Caching &
Performance Engineering
The fastest query is the one you never make. Caching is not a magic performance fix — it is a careful engineering tradeoff between consistency, latency, memory, and complexity. This section teaches you how to design and operate caching systems at scale: from Spring's abstraction layer to Redis internals, from cache stampede prevention to JVM profiling and load testing.
Why Caching? The Real Story
Every junior engineer knows caching makes things faster. Senior engineers know it also introduces consistency problems, thundering herds, memory pressure, and subtle bugs that only appear under production load. Caching is a tradeoff, not a free lunch.
Don't cache data that changes very frequently (stock prices tick-by-tick), data that is personal and unique per user (shopping cart contents are better in a session store), data where stale reads cause financial or safety issues (account balances in a payment system), or small data sets that are fast to query anyway. Caching for its own sake adds complexity without proportional benefit.
Spring Cache Abstraction
Spring provides a cache abstraction layer via annotations. Your business code stays clean — cache interaction is handled by AOP proxies. The abstraction works with any backing store: ConcurrentHashMap, Caffeine, Redis, Hazelcast, and more.
Core Annotations
// 1. Enable caching in your configuration
@SpringBootApplication
@EnableCaching
public class Application { ... }
// 2. @Cacheable — cache the return value
@Service
public class ProductService {
@Cacheable(value = "products", key = "#id")
public Product findById(Long id) {
// This body only executes on cache MISS
log.info("Loading product {} from database", id);
return productRepository.findById(id)
.orElseThrow(() -> new ProductNotFoundException(id));
}
// Conditional caching
@Cacheable(
value = "products",
key = "#id",
condition = "#id > 0", // only cache if condition true
unless = "#result == null" // don't cache null results
)
public Product findByIdConditional(Long id) { ... }
// 3. @CachePut — always execute, always update cache
@CachePut(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
return productRepository.save(product);
}
// 4. @CacheEvict — remove from cache
@CacheEvict(value = "products", key = "#id")
public void deleteProduct(Long id) {
productRepository.deleteById(id);
}
// Evict entire cache
@CacheEvict(value = "products", allEntries = true)
public void clearProductCache() { }
// 5. @Caching — combine multiple cache operations
@Caching(evict = {
@CacheEvict(value = "products", key = "#product.id"),
@CacheEvict(value = "product-lists", allEntries = true)
})
public void invalidateProductCaches(Product product) { }
}
Cache Configuration with Caffeine (In-Process)
For local, single-instance caching, Caffeine is the fastest JVM cache. It uses a Window TinyLFU eviction policy that outperforms LRU/LFU in most access patterns.
@Configuration
@EnableCaching
public class CacheConfig {
@Bean
public CacheManager caffeineCacheManager() {
CaffeineCacheManager manager = new CaffeineCacheManager();
// Per-cache configuration
manager.registerCustomCache("products",
Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.expireAfterAccess(Duration.ofMinutes(5))
.recordStats() // enable hit/miss metrics
.build());
manager.registerCustomCache("users",
Caffeine.newBuilder()
.maximumSize(5_000)
.expireAfterWrite(Duration.ofMinutes(30))
.weakValues() // GC-eligible values
.build());
return manager;
}
}
Caffeine (in-process): Sub-millisecond latency, no network, no serialization overhead. Perfect for read-heavy, rarely-changing reference data (country lists, product categories, config). Problem: each JVM instance has its own cache — updates to one node don't propagate to others. Cache size limited by heap. Redis (out-of-process): Shared across all instances, survives restarts, can hold far more data. Latency ~0.5–2ms. The right choice for user sessions, rate limiting counters, distributed locks, and any data that must be consistent across cluster nodes.
Redis: Architecture & Internals
Redis is a single-threaded, in-memory data structure server. "Single-threaded" sounds like a limitation but is actually its secret weapon — no lock contention, predictable latency, and atomic operations on all commands. Redis 6+ added I/O threads for network handling while keeping command execution single-threaded.
Use: counters, flags, sessions
Max: 512MB value
Use: object fields, user profile
Memory-efficient for small hashes
Use: queues, activity feeds
O(1) push/pop from ends
Use: unique visitors, tags
Intersection/union operations
Use: leaderboards, rate limiting
Score-ordered, O(log N) ops
Use: event log, message queue
Consumer groups, persistence
Spring Data Redis Setup
spring:
data:
redis:
host: localhost
port: 6379
password: ${REDIS_PASSWORD}
timeout: 2000ms
lettuce:
pool:
max-active: 16 # max connections in pool
max-idle: 8
min-idle: 2
max-wait: 100ms # block at most 100ms waiting for connection
cache:
type: redis
redis:
time-to-live: 600000 # 10 minutes default TTL (ms)
cache-null-values: false
use-key-prefix: true
key-prefix: "myapp:"
@Configuration
public class RedisConfig {
@Bean
public RedisTemplate<String, Object> redisTemplate(
RedisConnectionFactory factory) {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(factory);
// Use JSON serialization (NOT Java serialization!)
Jackson2JsonRedisSerializer<Object> jsonSerializer =
new Jackson2JsonRedisSerializer<>(Object.class);
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(jsonSerializer);
template.setHashKeySerializer(new StringRedisSerializer());
template.setHashValueSerializer(jsonSerializer);
template.afterPropertiesSet();
return template;
}
@Bean
public RedisCacheManager redisCacheManager(
RedisConnectionFactory factory) {
RedisCacheConfiguration defaultConfig = RedisCacheConfiguration
.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(10))
.serializeKeysWith(
RedisSerializationContext.SerializationPair
.fromSerializer(new StringRedisSerializer()))
.serializeValuesWith(
RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer()))
.disableCachingNullValues();
// Per-cache TTL overrides
Map<String, RedisCacheConfiguration> cacheConfigs = Map.of(
"products", defaultConfig.entryTtl(Duration.ofMinutes(30)),
"users", defaultConfig.entryTtl(Duration.ofHours(1)),
"rate-limits", defaultConfig.entryTtl(Duration.ofMinutes(1))
);
return RedisCacheManager.builder(factory)
.cacheDefaults(defaultConfig)
.withInitialCacheConfigurations(cacheConfigs)
.build();
}
}
// Direct Redis operations for custom logic
@Service
@RequiredArgsConstructor
public class RedisService {
private final StringRedisTemplate stringRedisTemplate;
private final RedisTemplate<String, Object> redisTemplate;
public void set(String key, String value, Duration ttl) {
stringRedisTemplate.opsForValue().set(key, value, ttl);
}
public Optional<String> get(String key) {
return Optional.ofNullable(stringRedisTemplate.opsForValue().get(key));
}
// Atomic increment — safe for rate limiting counters
public Long increment(String key) {
return stringRedisTemplate.opsForValue().increment(key);
}
// Hash operations — store object fields without serializing whole object
public void setUserField(Long userId, String field, String value) {
redisTemplate.opsForHash().put("user:" + userId, field, value);
}
// Sorted set for leaderboard
public void addScore(String leaderboard, String userId, double score) {
redisTemplate.opsForZSet().add(leaderboard, userId, score);
}
public Set<Object> getTopN(String leaderboard, long n) {
return redisTemplate.opsForZSet()
.reverseRange(leaderboard, 0, n - 1);
}
}
The default JdkSerializationRedisSerializer writes Java binary format. This means: you can't inspect cached values with redis-cli, any class rename or field change causes deserialization failures for existing cache entries, and binary blobs are larger than JSON. Always configure GenericJackson2JsonRedisSerializer with an ObjectMapper that includes type information. This also lets you share cached data between services written in different languages.
Cache Patterns in Production
There are multiple ways to integrate a cache with your data store. Each has different consistency properties, write amplification, and failure modes. Picking the wrong pattern is a common source of production bugs.
Implementing Cache-Aside Manually (When Annotations Aren't Enough)
@Service
@RequiredArgsConstructor
public class ProductCacheService {
private final RedisTemplate<String, Object> redisTemplate;
private final ProductRepository productRepository;
private static final String KEY_PREFIX = "product:";
private static final Duration TTL = Duration.ofMinutes(30);
public Product getProduct(Long id) {
String key = KEY_PREFIX + id;
// 1. Check cache
Product cached = (Product) redisTemplate.opsForValue().get(key);
if (cached != null) {
return cached; // Cache HIT
}
// 2. Cache MISS — load from DB
Product product = productRepository.findById(id)
.orElseThrow(() -> new ProductNotFoundException(id));
// 3. Store in cache with TTL
redisTemplate.opsForValue().set(key, product, TTL);
return product;
}
public Product updateProduct(Long id, ProductUpdateRequest req) {
Product product = productRepository.findById(id)
.orElseThrow(() -> new ProductNotFoundException(id));
product.update(req);
Product saved = productRepository.save(product);
// Evict the old cached entry
redisTemplate.delete(KEY_PREFIX + id);
// Option: also pre-warm the cache with the new value
redisTemplate.opsForValue().set(KEY_PREFIX + id, saved, TTL);
return saved;
}
public void deleteProduct(Long id) {
productRepository.deleteById(id);
redisTemplate.delete(KEY_PREFIX + id);
}
}
Cache Invalidation: The Hard Problem
Phil Karlton famously said: "There are only two hard things in computer science: cache invalidation and naming things." He was right about the first one. Invalidation bugs are subtle, reproduce only under load, and can serve stale data silently for hours.
Strategies
// Strategy 1: TTL-based expiry (simplest, eventual consistency)
// Cache entry expires after TTL. Stale window = TTL duration.
redisTemplate.opsForValue().set("product:42", product, Duration.ofMinutes(10));
// Accept: up to 10 minutes of stale data. Suitable for rarely-changed data.
// Strategy 2: Event-driven invalidation (strong consistency, more code)
@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
public void onProductUpdated(ProductUpdatedEvent event) {
// Only runs after the DB transaction commits successfully
redisTemplate.delete("product:" + event.getProductId());
// Optionally publish to other nodes via Redis pub/sub
redisTemplate.convertAndSend("cache-invalidation",
"product:" + event.getProductId());
}
// Strategy 3: Cache versioning (for bulk invalidation)
@Service
public class VersionedCacheService {
private final RedisTemplate<String, Object> redisTemplate;
// Get current version for a namespace
private long getVersion(String namespace) {
Object v = redisTemplate.opsForValue().get("version:" + namespace);
return v == null ? 1L : ((Number) v).longValue();
}
// Build versioned key
private String versionedKey(String namespace, String key) {
return namespace + ":v" + getVersion(namespace) + ":" + key;
}
// Invalidate entire namespace by bumping version (O(1)!)
public void invalidateNamespace(String namespace) {
redisTemplate.opsForValue().increment("version:" + namespace);
// Old versioned keys will naturally expire via TTL
// No need to scan and delete millions of keys
}
}
// Strategy 4: Redis keyspace notifications (react to TTL expiry)
@Component
public class CacheExpiryListener implements MessageListener {
@Override
public void onMessage(Message message, byte[] pattern) {
String expiredKey = message.toString();
if (expiredKey.startsWith("product:")) {
// Key expired — optionally re-warm the cache
Long id = Long.parseLong(expiredKey.split(":")[1]);
// triggerRefresh(id);
}
}
}
@Bean
public RedisMessageListenerContainer keyExpiryContainer(
RedisConnectionFactory factory,
CacheExpiryListener listener) {
RedisMessageListenerContainer container =
new RedisMessageListenerContainer();
container.setConnectionFactory(factory);
container.addMessageListener(listener,
new PatternTopic("__keyevent@0__:expired"));
return container;
}
If you delete the cache key before writing to the DB, two threads can race: Thread A deletes key, Thread B reads (miss) and loads old data from DB, Thread A writes new data to DB, Thread B puts old data back in cache. The fix: use the "cache-aside with evict-after-write" pattern — always write to DB first, then delete the cache key. Better still, use a short TTL so even if a stale entry slips in, it expires quickly.
Cache Stampede & Hot Keys
Two of the most dangerous production caching failure modes — and the ones most often discovered at 3am when traffic spikes.
// Solution 1: Mutex Lock (only one thread reloads)
@Service
public class StampedeProtectedCache {
private final RedisTemplate<String, Object> redisTemplate;
private final ProductRepository productRepository;
private final RedisScript<Long> lockScript;
public Product getWithLock(Long id) {
String dataKey = "product:" + id;
String lockKey = "lock:product:" + id;
// 1. Try cache first
Product cached = (Product) redisTemplate.opsForValue().get(dataKey);
if (cached != null) return cached;
// 2. Try to acquire lock (SET NX EX — atomic)
Boolean acquired = redisTemplate.opsForValue()
.setIfAbsent(lockKey, "1", Duration.ofSeconds(5));
if (Boolean.TRUE.equals(acquired)) {
try {
// 3. Double-check after acquiring lock
cached = (Product) redisTemplate.opsForValue().get(dataKey);
if (cached != null) return cached;
// 4. We have the lock — load from DB
Product product = productRepository.findById(id).orElseThrow();
redisTemplate.opsForValue().set(dataKey, product,
Duration.ofMinutes(30));
return product;
} finally {
redisTemplate.delete(lockKey);
}
} else {
// 5. Another thread is loading — wait briefly and retry
try { Thread.sleep(50); } catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
cached = (Product) redisTemplate.opsForValue().get(dataKey);
// If still null, fall through to DB (safety net)
return cached != null ? cached :
productRepository.findById(id).orElseThrow();
}
}
}
// Solution 2: Probabilistic Early Expiry (simpler, no locks)
// Randomly refresh the cache before it expires to avoid simultaneous expiry
@Service
public class ProbabilisticCache {
public Product getProduct(Long id) {
String key = "product:" + id;
CachedValue<Product> entry = getCachedWithExpiry(key);
if (entry != null) {
long remainingMs = entry.getExpiryMs() - System.currentTimeMillis();
long betaMs = 1000; // jitter window: 1 second
// Probabilistically decide to refresh early
// More likely to refresh as expiry approaches
double refreshProbability = -remainingMs / (betaMs * Math.log(Math.random()));
if (refreshProbability <= 0) {
return entry.getValue(); // Cache is fresh
}
}
// Load from DB and cache
Product product = productRepository.findById(id).orElseThrow();
storeWithExpiry(key, product, Duration.ofMinutes(30));
return product;
}
}
Hot Key Problem
A "hot key" is a single cache key receiving so much traffic that it saturates the Redis CPU or network bandwidth for that key's slot. In Redis Cluster, each key lives on one shard — even with 10 shards, one hot key hits one shard at 100% while others are idle at 10%.
// Solution: Local in-process cache as L1 in front of Redis (L2)
// Breaks the hot key — each JVM node serves from its own memory
@Service
public class TwoLevelCache {
// L1: Caffeine — in-process, sub-millisecond, per-node
private final Cache<Long, Product> localCache = Caffeine.newBuilder()
.maximumSize(1_000)
.expireAfterWrite(Duration.ofSeconds(30)) // Short TTL for L1
.build();
// L2: Redis — shared, consistent, larger
private final RedisTemplate<String, Object> redisTemplate;
private final ProductRepository productRepository;
public Product getProduct(Long id) {
// Try L1 first (no network, sub-ms)
Product product = localCache.getIfPresent(id);
if (product != null) return product;
// Try L2 (Redis, ~1ms)
product = (Product) redisTemplate.opsForValue().get("product:" + id);
if (product != null) {
localCache.put(id, product); // Promote to L1
return product;
}
// Miss both — load from DB
product = productRepository.findById(id).orElseThrow();
redisTemplate.opsForValue().set("product:" + id, product,
Duration.ofMinutes(30));
localCache.put(id, product);
return product;
}
// On update: evict from Redis (L1 expires via short TTL automatically)
public void evict(Long id) {
localCache.invalidate(id);
redisTemplate.delete("product:" + id);
}
}
// Alternative: Key sharding for ultra-hot keys
// Instead of "leaderboard", use "leaderboard:0" through "leaderboard:9"
// Route reads to a random shard, writes fan-out to all shards
public class ShardedHotKey {
private static final int SHARDS = 10;
public void increment(String baseKey, String userId, double score) {
// Write to one random shard
int shard = ThreadLocalRandom.current().nextInt(SHARDS);
redisTemplate.opsForZSet().add(baseKey + ":" + shard, userId, score);
}
public Set<String> getTopUsers(String baseKey, int n) {
// Merge all shards — usually done asynchronously
return IntStream.range(0, SHARDS)
.mapToObj(i -> redisTemplate.opsForZSet()
.reverseRangeWithScores(baseKey + ":" + i, 0, n - 1))
.filter(Objects::nonNull)
.flatMap(Collection::stream)
.sorted(Comparator.comparingDouble(
(ZSetOperations.TypedTuple<Object> t) -> t.getScore())
.reversed())
.limit(n)
.map(t -> (String) t.getValue())
.collect(Collectors.toSet());
}
}
Rate Limiting with Redis
Rate limiting is itself a caching problem — you're caching request counts per time window. Redis is the ideal backing store: atomic INCR, automatic TTL expiry, and shared state across all backend instances.
// Fixed Window Rate Limiter (simplest, has edge-case burst)
@Service
@RequiredArgsConstructor
public class FixedWindowRateLimiter {
private final StringRedisTemplate redis;
public boolean isAllowed(String userId, int maxRequests, Duration window) {
String key = "rate:" + userId + ":" +
(System.currentTimeMillis() / window.toMillis());
Long count = redis.opsForValue().increment(key);
if (count == 1) {
// First request in this window — set expiry
redis.expire(key, window);
}
return count <= maxRequests;
}
}
// Sliding Window Rate Limiter (accurate, no burst at window boundary)
@Service
public class SlidingWindowRateLimiter {
private final RedisTemplate<String, Object> redis;
private static final String SCRIPT =
"local key = KEYS[1]\n" +
"local now = tonumber(ARGV[1])\n" +
"local window = tonumber(ARGV[2])\n" +
"local limit = tonumber(ARGV[3])\n" +
"local windowStart = now - window\n" +
// Remove old entries outside window
"redis.call('ZREMRANGEBYSCORE', key, 0, windowStart)\n" +
"local count = redis.call('ZCARD', key)\n" +
"if count < limit then\n" +
" redis.call('ZADD', key, now, now)\n" +
" redis.call('PEXPIRE', key, window)\n" +
" return 1\n" +
"else\n" +
" return 0\n" +
"end";
private final RedisScript<Long> script =
RedisScript.of(SCRIPT, Long.class);
public boolean isAllowed(String key, int maxRequests,
Duration windowDuration) {
long now = System.currentTimeMillis();
Long result = redis.execute(script,
List.of("ratelimit:" + key),
String.valueOf(now),
String.valueOf(windowDuration.toMillis()),
String.valueOf(maxRequests));
return Long.valueOf(1L).equals(result);
}
}
// Spring MVC interceptor to apply rate limiting
@Component
@RequiredArgsConstructor
public class RateLimitInterceptor implements HandlerInterceptor {
private final SlidingWindowRateLimiter rateLimiter;
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) throws Exception {
String clientId = extractClientId(request); // IP or user ID
boolean allowed = rateLimiter.isAllowed(
clientId, 100, Duration.ofMinutes(1));
if (!allowed) {
response.setStatus(429); // Too Many Requests
response.setHeader("Retry-After", "60");
response.getWriter().write("{\"error\":\"Rate limit exceeded\"}");
return false;
}
return true;
}
private String extractClientId(HttpServletRequest request) {
String userId = (String) request.getAttribute("userId");
return userId != null ? "user:" + userId :
"ip:" + request.getRemoteAddr();
}
}
The sliding window approach above is accurate but stores one entry per request in a sorted set — at 1000 rps per user, that's 60,000 entries per user per minute. For very high-throughput APIs, use the token bucket algorithm instead: store only a float (remaining tokens) and a timestamp in a Redis hash, and recalculate on each request using the elapsed time. This is O(1) space per user regardless of request rate. Bucket4j library implements token bucket with Redis out of the box.
Distributed Caching Patterns
When you run multiple instances of your service, local caches become inconsistent. Distributed caching with Redis solves the consistency problem but introduces new challenges: serialization formats, network partitions, and eviction under memory pressure.
Redis Cluster Topology
Cache Synchronization Across Instances
// Problem: Instance A updates local Caffeine cache.
// Instance B still has the old value in its local cache.
// Solution: Redis Pub/Sub to broadcast invalidation events
@Configuration
public class CacheInvalidationConfig {
@Bean
public RedisMessageListenerContainer invalidationListener(
RedisConnectionFactory factory,
CacheInvalidationListener listener) {
RedisMessageListenerContainer container =
new RedisMessageListenerContainer();
container.setConnectionFactory(factory);
container.addMessageListener(listener,
new ChannelTopic("cache-invalidation"));
return container;
}
}
@Component
@RequiredArgsConstructor
public class CacheInvalidationListener implements MessageListener {
private final Cache<Long, Product> localCache; // Caffeine
@Override
public void onMessage(Message message, byte[] pattern) {
String key = new String(message.getBody());
if (key.startsWith("product:")) {
Long id = Long.parseLong(key.split(":")[1]);
localCache.invalidate(id);
log.debug("Invalidated local cache for product {}", id);
}
}
}
@Service
@RequiredArgsConstructor
public class ProductUpdateService {
private final ProductRepository repository;
private final RedisTemplate<String, Object> redisTemplate;
private final Cache<Long, Product> localCache;
@Transactional
public Product update(Long id, ProductUpdateRequest req) {
Product product = repository.findById(id).orElseThrow();
product.update(req);
Product saved = repository.save(product);
// Evict from Redis (L2) — this also triggers pub/sub notification
redisTemplate.delete("product:" + id);
// Broadcast invalidation to all cluster nodes
redisTemplate.convertAndSend("cache-invalidation", "product:" + id);
// Evict from local cache (L1) on this node
localCache.invalidate(id);
return saved;
}
}
JVM Tuning & GC
Spring Boot applications run on the JVM. Understanding the JVM's memory model and garbage collector behaviour is essential for diagnosing latency spikes, OOM crashes, and throughput degradation in production.
Choosing the Right GC
# G1GC (default from Java 9) — good general purpose
# Region-based, concurrent marking, configurable pause target
java -XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \ # Target max pause (best effort)
-XX:G1HeapRegionSize=16m \ # Tune for your heap size
-Xms2g -Xmx4g \ # Set min = max to avoid resizing
-jar app.jar
# ZGC (Java 15+ production, Java 11+ experimental)
# Sub-millisecond pauses regardless of heap size!
# Perfect for latency-sensitive APIs
java -XX:+UseZGC \
-Xms4g -Xmx4g \
-XX:+ZGenerational \ # Generational ZGC (Java 21+)
-jar app.jar
# Shenandoah (alternative low-latency GC)
java -XX:+UseShenandoahGC \
-Xms4g -Xmx4g \
-jar app.jar
# Diagnostic flags (use in staging, not production)
java -XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-Xloggc:/var/log/gc.log \
-XX:+UseGCLogFileRotation \
-XX:NumberOfGCLogFiles=5 \
-XX:GCLogFileSize=20M \
-jar app.jar
// Expose GC metrics via Spring Boot Actuator + Micrometer
// Add to application.yml:
// management.metrics.enable.jvm: true
// Listen for GC notifications programmatically
@Component
public class GcMetricsCollector {
@PostConstruct
public void registerGcListener() {
for (GarbageCollectorMXBean gcBean :
ManagementFactory.getGarbageCollectorMXBeans()) {
if (gcBean instanceof NotificationEmitter emitter) {
emitter.addNotificationListener((notif, handback) -> {
GarbageCollectionNotificationInfo info =
GarbageCollectionNotificationInfo
.from((CompositeData) notif.getUserData());
long durationMs = info.getGcInfo().getDuration();
String gcName = info.getGcName();
if (durationMs > 500) {
log.warn("Long GC pause: {} took {}ms", gcName, durationMs);
// Alert or record metric
}
}, null, null);
}
}
}
}
// Memory leak detection: Heap dump on OOM
// -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/dumps/
// Analyze with: Eclipse Memory Analyzer (MAT) or IntelliJ profiler
// Common memory leak patterns in Spring Boot:
// 1. ThreadLocal not removed — grows unboundedly with request threads
// 2. Static Map/List with unbounded growth
// 3. Listeners/callbacks registered but never unregistered
// 4. Large objects promoted to Old Gen and never freed (Hibernate session cache)
// 5. Class loader leaks in hot-reload scenarios
Profiling Spring Boot Applications
Profiling is how you find what's actually slow — not what you think is slow. Every engineer who has profiled production applications has been surprised by the results. The slowest thing is almost never where you expect it to be.
Async Profiler (CPU & Allocation)
# Download async-profiler (no JVM overhead, uses perf_events)
# https://github.com/async-profiler/async-profiler
# CPU profiling — capture 30s of CPU flame graph
./asprof -d 30 -f /tmp/flamegraph.html $(pgrep -f "app.jar")
# Allocation profiling — find objects causing GC pressure
./asprof -e alloc -d 30 -f /tmp/alloc.html $(pgrep -f "app.jar")
# Wall-clock profiling — includes threads waiting on I/O
./asprof -e wall -d 30 -t -f /tmp/wall.html $(pgrep -f "app.jar")
Spring Boot Actuator for Runtime Insights
// application.yml — expose useful endpoints
management:
endpoints:
web:
exposure:
include: health, metrics, info, env, threaddump, heapdump, prometheus
endpoint:
health:
show-details: always
heapdump:
enabled: true # GET /actuator/heapdump downloads heap dump
threaddump:
enabled: true # GET /actuator/threaddump shows all threads
// Key endpoints for performance diagnosis:
// GET /actuator/metrics/jvm.memory.used?tag=area:heap — current heap usage
// GET /actuator/metrics/jvm.gc.pause — GC pause times
// GET /actuator/metrics/hikaricp.connections.active — DB pool usage
// GET /actuator/metrics/http.server.requests — request latency p50/p95/p99
// GET /actuator/threaddump — detect thread contention
// GET /actuator/heapdump — full heap dump (careful!)
// Custom Micrometer metrics for your own code
@Service
@RequiredArgsConstructor
public class MetricAwareProductService {
private final MeterRegistry meterRegistry;
private final ProductRepository productRepository;
// Counter
private final Counter cacheHitCounter = Counter.builder("product.cache.hits")
.description("Number of product cache hits")
.register(meterRegistry);
// Timer — measures execution time
public Product findById(Long id) {
return Timer.builder("product.find.duration")
.tag("source", "db")
.description("Time to load product from DB")
.register(meterRegistry)
.recordCallable(() -> productRepository.findById(id).orElseThrow());
}
// Gauge — tracks a current value
@PostConstruct
public void registerGauges() {
Gauge.builder("product.cache.size",
localCache, Cache::estimatedSize)
.description("Local product cache size")
.register(meterRegistry);
}
}
Before touching any code, gather data: (1) Enable slow query log in PostgreSQL (log_min_duration_statement = 100) — the DB is usually the bottleneck. (2) Check connection pool utilization via HikariCP metrics — pool exhaustion looks like slow queries but is actually waiting for a connection. (3) Review thread dumps for BLOCKED or WAITING threads — usually a sign of lock contention. (4) Sample CPU with async-profiler for 60 seconds under load — read the flame graph from the bottom up. (5) Check allocation profile — excessive object creation causes GC pressure and latency spikes. Only then write code changes.
Performance Testing
Load testing tells you how your system behaves under stress before your users find out the hard way. A system that works for 10 users may completely fail at 1,000. Performance testing should be a non-negotiable part of any release cycle for production backend systems.
Gatling Load Test (Production-Grade)
// Gatling simulation — simulates realistic user behaviour
public class ProductApiSimulation extends Simulation {
private final HttpProtocolBuilder httpProtocol = http
.baseUrl("http://localhost:8080")
.acceptHeader("application/json")
.contentTypeHeader("application/json");
// Scenario 1: Browse products
private final ScenarioBuilder browseProducts = scenario("Browse Products")
.exec(http("Get all products")
.get("/api/v1/products?page=0&size=20")
.check(status().is(200))
.check(jmesPath("content").exists()))
.pause(Duration.ofMillis(500))
.exec(http("Get single product")
.get("/api/v1/products/#{productId}")
.check(status().is(200)));
// Scenario 2: Authenticated write operations
private final ScenarioBuilder createOrders = scenario("Create Orders")
.exec(http("Login")
.post("/api/v1/auth/login")
.body(StringBody("{\"email\":\"test@example.com\",\"password\":\"pass\"}"))
.check(jmesPath("token").saveAs("authToken")))
.exec(http("Create Order")
.post("/api/v1/orders")
.header("Authorization", "Bearer #{authToken}")
.body(StringBody("{\"productId\":42,\"quantity\":2}"))
.check(status().is(201)));
{
setUp(
// Ramp up to 200 concurrent users over 1 minute
browseProducts.injectOpen(
rampUsers(200).during(Duration.ofMinutes(1))
),
// Constant 20 order creators
createOrders.injectOpen(
constantUsersPerSec(20).during(Duration.ofMinutes(5))
)
).protocols(httpProtocol)
.assertions(
global().responseTime().percentile3().lt(500), // p99 < 500ms
global().responseTime().percentile2().lt(200), // p95 < 200ms
global().successfulRequests().percent().gt(99.0) // 99% success
);
}
}
Connection Pool Tuning Under Load
// HikariCP — optimal settings for web APIs
spring:
datasource:
hikari:
# Formula: pool_size = Tn × (Cm - 1) + 1
# Tn = number of threads, Cm = number of DB calls per request
# For 100 threads, 2 DB calls: 100 × (2 - 1) + 1 = 101
maximum-pool-size: 20 # Start conservative, measure then grow
minimum-idle: 5 # Keep connections warm
connection-timeout: 30000 # Max wait for a connection (30s)
idle-timeout: 600000 # Close idle connections after 10min
max-lifetime: 1800000 # Recycle connections every 30min
validation-timeout: 5000 # Check connection health in 5s
// Why NOT to blindly increase pool size:
// Each DB connection holds:
// - 5-10MB RAM on PostgreSQL server
// - File descriptor on both client and server
// - Active process/thread on PostgreSQL
// 200-connection pool = 1-2GB RAM on DB server
// PostgreSQL performance actually DEGRADES above ~100 connections
// Use PgBouncer (connection pooler) at the DB layer for high-concurrency systems
// Thread pool tuning for @Async and web server
@Configuration
public class ThreadPoolConfig {
// Tomcat embedded server thread pool
@Bean
public TomcatServletWebServerFactory tomcatFactory() {
TomcatServletWebServerFactory factory =
new TomcatServletWebServerFactory();
factory.addConnectorCustomizers(connector -> {
ProtocolHandler handler = connector.getProtocolHandler();
if (handler instanceof AbstractHttp11Protocol<?> protocol) {
protocol.setMaxThreads(200); // Max concurrent requests
protocol.setMinSpareThreads(20); // Always-warm threads
protocol.setAcceptCount(100); // Backlog queue
protocol.setConnectionTimeout(5000);
}
});
return factory;
}
// Async task executor — for @Async methods
@Bean
public ThreadPoolTaskExecutor asyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(50);
executor.setQueueCapacity(200);
executor.setThreadNamePrefix("async-");
executor.setRejectedExecutionHandler(
new ThreadPoolExecutor.CallerRunsPolicy()); // Backpressure
executor.initialize();
return executor;
}
}
Production Caching Pitfalls
If you cache a Java object and then modify it, you've modified the cached object too — there's no copy. Subsequent callers get the mutated version. Always cache immutable objects or DTOs, never JPA entities (which can be proxies with lazy-loaded state).
On fresh deployment, every request is a cache miss. If your system can't handle 100% cache miss rate, you have a problem. Mitigations: cache warming jobs that pre-load critical data before traffic is routed, circuit breakers on downstream services, read replicas for load distribution.
@Cacheable without a size limit or eviction policy on Caffeine will grow until the JVM runs out of heap. Always set maximumSize(). Redis with no maxmemory setting will grow until the OS kills it. Set maxmemory and maxmemory-policy: allkeys-lru in redis.conf.
Deploy new version → all old cache keys are incompatible → every request misses → DB crushed. Mitigation: use a version prefix in cache keys (v2:product:42), or use blue-green deployments where the green side has a warmed cache before traffic cuts over.
Caching /products?page=0&size=20 seems clever but product inserts/deletes invalidate every page simultaneously. Better: cache individual product objects and re-assemble pages from cache, or use cursor-based pagination which is more cache-friendly.
When Redis is near its maxmemory limit, eviction kicks in. LRU eviction during a write-heavy workload can cause Redis commands to block for 100ms+. Monitor evicted_keys metric. If eviction is high, increase Redis memory or reduce cache TTLs.
Interview Preparation
Caching and performance are heavily tested in senior backend interviews. These questions probe whether you understand the tradeoffs, not just the API.