Caching &
Performance Engineering

The fastest query is the one you never make. Caching is not a magic performance fix — it is a careful engineering tradeoff between consistency, latency, memory, and complexity. This section teaches you how to design and operate caching systems at scale: from Spring's abstraction layer to Redis internals, from cache stampede prevention to JVM profiling and load testing.

Why Caching? The Real Story

Every junior engineer knows caching makes things faster. Senior engineers know it also introduces consistency problems, thundering herds, memory pressure, and subtle bugs that only appear under production load. Caching is a tradeoff, not a free lunch.

The Caching Value Chain
🐌 Without Cache
GET /product/42 → Controller
→ Repository.findById()
→ DB connection pool
→ PostgreSQL disk read
← ~8–40ms round trip
Every request hits the DB
DB CPU: 80% at 2000 rps
⚡ With Cache
GET /product/42 → Controller
→ Cache lookup (Redis)
HIT: return in <1ms
MISS: query DB, store, return
← ~0.5–2ms for cache hit
95% requests skip DB
DB CPU: 12% at 2000 rps
⚠ The hidden cost: User updates product → cache still serves stale data for TTL duration. Accept this tradeoff consciously.
When NOT to Cache

Don't cache data that changes very frequently (stock prices tick-by-tick), data that is personal and unique per user (shopping cart contents are better in a session store), data where stale reads cause financial or safety issues (account balances in a payment system), or small data sets that are fast to query anyway. Caching for its own sake adds complexity without proportional benefit.

Spring Cache Abstraction

Spring provides a cache abstraction layer via annotations. Your business code stays clean — cache interaction is handled by AOP proxies. The abstraction works with any backing store: ConcurrentHashMap, Caffeine, Redis, Hazelcast, and more.

Core Annotations

Java
// 1. Enable caching in your configuration
@SpringBootApplication
@EnableCaching
public class Application { ... }

// 2. @Cacheable — cache the return value
@Service
public class ProductService {

    @Cacheable(value = "products", key = "#id")
    public Product findById(Long id) {
        // This body only executes on cache MISS
        log.info("Loading product {} from database", id);
        return productRepository.findById(id)
            .orElseThrow(() -> new ProductNotFoundException(id));
    }

    // Conditional caching
    @Cacheable(
        value = "products",
        key = "#id",
        condition = "#id > 0",          // only cache if condition true
        unless = "#result == null"      // don't cache null results
    )
    public Product findByIdConditional(Long id) { ... }

    // 3. @CachePut — always execute, always update cache
    @CachePut(value = "products", key = "#product.id")
    public Product updateProduct(Product product) {
        return productRepository.save(product);
    }

    // 4. @CacheEvict — remove from cache
    @CacheEvict(value = "products", key = "#id")
    public void deleteProduct(Long id) {
        productRepository.deleteById(id);
    }

    // Evict entire cache
    @CacheEvict(value = "products", allEntries = true)
    public void clearProductCache() { }

    // 5. @Caching — combine multiple cache operations
    @Caching(evict = {
        @CacheEvict(value = "products", key = "#product.id"),
        @CacheEvict(value = "product-lists", allEntries = true)
    })
    public void invalidateProductCaches(Product product) { }
}

Cache Configuration with Caffeine (In-Process)

For local, single-instance caching, Caffeine is the fastest JVM cache. It uses a Window TinyLFU eviction policy that outperforms LRU/LFU in most access patterns.

Java
@Configuration
@EnableCaching
public class CacheConfig {

    @Bean
    public CacheManager caffeineCacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();

        // Per-cache configuration
        manager.registerCustomCache("products",
            Caffeine.newBuilder()
                .maximumSize(10_000)
                .expireAfterWrite(Duration.ofMinutes(10))
                .expireAfterAccess(Duration.ofMinutes(5))
                .recordStats()           // enable hit/miss metrics
                .build());

        manager.registerCustomCache("users",
            Caffeine.newBuilder()
                .maximumSize(5_000)
                .expireAfterWrite(Duration.ofMinutes(30))
                .weakValues()            // GC-eligible values
                .build());

        return manager;
    }
}
Caffeine vs Redis — When to Use Which

Caffeine (in-process): Sub-millisecond latency, no network, no serialization overhead. Perfect for read-heavy, rarely-changing reference data (country lists, product categories, config). Problem: each JVM instance has its own cache — updates to one node don't propagate to others. Cache size limited by heap. Redis (out-of-process): Shared across all instances, survives restarts, can hold far more data. Latency ~0.5–2ms. The right choice for user sessions, rate limiting counters, distributed locks, and any data that must be consistent across cluster nodes.

Redis: Architecture & Internals

Redis is a single-threaded, in-memory data structure server. "Single-threaded" sounds like a limitation but is actually its secret weapon — no lock contention, predictable latency, and atomic operations on all commands. Redis 6+ added I/O threads for network handling while keeping command execution single-threaded.

Redis Data Structures
String
SET/GET/INCR/APPEND
Use: counters, flags, sessions
Max: 512MB value
Hash
HSET/HGET/HMGET
Use: object fields, user profile
Memory-efficient for small hashes
List
LPUSH/RPOP/LRANGE
Use: queues, activity feeds
O(1) push/pop from ends
Set
SADD/SISMEMBER/SUNION
Use: unique visitors, tags
Intersection/union operations
Sorted Set
ZADD/ZRANGE/ZRANK
Use: leaderboards, rate limiting
Score-ordered, O(log N) ops
Stream
XADD/XREAD/XGROUP
Use: event log, message queue
Consumer groups, persistence

Spring Data Redis Setup

YAML
spring:
  data:
    redis:
      host: localhost
      port: 6379
      password: ${REDIS_PASSWORD}
      timeout: 2000ms
      lettuce:
        pool:
          max-active: 16      # max connections in pool
          max-idle: 8
          min-idle: 2
          max-wait: 100ms     # block at most 100ms waiting for connection

  cache:
    type: redis
    redis:
      time-to-live: 600000   # 10 minutes default TTL (ms)
      cache-null-values: false
      use-key-prefix: true
      key-prefix: "myapp:"
Java
@Configuration
public class RedisConfig {

    @Bean
    public RedisTemplate<String, Object> redisTemplate(
            RedisConnectionFactory factory) {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(factory);

        // Use JSON serialization (NOT Java serialization!)
        Jackson2JsonRedisSerializer<Object> jsonSerializer =
            new Jackson2JsonRedisSerializer<>(Object.class);

        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(jsonSerializer);
        template.setHashKeySerializer(new StringRedisSerializer());
        template.setHashValueSerializer(jsonSerializer);
        template.afterPropertiesSet();
        return template;
    }

    @Bean
    public RedisCacheManager redisCacheManager(
            RedisConnectionFactory factory) {

        RedisCacheConfiguration defaultConfig = RedisCacheConfiguration
            .defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(10))
            .serializeKeysWith(
                RedisSerializationContext.SerializationPair
                    .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(
                RedisSerializationContext.SerializationPair
                    .fromSerializer(new GenericJackson2JsonRedisSerializer()))
            .disableCachingNullValues();

        // Per-cache TTL overrides
        Map<String, RedisCacheConfiguration> cacheConfigs = Map.of(
            "products",      defaultConfig.entryTtl(Duration.ofMinutes(30)),
            "users",         defaultConfig.entryTtl(Duration.ofHours(1)),
            "rate-limits",   defaultConfig.entryTtl(Duration.ofMinutes(1))
        );

        return RedisCacheManager.builder(factory)
            .cacheDefaults(defaultConfig)
            .withInitialCacheConfigurations(cacheConfigs)
            .build();
    }
}

// Direct Redis operations for custom logic
@Service
@RequiredArgsConstructor
public class RedisService {
    private final StringRedisTemplate stringRedisTemplate;
    private final RedisTemplate<String, Object> redisTemplate;

    public void set(String key, String value, Duration ttl) {
        stringRedisTemplate.opsForValue().set(key, value, ttl);
    }

    public Optional<String> get(String key) {
        return Optional.ofNullable(stringRedisTemplate.opsForValue().get(key));
    }

    // Atomic increment — safe for rate limiting counters
    public Long increment(String key) {
        return stringRedisTemplate.opsForValue().increment(key);
    }

    // Hash operations — store object fields without serializing whole object
    public void setUserField(Long userId, String field, String value) {
        redisTemplate.opsForHash().put("user:" + userId, field, value);
    }

    // Sorted set for leaderboard
    public void addScore(String leaderboard, String userId, double score) {
        redisTemplate.opsForZSet().add(leaderboard, userId, score);
    }

    public Set<Object> getTopN(String leaderboard, long n) {
        return redisTemplate.opsForZSet()
            .reverseRange(leaderboard, 0, n - 1);
    }
}
Never Use Java Serialization with Redis

The default JdkSerializationRedisSerializer writes Java binary format. This means: you can't inspect cached values with redis-cli, any class rename or field change causes deserialization failures for existing cache entries, and binary blobs are larger than JSON. Always configure GenericJackson2JsonRedisSerializer with an ObjectMapper that includes type information. This also lets you share cached data between services written in different languages.

Cache Patterns in Production

There are multiple ways to integrate a cache with your data store. Each has different consistency properties, write amplification, and failure modes. Picking the wrong pattern is a common source of production bugs.

Cache Pattern Comparison
Cache-Aside (Lazy Loading)
READ: Check cache → if miss → query DB → store in cache
WRITE: Write to DB → evict/update cache
✓ Only caches what is actually needed
✓ Cache failures don't break reads
✗ First request after eviction is slow
✗ Risk of stale reads between write/evict
Used by: @Cacheable annotation (default)
Write-Through
READ: Always check cache first
WRITE: Write to DB AND cache atomically
✓ Cache always fresh after writes
✓ No stale read window
✗ Write latency includes cache write
✗ Caches cold data that may never be read
Used by: @CachePut on write methods
Write-Behind (Write-Back)
WRITE: Write to cache immediately → async flush to DB
READ: Cache always returns latest value
✓ Very low write latency
✓ Batch writes to DB (efficient)
✗ Data loss risk if cache crashes before flush
✗ Complex to implement correctly
Used by: gaming scores, view counters
Read-Through
Cache is the single source — it loads from DB on miss
Application always talks to cache, never DB directly
✓ Transparent to application code
✓ Consistent access pattern
✗ Requires smart cache (e.g. Redis + loader)
✗ Cache outage blocks all reads
Used by: dedicated cache services

Implementing Cache-Aside Manually (When Annotations Aren't Enough)

Java
@Service
@RequiredArgsConstructor
public class ProductCacheService {

    private final RedisTemplate<String, Object> redisTemplate;
    private final ProductRepository productRepository;
    private static final String KEY_PREFIX = "product:";
    private static final Duration TTL = Duration.ofMinutes(30);

    public Product getProduct(Long id) {
        String key = KEY_PREFIX + id;

        // 1. Check cache
        Product cached = (Product) redisTemplate.opsForValue().get(key);
        if (cached != null) {
            return cached; // Cache HIT
        }

        // 2. Cache MISS — load from DB
        Product product = productRepository.findById(id)
            .orElseThrow(() -> new ProductNotFoundException(id));

        // 3. Store in cache with TTL
        redisTemplate.opsForValue().set(key, product, TTL);
        return product;
    }

    public Product updateProduct(Long id, ProductUpdateRequest req) {
        Product product = productRepository.findById(id)
            .orElseThrow(() -> new ProductNotFoundException(id));

        product.update(req);
        Product saved = productRepository.save(product);

        // Evict the old cached entry
        redisTemplate.delete(KEY_PREFIX + id);

        // Option: also pre-warm the cache with the new value
        redisTemplate.opsForValue().set(KEY_PREFIX + id, saved, TTL);

        return saved;
    }

    public void deleteProduct(Long id) {
        productRepository.deleteById(id);
        redisTemplate.delete(KEY_PREFIX + id);
    }
}

Cache Invalidation: The Hard Problem

Phil Karlton famously said: "There are only two hard things in computer science: cache invalidation and naming things." He was right about the first one. Invalidation bugs are subtle, reproduce only under load, and can serve stale data silently for hours.

Strategies

Java
// Strategy 1: TTL-based expiry (simplest, eventual consistency)
// Cache entry expires after TTL. Stale window = TTL duration.
redisTemplate.opsForValue().set("product:42", product, Duration.ofMinutes(10));
// Accept: up to 10 minutes of stale data. Suitable for rarely-changed data.

// Strategy 2: Event-driven invalidation (strong consistency, more code)
@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
public void onProductUpdated(ProductUpdatedEvent event) {
    // Only runs after the DB transaction commits successfully
    redisTemplate.delete("product:" + event.getProductId());
    // Optionally publish to other nodes via Redis pub/sub
    redisTemplate.convertAndSend("cache-invalidation",
        "product:" + event.getProductId());
}

// Strategy 3: Cache versioning (for bulk invalidation)
@Service
public class VersionedCacheService {
    private final RedisTemplate<String, Object> redisTemplate;

    // Get current version for a namespace
    private long getVersion(String namespace) {
        Object v = redisTemplate.opsForValue().get("version:" + namespace);
        return v == null ? 1L : ((Number) v).longValue();
    }

    // Build versioned key
    private String versionedKey(String namespace, String key) {
        return namespace + ":v" + getVersion(namespace) + ":" + key;
    }

    // Invalidate entire namespace by bumping version (O(1)!)
    public void invalidateNamespace(String namespace) {
        redisTemplate.opsForValue().increment("version:" + namespace);
        // Old versioned keys will naturally expire via TTL
        // No need to scan and delete millions of keys
    }
}

// Strategy 4: Redis keyspace notifications (react to TTL expiry)
@Component
public class CacheExpiryListener implements MessageListener {

    @Override
    public void onMessage(Message message, byte[] pattern) {
        String expiredKey = message.toString();
        if (expiredKey.startsWith("product:")) {
            // Key expired — optionally re-warm the cache
            Long id = Long.parseLong(expiredKey.split(":")[1]);
            // triggerRefresh(id);
        }
    }
}

@Bean
public RedisMessageListenerContainer keyExpiryContainer(
        RedisConnectionFactory factory,
        CacheExpiryListener listener) {
    RedisMessageListenerContainer container =
        new RedisMessageListenerContainer();
    container.setConnectionFactory(factory);
    container.addMessageListener(listener,
        new PatternTopic("__keyevent@0__:expired"));
    return container;
}
The Delete-Before-Write Race Condition

If you delete the cache key before writing to the DB, two threads can race: Thread A deletes key, Thread B reads (miss) and loads old data from DB, Thread A writes new data to DB, Thread B puts old data back in cache. The fix: use the "cache-aside with evict-after-write" pattern — always write to DB first, then delete the cache key. Better still, use a short TTL so even if a stale entry slips in, it expires quickly.

Cache Stampede & Hot Keys

Two of the most dangerous production caching failure modes — and the ones most often discovered at 3am when traffic spikes.

Cache Stampede: The Thundering Herd Problem
T=0: Product cache key expires
Request 1 → MISS
Request 2 → MISS
Request 3 → MISS
...500 more → MISS
↓ All 503 requests hit DB simultaneously
💥 DB overwhelmed → timeouts → service down
Java
// Solution 1: Mutex Lock (only one thread reloads)
@Service
public class StampedeProtectedCache {
    private final RedisTemplate<String, Object> redisTemplate;
    private final ProductRepository productRepository;
    private final RedisScript<Long> lockScript;

    public Product getWithLock(Long id) {
        String dataKey = "product:" + id;
        String lockKey = "lock:product:" + id;

        // 1. Try cache first
        Product cached = (Product) redisTemplate.opsForValue().get(dataKey);
        if (cached != null) return cached;

        // 2. Try to acquire lock (SET NX EX — atomic)
        Boolean acquired = redisTemplate.opsForValue()
            .setIfAbsent(lockKey, "1", Duration.ofSeconds(5));

        if (Boolean.TRUE.equals(acquired)) {
            try {
                // 3. Double-check after acquiring lock
                cached = (Product) redisTemplate.opsForValue().get(dataKey);
                if (cached != null) return cached;

                // 4. We have the lock — load from DB
                Product product = productRepository.findById(id).orElseThrow();
                redisTemplate.opsForValue().set(dataKey, product,
                    Duration.ofMinutes(30));
                return product;
            } finally {
                redisTemplate.delete(lockKey);
            }
        } else {
            // 5. Another thread is loading — wait briefly and retry
            try { Thread.sleep(50); } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            cached = (Product) redisTemplate.opsForValue().get(dataKey);
            // If still null, fall through to DB (safety net)
            return cached != null ? cached :
                productRepository.findById(id).orElseThrow();
        }
    }
}

// Solution 2: Probabilistic Early Expiry (simpler, no locks)
// Randomly refresh the cache before it expires to avoid simultaneous expiry
@Service
public class ProbabilisticCache {

    public Product getProduct(Long id) {
        String key = "product:" + id;
        CachedValue<Product> entry = getCachedWithExpiry(key);

        if (entry != null) {
            long remainingMs = entry.getExpiryMs() - System.currentTimeMillis();
            long betaMs = 1000; // jitter window: 1 second

            // Probabilistically decide to refresh early
            // More likely to refresh as expiry approaches
            double refreshProbability = -remainingMs / (betaMs * Math.log(Math.random()));
            if (refreshProbability <= 0) {
                return entry.getValue(); // Cache is fresh
            }
        }

        // Load from DB and cache
        Product product = productRepository.findById(id).orElseThrow();
        storeWithExpiry(key, product, Duration.ofMinutes(30));
        return product;
    }
}

Hot Key Problem

A "hot key" is a single cache key receiving so much traffic that it saturates the Redis CPU or network bandwidth for that key's slot. In Redis Cluster, each key lives on one shard — even with 10 shards, one hot key hits one shard at 100% while others are idle at 10%.

Java
// Solution: Local in-process cache as L1 in front of Redis (L2)
// Breaks the hot key — each JVM node serves from its own memory
@Service
public class TwoLevelCache {
    // L1: Caffeine — in-process, sub-millisecond, per-node
    private final Cache<Long, Product> localCache = Caffeine.newBuilder()
        .maximumSize(1_000)
        .expireAfterWrite(Duration.ofSeconds(30))  // Short TTL for L1
        .build();

    // L2: Redis — shared, consistent, larger
    private final RedisTemplate<String, Object> redisTemplate;
    private final ProductRepository productRepository;

    public Product getProduct(Long id) {
        // Try L1 first (no network, sub-ms)
        Product product = localCache.getIfPresent(id);
        if (product != null) return product;

        // Try L2 (Redis, ~1ms)
        product = (Product) redisTemplate.opsForValue().get("product:" + id);
        if (product != null) {
            localCache.put(id, product); // Promote to L1
            return product;
        }

        // Miss both — load from DB
        product = productRepository.findById(id).orElseThrow();
        redisTemplate.opsForValue().set("product:" + id, product,
            Duration.ofMinutes(30));
        localCache.put(id, product);
        return product;
    }

    // On update: evict from Redis (L1 expires via short TTL automatically)
    public void evict(Long id) {
        localCache.invalidate(id);
        redisTemplate.delete("product:" + id);
    }
}

// Alternative: Key sharding for ultra-hot keys
// Instead of "leaderboard", use "leaderboard:0" through "leaderboard:9"
// Route reads to a random shard, writes fan-out to all shards
public class ShardedHotKey {
    private static final int SHARDS = 10;

    public void increment(String baseKey, String userId, double score) {
        // Write to one random shard
        int shard = ThreadLocalRandom.current().nextInt(SHARDS);
        redisTemplate.opsForZSet().add(baseKey + ":" + shard, userId, score);
    }

    public Set<String> getTopUsers(String baseKey, int n) {
        // Merge all shards — usually done asynchronously
        return IntStream.range(0, SHARDS)
            .mapToObj(i -> redisTemplate.opsForZSet()
                .reverseRangeWithScores(baseKey + ":" + i, 0, n - 1))
            .filter(Objects::nonNull)
            .flatMap(Collection::stream)
            .sorted(Comparator.comparingDouble(
                (ZSetOperations.TypedTuple<Object> t) -> t.getScore())
                .reversed())
            .limit(n)
            .map(t -> (String) t.getValue())
            .collect(Collectors.toSet());
    }
}

Rate Limiting with Redis

Rate limiting is itself a caching problem — you're caching request counts per time window. Redis is the ideal backing store: atomic INCR, automatic TTL expiry, and shared state across all backend instances.

Java
// Fixed Window Rate Limiter (simplest, has edge-case burst)
@Service
@RequiredArgsConstructor
public class FixedWindowRateLimiter {
    private final StringRedisTemplate redis;

    public boolean isAllowed(String userId, int maxRequests, Duration window) {
        String key = "rate:" + userId + ":" +
            (System.currentTimeMillis() / window.toMillis());

        Long count = redis.opsForValue().increment(key);

        if (count == 1) {
            // First request in this window — set expiry
            redis.expire(key, window);
        }

        return count <= maxRequests;
    }
}

// Sliding Window Rate Limiter (accurate, no burst at window boundary)
@Service
public class SlidingWindowRateLimiter {
    private final RedisTemplate<String, Object> redis;
    private static final String SCRIPT =
        "local key = KEYS[1]\n" +
        "local now = tonumber(ARGV[1])\n" +
        "local window = tonumber(ARGV[2])\n" +
        "local limit = tonumber(ARGV[3])\n" +
        "local windowStart = now - window\n" +
        // Remove old entries outside window
        "redis.call('ZREMRANGEBYSCORE', key, 0, windowStart)\n" +
        "local count = redis.call('ZCARD', key)\n" +
        "if count < limit then\n" +
        "  redis.call('ZADD', key, now, now)\n" +
        "  redis.call('PEXPIRE', key, window)\n" +
        "  return 1\n" +
        "else\n" +
        "  return 0\n" +
        "end";

    private final RedisScript<Long> script =
        RedisScript.of(SCRIPT, Long.class);

    public boolean isAllowed(String key, int maxRequests,
                             Duration windowDuration) {
        long now = System.currentTimeMillis();
        Long result = redis.execute(script,
            List.of("ratelimit:" + key),
            String.valueOf(now),
            String.valueOf(windowDuration.toMillis()),
            String.valueOf(maxRequests));

        return Long.valueOf(1L).equals(result);
    }
}

// Spring MVC interceptor to apply rate limiting
@Component
@RequiredArgsConstructor
public class RateLimitInterceptor implements HandlerInterceptor {
    private final SlidingWindowRateLimiter rateLimiter;

    @Override
    public boolean preHandle(HttpServletRequest request,
                             HttpServletResponse response,
                             Object handler) throws Exception {
        String clientId = extractClientId(request); // IP or user ID
        boolean allowed = rateLimiter.isAllowed(
            clientId, 100, Duration.ofMinutes(1));

        if (!allowed) {
            response.setStatus(429); // Too Many Requests
            response.setHeader("Retry-After", "60");
            response.getWriter().write("{\"error\":\"Rate limit exceeded\"}");
            return false;
        }
        return true;
    }

    private String extractClientId(HttpServletRequest request) {
        String userId = (String) request.getAttribute("userId");
        return userId != null ? "user:" + userId :
            "ip:" + request.getRemoteAddr();
    }
}
Token Bucket vs Sliding Window

The sliding window approach above is accurate but stores one entry per request in a sorted set — at 1000 rps per user, that's 60,000 entries per user per minute. For very high-throughput APIs, use the token bucket algorithm instead: store only a float (remaining tokens) and a timestamp in a Redis hash, and recalculate on each request using the elapsed time. This is O(1) space per user regardless of request rate. Bucket4j library implements token bucket with Redis out of the box.

Distributed Caching Patterns

When you run multiple instances of your service, local caches become inconsistent. Distributed caching with Redis solves the consistency problem but introduces new challenges: serialization formats, network partitions, and eviction under memory pressure.

Redis Cluster Topology

Redis Cluster: Hash Slots & Replication
16384 hash slots divided across masters
Master-1
Slots 0–5460
↓ Replica-1A
↓ Replica-1B
Master-2
Slots 5461–10922
↓ Replica-2A
↓ Replica-2B
Master-3
Slots 10923–16383
↓ Replica-3A
↓ Replica-3B
key → CRC16(key) % 16384 → shard assignment → routed automatically by Lettuce client

Cache Synchronization Across Instances

Java
// Problem: Instance A updates local Caffeine cache.
// Instance B still has the old value in its local cache.
// Solution: Redis Pub/Sub to broadcast invalidation events

@Configuration
public class CacheInvalidationConfig {

    @Bean
    public RedisMessageListenerContainer invalidationListener(
            RedisConnectionFactory factory,
            CacheInvalidationListener listener) {
        RedisMessageListenerContainer container =
            new RedisMessageListenerContainer();
        container.setConnectionFactory(factory);
        container.addMessageListener(listener,
            new ChannelTopic("cache-invalidation"));
        return container;
    }
}

@Component
@RequiredArgsConstructor
public class CacheInvalidationListener implements MessageListener {
    private final Cache<Long, Product> localCache; // Caffeine

    @Override
    public void onMessage(Message message, byte[] pattern) {
        String key = new String(message.getBody());
        if (key.startsWith("product:")) {
            Long id = Long.parseLong(key.split(":")[1]);
            localCache.invalidate(id);
            log.debug("Invalidated local cache for product {}", id);
        }
    }
}

@Service
@RequiredArgsConstructor
public class ProductUpdateService {
    private final ProductRepository repository;
    private final RedisTemplate<String, Object> redisTemplate;
    private final Cache<Long, Product> localCache;

    @Transactional
    public Product update(Long id, ProductUpdateRequest req) {
        Product product = repository.findById(id).orElseThrow();
        product.update(req);
        Product saved = repository.save(product);

        // Evict from Redis (L2) — this also triggers pub/sub notification
        redisTemplate.delete("product:" + id);

        // Broadcast invalidation to all cluster nodes
        redisTemplate.convertAndSend("cache-invalidation", "product:" + id);

        // Evict from local cache (L1) on this node
        localCache.invalidate(id);

        return saved;
    }
}

JVM Tuning & GC

Spring Boot applications run on the JVM. Understanding the JVM's memory model and garbage collector behaviour is essential for diagnosing latency spikes, OOM crashes, and throughput degradation in production.

JVM Memory Layout (Heap + Non-Heap)
Heap (GC Managed)
Young Gen
Eden + S0 + S1
Short-lived objects
Minor GC: fast (~ms)
Old Gen (Tenured)
Long-lived objects, caches
Major GC: slower (~100ms–s)
Compaction: STW pauses
Non-Heap
Metaspace
Class metadata, bytecode
Code Cache
JIT-compiled native code
Direct Memory
Netty buffers, NIO

Choosing the Right GC

Shell / JVM Flags
# G1GC (default from Java 9) — good general purpose
# Region-based, concurrent marking, configurable pause target
java -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \     # Target max pause (best effort)
     -XX:G1HeapRegionSize=16m \     # Tune for your heap size
     -Xms2g -Xmx4g \               # Set min = max to avoid resizing
     -jar app.jar

# ZGC (Java 15+ production, Java 11+ experimental)
# Sub-millisecond pauses regardless of heap size!
# Perfect for latency-sensitive APIs
java -XX:+UseZGC \
     -Xms4g -Xmx4g \
     -XX:+ZGenerational \           # Generational ZGC (Java 21+)
     -jar app.jar

# Shenandoah (alternative low-latency GC)
java -XX:+UseShenandoahGC \
     -Xms4g -Xmx4g \
     -jar app.jar

# Diagnostic flags (use in staging, not production)
java -XX:+PrintGCDetails \
     -XX:+PrintGCDateStamps \
     -Xloggc:/var/log/gc.log \
     -XX:+UseGCLogFileRotation \
     -XX:NumberOfGCLogFiles=5 \
     -XX:GCLogFileSize=20M \
     -jar app.jar
Java
// Expose GC metrics via Spring Boot Actuator + Micrometer
// Add to application.yml:
// management.metrics.enable.jvm: true

// Listen for GC notifications programmatically
@Component
public class GcMetricsCollector {

    @PostConstruct
    public void registerGcListener() {
        for (GarbageCollectorMXBean gcBean :
             ManagementFactory.getGarbageCollectorMXBeans()) {
            if (gcBean instanceof NotificationEmitter emitter) {
                emitter.addNotificationListener((notif, handback) -> {
                    GarbageCollectionNotificationInfo info =
                        GarbageCollectionNotificationInfo
                            .from((CompositeData) notif.getUserData());
                    long durationMs = info.getGcInfo().getDuration();
                    String gcName = info.getGcName();

                    if (durationMs > 500) {
                        log.warn("Long GC pause: {} took {}ms", gcName, durationMs);
                        // Alert or record metric
                    }
                }, null, null);
            }
        }
    }
}

// Memory leak detection: Heap dump on OOM
// -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/dumps/
// Analyze with: Eclipse Memory Analyzer (MAT) or IntelliJ profiler

// Common memory leak patterns in Spring Boot:
// 1. ThreadLocal not removed — grows unboundedly with request threads
// 2. Static Map/List with unbounded growth
// 3. Listeners/callbacks registered but never unregistered
// 4. Large objects promoted to Old Gen and never freed (Hibernate session cache)
// 5. Class loader leaks in hot-reload scenarios

Profiling Spring Boot Applications

Profiling is how you find what's actually slow — not what you think is slow. Every engineer who has profiled production applications has been surprised by the results. The slowest thing is almost never where you expect it to be.

Async Profiler (CPU & Allocation)

Shell
# Download async-profiler (no JVM overhead, uses perf_events)
# https://github.com/async-profiler/async-profiler

# CPU profiling — capture 30s of CPU flame graph
./asprof -d 30 -f /tmp/flamegraph.html $(pgrep -f "app.jar")

# Allocation profiling — find objects causing GC pressure
./asprof -e alloc -d 30 -f /tmp/alloc.html $(pgrep -f "app.jar")

# Wall-clock profiling — includes threads waiting on I/O
./asprof -e wall -d 30 -t -f /tmp/wall.html $(pgrep -f "app.jar")

Spring Boot Actuator for Runtime Insights

Java
// application.yml — expose useful endpoints
management:
  endpoints:
    web:
      exposure:
        include: health, metrics, info, env, threaddump, heapdump, prometheus
  endpoint:
    health:
      show-details: always
    heapdump:
      enabled: true   # GET /actuator/heapdump downloads heap dump
    threaddump:
      enabled: true   # GET /actuator/threaddump shows all threads

// Key endpoints for performance diagnosis:
// GET /actuator/metrics/jvm.memory.used?tag=area:heap     — current heap usage
// GET /actuator/metrics/jvm.gc.pause                      — GC pause times
// GET /actuator/metrics/hikaricp.connections.active        — DB pool usage
// GET /actuator/metrics/http.server.requests               — request latency p50/p95/p99
// GET /actuator/threaddump                                 — detect thread contention
// GET /actuator/heapdump                                   — full heap dump (careful!)

// Custom Micrometer metrics for your own code
@Service
@RequiredArgsConstructor
public class MetricAwareProductService {
    private final MeterRegistry meterRegistry;
    private final ProductRepository productRepository;

    // Counter
    private final Counter cacheHitCounter = Counter.builder("product.cache.hits")
        .description("Number of product cache hits")
        .register(meterRegistry);

    // Timer — measures execution time
    public Product findById(Long id) {
        return Timer.builder("product.find.duration")
            .tag("source", "db")
            .description("Time to load product from DB")
            .register(meterRegistry)
            .recordCallable(() -> productRepository.findById(id).orElseThrow());
    }

    // Gauge — tracks a current value
    @PostConstruct
    public void registerGauges() {
        Gauge.builder("product.cache.size",
            localCache, Cache::estimatedSize)
            .description("Local product cache size")
            .register(meterRegistry);
    }
}
Profiling Checklist Before Optimizing

Before touching any code, gather data: (1) Enable slow query log in PostgreSQL (log_min_duration_statement = 100) — the DB is usually the bottleneck. (2) Check connection pool utilization via HikariCP metrics — pool exhaustion looks like slow queries but is actually waiting for a connection. (3) Review thread dumps for BLOCKED or WAITING threads — usually a sign of lock contention. (4) Sample CPU with async-profiler for 60 seconds under load — read the flame graph from the bottom up. (5) Check allocation profile — excessive object creation causes GC pressure and latency spikes. Only then write code changes.

Performance Testing

Load testing tells you how your system behaves under stress before your users find out the hard way. A system that works for 10 users may completely fail at 1,000. Performance testing should be a non-negotiable part of any release cycle for production backend systems.

Gatling Load Test (Production-Grade)

Java (Gatling)
// Gatling simulation — simulates realistic user behaviour
public class ProductApiSimulation extends Simulation {

    private final HttpProtocolBuilder httpProtocol = http
        .baseUrl("http://localhost:8080")
        .acceptHeader("application/json")
        .contentTypeHeader("application/json");

    // Scenario 1: Browse products
    private final ScenarioBuilder browseProducts = scenario("Browse Products")
        .exec(http("Get all products")
            .get("/api/v1/products?page=0&size=20")
            .check(status().is(200))
            .check(jmesPath("content").exists()))
        .pause(Duration.ofMillis(500))
        .exec(http("Get single product")
            .get("/api/v1/products/#{productId}")
            .check(status().is(200)));

    // Scenario 2: Authenticated write operations
    private final ScenarioBuilder createOrders = scenario("Create Orders")
        .exec(http("Login")
            .post("/api/v1/auth/login")
            .body(StringBody("{\"email\":\"test@example.com\",\"password\":\"pass\"}"))
            .check(jmesPath("token").saveAs("authToken")))
        .exec(http("Create Order")
            .post("/api/v1/orders")
            .header("Authorization", "Bearer #{authToken}")
            .body(StringBody("{\"productId\":42,\"quantity\":2}"))
            .check(status().is(201)));

    {
        setUp(
            // Ramp up to 200 concurrent users over 1 minute
            browseProducts.injectOpen(
                rampUsers(200).during(Duration.ofMinutes(1))
            ),
            // Constant 20 order creators
            createOrders.injectOpen(
                constantUsersPerSec(20).during(Duration.ofMinutes(5))
            )
        ).protocols(httpProtocol)
         .assertions(
             global().responseTime().percentile3().lt(500),  // p99 < 500ms
             global().responseTime().percentile2().lt(200),  // p95 < 200ms
             global().successfulRequests().percent().gt(99.0) // 99% success
         );
    }
}

Connection Pool Tuning Under Load

Java
// HikariCP — optimal settings for web APIs
spring:
  datasource:
    hikari:
      # Formula: pool_size = Tn × (Cm - 1) + 1
      # Tn = number of threads, Cm = number of DB calls per request
      # For 100 threads, 2 DB calls: 100 × (2 - 1) + 1 = 101
      maximum-pool-size: 20     # Start conservative, measure then grow
      minimum-idle: 5           # Keep connections warm
      connection-timeout: 30000 # Max wait for a connection (30s)
      idle-timeout: 600000      # Close idle connections after 10min
      max-lifetime: 1800000     # Recycle connections every 30min
      validation-timeout: 5000  # Check connection health in 5s

// Why NOT to blindly increase pool size:
// Each DB connection holds:
//   - 5-10MB RAM on PostgreSQL server
//   - File descriptor on both client and server
//   - Active process/thread on PostgreSQL
// 200-connection pool = 1-2GB RAM on DB server
// PostgreSQL performance actually DEGRADES above ~100 connections
// Use PgBouncer (connection pooler) at the DB layer for high-concurrency systems

// Thread pool tuning for @Async and web server
@Configuration
public class ThreadPoolConfig {

    // Tomcat embedded server thread pool
    @Bean
    public TomcatServletWebServerFactory tomcatFactory() {
        TomcatServletWebServerFactory factory =
            new TomcatServletWebServerFactory();
        factory.addConnectorCustomizers(connector -> {
            ProtocolHandler handler = connector.getProtocolHandler();
            if (handler instanceof AbstractHttp11Protocol<?> protocol) {
                protocol.setMaxThreads(200);       // Max concurrent requests
                protocol.setMinSpareThreads(20);   // Always-warm threads
                protocol.setAcceptCount(100);      // Backlog queue
                protocol.setConnectionTimeout(5000);
            }
        });
        return factory;
    }

    // Async task executor — for @Async methods
    @Bean
    public ThreadPoolTaskExecutor asyncExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(200);
        executor.setThreadNamePrefix("async-");
        executor.setRejectedExecutionHandler(
            new ThreadPoolExecutor.CallerRunsPolicy()); // Backpressure
        executor.initialize();
        return executor;
    }
}

Production Caching Pitfalls

1. Caching Mutable Shared Objects

If you cache a Java object and then modify it, you've modified the cached object too — there's no copy. Subsequent callers get the mutated version. Always cache immutable objects or DTOs, never JPA entities (which can be proxies with lazy-loaded state).

2. Not Handling Cache Misses at Startup

On fresh deployment, every request is a cache miss. If your system can't handle 100% cache miss rate, you have a problem. Mitigations: cache warming jobs that pre-load critical data before traffic is routed, circuit breakers on downstream services, read replicas for load distribution.

3. Unbounded Cache Growth

@Cacheable without a size limit or eviction policy on Caffeine will grow until the JVM runs out of heap. Always set maximumSize(). Redis with no maxmemory setting will grow until the OS kills it. Set maxmemory and maxmemory-policy: allkeys-lru in redis.conf.

4. Cache Stampede After Deployment

Deploy new version → all old cache keys are incompatible → every request misses → DB crushed. Mitigation: use a version prefix in cache keys (v2:product:42), or use blue-green deployments where the green side has a warmed cache before traffic cuts over.

5. Caching Paginated Results

Caching /products?page=0&size=20 seems clever but product inserts/deletes invalidate every page simultaneously. Better: cache individual product objects and re-assemble pages from cache, or use cursor-based pagination which is more cache-friendly.

6. Redis Timeouts Under Memory Pressure

When Redis is near its maxmemory limit, eviction kicks in. LRU eviction during a write-heavy workload can cause Redis commands to block for 100ms+. Monitor evicted_keys metric. If eviction is high, increase Redis memory or reduce cache TTLs.

Interview Preparation

Caching and performance are heavily tested in senior backend interviews. These questions probe whether you understand the tradeoffs, not just the API.

Q: What is cache stampede and how do you prevent it?
Cache stampede (thundering herd) occurs when a high-traffic cache key expires and hundreds of concurrent requests all experience a miss simultaneously, flooding the database. Prevention strategies include: (1) Mutex locking — use Redis SETNX to allow only one request to reload while others wait or serve slightly stale data; (2) Probabilistic early expiry — randomly refresh the cache slightly before it expires, so expiry is spread across time rather than occurring simultaneously; (3) Background refresh — a separate thread refreshes the cache before expiry, so it's never actually empty; (4) Two-level caching — a short-TTL L1 (Caffeine) and longer-TTL L2 (Redis), where L1 misses go to L2 not the DB. The right approach depends on how stale you can tolerate and how expensive a DB miss is.
Q: What's the difference between cache-aside and write-through caching?
Cache-aside (lazy loading): the application code explicitly checks the cache, loads from DB on miss, and writes to cache. The cache is only populated for data that's actually been requested. Write-through: every write goes to both DB and cache atomically (or via @CachePut). The cache is always up to date but may hold data that's never read again. Cache-aside is simpler and more memory-efficient; write-through has fresher data but wastes memory on cold writes. Most production systems use cache-aside with short TTLs as a safety net.
Q: How does Redis Cluster work, and what happens to a request for a hot key?
Redis Cluster splits the key space into 16,384 hash slots, distributing them across master nodes. A key's slot is determined by CRC16(key) % 16384. The Lettuce client caches the slot-to-node mapping and routes commands directly. A hot key — one receiving disproportionate traffic — saturates the single shard that owns its slot. Solutions: (1) Key sharding — append a random suffix (0-9) to distribute the key across multiple slots, with reads going to a random shard and writes fanning out; (2) Local L1 cache — add a per-JVM Caffeine cache in front of Redis so most traffic never reaches Redis at all; (3) Read replicas — configure Lettuce to route reads to replica nodes, spreading load across master and replicas for that shard.
Q: When would you NOT use caching?
Several scenarios: (1) Data that changes every request (stock tick data, real-time sensor readings) — the cache is always stale; (2) Security-sensitive personalized data — a misconfigured cache key could serve User A's data to User B; (3) Write-heavy workloads — if you're writing more than reading, cache invalidation overhead may exceed the benefit; (4) Small datasets with fast queries — caching a table with 100 rows that PostgreSQL keeps in its buffer pool adds complexity without benefit; (5) Financial transactions where consistency is non-negotiable — showing a user their old account balance even for 1 second could cause real harm. Always profile first to confirm the DB is actually the bottleneck before adding cache complexity.
Q: How do you handle cache invalidation in a microservices architecture?
This is the hardest problem. Service A caches data from Service B, but Service B updates it — how does Service A's cache get invalidated? Three approaches: (1) TTL-based — just let it expire. Accept eventual consistency. Most practical for non-critical data. (2) Event-driven invalidation — Service B publishes an event (Kafka/RabbitMQ) when data changes; Service A subscribes and evicts its local cache. More complex but consistent. (3) Shared cache with owned namespace — only Service B writes to the "product:" namespace in Redis; Service A reads but never writes. B invalidates keys when it updates them. This avoids the stale cache problem but couples services to a shared Redis cluster.
Q: What causes GC pauses and how do you reduce them in a high-traffic API?
GC pauses are caused by: (1) Stop-the-world (STW) phases — where all application threads pause while GC runs; (2) Large heaps taking longer to scan; (3) High object allocation rate causing frequent minor GCs; (4) Long-lived objects promoted to Old Gen causing major GCs. Reduction strategies: (1) Use ZGC or Shenandoah — both achieve sub-millisecond pauses by doing most work concurrently without STW; (2) Reduce object allocation — use object pools, reuse buffers, avoid boxing primitives; (3) Profile allocation with async-profiler to find the hot allocation sites; (4) Size Xms = Xmx to prevent heap resizing pauses; (5) Set MaxGCPauseMillis target with G1GC. For latency-critical APIs (p99 < 10ms), ZGC with generational mode (Java 21+) is the current state of the art.
Q: Design a rate limiter that works across a 10-instance cluster.
Requirements: accurate per-user limits, shared state, no single point of failure, low latency overhead. Solution using Redis sliding window: (1) Store each request timestamp in a Redis sorted set (key = "ratelimit:{userId}"); (2) Use a Lua script executed atomically: remove entries older than the window, count remaining, if under limit add current timestamp and return allowed; (3) The sorted set approach is accurate but uses O(requests per window) space per user. For very high-throughput, use token bucket instead — store only (tokens_remaining, last_refill_timestamp) as a Redis hash, updated atomically via Lua script. (4) If Redis is unavailable (circuit breaker), fail open (allow requests) rather than fail closed (block everything). (5) The key insight: Lua scripts in Redis are atomic, so no race conditions despite multiple concurrent callers from 10 instances.
Q: How do you diagnose a sudden increase in API latency in production?
Systematic approach: (1) Check error rate — is it just slow or also failing? Failing fast (DB down) looks like high latency on percentiles. (2) Check recent deployments — code or config change? Roll back if uncertain. (3) Check DB metrics — slow query log, active connections (pool exhaustion looks like latency), lock waits. (4) Check Redis metrics — connection pool saturation, latency histogram, eviction rate. (5) Get a thread dump — look for threads BLOCKED on a lock or all WAITING on the same resource. (6) Check GC logs — a major GC pause of 2 seconds shows up as a p99 spike. (7) Check external dependencies — downstream service timeouts cascade. (8) Profile with async-profiler wall-clock mode to see where threads are actually spending time. The most common causes: slow DB query (new query plan from stats change), connection pool exhaustion (traffic spike + slow queries holding connections), external service timeout, GC pause from memory pressure.

Section 09 Complete

You now understand caching as an engineering discipline — Redis internals, cache-aside vs write-through, stampede prevention, hot key sharding, distributed invalidation, JVM tuning, and performance testing. You know when NOT to cache, and how to diagnose production latency issues systematically.