Beyond Rent-Seeking: Architecting High-Performance Cache Eviction Policies in Modern Web Apps

Hey everyone, Alex here. Welcome back to another edition of Coding with Alex at sysseder.com.

Today, a fascinating paper made the rounds on Hacker News discussing the macroeconomic impact and "redistribution of wealth" caused by rent control policies. While economists and urban planners debate the real-world allocation of physical real estate, my mind immediately drifted to a problem that every single backend and systems engineer wrestles with daily: cache resource allocation.

In our software systems, memory is our prime real estate. We have a limited amount of high-speed "housing" (RAM) to store our data, and an endless queue of incoming requests demanding a spot. How we decide who gets to stay in our cache, who gets evicted, and how we distribute this digital "wealth" (bandwidth and latency savings) is one of the most critical design decisions we make.

If your eviction policy is too rigid, you end up with stale, unproductive data hogging memory (digital rent-seeking). If it's too aggressive, you suffer from constant cache thrashing, tanking your database performance. Today, we are going to dive deep into modern cache eviction architectures—moving beyond basic LRU (Least Recently Used) to explore advanced, production-grade algorithms like W-TinyLFU, LIRS, and ARC that power modern systems like Caffeine, Redis, and high-performance CDN layers.

The Cracks in the Classic LRU Model

For decades, Least Recently Used (LRU) has been the default choice for cache eviction. It’s simple, elegant, and can be implemented in constant $O(1)$ time using a doubly linked list and a hash map. When a key is accessed, it moves to the head of the list. When the cache is full, the item at the tail is evicted.

But LRU has a massive, fatal flaw: one-hit wonders.

Imagine a batch job, a web crawler, or an analytics query that performs a full table scan. It reads thousands of database rows that will never be accessed again. In a standard LRU cache, these single-use items will sweep through your memory, evicting your highly valuable, frequently accessed hot keys. Your cache hit rate plummets to zero, and your database gets hammered. This is known as "cache pollution."

The Anatomy of Cache Pollution

[Hot Key A] -> [Hot Key B] -> [Hot Key C]  (Healthy State)
      |
      v  (Enter a massive, single-use database scan)
[Scan Key 1] -> [Scan Key 2] -> [Scan Key 3] (Hot keys are evicted!)

To prevent this, modern systems engineering has moved toward frequency-aware and adaptive eviction policies. Let’s look at how we can implement and leverage these smarter strategies.

Enter LFU and the Frequency Problem

To combat the one-hit-wonder problem of LRU, we can look at Least Frequently Used (LFU). LFU keeps track of how many times an item has been requested. A one-hit wonder won't easily evict a key that has been accessed 100 times.

However, LFU has its own structural vulnerability: historical accumulation. If an item was accessed 10,000 times during a Black Friday sale, it will sit in your LFU cache forever, even if its request frequency drops to zero in December. It blocks new, relevant items from entering the cache because its historical "wealth" is too high to overcome.

To build a resilient production system, we need algorithms that balance both recency (LRU) and frequency (LFU) while incorporating a mechanism for historical decay.

Implementing W-TinyLFU: The Gold Standard of In-Memory Caching

If you are using Spring Boot, Go, or Rust, you might have heard of the Caffeine caching library (or its ports). Caffeine is widely regarded as the fastest, highest-hit-rate in-memory cache in the JVM ecosystem. At its core lies an incredibly elegant architecture called W-TinyLFU (Window Tiny Least Frequently Used).

W-TinyLFU solves the frequency tracking problem without eating up massive amounts of memory. Instead of storing a 32-bit integer counter for every single key (which would bloat our cache metadata), it uses a Count-Min Sketch—a probabilistic data structure that estimates frequency using a fraction of a byte per entry.

The W-TinyLFU Architecture

W-TinyLFU divides the cache layout into two main areas:

  • The Window Cache (LRU): A small admission window (typically 1% of total cache space) that acts as a buffer. New arrivals are placed here first, guaranteeing high throughput for bursty, short-term data.
  • The Main Cache (Segmented LRU): Divided into a "Probation" segment and a "Protected" segment.

When an item is evicted from the Window Cache, it fights for survival. The system uses the Count-Min Sketch to compare the frequency of the evicted Window item against the frequency of the victim candidate at the tail of the Main Cache's Probation segment. If the newcomer has a higher frequency, it enters the Main Cache. Otherwise, it is discarded. This is a true meritocracy of data!

Let's look at how we can implement a basic Count-Min Sketch in Python to understand how we track millions of frequencies using almost zero memory.

import math

class CountMinSketch:
    def __init__(self, width: int, depth: int):
        self.width = width
        self.depth = depth
        # A tiny matrix of counters (typically 4 bits per counter in production)
        self.table = [[0] * width for _ in range(depth)]
        
    def _hash(self, key: str, i: int) -> int:
        # Use simple seed variations to simulate multiple hash functions
        return hash(f"{key}-{i}") % self.width

    def increment(self, key: str):
        for i in range(self.depth):
            idx = self._hash(key, i)
            # Prevent integer overflow (e.g., capping at 15 for 4-bit counters)
            if self.table[i][idx] < 15:
                self.table[i][idx] += 1

    def estimate(self, key: str) -> int:
        # The estimate is the minimum of all hashed counters to avoid collisions
        return min(self.table[i][self._hash(key, i)] for i in range(self.depth))

To prevent historical accumulation, W-TinyLFU uses a reset mechanism. Every time the total number of additions to the cache hits a specific threshold (the sample size $W$), all counters in the Count-Min Sketch are halved. Just like that, old historical giants lose their power, allowing fresh, trending data to claim its space.

Adaptive Caching: Self-Tuning Policies

What if your workload shifts dynamically? During the day, your app might experience heavy transactional traffic (best suited for LRU). At night, it might run batch reports (best suited for LFU).

Instead of manual tuning, we can use an Adaptive Replacement Cache (ARC). ARC dynamically adjusts the sizes of its LRU and LFU tracking lists based on cache misses.

How ARC Dynamically Balances the Scales

ARC keeps track of two double-sized lists: $L_1$ (for recency) and $L_2$ (for frequency). These lists are split into top portions ($T_1, T_2$) which hold the actual cached data, and bottom portions ($B_1, B_2$) which hold historical ghost keys (metadata of recently evicted items, but not their values).

Total Cache Capacity (C) = Size(T1) + Size(T2)

[   T1 (Recency)   ] <---> [   T2 (Frequency)   ]
[   B1 (Ghost List)]       [   B2 (Ghost List)  ]

If we hit a key that is present in the ghost list $B_1$ (meaning we recently evicted a recency-based key), ARC realizes it made a mistake by making the LFU portion too big. It instantly increases the target size of $T_1$ (recency), dedicating more physical RAM to recently used items. Conversely, a hit in $B_2$ shifts the target size in favor of frequency ($T_2$).

This self-tuning feedback loop means your infrastructure adapts automatically to traffic spikes, database migrations, or seasonal variations without needing developer intervention.

Production Checklist: Choosing Your Cache Architecture

How do you apply this to your daily dev stack? Here is a quick reference guide on choosing the right caching engine based on your architecture:

  • For Redis / Distributed Caching: Use the volatile-lru or allkeys-lfu eviction policies. Redis uses an approximated LRU/LFU algorithm using randomized sampling to save memory. If your access patterns have high variance (e.g., API rate limiting vs. static product pages), prefer LFU.
  • For In-Memory App Caching (JVM/Go/Rust): Do not roll your own LRU using maps and mutexes. Use robust libraries that implement W-TinyLFU (like Caffeine in Java, or Ristretto in Go). They handle lock contention, concurrent updates, and high-performance eviction out of the box.
  • For CDNs and Edge Nodes: Edge caches often leverage variations of ARC or LIRS to shield origin servers from massive scale "thundering herd" problems and crawler traffic.

Conclusion & Discussion

Just like housing in a dense city, memory in our applications is a highly contested resource. If we rely on outdated allocation strategies, we pay the price in high latencies, inflated cloud bills, and brittle databases. Moving beyond standard LRU to intelligent, frequency-aware, and adaptive eviction policies like W-TinyLFU or ARC ensures your architecture remains resilient, performant, and cost-effective.

What cache eviction policies are you running in production right now? Have you ever had your databases brought down by a cache-pollution event? Let me know in the comments below, or drop a line on our community Discord!

Until next time, keep your code clean, your databases shielded, and your latency low. Happy coding!

Post a Comment

Previous Post Next Post