Distributed Caching Strategies: Cache-Aside, Write-Through, and Beyond
Every caching strategy encodes a set of consistency promises. Cache-aside promises that the application controls exactly what is cached and when. Write-through promises that the cache and database are always synchronized on writes. Write-behind promises eventual persistence in exchange for lower write latency. These are not interchangeable variations on a theme — they produce genuinely different consistency behaviors, and choosing the wrong one for your workload produces cache entries that diverge from the database in ways that are difficult to diagnose and expensive to remediate.
The other complexity is that distributed caching adds a layer that doesn't exist in single-node systems: the cache itself is partitioned across multiple nodes, which introduces consistent hashing, node failure handling, and thundering herd problems that don't appear in textbook examples.
Concept
Cache-Aside (Lazy Loading)
In cache-aside — the most common pattern in microservice architectures — the application is responsible for all cache interactions. On a read, the application checks the cache first. If the data is present (cache hit), it is returned directly. If absent (cache miss), the application reads from the database, writes the result to the cache with a TTL, and returns the data.
On a write, the application writes to the database first, then invalidates (or updates) the cache entry. The critical decision is whether to invalidate (delete the cache key) or update (write the new value to the cache). Invalidation is safer: it forces the next read to populate the cache from the authoritative database, ensuring consistency. Updating the cache on write avoids the next cache miss but introduces a window where a concurrent read could overwrite the updated cache value with a stale database read.
The consistency risk in cache-aside is the race condition between a cache miss read and a concurrent write:
- Client A reads product_123 → cache miss → reads from DB: price = $50
- Client B writes product_123 → DB: price = $75 → invalidates cache
- Client A writes to cache: price = $50 (stale!) → cache now has the old price
Mitigating this requires either TTL-based eventual resolution (the stale entry expires and is refreshed), distributed locking before populating the cache, or using compare-and-swap (CAS) semantics on the cache write.
Write-Through
In write-through, every write to the database passes through the cache layer first. The application writes to the cache, and the cache layer synchronously writes to the database before acknowledging the write. The cache is always populated with the latest written value; cache misses only occur for data that has never been written through this path.
The advantage is strong cache-database consistency for the write path. The disadvantage is write latency: every write incurs both the cache write and the database write, serialized. In read-heavy workloads with infrequent writes, this is often acceptable. In write-heavy workloads, the double write latency is a significant cost.
Write-through also populates the cache with every written record, including records that may never be read again. Cache memory is consumed by data that provides no read acceleration benefit.
Write-Behind (Write-Back)
Write-behind acknowledges the write after the cache write alone, then asynchronously flushes to the database. This provides the lowest write latency (cache-speed writes) but introduces a durability window: if the cache node fails before the flush completes, the write is lost.
Write-behind is appropriate for write-heavy workloads where durability can be relaxed — telemetry counters, session updates, view counts — and inappropriate for financial transactions or any data where loss is unacceptable.
Cache Stampede (Thundering Herd)
Cache stampede is the failure mode that occurs when a popular cache entry expires simultaneously for many callers. All callers experience a cache miss concurrently, all query the database concurrently, all receive the same result, and all simultaneously write it back to the cache — performing N identical database queries where one was sufficient.
At high concurrency with expensive database queries, a stampede can saturate the database connection pool. The mitigations are:
Mutex/lock pattern: The first miss acquires a distributed lock. Subsequent misses wait for the lock. When the lock holder populates the cache, waiting callers find the cache warm on retry.
Probabilistic early expiration: Popularized by XFetch, this algorithm begins refreshing a cache entry before it expires, with probability proportional to the cost of regeneration and inversely proportional to remaining TTL. High-cost entries are refreshed proactively; low-cost entries expire naturally.
Stale-while-revalidate: Return the stale cached value immediately while triggering an async refresh in the background. The caller never waits for the cache miss.
Consistent Hashing for Cache Node Distribution
When a cache cluster has multiple nodes, you need a strategy for assigning cache keys to nodes. Naive modulo hashing (node = hash(key) % nodeCount) is simple but catastrophic on node addition or removal: changing nodeCount remaps ~(N-1)/N of all keys to different nodes, invalidating most of the cache.
Consistent hashing places both nodes and keys on a hash ring. Each key is assigned to the nearest clockwise node on the ring. When a node is added or removed, only the keys that mapped to that node need to be remapped — approximately 1/N of all keys. Redis Cluster uses a variant of this with 16,384 hash slots.
Constraints
TTL calibration: TTL is the primary control over cache-database consistency in cache-aside. Too short: high cache miss rate, high database load, reduced benefit. Too long: stale data returned for extended periods. The right TTL depends on how frequently the underlying data changes and how much staleness the consumer can tolerate. There is no universally correct value.
Invalidation in microservices: When the service that owns the data and the service that caches the data are different microservices, cache invalidation requires cross-service coordination. The owning service must notify all caching consumers when data changes, typically via domain events. If any consumer misses the event, its cache becomes permanently stale until TTL expiration.
Cold cache startup: After a cache flush, node restart, or cluster scale-out, the cache is cold. All reads hit the database simultaneously. For high-traffic services, a cold cache event can cause a traffic spike that saturates the database. Pre-warming the cache with predictively hot keys before enabling traffic is standard practice for critical services.
Memory pressure and eviction: Cache memory is finite. Under memory pressure, Redis evicts keys according to its configured eviction policy (LRU, LFU, volatile-ttl, allkeys-lru, etc.). If the wrong policy is configured for your access pattern, frequently accessed data can be evicted in favor of recently written but infrequently read data.
Trade-offs
| Strategy | Write Latency | Read Hit Rate | Consistency | Durability Risk | Best For |
|---|---|---|---|---|---|
| Cache-Aside | Low (DB only) | Variable (miss-then-populate) | Eventual (TTL or invalidation) | None | Read-heavy, variable access patterns |
| Write-Through | High (cache + DB) | High (always warm after write) | Strong on write path | None | Read-heavy with predictable writes |
| Write-Behind | Lowest (cache only) | High | Eventual (async flush) | Data loss window | Write-heavy, loss-tolerant |
| Read-Through | Low (hidden from app) | High (cache manages miss) | Eventual | None | Simplifying application logic |
The most significant trade-off is between consistency and performance. Tighter consistency requires more synchronous coordination between cache and database; looser consistency tolerates more divergence in exchange for lower latency and higher availability.
Code
Cache-Aside with Stampede Protection via Distributed Lock
public sealed class ProductCatalogCache
{
private readonly IDatabase _redisDb;
private readonly IProductRepository _productRepository;
private readonly ILogger<ProductCatalogCache> _logger;
private static readonly TimeSpan CacheTtl = TimeSpan.FromMinutes(15);
private static readonly TimeSpan LockTtl = TimeSpan.FromSeconds(5);
public ProductCatalogCache(
IDatabase redisDb,
IProductRepository productRepository,
ILogger<ProductCatalogCache> logger)
{
_redisDb = redisDb;
_productRepository = productRepository;
_logger = logger;
}
public async Task<Product?> GetProductAsync(ProductId productId, CancellationToken ct)
{
var cacheKey = $"product:{productId.Value}";
// Step 1: Try cache
var cached = await _redisDb.StringGetAsync(cacheKey);
if (cached.HasValue)
return JsonSerializer.Deserialize<Product>(cached!);
// Step 2: Cache miss — acquire distributed lock to prevent stampede
var lockKey = $"lock:{cacheKey}";
var lockToken = Guid.NewGuid().ToString("N");
bool acquired = await _redisDb.StringSetAsync(
lockKey, lockToken, LockTtl, When.NotExists);
if (!acquired)
{
// Another instance is populating — poll for the cache to warm up
// Real implementation: use exponential backoff with max retries
await Task.Delay(50, ct);
var retryValue = await _redisDb.StringGetAsync(cacheKey);
return retryValue.HasValue
? JsonSerializer.Deserialize<Product>(retryValue!)
: null; // Give up and return null — let the caller decide to fallback
}
try
{
// Step 3: We hold the lock — load from database
var product = await _productRepository.FindByIdAsync(productId, ct);
if (product is not null)
{
await _redisDb.StringSetAsync(
cacheKey,
JsonSerializer.Serialize(product),
CacheTtl);
}
return product;
}
finally
{
// Release lock only if we still own it (guard against lock expiry + race)
var releaseLuaScript = @"
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0
end";
await _redisDb.ScriptEvaluateAsync(
releaseLuaScript,
new RedisKey[] { lockKey },
new RedisValue[] { lockToken });
}
}
public async Task InvalidateProductAsync(ProductId productId)
{
var cacheKey = $"product:{productId.Value}";
await _redisDb.KeyDeleteAsync(cacheKey);
_logger.LogInformation("Invalidated cache key {CacheKey}", cacheKey);
}
}
Write-Through Cache with Optimistic Concurrency
public sealed class WriteThroughInventoryCache
{
private readonly IDatabase _redisDb;
private readonly IInventoryRepository _inventoryRepository;
public WriteThroughInventoryCache(
IDatabase redisDb,
IInventoryRepository inventoryRepository)
{
_redisDb = redisDb;
_inventoryRepository = inventoryRepository;
}
/// <summary>
/// Write-through: updates cache first, then persists to database.
/// Uses optimistic concurrency to prevent lost updates.
/// </summary>
public async Task<bool> DecrementStockAsync(
SkuId skuId, int quantity, CancellationToken ct)
{
var cacheKey = $"inventory:{skuId.Value}";
// Atomic Redis decrement — avoids race between read-check-write
var decrementScript = @"
local current = tonumber(redis.call('GET', KEYS[1]) or '-1')
if current < 0 then return -1 end -- cache miss: let app handle
if current < tonumber(ARGV[1]) then return -2 end -- insufficient stock
return redis.call('DECRBY', KEYS[1], ARGV[1])
";
var newValue = (long)await _redisDb.ScriptEvaluateAsync(
decrementScript,
new RedisKey[] { cacheKey },
new RedisValue[] { quantity });
if (newValue == -1)
{
// Cache miss: fall back to database with pessimistic lock
return await _inventoryRepository.DecrementStockAsync(skuId, quantity, ct);
}
if (newValue == -2)
return false; // Insufficient stock
// Cache updated: persist to database asynchronously for write-through guarantee
// In production: use outbox pattern to ensure DB write is not silently dropped
await _inventoryRepository.SetStockLevelAsync(skuId, (int)newValue, ct);
return true;
}
/// <summary>
/// XFetch-style probabilistic early expiration.
/// Returns true if the entry should be refreshed (before it actually expires).
/// </summary>
public async Task<bool> ShouldEagerlyRefreshAsync(
string cacheKey, TimeSpan originalTtl, double beta = 1.0)
{
var ttlRemaining = await _redisDb.KeyTimeToLiveAsync(cacheKey);
if (ttlRemaining is null || ttlRemaining == TimeSpan.Zero)
return true; // Already expired
// XFetch: refresh probability increases as TTL decreases
// P(refresh) = -beta × recomputationCostMs × log(uniform_random)
// Simplified: refresh if remaining TTL < (beta × 10% of original TTL)
var refreshThreshold = TimeSpan.FromTicks((long)(beta * originalTtl.Ticks * 0.10));
return ttlRemaining < refreshThreshold;
}
}
// Cache invalidation via domain event subscription
// When ProductUpdatedEvent is published, all cache entries for that product are invalidated
public sealed class ProductCacheInvalidationHandler
: INotificationHandler<ProductUpdatedEvent>
{
private readonly IDatabase _redisDb;
private readonly ILogger<ProductCacheInvalidationHandler> _logger;
public ProductCacheInvalidationHandler(
IDatabase redisDb, ILogger<ProductCacheInvalidationHandler> logger)
{
_redisDb = redisDb;
_logger = logger;
}
public async Task Handle(ProductUpdatedEvent notification, CancellationToken ct)
{
// Invalidate all cache variants for the product (by ID, by slug, by category lists)
var keysToDelete = new RedisKey[]
{
$"product:{notification.ProductId}",
$"product:slug:{notification.Slug}",
$"product-summary:{notification.ProductId}"
};
var deleted = await _redisDb.KeyDeleteAsync(keysToDelete);
_logger.LogInformation(
"Invalidated {Count} cache keys for ProductId {ProductId} following update.",
deleted, notification.ProductId);
}
}