Theoretical Foundations
Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.
Module 6: Caching Topologies & Content Delivery
PHASE 2 — SYSTEM DESIGN FOUNDATIONS: Caching is the single highest-leverage performance and cost optimization in distributed systems. A correctly designed cache hierarchy can reduce database load by 90%+, cut user-facing latency from 200ms to 5ms, and reduce CDN bandwidth costs. This module covers the full stack of caching strategies — from browser to edge to application to database — and the failure modes that make caching notoriously difficult to get right.
Introduction: Why Databases Can't Scale With Raw Read Traffic
A PostgreSQL database on a 32-core server handles approximately 10,000 complex queries/second before CPU becomes the bottleneck. An uncached e-commerce home page might require 8 database queries per page view. At 5,000 concurrent users, that's 40,000 queries/second — 4x over the limit.
Adding more database replicas helps (linearly scales read capacity) but costs linearly in infrastructure: $$\text{Cost}_{\text{replicas}} = N \times \text{Instance Cost}$$
Caching breaks this linear relationship. A single Redis cluster costing $200/month can serve 1 million cache reads/second, replacing the need for multiple expensive database read replicas.
The Cache Hit Ratio Formula
The performance gain from caching is governed by the Cache Hit Ratio ($H$): $$\text{Avg Latency} = H \times L_{\text{cache}} + (1-H) \times L_{\text{db}}$$
Where:
- $H$ = fraction of requests served from cache (hit ratio)
- $L_{\text{cache}}$ = cache read latency (Redis: ~0.1ms)
- $L_{\text{db}}$ = database query latency (PostgreSQL: ~5–50ms)
| Hit Ratio ($H$) | Avg Latency (cache=0.1ms, DB=20ms) | DB Load Reduction |
|---|---|---|
| 0% (no cache) | 20.0ms | 0% |
| 50% | 10.05ms | 50% |
| 80% | 4.08ms | 80% |
| 95% | 1.095ms | 95% |
| 99% | 0.299ms | 99% |
A 99% cache hit ratio reduces your database load by 99% while serving users 67x faster.
Section 1: The Caching Layer Hierarchy
Modern systems use caches at multiple layers. Each layer adds speed and absorbs load before requests reach the next layer:
[User Browser]
|
| Browser Cache (HTTP Cache-Control headers)
v
[CDN Edge Node] (Cloudflare, CloudFront, Fastly)
|
| CDN Cache (static assets, edge-computed responses)
v
[Reverse Proxy Cache] (NGINX FastCGI cache, Varnish)
|
| Process-Level Cache (in-memory LRU in Node.js process)
v
[Application Server]
|
| Distributed Cache (Redis, Memcached)
v
[Database] (PostgreSQL, MySQL)
Each layer has different characteristics:
| Cache Layer | Latency | Scope | Storage | TTL Range |
|---|---|---|---|---|
| Browser (HTTP) | 0ms (local) | Per-user | User's disk/memory | Seconds to years |
| CDN Edge | 1–5ms | Global | Edge PoP SSDs | Minutes to days |
| Reverse Proxy | 0.5ms | Per-node | Server memory/disk | Seconds to hours |
| In-Process | 0.001ms | Per-process | Process heap | Seconds to minutes |
| Distributed (Redis) | 0.1ms | Cluster-wide | RAM | Seconds to hours |
| Database query cache | 1–5ms | Database-wide | DB memory | Implicit (invalidated on write) |
Section 2: Cache-Aside (Lazy Loading)
The most common caching pattern. The application code manages both the cache and the database directly.
Cache-Aside Read Flow:
Application → GET user:42 → Redis
|
┌──────────────┴──────────────┐
│ Cache HIT │ Cache MISS
│ Return cached value │ Query PostgreSQL
│ (~0.1ms) │ SET user:42 in Redis (TTL: 300s)
│ │ Return value (~20ms)
└──────────────────────────────┘
class UserRepository {
constructor(
private redis: Redis,
private db: PrismaClient
) {}
async findById(id: string): Promise<User | null> {
const cacheKey = `user:${id}`;
const TTL = 300; // 5 minutes
// 1. Check cache first
const cached = await this.redis.get(cacheKey);
if (cached) {
return JSON.parse(cached); // Cache hit — return immediately
}
// 2. Cache miss — query database
const user = await this.db.user.findUnique({ where: { id } });
if (!user) return null;
// 3. Populate cache for future reads
await this.redis.setex(cacheKey, TTL, JSON.stringify(user));
return user;
}
async update(id: string, data: Partial<User>): Promise<User> {
const updated = await this.db.user.update({ where: { id }, data });
// Invalidate stale cache entry
await this.redis.del(`user:${id}`);
return updated;
}
}
Trade-offs:
- Pros: Only caches data that's actually requested. Cache failures are non-fatal (falls through to DB). Fine-grained cache control per query.
- Cons: Cache miss on cold start causes a "thundering herd" (addressed in Section 5). Data can be stale until TTL expires if using TTL-based invalidation.
Section 3: Write-Through vs. Write-Behind
A. Write-Through
The application writes to the cache first. The cache synchronously writes to the database before confirming success.
Application → WRITE → [Cache] → WRITE → [Database]
↓
Success confirmed to Application
async createOrder(order: Order): Promise<void> {
const tx = await this.db.$transaction(async (prisma) => {
const saved = await prisma.order.create({ data: order });
// Synchronously cache the new order
await this.redis.setex(`order:${saved.id}`, 3600, JSON.stringify(saved));
return saved;
});
}
Trade-offs:
- Pros: Cache is always consistent with the database. Read-after-write always returns fresh data.
- Cons: Writes are slower (two writes: cache + DB). Cache stores data that may never be read again (write-all, read-few workloads waste cache memory).
B. Write-Behind (Write-Back)
The application writes to the cache only. The cache asynchronously flushes to the database in batches.
Application → WRITE → [Cache] → returns success immediately
↓
(async, batch flush every 500ms)
↓
[Database]
Trade-offs:
- Pros: Lowest write latency. Database receives batched writes — dramatically reduces write IOPS.
- Cons: Data loss risk — if the cache node crashes before flushing, writes are lost. Complex to implement correctly (requires write durability guarantees in the cache layer, e.g., Redis AOF persistence).
C. Choosing the Right Write Strategy
| Use Case | Recommended Strategy |
|---|---|
| User session data (can be regenerated) | Write-Through (safety over speed) |
| Product catalog (read-heavy, rare writes) | Cache-Aside with TTL |
| IoT sensor metrics (high write volume) | Write-Behind with AOF persistence |
| Financial account balances | Write-Through or no cache (strong consistency) |
| Search autocomplete suggestions | Write-Behind (eventual consistency acceptable) |
Section 4: Cache Eviction Policies
When a cache is full and a new item must be inserted, an existing item must be evicted. The eviction policy determines which item is removed.
A. LRU (Least Recently Used)
Evicts the item that has not been accessed for the longest time. Implemented using a doubly-linked list + hash map combination ($O(1)$ get and put).
Cache State (capacity=3):
[A] → [B] → [C] (A is least recently used)
Access D → Cache full → Evict A:
[B] → [C] → [D]
Best for: General-purpose caching where recently accessed items are likely to be accessed again.
B. LFU (Least Frequently Used)
Evicts the item with the lowest access count.
Item A: accessed 50 times
Item B: accessed 2 times ← Evicted
Item C: accessed 30 times
Best for: Workloads with long-running hot items (popular product pages, trending news). LRU would eventually evict popular items if they haven't been accessed recently.
C. ARC (Adaptive Replacement Cache)
ARC adaptively balances between LRU and LFU based on recent workload patterns. Used by ZFS and some high-performance databases.
Redis Eviction Policy Configuration
# Set max memory and eviction policy in redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru
# Available policies:
# noeviction — Return errors when memory limit is reached (fail-safe)
# allkeys-lru — Evict any key using LRU
# volatile-lru — Evict only keys with TTL set, using LRU
# allkeys-random — Evict random keys
# volatile-ttl — Evict the key with the shortest TTL first
Section 5: Cache Failure Modes and Mitigations
A. Cache Stampede (Thundering Herd)
Scenario: A trending article page is cached with a 60-second TTL. 10,000 users are viewing it simultaneously. The key expires at exactly the same moment — all 10,000 requests miss the cache and hit the database simultaneously.
10,000 concurrent users
|
Key expires at T=0
|
10,000 cache misses → 10,000 DB queries → DB CPU spikes to 100% → CRASH
Mitigation 1: Mutex Lock (Singleflight Pattern)
Only one goroutine/thread queries the database for an expired key. All other concurrent requests wait for the first result:
class SingleFlight {
private in_flight = new Map<string, Promise<any>>();
async do<T>(key: string, fn: () => Promise<T>): Promise<T> {
if (this.in_flight.has(key)) {
return this.in_flight.get(key)!; // Wait for in-flight request
}
const promise = fn().finally(() => this.in_flight.delete(key));
this.in_flight.set(key, promise);
return promise;
}
}
const singleFlight = new SingleFlight();
async function getArticle(id: string): Promise<Article> {
const cached = await redis.get(`article:${id}`);
if (cached) return JSON.parse(cached);
// Only ONE database query fires even with 10,000 concurrent misses
return singleFlight.do(`article:${id}`, async () => {
const article = await db.article.findUnique({ where: { id } });
await redis.setex(`article:${id}`, 60, JSON.stringify(article));
return article;
});
}
Mitigation 2: Probabilistic Early Expiration (XFetch Algorithm)
Before the key's TTL expires, a background thread proactively recomputes the value. The probability of triggering early recomputation increases as the TTL approaches:
$$\text{Refresh early if:} \quad -\beta \times \delta \times \ln(\text{rand}()) > (\text{TTL}_\text{remaining})$$
Where $\beta$ is a sensitivity parameter (higher = more aggressive early refresh) and $\delta$ is the time taken to recompute the value.
B. Cache Penetration
Scenario: An attacker sends millions of requests for user:999999999 — a user ID that doesn't exist. Every request misses the cache (no entry to cache for non-existent users) and hits the database, which also returns null.
Attacker → GET user:999999999 → Redis (miss) → PostgreSQL (null) → repeat ×1M
↑
DB overwhelmed by null queries
Mitigation 1: Cache Null Results
Cache the null result with a short TTL:
const user = await db.user.findUnique({ where: { id } });
// Cache both found users AND null results
await redis.setex(`user:${id}`, user ? 300 : 30, JSON.stringify(user ?? null));
Mitigation 2: Bloom Filter
A Bloom Filter is a probabilistic data structure that can definitively answer "Does key X definitely NOT exist?" with zero false negatives (though it can have false positives):
Bloom Filter for valid user IDs:
- On user creation: bloomFilter.add(userId)
- On request: if !bloomFilter.has(userId) → return 404 immediately (skip cache + DB)
- Memory: 10M users stored in ~12MB (vs. 400MB for a hash set)
class UserService {
private bloomFilter: BloomFilter;
async findById(id: string): Promise<User | null> {
// Fast rejection: definitively not in database
if (!this.bloomFilter.test(id)) {
return null; // Return immediately — no cache or DB lookup
}
const cached = await this.redis.get(`user:${id}`);
if (cached) return JSON.parse(cached);
const user = await this.db.user.findUnique({ where: { id } });
if (user) await this.redis.setex(`user:${id}`, 300, JSON.stringify(user));
return user;
}
}
C. Cache Avalanche
Scenario: All cached items across the system expire at the same time (e.g., you preloaded the cache at midnight with all TTLs set to exactly 3600 seconds — at 1:00 AM, everything expires simultaneously). Every request hits the database at once.
Mitigation: TTL Jitter
Add random variance to all TTL values to distribute expiry across time:
function jitteredTTL(baseTTL: number, jitterRatio = 0.1): number {
const jitter = baseTTL * jitterRatio * Math.random();
return Math.floor(baseTTL + jitter); // e.g., 3600 ± 10% → 3600–3960 seconds
}
await redis.setex(key, jitteredTTL(3600), value);
// Different keys expire at different times — avalanche prevented
Section 6: CDN Edge Caching
A Content Delivery Network (CDN) is a globally distributed network of edge servers that cache and serve content from locations geographically close to users.
CDN Cache Mechanics
First request from user in Tokyo for /static/app.js:
Tokyo PoP → MISS → Origin server (Virginia) → 100ms RTT
CDN stores app.js at Tokyo PoP
All subsequent requests from Tokyo users for /static/app.js:
Tokyo PoP → HIT → serves from local SSD → 2ms
HTTP Cache-Control Headers
The origin server controls CDN and browser caching behavior via HTTP response headers:
// Next.js API route: control caching at the HTTP layer
export async function GET(request: Request) {
const data = await fetchProductCatalog();
return new Response(JSON.stringify(data), {
headers: {
'Content-Type': 'application/json',
// Cache for 1 hour at CDN edge, 5 min in browser, never private
'Cache-Control': 'public, s-maxage=3600, max-age=300',
// CDN can serve stale for 1 hour while revalidating in background
'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=3600',
// Unique version-based cache key (immutable assets)
'Cache-Control': 'public, max-age=31536000, immutable',
}
});
}
Cache-Control Directives:
| Directive | Meaning |
|---|---|
public |
Can be cached by CDNs and proxies |
private |
Only cacheable in the browser (not CDN) |
no-store |
Never cache anywhere |
no-cache |
Must revalidate with origin before serving cached copy |
max-age=N |
Browser caches for N seconds |
s-maxage=N |
CDN caches for N seconds (overrides max-age for shared caches) |
stale-while-revalidate=N |
Serve stale while fetching fresh in background |
immutable |
Browser never revalidates (use only with versioned URLs) |
Section 7: Redis CLI Command Reference
# Connect to Redis
redis-cli -h redis.internal -p 6379
# String operations (most common)
SET user:42 '{"name":"Alice"}' EX 300 # Set with 300s TTL
GET user:42 # Get value
DEL user:42 # Delete key
TTL user:42 # Remaining TTL in seconds
PERSIST user:42 # Remove TTL (make permanent)
# Atomic counter (rate limiting)
INCR api:requests:user:42 # Increment counter
EXPIRE api:requests:user:42 60 # Set TTL atomically
# Hash (partial field updates without full re-serialization)
HSET user:42 name "Alice" role "architect"
HGET user:42 name
HMGET user:42 name role
# Sorted Set (leaderboard)
ZADD leaderboard 9450 "alice"
ZADD leaderboard 8230 "bob"
ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10 with scores
ZRANK leaderboard "alice" # Rank of a specific member
# Pipeline (batch commands to reduce RTT overhead)
redis-cli --pipe << EOF
SET key1 value1 EX 300
SET key2 value2 EX 300
SET key3 value3 EX 300
EOF
# Lua script (atomic multi-command operations — rate limiting example)
EVAL "
local count = redis.call('INCR', KEYS[1])
if count == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count
" 1 api:rate:user:42 60
Section 8: Hands-On Practice Challenge
The Challenge
The current architecture shows a client browser querying the database directly for every request. Design a multi-tier cache hierarchy that intercepts requests at multiple layers before reaching the database.
Your Goal
Construct a cache hierarchy diagram showing:
Client Browserwith local cache (HTTP Cache-Control).CDNcaching static assets at the edge.App ServercheckingRedisbefore querying the database.PostgreSQLas the system of record.
Solution Model
graph TD
Client[Client Browser\nBrowser Cache: 5min]
CDN[CDN Edge Node\nStatic Assets: 1yr\nAPI Responses: 1hr]
App[Application Server\nIn-Process LRU: 30s]
Redis[Redis Cluster\nDistributed Cache: 5min]
DB[(PostgreSQL\nSystem of Record)]
Client -->|HTTPS| CDN
CDN -->|Cache Miss: Static| App
CDN -->|Cache Miss: API| App
App -->|Cache Miss| Redis
Redis -->|Cache Miss| DB
DB -->|Result| Redis
Redis -->|Cached Result| App
App -->|Response + Cache-Control headers| CDN
CDN -->|Cached Response| Client
style CDN fill:#1e40af,color:#fff
style Redis fill:#dc2626,color:#fff
style DB fill:#065f46,color:#fff
Cache miss chain (worst case):
Browser (miss) → CDN (miss) → App (miss) → Redis (miss) → PostgreSQL
Total latency: ~25ms (CDN: 5ms + Redis: 0.5ms + DB: 20ms)
Cache hit at CDN (best case):
Browser (miss) → CDN (HIT)
Total latency: ~2ms
Bridge to Module 7: You now know how to make a single system fast. Module 7 (Distributed Computing Realities & CAP Theorem) introduces the fundamental constraints that govern how distributed systems behave — particularly what guarantees you can and cannot make when network partitions occur between your caching and database nodes.