Theoretical Foundations

Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.

Module 6: Caching Topologies & Content Delivery

PHASE 2 — SYSTEM DESIGN FOUNDATIONS: Caching is the single highest-leverage performance and cost optimization in distributed systems. A correctly designed cache hierarchy can reduce database load by 90%+, cut user-facing latency from 200ms to 5ms, and reduce CDN bandwidth costs. This module covers the full stack of caching strategies — from browser to edge to application to database — and the failure modes that make caching notoriously difficult to get right.

Introduction: Why Databases Can't Scale With Raw Read Traffic

A PostgreSQL database on a 32-core server handles approximately 10,000 complex queries/second before CPU becomes the bottleneck. An uncached e-commerce home page might require 8 database queries per page view. At 5,000 concurrent users, that's 40,000 queries/second — 4x over the limit.

Adding more database replicas helps (linearly scales read capacity) but costs linearly in infrastructure: $$\text{Cost}_{\text{replicas}} = N \times \text{Instance Cost}$$

Caching breaks this linear relationship. A single Redis cluster costing $200/month can serve 1 million cache reads/second, replacing the need for multiple expensive database read replicas.

The Cache Hit Ratio Formula

The performance gain from caching is governed by the Cache Hit Ratio ($H$): $$\text{Avg Latency} = H \times L_{\text{cache}} + (1-H) \times L_{\text{db}}$$

Where:

$H$ = fraction of requests served from cache (hit ratio)
$L_{\text{cache}}$ = cache read latency (Redis: ~0.1ms)
$L_{\text{db}}$ = database query latency (PostgreSQL: ~5–50ms)

Hit Ratio ($H$)	Avg Latency (cache=0.1ms, DB=20ms)	DB Load Reduction
0% (no cache)	20.0ms	0%
50%	10.05ms	50%
80%	4.08ms	80%
95%	1.095ms	95%
99%	0.299ms	99%

A 99% cache hit ratio reduces your database load by 99% while serving users 67x faster.

Section 1: The Caching Layer Hierarchy

Modern systems use caches at multiple layers. Each layer adds speed and absorbs load before requests reach the next layer:

[User Browser]
    |
    | Browser Cache (HTTP Cache-Control headers)
    v
[CDN Edge Node] (Cloudflare, CloudFront, Fastly)
    |
    | CDN Cache (static assets, edge-computed responses)
    v
[Reverse Proxy Cache] (NGINX FastCGI cache, Varnish)
    |
    | Process-Level Cache (in-memory LRU in Node.js process)
    v
[Application Server]
    |
    | Distributed Cache (Redis, Memcached)
    v
[Database] (PostgreSQL, MySQL)

Each layer has different characteristics:

Cache Layer	Latency	Scope	Storage	TTL Range
Browser (HTTP)	0ms (local)	Per-user	User's disk/memory	Seconds to years
CDN Edge	1–5ms	Global	Edge PoP SSDs	Minutes to days
Reverse Proxy	0.5ms	Per-node	Server memory/disk	Seconds to hours
In-Process	0.001ms	Per-process	Process heap	Seconds to minutes
Distributed (Redis)	0.1ms	Cluster-wide	RAM	Seconds to hours
Database query cache	1–5ms	Database-wide	DB memory	Implicit (invalidated on write)

Section 2: Cache-Aside (Lazy Loading)

The most common caching pattern. The application code manages both the cache and the database directly.

Cache-Aside Read Flow:

Application → GET user:42 → Redis
                               |
               ┌──────────────┴──────────────┐
               │ Cache HIT                    │ Cache MISS
               │ Return cached value          │ Query PostgreSQL
               │ (~0.1ms)                     │ SET user:42 in Redis (TTL: 300s)
               │                              │ Return value (~20ms)
               └──────────────────────────────┘

class UserRepository {
  constructor(
    private redis: Redis,
    private db: PrismaClient
  ) {}

  async findById(id: string): Promise<User | null> {
    const cacheKey = `user:${id}`;
    const TTL = 300; // 5 minutes

    // 1. Check cache first
    const cached = await this.redis.get(cacheKey);
    if (cached) {
      return JSON.parse(cached);  // Cache hit — return immediately
    }

    // 2. Cache miss — query database
    const user = await this.db.user.findUnique({ where: { id } });
    if (!user) return null;

    // 3. Populate cache for future reads
    await this.redis.setex(cacheKey, TTL, JSON.stringify(user));
    return user;
  }

  async update(id: string, data: Partial<User>): Promise<User> {
    const updated = await this.db.user.update({ where: { id }, data });
    // Invalidate stale cache entry
    await this.redis.del(`user:${id}`);
    return updated;
  }
}

Trade-offs:

Pros: Only caches data that's actually requested. Cache failures are non-fatal (falls through to DB). Fine-grained cache control per query.
Cons: Cache miss on cold start causes a "thundering herd" (addressed in Section 5). Data can be stale until TTL expires if using TTL-based invalidation.

Section 3: Write-Through vs. Write-Behind

A. Write-Through

The application writes to the cache first. The cache synchronously writes to the database before confirming success.

Application → WRITE → [Cache] → WRITE → [Database]
                                         ↓
                              Success confirmed to Application

async createOrder(order: Order): Promise<void> {
  const tx = await this.db.$transaction(async (prisma) => {
    const saved = await prisma.order.create({ data: order });
    // Synchronously cache the new order
    await this.redis.setex(`order:${saved.id}`, 3600, JSON.stringify(saved));
    return saved;
  });
}

Trade-offs:

Pros: Cache is always consistent with the database. Read-after-write always returns fresh data.
Cons: Writes are slower (two writes: cache + DB). Cache stores data that may never be read again (write-all, read-few workloads waste cache memory).

B. Write-Behind (Write-Back)

The application writes to the cache only. The cache asynchronously flushes to the database in batches.

Application → WRITE → [Cache] → returns success immediately
                         ↓
             (async, batch flush every 500ms)
                         ↓
                    [Database]

Trade-offs:

Pros: Lowest write latency. Database receives batched writes — dramatically reduces write IOPS.
Cons: Data loss risk — if the cache node crashes before flushing, writes are lost. Complex to implement correctly (requires write durability guarantees in the cache layer, e.g., Redis AOF persistence).

C. Choosing the Right Write Strategy

Use Case	Recommended Strategy
User session data (can be regenerated)	Write-Through (safety over speed)
Product catalog (read-heavy, rare writes)	Cache-Aside with TTL
IoT sensor metrics (high write volume)	Write-Behind with AOF persistence
Financial account balances	Write-Through or no cache (strong consistency)
Search autocomplete suggestions	Write-Behind (eventual consistency acceptable)

Section 4: Cache Eviction Policies

When a cache is full and a new item must be inserted, an existing item must be evicted. The eviction policy determines which item is removed.

A. LRU (Least Recently Used)

Evicts the item that has not been accessed for the longest time. Implemented using a doubly-linked list + hash map combination ($O(1)$ get and put).

Cache State (capacity=3):
[A] → [B] → [C]  (A is least recently used)

Access D → Cache full → Evict A:
[B] → [C] → [D]

Best for: General-purpose caching where recently accessed items are likely to be accessed again.

B. LFU (Least Frequently Used)

Evicts the item with the lowest access count.

Item A: accessed 50 times
Item B: accessed 2 times  ← Evicted
Item C: accessed 30 times

Best for: Workloads with long-running hot items (popular product pages, trending news). LRU would eventually evict popular items if they haven't been accessed recently.

C. ARC (Adaptive Replacement Cache)

ARC adaptively balances between LRU and LFU based on recent workload patterns. Used by ZFS and some high-performance databases.

Redis Eviction Policy Configuration

# Set max memory and eviction policy in redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru

# Available policies:
# noeviction    — Return errors when memory limit is reached (fail-safe)
# allkeys-lru   — Evict any key using LRU
# volatile-lru  — Evict only keys with TTL set, using LRU
# allkeys-random — Evict random keys
# volatile-ttl  — Evict the key with the shortest TTL first

Section 5: Cache Failure Modes and Mitigations

A. Cache Stampede (Thundering Herd)

Scenario: A trending article page is cached with a 60-second TTL. 10,000 users are viewing it simultaneously. The key expires at exactly the same moment — all 10,000 requests miss the cache and hit the database simultaneously.

10,000 concurrent users
        |
  Key expires at T=0
        |
10,000 cache misses → 10,000 DB queries → DB CPU spikes to 100% → CRASH

Mitigation 1: Mutex Lock (Singleflight Pattern)

Only one goroutine/thread queries the database for an expired key. All other concurrent requests wait for the first result:

class SingleFlight {
  private in_flight = new Map<string, Promise<any>>();

  async do<T>(key: string, fn: () => Promise<T>): Promise<T> {
    if (this.in_flight.has(key)) {
      return this.in_flight.get(key)!;  // Wait for in-flight request
    }

    const promise = fn().finally(() => this.in_flight.delete(key));
    this.in_flight.set(key, promise);
    return promise;
  }
}

const singleFlight = new SingleFlight();

async function getArticle(id: string): Promise<Article> {
  const cached = await redis.get(`article:${id}`);
  if (cached) return JSON.parse(cached);

  // Only ONE database query fires even with 10,000 concurrent misses
  return singleFlight.do(`article:${id}`, async () => {
    const article = await db.article.findUnique({ where: { id } });
    await redis.setex(`article:${id}`, 60, JSON.stringify(article));
    return article;
  });
}

Mitigation 2: Probabilistic Early Expiration (XFetch Algorithm)

Before the key's TTL expires, a background thread proactively recomputes the value. The probability of triggering early recomputation increases as the TTL approaches:

$$\text{Refresh early if:} \quad -\beta \times \delta \times \ln(\text{rand}()) > (\text{TTL}_\text{remaining})$$

Where $\beta$ is a sensitivity parameter (higher = more aggressive early refresh) and $\delta$ is the time taken to recompute the value.

B. Cache Penetration

Scenario: An attacker sends millions of requests for user:999999999 — a user ID that doesn't exist. Every request misses the cache (no entry to cache for non-existent users) and hits the database, which also returns null.

Attacker → GET user:999999999 → Redis (miss) → PostgreSQL (null) → repeat ×1M
                                                       ↑
                                               DB overwhelmed by null queries

Mitigation 1: Cache Null Results

Cache the null result with a short TTL:

const user = await db.user.findUnique({ where: { id } });
// Cache both found users AND null results
await redis.setex(`user:${id}`, user ? 300 : 30, JSON.stringify(user ?? null));

Mitigation 2: Bloom Filter

A Bloom Filter is a probabilistic data structure that can definitively answer "Does key X definitely NOT exist?" with zero false negatives (though it can have false positives):

Bloom Filter for valid user IDs:
- On user creation: bloomFilter.add(userId)
- On request: if !bloomFilter.has(userId) → return 404 immediately (skip cache + DB)
- Memory: 10M users stored in ~12MB (vs. 400MB for a hash set)

class UserService {
  private bloomFilter: BloomFilter;

  async findById(id: string): Promise<User | null> {
    // Fast rejection: definitively not in database
    if (!this.bloomFilter.test(id)) {
      return null;  // Return immediately — no cache or DB lookup
    }

    const cached = await this.redis.get(`user:${id}`);
    if (cached) return JSON.parse(cached);

    const user = await this.db.user.findUnique({ where: { id } });
    if (user) await this.redis.setex(`user:${id}`, 300, JSON.stringify(user));
    return user;
  }
}

C. Cache Avalanche

Scenario: All cached items across the system expire at the same time (e.g., you preloaded the cache at midnight with all TTLs set to exactly 3600 seconds — at 1:00 AM, everything expires simultaneously). Every request hits the database at once.

Mitigation: TTL Jitter

Add random variance to all TTL values to distribute expiry across time:

function jitteredTTL(baseTTL: number, jitterRatio = 0.1): number {
  const jitter = baseTTL * jitterRatio * Math.random();
  return Math.floor(baseTTL + jitter);  // e.g., 3600 ± 10% → 3600–3960 seconds
}

await redis.setex(key, jitteredTTL(3600), value);
// Different keys expire at different times — avalanche prevented

Section 6: CDN Edge Caching

A Content Delivery Network (CDN) is a globally distributed network of edge servers that cache and serve content from locations geographically close to users.

CDN Cache Mechanics

First request from user in Tokyo for /static/app.js:
  Tokyo PoP → MISS → Origin server (Virginia) → 100ms RTT
  CDN stores app.js at Tokyo PoP

All subsequent requests from Tokyo users for /static/app.js:
  Tokyo PoP → HIT → serves from local SSD → 2ms

HTTP Cache-Control Headers

The origin server controls CDN and browser caching behavior via HTTP response headers:

// Next.js API route: control caching at the HTTP layer
export async function GET(request: Request) {
  const data = await fetchProductCatalog();

  return new Response(JSON.stringify(data), {
    headers: {
      'Content-Type': 'application/json',
      // Cache for 1 hour at CDN edge, 5 min in browser, never private
      'Cache-Control': 'public, s-maxage=3600, max-age=300',
      // CDN can serve stale for 1 hour while revalidating in background
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=3600',
      // Unique version-based cache key (immutable assets)
      'Cache-Control': 'public, max-age=31536000, immutable',
    }
  });
}

Cache-Control Directives:

Directive	Meaning
`public`	Can be cached by CDNs and proxies
`private`	Only cacheable in the browser (not CDN)
`no-store`	Never cache anywhere
`no-cache`	Must revalidate with origin before serving cached copy
`max-age=N`	Browser caches for N seconds
`s-maxage=N`	CDN caches for N seconds (overrides `max-age` for shared caches)
`stale-while-revalidate=N`	Serve stale while fetching fresh in background
`immutable`	Browser never revalidates (use only with versioned URLs)

Section 7: Redis CLI Command Reference

# Connect to Redis
redis-cli -h redis.internal -p 6379

# String operations (most common)
SET user:42 '{"name":"Alice"}' EX 300    # Set with 300s TTL
GET user:42                               # Get value
DEL user:42                               # Delete key
TTL user:42                               # Remaining TTL in seconds
PERSIST user:42                           # Remove TTL (make permanent)

# Atomic counter (rate limiting)
INCR api:requests:user:42                 # Increment counter
EXPIRE api:requests:user:42 60            # Set TTL atomically

# Hash (partial field updates without full re-serialization)
HSET user:42 name "Alice" role "architect"
HGET user:42 name
HMGET user:42 name role

# Sorted Set (leaderboard)
ZADD leaderboard 9450 "alice"
ZADD leaderboard 8230 "bob"
ZREVRANGE leaderboard 0 9 WITHSCORES     # Top 10 with scores
ZRANK leaderboard "alice"                # Rank of a specific member

# Pipeline (batch commands to reduce RTT overhead)
redis-cli --pipe << EOF
SET key1 value1 EX 300
SET key2 value2 EX 300
SET key3 value3 EX 300
EOF

# Lua script (atomic multi-command operations — rate limiting example)
EVAL "
  local count = redis.call('INCR', KEYS[1])
  if count == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[1])
  end
  return count
" 1 api:rate:user:42 60

Section 8: Hands-On Practice Challenge

The Challenge

The current architecture shows a client browser querying the database directly for every request. Design a multi-tier cache hierarchy that intercepts requests at multiple layers before reaching the database.

Your Goal

Construct a cache hierarchy diagram showing:

Client Browser with local cache (HTTP Cache-Control).
CDN caching static assets at the edge.
App Server checking Redis before querying the database.
PostgreSQL as the system of record.

Solution Model

graph TD
    Client[Client Browser\nBrowser Cache: 5min]
    CDN[CDN Edge Node\nStatic Assets: 1yr\nAPI Responses: 1hr]
    App[Application Server\nIn-Process LRU: 30s]
    Redis[Redis Cluster\nDistributed Cache: 5min]
    DB[(PostgreSQL\nSystem of Record)]

    Client -->|HTTPS| CDN
    CDN -->|Cache Miss: Static| App
    CDN -->|Cache Miss: API| App
    App -->|Cache Miss| Redis
    Redis -->|Cache Miss| DB
    DB -->|Result| Redis
    Redis -->|Cached Result| App
    App -->|Response + Cache-Control headers| CDN
    CDN -->|Cached Response| Client

    style CDN fill:#1e40af,color:#fff
    style Redis fill:#dc2626,color:#fff
    style DB fill:#065f46,color:#fff

Cache miss chain (worst case):

Browser (miss) → CDN (miss) → App (miss) → Redis (miss) → PostgreSQL
Total latency: ~25ms (CDN: 5ms + Redis: 0.5ms + DB: 20ms)

Cache hit at CDN (best case):

Browser (miss) → CDN (HIT)
Total latency: ~2ms

Bridge to Module 7: You now know how to make a single system fast. Module 7 (Distributed Computing Realities & CAP Theorem) introduces the fundamental constraints that govern how distributed systems behave — particularly what guarantees you can and cannot make when network partitions occur between your caching and database nodes.

Theoretical Foundations

Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.

Module 6: Caching Topologies & Content Delivery

PHASE 2 — SYSTEM DESIGN FOUNDATIONS: Caching is the single highest-leverage performance and cost optimization in distributed systems. A correctly designed cache hierarchy can reduce database load by 90%+, cut user-facing latency from 200ms to 5ms, and reduce CDN bandwidth costs. This module covers the full stack of caching strategies — from browser to edge to application to database — and the failure modes that make caching notoriously difficult to get right.

Introduction: Why Databases Can't Scale With Raw Read Traffic

Adding more database replicas helps (linearly scales read capacity) but costs linearly in infrastructure: $$\text{Cost}_{\text{replicas}} = N \times \text{Instance Cost}$$

Caching breaks this linear relationship. A single Redis cluster costing $200/month can serve 1 million cache reads/second, replacing the need for multiple expensive database read replicas.

The Cache Hit Ratio Formula

The performance gain from caching is governed by the Cache Hit Ratio ($H$): $$\text{Avg Latency} = H \times L_{\text{cache}} + (1-H) \times L_{\text{db}}$$

Where:

$H$ = fraction of requests served from cache (hit ratio)
$L_{\text{cache}}$ = cache read latency (Redis: ~0.1ms)
$L_{\text{db}}$ = database query latency (PostgreSQL: ~5–50ms)

Hit Ratio ($H$)	Avg Latency (cache=0.1ms, DB=20ms)	DB Load Reduction
0% (no cache)	20.0ms	0%
50%	10.05ms	50%
80%	4.08ms	80%
95%	1.095ms	95%
99%	0.299ms	99%

A 99% cache hit ratio reduces your database load by 99% while serving users 67x faster.

Section 1: The Caching Layer Hierarchy

Modern systems use caches at multiple layers. Each layer adds speed and absorbs load before requests reach the next layer:

[User Browser]
    |
    | Browser Cache (HTTP Cache-Control headers)
    v
[CDN Edge Node] (Cloudflare, CloudFront, Fastly)
    |
    | CDN Cache (static assets, edge-computed responses)
    v
[Reverse Proxy Cache] (NGINX FastCGI cache, Varnish)
    |
    | Process-Level Cache (in-memory LRU in Node.js process)
    v
[Application Server]
    |
    | Distributed Cache (Redis, Memcached)
    v
[Database] (PostgreSQL, MySQL)

Each layer has different characteristics:

Cache Layer	Latency	Scope	Storage	TTL Range
Browser (HTTP)	0ms (local)	Per-user	User's disk/memory	Seconds to years
CDN Edge	1–5ms	Global	Edge PoP SSDs	Minutes to days
Reverse Proxy	0.5ms	Per-node	Server memory/disk	Seconds to hours
In-Process	0.001ms	Per-process	Process heap	Seconds to minutes
Distributed (Redis)	0.1ms	Cluster-wide	RAM	Seconds to hours
Database query cache	1–5ms	Database-wide	DB memory	Implicit (invalidated on write)

Section 2: Cache-Aside (Lazy Loading)

The most common caching pattern. The application code manages both the cache and the database directly.

Cache-Aside Read Flow:

Application → GET user:42 → Redis
                               |
               ┌──────────────┴──────────────┐
               │ Cache HIT                    │ Cache MISS
               │ Return cached value          │ Query PostgreSQL
               │ (~0.1ms)                     │ SET user:42 in Redis (TTL: 300s)
               │                              │ Return value (~20ms)
               └──────────────────────────────┘

class UserRepository {
  constructor(
    private redis: Redis,
    private db: PrismaClient
  ) {}

  async findById(id: string): Promise<User | null> {
    const cacheKey = `user:${id}`;
    const TTL = 300; // 5 minutes

    // 1. Check cache first
    const cached = await this.redis.get(cacheKey);
    if (cached) {
      return JSON.parse(cached);  // Cache hit — return immediately
    }

    // 2. Cache miss — query database
    const user = await this.db.user.findUnique({ where: { id } });
    if (!user) return null;

    // 3. Populate cache for future reads
    await this.redis.setex(cacheKey, TTL, JSON.stringify(user));
    return user;
  }

  async update(id: string, data: Partial<User>): Promise<User> {
    const updated = await this.db.user.update({ where: { id }, data });
    // Invalidate stale cache entry
    await this.redis.del(`user:${id}`);
    return updated;
  }
}

Trade-offs:

Pros: Only caches data that's actually requested. Cache failures are non-fatal (falls through to DB). Fine-grained cache control per query.
Cons: Cache miss on cold start causes a "thundering herd" (addressed in Section 5). Data can be stale until TTL expires if using TTL-based invalidation.

Section 3: Write-Through vs. Write-Behind

A. Write-Through

The application writes to the cache first. The cache synchronously writes to the database before confirming success.

Application → WRITE → [Cache] → WRITE → [Database]
                                         ↓
                              Success confirmed to Application

async createOrder(order: Order): Promise<void> {
  const tx = await this.db.$transaction(async (prisma) => {
    const saved = await prisma.order.create({ data: order });
    // Synchronously cache the new order
    await this.redis.setex(`order:${saved.id}`, 3600, JSON.stringify(saved));
    return saved;
  });
}

Trade-offs:

Pros: Cache is always consistent with the database. Read-after-write always returns fresh data.
Cons: Writes are slower (two writes: cache + DB). Cache stores data that may never be read again (write-all, read-few workloads waste cache memory).

B. Write-Behind (Write-Back)

The application writes to the cache only. The cache asynchronously flushes to the database in batches.

Application → WRITE → [Cache] → returns success immediately
                         ↓
             (async, batch flush every 500ms)
                         ↓
                    [Database]

Trade-offs:

Pros: Lowest write latency. Database receives batched writes — dramatically reduces write IOPS.
Cons: Data loss risk — if the cache node crashes before flushing, writes are lost. Complex to implement correctly (requires write durability guarantees in the cache layer, e.g., Redis AOF persistence).

C. Choosing the Right Write Strategy

Use Case	Recommended Strategy
User session data (can be regenerated)	Write-Through (safety over speed)
Product catalog (read-heavy, rare writes)	Cache-Aside with TTL
IoT sensor metrics (high write volume)	Write-Behind with AOF persistence
Financial account balances	Write-Through or no cache (strong consistency)
Search autocomplete suggestions	Write-Behind (eventual consistency acceptable)

Section 4: Cache Eviction Policies

When a cache is full and a new item must be inserted, an existing item must be evicted. The eviction policy determines which item is removed.

A. LRU (Least Recently Used)

Evicts the item that has not been accessed for the longest time. Implemented using a doubly-linked list + hash map combination ($O(1)$ get and put).

Cache State (capacity=3):
[A] → [B] → [C]  (A is least recently used)

Access D → Cache full → Evict A:
[B] → [C] → [D]

Best for: General-purpose caching where recently accessed items are likely to be accessed again.

B. LFU (Least Frequently Used)

Evicts the item with the lowest access count.

Item A: accessed 50 times
Item B: accessed 2 times  ← Evicted
Item C: accessed 30 times

Best for: Workloads with long-running hot items (popular product pages, trending news). LRU would eventually evict popular items if they haven't been accessed recently.

C. ARC (Adaptive Replacement Cache)

ARC adaptively balances between LRU and LFU based on recent workload patterns. Used by ZFS and some high-performance databases.

Redis Eviction Policy Configuration

# Set max memory and eviction policy in redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru

# Available policies:
# noeviction    — Return errors when memory limit is reached (fail-safe)
# allkeys-lru   — Evict any key using LRU
# volatile-lru  — Evict only keys with TTL set, using LRU
# allkeys-random — Evict random keys
# volatile-ttl  — Evict the key with the shortest TTL first

Section 5: Cache Failure Modes and Mitigations

A. Cache Stampede (Thundering Herd)

10,000 concurrent users
        |
  Key expires at T=0
        |
10,000 cache misses → 10,000 DB queries → DB CPU spikes to 100% → CRASH

Mitigation 1: Mutex Lock (Singleflight Pattern)

Only one goroutine/thread queries the database for an expired key. All other concurrent requests wait for the first result:

class SingleFlight {
  private in_flight = new Map<string, Promise<any>>();

  async do<T>(key: string, fn: () => Promise<T>): Promise<T> {
    if (this.in_flight.has(key)) {
      return this.in_flight.get(key)!;  // Wait for in-flight request
    }

    const promise = fn().finally(() => this.in_flight.delete(key));
    this.in_flight.set(key, promise);
    return promise;
  }
}

const singleFlight = new SingleFlight();

async function getArticle(id: string): Promise<Article> {
  const cached = await redis.get(`article:${id}`);
  if (cached) return JSON.parse(cached);

  // Only ONE database query fires even with 10,000 concurrent misses
  return singleFlight.do(`article:${id}`, async () => {
    const article = await db.article.findUnique({ where: { id } });
    await redis.setex(`article:${id}`, 60, JSON.stringify(article));
    return article;
  });
}

Mitigation 2: Probabilistic Early Expiration (XFetch Algorithm)

Before the key's TTL expires, a background thread proactively recomputes the value. The probability of triggering early recomputation increases as the TTL approaches:

$$\text{Refresh early if:} \quad -\beta \times \delta \times \ln(\text{rand}()) > (\text{TTL}_\text{remaining})$$

Where $\beta$ is a sensitivity parameter (higher = more aggressive early refresh) and $\delta$ is the time taken to recompute the value.

B. Cache Penetration

Attacker → GET user:999999999 → Redis (miss) → PostgreSQL (null) → repeat ×1M
                                                       ↑
                                               DB overwhelmed by null queries

Mitigation 1: Cache Null Results

Cache the null result with a short TTL:

const user = await db.user.findUnique({ where: { id } });
// Cache both found users AND null results
await redis.setex(`user:${id}`, user ? 300 : 30, JSON.stringify(user ?? null));

Mitigation 2: Bloom Filter

A Bloom Filter is a probabilistic data structure that can definitively answer "Does key X definitely NOT exist?" with zero false negatives (though it can have false positives):

Bloom Filter for valid user IDs:
- On user creation: bloomFilter.add(userId)
- On request: if !bloomFilter.has(userId) → return 404 immediately (skip cache + DB)
- Memory: 10M users stored in ~12MB (vs. 400MB for a hash set)

class UserService {
  private bloomFilter: BloomFilter;

  async findById(id: string): Promise<User | null> {
    // Fast rejection: definitively not in database
    if (!this.bloomFilter.test(id)) {
      return null;  // Return immediately — no cache or DB lookup
    }

    const cached = await this.redis.get(`user:${id}`);
    if (cached) return JSON.parse(cached);

    const user = await this.db.user.findUnique({ where: { id } });
    if (user) await this.redis.setex(`user:${id}`, 300, JSON.stringify(user));
    return user;
  }
}

C. Cache Avalanche

Mitigation: TTL Jitter

Add random variance to all TTL values to distribute expiry across time:

function jitteredTTL(baseTTL: number, jitterRatio = 0.1): number {
  const jitter = baseTTL * jitterRatio * Math.random();
  return Math.floor(baseTTL + jitter);  // e.g., 3600 ± 10% → 3600–3960 seconds
}

await redis.setex(key, jitteredTTL(3600), value);
// Different keys expire at different times — avalanche prevented

Section 6: CDN Edge Caching

A Content Delivery Network (CDN) is a globally distributed network of edge servers that cache and serve content from locations geographically close to users.

CDN Cache Mechanics

First request from user in Tokyo for /static/app.js:
  Tokyo PoP → MISS → Origin server (Virginia) → 100ms RTT
  CDN stores app.js at Tokyo PoP

All subsequent requests from Tokyo users for /static/app.js:
  Tokyo PoP → HIT → serves from local SSD → 2ms

HTTP Cache-Control Headers

The origin server controls CDN and browser caching behavior via HTTP response headers:

// Next.js API route: control caching at the HTTP layer
export async function GET(request: Request) {
  const data = await fetchProductCatalog();

  return new Response(JSON.stringify(data), {
    headers: {
      'Content-Type': 'application/json',
      // Cache for 1 hour at CDN edge, 5 min in browser, never private
      'Cache-Control': 'public, s-maxage=3600, max-age=300',
      // CDN can serve stale for 1 hour while revalidating in background
      'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=3600',
      // Unique version-based cache key (immutable assets)
      'Cache-Control': 'public, max-age=31536000, immutable',
    }
  });
}

Cache-Control Directives:

Directive	Meaning
`public`	Can be cached by CDNs and proxies
`private`	Only cacheable in the browser (not CDN)
`no-store`	Never cache anywhere
`no-cache`	Must revalidate with origin before serving cached copy
`max-age=N`	Browser caches for N seconds
`s-maxage=N`	CDN caches for N seconds (overrides `max-age` for shared caches)
`stale-while-revalidate=N`	Serve stale while fetching fresh in background
`immutable`	Browser never revalidates (use only with versioned URLs)

Section 7: Redis CLI Command Reference

# Connect to Redis
redis-cli -h redis.internal -p 6379

# String operations (most common)
SET user:42 '{"name":"Alice"}' EX 300    # Set with 300s TTL
GET user:42                               # Get value
DEL user:42                               # Delete key
TTL user:42                               # Remaining TTL in seconds
PERSIST user:42                           # Remove TTL (make permanent)

# Atomic counter (rate limiting)
INCR api:requests:user:42                 # Increment counter
EXPIRE api:requests:user:42 60            # Set TTL atomically

# Hash (partial field updates without full re-serialization)
HSET user:42 name "Alice" role "architect"
HGET user:42 name
HMGET user:42 name role

# Sorted Set (leaderboard)
ZADD leaderboard 9450 "alice"
ZADD leaderboard 8230 "bob"
ZREVRANGE leaderboard 0 9 WITHSCORES     # Top 10 with scores
ZRANK leaderboard "alice"                # Rank of a specific member

# Pipeline (batch commands to reduce RTT overhead)
redis-cli --pipe << EOF
SET key1 value1 EX 300
SET key2 value2 EX 300
SET key3 value3 EX 300
EOF

# Lua script (atomic multi-command operations — rate limiting example)
EVAL "
  local count = redis.call('INCR', KEYS[1])
  if count == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[1])
  end
  return count
" 1 api:rate:user:42 60

Section 8: Hands-On Practice Challenge

The Challenge

Your Goal

Construct a cache hierarchy diagram showing:

Client Browser with local cache (HTTP Cache-Control).
CDN caching static assets at the edge.
App Server checking Redis before querying the database.
PostgreSQL as the system of record.

Solution Model

graph TD
    Client[Client Browser\nBrowser Cache: 5min]
    CDN[CDN Edge Node\nStatic Assets: 1yr\nAPI Responses: 1hr]
    App[Application Server\nIn-Process LRU: 30s]
    Redis[Redis Cluster\nDistributed Cache: 5min]
    DB[(PostgreSQL\nSystem of Record)]

    Client -->|HTTPS| CDN
    CDN -->|Cache Miss: Static| App
    CDN -->|Cache Miss: API| App
    App -->|Cache Miss| Redis
    Redis -->|Cache Miss| DB
    DB -->|Result| Redis
    Redis -->|Cached Result| App
    App -->|Response + Cache-Control headers| CDN
    CDN -->|Cached Response| Client

    style CDN fill:#1e40af,color:#fff
    style Redis fill:#dc2626,color:#fff
    style DB fill:#065f46,color:#fff

Cache miss chain (worst case):

Browser (miss) → CDN (miss) → App (miss) → Redis (miss) → PostgreSQL
Total latency: ~25ms (CDN: 5ms + Redis: 0.5ms + DB: 20ms)

Cache hit at CDN (best case):

Browser (miss) → CDN (HIT)
Total latency: ~2ms

Module 6: Caching Topologies & Content Delivery

Theoretical Foundations

Module 6: Caching Topologies & Content Delivery

Introduction: Why Databases Can't Scale With Raw Read Traffic

The Cache Hit Ratio Formula

Section 1: The Caching Layer Hierarchy

Section 2: Cache-Aside (Lazy Loading)

Section 3: Write-Through vs. Write-Behind

A. Write-Through

B. Write-Behind (Write-Back)

C. Choosing the Right Write Strategy

Section 4: Cache Eviction Policies

A. LRU (Least Recently Used)

B. LFU (Least Frequently Used)

C. ARC (Adaptive Replacement Cache)

Redis Eviction Policy Configuration

Section 5: Cache Failure Modes and Mitigations

A. Cache Stampede (Thundering Herd)

B. Cache Penetration

C. Cache Avalanche

Section 6: CDN Edge Caching

CDN Cache Mechanics

HTTP Cache-Control Headers

Section 7: Redis CLI Command Reference

Section 8: Hands-On Practice Challenge

The Challenge

Your Goal

Solution Model

Module Deliverables

Module 6: Caching Topologies & Content Delivery

Theoretical Foundations

Module 6: Caching Topologies & Content Delivery

Introduction: Why Databases Can't Scale With Raw Read Traffic

The Cache Hit Ratio Formula

Section 1: The Caching Layer Hierarchy

Section 2: Cache-Aside (Lazy Loading)

Section 3: Write-Through vs. Write-Behind

A. Write-Through

B. Write-Behind (Write-Back)

C. Choosing the Right Write Strategy

Section 4: Cache Eviction Policies

A. LRU (Least Recently Used)

B. LFU (Least Frequently Used)

C. ARC (Adaptive Replacement Cache)

Redis Eviction Policy Configuration

Section 5: Cache Failure Modes and Mitigations

A. Cache Stampede (Thundering Herd)

B. Cache Penetration

C. Cache Avalanche

Section 6: CDN Edge Caching

CDN Cache Mechanics

HTTP Cache-Control Headers

Section 7: Redis CLI Command Reference

Section 8: Hands-On Practice Challenge

The Challenge

Your Goal

Solution Model

Module Deliverables