Blog/The System Design Interview Framework That Senior Engineers Actually Use
system-designinterviewarchitecturecareersenior-engineer

The System Design Interview Framework That Senior Engineers Actually Use

March 28, 2024·14 min read·by Bishwambhar Sen
Whiteboard session with a distributed system diagram, capacity math, and constraint annotations

Concept

The system design interview has a documented failure mode that affects even very strong engineers: they start drawing boxes immediately. Service here, database there, load balancer up front — a reasonable-looking diagram assembled from muscle memory, without any systematic analysis of the problem. The interview runs 45 minutes, produces a plausible-looking architecture, and lands a "weak hire" because the candidate demonstrated pattern-matching, not reasoning.

What senior engineers actually do when designing systems — in interviews and in production — is a structured process of constraint elicitation, envelope calculation, bottleneck identification, and explicit trade-off articulation. The diagram is the output of that process, not the process itself.

This post gives you the full framework as a repeatable methodology. It is not a set of templates to memorize. It is a reasoning process that, once internalized, generalizes to any problem from "design Twitter" to "design a payment processor" to "redesign our order management system."

The Five-Phase Framework

  1. Requirements Scoping — eliminate ambiguity before drawing anything
  2. Capacity Estimation — derive the scale parameters that constrain technology choices
  3. System Decomposition — identify the primary components and their relationships
  4. Bottleneck Identification — reason explicitly about where the system will fail under load
  5. Trade-off Articulation — make the decision rationale explicit and defensible

Phase 1: Requirements Scoping

The first five minutes of a system design interview should produce zero architecture. They should produce a requirements document.

The questions to ask are not politeness rituals — they directly determine the architecture:

Scale questions: How many users? Daily active users vs. registered users? Peak concurrent sessions? Read-to-write ratio? These determine whether you need sharding, caching layers, read replicas, or whether a single well-configured database is fine.

Consistency questions: What happens if a user sees slightly stale data? Is strong consistency required (financial transactions, inventory reservations) or acceptable to sacrifice (social media timelines, recommendation feeds)? This determines whether you can use eventual consistency, which dramatically expands your architectural options.

Latency questions: What is the user-facing SLA? What operations must be synchronous? What can be deferred? A 200ms SLA on a user-facing API with three downstream service calls has a fundamentally different architecture than a 2-second SLA on a background processing pipeline.

Availability questions: What is the acceptable downtime? 99.9% (8.7 hours/year)? 99.99% (52 minutes/year)? Higher availability multiplies operational complexity and infrastructure cost — it is not free.

Scope questions (often the most important): "Design a URL shortener" — does that include analytics? Custom domains? Expiry? Rate limiting? Abuse prevention? Each of these is a separate subsystem. Clarifying scope tells you what to design and demonstrates that you understand what you're building.

The scoping output is a written list (on the whiteboard) of:

  • Functional requirements (what the system does)
  • Non-functional requirements (scale, latency, availability, consistency)
  • Explicitly out-of-scope items

Phase 2: Capacity Estimation

Capacity estimation is arithmetic, not guesswork. Senior engineers do this with pen and paper (or whiteboard) before any architectural decision because the numbers determine the technology choices.

The Estimation Toolkit

Memorize these constants:

  • 1 million seconds ≈ 11.6 days → 1 billion seconds ≈ 32 years
  • Character in UTF-8: 1–4 bytes. ASCII: 1 byte.
  • Average tweet / short message: 200 bytes
  • Average photo: 300KB–2MB
  • SSD random read: 100–500MB/s. HDD: 50–150MB/s
  • Network bandwidth (cloud datacenter): 1–10Gbps per instance
  • PostgreSQL single-instance write throughput: ~10,000–50,000 inserts/second (index-dependent)
  • Redis SET/GET throughput: ~100,000–500,000 ops/second single node

The Estimation Process

For a URL shortener handling 100 million daily URL redirects:

Read throughput:

100M redirects / day ÷ 86,400 seconds = ~1,157 reads/second (average)
Peak (10x average) = ~12,000 reads/second

Write throughput (assume 1% of reads are new URL creations):

1M writes/day ÷ 86,400 = ~12 writes/second

Storage (5-year horizon, 100 bytes per URL record):

1M writes/day × 365 × 5 years = 1.825 billion records
1.825B × 100 bytes = ~182 GB

Cache sizing (80/20 rule: top 20% of URLs serve 80% of traffic):

Cache the top 20% of daily active URLs: 0.2 × 1M = 200K URLs
200K × 100 bytes = 20MB — trivially small for Redis

This arithmetic tells you immediately: this is a read-heavy, storage-light, write-trivial system. The architecture should optimize for reads (cache aggressively), the database doesn't need to be large or write-optimized, and writes are so infrequent that any relational database handles them comfortably.

Phase 3: System Decomposition

Only after requirements and capacity estimation do you draw components. The decomposition should be driven by the data, not by instinct.

For the URL shortener:

  • Write path: API → URL shortener service → PostgreSQL (12 writes/second — a single primary handles this)
  • Read path: Client → CDN (cache hot URLs at edge) → Load balancer → URL shortener service → Redis (cache warm URLs in-memory) → PostgreSQL (cache miss only)

The read path has three layers of caching before hitting the database. The architecture follows from the estimation: 12,000 reads/second, 80% cache hit = ~2,400 requests/second reaching the service, of which ~90% Redis hit = ~240 requests/second reaching PostgreSQL. A single read replica handles this trivially.

Phase 4: Bottleneck Identification

Every system has a limiting resource. Identifying it explicitly is the mark of architectural maturity.

Common bottlenecks by resource type:

Network: Inter-service call frequency exceeds the connection pool capacity. Fix: batch calls, use HTTP/2 multiplexing, reduce call count.

CPU: Business logic is computationally intensive (image processing, ML inference, cryptography). Fix: horizontal scaling, offload to dedicated compute tier.

Memory: Large working sets (in-memory caches, session state, graph traversals). Fix: partitioned cache, off-process caching (Redis), streaming rather than buffering.

Disk I/O: Write throughput exceeds single-disk capacity. Fix: write batching, LSM-tree storage engines, sharding.

Database connections: Connection pool exhaustion under concurrent load (see Little's Law). Fix: increase pool size to match, add PgBouncer for PostgreSQL, ensure async/await prevents thread-blocking.

External rate limits: Third-party API call quota exceeded under load. Fix: queue-based rate limiting, exponential backoff with jitter, client-side throttle.

Constraints

The Trade-off Triangle: CAP, Cost, and Team

Every architectural choice optimizes across three dimensions:

  1. Technical trade-offs (consistency vs. availability, latency vs. throughput)
  2. Cost trade-offs (infrastructure cost, operational complexity, build cost)
  3. Team trade-offs (team skill ceiling, maintenance burden, onboarding complexity)

Senior engineers make all three explicit. "We'll use Cassandra" is incomplete. "We'll use Cassandra because we need write throughput > 100K/second that PostgreSQL cannot provide at this replication factor, our team already operates a Cassandra cluster for another service, and the eventual consistency model is acceptable for this use case" is an architectural rationale.

Trade-offs

The Trade-off Articulation Structure

The single most differentiating skill in a system design interview is articulating trade-offs with specificity. The structure is:

Option A vs. Option B, given constraint C:

  • Option A is better when [specific condition] because [mechanism]
  • Option B is better when [specific condition] because [mechanism]
  • I recommend Option [X] because [our specific context matches the conditions for X]

Example: "SQL vs. NoSQL for the user profile store:"

  • SQL (PostgreSQL) is better when profiles have complex relationships to other entities, need ACID transactions, and write volume is below ~50K/second — because the relational model, MVCC transactions, and mature query planner handle these requirements with low operational overhead.
  • NoSQL (DynamoDB) is better when profiles are large, access patterns are single-key lookups, and write volume exceeds relational throughput — because the document model avoids join overhead and DynamoDB's horizontal partitioning handles arbitrary write scale.
  • For this use case (100M users, profile reads on every login, write on profile update only): PostgreSQL with a Redis cache, because our read volume is the constraint, not writes, and PostgreSQL handles 10K writes/second with ease. NoSQL would add complexity without addressing our actual bottleneck.

Code

The following models the capacity estimation process as a structured calculation object — the kind of reasoning that should precede any architectural decision, but is rarely formalized:

// CapacityEstimationModel.cs
// First-principles capacity analysis for a system design
public record TrafficProfile
{
    public required long DailyActiveUsers { get; init; }
    public required double ReadsPerUserPerDay { get; init; }
    public required double WritesPerUserPerDay { get; init; }
    public required double PeakMultiplier { get; init; } = 5.0;
    public required int AveragePayloadBytes { get; init; }
}

public class CapacityEstimationModel
{
    private readonly TrafficProfile _profile;

    public CapacityEstimationModel(TrafficProfile profile)
    {
        _profile = profile;
    }

    public double AverageReadRps =>
        (_profile.DailyActiveUsers * _profile.ReadsPerUserPerDay) / 86_400.0;

    public double PeakReadRps => AverageReadRps * _profile.PeakMultiplier;

    public double AverageWriteRps =>
        (_profile.DailyActiveUsers * _profile.WritesPerUserPerDay) / 86_400.0;

    public double PeakWriteRps => AverageWriteRps * _profile.PeakMultiplier;

    public long FiveYearStorageBytes =>
        (long)(_profile.DailyActiveUsers
               * _profile.WritesPerUserPerDay
               * 365 * 5
               * _profile.AveragePayloadBytes);

    public double FiveYearStorageGb => FiveYearStorageBytes / 1_073_741_824.0;

    public double NetworkBandwidthMbps =>
        (PeakReadRps * _profile.AveragePayloadBytes * 8) / 1_000_000.0;

    public StorageRecommendation RecommendStorage()
    {
        if (PeakWriteRps < 5_000 && FiveYearStorageGb < 500)
            return StorageRecommendation.SingleRelationalPrimary;

        if (PeakWriteRps < 50_000 && FiveYearStorageGb < 5_000)
            return StorageRecommendation.RelationalWithReadReplicas;

        if (PeakWriteRps < 200_000)
            return StorageRecommendation.ShardedRelational;

        return StorageRecommendation.NoSqlHorizontalScale;
    }

    public void PrintSummary()
    {
        Console.WriteLine("=== Capacity Estimation ===");
        Console.WriteLine($"Avg Read RPS:    {AverageReadRps:F0}  | Peak: {PeakReadRps:F0}");
        Console.WriteLine($"Avg Write RPS:   {AverageWriteRps:F0} | Peak: {PeakWriteRps:F0}");
        Console.WriteLine($"5-Year Storage:  {FiveYearStorageGb:F1} GB");
        Console.WriteLine($"Network (peak):  {NetworkBandwidthMbps:F0} Mbps");
        Console.WriteLine($"Storage Rec:     {RecommendStorage()}");
    }
}

public enum StorageRecommendation
{
    SingleRelationalPrimary,
    RelationalWithReadReplicas,
    ShardedRelational,
    NoSqlHorizontalScale
}

// Example: Twitter-scale timeline service
// var model = new CapacityEstimationModel(new TrafficProfile
// {
//     DailyActiveUsers = 100_000_000,
//     ReadsPerUserPerDay = 50,   // 50 timeline fetches/day
//     WritesPerUserPerDay = 0.1, // 1 tweet per 10 days on average
//     PeakMultiplier = 10,
//     AveragePayloadBytes = 500
// });
// model.PrintSummary();
// → Avg Read RPS: 57,870 | Peak: 578,700
// → Avg Write RPS: 116 | Peak: 1,157
// → 5-Year Storage: 274.9 GB
// → Storage Rec: ShardedRelational (or NoSql for read scale)

The second code block shows a trade-off decision logger — a pattern for documenting architectural decisions in code at the moment the decision is encoded, aligned with the ADR practice:

// ArchitecturalDecisionLogger.cs
// Documents trade-off decisions as structured log events at the code callsite
// Integrates with structured logging for searchability and audit

[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class)]
public sealed class ArchitecturalDecisionAttribute : Attribute
{
    public string DecisionRef { get; }       // ADR number: "ADR-0015"
    public string Rationale { get; }
    public string[] RejectedAlternatives { get; }

    public ArchitecturalDecisionAttribute(
        string decisionRef,
        string rationale,
        params string[] rejectedAlternatives)
    {
        DecisionRef = decisionRef;
        Rationale = rationale;
        RejectedAlternatives = rejectedAlternatives;
    }
}

// Applied at the point where the architectural decision manifests in code:
public class TimelineReadService
{
    private readonly IRedisCache _cache;
    private readonly ITimelineRepository _repository;
    private readonly ILogger<TimelineReadService> _logger;

    [ArchitecturalDecision(
        "ADR-0022",
        "Using Redis read-through cache with 60s TTL because timeline reads " +
        "are 100x more frequent than writes. Eventual consistency is acceptable " +
        "for social timelines (users tolerate 60s staleness per product decision).",
        "No cache (rejected: DB cannot sustain 580K RPS at peak)",
        "CDN cache (rejected: personalized content cannot be cached at edge)")]
    public async Task<IReadOnlyList<TimelinePost>> GetTimelineAsync(
        Guid userId,
        int pageSize = 20,
        CancellationToken cancellationToken = default)
    {
        var cacheKey = $"timeline:{userId}:page1:{pageSize}";

        var cached = await _cache.GetAsync<IReadOnlyList<TimelinePost>>(
            cacheKey, cancellationToken);

        if (cached is not null)
        {
            _logger.LogDebug(
                "Timeline cache HIT for user {UserId} — cache key {CacheKey}",
                userId, cacheKey);
            return cached;
        }

        _logger.LogDebug(
            "Timeline cache MISS for user {UserId} — reading from database",
            userId);

        var timeline = await _repository.GetLatestPostsAsync(
            userId, pageSize, cancellationToken);

        // ADR-0022: 60-second TTL — balances freshness against DB load
        await _cache.SetAsync(
            cacheKey,
            timeline,
            expiry: TimeSpan.FromSeconds(60),
            cancellationToken);

        return timeline;
    }
}

Further Reading

External references:

  • Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly.
  • Pryce, N. & Freeman, S. (2010). Growing Object-Oriented Software, Guided by Tests. Addison-Wesley.
  • "Numbers Every Programmer Should Know" — Jeff Dean, Google (2009 Stanford lecture).