Blog/The 8 Fallacies of Distributed Computing — Still Relevant in 2024

distributed-systemscloudresiliencedotnet

The 8 Fallacies of Distributed Computing — Still Relevant in 2024

January 15, 2024·14 min read·by Bishwambhar Sen

A distributed cloud architecture diagram showing inter-service calls with annotated failure points corresponding to each of the eight fallacies

Concept

The fallacies of distributed computing are a set of incorrect assumptions that engineers make when designing distributed systems. Originally compiled by L. Peter Deutsch at Sun Microsystems in 1994 and later extended to eight by James Gosling, they represent recurring design mistakes that surface as production failures, not unit test failures. Each fallacy feels locally reasonable. None survive contact with production at scale.

What makes this list endure is that the fallacies do not disappear as infrastructure improves — they shift their manifestation. In 1994, "the network is reliable" failed when a physical switch failed. In 2024, it fails when an AWS availability zone's networking fabric loses connectivity for 90 seconds, or when a misconfigured security group silently drops packets between two ECS tasks. The failure mode is different; the false assumption is identical.

This post addresses each fallacy, its modern cloud manifestation, and the architectural mitigation pattern in .NET.

Constraints

Fallacy 1: The Network Is Reliable

The assumption: If I make an HTTP call, it will either succeed or return an error. It will not disappear.

Modern manifestation: Kubernetes pod-to-pod communication traverses multiple network interfaces, iptables rules, CNI plugin layers, and load balancer health checks. A request can be sent, received, and silently dropped before processing, or processed but the response dropped before receipt. The caller has no way to distinguish between "the request was not received" and "the request was received but the response was lost."

Mitigation: All inter-service calls must be idempotent and retried with exponential backoff and jitter. The client must not assume that a timeout means the server did not process the request. Use idempotency keys on all mutating operations. Implement at-least-once delivery for event publishing, with deduplication on the consumer side.

Fallacy 2: Latency Is Zero

The assumption: A call to a remote service is fast enough to treat as synchronous without special handling.

Modern manifestation: In a microservices architecture with 5 serial inter-service calls on the critical path, each with P99 latency of 50ms, your P99 end-to-end latency is 250ms — before accounting for serialisation overhead, service mesh sidecar latency, or network jitter. At P99.9, where each call degrades to 150ms, end-to-end latency becomes 750ms. This compounds silently until a latency spike causes a user-visible timeout.

Mitigation: Measure P95 and P99 latency for every dependency, not just averages. Parallelize independent downstream calls using Task.WhenAll. Use read-side caching with explicit TTL to eliminate redundant cross-service reads. Set explicit timeouts on every outbound call — never rely on the OS default TCP timeout (which can be 2 minutes).

Fallacy 3: Bandwidth Is Infinite

The assumption: I can send as much data as I want between services without concern for bandwidth cost or throughput.

Modern manifestation: A GraphQL API that returns full entity graphs where the client needs only 3 fields. An event stream that serialises large domain objects as JSON when only an ID and status are needed. Inter-AZ data transfer costs in AWS run at $0.01 per GB — negligible until your service transfers 50TB per month of unnecessarily large payloads, producing a $500/month line item and measurable latency from serialisation overhead.

Mitigation: Shape API responses to client needs (BFF pattern, sparse fieldsets, pagination). Use binary serialisation formats (Protobuf, MessagePack) for high-throughput internal services. Implement compression for large response bodies. Audit event payload sizes and extract references over full object copies.

Fallacy 4: The Network Is Secure

The assumption: Communication between internal services does not need to be secured — the network perimeter protects it.

Modern manifestation: Kubernetes clusters with permissive network policies where any pod can call any other pod. A misconfigured Service Account with excessive RBAC permissions. A shared Redis instance accessible from every microservice without authentication. The 2020 SolarWinds attack propagated through trusted internal network paths that defenders assumed were safe.

Mitigation: Zero-trust networking: mutual TLS (mTLS) between all services (enforced by the service mesh — Istio, Linkerd, or AWS App Mesh). Least-privilege RBAC for all service identities. Network policies at the namespace level restricting inter-service communication to declared dependencies. Secret rotation through a secrets manager (HashiCorp Vault, AWS Secrets Manager) rather than static environment variables.

Fallacy 5: Topology Doesn't Change

The assumption: The set of services, their addresses, and their configurations are stable between deployments.

Modern manifestation: Kubernetes schedules pods to any available node. Pod IP addresses change on every restart. A rolling deployment of Service B changes 5 of its 10 instances simultaneously. A DNS record has a 30-second TTL, but your HttpClient was initialised at startup and cached the DNS result for the process lifetime, now routing to a decommissioned IP.

Mitigation: Use service discovery (Kubernetes DNS, Consul, or AWS Cloud Map) rather than hardcoded IPs. Configure HttpClient via IHttpClientFactory to respect DNS TTL and connection lifecycle. Implement health check probes so load balancers remove unhealthy instances from rotation before they receive traffic. Use circuit breakers to detect topology changes that manifest as sustained failures.

Fallacy 6: There Is One Administrator

The assumption: A single person or team has full operational control over the system.

Modern manifestation: A multi-cloud architecture spanning AWS and Azure, where the networking team controls the VPN peering, the platform team controls Kubernetes RBAC, the security team controls certificate rotation, and each product team controls their own services. A production incident requires coordination across 4 teams, each with different oncall rotations, ticketing systems, and escalation paths.

Mitigation: Define and document runbooks that cross team boundaries. Invest in platform engineering that provides self-service operational capabilities to product teams (deploy, rollback, scale, rotate secrets) without requiring cross-team escalation for routine operations. Implement chaos engineering to discover multi-team coordination failures before production incidents do.

Fallacy 7: Transport Cost Is Zero

The assumption: The overhead of network serialisation, protocol negotiation, and connection establishment is negligible.

Modern manifestation: A service making 500 outbound HTTP calls per request because each database entity is fetched individually (the N+1 problem applied to inter-service calls). JSON serialisation and deserialisation consuming 15% of CPU in a high-throughput API because every call to the user service returns a full 2KB user object when only the user's timezone is needed. Connection establishment overhead for short-lived connections to a service that doesn't support HTTP Keep-Alive.

Mitigation: Batch remote calls where possible. Use connection pooling for all outbound connections. Profile serialisation overhead in high-throughput paths — System.Text.Json with source generation is 3–5× faster than reflection-based serialisation. Cache frequently read, rarely changed remote data locally with appropriate TTLs.

Fallacy 8: The Network Is Homogeneous

The assumption: All network paths have equivalent characteristics — latency, reliability, bandwidth, protocol support.

Modern manifestation: A service deployed across AWS us-east-1 and eu-west-1 that assumes cross-region calls have the same latency as intra-region calls. A mobile client on a 3G connection hitting an API designed for broadband. A gRPC service calling a legacy SOAP endpoint through a protocol translation layer that adds 40ms of overhead.

Mitigation: Measure latency from every network path, not just from the deployment perspective. Design APIs to minimise round-trips for high-latency clients (batch endpoints, GraphQL, or gRPC server streaming). Implement regional routing (Route 53 latency-based routing, Cloudflare GeoDNS) to route users to the nearest region. Never assume a network path is uniform without measurement.

Trade-offs

The practical trade-off all eight fallacies share is this: acknowledging each fallacy adds engineering complexity. Idempotency keys, retry logic, timeout configurations, circuit breakers, service discovery, mTLS, connection pooling — each is correct, and each adds code to write, test, and maintain. The engineering question is not "should we handle this?" but "where on the spectrum of defensive programming do we stop?"

The answer is risk-based: for infrastructure that handles financial transactions, healthcare records, or user authentication, all eight mitigations are mandatory. For an internal analytics dashboard serving 10 engineers, you can accept higher failure tolerance in exchange for reduced operational overhead. The fallacy is not in accepting trade-offs — it is in not recognising that you are making them.

Code

This C# demonstrates a resilient HTTP client factory configuration that addresses Fallacies 1, 2, 3, 5, and 7 concurrently — reflecting the reality that resilience patterns are not independent; they compose.

public static class CatalogueServiceHttpClientExtensions
{
    public static IServiceCollection AddCatalogueServiceClient(
        this IServiceCollection services,
        IConfiguration configuration)
    {
        var options = configuration
            .GetRequiredSection("Services:Catalogue")
            .Get<CatalogueServiceOptions>()!;

        services
            .AddHttpClient<ICatalogueServiceClient, CatalogueServiceClient>(client =>
            {
                // Fallacy 2 mitigation: explicit timeout, not OS default
                client.BaseAddress = new Uri(options.BaseUrl);
                client.Timeout = TimeSpan.FromMilliseconds(options.TimeoutMs);

                // Fallacy 3 mitigation: request compression support and minimal Accept headers
                client.DefaultRequestHeaders.AcceptEncoding.Add(
                    new StringWithQualityHeaderValue("gzip"));
                client.DefaultRequestHeaders.Accept.Add(
                    new MediaTypeWithQualityHeaderValue("application/json"));
            })
            // Fallacy 5 mitigation: IHttpClientFactory manages connection lifetime
            // and respects DNS TTL — do NOT use a singleton HttpClient
            .ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
            {
                PooledConnectionLifetime = TimeSpan.FromMinutes(2), // respect DNS changes
                PooledConnectionIdleTimeout = TimeSpan.FromMinutes(1),
                MaxConnectionsPerServer = 20,
                EnableMultipleHttp2Connections = true,
                AutomaticDecompression = System.Net.DecompressionMethods.GZip // Fallacy 3
            })
            // Fallacy 1 & 2 mitigation: circuit breaker + retry via Polly
            .AddResilienceHandler("catalogue-pipeline", pipeline =>
            {
                pipeline.AddCircuitBreaker(new CircuitBreakerStrategyOptions
                {
                    FailureRatio = 0.5,
                    MinimumThroughput = 20,
                    SamplingDuration = TimeSpan.FromSeconds(30),
                    BreakDuration = TimeSpan.FromSeconds(15)
                });

                pipeline.AddRetry(new RetryStrategyOptions
                {
                    MaxRetryAttempts = 3,
                    BackoffType = DelayBackoffType.Exponential,
                    UseJitter = true, // Prevents thundering herd on recovery
                    Delay = TimeSpan.FromMilliseconds(150),
                    ShouldHandle = new PredicateBuilder()
                        .Handle<HttpRequestException>()
                        .Handle<TimeoutRejectedException>()
                });

                pipeline.AddTimeout(TimeSpan.FromMilliseconds(options.TimeoutMs));
            });

        return services;
    }
}

This second example demonstrates addressing Fallacy 7 (transport cost) through response shaping — requesting only the fields your service actually needs from a downstream API:

public class CatalogueServiceClient : ICatalogueServiceClient
{
    private readonly HttpClient _httpClient;
    private readonly ILogger<CatalogueServiceClient> _logger;

    public CatalogueServiceClient(HttpClient httpClient, ILogger<CatalogueServiceClient> logger)
    {
        _httpClient = httpClient;
        _logger = logger;
    }

    // Fallacy 7 mitigation: request only the fields this context needs
    // instead of fetching full product entities (2KB+) when we only need price and availability
    public async Task<IReadOnlyList<ProductPricingSnapshot>> GetPricingSnapshotsAsync(
        IReadOnlyList<Guid> productIds,
        CancellationToken ct)
    {
        if (!productIds.Any())
            return Array.Empty<ProductPricingSnapshot>();

        // Batch request — Fallacy 7: single call vs. N calls
        // Sparse fieldset — Fallacy 3: minimal payload vs. full entity
        var queryString = string.Join("&", productIds.Select(id => $"ids={id}"));
        var endpoint = $"/api/products/pricing?{queryString}&fields=id,price,currency,inStock,stockLevel";

        using var response = await _httpClient.GetAsync(endpoint, ct);
        response.EnsureSuccessStatusCode();

        var body = await response.Content.ReadFromJsonAsync<ProductPricingBatchResponse>(
            cancellationToken: ct);

        _logger.LogDebug(
            "Fetched pricing for {Count} products, response size: {Size} bytes",
            productIds.Count,
            response.Content.Headers.ContentLength ?? -1);

        return body?.Snapshots ?? Array.Empty<ProductPricingSnapshot>();
    }
}

The fields=id,price,currency,inStock,stockLevel querystring parameter is a sparse fieldset — a server-supported mechanism (common in JSON:API and GraphQL) to return only the properties the client requested. If the full product entity is 2KB and you only need 5 fields totalling 120 bytes, sparse fieldsets reduce payload by 94% for every call.