Theoretical Foundations

Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.

Module 19: Observability & Production Operations

1. Module Title & Overview

Title: Module 19: Observability & Production Operations
Overview: This module teaches engineers how to instrument, collect, and analyze system telemetry in production environments. Students will move beyond basic monitoring to design context-propagated distributed tracing, unified metric collection, and action-oriented alerting frameworks using OpenTelemetry standards.

2. Learning Objectives

Design Distributed Tracing Across Microservices: Implement tracing standards that propagate trace headers through synchronous HTTP/gRPC API channels and asynchronous message brokers.
Establish Metric Frameworks (RED vs. USE): Contrast application-level metrics (Rate, Errors, Duration) with system infrastructure metrics (Utilization, Saturation, Errors) to build coherent dashboards.
Formulate SLOs and Error Budgets: Translate business availability targets into quantifiable Service Level Objectives, measuring system performance using Service Level Indicators.
Optimize Observability Cost & Sampling: Apply trace sampling algorithms (head-based, tail-based) and metric rollup policies to limit log aggregation storage bills.

3. Prerequisite Statement

Requires Module 5 (Storage Paradigms & Database Mechanics) and Module 14 (Fault Tolerance & Resiliency). Students must understand write latency, disk saturation, and failure recovery policies before designing tracing pipelines that capture these occurrences.

4. Content Outline

Section 19.1: The Paradigm Shift to Observability

Concepts: Monitoring (Is it broken?) vs. Observability (Why is it broken?). The limits of unstructured logs. Structured events as system state representations.
Deep Dive: Limitations of traditional CPU/RAM threshold alerts in complex distributed networks. Root-cause isolation mechanics in microservices. The physical costs of un-indexed vs. indexed log architectures.
Architectural Trade-offs: Structured JSON logging enables structured queries but increases CPU parsing overhead and disk storage needs. Unstructured logging is simple to write but requires expensive regex parsing engines.
Physical Constraints: Telemetry extraction overhead, memory buffer sizes for asynchronous log flushers, and disk write speeds for local collectors.

Section 19.2: Distributed Tracing & W3C Trace Context Propagation

Concepts: Trace IDs, Span IDs, Parent-Child spans, Trace State, and the W3C Trace Context standard.
Deep Dive: Trace context propagation mechanics across boundaries. Header injection and extraction protocol details. Handling trace propagation in message queues (Kafka headers) vs. synchronous HTTP calls.
Architectural Trade-offs: Automatic instrumentation via runtime monkey-patching provides immediate visibility but adds black-box performance overhead. Manual instrumentation requires code changes but guarantees precise spans and zero bloat.
Physical Constraints: CPU overhead of span generation in high-throughput loops, context leak risks across asynchronous execution threads, and packet size increases from large headers.

Section 19.3: Metrics Engine Design: RED Method vs. USE Method

Concepts: The RED Method (Rate, Errors, Duration) for APIs. The USE Method (Utilization, Saturation, Errors) for system components. Histograms, Gauges, Counters, and Summaries.
Deep Dive: Statistical processing of percentile latency ($p50, p90, p99, p99.9$). The mathematical trap of averaging averages. Dimensionality and Cardinality limits in modern TSDBs (Time Series Databases).
Architectural Trade-offs: Storing raw high-cardinality metrics (like metric labels containing user IDs) allows granular debugging but degrades TSDB write performance and memory allocation. Metric rollups save space but destroy resolution.
Physical Constraints: Network bandwidth usage of metric scraping pulls vs. push mechanisms, and memory pressure on collector agent buffers.

Section 19.4: SLOs, SLIs, and Actionable Alerting

Concepts: Service Level Agreement (SLA), Service Level Objective (SLO), Service Level Indicator (SLI), Error Budgets, and Burn Rates.
Deep Dive: Calculating error budgets mathematically. Designing multi-window, multi-burn-rate alerts to eliminate alert fatigue. Connecting alert thresholds to pageable operational runbooks.
Architectural Trade-offs: Tight SLOs (e.g., 99.99%) drive high availability but require expensive multi-region architectures. Realistic SLOs (e.g., 99.9%) allow faster feature shipping and cheaper operations.
Physical Constraints: Monitoring loop evaluation intervals, system clock drift impacts, and alerting engine evaluation latencies.

Section 19.5: OpenTelemetry Architecture & Collection Pipelines

Concepts: OpenTelemetry SDKs, APIs, OTEL Collector (Receivers, Processors, Exporters), OTLP protocol, and Prometheus/Jaeger/Elastic backends.
Deep Dive: Designing localized collector agent sidecars vs. centralized collector service gateways. Configuring batch processing, memory limits, and queue retries in the collector configurations.
Architectural Trade-offs: Running a local OTEL collector sidecar offloads telemetry processing immediately from the application CPU but increases memory footprint per container/pod.
Physical Constraints: CPU limits of the collector container, local disk storage for buffering during collector outages, and backend network bandwidth limits.

Section 19.6: Cost Management & Sampling Strategies

Concepts: Head-based sampling vs. Tail-based sampling, log level filtering, metric rollup rules, and telemetry storage tiering.
Deep Dive: Execution flow of tail-based sampling rules. Retaining 100% of errors and only 1% of successful trace paths. Designing dynamic sampling rates based on network load spikes.
Architectural Trade-offs: Head-based sampling decides at the start of a trace, minimizing trace creation CPU overhead but risking missing downstream transaction failures. Tail-based sampling evaluates the entire trace at the end, capturing all failures but requiring memory-intensive buffering.
Physical Constraints: Buffer memory on tail-sampling processors, query latency limits on trace search backends, and storage indexing latency.

5. Key Concepts

Distributed Tracing: A method of tracking application requests as they flow through frontend client portals to backend microservices.
Context Propagation: The system mechanism that carries metadata (Trace ID, Span ID) across logical and physical network boundaries.
Telemetry Collector: A proxy service that receives, processes, filters, and exports application telemetry to storage backends.
USE Method: An infrastructure monitoring framework evaluating Utilization, Saturation, and Error rates of physical resources.
RED Method: An API monitoring framework evaluating request Rate, response Errors, and execution Duration.
Metric Cardinality: The number of unique time-series combinations generated by a metric name and its label key-value pairs.
Error Budget: The allowable fraction of time a service can fail or perform poorly before violating its SLO.
Burn Rate: The consumption rate of a service's error budget over a specific window of time.
Tail-based Sampling: Deciding to record or drop a trace after the entire transaction chain has finished executing.
OpenTelemetry (OTel): A vendor-neutral, open-source standard for generating, collecting, and exporting telemetry data.
W3C Trace Context: The standardized format for trace context HTTP headers (traceparent, tracestate).

6. Practice Section Description

Practice Exercise: Implementing Distributed Tracing and Metrics in a Payment Chain.
Scenario: An e-commerce system is experiencing intermittent checkout latency spikes. The payment chain consists of an API Gateway -> Checkout Service -> Payment Processor -> Database.
Challenge: Students must write the instrumentation code to generate context-propagated traces and capture latency histograms. They must build a telemetry flow diagram (via the diagram editor) mapping trace context headers as they travel across network boundaries and queue layers.
Constraints: Must use W3C traceparent headers. Must extract tracecontext from incoming HTTP requests and inject it into outgoing client calls. Must log error spans with stack traces if transactions fail.

flowchart TD
    subgraph Client Portal
        Browser[Client Browser]
    end

    subgraph Service Mesh Topology
        GW[API Gateway]
        Check[Checkout Service]
        Pay[Payment Processor]
        DB[(Payment Database)]
    end

    subgraph Telemetry Pipeline
        OTel_Agent[Local Collector Sidecar]
        OTel_Coll[Central OTEL Collector Gateway]
        TSDB[(Prometheus - Metrics)]
        TraceStore[(Jaeger - Traces)]
    end

    %% Business Request Flow
    Browser -->|HTTP Request| GW
    GW -->|HTTP Post - Inject traceparent| Check
    Check -->|gRPC - Inject traceparent| Pay
    Pay -->|SQL Query| DB

    %% Telemetry Collection Flow
    GW -.->|OTLP over gRPC| OTel_Agent
    Check -.->|OTLP over gRPC| OTel_Agent
    Pay -.->|OTLP over gRPC| OTel_Agent

    OTel_Agent -->|Batch Push| OTel_Coll
    OTel_Coll -->|Export Metrics| TSDB
    OTel_Coll -->|Export Traces| TraceStore

7. Deliverable/Documentation

Deliverable Name: Enterprise Observability Architecture & SLO Ledger
Description: A formal operations blueprint containing:
1. A structural network diagram mapping application workloads, collector proxies, and backends.
2. A trace propagation map detailing W3C header handling at each node of the payment workflow.
3. A defined SLO registry specifying three Service Level Objectives, error budget equations, and corresponding SLI metrics.
4. A tail-based sampling policy configuration (YAML) that retains 100% of anomalies/errors while throttling telemetry volume for success paths to keep storage costs under control.

Code Snippet: C# Middleware Implementing Context-Propagated Distributed Tracing

using System;
using System.Diagnostics;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;

public class TracePropagationMiddleware
{
    private readonly RequestDelegate _next;
    private static readonly ActivitySource MpcActivitySource = new ActivitySource("Mpc.Telemetry.Core");

    public TracePropagationMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        // Extract W3C Trace Context header (traceparent)
        // Format: version-traceId-parentId-traceFlags (e.g., 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01)
        string traceParentHeader = context.Request.Headers["traceparent"];
        Activity activity = null;

        if (!string.IsNullOrEmpty(traceParentHeader))
        {
            // Parse and start activity parented by incoming trace context
            activity = MpcActivitySource.StartActivity("HTTP Request Inbound", ActivityKind.Server, parentId: traceParentHeader);
        }
        else
        {
            // Start new root activity trace context
            activity = MpcActivitySource.StartActivity("HTTP Request Inbound", ActivityKind.Server);
        }

        // Add standard RED semantic tags to activity span
        if (activity != null)
        {
            activity.SetTag("http.method", context.Request.Method);
            activity.SetTag("http.route", context.Request.Path.Value);
            activity.SetTag("component", "middleware");
        }

        try
        {
            // Execute the next step in the pipeline
            await _next(context);

            if (activity != null)
            {
                activity.SetTag("http.status_code", context.Response.StatusCode);
                
                // Track error status based on HTTP code limits
                if (context.Response.StatusCode >= 500)
                {
                    activity.SetStatus(ActivityStatusCode.Error, $"Inbound request failed with status code {context.Response.StatusCode}");
                }
            }
        }
        catch (Exception ex)
        {
            if (activity != null)
            {
                activity.SetStatus(ActivityStatusCode.Error, ex.Message);
                activity.RecordException(ex);
            }
            throw;
        }
        finally
        {
            // Stop and record span data
            activity?.Stop();
        }
    }
}

public static class ActivityExtensions
{
    public static void RecordException(this Activity activity, Exception ex)
    {
        activity.AddEvent(new ActivityEvent("exception", DateTimeOffset.UtcNow, new ActivityTagsCollection
        {
            { "exception.type", ex.GetType().FullName },
            { "exception.message", ex.Message },
            { "exception.stacktrace", ex.StackTrace }
        }));
    }
}

8. Integration Notes

Curriculum Placement: Maps directly to Module 14 (Fault Tolerance & Resiliency). Provides the telemetry mechanism required to test and evaluate circuit breaker trip states, retry effectiveness, and fallback outcomes.
hiring_signal: "Can trace complex microservices failures back to their originating calls using distributed trace context standards and design performance indicators that align technical systems with business objectives."

Theoretical Foundations

Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.

Module 19: Observability & Production Operations

1. Module Title & Overview

Title: Module 19: Observability & Production Operations
Overview: This module teaches engineers how to instrument, collect, and analyze system telemetry in production environments. Students will move beyond basic monitoring to design context-propagated distributed tracing, unified metric collection, and action-oriented alerting frameworks using OpenTelemetry standards.

2. Learning Objectives

Design Distributed Tracing Across Microservices: Implement tracing standards that propagate trace headers through synchronous HTTP/gRPC API channels and asynchronous message brokers.
Establish Metric Frameworks (RED vs. USE): Contrast application-level metrics (Rate, Errors, Duration) with system infrastructure metrics (Utilization, Saturation, Errors) to build coherent dashboards.
Formulate SLOs and Error Budgets: Translate business availability targets into quantifiable Service Level Objectives, measuring system performance using Service Level Indicators.
Optimize Observability Cost & Sampling: Apply trace sampling algorithms (head-based, tail-based) and metric rollup policies to limit log aggregation storage bills.

3. Prerequisite Statement

4. Content Outline

Section 19.1: The Paradigm Shift to Observability

Concepts: Monitoring (Is it broken?) vs. Observability (Why is it broken?). The limits of unstructured logs. Structured events as system state representations.
Deep Dive: Limitations of traditional CPU/RAM threshold alerts in complex distributed networks. Root-cause isolation mechanics in microservices. The physical costs of un-indexed vs. indexed log architectures.
Architectural Trade-offs: Structured JSON logging enables structured queries but increases CPU parsing overhead and disk storage needs. Unstructured logging is simple to write but requires expensive regex parsing engines.
Physical Constraints: Telemetry extraction overhead, memory buffer sizes for asynchronous log flushers, and disk write speeds for local collectors.

Section 19.2: Distributed Tracing & W3C Trace Context Propagation

Concepts: Trace IDs, Span IDs, Parent-Child spans, Trace State, and the W3C Trace Context standard.
Deep Dive: Trace context propagation mechanics across boundaries. Header injection and extraction protocol details. Handling trace propagation in message queues (Kafka headers) vs. synchronous HTTP calls.
Architectural Trade-offs: Automatic instrumentation via runtime monkey-patching provides immediate visibility but adds black-box performance overhead. Manual instrumentation requires code changes but guarantees precise spans and zero bloat.
Physical Constraints: CPU overhead of span generation in high-throughput loops, context leak risks across asynchronous execution threads, and packet size increases from large headers.

Section 19.3: Metrics Engine Design: RED Method vs. USE Method

Concepts: The RED Method (Rate, Errors, Duration) for APIs. The USE Method (Utilization, Saturation, Errors) for system components. Histograms, Gauges, Counters, and Summaries.
Deep Dive: Statistical processing of percentile latency ($p50, p90, p99, p99.9$). The mathematical trap of averaging averages. Dimensionality and Cardinality limits in modern TSDBs (Time Series Databases).
Architectural Trade-offs: Storing raw high-cardinality metrics (like metric labels containing user IDs) allows granular debugging but degrades TSDB write performance and memory allocation. Metric rollups save space but destroy resolution.
Physical Constraints: Network bandwidth usage of metric scraping pulls vs. push mechanisms, and memory pressure on collector agent buffers.

Section 19.4: SLOs, SLIs, and Actionable Alerting

Concepts: Service Level Agreement (SLA), Service Level Objective (SLO), Service Level Indicator (SLI), Error Budgets, and Burn Rates.
Deep Dive: Calculating error budgets mathematically. Designing multi-window, multi-burn-rate alerts to eliminate alert fatigue. Connecting alert thresholds to pageable operational runbooks.
Architectural Trade-offs: Tight SLOs (e.g., 99.99%) drive high availability but require expensive multi-region architectures. Realistic SLOs (e.g., 99.9%) allow faster feature shipping and cheaper operations.
Physical Constraints: Monitoring loop evaluation intervals, system clock drift impacts, and alerting engine evaluation latencies.

Section 19.5: OpenTelemetry Architecture & Collection Pipelines

Concepts: OpenTelemetry SDKs, APIs, OTEL Collector (Receivers, Processors, Exporters), OTLP protocol, and Prometheus/Jaeger/Elastic backends.
Deep Dive: Designing localized collector agent sidecars vs. centralized collector service gateways. Configuring batch processing, memory limits, and queue retries in the collector configurations.
Architectural Trade-offs: Running a local OTEL collector sidecar offloads telemetry processing immediately from the application CPU but increases memory footprint per container/pod.
Physical Constraints: CPU limits of the collector container, local disk storage for buffering during collector outages, and backend network bandwidth limits.

Section 19.6: Cost Management & Sampling Strategies

Concepts: Head-based sampling vs. Tail-based sampling, log level filtering, metric rollup rules, and telemetry storage tiering.
Deep Dive: Execution flow of tail-based sampling rules. Retaining 100% of errors and only 1% of successful trace paths. Designing dynamic sampling rates based on network load spikes.
Architectural Trade-offs: Head-based sampling decides at the start of a trace, minimizing trace creation CPU overhead but risking missing downstream transaction failures. Tail-based sampling evaluates the entire trace at the end, capturing all failures but requiring memory-intensive buffering.
Physical Constraints: Buffer memory on tail-sampling processors, query latency limits on trace search backends, and storage indexing latency.

5. Key Concepts

Distributed Tracing: A method of tracking application requests as they flow through frontend client portals to backend microservices.
Context Propagation: The system mechanism that carries metadata (Trace ID, Span ID) across logical and physical network boundaries.
Telemetry Collector: A proxy service that receives, processes, filters, and exports application telemetry to storage backends.
USE Method: An infrastructure monitoring framework evaluating Utilization, Saturation, and Error rates of physical resources.
RED Method: An API monitoring framework evaluating request Rate, response Errors, and execution Duration.
Metric Cardinality: The number of unique time-series combinations generated by a metric name and its label key-value pairs.
Error Budget: The allowable fraction of time a service can fail or perform poorly before violating its SLO.
Burn Rate: The consumption rate of a service's error budget over a specific window of time.
Tail-based Sampling: Deciding to record or drop a trace after the entire transaction chain has finished executing.
OpenTelemetry (OTel): A vendor-neutral, open-source standard for generating, collecting, and exporting telemetry data.
W3C Trace Context: The standardized format for trace context HTTP headers (traceparent, tracestate).

6. Practice Section Description

Practice Exercise: Implementing Distributed Tracing and Metrics in a Payment Chain.
Scenario: An e-commerce system is experiencing intermittent checkout latency spikes. The payment chain consists of an API Gateway -> Checkout Service -> Payment Processor -> Database.
Challenge: Students must write the instrumentation code to generate context-propagated traces and capture latency histograms. They must build a telemetry flow diagram (via the diagram editor) mapping trace context headers as they travel across network boundaries and queue layers.
Constraints: Must use W3C traceparent headers. Must extract tracecontext from incoming HTTP requests and inject it into outgoing client calls. Must log error spans with stack traces if transactions fail.

flowchart TD
    subgraph Client Portal
        Browser[Client Browser]
    end

    subgraph Service Mesh Topology
        GW[API Gateway]
        Check[Checkout Service]
        Pay[Payment Processor]
        DB[(Payment Database)]
    end

    subgraph Telemetry Pipeline
        OTel_Agent[Local Collector Sidecar]
        OTel_Coll[Central OTEL Collector Gateway]
        TSDB[(Prometheus - Metrics)]
        TraceStore[(Jaeger - Traces)]
    end

    %% Business Request Flow
    Browser -->|HTTP Request| GW
    GW -->|HTTP Post - Inject traceparent| Check
    Check -->|gRPC - Inject traceparent| Pay
    Pay -->|SQL Query| DB

    %% Telemetry Collection Flow
    GW -.->|OTLP over gRPC| OTel_Agent
    Check -.->|OTLP over gRPC| OTel_Agent
    Pay -.->|OTLP over gRPC| OTel_Agent

    OTel_Agent -->|Batch Push| OTel_Coll
    OTel_Coll -->|Export Metrics| TSDB
    OTel_Coll -->|Export Traces| TraceStore

7. Deliverable/Documentation

Deliverable Name: Enterprise Observability Architecture & SLO Ledger
Description: A formal operations blueprint containing:
1. A structural network diagram mapping application workloads, collector proxies, and backends.
2. A trace propagation map detailing W3C header handling at each node of the payment workflow.
3. A defined SLO registry specifying three Service Level Objectives, error budget equations, and corresponding SLI metrics.
4. A tail-based sampling policy configuration (YAML) that retains 100% of anomalies/errors while throttling telemetry volume for success paths to keep storage costs under control.

Code Snippet: C# Middleware Implementing Context-Propagated Distributed Tracing

using System;
using System.Diagnostics;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Http;

public class TracePropagationMiddleware
{
    private readonly RequestDelegate _next;
    private static readonly ActivitySource MpcActivitySource = new ActivitySource("Mpc.Telemetry.Core");

    public TracePropagationMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        // Extract W3C Trace Context header (traceparent)
        // Format: version-traceId-parentId-traceFlags (e.g., 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01)
        string traceParentHeader = context.Request.Headers["traceparent"];
        Activity activity = null;

        if (!string.IsNullOrEmpty(traceParentHeader))
        {
            // Parse and start activity parented by incoming trace context
            activity = MpcActivitySource.StartActivity("HTTP Request Inbound", ActivityKind.Server, parentId: traceParentHeader);
        }
        else
        {
            // Start new root activity trace context
            activity = MpcActivitySource.StartActivity("HTTP Request Inbound", ActivityKind.Server);
        }

        // Add standard RED semantic tags to activity span
        if (activity != null)
        {
            activity.SetTag("http.method", context.Request.Method);
            activity.SetTag("http.route", context.Request.Path.Value);
            activity.SetTag("component", "middleware");
        }

        try
        {
            // Execute the next step in the pipeline
            await _next(context);

            if (activity != null)
            {
                activity.SetTag("http.status_code", context.Response.StatusCode);
                
                // Track error status based on HTTP code limits
                if (context.Response.StatusCode >= 500)
                {
                    activity.SetStatus(ActivityStatusCode.Error, $"Inbound request failed with status code {context.Response.StatusCode}");
                }
            }
        }
        catch (Exception ex)
        {
            if (activity != null)
            {
                activity.SetStatus(ActivityStatusCode.Error, ex.Message);
                activity.RecordException(ex);
            }
            throw;
        }
        finally
        {
            // Stop and record span data
            activity?.Stop();
        }
    }
}

public static class ActivityExtensions
{
    public static void RecordException(this Activity activity, Exception ex)
    {
        activity.AddEvent(new ActivityEvent("exception", DateTimeOffset.UtcNow, new ActivityTagsCollection
        {
            { "exception.type", ex.GetType().FullName },
            { "exception.message", ex.Message },
            { "exception.stacktrace", ex.StackTrace }
        }));
    }
}

8. Integration Notes

Curriculum Placement: Maps directly to Module 14 (Fault Tolerance & Resiliency). Provides the telemetry mechanism required to test and evaluate circuit breaker trip states, retry effectiveness, and fallback outcomes.
hiring_signal: "Can trace complex microservices failures back to their originating calls using distributed trace context standards and design performance indicators that align technical systems with business objectives."

Module 19: Observability & Production Operations

Theoretical Foundations

Module 19: Observability & Production Operations

1. Module Title & Overview

2. Learning Objectives

3. Prerequisite Statement

4. Content Outline

Section 19.1: The Paradigm Shift to Observability

Section 19.2: Distributed Tracing & W3C Trace Context Propagation

Section 19.3: Metrics Engine Design: RED Method vs. USE Method

Section 19.4: SLOs, SLIs, and Actionable Alerting

Section 19.5: OpenTelemetry Architecture & Collection Pipelines

Section 19.6: Cost Management & Sampling Strategies

5. Key Concepts

6. Practice Section Description

7. Deliverable/Documentation

Code Snippet: C# Middleware Implementing Context-Propagated Distributed Tracing

8. Integration Notes

Module Deliverables

Draw a microservices telemetry collector routing pipeline (OTel to Prometheus/Jaeger)

Module 19: Observability & Production Operations

Theoretical Foundations

Module 19: Observability & Production Operations

1. Module Title & Overview

2. Learning Objectives

3. Prerequisite Statement

4. Content Outline

Section 19.1: The Paradigm Shift to Observability

Section 19.2: Distributed Tracing & W3C Trace Context Propagation

Section 19.3: Metrics Engine Design: RED Method vs. USE Method

Section 19.4: SLOs, SLIs, and Actionable Alerting

Section 19.5: OpenTelemetry Architecture & Collection Pipelines

Section 19.6: Cost Management & Sampling Strategies

5. Key Concepts

6. Practice Section Description

7. Deliverable/Documentation

Code Snippet: C# Middleware Implementing Context-Propagated Distributed Tracing

8. Integration Notes

Module Deliverables

Draw a microservices telemetry collector routing pipeline (OTel to Prometheus/Jaeger)