Theoretical Foundations
Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.
Module 15: Environmental Assessment (Greenfield vs. Brownfield)
PREREQUISITE STATEMENT: Read this module after completing Module 14 (Fault Tolerance). Building resilient codebases is half the challenge; the other half is deploying them within complex, pre-existing enterprise environments. This module teaches you how to decompose legacy monoliths without disrupting active production operations.
1. Greenfield vs. Brownfield Architectures
Software architects rarely work with a blank slate. Most enterprise engineering occurs within the context of existing legacy systems:
[ Greenfield Architecture ]
- Absolute design freedom
- No technical debt
- Risk: Analysis paralysis, lack of real operational baseline
[ Brownfield Architecture ]
- Constrained by legacy databases and code
- Must maintain active revenue lines (100% uptime)
- Risk: Technical debt, leaky abstractions, regression bugs
A. Greenfield Engineering
Greenfield engineering refers to developing a system from scratch, with no pre-existing legacy constraints, schemas, or running servers:
- The Opportunity: Absolute design freedom. You can select the modern frameworks, databases, and deployment pipelines (e.g., Serverless, Kubernetes) that perfectly align with your target scalability goals.
- The Risks:
- Analysis Paralysis: The lack of constraints can lead to over-engineering and delayed delivery times.
- Unknown Operational Profile: You have no historical data regarding user load spikes, database query profiles, or cache hit ratios. Architects must design based on speculative assumptions.
B. Brownfield Transformation
Brownfield transformation refers to modifying, extending, or replacing a live legacy system:
- The Constraint: The legacy monolith is actively serving users and generating company revenue. You cannot shut down operations for a rewrite. You must migrate the architecture incrementally — equivalent to "changing the engine of a plane while it is flying."
- The Risks:
- Data Integration Bottlenecks: Legacy databases are often un-normalized, lacking foreign key constraints, and coupled to multiple systems.
- Regression Failures: Lack of automated test coverage in legacy systems makes modifying code risky.
- Leaky Abstractions: It is easy for legacy design patterns to leak into new microservices, corrupting the clean domain models.
2. The Strangler Fig Pattern
To decompose a legacy monolith safely, architects use the Strangler Fig Pattern. Named after the strangler fig tree that seeds in the upper branches of a host tree, grows roots down to the ground, and eventually envelope and kill the host, this pattern replaces monolithic capabilities incrementally:
Phase 1 (Interception):
[ Client ] ---> [ Routing Gateway ] ---> [ Legacy Monolith ]
Phase 2 (Co-existence):
[ Client ] ---> [ Routing Gateway ] ---> [ Legacy Monolith ] (User/Billing queries)
---> [ Order Service ] (Orders queries)
Phase 3 (Strangled):
[ Client ] ---> [ Routing Gateway ] ---> [ Order Service ]
---> [ User Service ]
(Legacy Monolith Decommissioned)
The Strangler Migration Lifecycle:
- Phase 1: Ingress Interception: Insert a Routing Gateway (e.g. Envoy, NGINX, or a Cloud CDN) in front of the Legacy Monolith. Initially, configure the gateway to route 100% of public traffic to the monolith.
- Phase 2: Extract & Co-exist: Identify a cohesive business capability (such as
/orders) to extract. Build a new microservice (OrderService) with its own database. Update the Routing Gateway to route requests targeting/ordersto the new service, while all other requests (e.g./users,/billing) continue to hit the monolith. - Phase 3: Strangling: Extract subsequent capabilities (e.g.
/userstoUserService,/billingtoBillingService) one by one, shifting routing targets in the gateway. - Phase 4: Monolith Elimination: Once all capabilities have been migrated, the legacy monolith receives zero traffic and can be decommissioned.
3. The Anti-Corruption Layer (ACL) Pattern
When a newly extracted microservice needs to communicate with the legacy monolith to fetch data, we face a data modeling challenge. Legacy systems often represent data in messy formats (e.g., a single string column containing delimited user details, or archaic naming conventions).
If the new service adopts these schemas directly, its clean domain model is corrupted. To prevent this, place an Anti-Corruption Layer (ACL) between the new service and the legacy monolith:
[ New Order Service ] ---> [ Anti-Corruption Layer (Translation) ] ---> [ Legacy Monolith API ]
The ACL translates data structures between the legacy representation and the clean domain models of the new service.
Example: TypeScript Anti-Corruption Layer implementation
Below is a clean domain representation and an ACL translating a legacy address string structure into a structured domain value object:
// Clean Domain Value Object
interface CustomerAddress {
street: string;
city: string;
postalCode: string;
}
// Legacy Monolith API Output
interface LegacyAddressPayload {
// Legacy format: "123 Main St, Dublin, D02, IE"
raw_address_line: string;
country_code: string;
}
// Anti-Corruption Layer (ACL) Translator
class AddressACLTranslator {
public static toDomain(legacyPayload: LegacyAddressPayload): CustomerAddress {
const parts = legacyPayload.raw_address_line.split(",");
return {
street: parts[0]?.trim() || "Unknown",
city: parts[1]?.trim() || "Unknown",
postalCode: parts[2]?.trim() || "Unknown"
};
}
}
4. Tooling: Legacy System Assessment
Before executing a brownfield migration, architects use static analysis and mapping tools to discover dependencies:
- Dependency Structure Matrix (DSM): Tools like Lattix or Structure101 analyze codebases to create a matrix mapping package dependencies. This highlights cycles and tightly coupled classes that must be decoupled before extraction.
- OpenTelemetry Tracing: Deploying tracing agents (e.g., Datadog, Jaeger) to the legacy monolith maps the real-world runtime execution paths, identifying database tables that are shared across different domains.
5. Documentation Standard: Legacy Modernization Ledger
A Legacy Modernization & Technical Debt Valuation Ledger documents the codebase sections selected for migration, estimating the technical debt cost and defining the replacement strategy:
Technical Debt Ledger
| Monolith Code Module | Technical Debt Risk | Cyclomatic Complexity | Business Impact | Chosen Migration Pattern | Estimated Migration Effort | Target Microservice |
|---|---|---|---|---|---|---|
com.legacy.orders |
High (Slow billing runs, lock errors) | 84 (High loop nesting) | Outages block client revenue | Strangler Fig with dynamic proxy routing. | 6 Weeks | orders-service |
com.legacy.users |
Medium (Static schema, slow joins) | 32 | Prevents user logins | Strangler Fig with Conformist integration. | 4 Weeks | user-service |
com.legacy.reporting |
Low (Batch reporting runs overnight) | 12 | Non-critical reporting delay | Direct rewrite to Serverless Cron task. | 2 Weeks | reports-lambda |
6. Hands-on Architecture Challenge
Scenario Description
A client application (ClientBrowser) connects directly to a LegacyMonolith. The monolith database handles both User Profiles and Order Transactions. You must refactor this architecture to implement the Strangler Fig Pattern with an Anti-Corruption Layer.
Your Goal:
- Insert an
APIGatewaybetween theClientand the services. - Define the routing paths inside the
APIGateway:- Route legacy endpoints (e.g.,
/users) to theLegacyMonolith. - Route refactored endpoints (e.g.,
/orders) to a newOrderMicroservice.
- Route legacy endpoints (e.g.,
- Inside the
OrderMicroservice, introduce anAntiCorruptionLayer(ACL) block. - Show the
AntiCorruptionLayercommunicating with theLegacyMonolithto fetch legacy user profile dependencies and translating them. - Draw this migration architecture using the diagram editor's graph syntax.
7. Practice Challenge Template
Use this template in your sandbox to model the Strangler Fig migration:
graph TD
subgraph Legacy_State [Legacy Direct Access]
ClientDirect[Client Browser] -->|Direct HTTP Calls| MonolithDirect[Legacy Monolith]
style ClientDirect fill:#faa,stroke:#333,stroke-width:2px
style MonolithDirect fill:#faa,stroke:#333,stroke-width:2px
end
subgraph Strangler_State [Strangler Fig Co-existence]
Client[Client Browser] -->|1. HTTP Request| Gateway[Routing Gateway]
Gateway -->|2a. Route /users| LegacyMonolith[Legacy Monolith]
Gateway -->|2b. Route /orders| OrderService[Order Microservice]
subgraph OrderService_Boundary [Order Service Boundary]
OrderService -->|3. Query Legacy Profile| ACL[Anti-Corruption Layer]
ACL -->|4. Translate Schema| CleanDomain[Order Domain Logic]
end
ACL -->|5. HTTP API Call| LegacyMonolith
style Client fill:#9f9,stroke:#333,stroke-width:2px
style Gateway fill:#9ff,stroke:#333,stroke-width:3px
style LegacyMonolith fill:#faa,stroke:#333,stroke-width:2px
style OrderService fill:#9f9,stroke:#333,stroke-width:2px
style ACL fill:#9ff,stroke:#333,stroke-width:2px
style CleanDomain fill:#9f9,stroke:#333,stroke-width:2px
end
8. Data Migration Strategy
Migrating data out of a legacy monolith is the highest-risk activity in any brownfield programme. The schema is shared by multiple consumers, the data volume may be in the hundreds of gigabytes, and you cannot afford downtime. Three primitives solve this: dual-write, shadow reads, and cutover gates.
8.1 The Dual-Write Pattern
During co-existence (Strangler Fig Phase 2), every write that reaches the legacy system is mirrored to the new service's database. This keeps both stores progressing in lock-step until the cutover gate is opened.
public class DualWriteOrderRepository : IOrderRepository
{
private readonly ILegacyOrderStore _legacy;
private readonly INewOrderStore _modern;
private readonly ILogger<DualWriteOrderRepository> _log;
public DualWriteOrderRepository(
ILegacyOrderStore legacy,
INewOrderStore modern,
ILogger<DualWriteOrderRepository> log)
{
_legacy = legacy;
_modern = modern;
_log = log;
}
public async Task SaveAsync(Order order, CancellationToken ct)
{
// Primary write: legacy store must succeed or we abort entirely.
await _legacy.SaveAsync(order, ct);
// Secondary write: failures here are logged but do NOT roll back
// the primary. A reconciliation job corrects divergence nightly.
try
{
await _modern.SaveAsync(order, ct);
}
catch (Exception ex)
{
_log.LogError(ex,
"Dual-write divergence for OrderId={OrderId}. " +
"Queuing reconciliation job.", order.Id);
await _reconciliationQueue.EnqueueAsync(order.Id, ct);
}
}
}
Why the secondary write is non-blocking: If the new store becomes temporarily unavailable during migration, the legacy system (the source of truth during Phase 2) must not degrade. The reconciliation queue acts as a compensating mechanism; it replays missed writes from the legacy write-ahead log (WAL) once the new store recovers.
Constraint — write amplification: Dual-write doubles write throughput to the database tier. For write-heavy domains (> 5,000 writes/s), measure the impact on connection pool saturation and WAL pressure before enabling this pattern in production.
8.2 Shadow Reads
Shadow reads validate that the new store's data matches the legacy store before you route real user traffic to it. A small percentage of reads—typically 1%–5%—are forked: the primary result is returned from the legacy store immediately, while an async comparison job fetches the same record from the new store and diffs the response.
public class ShadowReadOrderRepository : IOrderRepository
{
private readonly ILegacyOrderStore _legacy;
private readonly INewOrderStore _modern;
private readonly IDivergenceReporter _reporter;
private readonly double _shadowSampleRate; // e.g. 0.02 = 2%
public async Task<Order?> GetByIdAsync(Guid orderId, CancellationToken ct)
{
var legacyResult = await _legacy.GetByIdAsync(orderId, ct);
if (Random.Shared.NextDouble() < _shadowSampleRate)
{
// Fire-and-forget: never delay the caller's response.
_ = Task.Run(async () =>
{
var modernResult = await _modern.GetByIdAsync(orderId, ct);
if (!AreEquivalent(legacyResult, modernResult))
{
await _reporter.ReportAsync(new DivergenceEvent
{
EntityId = orderId.ToString(),
LegacyValue = legacyResult,
ModernValue = modernResult,
DetectedAtUtc = DateTime.UtcNow
});
}
}, ct);
}
return legacyResult;
}
private static bool AreEquivalent(Order? a, Order? b)
{
if (a is null && b is null) return true;
if (a is null || b is null) return false;
// Deep structural comparison; exclude volatile fields like UpdatedAt.
return a.Id == b.Id
&& a.CustomerId == b.CustomerId
&& a.TotalAmount == b.TotalAmount
&& a.Status == b.Status;
}
}
Shadow reads surface three categories of defect: (1) missing rows caused by incomplete bulk migration; (2) field-mapping errors in the Anti-Corruption Layer; and (3) timing windows where the reconciliation job has not yet caught up. The divergence reporter writes to a dedicated Postgres table or streams to a Grafana dashboard so the team can track the error rate trending toward zero before opening the cutover gate.
8.3 Cutover Gates
A cutover gate is a feature-flag controlled by a single configuration value that flips routing from the legacy store to the new store. The gate should only open when three observable conditions are met:
| Gate Condition | Measurement | Threshold |
|---|---|---|
| Shadow-read divergence rate | divergences / total_shadow_reads |
< 0.01% over 72 hours |
| Reconciliation queue depth | Count of unprocessed compensation jobs | 0 |
| New-store P99 read latency | 99th percentile read time | ≤ legacy P99 latency |
public class CutoverGate
{
private readonly IFeatureFlags _flags;
private readonly ILegacyOrderStore _legacy;
private readonly INewOrderStore _modern;
public async Task<Order?> GetByIdAsync(Guid orderId, CancellationToken ct)
{
// The flag is toggled in a config store (e.g., Azure App Config).
// A single atomic flip routes all reads — no code deployment needed.
bool useModern = await _flags.IsEnabledAsync("orders.read.modern", ct);
return useModern
? await _modern.GetByIdAsync(orderId, ct)
: await _legacy.GetByIdAsync(orderId, ct);
}
}
The key constraint is atomicity: flip the flag for reads and writes in a single configuration update. Leaving reads on legacy while writes go to modern (or vice versa) creates a split-brain state that is extremely difficult to reconcile.
9. Testing Seams in Legacy Code
Legacy codebases rarely have meaningful test coverage. Introducing characterization tests before any refactoring creates a safety net: the tests document existing behaviour, not intended behaviour. If a refactoring breaks a characterization test, you have changed observable behaviour — which may or may not be intentional.
9.1 What Is a Seam?
Michael Feathers defines a seam as a place where you can alter program behaviour without editing code in that place. Seams are the entry points for test doubles (stubs, fakes, mocks). In C#, the most practical seam types are:
- Constructor injection seams: Replace
new ConcreteType()with anIConcreteTypeparameter. - Virtual method seams: Override a
virtualmethod in a test subclass. - Delegate/Func seams: Inject a
Func<T>where a hard-coded static call existed.
9.2 Characterization Tests in C#
A characterization test calls a legacy method and records its actual output. The output becomes the assertion. This is deliberately the reverse of normal TDD: you are not specifying intent; you are pinning existing behavior.
// Legacy code under test — do NOT modify this class yet.
public class LegacyPricingEngine
{
public decimal CalculateTotal(int quantity, decimal unitPrice, string region)
{
decimal tax = region == "IE" ? 0.23m : 0.20m;
// Undocumented rounding: truncates rather than rounds.
return Math.Truncate((unitPrice * quantity) * (1 + tax) * 100) / 100;
}
}
// Characterization test — captures actual behaviour as an assertion.
[TestClass]
public class LegacyPricingEngine_CharacterizationTests
{
private readonly LegacyPricingEngine _engine = new();
[TestMethod]
public void IrishTaxRate_TruncatesResult_NotRounded()
{
// Act: call legacy code with real production inputs.
decimal actual = _engine.CalculateTotal(3, 14.99m, "IE");
// Assert: pin the ACTUAL output, even if it looks wrong.
// We will understand WHY later; for now we are capturing it.
Assert.AreEqual(55.33m, actual,
"Existing truncation behaviour must be preserved during refactor.");
}
[TestMethod]
public void NonIrishTaxRate_UsesStandardVAT()
{
decimal actual = _engine.CalculateTotal(3, 14.99m, "GB");
Assert.AreEqual(53.96m, actual);
}
}
Once you have a suite of characterization tests passing against the legacy code, you can refactor the internals safely. Any change that breaks an assertion forces a deliberate conversation: "Are we intentionally changing this behaviour, or did we introduce a regression?"
9.3 Approval Tests
Approval tests (popularized by the ApprovalTests library) extend characterization testing to complex outputs: serialized JSON, HTML reports, or multi-field objects. Instead of writing individual Assert.AreEqual calls for every field, you serialize the entire output and compare it against a golden file stored in version control.
// Requires: dotnet add package ApprovalTests
using ApprovalTests;
using ApprovalTests.Reporters;
[UseReporter(typeof(DiffReporter))]
[TestClass]
public class LegacyOrderSummaryService_ApprovalTests
{
[TestMethod]
public void GenerateSummary_MatchesApprovedOutput()
{
var service = new LegacyOrderSummaryService();
var order = new Order
{
Id = Guid.Parse("00000000-0000-0000-0000-000000000001"),
CustomerId = 42,
Lines = new[]
{
new OrderLine { Sku = "SKU-001", Qty = 2, UnitPrice = 9.99m },
new OrderLine { Sku = "SKU-002", Qty = 1, UnitPrice = 24.99m }
}
};
string actualSummary = service.GenerateSummary(order);
// On first run this FAILS and creates a .received.txt file.
// After manual inspection, rename it to .approved.txt to lock in.
Approvals.Verify(actualSummary);
}
}
The .approved.txt file is committed to the repository. On every subsequent CI run, the Approvals.Verify() call compares the live output against the approved file. This makes approval tests invaluable for legacy output formatters, report generators, and serialization layers where the exact character-by-character output must remain stable during refactoring.
Constraint — approved file maintenance: Every intentional behaviour change requires an engineer to deliberately update the approved file. This is by design: it creates a traceable audit trail of what changed and why.
10. Measuring Modernisation Progress
A brownfield migration can take 12–36 months. Without objective measurement, teams lose confidence, stakeholders question ROI, and engineers burn out. The DORA (DevOps Research & Assessment) four-key metrics provide an evidence-based progress framework that directly correlates with migration health.
10.1 The Four DORA Metrics
| Metric | Definition | Elite Threshold | Why It Matters for Modernisation |
|---|---|---|---|
| Deployment Frequency | How often code is deployed to production | Multiple times per day | Monoliths ship monthly; microservices should ship daily. Increasing frequency signals successful decoupling. |
| Lead Time for Changes | Time from code commit to production deploy | < 1 hour | Short lead times prove CI/CD pipelines are healthy and test suites are fast enough to not block release. |
| Change Failure Rate | % of deployments causing production incidents | < 5% | High failure rates during migration indicate insufficient test coverage or missing characterization tests. |
| Mean Time to Recover (MTTR) | Time to restore service after an incident | < 1 hour | Microservices should recover faster than monoliths due to independent deployment. |
10.2 Tracking Metrics in C#
Instrumenting your CI/CD pipeline to emit DORA telemetry requires capturing deployment events and correlating them with incident records. The following model illustrates the event types you need to capture:
public record DeploymentEvent(
string ServiceName,
string CommitSha,
DateTime CommitTimestamp,
DateTime DeployedAtUtc,
DeploymentOutcome Outcome);
public enum DeploymentOutcome { Success, RolledBack, HotfixRequired }
public class DoraMetricsCalculator
{
private readonly IReadOnlyList<DeploymentEvent> _events;
public DoraMetricsCalculator(IReadOnlyList<DeploymentEvent> events)
=> _events = events;
/// <summary>
/// Deployment Frequency: average deployments per day over the window.
/// </summary>
public double DeploymentFrequency(TimeSpan window)
{
var cutoff = DateTime.UtcNow - window;
var inWindow = _events.Where(e => e.DeployedAtUtc >= cutoff).ToList();
return inWindow.Count / window.TotalDays;
}
/// <summary>
/// Lead Time: median time from commit to production deploy.
/// </summary>
public TimeSpan MedianLeadTime()
{
var leadTimes = _events
.Select(e => e.DeployedAtUtc - e.CommitTimestamp)
.OrderBy(t => t)
.ToList();
int mid = leadTimes.Count / 2;
return leadTimes.Count == 0 ? TimeSpan.Zero : leadTimes[mid];
}
/// <summary>
/// Change Failure Rate: fraction of deployments that were rolled back.
/// </summary>
public double ChangeFailureRate()
{
if (_events.Count == 0) return 0;
int failures = _events.Count(e =>
e.Outcome is DeploymentOutcome.RolledBack or DeploymentOutcome.HotfixRequired);
return (double)failures / _events.Count;
}
}
10.3 Using Metrics as Migration Gates
Rather than declaring migration phases complete on a calendar date, bind phase transitions to DORA thresholds:
- Phase 2 → Phase 3 (Strangling): Only begin strangling the next capability once the newly extracted service has achieved a Deployment Frequency of ≥ 1/day for 30 consecutive days and a Change Failure Rate below 10%.
- Phase 3 → Phase 4 (Decommission): Only decommission the monolith once the new services have MTTR ≤ 30 minutes, demonstrating the on-call team can resolve incidents in the microservice estate without the safety net of the monolith as a fallback.
This transforms abstract migration milestones into measurable engineering outcomes, which is precisely the language executive stakeholders understand when making infrastructure investment decisions.
10.4 Baseline First, Then Improve
A critical error teams make is not capturing a baseline measurement before the migration begins. Without a pre-migration baseline, you cannot prove improvement. Instrument the legacy monolith's deployment pipeline on day one, even if deployments only happen monthly. That single data point—one deployment per month—becomes the "before" state that your post-migration metrics are compared against.
Module 15 → Module 16: Defining technical migration patterns establishes the physical implementation rules, but driving organizational buy-in and documenting key engineering decisions requires architectural governance. Proceed to Module 16: Governance, ADRs, & The Architecture Review Board (ARB) to discover how to align engineering teams and record architectural choices.