Theoretical Foundations
Welcome to the curriculum workspace. Here you will find long-form technical guidelines outlining core architectural blueprints and implementation mechanics.
Module 12: CQRS & Event Sourcing
PREREQUISITE STATEMENT: This module assumes familiarity with event-driven architecture primitives covered in Module 9: Event-Driven Architectures (EDA), choreography and orchestration trade-offs from Module 11: Sagas & Distributed Transactions, and storage engine internals from Module 5: Storage Paradigms. If you have not completed those modules, do so before proceeding. The patterns introduced here — CQRS and Event Sourcing — are frequently misunderstood in isolation and are most coherent when viewed as natural extensions of event-driven thinking.
Introduction
Every production database system ever deployed encodes an implicit assumption: that the schema and data model used to write data is also the most efficient shape for reading it back. This assumption holds for toy applications and simple internal tooling. It collapses under the weight of real-world asymmetric access patterns.
The central observation that drives this entire module is deceptively simple: in most real-world systems, reads massively outnumber writes. A social media platform processes thousands of post writes per second, but serves millions of feed reads per second. An e-commerce platform processes hundreds of order submissions per minute, but its product catalog is queried billions of times per day. A banking system records thousands of transactions per hour, but customers and compliance systems interrogate the ledger continuously.
This asymmetry is not incidental — it is structural. And it demands a fundamentally different approach to data modeling.
Command Query Responsibility Segregation (CQRS) and Event Sourcing are two orthogonal but synergistic patterns that address this challenge. CQRS partitions the data access layer into two distinct pipelines — one optimized for writing, another optimized for reading. Event Sourcing reframes the persistence model entirely, replacing snapshot-based state tables with an immutable, append-only log of everything that has ever happened to a domain entity.
Together, they provide capabilities that no single unified data model can match: complete audit trails, temporal queries, scalable read projections, independent scaling of read and write paths, and near-zero friction for debugging production anomalies by replaying history.
This module provides an exhaustive treatment of both patterns. You will learn where they came from, how they work mechanically, when to apply them, and — critically — when not to. Every concept is grounded in production TypeScript implementations and validated against real-world case studies from organizations that operate at scale.
Section 1: The Read/Write Asymmetry Problem
A. The Ratio That Breaks CRUD
In an introductory database course, you are taught the Create/Read/Update/Delete (CRUD) model. A single table holds the canonical state of each entity. Every application layer — the write path, the read path, the search path — reads from and writes to this table. The schema is normalized according to Boyce-Codd Normal Form to eliminate redundancy and preserve data integrity.
This model is elegant. It is also catastrophically inefficient at scale.
Consider the read-to-write ratio on a selection of canonical production systems:
| System Type | Typical Read:Write Ratio | Dominant Read Pattern |
|---|---|---|
| Social media feed (Twitter/X, Facebook) | 100:1 to 1000:1 | Timeline aggregation across followed accounts |
| Product catalog (Amazon, Shopify) | 500:1 | Faceted search, related products, price lookups |
| Banking ledger | 10:1 to 50:1 | Balance inquiries, statement generation |
| Blog platform | 50:1 | Article rendering with author, tags, comment counts |
| E-commerce order management | 20:1 | Order status, fulfillment tracking, invoice generation |
A 100:1 read-to-write ratio means that for every row you write, you query it 100 times. If you optimize your schema for writes (normalize it), every read must execute expensive multi-table JOIN operations. If you optimize your schema for reads (denormalize it), your writes must maintain redundant copies of data and risk anomalies during updates.
This is not an implementation detail. It is a fundamental tension in relational data modeling.
B. The Anatomy of the Mismatch: A Blog Platform Case Study
To make this concrete, examine a canonical blog platform. The write model is normalized following standard database design principles:
Write Model (Normalized Schema):
Table: articles
+-----------+------------+------------+-----------+
| id (PK) | title | body | author_id |
+-----------+------------+------------+-----------+
| 1 | "CQRS 101" | "..." | 42 |
| 2 | "Sagas" | "..." | 17 |
+-----------+------------+------------+-----------+
Table: authors
+--------+-------------+------------------+
| id (PK)| display_name| profile_image_url|
+--------+-------------+------------------+
| 42 | "Alice Wang" | "..." |
| 17 | "Bob Patel" | "..." |
+--------+-------------+------------------+
Table: article_tags (many-to-many junction)
+------------+---------+
| article_id | tag_id |
+------------+---------+
| 1 | 8 |
| 1 | 14 |
| 2 | 8 |
+------------+---------+
Table: comments
+--------+------------+---------+-----------------------------+
| id (PK)| article_id | user_id | body |
+--------+------------+---------+-----------------------------+
| 901 | 1 | 55 | "Great explanation of CQRS" |
| 902 | 1 | 67 | "Helped me so much" |
+--------+------------+---------+-----------------------------+
To render a single article card in the blog feed, the application must execute:
SELECT
a.id,
a.title,
au.display_name AS author_name,
au.profile_image_url,
COUNT(c.id) AS comment_count,
STRING_AGG(t.name, ', ') AS tags
FROM articles a
JOIN authors au ON a.author_id = au.id
LEFT JOIN comments c ON c.article_id = a.id
LEFT JOIN article_tags at ON at.article_id = a.id
LEFT JOIN tags t ON t.id = at.tag_id
WHERE a.id = $1
GROUP BY a.id, au.display_name, au.profile_image_url;
This query executes five table scans, four JOIN operations, and a GROUP BY aggregation for every single article card rendered. On a blog with 50,000 articles and 2 million comments, this query can take 80 to 300 milliseconds per call under production load. At 50:1 read-to-write ratio, this becomes the primary bottleneck in the system.
The read model that your users actually need looks nothing like the normalized write schema:
// Read Model (Denormalized Article Card View):
{
"articleId": 1,
"title": "CQRS 101",
"authorName": "Alice Wang",
"authorAvatarUrl": "https://cdn.mpc.io/avatars/alice.jpg",
"tags": ["distributed-systems", "patterns"],
"commentCount": 2,
"publishedAt": "2026-01-15T09:30:00Z",
"excerpt": "In most production systems, reads massively outnumber..."
}
This pre-joined, pre-aggregated JSON document can be served from a Redis cache or an Elasticsearch index in under 2 milliseconds — 40 to 150 times faster than the JOIN query against the normalized schema.
The insight of CQRS is to treat these two shapes — the write model and the read model — as first-class, separately maintained data structures, each optimized for its specific access pattern.
C. Why ORM-Based Abstractions Amplify the Problem
Object-Relational Mappers (ORMs) such as TypeORM, Prisma, Hibernate, or ActiveRecord create a single class hierarchy that maps to the database schema and is used for both reads and writes. This convenience becomes a liability at scale:
- Overfetching: ORM queries retrieve full entity objects (with all columns) even when only two fields are needed for a list view.
- N+1 Queries: Lazy-loading associations causes ORM frameworks to execute one query for each item in a list, turning a 50-item feed into 51 database round trips.
- Impedance Mismatch: The ORM's object graph cannot be efficiently serialized into the shape clients need without additional transformation layers that add CPU overhead.
- Schema Lock-in: Adding a new field to support a new read requirement forces a schema migration on the write table, which may require locking on large tables.
The mathematical consequence of the N+1 problem is straightforward. If your feed renders $K$ articles per page and each article requires $J$ additional associated lookups (tags, author, comments):
$$\text{DB Queries per Page Load} = 1 + (K \times J)$$
For $K = 20$ articles and $J = 3$ additional lookups each:
$$\text{DB Queries per Page Load} = 1 + (20 \times 3) = 61\text{ queries}$$
At a read-to-write ratio of 50:1 with 1,000 writes per minute, the system must execute:
$$61 \times 50{,}000 = 3{,}050{,}000\text{ queries per minute}$$
This load saturates a standard database connection pool and collapses under moderate traffic. CQRS eliminates this by maintaining pre-built read projections that can be retrieved with a single key lookup.
Section 2: CQRS — Command Query Responsibility Segregation
A. Intellectual Origins
The philosophical root of CQRS lies in Bertrand Meyer's Command-Query Separation (CQS) principle, introduced in his 1988 book "Object-Oriented Software Construction." Meyer articulated a constraint that is elegant in its simplicity:
"Every method should either be a command that performs an action, or a query that returns data to the caller, but not both."
A query is a pure function: it returns a value and produces no observable side effects. A command changes the state of the system and returns nothing (or at most an acknowledgment of receipt). This dual classification eliminates a class of subtle bugs in which a function both mutates state and returns data, making reasoning about program state difficult.
In 2010, Greg Young extended Meyer's CQS principle from the method level to the architectural level, coining the term Command Query Responsibility Segregation (CQRS). Young's key insight was that the same principle that clarifies method design also clarifies system design: the infrastructure that processes writes and the infrastructure that serves reads should be separate, independently scalable components.
graph LR
subgraph "Bertrand Meyer's CQS (1988)"
M1[method query()] --> |returns value, no side effects| RV[Return Value]
M2[method command()] --> |mutates state, returns void| SS[State Change]
end
subgraph "Greg Young's CQRS (2010)"
CMD[CommandHandler] --> |writes events, returns void| ES[(Event Store)]
QRY[QueryHandler] --> |reads projection, returns DTO| RM[(Read Model DB)]
end
B. The Core Architecture
A CQRS system divides all operations into two mutually exclusive categories:
CQRS Architectural Partition
+-----------------------------------------------------------+
| CLIENT APPLICATION |
+-----------------------------------------------------------+
| |
| Commands (mutations) | Queries (reads)
| (no return value) | (no side effects)
v v
+-------------------+ +-------------------+
| COMMAND SIDE | | QUERY SIDE |
| | | |
| - CommandBus | | - QueryBus |
| - CommandHandlers | | - QueryHandlers |
| - Domain Model | | - Read Model DTOs |
| - Event Store | | - Projections |
+-------------------+ +-------------------+
| ^
| Domain Events | Async updates
v |
+-------------------------------------------+
| PROJECTION / SYNC ENGINE |
| (Projectionist workers subscribe to |
| event stream and update read models) |
+-------------------------------------------+
1. The Command Side (Write Model)
The command side handles all state mutations. The design vocabulary on the command side is intentional:
Commands represent intent, not data. A command is a request for the system to do something. It is named in the imperative, present tense: PlaceOrderCommand, RegisterUserCommand, CancelSubscriptionCommand, TransferFundsCommand. A command may be rejected if business validation fails. It carries the context and parameters needed to execute the operation, but it does not carry a return value — only an acknowledgment or a rejection error.
Command Handlers contain business logic. Each command is routed by a CommandBus to a single CommandHandler responsible for that command type. The handler loads the relevant domain aggregate, invokes domain logic, validates invariants, and — if validation passes — emits domain events to the event store.
Domain Events record what happened. After a command succeeds, the command side emits domain events: OrderPlacedEvent, UserRegisteredEvent, SubscriptionCancelledEvent. These events describe immutable facts about what occurred in the domain, past tense. They are broadcast to the event store and then consumed asynchronously by the projection engine.
2. The Query Side (Read Model)
The query side handles all reads. Its job is singular: serve pre-built, pre-joined, pre-aggregated data as fast as possible.
Queries describe data requirements, not mutations. A query is also named expressively: GetOrderByIdQuery, ListRecentOrdersByCustomerQuery, SearchProductsByCategoryQuery. Queries carry filter parameters and pagination hints but make no changes to system state.
Query Handlers read from optimized projections. A QueryHandler does not touch the normalized write database or the event store. It queries a read model — a Redis hash, an Elasticsearch document, a denormalized PostgreSQL materialized view — that was built and maintained by the projection engine. The read model is shaped precisely for what the client needs: no joins, no aggregations, no N+1 issues.
Queries are horizontally scalable without coordination. Because queries make no writes and read from eventually consistent projections, the query layer can be scaled to dozens of instances behind a load balancer with no coordination overhead.
C. Complete TypeScript CQRS Implementation
The following implementation demonstrates the full command/query pipeline for an order management domain. This is not pseudocode — it represents the structure of a production CQRS system.
// --- SHARED TYPES ---
export interface Command {
readonly commandId: string;
readonly correlationId: string;
readonly issuedAt: Date;
}
export interface Query<TResult> {
readonly queryId: string;
}
export interface CommandResult {
readonly success: boolean;
readonly aggregateId?: string;
readonly error?: string;
}
// --- DOMAIN EVENTS ---
export interface DomainEvent {
readonly eventId: string;
readonly eventType: string;
readonly aggregateId: string;
readonly occurredAt: Date;
readonly version: number;
readonly data: Record<string, unknown>;
readonly metadata: Record<string, unknown>;
}
// --- COMMAND: PlaceOrderCommand ---
export class PlaceOrderCommand implements Command {
public readonly commandId: string;
public readonly correlationId: string;
public readonly issuedAt: Date;
constructor(
public readonly customerId: string,
public readonly items: Array<{ productId: string; quantity: number; unitPrice: number }>,
public readonly shippingAddressId: string,
correlationId: string,
) {
this.commandId = crypto.randomUUID();
this.correlationId = correlationId;
this.issuedAt = new Date();
}
}
// --- COMMAND HANDLER: PlaceOrderCommandHandler ---
export class PlaceOrderCommandHandler {
constructor(
private readonly orderRepository: OrderRepository,
private readonly eventStore: EventStore,
private readonly customerValidator: CustomerValidator,
private readonly inventoryService: InventoryService,
) {}
async handle(command: PlaceOrderCommand): Promise<CommandResult> {
// Step 1: Validate the customer exists and has payment credentials
const customerValid = await this.customerValidator.validate(command.customerId);
if (!customerValid) {
return { success: false, error: `Customer ${command.customerId} not found or lacks payment credentials` };
}
// Step 2: Validate inventory availability for all items
for (const item of command.items) {
const available = await this.inventoryService.checkAvailability(item.productId, item.quantity);
if (!available) {
return { success: false, error: `Product ${item.productId} has insufficient inventory` };
}
}
// Step 3: Compute order total
const totalAmount = command.items.reduce(
(sum, item) => sum + item.quantity * item.unitPrice,
0,
);
// Step 4: Generate the aggregate ID for the new order
const orderId = crypto.randomUUID();
// Step 5: Construct the domain event (this is the ONLY write)
const orderPlacedEvent: DomainEvent = {
eventId: crypto.randomUUID(),
eventType: 'OrderPlaced',
aggregateId: orderId,
occurredAt: new Date(),
version: 1,
data: {
customerId: command.customerId,
items: command.items,
totalAmount,
shippingAddressId: command.shippingAddressId,
status: 'PENDING',
},
metadata: {
correlationId: command.correlationId,
commandId: command.commandId,
},
};
// Step 6: Append the event to the event store (append-only, no UPDATE operations)
await this.eventStore.appendToStream(`order-${orderId}`, [orderPlacedEvent], {
expectedVersion: 0, // This is a new stream; fail if it already exists
});
return { success: true, aggregateId: orderId };
}
}
// --- QUERY: GetOrderByIdQuery ---
export class GetOrderByIdQuery implements Query<OrderDetailReadModel> {
public readonly queryId: string;
constructor(
public readonly orderId: string,
public readonly requestingUserId: string,
) {
this.queryId = crypto.randomUUID();
}
}
// --- READ MODEL: OrderDetailReadModel ---
export interface OrderDetailReadModel {
orderId: string;
status: 'PENDING' | 'CONFIRMED' | 'SHIPPED' | 'DELIVERED' | 'CANCELLED';
customerName: string;
customerEmail: string;
items: Array<{
productId: string;
productName: string;
quantity: number;
unitPrice: number;
lineTotal: number;
}>;
totalAmount: number;
shippingAddress: string;
placedAt: Date;
lastUpdatedAt: Date;
}
// --- QUERY HANDLER: GetOrderByIdQueryHandler ---
export class GetOrderByIdQueryHandler {
constructor(
private readonly readModelDb: ReadModelRepository,
private readonly authorizationService: AuthorizationService,
) {}
async handle(query: GetOrderByIdQuery): Promise<OrderDetailReadModel | null> {
// Verify the requesting user has permission to view this order
const authorized = await this.authorizationService.canViewOrder(
query.requestingUserId,
query.orderId,
);
if (!authorized) {
throw new UnauthorizedError(`User ${query.requestingUserId} cannot access order ${query.orderId}`);
}
// Read directly from the denormalized read model — no JOINs, no aggregation
const orderDetail = await this.readModelDb.findOrderDetail(query.orderId);
return orderDetail ?? null;
}
}
D. Eventual Consistency Between the Two Sides
The command side and the query side do not share a database. The command side writes events to the event store. The projection engine asynchronously consumes those events and updates the read model database. This introduces eventual consistency: there is a window of time — typically measured in milliseconds to seconds — during which the read model does not yet reflect the outcome of a recently processed command.
This is not a defect. It is an explicit architectural trade-off. The read side becomes available for independent scaling, independent technology choice (Redis, Elasticsearch, Cassandra, PostgreSQL materialized views), and independent performance tuning, in exchange for a bounded window of data staleness.
The projection lag can be quantified:
$$\Delta t_{\text{projection}} = t_{\text{read model updated}} - t_{\text{event appended to store}}$$
Under normal operating conditions, this lag is typically 50–200ms for an event-driven projection system using persistent subscriptions to EventStoreDB. Under load spikes or projection worker failures, this can grow to seconds or minutes.
Communicating this to users requires deliberate UI design:
- Optimistic updates: The UI immediately reflects the command outcome locally before the server projection catches up. If the command is rejected, the UI rolls back.
- Eventual consistency hints: After a write, display "Your order is being processed" rather than immediately fetching the order from the read model.
- Read-after-write tokens: Attach a version/checkpoint to the command acknowledgment. The query handler waits until the read model has processed at least up to that checkpoint before returning results.
E. Consistency Model Comparison
| Property | CQRS System | Traditional CRUD |
|---|---|---|
| Read/write schema coupling | Decoupled — separate data shapes | Tightly coupled — one schema serves both |
| Read scalability | Horizontal scale of read replicas, projections | Limited by primary database replication lag |
| Write optimization | Normalized, ACID, domain-validated | Normalized, but must also serve read patterns |
| Read latency | Sub-millisecond (Redis/Elasticsearch) | Tens to hundreds of ms (JOIN queries) |
| Operational complexity | High — two pipelines, eventual consistency | Low — single pipeline, immediate consistency |
| Consistency model | Eventual (read side lags write side) | Immediate (reads reflect writes at commit) |
| Suitable for | High read/write ratio, complex domain | Low traffic, simple domain, small teams |
Section 3: Event Sourcing
A. The Paradigm Shift
Every database you have worked with operates under the same assumption: persistence means storing the current state of an entity. When you update a row in PostgreSQL, the previous value is gone. The database stores a mutable snapshot — the entity as it exists right now.
Event Sourcing inverts this assumption entirely. Instead of persisting the current state, you persist the sequence of events that led to that state. The current state is not stored at all — it is derived on demand by replaying the event log.
Traditional Snapshot Model:
Entity: Order #42
+----------+------------+-----------+--------+
| order_id | status | total | items |
+----------+------------+-----------+--------+
| 42 | DELIVERED | $289.99 | [...] |
+----------+------------+-----------+--------+
(History is gone. You cannot know what the order looked like last Tuesday.)
Event Sourced Model:
Stream: order-42
+-----+-----------------------+--------------------+---------+
| seq | eventType | occurredAt | version |
+-----+-----------------------+--------------------+---------+
| 1 | OrderPlaced | 2026-01-10 09:01 | 1 |
| 2 | PaymentConfirmed | 2026-01-10 09:03 | 2 |
| 3 | OrderItemAdded | 2026-01-10 09:05 | 3 |
| 4 | OrderShipped | 2026-01-11 14:22 | 4 |
| 5 | OrderDelivered | 2026-01-14 11:09 | 5 |
+-----+-----------------------+--------------------+---------+
(Full history preserved. Reconstruct the state at any point in time.)
The current state of an entity is computed as a fold (reduce operation) over the event log:
$$\text{CurrentState} = \text{fold}([\text{event}_1, \text{event}_2, \ldots, \text{event}_n], \text{initialState}, \text{applyEvent})$$
Where applyEvent is a pure function that takes the current state and an event, and returns a new state.
This is isomorphic to how functional programming manages state transitions: a left fold over an immutable sequence of state-transforming functions.
B. The Event Store
The event store is the append-only persistence layer at the heart of an event-sourced system. Its semantics are fundamentally different from a relational database:
- No UPDATE operations. Events are immutable once written. You never modify a past event.
- No DELETE operations. The log is the source of truth. Deletion would corrupt the historical record.
- Streams, not tables. Events are organized into streams, where each stream corresponds to a single aggregate (e.g.,
order-42,user-7891,account-delta-prime). - Version numbers. Each event in a stream carries a monotonically increasing version number. Optimistic concurrency control uses this version to prevent concurrent writers from producing conflicting state.
EventStoreDB is the canonical purpose-built event store. It is an append-only database with native support for persistent subscriptions (consumers that subscribe to streams and receive events in order), projections (server-side stream transformations), and catch-up subscriptions (for projection rebuilds).
An event record contains the following canonical fields:
Event Record Structure:
+------------------+-------------------------------------------------------+
| Field | Description |
+------------------+-------------------------------------------------------+
| eventId | UUID — globally unique identifier for this event |
| streamId | The aggregate stream (e.g., "order-42") |
| eventType | Discriminator string (e.g., "OrderPlaced") |
| data | JSON payload — the business data of the event |
| metadata | JSON — correlation IDs, causation IDs, user context |
| occurredAt | UTC timestamp of when the event was created |
| version | Monotonic sequence number within the stream |
| globalPosition | Position in the global event log (all streams) |
+------------------+-------------------------------------------------------+
C. Domain Events and the Aggregate Root
In Domain-Driven Design (DDD), an Aggregate Root is the top-level entity in an aggregate cluster. It encapsulates all state and invariants for the cluster and is the single unit of transactional consistency.
In an event-sourced system, the aggregate root is responsible for:
- Validating that a command can be applied given the current state.
- Emitting the appropriate domain events that record what happened.
- Applying those events to evolve its own internal state.
The key discipline is the apply pattern: the aggregate never mutates state directly. All state changes happen through the apply(event) method, which is also used during event replay to reconstruct the aggregate from the event log.
// --- DOMAIN EVENTS ---
export interface OrderPlacedEvent extends DomainEvent {
eventType: 'OrderPlaced';
data: {
customerId: string;
items: Array<{ productId: string; quantity: number; unitPrice: number }>;
totalAmount: number;
shippingAddressId: string;
};
}
export interface OrderItemAddedEvent extends DomainEvent {
eventType: 'OrderItemAdded';
data: {
productId: string;
quantity: number;
unitPrice: number;
};
}
export interface OrderCancelledEvent extends DomainEvent {
eventType: 'OrderCancelled';
data: {
reason: string;
cancelledByUserId: string;
};
}
export interface OrderShippedEvent extends DomainEvent {
eventType: 'OrderShipped';
data: {
trackingNumber: string;
shippedAt: string;
};
}
export type OrderEvent =
| OrderPlacedEvent
| OrderItemAddedEvent
| OrderCancelledEvent
| OrderShippedEvent;
// --- AGGREGATE ROOT: Order ---
export type OrderStatus = 'UNINITIALIZED' | 'PENDING' | 'CONFIRMED' | 'SHIPPED' | 'CANCELLED';
export interface OrderItem {
productId: string;
quantity: number;
unitPrice: number;
}
export class Order {
private _orderId: string = '';
private _customerId: string = '';
private _status: OrderStatus = 'UNINITIALIZED';
private _items: OrderItem[] = [];
private _totalAmount: number = 0;
private _version: number = 0;
// Pending uncommitted events (emitted during command handling, to be persisted)
private _pendingEvents: OrderEvent[] = [];
get orderId() { return this._orderId; }
get status() { return this._status; }
get version() { return this._version; }
get pendingEvents() { return [...this._pendingEvents]; }
// -- COMMAND METHODS --
place(customerId: string, items: OrderItem[], shippingAddressId: string): void {
if (this._status !== 'UNINITIALIZED') {
throw new Error('Order has already been placed');
}
if (items.length === 0) {
throw new Error('An order must contain at least one item');
}
const totalAmount = items.reduce((sum, item) => sum + item.quantity * item.unitPrice, 0);
const event: OrderPlacedEvent = {
eventId: crypto.randomUUID(),
eventType: 'OrderPlaced',
aggregateId: crypto.randomUUID(),
occurredAt: new Date(),
version: this._version + 1,
data: { customerId, items, totalAmount, shippingAddressId },
metadata: {},
};
this.applyAndRecord(event);
}
addItem(productId: string, quantity: number, unitPrice: number): void {
if (this._status !== 'PENDING') {
throw new Error(`Cannot add items to an order in status ${this._status}`);
}
const event: OrderItemAddedEvent = {
eventId: crypto.randomUUID(),
eventType: 'OrderItemAdded',
aggregateId: this._orderId,
occurredAt: new Date(),
version: this._version + 1,
data: { productId, quantity, unitPrice },
metadata: {},
};
this.applyAndRecord(event);
}
cancel(reason: string, cancelledByUserId: string): void {
if (this._status === 'SHIPPED' || this._status === 'CANCELLED') {
throw new Error(`Cannot cancel an order in status ${this._status}`);
}
const event: OrderCancelledEvent = {
eventId: crypto.randomUUID(),
eventType: 'OrderCancelled',
aggregateId: this._orderId,
occurredAt: new Date(),
version: this._version + 1,
data: { reason, cancelledByUserId },
metadata: {},
};
this.applyAndRecord(event);
}
// -- EVENT APPLICATION (used for both new events and replay) --
apply(event: OrderEvent): void {
switch (event.eventType) {
case 'OrderPlaced':
this._orderId = event.aggregateId;
this._customerId = event.data.customerId;
this._items = [...event.data.items];
this._totalAmount = event.data.totalAmount;
this._status = 'PENDING';
break;
case 'OrderItemAdded':
this._items.push({
productId: event.data.productId,
quantity: event.data.quantity,
unitPrice: event.data.unitPrice,
});
this._totalAmount += event.data.quantity * event.data.unitPrice;
break;
case 'OrderCancelled':
this._status = 'CANCELLED';
break;
case 'OrderShipped':
this._status = 'SHIPPED';
break;
}
this._version = event.version;
}
// -- RECONSTRUCTION FROM EVENT LOG --
static reconstitute(events: OrderEvent[]): Order {
const order = new Order();
for (const event of events) {
order.apply(event);
}
return order;
}
private applyAndRecord(event: OrderEvent): void {
this.apply(event);
this._pendingEvents.push(event);
}
}
// --- EVENT STORE INTERFACE ---
export interface AppendOptions {
expectedVersion: number; // -1 for new stream, N for optimistic concurrency
}
export interface EventStore {
appendToStream(
streamId: string,
events: DomainEvent[],
options: AppendOptions,
): Promise<void>;
readStream(
streamId: string,
fromVersion?: number,
): Promise<DomainEvent[]>;
readAllEvents(
fromGlobalPosition?: number,
): AsyncIterable<DomainEvent>;
}
D. Optimistic Concurrency Control in the Event Store
A critical property of the event store is its use of optimistic concurrency control via the expectedVersion parameter. When appending events to a stream, the caller declares what version they expect the stream to currently be at. The event store atomically checks this expectation:
- If the stream is at the expected version, the events are appended and the stream advances.
- If another writer has already appended events and the stream version is higher than expected, the append is rejected with a
WrongExpectedVersionException.
This prevents two concurrent command handlers from both loading an order at version 5, making decisions independently, and both attempting to append at version 6, creating a split-brain state. The second writer's attempt is rejected, and they must reload the current stream and retry their command with the updated state.
The mathematical invariant maintained by the event store is:
$$\forall \text{ stream } S: \text{version}(S_t) = \text{version}(S_{t-1}) + |\text{events appended at time } t|$$
No gaps, no overwrites, no concurrent conflicting appends.
Section 4: Projections and Projection Lag
A. What is a Projection?
A projection is a read model built by consuming the event stream and transforming it into a shape optimized for querying. The projection is not the source of truth — the event store is. The projection is a derived, disposable view. If a projection becomes corrupted, out-of-sync, or needs to be redesigned for a new query pattern, it can be deleted and rebuilt by replaying the entire event stream from the beginning.
This disposability is a profound architectural property. In a snapshot-based system, your read model IS the source of truth; corrupting it means permanent data loss. In an event-sourced system, the read model is a cache of derived state that can always be reconstructed.
Projections can take many forms:
| Projection Type | Storage Technology | Use Case |
|---|---|---|
| Summary View | Redis Hash | Order status lookups, user profiles |
| Search Index | Elasticsearch | Full-text search, faceted filtering |
| Reporting Aggregate | PostgreSQL materialized view | Dashboard metrics, financial summaries |
| Graph Projection | Neo4j | Social graph, recommendation engines |
| Time-Series Projection | InfluxDB / TimescaleDB | Analytics, monitoring dashboards |
B. The Projectionist Pattern
The Projectionist (also called the Projection Engine or Event Processor) is the background worker that subscribes to the event store and maintains the read models. It is a long-running process that:
- Subscribes to the event store using a persistent subscription or catch-up subscription.
- Receives events in global order as they are appended.
- Routes each event to the appropriate projection handler(s).
- Updates the read model database accordingly.
- Checkpoints its progress (stores the last processed global position) so it can resume after a restart without reprocessing all historical events.
graph TD
CMD[Command Handler] -->|AppendToStream| ES[(EventStoreDB)]
ES -->|Persistent Subscription| PE[Projection Engine / Projectionist]
PE -->|Update| REDIS[(Redis: Order Summary)]
PE -->|Index| ES_IDX[(Elasticsearch: Search Index)]
PE -->|Upsert| PG[(PostgreSQL: Reporting Views)]
PE -->|Checkpoint| CP[(Checkpoint Store)]
QH[Query Handler] -->|ReadModel query| REDIS
QH -->|Search query| ES_IDX
CLIENT[Client] -->|Command| CMD
CLIENT -->|Query| QH
C. TypeScript Projection Handler Implementation
// --- PROJECTION: OrderSummaryProjection ---
export interface OrderSummaryRecord {
orderId: string;
customerId: string;
status: string;
totalAmount: number;
itemCount: number;
placedAt: string;
lastEventAt: string;
}
export class OrderSummaryProjection {
constructor(
private readonly redis: RedisClient,
private readonly checkpointStore: CheckpointStore,
) {}
private orderKey(orderId: string): string {
return `order:summary:${orderId}`;
}
async handle(event: DomainEvent): Promise<void> {
switch (event.eventType) {
case 'OrderPlaced':
await this.onOrderPlaced(event as OrderPlacedEvent);
break;
case 'OrderItemAdded':
await this.onOrderItemAdded(event as OrderItemAddedEvent);
break;
case 'OrderCancelled':
await this.onOrderCancelled(event as OrderCancelledEvent);
break;
case 'OrderShipped':
await this.onOrderShipped(event as OrderShippedEvent);
break;
}
// Checkpoint: record the global position of the last processed event
await this.checkpointStore.save('order-summary-projection', event.metadata['globalPosition'] as number);
}
private async onOrderPlaced(event: OrderPlacedEvent): Promise<void> {
const summary: OrderSummaryRecord = {
orderId: event.aggregateId,
customerId: event.data.customerId,
status: 'PENDING',
totalAmount: event.data.totalAmount,
itemCount: event.data.items.length,
placedAt: event.occurredAt.toISOString(),
lastEventAt: event.occurredAt.toISOString(),
};
await this.redis.hSet(this.orderKey(event.aggregateId), summary as unknown as Record<string, string>);
// Also maintain a customer-orders index for list queries
await this.redis.zAdd(`customer:orders:${event.data.customerId}`, {
score: event.occurredAt.getTime(),
value: event.aggregateId,
});
}
private async onOrderItemAdded(event: OrderItemAddedEvent): Promise<void> {
await this.redis.hIncrBy(this.orderKey(event.aggregateId), 'itemCount', 1);
await this.redis.hIncrByFloat(
this.orderKey(event.aggregateId),
'totalAmount',
event.data.quantity * event.data.unitPrice,
);
await this.redis.hSet(this.orderKey(event.aggregateId), 'lastEventAt', event.occurredAt.toISOString());
}
private async onOrderCancelled(event: OrderCancelledEvent): Promise<void> {
await this.redis.hSet(this.orderKey(event.aggregateId), {
status: 'CANCELLED',
lastEventAt: event.occurredAt.toISOString(),
});
}
private async onOrderShipped(event: OrderShippedEvent): Promise<void> {
await this.redis.hSet(this.orderKey(event.aggregateId), {
status: 'SHIPPED',
lastEventAt: event.occurredAt.toISOString(),
});
}
}
// --- PROJECTION ENGINE ---
export class ProjectionEngine {
private isRunning = false;
constructor(
private readonly eventStore: EventStore,
private readonly projections: Array<{ handle: (event: DomainEvent) => Promise<void> }>,
private readonly checkpointStore: CheckpointStore,
private readonly metricsCollector: MetricsCollector,
) {}
async start(): Promise<void> {
this.isRunning = true;
const lastCheckpoint = await this.checkpointStore.load('projection-engine') ?? 0;
for await (const event of this.eventStore.readAllEvents(lastCheckpoint)) {
if (!this.isRunning) break;
const startTime = Date.now();
for (const projection of this.projections) {
await projection.handle(event);
}
const processingLatency = Date.now() - startTime;
const eventAge = Date.now() - event.occurredAt.getTime();
// Track projection lag: how far behind the event store is the read model?
this.metricsCollector.gauge('projection.lag.ms', eventAge);
this.metricsCollector.histogram('projection.processing.latency.ms', processingLatency);
}
}
async stop(): Promise<void> {
this.isRunning = false;
}
}
D. Measuring and Managing Projection Lag
Projection lag is the most operationally critical metric in a CQRS/ES system. It can be measured as:
$$\text{lag}(t) = t_{\text{now}} - t_{\text{last event processed}}$$
Or equivalently, as a position delta in the global event log:
$$\text{position lag} = \text{GlobalHead} - \text{ProjectionCheckpoint}$$
Alert thresholds should be established based on the specific SLA of the system:
| Lag Threshold | Alert Level | Recommended Action |
|---|---|---|
| 0–200ms | Normal | No action |
| 200ms–2s | Warning | Investigate projection worker CPU/memory |
| 2s–30s | Critical | Page on-call; scale projection workers |
| >30s | Emergency | UI fallback to "processing" state; throttle commands |
Projection lag can be reduced by:
- Running multiple projection workers partitioned by aggregate ID range or stream prefix.
- Co-locating projection workers geographically with the event store.
- Optimizing per-event database write operations (use Redis pipelines, batch Elasticsearch indexing).
- Reducing the number of projections that a single event must update.
Section 5: Temporal Queries — Time-Travel Debugging
A. The Power of the Immutable Log
Because event sourcing preserves the complete history of every domain entity as an immutable, ordered log, it provides a capability that no snapshot-based database can match: temporal queries — the ability to reconstruct the exact state of any entity at any arbitrary point in time.
A typical auditor question that is impossible with a snapshot database but trivial with event sourcing:
"What was the balance of account ACC-4429-DELTA at 14:30:00 UTC on March 15, 2026, immediately before the suspicious transaction?"
With a traditional snapshot database, the answer requires either a dedicated audit table (which is often incomplete, forgeable, and expensive to maintain) or a point-in-time recovery of the database backup — a process that can take hours and affects the entire database.
With event sourcing, the answer requires a single query:
export class TemporalQueryService {
constructor(private readonly eventStore: EventStore) {}
async getOrderStateAt(orderId: string, pointInTime: Date): Promise<Order> {
const allEvents = await this.eventStore.readStream(`order-${orderId}`);
// Filter: only replay events that occurred before the requested timestamp
const eventsUpToPointInTime = allEvents.filter(
(event) => event.occurredAt <= pointInTime,
) as OrderEvent[];
if (eventsUpToPointInTime.length === 0) {
throw new Error(`Order ${orderId} did not exist at ${pointInTime.toISOString()}`);
}
// Reconstruct the aggregate state at that point in time
return Order.reconstitute(eventsUpToPointInTime);
}
async getAccountBalanceAt(accountId: string, pointInTime: Date): Promise<number> {
const events = await this.eventStore.readStream(`account-${accountId}`);
const relevantEvents = events.filter((e) => e.occurredAt <= pointInTime);
let balance = 0;
for (const event of relevantEvents) {
if (event.eventType === 'MoneyDeposited') {
balance += (event.data as { amount: number }).amount;
} else if (event.eventType === 'MoneyWithdrawn') {
balance -= (event.data as { amount: number }).amount;
}
}
return balance;
}
}
B. Use Cases for Temporal Queries
Temporal queries underpin a family of capabilities that have significant business and compliance value:
Debugging production anomalies: A customer claims an order was placed at a specific time with specific items, but the current state shows different items. With temporal queries, you can reconstruct the order state at the exact timestamp the customer reported and trace each subsequent mutation.
Compliance and regulatory reporting: PCI-DSS, SOX, and HIPAA require organizations to produce evidence that specific transactions occurred at specific times in specific states. Event sourcing provides this as a first-class capability.
Undo and rollback: Because you can compute the state at any past point, you can implement undo by simply resetting the read model projection to the state at time T-1. This is not available in snapshot systems without dedicated undo tables.
A/B testing and retrospective analysis: By replaying the event stream with a modified projection (e.g., "apply a new discount algorithm to past orders and compute what revenue would have been"), you can validate business hypotheses against real historical data without touching production state.
Temporal Query Visualization:
Event Stream: order-42
t=0 OrderPlaced --> { status: PENDING, total: $89.99, items: [A] }
t=1 OrderItemAdded --> { status: PENDING, total: $149.99, items: [A, B] }
t=2 PaymentConfirmed --> { status: CONFIRMED, total: $149.99, items: [A, B] }
t=3 OrderShipped --> { status: SHIPPED, total: $149.99, items: [A, B] }
Query: "What was the state at t=1.5?"
Answer: Replay t=0 and t=1 => { status: PENDING, total: $149.99, items: [A, B] }
(PaymentConfirmed at t=2 has not yet occurred in the timeline)
Section 6: Snapshots for Performance
A. The Replay Performance Problem
Event sourcing trades storage efficiency for historical completeness. A highly active aggregate — an order with hundreds of status updates, a bank account with decades of transactions, a game character with thousands of level-ups — can accumulate tens of thousands of events in its stream.
If the system must replay all $N$ events every time the aggregate is loaded for command processing, the computational cost scales linearly with stream length:
$$T_{\text{reconstitute}}(N) = N \times T_{\text{apply event}}$$
For an aggregate with $N = 10{,}000$ events and $T_{\text{apply event}} = 0.1\text{ms}$:
$$T_{\text{reconstitute}}(10{,}000) = 10{,}000 \times 0.1\text{ms} = 1{,}000\text{ms} = 1\text{ second}$$
One second of computation to load a single aggregate is unacceptable in a system that must process hundreds of commands per second.
B. The Snapshot Strategy
The solution is snapshotting: periodically persist a serialized snapshot of the aggregate's current state alongside its stream version. On subsequent loads, instead of replaying from the beginning of the stream, the system:
- Loads the most recent snapshot (a single database read).
- Reads only the events that occurred after the snapshot was taken.
- Applies only those newer events to the snapshot state.
The amortized reconstitution cost becomes:
$$T_{\text{reconstitute with snapshot}}(N) = T_{\text{load snapshot}} + (N \bmod K) \times T_{\text{apply event}}$$
Where $K$ is the snapshot frequency (every $K$ events). For $K = 100$:
$$T_{\text{reconstitute with snapshot}}(10{,}000) = T_{\text{load snapshot}} + (0 \text{ to } 100) \times T_{\text{apply event}}$$
The maximum reconstitution cost is bounded by $100 \times T_{\text{apply event}} = 10\text{ms}$, regardless of how many total events exist in the stream.
C. Snapshot Cadence Strategies
| Strategy | Trigger | Pros | Cons |
|---|---|---|---|
| Event Count | Every N events (e.g., N=100) | Simple, predictable | May snapshot during low-activity periods |
| Time-Based | Every T seconds/minutes | Bounded temporal lag | May snapshot aggregates with no recent changes |
| On-Demand | After commands that produce many events | Targeted efficiency | Requires command handler coordination |
| Background Compactor | Asynchronous process checks stale aggregates | No impact on write path | Adds operational complexity |
D. TypeScript Snapshot Implementation
// --- SNAPSHOT TYPES ---
export interface AggregateSnapshot {
aggregateId: string;
aggregateType: string;
streamVersion: number; // The event version at which this snapshot was taken
snapshotData: string; // JSON serialized aggregate state
takenAt: Date;
}
export interface SnapshotStore {
saveSnapshot(snapshot: AggregateSnapshot): Promise<void>;
loadLatestSnapshot(aggregateId: string, aggregateType: string): Promise<AggregateSnapshot | null>;
}
// --- SNAPSHOT-AWARE AGGREGATE REPOSITORY ---
export class SnapshotAwareOrderRepository {
private readonly SNAPSHOT_EVERY_N_EVENTS = 50;
constructor(
private readonly eventStore: EventStore,
private readonly snapshotStore: SnapshotStore,
) {}
async load(orderId: string): Promise<Order> {
// Step 1: Attempt to load the most recent snapshot
const snapshot = await this.snapshotStore.loadLatestSnapshot(orderId, 'Order');
let order: Order;
let fromVersion: number;
if (snapshot) {
// Step 2a: Hydrate the aggregate from the snapshot
order = Order.fromSnapshot(JSON.parse(snapshot.snapshotData));
fromVersion = snapshot.streamVersion + 1; // Only load events AFTER the snapshot
} else {
// Step 2b: No snapshot available — cold start from the beginning of the stream
order = new Order();
fromVersion = 0;
}
// Step 3: Load only the events after the snapshot version
const recentEvents = await this.eventStore.readStream(`order-${orderId}`, fromVersion);
// Step 4: Apply the remaining events
for (const event of recentEvents) {
order.apply(event as OrderEvent);
}
return order;
}
async save(order: Order): Promise<void> {
const pendingEvents = order.pendingEvents;
if (pendingEvents.length === 0) return;
// Persist the new events to the event store
await this.eventStore.appendToStream(`order-${order.orderId}`, pendingEvents, {
expectedVersion: order.version - pendingEvents.length,
});
// Check if we should take a snapshot after this save
if (order.version % this.SNAPSHOT_EVERY_N_EVENTS === 0) {
await this.snapshotStore.saveSnapshot({
aggregateId: order.orderId,
aggregateType: 'Order',
streamVersion: order.version,
snapshotData: JSON.stringify(order.toSnapshotData()),
takenAt: new Date(),
});
}
}
}
Section 7: Audit Ledgers and Compliance
A. Event Sourcing as the Ideal Audit Mechanism
The regulatory compliance requirements for enterprise software are exacting. PCI-DSS (Payment Card Industry Data Security Standard) requires tamper-evident records of all payment transactions. SOX (Sarbanes-Oxley Act) requires immutable audit trails of all financial record modifications. HIPAA requires healthcare organizations to maintain audit logs of all access to and modifications of protected health information (PHI).
Traditional approaches to auditing rely on one of two mechanisms, each with fundamental limitations:
Audit triggers: Database triggers that fire on INSERT, UPDATE, and DELETE operations and write copies of affected rows to a separate audit table. These are fragile: triggers can be disabled by privileged database users, they capture the data-layer representation rather than the business-level intent, they add write overhead to every operation, and they break during schema migrations.
Application-level audit logging: Application code explicitly writes audit records alongside state mutations. This is better than triggers but is error-prone — developers forget to add audit writes, audit logic is inconsistently applied across code paths, and the audit record is separate from the state that it documents.
Event sourcing eliminates both of these problems by making the audit trail the primary record. The event log is the system of record. There is no secondary table, no trigger to disable, no audit code path to forget. Every state change is already recorded as an immutable, timestamped, attributed event.
B. Comparing Audit Approaches
Traditional Audit Table Approach:
Table: orders (mutable state)
+----------+------------+ <-- Subject to UPDATE and DELETE
Table: orders_audit (tacked-on audit log)
+----------+------------+-----------+-----------+-----------+
| audit_id | order_id | old_status| new_status| changed_by|
| 1 | 42 | PENDING | CONFIRMED | system |
+----------+------------+-----------+-----------+-----------+
Problems:
- Trigger can be dropped: DROP TRIGGER audit_orders ON orders;
- Schema changes to 'orders' break the trigger
- No business context -- why was the status changed?
- Cannot reconstruct full state at arbitrary point in time
Event Sourced Approach:
Stream: order-42 (append-only, immutable)
+---------+-------------------+----------------------------+
| version | eventType | data |
| 1 | OrderPlaced | { customerId, items, ... } |
| 2 | PaymentConfirmed | { paymentRef, amount, ...} |
| 3 | OrderShipped | { trackingNumber, ... } |
+---------+-------------------+----------------------------+
Properties:
- Cannot be deleted or modified (append-only store)
- Carries business intent ("PaymentConfirmed"), not just field deltas
- Full state reconstructable at any version
- Causation chain preserved via correlationId/causationId metadata
C. Immutability Guarantees in EventStoreDB
EventStoreDB provides several mechanisms that enforce immutability at the infrastructure level, not just the application level:
Append-only semantics: The EventStoreDB API does not expose UPDATE or DELETE operations on event records. Once an event is appended to a stream, it is physically immutable.
Stream ACLs: Access Control Lists on streams allow write-only access to application services, while compliance and audit systems receive read-only access. No application service can read-and-then-modify an event.
Scavenging and soft deletion: EventStoreDB's "scavenge" operation (log compaction) can physically delete old events, but this is a deliberate operational action requiring administrator-level credentials, not something that can be triggered by application logic. For compliance scenarios, scavenging is typically disabled entirely on ledger streams.
Content-Addressable Storage: Each event's hash can be verified against its storage position, detecting any tampering with the underlying storage engine.
Section 8: Advanced Topics
A. CQRS Without Event Sourcing
A critical architectural point that practitioners frequently miss: CQRS and Event Sourcing are orthogonal patterns. You can use one without the other.
CQRS without Event Sourcing is a common and often appropriate choice. The command side writes to a normalized relational database using standard SQL. The query side reads from a separate denormalized database or cache. A synchronization mechanism (change data capture, scheduled ETL, or explicit cache invalidation on write) keeps the read side updated. This delivers the scalability benefits of separate read and write models without the complexity of event log storage and projection management.
Event Sourcing without CQRS is theoretically possible but rare. An event-sourced aggregate can be the only data model in a system, serving both write commands and read queries. In practice, event sourcing's reconstitute-from-events read path is too slow for high-volume query scenarios, which naturally pushes teams toward CQRS to add optimized read projections.
The combination of both — CQRS with Event Sourcing — is the most powerful configuration and the primary subject of this module.
B. Process Managers vs. Sagas
Process Managers and Sagas (covered in depth in Module 11: Sagas & Distributed Transactions) are the mechanisms by which multi-aggregate workflows are coordinated in event-sourced systems. The distinction is important:
- Sagas are choreography-based: each service listens for domain events and reacts by producing commands or further events, with no central coordinator.
- Process Managers are orchestration-based: a dedicated process manager aggregate listens to events from multiple streams and explicitly dispatches commands to other services to drive a workflow to completion.
In an event-sourced system, a Process Manager is itself an event-sourced aggregate. It receives events, records its own state changes as events (e.g., OrderFulfillmentStarted, PaymentCommandDispatched, ShipmentCommandDispatched), and can be fully reconstructed from its own event stream.
C. Event Versioning and Schema Evolution
Production event-sourced systems inevitably encounter the need to change the structure of domain events over time. A business rule changes, a new field is required, or an old field is renamed. Because events are immutable, you cannot alter historical events in the stream. You must handle schema evolution explicitly.
The two primary strategies are:
Upcasting: When the system reads an old event version during replay, an upcaster transforms it to the current event version format before passing it to the aggregate's apply method. The transformation is applied at read time, not stored:
export class OrderPlacedEventUpcaster {
// Converts v1 event (without 'shippingAddressId') to v2 format
upcast(rawEvent: Record<string, unknown>): OrderPlacedEvent {
const version = rawEvent['schemaVersion'] as number ?? 1;
if (version === 1) {
return {
...(rawEvent as unknown as OrderPlacedEvent),
data: {
...(rawEvent['data'] as Record<string, unknown>),
shippingAddressId: 'LEGACY_DEFAULT_ADDRESS', // Backfill with default
schemaVersion: 2,
},
};
}
return rawEvent as unknown as OrderPlacedEvent;
}
}
Weak schema / tolerant reader: Events carry only the fields that existed when they were written. The aggregate's apply method uses optional chaining and default values to tolerate missing fields from older events:
case 'OrderPlaced':
this._shippingAddressId = event.data.shippingAddressId ?? 'LEGACY_DEFAULT_ADDRESS';
break;
The second approach (tolerant reader) is simpler to implement but can accumulate technical debt as the number of supported "old shapes" grows. Upcasting is more explicit and maintains clean aggregate code at the cost of maintaining the upcaster chain.
D. Idempotent Event Processing in Projections
A projection worker may receive the same event multiple times due to network retries, redelivery after crashes, or at-least-once delivery guarantees in the message broker. If the projection is not idempotent, it will double-count events and produce incorrect read models.
Idempotency in projections is achieved by:
- Storing event IDs processed: Before handling an event, check whether its
eventIdhas been processed before. If so, skip it. - Upsert semantics: Design projection writes as upsert operations. Writing the same projection record twice produces the same result as writing it once.
- Version gating: Only apply a projection update if the incoming event's version is greater than the currently stored version in the read model.
Section 9: Real-World Case Studies
A. Microsoft Azure: CQRS in Cloud-Scale Microservices
Microsoft's Azure team published the canonical CQRS guidance in their Cloud Design Patterns documentation. Their primary motivation was the CosmosDB-backed microservices in Azure's control plane, where read operations from the Azure Portal, APIs, and internal monitoring systems dwarfed write operations (provisioning, scaling, deletion) by a ratio of several hundred to one.
Microsoft's implementation separates the ARM (Azure Resource Manager) write path, which validates and commits resource configuration changes as events, from the read path, which serves resource state from pre-built materialized views in Cosmos DB. The read model is eventually consistent with the write model, which is why the Azure Portal sometimes shows "Updating" states for several seconds after a resource configuration change.
The key lesson from Microsoft's implementation is that the read model projection strategy must be tailored to each entity type. High-frequency entities (virtual machines, network interfaces) use incremental projection updates. Low-frequency entities (management groups, subscriptions) use full-rebuild projections that replay from the beginning of their event streams on each update.
B. LinkedIn: The Activity Feed as an Event-Sourced Projection
LinkedIn's activity feed — the stream of connections' posts, likes, shares, and profile updates that appears on a user's homepage — is one of the most demanding read models in the industry. A user with 5,000 first-degree connections produces a feed that is the real-time aggregation of events from all 5,000 activity streams.
LinkedIn's engineering team implemented this as a fan-out-on-write projection. When a connection posts a new update (the domain event), the projection engine writes that update to the personalized feed timelines of all of that user's connections in Redis. Each user's feed is a pre-built, pre-sorted Redis sorted set, ranked by event timestamp.
The query path becomes a trivial Redis sorted set range query — a single network call returning a pre-paginated list of activity events. Without the event-sourced projection strategy, this would require an online merge-join across all 5,000 connection activity tables at read time, which is infeasible at LinkedIn's scale of hundreds of millions of users.
C. Banking Systems: Event Sourcing for Ledger Accuracy
The core challenge in banking ledger systems is that the ledger must be simultaneously a high-throughput transaction processor and an immutable audit record. These requirements are in direct tension in snapshot-based databases: high throughput requires optimized writes, but immutability requires that no record can be modified or deleted.
Several major financial institutions, including BBVA and ING, have adopted event sourcing for their core banking ledgers. The ledger is modeled as an event stream where each transaction is an immutable MoneyDeposited, MoneyWithdrawn, TransferInitiated, or InterestApplied event. The current account balance is a projection computed by the fold over this event stream.
The audit trail that regulators require is not a secondary system — it is the primary event store. Compliance reporting tools read directly from the event store and apply temporal queries to answer questions like "what was the outstanding loan balance as of the end of Q3?"
BBVA's implementation reported a 40% reduction in the complexity of their audit reporting code after migrating to event sourcing, because audit functionality that previously required complex ETL pipelines and audit table schemas became simple event store queries.
D. E-Commerce Order Management: The Canonical CQRS Use Case
E-commerce order management systems are the archetypal CQRS/ES use case, which is why this module uses them extensively for examples. Companies such as IKEA, Shopify, and Klarna have publicly discussed CQRS/ES implementations in their order systems.
The write path handles the rich domain logic of order placement, payment authorization, fraud detection, inventory reservation, fulfillment allocation, and shipping label generation. Each of these steps is represented as a domain event appended to the order's stream.
The read path serves multiple consumer types simultaneously from pre-built projections:
- Customer-facing order tracking: A Redis projection provides sub-millisecond order status lookups for 99.9% of customer queries.
- Fulfillment team dashboard: An Elasticsearch projection provides full-text search, date-range filtering, and status aggregations across millions of orders.
- Finance reporting: A PostgreSQL materialized view provides revenue aggregations, refund rates, and payment reconciliation data.
- Fraud analysis: A graph database projection represents connections between orders, customers, payment methods, and IP addresses for anomaly detection.
Each of these four read models is independently maintained by separate projection workers, independently scaled, and independently evolvable without touching the write path or the event store.
Section 10: Decision Matrix — CQRS and Event Sourcing
A. When to Use CQRS
| Scenario | Recommendation | Rationale |
|---|---|---|
| High read/write ratio (>10:1) | CQRS recommended | Independent read scaling provides cost-effective throughput |
| Complex domain with rich query patterns | CQRS recommended | Multiple projection shapes serve diverse consumers |
| Simple CRUD app, small team, early stage | Avoid CQRS | Operational complexity outweighs benefits at small scale |
| Single database schema serves both reads and writes efficiently | Avoid CQRS | No asymmetry to exploit; adds complexity without benefit |
| Multiple consumers needing different data shapes | CQRS recommended | Projections decouple consumer needs from write schema |
| Strict read-after-write consistency required | Use with caution | Eventual consistency must be managed explicitly |
B. When to Use Event Sourcing
| Scenario | Recommendation | Rationale |
|---|---|---|
| Complete audit trail required (finance, healthcare, compliance) | Event Sourcing strongly recommended | Audit trail is the primary record, not an afterthought |
| Temporal queries required ("what was the state at time T?") | Event Sourcing required | Snapshot databases cannot provide this |
| Complex aggregate with many state transitions | Event Sourcing recommended | Event log is more coherent than a tangled state machine |
| High regulatory compliance burden (PCI, SOX, HIPAA) | Event Sourcing recommended | Immutable event log satisfies regulators natively |
| Simple entity with stable schema and no audit requirements | Avoid Event Sourcing | Full event log storage and projection maintenance add cost |
| Team unfamiliar with DDD and event-driven patterns | Avoid Event Sourcing | Learning curve is steep; incorrect implementation is worse than CRUD |
| Data that must be physically deleted (GDPR right to erasure) | Use with caution | Immutable log conflicts with deletion requirements; requires crypto-shredding |
C. Combined Decision Framework
[New Feature or System]
|
Does it have complex domain
logic with many state transitions?
/ \
YES NO
| |
Does it require Use simple
audit trail or CRUD + standard ORM.
temporal queries?
/ \
YES NO
| |
Event Sourcing Does read load
+ CQRS exceed write load
by >10:1?
/ \
YES NO
| |
CQRS Standard
only CRUD
Section 11: Anti-Patterns to Avoid
A. Commands That Return Domain Data
The single most common CQRS violation: a command handler that returns the updated entity after processing.
// ANTI-PATTERN: Command that returns data
async handle(command: PlaceOrderCommand): Promise<OrderDetailReadModel> {
// ... process command ...
const order = await this.orderRepository.load(orderId);
return this.mapper.toReadModel(order); // WRONG: violates CQS
}
// CORRECT: Command returns only an acknowledgment
async handle(command: PlaceOrderCommand): Promise<CommandResult> {
// ... process command ...
return { success: true, aggregateId: orderId };
// Client must subsequently issue a GetOrderByIdQuery to read the result
}
Returning data from a command creates implicit coupling between the write and read paths, preventing independent scaling and evolution.
B. Projections That Query the Command Side
A projection should consume events and update the read model database. It should never query the event store, the normalized write database, or other aggregates to enrich the projection.
// ANTI-PATTERN: Projection that queries the command side
async onOrderPlaced(event: OrderPlacedEvent): Promise<void> {
// WRONG: Querying the command database from the projection
const customer = await this.commandDb.query(
'SELECT name, email FROM customers WHERE id = $1',
[event.data.customerId]
);
await this.redis.hSet(`order:${event.aggregateId}`, { customerName: customer.name });
}
// CORRECT: CustomerRegistered event should carry denormalized customer data,
// or a separate CustomerProjection should maintain a customer read model
// that the order projection reads from.
async onOrderPlaced(event: OrderPlacedEvent): Promise<void> {
// Read from the customer READ MODEL (not the command database)
const customerSummary = await this.customerReadModel.findById(event.data.customerId);
await this.redis.hSet(`order:${event.aggregateId}`, { customerName: customerSummary.name });
}
C. Events That Are Too Granular (Event Soup)
Defining an event for every property change produces an unreadable event stream with no business meaning.
Event Soup (Anti-Pattern):
CustomerFirstNameUpdated
CustomerLastNameUpdated
CustomerEmailUpdated
CustomerPhoneUpdated
Correct Domain Events:
CustomerProfileUpdated (carries all changed fields in one event)
CustomerEmailChanged (only if email change has domain significance, e.g., requires re-verification)
Events should reflect business intent and domain language, not property-level deltas. Each event should answer the question: "What meaningful thing happened in the domain?"
D. Events That Are Too Coarse (Missing Domain Detail)
The opposite problem: bundling too much into a single event that obscures the domain narrative.
Too Coarse (Anti-Pattern):
OrderUpdated { previousState: {...}, newState: {...} }
Correct Domain Events:
OrderItemAdded { productId, quantity, unitPrice }
OrderShippingAddressChanged { oldAddress, newAddress, reason }
OrderCancelled { reason, cancelledByUserId, refundAmount }
Each event should be a specific, named business fact that carries only the data relevant to that fact.
E. Not Handling Idempotency in Projections
Failing to handle duplicate event delivery causes projection state corruption. Every projection handler must be designed for at-least-once delivery:
// ANTI-PATTERN: Non-idempotent projection handler
async onOrderPlaced(event: OrderPlacedEvent): Promise<void> {
// WRONG: Blindly incrementing counter without idempotency check
await this.db.query('UPDATE stats SET order_count = order_count + 1');
}
// CORRECT: Idempotent handler using event ID deduplication
async onOrderPlaced(event: OrderPlacedEvent): Promise<void> {
const alreadyProcessed = await this.processedEventStore.contains(event.eventId);
if (alreadyProcessed) return;
await this.db.query('UPDATE stats SET order_count = order_count + 1');
await this.processedEventStore.add(event.eventId);
}
Section 12: The Practice Challenge
The following diagram represents the canonical CQRS with Event Sourcing data path. Study the separation between the command mutation path and the query projection retrieval path. Trace both paths from the client to the data store and back.
graph TD
CLIENT[Client Application]
subgraph "Command Path — Write Side"
CMD_BUS[CommandBus]
CMD_HANDLER[CommandHandler\nDomain Validation + Aggregate Logic]
AGG[Aggregate Root\nOrder / Account / etc.]
EVT_STORE[(EventStoreDB\nAppend-Only Event Log)]
end
subgraph "Projection Path — Async Sync"
PROJ_ENGINE[Projectionist Worker\nPersistent Subscription]
CHECKPOINT[(Checkpoint Store\nLast Processed Position)]
end
subgraph "Read Model Stores"
REDIS[(Redis\nOrder Summary Cache)]
ES_IDX[(Elasticsearch\nSearch Index)]
PG_VIEW[(PostgreSQL\nReporting Views)]
end
subgraph "Query Path — Read Side"
QRY_BUS[QueryBus]
QRY_HANDLER[QueryHandler\nNo Domain Logic — Read Only]
end
CLIENT -->|"PlaceOrderCommand"| CMD_BUS
CMD_BUS --> CMD_HANDLER
CMD_HANDLER -->|"Load from event stream"| AGG
CMD_HANDLER -->|"AppendToStream — domain events"| EVT_STORE
CMD_HANDLER -->|"CommandResult — aggregateId only"| CLIENT
EVT_STORE -->|"Persistent Subscription — ordered events"| PROJ_ENGINE
PROJ_ENGINE -->|"Update"| REDIS
PROJ_ENGINE -->|"Index"| ES_IDX
PROJ_ENGINE -->|"Upsert"| PG_VIEW
PROJ_ENGINE -->|"Checkpoint global position"| CHECKPOINT
CLIENT -->|"GetOrderByIdQuery"| QRY_BUS
QRY_BUS --> QRY_HANDLER
QRY_HANDLER -->|"Key lookup — O(1)"| REDIS
QRY_HANDLER -->|"Query result DTO"| CLIENT
style EVT_STORE fill:#2d4a22,stroke:#5a9e3c,color:#ffffff
style REDIS fill:#8b1a1a,stroke:#cc2222,color:#ffffff
style ES_IDX fill:#1a3a5c,stroke:#2266aa,color:#ffffff
style PG_VIEW fill:#3a2a5c,stroke:#6644aa,color:#ffffff
style PROJ_ENGINE fill:#4a3a00,stroke:#aa8800,color:#ffffff
The diagram illustrates the key architectural invariants of CQRS with Event Sourcing:
- The client never reads from the event store directly. The event store is a write-optimized, append-only log. Queries route to purpose-built read model stores.
- The event store is the single source of truth. All read models are derived views that can be discarded and rebuilt by replaying the event store.
- The projection path is asynchronous. There is no synchronous coupling between the command acknowledgment and the read model update. The client receives a
CommandResultimmediately; the read model updates in the background. - Multiple read models coexist. Redis, Elasticsearch, and PostgreSQL views serve different query patterns from the same underlying event stream, each optimized for its specific consumer.
- The checkpoint enables resilience. If the Projectionist worker crashes, it resumes from the last checkpointed global position, processing only the events it missed — not replaying the entire history.
This architecture is the operational foundation for any system that must simultaneously support high-throughput writes, low-latency reads, complete audit trails, temporal debugging, and independently scalable read and write paths.
For the next layer of distributed workflow coordination built on top of this foundation, proceed to Module 11: Sagas & Distributed Transactions and Module 9: Event-Driven Architectures.