Back to Insights
ARCHITECTURE18 min read

Why We Chose Event Sourcing for a $200M Trading Platform

Published Mar 12, 2026alphabench Engineering

When Meridian Capital approached us to rebuild their equity trading platform, the requirements were clear: every transaction had to be auditable, replayable, and projected in real time across multiple read models. Traditional CRUD architecture wasn't going to cut it.

This is the story of how we designed and shipped an event-sourced trading platform that now processes over $200M in daily volume - including the decisions that worked, the ones that didn't, and everything we learned along the way.

What Is Event Sourcing, and Why Should You Care?

Before diving into our implementation, it's worth establishing what event sourcing actually is - because the term gets thrown around loosely.

In a traditional CRUD system, you store the current state of your data. When a user places an order, you insert a row into an orders table. When the order is filled, you update that row. The previous state is gone unless you've bolted on an audit log.

Event sourcing flips this model. Instead of storing the current state, you store the sequence of events that led to that state. The current state becomes a derived projection - one of potentially many. An order isn't a row that gets mutated; it's a sequence of events: OrderPlaced, OrderValidated, OrderRouted, OrderFilled.

The event log is the source of truth. Everything else - every database table, every dashboard, every report - is a derivative.

This distinction sounds academic until you're sitting in a room with regulators who want to know the exact state of a portfolio at 2:47 PM on March 3rd. In a CRUD system, that's a multi-day forensic exercise. In an event-sourced system, you replay events up to that timestamp and you have your answer in seconds.


The Problem with CRUD in Financial Systems

Most trading platforms start life as straightforward database applications. You have orders, executions, positions, and balances - all stored as mutable rows in a relational database. This works fine when you're small. It starts breaking down when:

Regulatory scrutiny increases. Meridian operates under SEC and FINRA oversight. The ability to reconstruct historical state at any point in time isn't a nice-to-have - it's a legal requirement. With CRUD, you're reconstructing state from audit logs that were bolted on as an afterthought, hoping nothing was missed or miscategorized.

Multiple teams need different views of the same data. The trading desk needs real-time P&L. Risk management needs exposure calculations. Compliance needs transaction surveillance. Operations needs settlement tracking. In a CRUD system, these all compete for the same database, creating contention, or you build elaborate ETL pipelines that are always slightly stale.

Debugging production issues requires forensic archaeology. When a discrepancy appears - and in trading systems, they always do - you need to understand exactly how the system arrived at the current state. CRUD gives you the current state and maybe some logs. Event sourcing gives you the complete, ordered sequence of everything that happened.

Data model changes become terrifying. In a CRUD system, schema migrations can alter historical data. You're modifying the source of truth. In event sourcing, events are immutable - you can always rebuild from the original event stream.


Why Event Sourcing Was the Right Fit for Meridian

Not every system needs event sourcing. We chose it for Meridian because the requirements aligned almost perfectly with event sourcing's strengths:

  • Complete audit trail by default. Every state change is an immutable event. There's no separate audit log to maintain - the event store is the audit log. When FINRA asks for a complete trading history, it's a query, not a project.

  • Temporal queries for free. Want to know the portfolio value at any point in time? Replay events up to that timestamp. No snapshots to maintain, no guesswork about what data was available when.

  • Multiple read models from a single source. The risk team, compliance team, and traders all need different views of the same data. Event sourcing lets us project the same event stream into purpose-built read models - each optimized for its consumer, each eventually consistent but independently scalable.

  • Replayability for debugging and testing. When a discrepancy appears, we can replay the exact sequence of events that produced it. This turned multi-day investigations into 15-minute exercises.

  • Natural fit for distributed systems. Events are the lingua franca of microservices. Every service publishes what happened; other services react as they see fit. This decouples services in ways that API calls never can.


The Architecture in Detail

We built the system on a stack of Node.js microservices with Apache Kafka as the event backbone and PostgreSQL for both the event store and materialized projections.

The Event Store

The event store is an append-only table in PostgreSQL with strict ordering guarantees. Each event record contains:

  • Aggregate ID - the entity the event belongs to (e.g., an order ID or portfolio ID)
  • Aggregate type - the category of entity (Order, Position, Portfolio)
  • Event type - what happened (OrderPlaced, ExecutionReceived, PositionUpdated)
  • Event data - the full payload as JSON
  • Metadata - timestamp, causation ID (what triggered this event), correlation ID (for tracing across services), actor ID (who/what initiated the action)
  • Version - a monotonically increasing sequence number per aggregate, used for optimistic concurrency control

We chose PostgreSQL over purpose-built event stores like EventStoreDB because the team had deep Postgres expertise, and Postgres's ACID guarantees and mature tooling outweighed the specialized features of dedicated event stores. For our scale (millions of events per day, not billions), this was the right trade-off.

The Command Pipeline

The flow for every state change follows the same pattern:

  1. A command is received (e.g., "Place Order for 1,000 shares of AAPL at market")
  2. The aggregate is loaded by replaying its events from the store (or from a snapshot + subsequent events)
  3. The aggregate validates the command against its current state (Does this portfolio have sufficient buying power? Is this instrument tradeable?)
  4. If valid, one or more events are emitted (OrderPlaced, BuyingPowerReserved)
  5. Events are persisted to the event store within a database transaction
  6. Events are published to Kafka for consumption by projection services and other bounded contexts
  7. Projection services consume events and update their read models - the positions table, the P&L dashboard, the risk exposure view, the compliance surveillance feed

Steps 5 and 6 are the trickiest part. We use the transactional outbox pattern: events are written to both the event store table and an outbox table in the same database transaction. A separate relay process reads the outbox and publishes to Kafka. This guarantees that if an event is stored, it will eventually be published - no dual-write consistency issues.

Why Kafka?

We chose Kafka over simpler message brokers for several reasons:

  • Durability. Kafka retains messages for a configurable period (we use 30 days). This means consumers can replay from any offset if they need to rebuild their state.
  • Consumer groups. Multiple independent consumers can process the same event stream at their own pace. The risk service doesn't slow down the P&L service.
  • Ordering guarantees. Within a partition, messages are strictly ordered. We partition by aggregate ID, ensuring all events for a given order or portfolio are processed in sequence.
  • Backpressure handling. If a consumer falls behind, Kafka buffers events without affecting producers or other consumers.

Read Models and Projections

This is where event sourcing really shines. From the same event stream, we project into:

  • Positions database - current holdings per portfolio, updated on every execution event
  • P&L service - real-time profit/loss calculations, streaming to trader dashboards via WebSocket
  • Risk engine - exposure calculations, VaR estimates, and limit monitoring
  • Compliance feed - transaction surveillance for pattern detection and regulatory reporting
  • Settlement service - tracks pending settlements and reconciliation with clearing houses

Each projection is an independent service with its own database. If the risk engine's database corrupts, we rebuild it by replaying events from Kafka. No data loss, no coordination with other services.


Testing an Event-Sourced System

One of the unexpected benefits of event sourcing is how naturally it supports testing. Since every state change is driven by events, tests become declarative:

  • Given these events have occurred (set up state)
  • When this command is issued (trigger behavior)
  • Then these events should be emitted (assert outcomes)

This pattern eliminates the need for complex database setup in tests. You're testing pure business logic - given this history, does this command produce the correct events? No mocking databases, no managing test fixtures.

We also use event replay for integration testing. We capture production event streams (sanitized of PII), replay them against new code, and compare the resulting projections. This caught several regression bugs that unit tests missed - particularly around event ordering edge cases that only appear with real-world data patterns.


Performance Considerations

The Snapshotting Problem

Replaying thousands of events to rebuild an aggregate's state on every command gets expensive. A portfolio that's been active for three years might have 50,000+ events. Loading all of them on every trade is not viable.

We implemented periodic snapshots - serialized representations of an aggregate's state at a given event version. When loading an aggregate, we find the latest snapshot and replay only events that occurred after it. Our snapshotting strategy:

  • Snapshots are created every 100 events per aggregate
  • Snapshots are stored in a separate table, indexed by aggregate ID and version
  • Snapshot creation is asynchronous - it doesn't block the command pipeline
  • Old snapshots are garbage-collected after 90 days (we can always rebuild from events)

Event Store Query Performance

The append-only nature of the event store makes reads straightforward but requires careful indexing. Our primary access patterns:

  • Load all events for an aggregate (indexed on aggregate_id + version)
  • Load events after a specific version for an aggregate (same index, range scan)
  • Load all events after a global sequence number (for projection rebuilds, indexed on global_sequence)

At our current scale (~3M events/day), PostgreSQL handles this comfortably. We partition the event store table by month for operational manageability and archive partitions older than 12 months to cold storage.


What We'd Do Differently

Event sourcing isn't free. Here are the trade-offs we encountered and what we learned:

Event schema evolution is harder than we anticipated. As the system grew, we needed to change event structures. Adding a new field to OrderPlaced is simple - old events just don't have it. But renaming a field, changing its type, or splitting an event into two is painful. We settled on a versioned schema approach with upcasters that transform old event formats to new ones at read time. This works but adds complexity. If we started over, we'd invest more upfront in event schema design and treat it as seriously as a public API contract.

Eventual consistency requires careful UX design. There's an inherent delay between writing an event and seeing it reflected in read models. For most views, this is imperceptible (under 50ms in our system). But for the order confirmation screen, traders expected instant feedback. We solved this with optimistic updates on the client - the UI shows the expected state immediately, then reconciles when the projection catches up. The key insight: you need to design for eventual consistency from the beginning, not bolt it on later.

Developer onboarding takes longer. Event sourcing is a paradigm shift. Engineers accustomed to CRUD need time to internalize the pattern. Our onboarding now includes a dedicated "event sourcing bootcamp" - a two-day workshop where new team members build a simplified event-sourced system from scratch.

CQRS complexity compounds. Separating reads and writes (Command Query Responsibility Segregation) is powerful but means you're maintaining multiple data stores. Schema changes, data migrations, and debugging all require awareness of both sides. We underestimated this operational overhead initially.


When NOT to Use Event Sourcing

Event sourcing is not a universal architecture. We'd recommend against it when:

  • Simple CRUD is sufficient. If your domain is straightforward create/read/update/delete with no complex business rules, event sourcing adds unnecessary complexity.
  • You don't need an audit trail. If historical state reconstruction isn't a requirement, you're paying complexity costs for a benefit you don't need.
  • Your team is small and unfamiliar with the pattern. The learning curve is real. A two-person team building an MVP should probably start with CRUD and migrate later if needed.
  • Write volume vastly exceeds read volume. Event sourcing optimizes for read flexibility at the cost of write simplicity. If your workload is write-heavy with simple reads, it's likely overkill.

Results

Twelve months after launch, the numbers speak for themselves:

  • $200M+ in daily trading volume processed through the platform
  • 12ms average order-to-execution latency - faster than the legacy system by 3x
  • 100% audit compliance - FINRA requested a complete trading history audit and we generated the report in under 30 minutes. The legacy system required a team of three analysts working for two weeks.
  • Zero data discrepancies between read models - up from ~15 per month on the legacy system
  • 4 independent read models serving different teams, all derived from the same event stream, each optimized for its specific use case
  • 60% faster bug resolution - the ability to replay exact event sequences reduced average investigation time from 8 hours to under 3 hours

"The ability to answer any regulatory question by replaying events changed our relationship with compliance. It went from adversarial to collaborative." - CTO, Meridian Capital


Key Takeaways

  1. Event sourcing is a commitment, not a feature. It permeates your entire architecture. Evaluate honestly whether the benefits justify the complexity for your specific domain.

  2. Invest heavily in event schema design. Treat events as a public API. Version them. Document them. Review schema changes as carefully as you'd review a database migration.

  3. The transactional outbox pattern is non-negotiable. Dual writes between an event store and a message broker will eventually lose data. Use the outbox pattern or accept data loss.

  4. Build for eventual consistency from day one. It's a UX concern as much as a technical one. Design your interfaces to handle the propagation delay gracefully.

  5. Snapshotting isn't optional at scale. Plan for it from the beginning, even if you don't need it initially. Retrofitting snapshots onto a running system is significantly harder.

Event sourcing isn't the right choice for every system. But for financial platforms where auditability, replayability, and multi-view projections are core requirements, it's hard to beat.

Have a similar challenge?

Let's discuss how we can help you build the right solution.

START A PROJECT