Event-Driven Architecture Patterns
Why Events Matter
Event-driven architecture (EDA) has become fundamental to building scalable, loosely coupled distributed systems. Instead of services directly calling each other, they communicate by publishing and subscribing to events. This inversion of dependencies unlocks powerful patterns for handling complexity at scale.
I’ve spent the last several years building event-driven systems across multiple domains: from financial transaction processing to IoT data ingestion to orchestrating infrastructure changes. Here’s what I’ve learned about the patterns that work and the pitfalls to avoid.
What Is an Event?
Before diving into patterns, let’s define our terms. An event is a fact about something that happened in the past:
{
"eventId": "evt_2025_abc123",
"eventType": "OrderPlaced",
"timestamp": "2025-10-20T14:23:45Z",
"aggregateId": "order-12345",
"data": {
"orderId": "order-12345",
"customerId": "cust-789",
"items": [...],
"totalAmount": 149.99
}
}
Events are immutable. They describe what happened, not what should happen. This distinction is crucial.
Events vs Commands vs Messages:
- Event: Something that happened (“OrderPlaced”, “PaymentCompleted”)
- Command: A request to do something (“PlaceOrder”, “ProcessPayment”)
- Message: Generic term for either
Core Patterns
1. Event Notification
The simplest pattern: a service publishes an event when something interesting happens. Other services that care can react.
Example:
OrderService: publishes "OrderPlaced" event
EmailService: subscribes, sends confirmation email
AnalyticsService: subscribes, updates dashboards
InventoryService: subscribes, reserves items
When to use:
- Decoupling services that don’t need synchronous responses
- Fan-out scenarios where multiple systems need to react
- Audit trails and observability
Challenges:
- Eventual consistency: email might arrive before inventory is reserved
- No guarantee subscribers exist or are working
- Debugging flows across multiple services
2. Event-Carried State Transfer
Events contain enough data that consumers don’t need to call back to the source service.
Example:
{
"eventType": "CustomerAddressChanged",
"customerId": "cust-789",
"oldAddress": {...},
"newAddress": {
"street": "123 Main St",
"city": "Portland",
"state": "OR",
"zip": "97201"
}
}
The shipping service can update its local cache of customer addresses without calling the customer service.
Benefits:
- Reduced coupling and service-to-service calls
- Services can work when dependencies are down
- Lower latency (no remote calls needed)
Challenges:
- Events can become large
- Data duplication across services
- Eventually consistent reads
3. Event Sourcing
Instead of storing current state, store the sequence of events that led to that state. Current state is derived by replaying events.
Traditional approach:
UPDATE orders SET status = 'shipped', shipped_at = NOW()
WHERE id = 'order-12345';
Event sourcing approach:
Event 1: OrderPlaced(orderId, customerId, items)
Event 2: PaymentReceived(orderId, amount, method)
Event 3: OrderShipped(orderId, trackingNumber, carrier)
Current state = replay all events for that order.
Benefits:
- Complete audit trail (you never lose history)
- Time travel: reconstruct state at any point in the past
- New views: create new projections from existing events
- Natural fit for domains with compliance requirements
Challenges:
- Complexity: need event store, projection management
- Eventual consistency for read models
- Schema evolution is harder
- Performance: replaying thousands of events is slow
When to use:
- Financial systems, healthcare, legal domains
- When audit trail is critical
- When you need to analyze historical state
I built an event-sourced system for financial transactions. The ability to replay any account to any point in time was invaluable for debugging and compliance. But the complexity was high: we needed separate read models, event versioning strategies, and careful handling of event schema changes.
4. CQRS (Command Query Responsibility Segregation)
Separate the write model (commands) from the read model (queries). Often paired with event sourcing.
Write side:
PlaceOrder command -> Order aggregate -> Events saved
Read side:
Events -> Projection -> Materialized view -> Query response
Example:
- Write: Orders are stored as event streams
- Read: Materialized views in Postgres for fast queries
orders_summarytable for list viewscustomer_order_historytable for customer portalorder_searchin Elasticsearch for full-text search
Benefits:
- Optimize reads and writes independently
- Scale read and write sides separately
- Support multiple read models from same events
- Query models match UI needs exactly
Challenges:
- Increased system complexity
- Eventual consistency between write and read models
- Need infrastructure for projections
5. Saga Pattern
Distributed transactions across multiple services without two-phase commit.
A saga is a sequence of local transactions. If one step fails, compensating transactions undo previous steps.
Example: Order Fulfillment Saga
- Order service: reserve order
- Payment service: charge customer
- Inventory service: allocate stock
- Shipping service: create shipment
If step 3 fails, compensating transactions:
- Refund payment
- Cancel order
Two approaches:
Choreography: Services publish events, next service reacts
OrderService -> OrderCreated event
PaymentService -> PaymentCompleted event
InventoryService -> InventoryReserved event
If inventory fails, publishes InventoryReservationFailed, which triggers refund.
Orchestration: Central coordinator manages the saga
class OrderSaga:
def execute(self, order):
try:
payment_id = self.payment_service.charge(order)
inventory_id = self.inventory_service.reserve(order)
shipment_id = self.shipping_service.create(order)
except InventoryError:
self.payment_service.refund(payment_id)
self.order_service.cancel(order.id)
raise
Choreography vs Orchestration:
| Choreography | Orchestration |
|---|---|
| Decentralized | Centralized |
| Services are autonomous | Coordinator controls flow |
| Harder to debug | Easier to understand flow |
| More resilient | Single point of failure |
I’ve used both. For simple flows (2-3 steps), choreography works well. For complex business processes (10+ steps, conditional logic), orchestration is clearer.
6. Transactional Outbox
Ensure an event is published if and only if a database transaction commits.
The problem:
# This is broken!
db.save(order)
event_bus.publish(OrderCreated(order))
If publish() fails, the database has the order but no event was sent. If save() fails after publish(), the event was sent but no order exists.
The solution: Outbox pattern
# Atomic transaction
with db.transaction():
db.save(order)
db.outbox.insert(OrderCreated(order))
# Separate process reads outbox and publishes events
event_publisher.poll_outbox()
The outbox table is in the same database, so writes are atomic. A separate process (or the same process using polling) reads from the outbox and publishes events.
Implementation options:
- Polling: Background job polls outbox table
- Transaction log tailing: CDC (Change Data Capture) reads database log
- Dual writes with idempotency: Accept occasional duplicates, handle with idempotent consumers
Debezium is a popular tool for CDC-based outbox pattern with Kafka.
Message Brokers and Event Stores
Your choice of infrastructure shapes what patterns are practical.
Apache Kafka
Strengths:
- High throughput (millions of events/second)
- Event log retention (days, weeks, forever)
- Strong ordering guarantees within a partition
- Replay capability
Use cases:
- Event streams for analytics
- Event sourcing event store
- Inter-service communication at scale
Challenges:
- Operational complexity (ZooKeeper, partitions, rebalancing)
- Overkill for simple pub/sub
- At-least-once delivery (duplicates possible)
RabbitMQ / AWS SQS
Strengths:
- Traditional message queue semantics
- Simpler operational model
- Rich routing (exchanges, bindings)
- Good for task queues
Use cases:
- Work queues
- Simple pub/sub
- RPC patterns
Challenges:
- Not designed for event sourcing (messages are deleted when consumed)
- Limited replay capability
- Lower throughput than Kafka
Redis Streams
Strengths:
- Lightweight, part of Redis
- Consumer groups
- Good performance
- Simpler than Kafka
Use cases:
- Medium-scale event streams
- Real-time notifications
- Activity feeds
Challenges:
- Not as battle-tested as Kafka for high scale
- Retention tied to memory
Event Store DB / EventStoreDB
Strengths:
- Purpose-built for event sourcing
- Projections as first-class concept
- Optimistic concurrency built-in
Use cases:
- Event sourcing systems
- When you need built-in projections
Design Principles
1. Events Should Be Immutable
Once published, an event never changes. If you made a mistake, publish a correction event.
Wrong:
{"eventType": "OrderPlaced", "status": "pending"}
// Later: modify event to status: "completed"
Right:
{"eventType": "OrderPlaced", "status": "pending"}
{"eventType": "OrderCompleted", "completedAt": "..."}
2. Events Should Be Self-Contained
Include enough data that consumers don’t need to call back to fetch details.
Bad:
{"eventType": "ProductUpdated", "productId": "prod-123"}
Good:
{
"eventType": "ProductUpdated",
"productId": "prod-123",
"changes": {
"price": {"old": 99.99, "new": 89.99},
"stock": {"old": 50, "new": 45}
}
}
3. Design for Idempotency
Consumers might receive the same event multiple times. Design handlers to be idempotent.
def handle_order_placed(event):
# Check if already processed
if db.exists(f"processed_event_{event.id}"):
return
# Process event
send_confirmation_email(event.order_id)
# Mark as processed
db.set(f"processed_event_{event.id}", True)
4. Version Your Events
Event schemas evolve. Plan for it from day one.
{
"eventType": "OrderPlaced",
"eventVersion": "v2",
"data": {...}
}
Use schema registries (like Confluent Schema Registry) to manage evolution.
Observability Challenges
Event-driven systems are harder to debug than synchronous request-response systems.
Essential tools:
-
Distributed tracing: OpenTelemetry, Jaeger, Zipkin
- Trace events as they flow through services
- Correlate events with trace IDs
-
Event visualization: Tools like Kafka UI, RedPanda Console
- See events in real-time
- Replay specific events for debugging
-
Dead letter queues: Capture failed events
- Don’t lose events that fail processing
- Investigate and replay
-
Event schemas: Centralized registry
- Know what events exist
- Understand event structure
- Track schema versions
Anti-Patterns to Avoid
1. Event Chains That Are Too Long
If event A triggers B which triggers C which triggers D, you’ve created a fragile chain that’s hard to debug.
Better: Use orchestration or combine steps.
2. Using Events for Synchronous Workflows
Don’t use events when you need immediate feedback.
Wrong: Click “Checkout” -> publish event -> wait for confirmation email
Right: Click “Checkout” -> synchronous API call -> immediate confirmation
3. Publishing Internal State Changes
Events should represent business-level facts, not database row changes.
Wrong: UserTableRowUpdated
Right: UserEmailChanged, UserSubscriptionUpgraded
4. Tight Coupling Through Events
If changing one service’s events breaks five others, you’ve just created distributed coupling.
Solution: Define stable contracts, version events, use schema registries.
When to Use Event-Driven Architecture
Good fit:
- Multiple systems need to react to the same change
- You need audit trails and compliance
- Scale requires decoupling services
- Asynchronous workflows (order fulfillment, ETL pipelines)
- Real-time data processing
Not a good fit:
- Simple CRUD applications
- When you need immediate consistency
- Small teams without operational maturity
- Synchronous user interactions
Getting Started
If you’re new to event-driven architecture:
- Start with event notifications for non-critical flows
- Add transactional outbox to ensure reliability
- Consider CQRS if read and write patterns diverge significantly
- Try event sourcing for one bounded context, not everything
- Invest in observability from day one
Conclusion
Event-driven architecture is a powerful tool for building scalable, loosely coupled systems. The patterns I’ve described—event notification, event sourcing, CQRS, sagas—each solve specific problems and come with specific trade-offs.
Don’t adopt these patterns because they’re trendy. Adopt them when you have concrete problems they solve: scale, decoupling, audit requirements, or complex workflows.
Start simple, add complexity only when needed, and invest heavily in observability. Events are a powerful abstraction, but they make debugging harder. With the right patterns and tooling, event-driven systems can be both powerful and maintainable.
Events aren’t a silver bullet, but they’re a damn good tool when you need loose coupling, scale, and audit trails. Use them wisely.