DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Event-Driven Architecture for Software Development: Leverage the Strength of Reactive Systems
  • Building an Event-Driven Architecture Using Kafka
  • What Are Events? Always 'Decoupled'
  • Correlating Event-Driven Architecture and API-led Connectivity

Trending

  • Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot
  • Building a Real-Time Audio Transcription System With OpenAI’s Realtime API
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Using Java Stream Gatherers To Improve Stateful Operations
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Event Driven Architecture (EDA) - Optimizer or Complicator

Event Driven Architecture (EDA) - Optimizer or Complicator

This article shares real-world lessons from using tools like Kafka and AWS SNS, covering both the strengths of EDA and its common pitfalls.

By 
Soumya Ramesh user avatar
Soumya Ramesh
·
May. 23, 25 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
3.1K Views

Join the DZone community and get the full member experience.

Join For Free

Abstract

This article explores the practical realities of Event-Driven Architecture (EDA)—a paradigm often celebrated for its scalability, responsiveness, and flexibility in modern software systems. Drawing from years of hands-on experience with technologies like Kafka, RabbitMQ, and AWS SNS, I present a balanced view of EDA's benefits and pitfalls. The article delves into its core advantages, including loose coupling, asynchronous processing, real-time responsiveness, and extensibility, and aligns it with complementary approaches like Domain-Driven Design. At the same time, it sheds light on lesser-discussed challenges such as observability gaps, schema versioning, testing complexity, event duplication, and message sequencing. Real-world success stories from companies like Netflix and Walmart illustrate its potential, and I would also like to emphasize the importance of guardrails, error-handling patterns, and security best practices to build resilient systems. Ultimately, this article advocates for a thoughtful, problem-first approach to adopting EDA and highlighting that, while it can optimize systems at scale, it must be applied judiciously to avoid unnecessary complexities. 

Introduction

Over the years I have used varied technologies and tools to build applications to meet customer or internal needs. I am a developer who has seen a transition from Titanic (water fall)  to Jet Ski (agile), from Gantt charts to stand ups and from giant bricks (Monoliths) to tiny boxes (Microservice). We are always riding the wave of technology. Every time we move to something new or more modern, it's glorified. We put technology ahead of a problem sometimes.

An image showing "technology is not the problem"

I have been part of various implementations for years ranging from using Active MQ, Rabbit MQ, AWS- SNS to a more recent use of Apache Kafka. I have seen how well it blends in and helps build distributed systems. But not everything can be event driven and it has its own pitfalls. This is my effort to pen down my thoughts about the rush to move from a "Request driven architecture" to an "Event Driven architecture."

What Is Event Driven Architecture?

It is a software design pattern built around events. It allows systems to detect, process or react to real time events as they happen. It helps bind together complex and distributed systems and is always used in conjunction with other software architecture like micro services, domain driven design or the serverless architecture. It works best when used with technology/architecture which compliments it and brings in very less value when used alone.

An image showing an Event Driven Architecture


There are also a lot of technologies and tools which support event driven systems and realtime processing. By decoupling services and using event brokers like Kafka, RabbitMQ, or AWS SNS/SQS, EDA ensures high availability, fault tolerance, and efficient handling of large-scale workloads.

Benefits of EDA                                        

An image showing the benefits of Event Driven Architecture


Asynchronous Processing 

Tasks which take longer or do not have to give user immediate feedback can run in the background allowing the system to handle multiple tasks all at once in parallel. Since event producers and consumers are decoupled, systems can scale independently based on demand.The system doesn't wait for a response before moving on to the next task, reducing blocking and improving response times. Failures in one part of the system don’t necessarily bring down the entire application.

Loosely Coupled

Multiple systems can work together without knowing the internal complexities of each other. Because components are decoupled, you can modify or upgrade one part of the system without significantly affecting others. Testing components independently is easier in loosely coupled systems. Mocking or stubbing other services and components becomes straightforward, allowing for thorough and isolated tests. When issues arise, they are typically easier to isolate to specific components because of the loose dependencies. Different teams can work on separate components or services simultaneously without being blocked by dependencies on each other’s work. This speeds up development cycles and deployment

Highly Scalable

These are designed to scale up or down based on demand and traffic, which makes them a most sought out pattern these days. Asynchronous processing and loose coupling makes them highly scalable. Event brokers can be horizontally scaled to handle more events. By adding more nodes to the broker infrastructure, we can scale the event-handling capacity linearly, allowing for higher throughput. Cloud platforms provide services which automatically adjust resources based on demand, allowing us to handle large amounts of traffic without having to manually manage the scaling. This in turn allows considerable reduction in costs.

An image showing scalability


Real-Time Processing 

Various systems can react to each other's events and bring a whole ecosystem together. One event can be consumed by one or more consumers and they could use it for varied operations or even decide not to act upon it. Real-time processing in Event-Driven Architecture enables systems to respond to changes as they happen, ensuring low-latency, high-throughput processing. This model is ideal for scenarios that require immediate action based on real-time data, such as fraud detection, real-time analytics, and user-facing applications. 

Highly Extendible 

Once an event is produced, there is no limit to how many systems will consume it for varied use cases. Consumers have all the liberty to pick and choose event data based on their use case.  Since consumers can independently process different events, the overall system can handle much higher throughput compared to traditional request-response models. Once an event is generated there is no limit to how many consumers can consume nor there are any constraints on how that message gets used. Each consumer can process the same message, interpret and use parts/whole of the content as per its needs. 

Supplements Domain Driven Architecture

An image showing Domain-Driven Architecture


Multiple domains coming together and working in tandem is a hard goal to achieve. If the domains just focus on their business logic, then event driven architecture can work beautifully to integrate various domains together. Event-Driven Architecture (EDA) aligns perfectly with Domain-Driven Design (DDD), especially in large, complex systems where services need to evolve independently while maintaining business integrity. EDA enhances DDD by providing a flexible, scalable, and asynchronous way to handle communication between bounded contexts, ensuring that domain events are propagated efficiently while keeping services loosely coupled and autonomous.

Success Stories 

There are no dearth of success stories for event driven architecture. Giant companies like Uber, Netflix, Walmart and LinkedIn to mid-size companies or startups all have used it and reaped benefits. Event driven architecture has become a core architectural choice for modern software enterprises. 

This is a good read on how Netflix picked a combination of both push and pull mechanisms for their events. It's an example of one event flow, multiple consumption patterns (Push or Pull), at massive scale, with resilience and flexibility built-in. 

Walmart uses Kafka for its real time replenishment use case. Walmart moved from reactive, slow replenishment to a proactive, real-time supply chain powered by Event-Driven Architecture.

Kafka enabled a scalable, decoupled, and resilient backbone. 

Hidden Pitfalls

There are always unspoken challenges with using event driven architecture which can make the development and maintenance very complex. More common ones are listed below. 

An image showing the challenges with using event driven architecture


Harder Observability 

As events pave the way to a distributed system, consolidating all the logs will be a pain point. There is no centralized dashboard available by default. Some system failures can go unnoticed which could have a cascading effect.

Versioning Schema

Development phase goes well with events. Once in production, if there are changes needed to the schema, like adding or removing fields, then the changes have to be carefully investigated, made backward compatible and a good regression suite has to be built.

Complexity in Testing

Developer testing like unit-testing can be straightforward. But mimicking scenarios with systems going offline, end-to-end testing, performance testing can be quite tedious.

Event Spamming 

Events lacking enough details can be useless. Like in case of logs, too many events can cause nothing more than havoc for a developer to debug.

Message Broker

There is a tight constraint to use a message broker like Kafka or Active MQ. You could have a self-managed solution or buy a platform which comes with a price and learning curve.

Deduplication 

  • Events being replayed because of a faulty error handling 
  • Producers accidentally sending more than one event
  • No standard unique ID mechanism   

Sequencing messages 

Sometimes, to meet the business requirements, there is a need for sequencing messages from a single business entity. Various brokers have different ways to make it happen which is not always a default behavior and teams have to handle it based on their business or design needs.

Best Practices for Resilience

A resilient event driven system ensures the system is scalable, efficient, maintainable and most importantly fault tolerant. Some suggestions to make a system very resilient are below:

Schema Versioning and Validation 

Validating the schema at the Client, Broker or Consumer end can be a good choice to solve this to avoid exceptions which can bring event processing to a halt. A good schema versioning pattern can come a long way to make sure no regressions are caused. A new consumer should be able to process old events and old consumers should be able to ignore new fields.

Use Persistent Event Storage

Having a good persistence mechanism at the broker's end to make sure crashes, exceptions and service restarts do not cause data loss which can cause catastrophic production issues is very important. Using an immutable/Log with longer TTL(Time to live) is one good approach which will allow for replays. Like Postgres, DocumentDB can be a good option for data storage depending on your needs.

Idempotency and Unique IDs

 An image showing idempotencey and unique constraints


  • Assigning unique IDs, best done at the producer side will be beneficial.
  • Having unique constraints if a database is involved.
  • Having a good consumer event acknowledgement mechanism can help in this regard.
  • Having deduplication at the broker is a very powerful mechanism too. E.g. Kafka has exactly once processing mode (EOS mode) which can help prevent duplication. 

Error Handling  

Some of the highly powerful error handling patterns hold true for events true like the following:

  • Dead Letter Queue (DLQ) - This will help keep problematic messages away and allows for manual intervention or analysis later. These messages can be retried or logged or used to generate alarms/alerts.


An image showing dead letter queue



  • Circuit Breaker Pattern - If a service is failing, then further events to that consumer can be stopped with this pattern. This avoids tension on the system and avoids cascading failure while maintaining high system availability.  

Observability 

With the system being distributed, special consideration and thought has to be given to consolidate and monitor logs, setup alerts, metrics collection and performance monitoring. Tools like Data Dog, Grafana, Splunk or New Relic come to the rescue but with a heavy price. 

Security Considerations

By applying some security best practices, an Event driven architecture can remain robust against cyber threats, ensures compliance (GDPR, PCI-DSS, HIPAA), and maintains data integrity across distributed systems.

DDoS & Rate Limiting – Implement rate limits and throttling on event producers and consumers to prevent abuse and DDOS attacks.

Message Broker Security – Secure Kafka, RabbitMQ, or AWS SNS/SQS by configuring ACLs, enabling SSL, and restricting public access to prevent unauthorized event publishing or consumption.

Data Encryption – Encrypt events in transit (TLS) and at rest (AES-256) to protect sensitive information from interception or leaks.

Authentication & Authorization – Use OAuth 2.0, JWT, or API keys to ensure that only authorized producers and consumers can publish or subscribe to events. Role-Based Access Control (RBAC) or Attribute-Based Access Control (ABAC)  can also be implemented for fine-grained permissions.

Other Considerations 

  • Throttling - Slowing down the processing when the system is already stressed is Pivotal. Broker level throttling can help handle this and so can consumers.
  • Retries - Exponential Backoff, setting Min and Max retry counts at the consumer, Retry topics could be some mechanisms to follow.
  • Rate limiting - Will help systems to protect themselves from attacks or accidental circumstances.
  • Auto Scaling - Adjusting the consumers based on the traffic load can be done. This will expand the number of pods at heavy traffic and reduce the pods when the traffic reduces.

Conclusion - A Balanced Approach

I am not pontificating to use synchronous architecture, I have seen how scalable and extensible event driven architecture has made our systems. But if not done right, it can be a maintenance havoc for an organization. Some key takeaways I would like to focus on:

  • Not everything has to be solved via event driven patterns. 
  • Simple use cases have a lot of tried and tested alternatives. For simple use cases, which can be solved with some microservices and frontend technology, bringing in an event driven architecture might be an overkill. 
  • There is no "one size fits all"—understand your business requirements, scaling needs, NFR’s (Non Functional Requirements). 
  • Event driven architecture and right guardrails around it go hand in hand.
  • Event driven architecture is not complex, but it just exposes all the complexity which we did not have to solve or were hidden in a monolith. 
  • Getting event driven architecture right the first time might not be the case for most of us. Tweaking, learning as we go, and making modifications will reap results. 
  • Wherever possible, a hybrid system might be a good pattern to follow. 
  • As we look at a brighter future with AI and more robust new technologies, I think the first question to ask will always be “What is the problem we are trying to solve and why?” 
Architecture Event-driven architecture Event

Opinions expressed by DZone contributors are their own.

Related

  • Event-Driven Architecture for Software Development: Leverage the Strength of Reactive Systems
  • Building an Event-Driven Architecture Using Kafka
  • What Are Events? Always 'Decoupled'
  • Correlating Event-Driven Architecture and API-led Connectivity

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: