Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions documentation/documentation/arch_concepts/event_driven_arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,147 @@
limitations under the License.
-->
# Event-Driven Architecture

This document provides an overview of the communication methods between
microservices using Apache Kafka. It offers both a high-level explanation of
how information is sent and received and a look at the layers of
abstraction implemented within Hexkit. The document also outlines how
a design pattern called the outbox pattern is employed to solve certain challenges.

## Event-driven Architecture with Apache Kafka

> See the [Kafka documentation](https://kafka.apache.org/intro) for in-depth
> information on concepts and Kafka's inner workings.

Event-driven architecture refers to a protocol where information is exchanged
indirectly and often asynchronously by interacting with a broker. Rather than
making direct API calls from one service to another, one service, called the *producer*,
publishes information to a broker in the form of an *event* (also called a
*message*), and is ignorant of all downstream processing of the event. It is essentially
"fire and forget". Other services, called *consumers*, can consume the event once it has
been published. The same way the producer is ignorant of an event's fate, the consumers
are ignorant of its origin. This has important implications for design.
In Kafka, events have metadata headers, a timestamp, a *key*, and a *payload*.
Hexkit defines a `type` field in the metadata headers to further distinguish events.
Events are organized into configurable *topics*, and further organized within each topic
by key. A topic can have any number of producers and consumers, and messages are not
deleted after consumption (enabling rereads).

In the context of microservices, event-driven architecture provides some important benefits:

1. **Decoupling of Service Dependencies:** Services communicate by emitting events,
rather than direct requests, reducing service coupling. Keeping contractual definitions
(i.e. payload schemas) in a dedicated library means services can evolve with minimal
change propagation.
1. **Asynchronous Communication:** Kafka facilitates asynchronous data processing,
allowing services to operate independently without waiting for responses. This improves
performance and fault tolerance, as services can continue functioning even if one
component is slow or temporarily unavailable.
1. **Consistency and Order Guarantee:** Kafka maintains the order of events within
a partition for a given key, crucial for consistency in processing events that are
related or dependent upon one another.


But event-driven architecture also introduces challenges, too:

1. **Duplicate processing:** Because the event consumers have no way to know whether
an event is a duplicate or a genuinely distinct event with the same payload, the
consumers must be designed to be idempotent.
2. **Data Duplication:** If service A needs to access the data maintained by service B,
the options are essentially an API call (which introduces security and coupling concerns)
or duplicating the data for service A. GHGA has opted mitigate this issue via data
duplication by way of the outbox pattern, described later in this document.
3. **Tracing:** A single request flow can span many services with a multitude of a messages
being generated along the way. The agnostic nature of producers and consumers makes it
harder to tag requests than with traditional API calls, and several consumers can
process a given message in parallel, sometimes leading to complex request flows.
GHGA enables basic tracing by generating and propagating a *correlation ID* (also called
a request ID) as an event header.


## Abstraction Layers

Hexkit features protocols and providers to simplify using
Kafka in services. Instead of writing the functionality from scratch in every new
project, Hexkit provides the `KafkaEventPublisher` and `KafkaEventSubscriber`
provider classes, in addition to an accompanying `KafkaConfig` Pydantic config class.


### Producers

When a service publishes an event, it uses the `KafkaEventPublisher` provider class
along with a service-defined _translator_ (a class implementing the `EventPublisherProtocol`).
The purpose of the translator is to provide a domain-bound API to the core of the service for
publishing an event without exposing any of the specific provider-related details. The core
calls a method like `publish_user_created()` on the translator, and the translator provides
the requisite payload, topic, type, and key to the `KafkaEventPublisher`.
The translator's module usually features a config class for defining the topics and event
types used. It's important to note that a translator is not strictly required in order to
use the `KafkaEventPublisher`. As long as the required configuration is provided, there's
nothing to stop the core from using it directly.

### Consumers

While the `KafkaEventPublisher` can be used in isolation, the `KafkaEventSubscriber` requires
a *translator* to deliver event payloads to the service's core components. The translator is
defined outside of Hexkit (i.e. in the service code) and implements the `EventSubscriberProtocol`.
The provider hands off the event to the translator through the protocol's `consume` method,
which in turn calls an abstract method implemented by the translator. The translator receives
the event and examines its attributes (the topic, payload, key, headers, etc.) to determine
processing. Naturally, the actual processing logic varies, but usually the payload is
validated against the expected schema and used to call a method in the service's core.
When control is returned to the `KafkaEventSubscriber`, the underlying consumer (from
`AIOKafka`) saves its offsets, declaring that it has successfully handled the event and
is ready for the next.

A diagram illustrating the process of event publishing and consuming, starting with
the microservice:
![Kafka abstraction](../img/kafka%20basics%20generic.png)


## Outbox Pattern

Sometimes services need to access data owned by another service.
The challenge of enabling this kind of linkage involves ensuring data consistency and avoiding
coupling. The outbox pattern provides a way to mitigate these risks by publishing persisted data
as Kafka events for other services to consume as needed. Some advantages include:
- Kafka data can be republished from the database as needed (fault tolerance)
- The coupling between services is minimized
- The data producer remains the single source of truth
- Kafka events no longer need to be backed up separately from the database


### Topic Compaction & Idempotence

Topic compaction means retaining only the latest event per key in a topic. Why do this?
In Hexkit's implementation of the outbox pattern, published events are categorized as one
of two event types: `upserted` or `deleted`. The entire state of the object, insofar as
it is exposed by the producer, is represented in every event.
Consumers interpret the event type and act accordingly, based on their needs (although
this typically translates to at least upserting or deleting the event in their own database).
When combined with topic compaction, the result is that consumers only have to get the
latest event to be up to date with the originating service.

### Implications

One requirement of using the outbox pattern as implemented in Hexkit is that consumers
must be idempotent. The state of Kafka is stored in the database and can be republished
as needed, so services must be prepared to consume the same event more than once. Event
republishing might need to occur at any time, so it is possible for sequence-dependent
events to be re-consumed out of sequence. Depending on the complexity of the microservice
landscape, it can be difficult to foresee all the potential cases where out-of-order
events could be problematic. That's why it's paramount to perform robust testing and use
features like topic compaction where necessary.

## Tracing

Request flows are traced with a `correlation ID`, or request ID, which is generated at
the request origin. The correlation ID is propagated through subsequent services by
adding it to the metadata headers of Kafka events. When a service consumes an event, the
provider automatically extracts the correlation ID and sets it as a
[context variable](https://docs.python.org/3.9/library/contextvars.html#context-variables)
before calling the translator's `consume()` method. In the case of the outbox mechanism,
the correlation ID is stored in the document metadata. When initial publishing occurs,
the correlation ID is already set in the context. For republishing, however, the
correlation ID is extracted and set before publishing the event, thereby maintaining
traceability for disjointed request flows.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.