diff --git a/documentation/documentation/arch_concepts/event_driven_arch.md b/documentation/documentation/arch_concepts/event_driven_arch.md index 153d95f6..bb3d5cd7 100644 --- a/documentation/documentation/arch_concepts/event_driven_arch.md +++ b/documentation/documentation/arch_concepts/event_driven_arch.md @@ -15,3 +15,147 @@ limitations under the License. --> # Event-Driven Architecture + +This document provides an overview of the communication methods between +microservices using Apache Kafka. It offers both a high-level explanation of +how information is sent and received and a look at the layers of +abstraction implemented within Hexkit. The document also outlines how +a design pattern called the outbox pattern is employed to solve certain challenges. + +## Event-driven Architecture with Apache Kafka + +> See the [Kafka documentation](https://kafka.apache.org/intro) for in-depth +> information on concepts and Kafka's inner workings. + +Event-driven architecture refers to a protocol where information is exchanged +indirectly and often asynchronously by interacting with a broker. Rather than +making direct API calls from one service to another, one service, called the *producer*, +publishes information to a broker in the form of an *event* (also called a +*message*), and is ignorant of all downstream processing of the event. It is essentially +"fire and forget". Other services, called *consumers*, can consume the event once it has +been published. The same way the producer is ignorant of an event's fate, the consumers +are ignorant of its origin. This has important implications for design. +In Kafka, events have metadata headers, a timestamp, a *key*, and a *payload*. +Hexkit defines a `type` field in the metadata headers to further distinguish events. +Events are organized into configurable *topics*, and further organized within each topic +by key. A topic can have any number of producers and consumers, and messages are not +deleted after consumption (enabling rereads). + +In the context of microservices, event-driven architecture provides some important benefits: + +1. **Decoupling of Service Dependencies:** Services communicate by emitting events, +rather than direct requests, reducing service coupling. Keeping contractual definitions +(i.e. payload schemas) in a dedicated library means services can evolve with minimal +change propagation. +1. **Asynchronous Communication:** Kafka facilitates asynchronous data processing, +allowing services to operate independently without waiting for responses. This improves +performance and fault tolerance, as services can continue functioning even if one +component is slow or temporarily unavailable. +1. **Consistency and Order Guarantee:** Kafka maintains the order of events within +a partition for a given key, crucial for consistency in processing events that are +related or dependent upon one another. + + +But event-driven architecture also introduces challenges, too: + +1. **Duplicate processing:** Because the event consumers have no way to know whether +an event is a duplicate or a genuinely distinct event with the same payload, the +consumers must be designed to be idempotent. +2. **Data Duplication:** If service A needs to access the data maintained by service B, +the options are essentially an API call (which introduces security and coupling concerns) +or duplicating the data for service A. GHGA has opted mitigate this issue via data +duplication by way of the outbox pattern, described later in this document. +3. **Tracing:** A single request flow can span many services with a multitude of a messages +being generated along the way. The agnostic nature of producers and consumers makes it +harder to tag requests than with traditional API calls, and several consumers can +process a given message in parallel, sometimes leading to complex request flows. +GHGA enables basic tracing by generating and propagating a *correlation ID* (also called +a request ID) as an event header. + + +## Abstraction Layers + +Hexkit features protocols and providers to simplify using +Kafka in services. Instead of writing the functionality from scratch in every new +project, Hexkit provides the `KafkaEventPublisher` and `KafkaEventSubscriber` +provider classes, in addition to an accompanying `KafkaConfig` Pydantic config class. + + +### Producers + +When a service publishes an event, it uses the `KafkaEventPublisher` provider class +along with a service-defined _translator_ (a class implementing the `EventPublisherProtocol`). +The purpose of the translator is to provide a domain-bound API to the core of the service for +publishing an event without exposing any of the specific provider-related details. The core +calls a method like `publish_user_created()` on the translator, and the translator provides +the requisite payload, topic, type, and key to the `KafkaEventPublisher`. +The translator's module usually features a config class for defining the topics and event +types used. It's important to note that a translator is not strictly required in order to +use the `KafkaEventPublisher`. As long as the required configuration is provided, there's +nothing to stop the core from using it directly. + +### Consumers + +While the `KafkaEventPublisher` can be used in isolation, the `KafkaEventSubscriber` requires +a *translator* to deliver event payloads to the service's core components. The translator is +defined outside of Hexkit (i.e. in the service code) and implements the `EventSubscriberProtocol`. +The provider hands off the event to the translator through the protocol's `consume` method, +which in turn calls an abstract method implemented by the translator. The translator receives +the event and examines its attributes (the topic, payload, key, headers, etc.) to determine +processing. Naturally, the actual processing logic varies, but usually the payload is +validated against the expected schema and used to call a method in the service's core. +When control is returned to the `KafkaEventSubscriber`, the underlying consumer (from +`AIOKafka`) saves its offsets, declaring that it has successfully handled the event and +is ready for the next. + +A diagram illustrating the process of event publishing and consuming, starting with +the microservice: +![Kafka abstraction](../img/kafka%20basics%20generic.png) + + +## Outbox Pattern + +Sometimes services need to access data owned by another service. +The challenge of enabling this kind of linkage involves ensuring data consistency and avoiding +coupling. The outbox pattern provides a way to mitigate these risks by publishing persisted data +as Kafka events for other services to consume as needed. Some advantages include: +- Kafka data can be republished from the database as needed (fault tolerance) +- The coupling between services is minimized +- The data producer remains the single source of truth +- Kafka events no longer need to be backed up separately from the database + + +### Topic Compaction & Idempotence + +Topic compaction means retaining only the latest event per key in a topic. Why do this? +In Hexkit's implementation of the outbox pattern, published events are categorized as one +of two event types: `upserted` or `deleted`. The entire state of the object, insofar as +it is exposed by the producer, is represented in every event. +Consumers interpret the event type and act accordingly, based on their needs (although +this typically translates to at least upserting or deleting the event in their own database). +When combined with topic compaction, the result is that consumers only have to get the +latest event to be up to date with the originating service. + +### Implications + +One requirement of using the outbox pattern as implemented in Hexkit is that consumers +must be idempotent. The state of Kafka is stored in the database and can be republished +as needed, so services must be prepared to consume the same event more than once. Event +republishing might need to occur at any time, so it is possible for sequence-dependent +events to be re-consumed out of sequence. Depending on the complexity of the microservice +landscape, it can be difficult to foresee all the potential cases where out-of-order +events could be problematic. That's why it's paramount to perform robust testing and use +features like topic compaction where necessary. + +## Tracing + +Request flows are traced with a `correlation ID`, or request ID, which is generated at +the request origin. The correlation ID is propagated through subsequent services by +adding it to the metadata headers of Kafka events. When a service consumes an event, the +provider automatically extracts the correlation ID and sets it as a +[context variable](https://docs.python.org/3.9/library/contextvars.html#context-variables) +before calling the translator's `consume()` method. In the case of the outbox mechanism, +the correlation ID is stored in the document metadata. When initial publishing occurs, +the correlation ID is already set in the context. For republishing, however, the +correlation ID is extracted and set before publishing the event, thereby maintaining +traceability for disjointed request flows. diff --git a/documentation/documentation/img/kafka basics generic.png b/documentation/documentation/img/kafka basics generic.png new file mode 100644 index 00000000..45565a99 Binary files /dev/null and b/documentation/documentation/img/kafka basics generic.png differ