Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,102 @@
limitations under the License.
-->
# Outbox vs Persistent Publisher: Choosing the Right One

When using `KafkaEventPublisher`, produced events are generated and sent to the Kafka
broker in an ephemeral manner: there's no ability to generate the same events again.
The Persistent Publisher is a Hexkit provider that solves this problem by storing
a representation of events in MongoDB as they are published. This makes the database
the de facto backup of Kafka, to an extent. The Persistent Publisher class is
called `PersistentEventPublisher`, located in the `MongoKafka` subpackage.

The other `MongoKafka` provider, the Outbox Publisher, solves a different problem.
Microservices sometimes need to share data without sharing access to the same database.
The Outbox Publisher is essentially a MongoDB DAO retrofitted with a Kafka publisher.
It attaches a small amount of metadata to documents as it upserts or deletes data.
Upon each modifying operation, the provider run the given document through a
user-defined function to produce a Kafka payload which it then publishes. If a
document is deleted, the provider merely publishes a tombstone record with the object
ID as the key. When another service wants to access the data in question, it
subscribes to the Kafka topic housing the outbox events. The Outbox Publisher is
called `MongoKafkaDaoPublisher`, also located in the `MongoKafka` subpackage.

Both the `PersistentEventPublisher` and the `MongoKafkaDaoPublisher` save data to
the database, but they are intended for use in different situations, and are not
interchangeable.

### When to use the Persistent Publisher (`PersistentEventPublisher`)
The Persistent Publisher should be employed when:
- The publisher needs to be used like a normal `KafkaEventPublisher`.
- The data published *does not* represent a stateful entity, like a domain object.
- You want to be able to publish the same Kafka event again.
- Same topic, event type, correlation ID, event ID, payload, etc.
- Often these are events that initiate a sequence of events, or Kafka flow.

### When to use the Outbox Publisher (`MongoKafkaDaoPublisher`)
The Outbox Publisher should be used when:
- The publisher needs to be used like a normal `MongoDbDao`.
- The data published represents the latest state of a domain object
- Examples: user data, access requests, order status, etc.
- Your service needs to inform other services about changes to a given MongoDB
collection.
- You're focused on **storing** data. The Outbox Publisher cannot publish arbitrary
events like the Persistent Publisher -- it behaves like the MongoDB DAO and publishes
events in the background.


## Further Details

### How the Two Publishers Store Information
The Persistent Publisher stores data according to the Pydantic model defined
[here](../../../src/hexkit/providers/mongokafka/provider/persistent_pub.py):

```python
class PersistentKafkaEvent(BaseModel):
"""A model representing a kafka event to be published and stored in the database."""

id: str = Field(
...,
description="The unique ID of the event. If the topic is set to be compacted,"
+ " the ID is set to the topic and key in the format <topic>:<key>. Otherwise"
+ " the ID is set to a random UUID.",
)
topic: Ascii = Field(..., description="The event topic")
type_: Ascii = Field(..., description="The event type")
payload: JsonObject = Field(..., description="The event payload")
key: Ascii = Field(..., description="The event key")
headers: Mapping[str, str] = Field(
default_factory=dict,
description="Non-standard event headers. Correlation ID and event type are"
+ " transmitted as event headers, but added as such within the publisher"
+ " protocol. The headers here are any additional header that need to be sent.",
)
correlation_id: str = Field(..., description="The event correlation ID")
created: datetime = Field(
..., description="The timestamp of when the event record was first inserted"
)
published: bool = Field(False, description="Whether the event has been published")
```

By contrast, the Outbox Publisher saves the domain object like a DAO normally would,
but adds a `__metadata__` field. If we were saving data from a `User` class with
a `user_id`, `name` and `email` field, the saved data might look like this:

```json
{
"__metadata__": {
"correlation_id": <UUID4>,
"published": true,
"deleted": false,
},
"_id": <UUID4>, // hexkit converts models' ID field names to the MongoDB `_id`
"name": "John Doe",
"email": "[email protected]"
}
```

### Republishing
Both providers expose the methods `publish_pending` and `republish`. The former
only publishes documents where `published` is false, while the latter will republish
all its documents regardless of publish status. We suggest implementing a method
in your service accessible via CLI in order to trigger republishing as a job.
In both cases, the provider will use the stored correlation ID when republishing data.