How We Solved Kafka Event Sequencing in Our Online Grocery Application
🛒 Introduction: When Event Order Does Matter
In an event-driven architecture, event sequencing can sometimes be overlooked, especially when the order of events isn’t critical. However, in our online grocery application, we encountered a situation where event order became crucial for ensuring accurate processing of cashback promotions.
In this blog post, I will share how we solved the Kafka event sequencing issue in our system. I will discuss the challenges we faced, the potential solutions we explored, and how we ultimately implemented a strategy to ensure that events were processed in the correct order. By adjusting our approach to Kafka partitioning and event handling, we were able to resolve the issue while also preparing the system for future scalability.
🧾 The Business Context: Cashback in an Online Grocery App
Before diving into the technical details, let’s discuss the context of our project. Our client operates an online grocery shop application where customers can purchase items. After an order is placed, the items are collected from the store and delivered to the customer.
Within this application, we’ve implemented a “cashback promotion” on certain products. For example, if you purchase a chocolate for $2, the application credits $1 back to your wallet after you receive your order.
📦 The Architecture Behind Our Cashback System
To manage these promotions effectively, state changes related to each order must be communicated to a third-party system. Our architecture consists of two main services:
- Order Service: Responsible for creating and updating orders.
-Order-Cashback Integration Service: Facilitates communication between the order service and the third-party system.
Communication between these services is managed through Kafka topics such as OrderCashbackCreated
, OrderCashbackUpdated
, OrderCashbackCancelled
, and OrderCashbackCompleted
.
The Order Service produces messages based on order updates and sends them to Kafka. Subsequently, the order-cashback-integration service consumes these messages and sends REST requests to the third-party system.
Below is an architecture diagram that provides a visual overview of this process.
⚠️ The Problem: Events Out of Order
After initiating this project, one day our system started sending too many alerts. We checked the exceptions and discovered that the OrderCashbackUpdated
event was consumed before the OrderCashbackCreated
event. As a result, the OrderCashbackUpdatedConsumer
was throwing exceptions.
We checked Kafka and realized that there was a lag in the OrderCashbackCreated
topic, but no lag in OrderCashbackUpdated
. This made sense, as updates are generally less frequent than creates.
🔍 First Thoughts: Is It a Consumer Speed Issue?
Initially, we thought the problem was the speed of the OrderCashbackCreatedConsumer
. We considered increasing the number of partitions in the OrderCashbackCreated
topic. This would allow the consumer to process events faster, potentially preventing errors.
However, after a brainstorming session, we realized that this approach wouldn’t guarantee the correct order of consumption. If the system receives many orders from users simultaneously, how can we ensure that the OrderCashbackCreated
event is consumed before any updates?
💭 Considering Alternatives: What About HTTP?
We also considered using HTTP calls instead of Kafka to preserve the order of events. While this approach ensures sequencing, it would have required significantly more effort to implement and could have introduced scalability issues in the future.
Because of these trade-offs, we decided against this solution.
✅ The Solution: One Topic, Partitioned by Order ID
Ultimately, we recognized that Kafka ensures ordered consumption within a single partition. So, we decided to combine all cashback-related events into a single topic and ensure that all events related to a specific order were routed to the same partition.
By using Kafka’s default partitioner and setting the order ID as the message key, we guaranteed that all events for a given order would be processed in the correct sequence. Below is our final architecture diagram.
🧰 Implementation Highlights
For the implementation, we leveraged Kotlin’s generics to handle diverse types of events. Below is an example of the event class we used, designed to support different event types. These events are published by the order-service and consumed by the order-cashback-integration service.
data class OrderCashbackPromotionEvent(
val payload: T,
val type: OrderCashbackPromotionCommandType
)
Our example create event is below.
{
"payload": {
"orderId": 24323523,
"storeNo": "54645",
"orderItemsWithDiscountSummaryItems": [
{
"productItem": {
"externalProductId": "67867",
"quantity": 2,
"unit": "KG",
"originalPrice": 156,
"discountedPrice": 153,
"finalTotalPrice": 306,
"vatRate": 1
}
}
],
"buyerId": "1234",
"transactionId": "5661042b-cc39-4567-a264-265943097139",
"transactionDate": "2025-02-28T13:36:06.026732837Z"
},
"type": "CREATED"
In the integration service, we needed to consume multiple event types using a single consumer. To handle this, we applied the Strategy Pattern.
Each event type has a corresponding processor. The consumer identifies and invokes the correct processor based on the event.
@Component
class OrderCashbackPromotionConsumer(
private val cashbackPromotionProcessorProvider: CashbackPromotionProcessorProvider,
) {
private val logger = LoggerFactory.getLogger(this.javaClass)
@KafkaListener(
topics = ["\${kafka.topics.orderCashbackPromotion}"],
concurrency = "\${kafka.listeners.orderCashbackPromotion.concurrency}"
)
fun receive(@Payload orderCashbackPromotionEvent: OrderCashbackPromotionEvent) {
runBlocking {
runCatching {
cashbackPromotionProcessorProvider.getProcessor(orderCashbackPromotionEvent.type)
?.process(orderCashbackPromotionEvent)
}.onFailure {
logger.error(
"OrderCashbackPromotionConsumer consumer failed, eventPayload: ${orderCashbackPromotionEvent.payload}",
it
)
throw it
}
}
}
}
Within each processor, we deserialize the payload using ObjectMapper
and forward it to the responsible downstream service. An example processor for a "create" event is shown below.
@Service
class CashbackPromotionCreatedProcessor(
private val orderCashbackPromotionService: OrderCashbackPromotionService,
private val objectMapper: ObjectMapper
) :
CashbackPromotionProcessor {
override suspend fun process(orderCashbackPromotionEvent: OrderCashbackPromotionEvent) {
val payload = objectMapper.convertValue(orderCashbackPromotionEvent.payload, OrderCashbackPromotionCreatedEvent::class.java)
orderCashbackPromotionService.handleOrderCashbackPromotionCreated(payload)
}
override fun getType(): OrderCashbackPromotionCommandType {
return OrderCashbackPromotionCommandType.CREATED
}
}
📈 Preparing for Scale: Handling Partition Increases
Another challenge was how to handle increased system load, particularly the need to add more Kafka partitions in the future. Adding partitions can disrupt message consumption order due to partition rebalancing.
To mitigate this, we persist all events in our database, assigning each one a transactionDate
timestamp that represents the publishing time. This allows us to replay events in their original chronological order.
To handle failed events, we created an endpoint that accepts an orderId
as a parameter. This endpoint queries the database and republishes the related events, ordered by transactionDate
.
🔁 Replaying Events: Our Recovery Mechanism
By persisting events and replaying them with their original timestamps, we not only solved the sequencing issue but also enabled graceful failure recovery. This mechanism gave us the flexibility to rebuild the order lifecycle in a deterministic manner, without breaking downstream assumptions.
🚀 Final Thoughts: Lessons from Event Sequencing
In this journey to optimize our online grocery store’s cashback promotion system, we navigated several technical challenges that tested our understanding and application of Kafka within a microservices architecture. Our initial struggle with the order of event consumption highlighted the intricacies of event-driven systems and the importance of maintaining a robust and scalable infrastructure.
After evaluating multiple solutions, we implemented a strategy that combined all related events into a single Kafka topic and ensured ordered processing by partitioning events based on order IDs. This approach not only resolved the issue of events being processed out of sequence but also prepared our system for future scalability.
With these modifications we improved reliability, resulting in a smoother user experience and increased satisfaction for both customers and our development team.
Veysel Pehlivan
Software Developer | kloia