by Mark Soule | at Minnebar 19
Real-time streaming and event-driven technologies like Apache Kafka, Google Pub/Sub, Apache Flink, and RabbitMQ all encounter a similar issue - duplicate events. Delivery guarantees, network retries, crashes, and system restarts can introduce duplicate messages, making it a pervasive challenge for data-intensive applications. Deduplication ensures data accuracy and consistency in real-time systems, preventing inflated metrics, overcounting, and confusion downstream. By handling redundant events effectively, organizations maintain data integrity, reduce storage overhead, and produce reliable analytics.
In this guide, we’ll explore practical, battle-tested strategies for handling duplicates across popular messaging and streaming technologies. We’ll cover the core concepts of unique event identifiers, idempotent operations, consumer-side state tracking, and exactly-once semantics. Developers will leave with a clear playbook of best practices and strategies to build deduplicated, high-throughput data pipelines.
Mark Soule is a technologist specializing in distributed systems, with a particular focus on event-driven systems. He is proud to be a Principal Engineer at Improving where he has led many successful projects. Mark is a Minnesota native and still lives there with his daughter. Outside of work you can expect to find him with a Nintendo controller in his hands.
This will add your name to the list of interested participants. It will help us gauge interest for scheduling purposes.
Does this session sound interesting? You may also like these: