Use Case Track  

High cardinality data stream processing with large states

  

At Klaviyo, we process more than a billion events daily with spikes as high as 75,000/s on peak days. The workload is growing exponentially year over year. We migrated our legacy event processing pipeline from Python to Flink in 2018 and gained a tremendous amount of performance. At the same time, we reduced our AWS EC2 instances from over 100 nodes to 15. Due to the nature of multi-tenancy and diverse dataset for over a billion user profiles, we spent a lot of engineering effort solving performance bottlenecks specific to handling high cardinality data streams in Flink. In this talk, we would like to share the learnings on using keyed operator states, windowing on high cardinality data, and making Flink production ready. We will also share the journey of moving from a 99% Python culture to Java.

Authors

Ning Shi
Ning Shi
Klaviyo

Ning Shi

Ning Shi works with the engineering teams that focus on event processing, data storage, and analytics at Klaviyo. Previously he worked on building distributed in-memory database VoltDB. His passionate about building high performance distributed systems.