Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Flink Forward Session Preview: Streaming Event-Time Partitioning With Apache Flink and Apache Iceberg at Netflix


by

Are you thinking of joining Flink Forward Europe 2019? As a first-time speaker at the event, I am excited to meet the Apache Flink community and share how we leveraged Apache Flink and Apache Iceberg (Incubating) at Netflix to build streaming event-time partitioning at scale. In this post, I will give a sneak preview of my talk Streaming Event-Time Partitioning With Apache Flink and Apache Iceberg that you can attend October 8, 2019!

If you haven’t done so already, go ahead and register to learn more from exciting use cases and technical deep dives on October 8 & 9, or attend one of the Apache Flink training sessions scheduled on October 7!

Here’s what my talk will be about at this year’s Flink Forward Europe:

Streaming Event-Time Partitioning With Apache Flinkand Apache Iceberg

FFEU19 - Julia Bennett - Netflix

Background

At Netflix, we’ve seen a lot of success and also valuable learnings building some of our core data pipelines with near real-time stream processing in Flink. One challenge that we hadn’t yet tackled though was landing data directly from a stream into a table partitioned on event time. Instead, we were often relying on post-processing batch jobs that “put the data in the right place”.

Our playback data, in particular, depends heavily on a repartitioning pattern to handle a long tail of late-arriving events. This partitioning ensures that all playbacks that occurred on any given date can easily be queried together, regardless of what time we actually processed those events. This is a critical dataset with high volume used broadly across Netflix, powering product experiences, A/B test metrics, and offline insights.

With the introduction of Apache Iceberg as a new table format being developed at Netflix and a supporting internal Flink integration, we decided it was time to give streaming event-time partitioning a shot. As an important dataset with a growing number of low latency requirements, playback data was the perfect candidate to put it to the test.

Topics covered

In this talk, I will focus on what you do with your data after your streaming pipelines by discussing a pattern for performing event-time partitioning in Flink on high volume streams.

I’ll give an overview of how we process playback data at Netflix and why event-time partitioning is so important. I’ll briefly introduce Iceberg and how it helps solve this problem, and then describe our implementation of generic event-time partitioning on streams through configurable Flink components that leverage Iceberg as an output table format. And of course, discuss how we applied this generic implementation to improve playback data! I’ll share some of the challenges we hit along the way and talk through tradeoffs with this approach.

Key takeaways

Interested in what you will learn from my session? Here are some key takeaways the Apache Flink community can expect:

  • A generic pattern for event-time partitioning at scale with Flink and Iceberg

  • Challenges and tradeoffs with a streaming approach to event-time partitioning

  • Overview of how we process Netflix’s core playback data and learnings from moving it to this pattern

  • A brief introduction to Apache Iceberg as a new table format and how it integrates with Flink

Make sure to secure your spot before September 15 by registering on the Flink Forward website. Sessions this year cover multiple aspects of stream processing with Apache Flink such as use cases, technology deep dives, ecosystem, operations, community and research talks among others! Grab your pass and come to meet the Apache Flink community in Berlin this October!

FFEU 2019 - Banner - wide (1)

 

New call-to-action

Topics:
Julia Bennett
Article by:

Julia Bennett

Find me on:

Comments

Our Latest Blogs

Ververica celebrates as Apache Paimon Graduates to Top-Level Project featured image
by Kaye Lincoln and Karin Landers 18 April 2024

Ververica celebrates as Apache Paimon Graduates to Top-Level Project

Congratulations to the Apache Software Foundation and each individual contributor on the graduation of Apache Paimon from incubation to a Top-Level Project! Apache Paimon is a data lake format that...
Read More
Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 06 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More
Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community featured image
by Ververica 03 April 2024

Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community

Ververica has officially donated Flink Change Data Capture (CDC) to the Apache Software Foundation. In this blog, we’ll explore the significance of this milestone, and how it positions Flink CDC as a...
Read More