Massive Scale Data Processing at Netflix using Flink

Written by Pallavi Phadnis | 14 March 2019

Authors:Pallavi Phadnis& Snehal Nagmote

Getting ready for Flink Forward San Francisco next month? This is our first time speaking at the conference and we can’t wait to meet the Apache Flink community and share our experiences using Flink at Netflix! In this blog post, we will give you a sneak preview of our talk Massive Scale Data Processing at Netflix using Flink that you can attend on April 2!

If you haven’t done so already, please go ahead and register for the event to attend our talk and hear more exciting use cases and stream processing developments!

Here’s a sneak preview of our talk during this year’s Flink Forward San Francisco:

Massive Scale Data Processing at Netflix using Flink

Background

Over 137 million members worldwide are enjoying TV series, feature films across a wide variety of genres and languages on Netflix. It leads to petabyte scale of user behavior data. At Netflix, our client logging platform collects and processes this data to empower recommendations, personalization and many other services to enhance user experience.

Last year, we redesigned the legacy consolidated logging pipeline and migrated to a managed scalable stream-processing Flink-based platform. This has resulted in improved computation efficiency, reduced costs and operational overhead. Considering the massive scale that continues to grow and the business criticality of this most upstream system, we have some interesting learnings to share with the Apache Flink user community.

Topics covered

In this presentation, we will give you an overview of the consolidated logging platform at Netflix.

Built with Apache Flink, this platform processes hundreds of billions of events and petabyte data per day, 2.5 million events/sec in sub-milliseconds latency. The processing involves a series of data transformations such as decryption and data enrichment of customer, geo, device information using microservices-based lookups.

The transformed and enriched data is further used by multiple data consumers for a variety of applications such as improving user-experience with A/B tests, tracking application performance metrics, tuning algorithms and more. This causes redundant reads of the dataset by multiple batch jobs and incurs heavy processing costs. To avoid this, we have developed a configuration-driven, centralized, managed platform, on top of Apache Flink, that reads this data once and routes it to multiple streams based on dynamic configuration.

We will deep dive into the architecture and design patterns. Additionally, we will share the challenges and learnings related to Apache Flink.

Key takeaways

Interested in what you will learn from our session? Here are some key takeaways the Apache Flink community can expect from our session on April 2, 2019:

Challenges of embarrassingly parallel Flink application
Learnings from running one of the largest Flink applications in production at Netflix, in terms of scale
Best practices to ensure the production systems are scalable and cost-efficient

Make sure to secure your spot before March 23 by registering on the Flink Forward website. Sessions cover multiple aspects of stream processing with Apache Flink such as use cases, technology deep dives, ecosystem and operations so don’t miss out on the exciting Flink talks!

About the authors:

Pallavi Phadnis is a Senior Software Engineer at Netflix working on the Consolidated Logging team. At Netflix, she contributes to the backend systems that process user behavior data to enable analytics and personalization using technologies such as Flink, Spark, Kafka, Hadoop etc. Prior to Netflix, she worked for several years building large scale, high-performance batch and real-time processing systems in domains such as e-commerce and online advertising at WalmartLabs, eBay and advertising.com. Pallavi has a Master's degree from Carnegie Mellon University, Pittsburgh.

Snehal Nagmote is Senior Software Engineer on the Consolidated Logging Team at Netflix. She focuses on building batch and real time data pipelines for processing user behavioral data using Flink, Spark, Kafka. Prior to Netflix, she worked on the real time search indexing system at WalmartLabs using Spark and Cassandra. She has several years of experience of building distributed and large scale data processing systems.

View full post