Flink Forward Session Preview: Real-time Experiment Analytics at Pinterest with Apache Flink

September 17, 2019 | by Parag Kesar & Ben Liu

Flink Forward Europe 2019 is fast approaching and we would like to share with the Apache Flink community some more information about the talk that we will be presenting at the conference in October. In the following sections, we will present how we leverage Apache Flink at Pinterest to run our real-time experiment analytics at scale. We will present some background information about our talk, the topics that we will cover in our presentation and some takeaways attendees can expect to hear from our talk: Real-time Experiment Analytics at Pinterest with Apache Flink, scheduled on October 8, 2019!

If you want to find out how Pinterest uses Flink to run real time experiment analytics at scale, get your pass and attend our presentation on October 8, 2019!

Real-time Experiment Analytics at Pinterest with Apache Flink

Real-time Experiment Analytics at Pinterest with Apache Flink

 

Background

At Pinterest, we run thousands of experiments every day. We mostly rely on daily experiment metrics to evaluate experiment performance. The daily pipelines can take 10+ hours to run and sometimes are delayed. This has created some inconvenience in verifying the setup of the experiment, the correctness of triggering, and the expected performance of the experiment. To solve this, we developed a near real time experimentation platform based on Apache Flink that allows us to have fresher experiment metrics that help us catch any issues as soon as possible.

 

Topics covered

This talk will cover a detailed overview of our challenges and how we developed Pinterest’s real-time experimentation platform to overcome them. During our session, we will go over the following aspects of our approach: 

  • We will discuss the high-level design of our system consisting of Flink, Kafka and HBase   

  • We will showcase how we used Flink's IntervalJoin to associate user action with A/B experiment metadata in a timely fashion. 

  • We will explain how we utilized Flink's MapState and KeyedProcessFunction to perform customized aggregation and handle message-lateness

  • We will showcase how we used the Flink UI and Flink Metrics

  • Finally, we will cover how we performed data validation for our streaming jobs

 

Key takeaways

What you will learn during our sessions? Some of the things you can expect to hear from us are the following:

  1. The use of KeyedProcessFunction and State to implement complex stateful operations and handle message lateness

  2. How to detect and deal with data-skew, back-pressure and checkpoint failure

  3. How to make best use of unit test utils and monitoring metrics provided by Flink

  4. How we combined batch and streaming processing for data validation 

 

Don’t forget to register before September 20 to secure your spot and immerse yourself in the exciting world of stream processing and Apache Flink! See you in Berlin in a few weeks!

FFEU 2019 - Banner - wide

About the authors: 

Parag Kesar

Parag Kesar

Experienced Software Engineer with 15 years of professional experience building scalable, distributed, high-performance web applications, backend services, and big data applications. Experience working with high scale systems like Apple's IdMS and Ooyala's recommendation engine.

Languages - Java, Scala, Python

Big Data - Apache Spark, HBase, Elastic Search, Couchbase NoSQL, Cassandra, Flink Machine Learning - Content and Collaborative filtering algorithms for video recommendations based on Spark

Ben Liu

Ben Liu

Software engineer at Pinterest focusing on large scale data analytics with work experience in Spark, Hive, Flink and HBase.

Before joining, Ben Liu graduated from Stanford University as an MS student in Statistics with a background in Computer science.

 

 

Topics: Flink Forward

Article by:

Parag Kesar & Ben Liu

Related articles

Comments

Sign up for Monthly Blog Notifications

Please send me updates about products and services of Ververica via my e-mail address. Ververica will process my personal data in accordance with the Ververica Privacy Policy.

Our Latest Blogs

by Alexander Fedulov July 09, 2020

Presenting our Streaming Concepts & Introduction to Flink Video Series

Transitioning from the batch data processing world into the world of stream processing and real time analytics can be challenging. Throughout this process, there are many new concepts you need to...

Read More
by Konstantin Knauf July 06, 2020

Announcing Early Access Program for Flink SQL in Ververica Platform

Been wondering what's next for Ververica Platform? Maybe you've already guessed: Flink SQL is coming to Ververica Platform later this year! Today we are excited to announce our Early Access Program

Read More

Data-driven Matchmaking at Azar with Apache Flink

The Hyperconnect team attended Flink Forward for the first time a couple of months back and presented how we utilize Apache Flink to perform real time matchmaking for the video-based social...

Read More