Real-time Experiment Analytics at Pinterest with Apache Flink

Written by Parag Kesar & Ben Liu | 17 September 2019

Flink Forward Europe 2019 is fast approaching and we would like to share with the Apache Flink community some more information about the talk that we will be presenting at the conference in October. In the following sections, we will present how we leverage Apache Flink at Pinterest to run our real-time experiment analytics at scale. We will present some background information about our talk, the topics that we will cover in our presentation and some takeaways attendees can expect to hear from our talk: Real-time Experiment Analytics at Pinterest with Apache Flink, scheduled on October 8, 2019!

If you want to find out how Pinterest uses Flink to run real time experiment analytics at scale, get your pass and attend our presentation on October 8, 2019!

Real-time Experiment Analytics at Pinterest with Apache Flink

Background

At Pinterest, we run thousands of experiments every day. We mostly rely on daily experiment metrics to evaluate experiment performance. The daily pipelines can take 10+ hours to run and sometimes are delayed. This has created some inconvenience in verifying the setup of the experiment, the correctness of triggering, and the expected performance of the experiment. To solve this, we developed a near real time experimentation platform based on Apache Flink that allows us to have fresher experiment metrics that help us catch any issues as soon as possible.

Topics covered

This talk will cover a detailed overview of our challenges and how we developed Pinterest’s real-time experimentation platform to overcome them. During our session, we will go over the following aspects of our approach:

We will discuss the high-level design of our system consisting of Flink, Kafka and HBase
We will showcase how we used Flink's IntervalJoin to associate user action with A/B experiment metadata in a timely fashion.
We will explain how we utilized Flink's MapState and KeyedProcessFunction to perform customized aggregation and handle message-lateness
We will showcase how we used the Flink UI and Flink Metrics
Finally, we will cover how we performed data validation for our streaming jobs

Key takeaways

What you will learn during our sessions? Some of the things you can expect to hear from us are the following:

The use of KeyedProcessFunction and State to implement complex stateful operations and handle message lateness
How to detect and deal with data-skew, back-pressure and checkpoint failure
How to make best use of unit test utils and monitoring metrics provided by Flink
How we combined batch and streaming processing for data validation

Don’t forget to register before September 20 to secure your spot and immerse yourself in the exciting world of stream processing and Apache Flink! See you in Berlin in a few weeks!

About the authors:

Parag Kesar

Experienced Software Engineer with 15 years of professional experience building scalable, distributed, high-performance web applications, backend services, and big data applications. Experience working with high scale systems like Apple's IdMS and Ooyala's recommendation engine.

Languages - Java, Scala, Python

Big Data - Apache Spark, HBase, Elastic Search, Couchbase NoSQL, Cassandra, Flink Machine Learning - Content and Collaborative filtering algorithms for video recommendations based on Spark

Ben Liu

Software engineer at Pinterest focusing on large scale data analytics with work experience in Spark, Hive, Flink and HBase.

Before joining, Ben Liu graduated from Stanford University as an MS student in Statistics with a background in Computer science.

View full post