Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Flink Forward Session Preview: Real-time Experiment Analytics at Pinterest with Apache Flink


by

Flink Forward Europe 2019 is fast approaching and we would like to share with the Apache Flink community some more information about the talk that we will be presenting at the conference in October. In the following sections, we will present how we leverage Apache Flink at Pinterest to run our real-time experiment analytics at scale. We will present some background information about our talk, the topics that we will cover in our presentation and some takeaways attendees can expect to hear from our talk: Real-time Experiment Analytics at Pinterest with Apache Flink, scheduled on October 8, 2019!

If you want to find out how Pinterest uses Flink to run real time experiment analytics at scale, get your pass and attend our presentation on October 8, 2019!

Real-time Experiment Analytics at Pinterest with Apache Flink

Real-time Experiment Analytics at Pinterest with Apache Flink

Background

At Pinterest, we run thousands of experiments every day. We mostly rely on daily experiment metrics to evaluate experiment performance. The daily pipelines can take 10+ hours to run and sometimes are delayed. This has created some inconvenience in verifying the setup of the experiment, the correctness of triggering, and the expected performance of the experiment. To solve this, we developed a near real time experimentation platform based on Apache Flink that allows us to have fresher experiment metrics that help us catch any issues as soon as possible.

Topics covered

This talk will cover a detailed overview of our challenges and how we developed Pinterest’s real-time experimentation platform to overcome them. During our session, we will go over the following aspects of our approach:

  • We will discuss the high-level design of our system consisting of Flink, Kafka and HBase

  • We will showcase how we used Flink's IntervalJoin to associate user action with A/B experiment metadata in a timely fashion.

  • We will explain how we utilized Flink's MapState and KeyedProcessFunction to perform customized aggregation and handle message-lateness

  • We will showcase how we used the Flink UI and Flink Metrics

  • Finally, we will cover how we performed data validation for our streaming jobs

Key takeaways

What you will learn during our sessions? Some of the things you can expect to hear from us are the following:

  1. The use of KeyedProcessFunction and State to implement complex stateful operations and handle message lateness

  2. How to detect and deal with data-skew, back-pressure and checkpoint failure

  3. How to make best use of unit test utils and monitoring metrics provided by Flink

  4. How we combined batch and streaming processing for data validation

Don’t forget to register before September 20 to secure your spot and immerse yourself in the exciting world of stream processing and Apache Flink! See you in Berlin in a few weeks!

FFEU 2019 - Banner - wide

About the authors:

Parag Kesar

Parag Kesar

Experienced Software Engineer with 15 years of professional experience building scalable, distributed, high-performance web applications, backend services, and big data applications. Experience working with high scale systems like Apple's IdMS and Ooyala's recommendation engine.

Languages - Java, Scala, Python

Big Data - Apache Spark, HBase, Elastic Search, Couchbase NoSQL, Cassandra, Flink Machine Learning - Content and Collaborative filtering algorithms for video recommendations based on Spark

Ben Liu

Ben Liu-1

Software engineer at Pinterest focusing on large scale data analytics with work experience in Spark, Hive, Flink and HBase.

Before joining, Ben Liu graduated from Stanford University as an MS student in Statistics with a background in Computer science.

New call-to-action

Topics:
Article by:

Parag Kesar & Ben Liu

Comments

Our Latest Blogs

Ververica celebrates as Apache Paimon Graduates to Top-Level Project featured image
by Kaye Lincoln and Karin Landers 18 April 2024

Ververica celebrates as Apache Paimon Graduates to Top-Level Project

Congratulations to the Apache Software Foundation and each individual contributor on the graduation of Apache Paimon from incubation to a Top-Level Project! Apache Paimon is a data lake format that...
Read More
Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 09 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More
Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community featured image
by Ververica 03 April 2024

Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community

Ververica has officially donated Flink Change Data Capture (CDC) to the Apache Software Foundation. In this blog, we’ll explore the significance of this milestone, and how it positions Flink CDC as a...
Read More