Skip to content

The Apache Flink Story at Pinterest - Flink Forward Global 2021


by

On October 27, at the annual Apache Flink user conference, Flink Forward Global 2021, Pinterest Tech Lead, Chen Qin will deliver a keynote talk on “Sharing what we love: The Apache Flink Story at Pinterest”. Chen has been using Apache Flink since late 2015, and he’s been attending Flink Forward conferences since 2017.

Pinterest is a visual discovery platform that helps over 478 million Pinners find and share ideas that spark inspiration. In Chen’s presentation, he will share the story of Pinterest’s journey with Apache Flink and how the use of Flink has helped transform Pinterest with new real-time experiences for Pinners.

FlinkForward_Banner_KEYNOTE_Pinterest-ChenQin

We interviewed Chen about his talk, his experiences with Apache Flink, and with Flink Forward...

What will people learn from the story of Pinterest’s journey with Apache Flink?

“The application of the stream processing technology is evolving rapidly thanks to productChen Qin - Pinterest innovation and competition in the online space. Making good use of mature, open source stream processing frameworks like Apache Flink can accelerate production iterations and reduce cloud infrastructure costs. Our story of adoption was a mixture of education, close working relationship with key stakeholders, and abstraction, as well as mastering dev-to-prod and operation life cycle automation at scale. Once you can demonstrate production stability to internal teams, you can support mission-critical use cases reliably at scale.”


What’s your real-time stack? What software do you run with Apache Flink at Pinterest?

“Our stack features the following:

  • Apache Kafka or S3 are most popular way to access datasets 
  • We built a XenonUnifiedSource/UnifiedSink to provide seamless backfill and reprocessing; 
  • Built a NRTG compiler to expedite user development and keep consistent with batch implementation
  • Apache ZooKeeper - to achieve job high availability in a single Availability zone
  • A CI system on top of Spinnaker to run frequent application regression tests at scale 
  • Dr. Squirrel to provide actionable insights and debugging support to application developers at scale
  • We stored Flink SQL logical tables in Hive Metastore.”

 

Club Matte

What are your favorite Flink Forward memories?

“My first time attending Flink Forward was early 2017. I was able to see many insightful talks and use case discussions. And I tasted beverages brought by the organizing committee from Deutschland.

What do you look forward to most at Flink Forward Global 2021?

"Every year, my colleagues and I have to split work and attend different tracks in Flink Forward. There are many practical talks and interesting new industrial use cases. Ease-of-use topics like abstraction (e.g. Flink SQL, Stateful Functions and our compiler based approach) and unification (to reduce the burden of maintaining batch and streaming implementations) are quite intriguing."

 

What other Flink Forward Global 2021 sessions interest you?

 

Do you have any advice for first-time Flink Forward attendees?

“There are many tracks and great presentations and it's easy to get distracted. Figure out your attendance strategy (with your colleagues) to ensure coverage of talks you want to attend.”

 

Any final thoughts?

“The industry is undergoing a rapid evolution from the batch-only data stack to the hybridization of the real time and batch data stack. There are many challenges to building a unified and simplified offering-empowered product team that iterates fast without overspending infrastructure budget. Flink Forward is a good place to learn how to meet these challenges.”

 

We hope you will join Chen and the rest of the Apache Flink community at Flink Forward online on October 26-27. Secure your spot here!

Topics:
Chen Qin
Article by:

Chen Qin

Find me on:

Comments

Our Latest Blogs

The Release of Flink CDC v2.3 featured image
by Hang Ruan & Qingsheng Ren November 30, 2022

The Release of Flink CDC v2.3

Flink CDC is a change data capture (CDC) technology based on database changelogs. It is a data integration framework that supports reading database snapshots and smoothly switching to reading binlogs...
Read More
Flink SQL Recipe: Window Top-N and Continuous Top-N featured image
by Ververica November 25, 2022

Flink SQL Recipe: Window Top-N and Continuous Top-N

Flink SQL has emerged as the standard for low-code streaming analytics and managed to unify batch and stream processing while simultaneously staying true to the SQL standard. In addition, it provides...
Read More
Apache Flink SQL: Past, Present, and Future featured image
by Becket Qin November 22, 2022

Apache Flink SQL: Past, Present, and Future

Recently the Apache Flink community announced the release of Flink 1.16, which continues to push the vision of stream and batch unification in Flink SQL to a new level. At this point, Flink SQL is...
Read More