Stream Processing & Apache Flink - News and Best Practices

The Apache Flink Story at Pinterest - Flink Forward Global 2021

Written by Chen Qin | 21 September 2021

On October 27, at the annual Apache Flink user conference, Flink Forward Global 2021, Pinterest Tech Lead, Chen Qin will deliver a keynote talk on “Sharing what we love: The Apache Flink Story at Pinterest”. Chen has been using Apache Flink since late 2015, and he’s been attending Flink Forward conferences since 2017.

Pinterest is a visual discovery platform that helps over 478 million Pinners find and share ideas that spark inspiration. In Chen’s presentation, he will share the story of Pinterest’s journey with Apache Flink and how the use of Flink has helped transform Pinterest with new real-time experiences for Pinners.

We interviewed Chen about his talk, his experiences with Apache Flink, and with Flink Forward...

What will people learn from the story of Pinterest’s journey with Apache Flink?

“The application of the stream processing technology is evolving rapidly thanks to product innovation and competition in the online space. Making good use of mature, open source stream processing frameworks like Apache Flink can accelerate production iterations and reduce cloud infrastructure costs. Our story of adoption was a mixture of education, close working relationship with key stakeholders, and abstraction, as well as mastering dev-to-prod and operation life cycle automation at scale. Once you can demonstrate production stability to internal teams, you can support mission-critical use cases reliably at scale.”

What’s your real-time stack? What software do you run with Apache Flink at Pinterest?

“Our stack features the following:

  • Apache Kafka or S3 are most popular way to access datasets
  • We built a XenonUnifiedSource/UnifiedSink to provide seamless backfill and reprocessing;
  • Built a NRTG compiler to expedite user development and keep consistent with batch implementation
  • Apache ZooKeeper - to achieve job high availability in a single Availability zone
  • A CI system on top of Spinnaker to run frequent application regression tests at scale
  • Dr. Squirrel to provide actionable insights and debugging support to application developers at scale
  • We stored Flink SQL logical tables in Hive Metastore.”

What are your favorite Flink Forward memories?

“My first time attending Flink Forward was early 2017. I was able to see many insightful talks and use case discussions. And I tasted beverages brought by the organizing committee from Deutschland.”

What do you look forward to most at Flink Forward Global 2021?

"Every year, my colleagues and I have to split work and attend different tracks in Flink Forward. There are many practical talks and interesting new industrial use cases. Ease-of-use topics like abstraction (e.g. Flink SQL, Stateful Functions and our compiler based approach) and unification (to reduce the burden of maintaining batch and streaming implementations) are quite intriguing."

What other Flink Forward Global 2021 sessions interest you?

Do you have any advice for first-time Flink Forward attendees?

“There are many tracks and great presentations and it's easy to get distracted. Figure out your attendance strategy (with your colleagues) to ensure coverage of talks you want to attend.”

Any final thoughts?

“The industry is undergoing a rapid evolution from the batch-only data stack to the hybridization of the real time and batch data stack. There are many challenges to building a unified and simplified offering-empowered product team that iterates fast without overspending infrastructure budget. Flink Forward is a good place to learn how to meet these challenges.”

We hope you will join Chen and the rest of the Apache Flink community at Flink Forward online on October 26-27. Secure your spot here!