Operations Track  

Practical Experience running Flink in Production

 

The stream processing platform team at Uber has been running Flink in Production for > 2 years. Currently, the platform runs and manages > 1000 jobs across multiple data centers, and powers many critical business at Uber from realtime surge computation, UberEATs real-time ETD estimation to OLAP realtime ingestion engine as well. In this talk, we will share challenges, experience and works we have done to run Flink at Uber scale reliably and securely. Specifically, we will cover topics on Flink HA, Flink job build optimization, deployment management, job management, infrastructure maintenance, catastrophic failure recovery in a multi-DC environment and security.

Authors

shuyi-chen
Shuyi Chen
Uber

Shuyi Chen

Shuyi Chen is a software engineer at Uber and the TL of Uber’s stream processing platform. He is a committer of both Apache Flink and Apache Calcite. Shuyi has years of experience in storage infrastructure, data infrastructure, and mobile development at both Google and Uber.

Rong Rong
Rong Rong
Uber

Rong Rong

Rong Rong is a senior software engineer at Uber’s streaming processing team. He worked on Uber’s SQL-based stream analytics engine AthenaX which is currently powering over 1000+ production real-time data analytics and ML pipelines. Previously Rong held a software and machine learning engineer position in Qualcomm computer vision team.