Operations Track  

Managing Flink on Kubernetes - FlinkK8sOperator

 

The goal of Lyft is to “Improve people’s lives with the world’s best transportation”. Our product is fundamentally real-time and building a reliable platform that consumes and processes massive amounts of streaming data empowers us to achieve our mission. The advent of containers and Kubernetes has completely changed how we deploy and manage stateless services. At Lyft we have doubled down on Docker containers and Kubernetes for all the services in production. To achieve a homogenous infrastructure we decided to extend Kubernetes to manage stateful streaming services like Flink. We developed the FlinkK8sOperator which leverages Kubernetes CustomResourceDefinition to enable native management of Flink applications on Kubernetes. FlinkK8sOperator employs a state machine that transitions the application through a series of states, until a stable state is attained. Each Flink application on Kubernetes spins up a separate Flink Cluster, with its own UI, providing clear isolation for monitoring and debugging. This talk provides an overview of running Flink applications on Kubernetes using FlinkK8sOperator, showcasing the entire lifecycle of the application from creation to execution, with focus of transitions during deployments and stateful updates, concluding with a demo.

Authors

Anand Swaminathan
Anand Swaminathan
Lyft

Anand Swaminathan

Currently working as a Software Engineer at Lyft building infrastructure on large scale streaming and batch processing systems. Previously built real time replication systems for Lyft application tables. Also worked on DynamoDB (AWS), building DynamoDB streams, TTL, etc.

ketan-umare
Ketan Umare
Lyft

Ketan Umare

Engineer passionate about making data accessible to everyone. Lead for Flyte (kubernetes native orchestration platform for machine learning and large-scale data analysis) and passionate about Data on kubernetes. Past lead for ETA at Lyft responsible for extracting realtime traffic signals from all Lyft location data, which makes Lyft more efficient. Lead for Block storage at Oracle Baremetal cloud. Lead for Maps and dynamic route planning for logistics @Amazon