Use Case Track  

Building production Flink jobs with Airstream at Airbnb

AirStream is a realtime stream computation framework that supports Flink as one of its processing engines. It allows engineers and data scientists at Airbnb to easily leverage Flink to build real time data pipelines and feedback loops. Multiple mission critical applications have been built on top of it. In this talk, we will start with an overview of AirStream, and describe how we have designed Airstream to leverage SQL support in Flink to allow users to easily build real time data pipelines. We will go over a few production use cases such as building a user activity profiler and building user identity mapping in realtime. We will also cover how we have integrated Airstream into the data infrastructure ecosystem at Airbnb through easily configurable connectors such as Kafka and Hive that allow users to easily leverage these components in their pipelines.

Authors

Pala  Muthiah
Pala Muthiah
Airbnb

Pala Muthiah

Pala is an engineer in Data Platform at Airbnb, working on projects revolving around stream processing, analytics engines and storage. Prior to that he was a systems engineer at Trooly (acquired by Airbnb), working on a web crawl system (storage and indexing), and engineer at RocketFuel working on data and ML infrastructure. Before that he worked as an engineer in Microsoft SQL Server team on database manageability for SQL Azure and SQL Server. He holds a Bachelors and Masters degree in Computer Science from Cornell University.

Hao Wang
Hao Wang
Airbnb

Hao Wang

Hao is a software engineer on Airbnb’s Data Platform team. He has been leading the development and driving the adoption of real-time stream processing at Airbnb. Before that, he helped building the data and machine learning infrastructure for IBM Watson. He received his PhD from the University of Southern California.