Running Flink Data-Connectors at Scale
Data is a core part of our infrastructure here at Yelp, with tens of billions of messages per day flowing across our streaming pipelines, empowering us to solve core business problems. To reliably connect and route this massive amount of data across a variety of data-stores, we have built Flink based data-connectors that have either exactly once or at-most-once semantics.
In this talk I will introduce the various kinds of data connectors we have built and how they assist us in solving various business problems. I’ll focus on how these data-connectors integrate with our infrastructure, discuss the technical tradeoff between stateless and stateful connectors, and general guidelines and specifications we have for developers building such connectors. I’ll also touch briefly on how we audit, to ensure end to end data integrity. In the end I will talk about how we ensure these connectors run at scale and integrate with our home grown flink supervisor solution.
Vipul Singh is a software engineer at Yelp Inc. He is a part of Distributed Systems Team where he works on developing and maintaining stream processing applications. His current focus is on building data connectors, enabling creation of a data lake from schematized streams, and developing next-gen stream processing infrastructure here at Yelp.