Technology Deep Dive Track
Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream
Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.
In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:
- The future roles and interplay of DataSet, DataStream, and Table API
- The new Flink stack and the abstractions on which these APIs will build
- The new unified batch/streaming sources
- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like
Aljoscha Krettek is a PMC member at Apache Flink and co-founder and software engineer at 'Ververica':https://www.ververica.com/. He studied Computer Science at TU Berlin, he has worked at IBM Germany and at the IBM Almaden Research Center in San Jose. In Flink, Aljoscha is mainly working on the Streaming API. The most recent additions to the windowing and state APIs where designed and implemented by him. Aljoscha has spoken at Hadoop Summit, Strata, Flink Forward and several meetups about stream processing and Apache Flink before.
Stephan Ewen is CTO and co-founder at Ververica where he leads the development of the stream processing platform based on open source Apache Flink. He is also a PMC member and one of the original creators of Apache Flink. Before working on Apache Flink, Stephan worked on in-memory databases, query optimization, and distributed systems. He holds a Ph.D. from the Berlin University of Technology.