Technology Deep Dive Track  

Build a Table-centric Apache Flink Ecosystem

Flink Table API was initially created to address the relational query use case. It has been a good addition to DataStream and DataSet API for users to write declarative queries. Moreover, Table API provides a unified API for batch and stream processing. We have been exploring extending the capability of Flink Table API to go beyond the classical relational query. With these work, we are establishing an ecosystem on top of the Table API. This talk will introduce the following enhancements we have made on Table API to expand its horizon. Most of the work has been or will be contributed back to Apache Flink. We will also share our experience of building an ecosystem around Flink Table API, and our vision for Table API in the future.

 

Non-relational processing API

Relational query is natively supported by Table API. It is also very powerful to express complicated computation logic. However, non-relational API become handy to perform a general purpose computation. We have introduced a set of non-relational methods, such as map() and flatMap(), to Table API in a systematic manner to improve the user experience in general.

 

Interactive programming

Ad-hoc queries is a pretty common use case for processing engines, especially for batch processing. In order to meet the requirements for such use cases, we introduced interactive programming to Table API, which allows users to cache the intermediate result. We envision the underlying service, which caches the intermediate Flink Table, will grow significantly to provide more sophisticated capabilities.

 

Iterative processing

Compared with DataSet and DataStream, one thing missing from Table is native iteration support. Instead of naively copying the native iteration API from DataSet / DataStream, we designed a new API to address the caveats that we have seen in the existing iteration support in DataStream and DataSet.

 

ML on Table API

One important part of the Flink ecosystem is ML. We have proposed to build a ML on top of Table API, so that the algorithm engineers can also benefit from the optimizations provided by Flink, in both batch and stream jobs.

Authors

shaoxuan_profile_pic-1
Shaoxuan Wang
Alibaba

Shaoxuan Wang

Shaoxuan is a senior staff engineer and director of engineering at Alibaba, working on Flink SQL and AI platform. Alibaba runs Blink, a fork of Flink, at very large scale and offers it as a service to internal and cloud customers. Prior to Alibaba, Shaoxuan was a senior software engineer working on social graph and core infrastructure at Facebook. Shaoxuan received his Ph.D degree from UC San Diego. He is an Apache Flink® committer. Shaoxuan and his colleagues are heavily contributing to Apache Flink®.