Integrate Flink with Hive Ecosystem
Along with the community's effort, at Alibaba we have explored Flink's potential as an execution engine not just for stream processing but also for batch processing. The findings are encouraging, so we have initiated our effort to make Flink's batching capabilities full-fledged, especially in SQL support. While comparing Flink to a mature SQL tool, we identified a major gap: a well integration with Hive ecosystem. This is crucial to the success of Flink SQL and batch processing because the majority Flink and other big data users have already established a data ecosystem around Hive. Therefore, we have decided to promote a close integration of Flink with Hive ecosystem.
In this talk we will present functional and design proposals as well as a roadmap to the ultimate goal. We will also share the latest development in this direction and the early feedback and outcomes.
Xuefu Zhang is a long-time open source veteran, worked or working on many projects under Apache Foundation, of which he is also an honored member. About 10 years ago he worked in the Hadoop team at Yahoo where the projects just got started. Later he worked at Cloudera, initiating and leading the development of Hive on Spark project in the communities and across many organizations. Prior to joining Alibaba, he worked at Uber where he promoted Hive on Spark for all Uber's SQL on Hadoop workload and also significantly improved Uber's cluster efficiency.
Bowen is a senior software engineer at Alibaba, working on new initiatives of Flink and Alibaba’s internal fork known as Blink. Before joining Alibaba, he led the development of stream processing platform based on Flink at OfferUp, and worked on infrastructure of data analytics platform at Tableau.