Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Announcing Google Cloud Dataflow on Flink and easy Flink deployment on Google Cloud


by

Today, we are pleased to announce a deeper engagement between Google, data Artisans, and the broader Apache Flink™ community to bring easy Flink deployment to Google Cloud Platform, and enable Google Cloud Dataflow users to leverage Apache Flink™ as a backend.

Flink deployment on Google Cloud Platform

We recently contributed a patch to bdutil, Google's open source tool for deploying data processing systems on Google Compute Engine. In addition to managing Hadoop on Google Compute Engine, bdutil now lets you deploy Flink as easily as:

bdutil -e extensions/flink/flink_env.sh deploy


See here for detailed instructions. Automatic Flink deployment on Google Caompute Engine is a natural next step after our recent experience of using Flink and the Google Compute Engine to factorize a 28-billion element matrix in 5 hours using a 40-node cluster. Check out our recent blog post here and an extended version here.


Google Cloud Dataflow on Flink

Google Cloud Dataflow is a data analytics service running on Google’s infrastructure. It allows users to write sophisticated data analytics pipelines for both batch and streaming programs and run them at scale on Google Cloud Platform. Dataflow offers a unified view at batch and stream processing, as well as highly flexible window semantics that support complex event stream analysis patterns. Cloud Dataflow is a descendant of Google’s FlumeJava and MillWheel projects. Google recently released an SDK for Dataflow as open source. The SDK decouples the programming model from the execution engine, via pluggable "runners". Google provides runners to run Dataflow programs on Google Cloud Platform, or on a local machine (for development). Today, we are pleased to announce a Flink runner for Cloud Dataflow. Dataflow users can now run their programs using Apache Flink™ as the execution backend. The current Flink runner supports all the batch functionality of Dataflow. We are currently working on bringing the Dataflow streaming functionality into the Flink runner. Fortunately, Flink already supports flexible window semantics, as does Cloud Dataflow. Flink and Cloud Dataflow are very well aligned, as they both share the vision of natively unifying stream and batch processing at the engine level. Flink has always executed both batch and streaming programs using the same streaming (pipelined) engine. The addition of Flink to the family of Dataflow SDK runners (that now include Google’s cloud platform, a local runner, and a Cloudera-contributed Apache Spark runner) is great for users that want to run the same hybrid analytical pipelines in the cloud and even on premise. Click here to get started on Google Dataflow. To install the Flink Dataflow runner, follow the instructions here. As always, we would love to know what you think, so please give us feedback by submitting an issue. For more information, see the announcement on the Google Cloud Platform Blog.

Ververica Academy

Maximilian Michels
Article by:

Maximilian Michels

Comments

Our Latest Blogs

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data featured image
by Kaye Lincoln 09 April 2024

Q&A with Erik de Nooij: Insights into Apache Flink and the Future of Streaming Data

Ververica is proud to host the Flink Forward conferences, uniting Apache Flink® and streaming data communities. Each year we nominate a Program Chair to select a broad range of Program Committee...
Read More
Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community featured image
by Ververica 03 April 2024

Ververica donates Flink CDC - Empowering Real-Time Data Integration for the Community

Ververica has officially donated Flink Change Data Capture (CDC) to the Apache Software Foundation. In this blog, we’ll explore the significance of this milestone, and how it positions Flink CDC as a...
Read More
Announcing the Release of Apache Flink 1.19 featured image
by Lincoln Lee 18 March 2024

Announcing the Release of Apache Flink 1.19

The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people...
Read More