Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Announcing Google Cloud Dataflow on Flink and easy Flink deployment on Google Cloud


by

Today, we are pleased to announce a deeper engagement between Google, data Artisans, and the broader Apache Flink™ community to bring easy Flink deployment to Google Cloud Platform, and enable Google Cloud Dataflow users to leverage Apache Flink™ as a backend.

Flink deployment on Google Cloud Platform

We recently contributed a patch to bdutil, Google's open source tool for deploying data processing systems on Google Compute Engine. In addition to managing Hadoop on Google Compute Engine, bdutil now lets you deploy Flink as easily as:

bdutil -e extensions/flink/flink_env.sh deploy


See here for detailed instructions. Automatic Flink deployment on Google Caompute Engine is a natural next step after our recent experience of using Flink and the Google Compute Engine to factorize a 28-billion element matrix in 5 hours using a 40-node cluster. Check out our recent blog post here and an extended version here.


Google Cloud Dataflow on Flink

Google Cloud Dataflow is a data analytics service running on Google’s infrastructure. It allows users to write sophisticated data analytics pipelines for both batch and streaming programs and run them at scale on Google Cloud Platform. Dataflow offers a unified view at batch and stream processing, as well as highly flexible window semantics that support complex event stream analysis patterns. Cloud Dataflow is a descendant of Google’s FlumeJava and MillWheel projects. Google recently released an SDK for Dataflow as open source. The SDK decouples the programming model from the execution engine, via pluggable "runners". Google provides runners to run Dataflow programs on Google Cloud Platform, or on a local machine (for development). Today, we are pleased to announce a Flink runner for Cloud Dataflow. Dataflow users can now run their programs using Apache Flink™ as the execution backend. The current Flink runner supports all the batch functionality of Dataflow. We are currently working on bringing the Dataflow streaming functionality into the Flink runner. Fortunately, Flink already supports flexible window semantics, as does Cloud Dataflow. Flink and Cloud Dataflow are very well aligned, as they both share the vision of natively unifying stream and batch processing at the engine level. Flink has always executed both batch and streaming programs using the same streaming (pipelined) engine. The addition of Flink to the family of Dataflow SDK runners (that now include Google’s cloud platform, a local runner, and a Cloudera-contributed Apache Spark runner) is great for users that want to run the same hybrid analytical pipelines in the cloud and even on premise. Click here to get started on Google Dataflow. To install the Flink Dataflow runner, follow the instructions here. As always, we would love to know what you think, so please give us feedback by submitting an issue. For more information, see the announcement on the Google Cloud Platform Blog.

Flink Forward 2024 conference + bootcamp ticket

Maximilian Michels
Article by:

Maximilian Michels

Comments

Our Latest Blogs

VERA Blog Series Part 3: Full Stream Ahead! featured image
by Karin Landers with Ben Gamble 13 August 2024

VERA Blog Series Part 3: Full Stream Ahead!

VERA: The Cloud Native Engine Revolutionizing Apache Flink® Blog Series Welcome to the final installment of the three-part blog series that introduces Ververica Runtime Assembly (VERA), the...
Read More
VERA Blog Series Part 2: Under the Hood: VERA's 3 Core Pillars featured image
by Karin Landers with Ben Gamble 08 August 2024

VERA Blog Series Part 2: Under the Hood: VERA's 3 Core Pillars

VERA: The Cloud Native Engine Revolutionizing Apache Flink® Blog Series Welcome to part two of our three-part blog and video series that introduces Ververica Runtime Assembly (VERA), the...
Read More
VERA Blog Series Part 1: From Steam to Stream featured image
by Karin Landers with Ben Gamble 06 August 2024

VERA Blog Series Part 1: From Steam to Stream

VERA: The Cloud Native Engine Revolutionizing Apache Flink® Blog Series Welcome to part one of a three-part blog series that introduces Ververica Runtime Assembly (VERA), the cloud-native,...
Read More