Your Cloud, Your Rules: Ververica's Bring Your Own Cloud Deployment

Request a Trial

Announcing Ververica Platform 2.4

Unified Streaming Data Platform

10 March 2021 by Konstantin Knauf

Newest release adds full support for Flink SQL and Flink 1.12, and improves resource utilization via new shared session clusters.

Today, I am very happy to announce the release of Ververica Platform 2.4. It is our mission to enable any organization to derive insights faster and to serve its customers in real time. To this end, Ververica Platform 2.4 makes development and operations of Apache Flink easier and more accessible. Flink SQL is at the center of our strategy opening up Apache Flink to non-engineers and thus has been the focus of the enhancements in this release:

Preview any SQL Query in the Editor (not just insert-only queries)
Extend Flink SQL with custom connectors & formats
Integrate with more data sources and sinks out-of-the-box
Save Resources with Deployments in Session Mode
Benefit from the latest improvements of Apache Flink

In the rest of the post, I will walk you through each of these improvements in more detail.

Preview any SQL query in the SQL Editor

Ververica Platform’s SQL Editor sits at the core of Flink SQL on Ververica Platform. With it, you run, develop, preview, and deploy your scripts right from your browser. Prior to v2.4.0, the SQL Editor was only able to display the results of queries producing inserts. With Ververica Platform 2.4.0, all types of queries are supported . Figure 1 shows two previously unsupported queries in action.

Figure 1: A non-widowed aggregation producing updates (left), and a continuous top-n producing deletions (right)

Extend Flink SQL with custom connectors & formats

If you have implemented your own connectors, or you are using a connector from a third-party (like Apache Pulsar or Apache Iceberg) you can now seamlessly add these via the web user interface or REST API. When you do this, Ververica Platform will analyze the archive, find all connector or format implementations and determine their types and configuration options. Afterwards, your connector is available to any developer in your namespace just like the packaged connectors. This makes it very easy to share such connectors across your development team.

Figure 2: Adding a custom connector to Ververica Platform

Integrate with more data sources and sinks out of the box

Ververica Platform provides a set of packaged connectors including the most popular connectors in the Apache Flink community. New additions in Ververica Platform 2.4 include AWS Kinesis (source & sink), Apache Kafka with Upsert semantics (source & sink) and Debezium Avro (format). In addition, some of the existing connectors have seen significant improvements like exactly-once support for Apache Kafka and auto-compaction for Apache Parquet. VVP2.4_packaged_connectors

VVP2.4_formats Figure 3: All packaged connectors (left), and formats (right) in Ververica Platform 2.4.

Save resources with Deployments in session mode

In Ververica Platform, a long-running Apache Flink application or Flink SQL query is represented by a Deployment. Deployments can be upgraded, (auto-)scaled and configured declaratively. So far, each Deployment has always been backed by a dedicated Flink cluster. This is what we call “application mode”.

On the one hand, “application mode” provides strong isolation between Deployments, which simplifies resource management and monitoring among other aspects. On the other hand, “application mode” introduces some resource overhead per Deployment. From our experience, applications written in the DataStream API are usually large in scale and quite heterogeneous in terms of their performance requirements, so that the overhead is easily justified by the higher level of isolation.

With Flink SQL, we often see a great number of queries that are pretty small and share the same requirements in terms of availability and latency. In that case, it makes sense to reduce the overhead by deploying all of them into a shared cluster. This is called “session mode” and it is now supported in Ververica Platform 2.4.

Deployments in “session mode” have the same capabilities as Deployments in “application mode” when it comes to lifecycle management and scaling. In many cases, upgrading or rescaling your Deployment will even be notably faster in case of “session mode.” The platform also allows you to switch between “session mode” and “application mode” along the regular upgrade strategies. This, for example, allows you to deploy new queries in session mode initially and move them to “application mode,” (i.e. a dedicated cluster) as they become more business critical or larger in scale over time.

“Session mode” is not only supported for SQL Deployments, but also for applications written in the DataStream API.

Benefit from the latest improvements of Apache Flink

As always, Ververica Platform supports the latest release of Apache Flink (1.12.2). Released in December 2020, Apache Flink 1.12 comes with a ton of improvements across the stack. Let me highlight the ones that we think have the highest impact in the context of Ververica Platform.

Batch execution on the DataStream API

Prior to Flink 1.12, you were able to use the DataStream API to process bounded streams (e.g. files), with the limitation that the runtime was not “aware” that the job is bounded. To optimize the runtime for bounded input, the new BATCH mode uses sort-based shuffles and animproved scheduling strategy and performs aggregations purely in-memory. As a result, BATCH mode execution in the DataStream API already comes very close to the performance of the DataSet API in Flink 1.12. Batch Execution on the DataStream API is a major milestone of a larger effort to converge Flink’s stack towards two unified APIs (Table API/SQL and DataStream API).

Temporal table joins

Apache Flink 1.11 only supported temporal table joins based on pre-registered temporal table functions. In Flink 1.12, you can simply use the standard SQL clause FOR SYSTEM_TIME AS OF to express a temporal table join. In addition, temporal joins are now supported against any kind of table that has a time attribute and a primary key, which unlocks temporal joins with Kafka topics or database changelogs (e.g. from Debezium). Please see Fig. 1 for an example that joins a regular Kafka topic with a compacted Kafka topic.

VVP2.4-temporal-table-joins-Apache-Flink Figure 4: A multi-way temporal table join in Ververica Platform’s new SQL Deployment Overview

Production-ready unaligned checkpoints

Originally introduced as an experimental feature in Flink 1.11, unaligned checkpoints provide consistently lower checkpointing times even under backpressure, one of the most frequently mentioned challenges of operating Apache Flink. With Flink 1.12.2, unaligned checkpointing is sufficiently hardened to be used in production environments.

As usual, the previous release of Apache Flink (1.11) continues to be supported.

In closing...

As you have seen above, Flink SQL has continued to sit at the center of our product development efforts over the last months, simply because we believe it is the key to widespread adoption of real-time analytics in any organization. But that is not all: we have, for example, also worked on the usability of our REST API and improved the interoperability of the platform. We will follow up on these topics in a separate post next week. For now, please check out the release notes for a complete list of improvements and bug fixes.

With an Apache Flink 1.13 release anticipated in April, Ververica Platform 2.5 is already around the corner. Besides Ververica Platform 2.5, our attention is slowly turning towards a major release of Ververica Platform in the second half of this year. We want to take this opportunity to double down on the most important functionality to our users, while dropping functionality that does not add to the platform's value. To determine what those features are, we are looking forward to working closely with our customers and the wider community for additional feedback.

If you would like to try out the platform yourself, our getting started guide and the corresponding playground repository are the best places to start. We are very much looking forward to your feedback and suggestions.

BYOC Deployment on AWS

Table of contents

Use case Solution Overview High-Level Data Pipeline Design

Ververica Announces Partnership with Aiven: Empowering Leading Enterprises to Create Value from Data in Real-Time

General | Company Updates Ververica

Ververica Announces Partnership with Aiven: Empowering Leading Enterprises to Create Value from Data in Real-Time

Ververica and Aiven partner to unlock real-time streaming data together.

by Kaye Lincoln and Karin Landers

Preventing Blackouts: Real-Time Data Processing for Millisecond-Level Fault Handling

Apache Flink Use Cases Unified Streaming Data Platform Complex Event Processing

Preventing Blackouts: Real-Time Data Processing for Millisecond-Level Fault Handling

Leverage real-time data processing for instant power outage detection. Im...

by Jaime López

Real-Time Fraud Detection Using Complex Event Processing

Apache Flink Use Cases Unified Streaming Data Platform Ververica Real-time Fraud Detection Complex Event Processing

Real-Time Fraud Detection Using Complex Event Processing

Real-time fraud detection with Complex Event Processing helps identify su...

by Giannis Polyzos

Meet the Flink Forward Program Committee A Q&A with Erik Schmiegelow

Meet the Flink Forward Program Committee: A Q&A with Erik Schmiegelow

Discover insights from Erik Schmiegelow of Hivemind on the importance of ...

by Kaye Lincoln