Announcing Ververica Platform 2.4

March 10, 2021 | by Konstantin Knauf

Newest release adds full support for Flink SQL and Flink 1.12, and improves resource utilization via new shared session clusters.

 

Today, I am very happy to announce the release of Ververica Platform 2.4. It is our mission to enable any organization to derive insights faster and to serve its customers in real time. To this end, Ververica Platform 2.4 makes development and operations of Apache Flink easier and more accessible. Flink SQL is at the center of our strategy opening up Apache Flink to non-engineers and thus has been the focus of the enhancements in this release:

  • Preview any SQL Query in the Editor (not just insert-only queries)

  • Extend Flink SQL with custom connectors & formats

  • Integrate with more data sources and sinks out-of-the-box

  • Save Resources with Deployments in Session Mode

  • Benefit from the latest improvements of Apache Flink

In the rest of the post, I will walk you through each of these improvements in more detail.

 

Preview any SQL query in the SQL Editor

Ververica Platform’s SQL Editor sits at the core of Flink SQL on Ververica Platform. With it, you run, develop, preview, and deploy your scripts right from your browser. Prior to v2.4.0, the SQL Editor was only able to display the results of queries producing inserts. With Ververica Platform 2.4.0, all types of queries are supported . Figure 1 shows two previously unsupported queries in action.

VVP2.4train_activities_by_booking_channel
VVP2.4_top2

 

Figure 1: A non-widowed aggregation producing updates (left), and a continuous top-n producing deletions (right)


Extend Flink SQL with custom connectors & formats

If you have implemented your own connectors, or you are using a connector from a third-party (like Apache Pulsar or Apache Iceberg) you can now seamlessly add these via the web user interface or REST API. When you do this, Ververica Platform will analyze the archive, find all connector or format implementations and determine their types and configuration options. Afterwards, your connector is available to any developer in your namespace just like the packaged connectors. This makes it very easy to share such connectors across your development team.

 

VVP2.4_custom_connectors_2

Figure 2: Adding a custom connector to Ververica Platform

 

 

Integrate with more data sources and sinks out of the box

Ververica Platform provides a set of packaged connectors including the most popular connectors in the Apache Flink community. New additions in Ververica Platform 2.4 include AWS Kinesis (source & sink), Apache Kafka with Upsert semantics (source & sink) and Debezium Avro (format). In addition, some of the existing connectors have seen significant improvements like exactly-once support for Apache Kafka and auto-compaction for Apache Parquet.

VVP2.4_packaged_connectors
VVP2.4_formats

Figure 3: All packaged connectors (left), and formats (right) in Ververica Platform 2.4.

 

Save resources with Deployments in session mode

In Ververica Platform, a long-running Apache Flink application or Flink SQL query is represented by a Deployment. Deployments can be upgraded, (auto-)scaled and configured declaratively. So far, each Deployment has always been backed by a dedicated Flink cluster. This is what we call “application mode”. 

On the one hand, “application mode” provides strong isolation between Deployments, which simplifies resource management and monitoring among other aspects. On the other hand, “application mode” introduces some resource overhead per Deployment. From our experience, applications written in the DataStream API are usually large in scale and quite heterogeneous in terms of their performance requirements, so that the overhead is easily justified by the higher level of isolation.

With Flink SQL, we often see a great number of queries that are pretty small and share the same requirements in terms of availability and latency. In that case, it makes sense to reduce the overhead by deploying all of them into a shared cluster. This is called “session mode” and it is now supported in Ververica Platform 2.4. 

Deployments in “session mode” have the same capabilities as Deployments in “application mode” when it comes to lifecycle management and scaling. In many cases, upgrading or rescaling your Deployment will even be notably faster in case of “session mode.” The platform also allows you to switch between “session mode” and “application mode” along the regular upgrade strategies. This, for example, allows you to deploy new queries in session mode initially and move them to “application mode,” (i.e. a dedicated cluster) as they become more business critical or larger in scale over time. 

“Session mode” is not only supported for SQL Deployments, but also for applications written in the DataStream API.

 

Benefit from the latest improvements of Apache Flink

As always, Ververica Platform supports the latest release of Apache Flink (1.12.2). Released in December 2020, Apache Flink 1.12 comes with a ton of improvements across the stack. Let me highlight the ones that we think have the highest impact in the context of Ververica Platform.


Batch execution on the DataStream API

Prior to Flink 1.12, you were able to use the DataStream API to process bounded streams (e.g. files), with the limitation that the runtime was not “aware” that the job is bounded. To optimize the runtime for bounded input, the new BATCH mode uses sort-based shuffles and an  improved scheduling strategy and performs aggregations purely in-memory. As a result, BATCH mode execution in the DataStream API already comes very close to the performance of the DataSet API in Flink 1.12. Batch Execution on the DataStream API is a major milestone of a larger effort to converge Flink’s stack towards two unified APIs (Table API/SQL and DataStream API).

 

Temporal table joins

Apache Flink 1.11 only supported temporal table joins based on pre-registered temporal table functions. In Flink 1.12, you can simply use the standard SQL clause FOR SYSTEM_TIME AS OF to express a temporal table join. In addition, temporal joins are now supported against any kind of table that has a time attribute and a primary key, which unlocks temporal joins with Kafka topics or database changelogs (e.g. from Debezium). Please see Fig. 1 for an example that joins a regular Kafka topic with a compacted Kafka topic.

VVP2.4-temporal-table-joins-Apache-Flink
Figure 4: A multi-way temporal table join in Ververica Platform’s new SQL Deployment Overview

 

Production-ready unaligned checkpoints

Originally introduced as an experimental feature in Flink 1.11, unaligned checkpoints provide consistently lower checkpointing times even under backpressure, one of the most frequently mentioned challenges of operating Apache Flink. With Flink 1.12.2, unaligned checkpointing is sufficiently hardened to be used in production environments. 

As usual, the previous release of Apache Flink (1.11) continues to be supported.

 

In closing...

As you have seen above, Flink SQL has continued to sit at the center of our product development efforts over the last months, simply because we believe it is the key to widespread adoption of real-time analytics in any organization. But that is not all: we have, for example, also worked on the usability of our REST API and improved the interoperability of the platform. We will follow up on these topics in a separate post next week. For now, please check out the release notes for a complete list of improvements and bug fixes. 

With an Apache Flink 1.13 release anticipated in April, Ververica Platform 2.5 is already around the corner. Besides Ververica Platform 2.5, our attention is slowly turning towards a major release of Ververica Platform in the second half of this year. We want to take this opportunity to double down on the most important functionality to our users, while dropping functionality that does not add to the platform's value. To determine what those features are, we are looking forward to working closely with our customers and the wider community for additional feedback. 

If you would like to try out the platform yourself, our getting started guide and the corresponding playground repository are the best places to start. We are very much looking forward to your feedback and suggestions

 

       Flink SQL in Ververica Platform, Flink SQL, Ververica Platform, Apache Flink, Getting Started          Community Edition, Ververica Platform, Flink, Flink SQL, Apache Flink          Ververica, Contact, Ververica Platform, Apache Flink





Topics: Ververica Platform, Featured

Konstantin Knauf
Article by:

Konstantin Knauf

Find me on:

Related articles

Comments

Sign up for Monthly Blog Notifications

Please send me updates about products and services of Ververica via my e-mail address. Ververica will process my personal data in accordance with the Ververica Privacy Policy.

Our Latest Blogs

by Nico Kruber May 11, 2021

SQL Query Optimization with Ververica Platform 2.4

In my last blog post, Simplifying Ververica Platform SQL Analytics with UDFs, I showed how easy it is to get started with SQL analytics on Ververica Platform and leverage the power of user-defined...

Read More
by Jun Qin March 29, 2021

The Impact of Disks on RocksDB State Backend in Flink: A Case Study

As covered in a recent blog post, RocksDB is a state backend in Flink that allows a job to have state larger than the amount of available memory as the state backend can spill state to local disk....

Read More
by Konstantin Knauf March 10, 2021

Announcing Ververica Platform 2.4

Newest release adds full support for Flink SQL and Flink 1.12, and improves resource utilization via new shared session clusters.

Read More