Announcing Ververica Platform 2.8

Unified Streaming Data Platform Featured

17 October 2022 by Daisy Tsang

The latest minor release includes an improved High Availability service, improvements to user experience, and vulnerability fixes!

The patch release of Apache Flink 1.15.2 includes several bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. You can read more about it here. This is now supported in Ververica Platform 2.8.

In addition to supporting the latest patch release of Flink, Ververica Platform 2.8 includes an improved Kubernetes-based High Availability service that will improve the robustness and resiliency of your Flink jobs.

Changes to the release process

Our minor releases have always corresponded to Flink minor releases but we have decided that this may limit us moving forward if we want to release bigger features. Starting from Ververica Platform 2.8, minor versions will correspond to either a new minor version of Flink or a new major Ververica Platform feature. Thus, 2.8 continues to support Flink 1.15 (just like VVP 2.7) but features the Flink Kubernetes High Availability option.

Improvements to the High Availability service

Making sure that Flink applications stay performant and resilient is top priority but keeping large distributed stateful applications resilient and making sure that data is still available if a machine dies suddenly is a huge challenge. Flink clusters on the Ververica Platform are configured to use High Availability mode (which essentially sets up Kubernetes, along with its supporting components, in a way that there is no single point of failure).

High Availability (HA) in Flink clusters is used to recover the JobManager (which coordinates job failovers), which there is only one of and is a single point of failure. HA helps recover the JobManager should it fail by remembering where a job has left off and where a job can continue when a new JobManager starts.

The HA service in Ververica Platform was initially based on a Zookeeper quorum. Since Ververica Platform 2.0.0, a custom Kubernetes-based HighAvailabilityServices implementation, VVP-Kubernetes-HA, was created to take Zookeeper’s role. This was done to improve usability since there is now one less component (Zookeeper cluster) to be managed. On top of VVP-Kubernetes-HA, Ververica Platform can support the new “Latest State” Restore Strategy which allows users to restart their jobs from the latest checkpoint or savepoint.

Fast forward to 2021 and Apache Flink introduced a native Kubernetes HA component that ships out-of-the-box. After some tests, we have determined that it can be an improvement to our custom implementation since it supports higher load on the Kubernetes API server. We decided to transition this important component to Flink Kubernetes HA and migrate away from our custom implementation in order to avoid unnecessary duplication and maintenance overhead.

We are pleased to announce that Ververica Platform 2.8 offers both “VVP Kubernetes” and “Flink Kubernetes” as HA options for the Flink Jobmanager Failover Configuration so that users can migrate more smoothly but the plan is to phase out “VVP Kubernetes” in later releases. We encourage users to use the new HA option. Both options feature retaining the latest state.

The Flink Kubernetes HA option is not available in the Community Edition.

Other improvements

Migrate Docker base images

We use openjdk as a base for the Ververica Platform images we ship publicly but the openjdk Docker repository is deprecated and not maintained anymore. In this release, we have updated openjdk to eclipse-temurin (which is based on Ubuntu-20):

eclipse-temurin:8-jdk-focal for Java 8 images
eclipse-temurin:11-jdk-focal for Java 11 images

Make Google Cloud Storage plugin work by default

Ververica Platforms’s universal blob storage feature helps store job jars, and configure checkpoint and savepoint directories automatically. You can specify the base URI location along with the provider scheme in a yaml file. The storage is used for things such as checkpoints and savepoints.

While we already support GCS, this plugin did not previously work out-of-the-box and required the user to download the GCS plugin and recreate the Flink image. In this release, the plugin will be enabled by default so users do not have to create a custom Flink image. This should create a smoother user experience.

More Info & Get Ververica Platform 2.8

If you'd like more detail about Apache Flink 1.15, you can see the full list of Apache Flink 1.15 enhancements on the Apache Flink blog. In addition to building support for Apache Flink 1.15, we also resolved a number of Ververica Platform issues and vulnerability issues, which you can read about in the Ververica Platform 2.8 release notes.

If you would like to try Ververica Platform 2.8, the getting started guide and the corresponding playground repository are the best places to start. We look forward to your feedback and suggestions. :)