New release enables Apache Flink users to address new mixed batch/stream application use cases and simplify operation of stream processing systems at scale.
On September 29, the developers at Ververica, along with nearly 200 other Apache Flink contributors, released Apache Flink 1.14. Today, we announce the availability of Ververica Platform 2.6, which facilitates the development and operation of stream processing applications built with Apache Flink 1.14.
Apache Flink 1.14 included hundreds of changes, and we're excited to support it in Ververica Platform 2.6 so quickly after its release. With this update Ververica Platform's Flink SQL version is bumped from Flink 1.13 to Flink 1.14 bringing all of the recent upstream improvements in Flink SQL to the platform. Here are some of the highlights:
Improved Unified Batch and Stream Processing Experience
Apache Pulsar Connector
Improved Unified Batch & Stream Processing Experience
One of the things that makes Apache Flink unique is how it integrates stream- and batch processing, using unified APIs and a runtime that supports multiple execution paradigms. Several new enhancements make this easier to manage:
Checkpointing Unbounded and Bounded Streams
If you mix bounded and unbounded streams in an application: Flink now supports taking checkpoints of applications that are partially running and partially finished (some operators reached the end of the bounded inputs). Additionally, bounded streams now take a final checkpoint when reaching their end to ensure that all sink data is committed before the job ends (similar to how stop-with-savepoint behaves).
A Flink SQL Deployment with bounded and unbounded sources with "Checkpoints After Finished Tasks" enabled
Batch Execution for Mixed DataStream and Table/SQL Applications
SQL and the Table API are often the default starting points for new projects because the declarative nature and richness of built-in types and operations make it easy to develop applications fast. However, developers may sometimes hit the limit of SQL’s expressiveness for certain types of event-driven business logic.
Flink 1.14 enables bounded batch-executed SQL/Table programs to convert their intermediate Tables to a DataStream, apply some DataSteam API operations, and convert it back to a Table. Under the hood, Flink builds a dataflow DAG mixing declarative optimized SQL execution with batch-executed DataStream logic.
New “Hybrid Source” - Easier Streaming From Tiered Storage
A new Hybrid Source produces a combined stream from multiple sources, by reading those sources one after the other, seamlessly switching over from one source to the other. For example, you might read streams from tiered storage, with older data stored in S3 and newer data landing in Kafka (before it’s migrated to S3). The Hybrid Source can read this as one contiguous logical stream, starting with the historic data on S3 and transitioning over to the more recent data in Kafka.
This is an exciting step in realizing the full promise of the Kappa Architecture. Even if older parts of an event log are physically migrated to different storage (for reasons such as cost, better compression, faster reads) you can still treat and process it as one contiguous log.
Buffer Debloating Improves Memory Utilization Efficiency
New Automatic network memory tuning (a.k.a. Buffer Debloating) speeds up checkpoints under high load. The mechanism continuously adjusts the network buffers to provide stable and predictable alignment times for aligned checkpoints under backpressure, and vastly reduce the amount of in-flight data stored in unaligned checkpoints under backpressure.
Standard Connector Metrics For Easier Monitoring
Until this release, there have been no standard or conventional metric definitions for the connectors in Apache Flink. Each connector defines its own metrics at the moment. This complicates operation and monitoring. In this release, standard metrics have begun being implemented, starting with the Kafka and FileSystem connectors.
Apache Pulsar Connector
The new Apache Pulsar Connector reads data from Pulsar topics and supports both streaming and batch execution modes. With the support of the transaction functionality (introduced in Pulsar 2.8.0), the Pulsar connector provides exactly-once delivery semantics to ensure that a message is delivered exactly once to a consumer, even if a producer retries sending that message.
With the upgrade of Flink SQL to 1.14 new functions like CURRENT_WATERMARK become available in Ververica Platform. Here used to filter out late mobile usage records in the interactive SQL editor (left) and in the final Deployment (right).
Hello Apache Flink 1.14, Goodbye Apache Flink 1.11
With the addition of Apache Flink 1.14 support in Ververica Platform 2.6, we are sunsetting our support for Apache Flink 1.11. Per our support policy, Ververica Platform 2.6 is supported for use with Apache Flink 1.12 and higher.
More Info & Get Ververica Platform 2.6
If you'd like more detail about Apache Flink 1.14, you can see the full list of Apache Flink 1.14 enhancements on the Apache Flink blog. In addition to building support for Apache Flink 1.14, we also resolved a number of Ververica Platform issues, which you can read about in the Ververica Platform 2.6 Release Notes.
If you would like to try Ververica Platform 2.6, the getting started guide and the corresponding playground repository are the best places to start. We also offer training and support for Apache Flink 1.14. We look forward to your feedback and suggestions.