Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Apache Flink® Community Announces 1.2.0 Release


by

On Monday, February 6, the Apache Flink® community announced the project's 1.2.0 release. We at data Artisans would like to extend a sincere thanks to the 122 members of the Flink community who contributed to 1.2.0. The release included contributors employed by Alibaba, Amazon, Cloudera, King, and many other enterprises. At data Artisans, we spend most of our waking hours thinking about and working on Flink, and so there's lots that we're excited about in the 1.2.0 release. In this post, members of the data Artisans engineering team will share their thoughts on just a subset of the release's new features. For a complete overview, be sure to check out the changelog on the project site. And in the coming weeks, we'll be writing about 1.2.0 features in more detail here on the data Artisans blog.

Rescalable State

One of Flink's core features is state management, and in order to support dynamic scaling (changing the number of machines allocated to a job while it's running), it's necessary to be able to redistribute state to a smaller or larger number of machines. For Flink 1.2, the internal representation of state was redesigned with a focus on rescaling. It's now possible to change the scale of jobs without breaking Flink's consistency guarantees. This makes jobs elastic so that applications can easily adapt to evolving performance requirements and allows for better utilization of hardware infrastructure, improving user experience during peak workloads and reducing costs during off-peak workloads. -Stefan Richter (@StefanRRichter)

Low-level Stream Operations

An interesting new feature in Flink 1.2.0 is a DataStream operation called process() that comes with a new type of user function: ProcessFunction. The function allows access to the same machinery that the WindowOperator uses internally to write custom operations that keep state and set timers. We expect ProcessFunction to be used for cases where more flexibility is required than what the current windowing system offers. -Aljoscha Krettek (@aljoscha) Relevant Documentation:

Asynchronous I/O

Flink’s new asynchronous I/O operator allows to run multiple blocking user operations, such as database queries or external system requests, concurrently. This can increase Flink's throughput significantly by overlaying waiting times while offering the same exactly-once processing guarantees as the rest of the system. Kudos to the Alibaba team for their contributions to this feature. - Till Rohrmann (@stsffap) Relevant Documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html

Table API and SQL

With Flink 1.2.0, the community significantly expanded the coverage of Flink’s Table API and SQL support for batch and streaming tables. Besides adding many built-in functions and support for more data types and relational operators, both APIs now support user-defined scalar and user-defined table functions. In addition, the Table API features tumbling, sliding, and session group window aggregations over streaming tables. - Fabian Hueske (@fhueske) Relevant Documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html

Externalized Checkpoints

Flink now gives you the option to persist periodic checkpoints externally. You can think of this as “lightweight” savepoints that ensure that you can resume your program from a checkpoint after the program has terminated. Previously, this was only possible with manual savepoints. The difference to savepoints is that externalized checkpoints are bound to a specific program. You will only be able to resume with the *same program* whereas savepoints allow rescaling and migration between Flink versions well. - Ufuk Celebi (@iamuce) Relevant Documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/checkpoints.html#externalized-checkpoints

Queryable State

This experimental feature allows you to directly query program state from Flink. Instead of populating a key/value database from data streams, you can build your application directly on top of the stream processor. This eliminates the need for (distributed) transactions with an external database, leading to very high performance. You do point lookups via the `QueryableStateClient` and expose your state for queries by simply marking it as queryable (`StateDescriptor#setQueryable(String)`) or using a queryable sink (`DataStream#asQueryableState(String)`). You can expect the APIs for this to change with upcoming releases, but feel encouraged to try this out now. The Flink community is very happy to get early feedback on this. - Ufuk Celebi (@iamuce) Relevant Documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/queryable_state.html

Flink Documentation

Many contributors have worked to improve the Flink documentation for this release. The navigation is clearer, and the docs are better organized and more complete. Many sections have been substantially rewritten, not only to bring them up-to-date, but also to better communicate how to take advantage of what Flink offers. - David Anderson (@alpinegizmo) Flink Documentation Home: https://ci.apache.org/projects/flink/flink-docs-release-1.2/

Mesos Integration

Flink 1.2 comes with support for Apache Mesos as a first-class citizen. This feature further increases Flink's flexibility when it comes to different deployment scenarios. So no matter whether you have a YARN, Mesos, or a bare-metal cluster at hand, Flink can run everywhere. - Till Rohrmann (@stsffap) Relevant Documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/mesos.html

Backwards Compatibility for Savepoints

Even though the format of checkpoints and savepoints was changed heavily since Flink versions 1.1 (due to the introduction of the “rescaling” feature), the new version 1.2 can resume savepoints from Flink 1.1. The core mechanisms are in place that allow the framework to convert the format of savepoints upon loading them. With backwards compatibility for savepoints, continuous applications built on Flink can go through framework upgrades without loss of state and consistency and without the need for reprocessing. - Stephen Ewen (@StephanEwen) Relevant Documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/upgrading.html

Ververica Academy

Topics:
Article by:

Michael Winters

Comments

Our Latest Blogs

Flink Forward Berlin 2024: CFP Open! featured image
by Karin Landers 26 April 2024

Flink Forward Berlin 2024: CFP Open!

You're invited to Flink Forward Berlin 2024, your conference for all things streaming data and Apache Flink®️, organized by Ververica!
Read More
Driving Real-Time Data Solutions: Insights from Uber's Na Yang featured image
by Kaye Lincoln 23 April 2024

Driving Real-Time Data Solutions: Insights from Uber's Na Yang

As the organizers of Flink Forward, at Ververica we take great pride in bringing together the Apache Flink® and streaming data communities. Every year, we appoint a Program Chair responsible for...
Read More
Ververica celebrates as Apache Paimon Graduates to Top-Level Project featured image
by Kaye Lincoln and Karin Landers 18 April 2024

Ververica celebrates as Apache Paimon Graduates to Top-Level Project

Congratulations to the Apache Software Foundation and each individual contributor on the graduation of Apache Paimon from incubation to a Top-Level Project! Apache Paimon is a data lake format that...
Read More