Apache Flink® Master Branch Monthly: New in Flink in February 2018

Apache Flink Flink Features

15 March 2018 by Till Rohrmann

Thanks to Stefan Richter, Kostas Kloudas, and Chesnay Schepler for their contributions to this post.

Last month, we introduced our first ever Flink Master Monthly blog post so we could highlight features that were merged into Flink’s master branch during the previous month but aren’t yet part of a stable release.

Flink’s major version releases occur every few months, and there’s a constant stream of activity as new features are merged to the Flink master branch in between releases. Keeping an eye on what’s going into Flink’s master is one of the best ways to stay up-to-date on new work that hasn’t yet made it into an official release.

If you’re interested in trying out these features out once they’re in master, you can certainly do so. Just keep in mind that they haven’t yet been fully tested until they go through the official release process.
If you’d like to see a full list of newly-merged features from a given time period, Git is your friend. You can run the following:

git shortlog -e --since="01 Feb 2018" --before="01 Mar 2018"

This month’s raw list is more than 3000 words long, and here's our summary.

FLIP-6 Work: FLIP-6, a major rework of Flink’s deployment and process model in order to improve integration with YARN, Mesos, and container managers (e.g. Docker & Kubernetes), is nearing completion. This FLIP will clear the way for oft-requested features such as dynamic scaling, among other things.

More than 50 issues under the flip6 label were merged in February 2018 as the community moves closer completion of the FLIP. You can take a look here if you’d like to go deeper into the details. In particular, we’ll highlight FLINK-8614, which activates “FLIP-6 mode” by default.

Expose broadcast state on the DataStream API: Many use cases consist of a stream of rules or patterns that a user wants to apply to another stream of incoming data. And these rules should be able to be updated over time. The broadcast state feature (FLINK-3659) allows a user to:

Connect a broadcast stream (the rules) with another keyed or non-keyed stream (the incoming data)
Store the rules into this new type of state
Apply the rules to incoming data

The fact that the rules are broadcasted means that each rule will be sent to all parallel tasks downstream. This is a commonly-requested feature and which unblocks, among other things, the implementation of the "dynamic patterns" feature in FlinkCEP.

Table API / Streaming SQL: In February, the community added an initial SQL CLI for Flink’s streaming SQL (FLINK-8607), making it possible to submit pure SQL queries to Flink and to visualize real-time results without the need for Java or Maven skills.

An experimental version of the feature was first demoed at Flink Forward Berlin 2017 in case you’d like to see it in action. The issue falls under the umbrella of Flink Improvement Proposal 24 for adding a SQL Client to Flink’s streaming SQL. The SQL Client is in an early development stage and the community open for more feedback, so feel free to join the discussion!

Task recovery from local state for faster failure recovery: Last month, we highlighted a few of the issues that were merged in an effort to introduce task recovery from local state. As a refresher, this means that Flink will store a second copy of the most recent checkpoint on the local disk of a task manager. In case of failover, the scheduler will try to reschedule tasks to their previous task manager (in other words, on the same machine) if possible.

The community's continued progress on this work is captured by a handful of critical issues that were closed in February:

Implement state storage for local recovery and integrate with task lifecycle (FLINK-8360)
Implement file-based local recovery for FsStateBackend (FLINK-8360)
Implement file-based local recovery for RocksDBStateBackend (FLINK-8360)
Attempt to reschedule failed tasks to a previous allocation (FLINK-8781)

Rescale running jobs without manually taking a savepoint: A new feature was added that allows a user to rescale running jobs without manually taking a savepoint then resuming the job with a changed parallelism (FLINK-8629). This process can now be handled by the JobMaster instead.

Creating a keyed stream without a keyBy: In a case where, for example, a stream is already pre-partitioned in the correct way, this issue (FLINK-8571) would eliminate the need for a shuffle which, in turn, can make jobs with keyed state embarrassingly parallel.

Documentation and Quickstarts: Two improvements were made to the documentation and the general getting-started-with-Flink user experience:

The addition of an example for a minimal, roll-your-own event pattern detection scenario (FLINK-8548) that uses a simple state machine that is evaluated over the stream
The ability to make Flink quickstarts work for IntelliJ, Eclipse, and Maven Jar packaging out of the box (FLINK-8764)

Metrics: Flink’s watermark metrics were improved, exposing `currentLowWatermark` for all operators including data sources (FLINK-4812). The latency metric was also reworked (FLINK-7608), providing more flexibility in how it can be used by common metrics systems.

Ecosystem:
The community added new features to connect Flink to other parts of the ecosystem:

- Reading and writing JSON messages from and to connectors has been improved. It’s now possible to parse a standard JSON schema in order to configure serializers and deserializers (FLINK-8630). The SQL Client is able to read JSON records from Kafka (FLINK-8538).
- Flink’s file input format supports to read from multiple paths now (FLINK-3655).
- A user of Flink’s bucketing sink can also specify custom extensions for multiple parts now (FLINK-8814).
- The Cassandra output format can handle instances of the Row type (FLINK-8397).
- The Kinesis consumer allows for more customization (FLINK-8516, FLINK-8648).