Skip to content

Ververica Cloud, a fully-managed cloud service for stream processing!

Learn more

Apache Flink® Master Branch Monthly: New in Flink in June 2018


by

Thanks to Dawid Wysakowicz, Piotr Nowojski, and Stefan Richter for their contributions and support in this blog post.

At the beginning of the year, we introduced the Flink Master Monthly blog post series (see the parts for January 2018 here and February 2018 here) to highlight features that were merged into Apache Flink’s master branch during the previous month but aren’t yet part of a stable release.

Apache Flink’s major version releases occur every few months, and there’s a constant stream of activity as new features are merged to the Flink master branch in between releases. Keeping an eye on what’s going into Flink’s master is one of the best ways to stay up-to-date on new work that hasn’t yet made it into an official release.

While the Apache Flink community was focused on releasing Flink 1.5.0 in May 2018, (for more details on what is updated in Flink 1.5.0, read our blog post here) June was also a busy month with some very valuable additions that you can explore below.

If you’d like to see a full list of newly-merged features from a given time period, Git is your friend. You can run the following git command for a full list of all additions for June:

git shortlog -e --since="01 June 2018" --before="01 July 2018"

[FLINK-9418] Migrate SharedBuffer to use MapState

  • Hardens the CEP library and makes it production ready. The migration to MapState makes ShareBuffer smarter with the memory management since only the necessary parts, such as tail entries, are deserialized rather than the entire structure. This change greatly stabilizes the CEP library and allows its use for applications with far more complex patterns that maintain more state.

[FLINK-9366] & [FLINK-8620] DistributedCache works with Distributed File System & BlobStore

  • In May the team worked on the FLINK-8620 issue that modified the distributed cache to disseminate files via the BlobStore, instead of downloading them from a distributed file system. Previously, task managers would download the requested files from the DFS, while now they can retrieve it from the BlobStore. This requires the client to pre-emptively upload all used files with the distributed cache.

  • Following from FLINK-8620, the community working on FLINK-9366 in June to allow both the old and new behavior of DistributedCache. This means that files from the local filesystem can be distributed via the BlobStore, based on a path scheme, while files in DFS will be downloaded and cached directly from DFS.

[FLINK-3952][runtime] Upgrade to Netty 4.1

  • The latest Netty version, which contains a lot of bug fixes, is a major improvement in Flink that includes significant enhancements available in Netty 4.1, and particularly HTTP/2 codecs. Additionally, upgrading to the latest Netty version will allow Flink to take advantage of all available Hadoop patches for Netty 4.1 once tested and fully merged.

[FLINK-9487][state] Prepare InternalTimerHeap for asynchronous snapshots

  • This is the first step in our efforts to make timer services scalable and improve checkpoint speed.

[FLINK-8430] [table] Implement stream-stream non-window full outer join

  • This commit completes the family of the most important non-windowed join operations in Flink’s streaming SQL. In addition to time-windowed joins, the upcoming Flink version will support INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. A design document about another class of joins for streaming enrichment has just been published. Feel free to

    Thanks to Dawid Wysakowicz, Piotr Nowojski, and Stefan Richter for their contributions and support in this blog post.

    At the beginning of the year, we introduced the Flink Master Monthly blog post series (see the parts for January 2018 here and February 2018 here) to highlight features that were merged into Apache Flink’s master branch during the previous month but aren’t yet part of a stable release.

    Apache Flink’s major version releases occur every few months, and there’s a constant stream of activity as new features are merged to the Flink master branch in between releases. Keeping an eye on what’s going into Flink’s master is one of the best ways to stay up-to-date on new work that hasn’t yet made it into an official release.

    While the Apache Flink community was focused on releasing Flink 1.5.0 in May 2018, (for more details on what is updated in Flink 1.5.0, read our blog post here) June was also a busy month with some very valuable additions that you can explore below.

    If you’d like to see a full list of newly-merged features from a given time period, Git is your friend. You can run the following git command for a full list of all additions for June:

    git shortlog -e --since="01 June 2018" --before="01 July 2018"

    [FLINK-9418] Migrate SharedBuffer to use MapState

    • Hardens the CEP library and makes it production ready. The migration to MapState makes ShareBuffer smarter with the memory management since only the necessary parts, such as tail entries, are deserialized rather than the entire structure. This change greatly stabilizes the CEP library and allows its use for applications with far more complex patterns that maintain more state.

    [FLINK-9366] & [FLINK-8620] DistributedCache works with Distributed File System & BlobStore

    • In May the team worked on the FLINK-8620 issue that modified the distributed cache to disseminate files via the BlobStore, instead of downloading them from a distributed file system. Previously, task managers would download the requested files from the DFS, while now they can retrieve it from the BlobStore. This requires the client to pre-emptively upload all used files with the distributed cache.

    • Following from FLINK-8620, the community working on FLINK-9366 in June to allow both the old and new behavior of DistributedCache. This means that files from the local filesystem can be distributed via the BlobStore, based on a path scheme, while files in DFS will be downloaded and cached directly from DFS.

    [FLINK-3952][runtime] Upgrade to Netty 4.1

    • The latest Netty version, which contains a lot of bug fixes, is a major improvement in Flink that includes significant enhancements available in Netty 4.1, and particularly HTTP/2 codecs. Additionally, upgrading to the latest Netty version will allow Flink to take advantage of all available Hadoop patches for Netty 4.1 once tested and fully merged.

    [FLINK-9487][state] Prepare InternalTimerHeap for asynchronous snapshots

    • This is the first step in our efforts to make timer services scalable and improve checkpoint speed.

    [FLINK-8430] [table] Implement stream-stream non-window full outer join

    • This commit completes the family of the most important non-windowed join operations in Flink’s streaming SQL. In addition to time-windowed joins, the upcoming Flink version will support INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. A design document about another class of joins for streaming enrichment has just been published. Feel free to join the discussion there!

    [FLINK-8861] [sql-client] Add support for batch queries in SQL Client

    • Even though Flink SQL has unified batch and streaming semantics, the SQL Client has not been able to execute batch queries so far. With this contribution, the SQL Client becomes a more useful tool for experimenting with Flink SQL and quick prototyping.

    [FLINK-7789] Add handler for Async IO operator timeouts

    • Extends the AsyncIO operator such that the user can specify how to handle timeouts of the asynchronous operations. This is a major improvement that gives greater flexibility to Flink users.

    To stay up-to-date with the latest additions and news about Apache Flink and data Artisans you can subscribe to the Apache Flink mailing list. Follow @VervericaData to be notified when our next blog post is ready.

    join the discussion there!

[FLINK-8861] [sql-client] Add support for batch queries in SQL Client

  • Even though Flink SQL has unified batch and streaming semantics, the SQL Client has not been able to execute batch queries so far. With this contribution, the SQL Client becomes a more useful tool for experimenting with Flink SQL and quick prototyping.

[FLINK-7789] Add handler for Async IO operator timeouts

  • Extends the AsyncIO operator such that the user can specify how to handle timeouts of the asynchronous operations. This is a major improvement that gives greater flexibility to Flink users.

To stay up-to-date with the latest additions and news about Apache Flink and data Artisans you can subscribe to the Apache Flink mailing list. Follow @VervericaData to be notified when our next blog post is ready.

New call-to-action

Topics:
Till Rohrmann
Article by:

Till Rohrmann

Find me on:

Comments

Our Latest Blogs

Announcing the Release of Apache Flink 1.19 featured image
by Lincoln Lee 18 March 2024

Announcing the Release of Apache Flink 1.19

The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people...
Read More
Building real-time data views with Streamhouse featured image
by Alexey Novakov 10 January 2024

Building real-time data views with Streamhouse

Use Case Imagine an e-commerce company operating globally, we want to see, in near real-time, the amount of revenue generated per country while the order management system is processing ongoing...
Read More
Streamhouse: Data Processing Patterns featured image
by Giannis Polyzos 05 January 2024

Streamhouse: Data Processing Patterns

Introduction In October, at Flink Forward 2023, Streamhouse was officially introduced by Jing Ge, Head of Engineering at Ververica. In his keynote, Jing highlighted the need for Streamhouse,...
Read More