3 important performance factors for stateful functions and operators in Flink

February 15, 2019 | by Tzu-Li (Gordon) Tai

This post focuses on the 3 factors developers should keep in mind when assessing the performance of a function or operator that uses Flink’s Keyed State in a stateful streaming application.

Keyed state is one of the two basic types of state in Apache Flink, the other being Operator state. As its name suggests, keyed state is bound to keys and is only available to functions and operators that process data from a KeyedStream. The difference between operator and keyed state is that operator state is scoped per parallel instance of an operator (sub-task), while keyed state is partitioned or sharded based on exactly one state-partition per key.

For more information about State in the Apache Flink, the documentation section “Working with State” describes how to use Flink’s state abstractions when developing an application. 

Friday-Flink-Tip-3-important-performance-factors-for-stateful-functions-and-operators-in-Flink

 

3 factors that impact the performance of
Keyed State in Apache Flink

With this in mind, let’s discuss below 3 factors that will impact the performance of Flink’s keyed state that you should keep in mind while developing stateful streaming applications:

The chosen state backend

The selected state backend has the most dominant impact on the performance of your stateful function or operation in a Flink application. The most distinctive factor here is how each state backend handles serialization of your state for persistence differently.

For example, when using the FsStateBackend or MemoryStateBackend, local state is maintained as on-heap objects during runtime and therefore has low overhead when accessing or updating them. Serialization overheads only occur when snapshots of the state are taken to create Flink checkpoints or savepoints. Downside to using this state backend is that state size is limited by heap size of the JVM, and can potentially run into OutOfMemory errors or long pauses for garbage collection.

On the contrary, out-of-core state backends such as the RocksDBStateBackend allows much larger state sizes by maintaining local state on disk. The tradeoff is that every state read and write would require serialization / deserialization.

To wrap this up, if your state size is small and expected to not exceed heap size, then using on-heap backends would be the obvious choice as it avoids serialization overheads. Otherwise, currently, RocksDBStateBackend would be the go-to choice for applications with large state sizes.

Per-key state primitives

ValueState / ListState / MapState

Another important factor is knowing to choose the correct state primitives. Flink currently supports 3 main state primitives for keyed state: ValueState, ListState, and MapState.

One common mistake new developers to Flink might make is having as state, for example, a  ValueState<Map<String, Integer>> while the map entries are intended to only be randomly accessed. In this case, it is definitely better to use MapState<String, Integer>, especially when taking into account that out-of-core state backends, such as RocksDBStateBackend, serializes/deserializes ValueState states completely on access, while for MapState, serialization occurs per-entry.

Access Pattern

Following up the previous section about state primitives, it was already quite thoroughly hinted that assessing how your application logic accesses state will help determine what state structure you should be using. As developers should always expect when designing any kind of application, using unsuitable data structures for your application’s specific data access pattern can have a severe impact on overall performance.

Conclusion

Developers should consider all three factors above as they can affect the performance of stateful functions and operators in Flink to a great extent. We encourage you to check our Advanced Training Schedule below for more best practices and application design patterns for stateful streaming applications.

Public Training Flink, Flink training, Ververica training, Apache Flink training

Ververica Contact

 

 

 

 

Topics: Apache Flink

Tzu-Li (Gordon) Tai
Article by:

Tzu-Li (Gordon) Tai

Find me on:

Related articles

Comments

Sign up for Monthly Blog Notifications

Please send me updates about products and services of Ververica via my e-mail address. Ververica will process my personal data in accordance with the Ververica Privacy Policy.

Our Latest Blogs

by Stephan Ewen January 20, 2022

A Farewell Message

Today I need to share some bittersweet news: I have decided to leave Ververica and reduce my engagement in Apache Flink, to start a new endeavor. This was one of the toughest decisions of my life,...

Read More

Ververica Apache Flink Hackathon - Nov 2021 Edition

Hackathons at Ververica

At Ververica we have a long-standing tradition of Hackathons. In fact, some widespread and very relevant features and efforts in Apache Flink and Ververica Platform...

Read More
by Konstantin Knauf December 17, 2021

Security Advisory - Log4Shell

What is Log4Shell?

A Remote Code Execution (RCE) vulnerability was discovered in the popular Java logging library, Log4j. It is tracked via CVE-2021-44228 and is known as Log4Shell. This is a...

Read More