Flink Forward 2025 Barcelona: The Future of AI is Real-Time
Which state backend should I choose for my Flink job?
Note: This applies to Flink 1.8 and later.
Out of the box, Flink bundles three state backends:
The following table shows an overview of these state backends focusing on where in-flight data lives and where the state is persisted during snapshotting (checkpoints and savepoints). It also summarizes the use cases they are good for, which should help you choose the proper state backend for your job.
Name | In-Flight data | Snapshots |
MemoryStateBackend |
JVM Heap | JobManager JVM Heap |
|
||
FsStateBackend |
JVM Heap | Distributed File System |
|
||
RocksDBStateBackend |
Local disk (tmp dir) with Off-Heap/Native Buffers |
Distributed File System |
|
To set the state backend in flink-conf.yaml
, use the key state.backend
and set its value to jobmanager
, filesystem
, or rocksdb
.
To separate the in-flight state storage and the checkpoint storage explicitly, Flink 1.13 and later bundle two state backends:
which stores the in-flight state in the JVM heap or RocksDB, respectively. You can use these state backends with different checkpoint storage independently, e.g., JobManagerCheckpointStorage or FileSystemCheckpointStorage. To set in flink-conf.yaml
, use
state.backend: hashmap (or rocksdb)
state.checkpoint-storage: filesystem (or jobmanager)
# if specified, implies 'filesystem' checkpoint-storage
state.checkpoints.dir: file:///checkpoint-dir/
Starting Flink 1.13, one can switch from one state backend to another one by first making a job savepoint and then restoring it with a different state backend. Therefore, a user can try first to use the "hashmap" backend in production and then maybe later switch to "rocksdb" to benefit from its incremental checkpointing, which is not yet supported in the "hashmap" backend as of May 2023.