How to choose a state backend for a Flink job

Answer

Note: This applies to Flink 1.8 and later.

Flink 1.12 or earlier

Out of the box, Flink bundles three state backends:

MemoryStateBackend (Default)
FsStateBackend
RocksDBStateBackend

The following table shows an overview of these state backends focusing on where in-flight data lives and where the state is persisted during snapshotting (checkpoints and savepoints). It also summarizes the use cases they are good for, which should help you choose the proper state backend for your job.

Name	In-Flight data	Snapshots
MemoryStateBackend	JVM Heap	JobManager JVM Heap
MemoryStateBackend	Good for testing and experimentation with small states Not for production use!
FsStateBackend	JVM Heap	Distributed File System
FsStateBackend	Fast, requires a large heap Subject to GC
RocksDBStateBackend	Local disk (tmp dir) with Off-Heap/Native Buffers	Distributed File System
RocksDBStateBackend	Supports state larger than available memory Supports incremental snapshotting Rule of thumb: 10x slower than heap-based backends

To set the state backend in flink-conf.yaml, use the key state.backend and set its value to jobmanager, filesystem, or rocksdb.

Flink 1.13 or later

To separate the in-flight state storage and the checkpoint storage explicitly, Flink 1.13 and later bundle two state backends:

HashMapStateBackend (Default)
EmbeddedRocksDBStateBackend

which stores the in-flight state in the JVM heap or RocksDB, respectively. You can use these state backends with different checkpoint storage independently, e.g., JobManagerCheckpointStorage or FileSystemCheckpointStorage. To set in flink-conf.yaml, use

state.backend: hashmap (or rocksdb)
state.checkpoint-storage: filesystem (or jobmanager)

# if specified, implies 'filesystem' checkpoint-storage
state.checkpoints.dir: file:///checkpoint-dir/

Starting Flink 1.13, one can switch from one state backend to another one by first making a job savepoint and then restoring it with a different state backend. Therefore, a user can try first to use the "hashmap" backend in production and then maybe later switch to "rocksdb" to benefit from its incremental checkpointing, which is not yet supported in the "hashmap" backend as of May 2023.

Related Information

Flink State Backends