Your Cloud, Your Rules: Ververica's Bring Your Own Cloud Deployment
Note: This applies to Flink 1.8 and later.
The JobManager and TaskManagers fail with org.rocksdb.RocksDBException: ... Not supported
and an entry like the following in the logs:
2019-10-15 12:33:53,699 INFO org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Could not complete snapshot 1 for operator testOperator (1/2).
org.rocksdb.RocksDBException: while link file to /mnt/checkpoints/job/<path>/chk-1.tmp/000022.sst: /mnt/checkpoints/job/<path>/db/000022.sst: Not supported
at org.rocksdb.Checkpoint.createCheckpoint(Native Method)
at org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51)
at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.takeDBNativeCheckpoint(RocksIncrementalSnapshotStrategy.java:243)
at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:154)
at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:484)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
at org.apache.flink.streaming.runtime.io.BarrierTracker.notifyCheckpoint(BarrierTracker.java:270)
at org.apache.flink.streaming.runtime.io.BarrierTracker.processBarrier(BarrierTracker.java:186)
at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:105)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:273)
at org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:117)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Thread.java:748)
/mnt/checkpoints
state.backend: rocksdb
state.backend.incremental: 'true'
state.checkpoints.dir: 'file:///mnt/checkpoints'
state.backend.rocksdb.localdir: /mnt/checkpoints/job
While keeping state.checkpoints.dir
on a distributed file system (Azure File Share in this case), move state.backend.rocksdb.localdir
to a local file system like /tmp
.
When state.checkpoints.dir
and state.backend.rocksdb.localdir
are configured to use the same file system, RocksDB makes use of hard links for checkpointing. Azure File Share does, however, not support hard-links and thus fails.
Important: RocksDB is a local embedded database used by Flink on each TaskManager. It is not used for state persistence or fault tolerance. As such, it should always be on a local file system, preferably a locally-attached SSD.