How to Make Sure My Deployment Always Restore From the Latest State

Question

I am running Flink jobs in Ververica Platform. What should I configure such that whenever my deployments restart, they always restore from the latest checkpoint or savepoint whichever is the latest one?

Answer

Note: This section applies to Ververica Platform 2.0-2.8.

Re-starting Flink jobs automatically from the latest state, aka the LATEST_STATE restore strategy, is one of the product features that Ververica Platform provides on top of Flink. The latest state can be a checkpoint or a savepoint whichever is the latest one at the restoring time. To have the LATEST_STATE restore strategy, you need to configure the following:

1) Enable checkpointing in your Flink job. For example,

execution.checkpointing.interval: 60s

You can also configure this via the "Advance" editor on the Ververica Platform's Web UI:

4590769633180

2) Retain checkpoints when your job fails or is canceled.

execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION

You can also configure this via the "Advance" editor on the Ververica Platform's Web UI:

4590761688476

Note: If this is not configured, checkpoints will not be retained. As a result, the LATEST_STATE restore strategy will behave in the same way as the LATEST_SAVEPOINT restore strategy.

3) Configure Kubernetes HA such that the latest checkpoint can be remembered and used upon job restoring

high-availability: vvp-kubernetes

You can configure this via the "Advance" editor on the Ververica Platform's Web UI:

4483681995676

Note: If this is not configured, when your Flink job fails (i.e., exhausted the configured retry attempts), Ververia Platform will restart the job from scratch. This means, the job will be restarted either from an empty state or from a savepoint that it was initially started with.

4) Configure the LATEST_STATE restore strategy. While the configuration in (1)-(3) are all Flink configurations, the LATEST_STATE restore strategy is configured at the deployment level:

spec:
  ...
  restoreStrategy:
    kind: LATEST_STATE

You can configure this via the "Advance" editor on the Ververica Platform's Web UI:

4483651432732

Related Information