Your Cloud, Your Rules: Ververica's Bring Your Own Cloud Deployment
I want to configure Kubernetes High-Availability (HA) for my Flink cluster. Could you tell me how Ververica Platform's Kubernetes HA and Apache Flink's Kubernetes HA work? What are the differences between these two solutions?
Note: This article applies to Flink 1.8 and later, running in Ververica Platform 2.0+.
Since Ververica Platform 2.8, we have integrated Apache Flink Kubernetes HA and adapted it to support the LATEST_STATE restore strategy in Ververica Platform. All users of Ververica Platform 2.8 or later are recommended to use Flink Kubernetes HA for their deployments.
tldr; See Summary
For a Flink cluster without HA, if the JM crashes, no new jobs can be submitted and all running jobs will fail. After HA is enabled, new jobs can be submitted immediately when a new JM is ready. Running jobs will continue to make progress after one more step - restarting from the previous checkpoint. Yes, you can consider it as the job's downtime. In Flink, you must always expect restarts and tune your application such that your SLAs still hold with a sporadic restart.
Apache Flink's Kubernetes HA can be activated by the following Flink configuration:
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: /path/to/ha/store
kubernetes.cluster-id: cluster-id
Typically, in this approach, there is a single leading JobManager at any time and multiple standby JobManagers to take over leadership in case the leader fails.
Ververica Platform's Kubernetes HAcan be activated by
Ververica Platform 2.8 or later
kind: Deployment # this is Ververica Platform deployment, not the Kubernetes one
spec:
template:
spec:
flinkConfiguration:
high-availability: kubernetes
high-availability.storageDir: /path/to/ha/store # optional, see more below
high-availability.vvp-kubernetes.job-graph-store.enabled: true # optional,see more below
Ververica Platform 2.7 or earlier
kind: Deployment # this is Ververica Platform deployment, not the Kubernetes one
spec:
template:
spec:
flinkConfiguration:
high-availability: vvp-kubernetes
high-availability.storageDir: /path/to/ha/store # optional, see more below
high-availability.vvp-kubernetes.job-graph-store.enabled: true # optional,see more below
Unlike Apache Flink's Kubernetes HA, this approach starts a single JobManager (JM) and manages it with a Kubernetes Jobresourcewith the following specification:
kind: Job
spec:
completions: 1
parallelism: 1
Whenever that JM crashes, i.e., not exited with 0, Kubernetes will launch another JM as a replacement. With the help of high availability storage, the new JM knows where to start. The high availability storage is configured automatically if Ververica Platform's Universal Blob Storage is configured.
In addition, with the configuration high-availability.kubernetes.job-graph-store.enabled: true
or high-availability.vvp-kubernetes.job-graph-store.enabled:true
, you can also store the generated job graph in the high availability storage. Upon JM failover,instead of re-executing your job's main()
to generate the job graph, the stored job graph will be used.
In all Kubernetes HA approaches described above, when the active JM (the leading JM in the case of Apache Flink's Kubernetes HA, the single JM in the case of Ververica Platform's original and new Kubernetes HA) is lost, the TaskManagers (TMs) who are talking to the failed JM will fail the tasks running on them. This leads to a job restart from the previous checkpoint. Flink's restart strategy controls how restarts are handled.
In terms of failover and recovery time, Apache Flink's Kubernetes HA can switch over to one of the stand-by JM quickly as it is already started. With Ververica Platform's Kubernetes HA, a new JM pod needs to be launched and started. But the execution of the job's main()
can be skipped if you enable the job graph store. The actual recovery time depends on your job and your cluster environment.
Another difference is that Ververica Platform's Kubernetes HA runs a single leader election process for the entire JM process while Apache Flink's Kubernetes HA runs multiple of them, one for Dispatcher, one for ResourceManager, one for JobManager, etc. Flink 1.15 followed the same route as Ververica Platform did to implement amultiple-component leader election service. See FLINK-24038for more details.
The biggest advantage of Ververica Platform's Kubernetes HA is that, if you set RestoreStrategy=LATEST_STATE
in your deployment, even if the job fails completely (e.g., exhausted the configured restart attempts) after the underlying issue is fixed, the job can still restart from the latest state (i.e., the previously completed checkpoint or savepoint).
This table summarizes the comparison between both Kubernetes HA approaches.
Apache Flink Kubernetes HA | Ververica Platform Kubernetes HA | |
The number of JMs |
one active, multiple standbys |
one |
Is the job restarted upon failover? | yes | |
Is the job graph regenerated upon failover? | yes, main() is re-executed to generate the job graph |
Can (optionally) be stored in the HA store for re-use |
The number of leader election processes in JM |
multiple (Flink <1.15) single (Flink 1.15 or later) |
single |
Can recover from LATEST_STATE |
no |
yes |