Deployments with a very large deployment specification may get stuck

Issue

A Ververica Platform Deployment with a very large deployment specification is stuck in the TRANSITIONING state with java.nio.BufferOverflowException in the platform's appmanager DEBUG logs:

2020-12-09 09:10:38.888 DEBUG 1 --- [eduler-worker-5] c.d.a.c.c.drivers.ScheduledController    : Exception while invoking the controller JobControllerLogic{jobId=4dd83c94-e0da-4955-b45e-a420179d94a3}. Will retry after backing off for 55155 milliseconds
java.nio.BufferOverflowException: null
	at java.base/java.nio.HeapByteBuffer.put(Unknown Source) ~[na:na]
	at com.fasterxml.jackson.databind.util.ByteBufferBackedOutputStream.write(ByteBufferBackedOutputStream.java:16) ~[jackson-databind-2.11.2.jar!/:2.11.2]
	at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2159) ~[jackson-core-2.11.2.jar!/:2.11.2]
	at com.fasterxml.jackson.core.json.UTF8JsonGenerator.close(UTF8JsonGenerator.java:1203) ~[jackson-core-2.11.2.jar!/:2.11.2]
	at com.fasterxml.jackson.databind.ObjectMapper._writeValueAndClose(ObjectMapper.java:4412) ~[jackson-databind-2.11.2.jar!/:2.11.2]
	at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:3619) ~[jackson-databind-2.11.2.jar!/:2.11.2]
	at com.dataartisans.appmanager.controller.core.serde.VersionAwareJacksonSerde.serialize(VersionAwareJacksonSerde.java:65) ~[appmanager-controller-2.2.2.jar!/:na]
	at com.dataartisans.appmanager.persistence.inmemory.InMemoryPersistence.serialize(InMemoryPersistence.java:389) ~[appmanager-persistence-inmemory-2.2.2.jar!/:na]
...

This may either happen at the deployment creation, after editing the deployment, or while the current deployment state is being synchronized by the platform.

Environment

Ververica Platform version 2.0 - 2.8

Cause

Ververica Platform versions 2.0 - 2.3.3 have an internal limitation of 32KB on the size of the serialized form of a deployment. If a deployment exceeds this limitation, you will experience the problem as illustrated above. This often happens when a deployment is configured with lots of arguments to the main class of the job.

Workaround

If you have many arguments to the main class of your job, avoid using command line parameters and try to provide them via a file or via Kubernetes ConfigMaps / Secrets instead. You may even share a common set of properties for different jobs through a set of common files, ConfigMaps, or Secrets. See How to inject many properties into Flink Jobs running on Ververica Platform for more details.

Fixed Version

Ververica Platform 2.3.4 and onwards increase the default limitation to 64KB. If you need to increase it even further, you can configure this limit in your values.yaml file. Be cautious about increasing it too far though and rather provide large sets of configurations via files, ConfigMaps, or Secrets.

vvp:
  appmanager:
    persistence:
      appmanager-persistence-inmemory.max-resource-size-bytes: <size_limit_in_bytes>