Fail to start job due to catalog connection timeout

Issue

My Ververica Platform was working fine. But suddenly, I could not start my jobs. And I found the following in the Ververica Platform logs:

[Timestamp] ERROR 1 --- [io-8080-exec-39] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: Failed to access catalog random due to connection timeout.] with root cause
" at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) ~[spring-web-5.3.9.jar:5.3.9]"
" at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) ~[tomcat-embed-core-9.0.52.jar:9.0.52]"
" at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) ~[spring-web-5.3.9.jar:5.3.9]"
" at com.ververica.platform.sql.environment.TableEnvProvider.createTableEnvironment(TableEnvProvider.java:62) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.environment.TableEnvProvider.getTableEnvironment(TableEnvProvider.java:49) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.service.validation.SqlScriptValidationServiceImpl.validate(SqlScriptValidationServiceImpl.java:66) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.service.SqlService.validateStatement(SqlService.java:299) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.controller.SqlController.validateStatement(SqlController.java:191) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.controller.SqlController$$FastClassBySpringCGLIB$$9329171f.invoke(<generated>) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.9.jar:5.3.9]"
" at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:779) ~[spring-aop-5.3.9.jar:5.3.9]"
" at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.9.jar:5.3.9]"
" at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750) ~[spring-aop-5.3.9.jar:5.3.9]"
" at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:61) ~[spring-security-core-5.5.2.jar:5.5.2]"
" at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.9.jar:5.3.9]"
" at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750) ~[spring-aop-5.3.9.jar:5.3.9]"
" at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692) ~[spring-aop-5.3.9.jar:5.3.9]"
" at com.ververica.platform.sql.controller.SqlController$$EnhancerBySpringCGLIB$$5fd459ca.validateStatement(<generated>) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at java.base/java.util.concurrent.FutureTask.get(Unknown Source) ~[na:na]"
" at com.ververica.platform.sql.catalog.ExternalCatalog.callWithClassLoaderAndTimeout(ExternalCatalog.java:86) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.catalog.ExternalCatalog.callWithClassLoaderAndTimeout(ExternalCatalog.java:72) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.catalog.ExternalCatalog.close(ExternalCatalog.java:117) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at java.base/java.util.Optional.ifPresent(Unknown Source) ~[na:na]"
" at com.ververica.platform.sql.catalog.FlinkCatalogProvider$CatalogCacheManager.evictCatalog(FlinkCatalogProvider.java:299) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at java.base/java.lang.Iterable.forEach(Unknown Source) ~[na:na]"
" at com.ververica.platform.sql.catalog.FlinkCatalogProvider$CatalogCacheManager.evictUnregisteredCatalogs(FlinkCatalogProvider.java:319) ~[vvp-sql-service-2.6.1-plain.jar:na]"
" at com.ververica.platform.sql.catalog.FlinkCatalogProvider.getCatalogs(FlinkCatalogProvider.java:95) ~[vvp-sql-service-2.6.1-plain.jar:na]"

… (Some logs skipped)

What can I do?

Environment

Ververica Platform version: 2.x

Resolution

You are likely experiencing a timeout issue when accessing the catalog service. You need to figure out why your catalog service is not responding. For example:

  1. Do you have network issues?
  2. Is your catalog service overloaded?

And as a temporary solution, you may increase the timeout threshold for external catalog service, by adding the following property in values-vvp.yaml (the values file that you use to install Ververica Platform). The default timeout is 10 seconds. In the following example, we set the timeout to 60 seconds (1 minute).

vvp:
  sqlService:
    catalog:
       externalCatalogTimeoutInSecs: 60

Then you can upgrade Ververica Platform with the updated values-vvp.yaml to apply the new configurations. For example, by using

$ helm upgrade --install ververica-platform \
            ververica/ververica-platform \
            --version your_ververica_platform_chart_version  \ # For example 5.5.1 (VVP 2.9.1)
            --values vvp-values.yaml

Cause

Ververica Platform did not receive responses from the external catalog service in time.

Related Information

Getting Started - Installation — Ververica Platform documentation

Installation using Helm — Ververica Platform documentation