Troubleshooting Apache Flink with Byteman

July 13, 2021 | by Victor Xu


What would you do if you need to see more details of some Apache Flink application logic at runtime, but there's no logging in that code path? An option is modifying the Flink source code or the application code, recompiling and redeploying it, which is time-consuming and error-prone. A quicker and more straightforward approach is to use Byteman. It can inject Java code into JVM and retrieve the runtime details you need.

What is Byteman

byteman, apache flink, debugging, troubleshooting

Byteman is a tool that makes it easy to trace, monitor, and test Java applications and JDK runtime code behavior. It can inject Java code into the application methods or into Java runtime methods without the need for you to recompile, repackage or even redeploy your application. The injected code can access any of your data and call any application methods, including the ones that are private.

To fully unleash the power of Byteman, you can use a simple scripting language based on a formalism called Event Condition Action (ECA) rule to specify where, when, and how the original Java code should be transformed. A rule specifies a trigger point and a location where you want code to be injected. When the execution reaches the trigger point, the rule's condition, a Java boolean expression, is evaluated. The Java expression (or sequence of expressions) in the rule's action is executed only when the condition is true.

In the next section, I will use an example to show how to leverage Byteman to retrieve more details of the underlying logic within a Flink application.

Apache Flink Troubleshooting Case Study


Checkpointing is the fault tolerance mechanism in the Apache Flink framework. When using S3 as the checkpointing destination, Flink usually leverages the Hadoop or Presto libraries for any underlying communication. However, it can sometimes be challenging to troubleshoot issues in that code path because the 3rd-party libraries don’t always contain sufficient logging. In this example, I’ll demonstrate some of Byteman’s capabilities on the logging code injection in a Flink application running on Ververica Platform.


Download the latest version (which is 4.0.15 at the time of this writing) of Byteman from the official website. After decompression, you can find the required byteman.jar in the byteman-download-4.0.15/lib directory.

important-clipart-general-information-1 Note that the bytemand-install.jar and byteman-submit.jar in the same directory are not sufficient for this use case.


To achieve the goal, let’s write some rules for Byteman in a plain text file (the lines start with ‘#’ are comments):

# File name: rules_v1.btm
# Start of a rule (naming the rule)
RULE rule_example_1

# Target class
CLASS ^org.apache.flink.fs.s3.common.FlinkS3FileSystem

# Target method (i.e. the constructor method)

# Injection position in the method (e.g. ENTRY/EXIT/LINE number/...)

# Bind a parameter for logging (same as a local variable)
BIND myLOG:org.slf4j.Logger = org.slf4j.LoggerFactory.getLogger($0.getClass());

# Trigger condition (no need in this case, so I put 'true')
IF true

# Actions to take when the rule got triggered (print the total number of parameters for the constructor method together with the values for the first and fifth parameters)
DO"Hello, FlinkS3FileSystem! -- Byteman");"Total number of parameters: " + $#);"#1 hadoopS3FileSystem: " + $1);"#5 S3AccessHelper: " + $5);

# End of rule

# ----------------------------------
# Another rule. (A single byteman rule file can contain multiple rule definitions.)

RULE rule_example_2

CLASS ^com.facebook.presto.hive.s3.PrestoS3FileSystem

METHOD initialize


BIND myLOG:org.slf4j.Logger = org.slf4j.LoggerFactory.getLogger($0.getClass());

IF true

DO"Hello, PrestoS3FileSystem! -- Byteman");"AmazonS3: [" + $0.s3 + "]");"TransferManagerConfiguration: [" + $0.transferConfig + "]");

# End of file rules_v1.btm

The full explanation of the rule language can be found here


Ververica Platform Configuration

The version of Veverica Platform in this demo is 2.5.0 and the corresponding Flink version is Flink 1.13.1. The Flink application in the demo is Top Speed Windowing from the Ververica Platform documentation.

Firstly, we need to upload both the byteman.jar and the rules file to Ververica Platform as two artifacts:

artifacts, ververica platform, apache flink, flink, stream processing

Then, the next step is to add both files to the Additional Dependencies and configure for the Apache Flink application:

top speeding window, flink, apache flink, ververica platform

top speeding window, flink, apache flink, ververica platform 

important-clipart-general-information-1 Note that the references to byteman files in must be absolute paths. Otherwise, it will fail. In VVP Deployment, the default local path for additional dependencies is /flink/usrlib, so the complete configuration looks like below: >-



After implementing the above changes, the Deployment will automatically restart. To check the results, click the Flink UI button at the top of the Deployment page and then click the Task Manager tab on the left of the Flink page. 

Flink UI, top speeding window, Flink, byteman, Flink troubleshooting, Flink debugging

flink task manager, apache flink, flink debugging, flink troubleshooting, flink UI

We should be able to see the TM logging after going into the TM page and clicking the Logs tab. Now the customized logging information will be shown after searching the ‘Byteman’ in the logging area:

flink task manager, apache flink, flink debugging, flink troubleshooting, flink UI,n2


Byteman is a very powerful tool that can diagnose most Java-related application issues. If you are willing to learn quickly how to best utilize Byteman, please refer to this quick tutorial. It can save you many hours when troubleshooting complex issues in your application, especially in cases when other troubleshooting methods might have failed. This article provided only a brief introduction to Byteman, and the example above only reveals a tiny fraction of its capabilities. To get more information, please check the Byteman website. For additional troubleshooting and debugging tips make sure to check our Ververica Troubleshooting & Operations training and sign up for the next available training date on our website. 

Public Training Flink, Flink training, Ververica training, Apache Flink trainingVerverica Contact





Topics: Apache Flink

Victor Xu
Article by:

Victor Xu

Find me on:

Related articles


Sign up for Monthly Blog Notifications

Please send me updates about products and services of Ververica via my e-mail address. Ververica will process my personal data in accordance with the Ververica Privacy Policy.

Our Latest Blogs

by Victor Xu July 13, 2021

Troubleshooting Apache Flink with Byteman


What would you do if you need to see more details of some Apache Flink application logic at runtime, but there's no logging in that code path? An option is modifying the Flink source...

Read More

Announcing Ververica Platform 2.5 for Apache Flink 1.13

New release includes full support for Apache Flink 1.13, with greatly expanded streaming SQL, new performance monitoring, and many new application management features.

Read More
by Nico Kruber May 11, 2021

SQL Query Optimization with Ververica Platform 2.4

In my last blog post, Simplifying Ververica Platform SQL Analytics with UDFs, I showed how easy it is to get started with SQL analytics on Ververica Platform and leverage the power of user-defined...

Read More