Skip to content

Troubleshooting Apache Flink with Byteman



What would you do if you need to see more details of some Apache Flink application logic at runtime, but there's no logging in that code path? An option is modifying the Flink source code or the application code, recompiling and redeploying it, which is time-consuming and error-prone. A quicker and more straightforward approach is to use Byteman. It can inject Java code into JVM and retrieve the runtime details you need.

What is Byteman

byteman, apache flink, debugging, troubleshooting

Byteman is a tool that makes it easy to trace, monitor, and test Java applications and JDK runtime code behavior. It can inject Java code into the application methods or into Java runtime methods without the need for you to recompile, repackage or even redeploy your application. The injected code can access any of your data and call any application methods, including the ones that are private.

To fully unleash the power of Byteman, you can use a simple scripting language based on a formalism called Event Condition Action (ECA) rule to specify where, when, and how the original Java code should be transformed. A rule specifies a trigger point and a location where you want code to be injected. When the execution reaches the trigger point, the rule's condition, a Java boolean expression, is evaluated. The Java expression (or sequence of expressions) in the rule's action is executed only when the condition is true.

In the next section, I will use an example to show how to leverage Byteman to retrieve more details of the underlying logic within a Flink application.

Apache Flink Troubleshooting Case Study


Checkpointing is the fault tolerance mechanism in the Apache Flink framework. When using S3 as the checkpointing destination, Flink usually leverages the Hadoop or Presto libraries for any underlying communication. However, it can sometimes be challenging to troubleshoot issues in that code path because the 3rd-party libraries don’t always contain sufficient logging. In this example, I’ll demonstrate some of Byteman’s capabilities on the logging code injection in a Flink application running on Ververica Platform.


Download the latest version (which is 4.0.15 at the time of this writing) of Byteman from the official website. After decompression, you can find the required byteman.jar in the byteman-download-4.0.15/lib directory.

the bytemand-install.jar and byteman-submit.jar in the same directory are not sufficient for this use case.


To achieve the goal, let’s write some rules for Byteman in a plain text file (the lines start with ‘#’ are comments):

# File name: rules_v1.btm
# Start of a rule (naming the rule)
RULE rule_example_1

# Target class
CLASS ^org.apache.flink.fs.s3.common.FlinkS3FileSystem

# Target method (i.e. the constructor method)

# Injection position in the method (e.g. ENTRY/EXIT/LINE number/...)

# Bind a parameter for logging (same as a local variable)
BIND myLOG:org.slf4j.Logger = org.slf4j.LoggerFactory.getLogger($0.getClass());

# Trigger condition (no need in this case, so I put 'true')
IF true

# Actions to take when the rule got triggered (print the total number of parameters for the constructor method together with the values for the first and fifth parameters)
DO"Hello, FlinkS3FileSystem! -- Byteman");"Total number of parameters: " + $#);"#1 hadoopS3FileSystem: " + $1);"#5 S3AccessHelper: " + $5);

# End of rule

# ----------------------------------
# Another rule. (A single byteman rule file can contain multiple rule definitions.)

RULE rule_example_2

CLASS ^com.facebook.presto.hive.s3.PrestoS3FileSystem

METHOD initialize


BIND myLOG:org.slf4j.Logger = org.slf4j.LoggerFactory.getLogger($0.getClass());

IF true

DO"Hello, PrestoS3FileSystem! -- Byteman");"AmazonS3: [" + $0.s3 + "]");"TransferManagerConfiguration: [" + $0.transferConfig + "]");

# End of file rules_v1.btm

The full explanation of the rule language can be found here


Ververica Platform Configuration

The version of Veverica Platform in this demo is 2.5.0 and the corresponding Flink version is Flink 1.13.1. The Flink application in the demo is Top Speed Windowing from the Ververica Platform documentation.

Firstly, we need to upload both the byteman.jar and the rules file to Ververica Platform as two artifacts:

artifacts, ververica platform, apache flink, flink, stream processing

Then, the next step is to add both files to the Additional Dependencies and configure for the Apache Flink application:

top speeding window, flink, apache flink, ververica platform

top speeding window, flink, apache flink, ververica platform 

Note that the references to byteman files in must be absolute paths. Otherwise, it will fail. In VVP Deployment, the default local path for additional dependencies is /flink/usrlib, so the complete configuration looks like below: >-



After implementing the above changes, the Deployment will automatically restart. To check the results, click the Flink UI button at the top of the Deployment page and then click the Task Manager tab on the left of the Flink page. 

Flink UI, top speeding window, Flink, byteman, Flink troubleshooting, Flink debugging

flink task manager, apache flink, flink debugging, flink troubleshooting, flink UI

We should be able to see the TM logging after going into the TM page and clicking the Logs tab. Now the customized logging information will be shown after searching the ‘Byteman’ in the logging area:

flink task manager, apache flink, flink debugging, flink troubleshooting, flink UI,n2


Byteman is a very powerful tool that can diagnose most Java-related application issues. If you are willing to learn quickly how to best utilize Byteman, please refer to this quick tutorial. It can save you many hours when troubleshooting complex issues in your application, especially in cases when other troubleshooting methods might have failed. This article provided only a brief introduction to Byteman, and the example above only reveals a tiny fraction of its capabilities. To get more information, please check the Byteman website. For additional troubleshooting and debugging tips make sure to check our Ververica Troubleshooting & Operations training and sign up for the next available training date on our website. 

Public Training Flink, Flink training, Ververica training, Apache Flink training

Ververica Contact





Victor Xu
Article by:

Victor Xu

Find me on:


Our Latest Blogs

The Release of Flink CDC v2.3 featured image
by Hang Ruan & Qingsheng Ren November 30, 2022

The Release of Flink CDC v2.3

Flink CDC is a change data capture (CDC) technology based on database changelogs. It is a data integration framework that supports reading database snapshots and smoothly switching to reading binlogs...
Read More
Flink SQL Recipe: Window Top-N and Continuous Top-N featured image
by Ververica November 25, 2022

Flink SQL Recipe: Window Top-N and Continuous Top-N

Flink SQL has emerged as the standard for low-code streaming analytics and managed to unify batch and stream processing while simultaneously staying true to the SQL standard. In addition, it provides...
Read More
Apache Flink SQL: Past, Present, and Future featured image
by Becket Qin November 22, 2022

Apache Flink SQL: Past, Present, and Future

Recently the Apache Flink community announced the release of Flink 1.16, which continues to push the vision of stream and batch unification in Flink SQL to a new level. At this point, Flink SQL is...
Read More