How to Include External Python Libraries in PyFlink Deployments on Ververica Platform
I need to use external Python libraries (e.g., requests, numpy, boto3) in my PyFlink job, but importing them results in ModuleNotFoundError: No module named 'xxx'. How can I add third-party Python packages to my PyFlink deployment?
Answer
Note: This section applies to Ververica Platform 2.6+ with Apache Flink 1.15+.
External Python libraries are not included in the default Flink runtime image. To use them in a PyFlink job, you must explicitly provide them to the Flink execution environment. There are two approaches depending on whether your Flink cluster has internet access.
Approach 1: Using requirements.txt (Internet Access Required)
If your Flink cluster nodes can access the internet (e.g., PyPI), you can let Flink install the dependencies at runtime.
Step 1: Create a requirements.txt file listing the packages you need. Pinning versions is recommended for reproducibility:
my-library==1.2.3
another-library==2.0.0
Step 2: Reference the file in your PyFlink script:
from pyflink.table import EnvironmentSettings, TableEnvironment
t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
t_env.set_python_requirements("/flink/usrlib/requirements.txt")
requirements.txt file as an Additional Dependency in Ververica Platform so that it is available in the pod at runtime.Important: This approach requires that all cluster nodes have network access to PyPI (or your private package index). If your environment is air-gapped or has restricted network policies, use Approach 2 instead.
Approach 2: Pre-Packaging Dependencies as a Zip (Recommended)
This is the recommended approach as it does not require internet access at runtime, ensures consistency across environments, and avoids runtime installation delays.
Step 1: Download or obtain the source/wheel package of the library. For example:
pip download my-library==1.2.3 --no-binary :all: -d .
Step 2: Extract the source package (if it is a .tar.gz):
tar -xvzf my-library-1.2.3.tar.gz
Step 3: Install the package into a flat directory using --no-deps and --target:
mkdir mylib_tmp
python3 -m pip install my-library-1.2.3/ --no-deps --target mylib_tmp
Important: Use --no-deps to avoid pulling in dependencies that may conflict with packages already present in the Flink Python environment. If your library has required dependencies that are not part of the default Flink image, repeat Steps 1-3 for each dependency separately.
Step 4: Zip the contents of the directory (not the directory itself):
cd mylib_tmp
zip -r ../mylib.zip.
cd ..
Critical: The zip file must contain the Python module directories at the root level. For example, mylib.zip should contain my_library/ at the top, not mylib_tmp/my_library/. You can verify with: unzip -l mylib.zip
Step 5: Reference the zip file in your PyFlink script using add_python_file():
from pyflink.table import EnvironmentSettings, TableEnvironment
t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
t_env.add_python_file("/flink/usrlib/mylib.zip")
import my_library # Now available
Step 6: Upload both the zip file and your Python script to the Ververica Platform deployment:
- Upload
mylib.zipunder Additional Dependencies - Upload your Python script as the main Python file
Packaging Multiple Libraries
If your job requires multiple external libraries, you can package them all into a single zip:
mkdir all_deps
# Install each library into the same directory
python3 -m pip install my-library-1.2.3/ --no-deps --target all_deps
python3 -m pip install helper-lib-1.5.0/ --no-deps --target all_deps
python3 -m pip install utils-pkg-2.0.0/ --no-deps --target all_deps
# Zip all contents together
cd all_deps
zip -r ../python_deps.zip .
cd ..
Then reference the single zip in your script:
t_env.add_python_file("/flink/usrlib/python_deps.zip")
Common Mistakes to Avoid
| Mistake | Why It Fails | Correct Approach |
|---|---|---|
Uploading .tar.gz directly and referencing it with add_python_archive() |
Source packages are not installed Python modules — they contain setup.py, metadata, etc., not importable code |
Extract, install with pip install --target, then zip |
| Zipping the parent folder instead of its contents | Python cannot find modules inside a nested directory (e.g., deps/my_library/ instead of my_library/) |
cd into the directory and zip from there: cd deps && zip -r ../deps.zip . |
Using --target without --no-deps |
May pull in packages that conflict with Flink's built-in Python environment | Always use --no-deps and handle each dependency explicitly |
| Using a different Python version to build the zip | Native extensions (.so files) are version-specific (e.g., cpython-310) and will fail to load on a different version |
Match the Python version used for packaging to the one in your Flink image (check with your Flink base image) |
| Confusing Additional Dependencies with Additional Python Archives in VVP UI | These fields serve different purposes in how files are mounted and extracted | Use add_python_file() in your code for the most reliable behavior |
Libraries with Native Dependencies
Some Python libraries require native system-level libraries that are not available in the default Flink Docker image. For example:
psycopg2requireslibpq(PostgreSQL client library)- Database-specific drivers may require vendor client libraries pre-installed in the container
For such libraries, consider these alternatives:
- Use pure-Python alternatives when available (e.g., a JDBC-based connector instead of a native database driver)
- Build a custom Flink Docker image with the required system libraries pre-installed
Written by Naci Simsek · Published 13 Feb 2026 · Last updated 13 Feb 2026