How to set arbitrary S3 configuration options (Hadoop S3A, Presto S3) in Flink?
The Flink docs on S3 only present a few configuration examples, for example, how to configure access credentials. How can I set arbitrary configuration parameters like presto.s3.max-retry-time or fs.s3a.attempts.maximum in Flink?
Answer
Note: This section applies to Flink 1.5 or later.
You can configure both S3 file system implementations via flink-conf.yaml. For configuration parameters to be forwarded to their native implementation, they need to match specific prefixes:
- Hadoop S3A:
s3.|s3a.|fs.s3a. - Presto S3:
s3.|presto.s3.
Examples
Please see the following examples for how the mapping is done. Please note a special replacement that allows s3.access-key/s3.access.key and s3.secret-key/ s3.secret.key to be used in both implementations despite their different native configuration names.
Flink-s3-fs-hadoop
flink-s3-fs-hadoop native | flink-conf.yaml variants
==============================================================
fs.s3a.access.key <-> s3.access.key
s3a.access.key
fs.s3a.access.key
s3.access-key
fs.s3a.secret.key <-> s3.secret.key
s3a.secret.key
fs.s3a.secret.key
s3.secret-key
fs.s3a.endpoint <-> s3.endpoint
s3a.endpoint
fs.s3a.endpoint
fs.s3a.proxy.host <-> s3.proxy.host
s3a.proxy.host
fs.s3a.proxy.host
fs.s3a.proxy.port <-> s3.proxy.port
s3a.proxy.port
fs.s3a.proxy.port
For more configuration options, please refer to Hadoop's AWS module documentation (the actual supported values may depend on the flink-s3-fs-hadoop version you use).
Flink-s3-fs-presto
flink-s3-fs-presto native | flink-conf.yaml variants
==============================================================
presto.s3.access-key <-> s3.access-key
presto.s3.access-key
s3.access.key
presto.s3.secret-key <-> s3.secret-key
presto.s3.secret-key
s3.secret.key
presto.s3.endpoint <-> s3.endpoint
presto.s3.endpoint
presto.s3.max-error-retries <-> s3.max-error-retries
presto.s3.max-error-retries
presto.s3.max-retry-time <-> s3.max-retry-time
presto.s3.max-retry-time
For more configuration options, please refer to Presto's S3 configuration docs replacing hive.s3 with presto.s3 (the actual supported values may depend on the flink-s3-fs-presto version you use).