How to set arbitrary S3 configuration options (Hadoop S3A, Presto S3) in Flink?

Question

The Flink docs on S3 only present a few configuration examples, for example, how to configure access credentials. How can I set arbitrary configuration parameters like presto.s3.max-retry-time or fs.s3a.attempts.maximum in Flink?

Answer

Note: This section applies to Flink 1.5 or later.

You can configure both S3 file system implementations via flink-conf.yaml. For configuration parameters to be forwarded to their native implementation, they need to match specific prefixes:

  • Hadoop S3A: s3. | s3a. | fs.s3a.
  • Presto S3: s3. | presto.s3.

Examples

Please see the following examples for how the mapping is done. Please note a special replacement that allows s3.access-key/s3.access.key and s3.secret-key/ s3.secret.key to be used in both implementations despite their different native configuration names.

Flink-s3-fs-hadoop

flink-s3-fs-hadoop native     |   flink-conf.yaml variants
==============================================================
fs.s3a.access.key            <->  s3.access.key
                                  s3a.access.key
                                  fs.s3a.access.key
                                  s3.access-key
fs.s3a.secret.key            <->  s3.secret.key
                                  s3a.secret.key
                                  fs.s3a.secret.key
                                  s3.secret-key
fs.s3a.endpoint              <->  s3.endpoint
                                  s3a.endpoint
                                  fs.s3a.endpoint
fs.s3a.proxy.host            <->  s3.proxy.host
                                  s3a.proxy.host
                                  fs.s3a.proxy.host
fs.s3a.proxy.port            <->  s3.proxy.port
                                  s3a.proxy.port
                                  fs.s3a.proxy.port

For more configuration options, please refer to Hadoop's AWS module documentation (the actual supported values may depend on the flink-s3-fs-hadoop version you use).

Flink-s3-fs-presto

flink-s3-fs-presto native     |   flink-conf.yaml variants
==============================================================
presto.s3.access-key         <->  s3.access-key
                                  presto.s3.access-key
                                  s3.access.key
presto.s3.secret-key         <->  s3.secret-key
                                  presto.s3.secret-key
                                  s3.secret.key
presto.s3.endpoint           <->  s3.endpoint
                                  presto.s3.endpoint
presto.s3.max-error-retries  <->  s3.max-error-retries
                                  presto.s3.max-error-retries
presto.s3.max-retry-time     <->  s3.max-retry-time
                                  presto.s3.max-retry-time

For more configuration options, please refer to Presto's S3 configuration docs replacing hive.s3 with presto.s3 (the actual supported values may depend on the flink-s3-fs-presto version you use).

Related Information