This extension allows you to do 2 things:
To use this Apache Druid extension, include
druid-s3-extensions in the extensions load list.
Reading data from S3
To configure the extension to read objects from S3 you need to configure how to connect to S3.
S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3.
S3 deep storage needs to be explicitly enabled by setting
druid.storage.type=s3. Only after setting the storage type to S3 will any of the settings below take effect.
Deep storage specific configuration
|Bucket to store in.||Must be set.|
|A prefix string that will be prepended to the object names for the segments published to S3 deep storage||Must be set.|
|Global deep storage provider. Must be set to ||Must be set (likely |
|S3 bucket name for archiving when running the archive task.||none|
|S3 object key prefix for archiving.||none|
|Boolean flag to disable ACL. If this is set to ||false|
|If true, use the "s3a" filesystem when using Hadoop-based ingestion. If false, the "s3n" filesystem will be used. Only affects Hadoop-based ingestion.||false|
S3 authentication methods
Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). Note : You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using Properties Object parameters in the ingestionSpec.
|1||Druid config file||Based on your runtime.properties if it contains values |
|2||Custom properties file||Based on custom properties file where you can supply |
|3||Environment variables||Based on environment variables |
|4||Java system properties||Based on JVM properties |
|5||Profile information||Based on credentials you may have on your druid instance (generally in |
|6||ECS container credentials||Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the EC2ContainerCredentialsProviderWrapper documentation|
|7||Instance profile information||Based on the instance profile you may have attached to your druid instance|
You can find more information about authentication method here
Note : Order is important here as it indicates the precedence of authentication methods.
So if you are trying to use Instance profile information, you must not set
druid.s3.secretKey in your Druid runtime.properties
S3 permissions settings
s3:PutObject are basically required for pushing/loading segments to/from S3.
druid.storage.disableAcl is set to
s3:PutObjectAcl are additionally required to set ACL for objects.
The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property
aws.region or the environment variable
As an example, to set the region to 'us-east-1' through system properties:
-Daws.region=us-east-1to the jvm.config file for all Druid services.
druid.indexer.runner.javaOptsin Middle Manager configuration so that the property will be passed to Peon (worker) processes.
Connecting to S3 configuration
|S3 access key. See S3 authentication methods for more details||Can be omitted according to authentication methods chosen.|
|S3 secret key. See S3 authentication methods for more details||Can be omitted according to authentication methods chosen.|
|Path to properties file containing ||Can be omitted according to authentication methods chosen.|
|Communication protocol type to use when sending requests to AWS. |
|Disables chunked encoding. See AWS document for details.||false|
|Enables path style access. See AWS document for details.||false|
|Enables global bucket access. See AWS document for details.||false|
|Service endpoint either with or without the protocol.||None|
|Region to use for SigV4 signing of requests (e.g. us-west-1).||None|
|Proxy host to connect through.||None|
|Port on the proxy host to connect through.||None|
|User name to use when connecting through a proxy.||None|
|Password to use when connecting through a proxy.||None|
|Server-side encryption type. Should be one of ||None|
|AWS KMS key ID. This is used only when ||None|
|Base64-encoded key. Should be specified if ||None|
You can enable server-side encryption by setting
druid.storage.sse.type to a supported type of server-side encryption. The current supported types are: