Dremio Jekyll

Configuring Dremio to Read S3 files leveraging AWS STS tokens

Mark Johnson

Intro

If you work for an organization which restricts access to the AWS S3 Access Key information and only provides temporary keys for data access, or you need to integrate into Dremio a dataset from an external organization which also only provides only temporary S3 file access, you need to have a way to authenticate using an AWS STS key. Though before you go down the road of enabling S3 access via STS do make certain that there really is no way to use a permanent key as you could find yourself in a situation where you are trying to execute a query and had forgotten to update the S3 login information.

This how to article goes through the HowTo steps to enable dremio to access S3 files through a temporary STS key.

STEP 1: Get the AWS STS

Have someone whose AWS credentials allow access to the particular S3 bucket you are interested in to pull the STS using either the was CLI command or using the AWS SDK to get the STS information as shown below.

1
2
3
4
5
6
7
8
9
~/.aws :> aws sts get-session-token --duration-seconds 129600
{
    "Credentials": {
        "SecretAccessKey": "7NPf1yPIRH88PJ4d1ayOlJ68nFaqJEVQSdaAQ3",
        "SessionToken": "FQoGZXIvYXdzEOb///////wEaDE9YmnarY04xH2h1MiKUAa2jyQ2PRZlyI4TGB44IOHTS1MZFbBW+ybNAdUVL6pW4o+FtDtkMTFvhfiicEb4XMAwasdadjr1Zk/rf2TsIgLOprUJdpWmv8s6VKKaAKWzXnQnOfOIZmBQ7gaKJX0gLaGN52cDaAPyGu4lkZzKaVyWSeL6wCz7AFoefxo9QrF1KHH/jrer0+nCHQP3CDalLVYLeoVEo6c787AU=",
        "Expiration": "2019-10-10T13:43:21Z",
        "AccessKeyId": "ASIAZBEVQE24GVQ"
    }
}

STEP 2: Create on S3 Datasource on Dremio

Make certain you are starting with a clean S3 Data source in Dremio. Then create a new S3 Data Source

Now that you have the STS credentials gathered from STEP 1, you are ready to create an S3 datasource in Dremio. To do this, Log into your Dremio instance, select the ‘+’ button next to the word ’Sources’ as shown below.

image alt text

You will now have access to a list of Dremio supported data sources. From this list select ‘Amazon S3’.

image alt text

STEP 3: Fill in S3 General page AWS STS Token and Key

Now you are ready to enter the AWS STS credentials information into the S3 Source General screen as shown below. Make certain in this screen you have selected the AWS Access Key option. An alternate for a more permanent access approach is to select the EC2 Metadata which enables Dremio to access S3 using the credentials associated with the EC2 instance. So long as the EC2 instance also has access to the S3 bucket you wish to query, in many ways the EC2 Metadata authentication is the better way to go and if you the reader agrees you can stop here.

But, if you need to access a S3 whose IAM policy does not allow full access to the desired S3 bucket or you are attempting to access S3 buckets which are in another region or managed under a different set of managed AWS credentials and you only have access to a temporary token the remaining steps will provide the needed steps to proceed.

The AWS Access Key below comes from the ‘AccessKeyId’ property pulled from either the AWS CLI or an SDK based program and the AWS Access Secret comes from the ‘SessionToken’. Make certain you have also selected the ‘AWS Access Key’ Authentication option.

You can also specify the Public Buckets and a specific IAM Role if you wish to filter the available buckets further. To keep this posting simple, will lead that write up for another day.

image alt text

Fill in the S3 Advanced Options with the STS Token and enable Compatibility as shown below

Now that the General options are set we need to enable ‘compatibility mode’ (absolutely required). In addition, we need to create a connection property ‘fs.s3a.session.token’ and set its value to the “SessionToken” from the AWS STS Session-token information which we retrieved earlier in this process. We also need to specify a credential provider by adding the ‘f3’.s9.s3a.aws.credentials.provider’ for STS by specifying a value of ‘org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider’.

image alt text

Then click ’Save’ at the bottom of the screen to complete the S3 Dremio data source definition using the STS token.

FINAL RESULT

Now you have access to the S3 bucket using your temporary AWS token until the token has expired, which in this case is 1:43pm on 10/10/2019 (which in this example has expired by now). But, while the token was still valid we see the below example using the master master credentials to list the S3 buckets.

1
2
3
4
~/.aws :> aws s3 ls
2019-04-10 21:30:29 dremio-dwn
2016-07-21 15:52:02 mfj
2019-05-07 13:03:41 mfjtaxi

Then going into Dremio and connecting into a valid STS token you will see the same S3 bucket list as accessible using the master credentials. Just drill down into a bucket to find the specific file you are looking to access and everything will work the same way as a convention S3 connection.

image alt text

Once the STS token has expired, you will see the STS Datasource is now in red and if you click on edit details for the S3 STS datasource as shown below, the expired token error message will display.

image alt text