Dremio Jekyll

How to Deploy Dremio on Amazon EKS

Intro

In this amazing tutorial created by our Principal Engineer - Nirmalya Sen, we will show you how to analyze data stored in Amazon S3 with a Dremio cluster running on EKS in AWS. This article will also show how you can shut down the Dremio cluster and reduce the EKS worker nodes to save on AWS infrastructure costs when not in use. For more information about Dremio’s different deployment methods, visit our Deploy page

Pre-requisites

To be able to successfully complete this tutorial, you should have the following setup in your shell:

  • kubectl
  • helm
  • awscli
  • eksctl
  • git

Alternatively, you can use the Docker image dremio/cloud-tools that has all these tools installed.

1
docker run -i -t dremio/cloud-tools bash

Configure your shell to use your AWS account:

1
aws configure

The command prompts you for four pieces of information (access key, secret access key, AWS Region, and output format), and stores them in a profile (a collection of settings) named default. This profile is then used any time you run an AWS CLI command that doesn’t explicitly specify a profile to use. You can find more info on how to get an access key and secret from AWS documentation.

Setting up EKS Cluster

Quickest way to setup an EKS cluster is using the eksctl tool. You can adjust the number of nodes or the node type based on your needs.

1
2
3
4
5
6
7
8
eksctl create cluster \
  --name dremio-test \
  --version 1.12 \
  --nodegroup-name dremio-test-workers \
  --node-type r5d.4xlarge \
  --nodes 5 --nodes-min 0 --nodes-max 5 \
  --node-ami auto \
  --region us-west-2

After your EKS cluster is ready, setup helm in your cluster. If you are using the dremio/cloud-tools Docker image, run

1
helm-init.sh

If you are not using the Docker image, execute the following commands to setup helm.

1
2
3
kubectl create serviceaccount -n kube-system tiller
kubectl create clusterrolebinding tiller-binding --clusterrole=cluster-admin --serviceaccount kube-system:tiller
helm init --service-account tiller --wait

Deploying Dremio

Dremio publishes helm charts to deploy Dremio in a Kubernetes cluster. Clone the dremio-cloud-tools repo.

git clone https://github.com/dremio/dremio-cloud-tools.git

Go to the directory dremio-cloud-tools/charts/dremio. Adjust memory, cpu for coordinator and executors and executor count in values.yaml as per your needs (the defaults should work perfectly as well if you have not adjusted the node type and node count when creating the EKS cluster). You can also configure to store your uploads in an existing S3 bucket.

Deploy Dremio

1
helm install . --wait --timeout 900

Once it is deployed, go to Dremio UI, register and start using it. Get the hostname to connect to from Kubernetes service.

1
kubectl get services dremio-client

The value of EXTERNAL-IP from the output of the above command is the hostname to connect to. For example, if the EXTERNAL-IP from the output of the above command is ae315257aa03911e98bb90e46a9f1e9a-1557651590.us-west-2.elb.amazonaws.com, you would connect to:

http://ae315257aa03911e98bb90e46a9f1e9a-1557651590.us-west-2.elb.amazonaws.com:9047

image alt text

Register and you are ready to use Dremio.

Analyzing Data in S3

Add your S3 bucket as a data source in Dremio.

Select Add Source.

image alt text

Select Amazon S3

image alt text

Add your S3 bucket

image alt text

You are now ready to analyze your data.

image alt text

image alt text

You can run your queries directly in Dremio or you can use other clients to analyze your data.

Shutdown and Restart Dremio

You do not need to have your Dremio (and your EKS cluster worker nodes) running when you do not need Dremio. You can shut down Dremio and reduce the worker nodes in EKS down to zero when you do not need it. And then bring them back up when you need it.

Shutting down Dremio

Find the helm release running Dremio.

1
helm list

Delete the release. Say, the release name was invited-narwhal,

1
helm delete --purge invited-narwhal

When the following command returns no results, you can scale down the EKS cluster.

1
kubectl get pods

Scale down EKS cluster

If you changed the cluster name or node-group name when creating the EKS cluster, you need to match the EKS cluster name and the name of the node-group with the ones you used when creating the cluster.

1
2
3
4
5
eksctl scale nodegroup \
--cluster dremio-test \
--name dremio-test-workers \
--nodes 0 \
--region us-west-2

Scale up EKS cluster

This is same as the command to scale down the cluster except the value for the number of nodes.

1
2
3
4
5
eksctl scale nodegroup \
--cluster dremio-test \
--name dremio-test-workers \
--nodes 5 \
--region us-west-2

Re-install Dremio

Once you EKS cluster is scaled up, you can install Dremio again.

1
helm install . --wait --timeout 900

Installing Dremio again restores the existing Dremio metadata. This is due to the Kubernetes feature of retaining persistent volume even when the stateful sets using those volumes are deleted. The hostname for the Dremio cluster will be different. You need to find the cluster hostname the same way - use the output of the EXTERNAL-IP of the command.

1
kubectl get services dremio-client

Login with the user you created when you registered Dremio the first time.

image alt text

image alt text

Conclusion

In this tutorial, we navigated through the steps of using Amazon’s EKS to deploy Dremio on AWS, we also connected Dremio to an S3 bucket to analyze the data contained in it. Deploying Dremio on EKS is very easy and straightforward, especially with the help of such a powerful tool like Helm. If you would like to learn more about how you can gain insights from your data faster, checkout the rest of our tutorials and also Dremio University.