In this amazing tutorial created by Nirmalya Sen, we will show you how to analyze data stored in Amazon S3 with a Dremio cluster running on EKS in AWS. This article will also show how you can shut down the Dremio cluster and reduce the EKS worker nodes to save on AWS infrastructure costs when not in use. For more information about Dremio’s different deployment methods, visit our Deploy page
Prerequisites
To be able to successfully complete this tutorial, you should have the following setup in your shell:
kubectl
helm
awscli
eksctl
git
Alternatively, you can use the Docker image dremio/cloud-tools that has all these tools installed.
docker run -i -t dremio/cloud-tools bash
Configure your shell to use your AWS account:
aws configure
The command prompts you for four pieces of information (access key, secret access key, AWS Region, and output format), and stores them in a profile (a collection of settings) named default. This profile is then used any time you run an AWS CLI command that doesn’t explicitly specify a profile to use. You can find more info on how to get an access key and secret from AWS documentation.
Setting Up EKS Cluster
Quickest way to set up an EKS cluster is using the eksctl tool. You can adjust the number of nodes or the node type based on your needs.
After your EKS cluster is ready, setup Helm in your cluster. If you are using the dremio/cloud-tools Docker image, run helm-init.sh. If you are not using the Docker image, execute the following commands to setup Helm.
Go to the directory dremio-cloud-tools/charts/dremio. Adjust memory, CPU for coordinator and executors, and executor count in values.yaml as per your needs (the defaults should work perfectly as well if you have not adjusted the node type and node count when creating the EKS cluster). You can also configure to store your uploads in an existing S3 bucket.
Deploy Dremio
helm install . --wait --timeout 900
Once it is deployed, go to Dremio UI, register and start using it. Get the hostname to connect to from Kubernetes service.
kubectl get services dremio-client
The value of EXTERNAL-IP from the output of the above command is the hostname to connect to. For example, if the EXTERNAL-IP from the output of the above command is ae315257aa03911e98bb90e46a9f1e9a-1557651590.us-west-2.elb.amazonaws.com, you would connect to:
You can run your queries directly in Dremio or you can use other clients to analyze your data.
Shutdown and Restart Dremio
You do not need to have your Dremio (and your EKS cluster worker nodes) running when you do not need Dremio. You can shut down Dremio and reduce the worker nodes in EKS down to zero when you do not need it. And then bring them back up when you need it.
Shutting down Dremio
Find the helm release running Dremio.
helm list
Delete the release. Say, the release name was invited-narwhal,
helm delete --purge invited-narwhal
When the following command returns no results, you can scale down the EKS cluster.
kubectl get pods
Scale down EKS cluster
If you changed the cluster name or node-group name when creating the EKS cluster, you need to match the EKS cluster name and the name of the node-group with the ones you used when creating the cluster.
Once you EKS cluster is scaled up, you can install Dremio again.
helm install . --wait --timeout 900
Installing Dremio again restores the existing Dremio metadata. This is due to the Kubernetes feature of retaining persistent volume even when the stateful sets using those volumes are deleted. The hostname for the Dremio cluster will be different. You need to find the cluster hostname the same way - use the output of the EXTERNAL-IP of the command.
kubectl get services dremio-client
Login with the user you created when you registered Dremio the first time.
Conclusion
In this tutorial, we navigated through the steps of using Amazon’s EKS to deploy Dremio on AWS, we also connected Dremio to an S3 bucket to analyze the data contained in it. Deploying Dremio on EKS is very easy and straightforward, especially with the help of such a powerful tool like Helm. If you would like to learn more about how you can gain insights from your data faster, checkout the rest of our tutorials and also Dremio University.
Ready to Get Started? Here Are Some Resources to Help
Webinars
Cyber Lakehouse for the AI Era, ZTA and Beyond
Many agencies today are struggling not only with managing the scale and complexity of cyber data but also with extracting actionable insights from that data. With new data retention regulations, such as M-21-31, compounding this problem further, agencies need a next-generation solution to address these challenges.
Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.