12 minute read · September 2, 2024

Evaluating Dremio: Deploying a Single-Node Instance on a VM

Alex Merced

Alex Merced · Senior Tech Evangelist, Dremio

If you are reading this, you are probably looking at Dremio as a potential solution to many different problems:

If any of the above propositions would add value to your organization, then it would be advisable to evaluate Dremio as a solution, and this guide can hopefully help you learn how to assess Dremio.

Individual Assessment on your Laptop with Docker

At this level, it's just about getting hands-on with Dremio and running a few queries to understand the Dremio workflow and features better. You should see pretty amazing performance when you connect to many of your datasets, but keep in mind that running Dremio on your laptop will be limited by your laptops specifications and internet connection.

With docker installed, you can have Dremio running in moments by running the following command in your terminal/command line:

docker run -p 9047:9047 -p 31010:31010 -p 45678:45678 -p 32010:32010 -e DREMIO_JAVA_SERVER_EXTRA_OPTS=-Dpaths.dist=file:///opt/dremio/data/dist --name try-dremio dremio/dremio-oss

In a few moments after running this command, you'll find a local version of Dremio running on http://localhost:9047. With this local Dremio you can connect your existing data sources (databases, data lakes and data warehouses) and query data in each location. When your done with this demo environment you can shut it off with:

# turn off environment
docker stop try-dremio

# turn it back on
docker start try-dremio

If you want a more guided experience, we have many tutorials that'll simulate many workflows. It is recommended that you allocate at least 6gb of ram to docker for the more complex exercises.

These exercises will give you a pretty good feel for what is possible with Dremio working with different sources, although one of Dremio's best features is not just for you to access multiple sources but to collaborate with others on all your data in one place, for that we'll need to deploy Dremio online.

Testing Out Collaboration with a Single-Node Deployment

You can deploy a single-node version of Dremio using a virtual machine from your favorite cloud compute provider (I tested out the below using an AWS t2.medium instance). This deployment is not meant for production and is limited by the power of the compute you use. This deployment can allow you to test the collaboration features by creating user accounts for some colleagues and letting them access the data you've connected to this deployment.

Provision a compute instance from your favorite provider (this guide is assuming an ubuntu based VM) and SSH into the shell of that instance.

Save the following scripts to .sh files on your instance using nano or vim:

setup1.sh

# Update the package list and install UFW
sudo apt-get update -y
sudo apt-get install -y ufw

# Set default firewall policies to deny incoming connections
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH and Docker traffic through the firewall
sudo ufw allow OpenSSH
sudo ufw allow 9047/tcp    # Dremio Web UI
sudo ufw allow 31010/tcp   # Dremio ODBC/JDBC client connections
sudo ufw allow 32010/tcp   # Dremio Arrow/Flight client connections
sudo ufw allow 45678/tcp   # Dremio Internal Process Communication
sudo ufw enable

# Install Docker
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -y
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# Add the ubuntu user to the docker group to run Docker without sudo
sudo usermod -aG docker $USER

setup2.sh

# Pull the Dremio Docker image
docker pull dremio/dremio-oss

# Run Dremio in Docker
docker run -d --name dremio \
  -e DREMIO_JAVA_SERVER_EXTRA_OPTS=-Dpaths.dist=file:///opt/dremio/data/dist \
  -p 9047:9047 \
  -p 31010:31010 \
  -p 45678:45678 \
  -p 32010:32010 \
  dremio/dremio-oss

# Notify that setup is complete
echo "Setup complete. Dremio is running as $USER."

nginx.sh (optional)

# Ensure the DOMAIN environment variable is set
if [ -z "$DOMAIN" ]; then
  echo "Error: DOMAIN environment variable is not set."
  exit 1
fi

# Ensure the EMAIL environment variable is set
if [ -z "$EMAIL" ]; then
  echo "Error: EMAIL environment variable is not set."
  exit 1
fi

# Install Nginx
sudo apt-get update -y
sudo apt-get install -y nginx

# Allow Nginx Full profile through UFW firewall
sudo ufw allow 'Nginx Full'

# Install Certbot and the Nginx plugin
sudo apt-get install -y certbot python3-certbot-nginx

# Create an initial Nginx server block configuration
sudo tee /etc/nginx/sites-available/dremio <<EOF
server {
    listen 80;
    server_name $DOMAIN;

    location / {
        proxy_pass http://localhost:9047;
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
        proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto \$scheme;
    }
}
EOF

# Enable the Nginx site configuration
sudo ln -s /etc/nginx/sites-available/dremio /etc/nginx/sites-enabled/

# Test the Nginx configuration
sudo nginx -t

# Reload Nginx to apply the configuration
sudo systemctl reload nginx

# Obtain an SSL certificate using Certbot
sudo certbot --nginx --non-interactive --agree-tos --email $EMAIL -d $DOMAIN

# Set up auto-renewal for the certificate
sudo certbot renew --dry-run

echo "Nginx reverse proxy with SSL setup complete. You can access Dremio via https://$DOMAIN"

The follow the following steps:

  • Run setup1.sh with the command "source setup1.sh"
  • terminate the ssh connection
  • reconnect with a new ssh session
  • Run setup2.sh with the command "source setup2.sh"
  • In a few minutes, Dremio should now be available at http://IPADDRESS:9047

At this point, you can already start working with Dremio using the compute instance's IP Address, but if you want to use a domain name with an SSL certificate, you can run the nginx.sh script after defining the DOMAIN and EMAIL environment variables assuming your domain's DNS settings are pointing to your instance. Other things to keep in mind:

  • That your cloud provider is allowing traffic to the instance to the ports that it is using
  • That you have the credentials to SSH into the instance
  • That if using Domain, that the domains DNS settings have propagated

Doing a POC with Dremio Cloud-Managed or Self-Managed

At this point, you've been able to experience the value of Dremio along with your colleagues and it's time to take the next step of doing a production POC with your organizations data to see it directly solving your organizations challenges. You can deploy a cloud-managed version of Dremio on AWS or Azure in moments or using Kubernetes deploy a self-managed version that can run in any environment in the cloud or on-prem. To get this process started you can follow either of the two links:

Conclusion

I hope this guide has helped you on your journey in exploring whether Dremio is the right solution to eliminating data silos, reducing costs and overall improving your organizations data outcomes.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.