Dremio Jekyll


Visualize your data lake with Apache Superset and Dremio

Jan 22, 2020
Naren Sankaran

Superset is a modern BI web application project that is in the incubating stages at The Apache Software Foundation. It is an open source project that provides users with an intuitive, visual and interactive data exploration platform. Some of the key features that Superset offer are:

  • Over 30 types of visualizations
  • Druid.io integration
  • Easy to use constructor for visualizations
  • Easy to share and collaborate on dashboards
  • Enterprise-ready authentication
  • A simple semantic layer that allows users to decide which fields they want to use in their visualizations
  • And much more.

Today, I am excited to announce that Dremio has been integrated into Apache Superset which is now available in its master branch. While it is projected to be released in the next Apache Superset version 0.35.3 within the next couple of weeks, the following code snippet can be used if you would like to test Dremio with Superset right away.

This snippet does the following:

  • Get the latest version of superset
  • Install the dependencies (sqlalchemy_dremio, Dremio’s ODBC driver)
  • Create the first user admin for superset

Setting up the EC2 Instance


Launch a new instance in your AWS account; in this scenario I worked with a CentOS 7 AMI. From the AWS Marketplace options search for CentOS and select the 1st option from the list.

image alt text

Then, select the instance type that you would like to use and select Configure Instance Details. Edit any parameters that you need and scroll down until you see the User Data field. Here you will paste the following snippet:


Author’s note: take a moment to review the code before you launch your instance, there are parameters that you may need to edit based on your environment.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash
# The following script can be used for testing Dremio with Superset. This has been tested on Centos 7. You can copy paste this entire script in user-data section while provisioning an EC2 instance with CentOS 7 as the AMI
# If you want to run superset on port 80, this script must be run as root. Otherwise change the last line by replacing 80 with a port higher than 1024
# If you also want Dremio CE in the same instance uncomment lines 14,15,33
# Dremio's username and password: dremio, dremio123
# Superset's username and password: admin, admin
# Connection URI example: dremio://dremio:dremio123@localhost:31010/dremio

# Dependencies for Superset and Dremio connector

curl -sL https://rpm.nodesource.com/setup_13.x | sudo -E bash - && sudo yum update -y && sudo yum install -y unixODBC unixODBC-devel python3 python3-setuptools gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel python-virtualenv git nodejs https://download.dremio.com/odbc-driver/1.4.2.1003/dremio-odbc-1.4.2.1003-1.x86_64.rpm

# Java 8 and Dremio
# sudo yum install -y java-1.8.0-openjdk-devel http://download.dremio.com/community-server/dremio-community-LATEST.noarch.rpm
# sudo service dremio start

# Superset from source
git clone https://github.com/apache/incubator-superset.git && cd incubator-superset && git checkout 0cf354c
virtualenv -p python3 venv && source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -r requirements-dev.txt
pip3 install sqlalchemy_dremio
cd superset/assets && npm ci && npm run build
cd ../.. && pip3 install -e .
pip3 install setuptools==45.0.0
superset db upgrade
superset init

# Create admin user for superset
flask fab create-admin --username admin --firstname admin --lastname admin --email admin@admin --password admin

# Create dremio user
# curl 'http://localhost:9047/apiv2/bootstrap/firstuser' -X PUT -H 'Authorization: _dremionull' -H 'Content-Type: application/json'  --data-binary '{"userName":"dremio","firstName":"dremio","lastName":"dremio","email":"dremio@dremio.com","createdAt":1557027923359,"password":"dremio123"}' --compressed

#Run superset
gunicorn -w 4 -b 0.0.0.0:80 --timeout 120 --limit-request-line 0 --limit-request-field_size 0 "superset.app:create_app()" superset run --daemon

Launching and Accessing the UI


When ready, simply click on Review and Launch. The new instance will take just a few minutes to launch.

image alt text

Once the instance is up and running, you should be able to navigate to Superset’s UI by typing the DNS address and port on your browser.

image alt text

To login, use admin for the username and password. Dremio can then be added as a source by going to Databases > Add new Database and using a connection string like below:

dremio://dremio:dremio123@localhost:31010/dremio

If you edited the code to install Dremio CE on the same EC2 instance, you should be able to reach it at this point using the same DNS and changing the port to 9047

image alt text

If by any chance using the DNS+port on the browser doesn’t yield the UI for either Superset or Dremio, check the security group settings for the EC2 instance that you are working with. At this point you should be able to use Dremio and Superset; checkout this article to see more details about creating a dashboard with Superset and Dremio.

And that is a wrap! I hope you learned something useful today. To learn more about Dremio visit our tutorials and resources, also if you would like to experiment with Dremio on your own virtual lab, go ahead and checkout Dremio University, and if you have any questions visit our community forums where we are all eager to help.

If you encounter any issues, please send an email to naren@dremio.com.

Ready to get started?