Unlocking Data-as-a-Service for Apache Superset using Dremio
Superset is a modern BI web application project that is in the incubating stages at The Apache Software Foundation. It is an open source project that provides users with an intuitive, visual and interactive data exploration platform. Some of the key features that Superset offer are:
- Over 30 types of visualizations
- Druid.io integration
- Easy to use constructor for visualizations
- Easy to share and collaborate on dashboards
- Enterprise-ready authentication
- A simple semantic layer that allows users to decide which fields they want to use in their visualizations
- And much more.
In this tutorial we will explore how can we unlock Data-as-a-Service for Superset using Dremio. We will connect Dremio to two different data sources and then after preparing and curating the data, we will connect Superset to Dremio and generate a dashboard. Without further ado, let’s dive right in!
For this tutorial we expect that you have Superset installed and configured, if you haven’t done so, take a look at the Superset installation documentation, there you will find detailed information about pre-requisites, and installation steps for different operating systems.
- Be familiar with the fundamentals of Dremio (we will be using Dremio 3.2)
- Have the latest version of python and pip installed
For this tutorial, I will be installing Supserset on Mac OSX Mojave. If you need details about how to install it on a different OS, take a look at their documentation page, they have very well detailed information about how to do so.
First, we need to update pip
1 pip install --upgrade setuptools pip
Then follow these steps to install Superset
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # Install superset pip install superset # Create an admin user (you can hit enter in each entry to leave values as default, however, you will need to set an admin password) fabmanager create-admin --app superset # Initialize the database superset db upgrade # Load sample data superset load_examples # Create default roles and permissions superset init # To start a development web server on port 8088, use -p to bind to another port superset runserver -d
Once all the steps are completed, open a new tab in your browser and go to http://localhost:8088/ here you will be presented with the login screen
If you left the default values when you were installing Superset, just type admin and the password that you chose in the respective fields.
Before we continue on Superset, let’s switch gears for a moment and get our data ready in Dremio.
Connecting to your data sources
In this scenario I’ll go ahead and play the role of a data analyst, as such, I would like to analyze data about the employees that we currently have in our company (disclaimer: the datasets that I’m using are fictional).
First thing I need to do is find the employees dataset. Since I’m familiar with the data source, I can navigate directly to the datasource and find the table that I need to work with which I know is stored in the PG datasource.
However, if you are not familiar with the data, Dremio provides the option to search the catalog for the dataset that you need. To do so, simply type the word that you want to search by in the “search catalog” entry and hit enter. I’ll go ahead and search for “employees” and Dremio will automatically provide a list of available datasets that match the string I typed.
Select the dataset that you want to preview
Once we verify that the values of the fields are the ones that we are expecting, we can move into doing the preparation of the data.
Prepping the data
In this scenario, to have a complete dataset, I need to join my ‘employees’ Virtual Dataset (VDS) with data about the departments where each one of these employees work. The catch is that this dataset is located in SQL Server. In an ordinary environment, I would have to put together an ETL process to extract the data from each data source and then join both of them and then copy the results into a different data set so I can then analyze it.
Let’s get that done in Dremio with just a few clicks.
First, click “join” on the data preview screen
Here you have several options, you can create a custom join with any data set, or you can accept any of the suggestions that Dremio is proposing. Additionally, you also have the option to select the type of join to be done: Inner, outer, etc.
I’ll select one of Dremio’s suggestions, inner join with the ‘Departments’ dataset located in SQL Server
After clicking ‘Apply’, Dremio will present us with a virtual dataset that contain all the information we need for this analysis.
Now let’s save the dataset. I’ll click “Save as”, give it a name and select the space where I want to save it.
If there are any changes that you want to do to the data, this is the moment to do so. In our Data Curation with Dremio tutorial, we explain in detail all the tasks that you can perform when trying to prepare your datasets. I encourage you to take a look. For know, let’s get our connection with Superset ready.
Creating and Connecting a Database in Superset
At this point you should have Superset open in a different tab. On the main screen, select “Sources” and then from the drop-down menu select “Databases”
Then select the plus (+) sign on the main screen
Now type in the following parameters in the indicated fields
SQLAlchemy URI: dremio+pyodbc://[Dremio username]:[Dremio password]@[Dremio host]:31010/dremio
Expose in SQL Lab: check
Allow Run Sync: check
NOTE: Replace the parameters in the SQLAlchemy URI to match the information from your environment. In my case the URI looks like the following:
Now, click on “Test Connection”, if everything went well, you should see the following pop-up
After all the parameters have been typed in, scroll down and then click ‘Save’
If everything went as expected, you will see your newly created source in the “List Databases” menu
Adding a new table
At this point we have successfully added a new data source to Supserset, which in this case is Dremio. Now we need to add the table that we want to work with. To do this, first select “Tables” from the “Sources menu”
From the new screen, click on the plus sign and then from the database menu, select “Dremio”. Then I’ll go ahead and fill in the rest of the parameters based on the table that I want to use. In this case, is the employee VDS that we created at the beginning of the tutorial.
Once you click “Save”, Superset will present you with the list of available datasets
Previewing your data
Before we start visualizing data, I’ll run a query to double check the data that I connected to. To do this, I’ll go to the SQL Lab tab in Superset
Then I’ll select the database and schema that I want to query and hit “Run query”
Exploring your data
Now we can start exploring the data, to do so, head back to the ‘Sources’ drop-down menu and select ‘Tables’. Then, click on the name of the dataset that we added.
Superset will present you with a table visualization type by default
Now let’s walk through the steps of exploring the data that we are working with. First, I’ll change the default time range filter to ‘No Filter’
Then, since I want to know how many employees I have per department, I’ll group by “department_name” and then select “count( * )” as the metric to use
After the changes are done, hit the “Run query” button and then you will be presented with the results based on the parameters selected.
Before I continue creating other visualizations, I’ll go ahead and save this table as “Employees by Department”. To name a visualization, simply click on the “- untitled” field and add a name, then hit enter. The visualization will be saved automatically.
Now, I’m going to create a different visualization. I want to see what is the average salary for each one of the departments that we have in our dataset. To do so, I’ll select the “Bar chart” chart from the visualization options.
Then, I will change the metric configuration to represent the “salary” column with the AVG aggregate
After running the query, the new chart looks like the following
Now, I would like to create a filter that I can use later in a dashboard. To do so, change the visualization type to “Filter Box” and will leave the rest of the parameters with their default value.
Creating a Dashboard in Superset
Creating dashboards in Superset is very simple and intuitive. The first thing that we need to do is select “Dashboards” in the main menu and then click on the plus sign to create a new one
I’ll name it “Acme Employees”, assign an owner and then save it.
Once you save it, Superset will present you with the list of dashboards available, in this case the only one is “Acme Employees”, I’ll click on the name link which will take me to the dashboard editor.
Here, select “Edit Dashboard”
Then select “Your charts & filters”
In this screen you will be able to see all the elements available to build the dashboard, in this case, I have three different elements: 2 charts and 1 filter.
To add elements to the dashboard, open the visualization from the charts menu, then click “Save” and select “Add to existing dashboard”
Then select “save and go to dashboard”
The visualization will be automatically added to the dashboard, here we can continue adding more elements. I’ll click on “Edit dashboard” and finish customizing the dashboard with the rest of the elements that I have available.
Once I’m done, I’ll save the changes and I can start slicing the data using the filer that I created. For example, I can change the values of the filter so the dashboard can show me the data about employees who were hired between 1987 and 2005.
Then I can start answering questions such as the average monthly salary for the Sales department between 1995 and 2005 was $8K and the team had 29 employees.
We focused our attention in this tutorial on working with a modern BI tool such as Superset and Dremio. We showed you the steps to connect, curate and prepare data from disparate data sources using Dremio. Moreover, we walked through the steps of connecting Supserset to Dremio and then creating several visualizations and a filter that we finally used to create a dashboard.
We hope you found this tutorial useful. To learn more about Dremio, and experience the benefits of using Dremio Enterprise Edition, please visit Dremio University, a free online Dremio learning platform where you can become a Dremio expert in no time.