Data Analytics on The Data Lake Using Apache Superset
Naren Sankaran
Table of Contents
Table of Contents
Introduction
Apache Superset is a modern BI web application open source project that provides users with an intuitive, visual and interactive data exploration platform. Some of the key features that Superset offer are:
Over 30 types of visualizations
Easy to use constructor for visualizations
Easy to share and collaborate on dashboards
Enterprise-ready authentication
A simple semantic layer that allows users to decide which fields they want to use in their visualizations
And much more.
Superset 1.0 was released on January 21, 2021 and has graduated from the incubator to become a top-level project at the Apache Software Foundation. Superset provides first class connectivity to Dremio via ODBC and Arrow Flight. Preset offers a SaaS version of Apache Superset.
To get started with Apache Superset and Dremio, install the Dremio SQLAlchemy Connector in the VM where Apache Superset is running.
Creating a dataset and reflection in Dremio
To get started with Dremio, follow this tutorial. To continue with the rest of this blog post you will need to create a space called taxi and save a virtual dataset called trips in the taxi space. The location of this dataset in the sample source is: “samples.dremio.com”.”NYC-taxi-trips”
And create an aggregate reflection like below:
Configuring Dremio in Superset
Data > Databases > Add database:
ODBC Connection:
Arrow Flight Connection:
Change the hostname, username, password to point to your Dremio cluster.
Creating the trips dataset
Data > Databases > Add dataset:
Creating charts
Data > Charts > Add chart:
Creating a dashboard
Dashboards > Add dashboard:
You can do a live refresh:
And check the jobs page in Dremio to see the jobs that were executed by Dremio in sub-second:
To learn more about Dremio visit our tutorials and resources, also if you would like to experiment with Dremio on your own virtual lab, go ahead and checkout Dremio University, and if you have any questions visit our community forums where we are eager to help.
If you encounter any issues, please send an email to [email protected]
Ready to Get Started? Here Are Some Resources to Help
Case Study
Dremio Supports Moonfare’s High-Performance Culture with a High-Performance Lakehouse
Moonfare replaced a PostgreSQL-based data warehouse on Amazon Web Services (AWS) with a Dremio data lakehouse to offer data engineers, analysts and business users a high performance platform for business intelligence and predictive analytics empowering them to make better data-driven decisions.
Case Study: DB Cargo Gives Users the Green Light to All Data with Dremio
Deutsche Bahn Group (DB) is one of the world's leading mobility and logistics companies. The DB Cargo business unit manages DB's rail freight business.
Case Study: Amazon Accelerates Supply Chain Decision Making with Dremio
Amazon's Supply Chain Finance Analytics team developed a new analytics architecture with Dremio to simplify ETL processes, accelerate queries, and provide analytics on a unified view of the data.