Data Analytics on The Data Lake Using Apache Superset

   

Table of Contents

Table of Contents

Introduction

Apache Superset is a modern BI web application open source project that provides users with an intuitive, visual and interactive data exploration platform. Some of the key features that Superset offer are:

  • Over 30 types of visualizations
  • Easy to use constructor for visualizations
  • Easy to share and collaborate on dashboards
  • Enterprise-ready authentication
  • A simple semantic layer that allows users to decide which fields they want to use in their visualizations
  • And much more.

Superset 1.0 was released on January 21, 2021 and has graduated from the incubator to become a top-level project at the Apache Software Foundation. Superset provides first class connectivity to Dremio via ODBC and Arrow Flight. Preset offers a SaaS version of Apache Superset.

To get started with Apache Superset and Dremio, install the Dremio SQLAlchemy Connector in the VM where Apache Superset is running.

Creating a dataset and reflection in Dremio

To get started with Dremio, follow this tutorial. To continue with the rest of this blog post you will need to create a space called taxi and save a virtual dataset called trips in the taxi space. The location of this dataset in the sample source is: “samples.dremio.com”.”NYC-taxi-trips”

getting started

And create an aggregate reflection like below:

aggregate reflection

Configuring Dremio in Superset


Data > Databases > Add database:

ODBC Connection:

odbc connection

Arrow Flight Connection:

flight connection

Change the hostname, username, password to point to your Dremio cluster.

Creating the trips dataset

Data > Databases > Add dataset:

trips dataset

Creating charts

Data > Charts > Add chart:

Total Count
Total Count
Rides by quarter
Average total and tips by month

Creating a dashboard

Dashboards > Add dashboard:

Total Count

You can do a live refresh:

Total Count

And check the jobs page in Dremio to see the jobs that were executed by Dremio in sub-second:

Total Count

To learn more about Dremio visit our tutorials and resources, also if you would like to experiment with Dremio on your own virtual lab, go ahead and checkout Dremio University, and if you have any questions visit our community forums where we are eager to help.

If you encounter any issues, please send an email to [email protected].

Ready to Get Started? Here Are Some Resources to Help

Webinars

Smart Data – Smart Factory with Octotronic and Dremio

read more

Guides

What Is a Data Lakehouse?

The data lakehouse is a new architecture that combines the best parts of data lakes and data warehouses. Learn more about the data lakehouse and its key advantages.

read more
Simplifying Data Mesh Featured Image

Whitepaper

Simplifying Data Mesh for Self-Service Analytics on an Open Data Lakehouse

The adoption of data mesh as a decentralized data management approach has become popular in recent years, helping teams overcome challenges associated with centralized data architecture.

read more

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us