6 minute read · March 29, 2024

BI Dashboards with Apache Iceberg Using AWS Glue and Apache Superset

Alex Merced · Senior Tech Evangelist, Dremio

Setting Up Our Environment

Connecting Your AWS Glue Catalog to Dremio

Business Intelligence (BI) dashboards are invaluable tools that aggregate, visualize, and analyze data to provide actionable insights and support data-driven decision-making. Serving these dashboards directly from the data lake, especially with technologies like Apache Iceberg, offers immense benefits, including real-time data access, cost-efficiency, and the elimination of data silos. Dremio as a data lakehouse platform, enhances this setup by providing high-performance query acceleration and an integrated analytics layer, thus ensuring that the dashboards are timely and powered by rich and comprehensive data sources. This blog will guide you through leveraging your AWS Glue catalog as a Dremio data source and utilizing Apache Superset as the BI tool to create and deliver dynamic, insightful BI dashboards. By combining these powerful technologies, you can unlock the full potential of your data lake, making your data more accessible and actionable for all stakeholders.

Setting Up Our Environment

This exercise will use Docker Compose to set up a local Dremio and Superset environment. We will use the official Dremio Docker image and a custom Superset image with the requisite Dremio libraries installed. Create a docker-compose.yml with the following:

version: "3"

services:
  # Dremio
  dremio:
    platform: linux/x86_64
    image: dremio/dremio-oss:latest
    ports:
      - 9047:9047
      - 31010:31010
      - 32010:32010
    container_name: dremio
    environment:
      - DREMIO_JAVA_SERVER_EXTRA_OPTS=-Dpaths.dist=file:///opt/dremio/data/dist
    networks:
      dremio-superset:
  #Superset
  superset:
    image: alexmerced/dremio-superset
    container_name: superset
    networks:
      dremio-superset:
    ports:
      - 8080:8088
networks:
  dremio-superset:

To spin up the environment, do the following commands with your terminal in the same folder as the docker-compose.yml:

docker compose up

This will spin up Dremio and Superset, but to fully activate Superset open up another terminal and enter the command:

docker exec -it superset superset init

Connecting Your AWS Glue Catalog to Dremio

Go to locahost:9047 in your browser and create your Dremio user. Add a new “AWS Glue” data source.
Name the source “glue”, select your preferred AWS region, and enter your AWS credentials. The simplest way is to use your access key and secret key, but if you prefer using IAM roles that is also possible.
Under the advanced options tab, add a connection property with the “hive.metastore.warehouse.dir” and the value should be the address of the location you want your data written to when Dremio creates Iceberg tables in your Glue catalog.
Then click “save” to add the data source

Connecting Superset to Dremio

Dremio can be used with most existing BI tools, with one-click integrations in the user interface for tools like Tableau and Power BI. We will use an open-source option in Superset for this exercise, but any BI tool would have a similar experience.

To get started, head over to localhost:8080 and log in to Superset with the username “admin” and password “admin”. Once you are in, click on “Settings” and select “Database Connections”.

Add a New Database
Select “Other”
Use the following connection string (make sure to include Dremio username and password in URL): dremio+flight://USERNAME:PASSWORD@dremio:32010/?UseEncryption=false
* Read here for details on how the URL should look like for Dremio Cloud
Test connection
Save connection

The next step is to add a dataset by clicking on the + icon in the upper right corner and selecting “create dataset”. From here, select the table you want to add to Superset, in this case, our sales_data table.

We can then click the + to add charts based on the datasets we’ve added. Once we create the charts we want, we can add them to a dashboard, and that’s it! It is as simple as that, and if you want to accelerate your dashboard even further, you can enable aggregate reflections on your underlying datasets for an additional boost.

Consider deploying Dremio into production to make delivering data for analytics easier for your data engineering team.

Article Topics

Dremio Blog: Product Insights

BI Dashboards with Apache Iceberg Using AWS Glue and Apache Superset

Table of Contents

Setting Up Our Environment

Connecting Your AWS Glue Catalog to Dremio

Connecting Superset to Dremio

Ready to Get Started?

Table of Contents

Setting Up Our Environment

Connecting Your AWS Glue Catalog to Dremio

Connecting Superset to Dremio

Additional Resources

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Table-Driven Access Policies Using Subqueries

Kubernetes Autoscaling in Dremio 24.3

Ready to Get Started?