Business Intelligence (BI) dashboards are invaluable tools that aggregate, visualize, and analyze data to provide actionable insights and support data-driven decision-making. Serving these dashboards directly from the data lake, especially with technologies like Apache Iceberg, offers immense benefits, including real-time data access, cost-efficiency, and the elimination of data silos. Dremio as a data lakehouse platform, enhances this setup by providing high-performance query acceleration and an integrated analytics layer, thus ensuring that the dashboards are timely and powered by rich and comprehensive data sources. This blog will guide you through leveraging your AWS Glue catalog as a Dremio data source and utilizing Apache Superset as the BI tool to create and deliver dynamic, insightful BI dashboards. By combining these powerful technologies, you can unlock the full potential of your data lake, making your data more accessible and actionable for all stakeholders.
Setting Up Our Environment
This exercise will use Docker Compose to set up a local Dremio and Superset environment. We will use the official Dremio Docker image and a custom Superset image with the requisite Dremio libraries installed. Create a docker-compose.yml with the following:
To spin up the environment, do the following commands with your terminal in the same folder as the docker-compose.yml:
docker compose up
This will spin up Dremio and Superset, but to fully activate Superset open up another terminal and enter the command:
docker exec -it superset superset init
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Connecting Your AWS Glue Catalog to Dremio
Go to locahost:9047 in your browser and create your Dremio user. Add a new “AWS Glue” data source.
Name the source “glue”, select your preferred AWS region, and enter your AWS credentials. The simplest way is to use your access key and secret key, but if you prefer using IAM roles that is also possible.
Under the advanced options tab, add a connection property with the “hive.metastore.warehouse.dir” and the value should be the address of the location you want your data written to when Dremio creates Iceberg tables in your Glue catalog.
Then click “save” to add the data source
Connecting Superset to Dremio
Dremio can be used with most existing BI tools, with one-click integrations in the user interface for tools like Tableau and Power BI. We will use an open-source option in Superset for this exercise, but any BI tool would have a similar experience.
To get started, head over to localhost:8080 and log in to Superset with the username “admin” and password “admin”. Once you are in, click on “Settings” and select “Database Connections”.
The next step is to add a dataset by clicking on the + icon in the upper right corner and selecting “create dataset”. From here, select the table you want to add to Superset, in this case, our sales_data table.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.