15 minute read · June 9, 2025
Building AI-Ready Data Products with Dremio and dbt

· Master Principal Solutions Architect

This guide will equip you with the expertise to easily build an AI-ready data product using Dremio and dbt. A quality, AI-ready data product should be discoverable, addressable, trustworthy, self-describing, interoperable, and securely governed. By enabling a consistent and repeatable approach for delivery using Dremio and dbt together provides the necessary workflow foundations for an effective and valuable data product that can be used by the business to make more accurate informed decisions.
TL;DR
Data products are structured data assets vital for business. Although complex to deliver, Dremio unifies data access, by cutting ETL, and the data-build tool (dbt) streamlines transformations. Together, they speed up data product delivery, boosting self-service analytics and insights.
What is a data product?
A data product is more than just a dataset - it’s a structured, curated and readily consumable asset designed to address a specific business objective. Data products can take many forms including dashboards, reports, data-sets, machine learning and Gen-AI models. They empower a wide range of impactful initiatives, from real-time insights into critical business events and performance to strategic decision support such as informing product development and operational efficiency. Ultimately, data products drive tangible value across key business areas.
A data product should make it easy to find, reuse and version data-sets across an organisation, whilst eliminating redundancy and waste. They are not just a data-set, they are reusable and trustworthy, enriched with the necessary context to drive confident decisions in an organisation.
What’s slowing down the delivery of data products?
In the current landscape, limitations in data accessibility represent a substantial constraint on the advancement of innovation. Figure 1.0 (below) depicts this.
- First, Data Silos and Access Limitations present challenges of accessing information across disparate systems. Consequently, technical teams are detracted from their capacity to drive innovation, as they have to dedicate a significant portion of their efforts to data discovery, comprehension, and retrieval.
- Next, Complex ETL Processes for Integrating data from disconnected sources requires intricate and time-consuming procedures.
- This leads to Inefficient Data Management where extended timelines associated with current processes are prompting users to develop independent data extracts, leading to challenges in governance and the emergence of shadow IT.
- Finally, this compounds our Technical Debt and poses limitations on our ability to scale effectively.
How can you deliver governed data products across the enterprise with Dremio and dbt?
By leveraging Dremio and dbt, data teams can efficiently discover, design and deliver well-governed, structured and version-controlled data products across the enterprise. Dremio’s semantic layer provides unified access to diverse data sources without the need for ETL, empowering domain experts to access relevant data, derive new insights and unlock business value independently of data engineering teams. Together they enhance data trust through comprehensive documentation and metadata management, therefore expediting analysis, reducing redundancy and facilitating seamless collaboration.
The results are significant: Data delivery timelines are reduced from months to minutes, data definition and discovery become self-service operations, and data access is streamlined through self-service capabilities.
What are we going to do?
For the remainder of this blog we will demonstrate the creation and utilisation of a “Sales Growth” data product that can be used for understanding sales performance of products, brands, channels and class. We will leverage dbt to construct this data product within Dremio’s semantic layer.
The data used for this blog will be the item
, catalog_sales
, web_sales
, store_sales
and date_dim
tables from TPC-DS (refer to TPC-DS for more details). The data will represent data across business domains covering sales and product, and sources including both relational (through Google BigQuery) and object-storage (through Amazon S3).
Dremio will be used to create a series of views that will represent the different stages of developing a data product. These are discussed below.
- Preparation - This is the closest layer to the data source, within which we will organise and expose only the required data-sets for the data product. In this layer we will connect to our tables in their respective sources and build out a series of views. This includes a view that performs a union across all of the sales tables (including
web_sales
,catalog_sales
andstore_stores
) to represent a single view across sales. - Business - It is designed as the initial point for joining data-sets from various sources and performing data cleaning. In this layer we will join the
sales
view with thecalendar
anditem
tables to create theproduct_growth
view, which also includes some data cleansing to remove aberrant data points such as empty classes and categories. - Application - Designed to organise data-sets specifically for the needs of data consumers, business units and departments. In this layer we will construct the
sales_growth
view, which will be our data product. This involves performing an aggregation onproduct-growth
view to derive year-on-year sales growth at different granularities including year, channel, brand, category and class.
Finally, we will build an aggregate reflection to deliver sub-second performance for the data product (refer to this for details on reflections). This is depicted in Figure 2.0 below.
Once we have successfully created this data product then it can be consumed by using a multitude of methods such as:
- Tableau, PowerBI, Qlik, etc. for BI,
- Claude/ChatGPT through the Dremio MCP server (please refer to Dremio-MCP blog for more details on this),
- Jupyter/Python/R to use the data product in an application for the business.
For the purpose of this blog, we are going to focus on using it for BI purposes, using a Tableau dashboard.

Now let’s get started!
Before we embark on building a data product, let’s ensure that we have the relevant pieces in place.
Requirement | Description |
Query Engine | Dremio Cluster will be the processing engine of the new data product that will be consumed downstream. |
Query Tool | Tableau, PowerBI or Qlik will be used to access the data product. This must be configured correctly to connect to your Dremio cluster. Please refer to our guides for more details. |
dbt | Enabling to connect, transform, curate and document data products for user consumption. |
dbt-dremio | Provides the ability for dbt to connect to your Dremio cluster to build the data products. |
git | Required to access the code repository for the dbt project, install and configure git, referring to Install Git. |
Dremio credentials | Username/password or PAT will be required to authenticate access to the data product. |
If available, you might want to include an additional data source such as Google BigQuery (or any other relational database), which will be a secondary data source that Dremio will interface with to demonstrate its ability to query data across domains and sources.
Now that we have the pre-requisites addressed, let’s build our data product. The following sections help you build the data product.
Prepare the project
1. Refer to Dremio and dbt Setup to install and configure dbt with Dremio.
2. Run the following command to download the dbt project:
git clone https://github.com/ashleyfarrugia89/data_product_demo
3. Load the project inside an IDE of your choice, for the purpose of this blog we are using PyCharm.
Prepare the Data
4. Add the sample source (if not already present) by clicking on “Add Source” and selecting “Sample Source”.
5. To promote the TPC-DS data-sets within the Sample Source by navigating to the “Samples” source in the Samples."samples.dremio.com".tpcds_sf1000
folder, finding the date_dim, store_sales, catalog_sales
and web_sales
, and clicking “Format Folder” on the right-hand side of each.

Samples."samples.dremio.com".tpcds_sf1000
folder6. If you have an extra data source available then please follow the steps below. Alternatively, you can use the “Sample Source” for the item table.
- Click on the item table and download it as a CSV.
- Create a dataset using the CSV in Google BigQuery by following this guide.
- Create a connection to Google BigQuery in Dremio by following these steps, and confirm that Dremio can successfully query the table.
7. Amend the source models of the dbt project to reflect the sources inside your Dremio environment. This can be achieved by editing the calendar.sql
, product.sql
and sales.sql
in the models/preparation folder of the project.
Constructing the Data Product
8. Execute dbt run
in the terminal to run the project and build the objects within Dremio.
Verify your data product
9. Verify that the views and spaces are created inside Dremio.
10. Load Tableau and connect it to the data product.
11. Build your dashboard to show Sales Growth.
Once you have successfully followed this blog’s guidance, you will have constructed a robust Sales Growth data product. This will be represented by a meticulously organised series of spaces, views and reflections in Dremio. This data product will then be readily accessible by downstream applications, empowering users to conduct comprehensive analysis of sales performance across key dimensions including Year, Product, Brand, Channel and Class.
Conclusion and Future Work
In this blog, we've demonstrated how Dremio and dbt can be leveraged to construct a data product tailored for consumption by business teams. Data products are pivotal in driving tangible business value, acting as structured and curated data assets strategically designed to advance key business initiatives.
Delivering robust data products often involves navigating significant technical and operational complexities. However, the combination of Dremio and dbt addresses these challenges effectively. Dremio's semantic layer unifies data access, crucially eliminating the need for traditional ETL processes. Concurrently, dbt streamlines data transformation workflows and fosters enhanced collaboration among data teams. Together, Dremio and dbt significantly accelerate the delivery of data products, empowering self-service analytics and enabling the faster generation of actionable insights.
Future work could focus on integrating the data product with Claude Desktop or ChatGPT via Dremio-MCP. This integration would enable real-time discovery of sales growth insights by eliminating the need for users to write SQL queries or understand complex data models. The aim would be to translate business questions into actionable insights, removing technical jargon and simplifying data access for a broader audience, and therefore enable the rollout of data products to the business.
References
https://www.getdbt.com/blog/data-product-examples
https://docs.getdbt.com/docs/core/connect-data-platform/dremio-setup
https://docs.dremio.com/24.3.x/sonar/client-applications/clients/dbt
Sign up for AI Ready Data content