A lakehouse is only as useful as the data inside it. Query performance, governance, and semantic layers all depend on one assumption: that the underlying data is accurate, complete, and behaving as expected. When it isn't, dashboards return wrong answers, AI agents reason from bad inputs, and engineering teams spend days diagnosing problems that should have been caught at the source.
Data quality testing is how you close that gap. The tools below each integrate with Dremio and cover the two main approaches: assertion-based testing, where you define what good data looks like and verify it explicitly, and observability-based monitoring, where the platform learns your data's normal behaviour and alerts you when something deviates. Used together, they give you both the checks you know to write and coverage for the problems you haven't anticipated yet.
dbt
The dbt-dremio adapter brings dbt's built-in test framework directly to your Dremio lakehouse. Tests are defined in YAML alongside your models and cover the most common data quality assertions: uniqueness, not-null constraints, accepted value sets, and referential integrity between tables. Running dbt test executes each test as a SQL query against Dremio and reports failures with the rows that caused them.
Because dbt tests live in the same project as your transformation logic, quality checks and the code they validate stay in sync. When a model changes, its tests change with it. For teams already using dbt-dremio for transformations, adding tests is a low-friction step rather than a separate tooling decision. Dremio's official dbt documentation is at docs.dremio.com/dremio-cloud/developer/dbt/, and the adapter repository with setup instructions is at github.com/dremio/dbt-dremio.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Soda
Soda connects to Dremio via Arrow Flight SQL and lets you write data quality checks in SodaCL, a human-readable YAML-based language. A check might assert that a column has no null values, that row counts fall within an expected range, or that a custom SQL expression evaluates to true. Checks run as SQL against your Dremio tables, so they respect your existing access controls and work against federated sources just as they would against native Iceberg tables.
Soda Cloud provides a managed layer for scheduling scans, tracking check results over time, and routing alerts to Slack or email when checks fail. For teams running Dremio as their primary analytics platform, Soda is suited to validating data quality at the point of ingestion or after transformation, before results reach downstream consumers. The Soda Core Github repo is available at github.com/sodadata/soda-core, and the requisite Arrow Flight SQL ODBC Driver documentation is found in the Dremio docs.
Great Expectations
Great Expectations is an open-source framework that connects to Dremio and lets you define "expectations" about your data: assertions covering column ranges, regex pattern matching, statistical distributions, set membership, and more. Expectations are organised into suites and run as validation jobs against your Dremio tables. When a validation run completes, Great Expectations generates data docs: HTML reports that show exactly which expectations passed or failed, with sample failing rows included.
The framework is particularly well-suited to teams that want granular, reproducible quality checks that can be versioned and reviewed like code. Expectations can be generated automatically by profiling an existing dataset, giving you a baseline to refine rather than starting from scratch. Compatibility and setup information is available at docs.greatexpectations.io/docs/application_integration_support.
Monte Carlo
Monte Carlo takes a different approach to data quality. Rather than requiring you to define checks upfront, it connects to Dremio and automatically learns the normal behaviour of your tables: typical row counts, schema structure, distribution patterns, and freshness cadences. When something deviates from the norm, Monte Carlo raises an alert. This covers the class of data quality problems that are hard to write explicit checks for because you don't know what to look for until something goes wrong.
For Dremio environments, Monte Carlo supports schema change detection, custom SQL monitors, and comparison rules that validate consistency across tables or transformations. It authenticates via a Personal Access Token, and both Dremio Cloud and Dremio Software deployments are supported. The integration is currently in public preview. Setup documentation is at docs.getmontecarlo.com/docs/dremio, and Dremio's partnership overview is at dremio.com/blog/dremio-and-monte-carlo-enhanced-data-reliability-for-your-data-lakehouse/.
Getting Started
If you want to start testing data quality against a Dremio environment, you'll first need a Dremio environment! You can get a free Dremio Cloud account at dremio.com/get-started, giving you a working lakehouse to connect any of these tools from the word go.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.