In data analytics, it's the query engine that gets all the attention. It's where the SQL runs and where the performance story is told. But the storage layer underneath is just as important; it's the "lake" part of the "lakehouse" after all. Choose the wrong storage infrastructure and you're facing I/O bottlenecks no query engine optimisation can help with.
With Data Federation, Dremio can sit on top of a range of storage backends, from the major cloud object stores to on-premises infrastructure for organisations that can't or won't put everything in the cloud. For hybrid deployments, where data lives partly on-premises and partly in the cloud, the storage choice defines the architecture. Read on to learn about four storage platforms with documented integrations that work well with Dremio.
VAST Data: All-Flash Performance at Lakehouse Scale
VAST Data builds all-flash storage infrastructure designed for petabyte and exabyte-scale workloads, and its partnership with Dremio is one of the more deeply integrated storage relationships in the ecosystem. A dedicated Dremio plugin for the VAST Database, and VAST DataStore supporting Apache Iceberg natively, means Dremio can manage Iceberg tables directly on VAST infrastructure without any translation layer between them. The combination is positioned as a high-performance alternative to cloud data warehouses for organisations that want warehouse-level analytics on-prem.
In practice, the architecture targets organisations running large-scale analytical workloads that require consistently low query latency against very large datasets. VAST's approach to data reduction, flash management, and data protection reduces effective storage costs significantly compared to traditional all-flash arrays, which is one of the common pain-points of running flash at lakehouse scale. Dremio's query acceleration via Reflections layers on top, meaning frequently accessed datasets benefit from both fast underlying storage and pre-computed query results. More detail on the integration is at dremio.com/blog/hybrid-lakehouse-infrastructure-solutions-vast-data/.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
MinIO: Open-Source Object Storage for Flexible Deployments
MinIO is open-source, S3-compatible object storage that can run on commodity hardware, Kubernetes, or any cloud environment. Because it implements the S3 API, Dremio connects to it using the same S3-compatible storage configuration used for AWS S3, with a couple of additional properties. From Dremio's perspective, once configured, a MinIO bucket behaves like any other S3 data source.
The practical appeal of MinIO is flexibility. Teams that want an on-premises or self-hosted object storage layer without a proprietary vendor relationship can deploy MinIO on Kubernetes alongside Dremio and get a fully open lakehouse stack. MinIO has published detailed guides covering this deployment pattern, including TLS configuration and Kubernetes-based deployments with Dremio. For organisations migrating away from HDFS, MinIO is a common landing target precisely because Dremio treats it identically to S3. Configuration documentation for distributed storage using MinIO can be found here.
NetApp: Enterprise Storage for the Hybrid Iceberg Lakehouse
NetApp's StorageGRID object storage and ONTAP platforms both integrate with Dremio as part of a published reference architecture for hybrid Iceberg lakehouse deployments. StorageGRID exposes an S3-compatible endpoint that Dremio connects to directly, while ONTAP can expose data via NFS or S3 depending on the workload. The combination supports teams that need to keep sensitive or regulated data on-premises while still running modern Iceberg-based analytics against it through Dremio's query layer.
NetApp published a documented case study as part of the reference architecture showing a 95% reduction in query time, from 45 minutes down to just 2 minutes, when running Dremio against NetApp StorageGRID. That result reflects both the storage performance and Dremio's ability to push down predicates and use Reflections effectively against the underlying data. For organisations already running NetApp infrastructure for primary storage, adding Dremio as the query layer gets you to lakehouse analytics with less disruption than migrating data to cloud object storage.
Pure Storage: High-Performance On-Premises Analytics
Pure Storage's FlashBlade and FlashArray platforms integrate with Dremio for on-premises analytics workloads where flash performance is a requirement. Dremio queries data in place on Pure Storage via its SQL engine, meaning data teams get fast analytical queries without moving data to a cloud tier or maintaining separate analytical copies. A joint solution brief covers the architecture, positioning the combination for organisations that need the economics and data sovereignty of on-premises storage with the self-service analytics capabilities of a modern lakehouse platform.
Pure Storage fits particularly well in organisations running data-intensive workloads in regulated industries, where data residency requirements make cloud storage difficult or impossible. Pairing Pure's all-flash infrastructure with Dremio's federated query engine means analysts can reach data across the organisation through a single SQL interface, with Reflections accelerating the queries that run most frequently. Find Dremio's overview of the partnership at dremio.com/blog/hybrid-lakehouse-storage-solutions-purestorage/.
Choosing Your Storage Layer
The four platforms discussed above sit at different points on the spectrum from open-source flexibility to enterprise partnership. MinIO is the right choice for teams that want full control and no vendor dependency. VAST, NetApp, and Pure Storage are the right choices for organisations investing in on-premises infrastructure at enterprise scale, each with different performance and cost profiles.
In all cases, Dremio's role stays consistent: it provides the query, governance, and semantic layer on top of whatever storage the data lives in. The storage platform handles durability, performance, and access; Dremio handles everything above it.
If you want to test this architecture with your own data, a free Dremio environment at dremio.com/get-started connects to S3-compatible storage from day one, making it straightforward to validate the integration before committing to an on-premises deployment.
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.