Dremio Blog

7 minute read · February 3, 2026

Driving Open Source and Open Standard Innovation at Dremio

Mark Shainman Mark Shainman Principal Product Marketing Manager
Start For Free
Driving Open Source and Open Standard Innovation at Dremio
Copied to clipboard

Dremio is a commercial platform, and we’re straightforward about that. But the standards and projects that power it are genuinely open, and Dremio has been an active contributor to building them, not just consuming them. Apache Arrow, Apache Iceberg, and Apache Polaris all have Dremio fingerprints on their design, specification, and governance. That work matters because open standards are what protect customers from lock-in, what allow multiple engines to share the same data, and what make the data ecosystem better for everyone, including us.

This post covers where Dremio is contributing today: the Apache projects we help lead, the Iceberg V3 milestones we just shipped, and the open source community tools we’ve built for practitioners who want to extend what’s possible on top of Dremio.

Co-Creating the Standards That Define the Lakehouse

Dremio co-created Apache Arrow, the open columnar memory format that has become the standard for high-performance data interchange across the ecosystem. Arrow is not a Dremio product. It is an Apache project, governed by the community, available to anyone. Dremio’s query engine is built on it, but so are dozens of other tools, and that breadth of adoption is exactly the point. Open standards create a rising tide.

Dremio was also a co-founder of Apache Polaris, the open Iceberg REST catalog specification. Polaris has now graduated to a top-level Apache project, which means it is governed independently by the broader Apache community. Any engine that speaks REST can read and write Iceberg tables through a Polaris-compatible catalog, including Spark, Flink, Trino, and DuckDB. Dremio’s Open Catalog is built on Polaris, but Polaris itself belongs to the community.

On Apache Iceberg, Dremio has been a leading contributor and educator, publishing extensively on Iceberg architecture, running hands-on workshops, and pushing the spec forward. Iceberg is the table format Dremio is built on, and we have a direct interest in making it the best it can be.

The election of Dremio engineer JB Onofre to the Apache Software Foundation board is the latest marker of that commitment. Onofre shepherded Apache Polaris through incubation, and his board role means Dremio has a voice in the governance of the foundation that stewards all of these projects. That kind of participation goes beyond code contributions. It is a long-term investment in the health of the open source ecosystem.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Shipping Iceberg V3 in Dremio Cloud

The most recent concrete milestone in Dremio’s Iceberg work is V3 support, now available in Dremio Cloud at general availability. V3 extends the spec to address workload classes that were difficult or expensive to support with earlier versions, and Dremio’s implementation covers all three of the major additions.

Deletion vectors make row-level operations faster and cheaper for change data capture and streaming workloads, removing the bottleneck that made fine-grained updates expensive at scale. The VARIANT data type eliminates the schema-on-write requirement for semi-structured data, letting teams ingest JSON without pre-defining structure. And row-level lineage provides built-in creation and update tracking directly in the table format, with no additional tooling required, which matters especially in regulated industries.

These capabilities compound on each other inside the Dremio platform. Autonomous Reflections observe query patterns and automatically create, refresh, and retire materializations to keep performance sub-second. Iceberg Clustering uses Z-order to co-locate data across multiple columns simultaneously, with two-level pruning that cuts I/O on petabyte-scale tables. Both run continuously without manual intervention, which means teams spend less time maintaining their lakehouse and more time building on it.

Open Source Tools from the Dremio Community

Beyond the Apache projects, Dremio created and supports the dremio-community GitHub organization as a home for open source connectors, integrations, and tools built by and for the Dremio practitioner community. 

dremio-community-connectors is a library of community-built connectors that extend Dremio to data sources. If your team needs access to a niche system or custom source, this is where you build and share the connector so others don’t have to solve the same problem from scratch.

dremio-community-udfs brings geospatial functions, vector similarity operations, and other specialized SQL capabilities into Dremio through community driven UDF libraries. Teams working on geospatial analysis or vector search can extend the query engine for their workload.

dremio-transform-studio is the repository for the free, open source Transform Studio tool, a visual low-code pipeline builder for Dremio. It lets practitioners browse the catalog, chain transforms, preview results, and write output tables from a clean UI, without writing SQL by hand. Publishing Transform Studio as open source is a direct invitation: use it, build on it, and contribute back.

Why Open Standards Matter for Your Data Platform Decision

Open standards are not just a philosophical position. They have direct practical consequences for the teams that build and run data platforms. When the table format is Apache Iceberg and the catalog is Apache Polaris, any engine that speaks the open REST API can query your data. You are not locked into one vendor’s compute. You can run Spark jobs alongside Dremio queries on the same Iceberg tables, under the same governance policies, without copying data or maintaining separate catalogs.

That interoperability is the whole point. Dremio’s commercial platform sits on top of these open foundations, and we think that’s the right way to build a data platform. Customers get the performance, governance, and autonomous management capabilities that Dremio adds, without giving up the freedom that open standards provide. Contributing to those standards, helping govern the projects that define them, and building open source tooling on top of them is how we make sure that freedom stays real.

Go Further

Learn about Dremio’s Apache Iceberg capabilities, including V3 support, Autonomous Reflections, and Iceberg Clustering.

Explore the dremio-community GitHub organization to find connectors, UDF libraries, and Transform Studio.

Attend a Dremio Cloud Workshop for hands-on time with the platform and the open ecosystem it runs on.Start for free and build on the open lakehouse platform today.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.