12 minute read · December 17, 2024

2024 Year in Review: Lakehouses, Apache Iceberg and Dremio

Alex Merced

Alex Merced · Senior Tech Evangelist, Dremio

As 2024 comes to a close, it’s clear that this year has been remarkable for the data lakehouse and the growing momentum driving its adoption. In this blog, I’ll reflect on some of the most exciting developments in the data lakehouse space, focusing on the new possibilities unlocked by tools like Apache Iceberg and Dremio.

Apache Iceberg in 2024

Apache Iceberg is a powerful lakehouse table format that allows large Parquet datasets to be managed like traditional data warehouse tables. It provides full ACID guarantees and advanced schema evolution capabilities, going beyond what traditional data warehouses offer.

If you’re looking to learn more and get hands-on with Apache Iceberg, here are some helpful resources:

Download Free Copy of "Apache Iceberg: The Definitive Guide"

Free Apache Iceberg Crash Course (on-demand)

Lakehouse Catalog Crash Course (on-demand)

Hands-on with Apache Iceberg Course

Apache Iceberg has had an incredible year, stepping further into the spotlight as the industry-standard table format, filling up industry mindshare over other popular formats like Apache Hudi and Delta Lake. Here are some of the key highlights from 2024:

  • Dremio announced the private preview of its Hybrid Iceberg Catalog, extending governance and table maintenance capabilities across both on-premises and cloud environments. This builds on the general availability of its cloud catalog in previous years.
  • Snowflake introduced the Polaris Catalog and partnered with Dremio, AWS, Google, and Microsoft to donate it to the Apache Software Foundation, marking a major step toward open collaboration.
  • Upsolver rolled out native Iceberg support, including streamlined table maintenance for streaming data landing directly into Iceberg tables.
  • Confluent unveiled new features to enhance Iceberg integrations, further bridging the gap between streaming and analytics.
  • Databricks acquired Tabular, a startup founded by Apache Iceberg creators Ryan Blue, Daniel Weeks, and Jason Reid, signaling increased investment and focus on Iceberg’s future.
  • AWS announced specialized S3 table bucket types to deliver native Apache Iceberg support, improving performance and reliability.
  • Google BigQuery added native Iceberg table support, enabling more flexibility and interoperability in analytics workloads.
  • Microsoft Fabric introduced “Iceberg Links”, simplifying access to Iceberg tables and providing seamless integration within its environment.

These milestones reflect Iceberg's continued momentum as the table format of choice for modern data architectures. As organizations increasingly adopt data lakehouses to unify structured and unstructured data for analytics and AI, Iceberg’s open and scalable design has become a critical enabler. Its robust capabilities, such as ACID transactions, schema evolution, and time travel, provide the flexibility and performance required for today’s demanding workloads.

Looking ahead, these advancements position Apache Iceberg as the foundation for the next generation of analytics and AI platforms, empowering businesses to unlock deeper insights and drive innovation at scale.

Dremio in 2024

Dremio is the Unified Data Lakehouse Platform, providing a first-class experience for Apache Iceberg-based data lakehouses. With its powerful query engine and seamless Iceberg catalog integrations, Dremio empowers organizations to unlock the full potential of Iceberg.

In addition to Iceberg support, Dremio offers federated query capabilities that allow you to unite data across databases, data warehouses, and data lakes into a single platform. Combined with its robust semantic layer and advanced governance features, Dremio delivers a unified solution for making data available anywhere to users everywhere.

Discover more about Dremio’s capabilities and take your skills to the next level by earning your Verified Lakehouse Associate Badge.

In 2024, Dremio has introduced a series of groundbreaking features across its AWS/Azure Cloud Managed and Self-Managed Anywhere Kubernetes deployment options. Many of these updates have further enhanced one of Dremio's most unique and powerful capabilities—query acceleration with Reflections—while also improving usability, performance, and integrations.

Here’s a summary of the key releases:

  • Dark Mode for the UI: A sleek, modern interface update to improve user experience.
  • Incremental Reflections: For Iceberg tables, reflections now refresh incrementally, processing only changes between snapshots. This reduces both cost and time for reflection maintenance.
  • Live Refreshes: Reflections on Apache Iceberg tables now auto-refresh whenever the underlying table is updated, ensuring real-time query acceleration.
  • Reflection Recommender: Dremio clusters can now analyze query patterns and recommend the most valuable reflections, taking the guesswork out of optimizing performance.
  • Polaris Catalog Connector: Enterprise Edition users can now connect to Snowflake-managed Polaris Deployments (known as “Open Catalog”), bringing Iceberg tables seamlessly into Dremio.
  • Unity Catalog Connector: Delta Lake users on Databricks can leverage Uniform to mirror their Delta metadata as Iceberg metadata. This enables users to bring these tables into Dremio and benefit from features like incremental and live reflections.
  • Hybrid Iceberg Catalog (Private Preview): Powered by Apache Polaris, this new catalog allows hybrid lakehouses to track Iceberg tables across both cloud and on-prem environments. For the first time, hybrid lakehouses can enjoy the same governance and management features as cloud-only solutions.
  • Enhanced dbt Integration: Dremio’s dbt integration now supports dbt's incremental features, providing seamless, performance-optimized workflows for analytics and transformations.
  • New Integrations: Dremio expanded its ecosystem with integrations for products like Vast Data and Monte Carlo, enabling broader interoperability and governance.
  • Result-Set Cache: A new query performance mechanism that accelerates repeated query results for faster analytics.
  • Improved Iceberg Support: Dremio introduced several key updates, including auto-ingest pipelines, COPY INTO for Parquet files, and support for merge-on-read, enhancing flexibility and ease of use for Iceberg tables.
  • Monitoring and Stability Enhancements: Dremio introduced features like the Memory Arbiter, improving cluster resiliency and ensuring more stable operations under high workloads.

These updates showcase Dremio’s continued commitment to innovation and leadership in the data lakehouse space, offering unparalleled query performance, integration, and governance. Whether you're leveraging Apache Iceberg, adopting hybrid architectures, or optimizing analytics workflows, Dremio remains at the forefront of empowering organizations to achieve their data goals with speed, efficiency, and simplicity.

Conclusion

As we wrap up 2024, it’s clear that this year has been a transformative one for the data lakehouse ecosystem. Apache Iceberg has firmly solidified its position as the industry-standard table format, gaining widespread adoption and inspiring significant innovations across the data landscape. From advanced integrations and enhanced streaming support to contributions from major industry players like Snowflake, Databricks, AWS, and Microsoft, Iceberg continues to enable organizations to unify structured and unstructured data for analytics and AI at unprecedented scale.

At the same time, Dremio has proven itself as the Unified Data Lakehouse Platform of choice, delivering critical advancements in query acceleration, hybrid catalog capabilities, and seamless integrations. Dremio’s focus on empowering organizations to work with Apache Iceberg efficiently—while unifying diverse data sources—demonstrates its commitment to driving performance, governance, and cost efficiency across modern data architectures.

Looking ahead to 2025, the momentum behind data lakehouses will only accelerate as businesses strive to unlock greater value from their data. With Apache Iceberg’s continued evolution and Dremio’s unwavering innovation, organizations have the tools they need to build scalable, flexible, and future-ready data platforms. Whether you’re adopting Iceberg for its robust capabilities or leveraging Dremio for its performance and simplicity, the opportunities to drive innovation through data have never been more promising.

Here’s to another year of breakthroughs, growth, and success in the world of data lakehouses—where the future of analytics and AI continues to take shape.

Start Your 2025 with a Meeting to Discuss how Dremio and Apache Iceberg fit into your Data Architecture

Ready to Get Started?

Enable the business to create and consume data products powered by Apache Iceberg, accelerating AI and analytics initiatives and dramatically reducing costs.