5 minute read · April 9, 2024

Advancing the Capabilities of the  Premier Data Lakehouse Platform for Apache Iceberg 

Mark Shainman

Mark Shainman · Principal Product Marketing Manager

With the latest release of Dremio, 25.0 we are helping accelerate the adoption and benefits of Apache Iceberg, while  bringing  your users closer to the data with lakehouse flexibility, scalability and performance at a fraction of the cost.  We are excited to announce some of the new features that improve scalability, manageability, ease of use and improve the overall time to business insight. 

Improved Data Ingestion and Migration into Iceberg 

Dremio has expanded the options for companies to ingest data into an Iceberg lakehouse. With support for high-speed streaming from Kafka into Iceberg, organizations can now effortlessly ingest data in real-time, enabling near-real-time  analytics on the data lakehouse. With support for a new Kafka sink connector, Dremio delivers continuous, real-time data ingest into the lakehouse. 

Dremio also simplifies migration from legacy, data lake file formats to Iceberg tables for improved performance and automated optimization with one-command ingestion. Improvements in the data copy commands make Iceberg adoption easier and error-free. Dremio can seamlessly convert raw data (in JSON, CSV and Parquet formats) from data lakes, relational databases, data warehouses, and NoSQL databases into Apache Iceberg, both in the cloud and on-premise.

Improved Reliability, Stability and Scalability

Now with v25.0, Dremio delivers the most durable and robust version for large and complex workloads. Improved memory management capabilities eliminate the risk of  out-of-memory issues, ensuring queries will not fail.  Dremio’s memory arbiter’s new memory management capabilities dynamically monitor memory usage and allocate memory as needed.  When memory is approaching exhaustion on complex queries or processes, instead of any type of failure occurring, the process will spill to disk, reducing memory usage.

 Now with new spillable hash join processes, you are assured that if the memory is exceeded, data spills to disk temporarily, allowing the join to proceed without running out of memory. Hash joins are now more scalable and performant, because they can easily handle datasets larger than available memory capacity. This makes it easy for Dremio users to execute large and complex queries against extremely large data sets to gain business insight. This ensures the queries will run to completion irrespective of other workloads and available resources. 

Enhanced Performance & Management For Iceberg Lakehouses

Dremio continues to innovate with Reflections capabilities to make SQL query performance even faster and more intelligent,  while simplifying manageability and lowering TCO.  Improved Reflections refresh capabilities reduce the time required to update Reflections,  ensuring users have timely, current access to the data they need.  With the new Reflection incremental refresh capability, users no longer need to completely rebuild Reflections when the data in the underlying tables changes. This provides faster and more cost-effective access to current data. Finally, new scheduling capabilities mean that administrators can easily schedule and manage refresh processes across all of their data.

At the same time Dremio continues to reduce complexity and total cost of ownership (TCO) by automating lakehouse management tasks, such as space management. By automating Iceberg management processes, Dremio not only reduces costs but also enhances data team productivity and improves overall time to insight.

Ease of Administration and Monitoring with Integrated Observability

Dremio now delivers an administration console for monitoring a Dremio lakehouse environment. This provides visibility into aggregate metrics on the jobs running on the cluster over time so admins can optimize performance and cost. Catalog usage metrics such as the most active users and the most queried data products are also available. Organizations can also now customize their own metrics using the underlying system tables or periodically push data from system tables to their monitoring tool of choice such as Splunk, Datadog, or even to object storage for more flexibility. Now, with this new information, companies can more efficiently manage and monitor their Dremio environments. 

Expanded SQL Coverage

In this release we continue to add additional SQL functions for an even more comprehensive SQL engine. New SQL functions include: Arrays_Overlap function, cumulative window framing function and sliding window framing function.

Get started now!

All of these capabilities are available now!  If you’re already a Dremio Self-Managed customer, it’s easy to upgrade. Visit our Support Portal to download the latest version. Dremio Cloud Users, simply log in to get started!  Not yet a Dremio user? Visit the Get Started page to find offerings for Dremio Cloud or Dremio Self-Managed.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.