4 minute read · April 16, 2024
What’s New in Dremio, Improved Data Ingestion and Migration into Apache Iceberg
· Senior Tech Evangelist, Dremio
Dremio's version 25 marks a significant milestone in data lakehouse management, particularly with its native support for Apache Iceberg, an open table format gaining momentum in the data community. This release cements Dremio's position as the foremost analytics engine tailored for Apache Iceberg, delivering unparalleled ease-of-management and performance.
A Unified Apache Iceberg Experience
Dremio's latest offering stands out with a bold claim: it is the most robust lakehouse management service built natively for Apache Iceberg. This distinction positions Dremio as a leader in Apache Iceberg analytics, providing a seamless and performant experience not just for Iceberg but across various table and file formats.
Seamless Integration and Superior Performance
Dremio excels in data ingestion, analytics, and BI performance on Apache Iceberg, ensuring that users experience sub-second response times. Moreover, Dremio's engine is adept at working with other table formats, such as Delta Lake, maintaining high performance and seamless integration. The platform's ability to support DML operations in Iceberg and perform time-travel queries in Iceberg and Delta Lake tables showcases its versatility and advanced capabilities.
Innovations in Version 25
The release of Dremio version 25 introduces several key features that enhance its Apache Iceberg support, making data management more efficient and intuitive.
Support for Apache Iceberg Kafka Connector Sink for Real-Time Ingestion
Dremio announces support for using the Apache Iceberg Kafka Connect sink with Dremio’s Lakehouse Catalog which is powered by the powerful open-source Nessie transactional catalog. The Kafka Connect sink was created by contributors to Apache Iceberg who recently contributed to the project pending PMC approval. Dremio documentation and tutorials will become available once that contribution is complete.
Additional Features for Comprehensive Support
COPY INTO ON_ERROR SKIP FILES: An enhancement to the existing COPY INTO command, this feature allows the system to skip files if any record gets rejected, preventing partial file loads and ensuring data integrity.
Read Support for Equality Deletes: This functionality is crucial for users who rely on Iceberg tables for real-time data analysis. It facilitates the easy handling of positional and equality deletes, enhancing the platform's real-time data analysis capabilities.
Conclusion
With version 25, Dremio redefines the analytics landscape for Apache Iceberg, offering a robust, efficient, and user-friendly platform. By enhancing its native support for Iceberg and integrating features like real-time data ingestion and simplified migration, Dremio empowers organizations to harness the full potential of their data, enabling insightful analytics and informed decision-making in a modern data environment.