The Dremio Blog

Dremio Blog: Open Data Insights

Dremio Blog: Open Data Insights

Using Nessie’s REST Catalog Support for Working with Apache Iceberg Tables

With the introduction of REST catalog , managing and interacting with Apache Iceberg catalogs has been greatly simplified. This shift from client-side configurations to server-side management offers many benefits, including better security, easier maintenance, and improved scalability.

Alex Merced
Dremio Blog: Open Data Insights

How Dremio brings together Data Unification and Decentralization for Ease-of-Use and Performance in Analytics

By embracing both data unification and decentralization, organizations can achieve a harmonious balance that leverages the strengths of each approach. Centralized access ensures consistency, security, and ease of governance, while decentralized management allows for agility, domain-specific optimization, and innovation.

Alex Merced
Dremio Blog: Open Data Insights

Leveraging Apache Iceberg Metadata Tables in Dremio for Effective Data Lakehouse Auditing

We'll delve into how querying Iceberg metadata tables in Dremio can provide invaluable insights for table auditing, ensuring data integrity and facilitating compliance.

Alex Merced
Dremio Blog: Open Data Insights

Unifying Data Sources with Dremio to Power a Streamlit App

By leveraging Dremio's unified analytics capabilities and Streamlit's simplicity in app development, we can overcome the challenges of data unification.

Alex Merced
Dremio Blog: Open Data Insights

Why Thinking about Apache Iceberg Catalogs Like Nessie and Apache Polaris (incubating) Matters

Iceberg catalogs are essential in the Iceberg lakehouse ecosystem, enabling core features such as table portability, concurrency control, governance, and versioning. As data lakehouse adoption grows, solutions like Nessie, and Apache Polaris (incubating) provide the necessary tools to streamline data management across diverse environments. With innovations like catalog versioning and centralized governance, these catalogs ensure consistency and reliability and empower organizations to manage their data more efficiently.

Alex Merced
Dremio Blog: Open Data Insights

The Iceberg Lakehouse: Key Benefits for Your Business

Choosing an Iceberg Lakehouse for your business means investing in a data architecture that meets your current needs and scales and evolves with your organization while delivering significant cost savings and enhanced analytics capabilities. As you consider the next steps for your data strategy, the Iceberg Lakehouse offers a compelling, forward-looking solution that will drive your business's success in the data-driven future.

Alex Merced
Dremio Blog: Open Data Insights

Introduction to the Iceberg Data Lakehouse

The Iceberg Data Lakehouse represents a significant advancement in data management architectures, combining the best features of data lakes and data warehouses. Its robust features, scalability, and cost efficiency make it a compelling choice for organizations looking to optimize their data platforms. Learn more about Lakehouse management for Apache Iceberg and why there's never been a better time to adopt Apache Iceberg as your data lakehouse table format.

Alex Merced
Dremio Blog: Open Data Insights

Guide to Maintaining an Apache Iceberg Lakehouse

Maintaining an Apache Iceberg Lakehouse involves strategic optimization and vigilant governance across its core components—storage, data files, table formats, catalogs, and compute engines. Key tasks like partitioning, compaction, and clustering enhance performance, while regular maintenance such as expiring snapshots and removing orphan files helps manage storage and ensures compliance. Effective catalog management, whether through open-source or managed solutions like Dremio's Enterprise Catalog, simplifies data organization and access. Security is fortified with Role-Based Access Control (RBAC) for broad protections and Fine-Grained Access Controls (FGAC) for detailed security, with tools like Dremio enabling consistent enforcement across your data ecosystem. By following these practices, you can build a scalable, efficient, and secure Iceberg Lakehouse tailored to your organization's needs.

Alex Merced
Dremio Blog: Open Data Insights

Apache XTable: Converting Between Apache Iceberg, Delta Lake, and Apache Hudi

Apache XTable offers a way to convert your existing data lakehouse tables to the format of your choice without having to rewrite all of your data. This, along with robust Iceberg DML support from Dremio, offers an additional way to easily migrate to an Apache Iceberg data lakehouse along with the catalog versioning benefits of the Dremio and Nessie catalogs.

Alex Merced
Dremio Blog: Open Data Insights

Migration Guide for Apache Iceberg Lakehouses

Migrating to an Apache Iceberg Lakehouse enhances data infrastructure with cost-efficiency, ease of use, and business value, despite the inherent challenges. By adopting a data lakehouse architecture, you gain benefits like ACID guarantees, time travel, and schema evolution, with Apache Iceberg offering unique advantages. Selecting the right catalog and choosing between in-place or shadow migration approaches, supported by a blue/green strategy, ensures a smooth transition. Tools like Dremio simplify migration, providing a uniform interface between old and new systems, minimizing disruptions and easing change management. Leveraging Dremio's capabilities, such as CTAS and COPY INTO, alongside Apache XTable, ensures an optimized and seamless migration process, maintaining consistent user experience and robust data operations.

Alex Merced
Dremio Blog: Open Data Insights

Getting Hands-on with Snowflake Managed Polaris

In previous blogs, we've discussed understanding Polaris's architecture and getting hands-on with Polaris self-managed OSS; in this article, I hope to show you how to get hands-on with the Snowflake Managed version of Polaris, which is currently in public preview.

Alex Merced
Dremio Blog: Open Data Insights

Getting Hands-on with Polaris OSS, Apache Iceberg and Apache Spark

A crucial component of an Iceberg lakehouse is the catalog, which tracks your tables, making them discoverable by various tools like Dremio, Snowflake, Apache Spark, and more. Recently, a new community-driven open-source catalog named Polaris has emerged at the forefront of open-source Iceberg catalog discussions.

Alex Merced
Dremio Blog: Open Data Insights

Comparing Apache Iceberg to Other Data Lakehouse Solutions

Apache Iceberg is a powerful data lakehouse solution with advanced features, robust performance, and broad compatibility. It addresses many of the challenges associated with traditional data lakes, providing a more efficient and reliable way to manage large datasets.

Alex Merced
Dremio Blog: Open Data Insights

Apache Iceberg Crash Course: What is a Data Lakehouse and a Table Format?

While data lakes democratized data access, they also introduced challenges that hindered their usability compared to traditional systems. The advent of table formats like Apache Iceberg and catalogs like Nessie and Polaris has bridged this gap, enabling the data lakehouse architecture to combine the best of both worlds.

Alex Merced
Dremio Blog: Open Data Insights

Unified Semantic Layer: A Modern Solution for Self-Service Analytics

The demand for flexible and fast data-driven decision-making is critical for modern business strategy. Semantic layers are designed to bridge the gap between complex data structures and business-friendly terminology, enabling self-service analytics. However, traditional approaches often struggle to meet performance and flexibility demands for today’s business insights. This is where a data lakehouse-powered semantic layer […]

Andrew Madson