8 minute read · June 24, 2024
The Unified Apache Iceberg Lakehouse: Self Service & Ease of Use
· Senior Tech Evangelist, Dremio
Data Mesh, Data Lakehouse, Data Fabric, Data Virtualization—there are many buzzwords describing ways to build your data platform. Regardless of the terminology, everyone seeks the same core features in their data platform:
- The ability to govern data in compliance with internal and external regulations.
- The ability to access all data seamlessly.
- The ability to query data and receive quick answers.
- The assurance that the data is up-to-date.
- Achieving all the above at minimal cost.
Many of these "Data X" concepts address different aspects of these goals. However, when you integrate solutions that cover all these needs, you often converge on a combination of a data lakehouse (treating your data lake as both a data warehouse and the primary data source) and data virtualization (connecting to multiple data sources and interacting with them through a unified interface). We’ll refer to this combination as the "Unified Apache Iceberg Lakehouse." This approach typically involves:
- Storing most of your analytics data in Apache Iceberg tables within your data lake.
- Enriching this data through data virtualization, drawing from a diverse array of databases and data warehouses.
- Using Dremio as the unified access and governance layer for all this data.
This series of blogs aims to explore the various benefits of this architecture, providing a deep dive into the value of this approach.
The Value of a Self-Service Data Lakehouse
Empowering users to access and analyze data independently can lead to faster insights, more informed decisions, and a more agile organization. Self-service eliminates bottlenecks in data access and allows users to explore data and generate reports without relying on IT or data engineering teams. This democratization of data enhances productivity and ensures that data-driven decision-making is not restricted to a few experts.
How Dremio Enables Self-Service Data Access
Dremio facilitates self-service data access through several key features:
- Unified Data Access: Dremio makes it easy to connect data from disparate sources into one centralized location. This unification allows users to seamlessly query and analyze data from multiple sources without needing to move the data.
- Robust Governance: Dremio simplifies data access governance by providing granular controls based on user, role, column, and row levels for all your data, regardless of where it lives. This ensures that sensitive information is protected and that access is compliant with governance policies.
- User-Friendly Interface: Dremio's web application UI is designed for ease of use, enabling users to explore and discover datasets effortlessly. Within the integrated semantic layer, users can craft SQL queries to create desired business metrics. The UI includes features like generative AI text-to-SQL, wizards for generating SQL to join datasets, changing column types, creating derived columns, and more.
- Integrated Documentation: Dremio's semantic layer includes an integrated wiki to document datasets, providing better context and understanding. The generative AI wiki generation feature helps kickstart documentation, making it easier for users to understand and utilize the data.
- Flexible Data Access: Outside the UI, data in Dremio can be accessed via common interfaces such as JDBC/ODBC, REST API, and Apache Arrow Flight. This flexibility allows users to connect their favorite BI tools or Python notebooks, facilitating a wide range of analytical workflows.
- Empowering Analysts: Dremio enables data analysts and analytics engineers to handle more transformation work in the final stages of data processing. This reduces the workload on data engineers, freeing them up to focus on new data projects instead of addressing endless data request tickets from downstream users.
- Experimentation with Zero-Copy Environments: The Dremio integrated catalog, powered by open-source Nessie, allows for the creation of zero-copy environments for data experimentation. Analysts can model different scenarios without the need to duplicate or triplicate the data, promoting efficient and flexible analysis.
Conclusion
Self-service data access is a game-changer for modern organizations, fostering a data-driven culture and enabling faster, more informed decision-making. Dremio's comprehensive suite of features makes it an ideal platform for achieving self-service data access. By unifying data sources, simplifying governance, providing an intuitive UI, and supporting flexible data access methods, Dremio empowers users to independently explore and analyze data. This not only enhances productivity but also allows data engineers to focus on strategic projects, ultimately driving innovation and growth.
Want to begin the transition to a Unified Apache Iceberg Lakehouse? Contact Us
Here are Some Exercises for you to See Dremio’s Features at Work on Your Laptop
- Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
- From SQLServer -> Apache Iceberg -> BI Dashboard
- From MongoDB -> Apache Iceberg -> BI Dashboard
- From Postgres -> Apache Iceberg -> BI Dashboard
- From MySQL -> Apache Iceberg -> BI Dashboard
- From Elasticsearch -> Apache Iceberg -> BI Dashboard
- From Kafka -> Apache Iceberg -> Dremio
Explore Dremio University to learn more about Data Lakehouses and Apache Iceberg and the associated Enterprise Use Cases. You can even learn how to deploy Dremio via Docker and explore these technologies hands-on.