A data lakehouse brings the best of both worlds - data warehouse and data lake. A lakehouse has the performance and optimization of a data warehouse combined with the flexibility of a data lake. It is not simply integrating a data lake and a data warehouse, but brings the best of speed, performance, agility, optimization and governance required for today’s needs.
A data lakehouse supports key capabilities such as
Transactions - Supports ACID transactions directly on the lake. This includes data versioning, concurrent transactions and record-level mutations (updates, deletes, etc.) across large-scale datasets.
BI and Analytics - Lakehouse enables seamless integration with BI and analytics tools.
Open Data Acrhitecture - Flexibility to use open standard formats such as multiple engines to run different types of workloads - Dremio (best-of-breed SQL), Databricks (best-of-breed batch/ML), EMR, Athena, Redshift Spectrum, Presto, Dask, Flink, or whatever else you want to process the data.
Decoupled storage and compute - The concept of separating compute and storage has been around for years but the next-generation cloud data lake architecture enables the separation of compute and data with data being its own tier.
Decoupled data and compute - The power of compute and data separation lies in the preservation of data in standardized open file and table formats, maximizing the flexibility for organizations to use best-of-breed technology, (i.e., Spark, Dremio, Flink, Dask and Kafka) for the analytics use case at hand while avoiding lock-in with a particular vendor.
Semantic layer - A universal business friendly layer that hides the complexity of underlying data structures and physical storage from end users - across all BI and analytics tools
Governance and security - It should support enterprise-grade security and governance to ensure that data can be safely accessed from data sources across the enterprise.
A SQL data lakehouse ( SQL lakehouse) is designed to run high performance, low latency BI and analytics queries on a Lakehouse. It has all the above capabilities plus support for BI tools, dashboards and ANSI-SQL that data consumers leverage to run high performing, low latency queries directly on the lake house without creating copies or cubes of data.
A true SQL lakehouse is a fully managed platform - with no software to install or configure and no infrastructure to monitor or maintain. It will scale elastically and dynamically with workloads, while delivering the performance and cost-efficiency that you need to run interactive queries directly against cloud data lakes.
Dremio Cloud is an infinitely scalable service that eliminates the cost and complexity of copying and moving data. Learn more about Dremio Cloud and try it for free.
Benefits of a SQL Lakehouse -
Eliminates multiple data copies typically required to get interactive performance (many stages of ETL, loading into data warehouses, downstream cubes, BI extracts), which mitigates data drift and KPI drift
Makes data analysts and data scientists more self-sufficient; quickly get to the data they need, resulting in much faster time-to-insight
Interactive query speeds, combined with true data democratization, mean that more data-driven business decisions are made
Seamless BI workflow experience with your BI tool of choice (Tableau, Power BI, etc)
Flexibility to use multiple engines on the same data
Easier security administration, as you’re not managing security for all the data copies
No vendor lock-in as the open data architecture is designed for today’s and tomorrow’s needs.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.