3 minute read · October 27, 2021
What is a Data Lakehouse? What are the benefits?
· Vice President, Portfolio Marketing, Dremio
A data lakehouse brings the best of both worlds - data warehouse and data lake. A lakehouse has the performance and optimization of a data warehouse combined with the flexibility of a data lake. It is not simply integrating a data lake and a data warehouse, but brings the best of speed, performance, agility, optimization and governance required for today’s needs.
A data lakehouse supports key capabilities such as
Transactions - Supports ACID transactions directly on the lake. This includes data versioning, concurrent transactions and record-level mutations (updates, deletes, etc.) across large-scale datasets.
BI and Analytics - Lakehouse enables seamless integration with BI and analytics tools.
Open Data Acrhitecture - Flexibility to use open standard formats such as multiple engines to run different types of workloads - Dremio (best-of-breed SQL), Databricks (best-of-breed batch/ML), EMR, Athena, Redshift Spectrum, Presto, Dask, Flink, or whatever else you want to process the data.
Decoupled storage and compute - The concept of separating compute and storage has been around for years but the next-generation cloud data lake architecture enables the separation of compute and data with data being its own tier.
Decoupled data and compute - The power of compute and data separation lies in the preservation of data in standardized open file and table formats, maximizing the flexibility for organizations to use best-of-breed technology, (i.e., Spark, Dremio, Flink, Dask and Kafka) for the analytics use case at hand while avoiding lock-in with a particular vendor.
Semantic layer - A universal business friendly layer that hides the complexity of underlying data structures and physical storage from end users - across all BI and analytics tools
Governance and security - It should support enterprise-grade security and governance to ensure that data can be safely accessed from data sources across the enterprise.
A SQL data lakehouse ( SQL lakehouse) is designed to run high performance, low latency BI and analytics queries on a Lakehouse. It has all the above capabilities plus support for BI tools, dashboards and ANSI-SQL that data consumers leverage to run high performing, low latency queries directly on the lake house without creating copies or cubes of data.
A true SQL lakehouse is a fully managed platform - with no software to install or configure and no infrastructure to monitor or maintain. It will scale elastically and dynamically with workloads, while delivering the performance and cost-efficiency that you need to run interactive queries directly against cloud data lakes.
Dremio Cloud is an infinitely scalable service that eliminates the cost and complexity of copying and moving data. Learn more about Dremio Cloud and try it for free.
Benefits of a SQL Lakehouse -
- Eliminates multiple data copies typically required to get interactive performance (many stages of ETL, loading into data warehouses, downstream cubes, BI extracts), which mitigates data drift and KPI drift
- Makes data analysts and data scientists more self-sufficient; quickly get to the data they need, resulting in much faster time-to-insight
- Interactive query speeds, combined with true data democratization, mean that more data-driven business decisions are made
- Seamless BI workflow experience with your BI tool of choice (Tableau, Power BI, etc)
- Flexibility to use multiple engines on the same data
- Easier security administration, as you’re not managing security for all the data copies
- No vendor lock-in as the open data architecture is designed for today’s and tomorrow’s needs.