6 minute read · March 10, 2023

Still Stuck with a Data Warehouse? It’s Time to Consider a Better Architecture – a Data Lakehouse

Kamran Hussain · Presales Solutions Architect, Dremio

Everyone recognizes that data platforms are critical for making data-driven decisions in all functions of an enterprise. A flaw in the foundation of your data platform can have significant cost and lost revenue implications.

The traditional data platform has been the data warehouse (DWH). Most data architects and engineers are very comfortable with the data warehouse since it's been around for 30+ years. They know the vendors, the tools to connect to it, and how to run dashboards, etc., plus the technical resources are easy to find. And data warehouses are now available in the cloud. So why consider the alternative?

If you've been around the block, you will agree that business users are often challenged with slow performance and/or access to all the data for analysis. Plus, the cost of hardware and software is too high. Even the newer cloud data warehouses seem affordable at first, then the cost goes up as consumption increases.

Some of the challenges with the DWH:

Time to market – Data modelers and ETL developers spend a lot of time with business users to try and understand their requirements. More often than not changes have to be made repeatedly to provide business users with what they need and/or to adjust to new requirements due to evolving business rules.
Proprietary format – When enterprises choose a data platform, they make a long-term commitment to that vendor. Once data is loaded into a DWH, only that engine can operate on that data. These systems are designed to be the end of the dataflow, so by design extracting data out of them is not very easy. Just ask users of Teradata, Netezza, and Oracle DWH!
Data integrity, security, and governance – When business users cannot run their reports within their desired time frame, they make copies of the data. This results in loss of trust in the data and a data governance nightmare.
Very expensive to maintain – Most enterprises have very large and very old data warehouses that have become too complex and require expert resources to manage them.

Let’s take a look at why you should consider the modern architecture of a data lakehouse, which combines the functionality of a data warehouse and the benefits of a data lake.

Data lakehouse strengths:

Faster time to insight – Instead of the traditional data pipeline that moves data from the source systems to a data lake, then to the data warehouse, and then transforms the data to a data mart, you can simply connect your preferred BI or data science tool directly to the data lakehouse (using a solution like Dremio) and start to analyze the data quickly.
Data in open format without vendor lock-in – Most enterprises have a data lake where data is stored in an open format, like Parquet, csv, JSON, etc. This means there is no need to convert data into a proprietary format to consume it. This strategy will also pay off in the future when you decide to move to a different platform; since there will be no need to export the data from that proprietary format.
Choice of the best engine for the use case – With the data in an open format, you are free to choose the best engine for the specific use case, without changing or copying the data. For example, you can use Dremio for interactive BI workloads and Spark for ETL/batch processing.
Data warehouse performance at data lake cost – Lakehouse technologies like Dremio provide interactive BI performance directly on the data lakehouse, eliminating multiple copies of the data (required on data warehouses for best performance), which means less hardware and fewer resources to manage the complex environment. Also, the cost of lakehouse technologies is significantly lower than data warehouses.
DML on data lakes – With lakehouse capabilities available on the data lake, users can now do DML using open source technologies like Apache Iceberg. Iceberg enables ACID transactions, schema evolution, and much more.

Apache Iceberg is a core foundational component of an effective data lakehouse. Iceberg is a table format which enables querying and modifying data in the data lake (files) as if it were a table. Iceberg enables ACID transactions, schema evolution, time travel, and many other capabilities that are required for data warehousing workloads.

We hope this short blog has piqued your interest in the data lakehouse and how it is the best modern data platform to solve your data warehouse challenges. The following are a few resources to learn more about Iceberg and to try out the data lakehouse for free.

Apache Iceberg FAQ
Free Apache Iceberg 101 Course (series of videos)
Dremio Cloud:
1. Completely free and easy to start, https://www.dremio.com/test-drive/
2. Sign up for Dremio Cloud and connect it to your S3 Data Lake (there is no Dremio cost with the Standard edition), https://www.dremio.com/sign-up/

Article Topics

Dremio Blog: Open Data Insights

Still Stuck with a Data Warehouse? It’s Time to Consider a Better Architecture – a Data Lakehouse

Table of Contents

Ready to Get Started?

Table of Contents

Additional Resources

The Why and How of Using Apache Iceberg on Databricks

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

5 Use Cases for the Dremio Lakehouse

Ready to Get Started?