Modernize to an Open Lakehouse

Data consumers need data for analytics to make business decisions. Data teams struggle to address stale data, poor self-service, and getting new analytics into production faster. Learn how to solve these challenges with an open lakehouse.

Open Lakehouse Essentials

A lakehouse is a data analytics architecture that converges the data lake and data warehouse in the cloud. An open lakehouse built on an open data architecture enables organizations to use their cloud data lake as their data warehouse so that they can make full use of their data for analytics.

Steps to an Open Lakehouse

  • Store data in open formats

    Use open-source formats (for instance, Apache Parquet for files and Apache Iceberg for tables) rather than proprietary formats tied to specific vendors.

  • Treat data as its own tier

    With an open lakehouse, data exists as its own independent layer, eliminating the need to move or copy data to data warehouses, cubes, or extracts for analysis.

  • Use a SQL query engine to accelerate time to insight

    In an open lakehouse, data is accessed by decoupled and elastic compute engines (for example, Dremio Sonar) with query acceleration for BI and ad hoc workloads.

  • Support self-service through use of a semantic layer

    A business-friendly semantic layer provides consistent shared access across all users and tools and also centralized security and governance.

  • Automate data management

    Use an intelligent metastore like Dremio Arctic for all the data management capabilities you’re used to with a data warehouse — and more.

3 Reasons to Modernize
to an Open Lakehouse

Today, many companies have data in cloud data storage (like Amazon S3, Azure Data Lake Storage, or Google Cloud Storage), but have needed to move and copy subsets of data into proprietary data warehouses for analytics — and from there create aggregates, cubes, and extracts for better performance. This leads to three significant challenges.

Slow, complex architecture

Moving data through complex ETL pipelines creates backlogs for data requests and headaches for data teams.

Out-of-Control Costs

Expensive data warehouses (along with multiple data copies, extracts, and cubes) add up to a high total cost of ownership.

Risky Vendor Lock-In

Proprietary data warehouse formats prevent you from using multiple best-of-breed engines on the same data or easily adopting new engines.

Why move data from your cloud data storage if you don’t have to?

With an open lakehouse, you keep your data where it is and make all your data available for analytics.

Get Started with an Open Lakehouse

Dremio’s open lakehouse platform is available as a fully managed cloud service with a forever-free tier. Sign up now with a forever-free account on Dremio Cloud.

The Open Source Technology Behind the Open Lakehouse

Dremio’s open lakehouse platform makes use of key open source technologies.

Apache Arrow

An in-memory columnar format that supports zero-copy reads for fast data access without serialization.

More About Apache Arrow

Apache Arrow Flight

Open source data connectivity technology that provides 20x times faster data transfer rates than JDBC and ODBC.

More about Apache Arrow Flight

Apache Iceberg

An open-source table format for huge analytic datasets, Iceberg enables multiple applications to work on the same data in a transactionally consistent manner.

More about Apache Iceberg

Project Nessie

Nessie is a lakehouse metastore that provides a Git-like experience on data lake storage.

More about Project Nessie

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

Watch Demo

Not ready to get started today? See the platform in action.

Check Out Demo