Store data in open formats
Use open-source formats (for instance, Apache Parquet for files and Apache Iceberg for tables) rather than proprietary formats tied to specific vendors.
Data consumers need data for analytics to make business decisions. Data teams struggle to address stale data, poor self-service, and getting new analytics into production faster. Learn how to solve these challenges with an open lakehouse.
A lakehouse is a data analytics architecture that converges the data lake and data warehouse in the cloud. An open lakehouse built on an open data architecture enables organizations to use their cloud data lake as their data warehouse so that they can make full use of their data for analytics.
Today, many companies have data in cloud data storage (like Amazon S3, Azure Data Lake Storage, or Google Cloud Storage), but have needed to move and copy subsets of data into proprietary data warehouses for analytics — and from there create aggregates, cubes, and extracts for better performance. This leads to three significant challenges.
Moving data through complex ETL pipelines creates backlogs for data requests and headaches for data teams.
Expensive data warehouses (along with multiple data copies, extracts, and cubes) add up to a high total cost of ownership.
Proprietary data warehouse formats prevent you from using multiple best-of-breed engines on the same data or easily adopting new engines.
With an open lakehouse, you keep your data where it is and make all your data available for analytics.
Dremio’s open lakehouse platform is available as a fully managed cloud service with a forever-free tier. Sign up now with a forever-free account on Dremio Cloud.
Dremio’s open lakehouse platform makes use of key open source technologies.
An in-memory columnar format that supports zero-copy reads for fast data access without serialization.
More About Apache Arrow
Open source data connectivity technology that provides 20x times faster data transfer rates than JDBC and ODBC.
More about Apache Arrow Flight
An open-source table format for huge analytic datasets, Iceberg enables multiple applications to work on the same data in a transactionally consistent manner.
More about Apache Iceberg
Nessie is a lakehouse metastore that provides a Git-like experience on data lake storage.
More about Project Nessie