These 10 capabilities that can be found in the modern data lake reference architecture, along with vendor tools and libraries for each one.
2. OTF-Based Data Warehouse
Object storage is also the underlying storage solution for an OTF-Bbased data warehouse. Using object storage for a data warehouse may sound odd, but a data warehouse built this way represents the next generation of data warehouses. This is made possible by the OTF specifications authored by Netflix, Uber and Databricks, which make it seamless to employ object storage within a data warehouse.
The OTFs — Apache Iceberg, Apache Hudi and Delta Lake — were written because there were no products on the market that could handle the creators’ data needs. Essentially, what they all do (in different ways) is define a data warehouse that can be built on top of object storage. Object storage provides the combination of scalable capacity and high performance that other storage solutions cannot.
Since these are modern specifications, they have advanced features that old-fashioned data warehouses do not have such as partition evolution, schema evolution and zero-copy branching.
Two MinIO partners that can run their OTF-based data warehouse on top of MinIO are Dremio and Starburst.
- Dremio Sonar (data warehouse processing engine)
- Dremio Arctic (data warehouse catalog)
- Open Data Lakehouse | Starburst (catalog and processing engine)
Read the full story, via The New Stack.