Data Lake Storage

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. In addition to S3, ADLS & GCS, there are Minio, Dell ECS, IBM, Alibaba and other small cloud providers
Apache Iceberg

The Future of Intelligent Storage in Big Data

Read more
...

Project Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Read more
...

Lessons Learned From Running Apache Iceberg at Petabyte Scale

How to maintain Iceberg tables in their optimal shapes while running at petabyte scale.

Read more
...

Covering Indexes in the Data Lake with Hyperspace

Read more