A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.
In addition to S3, ADLS & GCS, there are Minio, Dell ECS, IBM, Alibaba and other small cloud providers
Table FormatsApache IcebergSubsurface LIVE Sessions
Time travel capabilities and privacy laws like GDPR and CCPA are at odds with each other. Here’s how to make sure you’re GDPR/CCPA compliant while using time travel in Apache Iceberg.