Cloud data lakes are now the go-to architecture for data storage and analytics across organizations of all types and sizes because cloud storage is scalable, easy and inexpensive. Digital experiences are ubiquitous and every company needs to make their data accessible to unlock innovation and offset competitive threats.
Organizations are now able to use cloud data lakes for workloads that traditionally went to data warehouses (such as BI and analytics). This shift is possible in part because of Apache Iceberg, an open source table format that provides many of the same features and capabilities found with traditional databases and data warehouses but within an open, flexible data lake environment.
Apache Iceberg continues to gain mindshare in the data ecosystem because of its well documented, engine-agnostic and open standard. While Apache Parquet is the de facto standard file format to track the rows and columns of data, we need the next layer of abstraction, a table, to track the files so we can efficiently access the minimum data necessary per query. In addition to a better user experience based upon SQL, Apache Iceberg tables also provide atomic transactions, data consistency guarantee, time travel and versioning.
Signs of Apache Iceberg Growth and Adoption
In May 2021, Apache Iceberg emerged from incubation to a top level Apache Software Foundation project.
A project like this requires vast ecosystem adoption to become an industry standard. Let’s look at what has happened in the Iceberg ecosystem over the past few months which makes us at Dremio very bullish on its future:
Over the past 3 years code additions to the Apache Iceberg Project have increased and there is no signs of this slowing down based upon the recent ecosystem announcements.
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Top 10 Apache Iceberg Contributors and Influencers
There is a vibrant community of Apache Iceberg contributors and thought leaders that help drive growth and continuing innovation. Based on GitHub, LinkedIn, and our own research, here is a top 10 list of people we think are worth following on the topic of Apache Iceberg.
Ryan Blue - Tabular.io, previously Netflix
Anton Okolnychyi - Apple
Kyle Bendickson - Tabular.io, previously Apple
Jack Ye - AWS Athena
Openinx - Alibaba
Rusell Spitzer - Apple
Eduard Tudenhöfner - Dremio
Junjie Chen - Tencent
Fokko Driesprong - Datafold
Jun-he - Netflix
How to Get Involved and Learn More about Apache Iceberg
To learn more about Apache Iceberg check out these other resources:
Register for Subsurface LIVE Winter 2022 to hear more from Ryan Blue, the co-creator of Apache Iceberg, as well as other companies contributing to the project, including Uber and Apple.
We have some exciting Iceberg sessions in the agenda, including:
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Aug 16, 2023·Dremio Blog: News Highlights
5 Use Cases for the Dremio Lakehouse
With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.