Table of Contents
Apache Iceberg Becomes Industry Open Standard with Ecosystem Adoption
Cloud data lakes are now the go-to architecture for data storage and analytics across organizations of all types and sizes because cloud storage is scalable, easy and inexpensive. Digital experiences are ubiquitous and every company needs to make their data accessible to unlock innovation and offset competitive threats.
Organizations are now able to use cloud data lakes for workloads that traditionally went to data warehouses (such as BI and analytics). This shift is possible in part because of Apache Iceberg, an open source table format that provides many of the same features and capabilities found with traditional databases and data warehouses but within an open, flexible data lake environment.
Apache Iceberg continues to gain mindshare in the data ecosystem because of its well documented, engine-agnostic and open standard. While Apache Parquet is the de facto standard file format to track the rows and columns of data, we need the next layer of abstraction, a table, to track the files so we can efficiently access the minimum data necessary per query. In addition to a better user experience based upon SQL, Apache Iceberg tables also provide atomic transactions, data consistency guarantee, time travel and versioning.
Signs of Apache Iceberg Growth and Adoption
In May 2021, Apache Iceberg emerged from incubation to a top level Apache Software Foundation project.
A project like this requires vast ecosystem adoption to become an industry standard. Let’s look at what has happened in the Iceberg ecosystem over the past few months which makes us at Dremio very bullish on its future:
- At re:Invent AWS announced Athena support for Apache Iceberg.
- More recently AWS announced EMR support for Apache Iceberg.
- Adobe Experience Cloud adopts Apache Iceberg.
- Ryan Blue the Creator of Apache Iceberg at Netflix starts Tabular and raises series A funding.
- Snowflake announces support for Apache Iceberg external table query.
- It is easy to create a data lake based upon Apache Iceberg.
Over the past 3 years code additions to the Apache Iceberg Project have increased and there is no signs of this slowing down based upon the recent ecosystem announcements.
Top 10 Apache Iceberg Contributors and Influencers
There is a vibrant community of Apache Iceberg contributors and thought leaders that help drive growth and continuing innovation. Based on GitHub, LinkedIn, and our own research, here is a top 10 list of people we think are worth following on the topic of Apache Iceberg.
- Ryan Blue - Tabular.io, previously Netflix
- Anton Okolnychyi - Apple
- Kyle Bendickson - Tabular.io, previously Apple
- Jack Ye - AWS Athena
- Openinx - Alibaba
- Rusell Spitzer - Apple
- Eduard Tudenhöfner - Dremio
- Junjie Chen - Tencent
- Fokko Driesprong - Datafold
- Jun-he - Netflix
How to Get Involved and Learn More about Apache Iceberg
To learn more about Apache Iceberg check out these other resources:
Register for Subsurface LIVE Winter 2022 to hear more from Ryan Blue, the co-creator of Apache Iceberg, as well as other companies contributing to the project, including Uber and Apple.
We have some exciting Iceberg sessions in the agenda, including: