7 minute read · February 3, 2022

Apache Iceberg Becomes Industry Open Standard with Ecosystem Adoption

Mark Lyons

Mark Lyons · Vice President of Product Management, Dremio

Cloud data lakes are now the go-to architecture for data storage and analytics across organizations of all types and sizes because cloud storage is scalable, easy and inexpensive. Digital experiences are ubiquitous and every company needs to make their data accessible to unlock innovation and offset competitive threats.

Organizations are now able to use cloud data lakes for workloads that traditionally went to data warehouses (such as BI and analytics). This shift is possible in part because of Apache Iceberg, an open source table format that provides many of the same features and capabilities found with traditional databases and data warehouses but within an open, flexible data lake environment.

Apache Iceberg continues to gain mindshare in the data ecosystem because of its well documented, engine-agnostic and open standard. While Apache Parquet is the de facto standard file format to track the rows and columns of data, we need the next layer of abstraction, a table, to track the files so we can efficiently access the minimum data necessary per query. In addition to a better user experience based upon SQL, Apache Iceberg tables also provide atomic transactions, data consistency guarantee, time travel and versioning.

Signs of Apache Iceberg Growth and Adoption

In May 2021, Apache Iceberg emerged from incubation to a top level Apache Software Foundation project.

A project like this requires vast ecosystem adoption to become an industry standard. Let’s look at what has happened in the Iceberg ecosystem over the past few months which makes us at Dremio very bullish on its future:

Over the past 3 years code additions to the Apache Iceberg Project have increased and there is no signs of this slowing down based upon the recent ecosystem announcements.

Source: https://github.com/apache/iceberg/graphs/code-frequency

Top 10 Apache Iceberg Contributors and Influencers

There is a vibrant community of Apache Iceberg contributors and thought leaders that help drive growth and continuing innovation. Based on GitHub, LinkedIn, and our own research, here is a top 10 list of people we think are worth following on the topic of Apache Iceberg. 

  • Ryan Blue - Tabular.io, previously Netflix
  • Anton Okolnychyi - Apple
  • Kyle Bendickson - Tabular.io, previously Apple
  • Jack Ye - AWS Athena
  • Openinx - Alibaba
  • Rusell Spitzer - Apple
  • Eduard Tudenhöfner - Dremio
  • Junjie Chen - Tencent
  • Fokko Driesprong - Datafold
  • Jun-he - Netflix

How to Get Involved and Learn More about Apache Iceberg

To learn more about Apache Iceberg check out these other resources:

Register for Subsurface LIVE Winter 2022 to hear more from Ryan Blue, the co-creator of Apache Iceberg, as well as other companies contributing to the project, including Uber and Apple.

We have some exciting Iceberg sessions in the agenda, including:

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.