What the Big Fuss Over Table Formats and Metadata Catalogs Is All About

June 7, 2024

The big data community gained clarity on the future of data lakehouses earlier this week as a result of Snowflake’s open sourcing of its new Polaris metadata catalog and Databricks’ acquisition of Tabular. The actions cemented Apache Iceberg as the winner of the battle of open table formats, which is a big win for customers and open data, while it exposes a new competitive front: the metadata catalog.

The news Monday and Tuesday was as hot as the weather in San Francisco this week, and left some longtime big data watchers gasping for breath. To recap:

On Monday, Snowflake announced that it was open sourcing Polaris, a new metadata catalog based on Apache Iceberg. The move will enable Snowflake customers to use their choice of query engine to process data stored in Iceberg, including Spark, Flink, Presto, Trino, and soon Dremio.

Snowflake followed that up on Tuesday by announcing that, after a year and a half of being in tech preview, support for Iceberg was generally available. The moves, while expected, culminated a dramatic about-face for Snowflake from proud supporter of proprietary storage formats and query engines into a champion of openness and customer choice.

Source: Snowflake

Later Tuesday, Databricks came out of left field with its own groundbreaking news: the acquisition of Tabular, the company founded by the creators of Iceberg.

The move, made in the middle of Snowflake’s Data Cloud Summit at the Moscone Center in San Francisco (and a week before its own AI + Data Summit at the same venue), was a defacto admission by Databricks that Iceberg had won the table format war. Its own open table format, called Delta Lake, was trailing Iceberg in terms of support and adoption in the community.

Read the full story, via Alex Woodie at Datanami.

get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.