Looking Under the Data Catalog Umbrella: What Every CDO Should Know About Iceberg Before Getting Started

September 16, 2024

The momentum around data catalogs has never been higher than it is today. That said, it probably has never been more confusing to understand the changes and differences of each company and each product’s focus on how it delivers (and fails to deliver) at scale. The emergence of Apache Iceberg and the continued market consolidation for efficiencies and cost savings have left a number of executives reconsidering their previous make vs. buy decisions.

Historically, as a data leader in large enterprises, I realized that in order  to break through the data and organizational silos, you have to address the technical challenges of catalogs that typically have required a full build strategy (rarely though open source even). Most organizations have too many platforms consuming, enriching, serving, and generally interacting with data. The list is long and it’s simply not realistic to expect that there are enough connectors in commercial catalogs to track the full lineage and provenance across them. Treating data as an asset requires tracking and understanding that asset over its lifecycle, including crossing platforms that may not integrate well, or at all. The emergence of Iceberg as a standard, including the flexibility of it to enable managing assets, has dramatically lowered the bar. But be warned, at a use case level, the daylight is now visible but it’s not solved yet and the finish line has yet to come into view.

Breaking Up the Data Catalog to Create an Enterprise Picture

I have presented at a number of conferences on going beyond basic governance and building an enterprise data strategy including catalogs. Every time, I use the below graphic to help break up the data catalog into four distinct functional areas: Business Terms & Glossary, Metadata Management (emphasizing the business level metadata here as a missing part in a lot of technology teams’ strategies), Integration & Messaging, and Discovery & Compliance.

Read the full article, by Nik Acheson, via Datanami.

get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.