Data Discovery at Lyft and Convoy
In this talk, we will introduce why it’s so crucial to solve data discovery, and discuss the learnings from addressing this problem at Lyft and Convoy. The leading open source data catalog is used by 750 users every week at Lyft and by 80% of Convoy’s employees every month. We will share what makes a successful data catalog and the latest improvements in Amundsen, including lineage and dbt integration. We will end with what’s still not working well and how we as a community could tackle it.Everyone has access to data but few know what exists, what’s trustworthy and how to use it. Humans solve this problem naturally through the gossip protocols of Slack and shoulder-tapping which doesn’t scale and comes at a huge productivity loss. But it gets worse. Wrong data leads to wrong conclusions.Mark and Chad saw this problem first hand at their respective organizations – Lyft & Convoy. Analysts and data scientists were spending more than 1/3rd of their time discovering and establishing trust in the data they use. Lyft has made its analysts and data scientists over 20% more productive by creating and using the leading open source data discovery and metadata engine, Amundsen. Convoy has ~80% of the company use Amundsen for data discovery and trust.
Mark Grover is the co-creator of the open source data catalog and metadata engine, Amundsen. Amundsen is used by data scientists and analysts to discover, understand and trust the data they use. At Lyft, Amundsen has 700+ active users every week, and outside of Lyft, Amundsen is used by 27 companies like Instacart, ING, Square and more.
Chad Sanderson is the Product Lead for Convoy’s Data Platform team, which includes the data warehouse, streaming, BI and visualization, experimentation, machine learning and data discovery.