Data Discovery at Lyft and Convoy

Thursday, July 22 2021

In this talk, we will introduce why it’s so crucial to solve data discovery, and discuss the learnings from addressing this problem at Lyft and Convoy. The leading open source data catalog is used by 750 users every week at Lyft and by 80% of Convoy’s employees every month. We will share what makes a successful data catalog and the latest improvements in Amundsen, including lineage and dbt integration. We will end with what’s still not working well and how we as a community could tackle it.

Everyone has access to data but few know what exists, what’s trustworthy and how to use it. Humans solve this problem naturally through the gossip protocols of Slack and shoulder-tapping which doesn’t scale and comes at a huge productivity loss. But it gets worse. Wrong data leads to wrong conclusions.

Mark and Chad saw this problem first hand at their respective organizations – Lyft & Convoy. Analysts and data scientists were spending more than 1/3rd of their time discovering and establishing trust in the data they use. Lyft has made its analysts and data scientists over 20% more productive by creating and using the leading open source data discovery and metadata engine, Amundsen. Convoy has ~80% of the company use Amundsen for data discovery and trust.