CUSTOMER STORY

Amazon Accelerates Supply Chain Decision Making by Implementing an Innovative Analytics Architecture using Dremio

10x Increase

in query performance, from 60 seconds to 4-6 seconds.

-60 hours

of work eliminated per project with auto query planning.

90%

Reduction in setup time.

The Customer

Amazon's Supply Chain Finance Analytics team is responsible for providing controllership for Amazon's consumer supply chain systems and serves as a strategic partner for the Supply Chain Optimization Technologies (SCOT) team. The team's work is data-intensive, and they have to stitch together hundreds of different tables produced by various business units to create a unified financial view of the results from supply chain systems that are responsible for making science and machine learning driven decisions on tens of billions of dollars in supply chain spend. The team faced significant challenges in streamlining Extract, Transform, & Load (ETL) processes and in reducing their data engineering workloads while also providing reliable, consistent, and accurate insights to internal users with high-quality analytics products.

The Challenge

The SCOT Finance Analytics team at Amazon had to create financial reporting that provided a unified financial view of supply chain systems for decision-making. The team had to deal with enormous volumes of data, with some datasets consisting of billions of rows and hundreds of columns. The insights had to be generated on various dimensions and filters, including date ranges, geographical regions (country, state, city, zip), business categories, and product categories. However, the data source was only performant for 1-3 dimensions of filters, and the BI tools’ extracts usually failed after approximately 100 million rows, degrading analytics products performance severely, especially with high numbers of columns. Queries took very long (four minutes or more for complex queries) to complete, even with sample data. To try to solve this challenge, the team initially created 20+ materialized views of the same table in the data warehouse that were partitioned, sorted, and distributed by different dimensions, and they used parametrization functions in the BI tool to select the right data source based on the filter and value selected by the user. However, this approach was not performant or scalable, and it required significant amounts of ongoing maintenance and management. It also created numerous and complex data pipelines, introducing more chances for human error, and varying system responsiveness based on the Data Warehouse cluster's load. They required a better and more scalable solution to meet their standards.

The Solution

The team researched several commercial and open-source solutions, including Apache Kylin, a commercial big data Online Analytical Processing (OLAP) provider, and Dremio for in-depth evaluation. They had several requirements: it needed to produce views with less than a 10-second refresh time for each user click or filter selection, deliver consistent completion of daily backend data appends, query the entire dataset (with more than three years of historical data), without reducing scope, offer low setup and maintenance labor, and scale compute elastically without bottlenecking resources. Ultimately, the team chose Dremio. They quickly set up a Dremio instance using an Amazon Web Services CloudFormation Template (AWS CFT), and they were able to scale compute up and down as needed. Dremio allowed them to do everything using a modern User Interface for both SQL and no code analytics. Dremio had seamless integrations with existing BI tools and a built-in SQL Runner (SQL IDE) for ad hoc query analysis and exploration. The team was able to set up a reflection build trigger based on new data ingestion into the Amazon Simple Storage Service (Amazon S3) bucket, simplifying their ETL process. Dremio could build multiple combinations of reflections using a GUI interface. After the evaluation process, they chose Dremio over Kylin because it had a faster setup time, it was more user-friendly, and it offered seamless integration with most BI tools.

Conclusion

Amazon's Supply Chain Optimization Technology (SCOT) Finance Analytics team faced challenges in managing their data pipeline while providing reliable, consistent, and accurate insights to internal users with high-quality analytics products. They were able to achieve their goals by using Dremio to accelerate queries, streamline ETL, and reduce their data engineering workloads, getting high-quality insights into the hands of their end users fast. Dremio was able to meet all of their requirements, and they were able to deliver the end result exactly as imagined with no compromise on their vision.

Other Case Studies

1200x628 Gnarly Data Waves ep 1 1 1

Gnarly Data Waves Episode

Overview of Dremio’s Data Lakehouse

On our 1st episode of Gnarly Data Waves, Read Maloney provides an Overview of Getting Started with Dremio's Data Lakehouse and showcase Dremio Use Cases advantages.

Learn more
The Definitive Guide to the SQL Data Lakehouse

WHITEPAPER

The Definitive Guide to the SQL Data Lakehouse

A SQL data lakehouse uses SQL commands to query cloud data lake storage, simplifying data access and governance for both BI and data science.

Learn more
Resource thumbnail

WHITEPAPER

The Path to Self-Service Analytics on the Data Lake

Download this white paper to get a step-by-step roadmap for adopting Dremio and migrating workloads while maintaining coexistence and interoperability with existing systems and technologies.

Learn more

See All Case Studies ->

Here are some resources to get started

get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.