Architectural Analysis – Why Dremio Is Faster Than Any Presto
In our previous blog, we reviewed the Dremio versus Presto distributions benchmarking results and highlighted the standout performance and cost-efficiency of Dremio, the cloud data lake query engine. In this blog post, we will explain how Dremio achieves this high level of performance and infrastructure cost savings over any Presto distribution at any scale.
Dremio is the original co-creator of Apache Arrow, and has built the first and only cloud data lake engine from the ground up on Apache Arrow. At its core, Dremio utilizes in-memory execution, powered by Apache Arrow (columnar in-memory data format) with Gandiva (LLVM-based execution kernel).
The core capabilities of the Dremio data lake engine allow it to execute queries extremely fast and efficiently directly on cloud data lake storage. With advanced technologies like columnar cloud cache (C3), predictive pipelining and massive parallel readers for S3, the Dremio engine delivers 4x better performance and up to 12x faster ad hoc queries out of the box than any distribution of Presto. And for BI/reporting queries Dremio offers additional acceleration technologies such as data reflections to make queries up to 3,000x faster compared to traditional SQL engines.
The Dremio elastic engines capability included in the Dremio AWS Edition offers even bigger savings on infrastructure costs with the ability to use the cloud computing resources your queries need, only when they need it. When there is no query activity, the engine remains shut down and consumes no compute resources. Incoming query demands trigger the engine to automatically start and elastically scale up to its full, tailored size. When the queries pause, the engine again automatically and elastically scales back down and stops. In other words, Dremio AWS Edition takes full advantage of the underlying elasticity of AWS to give you more value for every query, reducing AWS query compute costs by 60% on average.
Presto architecture is similar to a classic massively parallel processing (MPP) database management system. Similar to Dremio, it has one coordinator node working in sync with multiple worker nodes. However, it does not provide any multi-engine capabilities and workload isolation. All workload types execute within the same cluster.
This table compares the Dremio and Presto architectures side by side and highlights the major differences in the underlying technologies that allow Dremio to achieve unprecedented performance and cost-efficiency at any scale.
While Presto is broadly used, the technology underneath is outdated and unable to support the demands of modern data teams building next-generation cloud data lakes. In turn, slow performance and high operational costs are motivating organizations to seek alternatives to more efficiently meet their modern-day needs. Only Dremio delivers the ability to execute queries directly on cloud data lake storage with high efficiency, low cost and high performance, making it the ideal data lake engine for your cloud data lake.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.