Architectural Analysis - Why Dremio Is Faster Than Any Presto
In our previous blog, we reviewed the Dremio versus Presto distributions benchmarking results and highlighted the standout performance and cost-efficiency of Dremio, the cloud data lake query engine. In this blog post, we will explain how Dremio achieves this high level of performance and infrastructure cost savings over any Presto distribution at any scale.
Dremio is the original co-creator of Apache Arrow, and has built the first and only cloud data lake engine from the ground up on Apache Arrow. At its core, Dremio utilizes in-memory execution, powered by Apache Arrow (columnar in-memory data format) with Gandiva (LLVM-based execution kernel).
Dremio vs. Presto Performance and Efficiency Benchmark
The core capabilities of the Dremio data lake engine allow it to execute queries extremely fast and efficiently directly on cloud data lake storage. With advanced technologies like columnar cloud cache (C3), predictive pipelining and massive parallel readers for S3, the Dremio engine delivers 4x better performance and up to 12x faster ad hoc queries out of the box than any distribution of Presto. And for BI/reporting queries Dremio offers additional acceleration technologies such as data reflections to make queries up to 3,000x faster compared to traditional SQL engines.
The Dremio elastic engines capability included in the Dremio AWS Edition offers even bigger savings on infrastructure costs with the ability to use the cloud computing resources your queries need, only when they need it. When there is no query activity, the engine remains shut down and consumes no compute resources. Incoming query demands trigger the engine to automatically start and elastically scale up to its full, tailored size. When the queries pause, the engine again automatically and elastically scales back down and stops. In other words, Dremio AWS Edition takes full advantage of the underlying elasticity of AWS to give you more value for every query, reducing AWS query compute costs by 60% on average.
In contrast, Presto-based SQL engines rely on row-based processing across the distributed cluster. Presto architecture is similar to a classic massively parallel processing (MPP) database management system. Similar to Dremio, it has one coordinator node working in sync with multiple worker nodes. However, it does not provide any multi-engine capabilities and workload isolation. All workload types execute within the same cluster.
While all processing is done in-memory and pipelined across the network between stages to avoid any unnecessary I/O overhead, the columnar data from the storage is still translated to row-based representation within the Presto engine for further processing. This approach falls short on a larger scale, demanding more additional compute resources in order to be able to load and redistribute requested data across all available resources.
This table compares the Dremio and Presto architectures side by side and highlights the major differences in the underlying technologies that allow Dremio to achieve unprecedented performance and cost-efficiency at any scale.
While Presto is broadly used, the technology underneath is outdated and unable to support the demands of modern data teams building next-generation cloud data lakes. In turn, slow performance and high operational costs are motivating organizations to seek alternatives to more efficiently meet their modern-day needs. Only Dremio delivers the ability to execute queries directly on cloud data lake storage with high efficiency, low cost and high performance, making it the ideal data lake engine for your cloud data lake.
Download your copy of the Dremio vs. Presto Performance and Efficiency Benchmark to learn more about how and why Dremio outperforms any distribution of Presto.