Dremio vs. Presto Distros – Performance and Efficiency Benchmark
Today we are excited to share our benchmark results based on the TPC-DS benchmarking methodology for general-purpose decision support systems. This is our first (but not last) official benchmarking effort where we compared the efficiency and performance of Dremio, the cloud data lake engine, to various flavors of Presto with the ultimate goal of providing a side-by-side comparison of performance and efficiency of query engines. Our benchmark compares and highlights query execution cost and performance, evaluates execution time for BI/reporting and ad hoc queries, and offers additional analysis of performance improvements with Dremio’s data reflections.
Our test is based on the most recent product versions that were available mid-April of 2020. New versions have and will continue to be released over time (e.g., Dremio has a monthly release cadence), and we are committed to continuing this effort on an ongoing basis. We compared Dremio AWS Marketplace edition version 4.2.1 versus PrestoDB 0.233.1, PrestoSQL 332, Starburst Presto 323e and AWS Athena.
For this exercise, we leveraged AWS cloud infrastructure and executed benchmark tests on the SF1000 (1TB) and SF10,000 (10TB) scale factors using a range of node counts to test the scalability and performance of the engines on the large datasets. The TPC-DS generated data in the Apache Parquet file format that resided in the AWS S3 bucket within the same cloud region as EC2 compute instances. We leveraged AWS Marketplace and serverless offerings as much as possible to make it easy for others to reproduce the same test results. However, with the open source versions of PrestoDB and PrestoSQL, we had no choice but to manually provision engine instances on EC2 by following deployment guides and configurations.
Findings: Cost, Performance and Execution
Based on the benchmarking tests, on average Dremio is 3x-4x faster than PrestoDB on the SF1000 (1TB) scale factor, and at this scale PrestoDB failed up to 10% of the executed queries. Achieving performance similar to a 4-node Dremio cluster requires 20 worker nodes at 5x the cost.
The overall performance of PrestoSQL during the benchmarking test at the SF1000 (1TB) scale factor was up to 3x slower than Dremio and at the same time at least 2x more expensive to run on the same m5d.8xlarge EC2 instance type. For example, to achieve similar performance to Dremio’s 4-node engine,PrestoSQL requires 16 to 20 nodes with up to 4.5x increase in the cost and 4x-5x larger compute infrastructure footprint. At the SF10000 (10TB) scale factor, PrestoSQL maintained the same 2x-3x cost and performance gap compared to Dremio, and even at 20 nodes, it was still slower than an 8-node Dremio cluster. It requires at least 10 nodes of PrestoSQL to achieve similar performance to a 4-node Dremio cluster, representing approximately 2.5x higher total execution cost.
At the 1000 (1TB) scale factor, Starburst Presto requires at least 12 worker nodes to achieve the same performance as a 4-node Dremio engine. In other words, for the same performance as Dremio, Starburst required 3.4x higher cost and 3x as many nodes (despite leveraging higher-end memory-optimized instances). Overall, Dremio offers approximately 2x better performance and cost saving over Starburst Presto on the same number of nodes. At a larger scale, the gap in performance remains the same. At the SF10000 (10TB) scale factor, Dremio is on average 2x faster than Starburst Presto on the same number of nodes. For example, to achieve similar performance to an 8-node Dremio engine, Starburst requires 2x more nodes at about 2.3x the cost.
AWS Athena is a shared service, and the performance usually depends on the time of day or day of the week. In our case, Athena was 3.5x more expensive and 6x slower than an 8-node Dremio engine, or 2x more expensive and 10x slower than a 20-node Dremio engine at the SF1000 (1TB) scale factor. Athena didn’t do well at the SF10000 (10TB) scale factor, the cost of execution for successful queries was 1.5x more expensive than an 8-node Dremio engine and the performance was 6.4x slower.
As for individual query types, for BI/reporting queries Dremio demonstrated up to 9x better performance than Presto on the SF1000 (1TB) and SF10000 (10TB) scale factors, and up to 5x less infrastructure footprint for the same query performance. And for ad hoc queries, all distributions of Presto were up to 12x less performant and for some queries were never able to deliver the same query execution time even with a 20-node cluster compared to a Dremio 4-node engine.
In general, for all queries that were executed at SF1000 (1TB) scale factor during the benchmarking test, Dremio was more than 2x faster than Starburst Presto, 3x faster than PrestoSQL, 4x faster than PrestoDB and 6x faster than AWS Athena at significantly less cost.
Acceleration With Data Reflections
And finally, by utilizing Dremio data reflections, that are essentially a hybrid between materialized views and indexes that are created in advance and then utilized transparently during query execution, Dremio delivers multiple orders of magnitude performance and efficiency gains compared to the Prestos. At the SF10000 (10TB) scale, the maximum performance gain versus Prestos for BI/reporting queries was 1,700x and ad hoc up to 3,000x with tremendous savings on the infrastructure cost.
The biggest observation that we want to highlight is that Prestos did not do well on relatively small clusters (less than 8 nodes) or on Parquet files with large row group size. The query execution success rate was approximately 70%-80%, and the Prestos were constantly throwing an “INSUFFICIENT RESOURCES” error due to the selected amount of data at the 1TB and 10TB scale and inability to parse returned results in allocated memory. Dremio, on the other hand,delivers consistent performance even with a 4-node cluster. We had to reduce the Parquet files row group size from the recommended 256MB to 128MB to decrease the memory footprint to overcome the “INSUFFICIENT RESOURCES” error, which allowed us to match the Prestos query success rate with Dremio. This, however, makes questionable practicality of usage of Prestos on datasets with many columns or wide columns because small row group size reduces throughput and efficiency while a larger row group size offers better data compression and IO throughput for a full or large table scan.
We measured the average execution cost per query, based on the EC2 instance hourly cost and the number of nodes in the cluster. Query engines can scale horizontally and better throughput can be achieved by spending more on infrastructure, however, getting the best performance by spending less with the ability to do more with fewer resources would be the most desired outcome for any organization.
While Presto is broadly used, the technology underneath is outdated and not capable of supporting the demands of next-generation data lakes. Slow performance and high operational costs are not something that companies are looking for these days. The ability to execute queries directly on cloud data lake storage with high efficiency, at low cost and high performance is the highest priority on the list of demands for next-generation cloud data lakes.
Based on the results of benchmarking tests, Dremio’s cloud data lake query engine is the most performant and efficient on the market, offering truly interactive query performance at a fraction of the cost compared to Presto-based engines. The modern technologies of Dremio, powered by Apache Arrow, offer significant unmatched query performance and infrastructure cost savings even with a small compute footprint. Dremio offers truly interactive query speed with 4x better performance on average and up to 3x cost savings on infrastructure at any scale, delivers up to 12x faster ad hoc queries and up to 9x BI/reporting query speed, and achieves more than 1,000x better query performance with additional acceleration via data reflections.
Read more about the Dremio versus Presto performance and efficiency benchmark here.