Dremio vs. Starburst

You want to run interactive, high-performance BI dashboards and analytics directly on your data lake storage. Which SQL query technology should you use? Here’s what you should consider in comparing Dremio to Starburst.

Dremio vs. Starburst Comparison

FeatureDremioStarburst
Workload FocusAd-hoc, mission critical BI, and everything in between directly against the data lakeAd-hoc
ArchitectureDremioStarburst
Memory formatColumnar, Apache ArrowColumnar, custom format
Physical workload isolation modelSingle cluster, multiple enginesMultiple clusters
ScalabilityAuto scale feature to scale up/down to handle any fluctuations in workloadKey functionalities around auto-scaling are limited to enterprise edition
Transparent AccelerationRobust and proven (Data Reflections)Very limited
Pushdowns with Relational DBsYesYes
High performance NVMe accelerationYes (Columnar Cloud Cache)Yes, but limited
Data Curation & Semantic LayerYesNo
Data LineageYesNo
User InterfaceYes, user-friendly UI supporting all featuresYes, but very limited capabilities
SQL EngineArrow in-memory with Gandiva LLVM compilation to machine code for perfJava-based compilation only
LLVM Compilation to Machine CodeYes (Gandiva)
No (Java only)
Predictive Reads for S3/ADLS Data SetsYesNo

Dremio vs. Starburst
Performance and Cost Comparison

In benchmarking reports, Dremio provides significant performance and cost advantages over Starburst.

TPC-DS  Performance Benchmarking Results

  • TPC-DS – at SF1000 (1TB) scale factor: For the same performance as Dremio, Starburst requires 3.4x higher cost and 3x as many nodes 
  • TPC-DS – at SF10000 (10TB) scale factor: To achieve similar performance to an 8-node Dremio engine, Starburst requires 2x more nodes at about 2.3x the cost
  • Performance gap is consistent as workload scales

  BI/Reporting Queries

  • Dremio is over 300x faster than Starburst for BI/reporting queries with Data Reflections enabled on a 4-node cluster at SF10000(10TB)

Dremio Overview

Dremio is a SQL lakehouse platform that enables high-performance business intelligence (BI) and analytics directly on data lake storage. Dremio simplifies data engineering and eliminates the need to copy data into proprietary data warehouses and create cubes, aggregation tables and BI extracts, providing flexibility and control for data architects and data engineers, and self-service for data consumers.

Key Facts about Dremio Technology

  • Ideal for high-performing BI dashboards and interactive analytics directly on data lake storage. Dremio technologies (Data Reflections, Columnar Cloud Cache, Gandiva, Predictive Pipelining) work alongside Apache Arrow for lightning-fast analytics.
  • Self-service semantic layer enables data exploration, curation, and collaboration
  • High performance: Nearly 2-3x faster with significant cost savings on compute
  • Low-latency analytics at high concurrency
  • Fast and easy to productionalize low-latency analytics. Virtual datasets and Data Reflections together make it easy to productionalize an analytic product such as a dashboard out to production by eliminating complex steps like ETL pipelines, managing data copies, and building BI extracts.
  • Elastic engines: Dremio supports the ability to provision multiple separate execution engines from a single Dremio coordinator node, start and stop based on predefined workload requirements at runtime
  • Rich UI experience for the both technical and non-technical users with advanced functionalities
  • Cost efficiency: Auto-stop/start and right-sized engines eliminate the need to over-provision infrastructure and lowers the EC2 compute cost by ~60%
  • Transactional data tier for data lakes: Enables you to deliver data warehouse-like capabilities directly on a data lake
  • Open data architecture (loosely-coupled) with the flexibility to choose best-breed-of engines, both today and tomorrow
  • Infinitely scalable platform can handle workloads of any concurrency with an elastic architecture that scales infinitely

Starburst Overview

Starburst is the commercial version of the open source Trino (formerly PrestoSQL). Trino is a SQL-based query engine that gives users the ability to run analytics on data lakes. Starburst is fundamentally designed to handle non-interactive use cases such as ad-hoc workloads.

Without a query acceleration layer, Starburst requires data engineers to copy and move subsets of data into cloud data warehouse (CDW) or data mart platforms and create cubes or aggregation tables. Starburst may work for federated queries over disparate data sources but it’s not designed for interactive, BI workloads that demand high concurrency and low latency.

Key Facts about Starburst Technology

  • Based on Trino, a fork of Presto
  • Generally only used for ad-hoc workloads
  • Generally requires ETL pipelines, data copies, and/or BI extracts for low-latency analytics
  • Very limited capabilities around materialized views
  • Inability to provide low latency analytics for BI workloads
  • Poor UI experience with limited functionality

Deep Dive Resources

CASE STUDY

Boosts Productivity and Accelerates Growth

Leap selected Dremio to provide self-service access to collected data from thousands of energy meters, dramatically improving analyst productivity, reducing data engineering workload, and improving the quality and timeliness of business decisions.

Read More
Whitepaper

WHITEPAPER

The Path to Self-Service Analytics on the Data Lake

Download this white paper to get a step-by-step roadmap of Dremio adoption. At each step, you’ll learn about benefits gained, as well as the complexities and risks reduced, as workloads are migrated from traditional systems to Dremio.

Read More

DATASHEET

Intro to Dremio Cloud

Find out more about Dremio Cloud, the only data lakehouse platform built for SQL and built on open source technologies that both data engineers and data analysts love. Dremio powers BI dashboards and interactive analytics directly on data lake storage.

Read More

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

Watch Demo

Not ready to get started today? See the platform in action.

Check Out Demo