The Data Lakehouse Query Engine

Dremio is the only engine built from the ground up to deliver high-performing BI dashboards and interactive analytics directly on data lake storage.

The World’s Fastest Lakehouse Engine

With query acceleration technologies like reflections and columnar cloud cache (C3), we make it possible to achieve interactive response times directly on data lake storage, without having to copy the data into warehouses, marts, extracts or cubes.

C3: Columnar Cloud Cache

Columnar Cloud Cache (C3) enables Dremio to achieve NVMe-level I/O performance on S3/ADLS/GCS by leveraging the NVMe/SSD built into cloud compute instances, like Amazon EC2 and Azure Virtual Machines.

C3 only caches data required to satisfy your workloads and can even cache individual microblocks within datasets. If your table has 1,000 columns and you only query a subset of those columns and filter for data within a certain timeframe, then C3 will just cache that portion of your table.

By selectively caching data, C3 also eliminates over 90% of S3/ADLS/GCS I/O costs, which can make up 10-15% of the costs for each query you run.

Reflections

Reflections are data structures that intelligently precompute aggregations and other operations on data, so you don’t have to do complex aggregations and drilldowns on the fly.

Reflections are completely transparent to end users. Instead of connecting to a specific materialization, users query the desired tables and views and the Dremio optimizer picks the best Reflections to satisfy and accelerate the query.

Aside from simplicity for data analysts, Reflections are also incredibly easy to create and maintain! You can use a UI or REST API to administer Reflections, instead of having to write complicated SQL statements to define materialized views and refresh rules.

Cost-Based Optimizer

Query engines can choose from multiple strategies to execute any query you submit. Picking the right strategy is crucial — the wrong join algorithm could grind you to a halt!

Dremio’s cost-based optimizer picks the fastest path to complete your query by understanding deep statistics about the data you want to query, including location, cardinality, and distribution. It uses that data to accurately predict how much data will flow through the query’s operators so that it can choose the best plan. It also takes into account the Reflections in the system, and rewrites the query plan to use them.

Granular Pruning

Runtime filtering enables Dremio to dynamically apply filters from a smaller joined table to a larger table to enhance filtering on larger tables. Dremio automatically applies these filters on joins without any user involvement and provides up to 100x improved performance when working with traditional star or snowflake schemas.

Apache Arrow Gandiva

Dremio is a columnar engine powered by Apache Arrow, the open source standard for columnar, in-memory computing (which we co-created!).

Dremio leverages Gandiva, an LLVM-based library for runtime code generation, to create machine code that efficiently evaluates arbitrary expressions on batches of columnar Arrow data, rather than row-based execution.

Gandiva maximizes CPU utilization and leverages optimizations like vectorized processing and SIMD execution to make your queries fly!

Learn More >

Apache Arrow Flight

Apache Arrow is Dremio’s internal memory format, and it’s also the standard for Python and R developers with over 20 million downloads per month. Arrow Flight is a modern, open source RPC framework that was co-created by Dremio to enable ultra-fast data transfer between Arrow-enabled systems.

Flight eliminates serialization and deserialization, enables parallelism, and avoids the need for proprietary client-side drivers. The result: 20-100x faster access to query results compared to traditional JDBC and ODBC interfaces.

Learn More>

Multi-Engine Architecture and Workload Management

Dremio features a multi-engine architecture, so you can create multiple right-sized, physically isolated engines for various workloads in your organization. You can easily set up workload management rules to route queries to the engines you define, so you’ll never have to worry again about complex data science workloads preventing an executive’s dashboard from loading.

Aside from eliminating resource contention, engines can quickly resize to tackle workloads of any concurrency and throughput, and auto-stop when you’re not running queries.

Zero noisy neighbors, 100% resource control, 60% lower compute costs.

get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us