Get Started

The Data Lakehouse Query Engine

Dremio is the only engine built from the ground up to deliver high-performing BI dashboards and interactive analytics directly on data lake storage.

The World’s Fastest Lakehouse Engine

With query acceleration technologies like data reflections and columnar cloud cache (C3), we make it possible to achieve interactive response times directly on data lake storage, without having to copy the data into warehouses, marts, extracts or cubes.
Columnar Cloud Cache diagram

C3: Columnar Cloud Cache

Columnar Cloud Cache (C3) enables Dremio to achieve NVMe-level I/O performance on S3/ADLS/GCS by leveraging the NVMe/SSD built into cloud compute instances, like Amazon EC2 and Azure Virtual Machines.

C3 only caches data required to satisfy your workloads and can even cache individual microblocks within datasets. If your table has 1,000 columns and you only query a subset of those columns and filter for data within a certain timeframe, then C3 will just cache that portion of your table.

By selectively caching data, C3 also eliminates over 90% of S3/ADLS/GCS I/O costs, which can make up 10-15% of the costs for each query you run.
Data Reflections product screenshot

Data Reflections

Data Reflections are data structures that intelligently precompute aggregations and other operations on data, so you don’t have to do complex aggregations and drilldowns on the fly.

Reflections are completely transparent to end users. Instead of connecting to a specific materialization, users query the desired tables and views and the Dremio optimizer picks the best Reflections to satisfy and accelerate the query.

Aside from simplicity for data analysts, Reflections are also incredibly easy to create and maintain! You can use a UI or REST API to administer Reflections, instead of having to write complicated SQL statements to define materialized views and refresh rules.
Cost-Based Optimizer product screenshot

Cost-Based Optimizer

Query engines can choose from multiple strategies to execute any query you submit. Picking the right strategy is crucial — the wrong join algorithm could grind you to a halt!

Dremio’s cost-based optimizer picks the fastest path to complete your query by understanding deep statistics about the data you want to query, including location, cardinality, and distribution. It uses that data to accurately predict how much data will flow through the query’s operators so that it can choose the best plan. It also takes into account the Reflections in the system, and rewrites the query plan to use them.
Granular Pruning diagram

Granular Pruning

Runtime filtering enables Dremio to dynamically apply filters from a smaller joined table to a larger table to enhance filtering on larger tables. Dremio automatically applies these filters on joins without any user involvement and provides up to 100x improved performance when working with traditional star or snowflake schemas.
Gnarly

Apache Arrow Gandiva

Dremio is a columnar engine powered by Apache Arrow, the open source standard for columnar, in-memory computing (which we co-created!).

Dremio leverages Gandiva, an LLVM-based library for runtime code generation, to create machine code that efficiently evaluates arbitrary expressions on batches of columnar Arrow data, rather than row-based execution.

Gandiva maximizes CPU utilization and leverages optimizations like vectorized processing and SIMD execution to make your queries fly!
Apache Arrow Flight diagram

Apache Arrow Flight

Apache Arrow is Dremio’s internal memory format, and it’s also the standard for Python and R developers with over 20 million downloads per month. Arrow Flight is a modern, open source RPC framework that was co-created by Dremio to enable ultra-fast data transfer between Arrow-enabled systems.

Flight eliminates serialization and deserialization, enables parallelism, and avoids the need for proprietary client-side drivers. The result: 20-100x faster access to query results compared to traditional JDBC and ODBC interfaces.
Multi-Engine Architecture and Workload Management diagram

Multi-Engine Architecture and Workload Management

Dremio features a multi-engine architecture, so you can create multiple right-sized, physically isolated engines for various workloads in your organization. You can easily set up workload management rules to route queries to the engines you define, so you’ll never have to worry again about complex data science workloads preventing an executive’s dashboard from loading.

Aside from eliminating resource contention, engines can quickly resize to tackle workloads of any concurrency and throughput, and auto-stop when you’re not running queries.

0 noisy neighbors, 100% resource control, 60% lower compute costs.

Ready to Get Started? Here Are Some Resources to Help

...

eBook

High Performance BI on Data Lake

3 Steps for Making High-Performance BI Work Directly with Cloud Data Lake Storage

...

White Paper

The New Data Tier

Learn how the new data tier brings data warehousing capabilities to the data lake and enables net-new capabilities that data warehouses cannot provide

...

Webinar

The Next Generation Cloud Data Architecture

David Loshin from TDWI helps you prepare for the next-generation cloud data architecture and discusses steps to take best advantage of this modernized environment.

see all resources

Ready for an amazing BI experience?

Test Drive Deploy Now
Gnarly Surfing