Dremio Jekyll


Announcing the Dremio Fall 2020 Release

Oct 27, 2020
Louise Westoby

With unlimited inexpensive storage as well as open source file and table formats, the data lake has emerged as the default data repository in the cloud. In addition, the cloud’s infinite supply of on-demand compute resources has led to the rise of decoupled and elastic compute engines such as Dremio (SQL), Apache Spark (batch) and Apache Kafka (streaming). The resulting separation of data from both storage and compute means that cloud data lakes are much more scalable and cost-efficient than cloud data warehouses.

Despite these advances, cloud data lakes have not historically addressed the needs of business analysts because they are either:

  • Constrained by querying the data directly on the data lake using traditional SQL engines that are too slow for modern BI tools
  • Delayed due to extensive wait time while the data team moves subsets of the data into data warehouses or data marts and, in some cases, then optimize that data for querying by creating cubes, extracts and aggregation tables

Today, we announced the Dremio Fall 2020 Release which tackles these challenges head-on, making BI on the data lake a reality. You no longer have to invest weeks of engineering work to update dashboards or datasets, and your BI users can enjoy newfound control and empowerment—and achieve significantly faster time to insight.


For more information on the Dremio Fall Release, attend the November 12 webinar

Register today


New Low-Latency Query Technology Accelerates BI Queries Directly on S3 and ADLS

The Dremio Fall 2020 release includes innovative new features that deliver sub-second query response times directly on Amazon S3 and Azure Data Lake Storage (ADLS), support for thousands of concurrent users and queries and more than 100x performance acceleration for star schema workloads.

Apache Arrow Caching

Dremio can now cache data reflections (physically optimized representations of data) in the Apache Arrow format so the data can be directly loaded into memory with zero compute overhead. This eliminates the need to decode and decompress data at runtime, enabling sub-second query response times for BI dashboards. This is reflected on a TPC-H benchmark where query processing time dropped up to 80% by caching the reflections using Arrow.

Apache Arrow Caching

Scale-Out Query Planning

Dremio now supports horizontal scaling for both multiple coordinator nodes and executor nodes. This provides the ability to run high-concurrency workloads consisting of thousands of simultaneous users and queries.

Runtime Filtering

With runtime filtering, Dremio automatically leverages runtime intelligence from dimension tables to drastically reduce the amount of data that must be read from a fact table. This results in a performance speedup of more than 100x for star schemas, workloads that have traditionally only been run on data warehouses.

Improved Front-and Back-End Connectivity

In addition to delivering sub-second query speeds, Dremio now also makes it faster and easier for your BI users to run SQL queries using Microsoft Power BI. In addition, you can now also join data in your data lake with data in relational databases without creating copies.

Enhanced Power BI Integration

Microsoft and Dremio have partnered to build a deeper integration between Power BI and Dremio that allows users to launch Power BI directly from the Dremio interface with the click of a button. Power BI then automatically connects using the native Dremio connector, so users can easily transition from building a dataset in Dremio to analyzing their data in Power BI.

 

Dremio is a Microsoft Gold Cloud Competency Partner, a Power BI Partner and an Azure Data Lake Storage (ADLS) partner.

External Queries

With the Fall 2020 Release, Dremio now enables users to incorporate explicit SQL queries on relational databases within Dremio virtual datasets. This makes it easy to join data between large datasets in a cloud data lake and smaller datasets in existing relational databases. You can push down any SQL statement, including proprietary and custom database functions.

Dremio External Queries

Slash Data Warehouse Costs with Dremio

This latest Dremio release delivers simple, self-service access to data and enables your analysts to see results immediately, significantly reducing their dependency on manual ETL processes or data engineering. You can now run production BI workloads directly on S3 and ADLS — without compromising on query performance or having to move data into data warehouses, cubes, aggregation tables or extracts. The result? You’ll be able to significantly reduce your dependency on your data warehouse and significantly reduce data warehousing costs.

Availability

The new product features are available today in both AWS and Azure. Deployment options are available here.

Learn More

To learn more about the new features available in the Demio Fall 2020 Release, please join us for this webinar. We’ll discuss and demonstrate:

  • New Apache Arrow-based capabilities that deliver sub-second query response times directly on cloud data lakes
  • How you can support thousands of concurrent users and queries
  • How Dremio accelerates performance >100x for data warehousing workloads that use star schemas
  • A new, built-in Power BI integration that lets users immediately query data via Power BI’s DirectQuery

For a complete list of new features, enhancements, changes and fixes, please review the release notes. As always, we look forward to your feedback. Please post any questions or comments on our community site.

Ready to get started?