Dremio Reflection Recommender

We are delighted to announce the release of the new Dremio Reflection Recommender. The Reflection Recommender eliminates guesswork by facilitating the creation of Reflections that will accelerate the input query workload.

This capability is available now in Dremio Cloud and 24.2 release of Dremio software.

Accelerate BI workloads with Reflections in seconds

A Reflection is an optimized materialization of an “anchor” table or view that can be used to partially or wholly accelerate data processing. A Reflection may be aggregated, sorted, and/or partitioned in various ways. The Dremio query engine automatically uses Reflections to accelerate queries by matching and substituting them in a query plan.

Reflections are particularly useful for BI workloads because they address performance challenges for BI dashboards and reports (e.g., Tableau, Power BI, Looker). As a result, data teams no longer need to export data from the lakehouse into Tableau extracts or Power BI imports, or to wire up new ETL pipelines to precook tables in the lakehouse for BI use cases.

Dremio now includes the Reflection Recommender using the sys.recommend_reflections table function. To get the most out of Dremio Reflections, it is essential to design Reflections that require fewer resources to build, but can accelerate frequently executed slow and expensive queries. The Reflection Recommender simplifies the creation of Reflections to accelerate BI workloads by streamlining the design process.

This blog demonstrates the power of the Reflection Recommender.

Reflection Recommender in action:

With the Reflection Recommender, users identify the most frequently run slow queries and submit their job IDs with the sys.recommend_reflections table function. The recommender will retrieve job ID profiles from the catalog, process them, and output a SQL statement to construct VIEW, also known as a supporting anchor, and an ALTER DATASET statement to construct a Reflection to accelerate input queries.

The following command generates recommendations for input BI workloads in seconds:

select * from TABLE(sys.recommend_reflections(ARRAY['<job_id>'[, ...] ]));

For this demonstration, we’ll use three queries on the TPC-H dataset that took a few seconds, but you can get recommendations for your dataset and workload. We will start by pulling the job_id of these queries from Dremio UI and call sys.recommend_reflections table function to get recommendation within seconds. Once we have a recommendation, we will apply this recommendation and confirm that the above queries will get accelerated and run under a second.

-- Query: 1
-- Job ID: 1b229fe1-f17f-b153-3675-c677ccfbff00
SELECT Max("p_size")
FROM s3."datasets.dremio.com".tpch.sf10.parquet.part
JOIN s3."datasets.dremio.com".tpch.sf10.parquet.lineitem
ON "p_partkey" = "l_partkey"
GROUP BY "p_brand";
-- Query: 2
-- Query with various aggregate functions over different columns
-- Job ID: 1b22a3c3-3a2b-431f-2f8a-4a27f4ab8400
SELECT MAX("p_size"), MIN("P_RETAILPRICE"), AVG("P_RETAILPRICE"), SUM("p_size"),
COUNT("p_size")
FROM s3."datasets.dremio.com".tpch.sf10.parquet.part
JOIN s3."datasets.dremio.com".tpch.sf10.parquet.lineitem
ON "p_partkey" = "l_partkey"
GROUP BY "p_brand";
-- Query: 3
-- Query with different filters(WHERE, HAVING) + ORDER BY and LIMIT clauses
-- Job ID: 1b229cdd-4fe2-5ee1-d3e1-7def15fe4700
SELECT MAX("p_size"), MIN("P_RETAILPRICE"), AVG("P_RETAILPRICE") AS avg_price,
SUM("p_size"), COUNT("p_size")
FROM s3."datasets.dremio.com".tpch.sf10.parquet.part
JOIN s3."datasets.dremio.com".tpch.sf10.parquet.lineitem
ON "p_partkey" = "l_partkey"
WHERE NOT "L_TAX" < 0.01
GROUP BY "p_brand"
HAVING avg_price > 50
ORDER BY avg_price
LIMIT 50;
select * from TABLE (sys.recommend_reflections(ARRAY['1b229fe1-f17f-b153-3675-c677ccfbff00',
'1b22a3c3-3a2b-431f-2f8a-4a27f4ab8400', '1b229cdd-4fe2-5ee1-d3e1-7def15fe4700']));

In a few seconds, the sys.recommend_reflections table function will generate the following Reflection recommendation for the above three queries. CREATE VIEW is also referred to as a supporting anchor and ALTER DATASET is used to create the aggregate Reflection in the output.

CREATE VIEW "recommended_view"."Dataset_a0f7fb80-9a5e-4e10-8ea9-725fe4fd5d56" AS
SELECT "lineitem1"."L_TAX" AS "F0[L_TAX]", "part1"."P_BRAND" AS "F1[P_BRAND]",
"part1"."P_SIZE" AS "F2[P_SIZE]", "part1"."P_RETAILPRICE" AS "F3[P_RETAILPRICE]",
"part1"."P_PARTKEY" AS "extra#0", "part1"."P_NAME" AS "extra#1", "part1"."P_MFGR" AS
"extra#2", "part1"."P_TYPE" AS "extra#4", "part1"."P_CONTAINER" AS "extra#6",
"part1"."P_COMMENT" AS "extra#8", "lineitem1"."L_ORDERKEY" AS "extra#9",
"lineitem1"."L_PARTKEY" AS "extra#10", "lineitem1"."L_SUPPKEY" AS "extra#11",
"lineitem1"."L_LINENUMBER" AS "extra#12", "lineitem1"."L_QUANTITY" AS "extra#13",
"lineitem1"."L_EXTENDEDPRICE" AS "extra#14", "lineitem1"."L_DISCOUNT" AS "extra#15",
"lineitem1"."L_RETURNFLAG" AS "extra#17", "lineitem1"."L_LINESTATUS" AS "extra#18",
"lineitem1"."L_SHIPDATE" AS "extra#19", "lineitem1"."L_COMMITDATE" AS "extra#20",
"lineitem1"."L_RECEIPTDATE" AS "extra#21", "lineitem1"."L_SHIPINSTRUCT" AS "extra#22",
"lineitem1"."L_SHIPMODE" AS "extra#23", "lineitem1"."L_COMMENT" AS "extra#24"
FROM "s3"."datasets.dremio.com"."tpch"."sf10"."parquet"."lineitem" AS "lineitem1"
INNER JOIN "s3"."datasets.dremio.com"."tpch"."sf10"."parquet"."part" AS "part1" ON
"part1"."P_PARTKEY" = "lineitem1"."L_PARTKEY";
ALTER DATASET "recommended_view"."Dataset_a0f7fb80-9a5e-4e10-8ea9-725fe4fd5d56"
CREATE AGGREGATE REFLECTION "agg_f1d8fa3b-1d80-4502-9f39-dcb9a328220f"
USING DIMENSIONS ("F0[L_TAX]", "F1[P_BRAND]") MEASURES ("F3[P_RETAILPRICE]" (SUM, MIN,
COUNT), "F2[P_SIZE]" (MAX, SUM, COUNT));

After the user has created and altered the preceding view definition, three queries that took more than a few seconds will execute in less than one second.

With Dremio Reflection Recommender, you can now accelerate your BI workloads in a matter of seconds by obtaining recommendations. In this release, Dremio will only recommend Aggregate Reflections, but future releases will include recommendations for raw Reflections. More information about this table function can be found in our documentation here.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.