8 minute read · September 13, 2023
Dremio Reflection Recommender
· Software Engineer, Dremio
· Senior Staff Engineer, Dremio
· Principal Product Manager, Dremio
Dremio Reflection Recommender
We are delighted to announce the release of the new Dremio Reflection Recommender. The Reflection Recommender eliminates guesswork by facilitating the creation of Reflections that will accelerate the input query workload.
This capability is available now in Dremio Cloud and 24.2 release of Dremio software.
Accelerate BI workloads with Reflections in seconds
A Reflection is an optimized materialization of an “anchor” table or view that can be used to partially or wholly accelerate data processing. A Reflection may be aggregated, sorted, and/or partitioned in various ways. The Dremio query engine automatically uses Reflections to accelerate queries by matching and substituting them in a query plan.
Reflections are particularly useful for BI workloads because they address performance challenges for BI dashboards and reports (e.g., Tableau, Power BI, Looker). As a result, data teams no longer need to export data from the lakehouse into Tableau extracts or Power BI imports, or to wire up new ETL pipelines to precook tables in the lakehouse for BI use cases.
Dremio now includes the Reflection Recommender using the sys.recommend_reflections table function. To get the most out of Dremio Reflections, it is essential to design Reflections that require fewer resources to build, but can accelerate frequently executed slow and expensive queries. The Reflection Recommender simplifies the creation of Reflections to accelerate BI workloads by streamlining the design process.
This blog demonstrates the power of the Reflection Recommender.
Reflection Recommender in action:
With the Reflection Recommender, users identify the most frequently run slow queries and submit their job IDs with the sys.recommend_reflections table function. The recommender will retrieve job ID profiles from the catalog, process them, and output a SQL statement to construct VIEW, also known as a supporting anchor, and an ALTER DATASET statement to construct a Reflection to accelerate input queries.
The following command generates recommendations for input BI workloads in seconds:
select * from TABLE(sys.recommend_reflections(ARRAY['<job_id>'[, ...] ]));
For this demonstration, we’ll use three queries on the TPC-H dataset that took a few seconds, but you can get recommendations for your dataset and workload. We will start by pulling the job_id of these queries from Dremio UI and call sys.recommend_reflections table function to get recommendation within seconds. Once we have a recommendation, we will apply this recommendation and confirm that the above queries will get accelerated and run under a second.
-- Query: 1 -- Job ID: 1b229fe1-f17f-b153-3675-c677ccfbff00 SELECT Max("p_size") FROM s3."datasets.dremio.com".tpch.sf10.parquet.part JOIN s3."datasets.dremio.com".tpch.sf10.parquet.lineitem ON "p_partkey" = "l_partkey" GROUP BY "p_brand"; -- Query: 2 -- Query with various aggregate functions over different columns -- Job ID: 1b22a3c3-3a2b-431f-2f8a-4a27f4ab8400 SELECT MAX("p_size"), MIN("P_RETAILPRICE"), AVG("P_RETAILPRICE"), SUM("p_size"), COUNT("p_size") FROM s3."datasets.dremio.com".tpch.sf10.parquet.part JOIN s3."datasets.dremio.com".tpch.sf10.parquet.lineitem ON "p_partkey" = "l_partkey" GROUP BY "p_brand"; -- Query: 3 -- Query with different filters(WHERE, HAVING) + ORDER BY and LIMIT clauses -- Job ID: 1b229cdd-4fe2-5ee1-d3e1-7def15fe4700 SELECT MAX("p_size"), MIN("P_RETAILPRICE"), AVG("P_RETAILPRICE") AS avg_price, SUM("p_size"), COUNT("p_size") FROM s3."datasets.dremio.com".tpch.sf10.parquet.part JOIN s3."datasets.dremio.com".tpch.sf10.parquet.lineitem ON "p_partkey" = "l_partkey" WHERE NOT "L_TAX" < 0.01 GROUP BY "p_brand" HAVING avg_price > 50 ORDER BY avg_price LIMIT 50; select * from TABLE (sys.recommend_reflections(ARRAY['1b229fe1-f17f-b153-3675-c677ccfbff00', '1b22a3c3-3a2b-431f-2f8a-4a27f4ab8400', '1b229cdd-4fe2-5ee1-d3e1-7def15fe4700']));
In a few seconds, the sys.recommend_reflections table function will generate the following Reflection recommendation for the above three queries. CREATE VIEW is also referred to as a supporting anchor and ALTER DATASET is used to create the aggregate Reflection in the output.
CREATE VIEW "recommended_view"."Dataset_a0f7fb80-9a5e-4e10-8ea9-725fe4fd5d56" AS SELECT "lineitem1"."L_TAX" AS "F0[L_TAX]", "part1"."P_BRAND" AS "F1[P_BRAND]", "part1"."P_SIZE" AS "F2[P_SIZE]", "part1"."P_RETAILPRICE" AS "F3[P_RETAILPRICE]", "part1"."P_PARTKEY" AS "extra#0", "part1"."P_NAME" AS "extra#1", "part1"."P_MFGR" AS "extra#2", "part1"."P_TYPE" AS "extra#4", "part1"."P_CONTAINER" AS "extra#6", "part1"."P_COMMENT" AS "extra#8", "lineitem1"."L_ORDERKEY" AS "extra#9", "lineitem1"."L_PARTKEY" AS "extra#10", "lineitem1"."L_SUPPKEY" AS "extra#11", "lineitem1"."L_LINENUMBER" AS "extra#12", "lineitem1"."L_QUANTITY" AS "extra#13", "lineitem1"."L_EXTENDEDPRICE" AS "extra#14", "lineitem1"."L_DISCOUNT" AS "extra#15", "lineitem1"."L_RETURNFLAG" AS "extra#17", "lineitem1"."L_LINESTATUS" AS "extra#18", "lineitem1"."L_SHIPDATE" AS "extra#19", "lineitem1"."L_COMMITDATE" AS "extra#20", "lineitem1"."L_RECEIPTDATE" AS "extra#21", "lineitem1"."L_SHIPINSTRUCT" AS "extra#22", "lineitem1"."L_SHIPMODE" AS "extra#23", "lineitem1"."L_COMMENT" AS "extra#24" FROM "s3"."datasets.dremio.com"."tpch"."sf10"."parquet"."lineitem" AS "lineitem1" INNER JOIN "s3"."datasets.dremio.com"."tpch"."sf10"."parquet"."part" AS "part1" ON "part1"."P_PARTKEY" = "lineitem1"."L_PARTKEY"; ALTER DATASET "recommended_view"."Dataset_a0f7fb80-9a5e-4e10-8ea9-725fe4fd5d56" CREATE AGGREGATE REFLECTION "agg_f1d8fa3b-1d80-4502-9f39-dcb9a328220f" USING DIMENSIONS ("F0[L_TAX]", "F1[P_BRAND]") MEASURES ("F3[P_RETAILPRICE]" (SUM, MIN, COUNT), "F2[P_SIZE]" (MAX, SUM, COUNT));
After the user has created and altered the preceding view definition, three queries that took more than a few seconds will execute in less than one second.
With Dremio Reflection Recommender, you can now accelerate your BI workloads in a matter of seconds by obtaining recommendations. In this release, Dremio will only recommend Aggregate Reflections, but future releases will include recommendations for raw Reflections. More information about this table function can be found in our documentation here.