13 minute read · March 30, 2020
How to Query Your Data Lake Using SQL Parameters in Excel
· Manager of Consulting Architects EMEA, Dremio
Introduction
There are a myriad of BI tools available today. While they offer a great variety of features for almost every data analysis use case imaginable, spreadsheets are still the most popular way to consume data. If you are one of the whopping 800 million Microsoft Excel users, read on to learn how to tackle one of the most common challenges when trying to query data - from Microsoft Excel - directly on your data lake. In order to make the most out of your data lake, it is important to have swift access to your data, this will allow you to create interactive reports that will support you or your stakeholders to make timely data-driven decisions to improve operations in your organization. A common challenge when creating these reports is the real-time requirements that they bring along, i.e: the ability to change parameters in the report to satisfy data requirements by different users or departments, custom levels of granularity, etc. Up to this point, a normal workaround for this challenge was to create a static report based on each request. In this tutorial I will guide you through the process of how you can connect directly to Dremio via Excel, we show you how to write queries in Excel that get executed in Dremio whenever the sheet is refreshed and we show you how you can parameterise these queries to make Excel’s interactions with the data more dynamic. This tutorial assumes you already have the Dremio ODBC driver installed on the machine where you will run Excel, and that you can connect successfully to Dremio from this machine. If you are not familiar with, or what to refresh your Dremio basics, take a look at our Getting Oriented to Dremio tutorial before proceeding.
Add connection to Dremio and define a query on that connection
The first step in the process is to connect to Dremio in order to define the data that we eventually want to parameterize.
-
- Under the Data tab select Get Data > From Other Sources > From ODBC
-
- Select Dremio Connector from the list of Data source names, then expand the Advanced options:
-
- In the SQL Statement (optional) box, enter a valid Dremio query, then click OK. RECOMMENDATION: Add a WHERE clause that represents the type of filtering you want to perform when any parameters are introduced:
-
- When the query has executed you will see a results summary window, click Load
Now we have a query prepared and ready for a parameter to be added.
Create Excel Query Function
Before we attempt to add a parameter to the Dremio query, one of the key pieces to enabling a seamless parameterization experience is creating a generic Excel query function that allows you to specify a named Excel cell as input and provides the value in that cell as output.This is essentially a one-time task, the function can be used many times across all sheets in a single workbook. In this tutorial we assume the function is called fGetValue, but you can call it whatever you like.
-
- In Excel, under the Data tab select Get Data > From Other Sources > Blank Query
-
- In the Power Query Editor that is presented, select Advanced Editor
-
- Paste the following function into the Advanced Editor window, then click Done:
<figure class="highlight"> <pre><code class="language-sql" data-lang="sql"></code></pre> <table class="rouge-table"> <tbody> <tr> <td class="gutter gl"> <pre class="lineno">1 2 3 4 5 6 7</pre> </td> <td class="code"> <pre><span class="n">let</span> <span class="n">fGetValue</span><span class="o">=</span><span class="p">(</span><span class="n">rangeName</span><span class="p">)</span> <span class="o">=></span> <span class="n">let</span> <span class="n">name</span> <span class="o">=</span> <span class="n">Excel</span><span class="p">.</span><span class="n">CurrentWorkbook</span><span class="p">()</span><span class="err">{</span><span class="p">[</span><span class="n">Name</span><span class="o">=</span><span class="n">rangeName</span><span class="p">]</span><span class="err">}</span><span class="p">[</span><span class="n">Content</span><span class="p">],</span> <span class="n">value</span> <span class="o">=</span> <span class="n">name</span><span class="err">{</span><span class="mi">0</span><span class="err">}</span><span class="p">[</span><span class="n">Column1</span><span class="p">]</span> <span class="k">in</span> <span class="n">value</span> <span class="k">in</span> <span class="n">fGetValue</span></pre> </td> </tr> </tbody> </table> <pre><code class="language-sql" data-lang="sql"></code></pre> <p><code class="language-sql" data-lang="sql"></code><code class="language-sql" data-lang="sql"></code></p> </figure>
-
- Change the name of the query to fGetValue, then click Close & Load:
Add a parameter to a Dremio SQL query in Excel
With our Dremio SQL query and our generic function now ready, we are ready to add the function into the SQL definition in place of any fixed value we may have added to a WHERE clause in the original query. If no WHERE clause was initially present then now will be the time to add one.
-
- Right-Click on a cell in Excel that you want to use as a value for a parameter, select Define Name
-
- Provide a name for the column, in this case we choose to name the column region2, notice how it captures the specific cell this name refers to. Click OK:
- Provide a name for the column, in this case we choose to name the column region2, notice how it captures the specific cell this name refers to. Click OK:
-
- Right-click on the data table that was previously loaded into Excel and select Table > Edit Query from the popup menu
- Right-click on the data table that was previously loaded into Excel and select Table > Edit Query from the popup menu
-
- Select Advanced Editor
-
- In the SQL Query dialog, replace any fixed value predicates with calls to the previously created function instead, passing the name of the cell containing the parameter value into the function, then click Done. Example:
let Source = Odbc.Query("dsn=Dremio Connector", "SELECT COUNTRIES.COUNTRY_ID, COUNTRIES.COUNTRY_NAME, COUNTRIES.REGION_ID FROM ""Staging.Application"".COUNTRIES WHERE COUNTRIES.REGION_ID = " & Number.ToText(fGetValue("region2"))) in Source
-
- Back in the Power Query Editor, click Close & Load
-
- Now we can change the value in the cell whose name we changed to region2, then click Refresh All:
-
- At this point a new query will run in Dremio with the parameter set to our new value and the results will be returned to Excel:
Wrapping Up!
Keeping efficiency in mind is fundamental when working with data. In this tutorial we went through the steps to create parameters in SQL queries when using Microsoft Excel to consume data from your data lake using Dremio. Following these steps will help you increase the efficiency of reporting activities, especially when these reports require real-time abilities such as custom data granularity based on who is viewing the data. We saw how this is possible by implementing parameters that will allow you to use the same report to satisfy the data needs of multiple users instead of creating a static report for each one of them. To learn more about Dremio visit our tutorials and resources, also if you would like to experiment with Dremio in your own virtual lab, go ahead and checkout Dremio University, and if you have any questions visit our community forums where we are all eager to help.