Dremio Jekyll

Replacing Microsoft SSIS With Dremio And Power BI


In this Blog, we will take a closer look at using Dremio as an alternative to building a traditional data warehouse using Microsoft products. Traditionally we would write ETL, move the data from our operational systems into a data warehouse, then point Power BI to the star schema of the data warehouse for reporting and visualization. With Dremio we have a different alternative where we build the star schema using Dremio’s virtual datasets, without ETL and without building a data warehouse.

As we step through the examples, we will be using AdventureWorks database as example. The AdventureWorks database is an OLTP system that is used by Microsoft SQL Server as Sample database. It comes with the SQL Server installation (with the install file or a download from the internet). For more information on this database, please visit: https://technet.microsoft.com/en-us/library/ms124825(v=sql.100).aspx

AdventureWorks database has many subject areas. In this blog, we will only tackle some aspects of the InternetSales. The star schema of the Internet Sales may include the following dimensions and fact tables:

  • DimProduct
  • DimSalesTerritory
  • DimCustomer
  • FactInternetSales

Doing It The SSIS way

If we were to build ETL process of this star schema with Microsoft BI stack, we would use a combination of SSIS packages and SQL Server queries. An SSIS package may have one or more data flows that would take data from AdventureWorks database, transform it, and then load it to AdventureWorkDW database.

So, we would need to install Microsoft BI Data Tools on the developer machine. We would also need to create a new database AdventureWorksDW. Create new dimensions and facts in the new database. Once the packages are ready, we would then make them available on the SQL Server (either file system objects or SSIS server objects). Depending on the SSIS design, we then need to attach one or more packages to SQL Server Agent job(s) to run them on a schedule to maintain the data going forward.

The process once deployed in SQL Server, it will be owned by IT department (most likely one of the DBAs would take charge of the process and support it).

The business user must make development requests for any ETL changes done in SSIS or any daily operation requests via IT department. The user may only be able to interact with data exposed via AdventureWorkDW or a cube built in SSAS. This limits the user’s ability to be self-sufficient and be able to try new idea on the AdventureWork database or (Copy of it).

Below is a diagram showing systems and steps involved to build a data warehouse using the Microsoft BI stack.

Process flow for building data warehouse with Microsoft SSIS

Upon completing the steps of building the data warehouse, many star schemas could be formed. One of them could be viewed as the following (InternetSales):

Star schema

Doing It The Dremio Way

With Dremio, we would only need access to Dremio via our browsers (no need to install any local software). We would not need to create databases or tables. We would only need access to these tables in the AdventureWork databases:

  • SalesOrderDetail
  • SalesOrderHeader
  • Product
  • ProductCategory
  • ProductSubcategory

The following diagram depicts the systems and steps involved:

SSIS architecture

Connect Dremio to a SQL Server instance

Prerequisites: SQL Server

Make sure sql server is set up to listen to remote connections if your SQL Server instance is not local on your machine where Dremio is installed. This is a task mostly done and controlled by Database Administrator. But if you happen to need to do it yourself and open the port for remote connection, follow instructions found here:


Dremio will need a SQL Server user. It will not work with domain user. Create a SQL Server user with minimum read permissions on the databases that Dremio will be accessing.

Dremio install

At this point, you should have Dremio running (Default: http://localhost:9047 )

Login to Dremio

At the UI, login with your user:

Log into Dremio

You should see a panel on the right side of Dremio UI. Click on the ‘Sources’ or click on ‘+’ sign next to ‘Sources’

Enter the create new source dialog

You should see a popup like this one with many types of data sources.

Select the data source

Click on Microsoft SQL Server.

Use the SQL Server user you created previously to connect to the SQL server instance. In the screenshot below, the Dremio user is used to connect to a local SQL Server instance on port 1433.

Connecting to SQL Servrer

Next, click ‘Save’.

You should now see the new source created in Dremio.

View the new SQL Server source in Dremio

At this point, Dremio is ready to create virtual datasets by browsing database, the schema, and the desired tables.

Locate the SalesOrderDetail table and click on Query as shown in screenshot below:

Preview the data

This will bring you to this screen. Click the SQL Editor window to view the SQL:

Previewing the dataset

Click on Join. You would then be presented by the available sources:

Select a dataset for the join

Navigate to and select the SalesOrderHeader table and click Next:

Building the join

Define the Join condition by dragging the appropriate column from the SalesOrderDetail and column from SalesOrderHeader. We are setting this Join as an Inner join. Hit Apply.

Building an inner join

As user interacts with Query editor, Dremio Internally builds the query.

By now, the query is 37 columns all together. Let’s get rid of some columns that we do not need.

Scroll right to a column that we do not need and click on dropdown arrow (at the top right corner of each column) and select Drop.

Dropping a column

Dropping a column

After deleting several columns we won’t need, we are able to keep 20 columns in the query:

Keeping the right columns

Let’s check the data type of each column and make sure Dremio has them correctly defined. Look at left top corner of each column and validate the data type. If any column needs data type change, click on the top left side of the column to get a drop down menu and change type to appropriate one:

Changing the type

Because we are only interested in Internet Sales, we would like to keep only transactions that have onlineflag turned on.

Scroll to the column OnlineOrderFlag and click on the dropdown menu on the top right corner of the column. This will bring a menu and click on Keep Only button:

Keep only certain values menu

The follow screen would then show up:

Histogram of values

Let’s only select the values of OnlineOrderFlag that are True and hit Apply.

Now our query has only the columns we kept with a where clause.

Review the query

At this point, it would be a good idea to save our query. Look at the top right corner of the SQL Editor. Locate Save As button and click on it.

Name the query. Select a space to save it to and click Save.

Name the query for saving in a space

The query name should appear at the top left corner of the SQL Editor pane:

Query name

Let’s now create a calculated field. Let calculate the Year and Quarters where the Order happened. At the top right corner of the OrderDate, click the dropdown and select calculated Field:

Select calculated field

You would be brought to this screen, where you can type an expression using the available functions (on the right):

Expression builder screen

Let’s search for extract function. By entering a search term in the search box, you would get a result and brief description of the function.

Searching for the extract function

Double click on the EXTRACT function to be inserted in the expression. Continue building the expression and validate by hitting Preview button. We will rename our new column to OrderYear and we will uncheck the box to keep the original OrderDate (It will not be dropped). Hit Apply.

Extracting quarters from the dates

If for some reason, you need to go back in your steps, you can always go back by using the breadcrumb bar on the right side:

Dremio breadcrumbs

Columns can be renamed right in the column header or through the dropdown menu in each column. Let’s rename the OrderDateQuarter to OrderQuarter.

Rename the column from SQL Server

At this point, we can bring in the Product table.

So, let’s click on Join button and select the Product table from the our source. Navigate to the Product table in the AdventureWorks database and click Next.

Build the join in SQL Server

You can review table structure before you bring in the table into the join by clicking on the purple table icon:

View the physical dataset

Sample data from the table would be displayed at the bottom of the screen as well.

Complete the join condition. This time, we want to make sure it is FullOuter join and Hit Apply. We want to bring all records from both sides.

Using Full Outer Join in Dremio

We do not need all the columns from Product table, so we will need to drop some of them. Go to the top right corner of a column and select Drop.

Let’s bring in ProductSubcategory and ProductCategory. Follow same steps to join them to the query:

  • Create an Inner join ProductSubcategory on ProductSubcategoryID.
  • Also, Create an Inner join ProductCategory on ProductCategoryID.

Let’s also bring in the SalesTerritory table. Let’s full Join the SalesTerritory on TerritoryID.

The last table we are bringing in is the Customer table. Let’s also full join Customer table on CustomerID.

We may see a lot of customers without orders but that would help us find out which of our customers are active and those which are dormant if we desire to run such query later.

Ultimately, the query we build for this virtual dataset looks like the following:

SELECT SalesOrderDetailID, OrderQty, UnitPrice, UnitPriceDiscount, LineTotal, SalesOrderID, OrderDate, OrderYear, OrderQuarter, DueDate, ShipDate, Status, OnlineOrderFlag, SalesPersonID, SubTotal, TaxAmt, Freight, TotalDue, ProductID, Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightUnitMeasureCode, Weight, DaysToManufacture, ProductLine, Class, Style, ProductModelID, SellStartDate, SellEndDate, DiscontinuedDate, SubCategoryName, CategoryName, TerritoryName, CountryRegionCode, nested_0."Group" AS "Group", SalesYTD, SalesLastYear, CostYTD, CostLastYear, CustomerID, CustomerID0, StoreID, TerritoryID, AccountNumber
  SELECT nested_8.SalesOrderDetailID AS SalesOrderDetailID, nested_8.OrderQty AS OrderQty, nested_8.UnitPrice AS UnitPrice, nested_8.UnitPriceDiscount AS UnitPriceDiscount, nested_8.LineTotal AS LineTotal, nested_8.SalesOrderID AS SalesOrderID, nested_8.OrderDate AS OrderDate, nested_8.OrderYear AS OrderYear, nested_8.OrderQuarter AS OrderQuarter, nested_8.DueDate AS DueDate, nested_8.ShipDate AS ShipDate, nested_8.Status AS Status, nested_8.OnlineOrderFlag AS OnlineOrderFlag, nested_8.SalesPersonID AS SalesPersonID, nested_8.SubTotal AS SubTotal, nested_8.TaxAmt AS TaxAmt, nested_8.Freight AS Freight, nested_8.TotalDue AS TotalDue, nested_8.ProductID AS ProductID, nested_8.Name AS Name, nested_8.ProductNumber AS ProductNumber, nested_8.MakeFlag AS MakeFlag, nested_8.FinishedGoodsFlag AS FinishedGoodsFlag, nested_8.Color AS Color, nested_8.SafetyStockLevel AS SafetyStockLevel, nested_8.ReorderPoint AS ReorderPoint, nested_8.StandardCost AS StandardCost, nested_8.ListPrice AS ListPrice, nested_8.Size AS Size, nested_8.SizeUnitMeasureCode AS SizeUnitMeasureCode, nested_8.WeightUnitMeasureCode AS WeightUnitMeasureCode, nested_8.Weight AS Weight, nested_8.DaysToManufacture AS DaysToManufacture, nested_8.ProductLine AS ProductLine, nested_8.Class AS Class, nested_8.Style AS Style, nested_8.ProductModelID AS ProductModelID, nested_8.SellStartDate AS SellStartDate, nested_8.SellEndDate AS SellEndDate, nested_8.DiscontinuedDate AS DiscontinuedDate, nested_8.SubCategoryName AS SubCategoryName, nested_8.CategoryName AS CategoryName, nested_8.TerritoryName AS TerritoryName, nested_8.CountryRegionCode AS CountryRegionCode, nested_8."Group" AS "Group", nested_8.SalesYTD AS SalesYTD, nested_8.SalesLastYear AS SalesLastYear, nested_8.CostYTD AS CostYTD, nested_8.CostLastYear AS CostLastYear, nested_8.CustomerID AS CustomerID, join_Customer.CustomerID AS CustomerID0, join_Customer.PersonID AS PersonID, join_Customer.StoreID AS StoreID, join_Customer.TerritoryID AS TerritoryID, join_Customer.AccountNumber AS AccountNumber, join_Customer.rowguid AS rowguid, join_Customer.ModifiedDate AS ModifiedDate
  FROM (
    SELECT nested_7.SalesOrderDetailID AS SalesOrderDetailID, nested_7.OrderQty AS OrderQty, nested_7.UnitPrice AS UnitPrice, nested_7.UnitPriceDiscount AS UnitPriceDiscount, nested_7.LineTotal AS LineTotal, nested_7.SalesOrderID AS SalesOrderID, nested_7.OrderDate AS OrderDate, nested_7.OrderYear AS OrderYear, nested_7.OrderQuarter AS OrderQuarter, nested_7.DueDate AS DueDate, nested_7.ShipDate AS ShipDate, nested_7.Status AS Status, nested_7.OnlineOrderFlag AS OnlineOrderFlag, nested_7.CustomerID AS CustomerID, nested_7.SalesPersonID AS SalesPersonID, nested_7.SubTotal AS SubTotal, nested_7.TaxAmt AS TaxAmt, nested_7.Freight AS Freight, nested_7.TotalDue AS TotalDue, nested_7.ProductID AS ProductID, nested_7.Name AS Name, nested_7.ProductNumber AS ProductNumber, nested_7.MakeFlag AS MakeFlag, nested_7.FinishedGoodsFlag AS FinishedGoodsFlag, nested_7.Color AS Color, nested_7.SafetyStockLevel AS SafetyStockLevel, nested_7.ReorderPoint AS ReorderPoint, nested_7.StandardCost AS StandardCost, nested_7.ListPrice AS ListPrice, nested_7.Size AS Size, nested_7.SizeUnitMeasureCode AS SizeUnitMeasureCode, nested_7.WeightUnitMeasureCode AS WeightUnitMeasureCode, nested_7.Weight AS Weight, nested_7.DaysToManufacture AS DaysToManufacture, nested_7.ProductLine AS ProductLine, nested_7.Class AS Class, nested_7.Style AS Style, nested_7.ProductModelID AS ProductModelID, nested_7.SellStartDate AS SellStartDate, nested_7.SellEndDate AS SellEndDate, nested_7.DiscontinuedDate AS DiscontinuedDate, nested_7.SubCategoryName AS SubCategoryName, nested_7.CategoryName AS CategoryName, join_SalesTerritory.Name AS TerritoryName, join_SalesTerritory.CountryRegionCode AS CountryRegionCode, join_SalesTerritory."Group" AS "Group", join_SalesTerritory.SalesYTD AS SalesYTD, join_SalesTerritory.SalesLastYear AS SalesLastYear, join_SalesTerritory.CostYTD AS CostYTD, join_SalesTerritory.CostLastYear AS CostLastYear
    FROM (
      SELECT nested_6.SalesOrderDetailID AS SalesOrderDetailID, nested_6.OrderQty AS OrderQty, nested_6.UnitPrice AS UnitPrice, nested_6.UnitPriceDiscount AS UnitPriceDiscount, nested_6.LineTotal AS LineTotal, nested_6.SalesOrderID AS SalesOrderID, nested_6.OrderDate AS OrderDate, nested_6.OrderYear AS OrderYear, nested_6.OrderQuarter AS OrderQuarter, nested_6.DueDate AS DueDate, nested_6.ShipDate AS ShipDate, nested_6.Status AS Status, nested_6.OnlineOrderFlag AS OnlineOrderFlag, nested_6.CustomerID AS CustomerID, nested_6.SalesPersonID AS SalesPersonID, nested_6.TerritoryID AS TerritoryID, nested_6.SubTotal AS SubTotal, nested_6.TaxAmt AS TaxAmt, nested_6.Freight AS Freight, nested_6.TotalDue AS TotalDue, nested_6.ProductID AS ProductID, nested_6.Name AS Name, nested_6.ProductNumber AS ProductNumber, nested_6.MakeFlag AS MakeFlag, nested_6.FinishedGoodsFlag AS FinishedGoodsFlag, nested_6.Color AS Color, nested_6.SafetyStockLevel AS SafetyStockLevel, nested_6.ReorderPoint AS ReorderPoint, nested_6.StandardCost AS StandardCost, nested_6.ListPrice AS ListPrice, nested_6.Size AS Size, nested_6.SizeUnitMeasureCode AS SizeUnitMeasureCode, nested_6.WeightUnitMeasureCode AS WeightUnitMeasureCode, nested_6.Weight AS Weight, nested_6.DaysToManufacture AS DaysToManufacture, nested_6.ProductLine AS ProductLine, nested_6.Class AS Class, nested_6.Style AS Style, nested_6.ProductModelID AS ProductModelID, nested_6.SellStartDate AS SellStartDate, nested_6.SellEndDate AS SellEndDate, nested_6.DiscontinuedDate AS DiscontinuedDate, nested_6.Name0 AS SubCategoryName, join_ProductCategory.Name AS CategoryName
      FROM (
        SELECT nested_5.SalesOrderDetailID AS SalesOrderDetailID, nested_5.OrderQty AS OrderQty, nested_5.UnitPrice AS UnitPrice, nested_5.UnitPriceDiscount AS UnitPriceDiscount, nested_5.LineTotal AS LineTotal, nested_5.SalesOrderID AS SalesOrderID, nested_5.OrderDate AS OrderDate, nested_5.OrderYear AS OrderYear, nested_5.OrderQuarter AS OrderQuarter, nested_5.DueDate AS DueDate, nested_5.ShipDate AS ShipDate, nested_5.Status AS Status, nested_5.OnlineOrderFlag AS OnlineOrderFlag, nested_5.CustomerID AS CustomerID, nested_5.SalesPersonID AS SalesPersonID, nested_5.TerritoryID AS TerritoryID, nested_5.SubTotal AS SubTotal, nested_5.TaxAmt AS TaxAmt, nested_5.Freight AS Freight, nested_5.TotalDue AS TotalDue, nested_5.ProductID AS ProductID, nested_5.Name AS Name, nested_5.ProductNumber AS ProductNumber, nested_5.MakeFlag AS MakeFlag, nested_5.FinishedGoodsFlag AS FinishedGoodsFlag, nested_5.Color AS Color, nested_5.SafetyStockLevel AS SafetyStockLevel, nested_5.ReorderPoint AS ReorderPoint, nested_5.StandardCost AS StandardCost, nested_5.ListPrice AS ListPrice, nested_5.Size AS Size, nested_5.SizeUnitMeasureCode AS SizeUnitMeasureCode, nested_5.WeightUnitMeasureCode AS WeightUnitMeasureCode, nested_5.Weight AS Weight, nested_5.DaysToManufacture AS DaysToManufacture, nested_5.ProductLine AS ProductLine, nested_5.Class AS Class, nested_5.Style AS Style, nested_5.ProductModelID AS ProductModelID, nested_5.SellStartDate AS SellStartDate, nested_5.SellEndDate AS SellEndDate, nested_5.DiscontinuedDate AS DiscontinuedDate, nested_5.ModifiedDate AS ModifiedDate, join_ProductSubcategory.ProductCategoryID AS ProductCategoryID, join_ProductSubcategory.Name AS Name0
        FROM (
          SELECT nested_4.SalesOrderDetailID AS SalesOrderDetailID, nested_4.OrderQty AS OrderQty, nested_4.UnitPrice AS UnitPrice, nested_4.UnitPriceDiscount AS UnitPriceDiscount, nested_4.LineTotal AS LineTotal, nested_4.SalesOrderID AS SalesOrderID, nested_4.OrderDate AS OrderDate, nested_4.OrderYear AS OrderYear, nested_4.OrderQuarter AS OrderQuarter, nested_4.DueDate AS DueDate, nested_4.ShipDate AS ShipDate, nested_4.Status AS Status, nested_4.OnlineOrderFlag AS OnlineOrderFlag, nested_4.CustomerID AS CustomerID, nested_4.SalesPersonID AS SalesPersonID, nested_4.TerritoryID AS TerritoryID, nested_4.SubTotal AS SubTotal, nested_4.TaxAmt AS TaxAmt, nested_4.Freight AS Freight, nested_4.TotalDue AS TotalDue, nested_4.ProductID AS ProductID, join_Product.Name AS Name, join_Product.ProductNumber AS ProductNumber, join_Product.MakeFlag AS MakeFlag, join_Product.FinishedGoodsFlag AS FinishedGoodsFlag, join_Product.Color AS Color, join_Product.SafetyStockLevel AS SafetyStockLevel, join_Product.ReorderPoint AS ReorderPoint, join_Product.StandardCost AS StandardCost, join_Product.ListPrice AS ListPrice, join_Product.Size AS Size, join_Product.SizeUnitMeasureCode AS SizeUnitMeasureCode, join_Product.WeightUnitMeasureCode AS WeightUnitMeasureCode, join_Product.Weight AS Weight, join_Product.DaysToManufacture AS DaysToManufacture, join_Product.ProductLine AS ProductLine, join_Product.Class AS Class, join_Product.Style AS Style, join_Product.ProductSubcategoryID AS ProductSubcategoryID, join_Product.ProductModelID AS ProductModelID, join_Product.SellStartDate AS SellStartDate, join_Product.SellEndDate AS SellEndDate, join_Product.DiscontinuedDate AS DiscontinuedDate, join_Product.ModifiedDate AS ModifiedDate
          FROM (
            SELECT SalesOrderDetailID, OrderQty, ProductID, UnitPrice, UnitPriceDiscount, LineTotal, SalesOrderID, OrderDate, EXTRACT(YEAR FROM "OrderDate" )  AS OrderYear, OrderDateQuarter AS OrderQuarter, DueDate, ShipDate, Status, OnlineOrderFlag, CustomerID, SalesPersonID, TerritoryID, SubTotal, TaxAmt, Freight, TotalDue
            FROM (
              SELECT SalesOrderDetailID, OrderQty, ProductID, UnitPrice, UnitPriceDiscount, LineTotal, SalesOrderID, OrderDate, EXTRACT(QUARTER FROM "OrderDate" ) AS OrderDateQuarter, DueDate, ShipDate, ShipDate_1, Status, OnlineOrderFlag, CustomerID, SalesPersonID, TerritoryID, SubTotal, TaxAmt, Freight, TotalDue
              FROM (
                SELECT SalesOrderDetailID, OrderQty, ProductID, UnitPrice, UnitPriceDiscount, LineTotal, SalesOrderID, CAST(CAST(OrderDate as DATE) as TIMESTAMP) AS OrderDate, DueDate, ShipDate, date_part('Month',"ShipDate")
                 AS ShipDate_1, Status, OnlineOrderFlag, CustomerID, SalesPersonID, TerritoryID, SubTotal, TaxAmt, Freight, TotalDue
                FROM (
                  SELECT nested_0.SalesOrderDetailID AS SalesOrderDetailID, nested_0.OrderQty AS OrderQty, nested_0.ProductID AS ProductID, nested_0.UnitPrice AS UnitPrice, nested_0.UnitPriceDiscount AS UnitPriceDiscount, nested_0.LineTotal AS LineTotal, nested_0.SalesOrderID AS SalesOrderID, join_SalesOrderHeader.OrderDate AS OrderDate, join_SalesOrderHeader.DueDate AS DueDate, join_SalesOrderHeader.ShipDate AS ShipDate, join_SalesOrderHeader.Status AS Status, join_SalesOrderHeader.OnlineOrderFlag AS OnlineOrderFlag, join_SalesOrderHeader.CustomerID AS CustomerID, join_SalesOrderHeader.SalesPersonID AS SalesPersonID, join_SalesOrderHeader.TerritoryID AS TerritoryID, join_SalesOrderHeader.SubTotal AS SubTotal, join_SalesOrderHeader.TaxAmt AS TaxAmt, join_SalesOrderHeader.Freight AS Freight, join_SalesOrderHeader.TotalDue AS TotalDue
                  FROM (
                    SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate
                    FROM "SQL Server 2012".AdventureWorks2012_Data.Sales.SalesOrderDetail
                  ) nested_0
                   INNER JOIN "SQL Server 2012".AdventureWorks2012_Data.Sales.SalesOrderHeader AS join_SalesOrderHeader ON nested_0.SalesOrderID = join_SalesOrderHeader.SalesOrderID
                   WHERE join_SalesOrderHeader.OnlineOrderFlag = true
                ) nested_1
              ) nested_2
            ) nested_3
          ) nested_4
           FULL JOIN "SQL Server 2012".AdventureWorks2012_Data.Production.Product AS join_Product ON nested_4.ProductID = join_Product.ProductID
        ) nested_5
         INNER JOIN "SQL Server 2012".AdventureWorks2012_Data.Production.ProductSubcategory AS join_ProductSubcategory ON nested_5.ProductSubcategoryID = join_ProductSubcategory.ProductSubcategoryID
      ) nested_6
       INNER JOIN "SQL Server 2012".AdventureWorks2012_Data.Production.ProductCategory AS join_ProductCategory ON nested_6.ProductCategoryID = join_ProductCategory.ProductCategoryID
    ) nested_7
     FULL JOIN "SQL Server 2012".AdventureWorks2012_Data.Sales.SalesTerritory AS join_SalesTerritory ON nested_7.TerritoryID = join_SalesTerritory.TerritoryID
  ) nested_8
   FULL JOIN "SQL Server 2012".AdventureWorks2012_Data.Sales.Customer AS join_Customer ON nested_8.CustomerID = join_Customer.CustomerID
) nested_0

The query when it runs, it should return 61,899 rows. This can be viewed going to the Datasets screen. From there, choose the data set:

Choosing the dataset

Now click on the Jobs link to be taken here where you can see some statistics about the run.

See query profile for preview run

At this point, our star schema is ready to be used. Let’s point a BI tool to it. In this blog, we will be using Microsoft Power BI Desktop.

Power BI with Dremio

Prerequisites: Install Dremio Connector

Go to Dremio download page https://www.dremio.com/download/ and download the ODBC Drivers for Windows 64 bit. Once downloaded, run the install by following these prompts to install it.

In some cases, you may need to have Administrator privileges or run the install with Administrator user to be able to install the Dremio Connector on your machine.

Install the Dremio Connector

Install the Dremio Connector

Install the Dremio Connector

Install the Dremio Connector

Install the Dremio Connector

Microsoft Power BI Desktop

If you have not installed Microsoft Power BI Desktop yet, please proceed to Power BI download page and follow instructions: https://powerbi.microsoft.com/en-us/downloads/

In some cases, you may need to have Administrator privileges or run the install with Administrator user to be able to install the Power BI Desktop on your machine.

Connecting Power BI to Dremio datasets in SQL Server

Start Microsoft Power BI Desktop.

Click on Get Data as shown in this screenshot.

Click Get Data from Power BI

You should see a pop up. Scroll down where you see Dremio(Beta).

Use the Dremio connection

You may get a warning. Click Continue through it.

Ignore the warning

Enter the name of the Dremio Cluster Node. If you have a standalone Dremio service as in the community version, it would be the same as the url that starts the Dremio UI but without the port number. Select DirectQuery to allow Power BI not to store the data and take advantage of Dremio query performance.

Use DirectQuery from Power BI with Dremio

Enter Dremio user name and password created previously and click Connect.

Login to Dremio from Power BI

Navigate to the dataset previously created in Dremio. Click Load to connect to the selected dataset.

Connect to the selected dataset from Power BI

You may need to click on Apply Changes in Power BI Desktop to be able to see the data imported.

See data imported from Dremio to Power BI

You would next be able to see the dataset metadata (Columns) on the right side of the Power BI Desktop.

See metadata columns

At this point, you are ready to use the Dremio dataset you connected to and start running reports against Dremio. Just drag dataset columns to the report canvas and start showing those awesome charts with Dremio’s blazing speed like this one below.

Example Power BI report with Dremio

We can also run SQL Server profiler to see the queries that are being pushed down to SQL Sever by Dremio. If you have SQL Management Studio installed and have enough permissions, you would be able to go to Tools in the menu and launch SQL Server profiler.

Launch SQL Server profiler

Once launched, we can then create a trace that filters on loginName = Dremio and start it.

Create trace

We could see then Dremio user logged and the tables that we mentioned earlier when we were building our query are being run. At this point, several queries are being pushed down to SQL Server OLTP AdventureWorks database.

Query push-dwon

In very busy business environment, a data warehouse query, with its tendency to grab a huge chunk of data, could bring the business core operational systems down. In such cases, we would want to offload the analytical queries from the OLTP system to ensure the performance of our transactional environment. Dremio addresses this issue with a feature called Data Reflections, which we will cover in our next blog post.