Dremio Jekyll

Connecting Looker to Dremio

Introduction

Looker is a popular platform for Business Intelligence and embedded analytics. Its web-based interface makes it easy for users to build powerful reports and visualizations on a wide range of data sources.

Dremio is a self-service data platform. It runs between your data sources and your analytical tools, like Looker, to simplify and accelerate how you run analytics. Dremio supports all your data sources, like relational databases, data lakes, NoSQL, and object stores like S3. This means that even when a source doesn’t support SQL, Dremio makes it possible to run analytics using standard SQL, which is what Looker uses.

In this tutorial we will show how to connect Looker to Dremio. With Dremio, Looker users gain access to NoSQL databases like MongoDB and Elasticsearch, as well as Data Lakes running on Hadoop, Amazon S3, and Azure ADLS. In addition, Dremio makes it easy to join data across different data sources, such as joining data between your data lake and your relational database. Finally Dremio’s data acceleration features provide interactive speed on massive datasets, and allow users to offload their analytical queries from operational sources.

Getting started

For this tutorial you’ll need access to Looker and you’ll also need access to a Dremio environment. Dremio is open source software you run wherever you like. You can download Dremio here, and follow the Quickstart Guide to get setup in just a few minutes.

For this tutorial we will use a sample dataset that includes CSV and JSON data stored in Amazon S3. If you’re new to Dremio, we suggest you read over Getting Oriented to Dremio and Working With Your First Dataset. You should follow along in the second tutorial to set up the sample data we will use for the remainder of this tutorial. Once you have completed this tutorial you should be able to run the following query through Dremio:

1
2
3
4
5
6
SELECT Category, count(*) as "incident count"
FROM SFIncidents.incidents
WHERE DayOfWeek in ('Monday','Tuesday','Wednesday','Thursday')
GROUP BY Category
ORDER BY "incident count" DESC
LIMIT 5

This query asks for the top 5 most frequent incidents that occur during weekdays. You should see the following results (be sure to click Run instead of Preview):

Connecting Looker to Dremio

Once you understand the basics for connecting Dremio to files in S3, you should be able to connect your other sources to Dremio and follow a similar pattern for connecting Looker to MongoDB, Elasticsearch, Hadoop, and other sources.

Setting up the connection from Looker to Dremio

Now that you have Dremio connected to the sample dataset on S3, let’s connect Looker to Dremio to analyze this data. First go to:

Admin > Database Connections > New Connection

Connecting Looker to Dremio

On the next screen, select Dremio as the dialect:

Connecting Looker to Dremio

  • For the host, use the IP address of the instance or server where Dremio is working.
  • For the database enter DREMIO.
  • Enter the user/password for the Dremio user that will be used to access Dremio.
  • All other fields use the default or leave blank.

Connecting Looker to Dremio

Once your connection is setup, you can test the connection to ensure your setup is correct:

Connecting Looker to Dremio

Creating a new project in Looker

Now you can go to Create Project and enter in the details to create views in Looker for all the virtual datasets in the SFIncidents space in Dremio:

Connecting Looker to Dremio

This will generate LookML for your project to make connecting to Dremio very simple:

Connecting Looker to Dremio

You can extend the defaults here to include the joins for the incidents dataset by replacing the entry for incidents{} with:

1
2
3
4
5
6
7
8
9
10
11
12
13
explore: incidents {
  join: zip_lookup {
    type: left_outer
    sql_on: ${incidents.lat} = ${zip_lookup.lat} AND ${incidents.lon} = ${zip_lookup.lon};;
    relationship: many_to_one
  }

  join: zip_pop {
    type: left_outer
    sql_on: ${zip_lookup.zip} = ${zip_pop.zip} ;;
    relationship: one_to_one
  }
}

Running queries from Looker

Now you can go to Explore to begin to interact with the sample data in S3 using Dremio. For example, go to Explore > Tutorial > Incidents, then add Category and Count to your report:

Connecting Looker to Dremio

You can then look in Dremio at the Jobs screen which will show you the query issued by Looker:

Connecting Looker to Dremio

Clicking on the most recent job shows the SQL issued by Dremio to generate the report:

1
2
3
4
5
6
7
SELECT 
	incidents.Category  AS incidents_category,
	COUNT(*) AS incidents_count
FROM SFIncidents.incidents  AS incidents
GROUP BY incidents.Category 
ORDER BY 2 DESC
LIMIT 500

Your job may not include a flame icon at the top next to Completed. This icon indicates a Data Reflection was used to accelerate the query. Data Reflections accelerate your queries for a wide range of query patterns, and for any of your tools, including Looker. You can read more about Data Reflections here.

Creating Data Reflections

To create your own Data Reflection on your dataset, navigate to the dataset by clicking on the preview icon in the job for SFIncidents.incidents:

Connecting Looker to Dremio

This will take you back to the dataset. Click on the gear icon to configure Data Reflections. You can configure the following options:

Connecting Looker to Dremio

This configuration will produce two Data Reflections for this dataset:

  • A Raw Reflection, to accelerate ad-hoc queries
  • An Aggregation Reflections, to accelerate GROUP BY queries

In this example we are using a dummy field for measures since all our queries will be simple counts on this dataset.

Now you can go back to Looker to re-run your query and you should see that it was accelerated:

Connecting Looker to Dremio

If you click on the Acceleration tab for the profile, you can see that multiple reflections were considered, and the cheapest options was selected:

Connecting Looker to Dremio

To learn more about running Dremio with MongoDB, Elasticsearch, S3, and other data sources, check out our tutorials.

If you have questions along the way, be sure to ask in our community.