Introduction
Looker is a popular platform for Business Intelligence and embedded analytics. Its web-based interface makes it easy for users to build powerful reports and visualizations on a wide range of data sources.
Dremio is a self-service data platform. It runs between your data sources and your analytical tools, like Looker, to simplify and accelerate how you run analytics. Dremio supports all your data sources, like relational databases, data lakes, NoSQL, and object stores like S3. This means that even when a source doesn’t support SQL, Dremio makes it possible to run analytics using standard SQL, which is what Looker uses.
In this tutorial we will show how to connect Looker to Dremio. With Dremio, Looker users gain access to NoSQL databases like MongoDB and Elasticsearch, as well as Data Lakes running on Hadoop, Amazon S3, and Azure ADLS. In addition, Dremio makes it easy to join data across different data sources, such as joining data between your data lake and your relational database. Finally Dremio’s data acceleration features provide interactive speed on massive datasets, and allow users to offload their analytical queries from operational sources.
Getting started
For this tutorial you’ll need access to Looker and you’ll also need access to a Dremio environment. Dremio is open source software you run wherever you like. You can deploy Dremio here, and follow the Quickstart Guide to get setup in just a few minutes.
For this tutorial we will use a sample dataset that includes CSV and JSON data stored in Amazon S3. If you’re new to Dremio, we suggest you read over Getting Oriented to Dremio and Working With Your First Dataset. You should follow along in the second tutorial to set up the sample data we will use for the remainder of this tutorial. Once you have completed this tutorial you should be able to run the following query through Dremio:
SELECT Category, count(*) as "incident count" FROM SFIncidents.incidents WHERE DayOfWeek in ('Monday','Tuesday','Wednesday','Thursday') GROUP BY Category ORDER BY "incident count" DESC LIMIT 5
This query asks for the top 5 most frequent incidents that occur during weekdays. You should see the following results (be sure to click Run instead of Preview):
Once you understand the basics for connecting Dremio to files in S3, you should be able to connect your other sources to Dremio and follow a similar pattern for connecting Looker to MongoDB, Elasticsearch, Hadoop, and other sources.
Setting up the connection from Looker to Dremio
Now that you have Dremio connected to the sample dataset on S3, let’s connect Looker to Dremio to analyze this data. First go to:
Admin > Database Connections > New Connection
On the next screen, select Dremio as the dialect:
- For the host, use the IP address of the instance or server where Dremio is working.
- For the database enter DREMIO.
- Enter the user/password for the Dremio user that will be used to access Dremio.
- All other fields use the default or leave blank.
Once your connection is setup, you can test the connection to ensure your setup is correct:
Creating a new project in Looker
Now you can go to Create Project and enter in the details to create views in Looker for all the virtual datasets in the SFIncidents space in Dremio:
This will generate LookML for your project to make connecting to Dremio very simple:
You can extend the defaults here to include the joins for the incidents dataset by replacing the entry for incidents{} with:
explore: incidents { join: zip_lookup { type: left_outer sql_on: ${incidents.lat} = ${zip_lookup.lat} AND ${incidents.lon} = ${zip_lookup.lon};; relationship: many_to_one } join: zip_pop { type: left_outer sql_on: ${zip_lookup.zip} = ${zip_pop.zip} ;; relationship: one_to_one } }
Running queries from Looker
Now you can go to Explore to begin to interact with the sample data in S3 using Dremio. For example, go to Explore > Tutorial > Incidents, then add Category and Count to your report:
You can then look in Dremio at the Jobs screen which will show you the query issued by Looker:
Clicking on the most recent job shows the SQL issued by Dremio to generate the report:
SELECT incidents.Category AS incidents_category, COUNT(*) AS incidents_count FROM SFIncidents.incidents AS incidents GROUP BY incidents.Category ORDER BY 2 DESC LIMIT 500
Your job may not include a flame icon at the top next to Completed. This icon indicates a Data Reflection was used to accelerate the query. Data Reflections accelerate your queries for a wide range of query patterns, and for any of your tools, including Looker. You can read more about Data Reflections here.
Creating Data Reflections
To create your own Data Reflection on your dataset, navigate to the dataset by clicking on the preview icon in the job for SFIncidents.incidents:
This will take you back to the dataset. Click on the gear icon to configure Data Reflections. You can configure the following options:
This configuration will produce two Data Reflections for this dataset:
- A Raw Reflection, to accelerate ad-hoc queries
- An Aggregation Reflections, to accelerate GROUP BY queries
In this example we are using a dummy field for measures since all our queries will be simple counts on this dataset.
Now you can go back to Looker to re-run your query and you should see that it was accelerated:
If you click on the Acceleration tab for the profile, you can see that multiple reflections were considered, and the cheapest options was selected:
To learn more about running Dremio with MongoDB, Elasticsearch, S3, and other data sources, check out our tutorials.
If you have questions along the way, be sure to ask in our community.