Dremio Jekyll

Connecting Tableau to MongoDB

Intro

MongoDB is a scalable, flexible document database that stores data in JSON-like documents. While MongoDB can store documents of varying structures, this makes it more difficult to integrate with BI tools like Tableau which connect to traditional relational databases.

Tableau is a business intelligence tool that helps businesses better understand their data, offering an easy-to-use interface to create standard visualizations. Tableau provides the ability to connectors to an array of data sources, but has difficulty connecting to NoSQL databases like MongoDB. You could, perhaps, download MongoDB’s Connector for BI which would allow you to use MongoDB as a data source for BI tools like Tableau, but it is part of MongoDB’s Enterprise Advanced software, and even then, the driver is slow.

Fortunately, Dremio allows us to connect data from MongoDB to Tableau easily. If you also wanted to connect data from other sources like Amazon S3, Azure Data Lake Store, HDFS, or ElasticSearch, among other sources where your data lake lives, you would only need to download one ODBC driver to join all of your data. In addition, after connecting your data, Dremio also provides an interface for you to curate and transform your data before creating visualization in Tableau or your favorite BI tool.

Assumptions

We assume that you have Dremio and ODBC driver installed; if not, go to Dremio’s download page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will be working with MongoDB and Tableau Desktop, so you need to have them installed and configured as well.

Importing data into MongoDB and connecting to Dremio

Today, we will be working with Yelp’s academic dataset. If your data is not already in MongoDB, We will need to connect to a running instance of MongoDB in order to add our data.

First, run

1
mongod

to start the MongoDB process and run it in the background. Then, in another tab of your terminal, we can run

1
mongoimport -d dremio -c yelp --type json --file <PATH TO FILE>/yelp.json

to add our data to MongoDB. Now we can connect MongoDB to Dremio.

First, click on the + button to add a new source to Dremio.

add source

You will see a list of options that you can connect to. Click on MongoDB.

add source

With your MongoDB process running in the background, enter the host and port number on which it is running, pick No Authentication and click Save to connect.

create new mongo source

Data curation in Dremio

Now, we can see all of the data you’ve imported into MongoDB as physical datasets indicated by the purple icon.

add source

First, let’s filter all of our data for just businesses in Las Vegas, NV.

add source

Data stored in MongoDB is in a JSON-like structure, which allows arrays and nested objects. Dremio’s interface allows you to unnest arrays and extract subfields from objects, allowing you to write vanilla SQL on unstructured data.

Let’s say it’s a Friday night and we’re looking for potential places to eat that are still open. Hours is stored as an object, but we can extract the Friday closed field from the object and filter on that column simply by pressing the three dots on the cell and then highlighting the field that we want.

add source

Let’s rename the field friday_close once we’ve extracted it. add source

Now we can convert friday_close as a string into a time type.

add source

We need to enter the correct time string formatter for it to read the string properly. add source

By converting friday_close into a time, we can then filter for restaurants that are open late at night using the Keep Only function.

add source

We can edit the SQL editor so that it returns restaurants that close after 10pm or later on Friday.

1
2
('07:00:00' >= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1) OR '22:00:00' <= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1))
 

Next, let’s unnest Categories. Unnesting Categories will flatten the array so that one row is created per category per business. add source

We’re craving tacos tonight, so we can now filter Categories for just Mexican restaurants.

add source

Let’s also unnest the Neighborhoods column so we can look at what neighborhoods restaurants are concentrated in on our visualization.

add source

To further dig through the data, you can also use the SQL Editor to write your queries. For now, our resulting SQL looks like this:

1
2
3
4
5
6
7
8
9
10
SELECT "_id", business_id, full_address, hours, friday_close, nested_0."open" AS "open", categories, city, review_count, name, flatten(neighborhoods) AS neighborhoods, longitude, state, stars, latitude, attributes, type
FROM (
  SELECT "_id", business_id, full_address, hours, friday_close, nested_0."open" AS "open", flatten(categories) AS categories, city, review_count, name, neighborhoods, longitude, state, stars, latitude, attributes, type
  FROM (
    SELECT "_id", business_id, full_address, hours, TO_TIME(business.hours.Friday."close", 'HH24:MI', 1) AS friday_close, business."open" AS "open", categories, city, review_count, name, neighborhoods, longitude, state, stars, latitude, attributes, type
    FROM mongo.yelp.business
     WHERE (city = 'Las Vegas') AND ('07:00:00' >= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1) OR '22:00:00' <= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1))
  ) nested_0
) nested_0
 WHERE categories = 'Mexican'

Visualizing in Tableau

Now we’re ready to visualize our data in Tableau. Simply save the changes you’ve made as a your virtual dataset and click on Tableau to create a .tds file.

add source

Then, open your .tds file and enter your Dremio login and password.

add source

Once you’ve entered your credentials and connected to your data successfully, make latitude and longitude dimensions and then drag them into rows and columns respectively. This will plot all of your restaurants on a map.

add source

Then, we can drag review_count into size. Bigger circles will have more reviews, while smaller circles will have fewer. Then, let’s drag stars to color and change the the color scale from red to blue. Restaurants with better reviews will be a deeper shade of blue. Make sure that the measure of review_count and stars are AVG and not SUM. You should see this in your Marks window.

add source

Your resulting graph should look like this.

add source

Let’s also drag the name of the restaurants and corresponding neighborhoods into tooltip. add source It looks like most restaurants are clustered in the Strip, and the biggest, bluest circle is Tacos El Gordo. We’ve found our late night snack.

add source

Conclusion

Today, we demonstrated how to visualize data from MongoDB in Tableau. As your company’s data becomes more distributed across multiple big data platforms, Dremio provides a home for all of your data and allows you to use your favorite BI tools like Tableau, which traditionally have trouble connecting to and visualizing big data. With Dremio, your data becomes self-service.