MongoDB is a scalable, flexible document database that stores data in JSON-like documents. While MongoDB can store documents of varying structures, this makes it more difficult to integrate with BI tools like Tableau which connect to traditional relational databases.
Tableau is a business intelligence tool that helps businesses better understand their data, offering an easy-to-use interface to create standard visualizations. Tableau provides the ability to connectors to an array of data sources, but has difficulty connecting to NoSQL databases like MongoDB. You could, perhaps, download MongoDB’s Connector for BI which would allow you to use MongoDB as a data source for BI tools like Tableau, but it is part of MongoDB’s Enterprise Advanced software, and even then, the driver is slow.
Fortunately, Dremio allows us to connect data from MongoDB to Tableau easily. If you also wanted to connect data from other sources like Amazon S3, Azure Data Lake Store, HDFS, or ElasticSearch, among other sources where your data lake lives, you would only need to download one ODBC driver to join all of your data. In addition, after connecting your data, Dremio also provides an interface for you to curate and transform your data before creating visualization in Tableau or your favorite BI tool.
We assume that you have Dremio and ODBC driver installed; if not, go to Dremio’s download page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will be working with MongoDB and Tableau Desktop, so you need to have them installed and configured as well.
Today, we will be working with Yelp’s academic dataset. If your data is not already in MongoDB, We will need to connect to a running instance of MongoDB in order to add our data.
to start the MongoDB process and run it in the background. Then, in another tab of your terminal, we can run
1 mongoimport -d dremio -c yelp --type json --file <PATH TO FILE>/yelp.json
to add our data to MongoDB. Now we can connect MongoDB to Dremio.
First, click on the + button to add a new source to Dremio.
You will see a list of options that you can connect to. Click on MongoDB.
With your MongoDB process running in the background, enter the host and port number on which it is running, pick No Authentication and click Save to connect.
Now, we can see all of the data you’ve imported into MongoDB as physical datasets indicated by the purple icon.
First, let’s filter all of our data for just businesses in Las Vegas, NV.
Data stored in MongoDB is in a JSON-like structure, which allows arrays and nested objects. Dremio’s interface allows you to unnest arrays and extract subfields from objects, allowing you to write vanilla SQL on unstructured data.
Let’s say it’s a Friday night and we’re looking for potential places to eat that are still open. Hours is stored as an object, but we can extract the Friday closed field from the object and filter on that column simply by pressing the three dots on the cell and then highlighting the field that we want.
Let’s rename the field friday_close once we’ve extracted it.
Now we can convert friday_close as a string into a time type.
We need to enter the correct time string formatter for it to read the string properly.
By converting friday_close into a time, we can then filter for restaurants that are open late at night using the Keep Only function.
We can edit the SQL editor so that it returns restaurants that close after 10pm or later on Friday.
1 2 ('07:00:00' >= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1) OR '22:00:00' <= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1))
Next, let’s unnest Categories. Unnesting Categories will flatten the array so that one row is created per category per business.
We’re craving tacos tonight, so we can now filter Categories for just Mexican restaurants.
Let’s also unnest the Neighborhoods column so we can look at what neighborhoods restaurants are concentrated in on our visualization.
To further dig through the data, you can also use the SQL Editor to write your queries. For now, our resulting SQL looks like this:
1 2 3 4 5 6 7 8 9 10 SELECT "_id", business_id, full_address, hours, friday_close, nested_0."open" AS "open", categories, city, review_count, name, flatten(neighborhoods) AS neighborhoods, longitude, state, stars, latitude, attributes, type FROM ( SELECT "_id", business_id, full_address, hours, friday_close, nested_0."open" AS "open", flatten(categories) AS categories, city, review_count, name, neighborhoods, longitude, state, stars, latitude, attributes, type FROM ( SELECT "_id", business_id, full_address, hours, TO_TIME(business.hours.Friday."close", 'HH24:MI', 1) AS friday_close, business."open" AS "open", categories, city, review_count, name, neighborhoods, longitude, state, stars, latitude, attributes, type FROM mongo.yelp.business WHERE (city = 'Las Vegas') AND ('07:00:00' >= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1) OR '22:00:00' <= TO_TIME(business.hours.Friday."close", 'HH24:MI', 1)) ) nested_0 ) nested_0 WHERE categories = 'Mexican'
Now we’re ready to visualize our data in Tableau. Simply save the changes you’ve made as a your virtual dataset and click on Tableau to create a .tds file.
Then, open your .tds file and enter your Dremio login and password.
Once you’ve entered your credentials and connected to your data successfully, make latitude and longitude dimensions and then drag them into rows and columns respectively. This will plot all of your restaurants on a map.
Then, we can drag review_count into size. Bigger circles will have more reviews, while smaller circles will have fewer. Then, let’s drag stars to color and change the the color scale from red to blue. Restaurants with better reviews will be a deeper shade of blue. Make sure that the measure of review_count and stars are AVG and not SUM. You should see this in your Marks window.
Your resulting graph should look like this.
Let’s also drag the name of the restaurants and corresponding neighborhoods into tooltip. It looks like most restaurants are clustered in the Strip, and the biggest, bluest circle is Tacos El Gordo. We’ve found our late night snack.
Today, we demonstrated how to visualize data from MongoDB in Tableau. As your company’s data becomes more distributed across multiple big data platforms, Dremio provides a home for all of your data and allows you to use your favorite BI tools like Tableau, which traditionally have trouble connecting to and visualizing big data. With Dremio, your data becomes self-service.