Dremio Jekyll

Analyzing Multiple Data Sources Simultaneously with Dremio and DBVisualizer

Intro

DbVisualizer is a powerful tool for database management and analysis that can be used for every major database. DbVisualizer provides many useful features such as visual actions for SQL queries, schema exporting, support procedures, functions, triggers, etc.

However, in spite of all the great abilities, merging data sources in DbVisualizer is not a simple task. Dremio, the Data-as-a-Service platform provides an ideal solution to this problem. In this tutorial, we will show how to connect Microsoft SQL Server and Postgres to Dremio, perform data curation in Dremio, and then connect Dremio to DbVisualizer. Also, we will demonstrate how to work with DbVisualizer.

Assumptions

We assume that you have Dremio and JDBC driver installed; if not, go to Dremio’s download page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will be working with SQL-Server, Postgre, and DbVisualizer, so you need to have them installed and configured as well.

Importing data into sources

In this tutorial, we will use a baseball dataset available in Kaggle. The History of Baseball is a reformatted version of the famous Lahman’s Baseball Database. It contains batting and pitching statistics, fielding statistics, standings, team stats, park stats, player demographics, managerial records, awards, post-season data, and more.

Follow the link above and get a database.sqllite file which can be easily transformed into SQL. By executing the file in MSSQL and Postgres, we import data in the databases.

Inserting data in MSSQL:

image alt text

Inserting data in Postgres:

image alt text

Connecting to Dremio

When data is uploaded, login to Dremio and choose an option to add a new source. Then, select Microsoft SQL Server and Postgre and fill in fields with necessary information as shown below:

image alt text

image alt text

Data curation in Dremio

When the sources are connected, we can begin data preprocessing. Let’s open the all_star table. First, we can see that starting_pos contains empty text, so we exclude such values. Note how convenient it is to see the results preview in Dremio.

image alt text

Also, this field contains numeric values, but now it is a string type. Let’s convert it to an integer.

image alt text

Next, we want to combine Microsoft SQL Server and PostgreSQL data sources. For that, we click join, choose the desired data source and its data.

image alt text

Next, we need to add a join condition. In our case, player_award_vote and all_star have common leage_id, so we select it and click “Apply”.

image alt text

After all the preparations, Dremio will automatically generate the final SQL script.

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT nested_0.player_id AS player_id, nested_0.date_year AS date_year, nested_0.game_num AS game_num, nested_0.game_id AS game_id, nested_0.team_id AS team_id, nested_0.gp AS gp, nested_0.starting_pos AS starting_pos, nested_0.league_id AS league_id, join_player_award_vote.league_id AS league_id0, join_player_award_vote.award_id AS award_id, join_player_award_vote.date_year AS date_year0, join_player_award_vote.player_id AS player_id0, join_player_award_vote.points_won AS points_won, join_player_award_vote.points_max AS points_max, join_player_award_vote.votes_first AS votes_first

FROM (

 SELECT player_id, date_year, game_num, game_id, team_id, league_id, CONVERT_TO_INTEGER(gp, 1, 1, 0) AS gp, CONVERT_TO_INTEGER(starting_pos, 1, 1, 0) AS starting_pos

 FROM Postgre.public.all_star

  WHERE starting_pos <> ''

) nested_0

INNER JOIN MSSS.baseball.dbo.player_award_vote AS join_player_award_vote ON nested_0.league_id = join_player_award_vote.league_id

Finally, let’s save the dataset to our previously created space.

image alt text

When data is ready, you can describe it using Wiki-content and Tags in Catalog.

image alt text

You can see the result in your space.

image alt text

Connecting to DbVisualizer

Now, we want to connect DbVisualizer to Dremio. For this, add a Dremio driver to driver manager tools. For more detailed information you can take a look at this DbVisualizer tutorial and also visit dremio docs.

When the driver is configured, add a new connection, choose the Dremio driver and click “Connect”. This is what you should see as a result:

image alt text

You can look at your data by double-clicking on its name or using SQL commander. Also, with SQL commander you can analyze data in different ways. You can use SQL queries, visualizing tools, etc.

image alt text

For chart visualizing, you should purchase a pro edition; there is also a free trial period. You can get more detailed information on how to work with charts in DbVisualizer here.

image alt text

At the right side of the page, there is “Show as Chart” option. After clicking on it, you will receive the proposed chart.

image alt text

You can try various chart types. Each chart can describe your data in a different way, so it might be useful to experiment.

image alt text

Let’s check which points receives each league using a pie chart. For that, from the chart list, we choose “Pie”, and then select league_id and points_max.

image alt text

Now, you can export your chart; moreover, you can also export tables. In pro edition, you can export them in different forms: CSV, HTML, XLS, XLSX, XML, SQL.

image alt text

Finally, you can print your chart or data from DbVisualizer. It might be convenient when preparing a report.

image alt text

Conclusion

In this tutorial, we showed how you can consume data from different data sources using DB Visualizer. With the help of Dremio, we were able to join two different datasets that originated from different data source systems (Postgres and Microsoft SQL Server). Also, using Dremio we enhanced the meaning of this dataset by adding contextual information in its wiki space.

We hope you enjoyed this tutorial, stay tuned for more tutorials and resources to learn how you can use Dremio to start gaining insights from your data, faster.