Analyzing Multiple Data Sources Simultaneously with Dremio and DBVisualizer
DbVisualizer is a powerful tool for database management and analysis that can be used for every major database. DbVisualizer provides many useful features such as visual actions for SQL queries, schema exporting, support procedures, functions, triggers, etc.
However, in spite of all the great abilities, merging data sources in DbVisualizer is not a simple task. Dremio, the Data-as-a-Service platform provides an ideal solution to this problem. In this tutorial, we will show how to connect Microsoft SQL Server and Postgres to Dremio, perform data curation in Dremio, and then connect Dremio to DbVisualizer. Also, we will demonstrate how to work with DbVisualizer.
We assume that you have Dremio and JDBC driver installed; if not, go to Dremio’s deployments page, pick the installations for your operating system, and visit Dremio Docs to read about installations and deployments. Also, we will be working with SQL-Server, Postgre, and DbVisualizer, so you need to have them installed and configured as well.
Importing data into sources
In this tutorial, we will use a baseball dataset available in Kaggle. The History of Baseball is a reformatted version of the famous Lahman’s Baseball Database. It contains batting and pitching statistics, fielding statistics, standings, team stats, park stats, player demographics, managerial records, awards, post-season data, and more.
Follow the link above and get a database.sqllite file which can be easily transformed into SQL. By executing the file in MSSQL and Postgres, we import data in the databases.
Inserting data in MSSQL:
Inserting data in Postgres:
Connecting to Dremio
When data is uploaded, login to Dremio and choose an option to add a new source. Then, select Microsoft SQL Server and Postgre and fill in fields with necessary information as shown below:
Data curation in Dremio
When the sources are connected, we can begin data preprocessing. Let’s open the all_star table. First, we can see that starting_pos contains empty text, so we exclude such values. Note how convenient it is to see the results preview in Dremio.
Also, this field contains numeric values, but now it is a string type. Let’s convert it to an integer.
Next, we want to combine Microsoft SQL Server and PostgreSQL data sources. For that, we click join, choose the desired data source and its data.
Next, we need to add a join condition. In our case, player_award_vote and all_star have common leage_id, so we select it and click “Apply”.
After all the preparations, Dremio will automatically generate the final SQL script.
1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT nested_0.player_id AS player_id, nested_0.date_year AS date_year, nested_0.game_num AS game_num, nested_0.game_id AS game_id, nested_0.team_id AS team_id, nested_0.gp AS gp, nested_0.starting_pos AS starting_pos, nested_0.league_id AS league_id, join_player_award_vote.league_id AS league_id0, join_player_award_vote.award_id AS award_id, join_player_award_vote.date_year AS date_year0, join_player_award_vote.player_id AS player_id0, join_player_award_vote.points_won AS points_won, join_player_award_vote.points_max AS points_max, join_player_award_vote.votes_first AS votes_first FROM ( SELECT player_id, date_year, game_num, game_id, team_id, league_id, CONVERT_TO_INTEGER(gp, 1, 1, 0) AS gp, CONVERT_TO_INTEGER(starting_pos, 1, 1, 0) AS starting_pos FROM Postgre.public.all_star WHERE starting_pos <> '' ) nested_0 INNER JOIN MSSS.baseball.dbo.player_award_vote AS join_player_award_vote ON nested_0.league_id = join_player_award_vote.league_id
Finally, let’s save the dataset to our previously created space.
When data is ready, you can describe it using Wiki-content and Tags in Catalog.
You can see the result in your space.
Connecting to DbVisualizer
Now, we want to connect DbVisualizer to Dremio. For this, add a Dremio driver to driver manager tools. For more detailed information you can take a look at this DbVisualizer tutorial and also visit dremio docs.
When the driver is configured, add a new connection, choose the Dremio driver and click “Connect”. This is what you should see as a result:
You can look at your data by double-clicking on its name or using SQL commander. Also, with SQL commander you can analyze data in different ways. You can use SQL queries, visualizing tools, etc.
For chart visualizing, you should purchase a pro edition; there is also a free trial period. You can get more detailed information on how to work with charts in DbVisualizer here.
At the right side of the page, there is “Show as Chart” option. After clicking on it, you will receive the proposed chart.
You can try various chart types. Each chart can describe your data in a different way, so it might be useful to experiment.
Let’s check which points receives each league using a pie chart. For that, from the chart list, we choose “Pie”, and then select league_id and points_max.
Now, you can export your chart; moreover, you can also export tables. In pro edition, you can export them in different forms: CSV, HTML, XLS, XLSX, XML, SQL.
Finally, you can print your chart or data from DbVisualizer. It might be convenient when preparing a report.
In this tutorial, we showed how you can consume data from different data sources using DB Visualizer. With the help of Dremio, we were able to join two different datasets that originated from different data source systems (Postgres and Microsoft SQL Server). Also, using Dremio we enhanced the meaning of this dataset by adding contextual information in its wiki space.
We hope you enjoyed this tutorial, stay tuned for more tutorials and resources to learn how you can use Dremio to start gaining insights from your data, faster.