Dremio Jekyll

Analyzing Data With TIBCO Spotfire and Dremio

Intro

As data consumers, we all want to start gaining value from our data as soon as we get our hands on it. TIBCO Spotfire is an analytics tool designed to answer any question, easy or challenging, based on your data as fast as possible. It is a BI platform that can be used across organizations and provides a simple way to consume your data via attractive reports and visualizations regardless of the user’s level of expertise.

In this tutorial we are going to show you how to connect TIBCO Spotfire to Dremio so you can start gaining insights from your data, faster. Additionally, thanks to the amazing TIBCO Spotfire team, we will share with you best practices to improve the performance of your data analysis when working with push down queries. We will discuss how can you share and collaborate on your analysis, optimize analysis files and also troubleshoot in case the inevitable hiccup shows up while you are working with your data.

While we encourage you to read the tutorial, we’ve also made a video available in case you rather sit back and watch the process.

Assumptions

In this tutorial we will be working on a MS Windows environment; to follow along, we assume that you have access to the following requirements:

Connecting To Dremio

To connect Spotfire Analyst to Dremio, we will use the native Dremio connector.

First, in Spotfire Analyst, open the “Files and data” flyout and click “Connect to”, then select “Dremio” followed by “New Connection”.

image alt text

In the Dremio Connection dialog, all your System and User DSNs for Dremio are listed. Select the DSN you want to connect to. Then, enter your Dremio Username and Password, and then click Connect.

Congrats! You have successfully connected Spotfire to Dremio. In the next step, you can select the data that you want to analyze.

Select the Dremio data to analyze

When you have connected to Dremio, you can target and select the data that is relevant to you. You select data in the Views in connection dialog. It’s easy to start analyzing your data:

1. Browse the tables available in Dremio.

2. Add the tables you want as views, which will be the data tables you analyze in Spotfire.

3. For each view, you can decide which columns you want to include.

image alt text

To create a more specific and flexible data selection, you can use a range of powerful tools in this dialog, such as:

1. Custom queries. This is where you can put your SQL skills to use. With custom queries, you can select the data you want to analyze by typing a custom SQL query.

2. Prompting. Leave the data selection to the user of your analysis file. You configure prompts based on columns of your choice. Then, the end user who opens the analysis can select to limit and view data for relevant values only. For example, she can select data within a certain span of time or for a specific geographic region.

Once you are happy with your data selection, click OK. In the next and final step, you can choose how you want your data to be accessed.

Push-down queries or imported data?

When you have selected data to analyze in Spotfire, you can choose how Spotfire should retrieve the data from Dremio. In the summary of data tables that you have selected, you can click each table to change the data loading method.

image alt text

In this tutorial we will select External which is the default option to work with Dremio. This means the data table is kept in-database in Dremio. Spotfire pushes different queries to the database for the relevant slices of data, based on your actions in the analysis.

You can also select Imported and Spotfire will extract the entire data table up-front. When you import data tables, you also unlock all analytical functions in the embedded in-memory data engine of Spotfire. You can also use the statistical capabilities of the embedded TIBCO Enterprise Runtime for R.

The third option is On-demand (corresponding to a dynamic WHERE clause). This means that slices of data will be extracted based on user actions in the analysis. You can define the criteria, which could be actions like marking or filtering data, or changing document properties. You can also combine on-demand data loading with External data tables.

After this step, your data will be ready for you to create reports and visualizations on.

image alt text

Using TIBCO Spotfire, I can now start visualizing the San Francisco Incidents dataset that we connected to.

image alt text

TIBCO Spotfire best practices

Sharing and collaborating on your Dremio analyses

It’s easy to share and collaborate on analysis files in Spotfire. In the TIBCO Cloud Spotfire web client library, you have access to your organization’s shared team folder. When you save your analysis file in your team folder, it’s available to potentially hundreds of thousands of users in your organization.

You can also share analysis files only with selected individuals. To do this, you can set up a shared folder inside your private Home folder. Simply right-click the folder, and then select Sharing to configure the sharing options. In the dialog box, select Shared and enter the email addresses of the users you want to share the contents with.

Tip: If you are using Spotfire in an on-premises environment, sharing works differently. You use the library administrator in the Spotfire Analyst Windows client to configure access rights for specific users and groups.

image alt text

image alt text

Enabling Spotfire web client access to your Dremio instance

With TIBCO Cloud Spotfire, you can continue to work on your Dremio analyses in the web client, once the first connection to the data has been configured in the Spotfire Analyst Windows client. This is also a convenient way to share your analyses within your organization. You might need to allow traffic from the TIBCO Cloud Spotfire servers in Dremio. For up-to-date information about the IP addresses you need to whitelist, see: Whitelisting TIBCO Cloud Spotfire IP Addresses

Optimizing analysis files

If your Spotfire analysis is very complex, there are things you can do to improve performance. The following are best-practices tips to help you optimize your analysis and reduce the load on the Dremio instance.

Combining push-down queries with on-demand imported data

You can optimize your analysis by combining the load methods External and On-demand in appropriate ways. This can help you reduce the number of rows scanned by Dremio and the number of queries pushed into Dremio.

Cascading filters and caching in Spotfire

Spotfire supports cascading filters. Cascading filters means that the values displayed in filters are dependent on selections in other filters. Only relevant values are available, and any values that are already filtered out are not displayed.

When you use the External loading method to access data from Dremio with push-down queries, cascading filters is disabled by default. If you have enabled cascading filters and are using many filters and/or filters with many unique values, you might experience long waiting times for visualizations and filter content to be refreshed. This is because all filters will be re-calculated using push-down queries when you change filter settings.

You can enable cascading filters in the data connection settings (select Data > Data connection properties on the menu bar, and then click Settings…). This is also where you configure the push-down query cache. Unless you are analyzing high velocity data it’s recommended to increase the cache expiration time from minutes to hours.

image alt text

Learn more in the data connection performance settings user documentation.

Spread visualizations across pages

With Spotfire, it is possible to link different data tables together, so that marking items in one visualization also marks items in visualizations based on other linked data tables. This is sometimes called brush-linking. If you are using the External loading method, Spotfire analysis files with many linked visualizations on the same page will push queries to Dremio every time you mark items in a brush-linked visualization.

This might lead to long waiting times until all visualizations on the page are refreshed. Consider spreading visualizations on multiple pages and design your visual data discovery workflow accordingly.

image alt text

Troubleshooting

Log files

If you want to troubleshoot, you can study the Spotfire log files. To enable logging, select Help > Support diagnostics and logging on the menu bar. On the Logging tab, you can change the log level according to how much detail you want.

Analyzing queries

Queries from Spotfire to Dremio are logged in the query log, Spotfire.Dxp.QueryLog.log. Here you can find useful information for understanding more about the queries from Spotfire, and you can learn, for example, which visualization is generating a certain query.

To enable the query log, set the log level to DEBUG or TRACE.

Analyzing events and actions

In the Windows client, user actions and events are logged to the general purpose log file Spotfire.Dxp.SupportDiagnostics.log. In this file, you can find detailed information that can be useful for troubleshooting.

Marking in non-aggregated visualizations

If you want to be able to mark individual rows or data points in your visualizations, for example to mark items in a table visualization, the data table must have a primary key defined that identifies each row. If your data table does not have a primary key, you can define the primary key column manually when you select tables and columns, or later by editing your data connection.

image alt text

Learn more in the Views in Connection user documentation.

Limit of 10.000 rows visible in table visualizations

When your data is External, Spotfire only retrieves and visualizes 10.000 rows in table visualizations. This is to protect you from extracting too many rows from big data sources like Dremio. If you have an on-premises environment of TIBCO Spotfire, your Spotfire administrator can adjust the limit by editing the following preference setting in the Administration Manager:

Application > DataImportPreferences > TableVisualizationExternalRowLimit

image alt text

Conclusion

In this tutorial we showed you the steps to connect TIBCO Spotfire to Dremio. We talked about the different ways that Spotfire retrieves data for analysis. We connected from Spotfire to the the SF Incidents data located in the Dremio Samples data source and created a dashboard in Spotfire.

Additionally, we shared best practices to enhance analysis performance in Spotfire, collaborate and also troubleshoot.

We hope you enjoyed this tutorial, stay tuned for more tutorials and resources to learn how you can use Dremio to start gaining insights from your data, faster.