In the current world of data science and business intelligence, each tool you use requires a separate driver to connect to each database it uses. These drivers may be included in the tool, but in general they are separate add-ons that users must install. Having to install this add-on incurs additional challenges for end users and IT administrators that get in the way of simply letting users analyze their own data. Additionally, individual drivers can be hundreds of megabytes large. Tools that support a large number of data sources can quickly balloon in size due entirely to bundling drivers.
Figure 1. BI tools require drivers for each database they support
With the advancements made as part of the Arrow Flight SQL initiative for Apache Arrow, this is no longer necessary. Arrow Flight SQL provides a lightning-fast protocol for sending data remotely and provides everything needed to describe the schema of a database. A database provider can expose an Arrow Flight SQL endpoint and any application written for Arrow Flight SQL will be able to connect to it.
How does this help with BI and data science tools that aren’t written for Arrow Flight SQL? A JDBC driver is being written for the Arrow Flight SQL protocol itself, rather than the traditional approach of writing the driver for a particular database. The idea is that the driver is a “one-size-fits-all” driver -- a user or tool vendor only needs to supply a generic driver that can connect to an infinite number of databases. This is even future-proof -- if a new database comes out, it can work with existing tools as long as an Arrow Flight SQL endpoint is provided. In fact, by adding an Arrow Flight SQL endpoint they would automatically enable JDBC connectivity too.
Figure 2. The JDBC driver for Arrow Flight SQL greatly simplifies configuration for the user.
Additional Advantages
Not only will Arrow Flight SQL reduce the technical burden on applications and users, but it leverages Arrow, which means it will provide better performance. And, just like Arrow, it will be open-source and thus as rough edges and bugs are found, they will be fixed by an active community. And since it will be leveraged by a wide variety of sources it is more likely to be of high quality. having a single reference JDBC driver allows any data source that adds an Arrow Flight SQL endpoint to get JDBC "for free" as an onramp. So the selling point is add an Arrow Flight SQL endpoint to your data source and automatically get JDBC connectivity.
Example BI Tool: Tableau
Tableau is one of the most popular analytics tools on the market. It has three variants - Tableau Desktop, Tableau Server & Tableau Online.
Tableau has the concept of named connectors that it comes installed with to provide connectivity between Tableau and various data sources (which can be relational, flat file, or multi-dimensional sources for example).
There are over 90 named connectors in Tableau Desktop (see Figure 3) as of version 2021.2.
Figure 3. Many drivers are required for Tableau to connect to the databases it supports.
However, not all sources have a driver included with Tableau and require extra steps to install. This driver download page lists instructions for installing the driver for each of the 90 sources. And the instructions vary by Tableau version and operating system (Windows, Mac, and Linux). Another problem is that some sources do not provide a driver for each operating system that Tableau Desktop and Tableau Server run on.
Under the Arrow Flight SQL model, any source that provides an Arrow Flight SQL endpoint can share the same driver, and that driver would work on all operating systems that Tableau Desktop and Server can run on.
Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg
By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.
Sep 22, 2023·Dremio Blog: Open Data Insights
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Oct 12, 2023·Product Insights from the Dremio Blog
Table-Driven Access Policies Using Subqueries
This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.