Dremio Blog

5 minute read · September 24, 2021

Arrow Flight SQL: A Universal JDBC Driver

James Duong Lead Software Developer, Dremio

Start For Free

Copied to clipboard

In the current world of data science and business intelligence, each tool you use requires a separate driver to connect to each database it uses. These drivers may be included in the tool, but in general they are separate add-ons that users must install. Having to install this add-on incurs additional challenges for end users and IT administrators that get in the way of simply letting users analyze their own data. Additionally, individual drivers can be hundreds of megabytes large. Tools that support a large number of data sources can quickly balloon in size due entirely to bundling drivers.

Figure 1. BI tools require drivers for each database they support

With the advancements made as part of the Arrow Flight SQL initiative for Apache Arrow, this is no longer necessary. Arrow Flight SQL provides a lightning-fast protocol for sending data remotely and provides everything needed to describe the schema of a database. A database provider can expose an Arrow Flight SQL endpoint and any application written for Arrow Flight SQL will be able to connect to it.

How does this help with BI and data science tools that aren’t written for Arrow Flight SQL? A JDBC driver is being written for the Arrow Flight SQL protocol itself, rather than the traditional approach of writing the driver for a particular database. The idea is that the driver is a “one-size-fits-all” driver -- a user or tool vendor only needs to supply a generic driver that can connect to an infinite number of databases. This is even future-proof -- if a new database comes out, it can work with existing tools as long as an Arrow Flight SQL endpoint is provided. In fact, by adding an Arrow Flight SQL endpoint they would automatically enable JDBC connectivity too.

Figure 2. The JDBC driver for Arrow Flight SQL greatly simplifies configuration for the user.

Additional Advantages

Not only will Arrow Flight SQL reduce the technical burden on applications and users, but it leverages Arrow, which means it will provide better performance. And, just like Arrow, it will be open-source and thus as rough edges and bugs are found, they will be fixed by an active community. And since it will be leveraged by a wide variety of sources it is more likely to be of high quality. having a single reference JDBC driver allows any data source that adds an Arrow Flight SQL endpoint to get JDBC "for free" as an onramp. So the selling point is add an Arrow Flight SQL endpoint to your data source and automatically get JDBC connectivity.

Example BI Tool: Tableau

Tableau is one of the most popular analytics tools on the market. It has three variants - Tableau Desktop, Tableau Server & Tableau Online.

Tableau has the concept of named connectors that it comes installed with to provide connectivity between Tableau and various data sources (which can be relational, flat file, or multi-dimensional sources for example).

There are over 90 named connectors in Tableau Desktop (see Figure 3) as of version 2021.2.

Figure 3. Many drivers are required for Tableau to connect to the databases it supports.

However, not all sources have a driver included with Tableau and require extra steps to install. This driver download page lists instructions for installing the driver for each of the 90 sources. And the instructions vary by Tableau version and operating system (Windows, Mac, and Linux). Another problem is that some sources do not provide a driver for each operating system that Tableau Desktop and Tableau Server run on.

Under the Arrow Flight SQL model, any source that provides an Arrow Flight SQL endpoint can share the same driver, and that driver would work on all operating systems that Tableau Desktop and Server can run on.

Learn More

To learn more about Arrow Flight SQL watch the Arrow Flight and Arrow Flight SQL Accelerating Data Movement video from Subsurface LIVE. You can also follow the status of the Flight SQL pull request on Github. To learn more about Apache Arrow and ways to contribute to the project, checkout the Apache Arrow documentation.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Various Insights

Blog coverpage for Ingesting Data into Aparche Iceberg with Dremio

Feb 1, 2024 Product Insights from the Dremio Blog

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.

Alex Merced

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Oct 12, 2023 Product Insights from the Dremio Blog

Table-Driven Access Policies Using Subqueries

This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.

Albert Vernon

Arrow Flight SQL: A Universal JDBC Driver

Table of Contents

Try Dremio Cloud free for 30 days

Ready to Get Started?

Table of Contents

Try Dremio Cloud free for 30 days

Related Dremio Articles

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

Table-Driven Access Policies Using Subqueries

Ready to Get Started?