Dremio Jekyll

How to Create an ARP Connector

Intro

Dremio’s Advanced Relational Pushdown (ARP) framework allows data consumers and developers to create custom relational connectors for those cases where data is stored in uncommon locations. Using the ARP framework not only allows you to create better connectors with improved push-down abilities but it also provides a way to develop connectors more efficiently. In this tutorial we will walk through the steps of using the ARP connector template to develop a connector, we will discuss best practices, things to avoid, potential problems and how to fix them.

Assumptions

For this tutorial we will use the latest deployment of Dremio which you can download from our deploy page. Also, for the purposes of this tutorial we will use SQLite, a software library that provides a lightweight serverless relational database management system. You can obtain its JDBC driver from their Github page.

Setting things up

Before we start working on our SQLite connector, let’s take a look at what are the steps that we will take to get this connector ready to go. There are two files that are necessary for the creation of an ARP-based plugin:

  • Storage plugin configuration: [data source]Conf.java
  • Plugin ARP YAML file: [data source].arp.yaml

The storage plugin configuration file tells Dremio what the name of the plugin should be, what connection options should be displayed in the source UI such as host address, user credentials, etc., what the name of the ARP file is, which JDBC driver to use and how to make a connection to the JDBC driver.

The Plugin ARP YAML file is what is used to modify the SQL queries that are sent to the JDBC driver, allowing you to specify support for different data types and functions, as well as rewrite them if tweaks need to be made for your specific data source.

Understanding the ARP File Format

The ARP file is broken down into several sections:

image alt text

Metadata: This section outlines some high level metadata about the plugin.

Syntax: This section allows for specifying some general syntax items like the identifier quote character.

Data Types: This section outlines which data types are supported by the plugin, their names as they appear in the JDBC driver, and how they map to Dremio types.

Relational Algebra: This section is divided up into a number of other subsections:

  • Aggregation: Specify what aggregate functions, such as SUM, MAX, etc, are supported and what signatures they have. You can also specify a rewrite to alter the SQL for how this is issued.
  • except/project/join/sort/union/union_all/values: These sections indicate if the specific operation is supported or not.
  • Expressions: This section outlines general operations that are supported. The main sections are:
  • Operators: Outlines which scalar functions, such as SIN, SUBSTR, LOWER, etc, are supported, along with the signatures of these functions which are supported. Finally, you can also specify a rewrite to alter the SQL for how this is issued.
  • Variable_length_operators: The same as operators, but allows specification of functions which may have a variable number of arguments, such as AND and OR.

If an operation or function is not specified in the ARP file, then Dremio will handle the operation itself. Any operations which are indicated as supported but need to be stacked on operations which are not will not be pushed down to the SQL query.

Understanding the ARP Template

The conf or Storage Plug-in file for the ARP connector provides Dremio with the necessary information about the JDBC driver that we will use, the name of the connector, GUI connection options and much more.

In this section I’ll identify which elements were edited to work with SQLite. At first, if you are not a day-to-day coder, this file might look intimidating, but the changes needed are very straightforward and foolproof.

Open the SqliteConf.java file and notice every instance of Sqlite.

Line 38

1
@SourceType(value = "SQLITE", label = "SQLite")

Line 39

1
public class SqliteConf extends AbstractArpConf<SqliteConf> {

And Line 40

1
private static final String ARP_FILENAME = "arp/implementation/sqlite-arp.yaml";

In addition to these lines, we have indicated the JDBC driver class for the driver that we will use in line 43:

Note that this driver class is different for each data source, you will need to look up in your source’s documentation to see what is the driver class for that source.

1
private static final String DRIVER = "org.sqlite.JDBC";

Editing The GUI Input Fields

Now we need to enable the GUI fields for the connection parameters. In this case SQLite only needs the following parameters: Database and Record fetch size.

1
2
3
4
5
6
7
8
@NotBlank
  @Tag(1)
  @DisplayMetadata(label = "Database")
  public String database;
@Tag(2)
  @DisplayMetadata(label = "Record fetch size")
  @NotMetadataImpacting
  public int fetchSize = 200;

These lines will be responsible for the following items in the GUI

image alt text

By default, each field is considered “metadata impacting”, which means that a change of the value of that field could result in different metadata that Dremio will see. For example, the service account username would be metadata impacting because different users could have different permissions and therefore see different objects. The @NotMetadataImpacting tag tells Dremio that this field is not considered “metadata impacting”.

Setting up the database URL

Before continuing with this step, read the documentation for your data source and find out what is the correct structure for its JDBC URL, in the case of SQLite it is: org.sqlite.JDBC

Because we are capturing the database name only in the UI, we need to provide that value to the JDBC URL in the toJdbcConnectionString class, to do so, make sure the variable names inside this class match the names of the parameters captured in the section above. In our case the class will look as follows:

1
2
3
4
5
6
@VisibleForTesting
  public String toJdbcConnectionString() {
    final String database = checkNotNull(this.database, "Missing database.");
    return String.format("jdbc:sqlite:%s", database);

 

Save and close the file.

Editing The Plug-in ARP File

This file is used to modify the SQL queries that are sent to the JDBC driver, here we will indicate what are the different data types supported as well as SQL functions supported by the datasource. It is a good practice to check the datasource’s documentation to look for this information. You can see the original file here.

In lines 19, and 20 we indicate the name of the API as well as the datasource that we will use.

1
2
3
4
5
metadata:
  # Manually Configured Metadata Section.
  name: SQLITE
  apiname: sqlite
  spec_version: '1'

The syntax section allows for specifying some general syntax items like the identifier quote character.

1
2
3
4
5
6
7
8
9
10
syntax:
  # Manually Configured Syntax Section.
  identifier_quote: '"'
  identifier_length_limit: 128
  allows_boolean_literal: false
  map_boolean_literal_to_bit: false
  supports_catalogs: false
  supports_schemas: false

 

Now we can move on to the data_types section, this part of the file outlines which data types are supported by the plugin, their names as they appear in the JDBC driver, and how they map to Dremio types. You will need to refer to your data source’s documentation to complete the mapping.

First we will do the basic mappings for numeric types, date types and string types. We will list some of them here, you can refer to the final YAML file to see all of them.

Data types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
data_types:
  mappings:
  # Manually Configured Data Types Mappings Section.
  - source:
      name: "INTEGER"
    dremio:
      name: "integer"
    required_cast_arguments: "none"
  - source:
      name: "REAL"
    dremio:
      name: "double"
    required_cast_arguments: "none"
  - source:
      name: "VARCHAR"
    dremio:
      name: "varchar"
    required_cast_arguments: "none"

 

Building The Connector

Now that we have all the files ready, we will go to the final step of creating the ARP connector. Before continuing, make sure your connector directory contains the following files in the appropriate paths:

  • Pom.xml
  • Readme.md
  • Src (folder) ->Main->Resources->Arp->Implementation->sqlite-arp.yaml
  • Scr->Main->Java->Com->dremio->exec->store->jdbc->conf->SqliteConf.java

Now, verify that Maven is installed on your computer. Once ready using the terminal, navigate to the directory where the pom, readme and src files are located and run the following command:

1
mvn clean install

After a few minutes, you will see a Build Success message on the terminal, and also there will be a Target directory created in your current working path.

If there are any errors in the build, Maven will display the exact location of the issue so you can troubleshoot it, once ready simply save the file and run the build command again.

Installing The Connector

First, make sure Dremio is not running, then, from the target **directory, copy the **.jar file and place it inside the /$DREMIO_HOME/jars directory. Additionally, move the JDBC driver to the /$DREMIO_HOME/jars/3rdparty directory. Start and log in in Dremio.

Inside Dremio, click on the plus sign to add a new Data source

image alt text

Select SQLite from the list of available sources.

image alt text

Add the connection and authentication parameters and click save.

image alt text

If everything went well, you should be able to see the directories inside your data source.

Things that could go wrong

While this method is very straightforward, there is always the possibility that something might need certain level of troubleshooting. Here are some of the common issues that you might encounter and how to solve them.

When running the mvn clean install **command, you might see the following error on your terminal. This means that something is not ok with the **[your source]Conf.java file.

image alt text

Thankfully, Maven is good about pinpointing the coordinates (line and character, issue, and file causing it) of where the issue is located.

image alt text

These are some of the things to look for in the [your source]Conf.java file:

  • Double check that the private static final String DRIVER = contains the correct JDBC class for the data source that you are working with.
  • Check that the @DisplayMetadata variable names (username, password, etc) are uniform across the entire file.
  • Make sure that the JDBC string is correctly built

Pushdown Failures

To debug pushdowns for queries, open the logback.xml **file located in the **dremio/conf directory and add the following lines:

1
2
3
<logger name="com.dremio.exec.store.jdbc">
    <level value="${dremio.log.level:-trace}"/>
  </logger>

After running queries, you might notice lines in the server.log file like the following:

1
- 2019-07-11 18:56:24,001 [22d879a7-ce3d-f2ca-f380-005a88865700/0:foreman-planning] DEBUG c.d.e.store.jdbc.dialect.arp.ArpYaml - Operator / not supported. Aborting pushdown.

To troubleshoot aborted pushdowns, always refer to the [your source]-arp.yaml and double check your data source’s documentation to make sure the supported SQL functions are listed correctly.

Conclusion

In earlier versions of Dremio each connector was developed on an independent code path. Starting in Dremio 3.0 we have developed an all-new declarative framework (ARP) for developing relational connectors. This allows us to standardize on a single code base that is now more efficient, provides better push-down abilities, and is easier for us to maintain.

In this tutorial we walked through the steps of developing a custom ARP connector for SQLite, we also highlighted what are some of the issues that users may come across when developing the connector and how to solve them.

We hope you find this tutorial useful, stay tuned to learn more about how Dremio can help you gain insights from your data, faster.