Dremio Jekyll

Introducing the REST API

Intro

Users can now interact with Dremio through a comprehensive set of REST APIs, allowing DevOps teams to orchestrate Dremio with others components of their technology stacks, and end users to more easily build web applications directly on top of Dremio. You can perform most operations via our REST API, including issuing queries and retrieving results as JSON; browsing and managing the data catalog; managing reflections; manually refreshing a reflection; and accessing the status and results for a specific job. You can find the documentation for the API here, and follow along to learn how to perform some basic requests in Python.

Assumptions

To follow this tutorial you should have access to a Dremio installation, and you should have completed the first two tutorials - Getting Oriented to Dremio, and Working With Your First Dataset. You will need an account with a username and password to use the API, and familiarity with some basic concepts in Dremio.

You should also have Python installed and configured on your operating system. Go to the Python website to do so if you haven’t already.

Setting up your requests

To use the API, we’ll first want to import Python’s requests library to make HTTP requests and the json library so that API responses are returned in JSON format. Then, we’ll want to define some constants, including your Dremio username and password, headers for authentication, and your Dremio server. If you are running Dremio on your local machine, this will be localhost.

1
2
3
4
5
6
7
import json
import requests

username = '<your username>'
password = '<your password>'
headers = {'content-type':'application/json'}
dremioServer = 'http://<server>:9047'

The Dremio API is designed around RESTful principles, so next we will define some wrapper functions for HTTP GET, POST, PUT, and DELETE.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def apiGet(endpoint):
  return json.loads(requests.get('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers).text)

def apiPost(endpoint, body=None):
  text = requests.post('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers, data=json.dumps(body)).text

  # a post may return no data
  if (text):
    return json.loads(text)
  else:
    return None

def apiPut(endpoint, body=None):
  return requests.put('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers, data=json.dumps(body)).text

def apiDelete(endpoint):
  return requests.delete('{server}/api/v3/{endpoint}'.format(server=dremioServer, endpoint=endpoint), headers=headers)

Generally, POST and PUT requests will take parameters. For example, creating a source will take a source input. Input configurations are detailed within the Models subheading of each endpoint. Make sure to format your configuration properly to send requests.

Authenticating a user

Now that we’ve defined our request functions, we can start using the API. Dremio uses a token-based authentication system, so we first need to authenticate ourselves by generating a token. We can do this by using the login endpoint along with your username and password as the body. Note that we are currently using an older API for logging in.

1
2
3
4
5
6
7
8
9
10
11
def login(username, password):
  # we login using the old api for now
  loginData = {'userName': username, 'password': password}
  response = requests.post('http://demo.drem.io:9047/apiv2/login', headers=headers, data=json.dumps(loginData))
  data = json.loads(response.text)

  # retrieve the login token
  token = data['token']
  return {'content-type':'application/json', 'authorization':'_dremio{authToken}'.format(authToken=token)}

headers = login(username, password)

The login function will return the header that we must pass in to all of our other API requests.

Querying data using the API

In previous tutorials you learned how to upload data into Dremio either from your local machine or one of our connectors (S3, MongoDB, ADLS, among many others). Let’s use the SQL and Job endpoints to query this data and return rows.

Given a path to your dataset, you can use this path to access this dataset through the API. Define the path as an array, then form your SQL query to send to the API. You can find the path in the UI as well. Today I’m using a dataset of all WTA tennis matches, and my SQL query returns all of Serena Williams’s matches.

path

1
2
3
4
5
6
7
8
9
def querySQL(query):
  queryResponse = apiPost('sql', body={'sql': query})
  jobid = queryResponse['id']
  return jobid

path = ['\"@elbert\"', 'wta', 'matches']
path = '.'.join([str(x) for x in path])
query = "SELECT * FROM {source} WHERE winner_name = 'Serena Williams' or loser_name = 'Serena Williams'".format(source=path)
jobid = querySQL(query)

The SQL endpoint returns a job id that you can also access in the interface. If your query was successful, there will be a green checkmark along with associated metadata.

jobs

Now that we have the job id, we can use the Job endpoint to access those rows. The Job API uses a paging model, so you can recursively page through your rows by setting an offset and limit per call.

1
results = apiGet('job/{id}/results?offset={offset}&limit={limit}'.format(id='<jobid>', offset=0, limit=100))

The Job API model returns a rowCount, the returned rows, and the schema of the table.

Exploring the Catalog

Dremio’s data catalog is a representation of all of your datasets, spaces, and sources that you have either created or have access to.

catalog The Catalog endpoint gives you access to this.

The following request will return a top level view of the data catalog.

1
apiGet('catalog')

Each container returned in the response has an associated id. Given a path to a dataset, you can recursively traverse ids returned by the Catalog endpoint to get metadata about that dataset. Here’s how:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def getCatalogRoot():
  return apiGet('catalog')['data']

def getByPathChildren(path, children, depth):
  # search children for the item we are looking for
  for item in children:
    if item['path'][depth] == path[0]:
      path.pop(0)
      response = apiGet('catalog/{id}'.format(id=quote(item['id'])))
      if len(path) == 0:
        return response
      else:
        return getByPathChildren(path, response['children'], depth + 1)

def getByPath(path):
  # get the root catalog
  root = getCatalogRoot()

  for item in root:
    if item['path'][0] == path[0]:
      path.pop(0)

      if len(path) == 0:
        return item
      else:
        response = apiGet('catalog/{id}'.format(id=quote(item['id'])))
        return getByPathChildren(path, response['children'], 1)

dataset = getByPath(['@elbert', 'wta', 'matches'])

Using the id returned by getByPath, you can also refresh your reflections for a particular dataset given its id

1
apiPost('catalog/{id}/refresh'.format(id=dataset['id']))

The catalog endpoint also allows you to promote a file or folder to a dataset given a path.

1
2
3
4
path2 = ['path', 'to', 'your', 'dataset']
file = getByPath(path2)
newDataset = {'entityType': 'dataset', 'id': file['id'], 'type': 'PHYSICAL_DATASET', 'path': path2, 'format': {'type': 'JSON'}}
apiPost('catalog/{id}'.format(id=quote(file['id'])), body=newDataset)

Sources

Sources represent all the different connectors for Dremio including data from local machine. To access all of your sources, run:

1
apiGet('source')

To access a specific source, run:

1
apiGet('source/{<source-id>}')

To create a new source, use a POST request with the correct source type for your specific connector. Here I am connecting to our Elastic cluster with the correct formatting.

1
2
3
sourceParams = {'username': <your elastic username>, 'password': <your elastic password>, "hostList": [
     {"hostname": <your elastic hostname>, "port": <your elastic port>}], "authenticationType": "MASTER",}
apiPost('source', body=sourceParams)

Conclusion

We hope that this is a gentle but informative introduction about how to use the REST API to interface with Dremio. For more experienced and enterprise users, we also have endpoints for Reflections and Voting, which you can also explore.