6 minute read · September 13, 2023

New Array Functions in Dremio

Albert Vernon

Albert Vernon · Senior Product Manager, Dremio

Introduction

Data types in Dremio fall into two categories: primitive types such as INT and VARCHAR that hold single values, and semi-structured types like LIST, STRUCT, and MAP that hold complex values.

Arrays are lists of arbitrary size of any single type, indexed by non-negative integers, and are useful for holding sparse data.

Note: LIST and ARRAY are synonyms in Dremio, so you will see references to both in documentation, error messages, and function names.

Dremio Cloud as of mid-September and Dremio Software version 24.2 provide a variety of functions for manipulating array data. 

The remainder of this article describes array functions in Dremio.

Dremio Array Functions

Creating Arrays

In Dremio Software v24.2+ and in Dremio Cloud, you create array literals with the ARRAY keyword like this:

SELECT ARRAY['apple', 'strawberry', 'banana']
-- ["apple","strawberry","banana"]

Dremio Software versions before v24.2 do not have array literals, but you can get the same result using CONVERT_FROM and JSON strings. For example:

SELECT CONVERT_FROM('["apple", "strawberry", "banana"]', 'json')
-- ["apple","strawberry","banana"]

You can also query array columns from tables and views in the usual manner. The Python 3 script below creates a file example.parquet that has array columns named fruits and numbers:

#!/usr/bin/env python3
import pyarrow as pa
import pyarrow.parquet as pq

table = pa.table(
    {
        "fruits": [("apple", "strawberry", "banana")],
        "numbers": [(1, 2, 3,)],
    }
)

pq.write_table(table, "example.parquet")

After uploading example.parquet you can query fruits and numbers through Dremio as you would any columns:

SELECT fruits FROM example;
-- ["apple","strawberry","banana"]
SELECT numbers FROM example;
-- [1, 2, 3]

Available Functions

The functions below are available in Dremio Software v24.2 and in Dremio Cloud:

SignatureDescription
array_avg(A)Returns the average of all non-null elements in A.
array_contains(A, V)Returns whether A contains V.
array_max(A)Returns the maximum value in A.
array_min(A)Returns the minimum value in A.
array_remove(A, V)Removes all elements that equal V in A.
array_sum(A)Returns the sum of all non-null elements in A.
cardinality(A)Returns the number of elements in A.
unnest(A)Converts elements in A into rows.

Examples

SELECT ARRAY_AVG(numbers) FROM example;
-- 2.000000
SELECT ARRAY_CONTAINS(fruits, 'banana') FROM example;
-- true
SELECT ARRAY_MAX(numbers) FROM example;
-- 3
SELECT ARRAY_MIN(numbers) FROM example;
-- 1
SELECT ARRAY_REMOVE(fruits, 'strawberry') FROM example;
-- ["apple","banana"]
SELECT ARRAY_SUM(numbers) FROM example;
-- 6
SELECT CARDINALITY(fruits) FROM example;
-- 3
SELECT fruit FROM example, UNNEST(fruits) as t(fruit);
-- apple
-- strawberry
-- banana

Coming Soon

The functions below are planned for Dremio Software v24.3 and the November 2023 update of Dremio Cloud:

SignatureDescription
array_agg(expr)Returns an array consisting of all values in expr.
array_append(A, E)Returns a new array with E at the end of A.
array_distinct(A)Returns a new array with only the distinct elements from A.
array_frequency(A)Returns a map where the keys are the unique elements in A,
and the values are how many times the key appears.
array_prepend(A, E)Returns a new array with E at the beginning of A.
arrays_overlap(X, Y)Returns whether X and Y have any elements in common.
set_union(X, Y, ...)Returns an array of all the distinct values contained in each
array of the input.

Get Started with Dremio Cloud – It’s Free!

Dremio Cloud: The easy and open, fully managed data lakehouse platform.

Sign Up Now

Everything you need to build, automate, and query your data lakehouse in production.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.