13 minute read · September 29, 2025

Handling Complex Data Types in Dremio

Michael Flower

Michael Flower · System Engineer, Dremio

Overview

Dremio provides out-of-the-box methods of handling complex data types in, for example JSON and parquet datasets. Common characteristics are embedded “columns within columns” and “rows within columns”. 

In this blog, we will demonstrate how Dremio can discover and handle these types of data. The examples have been tested on the following Dremio versions:

  • 23.1
  • 25.2.18
  • 26.0.5

Preparation of Example Dataset

We will use the “Samples” dataset which is available to any Dremio environment.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Add the Sample data source

Click on the + symbol next to Data Lakes

Add the sample source

After the source has been added, click on samples.dremio.com

Click on Dremio University

Scroll down the list of files, and click on restaurant_reviews.parquet

Click “Save” to promote the Restaurant Reviews parquet file as a PDS (Promoted Data Set).

Run a test query against the sample source

We will run the SQL examples from the resultant “SQL Editor”, e.g.

SELECT *
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo

Handling Arrays and Structs in Dremio

Handling Arrays in Dremio

Run the following query to see an example array:

SELECT  "_id", business_id , categories
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo

Here, we are retrieving 3 columns. Note that the “categories” column is an array :

  • (as denoted by the icon)
  • Let’s use this column for further analysis.

Flatten an Array

Arrays can be flattened using the SQL FLATTEN() function. 

Run the following SQL command:

select  "_id", business_id
, categories
, flatten(categories) as category
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo;

The result shows:

  • As previously we see the unflattened array (the “categories” column) 
  • Also, each entry of that array has been flattened into a column (“category”); each array entry listed on separate rows, e.g.  “_id” value ‘W6KUFNWNYYcE+g60’ has 2 entries in the results, one row per category

Expanding an Array

Alternatively, the array can be expanded along the single row, e.g. run the following SQL command:

SELECT  "_id", business_id
, categories
, categories[0] as cat0
, categories[1] as cat1
, categories[2] as cat2
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet";

The result shows:

  • The unflattened array (the “categories” column) 
  • The first 3 array entries have been flattened into other columns on the same  row (cat0, cat1, cat2).

Handling Structs in Dremio

Run the following query to see an example struct:

select  "_id", business_id
, votes
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet";

Note that the “votes” column is a struct (as denoted by the  icon)

Expanding a struct

We can extract an individual column by name, e.g. to extract the “cool” column from the struct:

select  "_id", business_id
, foo.votes
, foo.votes.cool 
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo ;

The result shows:

  • The unflattened struct (the “votes” column) 
  • The extracted “cool” column and its value (0).

We can also expand all of the struct entries into separate columns using the asterisk notation, e.g.

SELECT "_id", business_id
, foo.votes
, foo.votes.* 
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo ;

The ”votes” struct is expanded into separate columns ("cool", "useful", “funny”).

Full example of flattened and expanded dataset:

To fully flatten and expand our sample dataset, run this query:

SELECT  "_id", business_id
, "open", full_address
, flatten(categories) as category
, foo.attributes.parking.*
, flatten(neighborhoods) as neighbourhood
, foo.hours.tuesday."open" as tuesdayopen
, foo.hours.tuesday."close" as tuesdayclosed
, foo.hours.wednesday."open" as wednesdayopen
, foo.hours.wednesday."close" as wednesdayclosed
, foo.hours.thursday."open" as thursdayopen
, foo.hours.thursday."close" as thursdayclosed
, foo.hours.friday."open" as fridayopen
, foo.hours.friday."close" as fridayclosed
, foo.hours.saturday."open" as saturdayopen
, foo.hours.saturday."close" as saturdayclosed
, foo.hours.sunday."open" as sundayopen
, foo.hours.sunday."close" as sundayclosed
, foo.hours.monday."open" as mondayopen
, foo.hours.monday."close" as mondayclosed
, foo.votes.*
, stars0
, foo.votes0.*
, foo.compliments.*
, flatten(elite) as elite
, average_stars, type1
, review_count0, name0, user_id0, fans
, type0, business_id0, "_index", "_type", "_uid", "_id0", yelping_since
, type, "date", review_id, user_id
, longitude, state, stars, latitude
, text
, city, review_count, name
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo;

Nb; in the original parquet dataset there are:

  • 100 rows 
  • 38 columns

However in the resulting dataset:

  • 371 rows (flattened)
  • 69 columns (expanded)

Using a Semantic layer

To present the flattened, expanded dataset in a digestible form, save the query as a Virtual Data Set:

Click on Save View As …

Provide a name and a storage location for the new VDS, e.g.

  • Name: reviews
  • Location: working

Click “Save”

We can now access our ready-formatted dataset using the VDS, e.g.

SELECT * FROM working.reviews;

Next steps:

  • This can be saved as a raw reflection, so that the data is already pre-processed. 
  • Or it can be saved as an Iceberg table, e.g. 

create table s3b.iceberg.reviews as SELECT * FROM working.reviews;

Make data engineers and analysts 10x more productive

Boost efficiency with AI-powered agents, faster coding for engineers, instant insights for analysts.