Overview
Dremio provides out-of-the-box methods of handling complex data types in, for example JSON and parquet datasets. Common characteristics are embedded “columns within columns” and “rows within columns”.
In this blog, we will demonstrate how Dremio can discover and handle these types of data. The examples have been tested on the following Dremio versions:
- 23.1
- 25.2.18
- 26.0.5
Preparation of Example Dataset
We will use the “Samples” dataset which is available to any Dremio environment.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Add the Sample data source
Click on the + symbol next to Data Lakes

Add the sample source

After the source has been added, click on samples.dremio.com

Click on Dremio University

Scroll down the list of files, and click on restaurant_reviews.parquet

Click “Save” to promote the Restaurant Reviews parquet file as a PDS (Promoted Data Set).

Run a test query against the sample source
We will run the SQL examples from the resultant “SQL Editor”, e.g.
SELECT *
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo

Handling Arrays and Structs in Dremio
Handling Arrays in Dremio
Run the following query to see an example array:
SELECT "_id", business_id , categories
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo

Here, we are retrieving 3 columns. Note that the “categories” column is an array :
- (as denoted by the
icon)
- Let’s use this column for further analysis.
Flatten an Array
Arrays can be flattened using the SQL FLATTEN() function.
Run the following SQL command:
select "_id", business_id
, categories
, flatten(categories) as category
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo;

The result shows:
- As previously we see the unflattened array (the “categories” column)
- Also, each entry of that array has been flattened into a column (“category”); each array entry listed on separate rows, e.g. “_id” value ‘W6KUFNWNYYcE+g60’ has 2 entries in the results, one row per category
Expanding an Array
Alternatively, the array can be expanded along the single row, e.g. run the following SQL command:
SELECT "_id", business_id
, categories
, categories[0] as cat0
, categories[1] as cat1
, categories[2] as cat2
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet";

The result shows:
- The unflattened array (the “categories” column)
- The first 3 array entries have been flattened into other columns on the same row (cat0, cat1, cat2).
Handling Structs in Dremio
Run the following query to see an example struct:
select "_id", business_id
, votes
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet";

Note that the “votes” column is a struct (as denoted by the icon)
Expanding a struct
We can extract an individual column by name, e.g. to extract the “cool” column from the struct:
select "_id", business_id
, foo.votes
, foo.votes.cool
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo ;

The result shows:
- The unflattened struct (the “votes” column)
- The extracted “cool” column and its value (0).
We can also expand all of the struct entries into separate columns using the asterisk notation, e.g.
SELECT "_id", business_id
, foo.votes
, foo.votes.*
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo ;

The ”votes” struct is expanded into separate columns ("cool", "useful", “funny”).
Full example of flattened and expanded dataset:
To fully flatten and expand our sample dataset, run this query:
SELECT "_id", business_id
, "open", full_address
, flatten(categories) as category
, foo.attributes.parking.*
, flatten(neighborhoods) as neighbourhood
, foo.hours.tuesday."open" as tuesdayopen
, foo.hours.tuesday."close" as tuesdayclosed
, foo.hours.wednesday."open" as wednesdayopen
, foo.hours.wednesday."close" as wednesdayclosed
, foo.hours.thursday."open" as thursdayopen
, foo.hours.thursday."close" as thursdayclosed
, foo.hours.friday."open" as fridayopen
, foo.hours.friday."close" as fridayclosed
, foo.hours.saturday."open" as saturdayopen
, foo.hours.saturday."close" as saturdayclosed
, foo.hours.sunday."open" as sundayopen
, foo.hours.sunday."close" as sundayclosed
, foo.hours.monday."open" as mondayopen
, foo.hours.monday."close" as mondayclosed
, foo.votes.*
, stars0
, foo.votes0.*
, foo.compliments.*
, flatten(elite) as elite
, average_stars, type1
, review_count0, name0, user_id0, fans
, type0, business_id0, "_index", "_type", "_uid", "_id0", yelping_since
, type, "date", review_id, user_id
, longitude, state, stars, latitude
, text
, city, review_count, name
FROM Samples."samples.dremio.com"."Dremio University"."restaurant_reviews.parquet" foo;

Nb; in the original parquet dataset there are:
- 100 rows
- 38 columns
However in the resulting dataset:
- 371 rows (flattened)
- 69 columns (expanded)
Using a Semantic layer
To present the flattened, expanded dataset in a digestible form, save the query as a Virtual Data Set:
Click on Save View As …

Provide a name and a storage location for the new VDS, e.g.
- Name: reviews
- Location: working
Click “Save”

We can now access our ready-formatted dataset using the VDS, e.g.
SELECT * FROM working.reviews;

Next steps:
- This can be saved as a raw reflection, so that the data is already pre-processed.
- Or it can be saved as an Iceberg table, e.g.
create table s3b.iceberg.reviews as SELECT * FROM working.reviews;