Data Lake Engines

A data lake engine is an application or service which queries and/or processes the vast sets of data stored in data lake storage. Data lake processing engines like Apache Spark are often used for batch data transformation jobs and machine learning. Data lake query engines such as Dremio and Presto are used to analyze structured and semi-structured data in place for business intelligence (BI) and data science.

How Z-Ordering in Apache Iceberg Helps Improve Performance

Unlocking Potential with Apache Iceberg Table Formats Dremio Subsurface: Advanced Storage Solutions Dremio Subsurface for Apache Spark Dremio Subsurface for Amazon S3 Data Lake Engines

September 13, 2022

How Z-Ordering in Apache Iceberg Helps Improve Performance

This tutorial introduces the Z-order clustering algorithm in Apache Iceberg and explains how it adds value to the file optimization strategy.

Apache Iceberg 101 – Your Guide to Learning Apache Iceberg Concepts and Practices

Unlocking Potential with Apache Iceberg Table Formats Dremio Subsurface for Apache Spark Data Lake Engines

September 12, 2022

Apache Iceberg 101 – Your Guide to Learning Apache Iceberg Concepts and Practices

This article provides an introductory course on the concepts and practices of Apache Iceberg tables for running scalable data lakehouses.

Getting Started with Apache Iceberg in Databricks

Unlocking Potential with Apache Iceberg Table Formats Dremio Subsurface for Apache Spark Data Lake Engines

September 9, 2022

Getting Started with Apache Iceberg in Databricks

Getting started with Apache Iceberg in Databricks is straightforward. This article walks through the setup and usage step by step.

Tracking & Triggering Pattern with Spark Stateful Streaming

Dremio Subsurface for Apache Spark Data Lake Engines

March 30, 2022

Tracking & Triggering Pattern with Spark Stateful Streaming

Subsurface LIVE Winter 2022 sessions are now online!

Data Lake Engines

How Z-Ordering in Apache Iceberg Helps Improve Performance

Apache Iceberg 101 – Your Guide to Learning Apache Iceberg Concepts and Practices

Getting Started with Apache Iceberg in Databricks

Tracking & Triggering Pattern with Spark Stateful Streaming