8 minute read · April 16, 2024

Deep Dive into Better Stability with the new Memory Arbiter

Matthew Baker · Senior Product Manager

Tim Hurski · Senior Staff Software Engineer

Prashanth Badari · Staff Software Engineer

Sonal Chavan · Senior Software Engineer

Dexin Zhu · Principal Software Engineer

Dmitry Chirkov · Director, Software Engineering

Deep Dive into Better Stability with the new Memory Arbiter

Introduction

How Memory Arbiter changes execution

How Memory Arbiter Works

Testing Memory Arbiter and Spillable Hash Join

Initial Customer Testing

Recommended changes to WLM

Closing

Introduction

Memory Arbiter is a new and important feature that was released to Dremio Cloud and is coming to Dremio 25.0. MA enables executors to better utilize their direct memory by accurately tracking its usage across certain operators and, where possible, dynamically reducing the memory footprint of those operators to allow others to consume it. This, along with Spillable Hash Join, is the first of several features in the pipeline for Dremio to react better to diminishing resources and keep running where it previously would have failed.

How Memory Arbiter changes execution

Memory Arbiter changes execution for a particular set of operators once a fragment of work has landed on an executor and processing is underway. The way queries are planned and initial memory allocation remain the same and follow this flow:

A query is submitted, and a plan is generated
Based on the constituent operators of the plan and metadata regarding the source, a minimum viable estimate is generated

For operators that do not hold or expand as data is processed, this is a small static number
For operators that do hold or expand as data is processed, formulas are leveraged to generate this initial allocation based on the stats of the queried datasets

The initial memory allocation for a fragment of execution is made on the executors
Further memory allocations are made throughout the execution of the fragment as needed

The key differentiator is the final point. Previously, all operators would request as much memory as needed until query completion or memory exhaustion. Existing guard rails did offer some fault tolerance but would often intervene too late and not be able to prevent query failure. For example, spillable operators had their spill trigger point defined during planning; however, if one encountered an out-of-memory exception, it would start spilling instead. This would prevent failures, but such an exception signals there is little headroom available to continue execution. With Memory Arbiter, executors are now able to react more quickly to changes in allocatable memory instead of relying on arbitrary spill points set in the plan or an out-of-memory exception.

How Memory Arbiter Works

As mentioned above, Memory Arbiter only changes the execution of four operators (as of the time of writing):

Hash Join
Hash Agg
External Sort
TopN

These operators are often the highest memory consumers and, importantly, also have a mechanism to shrink their memory usage. Hash Join, Hash Agg and External Sort can spill in mid-operation, and the TopN can discard data structures that were used while processing and are no longer needed for the output. Each executor has its own Memory Arbiter that tracks each instance of these operators, the amount of memory they’re using and, importantly, the amount of memory they’re able to give up. If memory usage on an executor crosses a threshold, Memory Arbiter will start requesting that these operators shrink their usage, starting with the one with the most to give. Memory Arbiter also acts as a middleman for new memory allocation requests for these operators. Therefore, requests to shrink can be paired with temporarily blocking new memory allocations, thus reducing existing allocations and growth rate.

Memory Arbiter only tracks Direct memory since data processing happens in Direct memory. Not all operations in Direct memory are being tracked, despite Memory Arbiter's best efforts, Direct memory exhaustion remains possible.

Testing Memory Arbiter and Spillable Hash Join

The addition of Memory Arbiter and Spillable Hash Join is a significant change that requires expensive testing. We used TPC-DS, our Regression suite, and a new suite based on Customer queries. The customer queries selected were either very complex or were prone to failures before Memory Arbiter. All of these were compared to runs of the same version of Dremio with the features disabled.

Test Suite	Data	Run style	Number of executors
TPC-DS	1TB partitioned and non-partitioned	In-series	2, 4, 8
	1TB partitioned and non-partitioned	Concurrent 10, 20, 50	2, 4, 8
	10TB partitioned and non-partitioned	In-series	16
	100TB partitioned	In-series	32
Regression	Synthetic - relatively small to allow for faster cycles	In-series	8
Customer Queries	Synthetic - Same size as the original data set	In-series	8

We targeted 0 failures with a 5% performance delta for all executor counts.

With the inclusion of Memory Arbiter and Spillable Hash Join, Dremio could complete a full 100TB TPC-DS run without failure, which was previously impossible.

Initial Customer Testing

Multiple EE Cloud customers took part in the private beta. Some had been experiencing memory-related issues, while others were more to gauge any perceived performance impact. Comparing the same-sized time window before and after the features enablement (close to two months on either side), we saw some very positive results for customers facing memory-related issues.

The highest saw a 90% decrease in Direct memory allocation errors, with the lowest seeing a 56% decrease. This graph shows the customers that saw the 90% decrease:

MA = Memory Arbiter, SHJ = Spillable Hash Join. The two spikes were driven by an automated process, which repeatedly submitted the same query, even if it had not succeeded before and was destined to fail again.

The graph shows direct memory errors occurred with a relatively constant background rate with some spikes. After the enablement, these errors dropped immediately and lastingly.

The results for customers testing for performance degradation during day-to-day operation were also positive, with one asking if we’d forgotten to switch it on.

Recommended changes to WLM

This only concerns customers using Dremio self-managed once it launches in version 25.0. If you have set queue memory limits, we recommend removing them. The queue and job memory limits help self-managed customers avoid the noisy neighbor problem between workloads of varying priority. However, the queue and job memory limits are relatively heavy-handed and may intervene despite Memory Arbiter having the situation under control, thus diminishing its effectiveness.

Closing

With the Cloud general availability release behind us, we’ve been very happy with the results and are looking forward to the imminent 25.0 software release. We are already planning how to enhance the feature further with additional shrinkable operators, a better shrink selection algorithm, prioritized operators, integration with other monitoring systems, and others.

Article Topics

Dremio Blog: Product Insights

Deep Dive into Better Stability with the new Memory Arbiter

Table of Contents

Introduction

How Memory Arbiter changes execution

How Memory Arbiter Works

Testing Memory Arbiter and Spillable Hash Join

Initial Customer Testing

Recommended changes to WLM

Closing

Ready to Get Started?

Table of Contents

Introduction

How Memory Arbiter changes execution

How Memory Arbiter Works

Testing Memory Arbiter and Spillable Hash Join

Initial Customer Testing

Recommended changes to WLM

Closing

Additional Resources

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Table-Driven Access Policies Using Subqueries

Kubernetes Autoscaling in Dremio 24.3

Ready to Get Started?