Dremio Blog

16 minute read · March 12, 2026

“Random Engine” Design for Dremio Software

Michael Flower Michael Flower System Engineer, Dremio
Start For Free
“Random Engine” Design for Dremio Software
Copied to clipboard

Key Takeaways

  • The current Dremio Software architecture has performance bottlenecks during peak workloads due to fixed engine pools.
  • The proposed 'Random Engine Routing' design introduces Time-Segmented Engine Pools to manage predictable peak workloads effectively.
  • This design features baseline, medium, and peak engines that activate automatically based on scheduled demand, improving resource utilization.
  • New routing rules and queues ensure optimal engine usage while minimizing cloud costs during idle times.
  • Considerations for setup include enabling auto-start/stop features and adjusting operational hours for engines to further enhance performance.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Current Architecture (Conceptual)

The current Dremio Software architecture often uses a fixed pool of executor engines. While this provides stability for baseline workloads, it struggles to handle predictable spikes in demand, leading to performance bottlenecks during peak periods, and overallocation of engines during quieter periods. Overallocation can have an impact on Cloud costs.

This document’s examples were tested using Dremio Software, running on v26.0.6.

Proposed “Random Engine Routing” Design

To accommodate predictable peak workloads, the proposed design introduces the concept of Time-Segmented Engine Pools.

  • Instead of one fixed pool, Dremio will be configured with multiple, identically-configured, replica engine pools, active during specific time segments. 
  • A random mechanism will distribute query traffic across the pools. 
  • In order to reduce cloud consumption costs, Engines will be deactivated at configurable times. In addition, Engines will also be deactivated after a defined period of idle time.

Core Components

  • Baseline-workload Engine (E-Low): The primary engine running 24/7, handling all off-peak and standard workloads.
  • Medium-workload Engine (E-Medium): An additional replica engine, configured identically to E-Base, but only scheduled to be active during known medium-active hours (e.g. 08:00–12:59). 
  • Peak-workload Engine (E-Peak): An additional replica engine, configured identically to E-Base, but only scheduled to be active during known peak-active hours (e.g. 14:00–17:59).
  • Random Load Balancer (Conceptual): A logical function within the Dremio Coordinator that randomly distributes incoming queries among all currently active engine pools.

Design Benefits

FeatureCurrent ArchitectureProposed Random Engine Design
Scaling for peak/medium workloadsManual or relies on external scaling (if configured). Slow reaction.Automatic activation of pre-configured replica engines during scheduled times. Fast reaction.
Engine utilisationPotentially underutilised during off-peak.
Overloaded during peak.
Consistent utilisation of E-Base, with supplemental capacity (E-Medium, E-Peak) only when scheduled.
PredictabilityHigh variability in performance during peak.Stable and predictable query performance during known peak windows.

Designed Workload Capacity

The following table explains the available engine capacity over a 24-hour period, comparing the current fixed architecture with the proposed Round Robin design using scheduled Peak and Medium Engines. 

  • Assuming that each engine has 4 nodes:
Time Slot (24h)Proposed EnginesProposed Capacity (Nodes)
08:00 - 12:59 (Medium)E-Base, E-Medium8
13:00 - 13:59 (Low)E-Base4
14:00 - 17:59 (Peak)E-Base, E-Medium, E-Peak12
18:00 - 07:59 (Low)E-Base4

Setup

This section will describe how to configure the additional engines and rules, to satisfy the use case described earlier. 

Assumptions

  • An Engine “E-Base” has previously been configured.
  • A Queue “Base” already exists, which routes queries to the E-Base engine
  • A Rule already exists to capture the conditions in order to route to the “Base” queue.

Engine Configuration

Two new engines will be added: E-Medium and E-Peak. They will be sized identically to the existing engine E-Base.

We need to ensure that our new engines are enabled to “Automatically start/stop” (see Engine Settings documentation):

  • When no traffic is routed to the engine, then the engine will automatically stop.
  • When any traffic is routed to the engine and it is not running, then the engine will automatically start.

NB. If the “E-Base” Engine is required to be always on (24/7), then this engine’s setting for “Automatically start/stop” should be disabled. Alternatively, if the E-Base engine should be shutdown after a period of inactivity, then the “Automatically start/stop” feature should be enabled, but with an appropriately-defined Idle Time.

Workload Management Queues and Rules

We need to add 2 extra queues and rules to handle Medium and Peak workloads. 

New Queues 

(see also  Dremio Queues documentation)

The new queues will be based on the existing E-Base queue, with the difference that they will route to the new Engines. 

Queue NameEngine Name
MediumE-Medium
PeakE-Peak

New Rules

(see also Dremio Rules documentation)

We need to add our new rules so that they are only activated during required time periods. We will also need to apportion the workload randomly during those allotted time periods. 

The new rules:

  • must include the existing rule conditions of our E-Base engine (see below).
  • have a higher precedence than the existing rule of our E-Base engine. 

Any existing “E-Base” routing rules will also have to be included into our new rules. 

  • For example, assuming the existing routing condition for E-Base: query_cost() >= 300000
  • This same condition will have to be applied to our new rules.

In these examples: 

  • We will assume that query_cost() >= 300000 is an existing rule condition.
  • We will use the EXTRACT (HOUR FROM CURRENT_TIME) feature to determine the allotted time periods for medium / peak workloads
    • Note that the CURRENT_TIME function returns a result in GMT / UTC.
PriorityRule NameRule ConditionsQueueComment
1 (new rule)Peakquery_cost() >= 300000
AND (EXTRACT(HOUR FROM CURRENT_TIME) BETWEEN 14 AND 17)
AND RANDOM() < 0.33 
PeakPeak Activity engine will only be active between 14:00 -17:59. 
2 (New rule)Mediumquery_cost() >= 300000
AND (EXTRACT(HOUR FROM CURRENT_TIME) BETWEEN 14 AND 17 OR EXTRACT(HOUR FROM CURRENT_TIME) BETWEEN 08 AND 12)
AND RANDOM() < 0.5 
MediumMedium Activity engine will be active between both 14:00 -17:59 and between 08:00 -12:59. 
3 (existing rule)Basequery_cost() >= 300000BaseBase Activity engine will be active all day.

The result of this configuration will be:

Time of DayEngine NameProportion of Traffic
0800 - 1259 (Medium)E-Base~ 50%
E-Medium~ 50%
1300 - 1359 (Base)E-Base100%
1400 - 1759 (Peak)E-Base~ 33%
E-Medium~ 33%
E-Peak~ 33%
1800 - 0759 (Base)E-Base100%

Other Considerations

Using replicas only on weekdays

If you would only like the replica engines to be active only on weekdays, add the following rule to the replica engines’ routing:

AND extract(DOW from CURRENT_DATE) between 2 and 6

NB. the DOW function extracts day-of-week as follows:

  • 2 == Monday
  • 6 == Friday

Adding an extra time band

Eg., If we decide to add an extra band, such as “SuperPeak”, to cater for higher concurrency, only active between 17:00 and 17:59:

  • Step 1: Add a new engine, Name: E-SuperPeak
  • Step 2: Add a new Queue
Queue NameEngine Name
SuperPeakE-SuperPeak
  • Step 3: Add a new Rule and edit existing rules

Note that for the new rule which is adding a 4th time band, we will allocate ¼ of the queries to the new Engine, during the time it is required (17:00 - 17:59). This means that 25% of the queries are allocated to SuperPeak and the remaining 75% will be allocated to the other engines.

OrderNameRuleQueue
1 (new rule)SupPeakquery_cost() >= 300000AND (EXTRACT(HOUR FROM CURRENT_TIME) = 17)AND RANDOM() < 0.25 SuperPeak
2 (existing rule 1)Peakquery_cost() >= 300000AND (EXTRACT(HOUR FROM CURRENT_TIME) BETWEEN 14 AND 17)AND RANDOM() < 0.33 Peak
3 (existing rule 2)Mediumquery_cost() >= 300000
AND (EXTRACT(HOUR FROM CURRENT_TIME) BETWEEN 14 AND 17 OR EXTRACT(HOUR FROM CURRENT_TIME) BETWEEN 08 AND 12)AND RANDOM() < 0.5 
Medium
4 (existing rule 3)Basequery_cost() >= 300000Base

Engine Auto Start and Auto Stop

Dremio engines can be enabled to automatically start and stop - see Adding an Engine.

Auto Start Considerations

Description of autostart feature: When a query is routed to the engine, then the engine will start (if it is not already running).

Impact. If an engine is not already running, then there will be a delay before the query is  executed. End users will be impacted with slower query times.

Workaround:  Start the engine before it is scheduled to run. This can be achieved using the Dremio REST API - see starting engines - REST API documentation.

Auto Stop Considerations

Description of autostop feature: After a period of idle time (default 1 hour), the engine will automatically stop running. 

Impact of low idle time: If the engine stops too frequently, then subsequent queries will be delayed from executing.

Workarounds: 

  1. Configure a higher idle time (this will increase your running costs).
  2. Reconsider your time boundaries. Perhaps this engine is not required to be running for all of the configured period.

Impact of high idle time: The engine will remain running for a longer period. This will increase the cost of running your instance.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.