Dremio Jekyll

Gandiva Initiative Update: Improving SQL Projection Performance by 70x

Last month we announced The Gandiva Initiative for Apache Arrow, a new vectorized execution kernel for Apache Arrow based on LLVM. In that post we provided a few examples of how this work is improving the SQL processing performance in Dremio. In this post we’ll look at another example query in more detail and show how Gandiva is improving the performance of a production query by over 70x.

Gandiva was designed to be used in many contexts. In Dremio, we are building Gandiva into an upcoming release, where it will power our SQL execution engine. The current version of Dremio uses a Java compiler to compile queries at runtime, and is thus limited in its ability to use vectorization (SIMD instructions). Gandiva’s LLVM-based compiler, combined with Arrow’s columnar memory representation, enable Dremio to take full advantage of vectorization in the CPU. We have a number of tests we use to evaluate Dremio’s performance, including thousands of interesting queries we have encountered working with customers. We think it is important to continuously evaluate performance using a mix of different query conventions, including industry benchmarks such as TPC, query frameworks from a number of BI vendors, as well as real-world production queries from our customers.

A Problematic Query

The following query was provided by a customer who was generally happy with the performance of their deployment (muti-PB cluster, queries regularly scan 10s and 100s of TB) but had found a particular query pattern was causing issues for their users due to high latency. The query appears below (slightly modified to mask any sensitive data):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
SELECT SUM(stat_count) AS sum_stat
FROM   json."data.json"
WHERE  ( NOT ( ( CASE
                   WHEN ( ( alpha_key_code = 1101 )
                           OR ( alpha_key_code = 1102 ) ) THEN 'CODE100'
                   WHEN ( ( ( alpha_key_code = 1201 )
                             OR ( alpha_key_code = 1207 ) )
                          AND ( beta_key_code <> 24 )
                          AND ( beta_key_code <> 26 ) ) THEN 'CODE101'
                   WHEN ( ( alpha_key_code = 1235 )
                          AND ( beta_key_code <> 24 )
                          AND ( beta_key_code <> 26 ) ) THEN 'CODE102'
                   WHEN ( ( alpha_key_code = 1236 )
                          AND ( beta_key_code <> 24 )
                          AND ( beta_key_code <> 26 )
                          AND ( zeta_key_code = '9' ) ) THEN 'CODE103'
                   WHEN ( ( alpha_key_code = 2101 )
                           OR ( alpha_key_code = 2102 )
                           OR ( alpha_key_code = 2103 )
                           OR ( alpha_key_code = 2104 )
                           OR ( alpha_key_code = 2105 )
                           OR ( alpha_key_code = 2106 ) ) THEN ( CASE
                                                                  WHEN ( ( ( alpha_key_code = 2103 )
                                                                            OR ( alpha_key_code = 2104 )
                                                                            OR ( alpha_key_code = 2105 ) )
                                                                          AND ( theta_key_code <> '' )
                                                                          AND ( gamma_key_code = 3 ) ) THEN 'CODE104'
                                                                  ELSE 'CODE105'
                                                                END )
                   WHEN ( ( ( alpha_key_code = 1501 )
                            AND ( ( delta_key_code = 4 )
                                   OR ( delta_key_code = 5 )
                                   OR ( delta_key_code = 6 ) )
                            AND ( ( epsilon_key_code = 101 )
                                   OR ( epsilon_key_code = 102 )
                                   OR ( epsilon_key_code = 103 )
                                   OR ( epsilon_key_code = 422 ) ) )
                           OR ( ( alpha_key_code = 2204 )
                                AND ( zeta_key_code = 'CODE25' )
                                AND ( ( eta_key_code = 'CODE105' )
                                       OR ( eta_key_code = 'CODE1' )
                                       OR ( eta_key_code = 'CODE2' )
                                       OR ( eta_key_code = 'CODE3' )
                                       OR ( eta_key_code = 'CODE4' )
                                       OR ( eta_key_code = 'CODE5' )
                                       OR ( eta_key_code = 'CODE6' )
                                       OR ( eta_key_code = 'CODE7' )
                                       OR ( eta_key_code = 'CODE8' )
                                       OR ( eta_key_code = 'CODE9' )
                                       OR ( eta_key_code = 'CODE10' )
                                       OR ( eta_key_code = 'CODE11' ) ) ) ) THEN 'CODE106'
                   WHEN ( ( alpha_key_code = 2001 )
                           OR ( alpha_key_code = 2002 )
                           OR ( alpha_key_code = 2003 )
                           OR ( alpha_key_code = 2004 )
                           OR ( alpha_key_code = 2005 )
                           OR ( alpha_key_code = 2006 )
                           OR ( alpha_key_code = 2007 )
                           OR ( alpha_key_code = 2008 ) ) THEN 'Freezes'
                   WHEN ( alpha_key_code = 3002 ) THEN ( CASE
                                                          WHEN ( beta_key_code = 26 ) THEN 'CODE107'
                                                          WHEN ( beta_key_code = 24 ) THEN 'CODE108'
                                                          WHEN ( gamma_key_code = 11 ) THEN 'CODE109'
                                                          ELSE NULL
                                                        END )
                   WHEN ( ( ( alpha_key_code = 1216 )
                             OR ( alpha_key_code = 1218 ) )
                          AND ( proc_inp_typ_cde = 17 ) ) THEN 'CODE110'
                   WHEN ( ( ( ( alpha_key_code = 1103 )
                               OR ( alpha_key_code = 1104 )
                               OR ( alpha_key_code = 1105 )
                               OR ( alpha_key_code = 1106 ) )
                            AND ( ( gamma_key_code = 2 )
                                   OR ( gamma_key_code = 4 )
                                   OR ( gamma_key_code = 5 ) )
                            AND ( theta_key_code <> '' ) )
                           OR ( ( alpha_key_code = 1501 )
                                AND ( ( delta_key_code = 1 )
                                       OR ( delta_key_code = 2 )
                                       OR ( delta_key_code = 3 ) )
                                AND ( ( epsilon_key_code = 101 )
                                       OR ( epsilon_key_code = 102 )
                                       OR ( epsilon_key_code = 103 )
                                       OR ( epsilon_key_code = 104 )
                                       OR ( epsilon_key_code = 116 )
                                       OR ( epsilon_key_code = 201 )
                                       OR ( epsilon_key_code = 422 ) ) )
                           OR ( ( alpha_key_code = 2203 )
                                AND ( ( delta_key_code = 1 )
                                       OR ( delta_key_code = 2 )
                                       OR ( delta_key_code = 3 ) ) )
                           OR ( ( alpha_key_code = 2204 )
                                AND ( zeta_key_code = 'CODE111' )
                                AND ( ( eta_key_code = 'CODE112' )
                                       OR ( eta_key_code = 'CODE113' )
                                       OR ( eta_key_code = 'CODE114' )
                                       OR ( eta_key_code = 'CODE115' )
                                       OR ( eta_key_code = 'CODE116' )
                                       OR ( eta_key_code = 'CODE117' )
                                       OR ( eta_key_code = 'CODE118' )
                                       OR ( eta_key_code = 'CODE119' )
                                       OR ( eta_key_code = 'CODE120' )
                                       OR ( eta_key_code = 'CODE121' )
                                       OR ( eta_key_code = 'CODE122' )
                                       OR ( eta_key_code = 'CODE123' )
                                       OR ( eta_key_code = 'CODE124' )
                                       OR ( eta_key_code = 'CODE125' ) ) ) ) THEN 'CODE126'
                   WHEN ( ( alpha_key_code = 3003 )
                          AND ( beta_key_code <> 24 )
                          AND ( beta_key_code <> 26 ) ) THEN 'CODE127'
                   WHEN ( ( alpha_key_code = 2301 )
                           OR ( alpha_key_code = 2302 )
                           OR ( alpha_key_code = 2303 ) ) THEN 'CODE128'
                   WHEN ( ( alpha_key_code = 2304 )
                           OR ( alpha_key_code = 2305 )
                           OR ( alpha_key_code = 2306 )
                           OR ( alpha_key_code = 3001 ) ) THEN 'CODE129'
                   WHEN ( ( alpha_key_code = 2501 )
                           OR ( alpha_key_code = 2502 )
                           OR ( alpha_key_code = 2503 )
                           OR ( alpha_key_code = 2504 ) ) THEN 'CODE130'
                   WHEN ( ( alpha_key_code = 2601 )
                           OR ( alpha_key_code = 2602 )
                           OR ( alpha_key_code = 2603 )
                           OR ( alpha_key_code = 2604 )
                           OR ( alpha_key_code = 2605 )
                           OR ( alpha_key_code = 2606 )
                           OR ( alpha_key_code = 2801 ) ) THEN 'CODE131'
                   WHEN ( alpha_key_code = 3009 ) THEN 'CODE132'
                   WHEN ( alpha_key_code = 3010 ) THEN 'CODE133'
                   WHEN ( ( alpha_key_code = 1901 )
                           OR ( alpha_key_code = 1902 )
                           OR ( alpha_key_code = 1903 ) ) THEN 'CODE134'
                   ELSE NULL
                 END ) IS NULL ) )
HAVING ( Count(1) > 0 )

In order to debug the query we generated a synthetic dataset following the same schema as the JSON used in their dataset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
#include <iostream>
#include <string>
#include <vector>

using namespace std;

class DataGenerator {
  public:
  virtual std::string getNext() = 0;
};

class IntDataGenerator : public DataGenerator {
  public:
  IntDataGenerator(std::string name, std::vector<int> list) {
    name_ = name;
    list_ = list;
  }

  std::string getNext() override {
    int i = rand() % list_.size();
    return name_ + ":" + std::to_string(list_.at(i));
  }

  private:
  std::string name_;
  std::vector<int> list_;
};

class IntRangeDataGenerator : public DataGenerator {
  public:
  IntRangeDataGenerator(std::string name, int start, int end) {
    name_ = name;
    start_ = start;
    end_ = end;
  }

  std::string getNext() override {
    int i = rand() % (end_ - start_);
    return name_ + ":" + std::to_string(start_ + i);
  }

  private:
  std::string name_;
  int start_;
  int end_;
};

class StringDataGenerator : public DataGenerator {
  public:
  StringDataGenerator(std::string name, std::vector<std::string> list) {
    name_ = name;
    list_ = list;
  }

  std::string getNext() override {
    int i = rand() % list_.size();
    return name_ + ":\"" + list_.at(i) + "\"";
  }

  private:
  std::string name_;
  std::vector<std::string> list_;
};

int main()
{
  std::vector<DataGenerator *> generators;
  int num_rows = 10000000;

  generators.push_back(new IntDataGenerator(
                         "alpha_key_code",
                         {1101,1102,1201,1207,1235,1236,2101,2102,2103,2104,2105,2106,
                          2103,2104,2105,1501,2204,2001,2002,2003,2004,2005,2006,2007,
                          2008,3002,1216,1218,1103,1104,1105,1106,1501,2203,2204,3003,
                          2301,2302,2303,2304,2305,2306,3001,2501,2502,2503,2504,2601,
                          2602,2603,2604,2605,2606,2801,3009,3010,1901,1902,1903}));
  generators.push_back(new IntDataGenerator(
                         "beta_key_code",
                         {11, 24, 36}));
  generators.push_back(new IntDataGenerator(
                         "iota_key_code",
                         {2, 3, 4, 5, 11}));
  generators.push_back(new IntDataGenerator(
                         "kappa_key_code",
                         {1, 2, 3, 4, 5, 6}));
  generators.push_back(new StringDataGenerator(
                         "eta_key_code",
                         {"CODE1","CODE2","CODE3","CODE4","CODE5","CODE6",
                          "CODE7","CODE8","CODE9","CODE10","CODE11","CODE12",
                          "CODE13","CODE14","CODE15","CODE16","CODE17","CODE18","CODE19",
                          "CODE20","CODE21","CODE22","CODE23","CODE24"}));

  generators.push_back(new StringDataGenerator(
                         "zeta_key_code",
                         {"9", "CODE25"}));
  generators.push_back(new IntDataGenerator(
                         "epsilon_key_code",
                         {101, 102, 103, 104, 116, 201, 422}));
  generators.push_back(new StringDataGenerator(
                         "theta_key_code",
                         {"", "a"}));
  generators.push_back(new IntRangeDataGenerator(
                         "lambda_key_code",
                         15, 17));
  generators.push_back(new IntRangeDataGenerator(
                         "mu_key_code",
                         100, 200));

  for (int i = 0; i < num_rows; i++) {
    bool is_first = true;

    cout << "{";
    for (auto &gen : generators) {
      if (!is_first) {
        cout << ", ";
      }
      is_first = false;

      cout << gen->getNext();
    }
    cout << "}\n";
  }
  return 0;
}

We were able to keep the size small and still observe the issue, so the tests you see below are for 10M records running on a MacBook Pro.

A query plan in Dremio can vary in complexity and number of stages. The system collects a significant amount of diagnostic information for each query. This data can help understand how time is allocated across multiple threads in each of the stages of the query plan. We started by running the customer query on a current release of Dremio to help identify if one or more specific stages of the query were the main contributors to latency.

You can find this diagnostic information in the job history as the query profile. Instructions for how to find a job are provided in this tutorial, How to Share a Query Profile, which is what our support engineers will ask you to do if you’re ever working with us to debug one of your queries.

In the image below you can see some of the summary details about this query plan.

query profile for original query

If we focus on Average Processing Time column in the middle of the table, the last two stages stand out as the main contributors of the query latency: PROJECT and JSON_SUB_SCAN. Let’s take a closer look at each of these stages.

PROJECT Stage

In a SQL query the project stage determines which columns are returned. This particular query is over 130 lines, with an outer case statement having 15 cases. Of these, two have nested case expressions. And, all of them have long Boolean expressions (AND/OR), the biggest of which has 30 AND/OR clauses. Queries can have many PROJECT stages; this particular stage is where about half of the total query processing time is spent, and the other projection stages have trivial overhead.

When we look at the same stage processed in Gandiva, things look a lot better:

query profile with Gandiva kernel processing via LLVM

Here the PROJECT time has dropped from almost 29 seconds to 351 milliseconds, which is 81.8x faster. We tested similar queries across multiple runs and observed an average of 73x lower latency with Gandiva. In the customer query over 99% of the time was spent in a PROJECT stage, so we expect the improvements provided by Gandiva will improve the total query time by over 70-80x.

JSON_SUB_SCAN Stage

The last stage, JSON_SUB_SCAN, is the time required to scan the raw JSON data from the file system. In this test all data was stored in a single large JSON file. This stage of the query was taking over 20 seconds, partially because scanning JSON is inherently inefficient, and because a single thread was used for a single file.

Scanning JSON is an area of performance that can be greatly improved by the use of Data Reflections, which optimizes the physical data for more efficient scanning through the use of Parquet, sorting, filtering, and partitioning the data ahead of time to accelerate various query patterns. You can learn more in our docs or in this tutorial on Getting Started With Data Reflections.

We created a raw Data Reflection on the test data to assess how this might further improve the total latency of the query. You can see the time for JSON_SUB_SCAN has been replaced by PARQUET_ROW_SCAN and the time has dropped from over 20 seconds to 1.5 seconds:

query profile with Data Reflection

Conclusion

These results are preliminary. We believe there are still many opportunities for Gandiva to improve the performance of query evaluation in Dremio. For example, in these tests the setup time for Java-based query compilation is significantly lower once the query is cached (0.420s vs 0.016s), whereas with Gandiva we do not yet cache query plans. We will soon add to Gandiva filter expressions and fast code paths for batches which have no nulls. In future blog posts we will keep you updated on our progress.