Running SQL-based workloads in the cloud

Performance and cost are two important considerations in determining optimized solutions for SQL workloads in the cloud. We look at TPC workloads and how they can be accelerated, invisible to client apps. We explore how Apache Arrow, Parquet, and Calcite can be used to provide a scalable, high-performance solution optimized for cloud deployments, while significantly reducing operational costs.

Speakers

Wayne Eckerson
President
Eckerson Group

Kevin Petrie
Vice President of Research
Eckerson Group

Jason Nadeau
Vice President of Marketing
Dremio

Speaker Name 4
Title
Company

In the cloud when storing data in systems like Amazon S3 and Azure ADLS, some SQL engines are available for querying the data, but these offerings do not provide the same rich features in terms of indexes and materialized views.

In this talk, we will describe a novel solution that uses Apache Arrow, Apache Parquet, and Apache Calcite to provide features similar to materialized views in relational databases. The key differences are that this approach is:

integrated into the separation of compute and storage available on cloud platforms, and scales to any number of nodes
works with nested data structures like JSON natively
fully open source