Companies use SQL Execution Engines to run analytical workloads on data they have accumulated in environments such as Hadoop and Amazon S3. SQL Execution Engines allow companies to apply existing skills and tools that are based on SQL to their analytical data without loading this data into a relational database.
SQL execution engines evaluate SQL expression on underlying sources of data. Traditional databases tightly integrate data storage and query evaluation. In contrast, SQL execution engines are able to evaluate queries against data in sources they do not manage, such as file systems, databases, and NoSQL systems.
In the Hadoop ecosystem, Hive quickly became the most popular way for users to run analytical jobs on data in HDFS due to the familiarity of its SQL interface. Users could take their existing skills in SQL to write HiveQL, which Hive compiled into MapReduce jobs that are executed against data in HDFS.
A number of alternatives to Hive emerged as new SQL execution engine projects, including Apache Impala, Apache Spark SQL, Apache Drill, Presto, and Apache HAWQ.
SQL Execution Engines typically allow users to:
Dremio is the Data-as-a-Service Platform. It helps you get more value from your data, faster. Unlike SQL Execution Engines, Dremio is a comprehensive solution that eliminates the need for complex ETL, aggregation tables, or data cubes. Instead of cobbling together products from multiple vendors, Dremio lets you start seeing value in minutes, with a user experience whose quality is unprecedented in the SQL Execution market.
Dremio let’s you easily query all of your data sources, not just the things you’ve moved to HDFS, with optimized push downs to relational and non-relational systems like MongoDB, Elasticsearch, and S3. Dremio lets you reach your data faster, with far less effort.
Analysts connect to Dremio with their favorite BI tool (Tableau, Power BI, Qlik Sense, etc.) or language (SQL, R, Python, etc.). To an analyst, all data appears as tables, no matter what system it came from, with the full power of SQL to join, aggregate, transform and sort data across one or more data sources.
Entirely invisible to your users, Dremio Reflections™ accelerate your data so that no matter how big it is or where it came from, it feels small, approachable, and instantaneous. Unlike cubes that only work for small data on a small set of pre-defined queries, Dremio makes all your SQL fast, including ad-hoc row-level queries as well as OLAP workloads.
Dremio runs as a distributed process, on dedicated infrastructure, in containers, or as a YARN application in your Hadoop cluster. With Dremio you can query data that’s already in HDFS, or you can query external systems directly, removing the need for ETL.
Unlike SQL Execution Engines, Dremio provides:
|Dremio||SQL Execution Engines|
|Accelerates aggregation queries||YesQueries are written against the logical schema, and Dremio's query planner automatically rewrites the query to use Aggregation Reflections, invisible to the end user.||NoRequires a slow full table scan each time.|
|Accelerates ad-hoc queries||YesQueries are written against the logical schema, and Dremio's query planner automatically rewrites the query to use Raw Reflections, invisible to the end user.||NoRequires a slow full table scan each time.|
|Accelerates relational data sources||YesDremio Reflections, and native optimizers with first class push downs of queries||NoVaries by engine, but most require third party ETL to move and prep data for HDFS or S3|
|Accelerates NoSQL data sources||YesDremio Reflections, and native optimizers with first class push downs of queries||NoVaries by engine, but most require third party ETL to move and prep data for HDFS|
|Integrated data curation||YesNatural and intuitive UI for data discovery, curation, acceleration, and collaboration.||NoRequires third party tool or custom scripts written by data engineers|
|Integrated Data Lineage||YesFull visibility into data lineage and access patterns for governance and errr remediation.||NoRequires third party tool or custom scripts written by data engineers|
Dremio lets you reimagine your end to end analytical processes, with a solution that makes your data engineers and your analysts more productive on day 1. Instead of using ETL and custom scripts to move your data between different environments, Dremio connects to your data sources directly, and automatically creates a highly optimized cache that makes even your biggest data feel small, approachable, and interactive. Dremio supports all your favorite BI tools, and advanced languages like Python/Pandas, R, and Apache Spark.
We see a wide range of applications, but here are a few popular first projects:
Dremio is a new approach to data analytics. Learn about Dremio.