Most companies rely on a data warehouse to centralize current and historic data for analytical use. These systems are critical to the business and used across many different departments, including sales, marketing, finance, and others.
In this article we compare the capabilities of a data warehouse to those of Dremio, the Data-as-a-Service Platform. Data warehouses are part of a larger end-to-end analytical process that involves many steps, technologies, and teams across IT. Dremio is fundamentally different in that it integrates these steps into a new self-service solution for business users to access any data from any source using their favorite BI and data science tools. Dremio is complementary to your existing Data Warehouse investments.
The data warehouse is a central repository of data from across the enterprise that powers data analytics. BI tools and data science technologies access data from the data warehouse to serve the needs of many use cases across most business units. Because most companies use many different applications to run their business, the data warehouse simplifies access for analysis as a single system. It also provides data in a standardized and reliable form, which makes the analysis more reliable and informative.
Typically there are one or more data pipelines that move data from sources into the data warehouse. As part of these data pipelines, data may be transformed, filtered, enriched, or summarized in order to make it more suitable to the needs of analysts and data scientists. In some projects ETL tools are used as part of the data pipeline. In other projects data prep tools might also be used to help get the data ready for the data warehouse.
It is common for companies to store subsets of the data warehouse in smaller systems called data marts. A data mart might be specific to an individual department or region within an organization, with data specific to these users and their needs.
The data warehouse must be capable of reliably storing large volumes of data, while providing SQL access for BI tools and data science tools with high concurrency and low latency. In addition, user access must be secure and follow enterprise governance standards.
Dremio is a new and unique approach to data analytics that let’s you do more with your data, with less effort, and at an end-to-end speed never before possible. Dremio connects to your data warehouse and source systems directly, minimizing the need for elaborate data pipelines, ETL, and data prep tools. Dremio also optimizes your data and queries, providing fast, interactive performance no matter where your data originates. Instead of building cubes, aggregation tables, or BI extracts, Dremio makes you data fast using cutting-edge columnar, in-memory data structures.
Instead of cobbling together products from multiple vendors, Dremio lets you start seeing value in minutes, and for the first time makes all of your data easily accessible to IT as well as business users.
Analysts connect to Dremio with their favorite BI tool (Tableau, Power BI, Qlik Sense, etc.) or language (SQL, R, Python, etc.). To an analyst, all data appears as tables, no matter what system it came from, with the full power of SQL to join, aggregate, transform and sort data across one or more data sources. Dremio is entirely transparent to your users. And Dremio Reflections accelerate your data so that no matter the size or data source, your data feels small, approachable, and instantaneous. Unlike cubes that only work for a small set of pre-defined queries, Dremio makes all your SQL fast, including ad-hoc row-level queries.
|Store multi-TB to multi-PB datasets||Yes||YesLimited, most systems struggle to support PB-scale datasets.|
|Provide full SQL access to all structured data over ODBC and JDBC||Yes||Yes|
|Provide full SQL access to all unstructured data over ODBC and JDBC||Yes||No|
|Support BI workloads with 100s of concurrent users||Yes||No|
|Ensure secure, governed access through integration to centralized security controls for authentication, authorization, and auditing, as well as end-to-end encryption||Yes||Yes|
|Ensure SLAs across many users and multiple tenants with resource management||Yes||Yes|
|Scale out on commodity hardware to 1000+ nodes||Yes||No|
|Query and join across external sources, including non-relational systems (eg, MongoDB, Elasticsearch, S3)||Yes||No|
|Provide a self-service interface for business users to discover, curate, accelerate, and share data||Yes||No|
|Natively optimize data structures for multiple workloads, entirely transparent to end users, eliminating the need for cubes, BI extracts, and aggregation tables||Yes||No|
|Provide direct access to in-memory data buffers with zero copy and zero serialization/deserialization for Python, R, C++, Java, Spark, and other languages||Yes||No|
|Software license||Open source||Proprietary|
Dremio lets you reimagine your end to end analytical processes, with a solution that makes your data engineers and your analysts more productive on day 1. Instead of using Data Prep, ETL, and custom scripts to move your data between different environments, Dremio connects to your data sources directly, and automatically accelerates your data and queries to make even your biggest data feel small, approachable, and interactive. Dremio supports all your favorite BI tools, and advanced languages like Python/Pandas, R, and Apache Spark.
Customer use Dremio in a wide range of applications. Here are some popular first projects: