Analysts and data scientists are consumers of enterprise data. The use BI tools like Tableau, Qlik, and Power BI, as well as data science languages like Python and R, enable these teams to analyze data in powerful ways. All of these tools work best when the data has been collected from the different sources, prepared for analysis, and is stored in a high-performance database for access.
Data Prep tools have emerged to help companies perform the work of sourcing and preparing their data for analysis. Companies use these tools as an integral part of their data pipelines. In this article we compare Data Prep tools to Dremio, the Data-as-a-Service Platform, which integrates several critical functional areas, including Data Prep.
Data Prep, also known as Data Wrangling, is a new class of software products designed to help companies improve the quality of their data for analytics. The idea of improving data quality isn’t new. What is new with Data Prep is allowing users to make decisions about how to improve data based on the data itself rather than relying on metadata.
ETL tools are another class of products that perform a similar function. There are many more ETL tools, and they have been around much longer. ETL tools differ from Data Prep in that they tend to work from metadata definitions instead of data samples. They are also different in that they are exclusively designed for IT users, whereas Data Prep tools are designed for IT users as well as technical users within a business function.
Data Prep is an important part of an end-to-end Data Pipeline. Data Pipelines typically cover multiple steps, and involve several technologies. In a Data Pipeline, data is:
Data Prep is especially important for Joining, Extracting, Standardizing, and Correcting data. These tools typically run on a centralized server where they are accessed through a browser, or on an individual’s desktop. Some BI tools include basic data prep capabilities, but these are usually only suitable for small datasets that fit on the desktop computer.
Dremio is a new and unique approach to data analytics that helps you get more value from your data, faster. Unlike Data Prep products, Dremio is a comprehensive solution that is designed for business users to conduct end-to-end analytics from any data source, at any scale, for use with any BI or data science tool. While Data Prep is focused on one step of the process, Dremio allows users to perform data preparation tasks as part of a larger comprehensive process that includes:
Instead of cobbling together products from multiple vendors, Dremio lets you start seeing value in minutes, and for the first time makes all of your data easily accessible to IT as well as business users.
Analysts connect to Dremio with their favorite BI tool (Tableau, Power BI, Qlik Sense, etc.) or language (SQL, R, Python, etc.). To an analyst, all data appears as tables, no matter what system it came from, with the full power of SQL to join, aggregate, transform and sort data across one or more data sources. Dremio is entirely transparent to your users. And Dremio Reflections™ accelerates your data so that no matter the size or data source, your data feels small, approachable, and instantaneous. Unlike cubes that only work for a small set of pre-defined queries, Dremio makes all your SQL fast, including ad-hoc row-level queries.
|Read data from many structured sources (RDBMS, file system, Hadoop, NoSQL, etc)||Yes||Yes|
|Read data from unstructured sources (Elasticsearch, Amazon S3, etc)||Yes||Very limited|
|Push optimized queries into data sources via native connectors, minimizing impact on data source operations||Yes||No|
|Transform data using SQL, regular expressions, nested data operators, etc||Yes||Yes|
|Calculate new values from discrete fields or extracted values using a range of math operators||Yes||Yes|
|Aggregate data with many measures across many dimensions||Yes||Yes|
|Apply transformations, calculations, and aggregations in-memory as a dynamic operation||Yes||No|
|Accelerated data for rapid processing||Yes||No|
|Execute BI and Data Science Tool Queries||Yes||NoRequires loading copy of data into execution environment|
|Load data into destination systems||NoNo, Dremio provides dynamic access to all data instead of loading into a new destination system, accelerating time to insight, and minimizing cost of ownership||Yes|
Dremio lets you reimagine your end to end analytical processes, with a solution that makes your data engineers and your analysts more productive on day 1. Instead of using Data Prep, ETL, and custom scripts to move your data between different environments, Dremio connects to your data sources directly, and automatically creates a highly optimized cache that makes even your biggest data feel small, approachable, and interactive. Dremio supports all your favorite BI tools, and advanced languages like Python/Pandas, R, and Apache Spark.
Customers use Dremio in a wide range of applications. Here are some popular first projects:
Dremio is a new approach to data analytics. Learn about Dremio.