The variety of data has changed dramatically in the last few years, and the arrival of self-service discovery and analytics tools, and new, state-of-the-art methods for fast and easy access to data lakes have come not a moment too soon. In this blog, I’ll review classic solutions for collecting and consuming data, how things have changed, and how Dremio can work directly with your data with lightning-fast query speeds.
Here is a classical example of how data warehouses can be built.
We all have seen such solutions to collect data, process and consume data. The model works fine as long you have structured data and a reasonable size in the hundreds of GB. If data size grows to terabytes and petabytes, you need to invest more time to understand how to partition the data and precalculate cubes and BI extracts.
What Has Changed in the Data Analytics World
The data. Data is the new oil and has become more varied. There is not as much relational data anymore. Instead, the non- and semi-structured data such as JSON, Parquet, voice, images and video are dominating the area. To integrate that data, new modern object storage like AWS S3 and ADLS in the cloud or Scality and Dell EMC ECS on premises can solve those data integration challenges, making it easier to store all your data in a lake.
The rise of self-service discovery and analytics tools. Tableau, Power BI and Jupyter Notebooks give analysts and data scientists the freedom to explore the data on their own.
Unfortunately, access to the data lake is not easy. The amount of raw data is massive and retrieval from the data lake is usually less performant. In order to control compute resource usage, only a special group of people can access the data lake directly. The other problem is figuring out how to keep data secure and closely governed. Because of the nature of data lakes, there is no easy way to control data access to data or align to enterprise-wide master data management practices.
The next question is, how can the data in the lake be consumed with acceptable performance while also being governed?
First and foremost, natural tendencies are to reuse methods that have been in place for years—let’s build or extend a data warehouse. This will give us the required and aligned structure and will provide us with a fast layer on which we can build data marts or cubes.
However, the old pattern cannot satisfy current capability requirements. The requirements of easy ingestion and self-service data access add stress to the data warehouse which is, by its nature, built as a monolith. The data warehouse creates potential conflict due to its inflexibility and the agile nature of the self-service layer. As a result, data engineers are not able to satisfy the analyst requirements.
What we need is:
Fast data access without complex ETL processes or cubes
An easy way to get access to the data lake without duplicating the data
Governed, secured and audited data access
An easily searchable semantic layer
Dremio’s Data Lake Engine delivers lightning-fast query speed and a self-service semantic layer operating directly against your data lake storage.
Dremio provides connections to S3, ADLS, Hadoop or wherever your data is. Apache Arrow, Data Reflections, C3 and other Dremio technologies work together to speed up queries by up to 1,000x. An abstraction layer enables IT to apply security and business meaning, while enabling analysts and data scientists to explore data and derive new virtual datasets.
Dremio works directly with your data lake storage. You don’t have to send your data to Dremio, or have it stored in proprietary formats that lock you in. Dremio is built on open source technologies such as Apache Arrow, and can run in any cloud or data center. Dremio’s powerful joining abilities mean that you can easily take advantage of other data sources as well.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Aug 16, 2023·Dremio Blog: News Highlights
5 Use Cases for the Dremio Lakehouse
With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.