Separation of Compute and Data: A Profound Shift in Data Architecture
For many years now, the industry has talked about the separation of compute and storage, and for good reason – it was a critical step forward for efficiency. When we were able to separate the compute tier from the storage tier, at least three important things happened:
Raw storage costs became so cheap that they were practically "free" on an IT budget spreadsheet.
Compute costs were isolated meaning customers only paid for what they needed when processing data, thereby further lowering overall costs.
Independent scaling of storage and compute allowed for on-demand, elastic fine-tuning of resources bringing flexibility to architectural designs.
But these didn’t happen right away. Large, expensive SANs, and cheaper, but often complicated-in-their-own-way NAS systems have been with us for quite some time. The limiting factor for both of those storage models was administration and procurement overhead. Mass adoption of separating compute and storage would become practical only with public cloud computing. Separate compute and storage in the public clouds is simple to administer and relatively low cost. In addition, these compute and storage cloud services are, for all intents and purposes, infinitely scalable, which also eliminates the hardware procurement problems of old. Moreover, the services provide very high availability and performance.
Today, another paradigm for fully leveraging cloud infrastructure resources is underway: one that puts data at the center of the architecture, not a vendor. Just as applications have moved to microservice architectures, data itself is now able to follow suit, fully exploiting cloud capabilities in the process. Imagine the model shifting something like this:
Let’s take a cloud data warehouse as an example. From a pure cost standpoint, if a vendor charges you separately for your storage and compute utilization, you are in a better position than if those were inextricably linked. While that is progress, it brings some further challenges.
Let’s grant that a cloud data warehouse separates your compute and storage costs. So far, so good. But is your data itself separated from that vendor’s compute? Can you freely (in every sense of the word) access that data without paying the data warehouse vendor? You cannot. Are you paying the data warehouse vendor to get the data into their system? Or get it out of their system? Yes, you are. Is your data stored independently such that myriad other cloud services can access it through industry standard formats? No it is not. That state of affairs exists because that’s just not how data warehouses were designed some 30 years ago, and that same design principle exists in cloud data warehouses as well. The design imperative was to have the data completely in control of the data warehouse itself.
One hardly needs to make the point that data is central to an organization’s future. The question then becomes, what is the best architectural way to unlock that centrality? By separating compute from data, three immediate benefits are realized:
Extreme reduction in complex and costly data copies and movement as one shifts from the data warehouse being the single source of truth and instead accessing the data in open formats in the data lake, which also eliminates data silos.
Open data standards and formats allow for universal data access from unlimited services and applications, creating freedom to choose best-of-breed solutions.
An open architecture means future cloud services will be able to access the data directly instead of going through a data warehouse vendor’s proprietary format or moving/copying the data from the data warehouse for access.
Application architectures have proved that a services approach allows for maximum scale, flexibility, and agility. Separating compute and storage was an important first step in lowering costs for analytics, but it does not provide the kind of advantages found in modern application architectures. By separating compute and data, application design benefits now can be realized for data analytics. And given the critical nature of data for all businesses, that can’t happen fast enough.
Try Dremio’s Interactive Demo
Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI
Try Dremio Cloud free for 30 days
Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.
Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop
We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]
Aug 16, 2023·Dremio Blog: News Highlights
5 Use Cases for the Dremio Lakehouse
With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.
Aug 31, 2023·Dremio Blog: News Highlights
Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud
Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.