5 minute read · November 29, 2021

So Long Proprietary Data Warehouses, S3 and Open Lakehouse Architectures Are Changing the Landscape Forever

Tomer Shiran

Tomer Shiran · Founder & Chief Product Officer, Dremio

We’ve come to expect big, industry-changing developments at AWS re:Invent. This year, Dremio is making one of those statements - and it’s something I expect to have many conversations about while I’m here at the conference. You’ll see it if you stop by our booth at the show, but its impact will be felt long after everyone leaves Las Vegas: Make S3 Your Data Warehouse.

What do we mean by this? It means we’ve entered a new era. One where you no longer need proprietary data warehouses for your BI and analytics. To explain, let’s take a look at how data infrastructure is evolving.

The Evolution of Data Infrastructure

We all know the demand for data is increasing exponentially. Our vision at Dremio is to make corporate data as easy to access and use as personal data. To that end, we’ve changed the way many think about mission-critical BI and self-service data access.

I’ve written before about the data infrastructure trends that are challenging the decades-old paradigm of extracting and loading data into expensive, proprietary data warehouses for use in BI and analytics. Driven by advances in technology, we’re rapidly moving to modern, open data architectures where data is stored on cloud data lakes in open source formats as its own independent layer, accessible by loosely coupled and elastic query engines. All BI and analytic goals can now be achieved directly on this open data architecture - no need to extract and load the data into proprietary data warehouses.

The Building Blocks of a Data Warehouse on S3

With recent advancements in analytic technologies, you now have the pieces in place to build your data warehouse right on S3. That’s a goal that some of us in the industry have been working on for a while. In fact, large tech companies like Netflix have been using S3 as their data warehouse for some time now. It’s just with recent advancements that this ability has become feasible for organizations that don’t have armies and armies of data and infrastructure engineers.

For example, Apache Iceberg, which is quickly becoming an industry standard, came out of a need Netflix had for a more consistent, performant, and end-user friendly table format for their S3 environment. If you haven’t heard the story, it’s worth listening to the session Netflix database architect Ted Gooch gave on Iceberg at our last Subsurface conference.

Project Nessie, which Dremio open sourced, adds another building block. Nessie extends and leverages table formats like Iceberg, bringing multi-table transactions and Git-like version control to the data lake. Nessie also brings capabilities to the data lake that leapfrog data warehouses, such as safe experimentation and streamlined promotion workflows. It makes life much easier for data engineers and accelerates data science and data engineering.

While these building blocks now exist, companies still need a way to tie all these capabilities together and make an S3 data warehouse a reality. That’s where Dremio comes in.

Making S3 Your Data Warehouse With Help From Dremio Cloud

Having access to various building blocks is one thing. But companies need to be able to focus on deriving value from their data instead of getting consumed in system setup and maintenance. Earlier this year we launched Dremio Cloud, which provides the same great Dremio value, such as running BI directly on S3, along with infinite scale and limitless concurrency in a cost-efficient, fully managed service. AWS and the SaaS factory team were instrumental in helping us reach this milestone (you can read the blog about our journey here).

Dremio enables you to run all of your BI, from the ad-hoc to the mission-critical, directly on your data lake. Now with Dremio Cloud you get all the benefits of Dremio as a fully managed PaaS offering.

In the next few months we’ll be making more announcements about the future of Dremio Cloud, so stay tuned and save Feb. 9 and 10, 2022, for the winter Subsurface LIVE conference where we’ll fill you in on what’s coming next.

Until then, I invite you to sign up for Dremio Cloud and experience it for yourself for free.

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.