Here Comes the Data Lake Engine: Why I Joined Dremio
It’s official: I started a little over two weeks ago at Dremio, makers of the Data Lake Engine, as their VP of Marketing. I’m super excited to join the Dremio team as they are in a fantastic space with a huge total addressable market (TAM), a strong leadership team, and a disruptively better architecture and technology. I’d love to share the journey that led me here - so let’s dive in!
After leaving Pure Storage in June and taking the summer off to travel and spend time with family and friends, I started looking around for the Next Big Thing. I knew I wanted to move “up the stack” from storage and also go all software/SaaS, and so I explored a number of really interesting spaces (and relevant startups) including enterprise Kubernetes, analytics and AI, and IoT/OT security.
Cloud Data Lake Storage is a Disruptively Better Core Technology
I was looking for a space that shared the same characteristics of disruption that enabled Pure to be a breakout success. Namely, I was looking for the emergence of a new core technology that is creating the opportunity for a ground-up re-architecture of existing legacy solutions. For Pure, that core technology was NAND flash, and Pure used it to create a new storage architecture purpose-built for flash that was way simpler, way more efficient, and way more effective. The rest is history.
As I dug into the analytics space and talked to a lot of players and mentors (thank you all!), it became clear to me that the same thing is playing out in analytics, with Dremio perfectly positioned to capitalize on the shift. Analytics is crowded (see this and this), and fundamentally built around the same architecture that has been in place for decades: lots of data sources and silos, with a bunch of semantic data prep and painful ETL/ELT into data warehouses and marts (yet another silo), and yet more painful structural optimization via cubes, extracts, aggregation tables, etc., all in order to provide interactive analytics (on at least some data) to a wide range of end users. It’s a lot of data movement, copies, silos, hand-offs, points of failure and so on. Painful for data engineers and IT, and painful for data analysts, BI analysts, and data scientists.
But there is a new, core technology that is gaining steam quickly, just like Flash did. And that technology is cloud data lake storage. Basically, large, inexpensive, durable, and elastic object stores such as AWS S3 and Microsoft ADLS. Enterprises are all moving to land their data in cloud data lake storage, and then find ways to use it. But with the volume, velocity, and variety of data, it has become extremely painful (often impossible) to get value from that data using the same aging architecture. It doesn’t matter if all of the pieces of that architecture are running in the cloud or are even cloud-native as the architecture itself is broken at this point.
Dremio Delivers Interactive Analytics Directly Against Your Cloud Data Lake
Enter Dremio. The visionary team here has created a new architecture (and a bunch of amazing technology) that is purpose-built for cloud data lake storage. Dremio has built a data lake engine that sits directly on top of the data lake storage, collapsing multiple layers of the analytics stack and connecting all those myriad users and applications directly to the data. Others have tried (and are trying) to rebuild using this new architecture, but only Dremio has the technology to deliver incredible query consistency, cloud infrastructure efficiency (for lower cost), and easy accessibility - for all data, all the time.
What’s even cooler is that Dremio slides right in and starts adding value on day one. If you have a highly optimized data warehouse, Dremio can connect to that so you can query across that as well as data lake storage. What’s more, Dremio easily works with your existing BI tools, such as Tableau, PBI, and Looker. We also co-created Apache Arrow, with the help of Apache community members, and it now has 6 million+ monthly downloads. Apache Arrow users will feel right at home with Arrow Flight, a new general-purpose framework that simplifies high-performance transport of large datasets over network interfaces.
The Results Are Orders of Magnitude Better
Armed with this new architecture, our customers are on average getting 100X faster time to value; 10X greater efficiency; and zero copies or data movement, all wrapped with a dramatically simpler and easier experience. This is exciting stuff. It all frankly feels a LOT like Pure in the storage space, as Dremio offers a way simpler, way more efficient, and way more effective way to get value from your data. Many will say “I don’t believe it. It sounds too good to be true!” (like they did all the time with Pure). And I couldn’t be more excited to be joining Dremio to get the message out and have enterprises around the globe experience the difference for themselves. It’s going to be a fun ride! And of course it doesn’t hurt that co-founder Jacques is also a Nadeau (he’s only the second other Nadeau I’ve met in the US in 20 years). My dad is busy figuring out whether/how we’re related, so stay tuned on that front. :)
No surprise we’re growing fast, and hiring in every area. If this hot space sounds interesting to you - reach out to us, and let’s talk!