DM Radio and The Case for Immediate Analytics
I recently had the pleasure to participate in a DM Radio panel discussing the case for immediate analytics. The podcast was hosted by Eric Kavanagh and included additional panelists Rob Hedgpeth from MariaDB and Matthew Halliday from Incorta.
In this blog I’ll share some of the main points we discussed. It was a wide ranging discussion, so definitely check out the podcast itself.
1) Why does it currently take so long to go from data to insight?
There are multiple reasons, and a big one is that most people are still using a DW model to centralize and analyze their data. But most new data is landing in cloud data lakes, which means it takes time to ETL into the warehouse, apply schema, and ultimately make it accessible to end-user analysts. And since it still often isn’t fast enough for BI and reporting, it takes a bunch more time (and cost) to build external acceleration using cubes and extracts.
So if data is already in the lake, what if you could just query it right there, with high performance, and do BI directly against data lake storage? We think that is the right approach as it enables immediate analytics on data while it is fresh. And that’s exactly what we’re doing at Dremio.
2) What is the role of self-service in speeding up insights?
It’s huge. And it builds upon the challenge data analysts have when they have to wait for data engineering and IT to provision access. It isn’t enough to just provide analysts with data today. As soon as they start exploring it, they’re going to come up with new questions that require changes to the data, lightweight transformations. Or more significantly joins with other data sets. And every time this happens, and it happens frequently, it’s another run through the data engineering loop. On top of that, business requirements are constantly in flux, and those changes drive changes in dashboards and reports that once again require IT involvement.
So data analysts just don’t have self-service to do their work. At Dremio we believe the solution involves pairing fast query performance with a self-service semantic layer. In this way data engineers work with physical data sets, right in the cloud data lake, and only expose virtual data sets to end user analysts. And then give those analysts the ability to find and work with those data sets on demand. And as they make changes, like transforms and joins, they can create new virtual data sets for their own use and to share with others.
In this way data analysts, on their own, are able to create consistent KPIs and business logic in a single place, with centralized governance and security. And then there is a single source of truth, regardless of the visualization tool used.
3) How is the cloud changing the way we should think about query performance?
The cloud brings a really important new core capability - the separation of compute from data. The separation provides the ability to bring multiple different engines to bear on the same data, giving tremendous flexibility and freedom.
It also enables independent elasticity for both compute and data. The inherent elasticity of the cloud means we can start trading off performance for lower cloud infrastructure costs and greater efficiency. Greater performance is now something you can choose - though of course it comes with a higher price.
It also means that once you have your environment working the way you want and you’re meeting the performance SLAs for data access, you can choose to use any future performance gains to reduce the size of your cloud compute infrastructure and pay less money for that infrastructure.
We built Dremio for a world of resource scarcity and elasticity - we’re highly efficient and able to deliver a lot of performance for a given amount of compute in the cloud. And we do things like completely shut off compute engines when they aren’t servicing queries, and spin them up elastically on demand. So that means you not only get fast query speed, you get lower cloud infrastructure costs at the same time.
If you’re interested in how an open data lake architecture can speed up your time to insight, check out our company datasheet and architecture guide - and then give us a shout. You can also try us out right now via the AWS Marketplace - for free!