March 1, 2023

10:45 am - 11:15 am PST

Leveraging a Data Lakehouse for Energy Trading

Sharing analyst data is a key success factor for us as an energy trading company. Designing, building and operating a platform to support hundreds of users is non-trivial. We, as RWE Supply & Trading, an energy trading company, and Baringa as our consulting partner. have undergone this journey to build such a platform together successfully. In our talk we will give insights into our goals, our approach, best practices, and learnings in using Dremio in our architecture.

Topics Covered

Customer Use Cases
Enterprises
Lakehouse Architecture

Sign up to watch all Subsurface 2023 sessions

Transcript

Note: This transcript was created using speech recognition software. It may contain errors.

Nick Plassmann:

I’m very happy to present this use case here for all of you. yeah, as, as Tony already said, this is about leveraging a data lakehouse for energy trading. This is an initiative we kicked off a few years back in R w e Supply and trading, the strategic initiative. And yeah, something we would like to share our experiences on. So, just on myself. So yeah, my name is Nick, Nick Plusman. I’ve got quite a few years in the IT industry of experience. I’ve worked with different data warehouse projects and products, and yeah, over the past years I’ve been working on an, in, on an initiative in RW Supply and Trading, which we called Lead in Data and something we will go more into in our presentation. yeah, I, I work for, for our west, and I’m, I’m based in, in Germany one of our locations across many other locations including also UKs locations. With that one, over to you, Costas.

Costas Gavriel:

Thank you, Nick. Hi everyone. my name is Costo Gabriel. I’m a senior manager in a company called Baringa Partners. I about decade experience into data analytics, cloud delivery, and machine learning across a number of industries. And for the last three years, I’ve been partnering up with Nick and the r w team to support them with their data transformation journey. I’m currently the technical lead on the project we’re about to introduce now in this session. and it’s all about supporting r of your team, rolling out the brand new cloud platform to enable data driven decisions around trading. so yeah, that’s it about me, Nick.

Nick Plassmann:

Thank you. Kutas. Yeah, maybe a few words on who we are as companies. So about r b. So we are one of the, the leading energy companies in, in Europe. So we are around in the markets since more than 120 years. we started off our, our journey and we round for quite a while in the traditional energy production business with con conventional power plants. We’ve, we are focusing in the past years very, very much on green energy. So we’ve got a very considerable green energy portfolio, both wind and also solar. and with those portfolios we are focusing very much also on the commercial optimization of, of those parks and, and trading the power that comes out of our renewable energy parks. And with that energy trading, we obviously need also some underlying data for our traders and our commercial analysts. And that data is something that is being provided and the platform to work with the data is provided with the platform we are talking about today. So that’s the background about the company, but also a bit about already our, our platform.

Costas Gavriel:

Yeah, I just want to introduce my, my company as well. As I said, it’s called Baringa Partners. It’s a management consultancy headquarter in London, in the uk where I am based as well, but we have global presence in Europe, us and apac regions as well. Our heritage is mainly in energy but we’ve been supporting clients across a range of, of industries as well. I’m specifically part of the data analytics and AI team. we’re a team of about 75 data professionals, and our approach is pretty much outcome focused. We’ve been partnering with clients like Nick and the r w team, helping them with their data transformation journeys. And that includes anything from the data strategy, all the way to deep technical execution on the ground, implementing end-to-end data, products and, and, and solutions. Good. Nick, I think we should go into the now.

Nick Plassmann:

Yeah. Oh, wonderful. Yes. So, yeah, maybe to start with it’s probably important to understand why we started our journey, why we initiated our project, which we titled lead in data. what is it we wanted to achieve? So the key drivers that we identified back then with the start of, of our project was that we wanted to be more efficient in finding data. So in our company, we have a tremendous amount of data, can be various things, mostly prices, but then there are other influencing factors. Weather data is of course, these days quite important but then to find the right data that is available in the company for, for the different analysts has been a challenge, and that’s something we wanted to address. But not only that, we have different data sources, different data buckets I would call it. And one, one of the key challenges also to provide one access platform.

So that is something we wanted to do to make it easy once the data is found. And also to, to go from finding into accessing now with data getting more and more important and also data amounts, getting, getting bigger and bigger. Scalability was, was one of the very early points we identified to be very, very important. And, and in order for our models by which we, we, we model our marketing, which we are present to be more precise, we also need to have more fine grain models in, in various dimensions, and hence the models do get bigger. So scalability is really the key point. US work with large scale models, and that’s something we, we saw already from the begi very beginning, and we wanted to have a look and understand how we can, can do that, go a bit away from local compute power into, into something that offers scalability and, and also some ad hoc mechanisms to, to do that.

So these are very, very tangible points. But I mean, apart from that, we also really wanted to look at data culture. Now what, what does data culture mean? There are various aspects for us, and that one of the key points is that we wanted to, to allow a sharing culture. Now, if we have various analysts looking after different aspects, different markets, we want to make sure that the different market views can be shared with that one. We also wanted to make sure that our analysts can make use of the most modern technology, so that also includes data literacy, that includes upskilling. and that is something we integrated from the very beginning into our project. So really go beyond the technical platform and, and make sure that we have the, the organizational organizational set up around it.

Apart from sharing the data, which I’ve mentioned earlier, of course, also important for us is that we are able to, to also share some logic. Yeah. now we really wanna make sure that we can reuse what is, what is there already. And that also includes then reuse of I results from, from our our models. And that that is one key aspect that we see. we, we want different teams to be able to very easily exchange data. And last but not least we, we found in the past that sometimes we were locked a bit with, with technology, and we, we wanted to be more open and, and be able to absorb whatever technology potential offers us. And one of the key examples for us is machine learning. So that is something we want to, wanted to incorporate in our platform from the very beginning that we are set up for those technologies from the very beginning and, and enable easy integration of, of whatever is available in future. So this is, this should give you a, a bit of an idea in what we wanted to achieve and, and what was our target setting from the very beginning. Now let’s take a bit of a closer look into how our platform looks like. So over to you, cost us with a few more details.

Costas Gavriel:

Yeah, and I mean, as Nick said, there was a wide range of ambitions being said from the very, very beginning. And as you can imagine for r w being trading company, rapid access to data and timely data and accurately data is translating to more informed trading decisions. So when the, the project started about three years ago, there was a clear steer from the board and, and the vision around data was being essentially the leaders in the use of data in the trading space. And to enable all these ambitions that Nick said that we identified there was a clear need for a new technical modern platform to support and enable the commercial analysts to perform their data operations more efficiently. And that meant being able to find access, analyze visuals, data, visualize data a lot more efficiently. And that’s what you roughly see on the diagram on the right hand side of, of, of this slide.

Essentially, there are three key boxes of what we have delivered as a platform, is a data catalog component, which we have got a tool correlation there, which enables users, analyst, traders, and even IT to go and find data they want to access. We have Dremio as our access point for all the data and is literally placed at the heart of this platform, allowing access to data across a range of sources, relational databases, data lake, streaming data from, from Kafka. And then on the right hand side, we place Databricks, essentially a wrangling tool that allows power users you to use Python and run modeling, analysis, reporting on top of the data. and essentially the combination of the three tools serve the needs of, of, of those users. Now, o on top of all this is not just the technical side of things. We have to also enable collaboration through the tools and manage to ef efficiently do that through all, all the, the components of the platform enable safe and efficient sharing of data through with that kind of certain constraints that are need to be enforced across the platform, but also enable cell service for the users to be able to bring and onboard their own data, merge it with existing datas available in the platform, and then enrich their analysis as as they go along.

Now, as you can imagine, it’s not just tech, it’s not just a platform. There’s a lot more coming around that. So on the side of all these kind of technical activities where to go and enable brand new operating model, new ways of working in new governance to support this new platform coming within the r w supply and trading. And then more importantly, we build a brand new organizational unit, a new department, the leading data team that’s essentially responsible for maintaining, supporting and and, and main and supporting the users as well. It’s a team of about 30 multidisciplinary engineers, data engineers, dev, bi developers, that they’re all about supporting the users as part of that journey. Now, as I said, the, the started about three years ago, it’s still a bit under the way and there is a, a cutoff date around mid this year to transition to B A U, but we’re currently in a position where we have around 600 users on the platform from 73 different trading and analysis teams to about around 200 terabytes of data available across all the systems and sources. it’s around hundred thousand different data sets. We have around 200,000 querie daily on dr, about 80,000 V dss and quite a lot of et TL jobs on top of data. And that, and all these are still growing as mumu along into p A u and there’s a lot of growth expected over the next years in the platform.

So I, I just wanna touch a bit more now on, on some of the kind of principles behind the, the, the platform on how we approach everything and the roughly divided in three different areas. So first of all, it, it’s all about us leveraging the cloud. So we, we wanted to essentially not start from scratch from a blank piece of paper. We wanted to bring in quite a lot of functionality ready through the tools of, as I mentioned, elation, Dr. Databricks, and then on top of that, build customization that would allow us to make more tailor functionality for the tradings and the trading analysts. we introduced cloud that offers scalability, and on top of that, we also said that from day one that we wanted to bring a full DevOps discipline in the deployment of the platform. So we really wanted to adhere to really high SLAs and also wanting click deployment across the entire infrastructure end-to-end.

 we, we were, we were been very, very big advocates of SA solutions. We have ation and data bricks on the cloud via SaaS. we wanted to go with drain new cloud, but at the time, three years ago, it wasn’t really at the stage of supporters, but it’s on our midterm road, but also move to, to dry new cloud and, and, and how you get these benefits there. and then around the, the kind of data and the, the, the, the integration of all the data sets, we wanted to have a single point of, as I mentioned, but we follow a bit of a hybrid approach. We, we did a blend of data lake relational databases and streaming sources, but we didn’t want to etl the data into one place. We pretty much used Drio to keep this kind of decentralized ownership of data at the different sources and by the different teams.

With drio, we managed to abstract that complexity and build business intuitive views with the whole semantic layer around it. And also the performance was able to come on top through the different techniques, reflections, and Cushing of data and, and precalculating of quite a lot of data sets. But we’re able now to build interactive dashboards on table, for example, based on billions of rows and filter on the fly with almost minimal latency across that. And as I mentioned earlier, quite a lot of focus has been placed on enablement of the user self-service capability and collaborations being at the part of the project from pretty much the beginning when we’re doing POCs and pilots with quite a few vendors. The, we also enable the users to, to rapid it onboarding of data. So if a trader in a specific country or an industry or commodities came to explore some new data sets for with their analysis, they are able to do that now with the platform. And they can do that at scale by blending data across different sources. And then this whole platform is centrally host that allows different trading teams on around across the globe to be able to share both data and code and standards to be able to, to, to, to, to collaborate a lot more efficiently. there are a few other additions that come on the back of that, but hopefully this gives you a bit of an idea on how we approach this project from the beginning.

Nick Plassmann:

Cool. Thanks. Hosts. Yeah, maybe it’s, it’s worth taking it a bit more to life and looking at a use case. What can this platform do? And more importantly, what can our users do with the platform? Now, th this is one example, the long-term power forecast, which we of our power analysts have done and, and where one of the first use cases where we’ve been extensively using our platform and our users have been using the platform. So what is the situation we came from before introducing lead in data and, and, and the, the platform? basically we, we’ve been very much working with manual processes and, and, and related local desktops that, that led to slow processing of, of data. And with that one also, we were quite limited in terms of the amount of data and hence also the history that we were able to, to, to process and work with.

 and also with that one, again, the visualizations that were available were somehow very limited. So in that sense, a very, very limited approach. there was a, a lot of desire to extend that, and with the platform and Andreo sitting at the center of it I think we’ve now managed to come to, to a very, very different setup. And now, now we, we’ve got results that we can fully automated in a fully automated fashion process. Yeah, so, so really the data flows into through pipelines end to end, and, and there is no manual intervention really required. and hence also the, the, the, the time that it takes for, for generating results has significantly reduced down to something like, like 70 per 60%. That that is really considerable. Of course, one of the key success factors in that one is that we have a cloud-based storage.

We have also cloud-based compute power. Yeah. cost as mentioned already, that when it comes to the complex models we are, we’re using also Databricks, but then again, the sharing of it is very much into, into drio where results and different slices of results are being shared to different teams via, via drio and, and here the customi customizability of those views. So what is exactly the, the data to be shared, two different teams, that is something teams can very easily take into their own hands. The self-service element was something that we looked at from the very beginning, and I think here it really comes to life that teams can, can really take the data out of the large data model they, they really need. And with that one, we have got quite a few REU reusable components.

Yeah. When, when we look at the data pipeline, but then also when we look at the visualization, when we look at the, the, the, the views, these are all elements that we can also reuse for other use cases. And that is also something, by the way, that we did via a community which we put into place. So this is a use case being implemented by a user, the other use cases for implemented by other users and having that community is, is really key element. And, and yeah, the, the, the reuse ability and sharing is another aspect that comes out here.

Now, if we, if we take a step back from that use case and take a closer look onto what was the ambition and not where we are we today, what are the, the, what is the status and what are the benefits? I think that it’s quite remarkable on where we stand today. we do have a dedicated governance team in place. Cost has mentioned it. It’s not only the platform, but it’s also how we use the platform. So we do have a team in place who is basically looking after making sure we have the right processes in place of sharing data taking license requirements into account and so on so forth. So, so quite a, quite a substantial element in in, in the overall solution. So I mentioned earlier we have a very strong focus on upskilling. We have a, a, an own training pathway that we have defined for our users to upskill them to, to improve data literacy. We have a dedicated department, dedicated team. We’ve set up not only the, the governance team, but also the technical team looking after our platform. and we’ve even taken further steps to, to bring more data. It competencies together into, into one larger team that we’ve just built.

Data culture, data ownership is one key element. we have, we have quite a lot of data, yeah, about about a hundred thousand curves that we have in our data catalog. Key element is that there is also ownership being taken by the different data owners, data stewards so that, that for example, metadata is being kept up to date, but then also the sharing element is a, is a key element in our culture that we’ve, we see now coming across with many, many use cases like the one that I’ve talked about earlier. So that quite, quite a lot of these more cultural elements, but there’s also a lot of technical elements cost. Do you wanna, wanna quickly touch upon those?

Costas Gavriel:

Yeah, and I mean, from a platform point of view and a technical point of view, we’re still in the journey of, of, of completing and finalizing the platform. We are on the cloud, we have the scalability aspects of it, but there are lots to be done for, for making the platform more and more mature and established across, across the firm. from a data point of view, we’re around halfway through the integration of data sources and datasets. We expect quite a lot to be done over the next few months. and also giving more time for, for, for users, especially users that sit more on the non-coding end of the spectrum to come in onboard and better leverage the platform. some of you might love, but what to go and, and implement custom Excel plugins that connect to DR and allow users to come and be, come out and, and, and, and be able to access bigger dataset or analysis on top of BS datasets and bring it down to Excel.

So it, it can help traders to that kind of quick investigation on top of data, but in, in a way, we had to cater for a wide range of, of users and audiences all the way from data engineer, data scientists to more traders that are, are more familiar with Excel and, and some basic BI tooling. So we have been working quite a lot on integration and also making the life a lot, lot easier for users to access data. We’re still underway with the full integration between our catalog tool and dr. We want to have a, like a single click access where you search for, I know we gas prices and the catalog and the catalog gives you all the entries. You click a pattern and then that takes you along to Dr it to access and query the data and, and fit into your analysis and tools. So we want to build this vision of one stop shop for the entire platform for everyone to serve their data and needs and also be enabled to do their analysis at scale. So yeah, hopefully we’ve done the first table, last part of that journey and keep building our car along the way.

Nick Plassmann:

Thanks, Costa. So yeah, I think that that concludes what we wanted to share as a super exciting use case and, and, and platform that Costas and myself have been working on over the past month and even years. So yeah,

header-bg