Dremio Jekyll


Subsurface Summer 2020

Extracting Value from Data Assets

Session Abstract

Exelon Utilities delivers electricity and natural gas to approximately 10 million customers in Delaware, the District of Columbia, Illinois, Maryland, New Jersey and Pennsylvania. Recently, Exelon Utilities embarked in a multi-year journey to amplify the knowledge generated from its data assets and develop analytics-driven products to improve operational efficiencies and deliver a premier customer experience.

This presentation will cover the foundation that Exelon Utilities relied upon to assemble a sustainable data science team and create a culture of data-driven innovation. Further, it will dive beyond the surface to discuss how a combination of open source and proprietary technology drives adoption of analytics within the business and maximizes the value extracted from Excelon's data assets.

Presented By

Yannis Katsanos, Head of Customer Data Science, Exelon Utilities

Yannis Katsanos is the head of customer data science in the customer analytics department at Exelon Utilities. In this role, he is charged with developing analytics-based solutions that focus on delivering value to customers and reducing operating costs. He comes to Exelon with more than 20 years of experience in creating value through analytics, having worked across several industries and handled data into the range of several petabytes and trillion rows.


Webinar Transcript

Yannis Katsanos

Host

For those of you who are joining us, thank you for attending one of the final sessions of the day. I would like to introduce Yannis, head of customer data science of Exelon, and he will be taking questions if there is time at the end of the session. You can either type them into the chat here, or you can click on the audio share in the top right hand corner. And you can also ask your question live. If we don't get to your question, we can answer it at the Subsurface community. Yannis will be available there in his dedicated channel. And then, for those of you who are wanting to see a larger screen of the presentation, you can double click on the presentation tile, which will expand the video so you can get a better look at Yannis' slide. And with that, I will turn it over to you. Thanks, Yannis, for presenting.

Yannis Katsanos

Thank you, [Colleen 00:01:07]. And I would like to thank the Dremio team and the Subsurface organizers for the invitation to discuss how Exelon Utilities have been using all this great technology that we covered earlier in the conference to extract the value from our data assets. And of course, I know that there are some definitions in there. I mean, why we call it data assets? What do we mean by extracting value? And things like that.

But before starting, I should say that I have been on my role with the Exelon family for about five years now, actually for more than five years now. And when I joined, the idea was how can we take all that data that Exelon Utilities has been gathering and try to create insights out of that? What started as a simple question, of course, morphed into a long-term strategy, and eventually our goal is to use all that [inaudible 00:02:23] to get into our data and be able to drive a business decisions with that data.

The area that I'm going to cover in my presentation is, of course, explain what all these words mean, Exelon Utilities, et cetera. Then how we use a data science as the spearhead, if you want to create value out of our data assets.

But at the same time, be able to expand the usage of data, since at the end of the day, you are a data scientist. You want when you join any organization, not to run the day-to-day reports, but be able to look into the big picture items.

At the same time, these day-to-day reports needs to be run by someone, and if it is not you as a data scientist that you are going to do that, you need to create the culture and the environment to be able to have other people to do that.

So, I'm going to discuss the technology side of it, of how we have put in place in the form of a data lake to allow that wide usage of data. But at the same time, what we have done from the personnel side, to be able to extract those insights from our data systems and not create what sometimes the risk is a very expensive paperweight.

First of all, Exelon Utilities. If we think energy, electricity from the perspective of an electron, initially what happens is that electron needs to be generated somewhere. And that is the purview of Exelon generation. Then that electron gets bundled with other electrons in a product that goes into the open market so people can bid on that electron to get the energy that they need. And that is part of what is called the marketing function, and within the Exelon family, that is by Constellation.

And then you hear about what we call the utilities on the transmission and delivery side, which is taking those electrons from point A, which point A usually is the interconnection to our grid, to point B, which point B is the consumption, to allow you guys to watch a webinar about data usage in electric industry.

Exelon is a Fortune 100 company. As I said, we are split on the unregulated side, i.e. generation and marketing, and the regulated side, the Exelon Utilities. Exelon Utilities itself is the corporate structure of several smaller utilities or operating companies. Overall we serve Northern Illinois, Philadelphia, Baltimore, DC, Maryland and Southern New Jersey.

We serve 10 million customers, if we count customers as service points, or about 25 million people. And from those 10 million service points, 9.5 million of them are the so-called smart meters, which is nothing more than IOT devices on the field.

So, our company is one of the larger implementations of IOT in the field that have been running for several years now, and they gather data about electricity uses. And so, overall, we have to deal with data issues that may not be on the level of a Google or a Facebook, but certainly it is a higher data volume than what you would traditionally expect from the utility industry.

For Exelon, in order to be able to deal with this challenge, we created an analytical organization. The analytics organization is split in two parts. Infrastructure, which deals with our greater... as machinery. And then you have the customer side, which is the side of the company that deals with customer interactions. And that means trying to understand how customers use energy, how they interact with the company, and of course, products and services that would be appropriate for those customers.

Overall, we are dealing with several petabyte of data. Again, as I said, maybe not so big as other technology companies, but at the same time, significantly bigger than what Exelon or the subsidiaries had to deal in the more than 200 years history.

And of course, you may have picked up that I called Exelon a technology company. I remember when I first entered the private sector, moving out of academia. I heard a presentation when someone said that in the future, there are going to be two type technologies: digitally technology driven companies, and those that they didn't make it. And Exelon made that choice a few years back that we are going to be that technology digitally driven company.

I sit on the customer side. I'm leading the data science group there. And as part of customer analytics, our perspective is trying to understand, as I said earlier, using all that sensor data that we have on the field, how our customers use energy. But at the same time, trying to understand how they communicate with the company, et cetera.

So we share what data, that can be usage, billing, of course, call level data, web data, et cetera, and be able to put it everything together and combine it, in order to improve the customer satisfaction. That is always our goal as a group, while at the same time, increase our own savings, which eventually, that would lead to lower rates and higher customer satisfaction.

I want to talk a little bit about our data science approach. When I first joined Exelon Utilities and we started to building the organization, the view that we took as data scientists, we have mainly three roles in the company.

The first role is a technology consultant. So be able to scout what is the greatest and the latest in the field, and be able to see how that fits on our legacy technology environment, and be able to make some wise choices by filling the gaps or sometimes pushing the envelope. That is about 10% of our time.

The rest, 30% of our time, is what I call acting like a management consulting company, where someone, an internal client, asks us a question. We go into the data, we analyze the data, and then we produce a PowerPoint as the final deliverable. If someone wants to ask the same question next year, for example, we'll have to do everything from scratch.

Questions that we get is why, for example, this year we had a decreased call volume than the previous year. Then we explain that drop, and if we can explain the drop, can we see if it is sustainable?

So at the end of the day, we just produce, as I said, a PowerPoint where includes the analysis and potentially some insights on what to do in the future.

Where we spend about 60% of our time is creating analytics products. That is, being in control of the full chain, from the data acquisition, often data extraction from our systems analysis, and packeting it into a solution that our internal clients, or even our customers, can interact with.

And this is something that can be repeatable. Not only repeatable, but automated as well. And be able to just create the solution, put it into production, and then go to the next problem. You can see things like that being a recommendation system, so that can be surfaced through our website. That can be propensity scores for a customer to adopt a particular new product, and things like that.

But overall, we see our theme of being in the heart of the full analytics chain and be actually touching all of the elements. That's why, for example, you see me here presenting on a data engineering conference, not only how to use the data, but what is it on the earlier stages of that analytics chain?

Now, what makes all the magic really happen? And that is our data analytics platform. When we first started on the journey, about five years ago, a little bit less than five years ago, we had more than 100 disparate systems across our various data centers, or many of them sitting with external vendors.

The task of getting access to that universe of systems was always a very difficult one. And in particular, I remember from my early days, it took me three months to get access to the first system.

So when we started designing what the data science program would be in Exelon Utilities, and in particular how that will evolve over the years, we decided that the first problem that we need to solve is the accessibility. So take all those systems and bring it into a central repository. And then make sure that data scientists have the tool set to run their analytics on top of that system. But at the same time, make the system flexible enough to have business analysts and other folks that they want to include data into day-to-day operations to be able to do so.

Now, of course, at that point in time, we had to build the system on-prem reflecting the realities of five years ago. And I mention five years ago because for the analytics industry, that like the previous century. But from the utilities industry that is yesterday. So, at the same time, we have to bring on the same page the timescales of the utilities industry and the timescales of the analytics industry.

We named that system data analytics platform, or DAP, because I think it's a requirement that everything has to be an acronym. And the idea in that DAP is data scientists will be able to get the highest value data, which is the data on its rawest form that you can perform the more complicated operations from them.

But at the same time, being able to create a curated system where business analysts will be able to run their trusted day-to-day reports. My group, how it operates on that environment is a Python shop. So we made the investment early on with Anaconda Enterprise being our main entry point to Exelon Utilities data world. And of course, when we say Python, we mean all the great and latest modules that you can find under the Python ecosystem.

We have things like Dremio, Plotly, Alation, of course. And even though this is the currently enumerated system, we want at the same time to be open to what is new out there and see where we have gaps, to make sure that we go and fill those gaps.

But I talked for almost 50 minutes now on the abstract. I will continue being on the abstract, but maybe a little bit more in detail about our data analytics platform. And that data analytics platform, as I said at the beginning, you have chaos, you have all those different systems. If we look only of the customer data, the customer data itself, I mean records that describe our customers, who they are and how they pay and the programs that they participate, live in 25 distinct systems.

The sensor data that we have from the smart meters and other data from infrastructure, when you put all this together, give this ecosystem of 100+ data sources. And every day I'm surprised to find out there is a new data source or data system, that when we did our a list, we missed it.

So, what we do is we bring that system into our data lake through an integration layer that consists of technologies like NiFi, Kafka, and the Oracle-based technologies like ODI. So we combine all these different data in different processes to bring the data into our HDFS layer.

The HDFS layer is a Cloudera distribution that runs on some hardware that Oracle has built, and calls it big data appliance. Then we sort of normalize a part of that data to an RDBMS powered by Exadata.

And we do that in order to be able to send data to our external partners. And the data in that Exadata system is more or less reflecting the needs of those external partners, because in order to maximize our value, we decided to take a part of our analytics and develop them internally. But other part, just send it to companies that already have a pre-built solution.

My favorite part of this data analytics platform ecosystem is, of course, our GPU cluster. I believe we all understand that as a data scientist, I want to have access to the data, but I want to have access to readily available computer to do our operations. Oops, wrong direction.

But, having that ecosystem, you need the entry points in there. And there cannot be only one entry point, given that most of our users have different levels of sophistication.

So the first entry point is, of course, Alation. Anyone needs to have access to our data catalog and get information about data, and then do the reports through Power BI.

More sophisticated users will have access to our Anaconda Enterprise. And to those people that they do not do any of the new technologies, like notebooks, et cetera, and they prefer working through Emacs and CLIs, where those people go directly into the edge server. Again...

[inaudible 00:20:57].

Host

[inaudible 00:20:57].

Yannis Katsanos

Okay. I was talking about Power BI. Fair enough. So, the vast majority of the users will use Power BI to connect. And as you probably have experienced in the past, a Power BI connection directly to a kerberized system does not exist. But that is why we introduced Dremio, where Dremio can...

Okay. For some reason, I do not have control of my presentation. Okay. To connect, a Power BI can connect to Dremio. At the same time, we can get acceleration through Dremio on the other connections. And of course, my favorite toy, if you want, be able to create a dashboard so that our customers' internal clients can get access to Plotly.

Now, all this is good. That is great technology, and it took us some time to put it together. But of course, by giving all this power to the users, we need to make sure that we put some guard rails. That's why my team works closely with IT and legal to make sure that we have a data privacy component, and a component for fair and unbiased machine learning. And I'm happy to talk about these efforts maybe on Slack.

The last thing is being able to create is a wide audience, or user base, as possible. In order to do that, we started a collaboration with Coursera, where we made videos available to our users and actually our users, we call them some other times employees, can participate in the Analytics Academy.

They have a structure, and that structure based on the personas can sign up on particular courses. The courses can be becoming aware of analytics and how to ask other people to do that for you, do analytics yourself, and of course, be able to start leading an analytics-driven organization.

Certainly I have now migrated on that leading phase instead of being a hands-on data scientist, but we need to make sure that more leaders in the company know what to do about analytics and how to include analytics in their groups.

This Advanced Analytics Academy program has been going for two years now. And we have had some significant success on the use cases coming out of our Analytics Academy program. I will give the example of the estimated restoration time application that started as an academic exercise from the first cohort of our Analytics Academy.

But now my team has taken over that to see how we can productionalize it and deploy it across all of our operating companies.

Now that is our journey up to now. Today we have our analytics infrastructure on-prem, but we have started talking how we can migrate into cloud. So for the next one to five years, we have a hybrid, so use from part of our data on-prem, part of our data on cloud, and potentially compute on cloud.

And tomorrow we do see a full transition to cloud. Thank you very much for your attention.




Ready to Learn More? Here Are Some Resources to Help

Need Some Help?