March 2, 2023

11:45 am - 12:15 pm PST

Unboxing the Concept of Drift in ML

Drift is a common phenomenon that occurs during the lifecycle of your model. It is one of the leading causes of performance deterioration. This session will unveil the concepts of drift, including model drift and data drift. This will also cover some of the practical techniques to handle drift and some of the challenges associated with it.

Topics Covered

Data Science

Sign up to watch all Subsurface 2023 sessions


Note: This transcript was created using speech recognition software. It may contain errors.

Supreet Kaur:

Hi everyone. I’m just going to share my screen and do a brief introduction before I go to the topic. So let me just expand my screen. And yes, so I’m here to talk about a very important topic. I think it is definitely underrated in the world of machine learning, but yet it is very important and it is very important for me because I have been in two of the most regulated industries. I was in healthcare, now I’m in finance, and I have seen the impact that drift can cause on your models, and sometimes it can be a life in death situation. so right now I am a AI product manager at Morgan Stanley. my day-to-day entails managing the entire product life cycle of an AI product, but I also developed POCs with the data science team. Previously, I was a data science consultant in the healthcare space.

 in terms of my education, I have completed my m MBA plus ms. It’s called MBS in data science from Rutgers, and I’m originally from India. So without further ado the agenda is pretty straightforward. for today, it’s talking, you know, just giving you a brief overview of what drift is then what are the different types of drifts. I’ll also touch upon some of these statistical techniques to detect drift, and then how can you deal with them. I’m also going to talk about some of the tools and technologies that are available out there for you to use to be able to be integrated in the entire lifecycle of your machine learning project or product. And in the end, I would just summarize everything that we discussed you know, in terms of best practices.

So what is drift? So as the word suggests, you know, drift is basically a gradual change or degradation of a model’s performance and data properties over time. People often say it, you know, as model drift, data drift, but if you understand the drift and the model creeps up because there is a drift in your data. So both the terms in my head are interconnected, right? So obviously it is a phenomena that occurs, and the data used to train a machine learning model changes from the original data that was used to train the model. So, in other words, the model can no longer accurately predict the data points, which were not included in the original training set. So it’s basically not able to generalize anymore and predict well as it was supposed to, right? So the another important thing is that while all of us were data scientists in 2020 we saw a change in user behavior, drastic change in user behavior.

We all you know, started seeing people preferring loungewear from office ware, right? That was one of the biggest consumer change people preferring masks and sanitizer on a scale that they never did. That is also you know, one of the causes of drift, and I’m sure you must have observed how the accuracy of your model went for a toss during that time. Right? Now, as I mentioned in my introduction, that the impact of drift, especially if you’re working in an industry where it’s, it’s on high impactful problems. so I’ll give you an example of a use case that I have worked on, which I’m very proud, but at the same time, it was scary. I was working in for one of my clients in the pharma space, and we used to predict how many units of drugs will be required every month for patients suffering from a rare cancer.

And it is very important for the medicine to reach them on time for them to be honestly alive. So if something like drift occurs and it creeps in your model you know, the accuracy and the predictions will go for a toss, and you won’t be able to predict inventory on time and you won’t be able to manage the entire inventory life cycle. And at the same time it’s true either ways, you know, even if you over predict, it’s the same that you have to have write offs, and all of these drugs are super expensive so it will cause your revenue to go for a to. So both of those cases are extreme, and that is why it’s important for us to understand this concept and make sure that we are doing everything in our power to manage it.

So now, the different types of drift that occur. Now, if you read any book or any blog, I would say on drift, you know, all of these terms are interchangeably used. I honestly don’t even know if there is one definition that exists for all these phenomena. But I want you to take all of these concepts you know, with a grain of salt. just to give you a context, what are the different types of drifts that exist instead of getting into the technicalities of the names, right? that we, we shouldn’t really care about that. So let’s start with concept drift. You know, I just gave you an example of Covid 19 when the performance of the model was hit for most of the use cases. So, you know, that is because the user preferences changed, and hence that is concept rift.

Now, obviously, if the preferences have changed, the data has changed. you know, your predictions won’t be the same, and hence it’ll impact your predictions. So it’s called the prediction drift. now there are also two types of data drifts which in theory are prior probability shift. So it’s basically when the target variable shift and co variate drift or shift is when your input variables have drifted, but the relationship between your features and target variables are still intact. So ultimately if you think such behavior, both of these data da drift behavior will ultimately lead to a change in the moderate drift. And that is why I said all of these concepts are interrelated.

So now I’m gonna shift gears you know, a little bit to the statistical techniques. Now, they’re very complicated to explain in such short time, but still, I want to give you an intuition of the, of two important statistical techniques that in my experience, I’ve found are useful for the kind of data that is available, right? So the first one often called KS tests, so it determines if the data set comes from the same distribution. So it’s a non-par non parametric test, which means that it does not require any assumptions about the underlying distribution of the data. And that is also one of the advantages, because sometimes you don’t know what that distribution you know, looks like and how the KS test works. It, it’s basically works by comparing the cumulative distribution function of any two data sets, right?

And you know, you can calculate CDF in multiple ways. You can use a Python function or R function whatever is you know, viable for you. And basically, if you are able to compare those CDFs, you know, you can see how different your data sets are. So here you are comparing the two data sets that I’m talking about is your training data and your post-training data, right? So you should be able to compare. And after that, it’s like a simplest statistic test, as I mentioned, that if the NU hypothesis is that the distribution of both data sets are identical, so based on whatever results you get, if the alternate hypothesis is accepted, we conclude that yes, the drift has happened. If not you know, then you’re still fine. and one of the disadvantages of this test is definitely that you need to specify some of these parameters about the data, like location, scale, shape.

 sometimes in, especially in larger data sets or big data, this might not be possible, which is fine, but if it is, then you can use the KS test. the next one is population stability test. And this is a little bit easier to understand you know, than the KS test. So it basically is, again, to detect population changes over time. It’s, again, a statistical analysis technique, and it is used to evaluate, you know, whether the characteristics of the population, such as your mean variance have they changed or, and how much have they changed in a period of time, right? And this test is actually used in heavily used in finance where it is essential to ensure that the investment models or your trading strategies continue to perform consistently over time. And so one, you know, some of the advantages that I’ve already mentioned, you know, it’s easy to interpret and it can, it’s very I would say efficient in identifying the trends in the population.

But again, the disadvantages are that it assumes that there’s a linear change in population over time which might not be the case. And so yeah, that is I would say a pitfall of the test. Okay, now shifting my gears to some of the techniques to deal with drift, right? So more than detecting drift, you also have to deal with it at some point in time, right? So I think the first two are very obvious, which is you monitor the model and you monitor the data quality. So there are tools, and even if your model is in production, there are ways that you can actually put an alarm that if your model deteriorates over a certain under a certain benchmark, it’ll alarm you so that you know that, okay, model performance has deteriorated, and then you have enough time to mitigate it.

And monitoring the data quality is equally important because most of the time your model would have drifted because there’s an issue with your data quality. so one thing that has worked in my experience is that if you have your DQ data quality frameworks you have them set up just before your model is refreshed in production and whatever that cadence look like, it might be weekly, monthly, quarterly for you, so that if there is a issue with the data, you have a few r and your developers have few Rs to look into those issues, to analyze them further, and then assess what could be the performance of the model, right? Sometimes those alarms will be false and they’ll be minor. that’s fine. But at least if you have those frameworks in place, you will be able to detect them on time and curb it if possible.

 so somewhere you have to be you know, have to opt for preemptive approaches when it comes to dealing with drift. Another is retraining and redeployment. This is definitely like the last resort. you know, if nothing is working and now you have to retrain it. so you know, by all means do it but if you know, but this is not like a safe method per se. another one is data augmentation and data normalization. So again, it’s, it’s dependent on the data quality, right? So data augmentation is great because that way, you know, you can give your I would say if you, if you have a data quality framework, then if you’re augmenting the data using synthetic data techniques you are giving your DQ tool enough data for it to learn, okay, what is bad and what is good.

So all of these techniques can help. And data normalization, definitely that’s good for your model as well. you know, everything should be on the, on the same scale for Drift or for your model training performance. next is the Xai frameworks, which you know, have gained a lot of pop, a lot pop popularity in the past few years. and that is because if you opt for glass box approach, you will be able to know exactly what is happening with your model. So in case the drift occurs, you will be able to backtrack those performance changes. And your business stakeholders and partners will also have some peace of mind because they know that you are able to explain them what exactly happened or rather than giving them a black box excuse. and the last one is feature dropping. It is often the case that a few features that were used to train the model six months down the line or near down the line, they might not be relevant anymore. so it is important for you to be able to assess which features are important and which are not. And you know, you can also again use a tool or use some sort of coding to see, you know, when those features, if those features are not contributing enough you decide to drop them. And so, so continuous testing of your AI model or of your data is important in a nutshell.

Now, these are some of the tools and technologies. Now, some of them are open source, some of them are not, and you might have to buy them. so I am going to quickly skim through the slides. in no way this is to promote one tool over the other because one size you know, doesn’t fit all. so it’s possible that, you know, you might be more comfortable with deep checks. I might be more comfortable with ML flow. so this is just to name a few of the tools that are available out there. By all means, you can explore and see whatever is the best fit for your use case.

Okay, so the last slide to conclude the presentation is some of the things that I’ve already spoken about, but this is more like a summary or, or, you know, you can say key takeaways. So the best practices to deal with drift are obviously data collection and pre-processing should be done vividly, as we all know, garbage and garbage out. And then having frameworks that continuously flag your a u c, your r o c, msc, whatever you know, metric you want to measure or whatever your problem looks like. If it’s a prediction problem or a classification problem, it’s important you know, to be able to flag it, as I mentioned in some of the tools. Then data labeling is another one. you all may know that the industry is struggling to do accurate data labeling at this point. You know, there are companies who do it for us.

There are tools out available that do it for you, but if not done properly, data labeling can induce some kind of bias or you know, some kind of error that can ultimately lead to deterioration of your model performance. So if you’re choosing an outside vendor, you need to be careful if you are, sorry, if you’re designing some tool, then also you need to be mindful of how you are doing data labeling. then it’s defining a clear B benchmark, as I mentioned in your data quality or your model monitoring tools. you are obviously thinking about a benchmark that if my model’s performance goes below this, then it’s a problem. Now deciding that benchmark is not something data scientists or ML engineers can do, install. This has to be accumulative decision of business stakeholders of SMEs to decide what that benchmark would look like.

One thing that has worked in my experience is to see the model accuracy from a legacy tool. So if you were using, let’s say, s a P for your forecasting, so what was that number? What did that accuracy look like? And then comparing that with your model will give you that benchmark range. But again, this is not, I don’t think you can ever get this to perfection unless you have gone through this production life for three or four years and now you understand the model in and out. next is more of like a technical technique that you know you can choose, which is ensemble learning with model weighing. So this has been a popular technique where you know, that people have been using to kind of mitigate drift, but again, this is not for all use cases. So you can see if it kind of strikes with your use case. And last, but not the least, is choosing the right tools and technologies. as you know, there are two kinds of approaches. One is reactive. You know, okay, something blew up, let’s do some patchwork and let’s move ahead in our lives. But for Drift to be able to mitigate that, if you really want to ensure the success of your model in production, it is important for you to have some sort of tools or benchmarking in place even before you know, you experienced drift for the first time.