How Wayfair’s product team track, visualize, analyze user engagement, and behavior data

Product analytics is the process of analyzing how users engage with a product or service. It enables product teams to track, visualize, and analyze user engagement and behavior data. See how Wayfair is doing this with more than 30+ applications in Wayfair’s Supply Chain Tech organization.

Topics Covered

Enterprises

Real-world implementation

Sign up to watch all Subsurface 2023 sessions

Speaker

Siddharth Jain

Senior Engineering Manager

Transcript

Note: This transcript was created using speech recognition software. It may contain errors.

Siddharth Jain:

Welcome guys. hope you’re having a great day heads service. I’ll quickly start jumping into the content that we have. I think we have roughly 30 minutes of time of 28 minutes now. But so what we’ll do is we’ll probably have some material to cover. Maybe we’ll go to like the 20 minute mark, 22 minute mark maybe, and then leave some time for questions. I’ll try my best to monitor the chat but in case there are any questions that pop up even in the middle, feel free to, you know, interrupt me and we can obviously have a good conversation as well. So, with that said just gonna start talking. let’s start with the overview. I mean, my bio basically a little bit background about myself. I’ve been doing data engineering, data integration work for over 20 years now.

I’m building data platforms at scale for both on-prem and the past, and then now on the cloud using a bunch of tools and technologies. initially obviously it was a lot of like packaged applications, things that you would buy and then implement and then doing a lot of open source development using a lot of packages that are available out there. I have strong experience working in building event driven and patch patch applications. work in a various different companies primarily in the retail and financial services space. And like I said, I am a data engineering lead in the product analytics space within Wayfair’s supply chain technology group. so that’s a quick background about myself. a little bit about the agenda. we quickly go through the agenda give you a quick overview of Wayfair, what we do as a company and then going to the product analytics area what it means to organizations what it means to waive their and then the challenge that we faced. and then how did we go about overcoming those challenges and into you know, state we are in right now. obviously we’ll share a lot of the technical details as much as I can. and then happy to take questions. if you guys have any afterwards.

So, Wayfair as a company was founded about 20 years ago, or maybe a little bit over that. and it’s headquartered in Boston, Massachusetts. It’s primarily focused in the huge home goods market. it’s a platform that connects thousands of suppliers with millions of customers. people come in and shop in Bayfair, you know, like probably like thousands and thousand people shop in Bayfair every day. it has a very strong presence in North America, which is US and Canada. And then we have a good presence in Western Europe in countries like uk, Germany, and Ireland. Now, Wayfair is also known for its proprietary end-to-end logistics network that we have built on our own. and it enables us to drive faster delivery speeds reduced reduced damage, and as a result, lower costs, which we then try to pass on to our customers.

So let’s talk about product analytics. What is product analytics? so, and so product analytics is a process of gathering and transforming user generated data into insights that reveal how customers interact with the product. Capturing and analyzing usage metrics provides good information about the quality of the product. things like error rates anomalies gives us an idea of how the user, what the user’s, users friction points are and what we need to do to overcome this. It also helps organization to track and analyze the user journeys. So, for example, you know, what are the steps a person is taking? A user is taking on the application to fulfill a particular you know, request or, or initiative fee feature or something of that sort. so it helps you visualize all of that, and then basically gives you, we have product development or engineering managers the opportunity to, to course correct or change if, if they, if they need to, they see that there is some anomaly there.

besides besides actual user applications backend services can also be analyzed to understand the impact those services are having. So, for example, we recently worked on a project for the fulfillment optimization team. we performed extensive analytics on the data the service was producing as a, at a high level, this service was responsible to provide recommendations for order fulfillment based on product availability cost of shipment delivery time, things like that. So we analyzed the data set it was quite complex, huge volume of data. and then we build KPIs around it to see how efficiently the service was being used how efficient the service was in doing the the forecasting, so to speak. So, for example, we built we calculated a percentage of like cheapest routes that were taken by the recommendations that were made, and also the fastest outtake and things like that. So a bunch of metrics that were calculated and captured and then shared back with the business. Okay, so let’s move on.

So at Wayfair, there is a need to ly identify opportunities, analyze and measure the impact generate actionable insights, thus enabling our technology, technology teams to make better data led decisions to make our, make the pro progress faster against their goals. But the, the, there were a bunch of challenges with the way we were doing this within Wayfair over the, over the course of the years these, the processes that were being used lacked cohesion. The work was done in a very, like, ad hoc manner with various degrees of fidelity. No standard process was being followed which basically meant that we could not trust the data, the results were often inaccurate. it would take time to generate the KPIs that were needed and then get feedback from them, right? So all in all, you know, no standard tools are being used.

collection reporting was time consuming and all that. So there was definitely a need to put a framework in place so that the product managers, engineering managers would be empowered to build and own their metrics through self-service dashboards. Also, there was a need to standardize the interface for curation dashboarding and metrics calculation to generate consistent output. So there was also a need to make the curator data available, searchable explorable, right? So, you know, it was not just restricted to that particular domain or that particular business area. It could be opened up to other teams who would be, you know, interested in looking at the data. So the approach to, because, so this approach would improve our velocity significantly, right? Because we have more than 30 plus applications within supply chain, and it’ll help the prior product managers to shift their focus more into discovery work, as opposed to doing this work around building out the KPIs and things like that.

So what were the technical considerations that we had to take into account? so based on the functional requirements that I just mentioned on the previous slide, we, we started building out a foundation, each HR building a foundational layer of data pipelines. You know, our goal was to ingest new data sources that were available out there into a common storage area for easy access. We obviously, we, we wanted to follow. We followed, you know, software engineering, data engineering principles to build robust pipelines. obviously a lot of emphasis was given on data quality and also on observability and alerting because the scale was just too much. Also, as mentioned, we collected all the data with the domain model so that it’ll easily accessible by the product teams. And whenever there was a new, new generated asset that we would create, right, we would basically in ensure that that is appearing in the data catalog.

So, getting a little bit more into the details of our implementation. This, this, this shows basically the progress that we are making from the left to right. You know, typically we would ingest the data the data could be coming in it may be already in be in g in a table in gcp, or a piece of event stream that’s coming in. for example you know, Google Tag Manager would generate a lot of these events from our applications. we would collect that store into a, you know, calculated layer. we calling it collated layer. then we would queue it that data. So do a lot of like data wrangling, typical data wrangling steps to clean the data, get into a form where it could be easily queried on as I said, right? and I’ll cover a little bit more details in the, in later sections, but the data was called Complex at times, and so it required us to do a lot of liquidation techniques.

duplicates handling was also a concern because depending on the volume of the data and how the services and applications are generating the data, we wanna make sure that the data is clean and also complete at the same time. So we build in a lot of like, controls in our pipelines to, to ensure that the data is clean and available and queried and available for people to consume. And then obviously, we want to present this data in a fashion that would make sense to the users. And so we had you know, we using, we are using the curve for most of our reporting and visualization.

Okay. my questions to now. cool, sweet, quick overview of the tech stack. the infrastructure, I think I’ll probably just set it. we are, we are all on Google Cloud. Wayfair is on Google Cloud. a lot of our applications running on, on G gcp. and then we are obviously using tools like Datadog and Slack for monitoring and alerting for DevOps since we built Elaborate C I C D pipelines for, for doing all of the work, we use Jenkins and Build Kite to do a lot of our deployments and, and automated testing and stuff. we used, we are using cloud Composer, which is basically a wrapper of airflow within gcp for a lot of our pipeline development. so typically, you know, airflow would be the orchestration tool that we be using and then call either like GB Q queries or Spark jobs or just Python code and stuff like that. And then we are using Google Data catalog for metadata management for storage. Lot of our backend storages with is in gcp sorry, in BigQuery just because it’s easy to access. and a lot of our users are like analysts in of the data. So they are very adept with using SQL to get all the information that they want.

And like I said, we are using BigQuery for a lot of our SQL developments and also using PI Spa for some of the use cases that we have. And the presentation there is Looker for dashboarding, and then look for doing a lot of the development and stuff. yeah, there is no genuine in this whole ecosystem just yet. I have been exposed, tore as a product, but not really used in this ecosystem, , but there’s not to say never, but there’s definitely a potential is just, we wanted to basically ensure that first of all, we are getting ourselves out there, you know, showing the value to all the different teams that we are supporting. And then definitely as we onboard more and more complex requirements with respect to volume of data and complexity of the data, we can definitely look at using tools like drio in our environment.

So, end-to-end data flow, please stand straightforward from what I explained to you earlier. but again on the left start of the source systems depending on where the data is coming from, and then ingesting the data either on a daily basis or, or in a realtime basis or near realtime basis. And then doing a lot of the processing of the data using D B T or D B T or, or just plain BigQuery or Python and Spark. And then storage is either on BigQuery data sets or depending on the need we’re landing the data in Lupu Cloud storage. And then we are doing a lot of the aggregation in in local as well. So from a technical outcome standpoint, right? So once we are implemented, we are pretty much gotten a few of our pipelines out in production

Since the scope of work was too much. We obviously wanted to build a rinse and repeat methodology where, you know, we would capture all the development into a playbook so that it’s easy to, to scale not with it is just easy to scale it cuz you want to keep it a re repeatable process. And also because we were, we, the body of all that we’re taking on was a lot. We created a score cutting mechanism to prioritize what projects you want to take on. Since this work was a lot of greenfield work we, we got a chance to learn, play with a lot of different tools and technologies. for example, we have to build a C S B process from scratch working with Cloud Composer to do a lot of the you know, development and testing and promotion of code from dev to, to production and also using D B T expectations.

and in general, putting in some some framework for data quality. We were, we basically have created a lot of these assets that could be reused across a lot of the, the projects that we take on. So with the work that we’ve done, you know, we are we have one place where all the curated data sets are acc are, are stored and accessed across the organization. the access to G V Q is pretty straightforward and if you guys have experience with it it’s pretty easy to get access and, and query. and also we have built the luer environment in such a way that whoever is experiencing Luca can go in and start doing explorations and start to that. And also eventually, if you’re a super user of sorts, you can even go in and adding your own KPIs and stuff like that.

So quick quick use case win that we had recently was, you know, we had an analyst who was working on a compass data set, which would take him about, you know, at least three or four days to calculate like four metrics. go through a bunch of like curation steps and things like that, and then report that back to the business, right? and also he would, he would be working on little bit of stale data. So we, with our automated process, we are able to now run daily workloads. We able to curate the data quickly, and also we’re able to build out like 10 KPIs with, you know, almost you know, fresh data that we that we were ingesting. Basically, the point is the goal was to free up everybody’s time to do more of, you know, work other critical tasks as opposed to tasks that were repeated or would take time. and which we can automate using you know, the tools available to us. Think we are at 10 minutes. I have one last thing to cover, which is lessons learned.

So essentially, you know, the lessons learned. so since this was a new program, the teams were not very aware of what the goals of the project were. And so we had to constantly socialize with what we were trying to do and how we’re trying to do things and things like that. So we create a lot of artifacts for that. We had like office hours sessions every week so that people could come in and you know, wet out projects to see whether it’ll make sense for us to take on those projects and things like that. And we also, you know, just to show progress in terms of how we are doing certain things, how are these end, end, end products working. We also included a lot of these stakeholders into our demos that we were doing as well. So it’s basically trying to create an education around what we’re doing, how it’s gonna help them at the end of the day.

I alluded to one big thing in the in a few minutes ago was working with the data, right? Like those, that’s one of the biggest challenge we face even now whenever you work with new dataset is to understand the data properly, right? So the data right of the bat is very complex. It’s very deeply nested tag data that if you have experience with you probably knew about it. but, and also there is huge amounts of data that gets generated, right? Like, think about a user on an iOS app or a, or a web app or something where these clicking going back and forth between screens and things like that. So we, there’s a lot of noise also that comes in with the data and then we have to do a lot of like extreme like cleansing of the data or like filtering of the data and things like that. so those are the type of challenges we faced while doing this work.

So in the end though you know, I would like to just summarize that we’ve been doing this work for the last eight, nine months. We’ve onboarded four full size implementations. obviously it’s a level of capacity was a little bit different in each case, but we feel good that we have all the critical components in place that were, that are needed to to, to support the needs of the business. and also take on more and more work on a, on a faster basis just because now we have a runbook that we created and we can easily open. We, as I said earlier, we’ve already tried to open it up to other users to come in and start building their own KPIs and stuff like that now, cuz the, is the, in a way that can be shared across can be shared easily.

And then the, the complexity of querying and passing of the data is not there anymore. and that way we are going towards a goal of decentralizing a lot of the development so that then people are more more so people are more can, can do more with, you know, what’s out there. sorry, I got a little bit distracted with one other question. So one of the big issues, quality of data, what is the process for data has fit for purpose? So yeah, the quality of data is obviously the biggest thing, right? So we did multiple things, obviously while, while building the pipelines. We work with the stakeholders to understand like what’s, what’s needed what are the key key process indicators I would say to, in terms of what is needed, right? So then we, it’s a combination of using standard data quality checks that are there.

So data, the d b DBT comes up with I think more than a hundred data quality checks that we can implement. and also a lot of the post-processing work that we do we have put in some guard drills in terms of going back and checking on the on the data quality and then reporting that back as part of our alerting and monitoring implementation. So that is something that we’ve done. And again, this is not one size fits all. it’s mostly around yeah, it’s not, it’s all about, it’s all case to case basis. sure, sure. Sanji I hope that answers your question. any, so yeah, actually that’s pretty much what I had in terms of material.