How Personalized Recommendations Increased Sales For a Cosmetics Retailer
August 4, 2023 . health
The client is a major player in the Cosmetics Retail Industry having >200 stores across India. They had an inventory of >50,000 SKUs from hundreds of brands. They had ~5Mn Customers which helped them generate >$100Mn in Revenue.
In today’s world, multiple brands are adopting digital transformation to help them capture their customer’s attention. However, to increase the brand’s share of wallet, they need to recommend the right product, using the right message at the right time to the right person i.e. hyper-personalization. The Retail chain wanted to create an amazing customer experience by helping shoppers find relevant products faster, which would in turn improve their average basket value.
We kicked-off with a diagnostic phase where we interviewed some stakeholders to narrow down on the problem statement. We then performed some exploratory analysis to understand the current trends and various KPIs. Using these KPIs as a reference, we set quantifiable acceptance criteria for the project. This diagnostic / exploratory phase also gave us insights into the scale of data which needs to be processed by the system so that we could design its architecture accordingly.
Yugen’s team broke down the problem into the following 3 components:
- Building scalable ETL Pipelines capable of handling data at scale
- Building a recommendation engine to understand user behavior and show relevant products
- Building a micro-service to enable the retailer to seamlessly consume the output of the engine via an API.
Experimentation is a key tenet of Data Science. Yugen’s team leverages it with the ABDM Framework:
Perform Deep Exploratory Analysis to identify trends, patterns, features which influence customer purchase behavior
Build infrastructure & pipelines capable of handling data at scale at the desired latency such that Engineering is not an after-thought
Focus on Fast Prototyping to production, which enables us to make model releases faster and hence improve consistently. All of our promising model iterations make it to production.
Building model performance measurement framework because “What gets measured gets improved!”
To support the scale of data, we needed to build pipelines capable of handling the data for ~4-5Mn customers making ~100k transactions a day. We built ETL Pipelines using a combination of highly optimised Python Scripts & NoSQL Queries and scheduled them using Chron Jobs. We initially thought about orchestrating the system with Airflow but we decided that it might be an overkill and we could do so later down the line.
Performing a deep exploratory analysis yielded us a lot of insights and helped understand customer behaviour. We experimented with a lot of different approaches & algorithms like Collaborative Filtering (based on user-user & user-item interactions) & content based (like probabilistic classification models). While we were trying to optimize the models for better precision & recall, we also had to ensure that diversity (serendipity) & explainability were not compromised.
Upon further evaluation, we understood that Collaborative approaches have higher bias but low variance whereas the content based approaches have a lower bias but higher variance which is why we decided to go ahead with an ensemble hybrid model which uses a combination of both the approaches to:
- Identify candidate products
- Identify the probability of conversion
We also addressed the problem of cold-start (for new users) by first rolling out a few experiments like:
- Showing top 5 highest selling products of a store
- Showing top 5 highest selling products based on demography
- Showing top 5 products with highest discount
- Showing newly launched products
We eventually concluded that this problem should be solved using a combination of Collaborative approach & a rule-set based framework for higher ROI.
We developed this model using data from pre-covid transactions however, we observed that the user behaviour has changed significantly post-covid like
- Many new categories of products were introduced in the stores (like disinfectants, sanitizers etc.)
- Many of the top selling products were experiencing significantly lower sales (like Nail Polish/ Nail Paints etc.)
- While only 5% of their revenue was from online stores previously, it grew to around 15%-20% post covid.
We rolled out a series of experiments and understand & test our models in different situations to get the best results.
Inference Pipeline (API)
We needed to serve ~8 recommendations for each user within a latency of ~500ms. We decided to build a micro-service and expose it in the form of API, so that it could be easily used by the clients. API contracts were designed to enable us to get information like CustomerID, Loyalty ID, Gender, Items currently in the cart etc. The system was built using a combination of services like AWS API Gateway, AWS Lambda & AWS EMR.
The inference pipeline comprised of 3 different stages:
- Getting outputs from Collaborative filtering Framework which needs fast querying & throughput
- Getting outputs from Content Based Models which calculates the probability of a user purchasing each of the individual candidate SKUs
- Combining the results, post processing and generating the final output
Post release, in a span of 3 months, we observed that:
- We served recommendations to ~2Mn customers who made ~7Mn transactions
- The average latency of our API was ~400 ms.
- Average Customer Basket Value increased by ~2.5% in a span of 3 months post release
- # of unique products purchased per customer increased by ~5%
- Customer Retention improved by ~0.5 ppt