ManoMano is an unicorn start-up in the e-commerce industry, specializing in selling DIY products. As a marketplace their goal is to match clients with DIY sellers, taking a fee in the process, they also generate a great quantity of data which can be used to tailor recommender systems, optimize retention, acquisition and User Experience. They worked with Pyramind to build their paid acquisition systems, increase their conversion rate and industrialize their data science models with industry-standard craftsmanship.
"En tant que manager Data Science chez ManoMano, j'ai eu la chance de manager Bryce pendant plusieurs années. Ne vous fiez pas à son jeune âge ! Il a su travailler avec succès sur des sujets data complexes et stratégiques pour l'entreprise. Orienté impact utilisateur et production, Bryce a toujours été force de proposition avec des solutions innovantes mais pragmatiques sur l'ensemble de la chaine de valeur récolte de la donnée => production, en passant par le Data Engineering et la Data Science. Bryce a un profil pluridisciplinaire rare lui permettant d'avoir une compréhension précise des enjeux de chaque équipe data (Engineering, Analytics, Science, MLOps, ...) et donc de travailler avec chacune d'elles.""
- Romain Ayres, Head of Data Science @ ManoMano
I developed an algorithm to identify Google keywords to target, manage ManoMano's adwords campaigns, score each keyword daily to determine the optimal budget and select adwords best parameters.
Keywords were identified using a mix of Google APIs, ManoMano's own search database and business recommendations, they were then uploaded to Google Adwords campaigns using Google Adwords API. New keywords were discovered everyday (for example if a new kind of product is on sale) and campaigns were updated accordingly.
I crafted a dataset containing every query that eventually led to a conversion using ManoMano's historical data. I used LightGBM, a Gradient Boosting Tree Ensemble model, to model the query revenue on a daily granularity. Feature Engineering included: category of query in ManoMano's taxonomy, Device used by user, Country of user and cumulative past revenue.
The result was an automated campaign spending around 1M€ monthly while profitable.
I improved the existing Google Shopping bidding algorithm by: crafting new features, rewriting feature engineering compute engine and analyzing dataset and model's output by hand.
Google Shopping is ManoMano's main source of traffic, with automated campaign spending up to 1M€ daily a small uplift in spends and profitability can bring insane ROI. The model was a LightGBM Gradient Boosting Tree trained on over 1 billions rows and 500 features, its dataset contained 1 line per day x product x device which means it was representing a time-series of daily product revenue per device.
My first and most important contribution was rewriting the feature engineering compute engine which was computing moving average of past revenue over different granularities. It's main issue was a heavy memory consumption that could make the model training process crash if we were computing too much data. I fixed that by using a completely different process that was easily splitable by time periods, it allowed to compute features per chunk and not all at once. The result was a slower training process but that would now be constant in space complexity instead of linear.
Other contributions were coming up with new features (ensembling different models, using different time windows, mixing different granularities, ...) and tuning the model's hyperparameters.
The result of that work was a model being able to spend more while being more profitable and that was able to outsmart Google's own bidding algorithm.
A randomized A/B test revealed the final uplift was around 7% or around 30M€ incremental business volume on a yearly basis for ManoMano.
An analysis using an internal causal model inspired by Amazon showed that delivery time was a strong impact factor when it comes to conversion rate. A product whose delivery time was judged too slow for customers would simply not get ordered.
By working with business stakeholders I uncovered that delivery times were too pessimistic when comparing to historical tracking data, I developped a model that would estimate delivery time for a given product using that same data and industrialize it into production.
Delivery times got reduced for 60% of products, a statistically significant A/B test then revealed our model was responsible for a 3% uplift in conversion rate.
On en parle?