Germain Michou-Tonning

Germain Michou-Tonning

@ Alvo

Lancer le département data d'Alvo: Une Proposition Stratégique

Build a Modern Data Stack and save 50M yearly on fraud Alma

Context

  • Alma is a French fintech leader in the Buy Now Pay Later industry
  • To drive its complex operations Alma needed truthful and reliable dashboards, an integration of third parties tools and efficients tools for their Business Analyst and Data Scientist.
  • Before our collaboration Alma was building dashboards on top of their applicative database and raw data. This was hard to maintain and time consuming for their analysts. Data Scientists were lacking proper tool and know-how to streamline their work and build top notch risk assessment algorithms. Data ingestion and transformation was done at the same time.
  • In over 8 months we built a Modern Data Stack from scratch, integrated all their data sources, built layered BI transformations and delivered a high quality framework for business analysts and data scientists. We also built real time ingestion pipeline of their applicative data, ensuring it was always fresh.
  • The end-result was much more efficient reports, automated tedious tasks and streamlined workflows for virtually every team. The fraud assessment algorithm built on top of the Data Platform allowed to avoid around 50M€ yearly of fraudulent transactions.
Hamza Sefiane

"J'ai eu le plaisir de travailler avec Bryce chez Alma, où nous avons collaboré en tant que Data Engineer. Bryce est un problem-solver exceptionnel, démontrant constamment une approche proactive pour relever les défis. Ses compétences en programmation sont de premier ordre, et il possède une compréhension approfondie des principes de Data Engineering. L'une de nos réalisations a été la construction de la plateforme de données chez Alma. Le dévouement et l'expertise de Bryce ont été essentiels à son succès. Ses capacités à communiquer efficacement et à collaborer avec l'équipe ont assuré un processus de développement fluide et efficace. Je recommande vivement Bryce pour tout poste de Data Engineer. Ses compétences technique, combinées à une solide compétence en résolution de problèmes et ses excellentes compétences en communication, en font un atout inestimable pour toute organisation. Les contributions de Bryce ont été inestimables pour notre équipe, et je suis convaincu qu'il continuera à exceller dans tout projet futur."

- Hamza Sefiane, Lead Data Engineer @ Alma

Alberto Rodriguez

"Ce fut un plaisir de travailler avec Bryce, une personne énergique, positive et proactive, dotée d’une vaste expertise en ingénierie des données. C’est un collaborateur constructif qui apporte une grande valeur aux projets auxquels il participe. Bryce est sans aucun doute un problem-solver pragmatique, n’hésitant jamais à emprunter la voie la plus efficace pour atteindre ses objectifs. Ses compétences en communication sont exceptionnelles et travailler avec lui est très agréable. J’espère avoir l’occasion de collaborer à nouveau avec lui à l’avenir !"

- Alberto Rodriguez, Lead Data Scientist @ Alma

Rodolphe Quiédeville

"Lorsque vous travaillez avec Bryce, vous devez être en bonne santé et plein d'énergie de votre côté, je ne l'ai jamais vu épuisé ! Bryce sait ce qu'est la data et a constamment des idées pour améliorer vos processus et ne renonce jamais face à un problème. C'était un réel plaisir pour moi de travailler avec lui. Si vous souhaitez mettre en place un traitement de données à partir de zéro, n'hésitez pas à l'appeler, avant moi, car je le contacterai très bientôt quand j'aurai besoin d'un Data Engineer dans mon équipe."

- Rodolphe Quiédeville, Head of Platform @ Alma

Marin Huet

"Bryce nous a accompagné sur une période d'environ 1 an afin de créer de toute pièce une data stack moderne chez Alma. Son expertise et sa rigueur ont été précieuses pendant sa mission et nous toute l'équipe a énormément progressé grâce à lui. Je recommande fortement pour toute mission courte ou longue de data engineer!"

- Marin Huet, Head of Data @ Alma

Data Engineering

Data Engineering is the first step to build a Modern Data Stack, this layer is responsible for

  • Ingestion of external data (Datalake)
  • Exporting data onto third parties tools (Reverse ETL)
  • Data Privacy
  • Maintenance of the data infrastructure

Datalake

I integrated a variety of raw data sources into their Datalake

  • Postgres (applicative database)
  • Hubspot
  • Zendesk
  • Stripe
  • Adyen
  • Google Sheets

I ensured that ingestion was fault resilient and truthful by providing dashboards that compared data quality between the Analytics database and the source of truth.

Their Postgres database was first ingested using a custom batch workflow but as they needed real-time data we ultimately switched to real time ingestion using CDC (Capture Data Change).

I monitored their infrastructure every day, ensuring quick fixes when any upstream ingestion workflow was broken.

Export data to third parties

An important need for Alma was to be able to deliver transformed data into external tools used by business teams. I created custom Reverse connectors to deliver data to Hubspot, Zendesk, or Customer.io built upon their respective APIs.

I created dedicated environments for third parties, so that they could deliver data directly onto their Analytics Database without compromising production data.

Data Privacy

Alma is a fintech that works with many businesses and direct clients, they pay businesses directly on behalf of clients so they can then pay Alma in installments. A data leak could therefore compromise many actors at the same and it was from upmost importance to ensure only people that needed specific datasets could access it.

I used GCP Big Query tag system to tag tables and columns according to their security level thus only people having specific acreditation could access sensitive data.

Infrastructure Maintenance

  • Managing data stack using Terraform
  • Ensuring third parties would have an access to our cloud
  • Role of Site Reliability Engineer within the data team

Analytics Engineering

Analytics Engineering is the layer where we craft a robust framework for data manipulation, it is responsible for,

  • Transformation of raw data into higher level business objects (modelization)
  • Building informative dashboards for business owners
  • Ensuring transformations are fault-tolerant

Transformation of raw data

Raw data is transformed into refined business objects that are tangible for analysts and stakeholders, this transformation is done within an hierarchy of layers. The business layer will contains table that describe business object in a holistic way.

We built this hierarchy for them and worked together with business analyst and business stakeholders to build relevant business layer.

We implemented those transformations using DBT, a powerful tool to transform your data. We then onboarded business analysts and data scientists on this infrastructure so they could be autonomous and build their own models without the help of data engineers.

This layered architecture makes it 10x easier for business analysts to build dashboards. After building it we migrated existing dashboard onto this new architecture.

Fault tolerant transformations

We implemented defensive programming with statistics check that would run in the CI github workflow so that if an updated column would have a standard deviation shift, or any statistics that would be a fit then the Pull Request could not be merged automatically.

The CI would also kickstart a Docker compose with Airflow and DBT and if a failure were to arise the merging would be blocked. This ensure safe and streamlined developpement processes.

Data Science

The main focus of this project was Data and Analytics engineering but I also contributed to Data Science efforts by supporting the team on a daily basis. I was responsible for coaching them on Data Engineering and Software Engineering practices.

We brainstormed and industrialized together their risk assessment model that would decide whether a transaction should be accepted or not.

An A/B test revealed that the new version of that model was responsible for a savings up to 50M€ a year, this was the culminant achievement of Alma's Data Platform and it would have never been possible without a powerful and robust Data Engineering infrastructure.

On en parle?