Affectiva's Demographic Debiasing tool
Data Infrastructure Designer & Developer leading a team of three to use statistical modelling & greedy algorithms to demographically debias machine learning models
Team
Deyin Xu
Principal Data Infrastructure Engineer
Matthew Rossi
Director of Data Strategy
Farah Ashraf
Software Engineer
Ahmed Ibrahim
Hesham Ismail
Engineering Interns
My Role
Data Infrastructure Engineer
Project Manager
Deliverable
Front End & Back End App running asynchronously on AWS to run data debiasing tool by balancing multiple probability distributions of different demographics to ensure equitable train, validation & testing datasets
Time & Skillset
All Year 2018
Project Management
System Design
User Research
Big Data
Software Development
Distributed Systems
Data Visualizations
How did it work?
With more than 6 billion data points ranging in different ethnicities, age groups, genders as well as other discerning factors, the need for a balanced split of data into train, validation and testing is essential to ensure equitable machine learning models. The problem of dividing a set into multiple subsets while balancing multiple probability distributions is NP hard.
A greedy algorithm that divides the set into multiple subsets to randomly split by experimenting with different factor combinations and a cloud based black box system that is to be used by machine learning engineers.
I initially worked on creating the algorithm and building a manually curated prototype of the system. I experimented with this algorithm for 5 months before designing a cloud-based system with the rest of the team. I then managed two interns building and designing the system together on top of Amazon Web Services.
In this project, I learnt a lot about system design and team management as well as task breakdown and management over the course of 4 months with the intern team. I also learnt a lot about documentation and handover for the team member who took over after I left.