Affectiva's Demographic Debiasing tool
Data Infrastructure Designer & Developer leading a team of three to use statistical modelling & greedy algorithms to demographically debias machine learning models
Principal Data Infrastructure Engineer
Director of Data Strategy
Data Infrastructure Engineer
Front End & Back End App running asynchronously on AWS to run data debiasing tool by balancing multiple probability distributions of different demographics to ensure equitable train, validation & testing datasets
Time & Skillset
All Year 2018
How did it work?
With more than 6 billion data points ranging in different ethnicities, age groups, genders as well as other discerning factors, the need for a balanced split of data into train, validation and testing is essential to ensure equitable machine learning models. The problem of dividing a set into multiple subsets while balancing multiple probability distributions is NP hard.
A greedy algorithm that divides the set into multiple subsets to randomly split by experimenting with different factor combinations and a cloud based black box system that is to be used by machine learning engineers.
I initially worked on creating the algorithm and building a manually curated prototype of the system. I experimented with this algorithm for 5 months before designing a cloud-based system with the rest of the team. I then managed two interns building and designing the system together on top of Amazon Web Services.
In this project, I learnt a lot about system design and team management as well as task breakdown and management over the course of 4 months with the intern team. I also learnt a lot about documentation and handover for the team member who took over after I left.