Pushing the limits of where artificial intelligence can reasonably be applied today, Janelle Shane, Ph.D ran an interesting experiment that caught our attention. Dr. Shane wondered how AI would behave when faced with a creative and culturally-specific challenge, that of naming paint colors. While her AI’s results are humorously awkward, they point to deeper concerns about machine learning that deserve our attention. It is all too easy to be distracted by the humor, or at times, ridiculousness of the results from these types of experiments, and dismiss them as not relevant to the “serious work of AI”, but that is exactly the wrong impulse. It is at the edges, in these types of failures, that we get clues about what we need to pay attention to. It is in these clues that we get our warning to be diligent: to learn from the edges and not be lulled into training data and models which rely on frequency and central tendency.
Dr. Shane’s two blog posts (1) (2) explain her paint-naming experiment and are a very interesting read. Of particular note on her approach, she specifically invited readers to collaborate and contribute to improve the results. In other words, she didn’t go into this experiment assuming she had special insight into the domain. She was also not doing this for commercial purposes and therefore could solicit outside (non-employee) input.
Every major tech company, and most minor ones, are “all in” on AI and machine learning, on projects across the spectrum of human experience; many of which are much higher-stakes than naming paint colors: insurance rates, healthcare risk profiles, and loan eligibility, to name just a few. At the risk of squawking like Chicken Little, we need to pay attention to how these efforts are set up. Are these efforts taking into account the full diversity of lived experiences, in ways that allow for ongoing incorporation of outlying data? Are these efforts deliberately bringing in different perspectives, like Dr. Shane’s experiment? Our fear is that previous non-inclusive approaches to software development of tech companies, and industries that increasingly rely on their know-how, will not be disrupted in this new era of AI; and in fact, is in danger of being even less inclusive.
Whatever unconscious bias a corporate culture has is not going to melt away when it decides to create an algorithm to do the work of deciding who gets a medical treatment or a home loan, work previously done by employees. In fact, it’s likely that the team of people creating the algorithm are going to be less diverse than the employees who used to do the evaluation and decision making.
We were going to write a longer musing on this topic but then discovered this excellent article by Richard Sharp so, instead, we’d like to amplify what he is saying. Mr. Sharp is quite correct that algorithms, just like individuals and teams, are subject to unconscious bias from at least three sources:
- Selection bias in training data
- Hidden variables
- Existing social biases that are perpetuated
Mr. Sharp goes on to say:
“As machine learning expands into sensitive areas - such as credit scoring, hiring and even criminal sentencing -- it is imperative that we are careful and vigilant about keeping the algorithms fair.
Accomplishing this goal requires raising awareness about social biases in machine learning and the serious, negative consequences that it can have. Just as tech employees are educated about the negative implications of their own unconscious biases, so should they be educated about biases in the models they are building.
It also requires companies to explicitly test machine learning models for discriminatory biases and publish their results. Useful methods and datasets for performing such tests should be shared and reviewed publicly to make the process easier and more effective.”
We wholeheartedly agree. Bravo!
--Jeanine & Kent