by Joel Limardo

by Joel Limardo

Software Developer and IT Professional

Chicago, IL

joel@joellimardo.com

SIP Phone (researching...)

Workadventure (researching...)

Downstream Project Status

BixChange

33%

LIMSExpert.com

28%

Upstream Projects

Joellimardo.com Project Cleanup

Sun, 13 Feb 2022

Please Don't Police Algorithms -- I Mean Math

I saw a video on Netflix called Coded Bias which discussed an MIT researcher's discovery of bias in face recognition technology. The title of the film itself was a problem -- 'Coded Bias' -- because it is possible to create similar results in machine learning systems without writing code at all. It can be accomplished by simply feeding the system improperly constructed datasets.

So as a programmer I am appalled that we were implicated in this so someone could sell the rights to this film. Shame on you those who knew better and allowed for this to happen. The less provocative title of the movie should have been 'machine learning bias' or 'weak dataset collection' but that doesn't fuel the alarmist in us so my guess is that it failed to make the cut for that reason. People in this field are not well understood and that makes them easy targets for sensationalist films.

Computers Only Know What Somebody Tells Them

Think of bad machine learning this way: Three very small researchers all want to know how to identify elephants. They approach one using their own tools and report their findings (the data). One says an elephant is a gigantic ball because he approached the animal from the rear. Another says it is long like a snake because he approached it from the front. Still another says there is no such thing as an elephant because he was given the wrong coordinates and wound up at a nearby health juice bar. If you take one, two, or in this case all three perspectives you might still have the wrong description of an elephant.

When machine learning 'gets it wrong' it isn't because it was coded that way but potentially for several reasons: a) it was fed poop-quality data b) somebody either lazily or purposefully omitted important data to skew the results or c) the algorithms were being applied somewhere not originally intended. I get pretty impatient when I read online articles like this and right about now I would be asking, 'so what it the solution then?' Well, I must admit to wanting to come up with a novel approach to this but another person at MIT beat me to the punch. Marzyeh Ghassemi of MIT (wow, the same school) in her article in MIT News titled "The downside of machine learning in health care" not only frames the problem but offers the solution that has nothing to do with creating some type of new governmental regulatory agency (the term "FDA for algorithms" is literally used in the movie):

Ghassemi recommends assembling diverse groups of researchers — clinicians, statisticians, medical ethicists, and computer scientists — to first gather diverse patient data and then “focus on developing fair and equitable improvements in health care that can be deployed in not just one advanced medical setting, but in a wide range of medical settings.”

That's it.

Because of the way the economy works certain software packages get popular and widely used. Users of these systems must assemble and ask with some regularity for evidence that the product responds to suitable test data (in this case one must petition for the inclusion of data that includes various genders and races). We have non-governmental bodies that do this all the time -- notably the ASTM.

But This Article Was About Math

Right-o. It was. Specifically it was about why we shouldn't go about trying to police algorithms. For this I have to pull out an old book titled, Mathematics for the Nonmathematician by Morris Kline, Dover Press, 1967. The section of interest is buried at the end of 14-5 THE MOTION OF PROJECTILES FIRED AT AN ARBITRARY ANGLE:

...One could repeat the procedures pursued in the preceding section, using different initial velocities and angles of fire, and thus perhaps obtain answers to some of these questions. But the work would be considerable and still leave us with the problem of trying to infer a general conclusion from a number of special cases. The mathematician would not proceed in this way. He [...it was 1967...mankind was referred to as 'he' back then] would suppose that the initial velocity is an arbitrary value, V, and that the angle of fire is an arbitrary angle, A, and then study the motion with these arbitrary values V and A. He might thereby obtain conclusions about all such motions because his results would hold for any initial velocity and any angle of fire.

The previous quote precisely describes how algorithms are produced. They are abstractions derived from special cases which are then rigorously tested against more data (or against formal proving methods) to verify that they are indeed applicable to a wider set of cases.

This is why policing them threatens research, learning, and even human thought. Your data will likely not accompany the algorithm as it treks it way through the validation process. This is where computer science diverges a bit from other types of science. Published algorithms are almost always divorced from test data and used repeatedly in different settings for validation merely with a description of recommended applicability. A good example that people often do not even remember is the disclaimer that came with the first versions of Java warning users to avoid using the language and the virtual machine in settings where human life might be lost. People in computer science look for and read these descriptions and apply the code and/or algorithms accordingly.

So, for projectile trajectory we must say that it works for any initial velocity and any angle of fire but our description must include the caveat that we only tested it on our own planet -- it might or might not work on Venus for instance. The business of testing it on Venus is for the the next person to accomplish and publish their modification to include other planets or a basic warning not to use it at all anymore. This process goes on and basically forever.

Companies that want to commercialize technology will likely ignore these admonishments (they probably do not fully understand them anyway) and might rush a technology to market in hopes of recouping their considerable investments or to corner a market early before the tools actually mature. Whatever the reason -- the point is that algorithms are neither good nor evil -- they describe something abstractly and were developed under circumstances that simply need to be openly disclosed. This is one of those 'Frankenstein' type problems that has been created by science (like flurocarbons) and requires industry/science remediation before any discussion of government action should even take place.

/technical/downstream/reviews Permanent Link Project Sitemap Top