Modern AI Described by Mathematics from High School

Original article was published on Deep Learning on Medium

Modern AI Described by Mathematics from High School

Current artificial intelligence is born out of the math that gets you past middle school calculus. It blows my mind. I have just finished reading Geoffrey Hinton et al.’s 1986 paper on the ‘revolutionary’ concept of back-propagation, the technique behind all of modern deep learning. And in all of those 4 double spaced pages with schematics and diagrams making up about 1/3 of the actual paper, with far less mathematics than you would ever see on an AP math subject test, is the key somehow to all of modern artificial intelligence. It flabbergasts me to think this, as a mathematician, isn’t some kind of hoax.

The purpose of applied mathematics is, after all, to adjust complex mathematical algorithms to solve real industrial, socio-political, or economic challenges. The problem with moving from theory to application is usually to do with the complexity of the model: it is either far too mathematical and needs to be toned down for reader intelligibility or it simply doesn’t account for all the probabilistic parameters of the real working world.

Since artificial intelligence is meant to “imitate” the most complex algorithm of all, the functionality of the human brain, how is it that a simple chain rule of derivatives that can be done on the back of a napkin by a middle-schooler can possibly imitate our neural connections? For this is what back-propagation, the bedrock of deep learning, and therefore AI as well, is: gradient-based optimization of learning parameters using the chain-rule for derivatives. Child’s play, but the applications for this technique at scale have become huge. The reason being that although this is a simple, benign replication of how humans learn (you make a whole lot of errors at first in a given field, then iteratively adjust your performance to lessen the losses and errors each time you go back to perform the same activity), when done at scale, the applications become increasingly similar to how a human can be taught. Sometimes, even, the machine can learn faster, and more reliably. Either way though, it’s only elementary calculus, Watson.