Just about a month ago AI garnered another award in the field of gaming, winning a 3 round match of 5v5 Dota 2 against a team of high ranking players. While last year a bot from the same team was able to beat professionals in 1v1 matches, OpenAI’s new 5v5 bot, fittingly named OpenAI Five, is an even greater accomplishment (OpenAI’s Dota Team, 2018). Adding 4 more plays on each side adds a tremendous amount of complexity to the game, and the new aspect of teamwork arises. Many leaders in the field of tech, like Bill Gates, commended the achievement as a hallmark in the timeline of AI. (Clifford, 2018).
OpenAI Five uses a technique called reinforcement learning, a specialized branch of AI that has proven to work particularly well for game playing scenarios. Unlike other sub-fields of AI, reinforcement learning uses a reward system to train agents (such as a Dota 2 bot). This reward system functions similar to the way one would train a dog or a new baby learns new behaviors. When touching a hot stove, you receive a negative reward in the form of pain, making you want to stop touching hot things. When you do something good, you might receive a gift from your parents, encouraging you to show the same behavior more often. Similarly, a bot in Dota 2 might receive a negative reward when it loses the game and a positive reward for winning the game.
OpenAI Five is not alone in its achievement, but rather one of many in game playing AI systems. One of the first, big game playing achievements making use of reinforcement learning came about in the early 1990s when Gerry Tesauro created a Backgammon bot that could compete with professional players (Teasuro, 1995). In 1997, the chess world champion was dethroned by IBM’s Deep Blue, that played moves based on an algorithm to rapidly search through possible outcomes and predict its opponents’ moves (Piech, 2013). After a 19 year silence, In 2016 DeepMind took the world by surprise with AlphaGo, an AI that beat the then world champion of Go. With around1054more possible board configurations than chess, Go is a game of extreme complexity and difficulty. Even the then champion, Lee Sedol, expected AlphaGo to be an easy win, ending 4–1 or 5–0 (Lee, 2016). Just a year after in 2017, A newer version, AlphaGo Zero beat the 2016 version 100–0 (Silver, 2017). With OpenAI Five coming out just a year later, game playing AI seems to be on an unstoppable path to surpass humans in no time. But all the success begs the questions: why the sudden revival of game playing AI? Why is it working so well? And will it continue like this?
Reinforcement learning is actually not the only AI field that has been thriving with such rapid success in recent years. Rather, the field of deep learning, which encompasses reinforcement learning, has been improving in tasks from voice recognition (like you might use with Siri) to object detection (as used in the software made for self-driving cars). The AI boom was kicked off in the 2012 ImageNet competition, where teams competed to build programs that could detect what was in a given image. While success on this problem had been slow moving for years, the winners of the 2012 competition beat the previous error rate by around 9%, taking the accuracy from 75% to 84% with the use of deep learning (Gershgorn, 2017). However, the invention of deep learning is not the sole cause of the AI boom, as deep learning techniques had been invented in the 1950s, but never heavily used. What did change was the access to massive amounts of formatted data and much faster hardware that could build larger and faster deep learning models. While software has played a quintessential part in the current era of AI, it was not the cause for the initial revolution.
Software came to take a bigger role in development following the initial jump in 2012. Since then, a plethora of modified software has come out to augment the abilities of deep learning, and many models have been created that work particularly well on certain tasks. With primarily improvements in software, the ImageNet competition came to and end in 2017 when AI surpassed humans in image detection, with and accuracy score of 97.3%.
Looking into the future, software, hardware, and data will most like all serve as pillars for the advancement of AI, but what is to say AI progression won’t slow down, or even reach a limit and stop. Data is more readily available on the internet than never before. Particularly, there exist public data sets for most popular tasks in deep learning. Platforms like Kaggle and Google even have features that allow you to search for data sets. Creation of data sets can also be efficiently done by making use of software like Amazon’s Mechanical Turk, a crowd source platform for labeling data. For the majority of tasks, lack of data no longer poses a threat. While hardware had been doubling in efficiency every two years since 1965, new data seems to suggest that current hardware production techniques are reaching their maximum capacity, as parts can’t physically get much smaller (Simonite, 2017).
Luckily, software has been improving, but not as fast as recent accomplishments in the field may lead some to think. OpenAI used an LSTM, a deep learning model that had already existed since 1997 (Hochreiter, 1997). To achieve their results, OpenAI Five used 128,000 CPU cores and 256 GPUs to train its bots (OpenAI, 2018). An average computer only has 2 CPUs and 1 GPU. The bots trained over the course of months, collecting around 900 years of experience every day. AlphaGo Zero used a comparable amount of computing power to achieve its results. Such cases make the prospects of crowning such achievements without a colossal super computer unlikely. With conventional hardware reaching its limits on computing, such supercomputer systems are unlikely to be scalable at a consistent rate. Deep learning will surely continue to achieve success in many fields, but without a new revolution, deep learning may plateau, leading the world to its next AI winter.
Clifford, C. (2018, June 28). Bill Gates says gamer bots from Elon Musk-backed nonprofit are ‘huge milestone’ in A.I. Retrieved from https://www.cnbc.com/2018/06/27/bill-gates-openai-robots-beating-humans-at-dota-2-is-ai-milestone.html
Gershgorn, D. (2017, July 26). The data that transformed AI research-and possibly the world. Retrieved from https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. doi:10.1162/neco.19220.127.116.115
Piech, Chris. (2013). Deep Blue. Retrieved from http://stanford.edu/~cpiech/cs221/apps/deepBlue.html
Lee Sedol certain he’ll beat AI at ancient Chinese game Go. (2016, February 22). Retrieved from https://www.dailymail.co.uk/sciencetech/article-3458081/Human- champion-certain-hell-beat-AI-ancient-Chinese-game.html
OpenAI. (2018, September 11). OpenAI Five. Retrieved from https://blog.openai.com/openai-five/
OpenAI’s Dota Team. (2018, August 31). OpenAI Five Benchmark: Results. Retrieved from https://blog.openai.com/openai-five-benchmark-results/
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., . . . Hassabis, D. (2017, October 18). Mastering the game of Go without human knowledge. Retrieved from https://www.nature.com/articles/nature24270
Simonite, T. (2017, February 06). The foundation of the computing industry’s innovation is faltering. What can replace it? Retrieved from https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/
Tesauro, Gerald. (1995). Temporal Difference Learning and TD-Gammon. (n.d.). Retrieved from http://www.bkgm.com/articles/tesauro/tdl.html
Source: Deep Learning on Medium