DeepMind achieved StarCraft II GrandMaster Level, but at what cost?

Source: Deep Learning on Medium

DeepMind achieved StarCraft II GrandMaster Level, but at what cost?

AlphaStar Final vs Serral. Source: YouTube ArtosisTV

Thanks to the advancement in deep learning and reinforcement learning, DeepMind achieved groundbreaking success first in Atari, then in Go, and finally in Starcraft II.

One of the major benefits of AI research is the ease of replication. Before this decade, many researchers had to hand-code gradient calculations, loss functions, and data processing pipelines. Nowadays, solving a MINST dataset problem using Keras only requires less than 70 lines of code.

Replicating DeepMind’s Atari results is relatively easy. Having an Nvidia 2080 Ti GPU would undoubtedly speed up your training process, but all you need is a modern computer. There are many resources available online, helping us speed up the development, training, and testing phase of Atari Deep RL Agent. Most notably, OpenAI Baselines has already implemented the Atari solution for us. And in fact, this task is easy enough to be designed as a Berkeley CS285 Deep Reinforcement Learning homework (OK. This class is not easy, but you get my point.)

Replicating AlphaGo, DeepMind’s agent for playing Go, is a different story. AlphaGo Zero: Starting from scratch is the official story behind developing Alpha Go Zero and you read can more details regarding the performance and computation requirements in the Mastering the game of Go without human knowledge paper. According to the analysis by Dan, it would cost around $35 million to replicate the paper. Yeah, it is not a small number for any independent researcher or even a large university lab.

AlphaStar Agent Visualization. Source: DeepMind.

Training AlphaStar, the DeepMind agent for Starcraft II, is not an easy feat either. According to DeepMind blog,

In order to train AlphaStar, we built a highly scalable distributed training setup using Google’s v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent.

We are not sure how many agents were used by DeepMind. The DeepMind blog illustrated 600 agent IDs, but not all agents were trained concurrently for 14 days. According to the DeepMind StarCraft II Demonstration, they only selected 5 agents in the end for the match against TLO. These 5 agents are now considered intermediate step because they only played Protoss, and seemed to have superhuman control such as peak APM.

Let us assume there were only 5 agents for training. As a reference, the TPU Pod type v3–32 TPU costs $32 per hour based on Cloud TPU pricing updated in December 2019. Thus, we are looking at the following numbers:

Number of agents: 5

Hours: 14 days * 24 hours per day

Number of TPU: 16

TPU Cost Per Hour: $32

Total Cost: $860,160

In the more recent DeepMind nature paper, DeepMind resolved the previous superhuman control issues and improved the performance of the agents. We can also find additional clues regarding the number of agents. The agent here refers to the AlphaStar Final agent for each Starcraft race that achieved the GrandMaster level.

In StarCraft, each player chooses one of three races — Terran, Protoss or Zerg — each with distinct mechanics. We trained the league using three main agents (one for each StarCraft race), three main exploiter agents (one for each race), and six league exploiter agents (two for each race). Each agent was trained using 32 third-generation tensor processing units (TPUs23) over 44 days

Number of agents: 3 + 3 + 6 = 12

Hours: 44 days * 24 hours per day

Number of TPU: 32

TPU Cost Per Hour: $32

Total Cost: $12,976,128

Yes, it would cost $12,976,128 to replicate AlphaStar Final. Admittedly, it would cost less for DeepMind to use TPU because they are considered an internal customer at Google. However, we are only considering the cost of replication from an independent researcher here and we are not accounting for the cost of model tuning, evaluation, and pipeline code development (and keeping Oriol Vinyals and David Silver on payroll).

Without both the infrastructure and financial support from the parent company Google, DeepMind would never be able to achieve the Grandmaster level in Starcraft II. And this is only the beginning as DeepMind continues to contribute in areas such as weather foresting and radiology screening.

Moving forward to the new decade, as independent researchers, we should not compete with big companies like Google in areas that require large amount of computing resources. While waiting for the compute cost to go down for the same or better hardware, we should continue optimizing our model architecture for faster training and inference, and coming up with novel ways to gather and annotate data.