The current pace of progress in artificial intelligence is remarkably fast. Not a week goes by without an exciting new result about an AI system that can better understand natural language, recognize objects in an image, master games like chess or Go, or solve a Rubik’s Cube.
A major factor in this type of AI progress is the introduction of bigger, more expensive deep learning models. An analysis by Amodei and Hernandez (2018) shows an overall increase of 300,000x in the cost of training AI models (essentially, teaching them how to perform a given task) over time for state-of-the-art models.
This large increase in computational requirements has raised barriers to participation in AI research due to the rising cost of generating state-of-the-art results — annual computation budgets for many academic research labs, for example, are often smaller than the cost of training just one of the state-of-the-art models in the figure above a single time. It also leads to negative environmental impacts; Strubell et al. (2019) showed that modern training of AI models consumes a similar amount of CO2 as an average car does in 5 years.
To draw attention to this trend, our team of researchers at the Allen Institute for AI recently authored an opinion piece titled “Green AI”. In our paper, we listed the factors that contribute to this trend and outlined potential research directions to help mitigate it. We are happy to see our paper receiving wide interest (New York Times, Slate, VentureBeat, GeekWire, Fortune, Synced, MIT Tech Review, and others) and inciting discussions about this important topic.
In this blog post, we summarize the main points presented in the paper and address some of the issues raised in the public discussion that followed.
To better understand the concept of Green AI, we start by describing the opposite term, Red AI. Red AI refers to AI research that seeks to obtain state-of-the-art results in accuracy (or related measures) through the use of massive computational power — essentially “buying” stronger results.
To highlight the prevalence of Red AI, we demonstrated the focus of the AI community on measures of performance, such as accuracy, at the expense of measures of efficiency such as speed or energy cost, by showing that a large majority of the papers in top AI conferences target results related to accuracy over efficiency.
What makes an AI system Red?
To better understand the different ways in which AI research can be considered Red, consider a specific result that gets reported in a scientific paper. AI systems typically learn to perform a specific task by observing many examples. Such tasks include recognizing an object in a given image, translating a piece of text from English to French, or making the next chess move. Developing such systems typically includes a phase called training, which includes processing the collection of examples one by one (e.g., pairs of English/French translated sentences), until the model has learned to make correct predictions on new, unseen examples (e.g., translate a new English sentence to French).
AI models are typically evaluated by measuring their accuracy level (e.g., from a set of, say, 1,000 unseen English sentences, how many sentences did the model translate correctly to French?). To further improve the model accuracy, the development process also includes running multiple phases of training and then selecting the one with the highest accuracy score.
In recent years, many AI researchers have adopted a simple way to make AI models more accurate, by simply pouring more resources into their development:
a) training a more computationally expensive model, in which processing a single (E)xample is more costly in terms of money and energy,
b) training the model on more (D)ata (a larger number of examples), which results in longer and more expensive training, and
c) running many training experiments, in search of even higher accuracy results, often referred to as (H)yperparameter tuning.
Together, the overall cost of an AI (R)esult can be thought of as a function of these three factors:
The equation of Red AI: The cost of an AI (R)esult grows linearly with the cost of processing a single (E)xample, the size of the training (D)ataset and the number of (H)yperparameter experiments.
Increasing each of these factors increases the overall monetary and energy cost of generating the given result, which we refer to as Red AI.
Our goal here is to shed light on the practices of Red AI. Importantly, we note that Red AI work is valuable, and in fact, much of it contributes to what we know by pushing the boundaries of AI. Indeed, there is value in pushing the limits of each of the quantities discussed above. Currently, despite the massive amount of resources put into recent AI models, such investment still pays off in terms of accuracy (albeit at an increasingly slower rate). Finding the point of saturation (if such exists) is an important question for the future of AI. Red AI costs can even sometimes be amortized, and therefore extremely valuable, because a Red AI trained module may be reused by many research projects as a built-in component, which doesn’t require re-training.
A key goal of our paper is to raise awareness of the cost of Red AI, as well as encourage the AI community to recognize the value of work by researchers that take a different path, optimizing efficiency rather than accuracy alone.
The term Green AI refers to AI research that yields novel results without increasing computational cost, and ideally reducing it. Whereas Red AI has resulted in rapidly escalating computational (and thus carbon) costs, Green AI has the opposite effect. If measures of efficiency are widely accepted as important evaluation metrics for research alongside accuracy, then researchers will have the option of focusing on the efficiency of their models with a positive impact on both the environment and inclusiveness.
One of the challenges in making efficiency a core evaluation measure is that there are multiple potential measures of efficiency, each limited in various ways. For instance, the amount of carbon emitted by developing a given AI system is an important measure, but one that is hard to measure accurately, and largely depends on the local electricity infrastructure. The amount of electricity consumed by an AI system is easier to measure, but also largely depends on the local machine on which the experiments are run, and thus not comparable between different researchers in different locations. Measuring the amount of money an experiment costs could help inspire the development of cheaper AI models, but here again measuring is a challenge, as it is not clear how to price experiments on local hardware.
Among the set of potential measures, we suggest reporting the total number of floating-point operations (FPO) required to generate a result. FPO provides an estimate of the amount of work performed by a computational process. It can be computed analytically for any AI model. FPO has been used in the past to quantify the energy footprint of a model, but it is not yet widely adopted in AI.
FPO has several appealing properties. First, it directly computes the amount of work done by the running machine when executing a specific instance of a model, and is thus tied to the amount of energy consumed. Second, FPO is agnostic to the hardware on which the model is run. This facilitates fair comparisons between different approaches, unlike the measures described above.
Despite these benefits, FPO also has some limitations. Most importantly, the energy consumption of a model is not only influenced by the amount of work, but also from the communication between the different components, which is not captured by FPO. Nonetheless, we believe that FPO is a good compromise for measuring the costs of AI experiments, and we urge the AI community to report it in their experiments. As an example, our Green AI paper presents several FPO results for leading AI models.
Some companies such as Google, Amazon, and Microsoft make substantial efforts to use energy from renewable sources and/or buy carbon offsets to reduce their carbon footprints. These efforts are very important and could have a significant positive impact on the overall emissions of the AI industry.
Nonetheless, we argue that more should be done to promote Green AI. First, not all researchers have access to renewable energy. Second, it is unclear that buying carbon offsets is as environmentally friendly as not polluting (or polluting less) to begin with. Third, the inclusiveness concerns raised by the rising costs of AI still stand, even in a 100% renewable energy world, due to the monetary cost of energy.
Our call for increased reporting of computational resources echos existing efforts in the AI community, in scientific publications (Schulley et al., 2018, Oliver et al., 2018, Dodge et al., 2019), as well as through initiatives of AI conferences such as NeurIPS 2019 and EMNLP 2020, which require submissions to fill out forms that include information about the computational budget used to generate the reported results. We urge more researchers to report the computational budgets used in their experiments, as well as develop methods that facilitate this reporting.
The development of efficient machine learning approaches has also received attention in the research community. For example, a significant amount of work in the computer vision community has addressed efficient methods for real-time processing of images for applications like self-driving cars (Rastegari et al, 2016, Liu et al., 2016, Ma et al, 2018), or for placing models on devices such as mobile phones (Howard et al., 2017, Sandler et al., 2018). Recently several works have addressed efficient model training (Dettmers and Zettlemoyer, 2019, Lan et al., 2020, Clark et al., 2020) and presented methods that allow for reaching high accuracy while running fewer training runs (Li et al., 2017). Our goal is to encourage more researchers to work on these problems, by highlighting the large potential benefits of introducing more efficient AI methods.
Conclusion: increased transparency about the costs of AI
The sharp increase in the costs of AI prevents many researchers from studying state-of-the-art methods, and many practitioners from adopting the products of cutting-edge research. It also has a potentially non-negligible impact on the environment.
We argue that the first step towards making AI greener is to increase reporting of these costs, which are often hidden in AI scientific publications. We believe that making efficiency a core evaluation metric will encourage the community to put more effort into developing more efficient methods, which will help mitigate many of these concerns. At the Allen Institute for AI, we are working towards solutions to each of these problems.