Original article was published on Artificial Intelligence on Medium
Emergence: How Artificial General Intelligence can be Computationally Modeled
An Argument in Support of Cellular Automata-like Mechanisms in Approaching Artificial General Intelligence (AGI)
“If we are ever to make a machine that will speak, understand or translate human languages, solve mathematical problems with imagination, practice a profession or direct an organization, either we must reduce these activities to a science so exact that we can tell a machine precisely how to go about doing them or we must develop a machine that can do things without being told precisely how” — Richard M. Friedberg, 1958
If artificial intelligence (AI) is a problem to be solved, artificial general intelligence (AGI) is currently the coveted prize. AGI refers is the hypothetical intelligence of a machine capable of understanding and learning human (anthropocentric) tasks. However, unlike normal problems, we seem to have difficulty understanding what an AGI system would look like and have only recently looked at ways to systematically evaluate an AGI system .
That being said, it is clear that the three basic machine learning paradigms: supervised learning (e.g. neural networks, SVM, regression, etc.), unsupervised learning (e.g. clustering, dimensionality reduction, etc.), and reinforcement learning, although highly useful in narrowly defined tasks like computer vision , natural language processing  or even beating humans at the games of Chess and Shogi  fail to generalize to other domains. One could argue that the immense success of these paradigms is a byproduct of their assumptions or their inductive biases, for example a linear model assuming the data is linear . These biases inhibit combinatorial generalization or the ability humans have to inference, reason, and act in a possibly infinite number of scenarios. For example, if you ask a neural network trained to detect human faces to detect faces of human hidden behind ski masks, the model may perform no better than random guessing, although a human could quickly reason that a face is beneath the mask.
In designing an AGI system, we need to consider systems that have no knowledge of the tasks at hand nor access to training data that can be used to “beat” the task metrics through an “unintelligent” workaround. Instead, an AGI system must be evaluated on its ability to acquire new skills efficiently to solve not just one but many tasks spread across various domains with respect to anthropocentric priors, learned experience, and relative generalization difficulty . As a result, some basic assumptions of such a system can be made, which are central to the argument of this article which is a call-to-action to reinvigorate research surrounding cellular-automata-like machines to develop an AGI system.
Assumptions of an AGI System
- Evolution Assumption: The system must be able to evolve in a fashion dependent on time and space such that it can solve increasingly complex tasks and, for example, develop constructs like working memory — thus prioritizing the design of the initial system, environment, and evolutionary rules over the final deliverable. This is in opposition to the goal of traditional statistical models which is to develop an “expert” model in a particular, well-defined task.
- Generalization Assumption: The system must be capable of broad generalization (i.e. combinatorial generalization) such that previous experiences are the foundation for acquiring new learned skills and not the specific skills themselves 
- Learning Assumption: The system must optimize for flexibility and efficiency across a variety of tasks rather than a finite set of task-specific metrics (mean squared error, cross entropy, etc) . For example, efficiency in acquiring a new skill with a stopping condition for when that skill is acquired could be a potential optimization problem.
- Anthropocentric (Core Knowledge) Assumption: The system must be able to encode anthropocentric priors i.e. what humans are predefined to understand like the domain of objects and the domain of numbers (humans strongly perceive the world as a being composed of objects and naturally tend towards hierarchical representations and have a basic understanding of the natural numbers) .
These 4 assumptions are certainly not exhaustive, but serve as a good starting point for where and where not to look. For example, many computational frameworks can easily incorporate known priors necessary for the anthropocentric assumption (e.g. Bayesian Networks or Hidden Markov Models); however few models have the potential for combinatorial generalization (graph networks ), and even fewer systems can perform time-dependent evolution to handle increasingly complex tasks (e.g. not graph networks).
In addition to our assumptions, several useful properties would be desired in an AGI system. First and foremost, we would prefer the AGI system to be simple. Second, it would be desirable that from this simplicity would arise emergent properties with which our system can benefit from and we can use to evaluate the system. For example, it is well-known that humans visually attend to objects both in serial and in parallel (e.g. reading a sentence vs reading a word) . While a system designer of an AGI system might be able directly encode both serial and parallel processing into such an AGI system, Alan Newell’s quote below warns that attempting to piece apart human cognition through experimentation may not only be difficult, but actually intractable as human responses to controlled experiments may not reveal useful information on the underlying mechanisms of cognition but rather the human subject itself .
“Science advances by playing twenty questions with nature. The proper tactic is to frame a general question, hopefully bi- nary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal — one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work.” — Alan Newell, 1973, “You Can’t Play 20 Questions with Nature and Win”
Finally, we would like our AGI system to be tied to a mathematical framework, so that we can define constraints, control the complexity and evolution of the system, and limit the search space for potential solutions. For example, the idea of “whole brain simulation”, despite the likelihood that it is not possible , is also problematic because its evolution would be uncontrollable due to a lack of understanding of the underlying mechanisms, which has obvious ethical implications. An essential extension of this third property is that the mathematical framework itself must be able to increase in complexity if the capacity of the AGI system must increase. For example, a linear model would never suffice, because despite increasing the number of parameters, we could never, for example, model a non-linear problem (e.g. XOR). However, because neural networks can approximate any continuous function according to the Universal Approximation Theory  a newly defined architecture that is also capable of satisfying the assumptions above could increase its capacity by adding additional neurons and layers.
Now, let’s step back and consider a seemingly unrelated topic to AGI — childhood memory, or more specifically the lack thereof.
Why Don’t We Remember Things as a Child?
If you’ve ever tried remembering your earliest memory you may have encountered the realization that this memory is fragmented, unclear, or blurry — almost like a dream. In fact, it is highly likely that you don’t have any recollection of the first several years of your life at all and that these initial fragmented memories that you recall occurred between the ages of 3–7. This well-studied phenomena is known as “infantile amnesia” and refers to our inability to recall episodic memories from the first 3–4 years of our life ; however, it is not just a human experience — in fact, many animal species exhibit forms of infantile amnesia . While it is unclear if this is due to failure of memory retrieval or failure to store memories, it is apparent that this phenomena has something to do with the process of neurogenesis  or simply put, the process of our brain maturing.
The validity of the neurogenesis hypothesis is actually of little concern to the argument of this article. More important is the recognition of the phenomenon itself — that is, if we disregard developmental issues, human working memory seems to develop regularly and independently from our experiences. Nature in this case seems to trump nurture. The biological, psychological, and systematic analysis of this phenomenon all suggest that the information processing system of our brain becomes increasingly more complex and that this evolution is dependent entirely on time.
The consequences of time-dependent neurological complexity arising from simplicity — human intelligence — is profound. Naturally, questions arise such as what makes human intelligence different from non-anthropocentric intelligence and is there a blueprint shared among all primates, all mammals, or even all animals? Most importantly, what mechanisms control the maturation of the brain and how can this environment be sufficiently modeled? Answering these questions requires psychological, biological, and computational considerations — as such, we should limit our search to computational frameworks flexible enough to incorporate new understandings in these related domains and capable of complex evolution from simple initial configurations — enter cellular automata.
Background and Significance
Before his death in 1957, John von Neumann, who is considered one of the greatest mathematicians of his time, was working on the design of a universal constructor — a self-replicating machine that could be designed in a cellular automata environment. The details of the machine were eventually compiled posthumously by Arthur W. Burks and published in the book Theory of Self-Reproducing Automata . The idea behind the universal constructor was to determine the minimal requirements to allow a machine to grow in complexity, similar to evolution in biological and social systems. Such a machine could generate any other machine conceivable in the cellular automata environment. The universal constructor consisted of 3 parts: a blueprint, a mechanism to execute the blueprint, and a mechanism to copy the blueprint and impressively preceded the famous discovery of DNA transcription by Watson and Crick. In 1995, a concrete implementation was designed (Figure 1). The key discovery of the universal constructor was that complex systems capable of reproduction could emerge from a simple environment defined by a simple set of rules.
Cellular automata, originally described by von Neumann and Stan Ulam, are mathematical idealizations of physical systems where space and time are discretized. The spatial environment is defined by the tessellation of an N-dimensional lattice often infinite in extent, where each discrete cell is associated with a finite, time-dependent state. The system evolves in discrete time steps according to a set of rules that depend on the neighborhood of each cell in the previous time step. Often, the neighborhood is defined by a cell’s immediately adjacent cells. Each cell is governed by the same rules, can take on exactly one state from a finite set of states, and is updated synchronously with the rest of the environment. The configuration of the system is the current state held by each cell. A predecessor is the configuration in the previous time step and the successor is the configuration that results from applying the transition rules to the current configuration.
For example, imagine a finite 5×5 grid with 25 cells that can each take on one of two states — on or off. For now, let’s ignore how to handle the edge cells. The neighborhood of each cell can then be defined as its 8 adjacent cells, like in Figure 2b. Because each cell can have 2 states, the neighborhood and the cell itself can be defined by 2⁹ different configurations. Furthermore, because our cell can then transition into 2 possible states, there are 2^(2⁹) possible rules. How we define these rules is the final step in specifying this cellular automata environment.
One of the most famous and well-studied specifications of cellular automata is the late John Conway’s “Game of Life” or simply “Life”. Like the previous example, each cell can be on or off — or in this case alive or dead. The game is similarly defined on a 2D grid with three rules: any living cell with 2–3 neighbors stays alive, any dead cell with 3 neighbors becomes alive, and all other living cells die in the next generation. Trivially, any other dead cells remain dead .
The “Game of Life” is called a zero-player game because the system evolves based only on its initial configuration that specifies which cells are alive and dead. Amazingly, complex systems can be designed as shown in Figure 3 with patterns that are alive and stationary, oscillating with various periods, and moving rhythmically across the grid.
Hopefully, it is clear now how a cellular automata system could conceivably organize itself in a way that could satisfy all the assumptions of a basic AGI system. The evolution assumption is implicit in its design, which incorporates both temporal and spatial evolution. Like graph networks, combinatorial generalization could be achieved by allowing the system to develop basic building blocks with which to approach a task. Secondly, we can directly compare various initial configurations in terms of their efficiency and flexibility in a number of ways. For example, how many iterations it takes for an initial configuration to reach a useful, complex configuration can be used to compare efficiencies whereas the number of unique successor patterns that fit in an n x n grid could be used to evaluate the flexibility of the system. Finally, because cellular automata systems can be designed to be Turing complete , we can encode priors directly into the system. Furthermore, only certain sets of rules allow for Turing complete cellular automata, like rule 110 in the elementary or 1D cellular automata . Additionally, this imposes constraints on how we configure the system because only sets of rules that allow Turing completeness should be evaluated for a given number of iterations.
How Cellular Automata Systems are Configured
It is important to quickly discuss how cellular automata could be configured. In a way, this represents the entire potential search space that could be explored. For now, we assume that all cellular automata systems exist in an infinite space and that all parameterizations of the cellular automata system are time-independent (i.e. will not change once initially set) and shared across all cells.
- Curvature and dimensionality of space (e.g. “Life” is a 2D Euclidean plane with zero curvature but higher dimensional hyper-planes with non-zero curvature could be explored)
- Tessellation or tiling geometries (e.g. regular polygons like squares, triangles, hexagons or semi-regular polygons like both squares and triangles )
- Set of states (von Neumann’s original Universal Constructor had 29 states whereas “Life” only has 2)
- Neighborhood definition
- Transition function (rules)
Conditional dependencies could be explored such as the states, neighborhood definitions, and transition functions dependent on the tessellation geometry of a particular cell. I conjecture that the exploration of such dependencies, while increasing the complexity of the system, could increase the efficiency of the automata in evolving into useful, complex systems.
Where we Go From Here?
There still are disadvantages of cellular automata-like systems. For example, by representing time and space as discrete units, inputs into the system must also be discretized. Unlike neural networks which perform inference over raw sensory data like pictures, the cellular automata must take in object-like input. This is an open problem and a disadvantage shared with graph neural networks . However, because it is conceivable that the cellular automata could be used to design a sub-system to operate on discretized versions, of say, an audio input, it may prove to not be a limitation after all. Another limitation is the search-space. Assuming a configuration like “Life”, one would have to explore an exponentially growing number of configurations depending on the number of cells for an undecidably long number of iterations because whether or not a pattern can ever appear from an initial configuration is undecidable . Of utmost importance and concern, however, is how to efficiently explore the space of initial configurations to solve a given problem. While I have some ideas here, they are at this point no more than educated hypothesis and require additional exploration.
Like Friedburg pointed out, designing a system capable of artificial general intelligence exists on a spectrum between two extremes — one in which the entire human brain and all related activities are simulated by the system and another in which the system is capable of evolving on its own to match the needs of the human brain — more succinctly, these represent a model-centric vs system-centric approach. If you lean to the latter, cellular-automata-like systems are a promising option because they allow for complex systems — possibly even those required for human-like conscientiousness — to arise from simple, initial configurations. A consequence with surprisingly human parallels.
 Von Neumann, John, and Arthur W. Burks. “Theory of self-reproducing automata.” IEEE Transactions on Neural Networks 5.1 (1966): 3–14.
 Pesavento, Umberto. “An implementation of von Neumann’s self-reproducing machine.” Artificial Life 2.4 (1995): 337–354.
 Wolfram, Stephen. Cellular automata and complexity: collected papers. CRC Press, 2018.
 Lee, Chung-Hwan, et al. “Implementation of Lava Flow Simulation Program Using Cellular Automata.” The Journal of the Petrological Society of Korea 26.1 (2017): 93–98.
 Alberini, Cristina M., and Alessio Travaglia. “Infantile amnesia: a critical period of learning to learn and remember.” Journal of Neuroscience 37.24 (2017): 5783–5795.
 Akers, Katherine G., et al. “Hippocampal neurogenesis regulates forgetting during adulthood and infancy.” Science 344.6184 (2014): 598–602.
 Chollet, François. “The Measure of Intelligence.” arXiv preprint arXiv:1911.01547 (2019).
Sainath, Tara N., et al. “Deep convolutional neural networks for LVCSR.” 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.
 LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.” nature 521.7553 (2015): 436–444.
 Silver, David, et al. “Mastering chess and shogi by self-play with a general reinforcement learning algorithm.” arXiv preprint arXiv:1712.01815 (2017).
 Battaglia, Peter W., et al. “Relational inductive biases, deep learning, and graph networks.” arXiv preprint arXiv:1806.01261 (2018).
 Spelke, Elizabeth S. “Core knowledge.” American psychologist 55.11 (2000): 1233.
 Treisman, A. Cogn. Psychol. 12, 97–136 (1980).
 Newell, Allen. “You can’t play 20 questions with nature and win: Projective comments on the papers of this symposium.” (1973).
 Stiefel, Klaus M., and Daniel S. Brooks. “Why is There No Successful Whole Brain Simulation (Yet)?.” Biological Theory 14.2 (2019): 122–130.
 Csáji, Balázs Csanád. “Approximation with artificial neural networks.” Faculty of Sciences, Etvs Lornd University, Hungary 24.48 (2001): 7.
Cover photo courtesy of Dr. Manahel Thabet, PhD from his post https://www.manahelthabet.com/2018/08/06/artificial-intelligence-will-like-powerful-human-brain-future/