Recurrence in biological and artificial neural networks

Source: Deep Learning on Medium


Go to the profile of Matthew Roos

Recurrence is an overloaded term in the context of neural networks, with disparate colloquial meanings in the machine learning and the neuroscience communities. The difference is narrowing, however, as the artificial neural networks (ANNs) used for practical applications are increasingly sophisticated and more like biological neural networks (BNNs) in some ways (yet still vastly different on the whole).

In this post we’ll highlight the historic differences in the use of term recurrence within these two communities, highlight some fairly recent deep learning ANN models that creep towards the neuroscience, point to some neuroscience studies that shine light on the function of recurrence, and speculate on future advancements.

Too long; didn’t read

  • What the deep learning community refers to as recurrent connections is akin to what the neuroscience community calls lateral connections. Namely, interconnected neurons in a localized area.
  • In the neuroscience community, a recurrent network is one that is prolific in its connectivity, including feed-forward, lateral, and feed-back connections.
  • Feed-back connections accommodate animal capabilities and behaviors that may be impossible to replicate in deep learning models where such connections are absent.

Recurrence in deep learning artificial neural networks

As many readers will be aware, deep learning networks are sub-type of ANNs in which the neurons (or nodes) are arranged into layers. The presence of many layers in such a network gives it a subjective depth that is it’s namesake, contrasting with earlier researched networks of this type that had only one or two such layers. In a stereotypical fully-connected feed-forward deep learning network, all neurons in a given layer send their outputs to all neurons in the layer immediately following it (the directional flow of computations are typically schematized as going between layers from bottom-to-top or from left-to-right).

One might also design networks in which neurons in a given layer send their outputs to the layer immediately preceding it, thus introducing feed-back connections between layers. We’ll come back to that a bit later.

Finally, a layer of neurons could send its outputs back to itself in a fully-connected (or other) fashion. Information stored in the layer recurs as input to that same layer in the next time/processing step. This is the type of recurrence that is nearly always meant when discussed by a deep learning practitioner — recurrence that is confined to a layer. (Note that there may be multiple recurrent layers, but the inter-layer connections are only feed-forward.)

Unlike the feed-forward network on the right, the network on the left has a recurrent layer (the larger light blue circles) that “feeds back” onto itself. In deep learning parlance, the network is termed a recurrent network rather than a feed-back network because the feed-back does not project to a preceding layer. Note that although the recurrent neurons in the figure are depicted as connecting back to themselves individually, the typical arrangement is for all neurons in the recurrent layer to connect to all other neurons in that same layer.

What this recurrent connectivity does is bestow memory in the recurrent neural network (RNN). The outputs of the network are no longer dependent solely on the input at the aligned time step. Rather, the network has a “state” at any given time, which in combination with the next input provides a new output and also updates that network’s state.

This allows the RNNs to recognize or produce patterns that vary in their temporal structure, such as speech [1]. The utterances <sleep> and <sleeeep> can both be recognized as the word “sleep,” for example. In fact, major advancements in the design and training methods of such sequence-to-sequence networks [2] is a key reason why speech recognition technologies have advanced so greatly in the past 2–3 years. Siri and Alexa may still be dumb as rocks, but at least they can translate your spoken words into text with great accuracy (though you may not always know it based on their responses!).

Language translation for text has been another area of great success. The use of recurrence allows for information to be accumulated during an encoding phase and distributed (output across time) during a decoding phase, whereby a direct word-to-word or phrase-to-phrase alignment is not necessary [2]. For example, allowing modifiers that precede a word in one language to follow it in another, such as when translating red hat to sombrero rojo.

The use of LSTMs, a type of RNN, allows for language translation networks with network memory that can accumulate information. Words (as vector representations) are incrementally input to the network, and the network distributes output words in a different language, with some delay. This is successful even when ordering of parts of speech (nouns, adjectives, etc.) is different between the two languages. [Image taken from The Keras Blog.]

We’d be remiss not to mention that the “vanilla” RNN architecture described above is rarely used in practice. Advanced applications typically rely on human-devised modifications that accommodate gating mechanisms. In some sense, this allows the state memory of the recurrent layer to be “dumped” when a certain input is received or a certain output is delivered. As an analogy, when you finish a sentence, and the related thought, you may wish to dump that thought so it does not become muddled with your next one. Remarkably, one of the most common and effective gated layers, the long short-term memory (LSTM) layer, was originally created in 1997 [3], well before the recent advances in RNN-based applications. See Christopher Olah’s blog post for an excellent tutorial on LSTMs.

Recurrence in biological neural networks

Among neuroscientists, recurrence has a boarder definition — based in part on the nearly isotropic connectivity patterns between neurons in biological neural networks (BNNs). Neurons are prolific in their axonal projections to other neurons, sending them both forwards and backwards, short distances and long distances. While there is strong evidence of a coarse hierarchical arrangement both structurally and functionally [4], the cortex of the brain is clearly not arranged into confined layers (groups) of neurons. The brain as a whole has distinct regions, with distinct types of neurons and neurotransmitters, but nothing like the compartmentalized connectivity that is a defining feature of deep learning ANNs. Nonetheless, what a deep learning practitioner calls recurrent connections are more likely to be called lateral connections by a neuroscientist.

An aspect of recurrent networks that has been heavily research by computational neuroscientists is the pattern-completion property of a so-called attractor network. Consider how in our own minds it may take only a brief glimpse, a short burst of sound, or a scant whiff of an odor to bring about a strong, vibrant memory. Or when trying to recall an actor or actress’s name we visualize their face, think of names of other performers they have worked with, movie titles, etc., until suddenly their name seems to magically pop into our heads. An analogy to this sort of phenomenon has been observed in simulations of recurrent attractor networks (an ANN, but without a deep learning structure, and often with inhibitory as well as excitatory artificial neurons, meant to be a more realistic model of BNNs). For example, a pattern of neural activity driven by the image of a face may also be driven by an obscured or noisy image of that same face, although the dynamics of the network take longer to evolve to a stable state in the latter case.

The energy landscape of a Hopfield attractor network. Sensory information may briefly position the network activity in an unstable, partial-information state, from which it dynamically moves (adjusts neuron firing rates) to a stable state that represents a fully-remembered object, sensation, abstract idea, etc. In this case, “memory” is really the strength and pattern of the synaptic connections between neurons, and recall of that memory is the neuronal firing pattern that ensues when external or internal stimuli push the network beyond the edge of the memory’s attractor basin (like a drop of rain that eventually flows to one of many possible lakes or oceans). See Jack Terwilliger’s blog post for more detail.

What may be more important than the distinction between the confined recurrence (within layers) of deep learning ANNs versus the broad recurrence of BNNs is the lack of feed-back connectivity in most deep learning models. In the neuroscience community, the term recurrence is nearly synonymous with a mix of feed-back and feed-forward connections, and recent studies are providing new evidence of the role feed-back.

Likely functions of recurrent and feed-back connectivity in biological networks:

  • Iterative sensory processing: Recurrent processing, in which bottom-up and top-down information streams interact to settle upon a stable result. See the following section for a deeper treatment of the topic.
  • Long-term memory: Incomplete information can initiate recall of a memory from long-term storage in an attractor network (as described above).
  • Short-term memory: Short-term memory, the kind needed to remember a short sequence of digits or the content of a few sentences, may be maintained by neurons that collectively generate a stable (but possibly dynamic) firing pattern, somewhat like an attractor except maintaining a new, short-term memory rather than recalling a stored long-term memory. This functionality is related to that of the sequence-to-sequence deep learning RNNs described above (e.g., allowing for speech recognition and language translation).
  • Top-down goal-driven attention: Based on an organism’s task-at-hand and related goals, not all sensory information is equally valuable. An animal searching for its favorite red berries may have feed-back connections that enhance the activity of lower-level neurons that respond to red light while reducing activity of those that respond to other colors. Neural models of this process have leveraged work by the deep learning community [5].
  • Plasticity: Recurrence is also an important part of the learning mechanisms in biological brains. For example, dopamine-releasing neurons in sub-cortical basal nuclei are part of an intricate network composed of cortical and sub-cortical areas, and ultimately can enhance plasticity in cortical regions in response to behaviors that result in rewards (food, mating, etc.), thereby reinforcing that behavior. This type of neuronal and network sophistication is almost completely absent in state-of-the-art deep learning.
  • Gating: Speculatively, feed-back may also serve as a gating mechanism to control the flow of information from lower-level to high-level neurons. Attention may co-opt such gating, but here we refer to gating that is not driven by an organism’s conscious perceptions and goals. For example, it is well known that visual information about the identity of an object is extracted and refined along a pathway from occipital cortex to inferotemporal cortex. In contrast, object location information is extracted and refined along a path from occipital cortex to parietal cortex. (Note that this is an overly simplistic description.) Gating may help direct this information routing, and may be mechanism that supports the iterative sensory processing discussed at the top of this list.

Iterative Sensory Processing

We’ll briefly highlight the iterative sensory processing role of recurrent/feed-back connections in BNNs and contrast it with the feed-forward convolutional neural networks (CNNs) that dominate image classification tasks in deep learning ANNs.

Deep learning object (image) recognition models have been a huge success in the field, and since the publication of the first CNN model to win the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), “AlexNet” [6], the field has grown rapidly. For a nice tutorial, see this blog post, or one of many others. Because the visual cortex is the most well-studied area of the mammalian cortex, many subjective and quantitative comparisons have been made between deep learning CNNs and mammalian vision.

Earlier neuroscientific models of vision, based on recordings of individual neurons by Hubel and Weisel [7], and others, were similar to standard CNNs in that they had convolutions, pooling, and exclusively feed-forward connections [8, 9]. Part of the motivation for functional models that were exclusively feed-forward is the fact that visual perception happens quickly, on the order of 100 ms. This estimate is based on neural firing times in “higher” areas of the brain relative to the moment at which an image is shown to lab animals. Based on anatomy, the visual cortex is often modeled as a loose hierarchy of 4–6 levels with heavy feedback connectivity. Despite the feedback connections, the rapidity of neural responses at the higher levels suggested that the feedback connections are not altogether necessary (for a simple object recognition task). If that were not the case, stable responses would be slower to form in those areas because additional time would be needed for the contributions from the feedback loops to propagate.

Yet CNNs require dozens if not hundreds of layers to achieve good image classification performance on the challenging ILSVRC test set, in contradiction to a model of the visual cortex composed of just a few feed-forward levels. In addition, in some computational studies, relatively shallow RNNs have performed comparably to very deep CNNs [10, 11].

Liao and Poggio [10] built a 4-level recurrent network meant to model the visual cortex. In this coarse model, visual input from the eye (via retina and thalamus) enter the primary visual cortex, V1. The network contains feed-forward connection (left-to-right), feed-back connections (right-to-left) and lateral connections (looping back to same area; synonymous with recurrent connections in deep learning terminology). Outputs from the inferotemporal area, IT, are used for object classification. They demonstrate that a shallow RNN is equivalent to a very deep ResNet model (a type of deep learning ANN) with weight sharing among the ResNet layers.

Recently, a pair of neuroscientific studies out of world-class labs, along with a more nuanced understanding of the time delays of biological recurrent connections, suggests that recurrence is required to capture the dynamic computations of the human visual cortex [12], and that recurrence is critical to the visual cortex’s execution of object recognition behavior [13]. Simply put, the evidence suggests that more “challenging” instances of object images could not be recognized without being iterated upon multiple times by the recurrent network. Said another way, additional nonlinear transformations were needed to successfully recognize objects in the challenge cases.

Final words

As mentioned, while recurrent deep learning ANN models have within-layer recurrence (“lateral” connectivity in neuroscience parlance), very few have the type of feed-back connections generally studied by neuroscientists — that is, connections from higher layers to lower layers. Notable exceptions include ANNs that model attention, and a handful of image classification models.

One reason for the near absence of deep learning models with feed-back connectivity is the difficulty in training such models. We may need new learning rules (methods other than back-propagation) to achieve the type of functionality that feed-back provides in BNNs.

Relatedly, biological neurons operate in parallel such that computations in a massive recurrent network can happen quickly. Indeed, the simultaneous computational update of neuronal states may be critical for success. This degree of parallelism can be difficult or impossible to implement for large, highly recurrent (neuroscience parlance) ANNs running on modern hardware.

We speculate that the introduction of heavy feed-back recurrence into deep learning models, and the development of training methods for such models, will bring about powerful now AI capabilities. The rate of progress on these advances is difficult to predict, however.

References

  1. Graves (2012). “Sequence transduction with neural networks.” https://arxiv.org/abs/1211.3711
  2. Sutskever, Vinyals, and Le (2014). “Sequence to sequence learning with neural networks.” NIPS 2014. https://arxiv.org/abs/1409.3215
  3. Hochreiter and Schmidhuber (1997). “Long short-term memory.” Neural Computation. https://www.mitpressjournals.org/doi/abs/10.1162/neco.1997.9.8.1735
  4. Kravitz (2013). “The ventral visual pathway: An expanded neural framework for the processing of object quality.” Trends Cogn Sci. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3532569/
  5. Yamins and DiCarlo (2016). “Using goal-driven deep learning models to understand sensory cortex.” Nature Neuroscience. https://www.nature.com/articles/nn.4244
  6. Krizhevky, Sustskever, and Hinton (2012). “ImageNet classification with deep convolutional neural networks” NIPS 2012. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networ
  7. Hubel and Wiesel (1959). “Receptive fields of single neurones in the cat’s striate cortex.” J. Physiol. https://physoc.onlinelibrary.wiley.com/doi/pdf/10.1113/jphysiol.1959.sp006308
  8. Fukushima (1980). “Neocognitron. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.” Biol Cybernetics. https://www.rctn.org/bruno/public/papers/Fukushima1980.pdf
  9. Riesenhuber and Poggio (1999). “Hierarchical models of object recognition in cortex.” Nature. https://www.nature.com/articles/nn1199_1019
  10. Liao and Poggio (2016). “Bridging the gaps between residual learning, recurrent neural networks, and visual cortex.” CBMM memo 047. https://arxiv.org/abs/1604.03640
  11. Zamir et al. (2016). “Feedback networks.” The IEEE conference on computer vision and pattern recognition (CVPR). https://arxiv.org/pdf/1612.09508.pdf
  12. Kietzmann, et al. (2019). “Recurrence required to capture the dynamic computations of the human ventral visual stream.” https://arxiv.org/abs/1903.05946
  13. Kar et al. (2019). “Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior.” Nature Neuroscience. https://www.nature.com/articles/s41593-019-0392-5