On integrating symbolic inference into deep neural networks

Source: Deep Learning on Medium


Deep neural networks have been a tremendous success story over the last couple of years. Many advances in the field of AI, such as recognizing real world objects, fluently translating natural language or playing GO at a world class level, are based on deep neural networks. However, there were only few reports concerning the limitations of this approach. One such limitation is the inability to learn from a small amount of examples. Deep neural networks usually require a huge amount of training examples, whereas humans are able to learn from one single example. If you show a cat to a child who has never seen one before, it can recognize another cat based on this single instance. Deep neural networks on the other hand require hundreds of thousands of images to learn what a cat looks like. Another limitation is the inability to make inferences based on previously learned common knowledge. When reading a text, humans tend to derive wide ranging inferences about possible interpretations of the text. Humans can do this because they can recall knowledge from very different domains and apply it to the text.

These limitations indicate that something fundamental is missing in deep neural networks. This something is the ability to establish symbolic references to entities in the real world and to put them in relation to each other. Symbolic inference in form of formal logic has been at the core of classic AI for decades, but it has proven to be brittle and complex to work with. Nevertheless is there no way to enhance deep neural networks so that they would become capable of processing symbolic information? Deep neural networks have been inspired by biological neural networks like the human brain. Essentially they are a simplified model of the neurons and synapses that are the basic building blocks of the brain. One such simplification is the omission of the spiking nature of biological neural networks. But what if it is not only important to actually activate a neuron, but also when this neuron is exactly activated. What if the point in time, at which a neuron fires, establishes a relational context to which this activation refers to. Take, for example, a neuron that stands for a particular word. Would it not make sense if that neuron would be triggered every time the word appears in a text? In this case the timing of the spikes would play an important role. And, not only the timing of a single activation, but the timing of all incoming spikes of a neuron relative to each other would be crucial. This timing pattern might be used to establish a relation between these input activations. For example, if a neuron representing a particular word has an input synapse for each letter in this word, it is important that the word neuron is only triggered when the letter neurons have been fired in the right order to each other. Conceptionally these timing differences could be modeled as relations between the input synapses of a neuron. These relations also define the point in time at which the neuron itself fires relative to its input activations. For practical reasons it might be useful to allow the activation of a neuron to have several slots, like the beginning and the end of a word, associated to it. Otherwise the beginning and the end of a word would have to be modeled as two separate neurons. These relations are a very powerful concept. They allow to easily capture the hierarchical structure of text or to relate different ranges within a text to each other. In this case a neuron might refer to a very local information, like a letter, or a very wide ranging information, like the topic of a text.

Another simplification with regard to biological neural networks is that an activation function is used to approximate the firing rate of an individual neuron. For this purpose classical neural networks use the sigmoid function. However, the sigmoid function is symmetric with respect to large positive or negative input values, which makes it very difficult to model logic gate-like operations with neurons using the sigmoid function. Spiking networks on the other hand, have a clear threshold and ignore all input signals that remain below this threshold. Therefore, the ReLU function or some other asymmetric function could be a better approximation for the firing rate. This asymmetry is also essential for neurons that process relational information. The neuron representing a particular word must remain completely inactive for all times when the word does not occur.

Also neglected in deep neural networks is the fact that different types of neurons occur in the cerebral cortex. Two important types are the spiny pyramidal cell, which primarily has an excitatory characteristic, and the aspiny stellate cell, which has an inhibitory one. The inhibitory neurons are special because they allow to build negative feedback loops. Such feedback loops are not normally found in a deep neural network because they introduce an inner state to the network. Consider the following network with an inhibitory neuron and two excitatory neurons, representing two different meanings of the word ‘August’.

Both meanings are mutually exclusive, meaning that the network now has two stable states. These states may depend on further input synapses of the two excitatory neurons. For example, if the next word after the word ‘August’ is a potential last name, a corresponding input synapse for the entity neuron August-(first name) could increase the weight of that state. It is now more likely that the word ‘August’ will be classified as a first name and not a month. But keep in mind that both states need to be evaluated. In larger networks, many neurons may be connected by negative or positive feedback loops, potentially creating a great number of stable states within the network.

For this reason an efficient optimization process is required which determines the best state with regard to some objective function. This objective function could be to minimize the need to suppress strongly activated neurons. However, these states have the tremendous advantage that they allow to consider different interpretations of a given text. It is kind of a thought process in which different interpretations are evaluated and where the best fitting is the result. Fortunately, the search for an optimal solution state can be optimized quite well.

The reason why we need inhibitory neurons in these feedback loops, is that otherwise all mutually suppressive neurons would have to be fully connected to each other. That would lead to a quadratically increasing number of synapses.

Through the negative feedback loops, that is, by simply connecting a negative synapse to one of its precursor neurons, we have suddenly entered the mystical realm of nonmonotonic logic. Nonmonotonic logic is a subfield of formal logic in which implications are not only added to a model but also removed. It is assumed that nonmonotonic logic is needed in order to draw conclusions for many common sense reasoning tasks. One of the main problems of nonmonotonic logic is that it often can not decide which conclusions to draw and which not. Some skeptical or credulous inferences should be drawn only if no other conclusions are more likely. This is where the weighted nature of neural networks comes in handy. Here, more likely states can suppress less likely states.

Conclusion

Although deep neural networks have come a long way and are now delivering impressive results, it may be worth taking another look at the original, the human brain and its circuitry. If such an inherently complex structure as the human brain is to be used as a blueprint for a neural model, simplifying assumptions must be made. However, care must be taken in this process as otherwise important aspects of the original may be lost.

References

  1. The Aika Algorithm

Lukas Molzberger

2. Neuroscience: Exploring the Brain

Mark F. Bear, Barry W. Connors, Michael A. Paradiso

3. Neural-Symbolic Learning and Reasoning: A Survey and Interpretation

Tarek R. Besold, Artur d’Avila Garcez, Sebastian Bader; Howard Bowman, Pedro Domingos, Pascal Hitzler, Kai-Uwe Kuehnberger, Luis C. Lamb, ; Daniel Lowd, Priscila Machado Vieira Lima, Leo de Penning, Gadi Pinkas, Hoifung Poon, Gerson Zaverucha

4. Deep Learning: A Critical Appraisal

Gary Marcus

5. Nonmonotonic Reasoning

Gerhard Brewka, Ilkka Niemela, Mirosław Truszczynski