An IQ Test Proves that Neural Networks are Capable of Abstract Reasoning

Original article was published by Jesus Rodriguez on Artificial Intelligence on Medium

An IQ Test Proves that Neural Networks are Capable of Abstract Reasoning

DeepMind published a methodology that ressembles an IQ test for neural networks.


I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

The ability to create abstractions from knowledge representations is one of the hallmarks of human intelligence. Arguably, the two most famous theories about the dynamics of the universe were derived as results of abstractions. Legend has it that a young Isaac Newton was sitting under an apple tree when he was bonked on the head by a falling piece of fruit, a 17th-century “aha moment” that prompted him to suddenly come up with his law of gravity. In reality, things didn’t happen quite like that but I will take the folklore version for the purposes of this article. The ability to relate two abstract concepts also allowed Albert Einstein to formulate the basics of the theory of relativity as he reasoned that an equivalence relation exists between an observer falling in uniform acceleration and an observer in a uniform gravitational field. Abstract reasoning has long been used as an example that separates human cognition from artificial intelligence(AI). Are AI systems capable of abstract thinking? A recent research paper from DeepMind seem to think that is possible and proposes a methodology for evaluating abstract reasoning in deep neural networks.

Abstract thinking can be seen as a form of knowledge generalization which is a widely used concept in deep learning systems. However, one of the main differences between abstractions and pure generation is that the former is based on deriving new knowledge from seemingly unrelated data. One of the paradoxical characteristics about human’s abstract reasoning is that is surprisingly quantifiable. In 1936, psychologist John Raven introduces the first version of the IQ Test that is widely used as a quantified estimator of human intelligence. One of the components of the IQ Test are the famous Raven’s Progressive Matrices (RPM) which consist of an incomplete 3×3 matrix of context images, and some candidate answer images. The subject must decide which of the candidate images is the most appropriate choice to complete the matrix. To solve an RPM puzzle, the candidate must consider a large number of possible answers which is why this test has long considered a measure of eductive, fluid and, therefore, abstract reasoning.


An IQ Test for Neural Networks

In their research, DeepMind borrows some ideas from the RPM section of the IQ test in order to measure abstract reasoning in deep learning agents. Specifically, the researchers built a generator for creating RPM-line matrix problems involving a set of abstract factors such as the following:

· Relation types (R, with elements r): progression,XOR, OR, AND, consistent union.

· Object types (O, with elements o): shape, line.

· Attribute types (A, with elements a): size, type, color, position, number.

Using those primitives, DeepMind generated a dataset known as Procedurally Generated Matrices(PGM) that consists of triplets [progression, shape, color]. The relationship between the attributes in a triplet represent an abstract challenge. For instance, if the first attribute is progression, the values of the other two attributes must along rows or columns in the matrix.

In order to show signs of abstract reasoning using PGM, a neural network must be able to explicitly compute relatioships between different matrix images and evaluate the viability of each potential answer in parallel. To address this challenge, the DeepMind team created a new neural network architecture called Wild Relation Network(WReN) in recognition of John Rave’s wife Mary Wild who was also a contributor to the original IQ Test.


In the WReN architecture, a convolutional neural network(CNN) processes each context panel and an individual answer choice panel independently to produce 9 vector embeddings. This set of embeddings is then passed to an recurrent network, whose output is a single sigmoid unit encoding the “score” for the associated answer choice panel. 8 such passes are made through this network, one for each answer choice, and the scores are put through a softmax function to determine the model’s predicted answer.

The experiments conducted by the DeepMind team used a series of PGM datasets with different deep neural network models such as CNNs, Long-Short-Term-Memory(LSTM), ResNet and the new WReN. The results showed that WReN was able to outperform the other architectures but all of them exhibit different levels of abstract reasoning.


To pass the PGM experiment, a deep neural network needed to be able to solve complex visual reasoning questions, and to do so, it needed to induce and detect from raw pixel input the presence of abstract notions such as logical operations and arithmetic progressions, and apply these principles to never-before observed data. The WReN architecture was able to excel in those tasks partly given to the fact that it promotes the relationships between different parts of the dataset since the first level of the network.

The DeepMind experiment produced a lot of interesting results that could help us understand how deep neural networks abstract knowledge. For instance, the different models generalized relatively well when required to reason using attribute values ‘interpolated’ between previously seen attribute values, and also when applying known abstract relations in unfamiliar combinations. That wasn’t the case, in ‘extrapolation’ scenarios in which attribute values in the test set did not lie within the same range as those seen during training. An example of this occurs for puzzles that contain dark colored objects during training and light colored objects during testing. Despite the variety of results in the experiment was thing was clear, neural networks exhibit primitive ways of abstract reasoning.