Original article was published on Deep Learning on Medium
The Roller Coaster Ride
Neural Networks aka Deep Learning had a roller coaster ride the last 10–15 years. They claimed victories with things like pattern matching, classification, generation etc.
As of today all hyperbole and excitement has toned down. Nothing much has been achieved which translates into production solutions.
The top deep learning firm DeepMind has had little or no commercial success. As of today there are no Autonomous Vehicles, Conversational Engines, or Cancer Diagnosis solutions.
“Talk is cheap; the ultimate degree of enthusiasm for AI will depend on what is delivered.” — Gary Marcus
So what are the fathers of Deep Learning saying now?
Top A.I. Firms are now trying to combine Deep Learning with Symbolic Computing. Together called as Deep Neurosymbolic Computing.
The problem with that is that Deep Learning is a function approximation and Symbolic Computing is an accurate method.
Symbolic computing lacks the rich integration with sensors, multimodal inputs like vision, speech etc.
So mixing the two together makes sense.
But the problem is that when an approximate method like Deep Learning is in control of a perfect method. The results are less than promising.
“Neural networks can get you to the symbolic domain, and then you can use a wealth of ideas from symbolic AI to understand the world.”
The Problem Statement
Here are some examples of questions that are trivial to answer by a human child but which can be highly challenging for AI systems solely predicated on neural networks.
The CLEVR DataSet
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
The CLEVRER DataSet
The data set consists of 20,000 short synthetic video clips and more than 300,000 question and answer pairings that reason about the events in the videos. Each video shows a simple world of toy objects that collide with one another following simulated physics. In one, a red rubber ball hits a blue rubber cylinder, which continues on to hit a metal cylinder.
The questions fall into four categories: descriptive (e.g., “What shape is the object that collides with the cyan cylinder?”), explanatory (“What is responsible for the gray cylinder’s collision with the cube?”), predictive (“Which event will happen next?”), and counterfactual (“Without the gray object, which event will not happen?”). The questions mirror many of the concepts that children learn early on as they explore their surroundings. But the latter three categories, which specifically require causal reasoning to answer, often stump deep-learning systems.
Current Efforts in The Industry
“We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. Our model builds an object-based scene representation and translates sentences into executable, symbolic programs. To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation. Analog to the human concept learning, given the parsed program, the perception module learns visual concepts based on the language description of the object being referred to. Meanwhile, the learned visual concepts facilitate learning new words and parsing new sentences. We use curriculum learning to guide searching over the large compositional space of images and language. Extensive experiments demonstrate the accuracy and efficiency of our model on learning visual concepts, word representations, and semantic parsing of sentences. Further, our method allows easy generalization to new object attributes, compositions, language concepts, scenes and questions, and even new program domains. It also empowers applications including visual question answering and bidirectional image-text retrieval.”
“We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset. Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering. Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.”
This time around everyone moved really fast. They understood the fallacies of putting deep learning networks in control of Symbolic Computing.
So everyone quickly switched things the other way around.
They put Symbolic Computing in control of and driving the approximate deep neural networks.
The results have been promising so far.
We might just invent A.I. this way.
This is our website http://automatski.com