Limitations of Deep Learning in AI Research

Source: Deep Learning on Medium

Artificial intelligence has achieved incredible feats thanks to deep learning, however it still falls short of human capabilities.

Image source: Machine Learning Memoirs | [11]

Deep learning a subset of machine learning, has delivered super-human accuracy in a variety of practical uses in the past decade. From revolutionizing customer experience, machine translation, language recognition, autonomous vehicles, computer vision, text generation, speech understanding, and a multitude of other AI applications [2].

In contrast to machine learning where an AI agent learns from data based on machine learning algorithms, deep learning is based on a neural network architecture which acts similarly to the human brain, and allows the AI agent to analyze data fed in — in a structure similar to the way humans do. Deep learning models do not require algorithms to specify what to do with the data, which is made possible thanks to the extraordinary amount of data we as humans, collect and consume — which in turn is fed to deep learning models [3].

The “traditional” types of deep learning incorporates a different mix of feed-forward modules (frequently convolutional neural networks) and recurrent neural networks (now and then with memory units, such as LSTM [4] or MemNN [5]). These deep learning models are restricted in their capacity to “reason”, for example to do long chains of deductions, or streamlining a method to land at an answer. The quantity of steps in a computation is restricted by the quantity of layers in feed-forward nets, and by the time-span a recurrent neural network will recollect things.

At that point there’s the murkiness problem. When a deep learning model has been trained, it is not always clear how it goes about making decisions [6]. In numerous settings that is simply not acceptable, regardless of whether it finds the correct solution; i.e. assume a bank utilizes AI to assess your credit-value, and afterward denies you a loan, in numerous states there are laws that state that the bank needs to clarify why — if the bank is using a deep learning model for its loan decision making, their loan department (likely) will not be able to give a clear explanation as to why the loan was denied.

Figure 1| Captions generated by a recurrent neural network (RNN), the RNN in this case is trained to identify high-level image presentations into captions. [1]

Most importantly there is the absence of common sense. Deep learning models might be the best at perceiving patterns. Yet they cannot comprehend what the patterns mean, and considerably less reason about them. To empower deep learning models to reason, we have to change their structure in order for them to not create a single output (i.e. the interpretability of an image, the translation of a paragraph, etc.), yet as to deliver an entire arrangement of alternative outputs (i.e. different ways a sentence can be translated). This is what energy base models are intended to do: give you a score for every conceivable configuration of the variables to be construed.

Progressively, such weaknesses are raising concerns about AI among the extensive public population, particularly as autonomous vehicles, which utilize comparable deep learning strategies to navigate the roads [7], get associated with setbacks and fatalities [8]. The public has started to say, perhaps there is an issue with AI — in a world where perfection is expected; and even though deep learning on self-driving cars has proven, that it would cause incredibly less casualties than human drivers, humanity itself will not, completely have its trust in autonomous vehicles, until, no casualties are involved.

In addition, deep learning is absolutely restricted in its current form, on the grounds that practically all the fruitful uses of it [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32], utilize supervised machine learning with human-comment annotations which has been noted as a significant weakness — this dependence prevents deep neural networks from being applied to problems where input data is scarce. It is imperative to discover approaches to prepare extensive neural nets from “crude” non-commented data in order to catch the regularities of the real world. In which combining deep learning, with adversarial machine learning techniques [17] [18] may lay the answer we are looking for.

In terms of the general population — unfortunately the public, does not have a fair understanding of deep learning. If work in deep learning was confined to only AI research labs it would be one thing. However, deep learning techniques are being used in every possible application nowadays. The level of confidence that tech executives and marketers are placing on deep learning techniques is worrisome. While deep learning is an incredible feat, it is important to not only explore its strengths, but to also focus, and be aware of its weaknesses, in order to have a plan of action.

Mrinmaya Sachan’s research on Towards Literate Artificial Intelligence [33] makes an interesting case in exploring how, even though we have seen notable developments on the field of artificial intelligence thanks to deep learning, today’s AI systems still lack the intrinsic nature of human intelligence. He then dives in and reflects, before humanity starts to build AI systems that posses human capabilities (reasoning, understanding, common-sense), how can we evaluate AI systems on such tasks? — in order to thoroughly understand and develop true intelligent systems. His research proposes the use of standardized tests on AI systems (similarly to the tests that students take towards progressing in the formal education system) by using two frameworks as to further develop AI systems, with notable benefits which can be applied in the form of social good and education.

On Deep Learning and Decision Making, Do we have a true theoretical understanding of a neural network?

Artificial neural networks, which try to mimic the architecture of the brain posses a multitude of connections of artificial neurons (nodes), the network itself is not an algorithm but a framework on which a variety of machine learning algorithms can function on to achieve desired tasks. The foundations of neural network engineering are almost completely based on heuristics, with a small emphasis on network architecture choices, unfortunately there is no definite theory which tell us how to decide the right number of neurons for a certain model. There are however theoretical works on the number of neurons and the overall capacity of a model [12] [13] [14], nevertheless, those are rarely practical to apply.

Stanford Professsor, Sanjeev Arora, takes a vivid approach to the generalization theory of deep neural networks [15], in which he mentions the generalization mystery of deep learning as to: Why do trained deep neural networks perform well on previously unseen data? i.e. let us say that you train a deep learning model with ImageNet and train it on images with random labels, high accuracy will be the outcome. However, using normal regularization strategies which infer higher generalization do not help as much [16]. Regardless, the trained neural net is still unable to predict the random labeling of unseen images, which in turn means that the neural network does not generalize.

Figure 2| One-pixel attacks that successfully fooled three types of deep neural networks trained on CIFAR-10 dataset. [9] [10] | The original labels are in black, while the output labels of the attack are in blue with their correspondent confidence interval [9].

Recently researchers were able to expose vulnerabilities of a deep neural network architecture by adding small nuances on a large image dataset as to alter (with high probability) the model outputs [9] of the neural network. The study follows several other researchers showing similar levels of brittleness defy the outputs, based on small nuances on the input. These type of results do not inspire confidence, i.e. in autonomous vehicles, the environment is prone to have nuances of all kinds (rain, snow, fog, shadows, false positives, etc.) — now imagine a visual system being thrown off by a small change on its visual input. I am sure that Tesla, Uber and several others have identified these issues and are working on a plan as to address them, however it is important for the public to be aware of them as well.

Figure 3| One pixel successful attacks on deep neural networks (DNNs). Original label first, followed by output from the attack on parentheses [9]

Nowadays, we are surrounded by technology. From the smart gadgets on our home, smartphones in pour pockets, computers on our desks to the routers that connect us to the internet, etc. In each one of these technologies, the base architectures function properly thanks to the solid engineering principles they were built upon, deep mathematics, physics, electrical, computer and software engineering, etc. and above all these fields — years, if not decades, of statistical testing and quality assurance.

It is important to remember, that deep learning models need a large amount of data to train an initial model (in order to have high accuracy results and not produce overfitting, keep in mind that sub-sequential tasks can learn from transfer learning), and that ultimately without a profound understanding of what is truly happening inside a “deep neural architecture,” it is not practically nor theoretically wise to build technological solutions that are sustainable on the long run.


The author would like to thank Matt Gormley, Assistant Professor at Carnegie Mellon University, and Arthur Chan, Principal Speech Architect, Curator of and Deep Learning Specialist, for constructive criticism on preparation of this article.

DISCLAIMER: The views expressed in this article are those of the author and do not represent the views of Carnegie Mellon University, nor other companies (directly or indirectly) associated with the author.

You can find me on: My website, Medium, Instagram, Twitter, Facebook, LinkedIn or through my web design company.

Related articles:


[1] Deep Learning Review| Yann LeCun, Yoshua Bengio, Geoffrey Hinton |

[2] 30 Amazing Applications of Deep Learning | Yaron Hadad |

[3] Introduction to Deep Learning | Bhiksha Raj | Carnegie Mellon University |

[4] Understanding LSTM Networks | Christopher Olah |

[5] Memory Augmented Neural-Networks | Facebook AI Research |

[6] The Dark Secret at the Heart of Artificial Intelligence | MIT Technology Review |

[7] MIT 6.S094: Deep Learning for Self-Driving Cars | Massachusetts Institute of Technology |

[8] List of Self Driving Car Fatalities | Wikipedia |

[9] One Pixel Attack for Fooling Deep Neural Networks | Jiawei Su, Danilo Vasconcellos Vargas, Kouichi Sakurai |

[10] Canadian Institute for Advanced Research Dataset | CIFAR-10 Dataset |

[11] Images, courtesy of Machine Learning Memoirs |

[12] Deep Neural Network Capacity | Aosen Wang, Hua Zhou, Wenyao Xu, Xin Chen | Arxiv |

[13] On Characterizing the Capacity of Neural Networks Using Algebraic Topology | William H. Guss, Ruslan Salakhutdinov | Machine Learning Department, School of Computer Science, Carnegie Mellon University |

[14] Information Theory, Complexity, and Neural Networks | Yaser S. Abu-Mostafa | California Institute of Technology |

[15] Generalization Theory and Deep Nets, An Introduction | Sanjeev Arora | Stanford University |

[16] Understanding Deep Learning Requires Re-Thinking Generalization | Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals |

[17] The Limitations of Deep Learning in Adversarial Settings | Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, Ananthram Swami | Proceedings of the 1st IEEE European Symposium on Security and Privacy, IEEE 2016. Saarbrucken, Germany |

[18] Machine Learning in Adversarial Settings | Patrick McDaniel, Nicolas Papernot, and Z. Berkay Celik | Pennsylvania State University |

[19] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.

[20] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to humanlevel performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014.

[21] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. Advances in Neural Information Processing Systems, 2015.

[22] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.

[24] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.

[25] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.

[26] Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567, 2014.

[27] Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256, 2016.

[28] Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Katya Gonina, et al. State-of-the-art speech recognition with sequence-to-sequence models. arXiv preprint arXiv:1712.01769, 2017.

[29] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015.

[30] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112, 2014.

[31] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.

[32] Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567, 2018.

[33] Mrinmaya Sachan, Towards Literate Artificial Intelligence, Machine Learning Department at Carnegie Mellon University,