Original article can be found here (source): Artificial Intelligence on Medium

# Prologue

From disseminating targeted advertising to detecting pandemics, *Artificial Intelligence* (or *AI*) is truly transforming the world. Deep learning — an instance of the *AI* sub-field of Connectionist-*AI*, is at the heart of this transformative renaissance. Then, this article examines its foundational theory, which manifests through the neural networks for solving the learning problems, to understand its blackbox-operability nature.

# The Foundational Theory

For understanding the state-of-the-art *AI* and its sub-field Connectionist-*AI*, in general, and Deep learning, in particular, it is important to consider its inceptive historical context and the incumbent interpretation of its foundational theory, before the proposed alternate interpretation to its foundational theory may be appreciated. Then, the threads explicating the inceptive historical context for the aforementioned *AI* sub-field of Connectionist-*AI*, its incumbent interpretation, and the proposed alternate interpretation are briefly described below.

The Inceptive Historical Context. With the help from the synergistic efforts of several forward-looking scientists including (alphabetically) *Claude Shannon*, *John McCarthy*, *Marvin Minsky*, *Nathaniel Rochester*, and more, the field of *AI* was born in the mid 1950s. Since its inception, the field has been broadly subdivided into four distinct sub-fields, namely, Connectionist, Bayesian, Evolutionary, and Symbolist *AI*. These sub-fields fundamentally differ in the way they theorize, formulate, and implement the underlying ideas governing the field of *AI*.

Notably, throughout its history, due to a variety of constraints including the progress made in the theoretical underpinnings as well as the hardware advancements or lack thereof, the field of *AI* has witnessed alternating periods of optimism and pessimism. Then, this article focuses on the *AI* sub-field of Connectionist-*AI*, in general, and Deep learning — an instance of the Connectionist-*AI *which utilizes the neural networks, in particular.

Deep learning, also known as the differential programming, utilizes the neural networks to formulate and solve the learning problems. This sub-field instance was formally introduced in the mid 1980s by *Rina Dechter* — a distinguished Professor of Computer Science at the University of California, Irvine and a former doctoral student of *Judea Pearl* — the 2011 *Turing* Award winner. Then, the sub-field instance has made tremendous strides since its inception, with several fundamental contributions made by (alphabetically) *Geoffery Hinton*, *Yann LeCun*, and *Yoshua Bengio* — the 2018 *Turing* Award winners, among other notable scientists.

The Incumbent Interpretation. The neural networks, through which Deep learning instruments the learning process, are a connectionist construct loosely modeled on the biological neural networks. A neural network is a collection of individual artificial neuron nodes connected with each other over the mesh-type network topology. Generally, a neural network has an input layer that accepts the domain-specific input such as images, texts, and videos, and an output layer which assigns prediction scores based on the requirements of the underlying learning problem. Between the input and output layers, there are one or more hidden layers — with each hidden layer providing varying-order feature output to the subsequent layer. Typically, the neural networks used in Deep learning have several tens of hidden layers, if not more.

Separately, by the universal approximation theorem, it is known, given the pertinent parameters, the neural networks can approximate all continuous function defined within the bounded *R^n* space — the limit-closed *n*-dimensional *Euclidean* real space. However, it does not explicitly elaborate on the computability-nature of such approximate continuous functions.

Then, a recent result, precipitating the computability-learnability equivalence principle, states that the neural networks can only learn those computable functions (i.e. the aforementioned approximate continuous functions), which the *Turing* machines are able to compute, operating over the classical and quantum computing paradigms. Thus, the aforementioned result simultaneously bounds the computability-nature of the aforementioned approximate continuous functions to the underlying model of computation (e.g. a *Turing* machine with random-access memory) while establishing the learnability-equivalence between the Connectionist and Symbolist *AI*s, among other manifestable connections.

For example, by the aforementioned principle, it is understood that the neural networks can learn to play large combinatorial-space albeit finite computable functions -based games such as *Chess* and *Go,* which are computable by the *Turing* machines. However, they cannot learn the uncomputable functions such as the *Busy Beaver* function as it is not computable by the *Turing* machines.

Operationally, to learn the latent computable functions in the multi-dimensional input, a neural network, in its forward propagation phase, transforms the input by applying the input-dimension adjusted weight at the input layer. Consequently, such a transformed input is operated upon by the successive application of the non-linear activation functions — whose purpose is to provide selective output utilizing the converging transformed inputs from the previous layer neuron nodes. This process is repeated till the output layer, where the predicted output along with the cumulative error is produced.

Then, in the backward propagation phase, the aforementioned cumulative error, which results from the application of the forward propagation phase and is derived from the aforementioned predicted output, is minimized. The error minimization is framed as an optimization problem, whose objective is to shape the predicted output towards minimizing the cumulative error. For solving the aforementioned optimization problem, the stochastic gradient descent algorithm or its computationally-efficient variant is used, whose purpose is to repeatedly apply the differentiation operation to reverse the effects of the aforementioned forward propagation phase. Thus, through repeated application of the forward and backward propagation phases, the cumulative error between the real and predicted output is minimized.

The Proposed Alternate Interpretation. The aforementioned repeated application of the forward and backward propagation phases is performed in the training and testing phase of the neural networks, in which they learn and select, respectively, the latent computable functions present in the multi-dimensional input. Essentially, the training phase extracts the aforementioned latent computable functions and represents them within the neural networks. Subsequently, in the testing phase, the latent computable functions in the new multi-dimensional input are extracted and the decision is made on whether they are equivalent to the previously represented computable functions.

Mathematically, the forward propagation process of successively applying the weights and activation functions, to the multi-dimensional input, while converging and diverging the intermediate hidden layer outputs till the output layer, can be abstracted through the successive application of the integration operation over a given multi-dimensional space. Then, at the output layer, the predicted output is the resulting computable functions, along with the cumulative error.

Subsequently, in the backward propagation phase, the differentiation operation is successively applied to shape the output computable functions in the manner where the aforementioned cumulative error is minimized. Therein, it is important to note, for efficiency purposes, the numerical variants for the integration and differentiation operations may be utilized in the aforementioned forward and backward propagation phases, respectively. Then, the only distinction between the training and testing phase of the neural networks is that before the training phase, the neural networks have not learned any latent computable functions from the multi-dimensional input, whereas during the testing phase, the neural networks decide whether the newly learned latent computable functions are equivalent to the prior-learned latent computable functions.

For, if the multi-dimensional input — provided to the neural networks in the training and testing phases, respectively, share the same underlying input data distribution then the latent computable functions extracted in the training and testing phases, respectively, would be approximately equivalent and any difference between them would be bounded. Then, the aforementioned alternate interpretation to the neural networks operability is formally defined as the equation below.