The Beginner’s Glossary of Neural Network Terms

Original article was published on Artificial Intelligence on Medium


One of the greatest road-blocks for newcomers to data science and deep learning is the multitude of terms present on even the most introductory courses. The field has a steep learning curve, as most concepts are intertwined and complementary. To make things worse, most newcomers have little to no machine learning background, making the task even harder. In this article, I break down some of field’s most important concepts in a sequential manner, linking each new term to the last.

Let’s start with the definition of…

Hierarchy of terms surrounding “Deep Learning”

Intelligence: So far, no commonly accepted definition for intelligence exists. The definition is an on-going debate that spans a wide range of fields and disciplines, such as philosophy, psychology, sociology, and computer science. However, although no definition is widely accepted, science has been trying to mimic intelligence for decades, creating…

Artificial Intelligence (AI): The field within Computer Science that studies “intelligent behavior” in machines. In other words, systems that behave in non-trivially predictable ways. Systems that react and (appear to) reason about themselves and the world around them. This definition is vague as we can’t even define intelligence properly. However, something that we can define is learning, which brings us to…

Machine Learning (ML): The task of showing the inputs and outputs of a problem to an algorithm and letting it learn how to solve it. For instance, to create an algorithm that recognizes dogs and cats in pictures by showing it hundreds of photos of pets. Within, ML, a growing approach is:

Deep Learning (DL): The idea of stacking multiple learning algorithms to jointly solve a difficult task. When we are kids, we learn the alphabet, then we learn to read simple words, then full sentences, etc. With practice, we are reading books and forming opinions. In between letters and fluency, several learning steps are involved. In imitation, Deep learning algorithms approach learning by breaking the task down into sequential problems, each one building an increased level of abstraction.

In practice, DL and ML algorithms are frequently the tools of choice for…

Data Science (DS): The study of data, such as acquisition, cleaning, storing, viewing, and processing data. A common misconception is that DL and DS are the same things, as many DS courses feature DL prominently. ML and DL are tools for the data scientist, as is SQL, NumPy and, etc.

Data Engineering, Data Analytics, Data Analysis, Cognitive Computing: This area is full of lesser-known synonyms, and confusing terms. Some seem to only exist to sound new and fresh, some make reference to specific tasks within DS. For instance, “analytics” usually refers to exploratory analysis while “engineering” refers to implementing systems for production use. Stick to the widely used terms whenever in doubt, such as “data science”.

A sample fully-connected neural network. Visualization built with the NN-SVG tool.

Artificial Neural Network (NN): Among several ways of implementing deep learning, neural networks are by far the most popular. In short, they are a stack of simple learning algorithms (called layers) that sequentially process the input, producing an output. This embodies the idea of deep learning by design, as each layer learns a more refined understanding of the input.

Network Layers: All neural networks are made up of layers (as shown in the image above). A layer defines an operation that takes some inputs, some parameters, and produces a set of outputs. For instance, the…

Dense Layer: Is the layer that receives a vector (input) and multiplies it by a matrix (parameters), producing another vector (outputs). While this is one of the most widely used layers in practice, it has a problem, it is:

Linear: Something is linear when it can be decomposed as independent parts, this makes them easy to study. However, a linear system is no more interesting than the parts that make it. “A+B” is no more interesting than “A” and “B” themselves. This is opposed by:

Non-Linear: A system is non-linear when its parts are intertwined as a complex whole. A non-linear system cannot be easily factored. Hence, the whole is more than the sum of its parts. For instance, “sine(A + B)” cannot be broken down into “A” terms and “B” terms. Thankfully, any linear system can be made non-linear by means of an…

Activation Function: These are ordinary non-linear functions used in DL that have the sole purpose of receiving input and making it non-linear. In the above example, we made “A + B” non-linear by using the sine function. Within DL, the most famous activation function is the:

ReLU: Short of Rectified Linear Unity, defined as ReLU(x) = max(0, x). This is one of the simplest (and most efficient) ways of making something non-linear. That’s why it is so popular: it is simple and fast. Combining dense layers and ReLU activations, we can build a…

Dense Network / Fully-Connected Network / Multi-Layer Perceptron: These are all synonyms for the basic neural network: a collection of dense layers interleaved with activation functions. It is important to highlight that, without activation functions, a network would be no more interesting than a single dense layer. The non-linearities are the glue that creates a powerful model out of ordinary parts.

Out of the synonyms for neural networks, one has historical significance…

Perceptron Model: In 1958, Frank Rosenblatt created a computational model of a neuron: the perceptron, which is the basis for most of what came afterward. Simply put, the perceptron is a weighted sum followed by an activation function. Formally, P(x; w) = a(x ⋅ w). This means the perceptron receives an “x” vector, has a set of weights “w”, and computes the dot product of “x” with “w” (the weighted sum) and feeds it to “a”, the activation function.

Dense Layer (revisited): If an input is fed to many perceptrons at once, as in the figure above, we can simplify the math by “joining” the weights of each perceptron as a matrix, creating the dense layer: D(x; W) = a(x ⋅ W). Using this definition, we can build a two-layers network by feeding the result of a dense layer into another, such as: D(D(x;W₁);W₂) = a(a(x ⋅ W₁)⋅W₂). This showcases an important notion: neural networks are just math functions.

In this expression, what we haven’t given attention to yet are the…

Weights: Machine learning models define an “operation” that is guided by a set of “weights”. By changing a model’s weights, we are able to make it do different things. For instance, we might train the same network to recognize cats and dogs or birds and fishes. What changes are the weights, not the network. This brings us to…