Source: Deep Learning on Medium

# Deep Learning 101 — The Theory

There are many machine learning practitioners who question the approach to theory that many machine learning instruction typically takes. Their position is one of pragmatism — learn to apply the model, see how it works, then when you’re comfortable, go back and learn the theory. I can appreciate this approach, after all, machine learning is a rather empirical exercise. However, I prefer to start with the theory because I’m simply not fully comfortable with developing and implementing models if I don’t have at least some understanding of how they work under the hood. If you’re like me, then read on as we walk through the important theoretical underpinnings to deep learning. If you’re in the former camp, then check out this post to dive right in.

# TL;DR

We look at neural networks, their structure, purpose, and the math that underpins them including topics related to linear algebra, gradient descent, and back propagation. Medium doesn’t do a great job with equations, so if you want all of the gory details, check out my original post.

# What is Deep Learning?

Is a technique that has exploded into the mainstream of AI and machine learning research over the past five years. It consists of **artificial neural networks** with many layers (hence the characterization of “deep”) which allows for the expression of complex representations. The advances in image recognition, for example, have largely been driven by deep learning where the successive layers of the neural network allow simple features to be aggregated to higher and higher levels until a proper classification can be determined. The raw pixels come in and simple features such as contours are combined to form lines, edges, boundaries, and eventually the entire image is recognized by the network. Deep learning applications extend beyond image recognition into a wide-variety of different spaces such as natural language processing (NLP), forecasting, a myriad of classification problems, and can be applied to other, classic predictive analytic problems.

# Anatomy of a Neural Network

The layout or architecture of a neural network is one of the drivers of its performance, and there is a veritable zoo of architectures out there. Despite the wide variation, they contain the same few components, just rearranged. At the most basic level, we have an **input layer**, **hidden layer**, and **output layer**, each with various **nodes** (sometimes called **cells**). Below, we have a simple image of a two-layer neural network (we don’t typically count the input layer when counting neural network layers).