Deep Learning in a NutShell

Source: Deep Learning on Medium

Hey Folks!

The Field of Artificial Intelligence came into the picture when a machine can automate task which requires human intelligence. Moreover in machine learning has the capability to learn by experience and acquire skills without human intervention. Deep learning is a subsidiary of Machine learning in which Artifical Neural Network ANN’s are inspired by the human brain neurons or perceptrons that basically learns from a large amount of data. According to Wikipedia-

Deep learning is a class of machine learning algorithms that use multiple layers to progressively extract higher level features from raw input. For example, in image processing, lower layers may identify edges, while higher layer may identify human-meaningful items such as digits/letters or faces.

In Real life how we learn from experience similarily Deep Neural Networks perform each task iteratively resulting in learning from every iteration results in a better outcome. Why it’s called Deep learning? because Neural Networks works on a ton of deep layers that boost up learning of Algorithm. In Modern Era massive amount of data is generated In addition to more data creation, DNN Algorithm gains stronger computing power that’s available today as well as the proliferation of Artificial Intelligence (AI) as a Service.

In this Post I will be In this Post, I will be covering Most of DeepLearning Algorithm but in a nutshell. It will be a concise point to point explanation of Popular Supervised, Unsupervised Deep Learning Algorithms.


Multi-Layer Perceptron a.k.a. Feed Forward Neural Network

  • Perceptron is given a name to Model having a single layer or we can say one hidden layer in Neural Network. Also known as Shallow Neural Network.
  • In Simple Words, the information’s flows start at the input layer, goes to the “hidden” layers, and end at the output layer.
  • The network does not have a loop which is found in Backpropagation Information stops at the output layers.
  • If we expand perception to Multiple Layers then it seems to be Multilayer Perceptron. Below image represents on the input layer followed by an intermediate layer and output layer.
  • It is a field in which we investigate how models of biological brains consist of perceptrons can able to solve complex predictive models. Power comes with Neural Network have the ability to learn our data points and map it to the output for better prediction.
Typical ANN
  • The Net is dense which basically means each neuron in the hidden layer is connected to all neurons located in the previous layer and to all the neurons in the following layer.
  • The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out from features at different resolution and merge them into high-level features.
  • Python Implementation of Multilayer perceptron powered by Keras.

Popular Datasets for ANN’s

Convolutional Neural Networks

  • Convolution Neural Nets are the Multilayered architecture designed to extract increasingly complex features of the data at each layer to determine the output. Mostly used with Images (Unstructured Data) in which peeps are able to extract information from Image.
  • CNN’s came into the picture because traditional ANN took more time to train unstructured Data but CNN used to first reduces the size of images using convolutional layers and pooling layers and then feed the reduced data to fully connected layers.
  • Let take a simple example Convolutional Neural Network Receives an Image of Dog and in computational means, it is just a collection of pixels having one Grayscale layer and three colored layers, In process of feature extraction basically, it grabs features like Dog’s tail head, ear, and other instances.
  • After the learning is completed CNN’s provide a probability for each image. The label with the highest probability will become the prediction of your CNN.
  • The Convolutional Network are able to capture temporal and spatial relationship in the image through relative filters. The Architecture performs a better fitting to the image dataset due to the reduced parameters and reusability in weights.
  • Some of the CNN based architectures that have been used to maximize performance are AlexNet, GoogLeNet, VGGNet, Residual Neural Network (ResNet).
  • Key Concepts of CNN revolves around Convolution layer, Strides, padding, Pooling Layer, and fully connected layer. For More info regarding these terminologies check out this Link.
CNN Architecture
  • So in NutShell CNN involves Provide input image into convolution layer, Choose parameters, apply filters with strides, padding if requires. Perform convolution and Relu activation function on matrix, Perform pooling to reduce dimensionality size, Flatten the output and feed into a fully connected layer, Output the class using an activation function and classifies images.

Popular Datasets for CNN

Recurrent Neural Networks(RNN)

  • Recurrent Neural Networks are a type of Artificial Neural Network algorithm that designed to find patterns in sequential data such as text, genomes, handwriting, the spoken word, or numerical times series data originating from sensors. These algorithms take time and sequence however they have a temporality factor.
  • We can image RNN’s as multiple copies of the same network passing information to its successor.RNN stores data into nodes that in a context that allow it to learn sequences of data and then output even a number or another sequence. Putting it in a simple way it is an ANN that loops connection between the neurons.
RNN Block Diagram
  • RNN can be applied even to images due to each image can be divided into patches and later treat as sequences.
  • There is some disadvantage of RNN’s can able to learn from past information but there are few cases in which we need more context, there is a need to look at recent information to perform the present task. However, these are not capable of handling Long Dependency problems.
  • LSTM special type of RNN’s capable of handling long term dependencies. Here is Link for Awesome Explanation about LSTM’s by Christopher Olah.
  • Even In this AI Era RNN that is surpassing LSTM’s are Gated Recurrent Unit (GRU) aims to solve the vanishing gradient problem.GRU can also be considered as a variation of LSTM because of both outputs excellent results in most of the cases. Check out this tutorial on GRU.
  • In the last few years, there has been incredible success applying RNNs to a variety of problems such as speech recognition, language modeling, translation, image captioning, etc.

Popular Dataset for RNN

Unsupervised Learning

Self Organizing Maps(SOM’s)

  • Unsupervised ANN that produces a lower dimensional representation of input samples in the training sample. Lower Dimensional Representation is basically called Map So In Straight terms, it is a method for dimensionality reduction of data points.
  • SOM is different from other ANN Algorithms as it applies competitive learning which is an antonym for Backpropagation learning with Gradient Descent.
  • SOM’s core motive is Dimensionality Reduction and also used for finding the proper structure of your Multidimensional Data i.e. 2D representation. In this 2D representation, each of your original inputs maps to one of the nodes on the 2D grid. The Higher Dimensional inputs will map to same 2D Node or at least the same region in 2D space.
  • Moreover, SOM helps us to visualize high dimensional data in 2D easily which in the case is not possible in which your data has 3 Dimensions.
  • A typical example is with colors, Data points with 3D vectors are represented as Red, Green, Blue. When mapped to a 2D SOM you can see regions of similar colors appearing in color space.

Popular Datasets for SOM’s

Boltzmann Machines

  • Boltzmann Machines are the unsupervised Deep learning Model focus on Boltzmann Distribution which is based on Statistical Mechanics. Boltzmann Machines are considered to be a point where Deep learning meets Physics.
  • Boltzmann Machines are broadly divided into two categories mainly Energy-based Models (EBMs) and Restricted Boltzmann Machines (RBM). Also, there is the case in which RBM’s are stacked on top of each other called Deep Belief Networks.
  • So Boltzmann Machines are nothing but stochastic/Generative Deep learning model which has only input nodes and hidden nodes.
  • In Boltzmann, we can observe that input nodes are interconnected to each other hence this makes this unsupervised algorithm different from others. All Nodes as shown in below image exchange data with each other resulting self generate data so it is called Generative Deep Model.
  • When we input data into Boltzmann Machines Nodes present in the network will learn all parameters as well as the correlation between them on their own. The model will be monitoring the abnormal behavior of patterns what is learned.
  • Explaining Detailed Boltzmann Machines will be out the scope for this post so Starting with Restricted Boltzmann Machines these belong to Energy-Based Models. The aim is to find the joint probability distribution that maximizes the log-likelihood function.
  • Deep Belief Networks are formed when hidden nodes of RBM’s are stacked on top of each other. It is a Model that consists of undirected Associative Memory and hidden layers form a directed acyclic graph that converts the representations in the associative memory into observable variables such as the pixels of an image.DBM further broke down into Greedy Layer-wise Training & Wake-Sleep Algorithm.
  • Deep Boltzmann Machines are unsupervised algorithm focused on learning a separate recognition model that is used to quickly initialize, values in the single bottom-up pass and later these values are passed to latent variables in all hidden layers. Detailed Explanation here.

Popular Datasets for RBM’s


  • Unsupervised Neural Network that learns a compressed representation of an input image. Trained using supervised methods sometimes referred to as the self-supervised learning algorithm. Trained as a part of a model that aims to recreate the inputs.
  • AutoEncoders encodes input values x using a function f after encoding it decodes encode values f(x) using function g which hence create output values same as input values.
  • Aim of Autoencoders is to minimize the reconstruction error between input and output values. This technique helps autoencoder to learn patterns between data points.
  • Working of autoencoder basically starts with encoding the input in order to identify latent feature representations to recreate inputs. Loss is however calculated by making a comparison between input and output. Weights in autoencoder are update based on how much weights are responsible for error.
Deep AutoEncoders
  • Usually, Autoencoder has three layers namely Encoder, coder and Decoder. The encoder compresses the input image into compressed representation having reduced dimensions. Coder is part of Autoencoder which is responsible for feeding compressed input to the decoder. A decoder decodes the encoded image back to the original dimension.
  • Autoencoder broadly divided into Convolution autoencoders, Sparse AutoEncoder, Deep Autoencoder, and Contractive Autoencoder.
  • AutoEncoder is usually preferred over Principal Component Analysis due to Non-Linear transformation with Non-linear activation function having multiple layers.
  • Keras implementation of LSTM Autoencoder fires this way

Popular AutoEncoders Datasets


If you like this post, Please follow me as well as Press that Clap button as long as you think I Deserve it. If you have noticed any mistakes in the way of thinking, formulas, animations or code, please let me know.