Neural Networks with Memory

Original article was published by on AI Magazine


Neural Networks with Memory

Understanding RNN, LSTM under 5 minutes

We always heard that Neural Networks (NNs)are inspired by biological neural networks. This huge representation was done in a fantastic way.

Figure 1 shows the anatomy of a single neuron. The central part is called the cell body where the nucleus resides. There are various wires which pass the stimulus to the cell body and few wires which send the output to the other neurons. The thickness of the dendrites implies the weight/bias/power of the stimulus. Many neurons with various cell bodies are stacked up which forms the biological neural network.

Figure 1: Anatomy of Single Neuron ( Source, Edited by author)

This same structure is implemented in Neural Networks. The input is passed through an activation function with weighted edges. The output is generated which can be passed to another activation function. Many activation functions can be stacked up, each of these is called a layer. And in a layer, we can have multiple neurons.

Figure 2: Single neuron neural network (Image created by author)

These activation functions can be as simple as a sigmoid function

Advantages of neural networks over traditional machine learning algorithms

  • Various types and sizes of data can be handled
  • Multiple functions can be easily configured
  • Non-linear data can be efficiently handled

Neural Networks with memory

The main difference between the functioning of neural networks and the biological neural network is memory. While both the human brain and neural networks have the ability to read and write from the memory available, the brain can create/store the memory as well. Researchers identified that this key difference is the major roadblock for today’s AI systems to reach human-level intelligence.

Researchers at DeepMind aimed to build a differentiable computer, by putting together a neural network and linking it to external memory. The neural network would act as a CPU with a memory attached. Such differentiable computers aim to learn programs (algorithms) from input and output data.

The neural networks are used when the amount of data is huge. For example, text data has an enormous amount of dimensions or the image data which is split into a huge number of pixels.

Recurrent Neural Network

A movie consists of a sequence of scenes. When we watch a particular scene, we don’t try to understand it in isolation, but rather in connection with previous scenes. In a similar fashion, a machine learning model has to understand the text by utilizing already-learned text, just like in a human neural network.

In traditional machine learning models, we cannot store a model’s previous stages. However, Recurrent Neural Networks (commonly called RNN) can do this for us. Let’s take a closer look at RNNs below.

Figure 3: Working of a basic RNN (Image by Author)

An RNN has a repeating module that takes input from the previous stage and gives its output as input to the next stage. However, in RNNs we can only retain information from the most recent stage. That’s where LSTM comes to the picture.

Long Short Term Memory Networks

To learn long-term dependencies, our network needs memorization power. LSTMs are a special case of RNNs which can do that. They have the same chain-like structure as RNNs, but with a different repeating module structure.

Figure 4: Working of LSTM (Image by author)

LSTM has a wide range of applications in Sequence-to-Sequence modeling tasks like Speech Recognition, Text Summarization, Video Classification, and so on.

To understand how these networks can be adopted in real-life applications in a quick glance, do check the article below.

A spam detection model can be achieved by converting text data into vectors, creating an LSTM model, and fitting the model with the vectors.


To implement these complete neural networks, Keras and TensorFlow made it simple.

We are implementing a Bidirectional LSTM with the ReLU activation function.

#Importing necessary libraries
import tensorflow as tf
from keras.layers import LSTM, Activation, Bidirectional
#Addding Bi-directional LSTM
#Relu allows converging quickly and allows backpropagation
model.add(Dense(16, activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Creating each layer is just a single line of code.


In this article, we learned how neural networks are linked to biological neural networks and the working of neural networks with memory (namely, RNN, LSTM).

Thanks for the read. I am going to write more beginner-friendly posts in the future too. Follow me up on Medium to be informed about them. I welcome feedback and can be reached out on Twitter ramya_vidiyala and LinkedIn RamyaVidiyala. Happy learning!