Source: Deep Learning on Medium
Learn everything about RNNs and build a dinosaur name generator
Recurrent neural networks (RNN) are part of a larger group of algorithms called sequence models. Sequence models made giant leaps forward in the fields of speech recognition, music generation, DNA sequence analysis, machine translation, and many more.
In this post, the theory of RNN is introduced and we will consolidate everything by building a model that will generate new dinosaur names.
Why use a recurrent neural network?
Suppose that you want to build a neural network that will identify people’s names in a sentence.
In this case, a traditional neural network is not suitable for this type of application. First, the inputs and outputs will not have the same length, since a person’s name can vary widely. Second, learned features will not be shared across different positions of the text, resulting in subpar performance.
This is why RNNs are very popular with sequential data, such as sentences or music.
Architecture of a recurrent neural network
In an RNN, the input is fed to a layer of the network, and it outputs a prediction. Then, a second input is fed to another layer, and the network outputs a prediction. However, it also uses information computed from the previous layer.
The process is repeated until a final output is obtained.
Therefore, a typical RNN looks like this:
This structure allows RNNs to link information from earlier steps to a present step. However, if the RNN is very deep, meaning that it has many layers, it will be prone to vanishing or exploding gradients.
Vanishing gradients occurs when the gradients become close to 0, and so the weights are not updated anymore. On the other hand, exploding gradients are when the gradients is too large, and the global minimum of the optimization function is never reached. Of course, there are ways to prevent exploding and vanishing gradients, which we will see when we code an RNN.
Coding an RNN to generate new dinosaur names
Now, we will use Python to write an RNN that will generate new dinosaur names for us. We will learn how to:
- Store text data to process it with an RNN
- How to synthesize data
- How to generate text
- How to clip the gradient to avoid exploding gradient
Our RNN will look like this:
Of course, the full notebook is available if you want to code along. You will also find all the necessary utilities to replicate the results.
Step 1: Import libraries and explore dataset
We start off by importing the necessary libraries and utilities, and we take a look at the data we will work with.
You should see that we have quite a few dinosaur names, and we also have 27 unique characters. They correspond to each letter of the alphabet and we have a special character “\n” that defines a new line. This will be our end-of-sentence (EOS) character.
Step 2: Define the gradient clipping function
As aforementioned, RNNs are prone to vanishing or exploding gradients, so we will implement gradient clipping to prevent the gradient from “exploding”.
Gradient clipping is very simple: if the calculated gradient is larger than a certain threshold, we will it scale it back to the threshold. Otherwise, we leave it as is. Like that, we will make sure that our optimization function will converge.
Step 3: Define the sampling function
Once our model is trained, we will need a function that will choose the character with the highest probability to generate a plausible dinosaur name.
In other words, it will need to sample a character from the character library and generate a word.
Step 4: Build the model
Now, we are ready to build the model. First, we implement a function that performs one step of stochastic gradient descent with gradient clipping:
Perfect! Now we can write and train our model:
Now, we simply need to run this cell below to get new dinosaur names:
parameters = model(data, ix_to_char, char_to_ix)
You should see that algorithm is getting better at generating more plausible dinosaur names. A few examples that I got are:
If you have more time and computation power, feel free to tune the model and train it for longer, as it should generate better names.
That’s it! You learned the basics of RNNs and applied them to generate new dinosaur names!
In a future post, we will discuss long-short term memory (LSTM) units, which is another type of recurrent networks, and we will generate jazz music.