Introduction to LSTM Units While Playing Jazz

Source: Deep Learning on Medium

Learn about long short-term memory (LSTM) units and apply them to generate jazz music using Keras

Photo by Chris Bair on Unsplash

Long short-term memory (LSTM) units allow to learn very long sequences. It is a more general and robust version of the gated recurrent unit (GRU), which will not be addressed in this post.

In this post, we will learn how an LSTM unit works, and we will apply it to generate some jazz music.

Let’s get to it!

You can call me Al is a great song!

Structure of an LSTM

The main characteristic of an LSTM is the presence of three gates:

  • update gate
  • forget gate
  • output gate

Below is a schema of an LSTM unit. The three gates are shown by the presence of the three sigmas:

Schema of an LSTM unit

From the picture above, notice that there is a line from left to right at the top of the LSTM unit. This line represents how earlier information can be passed on to further steps in the network, and this is why an LSTM unit is so good at memorizing long sequences. Consequently, this allows a model to capture longer range dependencies.

This might seem like there is not a lot to know about LSTM, but understand that this is simply a unit within a recurrent neural network (MAKE LINK). The green box above simply represents one unit of an RNN. Then, multiple units are connected, resulting in a full network.

Using LSTMs to generate jazz

Now, we will implement an RNN with LSTM to generate jazz music! We will implement the network using Keras, and it will generate a 15-second clip of jazz.

As always, the full notebook is available for consultation.

Unfortunately, the data used for training was too large to upload on Github


Our model’s architecture will look like this:

RNN architecture to generate jazz music. Source

Basically, the model will be fed with a musical value, and it will then generate a series of musical values in order to generate music.

Here, a “value” represents a note, a duration, and it also contains information on whether another note is played at the same time (also called a chord).

Step 1: Initial model setup

First, we define the number of hidden states in the network. In this case, we will use 64.

Now, Keras has simple built-in functions to build the model. However, for sequence generation, not all input values are known; they are generated one at a time. Therefore, we need to add a custom for-loop, and define layer objects in order to have shared weights across steps.

Step 2: Build the model

Now, we are ready to build the model:

Step 3: Create, compile, fit

Now, we can create the model, compile it, and fit it to the data:

Step 4: Generate music

Now, we get to the fun part: let’s generate music with our model!

At each sampling step, the activation and cell state from a previous LSTM cell will propagate in the following cell, and they will be used to generate another output.

How the output propagates in the network. Source

To generate music, we need to sample notes first:

Whit that in place, we can now predict and create a sequence of music:

The music is then generated with this code cell:

out_stream = generate_music(inference_model)

You can find my generated music file (.midi file) in the repository.

That’s it! You learned what an LSTM unit is and how to apply it to generate music with Keras. Remember that an LSTM unit is just a block used in an RNN.

For further reading, I suggest you look up the GRU unit, as it is a simpler version of an LSTM.