Examining The Weight And Bias of LSTM in Tensorflow 2

Source: Deep Learning on Medium

Examining The Weight And Bias of LSTM in Tensorflow 2

source: https://dnacentre.co.uk/wp-content/uploads/2018/04/blog-uk-genetic-engineering.png

Ok, first as the opening what is LSTM? LSTM is an abbreviation for Long Short Term Memory. It’s one of the Recurrent Neural Network (RNN) that mostly used. Basically, RNN takes time-series data as an input. Every single data in time series become the input of RNN sequentially from the earliest timestep. The output of RNN from the certain timestep input will be the input of the RNN together with the data of the next timestep. The real output will be outputted when the RNN processing the last timestep of the data.

How the RNN process the input

How the structure of neuron of LSTM is like this

In every processing of timestep, LSTM has 4 layers of the neuron. These 4 layers together forming a processing called gate called Forget gate -> Input Gate -> Output gate (-> means the order of sequence processing happens in the LSTM). And that is LSTM, I will not cover the detail about LSTM because it will be a very long post and it’s not my focus this time.

Long story short, for the sake of my recent experiment, I need to retrieve the weights and biases of my LSTM. I build an LSTM using TensorFlow 2. In TensorFlow 2, we can access the structure of LSTM’s weights and biases using this code below.

import tensorflow as tfsimple_lstm_model = tf.keras.models.Sequential()
simple_lstm_model.add(tf.keras.Input((18,7)))
simple_lstm_model.add(tf.keras.layers.LSTM(2))
print(simple_lstm_model.layers[0].trainable_weights)

In the code above, I build an LSTM that take input with shape 18 x 7. 18 is the total timestep of the data and 7 is the total number of parameter. For every timestep, LSTM will take 7 parameters. I declare this LSTM has 2 hidden states. The hidden state is like an output of LSTM in every timestep. This means our LSTM will be outputting 2 real numbers in the end. It also means, the amount of neurons in every layer of the neuron in LSTM is 2. When you run this script, you will get an output like this.

[<tf.Variable 'lstm/kernel:0' shape=(7, 8) dtype=float32>, <tf.Variable 'lstm/recurrent_kernel:0' shape=(2, 8) dtype=float32>, <tf.Variable 'lstm/bias:0' shape=(8,) dtype=float32>]

Ok, so here we have a 3 set of parameters and it’ consists of lstm/kernel (with shape 7×8), lstm/recurrent_kernel (with shape 2×8), and lstm/bias (with shape 8). The last part (lstm/bias) is obvious, it’s the bias. Why we have 8 of them? Remember, in the LSTM we have 4 layers of neurons and I declare this LSTM have 2 hidden states or 2 neurons in every layer. Every neuron has 1 bias, there are 2 neurons in each of 4 layers so the total is 8.

The confusing part is lstm/kernel and lstm/recurrent_kernel. Kernel here means weight. lstm/kernel means weights of our input against every neuron in LSTM. I declare the input has 18 timestep and 7 parameters, so every parameter has 1 weight against every neuron, and that’s why lstm/kernel has shape 7×8. And the last, lstm/recurrent_kernel means weights of our hidden state a.k.a the output of LSTM in previous timestep (t-1) against every neuron in LSTM. I declare the hidden state is 2 and the total neuron in LSTM is 8, so the shape of weights of the hidden state against layers of neuron in LSTM must be 2 x 8.

And that is the explanation about weights and biases of LSTMin TensorFlow 2. You also can see the structure of the entire neural network using summary() function like below.

import tensorflow as tfsimple_lstm_model = tf.keras.models.Sequential()
simple_lstm_model.add(tf.keras.Input((18,7)))
simple_lstm_model.add(tf.keras.layers.LSTM(2))
simple_lstm_model.add(tf.keras.layers.Dense(5))
simple_lstm_model.summary()

And the output is

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 2) 80
_________________________________________________________________
dense (Dense) (None, 5) 15
=================================================================
Total params: 95
Trainable params: 95
Non-trainable params: 0

Here I stack the LSTM before with an ordinary layer of neural network. The final output of this network have 5 elements. The total of weights and biases in this stacked neural network is 95.

That’s all, see you in the next post.

Reference:

https://medium.com/ai-journal/lstm-gru-recurrent-neural-networks-81fe2bcdf1f9, accessed on 15 December 2019

https://colah.github.io/posts/2015-08-Understanding-LSTMs/, accessed on 15 December 2019