Can we write a Sonnet like it’s the middle ages?

Original article was published by bored_developer on Deep Learning on Medium

Loading the data

You can get the data to train this poetry-generator from here:

Let’s start with loading the actual data and seeing a small sample of it, say the first 300 characters.

data = open('./sonnets.txt’).read()

The output of the above code block looks something like this:

FROM fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou, contracted to thine own bright eyes, Feed’st thy light’st flame with self-substantial fuel, Making a famine where abundance

Load the packages

from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
import tensorflow.keras.utils as ku
import numpy as np

To check the TensorFlow version use the following code block

import tensorflow as tf

It should output 2.1.0

Tokenizing the training data

You can achieve this using the following code block:

tokenizer = Tokenizer()
corpus = data.lower().split(“\n”)
total_words = len(tokenizer.word_index) + 1
print(‘Total number of words in corpus:’,total_words)

What this does is it breaks the training data into individual lines. Then it further breaks these lines into words and assigns a unique word index to each. Moreover, counting the number of word indexes gives us the total number of unique words.

Preparing the data for training

This is the most important part of this entire script and can be broadly split into 5 steps. So let’s get into it shall we,

For each line in the text file (training data), we are going to do the following operations:

1. Converting text to sequences.

You can do that using the following:


Once you convert the text to sequence the output of it would look something like the following:

[34, 417, 877, 166, 213, 517]

This would be accomplished using the above discussed unique word indexes.

2. Creating the N_gram sequences.

The next step is to create a N_gram sequence that would look something like :


3. Finding the max sequence length and the padding the rest.

You start by finding the length of the longest sequence and then you would pad the rest of the sequences to match that length.

NOTE: Remember to do pre padding when doing this step.

You can do this using the following:

pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre')

The output of this would look something like:


4. Creating the predictors and the labels.

This is where the most interesting part comes in, we are going to consider the last element in the N_gram sequence arrays we got above as labels and the rest of the array as the predictors. So for example:

PREDICTORS                      LABLES
[0,0,0,0,34] 417
[0,0,0,34,417] 877
[0,0,34,417,877] 166
[0,34,417,877,166] 213
[34,417,877,166,213] 517

The code for all of the above steps are condensed together in the next code block:

# create input sequences using list of tokensinput_sequences = []
for line in corpus:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
# pad sequences
max_sequence_len = max([len(x) for x in input_sequences]) input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding=’pre’))
# create predictors and label
predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
label = ku.to_categorical(label, num_classes=total_words)

Defining the model

At this point, it becomes obvious what we are going to predict. The model is boing behave like a pure text classifier where the number of classes is equal to the total number of unique words and the input is the predictors defined above.

# Defining the model. 
model = Sequential() model.add(Embedding(total_words,100,input_length=max_sequence_len-1))
model.add(Bidirectional(LSTM(150,return_sequences=True))) model.add(Dropout(0.18)) model.add(Bidirectional(LSTM(100))) model.add(Dense(total_words/2,activation=’relu’,kernel_regularizer=regularizers.l2(0.01))) model.add(Dense(total_words,activation=’softmax’)) model.compile(loss=’categorical_crossentropy’,optimizer = ‘adam’,metrics = [‘accuracy’])

The model defined by this would look like the following:

Model: "sequential_20" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= embedding_20 (Embedding)     (None, 10, 100)           321100     _________________________________________________________________ bidirectional_41 (Bidirectio (None, 10, 300)           301200     _________________________________________________________________ dropout_21 (Dropout)         (None, 10, 300)           0          _________________________________________________________________ bidirectional_42 (Bidirectio (None, 200)               320800     _________________________________________________________________ dense_48 (Dense)             (None, 1605)              322605     _________________________________________________________________ dense_49 (Dense)             (None, 3211)              5156866    ================================================================= Total params: 6,422,571 Trainable params: 6,422,571 Non-trainable params: 0 _________________________________________________________________ None

Training the model

Now to actually start training the model, this could take a couple of hours or minutes depending upon the hardware it’s running on.

history =, label, epochs=100, verbose=1)
Sample screenshot from the training of the network.

Visualizing the performance of the model

Here we are going to plot 2 graphs accuracy vs epochs and loss vs epochs.