Finishing off RNNs

Original article was published on Deep Learning on Medium

Hello everyone! This is my twelfth writing on my journey of Completing the Deep Learning Nanodegree in a month! I’m finally done with the fourth module out of a total of six modules of the degree. Today, we will wrap up our discussion on RNNs.

Day- 17

Here, in this writing, I will continue my discussion of my last post, I’ll link the previous post at the end.

Sub Sampling

We use this to get rid of the highly occuring words that have no effect on the previous or next word, like ‘the’, ‘of’, and so on. We do this because such words have no effect on what the next word should be, it doesn’t help our model perform well as anything can be linked with these words. They do not have a specific context.

Subsampling

Making Batches

The thing to note is that we don’t keep in view words that are very far from the choosen one, because they’d have that much lesser of an impact on what the current word is, hence, we just keep in view the near ones.

Batching Data is when we consider a subset of all the data. This perticular implementation is for a Character Level RNN.

Total_batch = Sequence_length * Batch_size
No_of_batches = len(array)//Total_batch
array = array[: No_of_batches * Total_batch]
array = array.reshape((batch_size, -1))
for i in range(0, array.shape[1], sequence_length):
x = array[:, n:n+sequence_length]
# Here, we are making the window which will determine which entries to consider.y = np.zeros_like(x)

try:
y[:, :-1], y[:, -1] = x[:, 1:], array[:, n+sequence_length]

except IndexError:
y[:, :-1], y[:, -1] = x[:, 1:], array[:, 0]
# Here, we are mapping the two arrays as one +1 = another. and If its the last element, then, 0th element is saved.

yield x, y

Negative Sampling

For every example we give the network, we train it using the output from the softmax layer. That means for each input, we’re making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We’ll update the weights for the correct example, but only a small number of incorrect, or noise, examples. This is called “Negative Sampling”.

There are two modifications we need to make. First, since we’re not taking the softmax output over all the words, we’re really only concerned with one output word at a time. Similar to how we use an embedding table to map the input word to the hidden layer, we can now use another embedding table to map the hidden layer to the output word. Now we have two embedding layers, one for input words and one for output words. Secondly, we use a modified loss function where we only care about the true example and a small subset of noise examples.

Negative Sampling

Padding Features

Let’s say we have converted the array of words into numebr but we cannot feed into an NN yet, we further need to clean it. What we do is that we make all the inputs the same size, hence, we increase the size of the smaller ones adn decrease that of the larger ones. This process is called Padding.

#Padding Representation
features = np.zero((len(char2int), sequence_length), dtype=int)

for i, row in enumerate(char2int):
features[i, -len(row):] = np.array(row)[:sequence_length]

Splitting Data

The easiest way I found is:

plit_idx = int(len(features)*split_frac)
train_x, remaining_x = features[:split_idx], features[split_idx:]
train_y, remaining_y = encoded_labels[:split_idx], encoded_labels[split_idx:]
test_idx = int(len(remaining_x)*0.5)
val_x, test_x = remaining_x[:test_idx], remaining_x[test_idx:]
val_y, test_y = remaining_y[:test_idx], remaining_y[test_idx:]

Create DataLoaders and batch our training, validation, and test Tensor datasets:

train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=batch_size)

Creating Lookup Tables

In this, we basically convert the text into numbers that can be fed into a model.

x = set(words)
int2char = dict(enumerate(x))
char2int = {i:n for n, i in enumerate(x)}

Forward Feeding into Fully Connected Layers

We cannot directly feed in the output of RNN layer into a Fully Connected Layer, we need to reshape it the following way and we use the following syntax.

out = lstm_output.contiguous().view(-1, self.no_of_hidden)