Transfer Learning using MXNet (Parts 5+6) — Training a Base Model & Extracting Embeddings

Original article can be found here (source): Deep Learning on Medium

Part 6: Extracting Embeddings from the Network

The above network has learned multiple things, including how to represent each English word in a high-dimensional vector, (through an embedding layer), how to make sentences in the English language (through GRU), and how to classify a given input of review as positive or negative (throughthe final dense layer).

Using t-SNE, we can visualize high-dimensional vectors into 2D/3D space. Let’s see what kind of embeddings our network has learned for English words to determine if there was any pattern in the words, which later helped the network classify sentences as positive or negative

Our network learned 64 dimensional embedding representations (from embedding layer) for 63,167 unique words. It’s going to be very cluttered to visualize embeddings for all words. Let’s look at 100 example words and their t-SNE mapping from 64D to 2D.

Visualization of leant embeddings

From the plot, we can clearly see that from the embedding layer itself, the network started to learn how to differentiate negative words from positive words. The positive words are clustered together in the green circle area (excellent, loved, fun, perfect, wonderful, amazing, etc.), and the negative words are clustered together in the red circle area (bad, mess, pointless, disappointment, poorly, terrible, etc.).

The code for generating t-SNE is pretty straightforward, using the sklearn.manifold module:

What’s Next

Coming up next is the last and the most important part of this tutorial series. So far, we’ve completed all the required steps to train a neural network on the IMDB movie reviews dataset. We’re also happy with our network’s performance, and the word vector representations also make sense for the task at hand.

Now, the final task in this tutorial series is to use the trained network weights and use them for a similar task in another domain. The question we’re trying to answer here is whether the knowledge we gained by reading movie reviews is transferable to reviews in other domains like hotels, Amazon products, etc. This transfer of knowledge can save hours of training time and can be very useful for businesses that lack labeled data.