Importance Of Thought Vector In Seq2seq Model

Original article was published by Aditya Mohanty on Deep Learning on Medium

Picture By Alexandru Goman On Unsplash

Sequence to sequence models have been extremely useful for NLP tasks like question answering,machine translation etc. The crux of every seq2seq model is a thought vector which is also known as context vector. Here we would discuss in detail about the importance of thought vector.

What Is Encoder-Decoder Architecture:

Let us say we are handling a problem of machine translation. Here we would need encoder decoder architecture. The first half of the encoder-decoder architecture is an encoder which would turn the nlp text into a lower dimensional form. This lower dimensional form is known as thought vector,which shall be given as an input to the decoder.

As we can see in the above picture the thought vector serves as the input to the decoder or as we can see it is the initial state for the decoder. Using this as the input as well as a special token which is the start token the decoder tries to generate new sequence of texts.

However at the inference time the decoder does not have the expected text. So it has to generate the tokens using the thought vector and the start token only. This absence of expected tokens during the inference time is otherwise known as exposure-bias problem.

Importance Of Thought Vector and Seq2seq Model:

Let us say we are tackling a problem of machine translation. And we decided to use the traditional language model of lstm. It will be a very poor choice as for a single lstm to work we want them to be of same length. And this is practically impossible for any machine translation task.

So basically we shall need a mapping between the two languages which is handled by the seq2seq model in the best possible manner.

The thought vector here captures a fixed vector representation for the english sentence. This vector would go as an input to the decoder which will try to choose the most probable word in every time step and output it accordingly.

This was brief overview of thought vector in seq2seq model.

To add me to your network in linkedin