Original article was published on Deep Learning on Medium
Multi-Headed Attention Mechanism
Improving the Self- attention mechanism
In my last blog post , we have discussed about Self Attention. I strongly recommend going through that before understanding Multi-Headed Attention mechanism. Now , let’s see how Multi Headed attention could be of help.
Say we have a sentence:-
“I gave my dog Charlie some food.” . As we can see, there are multiple actions going on .
- “I gave” is one action.
- “to my dog Charlie” is second action .
- “What did I gave(some food)” is third action.
To keep a track of all these action we need Multi headed attention.
As you can see in the above image, its an extension of Self-attention with multiple heads/layers at Keys , Queries and Values blocks, which is why we need to concat the final output and pass it via a dense layer to get the final output. This Multi head mechanism is more efficient as it performs multiple attention mechanism in parallel . Earlier, in Self attention mechanism a single layer was suppose to catch all the actions going on in the sentence “I gave my dog Charlie some food.” . By using Multi headed attention mechanism , multiple actions are being shared and better captured using multiple layers.
In my next blog post, we will discuss about Transformers, in which Multi head Attention play a crucial role. Until then Goodbye.