Original article was published by Vidisha Jitani on Artificial Intelligence on Medium
Shortcoming #1 — Output Layer
Some quick questions!
Think think! Its no rocket science!! 🤔
Ok, answer time!!
We had 10 output nodes earlier. The reason being, only 10 labels were possible (0–9). Now, if we want positions as well along with the label, what should we do?
Imagine a hypothetical polygon(Centre: (x,y), Width: w, Height: h) that can enclose that digit. Now, we need to output just the coordinates of the rectangle and hurray, we will have the localization info as well. That’s all! So, if you will see, we have just added a few more info in the output layer to get the position info as well.
Previous Output: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
New Output: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, x, y, h, w]
So, the previous output here denotes one-hot encoding for our output array and thus only 3 is 1 and others have 0 as the output. The object detector will just add 4 more coordinates to let us know the position as well. And voila! All other back-propagations will just work as always (since it’s just 4 more outputs only).