Zebra Detector — Your third end-to-end CNN in 5 minutes

Original article was published by Vidisha Jitani on Artificial Intelligence on Medium


Shortcoming #1 — Output Layer

Some quick questions!

Think think! Its no rocket science!! 🤔
Ok, answer time!!

We had 10 output nodes earlier. The reason being, only 10 labels were possible (0–9). Now, if we want positions as well along with the label, what should we do?

Imagine a hypothetical polygon(Centre: (x,y), Width: w, Height: h) that can enclose that digit. Now, we need to output just the coordinates of the rectangle and hurray, we will have the localization info as well. That’s all! So, if you will see, we have just added a few more info in the output layer to get the position info as well.

Image by Author

Previous Output: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
New Output: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, x, y, h, w]

So, the previous output here denotes one-hot encoding for our output array and thus only 3 is 1 and others have 0 as the output. The object detector will just add 4 more coordinates to let us know the position as well. And voila! All other back-propagations will just work as always (since it’s just 4 more outputs only).