Original article was published on Deep Learning on Medium
The prediction is a probability vector, meaning it represents predicted probabilities of all classes, summing up to 1.
In a neural network, you typically achieve this prediction by having the last layer activated by a softmax function, but anything goes — it just must be a probability vector.
Let’s compute the cross-entropy loss for this image.
Loss is a measure of performance of a model. The lower, the better. When learning, the model aims to get the lowest loss possible.
The target represents probabilities for all classes — dog, cat, and panda.
The target for multi-class classification is a one-hot vector, meaning it has 1 on a single position and 0’s everywhere else.
For the dog class, we want the probability to be 1. For other classes, we want it to be 0.
We will start by calculating the loss for each class separately and then summing them. The loss for each separate class is computed like this:
Don’t worry too much about the formula, we’ll cover that in a second. Just notice that if the class probability is 0 in the target, the loss for it is also 0.