Original article was published on Deep Learning on Medium
Triplet Network and Triplet Loss
Triplet network is an improvement of siamese network. As the name implies, three input sample images are needed, which are called anchor sample, positive sample and negative sample. Firstly, an anchor sample is picked, then a positive sample is picked from the same category as the anchor sample and a negative sample is picked from a different category with the anchor sample. Triplet network is superb to siamese network in that it can learn both positive and negative distances simultaneously and the number of combinations of training data improves to fight overfitting.
Triplet loss is used to calculate the loss of estimation results of the three input samples. In concept, as shown in Fig. 4, triplet network learns to decrease the distance between anchor and positive, while increase the distance between anchor and negative (Fig. 4 left), so that the difference of the two distances would reach to alpha, which is a user-defined hyper-parameter (Fig. 4 right). In the loss function of Fig. 4, the first item is the distance between anchor and positive and the second item is the distance between anchor and negative. The value of first item is learned to be smaller while the second item to be bigger. If their subtraction is smaller than minus alpha, the loss would become zero and the network parameters would not be updated at all.
In some social implementation applications of triplet loss, the producing of label data is challenging. Actually the training triplet has two parts: anchor- positive pair and anchor-negative pair. Similarly, the triplet loss also has two parts: the loss contributed by anchor-positive pair and the loss contributed by anchor-negative pair. Moreover, the anchor-negative pair training data is easy to get while anchor-positive pair is difficult. Therefore, the total number of anchor-negative pairs is far more than anchor-positive pairs, which would lead to overfitting on anchor-positive pairs. A solution of this is to control the ratio between the number two kinds of pair data, such as 1:3 etc. The thought is similar to anomaly detection tasks in which the number of anomaly label data is too small.