YOLO V5 — Explained and Demystified

Original article was published on Deep Learning on Medium


YOLO v5 Model Architecture

As YOLO v5 is a single-stage object detector, it has three important parts like any other single-stage object detector.

  1. Model Backbone
  2. Model Neck
  3. Model Head

Model Backbone is mainly used to extract important features from the given input image. In YOLO v5 the CSP — Cross Stage Partial Networks are used as a backbone to extract rich in informative features from an input image.

CSPNet has shown significant improvement in processing time with deeper networks. Refer following image and GitHub repository for more information about CSPNet.

https://github.com/WongKinYiu/CrossStagePartialNetworks

Source: https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/fig/cmp3.png

Model Neck is mainly used to generate feature pyramids. Feature pyramids help models to generalized well on object scaling. It helps to identify the same object with different sizes and scales.

Feature pyramids are very useful and help models to perform well on unseen data. There are other models that use different types of feature pyramid techniques like FPN, BiFPN, PANet, etc.

In YOLO v5 PANet is used for as neck to get feature pyramids. For more information on features pyramids, refer to the following link.

Model Head is mainly used to perform the final detection part. It applied anchor boxes on features and generates final output vectors with class probabilities, objectness scores, and bounding boxes.

In YOLO v5 model head is the same as the previous YOLO V3 and V4 versions.

Additionally, I am attaching the final model architecture for YOLO v5 — a small version.

Activation Function

The choice of activation functions is most crucial in any deep neural network. Recently lots of activation functions have been introduced like Leaky ReLU, mish, swish, etc.

YOLO v5 authors decided to go with the Leaky ReLU and Sigmoid activation function.

In YOLO v5 the Leaky ReLU activation function is used in middle/hidden layers and the sigmoid activation function is used in the final detection layer. You can verify it here.

Optimization Function

For optimization function in YOLO v5, we have two options

  1. SGD
  2. Adam

In YOLO v5, the default optimization function for training is SGD.

However, you can change it to Adam by using the “ — — adam” command-line argument.

Cost Function or Loss Function

In the YOLO family, there is a compound loss is calculated based on objectness score, class probability score, and bounding box regression score.

Ultralytics have used Binary Cross-Entropy with Logits Loss function from PyTorch for loss calculation of class probability and object score.

We also have an option to choose the Focal Loss function to calculate the loss. You can choose to train with Focal Loss by using fl_gamma hyper-parameter.

Weights, Biases, Parameters, Gradients, and Final Model Summary

To look closely at weights, biases, shapes, and parameters at each layer in the YOLOv5-small model, refer to the following information.

Source: https://gist.github.com/mihir135/969d78149b724b7684e327a1672da667

Additionally, you can also refer to the following brief summary of the YOLO v5 — small model.

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients