Original article was published on Deep Learning on Medium
YOLO v5 Model Architecture
As YOLO v5 is a single-stage object detector, it has three important parts like any other single-stage object detector.
- Model Backbone
- Model Neck
- Model Head
Model Backbone is mainly used to extract important features from the given input image. In YOLO v5 the CSP — Cross Stage Partial Networks are used as a backbone to extract rich in informative features from an input image.
CSPNet has shown significant improvement in processing time with deeper networks. Refer following image and GitHub repository for more information about CSPNet.
Model Neck is mainly used to generate feature pyramids. Feature pyramids help models to generalized well on object scaling. It helps to identify the same object with different sizes and scales.
In YOLO v5 PANet is used for as neck to get feature pyramids. For more information on features pyramids, refer to the following link.
Model Head is mainly used to perform the final detection part. It applied anchor boxes on features and generates final output vectors with class probabilities, objectness scores, and bounding boxes.
In YOLO v5 model head is the same as the previous YOLO V3 and V4 versions.
Additionally, I am attaching the final model architecture for YOLO v5 — a small version.
YOLO v5 authors decided to go with the Leaky ReLU and Sigmoid activation function.
In YOLO v5 the Leaky ReLU activation function is used in middle/hidden layers and the sigmoid activation function is used in the final detection layer. You can verify it here.
For optimization function in YOLO v5, we have two options
In YOLO v5, the default optimization function for training is SGD.
However, you can change it to Adam by using the “ — — adam” command-line argument.
Cost Function or Loss Function
In the YOLO family, there is a compound loss is calculated based on objectness score, class probability score, and bounding box regression score.
Ultralytics have used Binary Cross-Entropy with Logits Loss function from PyTorch for loss calculation of class probability and object score.
We also have an option to choose the Focal Loss function to calculate the loss. You can choose to train with Focal Loss by using fl_gamma hyper-parameter.
Weights, Biases, Parameters, Gradients, and Final Model Summary
To look closely at weights, biases, shapes, and parameters at each layer in the YOLOv5-small model, refer to the following information.
Additionally, you can also refer to the following brief summary of the YOLO v5 — small model.
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients