Source: Deep Learning on Medium
I have already explored a Deep Learning application, namely sentiment analysis, related to natural language and comprehension of it in my previous and first blog post, so I turned my focus to the other major application of Deep Learning, which is Computer Vision.
What is Computer Vision?
Computer vision is the subset of Deep Learning that deals with giving computers the ability to cognitively perceive the world around them. Computer vision is primarily used in cars that have any kind of auto-drive features, and most recently, in the Google Lens, which can identify objects and items and allow you to copy and paste text from the real world. Amazon is currently in the process of releasing the Echo Look, a device that scans you and your clothes and makes recommendations about your outfit. Typical computer vision problems include object detection and image classification, and all of these applications are implementations of these 2 fundamental features.
The Evolution of Computer Vision
The field of computer vision started out by applying hand crafted rules to pick out relevant information from images. These techniques, such as hand crafting stipulations, hand picked feature extraction and data augmentation are now referred to as traditional methods of solving computer vision problems. Traditional methods, however successful, were not very salable and had to be re-engineered to introduce a new scope of items. However, competitions such as the ILSVRC pushed programmers to come up with new innovations to solve the problem of image classification. Below is the timeline of the evolution of Computer Vision and Conv-Nets
2012: AlexNet, The Birth of Modern Computer Vision:
AlexNet won the ILSVRC 2012 by a large margin compared to more traditional methods.
- Used Data Augmentation, the ReLU activation function, dropout, and GPU implementation
- Used Overlap Pooling to reduce the model complexity
- Proved the value of Convolutional Neural Networks
- 5 Convolution Layers
- Overlap Pooling Layers
- 3 Fully Connected (Dense) Layers
ZFNet won the ILSVRC 2013 using a modification upon the AlexNet
- Changed the first convolution layer in the AlexNet from a 11 x 11 convolution to a 7 x 7 convolution to retain more information
2014: VGGNet, One of the Most Popular Conv Nets:
VGGNet replaced large level convolutions by stacking several small kernel convolutions.
- 5 blocks, each block containing 2–4 3 x 3 convolutions with a stride and padding of 1, and a 2 x 2 max pooling with a stride of 2.
- A little over 0.5 GB
2014: Inception, or GoogLeNet, Winner of ILSVRC 2014
The inception model addressed the problem of overfitting that came along with deeper convolution models. It also removes the challenge of choosing the correct kernel size by including multiple different kernels.
- Instead of stacking multiple convolution layers with different kernel sizes, the inception net concatenates multiple different kernel convolutions
- To deal with the inconsistencies in the tensor shapes, uses 1×1 convolutions to combine the different convolutions
- To prevent the middle of the model from “dying out”, added auxiliary losses in 2 different parts of the model and added their weighted sum to the final loss
- Inception v2 and v3 replaced the 5 x 5 convolutions with a factorized convolution by 1 x 5 then 5 x 1 kernels.
A model that consisted of stacked residual blocks. Below is the basic representation of the residual block:
- Replaces model size and complexity substantially
- Residual block contains 2 conv layers followed by an addition of the original input to the output
- This addition allowed the model to learn an identity function easily.
- Much deeper than preceding model architectures
2015: Inception — ResNet: The Hybrid Network
Based on the success of the ResNet and the Inception Net, both models were combined to make a hybrid network with the advantages of both models
- 3 different blocks of convolutions (A, B, C)
- Added the residual block from ResNet
- Scaled activations on the convolutions to prevent the network from dying
Thanks for reading my second blog post! I hope it provided you with some important information about the growth of the Computer Vision field. My next blog post will be about how I created an anomaly detection system to predict fraudulent transactions in parking garages around the US. Signing off for now! -Tanish