Week 3: Introduction to Models

Source: Deep Learning on Medium

Week 3: Introduction to Models

Hi everyone, we know this is a tough week for all of you. That’s why we don’t want to tire you further.

This week’s focus will be on CNN models that we will use in our project.

We talked about data transform and dataloader events last week. We are now able to use CNN models. If you have not read or to browse again, you can reach our article here last week.

So, What is the CNN Model ?

A convolutional neural network (CNN) is a specific type of artificial neural network that uses perceptrons, a machine learning unit algorithm, for supervised learning, to analyze data. CNNs apply to image processing, natural language processing and other kinds of cognitive tasks.

A convolutional neural network is also known as a ConvNet.

Our aim is to use models such as Densenet-121, Resnet-50, VGG-16, and AlexNet.

What is a Pre-trained Model?

A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. Learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that you would be transferable your dataset.


ResNet-50 is a deep residual network. The “50” refers to the number of layers. It is a subclass of neural convolution networks, where ResNet is most commonly used for image classification.

The main innovation of ResNet is the Skip connection. As you know, deep networks without adjustments often suffer from vanishing gradients, that is, as the model spreads backwards, the process becomes smaller and smaller. Tiny inclines, can make Learning intractable.

The skipping link in the following figure is denoted by “identity”. It allows the network to learn the identity function, which allows it to route the input through the Block, without passing through the other weight layers!


VGG-16 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 16 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224.

It is a simple network model and the most important difference from the previous models is the use of convolution layers in 2s or 3s. In the fully connected (FC) layer, it is transformed into an attribute vector with 7x7x512 = 4096 neurons. The softmax performance of 1000 classes is calculated at the two FC layer outputs. Approximately 138 million parameters are calculated. As in other models, the height and width dimensions of the matrices decrease from input to output, while the depth value increases.

VGG-16 Architecture


It requires less parameters than conventional convolutional networks; because there is no need to re-learn unnecessary feature maps. Conventional feed forward architectures can be viewed as algorithms that include a state passed from layer to layer. Each layer reads the state from the previous layer and writes to the next layer.

In addition to better parameter efficiency, a major advantage of DenseNets is the enhanced information and gradient flows that make it easy to train them across the network. Each layer has direct access to gradients from the loss function and the original input signal, resulting in an implicit deep check. This helps in training deeper network architectures. Moreover, we observe that intensive connections have a regular effect that reduces excessive compliance with tasks with smaller training set sizes.



This is the first study in 2012 to make convoluted neural network models and deep learning become popular again. Developed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton. Basically, the LeNet model is very similar in that it has consecutive convolution and pooling layers. ReLU (Rectified Linear Unit) is used as activation function and max-pooling is used in pooling layers.

AlexNet Architecture

What is the difference between VGG16, Resnet50, Alexnet?

They are different kinds of Convolutional Neural Networks.

Alexnet and VGG are pretty much the same concept, but VGG is deeper and has more parameters, as well has using only 3×3 filters.

Resnets are a kind of CNNs called Residual Networks. They are very deep compared to Alexnet and VGG, and Resnet 50 refers to a 50 layers Resnet.

Resnet introduced residual connections between layers, meaning that the output of a layer is a convolution of its input plus its input. Moreover, layers in a Resnet also use Batch Normalization, which has also been incorporated to VGG.


That was about this week from us, see you next week.