5 CNN Architectures

Original article was published by Subhash Achutha on Deep Learning on Medium

Difference between 5 common convolutional neural networks


  1. Lenet
  2. Alexnet
  3. VGG
  4. Resenet
  5. Inception
  6. Lenet-5(1998)

LeNet-5 is one of the simplest architectures. It has 2 convolutional and 3 fully-connected layers.So simple architecture so less parameters.Input image is less so simple architecture enough to extract all features from image.This architecture has about 60,000 parameters.

2. AlexNet (2012)

Introduce relu,LRN(local response normalization),data agumentation techniques.With 60M parameters, AlexNet has 8 layers — 5 convolutional and 3 fully-connected. AlexNet just stacked a few more layers onto LeNet-5. relu overcome vanishing gradient problem and increase non linearity so more features extract from image,LRN normalization across depth.data agumentation increase sample images by shrinking ,rotating images.Here kernel size taken different sizes.Here not able to extract feature of feature so came up with VGG.

3. VGG-16 (2014)

It has 13 convolutional and 3 fully-connected layers, carrying with them the ReLU tradition from AlexNet. It consists of 138M parameters. Increase of extracting of features 64->128->256->512 extract more features and used 3*3 kernel size and introduce sgd with momentum. VGG 16 contain 16 layers and VGG 19 contain 19 layers.


It is powerful model used skip concept.Increase depth leads to vanishing gradient problem and increase parameters but overcome this problem used skip connection(residual block). we skip based on random or in code we can keep own logic to skip. Here increase depth also perform well because use residual network. used 1*1 feature map to increase non linearity.

Above image comparision between vgg19 and 34 layer plan network and 34 layer residual.resnet18,resnet 34,resnet 50,resnet181,resnet152.


Now depth and width changes so get adavantage ofextracting more features from image.use use 1*1,3*3,5*5 feature map across depth and extract mixture of features match size by using padding technique and atlast combine all features and send input to next layer.

The architecture of inception model take intermediate output at 2 places.here use batch normalisation and increase learning rate little bit to get output fastly.Remove dropout because use of batch normalisation.Inception V1,Inception v2,Inception v3,Inception v4 are some model.


  1. https://github.com/tensorflow/models/tree/master/research/slim/nets
  2. http://slazebni.cs.illinois.edu/spring17/lec01_cnn_architectures.pdf
  3. https://github.com/keras-team/keras-applications/tree/master/keras_applications