A Crash Course in Computer Vision

Original article was published on Deep Learning on Medium

Popular Models

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) started in the year 2010 where teams participated to train models that could classify images into 1000 categories. A good ILSVRC error rate was considered to be around 25% but the introduction of AlexNet, a deep convolutional neural network, changed the dynamics of the competition. AlexNet achieved an error rate of 16%, more than 10 percentage points lower than that of the runner up. AlexNet contained eight layers; the first five were convolutional layers, some of them followed by max-pooling layers, and the last three were fully connected layers. While AlexNet focused on smaller windows and strides, VGG model focused on the aspect of depth. VGG finished second in the ImageNet competition in the year 2014 achieving top 5-error rate of 7.3%. It was not until 2015 that AlexNet tasted defeat in the competition by Microsoft’s ResNet, a very deep CNN that had over 100 layers.

Python Libraries

TensorFlow — TensorFlow is a Python library for fast numerical computing created and released by Google. It is a foundation library that can be used to create Deep Learning models directly or by using wrapper libraries that simplify the process built on top of TensorFlow

Keras — It offers an easier mechanism for expressing neural networks. It was developed with a primary focus on allowing fast experimentation. All Keras models are portable.

NumPy — It is a Python library meant for scientific computing. It can be used to express images, other binary raw streams, and sound waves as an array of real numbers in an N-dimensional array

Pandas — Pandas library offers fast, expressive and flexible data structures. The library makes it easier for developers to work with labeled or relational data

Matplotlib — It is a two-dimensional plotting library for Python programming language capable of producing publication-quality figures in numerous hardcopy formats

Scikit-Learn — This library features a range of tools for accomplishing data analysis and mining tasks. It supports a number of data mining and machine learning tasks, notably classification, clustering, model selection, reducing dimensionality and regression

PyTorch — PyTorch was developed at Facebook’s AI Research lab. It offers an ecosystem to support development in Computer Vision and NLP. PyTorch defines a class called Tensor to store and operate on multidimensional rectangular array of numbers (similar to NumPy arrays)

OpenCV — OpenCV is a library of programming functions aimed at real-time Computer Vision. It can perform image operations such as reading/writing an image, face detection, face landmark detection, text recognition etc.