An Introduction to Convolutional Networks with Pytorch and Fast.ai


Course 1 — Part 1

Although there is an abundance of resources which can be bought online for deep learning, and even more difficult is finding the good ones, there is a lack of courses which are more about the practical approach and thats when I came across Fast.ai . It’s a rare gem even among courses and you can do it absolutely free of cost. One of the things that sets apart fast.ai from other courses is that they embrace the “learning through doing” approach. Instead of throwing at you jargon you may not understand , they ask you to implement code and pick apart this code part by part and simultaneously explaining the neural network you are coding.

Before getting started with the article, you need to setup basic stuff like a GPU and ensure that you’ve installed the various dependencies on the system and download the dataset. Fast.ai advocates the use of crestle or paperspace and earlier an aws p2 instance but they charge on an hourly basis. You can also setup a google cloud GPU instance since 300$ of credit is offered to all users for the first year. You can look at these options and find the necessary instructions to proceed on forums.fast.ai . Finally you can find the actual notebook for the first week here. One other thing to be wary about is the fact that fastai is a library written on top of pytorch, the library was written as a best-practices abstraction which helps in quick prototyping i.e helps you build complex models with ease since you won’t have to write it from scratch.

First let’s look at the 4 lines of the code we will be running, with this you will have successfully written your first neural network:

arch=resnet34
data = ImageClassifierData.from_paths(PATH,tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)

Neural Networks

Before we start picking apart parts of the code, lets try to understand what exactly neural networks are. A Neural network can be simply thought of as a model which is trying to approximate a function. NNs are inherently non-linear. It can model a wide variety of functions. Why Neural Networks ? These models made a comeback in the last decade because of two major reasons because they need a lot of data and computation to train both of which were experiencing exponential growth in the late 2000’s. Neural Networks weren’t this deep earlier, but it was found that deeper networks could learn and model better. It is this depth that we refer to as “Deep Learning”.

Neural Networks

Convolutional Neural Networks

Computer Vision is a field whose holy grail lies in making computers as adept at seeing as human eyes. Why Computer Vision? Human eyes effortlessly detect an object, are able to distinguish one object from another and also classify several kinds of objects innately, teaching this to computers has been of immense difficulty but in the wake of Deep Learning we’ve found much needed breakthrough, It’s needed because it can help in the creation of autonomous vehicles and other aspects like facial detection.

Vanilla Neural Networks can’t directly be used to solve computer vision problems, because the network can’t generalize well among a range of different photos. For example consider the photos of the dogs below, these photos are entered as pixel values into a neural network now while the network may learn that the first photo is a dog, when the second photo is now put in how will the network know that it’s a dog it’s pixel values are nowhere similar to that of the first photo so a comparison of pixel values would not work out well. Let’s think of another approach to solving this, What if we modeled the network in such a way that it detected the edges first after which they were merged to detect legs and other “charecteristics” that would be unique to a dog. This is exactly what a convolutional neural network does.

Convolutional Neural Networks (CNNs) evaluate an image through convoluted features. Consider the gif below, any image entered is first convolved with a filter which helps in detecting edges and other small level features in the earlier layers and more complex features in the deeper layers of the network.

https://cdn-images-1.medium.com/max/1000/1*ZCjPUFrB6eHPRi4eyP6aaA.gif
An image convolved with a filter
From edges to complex features

Diving into the code

First lets look at the approach here, the way we are going to train the neural network is through transfer learning. What does it mean? Resnet34 architecture has been trained on the imagenet challenge dataset. The dataset consists of 1000 different classes of objects and over 1.2 million photos. The classes that these objects belong to are common everyday objects, so moving forward what we will do is we will use this network and finetune it to cats and dogs by showing it around 15k images of the same. This way of using networks pretrained on some other dataset and finetune it to other datasets is called transfer learning. This finetuning could be done by just training the last layer or the entire network. In our code we will be using the resnet34 architecture which is about 34 levels deep.

data = ImageClassifierData.from_paths(PATH,tfms=tfms_from_model(arch, sz))

The second line in our code is an object to refer to out dataset, the path variable is to point to the location where you have the dataset downloaded. The tfms argument is for setting the size of the dataset images and can be later on used to augment the dataset in case overfitting occurs.

learn = ConvLearner.pretrained(arch, data, precompute=True)

The third line in our code is for creating the learner, by default the learner has all but the last layer of the network freezed. As i mentioned earlier this is a way to finetune the network. The architecture, data is passed. The precompute argument when set to false defreezes the entire network.

learn.fit(0.01, 2)

Finally, we look at the last line of our code, The fit function has two arguments one is the learning rate and the second is the number of epochs. The training of neural networks occurs through something called gradient descent. You can understand the gradient descent problem as trying to walk down a valley the learning rate is how big a step will you take. if you take large steps you may skip over the bottom but if you take small steps you may never reach the bottom of the valley. The learning rate has to be fixed carefully. The no of epochs is the no of the times the network looks at the entire dataset.

So there you have it, you have created your first Convolutional neural network and also have understood the code that is needed to run it. Please take a look at the original notebook for the first week here to download the dataset. To read more on convolutional networks and get an in depth understanding of the same you can read this article.

Source: Deep Learning on Medium