Three lines of code listed above that in my opinion have revolutionized my understanding of how to get state of the art results in deep learning. Having gone through the in person Deep Learning Certificate at the Data Institute at the University of San Francisco taught by distinguished scholar Jeremy Howard who is a founding researcher at fast.ai, I have gained a new level of understanding of the concepts. Firstly I come from a non-computer background and I have had the experience of reviewing and completing various deep learning courses from Coursera and Udacity and I have to admit that my deep learning journey has really accelerated after completing the Deep Learning Certificate at USF.
The three lines of code use advanced techniques such as the ability to use different pretrained models such as Resnet, Inception, VGG and Nasnet as well as processes such as data augmentation, dropout and random zooming. Once I understood the 3 lines it became extremely easy to see what each parameter was being used for and what part of the fast.ai library was being used. It also became a lot easier to use the same or similar code format and apply it to different datasets
Looking under the hood, the code reveals the use of various python programs from the Fast.ai. library and although I believe that the concept of Fast.ai is to provide accessibility to the masses (those that don’t have PhD’s from Stanford), it is important to understand how the code above works and why it produces exceptional results. The fast.ai library consists of 27 scritps that deal with various aspects but in order to get a good grasp of whats under the hood, I found deep diving into the following python scripts very beneficial: conv_learner.py, dataset.py, transforms.py and learner.py.
Dissecting the 3 lines of code:
This line primarily deals with what kind of transformations we are going to use. The ‘arch’ parameter is the pre-trained architecture or model we are going to use, the ‘sz’ parameter is the size of the input image bearing in mind that a smaller size will speed up the training time. The goal is to start with a smaller size’ in order to train quickly and then increase the size in order to improve accuracy and avoid over-fitting. The ‘aug_tfms’ parameter is the data augmentation portion of the code where a single image can be augmented randomly in ways that does not impact their interpretation, such as horizontal flipping, zooming and rotating. In the example above we are using a pre-defined list of functions that have a side_on transformation (transforms_side_on) with random zooming at a scale of 1.2. This line of code is predominantly handled by tranforms.py.
In the second line ‘ImageClassifiedData.from_csv’ is used because we are providing training labels from a csv file. Other input methods include .from_arrays and .from_paths and this is handled by dataset.py. Here we are providing the ‘path’ to the training and test folders as well as the location of the labels_csv file. Depending on the dataset you may have to indicate the suffix of the images being used, in this case I am using jpegs. The library also by default splits the dataset into a validation and training set using ‘val_idxs’. By default the validation set is split by 20%. The batch size parameter denoted by ‘bs’ specifies the number of images considered in each iteration. A larger batch size is better for more accurate gradient updates however the batch size is dependent on the size of the images and the size of your GPU memory.
The 3rd line involves the use of the learner class as defined in learner.py and other types of learners extend this class such as conv_learner.py. Here we again specify the pretrained architecture being used and pass in the data command. We also use precompute=True where we pass precomputed activation weights from the pretrained architecture. In this state no data augmentation occurs. On the flip side setting precompute=False allows for data augmentation which allows for the use of the unfreeze/freeze function where unfreeze refers to making all layers trainable and freeze refers to making all layers un-trainable (i.e. frozen) except for the last layer.
The ps parameter refers to dropout parameters. Dropout is a powerful regularization technique where randomly selected neurons are ignored during training. In this case the ps parameter is set to 0.5 where 50% of randomly selected neurons will be ignored from the n-1 layer. The effect of this technique is that network becomes less sensitive to the specific weights and this in turn results in a network that is capable of better generalization and is less likely to over-fit.
These 3 lines of code provide an astounding amount of ‘power’, so much so that I was able to get into the top 7% of the leader board in my first ever Kaggle competition all thanks to fast.ai. If you want to learn more about the 3 lines of code and fast.ai please visit their website here.
Source: Deep Learning on Medium