Below there is all code you need to make state of the art image classifier. We learned all these techniques last lesson.
# Imports, path to data, and hyper parameters
from fastai.conv_learner import *
PATH = "data/dogscats/"
# Data augmentation, take data from folders, and finally make a learner and fit it.
tfms = tfms_from_model(resnet50, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_paths(PATH, tfms=tfms, bs=bs)
learn = ConvLearner.pretrained(resnet50, data)
learn.fit(1e-2, 3, cycle_len=1)
# Unfreeze all layers, bn_freeze is something we have not learned yet but basiaclly it make model better when we are using precomputed models like resnet50, and then we fit the model using different learning rates.
learn.fit([1e-5,1e-4,1e-2], 1, cycle_len=1)
# Finally we use test time augmentation (TTA) to give us better result.
log_preds, y = learn.TTA()
metrics.log_loss(y, np.exp(log_preds)), accuracy(log_preds, y)
Fast.ai library is making most of the work for us but I think from here you can only intuitively see what is happening. Also, you need to run the learning rate finder.
If you are doing something on mobile devices, it is recommended to use Tensorflow because Pytorch is not supported well. Jeremy showed an example where he used Fastai and Keras code to classify images and the Keras code got 97% accuracy when the Fastai code got 99% accuracy. So it is easier to do things in Fastai but you should understand what the functions are.
- We take convolutional filter (a.k.a. kernel) where is white on right side and black on left side. 3×3 area become one pixel. Then we take other kernel where white is at top and black at bottom and got new image. (From one image we have created two different kind of images)
- Then we change all negative values to zeros (ReLU). After that we use max pooling. Max pooling take 2×2 area and write the biggest number from that area to new layer.
- Then we have again 3×3 convolutional filter. After that we throw away the negative values (ReLU) and then again max pooling.
- Finally we combine the two pixel grids and compare those to real letters which then give us percent of how similar the pixels are.
What you should take from this example is that you understand the structure of CNNs. Also as you can see in video, the first layers doesn’t change a lot and white and black are in certain places but when we come to second kernels white and black looked like randomly set.
output exp softmax
cat -1.83 0.16 0.00
dog 2.85 17.25 0.09
plane 3.86 47.54 0.26
fish 4.08 59.03 0.32
building 4.07 58.78 0.32
Softmax values are between 0 and 1, and all should add up to one. We use softmax in last layer to see what is in image.
output is the number we get from last linear layer of the convolutional layer. Remember that, in order to make complex functions we need linear layer and non-linear layer. Softmax is non-linear function.
exp is just
epower to output in that row. This makes differences between numbers much larger. After calculating the exps we add them up which in our case give result of 182.75. And finally we calculate the softmax by dividing the exp in that row with sum of exps. So first softmax calculation look like this:
exp = e^-1.8
softmax = exp/sum(exp) = 0.16/182.75 =~ 0.00
Because we always divide the exp with sum of the values we get something between 0 and 1, and the results also add up to one. Reason why most of the numbers are small and then there is one or two bigger probabilities is because we did e power to something which makes differences bigger.
Zip is method which take two arguments and combine them together. Like if oyu have list a which contain 0,1,2,3,4…. and list b which contain “one”,”two”,”three”,… and then you write
zip(a,b) you got list where is 0 at first row at first column and “one” at first row second column. This is handy method and I recomend to atleast remember it so if you some day need it you can read documentations more closely.
Sigmoid is the function you should use if you want to predict multiple things from image.
softmax predict: chair
sigmoid predict: chair, plane, desk.
Sigmoid is calculated following way:
sigmoid = exp/(1+exp)
So now sigmoids doesn’t add up to one.
Source: Deep Learning on Medium