Unleash the power of Callbacks[Deeplearning]

Original article was published on Deep Learning on Medium

Unleash the power of Callbacks[Deeplearning]

If you are into Deep learning and using the framework Tensorflow this post would be quite insightful

U might build great models using TensorFlow and could classify any damn thing on earth using Deep learning, but there comes a time when u might have many issues while training like

  1. Less compute power for tweaking and playing with different hyperparameters
  2. Power cut or loss of connection of an online Jupyter Notebooks
  3. Maybe you don’t know which learning rate to use
  4. When to stop training[which numbers of epochs to use].
  5. And many more problems

Here comes your savior, call-back functions. These are provided by the famous deep learning framework Tensorflow. These are the functions that help u to interact with the network while training and even changing hyperparameters during training.

This post is all about using the callback functions to the fullest to make your training efficient.

Bonus :

1. Learn how to import any dataset from Kaggle into Google Colab

2. Train a Dogs vs Cats classifier over images without labels directly from folders

The dataset could be downloaded from here: CLICK HERE

The post is divided into 2 sections:

  1. Top five Callback funtions[That I use]
  2. Building a classifier using these callbacks

Top 5 Callback functions I use:

Here I will be talking about callbacks which I use very often while training

a. ModelCheckpoint()

U must have played many games where u got checkpoints, so u get revived once u die over there rather than starting from the beginning. Here is a similar concept where ur model is saved at intervals and at last only the best model is saved, meaning the model with the best-trained parameters taking into consideration a metric.

Example:

tf.keras.callbacks.ModelCheckpoint(filepath='best_weights.hdf5',monitor='val_acc',save_best_only=True)

here the .hdf5 model file is saved with best weights depending upon validation accuracy, which I feel is important for better generalization. To check further out check this

b.EarlyStopping()

There comes a time when u need to stop training your model or else u might be wasting your computing power for no reason because the accuracy is constant for like many epochs, This might lead to u paying for the cloud service for no reason. This function helps u to stop before something like this happens based on some criteria like validation loss or accuracy or anything else.

Example:

tf.keras.callbacks.EarlyStopping(monitor='val_loss',patience=3)

This function will monitor validation loss and stop if the validation loss doesn’t change for 3 epoch, that’s what patience refers to. Check this for further details

c.Custom Function using Callback class

U might have some different kind of criteria and condition u want to apply to the network while training, so u need to build a custom condition for a callback function, no worries TensorFlow covers for you. You just have to mention when do u want to apply the conditions or a function like after the epoch ends or at the end of a batch of images is processed and many more.

Example:

class c(Callback):def on_epoch_end(self,epoch,logs={}):if(logs.get('val_acc')>0.95):print('Too much accuracy so exiting')self.model.stop_training=Trueif(logs.get('loss')<0.1):print('Too much loss lost so exiting')self.model.stop_training=Truecustom_callback = c()

You have to make a class inheriting Callback from TensorFlow, as I said before u need to specify like when do u want to apply the condition, as like hereafter epoch ended. logs refer to the info about the current training step. Here I need to stop training when the loss is below 0.1 and validation accuracy is above 95% at the end of an epoch. For more information on various functions in callbacks click here

d. CSVLogger()

So do u build web apps where u need to showcase your training and test graphs and other things about training or u want to just simply save the information regarding training locally, there is even a built-in function for it. This is like building a log file for your model so you could even plot later using Matplotlib or seaborn libraries.

Example:

tf.keras.callbacks.CSVLogger('logfile.csv')

Here its too simple just give it the name of the log file, easy right? To know more click

e. LearningRateScheduler()

This could be quite handy if you are new to machine learning and have zero clues what should be your learning rate .. someone help!!!. No worries TensorFlow at ur rescue.

This function is my favorite because it lets u decide the learning rate per epochs if u want or over any other conditions too. Mastering the skill of deciding learning rate is not that easy so why not use this

Example :

def scheduler(epoch):
if epoch < 10:
return 0.001
else:
return 0.001 * tf.math.exp(0.1 * (10 - epoch))
tf.keras.callbacks.LearningRateScheduler(scheduler)

Define a function that takes in an input parameter which here corresponds to the epoch number, utilizes that parameter, and defines the condition when u want to change the learning rate and return it. Easy right like here I use a constant learning rate of 0.001 for the first 10 epochs and later decrease the rate gradually. For more information regarding this click here

Building a classifier using these callbacks

As I always do, here is the code for implementing the whole network and playing with different callback functions.

Here I have used cats vs dogs dataset, I know that’s cliché but more important is showing u how to use different callback functions. This code is performed over Google Colab,

an amazing free tool for prototyping different models for training and scale this code over any other high end paid Cloud services. Because if u directly start there u might be paying initially for just prototyping which is not the kind of money u wanna waste while prototyping.

STEP 1[Optional]:

Authenticate your Kaggle account for downloading the data directly to Google Colab

a. Upload the kaggle.json file, which u will find in the account section of your Kaggle profile

from google.colab import files
files.upload()

b. Kaggle demands to have ur file in a specific folder to use it and then give it the rights

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

c. Download and unzip ur data

!kaggle datasets download -d tongpython/cat-and-dog
!unzip cat-and-dog.zip

Step 2:

Import All the libraries

import tensorflow as tffrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense,Flatten,Conv2D,MaxPooling2Dfrom tensorflow.keras.optimizers import RMSprop,Adamfrom tensorflow.keras.losses import binary_crossentropy,categorical_crossentropyfrom tensorflow.keras.preprocessing.image import ImageDataGenerator,img_to_arrayfrom tensorflow.keras.callbacks import Callback
import matplotlib.pyplot as plt
import random

Tensorflow for Building the model, Matplotlib for plotting and random for some random number generation

Step 3:

Setting up image generators for image augmentation and loading those images directly from the folder which represents the label.

Remember to give the path to the folder where various folders corresponding to various class resided

train_data_gen = ImageDataGenerator(rescale = 1./255,shear_range = 0.2,zoom_range = 0.2,horizontal_flip = True)train_gen=train_data_gen.flow_from_directory('training_set/training_set',target_size=(64,64),batch_size=32,class_mode='binary')valid_data_gen = ImageDataGenerator(rescale=1./255)valid_gen=valid_data_gen.flow_from_directory('test_set/test_set',target_size=(64,64),batch_size=32,class_mode='binary')'''
Found 8005 images belonging to 2 classes.
Found 2023 images belonging to 2 classes.
'''

We want our image to be resized as 64×64 and apply some data augmentation like flip and zoom.

Step 4:

View your augmented images

def disp():
x,y = train_gen.next()
id = random.randint(0,x.shape[0])
print("This belongs to class a "+str(y[1]))
plt.imshow(x[1])
plt.show()
disp()
This belongs to class 1.0

Step 5:

Build your model

model = Sequential()
model.add(Conv2D(input_shape=(64,64,3),filters=32,kernel_size=(3,3),activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Conv2D(filters=32,kernel_size=(3,3),activation='relu'))
model.add(MaxPooling2D(2,2))
model.add(Flatten())
model.add(Dense(128,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['acc'])
model.summary()

The Architecture

Model: “sequential” _________________________________________________________________ Layer (type) Output Shape Param # ========================================

conv2d (Conv2D) (None, 62, 62, 32) 896 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 31, 31, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 29, 29, 32) 9248 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 14, 14, 32) 0 _________________________________________________________________ flatten (Flatten) (None, 6272) 0 _________________________________________________________________ dense (Dense) (None, 128) 802944 _________________________________________________________________ dense_1 (Dense) (None, 1) 129 ========================================

Total params: 813,217 Trainable params: 813,217 Non-trainable params: 0

Step 6 [Important]:

Declare your callbacks

class c(Callback):
def on_epoch_end(self,epoch,logs={}):
if(logs.get('val_acc')>0.95):
print('Too much accuracy so exiting')
self.model.stop_training=True
if(logs.get('loss')<0.1):
print('Too much loss lost so exiting')
self.model.stop_training=True
custom_callback = c()early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',patience=3)checkpointer = tf.keras.callbacks.ModelCheckpoint(filepath='best_weights.hdf5',monitor='val_acc',save_best_only=True)csv= tf.keras.callbacks.CSVLogger('logfile.csv')term_nan=tf.keras.callbacks.TerminateOnNaN()def scheduler(epoch):
if epoch < 10:
return 0.001
else:
return 0.001 * tf.math.exp(0.1 * (10 - epoch))
lr = tf.keras.callbacks.LearningRateScheduler(scheduler)

All these callbacks were discussed before

Step 7 :

Training Begins

history = model.fit_generator(train_gen,steps_per_epoch=8005//32,epochs=50,validation_data=valid_gen,validation_steps=2023//32,callbacks=[custom_callback,early_stopping,checkpointer,csv,term_nan,lr])

here steps refer to no of steps to cover all images with a batch size which we declared while image generators

Please use Model.fit, which supports generators.

Epoch 1/50 250/250 [==============================] — 33s 131ms/step — loss: 0.6877 — acc: 0.5511 — val_loss: 0.6743 — val_acc: 0.5724 — lr: 0.0010

Epoch 2/50 250/250 [==============================] — 33s 131ms/step — loss: 0.6511 — acc: 0.6166 — val_loss: 0.6297 — val_acc: 0.6731 — lr: 0.0010

Epoch 3/50 250/250 [==============================] — 33s 132ms/step — loss: 0.6015 — acc: 0.6733 — val_loss: 0.5697 — val_acc: 0.6969 — lr: 0.0010

Epoch 4/50 250/250 [==============================] — 33s 133ms/step — loss: 0.5625 — acc: 0.7105 — val_loss: 0.5571 — val_acc: 0.7197 — lr: 0.0010

Epoch 5/50 250/250 [==============================] — 34s 134ms/step — loss: 0.5343 — acc: 0.7293 — val_loss: 0.5374 — val_acc: 0.7376 — lr: 0.0010

Epoch 6/50 250/250 [==============================] — 33s 133ms/step — loss: 0.5142 — acc: 0.7425 — val_loss: 0.5054 — val_acc: 0.7584 — lr: 0.0010

Epoch 7/50 250/250 [==============================] — 33s 133ms/step — loss: 0.4785 — acc: 0.7711 — val_loss: 0.4850 — val_acc: 0.7748 — lr: 0.0010

Epoch 8/50 250/250 [==============================] — 34s 134ms/step — loss: 0.4599 — acc: 0.7781 — val_loss: 0.5026 — val_acc: 0.7738 — lr: 0.0010

Epoch 9/50 250/250 [==============================] — 33s 134ms/step — loss: 0.4413 — acc: 0.7941 — val_loss: 0.4764 — val_acc: 0.7808 — lr: 0.0010 Epoch 10/50 250/250 [==============================] — 33s 133ms/step — loss: 0.4302 — acc: 0.8003 — val_loss: 0.4914 — val_acc: 0.7808 — lr: 0.0010

Epoch 11/50 250/250 [==============================] — 33s 133ms/step — loss: 0.4138 — acc: 0.8060 — val_loss: 0.4670 — val_acc: 0.7812 — lr: 0.0010

Epoch 12/50 250/250 [==============================] — 33s 134ms/step — loss: 0.3900 — acc: 0.8224 — val_loss: 0.4950 — val_acc: 0.7827 — lr: 9.0484e-04

Epoch 13/50 250/250 [==============================] — 34s 134ms/step — loss: 0.3767 — acc: 0.8299 — val_loss: 0.4581 — val_acc: 0.7877 — lr: 8.1873e-04

Observe the constant learning rate and changing of it after 10 epochs and it actually ended training much before 50 epochs at 16th epoch. Finally, the learning rate used was LR: 8.1873e-04

Last Step :

Evaluate the model

Train Accuracy of 82% and Validation Accuracy of 78.7%

Train Loss of 0.3767and Validation Loss of 0.4581

Thank you, To visualize your convolutions do visit my previous post

Hope this post helped you, Happy Deep Learning