One-Cycle Policy, Cyclic Learning Rate, and Learning Rate Range Test

Source: Deep Learning on Medium

Tensorflow and Keras tools

Keras callbacks that can complete your training toolkit

Go to the profile of Pisek K

In deep learning, a learning rate is a key hyperparameter in how a model converges to a good solution. Leslie Smith has published two papers on a cyclic learning rate (CLR), one-cycle policy (OCP), and learning rate range test (LrRT). He claimed that CLR/OCP helps a model to converge faster and has an effect of regularization. Jeremy Howard of fastai loved it, and has integrated the idea into fastai library. But fastai runs on PyTorch. How about Keras?

For Keras, there are a few Keras callbacks that implement OCP/CLR available on github (such as this one from keras-contrib repository). They cycle learning rate values, but do not change momentum. Moreover, they require a cycle length in steps/batches when initializing a callback.

I think there are ways to do OCP/CLR better:

  • Update learning rate
  • Update momentum
  • Automatically calculate cycle length using the number of epochs given to or model.fit_generator

We also need a better LrRT. Previously available LrRT modules (such as surmenok’s and jeremyjordan) can do a LrRT range test. For surmenok, you have to train for a whole epoch, which can be very large. For jeremyjordan’s, it needs a number of steps (batches) per epoch in the initialization step. Then, the lr range test is distributed over batches needed to complete a specified epoch number. They are not simple, and they do not have weight decay search.

I think we can do LrRT better by:

  • Specifying how many steps for a LrRT (so that we don’t have to train for a whole epoch, just train enough to finish the range test.)
  • Adding an ability to perform test for different weight decay values, using validation set.

The codes can be found in my repository.

The following is an example of how to use it.