Continual/Lifelong Learning with Deep Architectures

Source: Deep Learning on Medium

“Towards Artificial Intelligence”

On 28th January 2019 at Mikamai, Data Science Milan has organized a workshop about Continual Learning topic. Deep learning can solve multiple tasks all at once, but what happens if you introduce a new task?

“Continual/Lifelong Learning with Deep Architectures”, by Vincenzo Lomonaco, PhD student, Author, Teaching Assistant

In this talk Vincenzo Lomonaco explained concepts behind Continual/Lifelong Learning. Artificial intelligence requires the ability to learn tasks in a sequential way, but neural networks are not able to do it, they fall in the “catastrophic forgetting”; a phenomenon that happens when deep learning are trained sequentially on multiple tasks and the network loses knowledge achieved in the previous ones because weights that are important for a current task are different in the following one.

The goal of Continual Learning is to overcome “catastrophic forgetting”, in this way the architecture is able to smoothly update the prediction model using several tasks and data distributions.

There are several strategies to figure out this matter, in the talk were explained three:

-Naïve Strategy;

-Rehearsal Strategy;

-Elastic Weight Consolidation Strategy

Vincenzo showed these strategies by a hands-on workshop with Google Colaboratory on MNIST dataset and using PyTorch. Look at the Github repository.

After an initial good training of the dataset (94% accuracy), he permuted the dataset and tried to use the same model to solve a new task obtaining bad results.

The Naïve Strategy of fine-tuning from one task to the next one consists on continuing the back-propagation process keeping samples from the previous one.

The Rehearsal Strategy fine-tuning shuffle data of the current task and then combines it with the previous task before the training process.

The last method requires regularization updating weights in order to keep the knowledge from previous tasks and avoiding “catastrophic forgetting”. The Elastic Weight Consolidation Strategy (EWC) estimates weights’ importance by Fisher information and introduces new regularization loss penalizing weights of previous tasks.

Look at the video.

Written by Claudio G. Giancaterino

Originally published at on February 3, 2019.