Transfer Learning- A unorthodox explanation

Source: Deep Learning on Medium


Hey Folks! I am one of those people who are really interested in the latest buzzwords in technology like Machine Learning, Neural Networks and what not. The world of technology is growing so fast that it has become close to impossible to keep up with it. I have been learning and applying the concepts of Deep Learning for over 2 years now. In this introductory level post, I want to give a quick and unorthodox overview of the concept of Transfer Learning.

Why Transfer Learning?

One of the first and foremost questions that arise to the reader will be -why transfer learning? There is a simple reason why I decided to start my blog on this particular topic. In the realm of machine learning, knowledge is power. The goal of an ML scientist anywhere on this earth is to increase the amount of knowledge his/her model can represent. The concept of Transfer Learning shows us that if one of us is able to increase the knowledge base covered by a model by even 1%, then the entire world can do it too. It has a much bigger significance too. Transfer learning lets model share their knowledge and become better together. So, for my first blog, I think it’s justifiable to start on a topic which might as well change the scenario of ML research.

Definition & Building Blocks

The formal definition of Transfer Learning is the reuse of a pre-trained model on a new, yet related problem. So, let’s examine this closely here. One of the terms in that definition is that of a pre-trained model. So what does that mean? Well, it’s as simple as it sounds. A pre-trained model is a model that has already trained on some dataset and is able to represent that data well. In practice, we have a lot of CNN architectures available online with the weights pre-trained on the ImageNet dataset. Now, the main idea of transfer learning is to use this already learned model and apply it to a related problem. In the example of CNN, architectures pre-trained on ImageNet, a related problem would detection of a specific type of flowers or birds or animals, etc.

A simple example

Instead of getting into the hard specifics of this vast technique, the idea of this post is to give you an intuition of what it is. Given the motivation, let me tell you a really simple, yet really helpful example. Imagine that you are a student at a university. Each year, you are given a set of courses that you have to mandatorily complete and pass the examination. So how does this happen? When you start off, you don’t have any knowledge on the subject. A professor who has been teaching this subject for a long time comes into your class and starts to give lectures on it. You take those lectures down as notes and you use said notes to pass the exam. Simple, right?

Now, let’s look more closely at the example and relate it to the concept of Transfer Learning. The first entity in the example is the student. The student is analogous to the new, untrained model that needs to learn a specific task. In the student example, is the specific course that he/she has enrolled in. The next entity in the example is the professor. The professor is analogous to the pre-trained model that has a bigger knowledge base. Think about it, the professor has had years of experience in his field and has a vast knowledge of not only the specific course the student enrolls in but also other subjects under his field. The professor in our case is a perfect candidate for a pre-trained model that can be used for transfer learning. As far as the student is concerned, the only objective is to pass the examination for his course. So, the pre-trained model transfers knowledge about his specific course. Now, if the student is good at learning, he can easily pick up the bits and pieces of knowledge given by the professor and easily pass his examination. In fact, the student may even outperform the professor in this particular examination, although the professor has more knowledge about his field as a whole.

Why is it used?

By this time, I hope the reader has understood the basic concept behind transfer learning and some of the advantages of the same. In this section, I just want to nail the point home by addressing the question of why transfer learning might be used for a particular problem. In many of these reasons, I will be revisiting the student example just to make the understanding easier for the readers.

  1. It saves a lot of time. This is one of the most easiest of things that you can observe directly out of the student example. If the student had to learn everything that the professor did just to pass that one exam, well, I wouldn’t want to be that student. In most cases, transfer learning is usually a quick fix for a huge problem. In technical terms, the training time of the model reduces exponentially when transfer learning is applied.
  2. It does not require huge amounts of data. Let’s return back to our favorite example. The “data” for learning a subject book. For the amount of knowledge that the professor has, he must have read thousands of books by a variety of authors on the same subject. But the only data that the student needed to pass the examination is the lecture notes that he took down during the lecture. The amount of “data” required in the case of transfer learning is strikingly low.
  3. Sharing is caring, right? Transfer Learning is one of the most promising areas of research which could lead to General Intelligence. It might be the key to producing an AI with knowledge on a variety of different subjects at the same time.

How to use it?

There are many ways and many applications in which transfer learning can be applied. The most basic and abstract steps of using this technique are: listed below:

  1. Identify a pre-trained model: This steps in identifying an already available pre-trained model which as close to the target model as possible. In computer vision applications, a pre-trained CNN network is the ideal model for transfer learning.
  2. Identify a way to transfer knowledge: The next step is identifying a way of transferring knowledge about the target dataset using the pre-trained model. This basically involves finding a way to represent the “knowledge” encoded by the pre-trained model that can be used by the target model to learn. In computer vision applications, the output of the last convolutional layer is usually the representation of knowledge about the dataset.
  3. Training the target model: The final step involves using the “knowledge” provided by the pre-trained model to train the target model efficiently. This is usually the easiest part as it just involves fitting the target model to the intermediate representation of the dataset.

Summary

In this post, a comprehensive explanation of the concept of Transfer Learning was provided through easily understandable and relatable examples.

Specifically, we learned:

  1. What is Transfer Learning?
  2. Advantages of Transfer Learning
  3. Real world example of transfer learning
  4. How to use Transfer Learning?

Please do comment below any queries and suggestions on the quality and content of this post. I will do my best to answer such queries.