Building AI: you might not need as much data as you think (transfer learning)

Transfer learning is a deep learning technique for reusing learning from similar areas. When looking to design intelligent software, you must bear in mind transfer learning as it provides a host of new possibilities that traditionally needed Google-level quantities of data.

Supervised learning: high level overview

Let’s take a common machine learning situation: supervised learning: learning how to accurately perform a prediction by learning from examples provided.

Take the task: “given this image, tell me what object you see in the image”. For the machine to learn, it needs examples, provided by humans:

  • Some inputs (e.g. images of cats, bicycles, houses)
  • The expected outputs for those inputs (e.g. text label describing object found in that image, e.g. cat, bicycle, house etc)

To achieve this there are two steps.

  1. Create a trained network:
  • Take the inputs (e.g. image),
  • Predict outputs and check predictions against human-provided outputs
  • Tweak configuration of network to try and reduce errors, then try again, until network’s results are good enough (or more technically, “unlikely to change with further tweaks”). Training can take a long time (hours, days weeks months) and is why having oodles of cloud computing computing power has helped machine learning immensely in recent years

2. Predict outputs from new inputs:

  • Give new inputs (e.g a new image)
  • Ask for predicted output. This is usually very quick (effectively real time or close to it) and low cost to do.

Transfer learning is a way speed up learning by reusing the learning from a similar task. For example reusing the learning for general image recognition to accelerate the learning for x-ray recognition.

When can you expect transfer learning to work well?

  • When the two tasks have same inputs (e.g. for general object recognition and x-ray recognition, the input is an image, the output is a text description of the object found)
  • You have a lot more data (10–100x) for the first task to the second task.
  • You believe that the low level features from A could be helpful for learning B — there is something in common. For general object recognition, the low-level features of edges, lines, curves, dots, parts of shapes are likely to be very similar to the low-level features of x-ray images

Transfer learning works like this:

  • Take a trained network for one task (such as image recognition: predicting the object in an image from millions of images)
  • Reuse most of of that network training but reset the values of the highest one or two layers of the network. This reuse is often referred to as pretraining or fine-tuning.
  • Connect it to a new data set
  • Train again. In theory as the low-level features are the same, the training should be faster and/or need less data.


This is substantially derived from Andrew Ng’s excellent video on transfer learning:

Source: Deep Learning on Medium