Recently I wrote about OpenAI works in the meta-learning space with the publication of the Reptile algorithm research paper and initial TensorFlow implementation. Based on the feedback received from the article, I thought it might be a good idea to review some of the fundamental concepts and history of meta-learning as well as some of the popular algorithms in the space.
The ideas behind meta-learning can be traced back to 1979 and the work of Donald B. Maudsley when he rereferred to the new cognitive paradigm as “the process by which learners become aware of and increasingly in control of habits of perception, inquiry, learning, and growth that they have internalized”. A simpler definition can be found in the works of John Biggs (1985) in which he defined meta-learning as “being aware of and taking control of one’s own learning”. Those definitions are accurate from the cognitive science standpoint but they seemed a bit hard to adapt to the work of artificial intelligence(AI).
In the context of AI systems, meta-learning can be simply be defined as the ability to acquire knowledge versatility. As humans, we are able to acquire multiple tasks simultaneously with minimum information. We can recognize a new type of object by seeing a single picture of it or we can learn complex, multi-task activities such as driving or piloting an airplane at once. While AI agents can master really complex tasks, they require massive amounts of training on any atomic subtasks and they remained incredibly bad at multi-tasking. So the path to knowledge versatility requires AI agents to “learn how to learn” or, to used a more obnoxious term, to meta-learn J.
Types of Meta-Learning Models
Humans learn following different methodologies tailored to specific circumstances. In the same way, not all meta-learning models follow the same techniques. Some meta-learning models are focused on optimizing neural network structures while others (like Reptile) focused more on finding the right datasets to train specific models. A recent research paper from UC Berkeley AI Lab does a comprehensive job enumerating the different types of meta-learning. Here are some of my favorites:
Few Shots Meta-Learning
The idea of few shots meta-learning is to create deep neural networks that can learn from minimalistic datasets mimicking, for instance, how babies can learn to identify objects by seeing only a picture or two. The ideas of few shots meta-learning have inspired the creation of techniques such as memory augmented neural networks or one-shot generative models.
Optimizer meta-learning models are focused on learning how to optimize a neural network to better accomplish a task. Those models typically include a neural networks that applies different optimizations to the hyperparameters of another neural network in order to improve a target task. A great example of optimizer meta-learning are models that focused on improving gradient descent techniques like the one published in this research.
The objectives of metric meta-learning is to determine a metric space in which learning is particularly efficient. This approach can be seen as a subset of few shots meta-learning in which we used a learned metric space to evaluate the quality of learning with a few examples. This research paper shows how to apply metric meta-learning to classification problems.
Recurrent Model Meta-Learning
This type of meta-learning model is tailored to recurrent neural networks(RNNs) such as Long-Short-Term-Memory(LSTM). In this architecture, the meta-learner algorithm will train a RNN model will process a dataset sequentially and then process new inputs from the task. In an image classification setting, this might involve passing in the set of (image, label) pairs of a dataset sequentially, followed by new examples which must be classified. Meta-Reinforcement Learning is an example of this approach.
Source: Deep Learning on Medium