Original article was published on Deep Learning on Medium
Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. This enables CNNs to learn complex functions and they can easily be scaled to thousands of different classes, as seen in the well-known ImageNet dataset of 1000 classes, used as a benchmark for computer vision algorithm performance.
In the past couple of years, these cutting edge techniques have started to become available to the broader software development community. Industrial strength packages such as PyTorch and Tensorflow have given us the same building blocks that Google uses to write deep learning applications for embedded/mobile devices to scalable clusters in the cloud — Without having to hand-code the GPU matrix operations, partial derivative gradients, and stochastic optimizers that make efficient applications possible.
On top of all of this, are user-friendly APIs such as fastai and Keras that abstract away some of the lower level details and allow us to focus on rapidly prototyping a deep learning computation graph.
In the future, I want to also make something to be able to predict the amount of food and give the nutritional values based on that, but for now, lets do the image classification part.
Food-101 is a challenging dataset consisting of 101,000 images of 101 different food classes and 1000 images for each class with images of 512 x 512 resolution. Each class has 750 training and 250 test images.
The dataset contains the following classes:
Some of the classes are just variants of the same kind of food, which makes it very hard to differentiate between even for a human. For example, the only difference between steak and filet mignon is where from the cattle’s body is the meat sourced from.
Even in the same class, images can vary wildly. For example, all of the images in Figure 2 have been labelled as “bread pudding”, yet even as a human, I think I’d struggle to classify them as such.
The creators of the dataset left the training images deliberately uncleaned, so there are a number of mislabeled images, and as we can see, a large range in brightness / color saturation. More fundamentally, however, it’s clear that no two bread puddings are quite alike (or at all alike it seems). Classifying images with such high intra-class variation is hard.
If our model is to perform well “in the wild”, it needs to be able to handle these issues: real, non-professional photos of food aren’t going to be of uniform size and are unlikely to have perfect lighting or framing.
If you’re just interested in the results go to the Results section.
Data augmentation and Dataloader
Data augmentation is a strategy that significantly increases the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks.
I have used fastai’s inbuilt data augmentation tools here:
Next, we create a ImageList using fastai’s `ImageList.from_folder` class.
Let’s look at one image from each class.