How to train CNNs on ImageNet

Original article was published on Deep Learning on Medium

How to train CNNs on ImageNet

A practical guide to using large image datasets for deep learning

I’ll go over how to get the ImageNet dataset, and train your convolutional neural net on it. I’ve added some advice and learnings specific to training CNNs with PyTorch.

Before you start

If you haven’t already, I recommend first trying to run your model on a sample image. When you’re starting out, it’s really tempting to jump to a big dataset like ImageNet to train your next state of the art model. However, I’ve found it more effective to start small and slowly scale up your experiment. First, try an image to make sure your code works. Then, try a smaller dataset like CIFAR-10. Finally, try it out on ImageNet. Do sanity checks along the way and repeat them for each “scale up”.

Also, be aware of the differences in your model for the smaller image sizes of one dataset vs the other. For example, CIFAR-10 has only 32×32 size images which are smaller than ImageNet’s variable image sizes. The average resolution of an ImageNet image is 469×387. They are usually cropped to 256×256 or 224×224 in your image preprocessing step.

In PyTorch, we don’t define an input height or width like we would in TensorFlow, so it’s your job to make sure output channel sizes along the way are appropriate in your network for a given input size. My advice is to be wary of how dimensionality reduction occurs from shallow to deeper filters in your network, especially as you change your dataset.

VGG-16 Network Architecture. Output channels have shrinking dimensions (h x w). What errors could we come across if we change the input size? [Source:]

In general, as you increase the input resolution to a new dataset, the early receptive field should also increase in size (via increasing kernel size or adding pooling layers).

This is for two reasons:

  1. Increasing the size of the early receptive field is a form of regularisation to guard your CNN from learning ultra specific details of images that are less generalisable.
  2. When decreasing the input resolution, this will help avoid premature shrinkage of the channel size. Applying a convolution to a 256x1x1 size tensor is kind of useless.

Both of these errors fail silently. These errors result in only an 8% decrease in top 1 accuracy when an ImageNet shaped ResNet is improperly applied to CIFAR-10. To correct this error, when moving from CIFAR-10 to ImageNet, the ResNet authors add an early max-pool layer, and use a larger initial kernel size (5×5 → 7×7).

I’d really recommend reading this blog post by Andrej Karpathy for a deeper intuition of this art. I’d also recommend this post by Tim Rocktaschel on advice for short term ML projects.

Downloading ImageNet

This is best done on a cloud environment. Unless you have access to a powerful GPU and a large SSD, I wouldn’t recommend doing this locally.

Before doing any training, spin up a Google Colab instance or an AWS SageMaker instance to use a Jupyter Notebook to experiment with your model & visualise the data being passed in. Then when you want to train your model, I’d recommend using a script and spinning up an EC2 instance with the AWS Deep Learning AMI. Attach an EBS instance to your EC2 with enough storage space for downloading & unzipping ImageNet. For 2012 ImageNet, the compressed download is 150GB. But you will need ~400GB since you need enough space to unzip the files, then delete the .tar afterwards. Using an EBS instance also means you can upgrade your EC2 without having to re-download the data.

Now to actually download ImageNet, the official instructions are to sign up as a researcher with your research institution here.

I don’t think Stanford has been maintaining this for quite some time, as you’ll never get the email invite. So, what I found is effective is to download ImageNet from Academic Torrents.

Search for ImageNet, get the desired magnet links, and use the CLI to download torrents with Transmission. Make sure your instance has internet access!

sudo yum install transmission transmission-daemon transmission-cli

Then setup your download directory

transmission-daemon --download-dir "your-download-directory-path"

And add your magnet link

transmission-remote -a "magnet-link"

Find other important commands here.

Once you have downloaded the compressed files, we’d like to extract them and put them in the correct folders so that they match what the PyTorch ImageFolder class expects as described in the documentation here.

Place ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar in the same folder as the following script to get the desired folders. Edit as necessary for your specific torrent.

I’d also recommend throwing both .tar files onto a bucket in S3 so you can get them from there next time. Don’t toss the uncompressed files since you pay for individual requests per object on S3.

Setting up Data Loaders

I’d recommend setting up your usage of PyTorch’s DataLoader and ImageFolder in a module titled with the dataset. I’ve found that easy to help keep dataset specific augmentations in different files. Here’s an example for use with ResNet. Set up your default batch size, your normalising transformation, and crop that is specific to this dataset. Perhaps in another file like you could have the dataset loader with settings specific to cifar-10 (with different batch size, normalisation, and crop).

Training with ImageNet

I would not recommend training a model on a massive dataset like ImageNet or Sports1M in a Jupyter notebook. You may have timeouts, and your instance will disconnect from stdout which leads to you not seeing the progress your model is making either. A safer option is to ssh in and train with a script in a screen.

I would also recommend using to track progress in a neat visual dashboard. Some people use TensorBoard or TensorBoardX for pytorch, but I’ve yet to try that out. I liked because it keeps my results around even after I’ve closed the instances, and lets me easily compare experiments.

Now use your data loaders with your model, your choice of an optimiser, and your choice of loss to train on ImageNet. It’ll look like some variation of the following pseudocode:

# one epoch
for i, (images, target) in enumerate(train_loader):
# compute output
output = model(images)
loss = criterion(output, target)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do step

This is only for training. Use this in a loop with a validation function to alternate training and scoring on the validation set in each epoch. For more examples on how to do this, look at the official PyTorch examples here.

Remember to have a look at the data before it goes in to your network at least once. This means actually visualising it. Here’s a sample sanity check below to use to make sure everything is going well during preprocessing.

For completeness, I’ve added some code above the sanity check to generate the denormalising transformation (to view the actual image without the effects of normalisation).

Now have fun training & keep your sanity with sanity checks! 😄