Kaggle Protein Image Classification Competition: A Brief Review

Source: Deep Learning on Medium

Go to the profile of kaze

Recently I took part in ‘Human Protein Atlas Image classification challenge’ on Kaggle with some of my friends. In the end, we won a silver medal (27/2172), which is a total surprise. I think it is worth making a brief summary of what we had done during the competition, to contribute to the machine learning/deep learning community, from which we learnt a lot.


We trained a ResNet-50 model with a few tricks.

Problem statement

Each team is given an image dataset from Human Protein Atlas to predict protein organelle localisation labels for each sample (image). 28 labels as follows are presented in the dataset.


  1. Extreme class imbalance: some classes are difficult to train, let alone to predict. At the same time, they seem to weigh high in the ranking process.
  2. The data distribution is not consistent in train/test splits.
  3. External HPAv18 data is useful, as well as high quality images, but it is difficult to find a balance between inference efficiency and accuracy.

Model structure and training details

We adopted the fastai code from a public notebook kernel, and switched the backbone to ResNet50 along with larger input size (512*512), as we believe that we need a stronger feature extractor, and more information.

Similar to other high ranking teams, we added the external dataset for training, and furthermore used oversampling during the data augmentation process. Meanwhile, we lengthened our cycle rate training scheme, in order to achieve better performance. These proves useful.

As I did more work on trying different modules for training, I found focal loss of the whole training set is itself a good metric, while direct optimisation of F1 score did not work well. Upon predicting on the test set, F1 is too sensitive to thresholds, and it was almost impossible to track its consistency across different folds.

During the last stage, we used the public LB as another validation set.

We regretted that we had not downloaded the high-resolution dataset in the beginning, due to our ‘laziness’, otherwise we might get even better results. Since we were all working heavily on our daily jobs and this competition took about a couple of hours after work, we did not complain.

We also tried other famous classification networks such as Xception and Inception v3, but they did not perform better than our final model.

During the last three weeks, all three of us trained separately and applied ensemble. A fraction of them worked a little better on the public LB, but they failed in the private LB. Clearly fitting the validation set is not a clever strategy.

I will try to update some snippets we used in our competition stage. If you have any questions, leave me a message.