Three models for Kaggle’s “Flowers Recognition” Dataset

Screenshot from my notebook

The images above are came from the Kaggle’s dataset “Flowers Recognition” by Alexander. The title for each image is their class name and index number in the dataset. This dataset contains 4242 images of flowers. The pictures are divided into five classes: chamomile, tulip, rose, sunflower, dandelion. For each class there are about 800 photos. Photos are not high resolution, about 320×240 pixels. I used three models of CNN to train these dataset. And got some different result (The code and details can be viewed in my github). Here is the process.


This is how the dataset looks like. As we can see, in the zip folder, there are five sub-folders which contains several images for each class.

Environment && Packages

  • Environment
    Because I don’t have good GPU (sad face) and don’t want to spend money on my experimental training. I used Google’s Colab which provide Tesla K80 GPU for free (actually it’s 20h runtime for free but I have trained my model for that long time).
  • Packages
    1. Python: 3.6.3
    2. Keras (tensorflow.python.keras) for building models
    3. OpenCV (cv2) for processing images
    4. sikit-learn (sklearn) for train test split


As you can see, in the first image of this article, pictures dimension are not uniformed. After loading images, all the pictures need to be resized before split to training part and validation part. In this place, I tried two sizes. The first is 64×64 and the second is 256×256.

Shape statistic for all images

I used to train Dogs and Cats Dataset with 64×64 input. The result makes sense for more than 0.9 accuracy. For this flowers dataset, by using the customed pre-trained model ResNet-50, the acc only reached around 0.74. However, the acc can increase to 0.92 after using 256×256 input. I think this is kind of like a tradeoff for training time and accuracy. 64×64 input only took 4 minutes to train while 256×256 input took 35 minutes to train. On the other hand, 256×256 input reserve more features of the original images than 64×64 input. So the model can be “smarter” with more features to learn.

model_from_ResNet50 with 256×256 input

Moreover, in this specific dataset, sometimes the flowers is quite small in the image. After resized by interpolation, the features of these flowers is harder to recognised even by human eyes.

Images from dataset with small flowers

After resizing, samples are divided two part for training and validation. I didn’t split another part for test because this is not a competition which need precise accuracy for my model and the dataset is already quite small for training process.

The train_images is added by sequence from sub-folders. So we need to shuffle the dataset. Otherwise, the model can only learn what is “daisy” on first 800 images, which is bad for optimising the parameters of the model. Note that the seed need to be set and apply both to train_images and train_labels, so that each image can match the right label.

Build The Model

The first model I used is a model built from scratch. I has three hidden layers and two FN. The convolution shape for each layer is 32, 64 and 128 like the most common setting for image classification task. The activation I used is ‘ReLU’. The pooling size I used is 2×2. Two dense function with size 512 and 128 accompanying ‘ReLU’ activation function. ‘softmax’ function is used at last dense function with size 5 to be the output layer. The loss function I used is ‘categorical_crossentropy’. The optimiser I used here is ‘adam’ which can automatically change the learning rate during training process.

Summary for model built from scratch

The second model I tried is by customising pre-trained model VGG19. Here I froze the first 5 layers with untrainable and the customised layer is two dense function with size 1028 accompanying ‘ReLU’ activation function. The last layer and corresponding parameters I choose is the same as the first model.

The third model I tried is by customising pre-trained model ResNet-50. Every setting is the same as the second model except I just froze the first layer with untrainable, which means my model would fit more to the new dataset I give to it.

Summary for model built from pre-trained model ResNet-50

Input Data

Considering the dataset is small, data augmentation might be a useful way to improve the accuracy. Except for normalising pixel value with 255, rotation, shift, shear, zoom and horizontal flip are added to input data by using ImageDataGenerator in Keras.

Train and Evaluate the Model

The batch size I used is 32, which is quite common and friendly for GPU’s parallel computing.

The epochs I choose for these three models are different.

  • The first I used is 50 because it’s built from scratch and needed a longer time to learn.
  • The second (with pre-trained VGG19 model) I used is 10 because I found that the accuracy for train and validation are both quite low (only 0.24). However, I haven’t think out the reason for that. Even though I tried froze just first layer. The result is similar. VGG19 has more complex layer and larger number of params than ResNet-50. Maybe it’s too complicated to change. I might try to reduce some top layer of VGG19 and add my simpler layer to train again in the future.
  • The third (with pre-trained ResNet-50 model) I used is 30. I didn’t imagine it can fit so well in just around 12 epochs. But considering VGG19 and ResNet-50 are using the same dataset ‘ImageNet’, so the model should be similar. It’s a little weird that I use nearly the same structure but get totally different results.
Model built from scratch
Model built from VGG19 model
Model built from ResNet-50 model

As we can see, the accuracy is very good for customised ResNet-50 model. But I guess these five flowers classes already contains in original ResNet-50 model, which is a little cheat for my training. But for practice and expending to future classification task, it makes sense.

Finally, I just randomly download some flowers picture and test my model. Customised model with ResNet-50 works perfectly well and model built from scratch can also make a rough prediction.

Prediction for images downloaded from Baidu
Prediction for validation images

So, this is for now. I might make some update for my model in the future and hope I can find out why the second model and third model have such a huge differences.

Source: Deep Learning on Medium