CNN approach for predicting Movie Genre from Posters!

Original article was published on Deep Learning on Medium

Deep Learning

CNN approach for predicting Movie Genre from Posters!

Convolutional Neural Network is a class of deep neural networks and is used for image or video recognition and classification tasks.

Image by Sky Mane from Pixabay

Can we build a model that can learn what makes a movie poster of one genre different from others? The movie poster depicts many things about the movie. It plays a vital role in exciting the viewer’s interest in the movie. In above example, the colors are mainly red and black so when training our model may learn to classify this type of images as ‘horror’ or ‘thriller’. This will be an interesting task to do. When you look at the posters of different genres you notice that they are different in some ways. For example, look at the poster below. All images of movie posters used in this article are collected from the IMDB.

We’re the Millers (2013)

It represents a comedy movie. Now take a look at the poster below that is of an action movie. We can see that posters represent an important aspect of the movie genre.

Quantum of Solace (2008)

In this project, we will build a neural model that can distinguish between three movie genre posters and predict any random poster’s genre. We will build this model step by step from scratch! The dataset used in this project is self-created with the help of IMDB. It contains over 3900 images of posters of each genre — Action, Comedy, Drama.

Let’s get to the coding portion.

1. Working with the dataset

Our dataset is structured as shown below. We have kept training images and test images in different directories. Each directory contains three subdirectories — Action, Comedy, and Drama. These subdirectories further contain the images of our movie posters.

Directory organization of the dataset

We will use ImageDataGenerator for labeling purposes. ImageDataGenerator from Keras API helps us by labeling the data automatically. It is also useful when implementing data augmentation in your code. Let’s see how this is done in coding.

import tensorflow as tf
import keras_preprocessing
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
TRAINING_DIR = "/images2/Training"training_datagen = ImageDataGenerator(rescale = 1./255,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
VALIDATION_DIR = "/images2/Validation"validation_datagen = ImageDataGenerator(rescale = 1./255)train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size = 256
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size= 64
)

We will first create an instance of ImageDataGenerator for both training and validation purposes. As pixel values range from 0 to 255 we will normalize them in range 0 to 1. To do this we will pass in the argument (rescale = 1./255) when creating an instance of ImageDataGenerator. After this, we will use the .flow_from_directory() method of the instance to label the images for both directories and store the result in train_generator and validation_generator for training and validation purposes. While calling this method we will pass in the target_size attribute to ensure that our images in the dataset are of the same size. Here we have 3 classes so we have to pass in class_mode parameter as categorical. Batch sizes for both training and validation depend on the number of images we have in our dataset.

We have labeled our data into three classes — Action, Comedy, Drama. Now we can go ahead and create our CNN model.

2. Creating the CNN model

We will use Keras’ sequential model for building our model. We will add 3 pairs of Conv2D and MaxPooling2D layers. Then we will add the Flatten layer so that we have our data in one dimension. Finally, we will add a fully connected Dense layer with 1024 hidden units followed by a softmax layer. Below is the code to implement this.

from tensorflow.keras.optimizers import RMSpropmodel = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])

After we create our model we will use the RMSprop optimizer when compiling the model which allows us to tweak the learning rate as needed. Here, the learning rate is chosen after many tests with the model. We will need to pass categorical_crossentropy as our loss function as we have more than two classes.

Our model is ready for training. Let’s train it with the data we labeled earlier.

3. Training the model

We will pass in the train_generator and validation_generator variables we created earlier with the right values for epochs.

history = model.fit(
train_generator,
steps_per_epoch = 36,
epochs = 100,
validation_data = validation_generator,
validation_steps = 36
)

After 100 epochs our model gave 69.8% training accuracy while validation accuracy was still 53.3%. Below is the chart plotted for both accuracy metrics.

As we can see the highest validation accuracy is around 0.53 and training accuracy is around 0.70. Let’s now try our model on some images.

4. Testing our model

We will use Google Colab’s inbuilt library for uploading images and then we will pass them to our model and see if can get the genre correct.

# predicting for random images
import numpy as np
from google.colab import files
from keras.preprocessing import image
uploaded = files.upload()for fn in uploaded.keys():
path = '/content/' + fn
img = image.load_img(path, target_size=(150, 150))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=256)
print(classes)

We are passing in three different movie posters of different genres — action, comedy, and drama.

The output of the above code is shown below.

We are getting all three of them correctly classified but this will not happen every time. Remember that our validation accuracy is still around 53% so in half of the cases our prediction may go wrong.

You can find all of the code here on GitHub.

Future scope and limitations

A very smaller dataset is used here and as a result, the accuracy is lower. In the future, a larger dataset may be used to improve accuracy or even predict multiple genres for the same movie. Here the model only predicts for 3 types of genres but in the future, a more complex model using ResNet can be built that predicts for more than 10 or 20 types of genres. Machine learning algorithm K-Nearest Neighbours can also be used for this purpose.

Conclusion

Above we saw how we can build a model that can predict movie genre from its poster. There are still some posters that will be hard to classify. For example, the one shown below. The poster shown below is of a drama film. We can see that it contains only text as a result, it will be hard for our model to predict the correct genre.

The Genre prediction field is not yet fully explored. Using CNNs for image recognition tasks may prove useful for genre prediction from the images of the movie posters. CNN may find what makes a comedy movie poster different from an action movie poster.