Predicting Genre from Movie-poster!

Original article was published on Deep Learning on Medium

Deep learning | Convolutional Neural Networks

Predicting Genre from Movie-poster!

Movie posters depict many things about the movie. Building a model that can find patterns between different movie genre posters is an interesting venture.

Photo by Letícia Costa on pinterest

Posters represent one means of impacting human actions by using media. The film industry is dependent on the effectiveness of posters. Getting good posters is beneficial for movies, as they excite interest in watching the movie. Thus, designers tend to include prominent features in the posters to attract more spectators to their movies. As a model could identify features that predispose movie posters to success, the subject of image recognition concerning movie posters gets interesting. It would be an intriguing experiment to see which aspects of posters reviewers find appealing and are useful for designers.

Comedy movie poster
Action movie poster

The above posters are of a comedy movie “We’re the Millers (2013)” and an action movie “Quantum Of Solace (2008)”. We can notice after all, that two posters are different in many ways. So we intend to take this difference into account and build a model that can classify between 3 popular movie genres — Action, Comedy, Drama. The dataset of movie posters used in this project is self-created with the help of the below dataset which contains links to the images of posters.

Let’s get to the coding part directly. The first task we will do is the labeling of our poster images.

Labeling using ImageDataGenerator

ImageDataGenerator from Keras API is used for labeling the data. The directory structure is shown below. We have 9358 training images and 2338 validation images.

The file structure of our dataset

ImageDataGenerator labels the images from its directory name. As we saw the directory structure above, all images inside the Action folder will be labeled as “Action” and likewise for others. You can learn more about ImageDataGenerator here. Below is the code for doing this task.

import tensorflow as tf
import keras_preprocessing
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
TRAINING_DIR = "/images2/Training"
training_datagen = ImageDataGenerator(rescale = 1./255,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
VALIDATION_DIR = "/images2/Validation"
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size = 256
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size= 64
)

We have also added the code for ‘Data Augmentation’ for our images as the dataset we are using is not large enough. Data augmentation keeps us from overfitting our smaller dataset.

Next, we have defined the training images directory and validation images directory here and created a train_generator and validation_generator which we will pass into our model for training. In the dataset, images can be of different sizes therefore we pass in the argument target_size for resizing the images from training and validation directories. class_mode argument is set to ‘categorical’ as we have more than two classes to predict. Batch size is dependent on the number of images you have for training and validation tasks.

Finally, we will train our model by calling the .fit() method.

Building and Optimizing the model

from tensorflow.keras.optimizers import RMSpropmodel = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])

We are using 3 pairs of Conv2D-MaxPooling2D layers here. The input shape is 150 x 150 x 3. ReLU (Rectified Linear Unit) is used as an activation function here. After these 6 layers, we add the Flatten layer to get all numbers into one dimension. Then, we use a fully connected Dense layer with 1024 hidden units followed by another Dense layer with ‘softmax’ activation function to predict between 3 classes. Below is how our model looks.

Note that here we are using a learning rate of 0.01 for best results. Various learning rates are tested for this model and the one which gave the best accuracy is chosen.

Overview of the model using Keras-sequential-ASCII

Training our model

Let’s finally train our model. In the .fit() method we will pass train_generator and validation_generator as shown below. Moreover, we will also pass in steps_per_epoch for both the training and validation aspects. Last, we will train our model for 100 epochs which means it will go through our dataset that many times.

history = model.fit(
train_generator,
steps_per_epoch = 36,
epochs = 100,
validation_data = validation_generator,
validation_steps = 36
)

After 100 epochs we get 70% training accuracy and 53.2% validation accuracy. Take a look at the training accuracy and validation accuracy throughout the training time. Below is the line plot of both accuracies.

Training accuracy vs Validation accuracy

It may look like the model is overfitting to the dataset but it is not. This is the best validation accuracy we could get. This model has been tested with many different values for hyperparameters and the right combination is chosen.

Predicting for any random image

Let’s now test our model for some movie posters and see if it can get the genre of that movie correct.

Movie posters for three different genres

There are three posters of genres ‘Action’, ‘Comedy’, and ‘Drama’ respectively. Let’s see if our model could guess them. For prediction, we will use google notebook’s inbuilt libraries which enables us to directly upload an image. Below is the code for doing this followed by the output of it.

# predicting for random images
import numpy as np
from google.colab import files
from keras.preprocessing import image
uploaded = files.upload()
for fn in uploaded.keys():
path = '/content/' + fn
img = image.load_img(path, target_size=(150, 150))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=256)
print(classes)
The output of our model

Fortunately, all three of the predictions got correct but it will not happen every time as our accuracy is around 53%. Prediction is in the form of an array. All three classes are arranged in alphabetical order and accordingly, the probabilities from softmax function are given. For example, ([[0. 0. 1.]]) is for class ‘drama’ because the probability for the last class is 1. All probabilities sum to one.

You can find all of the code above here on GitHub. The dataset you will find there contains the links to the poster images.

Future scope vs limitation

Here only 3 popular genres were predicted because of lack of data. Poster images are available online but collecting and classifying them is a large concern. In this project, 3120 images for each genre are used for training and 780 for testing. In the future, this project can be carried further by predicting for 10 or 20 movie genres with a large dataset. The prediction of multiple genres for the same movie is also feasible.

Further, many different Deep learning and Machine learning architectures can be used for this project, for example, ResNet (Residual Network) or KNN (K Nearest Neighbours). Transfer learning may also prove helpful for this project.

Conclusion

As we saw above genre of a movie can be predicted from the poster as posters tell many things about the movie. Convolutional neural networks learn by filtering important aspects in the images like various edges. Convolutions reduce the information to identify one class from another while pooling reduces the data to make it more malleable. So, our network may learn that animated movie posters have more flexible edges than any other genre even that action movie posters often have guns. This way our model can correctly predict the genre from movie-poster.