Create a caption with Deep learning

Source: Deep Learning on Medium


Go to the profile of Name me
image source

We shall explore how to generate an image caption using Deep learning. The reader should be familiar with python. You need to have keras installed and numpy library.

Keras is an open source neural network library which is easy to learn. We shall use it to generate caption for a given picture. let’s get started…


from keras.preprocessing import image
# image library will be used to shape our image in a shape which #our model can understand
from  keras.applications.vgg16 import VGG16,preprocess_input,decode_predictions
# preprocess_input does color normalization 
# decode_prediction does the classification
import os
import numpy as np

We will use VGG16 as our deep learning model. We could have trained our own model from ground up, but this takes time and more complex to built it from scratch. And the beauty of deep learning is that you can use a pretrained model. A pre-trained model is a model which was created by someone else and can be used to solve a similar problem.

VGG16 network architecture

Next, we need to check if the file is an image format,

if(not os.path.isfile(path)):
   raise Exception(" File not Found") # check if the file exist
image_format =['jpeg','png','jpg'] # image formats
assert(path[-4:] in image_format or path[-3:] in image_format),"Image format required"  # check if the file is has an 
#image extension

Next, we reshape the image in a dimension which VGG16 can process. VGG16 can only process an image with (224,224) shape.

#  path is the location of our image to be predicted
# we shall use image library in keras to process our image
img = image.load_img(path,target_size=(224,224))
image_array = image.img_to_array(img)
image_array = np.expand_dims(image_array,axis=0)
image_array = preprocess_input(image_array)

The last step is create our model and use it to predict,

# we shall load  our model and use it to predict
model = VGG16(weights='imagenet')
predict=model.predict(image_array)
predicted = decode_predictions(predict, top=1)
print("Predicted:",predicted[0][0][1])
#predicted[0][0][1] is shaped in this way to return only the caption

Now we can combine all of our code in one function,

from  keras.applications.vgg16 import VGG16,preprocess_input,decode_predictions
from keras.preprocessing import image
import os
import numpy as np
from keras.utils import  plot_model
def predict(path,variable=1):
    if(not os.path.isfile(path)):
        raise Exception(" File not Found")
        image_format =['jpeg','png','jpg']
    assert(path[-4:] in image_format or path[-3:] in          image_format),"format not known"
        model = VGG16(weights='imagenet')
   img = image.load_img(path,target_size=(224,224))
   image_array = image.img_to_array(img)
   image_array = np.expand_dims(image_array,axis=0)
   image_array = preprocess_input(image_array)
   predict=model.predict(image_array)
   predicted = decode_predictions(predict, top=variable)
   print("Predicted:",predicted[0][0][1])

Now, we have our model working……………. let’s call it

path="ADD THE PATH OF YOUR IMAGE HERE"
predict(path=path)

Done, we have created an image caption generator, .!!!