Computer Vision in Advertisements

Original article was published by Girish Bhide on Artificial Intelligence on Medium

Computer Vision in Web Advertisements

In my work, I have seen some issues which are related to the image of the advertisement which is flashing on a web site. Out of curiosity, I gave a thought, will it be possible to solve this kind of issues with machine learning or AI? And if yes how this works? To find out more about it let us dive into it…

The Task –

Let us say for an example, is displaying some advertisements. But for some reason, some ads are showing inappropriate images. One way to sort this issue is by checking manually which domains are displaying such ads or we can try to build a neural network which will do that task for us.

Let us see what we have in our arsenal if we want to make the computer do our manual task…

Computer Vision it is !!!

Sounds interesting, isn’t it? Indeed. But it is not that simple as it sounds, to teach a computer to behave like a human brain. So, what should we do now?

Let us go a bit more in detail, with the help of our experimental trial and let us see how it works…

Exploring The Dataset –

To simulate the situation of getting ads from a web page, I have downloaded 46 images from Google. In which 23 ads are random inappropriate ads which most of the web site owners do not want to see on their web pages (e.g., dating site and gambling site ads). Another 23 ads are random ads for car sale, insurance, and investment.

(In this article am looking at image ads only and keeping video and animated ads for next article 🤓)

Now, we have the idea of what our dataset is. Our goal is to classify all these ads in two groups which are, inappropriate ads(dating and gambling ads) and ads which are approved by the web site owner(car sale, insurance, and investment ads). For the classification purpose out of 46 ads, 3 ads from each group are kept for testing the model and the remaining 40 are used for training and validation.

Now let us look at how this image classification and CNN (Convolutional Neural Network) works step by step…

Convolutional Neural Network (CNN) –

First, let us see what a neural network is…

Neural networks are inspired by the human brain. Neurons (which are a very fundamental element of our brain) are interlinked to each other in our brain and performs all the activities of processing, due to which we can do our daily activities. In the same manner, deep learning algorithms use nodes which are connected to each other in several layers. Each node can be described as a decision-making point in simple words. Depending upon the layout of these nodes and how many layers of such nodes are present the type of neural network can be determined. A shallow neural network has one input layer, one hidden layer and an output layer on the other hand deep neural network has one input layer, multiple hidden layers and an output layer. According to the type of problem, one can decide the architecture of the network.

One question arises is that why use CNN and no other neural networks? It is because CNN is a specialized neural network developed to perform operations on two-dimensional image data.

A convolution is a linear operation that involves the multiplication of a set of input, much like a traditional neural network. Given that the technique was designed for two-dimensional input, the multiplication is performed between an array of input data and a two-dimensional array, called a filter.

Neural Network For The Classification of Images –

You can see below a snippet of my code which shows the architecture of the neural network. I will try to explain all the important parameters one by one.

Convolutional Neural Network Layers and Compilation

So, as you can see in the above code there is a total of 12 layers in our network. First is the input layer which is marked with the blue box. The input layer feeds the image data to our network. You can see highlighted parameters which we need to define while creating a layer. The input_shape defines dimensions of our array of images. Here, I have resized all images to the size of 24 x 24 and the image has 3 channels so shape becomes 24 x 24 x 3. Resizing of images is a good practice if you want to avoid errors which occurs during running the model. 😇

Illustration of Kernel

To get the idea of kernel working look at the above image. We have our kernel of size 3 x 3, so a window of 3 x 3 matrix (marked as a yellow box)will slide all over the input image from left to right and top to bottom. Each time extracting features from the input image. Image shown above is for illustration purpose and the actual process happens on pixel level and a window of 3 x 3 pixels move on the image to gather the features.

Next parameter is activation which initiates a node. Activation method decides whether to initiate a node depending upon its value. Here Rectified Linear Unit or ReLU is used as the activation method. In this method nodes which has negative values are stopped from initiating. This improves the computation time. Some other methods are Sigmoid, TanH, Leaky ReLU, Parametric ReLU and Softmax.

After two convolution layers, there is a pooling layer which reduces the sample size. Here Max Pooling method is used, which takes maximum value from the 2 x 2 window.

Dropout layer is initiated after Max pooling layer. Dropout is a technique where randomly selected neurons or nodes are ignored during training. They are “dropped-out” randomly. You can imagine that if neurons are randomly dropped out of the network during training, then other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This results in multiple independent internal representations being learned by the network.

The effect is that the network becomes less sensitive to the specific weights of neurons. This in turn results in a network that is capable of better generalization and is less likely to overfit the training data.

Representation of CNN

I have added another set of same layers to the network to filter out more features. The architecture of the network is completely dependent on the type of problem. You will notice that in the second set of layers number of filters and dropout percentage are increased to get better accuracy.

Once we have created required arrays through convolution layers, we need to convert them to a single vector form. This is achieved by Flatten layer. This layer converts the data to a single long vector which is then passed to the final output layer for classification purpose.

After building the model architecture we need to compile our neural network. As we have only two outputs for the classification (coded them as 1 for inappropriate and 0 for acceptable) we are using binary_crossentropy as our loss function. And then the most important parameter from the aspect of our model performance is Optimizer. Here we are using Stochastic Gradient Descent or SGD in short. Using this method the network is forced to learn to extract features from the image that minimize the loss for the specific task the network is being trained to solve, e.g. extract features that are the most useful for classifying images as two different objects.

Results –

After compiling and running the model on the training dataset, as you can see in the below image, we get an accuracy of 92.5% and an accuracy of 83% for our test dataset. Which means out of 6 images from our testing dataset our neural network classified 5 images correctly. This result can also be improved by changing hyperparameters, network architecture and optimization method.

Output of Classification

By looking at the plots we get the idea of how loss decreases and accuracy increases with each new iteration. Our both lines that are for training and validation dataset moves in a similar way and converges at around 100th iteration. So, we can say that our model is not over-fitted.

Plots for Accuracy and Loss

Feeding Images From A Web Page –

At the beginning of the data exploration part, I mentioned that, to simulate scenario I have downloaded the images. But we can go one step further and try to automate this process by feeding images to our data set directly from a web page. To do that you can refer the following image of the code snippet.

Web Scraping for Images

As we want to classify images from (mentioned at the beginning of the article😛). By using some web scraping tools and packages like RSelenium for RStudio, we can feed the images to our dataset. The most important thing to keep in mind while doing it is to always check web sites policy for scraping. You can check that by typing *domain name*/robots.txt in your address bar.

Key Points for R/RStudio Users –

  • If you are running Tensorflow and Keras in R/RStudio make sure you have Python or Anaconda installed on your machine.
  • You need to install Tensorflow and Keras on Python first. Make sure the versions installed are compatible with your R/RStudio versions. Otherwise, both the packages will not load in RStudio.

(This gave me a lot of headache during the first installation, almost took 3 days to figure out why am unable to run Tensorflow 🤯)

If you found this article helpful, then like it and share with others. (It motivates me to keep on trying new things 🤓)

Thanks for reading, you are awesome!!!🥳