Building a Rock Paper Scissors AI using Tensorflow and OpenCV

Original article was published on Deep Learning on Medium

Collecting our Data:

The basis of any Deep Learning model is DATA. Any Machine Learning Engineer would agree that in ML the data is far more crucial than the algorithm itself. We need to collect images for the symbols Rock, Paper and Scissor. Instead of downloading somebody else’s data and training on it, I made my own dataset and encourage you to build your own too. Later on try changing the data and re-training the model to see the grave impact data has on a Deep Learning model.

I have used Python’s OpenCV library for all camera related operations. So label here refers to what class the image belongs too ie. RPS and depending on the label the image is saved in the appropriate directory. ct and maxCt refers to the start and final index to save images with. Remaining is standard OpenCV code to get webcam feed and save images to a directory. A key point to note is that all my images are of 300 x 300 dimensions. After running this my directory tree looks like this.

│ paper0.jpg
│ paper1.jpg
│ paper2.jpg

│ rock0.jpg
│ rock1.jpg
│ rock2.jpg


If you are referring the Github repo does the job for you !!

PreProcessing our Data:

A computer understands numbers and we have images at our disposal. So we will convert all the images into their respective vector representations. Also, our labels are yet to be generated and as established labels can’t be text, so I’ve manually built One-Hot-Encoded representations for each class using the shape_to_label dictionary.

As we have saved our images in directories based on their classes the directory name serves as the label which is converted to One-Hot representation using the shape_to_label dictionary. Following that we iterate over the files in our system to access images, the cv2.imread() function returns a vector representation of image. We do some manual Data Augmentation by flipping the image and zooming into it. This increases our dataset size without having the need to take new photos. Data Augmentation is a crucial part to generating datasets. Finally the images and labels are stored in separate numpy arrays.

More on Data Augmentation here.

Building Our Model with Transfer Learning:

When it comes to working with image data there are many pre-trained models available that have been trained on datasets with thousands of labels that are available at our disposal thanks to Tensorflow and Keras distributions of these models via their applications api. This makes including these pre-trained models in our applications seem like a breeze!

Transfer Learning Diagram

In a gist, Transfer learning takes a pre-trained model and does not include its final layers that make the final prediction thereby leaving us with the powerful part of the model that can distinguish features in images for this case and pass this information to our own Dense Neural Network.

Why not train your own model? Its totally up to you!! However using transfer learning can at many times make your progress a lot faster, in a sense you’re avoiding re-inventing the wheel.

Some other popular pre-trained models:

  • InceptionV3
  • VGG16/19
  • ResNet
  • MobileNet

Here’s an interesting article on Transfer Learning!

Note : Whenever we work with image data the use of Data Convolutional Neural Layers is almost a given. The transfer Learning model used here has these layers. For more info on CNNs visit.

The Implementation:

I’ve used the DenseNet121 model for feature extraction whose outputs will eventually be entered in my own Dense Neural Network.

Key Points :

  • As our images are 300×300 the input shape specified is the same. The additional 3 stands for the RGB layers, so this layer has enough neurons to process the entire image.
  • We’ve used the DenseNet layer as our base/first layer followed by our own Dense Neural Network.
  • I’ve set the trainable parameter to True which will retrain the weights of the DenseNet too. This gave me better results though twas quite time consuming. I’d suggest that in your own implementations try different iterations by changing such parameters also know as hyperparameters.
  • As we have 3 classes Rock-Paper-Scissor the final layer is a Dense layer with 3 neurons and softmax activation.
  • This final layer returns the probability of the image belonging to a particular class among the 3 classes.

If you’re referring the GitHub repo takes care of Data Preparation and model training!

By this point we have gathered our data, built and trained our model. Only part left is deployment using OpenCV.

OpenCV implementation:

The flow for this implementation is simple:

  • Start webcam feed and read each frame
  • Pass this frame to model for classification ie. predict class
  • Make a random move by computer
  • Calculate Score

Above snippet contains the rather important blocks of code, remaining part is just making the game user-friendly, RPS rules and scoring.

So we begin with loading our trained model. Next comes a stone age for-loop implementation that shows a countdown before beginning the prediction part of the program. After prediction the scores are updated based on the players moves.

We explicitly draw a target zone using cv2.rectangle(). Only this part of the frame is passed to the model for predictions after it is pre-processed using the prepImg() function.

The entire code is available here on my repo.


We’ve successfully implemented and understood the workings of this project. So go ahead and fork my implementation and experiment with it. A major improvement would probably be adding hand detection so we don’t need to explicitly draw a target zone and model would first detect hand position then make prediction. I’ve tried to keep my language as beginner friendly as possible if you still have any questions do comment. I encourage you to improve the project and send me your suggestions. Excelsior!