Source: Deep Learning on Medium
A step towards development of virtual interaction for gaming applications using Deep learning.
What is QuickDraw?
Pictionary is a game where one person draws a shape or picture of an object in the air, and another has to guess it. Just like Pictionary, Quickdraw is a game where you draw a pattern in front of the camera and let computer guess it what you have drawn.
About QuickDraw Database
Origin of this idea was developed by Magenta team at Google Research. In fact, the game “Quick, Draw!” was initially featured at Google I/O in 2016, later the team has trained Convolutional Neural Network (CNN) based model to predict drawing pattern. They made this game online. The dataset to train model consist of 50 million drawings across 345 categories. These 50 million patterns look like shown in figure 1. Image is taken from here.
The team made the database available for public use to help researchers to train own CNN model. Full dataset is separated in 4 types of categories:
- Raw files (
- Simplified drawings files (
- Binary files (
- Numpy bitmap files (
Some researchers used this dataset and successfully developed trained CNN model to predict drawing pattern. You can find these experiments and tutorials here. They also made source code online to help other researchers.
Development of a Real-time QuickDraw Application using CNN
Here we developed a tensorflow based QuickDraw application using 15 subjects instead of using the whole database image. We downloaded images of 15 class having .npy format from here. This 15 class includes Apple, Candle, Bowtie, Door, Envelope, Guitar, Ice Cream, Fish, Mountain, Moon, Toothbrush, Star, Tent, and Wristwatch.
The downloaded dataset is kept in the folder named ‘data’. A CNN model is trained with these image samples. For small dataset, we trained CNN model with 3 Convolutional layers followed by max-pooling and dropout. Two dense layers are used after flattening the network. The number of trainable parameters is shown in figure 2.
We used Adam optimization during training network with default learning rate as specified in the OpenCV library. However, we also used Stochastic Gradient Descent (SGD) but in this case, you have to manually set these parameters, as with default learning rate SGD gives low training and validation accuracy. We trained it with 50 epochs. It took around 2 hours to train CNN model in a 6GB RAM CPU. The accuracy of the model was around 94%. The output of real-time testing is shown in the video below. Bandicam software is used to record desktop activity.
The application finds blue pointer and makes a circle around that, then it finds the center of circle to draw lines or curve followed by hand movement. You can replace blue color pointer with other colors too. For that, you need to give appropriate RGB color combination in the code. As per the drawing pattern, model guesses and shows the predicted emoji as output. However, sometimes it couldn’t classify object correctly. The obvious reason is real-time application is sensitive to noise, background color and quality of sensor used for acquisition of video stream. However, accuracy of the system can be improved by training it with more number of epochs, keeping hand with blue pointer close enough to webcam.
The source code of this application is available at GitHub repository. You can download and use it to train your own CNN model. The installation process of required packages is also mentioned at GitHub page.
Thanks to Jayeeta Chakraborty, who helped me to develop this project. I would also like to thank Akshay Bahadur for his excellent tutorial and code. However, I made some improvement in CNN architecture by adding extra convolutional and dropout layers to improve performance of system. Hope this article and code will help you to develop other similar real-time applications.