Tutorial: Alphabet Recognition In Real Time — A Deep Learning and OpenCV Application

This is a tutorial on how to build a deep learning application that can recognize alphabet written by an object-of-interest (a bottle cap in this case) in real-time.

Project Description

A popular demonstration of the capability of deep learning techniques is object recognition in image data.

This deep learning python application recognizes alphabet from real time webcam data. The user is allowed to write the alphabet on the screen using an object-of-interest (a water bottle cap in this case).

You can access the full project code:

Working Example

Code Requirements

The code is in Python version 3.6, uses OpenCV and Keras libraries.

Follow this medium post to install OpenCV and Keras in Python 3.

Data Description

The “Extended Hello World” of object recognition for machine learning and deep learning is the EMNIST dataset for handwritten letters recognition. It is an extended version of the MNIST dataset (the “Hello World” of object recognition).

The letter ‘e’ is stored in a 28 x 28 numpy array as shown above.

Code Explanation

Step 1: Train A Multilayer Perceptron Model

1.1 Load Data

We use Python’s mnist library to load the data.

Lets now get the data ready to be fed to the model. Splitting the data into train and test sets, standardizing the images and other preliminary stuff.

1.2 Define Model

In Keras, models are defined as sequence of layers. We first initialize a ‘Sequential Model’ and then we add the layers with respective neurons in them. The following code does the same.

The model, as expected, takes 28 x 28 pixels (we flatten out the image and pass each of the pixel in a 1-D vector) as an input. The output of model has to be a decision on one of the letters, so we set the output layer with 26 neurons (the decision is made in probabilities).

1.3 Compile Model

Now that the model is defined, we can compile it. Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow.

Here, we specify some properties needed to train the network. By training, we are trying find the best set of weights to make the decision on the input. We must specify the loss function to use to evaluate a set of weights, the optimizer used to search through different weights for the network and any optional metrics we would like to collect and report during training.

1.4 Fit Model

Here, we train the model using a model checkpointer, which will help us save the best model (best in terms of the metric we defined in the previous step).

1.5 Evaluate Model

Test accuracy of the model on the EMNIST dataset was 91.1%.

1.6 Put It All Together

Putting all the steps together, we get the complete code needed to build a decent MLP model trained on the EMNIST data.

Step 2: Train A Convolutional Neural Network Model

2.1 and 2.2 — Load Data and Define Model

These two steps are exactly the same as the steps we implemented in building the MLP model.

2.3 Define Model

For the reasons out of the scope of this tutorial, I have defined the above CNN architecture to solve the problem at hand. To know more about CNNs, visit this tutorial page, its the best!

2.3 Compile Model

Unlike the MLP model, this time I am using the ADADELTA optimizer

2.4 Fit Model

To know how the model variables batch_size and epochs affect out model performance, visit this.

2.5 Evaluate Model

Test accuracy for the model on the EMNIST dataset was 93.1%.

2.6 Put It All Together

Putting it all together, we get the complete code needed to build a decent CNN model trained on the EMNIST data.

Step 3: Initializing Stuff

Before we look into the recognition code, lets initialize stuff.

First, we load the models built in the previous steps. We then create a letters dictionary, blueLower and blueUpper boundaries to detect the blue bottle cap, a kernal to smooth things along the way, an empty blackboard to store the writings in white (just like the alphabet in the EMNIST dataset), a deque to store all the points generated by the pen (blue bottle cap), and a couple of default value variables.

Step 4: Capturing The Writings

Once we start reading the input video frame by frame, we try to find the blue bottle cap and use it as a pen. We use the OpenCV’s cv2.VideoCapture() method to read the video, frame by frame (using a while loop), either from a video file or from a webcam in real time. In this case, we pass 0 to the method to read from a webcam. The following code demonstrates the same.

Once we start reading the webcam feed, we constantly look for a blue color object in the frames with the help of cv2.inRange() method and use the blueUpper and blueLower variables initialized beforehand. Once we find the contour, we do a series of image operations and make it smooth. Smoothing just makes our lives easier. If you want to know more about these operations — erode, morph and dilate, check this out.

Once we find the contour (the if condition passes when a contour is found), we use the center of the contour (blue cap) to draw on the screen as it moves. The following code does the same.

The above code checks if a contour is found and if yes, it takes the largest one (assuming its the bottle cap), draws a circle around it using the cv2.minEnclosingCircle() and cv2.circle() methods, gets the center of the contour found with the help of cv2.moments() method. In the end, the center is stored in a deque called points so that we can join them all to form a full writing.

We display the drawing on both frame and blackboard. One for external display and the other to pass it to the model.

Note: I have written a brief tutorial on setting up a drawing kinda environment that allows us to draw like in a paint application, check it out here to clearly understand whats going on.

Step 5: Scraping The Writing And Passing It To The Model

Once the user finishes writing, we take the points we stored earlier, join them up, put them on a black board and pass it to the models.

The control enters this elif block when we stop writing (because there were no contours detected). Once we verify that the points deque is not empty, we are now sure that the writing is done. Now we take the blackboard image, do a quick contour search again (to scrape the writing out). Once found, we cut it appropriately, resize it meet the input dimension requirements of the models we built i.e., 28 x 28 pixels. And pass it to both of the models!

Step 6: Showing Model Predictions

We then show the predictions made by our models on the frame window. And then we display it using the cv2.imshow() method. After falling out of the while loop we entered to read data from the webcam, we release the camera and destroy all the windows.

Step 7: Execution

7.1 Download The Data

Download the data folder from here and put it your project directory.

7. 2 Build The MLP Model

> python mlp_model_builder.py

7.3 Build The CNN Model

> python cnn_model_builder.py

7.4 Run The Engine File

> python alphabet_recognition.py

7.5 Grab A Blue Bottle Cap

And have fun!


In this tutorial, we built two deep learning models, an MLP model and a CNN model, trained on the famous EMNIST data. And used those models to predict alphabet written by an object of our interest in real-time. I encourage you to tweak the architectures of both the models and see how they affect your predictions.

Hope this tutorial was fun. Thanks for reading.

Live and let live!

Source: Deep Learning on Medium