Computer Vision — Auto grading Handwritten Mathematical Answersheets

Source: Deep Learning on Medium

( Note: I have not added all the code here, if you want to check you can visit my GitHub where I have the complete tutorial in a ipynb notebook )


Workflow diagram

There are two modules in the workflow Workspace detection module and Analysis Module. Workspace Detection module is responsible for detecting multiple work spaces in a given sheet of paper.

Analysis module is responsible for detecting and localizing characters in lines in any given single workspace, and mathematically analyzing them and then drawing red, green boxes depending upon their correctness.

Workspace Detection

Workspace detection model assumes that there are valid rectangular boxes in the given scanned worksheet. This image below shows the worksheet design. Three rectangular boxes in this worksheet are the work-spaces.

Worsheet design

Step 1 : Finding Rectangular Boxes

Rectangles are formed by two horizontal and vertical lines. So the first step is to find all the valid rectangles ignoring digits, symbols or anything that is written on the worksheet.

This code below will first create a binary image called “vertical_lines_img” which contains the vertical lines that are present in the worksheet, then another binary image called “horizontal_lines_img” which contains horizontal lines that are present in the worksheet.

Next we have to add image “vertical_lines_img” with “horizontal_lines_img” to get the final image.

Adding vertical and horizontal line

OpenCV’s findContours method finds all the contours in the final_image (Binary image with only horizontal and vertical lines).

Now we know the coordinates of the work-spaces lets draw them on the original image using openCV’s drawContours method.

Code to find and draw the contours

Sorting the contours

Now we have found all the rectangles, its time to sort them top-to-bottom based on their coordinates. This code below will do that for us.

Function to sort contours
Sorted contours

sort_contours function will return contours and bounding boxes sorted in the method we have given. In this case method is top to bottom.

Selection based on area

There are many rectangles, but we only need three largest ones. How can we select three largest rectangles?…. One answer is to find area of the rectangles, then choose the top 3 rectangles which has the maximum area.

Overall solution

These selected rectangles are the work-spaces which are then extracted from the worksheet and sent to the Analysis Module.

Analysis Module

Line Detection

Detecting the lines is the tricky part, everyone has their way of solving equations some solve step by step, some can solve in just one line, some might write steps for pages and some writes exponents way away from the equation confusing the module to treat those exponents as a separate line.

Our line detection module assumes that there is a sufficient gap between lines and there is some intersection between exponential characters and line. First, the detected work-spaces are converted to binary images then compressed in a single array to take forward derivative. Wherever there is a line there will be change in the derivative.

Change in derivatives of a binary image

The above code is just a glimpse of how line extraction works. To see complete code click extract_line.

Character Segmentation and Exponential detection

After detecting all the lines we have to send the extracted line images to the text_segment function which will use openCV’s find contours to segment the characters and sort them using the function sort_contours described above where method is set to left-to-right.

It’s easy for us to say whether the given number is an exponent or not but for the model it’s not that easy. Assuming that the exponents are at-least above half of the line, we drew a baseline at the center of the image any character which is above the baseline is considered as an exponent.

Exponential detection

Optical Character Recognition

I used MNIST dataset for digits and Kaggle’s Handwritten Mathematical symbols dataset for symbols to train the model.


Images of symbols are preprocessed in the same way as MNIST digits before training.

Almost 60,000 images were trained on Deep Columnar Convolutional Neural Network (DCCNN) a single deep and wide neural network architecture that offers near state-of-the-art performance like ensemble models on various image classification challenges, such as MNIST, CIFAR-10, and CIFAR-100 datasets. This model achieved atmost 96 % accuracy.