Document scanner and private data sharing

Original article was published on Deep Learning on Medium

Document scanner and private data sharing

Abstract

The problem: we want to take a picture of a document with our phones, and send it privately to a recipient. With the current article, we are trying to solve the current issues, which can appear in real life:

  • taking a picture of a document can be messy — not all is reading fine, there is a bunch of not needed data;
  • the picture can be captured via a third party (like a hosting, image sharing service, etc.). We want all to be private;
  • we want to have a fast way to show the image in a browser, directly converted from the ciphertext.

Is this a real-life problem? taking a picture of an image and “repair” the perspective is not something new, but it is something that appears very frequently in the real wife. It is a common thing — someone to ask: “Please scan this contract and send it to me today” or “Please scan this declaration and send it to me in an hour”, but what we do if we don’t have a scanner? Of course — take a picture with our phone and send a messy picture. We then have a response — “Hey, do you have a scanner? I don’t want to have your shoes on the picture” and so on.

How we will solve the problem — of course with Math. We will use some steps to go throw the process. We will use detect angles, change of perspective, and crypto libraries to hide the important information and easily transmit.

What libraries we will use — we will use and combine various type of libraries:

  • numpy — as a standard in Python numerical operations
  • OpenCV — image processing in Python;
  • blurhash-python — to get a placeholder of the image. It is not very useful for documents, but maybe we will want to extend this private sharing image service in feature. If we do so — placeholders will be very, very important. The reader can view only the initial preview, but don’t have a way to see the whole picture if he doesn’t have a password.
  • imutils — a series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, displaying Matplotlib images, sorting contours, detecting edges, and much easier with OpenCV.
  • pyaes — a pure-Python implementation of the AES (FIPS-197) block-cipher algorithm and common modes of operation (CBC, CFB, CTR, ECB, OFB) with no dependencies beyond standard Python libraries. See README.md for API reference and details.
  • pbkdf2 — password-based key derivation function, PBKDF2, specified in RSA PKCS#5 v2.0.

What cryptographic algorithms we will use for securing data?

  • base64 — for converting images to text and do so in a reverse way. It is useful for sending a picture to browsers;
  • blurhash — for generating placeholders of the image;
  • AES — for the text (base64) encryption;

Change of perspective in math

Why changing the perspective is so important in this text? In the current article, we will take a look at the practical change of perspective. But getting deeper into this concept and mathematical fundamentals is crucial to understand the whole picture.

Here you can find an article, which gives a deep overview of mathematics, behind the perspective transformation: https://www.math.utah.edu/~treiberg/Perspect/Perspect.htm Those mathematical ideas that occur in art and computer graphics.

Alberti’s problem

This question prompted the development of a new subject, projective geometry whose exponent was Girard Desargues (1591–1661).

Parallel transformation of points

The perspective transformations that describe how a point in three space is mapped to the drawing plane can be simply explained using elementary geometry. We begin by setting up coordinates. A projection involves two coordinate systems. A point in the coordinate system of an object to be drawn is given by X=(x, y, z) and the corresponding in the imaging system (on the drawing plane) is P=(u, v). If we use the standard right handed system, then x and y correspond to width and depth and z corresponds to height. On the drawing plane, we let u be the horizontal variable and v the vertical.

Projecting an object to the drawing plane

We can measure the distances between pairs of points in the usual way using the Euclidean metric.

If

and

and so on, then:

The projection from X to P is called a parallel projection if all sets of parallel lines in the object are mapped to parallel lines on the drawing. Such a mapping is given by an affine transformation, which is of the form

where T is a fixed vector in the plane and A is a 3 x 2 constant matrix. Parallel projection has the further property that ratios are preserved. That is if X (1, 2, 3, 4) are collinear points in the object, then the ratio of distances is preserved under parallel projection

Of course denominators are assumed to be nonzero.

Full process

Step 0. Requirements

It is always a pain to start a Python scripts, when you don’t know the required libraries and version. That’s way I create a requirements.txt file (link):

requirements.txt

Step 1. Read the image

At this stage we need to make the imports, we will use them further in this article. Please don’t forget to make the imports for everything to work as expected. Also, we define some of the functions, which will be useful for use in the future. Those are basic operations with OpenCV, which can be repeated many times and it is good practice to have them in functions (like read_image, show_image_opencv, save_image_opencv, etc). We also make a function get_current_dir, which can help us if we don’t know current dir, or we want to include the image from a different location.

Please keep in mind, that for *nix systems (like Mac), show_image_opencv can not work very well. It can “freeze” in the part of destroyAllWindows();

We read our input file, called bulsatcom.png, which is placed in the same directory as the course project files. Then we can make a variable holding the input file + one copy.

Original file:

bulsatcom.png

The expected result on this step: We now have the OpenCV object, holding the image. We also have a copy of the image in input_image.png

Step 2. Identify the edges

Every image has some noise and our goal in this step is to perform a cleaning. One of the approaches for doing so is to convert the colored imaged into a gray one. After that, we apply a blur function to blur the image with (3, 3) filter. Blurring reduces any high-frequency noise and makes detection of contours easier.

We have only one function here detect_edges, it accepts the input image and returns an instance with edges.

Maybe the most interesting part here is the Canny Edge Detection. Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in 1986. It is a multi-stage algorithm and the steps in short are:

  • Noise Reduction;
  • Finding Intensity Gradient of the Image;
  • Non-maximum Suppression;
  • Hysteresis Thresholding.

So what we finally get is strong edges in the image. The first argument is the image instance (already gray and blurred), second and third arguments are our minVal and maxVal respectively.

The expected result on this step: We have only one function here, but a very important one. We perform some cleaning of the noise in the image, applying filters.

edged_image.jpg

Additional methods, articles & approches for edge detection:

Step 3. Detect document edges in the image

One of the most interesting parts is to find the contours in the image. It is also a challenge (but very important) to find out the contour with the highest area. On that, we will exclude some big letters or images inside the paper. We only need the largest are, a.k.a the whole document.

We make a function calculate_draw_contours where we use some of the functions, built-in OpenCV, like findContours. This function returns

The expected result on this step: We have contours of the image.

contoured_image.jpg

Step 4. Identify and extract document boundary/edges

This is one of the hardest moments in this article. We have the coordinates of all the corners of our document and it is crucial to arrange them and know which coordinate to correspond to a corner.
Images are composed of pixels. When we have a gray picture, we don’t have a depth of color, which is a dimension also. So we can work with such pictures in two dimensions — width and height.

Step 5. Apply perspective transform

When we have the dimensions, we can construct the destination points. We can use getPerspectiveTransform function from OpenCV, which calculates a perspective transform from four pairs of the corresponding points. After that, we can use warpPerspective, which applies a perspective transformation to an image.

The expected result on this step: An almost scanned image, which better perspective to show.

scanned_image.jpg

Step 6. Encode the image in base64

But what Is Base64?

Base64 is a way in which 8-bit binary data is encoded into a format that can be represented in 7 bits. This is done using only the characters A-Z, a-z, 0–9, +, and / in order to represent data, with = used to pad data. For instance, using this encoding, three 8-bit bytes are converted into four 7-bit bytes.

The term Base64 is taken from the Multipurpose Internet Mail Extensions (MIME) standard, which is widely used for HTTP and XML, and was originally developed for encoding email attachments for transmission.

Why do we use Base64?

Base64 is very important for binary data representation, such that it allows binary data to be represented in a way that looks and acts as plain text, which makes it more reliable to be stored in databases, sent in emails, or used in text-based format such as XML. Base64 is basically used for representing data in an ASCII string format.

Why we DON’T use Base64 everywhere?

It is good that Base64 can do some important things for us, but we must keep in mind that we should not use base64 for every place, especially in web development. Here you can find an interesting article about this.

Step 7. Get also the blurhash value of the image

BlurHash is a compact representation of a placeholder for an image. I find it useful in projects, where I want to save bandwidth and show a placeholder until the image is actually loaded. Also, it can be a good fit for this article, as we can calculate the BlurHash value of a picture and store it in a DB. We can after that show “preview” in the browsers of users, which are not allowed to view the full picture/document.
It can be used for something like a secret variant of an image with some data on it, but not enough to read or identify patterns.

BlurHash

More links information about it

Step 8. Encrypt with AES

The example below will illustrate a simple password-based AES encryption (PBKDF2 + AES-CTR) without message authentication (unauthenticated encryption). I find this useful for this article, as we will want to encode the base64 equivalent of the image and make it “password protected”, without the ability someone to see the content, event he owns the servers, or read our message somehow.

Useful links for such operations: https://cryptobook.nakov.com/symmetric-key-ciphers/aes-encrypt-decrypt-examples

Step 9. Send the cipher text and visualize in browsers

This step is optional and we are not going to go deep inside this topic. The idea is that when we have encrypted image + blurhash to show in the browser (short preview), the user with the password can encrypt the ciphertext and see the base64 string. He can also convert it to an image. It is very easy to make a JavaScript library, which accepts BlurHash value + ciphertext and after a successful password entry — it visualizes the base64 image (natively in HTML).

Example library, that can be used for such operations (AES decrypt in browser) can be found here: https://github.com/ricmoo/aes-js

Summary

What do we want to make in short in this article?

  • take a picture of an image with our phone;
  • repair the perspective to get almost scanned document;
  • code it in base64;
  • get the blurhash value;
  • encrypt with AES;
  • send the ciphertext;
  • show a blurhash preview;
  • decode in browsers with libraries available.

It will solve some problems with private document/picture sharing + repairing perspective of a picture of a document. We use various techniques to obtain this, this approaches can be easily made to an API, I tried to make it in the biggest part like functions, which can be transformed to endpoints.

Similar articles/researches

What we have more in this article?

  • extend the idea with private document sharing with encryption methods;
  • a descriptive explanation for functions, steps and math concepts;
  • some tests of the functions, which will help us if something is broken in calculations.

Some quick tests for functions

Links