How to convert an image of text into a binary view in Java using Deep Learning

Source: Deep Learning on Medium

How to convert an image of text into a binary view in Java using Deep Learning

Often the first step in performing optical character recognition (OCR) on a document is to first convert the image into a binary view, or pure black and white. This often allows for much improved results. It is possible to apply this effect adaptively, but the results are not always reliable. Deep Learning provides highly accurate results by training an AI to perform the image adjustments. Normally this would take quite a bit of work to employ. However, today we will be using an API to benefit from this advanced functionality in just a few minutes.

First off, we must compile our library. We will be using Jitpack for this, so we need to insert some references into Maven POM.





Next we call preprocessingBinarizeAdvanced and provide an imageFile.

// Import classes://import com.cloudmersive.client.invoker.ApiClient;//import com.cloudmersive.client.invoker.ApiException;//import com.cloudmersive.client.invoker.Configuration;//import com.cloudmersive.client.invoker.auth.*;//import com.cloudmersive.client.PreprocessingApi;ApiClient defaultClient = Configuration.getDefaultApiClient();// Configure API key authorization: ApikeyApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");Apikey.setApiKey("YOUR API KEY");// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)//Apikey.setApiKeyPrefix("Token");PreprocessingApi apiInstance = new PreprocessingApi();File imageFile = new File("/path/to/file"); // File | Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.try {byte[] result = apiInstance.preprocessingBinarizeAdvanced(imageFile);System.out.println(result);} catch (ApiException e) {System.err.println("Exception when calling PreprocessingApi#preprocessingBinarizeAdvanced");e.printStackTrace();}

And that’s it! The image will be converted to a high contrast black and white view that can then be fed into other OCR functions.