Source: Deep Learning on Medium
Deep Image Quality Assessment with Tensorflow 2.0
In this tutorial, we will implement the Deep CNN-Based Blind Image Quality Predictor (DIQA) methodology proposed by Jongio Kim, Anh-Duc Nguyen, and Sanghoon Lee . Also, I will go through the following TensorFlow 2.0 concepts:
- Download and prepare a dataset using a tf.data.Dataset builder.
- Define a TensorFlow input pipeline to pre-process the dataset records using the tf.data API.
- Create the CNN model using the tf.keras functional API.
- Define a custom training loop for the objective error map model.
- Train the objective error map and subjective score model.
- Use the trained subjective score model to make predictions.
Note: Some of the functions are implemented in utils.py as they are out of the guide’s scope.
What is DIQA?
DIQA is an original proposal that focuses on solving some of the most concerning challenges of applying deep learning to image quality assessment (IQA). The advantages against other methodologies are:
- The model is not limited to work exclusively with Natural Scene Statistics (NSS) images .
- Prevents overfitting by splitting the training into two phases (1) feature learning and (2) mapping learned features to subjective scores.
The cost of generating datasets for IQA is high since it requires expert supervision. Therefore, the fundamental IQA benchmarks are comprised of solely a few thousands of records. The latter complicates the creation of deep learning models because they require large amounts of training samples to generalize.
The total amount of samples does not exceed 4,000 records for any of them.
The IQA benchmarks only contain a limited amount of records that might not be enough to train a CNN. However, for this guide purpose, we are going to use the Live dataset. It is comprised of 29 reference images, and 5 different distortions with 5 severity levels each.
The first task is to download and prepare the dataset. I have created a couple of TensorFlow dataset builders for image quality assessment and published them in the image-quality package. The builders are an interface defined by tensorflow-datasets.
Note: This process might take several minutes because of the size of the dataset (700 megabytes).
After downloading and preparing the data, turn the builder into a dataset, and shuffle it. Note that the batch is equal to 1. The reason is that each image has a different shape. Increasing the batch size will cause an error.
The output is a generator; therefore, accessing the samples using the bracket operator causes an error. There are two ways to access the images in the generator. The first way is to turn the generator into an iterator and extract a single sample using the next function.
The output is a dictionary that contains the tensor representation for the distorted image, the reference image, and the subjective score (dmos). Another way is to extract samples from the generator by taking them with a for loop:
The first step for DIQA is to pre-process the images. The image is converted into grayscale, and then a low-pass filter is applied. The low-pass filter is defined as:
where the low-frequency image is the result of the following algorithm:
- Blur the grayscale image.
- Downscale it by a factor of 1 / 4.
- Upscale it back to the original size.
The main reasons for this normalization are (1) the Human Visual System (HVS) is not sensitive to changes in the low-frequency band, and (2) image distortions barely affect the low-frequency component of images.
Objective Error Map
For the first model, objective errors are used as a proxy to take advantage of the effect of increasing data. The loss function is defined by the mean squared error between the predicted and ground-truth error maps.
and err(·) can be any error function. For this implementation, the authors recommend using
with p=0.2. The latter is to prevent that the values in the error map are small or close to zero.
According to the authors, the model is likely to fail to predict images with homogeneous regions. To prevent it, they propose a reliability function. The assumption is that blurry areas have lower reliability than textured ones. The reliability function is defined as
where α controls the saturation property of the reliability map. The positive part of a sigmoid is used to assign sufficiently large values to pixels with low intensity.
The previous definition might directly affect the predicted score. Therefore, the average reliability map is used instead.
For the Tensorflow function, we just calculate the reliability map and divide it by its mean.
The loss function is defined as the mean square error of the product between the reliability map and the objective error map. The error is the difference between the predicted error map and the ground-truth error map.
The loss function requires to multiply the error by the reliability map; therefore, we cannot use the default loss implementation tf.loss.MeanSquareError.
After creating the custom loss, we need to tell TensorFlow how to differentiate it. The good thing is that we can take advantage of automatic differentiation using tf.GradientTape.
The authors suggested using a Nadam optimizer with a learning rate of 2e-4.
Objective Error Model
For the training phase, it is convenient to utilize the tf.data input pipelines to produce a much cleaner and readable code. The only requirement is to create the function to apply to the input.
Then, map the tf.data.Dataset to the calculate_error_map function.
Applying the transformation is executed in almost no time. The reason is that the processor is not performing any operation to the data yet, it happens on demand. This concept is commonly called lazy-evaluation.
So far, the following components are implemented:
- The generator that pre-processes the input and calculates the target.
- The loss and gradient functions required for the custom training loop.
- The optimizer function.
The only missing bits are the models’ definition.