Source: Deep Learning on Medium
NIMA: Neural Image Assessment (2017)
This paper predicts the distribution of human opinion scores using a convolutional neural network. The network can be used to score images with a high correlation to human perception. It’s also useful in photo editing and enhancement. The paper aims to predict the technical and aesthetic qualities of images.
The squared EMD (earth mover’s distance) loss is used since it boosts performance in classification with ordered classes. The Aesthetic Visual Analysis (AVA) dataset is used. The AVA dataset contains about 255,000 images, rated based on aesthetic qualities by amateur photographers.
The architectures explored in this method are VGG16, Inceptionv2, and MobileNet for image quality assessment tasks. VGG16 has 13 convolutional and 3 fully-connected layers. It uses small convolutional filters of size 3 x 3. Inceptionv2 is based on the Inception module that allows for the parallel use of convolutional and pooling operations.
MobileNet is a deep CNN for mobile vision applications. For MobileNet, deep convolutional filters are replaced by separable filters. The last layer of the baseline CNN is replaced with a fully-connected layer with 10 neurons that’s followed by soft-max activations.
The baseline CNN weights are initialized by training on the ImageNet dataset.
Training input images are scaled to 256 × 256 and a random 224 × 224 image size is cropped. The CNNs are implemented using TensorFlow. The baseline CNN weights are initialized by training on ImageNet. The last fully-connected layer is randomly initialized.
The performance of the proposed method is shown below: