‘Semantic Generation Pyramid’ for Image Generation and Manipulation

Original article can be found here (source): Artificial Intelligence on Medium

‘Semantic Generation Pyramid’ for Image Generation and Manipulation

Researchers from Google and the Weizmann Institute of Science have proposed a new image generative model that leverages the hierarchical space of deep features learned by pretrained classification networks and provides a unified and versatile framework for image generation and manipulation tasks.

Convolutional Neural Networks (CNNs) are powerful tools for learning meaning in feature spaces in visual classification tasks and can be trained to acquire semantic information ranging from low level to high level. However, there is no one-to-one mapping between deep features and an image, which makes it difficult to invert manipulated deep features back into realistic images. So far, this challenge has been addressed by imposing regularization priors on the generated image. But this technique can cause other problems, as it limits the type of features that can be used and the reconstructed images can become blurry and unrealistic.

The Google and Weizmann team applied Generative Adversarial Networks (GANs) to the task of feature inversion. GANs offer a distinct approach and proven ability to generate highly realistic images based on their game-theoretic formulation. GANs however struggle to utilize globally coherent semantic information encapsulated in deep features.

The researchers’ proposed “Semantic Generation Pyramid” is a novel generative model — a unified versatile framework for image generation and manipulation tasks which can leverage the continuum of semantic information encapsulated in deep features.

Given a set of learned features from a reference image, the model generates images with matching features at each semantic level. It can also generate only specified areas of an image. The model’s feature maps are multiplied with masks. In the training stage, a blocked crop is randomly selected as a “selected layer”. At inference time, the user can set any shape of the mask and determine the ”selected layer” according to the original input. Thus the generated images can keep only the wanted parts of the original images.

Applying spatially varying masks, to generate only wanted areas of the image

The architecture of the generator works in full conjunction with a pretrained classification model, where each classification stage has a corresponding block in the generator and each block corresponds to a single stage in the classification model (2–3 conv layers + pooling).

Semantic Pyramid Generator images
Image Re-painting (left); Image generation from paintings (right)
Semantic Image Composition (left); Image re-labelling (right)

The researchers introduced several applications to test the proposed framework’s versatility and flexibility: Re-painting, where an image region can be re-generated; Semantic image composition, which is implanting an object or some image crop inside another image; Generation from unnatural reference image, which is converting paintings to realistic photos; and Re-labelling, changing the class label fed to the generator. All tasks were performed with the same model without further training, and the results demonstrated the positive potential of utilizing semantic in generative models.

The paper Semantic Pyramid for Image Generation is on arXiv.