pix2pix in a Nutshell

Source: Deep Learning on Medium

As one of the best known CV papers in 2017, the pix2pix paper (“Image-to-Image Translation with Conditional Adversarial Networks”) as its name suggests has come up with a general solution to tackle the problem of translating from one image to the other. Be it B&W to colored, aerial to map, or sketches to pictures. The paper provides an one-size-fits-all solution to rule them all.

This article serves as a memo to help me review the ideas without having to go through the paper again in the future.

Highlights of the paper:

  1. cGAN is used to help the discriminator D to give a more structured loss.
  2. A U-Net-based architecture for the generator G. For many image translation problems there is a great deal of low-level information shared between the input and output so the skips can help quite a bit.
  3. Convolutional PatchGAN classifier is used for the discriminator D that penalizes structure at the scale of image patches. The N x N patch size can be really small and still produce high quality results.
  4. Unlike the other more “traditional” GANs, pix2pix doesn’t provide gaussian noise z but only use provide noise in the form of dropout at both training and test time.

GAN Losses Summary: