Source: Deep Learning on Medium
This article illustrates Domain Transfer Network in face domain. This article also explains method to transfer set of random and unlabed face images to set of emoji images.
I have used set s of one million images without label .
Block f is feature encoder, block g is generator and block D is discriminator. Function f is typical Generative Adversial Network(GAN). This function should remain unchanged. f is pretrained and remains constant during training.
The loss function contains generator’s loss and discriminator’s loss. The generator loss function has four terms: Lgang, Lconst,Ltid,Ltv. The equation is :
Lg = Lgang + αLConst + βLtid+ γLtv
- Lgang measure how well generator tricks discriminator
- Lconstant keeps f(G(s) and f(s) identical.
- Ltid wants G to remain the identity mapping when it takes in some t from the target domain.
- Ltv smooths the generated image by pixel
Feature Encoder Block
Internally block f is pretrained OpenFace(Torch implementation of face recognition) which outputs a 128d vector representation of the input image which is trained so that similar faces are together in the feature space.
- 5 blocks each consisting a stride 2 transposed convolution followed by batch normalization and a ReLu
- 1 x 1 convolution is added after each block to lower Lconst
- Final transposed convolution by a Tanh output layer is performed to ensure outputs between -1 and 1
- Filters in each block can be varied for experimental purpose
- Similar to Generator block , it has 5 blocks
- Each block contain stride 2 convolution, batch normalization and leaky ReLu non-linearity α = 0.2.
- The final output is convolution with three filter
GAN Training Strategies:
Some of the strategies which can be applied apart from the above section
Balance of discriminator and generator:
Balancing is key in traing a GAN . Generator is usually overpowered by discrimnator. Method for solving this include:
- Train generator more than discriminator
- Adjust Hyperparameters like adding higher weight for generator loss
- Add a lower bound on discriminator loss
- Change model architecture.
Instead of normalizing input image into standard normal, normalizing images between [−1, 1]. Use Tanh as the last layer of the generator
Avoid sparse gradients
Instead of downsampling by maxpooling, use strided convolution. Use LeakyReLU instead of ReLU in discriminator.
Change learning rate schedule. Use SGD for discriminator and Adam for generator. Change weight decay.
This article explains the problem of unsupervised domain transfer. It also explains the ability to use domain transfer in order to perform unsupervised domain adaptation.