Fluorescent Microscopic Imaging Denoising using Generative Adversarial Networks (GANs)

Source: Deep Learning on Medium

The new loss functions of both generator and discriminator can be written simply as follow:

Loss Functions of Generator and Discriminator in WGAN

2. Removing the sigmoid function can lead to unstable training due to the effect of large steep gradients from the discrminator. This issue was solved by Ishaan Gulrajani et al. [3] in 2017 by adding penalty for large gradients in the loss function of the discrminator.

BACK TO OUR PROBLEM:

We have used three different architecture based on WGANs for solving our denoising problem.

3.A. WGAN with MSE Loss (WGAN-MSE) [5]:

Generator in WGAN-MSE network

In this architecture, the generator has a loss function including the mean square error between the denoised image and ground truth similar to CNN-MSE in addition to negative of the discriminator output of generated images (Remember, Generator wants to fool the Discrminator!). The final architecture will look like the following diagram:

WGAN-MSE

3.B. WGAN with Peceptual Loss (WGAN-VGG) [5]:

Generator in WGAN-VGG network

In this architecture, the generator has a loss function including the mean square error between the output of VGG-19 pretrained network for the denoised image and ground truth similar to CNN-VGG in addition to negative of the discriminator output of generated images. The final architecture can be viewed as follows:

WGAN-VGG Architecture

In both WGAN-MSE and WGAN-VGG, we have used Adam optimizer with low momentum, 1e-5 learning rate, and the networks are trained for 50 epochs. The same CNN network in CNN-MSE and CNN-VGG is used as the generator for these networks. Also, we have used the same discrminator network as shown below. The discriminator loss function penalizies the steep gradients in addition to wrong predictions for real or denoised images. (Remember, discriminator tries to catch fake denoised images!)

Discrminator Network in WGAN-MSE and WGAN-VGG

4. GAN Network with Residual Learning and Auto-Context Model for Refinement (ResUNet) [6]:

Another GAN architecture that proved to perform well in medical imaging field is ResUNet. Furthermore, there is evidence that this architecture has potential in the image denoising task [7].

The proposed architecture is an elegant modification of a classical convolutional network. Its main advantage is that it allows to learn more features in high-resolution images and perform well when few data is available [8].

The ResUNet architecture that we are using consists of the Generator and Discriminator networks. The Discriminator network is a usual convolutional network similar to the ones described above. The main difference is in the Generator. Let’s learn more about its structure.

The Generator of ResUNet consists of the following blocks:

  • ConvBlock is just a two Conv2D layers with kernel size of 3×3. After each layer, the batch normalization and ReLU activation are applied.
  • Residual Unit consists of a ConvBlock and an additional so-called bridge Conv2D layer. Then, the element-wide addition between the two outputs is performed. Residual blocks remove to a great extent the problem of vanishing and exploding gradients that is present in deep architectures.
  • UpResBlock takes two inputs. The first input goes to a Conv2D layer and the bridge input is cropped due to the loss of border pixels in every convolution. The two inputs are later concatenated and fed to a ConvBlock.

Now, when we know about the main constructing blocks of the U-Net, let’s look at the whole picture of the Generator. The Generator consists of two parts: contracting and expansive; they are placed in a U-shape hence the name of the architecture. The Generator architecture looks as follows:

As we can see, an input for each element of the expansive path is a feature map from the corresponding element of the contracting path concatenated with the previous output from the UpResBlock. Finally, the output of the generator is element-wise added with the residual source which is a middle slice of the initial source.

The rest of the architecture is similar to the GANs described above.