Semi-automated Annotation model , Polygon RNN, Polygon RNN++

Source: Deep Learning on Medium

Problems of Polygon RNN

Polygon RNN looks pretty good at first glance, but has some problems.

First, Polygon RNN solves the position of the new vertex as a classification problem. With this setting, even if the outline of the object is captured, a penalty will be incurred if it is not the correct answer for the annotation. Is not directly correlated with IOU ”.

Second, the resolution of the vertices are coarse because these depend on feature map resolution (In this case, 28×28).

Polygon RNN ++, which is introduced below, addresses these issues.

Polygon RNN++

Polygon RNN ++ was improved on the basis of polygon RNN by David Acuna et al. in 2018. PolygonRNN is…

  • Maximizing IOU as reward by using reinforcement learning
  • predicting vortices with high resolution by using Graph Neural Networks.

Maximizing IOU by using reinforcement learning

After learning the network by solving the classification problem of vertices like Polygon RNN, reinforcement learning starts with using it as the initial value.

They treat network parameters as policies and maximize IoU through reinforcement learning In order to maximize IoU, a loss function in which the sign of IoU is changed is considered.

However, since IoU cannot be differentiated, the expected value of the gradient is calculated using the REINFORCE trick (Williams et al. (1992)). r is Reward (IoU), and p_θ is policy (network parameter).

This is acceptable, but it is known that learning is not stable. Therefore, we use the self critical method (Rennie et al. (2017)) in which they set a baseline as follows. The baseline uses the previous maximum reward, and if it improves (Reward exceeds previous Reward), the value in the parenthesis takes a positive value, and if there is no improvement, the value becomes zero or less. It makes the learning stable.

Evaluator Network

Evaluator Network predicts IoU values with three inputs: output of CNN, hidden layer state of RNN, and predicted polygons (object region).

At inference, a predicted object region corresponding to the multiple initial vertex candidates are calculated, the IoU is evaluated by this network. We can select best polygon (and initial vortex) by selecting the largest IoU candidate.

Evaluator Network learns after RL learning has converged, and plays an active role in selecting initial vertex candidates during inference. Note that this network is not used when learning Encoder / Decoder using RL.。

Gated Graph Neural Networks

Input a polygon with a midpoint added to the time-series Graph Neural Networks , Gated Graph Neural Networks (Li et al. (2015)). It solves as a classification problem of which direction to move.

ResNet Encoder

The Encoder has been changed from VGG to ResNet to get better quality features.

Results of Polygon RNN++

Compared with the Polygon RNN, the resolution of the output vertices is improved.

It can also be seen the effectiveness of RL, Evaluator Network, and Gated Graph Neural Network.