Introduction to Capsule Networks

Original article was published by Mike Leske on Artificial Intelligence on Medium


Each capsule’s state sj is calculated as the weighted sum of the matrix multiplication of output/prediction vectors of the capsules from the lower layer with the coupling coefficient cij between sj and the respective lower-level capsule si.

Of course, capsules in the first capsule layer of a capsule network calculate their activation based on the input from the previous convolution layer. In this case, no coupling coefficient cij exists.

As the capsule’s output vector indicates the probability of having detected a certain feature, capsule sj’s output vector vj is “squashed”, so that long vectors sum up to 1 max and short vectors are close to zero.

The coupling coefficients cij defines the “activation routing” between a capsule and all potential parent capsules in the next layer and sum to 1. The softmax-like calculation ensure that the most likely “parent” capsule gets the “most” of capsule sj’s output.

By following the presented calculations, the routing preferences between capsules and the prediction of next layer activations, Capsule Network claim to address the CNN limitations listed above, especially modelling stronger feature relationships then CNN could represent which is a very strong tool to boost image segmentation.

Capsule Networks Use Cases

The CapsNet architecture represent a baseline implementation of Capsule Networks for the MNIST dataset. CapsNet not only achieves state-of-the-art CNN performance in digit classification, but it especially excels in the MultiMNIST dataset where each image includes 2 overlapping digits.

The CapsNet was designed with an initial classical convolutional layer, followed by 2 capsule layers.

Source: Dynamic Routing Between Capsules, Sabour, Frosst, Hinton [3]

At the CVPR 2019 conference several capsule use cases were presented. The left image below demonstrates how CapsNet is able to correctly detect overlapping digits due to the feature bonding between capsules. The right image demonstrates how SegCaps, a segmentation network with capsules, achieves more accurate results in segmenting granular retinal vessels.