Semantic segmentation- UNet

Original article was published on Deep Learning on Medium


UNet is one of the popular image segmentation architectures used mostly in biomedical purposes. The name UNet is because it’s architecture contains a compressive path and an expansive path which can be viewed as a U shape.This is also a “Fully Convolutional Neural Network” with no fully connected layers. This architecture is built in such a way that it could generate better results even for a less number of training datasets.


The left side path is contraction path and the right side is expansion path. For every one step in the contraction path, the following takes place “CONV1-3×3 →CONV2-3×3 →MAXPOOL2x2”. In the expansion path, the following takes place “DECONV1-3×3 →CONCAT(CONV2–3×3+DECONV1–3×3) →CONV3–3X3 →CONV4–3X3 →CONV-1X1”. The number of channels in the output layer corresponds to the number of classes that are to be classified. The concatenation of shallower and deeper layers is due to the reason that shallower regions containing more spatial information and deeper layers containing more image features.

What is OverLap tile strategy?

Since unpadded convolution is used, output size is smaller than input size. Instead of downsizing before network and upsampling after network, overlap tile strategy is used. Thereby, the whole image is predicted part by part as in the figure above. The yellow area in the image is predicted using the blue area. At the image boundary, image is extrapolated by mirroring.

What is Elastic deformation?

Elastic deformation is a data augmentation technique. Since in the medical field, data is not present in abundant quantities this method is used. To increase the size of training set, data augmentation is done by randomly deformed the input image and output segmentation map.

What is Seperation of touching objects?

In biomedical field, say cells, the different types of objects are in close proximity and it is difficult to differentiate them. So, to solve this a weight map is applied at the output of the network.


On the dataset ISBI-2012, this architecture was best performing at its time of release.