Review: 3D U-Net — Volumetric Segmentation (Medical Image Segmentation)

Source: Deep Learning on Medium

3D U-Net for Dense Volumetric Segmentation from Sparse Segmentation

Go to the profile of Sik-Ho Tsang
Volumetric Segmentation

In this story, 3D U-Net is briefly reviewed. This is a work by University of Freiburg, BIOSS Centre for Biological Signalling Studies, University Hospital Freiburg, University Medical Center Freiburg, and Google DeepMind. It is published as 2016 MICCAI with over 600 citations. (Sik-Ho Tsang @ Medium)


  1. 3D U-Net Architecture
  2. Results

1. 3D U-Net Architecture

3D U-Net Architecture
  • The 3D U-Net architecture is quite similar to the U-Net.
  • It comprises of an analysis path (left) and a synthesis path (right).
  • In the analysis path, each layer contains two 3×3×3 convolutions each followed by a ReLU, and then a 2×2×2 max pooling with strides of two in each dimension.
  • In the synthesis path, each layer consists of an up-convolution of 2×2×2 by strides of two in each dimension, followed by two 3×3×3 convolutions each followed by a ReLU.
  • Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path.
  • In the last layer, a 1×1×1 convolution reduces the number of output channels to the number of labels which is 3.
  • batch normalization (\BN”) before each ReLU.
  • 19069955 parameters in total.

2. Results

2.1. Some Details

  • Different structures were given the labels 0: “inside the tubule”, 1:”tubule”, 2: “background”, and 3: “unlabeled”.
  • Weighted cross entropy loss is used, where weights are reduced for the frequently seen background and weights are increased for the inner tubule to reach a balanced influence of tubule and background voxels on the loss.
  • Voxels with label 3 (“unlabeled”) do not contribute to the loss computation, i.e. have a weight of 0.
  • Down-sampled versions of the original resolution by factor of two are used.
  • Only 3 samples of Xenopus Kidney are used.
  • The data sizes used in the experiments are 248×244×64, 245×244×56 and 246×244×59 in x×y×z dimensions for our sample 1, 2, and 3, respectively.

2.2. Two Cases

Semi-Automatic segmentation (Top) Fully-Automatic Segmentation (Bottom)
  • 1st case: Semi-Automatic segmentation based on a sparsely annotated data set, i.e. some slices of the 3D structure are annotated.
  • The number of manually annotated slices in orthogonal (yz, xz, xy) slices are (7, 5, 21), (6, 7, 12), and (4, 5, 10) for sample 1, 2, and 3, respectively.
  • 2nd case: Fully-Automatic Segmentation which generalizes to new data set.

2.3. Semi-Automatic Segmentation

Effect of # of slices for semi-automated segmentation (IoU)
  • Start by using 1 annotated slice in each orthogonal direction and increase the number of annotated slices gradually.
  • More slices annotated, higher IoU.
Cross validation results for semi-automated segmentation (IoU)
  • 77 manually annotated slices from all 3 samples into three subsets, too have a 3-fold cross validation both with and without batch normalization (BN).
  • 3D U-Net with BN outperforms others.

2.4. Fully-automated Segmentation

Cross validation results for fully-automated segmentation (IoU)
  • Train on two kidney volumes, segment the third one.
  • BN improves the result except for the third setting.
  • Authors argue that large differences in the data sets are responsible for this effect. The solution is to have much larger sample sizes.


[2016 MICCAI] [3D U-Net]
3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

My Previous Reviews

Image Classification
[LeNet] [AlexNet] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [MSDNet]

Object Detection
[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

Semantic Segmentation
[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [PSPNet] [DeepLabv3] [DRN]

Biomedical Image Segmentation
[CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net]

Instance Segmentation
[SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution

Human Pose Estimation
[DeepPose] [Tompson NIPS’14]