Original article was published on Deep Learning on Medium
Computer Vision in Knee MRI Segmentation to the Human Tibia Bone
This article presents the development process of a Machine Learning model to gain understanding from Digital Magnetic Resonance Images (MRI) of the Human Knee and label the corresponding pixels of the image to the Tibia bone, using a Deep Learning network and image segmentation. Deep Convolutional networks have outperformed the state of the art in many visual recognition tasks, the image semantic segmentation challenge consists in classifying each pixel of an image into an instance corresponding to an object or a part of the image.
The post is dived in three sections:
The data set used, consisting of a total of 90 cases of the Human knee medical images, also known as Magnetic resonance Imaging MRI. These images capture the anatomy and physiological process of the body. Each case consists of a set of 160 medical images of the knee in format type Digital Imaging and Communications in Medicine or DICOM.
In order to extract the area of interest in each DICOM image, the Tibia bone was labeled with a software called BML BaseLine, this software is used to mark the bounds of the bone on each DICOM image for each case.
Once the labeling part was finished for each instance of the Tibia bone on the DICOM images, the mask of the tibia is extracted by reading the labeling file produced by the software BML Base Line. For usage purposes with the Machine Learning model, the DICOM images were also transformed into the format Portable Network Graphics PNG.
Once the data was prepared, the data were divided into three groups, a testing group, validation, and training group.
Training and validation
The Deep Learning model used was the U-Net architecture, this architecture consists of a set of Convolution Operations, Max pooling, and ReLU Activations. The downsampling extracts the features with 3×3 convolutions, and the upsampling expands by deconvolution reducing the number of features to finally get a 1×1 convolution to generate a segmentation image or map and categorize the input image.
The main idea behind Convolutional Neural Networks is to extract or learn a feature that is then mapped, this works for classification since the image is converted into a vector that is used to be classified. For Image segmentation, that vector must be reconstructed to an image, the predicted mask on the image above.
The NN architecture looks like a U, and it has three sections:
Contraction: This section is made of several blocks, each block applies two convolution layers with a kernel of 3×3 as the following code class ContractionBlock shows:
The contraction section is composed of four Contraction Blocks and four Max Pooling layers (See image U-Net Architecture) as the following image shows:
Bottleneck: It’s the bottom of the architecture, this part of the network connects and mediates the contraction and expansion sections. The bottleneck is built from simply two convolution layers with a kernel 3×3(with batch normalization) and one convolution layer UP (ConvTranspose2d Libtorch).
Expansion: This section, like the Contracting section, is composed of four blocks. The expansion section is the heart or main feature of this architecture, each expansion block passes the input to two 3×3 Convuliton Network followed by an upsampling layer.
The las expander block differs from the others since the resulting output is the predicted mask.
Each contraction block maps with an expansion block, so, each input gets appended by feature maps of the corresponding contraction layer, this action ensures that the features learned while contracting will be used to reconstruct the image.
The complete architecture is defined as follows: