Deep-Learning based tool for prognosis of Alzheimer

Original article was published by Carlos Sanmiguel on Deep Learning on Medium

Deep-Learning based tool for prognosis of Alzheimer


According to the World Health Organisation (WHO), it is estimated that 152M people will suffer dementia by 2050. Alzheimer’s disease, which is a subsection of dementia, gathers around 60–70% of the cases [1]. Within Alzheimer’s disease, there are several different stages where symptoms span to have from weak impact in the early stage known as Mild Cognitive Impairment (MRI), to memory, reasoning and behaviour issues when it develops. Nowadays, there is no treatment to stop its advance. Nevertheless, if it is found prematurely that a subject is developing the disease, symptoms can be treated temporally to minimize its impact on the daily life of the subject, and it can lead to an addition of life expectancy, wherein the best of the cases, it spans up to around 10 years.

The development of new technologies has allowed the scientific community to start proposing different alternatives in the diagnosis and prognosis of Alzheimer’s disease. Yet there are still not a huge number of studies for this particular problem, Artificial Intelligence (AI), and within it, its branches as Machine Learning (ML) and Deep Learning (DL), have proven their effectiveness in analysis and forecasting for other medical problems. Particularly, DL and ML-based algorithms have been used for the diagnosis and prognosis of Alzheimer’s disease, as shown in several studies [1,2,3].

Due to the complexity of the problem, previous studies rely on combining information from different clinical information such as demographic/clinical data, structural MRI, and PET scans. But in practical applications, many of the public datasets do not contain all this information or even some tests such as PET scans are not possible for all the patients because of adverse reactions to the tracer. For these reasons, a deep learning model that rely only on non-invasive procedures would be a significant step-up in the Alzheimer prognosis field. With this purpose in mind, the aim of this project is to develop a DL-based Alzheimer early diagnosis and prognosis tool using structural MRIs and clinical data of the subject. The dataset used here is the ADNI (Alzheimer Disease National whatever) database, characterized by having 1000 subjects with T1.5 and T3 three-dimensional structural MRI, and clinical data. Subjects are classified by their degree of severity of the disease in different groups at the baseline visit — Normal Cognitive (NL), Mild Cognitive Impairment (MCI), or Alzheimer (AD), reporting their evolution along the years, and progression to subsequent states. The diagnosis and prognosis will be treated as a classification problem, where the outputs of the network are the probabilities associated to being, on one hand, to NL or MCI that does not develop to AD, or on the other hand, MCI that develops AD or already was AD at baseline. The deep-learning-based tool is based on a 3D Convolutional network that uses Few-shot Learning techniques to manage the shortage of instances. This might allow us to extract hidden relevant features that might help in the future evolution of Alzheimer’s treatment and prognosis techniques.


The statistical study of this data that has been conducted allows for a balanced and suitable split of train and test sets. The images were somehow already preprocessed but required some additional work. Both MRI images and clinical data correspond to the first medical examination. Figure 1 shows the images as they were extracted from the ADNI database. Skull-stripping was first needed to extract the brain. In order to do that, we use a DL-based tool named deepbrain that extracts the mask of the brain, as shown in figure 2.

Figure 1. Non-preprocessed structural MRI extracted from a patient with AD.
Figure 2. Masked brain extracted from structural MRI using deepbrain.

In order to normalize them and allow for a suitable spatial representation, images were corrected in noise and intensity (bias N3), and registered (translated, rotated and resized) against the template MNI152 by the Montreal Neurosciences Institute (shown in figure 3), in order to share a common space, using ANTsPy.

Figure 3. Several slices extracted from the MNI152 database.

Figure 4 represents the same brain as in figure 2, after being registered against MNI152, showing that it already shares the common space that we were looking for.

Figure 4. Extracted brain registered against MNI152 template

After the registration, cropping black stripes and resize of images, they have a resolution of 111x139x154, weighting each image around 2.37MB.

From the clinical data the following variables are chosen: Age in years, Gender, Years of education, Ethnic, Race, Marital Status, APOE4 expression level, Clinical Dementia Rating Scale–Sum of Boxes, Alzheimer’s Disease Assessment Scale-Cognitive Subscale based on 11 questions, Alzheimer’s Disease Assessment Scale-Cognitive Subscale based on 13 questions, Mini-Mental State Examination, Rey’s Auditory Verbal Learning Test, Functional Activities Questionnaire and the polygenic hazard score. The numerical variables are normalized between [0, 1] and the categorical variables are encoded using a one-hot encoder.


Typically, machine learning problems related to clinical pathologies suffer from a lack of data due to the difficulty of obtaining samples. In particular, the clinical data composed by medical images has another limitation associated since they require a classification performed by a specific expert in the pathology. For this reason, an approach based on few-show learning algorithms is chosen for this project. Few-shot learning techniques offer a solution to creating robust classifiers from a limited amount of training data [4]. For this project, a deep neural net based on a Triplet Loss architecture is employed to overcome the limitations of the number of samples.

For Triplet Loss algorithm, the objective is to build triplets consisting of an anchor image, a positive image, and a negative image [5]. Positive images are images that have the same label as the anchor images are also similar to them, and, conversely, negative images have a different label and chosen to be dissimilar to the anchor images. As shown in Figure 5, the idea is to minimize the distance between the positive samples and the anchor and in contrast to maximize the distance between the anchor and the negative. By the end of the training, the Triplet Loss algorithm will generate embeddings of the samples that will be represented in well-formed cluster regions associated with the different labels of the samples.

Figure 5. Image taken from FaceNet paper [6].

The distance employed in this algorithm is the Euclidean Distance Function, which allows defining the loss function as follows,


where A is the anchor input, P is the positive sample input, N is the negative sample input, and α is some margin which is used to specify when a triplet has become trivial and therefore, there is no interest in adjusting the weights from it. In this project, the triplet definition chosen is the “Semi-Hard”. These triplets are defined as triplets where the negative is farther from the anchor than the positive but still produces a positive loss.


This decision is justified in the Face-Net paper where this algorithm was used for face recognition and the best results were obtained using these triplets.


The deep learning neural network model proposed here is based on the constraint of dealing with few examples for training (less than 1000). Therefore, a strategy of parametric efficiency is followed, with the objective of maintaining a deep network capable of learning the necessary features while preventing overfitting during learning. This is achieved with:

  1. Residual connections to preserve variability
  2. Custom Separable 3D Convolutional layers (far less parameters than normal 3DConv) [7]
  3. Few Shot Learning strategy: Triplet Semi Hard Loss (clustering)

Figure 6 represents a summary of the model, based on convolutional and separable convolutional blocks: convolutional layer, batch normalization, rectified linear activation layer, max pooling (optional) and dropout. First, an embedding of the MRI images is obtained using the previous convolutional blocks then is concatenated to the clinical data which is passed through a fully connected layer. A final embedding is obtained with the combination of both data sources and then an SVC classifier is used to predict the class which can be Health Controls or Alzheimer Disease. To sum up, 100 size embeddings are generated for every 3D input image and clinical data; triplet loss is then computed on the whole batch, measuring the euclidean distances between the candidate triplets. The resulting embedding is used to fit a simple SVM, obtaining accuracy metrics.

Figure 6. Model architecture scheme

Dropout rate has been set to 0.1, kernel regularizers to l2 and batch size to 64 for this preliminary study. It has to be taken into account that larger batches would be beneficial for triplet loss learning, as information extracted at every step for weights optimizacion is greater. Training, validation and test sizes are set to 579, 69 and 34, respectively. Full code and details can be encountered in the project GitHub repository:


Figure 7 show the resulting embeddings of a MRI + Clinical Data training and a MRI training. These embeddings were used to fit an SVM, obtaining accuracy metrics from Figure 8.