Next Article in Journal
Dynamic Characterisation of Fibre-Optic Temperature Sensors for Physiological Monitoring
Next Article in Special Issue
A Machine Learning Approach to Diagnosing Lung and Colon Cancer Using a Deep Learning-Based Classification Framework
Previous Article in Journal
Impact Damage Detection in Patch-Repaired CFRP Laminates Using Nonlinear Lamb Waves
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computer-Aided Diagnosis of Alzheimer’s Disease through Weak Supervision Deep Learning Framework with Attention Mechanism

1
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
2
School of AutoMation, Guangdong University of Petrochemical Technology, Maoming 525000, China
3
Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
4
Department of Chemistry, Institute of Inorganic and Analytical Chemistry, Goethe-University, 60438 Frankfurt, Germany
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(1), 220; https://doi.org/10.3390/s21010220
Submission received: 1 December 2020 / Revised: 28 December 2020 / Accepted: 28 December 2020 / Published: 31 December 2020
(This article belongs to the Special Issue Computer Vision and Machine Learning for Medical Imaging System)

Abstract

:
Alzheimer’s disease (AD) is the most prevalent neurodegenerative disease causing dementia and poses significant health risks to middle-aged and elderly people. Brain magnetic resonance imaging (MRI) is the most widely used diagnostic method for AD. However, it is challenging to collect sufficient brain imaging data with high-quality annotations. Weakly supervised learning (WSL) is a machine learning technique aimed at learning effective feature representation from limited or low-quality annotations. In this paper, we propose a WSL-based deep learning (DL) framework (ADGNET) consisting of a backbone network with an attention mechanism and a task network for simultaneous image classification and image reconstruction to identify and classify AD using limited annotations. The ADGNET achieves excellent performance based on six evaluation metrics (Kappa, sensitivity, specificity, precision, accuracy, F1-score) on two brain MRI datasets (2D MRI and 3D MRI data) using fine-tuning with only 20% of the labels from both datasets. The ADGNET has an F1-score of 99.61% and sensitivity is 99.69%, outperforming two state-of-the-art models (ResNext WSL and SimCLR). The proposed method represents a potential WSL-based computer-aided diagnosis method for AD in clinical practice.

1. Introduction

Alzheimer’s disease (AD) is a common chronic progressive neurodegenerative disease of the elderly characterized by progressive dementia and brain degeneration. It significantly affects cognitive functions, memory functions, the quality of life, and the emotions of more than 50 million people worldwide [1]. According to a report by the World Health Organization (WHO), AD has become the fifth leading cause of death, and the number of AD patients will increase to 152 million by 2050, and by 2050 [2]. However, the etiology of AD remains unclear, and there are no effective drugs or treatments to reverse dementia [3]. The preclinical stage of AD, called mild cognitive impairment (MCI), is a transitional state between normal aging and AD [4]. According to a report of the American Academy of Neurology [5], about 10% to 15% of patients with MCI may eventually suffer from AD, whereas only 1% to 2% of patients experience normal aging. Unfortunately, due to a lack of understanding of AD by patients and their family members, most patients suffer from moderate and severe stages of AD at the time of diagnosis and have missed the optimal intervention stage [6]. Therefore, it is of great significance to identify the risk and extent of AD as early as possible. Typically, doctors have to conduct careful medical assessments of patients, such as neuropsychological examinations and neuroimaging, to identify the risk and extent of developing dementia [7].
As a result of significant progress in neuroimaging technology, characteristic changes can now be observed in the brains of patients with AD, including changes in the prodromal and presymptomatic states, providing information for doctors to obtain a more accurate diagnosis [8]. Different forms of neuroimaging techniques have been used in clinical practice to diagnose AD, including computed tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI) [9]. CT is a structural imaging technique that integrates X-ray projections from multiple angles and generates cross-sectional or three-dimensional (3D) images [10]. It has the advantages of low cost and fast examination speed. However, the resolution of the medial temporal lobe is relatively low, which may lead to MCI being misdiagnosed as a normal aging symptom [11]. PET is another structural imaging technique that provides useful information for the diagnosis of AD by detecting the distribution of positron nuclide markers for metabolic information [12]. However, both CT and PET examinations expose the patients to radiation, whereas MRI has the unique advantage of not causing radiation damage [13]. MRI is a medical imaging technique that uses electromagnetic signals obtained from the human body by magnetic resonance to generate images of organs [14]. Moreover, it is highly sensitive to brain contraction and can be used to construct 3D brain tissue images at high resolution [15]. Therefore, it is promising to use MRI to understand and diagnose AD in clinical practice. With the rapid development and wide application of artificial intelligence (AI) in the medical field, computer-aided diagnosis (CAD) of AD using neuroimaging may be an auxiliary method to assist physicians. Here, CAD can be regarded as an image understanding and classification problem. Deep learning (DL), in particular convolutional neural networks (CNNs), has proved to be an effective method of feature extraction from images and has provided state-of-the-art (SOTA) solutions in different image understanding and recognition tasks. Various DL-based methods for CAD have also been developed. Mansour et al. used AlexNet [16] for diagnosing diabetic retinopathy and achieved an accuracy of 97.93% [17]. Yang et al. designed a patch-based DL framework to detect prostate cancer using MRI data and achieved a specificity of 90.6% [18]. Zhu et al. proposed a landmark-based feature representation method and employed a CNN model for the diagnosis of AD with an accuracy of 91.57% [19]. These methods are supervised learning methods that require a large sample size and high-quality manually annotated data for accurate feature representation [20]. However, it is time-consuming and costly to obtain medical images along with high-quality annotations in practical applications. Therefore, the development of a weakly supervised learning (WSL) method is of great significance to mine massive amounts of medical image data at a low cost and with high accuracy. Mahajan et al. presented a WSL method using ResNext [21] as the backbone network and trained the model using images from the Instagram website for pretraining. The hashtags were used as labels, and the pre-trained model was fine-tuned on the ImageNet dataset. The model achieved a top-1 accuracy of 85.4% on the ImageNet-1k benchmark [22]. In 2020, Hinton et al. presented the simple framework SimCLR for contrastive learning of visual representations and trained the SimCLR in a self-supervised learning manner. The SimCLR achieved a high accuracy of 85.8%, using only 1% of the labels of the ImageNet [23].
This paper proposes a WSL-based DL framework for the identification and classification of AD. The proposed framework consists of two parts, i.e., the backbone network with an attention mechanism and the task networks. This paper provides the following contributions:
  • An attention module (AM) is proposed to improve the discriminative ability of the backbone network with a low computational cost. The AM is an automatic weighting module that adjusts the weights of the channels in the feature maps so that the backbone network selectively focuses on the significant parts of the input.
  • The task networks perform two tasks (image classification and image reconstruction) in parallel. The task networks utilize the feature vector generated by the backbone network and use fully-connected (FC) layers and a decoder for label prediction and image reconstruction.
  • A multi-task learning (MTL) framework is proposed for conducting image recognition and reconstruction in parallel, with low computational requirements and good performance (with the best F1-score of 99.61% and a sensitivity of 99.69%) using only 20% of the labels from the datasets for fine-tuning.
The rest of the paper consists of five parts. The related studies are described in Section 2. The proposed method is explained in Section 3. Section 4 summarizes the results. The discussion is presented in Section 5, and the conclusion is given in Section 6.

2. Related Works

2.1. Multi-Task Learning

Multi-task learning is a transfer learning method that extracts domain-specific information from related tasks for an improved representation of the input data [24]. The concept of MTL was first proposed by Caruana et al. [25] and has been applied in many fields. Various studies have been conducted to explore effective MTL methods. Misra et al. built a novel sharing unit to learn representation from different tasks and reported excellent results [26]. Lu et al. developed an adaptive feature sharing mechanism in MTL to identify different attributes of people [27]. In the above studies, each task has a same priority. While some studies focused on a single task, whereas others acted as auxiliary tasks. Auxiliary tasks, which provide additional information, provide valid information from aspects of the main task. Zhang et al. used head pose estimation and attribute prediction of faces as auxiliary tasks and facial landmark detection as the main task; it was found that the precision of this method was higher than that of other methods [28]. Our framework belongs to this type of transfer learning method.

2.2. Weakly Supervised Learning

It is well-known that supervised learning-based models require a large amount of well-labeled data to obtain accurate predictions. In contrast, unsupervised learning-based models typically lack high precision, and the learning process is less effective than that of supervised learning methods [29]. Weakly supervised learning is a machine learning technique with the objective of learning effective feature representation from limited or low-quality annotations [30]. Various WSL-based methods have been explored in different fields. Hu et al. proposed a CNN-based WSL framework for the task of multimodal image registration and achieved STOA performance [31]. Wang et al. developed a WSL-based method for accurate automated segmentation of remote sensing data with a proposed U-Nets framework and obtained superior segmentation performance [32]. ResNeXt WSL is the most recent WSL method. It was used to pre-train images from the Instagram website, followed by fine-tuning on the ImageMet dataset [22]. SimCLR, an unsupervised learning method, was used in combination with WSL and achieved SOTA performance [23].

2.3. Image Classification

The objective of image classification is to classify an image or instance into categories [33]. With the rapid development and verification of CNNs, various CNN-based frameworks have been developed and used for image classification [34]. LeCun et al. first develop the CNN framework (LeNet) for document recognition [35]. AlexNet, which is an improvement of LeNet and surpassed traditional machine learning methods, won the Imagenet competition in 2012 [16]. Since then, different types of CNN architectures have been proposed, such as ResNet [36], ResNext [21], and InceptionNet [37]. In this study, we adopted the structure of the residual block, which is used in ResNet [36].

2.4. Image Reconstruction

Image reconstruction, which is a critical problem in medical imaging, is a technique for creating 2D or 3D images from sets of 1D projections [38]. The autoencoder is the most popular technique and has proved effective in image reconstruction of unlabeled images [39]. In this study, we designed a sub-network to perform image reconstruction using abundant features.

3. Materials and Methods

3.1. The Pipeline of the Proposed Framework

The proposed WSL-based DL framework, which is called ADGNET, is a CNN-based single-input-multi-output (SIMO) architecture consisting of two components: an improved backbone network with the attention mechanism and task nets that consists of two sub-networks, i.e., the classification sub-network (CSN) and the reconstruction sub-network (RSN). The backbone network has a residual network structure with the proposed AM to obtain highly discriminative representations while suppressing unrelated regions in the images. As shown in Figure 1, the backbone network consists of five convolution stages (C1-Attention to C5-Attention), followed by the Resnet. Generally, feature maps generated by the deeper stages contain more semantic information, and those generated by the shallower stages contain more detailed information, such as edges and corners. The backbone network extracts features step-by-step from the MRI input images and generates a pooling map using global average pooling (GAP). The task nets use the pooling map as input and flatten it as a feature vector V f . The V f is then sent to two different task branches; one generates the prediction vector V p using the FC layer, and the other reconstructs the original images using the FC layers and a decoding module. As shown in Figure 1, the V p is sent to the argmax (Amax), which returns the index with the largest value of the axes of the V p and provides the classification results. Here C denotes the number of categories.

3.2. Backbone Network with the Proposed Attention Module

The backbone network is a multi-stage convolution network that follows the Resnet to avoid the gradient vanishing problem. Different stages in the backbone network generate feature maps with different resolutions. Notably, the backbone network is a convolution network shared by the two sub-task nets, providing a parameter-efficient and time-efficient method. In the reconstruction task, the backbone network can be regarded as part of an encoder network that automatically learns the feature representation from the images without annotation information. In the classification task, the backbone network can be considered a feature extractor, which is optimized using the supervised learning principle. As described in Section 3.1, each stage of the backbone network generates the attention feature maps after implementing the proposed AM. As shown in Figure 2, the input feature maps F i s with a size of H × W × C and a scale of s at layer i are the output of the ith stage of Resnet. Given the F i s as input, the AM outputs the channel attention factors (CAF) with a size of 1 × 1 × C so that the network can automatically determine the importance of the extracted features.This process can also be regarded as a feature filtering and selection method that improves the discrimination ability of the network at low computational cost. The CAF are then fused with the F i s using the element-wise multiplication operation (EWMO), and the fused feature map F i s is the output. The feature extraction process of the backbone network can be expressed as follows:
C A F = A M ( F i s ) F i s = C A F F i s
where AM is the proposed attention module, and ⊗ is the EWMO.
The AM is an automatic weighting module that learns the channel weights of the input feature maps. As shown in Figure 2, the input F i s is first downsampled using the global max-pooling operation to retain important information while reducing the computational cost. The downsampled feature maps are then flattened into a one-dimensional vector. The flattened vector is then sent to a convolution layer with a 1 × 1 kernel to extract features from the vector and adjust its dimension. The extracted features are sent to the batch norm (BN) layer and activated using the ReLU function to speed up the training and convergence speed of the module and increase its nonlinear representation ability. After these operations, an FC layer (Linear) with ReLU as the activation function is adopted to output the CAF with a size of 1 × 1 × C . The final output is generated using the sigmoid function to convert the values of the CAF to a range of 0 to 1; these values can be considered the importance scores of the channels in the F i s .

3.3. Task Sub-Networks

As shown in Figure 1, the task sub-networks consist of the CSN and the RSN. The input to the two parts is the flattened vector ( V f ). The two tasks are performed in parallel. The CSN is a simple FC layer called F C p that generates a vector with a size of 1 × C . The vector is then sent to the sigmoid function to generate the prediction vector ( V p ) with a range of 0 to 1 that is used as a probability of prediction of the specified classes. The process of the CSN can be formulated as follows:
V p = σ ( F C ( V f ) )
where FC is the FC layer F C p ; σ is the sigmoid function. The RSN consists of two components, i.e., the encoder and the decoder. The encoder is constructed using the backbone network and two FC layers called F C e 1 and F C e 2 . The backbone network extracts and abstracts the features step-by-step, and the two FC layers encode the features into a vector ( V e ). The decoder component is a multi-layer transposed convolution network. The details of the decoder component are shown in Figure 3. The input of the decoder component is the V e after the reshape operation, which converts the V e to a two-dimensional feature map. The decoder is a modular network consisting of two parts with multiple transposed convolution layers. Each transposed convolution layer in the decoder has multiple transposed convolution kernels with a size of 3 × 3 , a stride of 2, and a padding of 1. As shown in Figure 3, there are M × transposed convolution layers and ReLU layers in the first part, which decode the input feature maps and up-sample the input. The second part of the encoder contains a convolution layer and a Tanh layer. The convolution layer is used for dimension normalization to convert the feature maps to the same size as the input MRI image. The Tanh layer is used to output the predicted MRI image since it has a wide range of predicted values, improving the prediction accuracy. The RSN process can be formulated as follows:
V e = F C e 2 ( F C e 1 ( V f ) ) I m r = tanh ( D e c ( V e ) )
where F C e 1 and F C e 2 are the FC layers; tanh is the tanh function. Dec is the proposed decoder part and I m r is the reconstructed image.

3.4. Loss Function

The proposed ADGNET can be trained in an end-to-end manner. The loss function of the framework is composed of two parts and is defined as follows:
L = λ 1 L c l s + λ 2 L r e c
where L c l s and L r e c are the classification loss and the image reconstruction loss, respectively. λ 1 and the λ 2 are the weighting factors which balance the two losses.
As we described in the introduction, it is difficult to distinguish dementia in the early and middle stages from normal aging because of the small differences in brain imaging. Thus, we adopted the focus idea, as introduced in previous works [40,41], to ensure that the framework focuses primarily on difficult and misclassified samples. We proposed a new loss function based on the cross-entropy loss. The modified classification loss is defined as follows:
L c l s = 1 N ( ( 1 i ) γ i l o g ( 1 i ) + i γ i l o g ( i ) )
where N is the number of samples participating in a single optimization. γ i represents the class-wise weight reduction factors, which adjust the importance of different samples for an improved representation. i is the ground truth probability of a target belonging to a given class, and i is the prediction probability of the target belonging to the given class.
The new loss function is a modification based on the cross-entropy (CE) loss. As shown in Figure 1, the final probability of a sample be a specified category is generated using the sigmoid function which range from 0 to 1. Therefore, the equation 5 demonstrated the loss for each specified category following the formulation of the CE loss, and the class-wise weight reduction factors γ i can be considered as a numerical vector. In our manuscript, the values of γ i were all set as 2 as default.
We used the mean square error function as the reconstruction loss; it is defined as follows:
L r e c = 1 N i = 1 N ( Y i Y i ^ ) 2
where N is the number of samples participating in a single optimization. Y i is the ground truth value of the input sample, and Y i ^ is the prediction value of the reconstructed sample.

4. Results

4.1. Multi-Modal Brain Imaging Dataset

In this study, the proposed ADGNET was evaluated on two different brain imaging datasets for a comprehensive assessment. The two brain imaging datasets are the Kaggle Alzheimer’s classification dataset (KACD) [42] and the Recognition of Alzheimer’s Disease dataset (ROAD) [43]. The example data of the two datasets are shown in Figure 4. Each dataset was divided into two parts: a train-val part and a test part using the train-test-split function (TTSF) from the scikit-learn library. The details of the split are shown in Table 1. The KACD dataset contains 6400 2D MRI images from 6400 cases, and each case is assigned into one of four categories: Non-Demented, Very Mild Demented, Mildly Demented, and Moderately Demented. The ROAD contains 532 3D MRI images from 532 cases, and each case is assigned into one of three categories: Non-Demented, Mildly Demented, and Alzheimer’s disease. As shown in Table 1, the data set was separated into two parts, including a training-val set (TVS) for training and selection of model weights and an independent test set (TS) to evaluate the performance of the models. The TVS of the KACD contains 5121 2D MRI images, and the TS of the KACD contains 1279 2D MRI images. The TVS of the ROAD contains 300 3D MRI images, and the TS of the ROAD contains 232 3D MRI images.

4.2. Evaluation Metrics

The Kappa score (Kappa), sensitivity (Sen), specificity (Spe), precision (Pr), accuracy (Acc) and F1-score metrics were used to evaluate the performance of the proposed ADGNET comprehensively. The equations of the six metrics are as follows:
p e = ( ( T N + F N ) × ( T N + F P ) + ( T P + F P ) × ( T P + F N ) ) / ( N × N )
p 0 = ( T P + T N ) / N
Given the definitions of pe and p0, the Kappa score is defined as follows:
K a p p a = ( p 0 p e ) / ( 1 p e )
The sensitivity is defined as:
S e n = T P / ( T P + F N )
The specificity is defined as:
S p e = T N / ( T N + F P )
The precision is defined as:
P r = ( T P ) / ( T P + F P )
The accuracy is defined as:
A C C = ( T P + T N ) / ( T P + T N + F P + F N )
The F1-score is defined as:
F 1 S c o r e = 2 × P r × S e n / ( P r + S e n )
where TP represents the true positive, TN represents the true negative, FP represents the false positive, and FN represents the false negative. Six evaluation metrics (Kappa, Sen, Spe, Pr, Acc and F1-score) were employed to evaluate the performance of the proposed ADGNET and other SOTA WSL-based methods. The Kappa is a statistical indicator of the stability of the model prediction. The Sen is related to the positive prediction rate and is a significant indicator in medical diagnosis. The Spe indicates the correctness of the model’s prediction and also has great significance in medical diagnosis. The Pr refers to the ability of the model to provide a positive prediction. The Acc is an indicator of the correctness of the model’s prediction. The F1-score is the harmonic mean of the Pr and Sen.

4.3. Experimental Results

The performance of the proposed ADGNET was evaluated using multimodal datasets (2D MRI images and 3D MRI images) to assess the generalization ability and transferability of the model with six metrics (Kappa, Sen, Spe, Pr, Acc and F1-score). Two sets of experiments were conducted (experiments A and B) to evaluate the performance of the proposed ADGNET. A bootstrapping method was used to calculate the empirical distributions of the boxplots. All experiments were conducted on the KACD dataset and ROAD dataset. An ablation study was also conducted to better demonstrate the effectiveness of the proposed framework. As shown in Table 2 and Table 3, the ADGNET (proposed) means the proposed framework as demonstrated in Figure 1; the ADGNET (no RSN) means the Subnet2:RSN as shown in Figure 1 was excluded while the rest of the proposed framework are retained and trained with the same amount of annotations; the ADGNET (no AM) means the Attention Mechanism as shown in Figure 2 was excluded while the rest of the proposed framework are retained and trained with the same amount of annotations. The training and inference processes were performed on four Nvidia GTX 2080Ti GPUs and Intel Xeon E5-2600 v4 3.60 GHz CPU using the Pytorch framework.

4.3.1. Experiment A: Performance on the KACD Dataset (Comparison between the Proposed ADGNET, ResNeXt WSL and SimCLR)

In this experiment, we used the 2D MRI images from the KACD dataset to evaluate the models’ performances. The overall results of the six metrics for the proposed ADGNET, the ResNeXt WSL, and the SimCLR are listed in Table 2. The optimum performance was obtained by ADGNET, with an F1-score of 99.61%, followed by SimCLR (98.67%) and ResNeXt WSL (98.37%). The Acc was highest for ADGNET (99.61%), followed by SimCLR (98.60%) and ResNeXt WSL (98.36%). The Pr, Spe, Sen and Kappa of ADGNET were 99.53%, 99.53%, 99.69% and 99.22%, respectively. The values of the indices were higher than those of ResNeXt WSL (97.84%, 97.81%, 98.91% and 96.72%) and SimCLR (98.60%, 98.59%, 98.75% and 97.34%). The corresponding boxplots of the six evaluation metrics (Kappa, Sen, Spe, Pr, Acc and F1-score) of the models’ performance on the KACD dataset are shown in Figure 5.

4.3.2. Experiment B: Performance on the ROAD Dataset (Comparison between the Proposed ADGNET, ResNeXt WSL and SimCLR)

In this experiment, we used the 3D MRI images from the ROAD dataset to evaluate the models’ performance. The overall result of the six metrics for the proposed ADGNET, the ResNeXt WSL, and the SimCLR are listed in Table 3. The best performance was obtained by the proposed ADGNET, with an F1-score of 98.49%, followed by SimCLR (93.00%) and ResNeXt WSL (92.61%). The Acc was highest for ADGNET (98.71%), followed by ResNeXt WSL (93.53%) and SimCLR (93.00%). The Pr, Spe, Sen and Kappa of ADGNET were 98.99%, 99.24%, 98.00% and 97.36%, respectively. The values of the indices were higher than those of ResNeXt WSL (91.26%, 93.18%, 94.00% and 86.87%) and SimCLR (93.00%, 94.70%, 93.00% and 87.70%). The corresponding boxplots of the six evaluation metrics (Kappa, Sen, Spe, Pr, Acc and F1-score) of the models’ performance on the ROAD dataset are shown in Figure 6.

4.3.3. Training Details

The TVS of each dataset was split into 5 parts using a stratified sampling method. The model was trained using 20% of the labels. A 5-fold cross-validation was adopted to evaluate the performance of the trained model. The samples in the TS of each dataset were used to verify the performance of the proposed ADGNET.

5. Discussion

We developed a CAD method for the identification and classification of AD in multi-modal brain imaging data (2D MRI and 3D MRI) using WSL-based DL techniques. Excellent performance was obtained by the proposed ADGNET based on six evaluation metrics, and the method proved superior to two SOTA WSL-based methods. The proposed ADGNET is a modular framework consisting of a backbone and the task subnets. We used a residual block in the design of the backbone network to retain the features while preventing degradation of the framework. We incorporated an AM into the backbone network to ensure the high discriminatory ability of the backbone network with a low computational cost. Unlike the conventional methods like Resnext WSL which assign the same weight to each channel, the proposed AM learned the channel weights of the input feature maps from the supervised information and the images for an improved feature representation of the samples. This helps the framework to focus more on the most discriminative part from the feature space. The feature vector obtained from the pooling map of the backbone network was flattened and sent to the two sub-networks. The CSN extracted the feature information directly from the vector and was optimized using the supervised information. The RSN encoded the vector to a new feature space using two FC layers and used a decoder to reconstruct the input images. The two FC layers and the backbone network comprised the encoder that was used for feature coding. The proposed decoder network consisted of the transposed convolution layer and the convolution layer. The objective of the transposed convolution layer was to learn the feature information for image reconstruction, and the convolution layer was used to adjust the number of channels and generate the final output. Unlike previous WSL-based methods (e.g., ResNeXt WSL and SimCLR), which have to be pre-trained on a large independent dataset and fine-tuned on the target dataset, the proposed ADGNET is trained in an end-to-end manner. The training process of the ADGNET is controlled by adjusting the weighting parameters. When λ 1 is zero, the network only learns the features from the images. When λ 2 is zero, the network only learns the features from the annotations. In this way, the large-scale unlabeled data can be fully utilized to help the proposed framework obtain more stable and representative features. Excellent performance was achieved by ADGNET based on the six statistical metrics for the multi-modal brain imaging datasets (KACD (2D MRI) and ROAD (3D MRI)). In order to intuitively analyze the experimental results, the heatmaps of the proposed ADGNET and the two SOTA models were demonstrated in Figure 7. As can be seen from Figure 7, the proposed ADGNET is able to capture key features while retaining more features by means of the AM and the RSN. While the ResNeXt WSL and SimCLR only use very limited features for prediction and their prediction scores are relatively low. Notably, the ADGNET’s prediction score is quite higher than ResNeXt WSL and SimCLR, which indicated that the ADGNET’s prediction is more reliable. The ADGNET has a promising potential as an auxiliary tool to assist in the diagnosis of AD due to its high performance, good stability, and cross-modal flexibility. Besides, medical diagnosis in a real situation is much more complex than in experimental environments, and sufficient and high-quality annotations are difficult to obtain. Therefore, the development of our proposed WSL-based DL methods is crucial for diagnosing conditions such as AD. However, the proposed ADGNET may also encounter some problems when the distribution of the data has extremely category imbalance. It is also important to develop effective generative frameworks to generate a large amount of effective data for compensation with WSL-based methods.

6. Conclusions

This study presented a unique WSL-based DL framework for the identification and classification of AD using multi-modal brain imaging data (2D MRI and 3D MRI). The proposed ADGNET provided excellent performances on six metrics (Kappa, Sen, Spe, Pr, Acc and F1-score), outperforming the two SOTA WSL-based models on two public datasets (KACD (2D MRI) and ROAD (3D MRI)) using limited annotation (only 20% of the labels). Most notably, the Kappa of the ADGNET was 0.9922 on the KACD dataset and 0.9736 on the ROAD dataset. These values were 2.50% and 1.88% higher than those of the two SOTA methods on the KACD dataset and 10.49% and 9.66% higher on the ROAD dataset, respectively.
The excellent performance achieved by ADGNET indicates that the proposed AM and the framework are suitable for the task and that the model is superior to the two SOTA WSL-based methods. The proposed AM module enabled the ADGNET to automatically assign different weights for different channels in the feature maps for a better capture of discriminative features. It is well-known that obtaining a large sample size and high-quality annotations of medical images is time-consuming and expensive. The introduction of sub-network for image reconstruction task help the ADGNET acquire effective features mining from large scale unlabeled data. Therefore, the development of WSL-based DL methods might represent a potential research direction to achieve accurate mining of massive medical data. In the future, the potential of this framework will be explored in-depth for other challenging tasks, including the detection of brain tumors and other major diseases.

Author Contributions

All authors contributed extensively to the study presented in this manuscript. S.L. and Y.G. contributed significantly to the conception of the study. S.L. designed the network and conduct the experiments. S.L. and Y.G. provided, marked, and analyzed the experimental results. Y.G. supervised the work and contributed with valuable discussions and scientific advice. All authors contributed in writing this manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology of the People’s Republic of China (Grant No. 2017YFB1400100) and the National Natural Science Foundation of China (Grant No. 61876059).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the Ministry of Science and Technology of the People’s Republic of China (Grant No. 2017YFB1400100) and the National Natural Science Foundation of China (Grant No. 61876059) for their support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADAlzheimer’s Disease
ADGNETAlzheimer’s Disease Grade Network
WSLWeakly Supervised Learning
DLDeep Learning
WHOWorld Health Organization
MCIMild Cognitive Impairment
CTComputed Tomography
PETPositron Emission Tomography
MRIMagnetic Resonance Imaging
CADComputer-Aided Diagnosis
CNNConvolution Neural Network
SOTAState-of-the-Art
AMAttention Module
MTLMulti-Task Learning
SIMOSingle-Input-Multi-Output
CSNClassification Sub-Network
RSNReconstruction Sub-Network
GAPGlobal Average Pooling
FCFully Connected
CAFChannel Attention Factors
EWMOElement-Wise Multiplication Operation
BNBatch Norm
KACDKaggle Alzheimer’s Classification Dataset
ROADRecognition of Alzheimer’s Disease Dataset
TTSFTrain-Test-Split Function
TVSTraining-Val Set
TSTesting Set
SenSensitivity
SpeSpecificity
PrPrecision
AccAccuracy

References

  1. Alzheimer’s Disease International. World Alzheimer Report 2019: Attitudes to Dementia. Available online: https://www.alz.co.uk/research/WorldAlzheimerReport2019.pdf (accessed on 20 November 2020).
  2. World Health Organization. World Health Organization (2018) The Top 10 Causes of Death. Available online: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death (accessed on 24 May 2018).
  3. Korczyn, A.D. Why have we failed to cure Alzheimer’s disease? J. Alzheimers Dis. 2012, 29, 275–282. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Sanford, A.M. Mild cognitive impairment. Clin. Geriatr. Med. 2017, 33, 325–337. [Google Scholar] [CrossRef] [PubMed]
  5. Petersen, R.C.; Stevens, J.C.; Ganguli, M.; Tangalos, E.G.; Cummings, J.; DeKosky, S.T. Practice parameter: Early detection of dementia: Mild cognitive impairment (an evidence-based review): Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 2001, 56, 1133–1142. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Alberdi, A.; Aztiria, A.; Basarab, A. On the early diagnosis of Alzheimer’s Disease from multimodal signals: A survey. Artif. Intell. Med. 2016, 71, 1–29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Frisoni, G.B.; Fox, N.C.; Jack, C.R.; Scheltens, P.; Thompson, P.M. The clinical use of structural MRI in Alzheimer disease. Nat. Rev. Neurol. 2010, 6, 67–77. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Trombella, S.; Assal, F.; Zekry, D.; Gold, G.; Giannakopoulos, P.; Garibotto, V.; Démonet, J.F.; Frisoni, G.B. Brain imaging of Alzheimer’disease: State of the art and perspectives for clinicians. Rev. Medicale Suisse 2016, 12, 795–798. [Google Scholar]
  9. Dubois, B.; Feldman, H.H.; Jacova, C.; Hampel, H.; Molinuevo, J.L.; Blennow, K.; DeKosky, S.T.; Gauthier, S.; Selkoe, D.; Bateman, R. Advancing research diagnostic criteria for Alzheimer’s disease: The IWG-2 criteria. Lancet Neurol. 2014, 13, 614–629. [Google Scholar] [CrossRef]
  10. Beaulieu, J.; Dutilleul, P. Applications of computed tomography (CT) scanning technology in forest research: A timely update and review. Can. J. For. Res. 2019, 49, 1173–1188. [Google Scholar] [CrossRef] [Green Version]
  11. Zhang, B.; Gu, G.j.; Jiang, H.; Guo, Y.; Shen, X.; Li, B.; Zhang, W. The value of whole-brain CT perfusion imaging and CT angiography using a 320-slice CT scanner in the diagnosis of MCI and AD patients. Eur. Radiol. 2017, 27, 4756–4766. [Google Scholar] [CrossRef]
  12. Jack, C.R., Jr.; Wiste, H.J.; Schwarz, C.G.; Lowe, V.J.; Senjem, M.L.; Vemuri, P.; Weigand, S.D.; Therneau, T.M.; Knopman, D.S.; Gunter, J.L. Longitudinal tau PET in ageing and Alzheimer’s disease. Brain 2018, 141, 1517–1528. [Google Scholar] [CrossRef] [Green Version]
  13. Domingues, I.; Pereira, G.; Martins, P.; Duarte, H.; Santos, J.; Abreu, P.H. Using deep learning techniques in medical imaging: A systematic review of applications on CT and PET. Artif. Intell. Rev. 2020, 53, 4093–4160. [Google Scholar] [CrossRef]
  14. Debette, S.; Schilling, S.; Duperron, M.G.; Larsson, S.C.; Markus, H.S. Clinical significance of magnetic resonance imaging markers of vascular brain injury: A systematic review and meta-analysis. JAMA Neurol. 2019, 76, 81–94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Battineni, G.; Chintalapudi, N.; Amenta, F.; Traini, E. A Comprehensive Machine-Learning Model Applied to Magnetic Resonance Imaging (MRI) to Predict Alzheimer’s Disease (AD) in Older Subjects. J. Clin. Med. 2020, 9, 2146. [Google Scholar] [CrossRef] [PubMed]
  16. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  17. Mansour, R.F. Deep-learning-based automatic computer-aided diagnosis system for diabetic retinopathy. Biomed. Eng. Lett. 2018, 8, 41–57. [Google Scholar] [CrossRef] [PubMed]
  18. Song, Y.; Zhang, Y.D.; Yan, X.; Liu, H.; Zhou, M.; Hu, B.; Yang, G. Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI. J. Magn. Reson. Imaging 2018, 48, 1570–1577. [Google Scholar] [CrossRef] [PubMed]
  19. Zhu, T.; Cao, C.; Wang, Z.; Xu, G.; Qiao, J. Anatomical Landmarks and DAG Network Learning for Alzheimer’s Disease Diagnosis. IEEE Access 2020, 8, 206063–206073. [Google Scholar] [CrossRef]
  20. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
  21. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
  22. Mahajan, D.; Girshick, R.; Ramanathan, V.; He, K.; Paluri, M.; Li, Y.; Bharambe, A.; van der Maaten, L. Exploring the limits of weakly supervised pretraining. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 181–196. [Google Scholar]
  23. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. arXiv 2020, arXiv:2002.05709. [Google Scholar]
  24. Zhang, Y.; Yang, Q. An overview of multi-task learning. Natl. Sci. Rev. 2018, 5, 30–43. [Google Scholar] [CrossRef] [Green Version]
  25. Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
  26. Misra, I.; Shrivastava, A.; Gupta, A.; Hebert, M. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3994–4003. [Google Scholar]
  27. Lu, Y.; Kumar, A.; Zhai, S.; Cheng, Y.; Javidi, T.; Feris, R. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5334–5343. [Google Scholar]
  28. Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 94–108. [Google Scholar]
  29. Li, Y.F.; Guo, L.Z.; Zhou, Z.H. Towards safe weakly supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Zhou, Z.H. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 2018, 5, 44–53. [Google Scholar] [CrossRef] [Green Version]
  31. Hu, Y.; Modat, M.; Gibson, E.; Li, W.; Ghavami, N.; Bonmati, E.; Wang, G.; Bandula, S.; Moore, C.M.; Emberton, M. Weakly-supervised convolutional neural networks for multimodal image registration. Med. Image Anal. 2018, 49, 1–13. [Google Scholar] [CrossRef] [PubMed]
  32. Wang, S.; Chen, W.; Xie, S.M.; Azzari, G.; Lobell, D.B. Weakly supervised deep learning for segmentation of remote sensing imagery. Remote Sens. 2020, 12, 207. [Google Scholar] [CrossRef] [Green Version]
  33. Wang, W.; Yang, Y.; Wang, X.; Wang, W.; Li, J. Development of convolutional neural network and its application in image classification: A survey. Opt. Eng. 2019, 58, 040901. [Google Scholar] [CrossRef] [Green Version]
  34. Zhang, J.; Xie, Y.; Wu, Q.; Xia, Y. Medical image classification using synergic deep learning. Med. Image Anal. 2019, 54, 10–19. [Google Scholar] [CrossRef] [PubMed]
  35. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
  36. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  37. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  38. Zhu, B.; Liu, J.Z.; Cauley, S.F.; Rosen, B.R.; Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature 2018, 555, 487–492. [Google Scholar] [CrossRef] [Green Version]
  39. Xu, W.; Keshmiri, S.; Wang, G. Adversarially approximated autoencoder for image generation and manipulation. IEEE Trans. Multimed. 2019, 21, 2387–2396. [Google Scholar] [CrossRef] [Green Version]
  40. Goyal, P.; Kaiming, H. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 39, 2999–3007. [Google Scholar]
  41. Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. arXiv 2020, arXiv:2006.04388. [Google Scholar]
  42. Dubey, S. Alzheimer’s Dataset (4 Class of Images). Available online: https://www.kaggle.com/tourist55/alzheimers-dataset-4-class-of-images (accessed on 29 November 2020).
  43. CCF BDCI. Recognition of Alzheimer’s Disease Dataset. Available online: https://www.datafountain.cn/competitions/369 (accessed on 29 November 2020).
Figure 1. The proposed framework. The framework consists of two parts, including a backbone network with an attention mechanism that acts as a shared network for extracting salient features and task nets that contain two sub-networks. The two sub-networks simultaneously conduct two sub-tasks, i.e., classification and reconstruction.
Figure 1. The proposed framework. The framework consists of two parts, including a backbone network with an attention mechanism that acts as a shared network for extracting salient features and task nets that contain two sub-networks. The two sub-networks simultaneously conduct two sub-tasks, i.e., classification and reconstruction.
Sensors 21 00220 g001
Figure 2. The proposed attention module.
Figure 2. The proposed attention module.
Sensors 21 00220 g002
Figure 3. Details of the proposed decoder component. The decoder consists of two parts, including a transposed convolution layer with a ReLU activation function and a convolution with a convolution layer and a Tanh layer.
Figure 3. Details of the proposed decoder component. The decoder consists of two parts, including a transposed convolution layer with a ReLU activation function and a convolution with a convolution layer and a Tanh layer.
Sensors 21 00220 g003
Figure 4. Examples of multi-modal data; (a) magnetic resonance imaging (MRI) images (2D) from the KACD dataset, (b) MRI images (3D) from the ROAD dataset.
Figure 4. Examples of multi-modal data; (a) magnetic resonance imaging (MRI) images (2D) from the KACD dataset, (b) MRI images (3D) from the ROAD dataset.
Sensors 21 00220 g004
Figure 5. Boxplots of the six evaluation metrics of the models in experiment A. (a) kappa score. (b) sensitivity. (c) specificity. (d) precision. (e) accuracy. (f) F1-score.
Figure 5. Boxplots of the six evaluation metrics of the models in experiment A. (a) kappa score. (b) sensitivity. (c) specificity. (d) precision. (e) accuracy. (f) F1-score.
Sensors 21 00220 g005
Figure 6. Boxplots of the six evaluation metrics of the models in experiment B. (a) kappa score. (b) sensitivity. (c) specificity. (d) precision. (e) accuracy. (f) F1-score.
Figure 6. Boxplots of the six evaluation metrics of the models in experiment B. (a) kappa score. (b) sensitivity. (c) specificity. (d) precision. (e) accuracy. (f) F1-score.
Sensors 21 00220 g006
Figure 7. Visualization of heatmaps. We compare the visualization heatmaps of the ADGNET (proposed, no reconstruction sub-network (RSN) and no attention module (AM)), the ResNeXt WSL (weakly supervised learning), and the SimCLR. The heatmap visualization is calculated for the last convolutional outputs and P denotes the prediction score of each network for the ground-truth category.
Figure 7. Visualization of heatmaps. We compare the visualization heatmaps of the ADGNET (proposed, no reconstruction sub-network (RSN) and no attention module (AM)), the ResNeXt WSL (weakly supervised learning), and the SimCLR. The heatmap visualization is calculated for the last convolutional outputs and P denotes the prediction score of each network for the ground-truth category.
Sensors 21 00220 g007
Table 1. Distribution of the Kaggle Alzheimer’s classification dataset (KACD) and Recognition of Alzheimer’s Disease dataset (ROAD).
Table 1. Distribution of the Kaggle Alzheimer’s classification dataset (KACD) and Recognition of Alzheimer’s Disease dataset (ROAD).
KACDTrain-ValTestTotalROADTrain-ValTestTotal
NoneDemented25606403200NoneDemented6852120
Very Mild Demented17924482240Very Mild Demented---
Mild Demented717179896Mild Demented151116267
Moderate Demented521264Alzheimer’s disease8164145
Total512112796400Total300232532
Table 2. Performance indices of the proposed ADGNET framework of the experiment A and the average performance of the two state-of-the-art (SOTA) models on the KACD dataset.
Table 2. Performance indices of the proposed ADGNET framework of the experiment A and the average performance of the two state-of-the-art (SOTA) models on the KACD dataset.
KACD Dataset
ADGNET (Proposed)ADGNET (No RSN)ADGNET (No AM)ResNeXt WSLSimCLR
Kappa (95%CI)0.99220.97810.98120.96720.9734
(0.9844, 0.9984)(0.9656, 0.9890)(0.9703, 0.9906)(0.9514, 0.9797)(0.9609, 0.9844)
Sen (95% CI)0.99690.99060.99220.98910.9875
(0.9921, 1.0000)(0.9824, 0.9970)(0.9845, 0.9984)(0.9804, 0.9955)(0.9785, 0.9953)
Spe (95% CI)0.99530.98750.98900.97810.9859
(0.9890, 1.0000)(0.9783, 0.9953)(0.9799, 0.9955)(0.9663, 0.9888)(0.9769, 0.9937)
Pr (95% CI)0.99530.98750.98910.97840.9860
(0.9894, 1.0000)(0.9780, 0.9953)(0.9798, 0.9956)(0.9670, 0.9889)(0.9805, 0.9922)
Acc (95% CI)0.99610.98910.99060.98360.9860
(0.9922, 0.9992)(0.9828, 0.9945)(0.9851, 0.9953)(0.9757, 0.9898)(0.9805, 0.9922)
F1-score (95% CI)0.99610.98910.99060.98370.9867
(0.9922, 0.9992)(0.9828, 0.9945)(0.9849, 0.9956)(0.9756, 0.9901)(0.9806, 0.9922)
Table 3. Performance indices of the proposed ADGNET framework of the experiment B and the average performance of the two SOTA models on the ROAD dataset.
Table 3. Performance indices of the proposed ADGNET framework of the experiment B and the average performance of the two SOTA models on the ROAD dataset.
ROAD Dataset
ADGNET (Proposed)ADGNET (No RSN)ADGNET (No AM)ResNeXt WSLSimCLR
Kappa (95% CI)0.97360.92100.94730.86870.8770
(0.9387, 1.0000)(0.8696, 0.9654)(0.9032, 0.9825)(0.7986, 0.9300)(0.8143, 0.9308)
Sen (95% CI)0.98000.96000.97000.94000.9300
(0.9500, 1.0000)(0.9175, 0.9906)(0.9310, 1.0000)(0.8889, 0.9804)(0.8764, 0.9770)
Spe (95% CI)0.99240.96210.97730.93180.9470
(0.9754, 1.0000)(0.9274, 0.9924)(0.9503, 1.0000)(0.8824, 0.9699)(0.9091, 0.9835)
Pr (95% CI)0.98990.95050.97000.91260.9300
(0.9674, 1.0000)(0.9053, 0.9897)(0.9347, 1.0000)(0.8509, 0.9619)(0.8775, 0.9780)
Acc (95% CI)0.98710.96120.97410.93530.9300
(0.9698, 1.0000)(0.9353, 0.9828)(0.9526, 0.9914)(0.9009, 0.9655)(0.9095, 0.9655)
F1-score (95% CI)0.98490.95520.97000.92610.9300
(0.9655, 1.0000)(0.9246, 0.9817)(0.9436, 0.9903)(0.8832, 0.9608)(0.8912, 0.9630)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liang, S.; Gu, Y. Computer-Aided Diagnosis of Alzheimer’s Disease through Weak Supervision Deep Learning Framework with Attention Mechanism. Sensors 2021, 21, 220. https://doi.org/10.3390/s21010220

AMA Style

Liang S, Gu Y. Computer-Aided Diagnosis of Alzheimer’s Disease through Weak Supervision Deep Learning Framework with Attention Mechanism. Sensors. 2021; 21(1):220. https://doi.org/10.3390/s21010220

Chicago/Turabian Style

Liang, Shuang, and Yu Gu. 2021. "Computer-Aided Diagnosis of Alzheimer’s Disease through Weak Supervision Deep Learning Framework with Attention Mechanism" Sensors 21, no. 1: 220. https://doi.org/10.3390/s21010220

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop