Next Article in Journal
Progressive Ataxia, Memory Impairments, and Seizure Episodes in Spna2 R1098Q Mouse Variant Affecting Alpha II Spectrin’s Scaffold Stability
Previous Article in Journal
Neuroplastic Changes in Addiction Memory—How Music Therapy and Music-Based Intervention May Reduce Craving: A Narrative Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

OViTAD: Optimized Vision Transformer to Predict Various Stages of Alzheimer’s Disease Using Resting-State fMRI and Structural MRI Data

by
Saman Sarraf
1,2,*,
Arman Sarraf
3,
Danielle D. DeSouza
4,
John A. E. Anderson
5,
Milton Kabia
2 and
The Alzheimer’s Disease Neuroimaging Initiative
1
Institute of Electrical and Electronics Engineers, Piscataway, NJ 08854, USA
2
School of Technology, Northcentral University, San Diego, CA 92123, USA
3
Department of Electrical and Software Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada
4
Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
5
Departments of Cognitive Science and Psychology, Carleton University, Ottawa, ON K1S 5B6, Canada
*
Author to whom correspondence should be addressed.
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigator within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/howtoapply/ADNIAcknowledgementList.pdf, 15 November 2021.
Brain Sci. 2023, 13(2), 260; https://doi.org/10.3390/brainsci13020260
Submission received: 12 November 2022 / Revised: 19 January 2023 / Accepted: 1 February 2023 / Published: 3 February 2023
(This article belongs to the Section Neurodegenerative Diseases)

Abstract

:
Advances in applied machine learning techniques for neuroimaging have encouraged scientists to implement models to diagnose brain disorders such as Alzheimer’s disease at early stages. Predicting the exact stage of Alzheimer’s disease is challenging; however, complex deep learning techniques can precisely manage this. While successful, these complex architectures are difficult to interrogate and computationally expensive. Therefore, using novel, simpler architectures with more efficient pattern extraction capabilities, such as transformers, is of interest to neuroscientists. This study introduced an optimized vision transformer architecture to predict the group membership by separating healthy adults, mild cognitive impairment, and Alzheimer’s brains within the same age group (>75 years) using resting-state functional (rs-fMRI) and structural magnetic resonance imaging (sMRI) data aggressively preprocessed by our pipeline. Our optimized architecture, known as OViTAD is currently the sole vision transformer-based end-to-end pipeline and outperformed the existing transformer models and most state-of-the-art solutions. Our model achieved F1-scores of 97 % ± 0.0 and 99.55 % ± 0.39 from the testing sets for the rs-fMRI and sMRI modalities in the triple-class prediction experiments. Furthermore, our model reached these performances using 30% fewer parameters than a vanilla transformer. Furthermore, the model was robust and repeatable, producing similar estimates across three runs with random data splits (we reported the averaged evaluation metrics). Finally, to challenge the model, we observed how it handled increasing noise levels by inserting varying numbers of healthy brains into the two dementia groups. Our findings suggest that optimized vision transformers are a promising and exciting new approach for neuroimaging applications, especially for Alzheimer’s disease prediction.

1. Introduction

An early diagnosis of Alzheimer’s disease (AD) delays the onset of dementia consequences for this life-threatening brain disorder and reduces the mortality rate and billion dollars cost of caring for AD patients [1,2,3,4]. The damages that Alzheimer’s disease inflict are widespread, mostly targeting memory. Over time, shrinkage of the brain, the atrophy of the posterior cortical brain tissue degradation in the right temporal, parietal, and left frontal lobes and ventricular expansion interfere with patients’ language, and memory abilities [5,6,7]. Researchers consider a transition phase known as mild cognitive impairment (MCI) from normal aging to acute AD, which often takes two to six years. As a result, patients lack focus, exhibit poor decision-making and judgment, experience time and location confusion, and suffer the onset of memory loss [8,9,10].
Among various biomarkers examinations such as blood and clinical tests, neuroimaging has remained the primary approach for medical practitioners to attempt an early prediction of Alzheimer’s disease [11,12,13,14]. However, neurologists conduct various neuroimaging tests to diagnose Alzheimer’s disease, since the impact of normal aging and early-stage Alzheimer’s are barely distinguishable in neuroimaging [15].
Today, artificial intelligence (AI) in neuroimaging is considered an emerging technology where neuroscientists employ and adapt novel and advanced algorithms to analyze medical imaging data [16,17,18]. Over the past decade, deep learning techniques have enabled medical imaging scientists to predict various stages of Alzheimer’s disease [19,20]. Using robust computational resources such as cloud computing, the scientists could implement end-to-end prediction pipelines to preprocess medical imaging data, build complex deep learning models, and post-process results to assist medical doctors in distinguishing early-stage MCI brains from highly correlated normal aging images [21,22,23,24].
Convolutional neural networks (CNNs) inspired by the human visual system form the core image classification component of pipelines. CNN-based classifiers consist of sophisticated feature extractors that retrieve hierarchical patterns from brain images and produce highly accurate predictions [25,26,27,28,29,30]. Although CNN models often require a light preprocessing pipeline, and the models aim to lessen the impact of noise implicitly, many studies have shown that a comprehensive preprocessing pipeline, to prepare neuroimaging data significantly improves prediction performance [31,32,33].
Advances in CNNs architectures and hybrid CNNs with other architectures, such as recurrent neural networks (RNNs), have significantly improved the performance and multi-stage AD prediction [34,35,36]. The central pillar of CNN-based pipelines is the convolutional layer, considered an invariant operator in signal and image processing. The convolutional layer reduces the sensitivity of the image classification pipeline to morphological variations such as shift and rotation [37,38,39].
Also, multi-dimension filters in CNN models and various combinations of feature map concatenation enhance such models’ invariant characteristics [40,41]. However, the high complexity of models with hundreds of millions of trainable parameters, requiring high computations with an enormous amount of data, is considered a disadvantage of such methods [42,43,44]. Moreover, CNN models incorporate contextual information into training without considering positional embedding offered by transformer block [45,46].
In this study, we explore a novel method to bridge the gap of position-based context extraction through an optimized vision transformer. The literature review shows that using the vision transformer in predicting Alzheimer’s disease is in a very early stage, and this study opens a new avenue to employ vision transformers in this domain. We implement two separate end-to-end pipelines to predict the triple class of Alzheimer’s disease stages where pre- and postprocessing modules play crucial roles in improving prediction performance. Also, we analyze the impact of merging MCI data with healthy control and Alzheimer’s brains to analyze modeling performance in binary classification tasks.
We repeat each model three times using random data splits and assess our pipelines using standard evaluation metrics by averaging across repetitions to ensure the robustness and reproducibility of our models. Finally, we visualize the attention and features maps to demonstrate the global impact of attention mechanisms employed in the architecture.

2. Related Work

Machine learning applications in predicting various stages of Alzheimer’s disease have been of interest to numerous researchers, who began employing classical techniques such as support vector machines [47,48]. Researchers extracted features from Alzheimer’s imaging data using autoencoders and classical techniques to classify AD and MCI brains. This approach introduced more advanced feature extractors compared to the classical methods, which improved the performance of AD prediction [49,50]. The next generation of predictive models included many CNN architectures to classify mainly AD and HC brains. The successful binary classification motivated imaging and neuroscientists to employ sophisticated techniques to address 3-class prediction tasks of HC vs. MCI vs. AD [51,52,53,54].
Besides 2D CNN architectures, 3D convolutional layers enabled scientists to incorporate the volumetric data into the training process. Such approaches produced promising predictions using structural MRI data [55,56,57]. The 3D models used the signal intensity at the voxel level and applied the convolution operator to 3D filters and previous-layer feature maps. Although 3D models became popular due to producing high accuracy rates, many scientists challenged these techniques, conducting experiments in which 2D models outperformed 3D models [54,58,59,60].
Some research groups considered using functional MRI 4D data to predict various stages of AD, where they composed the brain images into 2D samples along with depth and time axes. The data decomposition method produced a significant amount of data for training and resulted in a nearly perfect binary classification performance, outperforming most of the models built by structural data. The major challenge in using 4D fMRI data was to establish a preprocessing pipeline to prepare the data for model development [61,62,63,64].
Recurrent neural networks (RNNs) and their subsequent architectures, such as long- and short-term memory LSTM models, capture features from a sequence of data that are useful to extract temporal relationships encoded in Alzheimer’s imaging data [65,66]. A special use of LSTM models occurs in longitudinal analysis for Alzheimer’s disease prediction. In this approach, researchers extract spatial maps from imaging data using various feature extractors, such as multi-layer perceptron (MLP), and train bi-directional LSTMs to address the AD classification problems. This two-step prediction allows neuroscientists to explore the patterns in longitudinal imaging, which are suppressed in cross-sectional methods [67,68,69]. However, the extra step of explicit feature extraction and the complex impact of sensitive longitudinal analysis remains the major challenge of using such methods [70,71].
The next category of machine learning methods used for Alzheimer’s disease prediction is hybrid modeling, where CNNs and RNNs models extract hierarchical and temporal features in a cascade architecture. The CNN component of such networks is considered the central feature extractor, and the RNN-LSTM component extracts position-related features and forms the core of the model [35,71,72].
Multimodal imaging in the same category provides complementary information from each modality, such as fMRI, structural MRI, and PET, that often transfer the predicted labels to a postprocessing or ensemble model. Since the nature of each modality is different, using combined data to build a unique model for AD prediction produces poor performance because the model hardly converges [73,74,75,76]. Some researchers considered a hybrid approach, using clinical and imaging data to develop separate models that followed a predictive model. Such a technique offers strong decision-making, since the misprediction by imaging models is compensated for by clinical data [77,78].
Transformers with the various implementations of attention mechanisms stemming from natural language processing (NLP) domains have been of interest to scientists regarding whether such technology is adaptable for Alzheimer’s disease prediction [79,80]. For example, a deep neural network with transformer blocks was the core of an Alzheimer’s study to assess risks using targeted speech [81].
Transformers’ temporal or sequential feature extraction capability allowed researchers to develop end-to-end solutions to predict Alzheimer’s through a longitudinal model known as TrasforMesh using structural data [82]. Also, a universal brain encoder based on a transformer with attention mechanisms offered model explainability to analyze 3D MRI data [82]. The transformer technology has motivated scientists to implement predictive models using 3D data in non-Alzheimer’s studies, such as defect assessment of knee cartilage [83]. To date, our proposed method of using an optimized vision transformer (OViTAD) to predict various stages of Alzheimer’s is considered the first initiative in adopting this technology.

3. Materials and Methods

3.1. Datasets

We used two sets of Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/, 15 July 2021), including fMRI and structural MRI imaging data. We recruited older adults (age group > 75) for both imaging modalities in this study with the aim of suppressing the effect of aging on modeling. Using only older adults in this study enabled us to ensure our models predict the Alzheimer’s stages not aging effect. We ensured ground truth quality; we cross-checked the participants’ proposed labels by ADNI with their mini-mental state examination (MMSE) scores. The fMRI dataset contained 275 participants scanned for resting-state fMRI (rs-fMRI) studies; we found 52 Alzheimer’s (AD), 92 healthy control (HC), and 131 MCI brains in our fMRI dataset. The structural MRI dataset included 1076 participants, where we found 211 AD, 91 HC, and 744 MCI brains. Table 1 shows the participants’ demographic details for both modalities categorized into three groups: gender, age, and MMSE scores.

3.2. Image Acquisition Protocol

ADNI provided a standard protocol to scientists to acquire imaging data using three Tesla scanners, including General Electric (GE) Healthcare, Philips Medical Systems, and Siemens Medical Solutions machines [84]. We ensured that the two datasets utilized in this study were collected using the same scanning parameters. The protocol stated that the functional scans were performed using an echo-planar imaging (EPI) sequence (150 volumes, repetition time (TR) = 2 second (s), echo to time (TE) = 30 milliseconds (ms), flip angle (FA) = 70 degrees, filed-of-view (FOV) = 20 centimeters (cm)) that produced 64 × 64 matrices with 30 axial slices of 5 millimeters (mm) thickness without a gap. The structural MRI data acquisition employed a 3-dimensional (3D) magnetization prepared rapid acquisition gradient echo sequence known as MPRAGE (TR = 2 s, TE = 2.63 ms, FOV = 25.6 cm) that produced 256 × 256 matrices with 160 slices of 1mm thickness.

3.3. Data Preprocessing

3.3.1. rs-fMRI

We considered an extensive 7-step pipeline to preprocess the rs-fMRI data to preprocess our data from scratch, as the research indicated that enhanced preprocessing rs-fMRI data improved the performance of modeling [85,86]. First, we converted the raw rs-fMRI data, downloaded from ADNI in digital imaging and communications in medicine (DICOM) format, to neuroimaging informatics technology initiative (NIfTI/NII) format using an open-source tool known as the dcm2niix software [87]. We removed skull and neck voxels considered non-brain regions from the structural T1-weighted imaging data corresponding to each fMRI time course using FSL-BET software [88]. Third, using FSL-MCFLIRT [89], we corrected the rs-fMRI data for motion artifact caused by low-frequency drifts, which could negatively impact the time course decomposition. Finally, we applied a standard slice timing correction (STC) method known as Hanning-Windowed Sinc Interpolation (HWSI) to each voxel’s time series. According to the ADNI data acquisition protocol, the brain slices were acquired halfway through the relevant volume’s TR; therefore, we shifted each time series by a proper fraction relative to the middle point of TR period. We spatially smoothed the rs-fMRI time series using a Gaussian kernel with 5 mm full width half maximum (FWHM). Next, we employed a temporal high-pass filter with a cut-off frequency of 0.01 HZ (Sigma = 90 s) to remove low-frequency noise. We registered the fMRI brains to the corresponding high-resolution structural T1-weighted scans using an affine linear transformation with seven degrees of freedom (7 DOF). Subsequently, we aligned the registered brains to the Montreal Neurological Institute standard brain template (MNI152) using an affine linear transformation with 12 DOF [90]. We resampled the aligned brains by a 4 mm kernel that generated 45 × 54 × 45 brain slices per time course. The rs-fMRI preprocessing pipeline produced 4-dimensional (4D) data, including time series within T [ 124 , 200 ] with the mode of 140 data points per participant; therefore, we obtained 4D NIfTI/NII files of 45 × 54 × 45 × T.

3.3.2. Structural MRI

We preprocessed the structural MRI data from scratch using a 6-step pipeline where we first converted the DICOM raw images to NifTi/NII format using dcm2niix software [87]. Next, we extracted the brain regions by removing the skull and neck tissues from the data [88]. Then, using the FSL-VBM library [91], we segmented the brain images into grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF). We used the GM images to register to the GM ICBM-152 standard template using a linear affine transformation with 6 DOF. Next, we concatenated the brain images, flipped them along the x-axis, then re-averaged to create a first-pass, study-specific template as a standard approach [88]. Next, we re-registered the structural MRI brains to the template using a non-linear transformation, and then resampled to create a 2 × 2 × 2 mm 3 GM template in the standard space. Per FSL-VBM standard protocol, we applied a modulation technique to the structural MRI data by multiplying each voxel by the Jacobian of the warp field to compensate for the enlargement that occurred via the non-linear component of transformation. Subsequently, we used all the concatenated and averaged 3D GM images (one 3D sample per participant) to create a 4D data stack. Finally, we smoothed the structural MRI data using a range of Gaussian kernels with sigma = 3, 4 (FWHM of 4.6, 7, and 9.3 mm), as the research showed that the smoothing significantly impacted the performance of modeling [92,93]. The structural MRI preprocessing pipeline produced two sets (one set per sigma) of 3D NIfTI/NII files of 91 × 109 × 91.

3.4. Proposed Architecture: Optimized Vision Transformer (OViTAD)

Inspired by a transformer built for natural language processing use cases [94], vision transformers have been adopted for computer vision tasks such as image classification or object detection. The vanilla vision transformer [95] employs a dozen multi-head self-attention (MHSA) layers, considered the transformer blocks building the core of architecture. This algorithm splits an input image into small patches that are passed through positional embedding and transformer encoder layers. During the training process, the positional information is incorporated by attention layers which a similar to better predict farther data points from the current state [94,95]. The vision transformer generated patches from a given set of preprocessed images, converted the 2D arrays into 1D arrays, and decomposed them along the axes for the three channels. The dimension of each patch is calculated by multiplying the number of channels by the height and width of the patches. We prepared the linearly embedded arrays to feed into the next blocks. To address the objective of our multiple-class Alzheimer’s prediction, where we used specific imaging data dimensions; we set our transformer’s input dimension to 56 × 56 for fMRI and 112 × 112 for structural MRI, which were the closest meaningful dimensions reflecting popular image size. This data-driven approach allowed us to bypass a computationally massive grid search by optimizing the network’s hyperparameters. Since we reduced the vision transformer input dimension from 224 × 224 × 3 to 112 × 112 × 3 and 56 × 56 × 3, we reduced the number of heads in MHSA in architecture to optimize the architecture. The core intention was to improve the efficiency of our model while producing the same or better performance compared to the vanilla version with reduced trainable parameters. In the next step, the vision transformer used a positional embedding to feed the arrays to the transformer 8-head self-attention block with six layers in depth, which applied a set of standard steps to the arrays similar to the original architecture [94,95]. To decrease the chance of overfitting, we set our dropout and embedding dropout to 0.1. We used a multi-perceptron layer, known as the fully connected layer of 2048 neurons, to translate the features extracted by the optimal vision transformer to a format usable for the cross-entropy loss function to evaluate classification performance. Figure 1 pictures the architecture of the optimized vision transformer implemented in this study.
We used DeepViT, which is a deeper version of a vision transformer, to build our baselines [96]. DeepViT employs a mechanism known as re-attention, instead of MHSA, to reproduce attention maps to increase the diversity of features extracted by the architecture. The re-attention layers benefit from the interaction across various heads to capture further information, which improves the diversity of attention maps through a learnable transformation matrix known as Q. Figure 2 (left) demonstrates the DeepViT transformer block with its re-attention mechanism. To enhance the scope of our benchmarking, we used another vision transformer image classifier known as class-attention in image transformers (CaIT) that introduced a class-attention layer [97]. The CaIT architecture consists of two major components: (a) standard self-attention step which is identical to the ViT transformer, and (b) a class-attention layer step, including a set of operations to convert the positionally embedded patches into class embedding arrays (CLS), followed by a linear classification method. CaIT with the CLS mechanism avoids the saturation of deep vision transformers in the early state and allows the model to further learn across training. Figure 2 (right) shows the CaIT transform block.

3.5. fMRI Pipeline

We categorized the preprocessed 4D fMRI samples (one NIfTI/NII per participant) into AD, HC, and MCI classes. In the next step, we used a stratified split of 80%–10%–10% and randomly shuffled data at class level to generate three training, validation, and testing sets. Therefore, the sets included 226, 27, and 31 participants for training, validation, and testing. The main objective of this study was to perform a multiclass prediction; however, we expanded our modeling approach to explore the impact of merging MCI data with two other classes and generated samples for AD + MCI vs. HC and AD vs. HC + MCI experiments. Using our optimized model, we also built AD vs. HC and HC vs. MCI models for a consistent comparison with the literature. To perform a consistent comparison, we used the identical data splits generated for multiclassification for the two binary classifications, where we only modified the corresponding ground truth according to experiments.

3.5.1. Data Decomposition from 4D to 2D

We decomposed the 4D fMRI data and z and t axes into 2D images using a lossless data conversion method to generate portable network graphics (PNG) samples for model development. We first loaded the NIfTI files into memory using the Nibabel package available at https://nipy.org/nibabel/, 20 August 2021 and employed the Python OpenCV library available at OpenCV.org to store the decomposed 2D images in the server. Next, we removed the last ten brain slices and empty brain images to improve data quality. To find the empty slices, we measured the sum of pixel intensity in a given brain image and only stored images with non-zero-sum. Equation (1) shows the details of fMRI data decomposition.
f o r z = 1 t o Z 10 f o r t = 1 t o T i f S I z , t ( B S z , t ( x , y ) ) = X x = 1 Y y = 1 B S ( x , y ) 0 : B S z , t ( x , y ) P N G ( B S z , t ( x , y ) ) o t h e r w i s e : I g n o r e B S z , t ( x , y )
where X, Y, and Z represent the spatial dimensions of fMRI data (45, 54, 45), and T refers to 140 data points of a given fMRI time course. S I ( z , t ) represents the sum of voxel intensity in a given brain slice, B S z , t ( x , y ) represents and PNG denotes the lossless data conversion function. The decomposition module produced 1,433,880 images consisting of 1,141,280, 138,600, and 154,000 samples for training, validation, and testing purposes.

3.5.2. Modeling

The central objective of our fMRI pipeline was to address the multiclass prediction of AD, HC, and MCI using our designed optimal vision transformer. Furthermore, we considered two additional binary classification experiments mentioned earlier: (a) AD + MCI against HC and (b) HC + MCI against AD, to explore the clinical impact of merging MCI with the other classes. We built our optimized vision transformer (OViTAD) and three other baselines—CaIT, DeepViT, and vanilla vision transformer—and used the Amazon Web Services’ (AWS) SageMaker infrastructure as our development environment. We spun up a p3.8xlarge instance with 32 virtual central processing units (vCPUs) and 244 gigabyte (GB) memory. The instance included four NVIDIA TESLA V100-SXM2-16GB graphical processing units (GPUs) and 10 GB per second (Gbps) network performance. We trained all the models for 40 epochs and a batch size 64 using the Adam optimization method with a learning rate lr = 3 × 10−5, gamma = 0.7, and stepsize = 1. We monitored modeling performance across the epochs using accuracy rates and loss scores for training and validation sets. We used the accuracy rate of validation sets as the criteria for selecting the best model. We implemented the prediction module to load the stored best models into the memory and predict validation and testing sets with their probability scores at slice level. We evaluated the performance of the models using a standard classification report by calculating precision, recall, F1-score, and accuracy rates. Table A1 in the Appendix A demonstrates the models’ performance at slice level for validation and test datasets and three repetitions (random data splits) of fMRI experiments.

3.5.3. Subject-Level Evaluation

We designed our modeling based on the decomposition of brain image into a 2D image; therefore, the performance obtained from the prediction module demonstrated the slice-level performance. To calculate the performance of our models at the subject level (see Table A1 Appendix A), we applied a vote for majority method to the predicted labels by aggregating the results based on subjects’ identifiers (IDs). Next, we calculated the probability of each class per subject and then voted for the class with the highest probability. Finally, we used our standard classification report to measure the performance of our models at the subject level. Table A2 in the Appendix A shows the models’ performance for validation and test datasets and three repetitions (random data splits) of experiments for fMRI data. First, we calculated the macro average (macro-ave) and weighted average (weighted-avg) for precision, recall, and F1-score evaluation metrics. Next, we analyzed at model level to explore classification performance across the experiments. We used the weighted average scores of the aforementioned four metrics and calculated each experiment’s average and standard deviation against three repetitions (random data splits). Table 2 shows the performance of models for validation and test sets with the averaged metrics and the corresponding standard deviation values. We summarized the results of this table in Figure A5 comparing the performance of fMRI models using averaged F1-score for three testing sets.

3.6. Structural Pipeline

3.6.1. Data Split

We categorized the preprocessed 3D structural MRI samples (one NIfTI/NII per participant) into AD, HC, and MCI classes. In the next step, we used a stratified split of 80%–10%–10% and randomly shuffled data at class-level to generate three training, validation, and testing sets for two sets of preprocessed data S3 (sigma = 3 mm) and S4 (sigma = 4 mm). Therefore, the sets included 1167, 144, and 149 participants for training, validation, and testing, respectively. Similar to the fMRI pipeline, we explored the impact of merging MCI data with AD and HC. We used the identical data splits generated for multiclass prediction to address the binary classification experiments by updating the corresponding ground truth; this strategy allowed us to perform a consistent comparison across experiments and two sigma variations.

3.6.2. Data Decomposition 3D to 2D

We employed the same technique explained in Equation 1 to decompose 3D MRI data into 2D PNG images. As the structural MRI data are constructed without temporal information, we set the time parameter in the equation to T = 1. The structural MRI decomposition module produced 111,899 images per set containing 89,446, 11,040, and 11,413 samples for training, validating, and testing our models.

3.6.3. Modeling

The main objective of the structural MRI pipeline was to conduct a multiclass prediction of AD, HC, and MCI classes using two sets of preprocessed data (sigma = 3, 4) and to evaluate our proposed optimal vision transformer architecture. Also, we used four other models as baselines similar to the fMRI pipeline to investigate the performance of optimal architecture. Furthermore, we considered combining MCI data with AD and HC to classify (a) AD + MCI against HC and (b) HC + MCI against AD. Similar to the fMRI pipeline, we utilized AWS SageMaker as the development environment on a p3.8xlarge instance equipped with NVIDIA GPUs. We trained all the models for 40 epochs and a batch size 64 using the Adam optimization method with a learning rate lr = 3 × 10−5, gamma = 0.7, and step_size = 1. Using loss scores and accuracy rates of training and validation sets, we evaluated the training process and selected the best model based upon the highest accuracy rate obtained from the validation sets. Since we designed our vision transformers to use 2D images, we developed a prediction module to output validation and test sets’ labels at slice level. We employed our standard classification report module to generate a macro and weighted average of precision, recall, F1-scores, and accuracy rates. We show the slice-level performance of structural MRI models in Table A3 and Table A4 in Appendix A and for sigma = 3, 4.

3.6.4. Subject-Level

We used the predicted labels for brain slices and aggregated the results by the subject IDs to calculate the models’ performance at subject level; the slice-level performance is shown in Table A4 Appendix. Then, using the postprocessing module based on the voting for majority concept, we counted the number of each class prediction in an experiment and measured each class probability. In the next step, we assigned the corresponding label of the highest probability to a given subject. Finally, we employed our standard classification reports as described earlier, and generated the evaluation scores at the subject level. Table A5 and Table A6 in the Appendix demonstrate the subject-level performance of structural MRI models for preprocessed data with spatial smoothing sigma = 3, 4, respectively. To measure the performance of experiments at the model level, we used the weighted average evaluation scores and calculated the average and standard deviation of the scores for both structural MRI datasets, shown in Table 3. We summarized the results of this table in Figure A6 comparing the performance of sMRI models using averaged F1-score for three testing sets.

3.7. Discussion

3.7.1. Technical/Architecture Design

We designed an optimized vision transformer architecture to predict multiple stages of Alzheimer’s disease using fMRI and MRI data. Our end-to-end pipeline for two modalities was built on four major components: (a) aggressive preprocessing of fMRI and MRI data, (b) data decomposition from higher dimensions to 2D, (c) vision transformer model development, and (d) postprocessing. The core concept of this study was to explore the capability of vision transformers to predict Alzheimer’s stages. We exhaustively trained models to conduct a comprehensive evaluation of our proposed architecture. We investigated the performance of our baselines and our proposed architecture against fMRI and two sets of structural MRI data to address the 3-class AD vs. HC vs. MCI, AD vs. HC + MCI, and AD + MCI vs. HC classifications. To demonstrate the robustness of our modeling approach, we repeated each experiment with random data splits three times. More random data splits, such as five to ten runs, could be explored in future work. We reported the performances at the slice level and subject level, which led us to compare our models across all experiments (model level). We proposed an optimized vision transformer architecture as the core of our end-to-end prediction pipeline. Our optimization approach is based on the scientific fact of using an image input size of architecture that has the closest and most meaningful input dimensions of preprocessed fMRI data. Therefore, we set the architecture input dimension to 56 × 56 and resample our data (45 × 54) to fit our optimal architecture, where the originality of data content remains through minimal upsampling. Next, we consider reducing the number of heads in the multi-head attention layer to decrease the complexity and trainable parameters of the network. We showed in Table 4 that we decreased the input image size and trainable parameters in the optimized network by 75% and 28% compared to the vanilla vision transformer, while improving the models’ performance in the fMRI experiments and producing a similar performance to other models in the structural MRI experiments. Unlike grid search-based optimization, which requires massive model development to achieve an optimal architecture and topology, our fact- and data-driven optimization method, which stems from the impact of input size, produced faster converging modeling. This allowed us to explore a broader set of model development and clinical analysis.
We consider the fMRI testing datasets as our gold standard to compare the performance of our models. Unlike training and validation datasets, the testing datasets are unseen and never used in the training processes. The models’ performance at the subject. level using fMRI data, shown in Table 3, reveals that OViTAD, DeepViT, ViT-vanilla, and ViT-224-8 in AD-HCMCI classification outperforms other models with an F1-score of 0.99 ± 0.02 . Also, among the models trained for the 3-class AD vs. HC vs. MCI prediction, our optimized OViTAD model is on par with the ViT-vanilla, and ViT-224-8 outperforms other models with an F1-score of 0.97 ± 0.02 ; our optimized models contain much fewer trainable parameters than other models. Also, we investigated the impact of the postprocessing step developed based on voting for the majority algorithm, and the results indicated that the models’ performance at the subject level (after postprocessing) with an averaged F1-score of 0.89 ± 0.02 across all experiments (testing datasets) are higher (with 3% improvement) than slice-level ones with an averaged F1-score of 0.86 ± 0.02 . This finding aligned with the literature [54,58,64] that shows that postprocessing plays a crucial role in improving the performance of modeling and proves that decomposition of data from higher dimensions to 2D and back-transforming the slice-level predictions to the subject level improve the quality of prediction significantly.
Similar to the above approach, we consider the structural MRI (sigma = 3) testing datasets as the golden standard to investigate the best-performing model. The results shown in Table 3 reveal that our OViTAD is on par with DeepViT, ViT-vanilla, and ViT-224-8 in the A D M C I H C S3 and S4 experiments in terms of F1-scores at the subject level. To explore the central objective of this study, we reviewed the performance of models for 3-class AD vs. HC vs. MCI prediction. The results indicated that ViT-vanilla and ViT-224-8 competed with our OViTAD and produced an F1-score of 0.99 ± 0.01 (0.004 is negligibly higher than OViTAD) using MRI S3. After preprocessing, the original MRI dimension was 91X109, and we downsampled the structural MRI data to 112 × 112, causing a loss in contextual information. Similarly, we analyzed the behavior of our models trained and evaluated by the preprocessed MRI with sigma = 4 testing datasets. Our OViTAD model using MRI S4 was on par with other architectures, producing the best performance with an F1-score of 0.99 ± 0.01 .
The results suggest that the input size and number of patches in the attention layers greatly impact the performance of the structural MRI models. In a similar observation to fMRI testing datasets, the models’ performance at the subject level (after postprocessing with voting for a majority) increased by 7% compared to the slice-level models across the experiments for sigma = 3, 4. Our analysis indicated that spatial smoothing with a Gaussian kernel of sigma = 3 mm resulted in slightly higher evaluation scores across the study (an average increase of 0.43% in sigma = 3 compared to the sigma = 4 dataset) which aligns with the previous research; however, the improvement is negligible [54,58]. Spatial smoothing is important in preprocessing MRI data that removes random noise in a given voxel’s neighborhoods [98,99].
This finding implies that the nature of features extracted by attention layers in the vision transformer should differ from the features extracted by convolutional layers since the impact of sigma = 3, 4 in the previous studies was negligible [54,58]. Next, we calculated the confusion matrix of testing samples normalized per group for the best-performing OViTAD fMRI (test set 2), MRI-S3, and MRI-S4 (test sets 3) models in the multiple classification experiment to predict AD vs. HC vs. MCI, illustrated in Figure 3. The performance of the best-performing OViTAD models for the same test sets across 40 epochs is shown in Figure A4, Appendix A.
Moreover, we comprehensively compared our findings and the recent literature reviews in which the ADNI dataset was used for Alzheimer’s disease classification. We carefully selected the most current, highly referenced studies and offered novel techniques where the performance of models was highly competitive. Table 5 compares the performance achieved by OViTAD in the two modalities with the most highly referenced recent literature. Our finding shows that this study offers a broader range of classifications where the optimized vision transformer outperforms the state-of-the-art models.

3.7.2. Clinical Observation

We considered combining the health control brains with Alzheimer’s and mild cognitive impairment brains to generate new sets from the ADNI dataset to perform two binary classification tasks using all the models. The fMRI models revealed a consistent pattern in which the AD vs. HC + MCI models outperformed AD + MCI vs. HC by 4.64% with respect to averaged F1-scores across all experiments shown in Table 2. This finding revealed some level of similarity between HC and MCI functional data. Also, the results showed that our predictive models could differentiate HC data from non-HC data, which revealed that our models properly addressed the aging effect in this study. Furthermore, we analyzed the binary models trained by structural MRI data for AD + MCI vs. HC and AD vs. HC + MCI experiments for the two sigma = 3,4. The results indicated that our HC vs. AD + MCI models outperformed AD vs. HC + MCI by 2.82%, respecting the averaged F1-scores across all experiments for the two sigma values shown in Table 3.

3.7.3. Local and Global Attention Visualization

We extracted the attention weights and produced post-SoftMax for eight self-attention heads with a depth of six. Then, using a random AD fMRI brain slice, we generated the self-attention maps based on OViTAD for AD vs. HC vs. MCI classification as shown in Figure A1, Appendix A. The attention maps in each column represent one self-attention head, whereas the maps in each row represent the depth of attention layers. Also, we explored the impact of attention mechanisms at the global level. We utilized the last feature vector of OViTAD—the fMRI AD vs. HC vs. MCI classification, which is a fully connected layer (FC)—and considered it the global attention feature. The FC layer represents the features produced by the self-attention layers; therefore, it contains the information of global attention. We employed an element-wise operator to obtain the sum of multiplication between each pixel and all the elements in the FC vector. Next, we generated the normalized global attention feature maps for a set of AD fMRI slices in the testing set as shown in Equation (2) and visualized the maps using the CIVIDIS color map, illustrated in Figure 4.
i m a g e r e s i z e = R e s i z e ( i m a g e o r i g i n a l 56 × 56 ) G l o b a l A t t e n t i o n F e a t u r e M a p ( G A F M ) = i m a g e r e s i z e × F C v e c t o r G A F M n o r m a l i z e d = ( G A F M m i n ( G A F M ) ) * 255 m a x ( G A F M ) m i n ( G A F M )

3.7.4. Limitations

The number of repetitions for model development is considered a limitation in this research study. Although we utilized a large dataset and generated three random data splits for modeling, it is highly recommended to repeat this exercise up to 10 times with randomly shuffled data to ensure the robustness of OViTAD. Also, we included voting for a majority technique as postprocessing to stabilize models’ performance; however, this step would add an extra layer of computation to our pipeline, increasing the modeling cost. Future work could address such a limitation using upper-dimension models, including 3D vision transformers. Training of vision transformer models is costly; therefore, reducing the image input size discussed in this research decreases training time and inspection latency. Finally, this research study outlined an end-to-end machine learning pipeline to predict Alzheimer’s disease stages using the ADNI dataset so that the models’ performance reflects the accuracy of the pipeline for this dataset. Since the early prediction of this brain disorder is crucial in clinical studies, a variety of existing Alzheimer’s datasets should be explored along with ADNI to examine OViTAD performance in future work.

4. Conclusions

This study introduced an optimized vision transformer called OViTAD to predict healthy, MCI, and AD brains using rs-fMRI and structural MRI (sigma = 3,4 mm) data. The prediction pipeline included two separate preprocessing stages for the two modalities, training and evaluation of slice-level vision transformers and a postprocessing step based on voting for the majority concept. The results showed that our optimized vision transformer outperformed and was on par with the vision transformers-based benchmark. OViTAD 30% reduced the number of trainable parameters compared to the vanilla ViT. The average performance of OViTAD across three repetitions (random data splits) was 97% ± 0.0 and 99.55% ± 0.39 for the two modalities for the multi-class classification experiments, which outperformed most existing deep learning and CNN-based models. Also, we introduced a method of visualizing the attention mechanism’s global effect, enabling scientists to explore crucial brain areas. This study showed that the vision transformers could outperform and compete with the state-of-the-art algorithms to predict various stages of Alzheimer’s disease with less complex architectures.

Author Contributions

S.S. designed, implemented and executed the end-to-end project including data processing, modeling, and writing. A.S. collaborated in modeling and software development. D.D.D., J.A.E.A. and M.K. reviewed and proof read the project and manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

ADNI board of committee approved the use of the dataset in this research study.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. The slice-level models’ performance is described in this table for the validation and test datasets and three repetitions (random data splits). The classification report includes the macro and weighted average precision, recall, and F1-score. The report also includes the accuracy rate and the number of unseen slices used for each model evaluation.
Table A1. The slice-level models’ performance is described in this table for the validation and test datasets and three repetitions (random data splits). The classification report includes the macro and weighted average precision, recall, and F1-score. The report also includes the accuracy rate and the number of unseen slices used for each model evaluation.
ModelDatasetRepetitionAccuracyPrecision macro_avgPrecision weighted_avgRecall macro_avgRecall weighted_avgF1-Score macro_avgF1-Score weighted_avgSlices
CaIT_AD-HC-MCIVal10.49840.41160.4490.39910.49840.36140.442138600
20.49380.43350.45980.38680.49380.35190.4379133560
30.48770.41480.44040.3550.48770.27770.3673134400
Test10.50110.47010.48650.4020.50110.36040.4396155820
20.49890.48280.48950.40050.49890.35770.4377156100
30.46910.35070.40310.37230.46910.31480.3865155820
CaIT_AD-HCMCIVal10.80810.4040.6530.50.80810.44690.7223138600
20.81660.40830.66680.50.81660.44950.7341133560
30.80210.90110.84130.50010.80210.44520.714134400
Test10.81130.40570.65820.50.81130.44790.7268155820
20.81170.40580.65880.50.81170.4480.7273156100
30.79790.89890.83870.50.79790.44390.7082155820
CaIT_ADMCI-HCVal10.67990.63890.660.60.67990.60070.6546138600
20.6750.64960.6590.5480.6750.51210.6002133560
30.67160.55060.59240.50040.67160.40390.5412134400
Test10.65020.61590.62910.5680.65020.55360.6071155820
20.65720.6540.6550.55820.65720.52330.5879156100
30.63970.47830.52390.49980.63970.39230.5013155820
DeepViT_ADMCI_HCVal10.97230.96960.97230.96810.97240.97110.9723138600
20.99040.98950.99040.99030.99040.98860.9904138600
30.98320.98120.98310.98580.98340.97710.9832133560
Test10.91090.90380.91050.90760.91060.90040.9109155820
20.9890.98820.9890.98940.9890.98690.989155820
30.9910.99040.9910.98820.99120.99280.991156100
DeepViT_AD_HCMCIVal10.96180.9350.96070.96180.96180.91310.9618138600
20.98310.97190.98310.96950.98320.97430.9831133560
30.98750.98040.98750.97790.98760.9830.9875134400
Test10.98070.96830.98060.96980.98060.96680.9807155820
20.97190.9540.97190.95420.97190.95380.9719156100
30.93150.88180.92750.93810.93250.84510.9315155820
DeepViT_AD_HC_MCIVal10.93870.93220.93850.93580.93920.92960.9387138600
20.9350.93520.93520.93650.93610.93470.935133560
30.92990.93230.92990.92820.93030.9370.9299134400
Test10.89970.90490.89950.90190.89960.90830.8997155820
20.90690.90920.90690.91230.90780.90690.9069156100
30.88550.87270.88370.89540.88790.85890.8855155820
ViT24_8_ADMCI_HCVal10.97720.9750.97730.97230.97750.9780.9772138600
20.95720.95270.95720.95140.95730.9540.9572133560
30.95070.94420.95070.9430.95080.94550.9507134400
Test10.91810.91120.91760.91660.91790.90680.9181155820
20.93960.93480.93930.93870.93950.93140.9396156100
30.94610.94120.9460.94380.9460.93880.9461155820
ViT24_8_AD_HCMCIVal10.97380.95640.97340.97080.97360.94360.9738138600
20.98490.97470.98490.97610.98480.97330.9849133560
30.98940.98340.98940.97840.98960.98860.9894134400
Test10.97820.9650.97840.95710.97880.97340.9782155820
20.98560.97650.98560.97430.98560.97870.9856156100
30.9320.88520.92890.92740.93140.85520.932155820
ViT24_8_AD_HC_MCIVal10.94910.9430.94870.94850.94970.93920.9491138600
20.93180.93280.9320.93130.93340.93540.9318133560
30.9220.92410.92180.91830.92230.93080.922134400
Test10.91370.91740.91350.91490.91370.92020.9137155820
20.92070.92320.92070.92260.92080.92410.9207156100
30.8870.87050.8850.88710.88710.85960.887155820
ViT_vanilla_ADMCI_HCVal10.96530.96220.96550.95810.96610.96690.9653138600
20.93760.93080.93760.93120.93760.93040.9376133560
30.94450.93680.94440.93810.94430.93570.9445134400
Test10.93520.92980.93480.93510.93520.92530.9352155820
20.91980.91170.91850.92780.9220.90110.9198156100
30.94060.93550.94060.93550.94060.93550.9406155820
ViT_vanilla_AD_HCMCIVal10.96630.94310.96550.96560.96620.9240.9663138600
20.97770.96340.97790.95680.97820.97020.9777133560
30.98660.97910.98670.97480.98680.98360.9866134400
Test10.98370.97350.98370.96960.98390.97760.9837155820
20.97060.95150.97050.95590.97040.94720.9706156100
30.92130.86770.91790.9060.91950.84020.9213155820
ViT_vanilla_AD_HC_MCIVal10.93590.92450.9350.93290.93550.91830.9359138600
20.90950.90920.90950.90270.91120.91740.9095133560
30.92550.92850.92550.9270.92560.930.9255134400
Test10.91580.91940.91550.92480.91820.91650.9158155820
20.90740.90710.90720.90730.90880.90850.9074156100
30.88760.86530.88390.89470.89010.850.8876155820
OViTAD_ADMCI_HCVal10.96360.96020.96370.95760.9640.96290.9636138600
20.95010.94450.950.94590.94990.94310.9501133560
30.94350.93620.94360.93450.94380.9380.9435134400
Test10.91460.90730.9140.91340.91440.90240.9146155820
20.92810.92110.92710.93480.92970.91160.9281156100
30.92210.91550.92220.91520.92220.91590.9221155820
OViTAD_AD_HCMCIVal10.95390.92180.95270.94670.95340.90120.9539138600
20.97540.95860.97530.96210.97530.95530.9754133560
30.98540.97730.98550.97270.98560.98190.9854134400
Test10.96990.95110.970.94820.97010.95410.9699155820
20.98430.97490.98450.96530.9850.98530.9843156100
30.92050.8670.91730.90220.91840.84120.9205155820
OViTAD_AD_HC_MCIVal10.92390.9140.92320.9250.9240.90550.9239138600
20.91020.910.91020.90970.91120.91110.9102133560
30.91920.92190.91930.9170.9210.92830.9192134400
Test10.89390.90260.89360.90570.89490.90090.8939155820
20.90250.90530.90240.90380.90240.9070.9025156100
30.88820.86820.88520.8910.8890.85570.8882155820
OViTAD_AD_HCVal10.96150.96390.96180.95190.96150.95740.961274900
20.98770.9890.98780.9840.98770.98640.987770420
30.98150.97750.9820.98380.98150.98040.981670700
Test10.96080.95010.96270.96530.96080.95690.961187220
20.95450.94290.95680.95890.95450.950.954987500
30.89460.90040.89620.86940.89460.88140.892587500
OViTAD_HC_MCIVal10.95960.95760.96020.96080.95960.9590.9597112000
20.92650.92310.92770.9280.92650.92510.9267109060
30.92830.92430.9290.92850.92830.92620.9285107800
Test10.90380.90490.90410.90130.90380.90270.9036126420
20.89210.89340.89260.88940.89210.89090.8918126700
30.94190.94290.94210.93980.94190.94110.9418124320
Table A2. The fMRI subject-level models’ performance is described in this table for the validation and test datasets and three repetitions (random data splits). The classification report includes the macro and weighted average precision, recall, and F1-score. The report also includes the accuracy rate and the number of unseen subjects aggregated by the postprocessing module and used for each model evaluation. In this table, AD-HC-MCI refers to multiclass (3-class) prediction, AD-HCMCI refers to AD vs. HC + MCI, and ADMCI-HC represents AD + MCI vs. HC binary classifications.
Table A2. The fMRI subject-level models’ performance is described in this table for the validation and test datasets and three repetitions (random data splits). The classification report includes the macro and weighted average precision, recall, and F1-score. The report also includes the accuracy rate and the number of unseen subjects aggregated by the postprocessing module and used for each model evaluation. In this table, AD-HC-MCI refers to multiclass (3-class) prediction, AD-HCMCI refers to AD vs. HC + MCI, and ADMCI-HC represents AD + MCI vs. HC binary classifications.
ModelDatasetRepetitionAccuracyPrecision macro_avgPrecision weighted_avgRecall macro_avgRecall weighted_avgF1-Score macro_avgF1-Score weighted_avgSubjects
CaIT_AD-HC-MCIVal10.59260.41270.49740.45580.59260.41310.517627
20.55560.38180.46260.41880.55560.37140.47327
30.48150.16050.23180.33330.48150.21670.31327
Test10.45160.31480.37810.34630.45160.2840.35931
20.48390.30.36770.37010.48390.30.382331
30.45160.31480.37810.34630.45160.2840.35931
CaIT_AD-HCMCIVal10.81480.40740.66390.50.81480.4490.731727
20.81480.40740.66390.50.81480.4490.731727
30.81480.40740.66390.50.81480.4490.731727
Test10.80650.40320.65040.50.80650.44640.7231
20.80650.40320.65040.50.80650.44640.7231
30.80650.40320.65040.50.80650.44640.7231
CaIT_ADMCI-HCVal10.70370.68750.69440.58330.70370.57140.650827
20.66670.33330.44440.50.66670.40.533327
30.66670.33330.44440.50.66670.40.533327
Test10.64520.32260.41620.50.64520.39220.50631
20.67740.83330.78490.54550.67740.48330.575331
30.64520.32260.41620.50.64520.39220.50631
DeepViT_ADMCI_HCVal1111111127
2111111127
3111111127
Test10.96770.9640.96740.97620.96930.95450.967731
2111111131
3111111131
DeepViT_AD_HCMCIVal10.9630.93330.96130.97830.96460.90.96327
2111111127
3111111127
Test1111111131
2111111131
30.96770.94470.96660.98080.9690.91670.967731
DeepViT_AD_HC_MCIVal1111111127
2111111127
3111111127
Test10.96770.97260.96750.97780.96990.96970.967731
20.96770.97260.96750.97780.96990.96970.967731
30.93550.93160.93540.95830.94350.91410.935531
ViT24_8_ADMCI_HCVal1111111127
2111111127
3111111127
Test10.96770.9640.96740.97620.96930.95450.967731
20.96770.9640.96740.97620.96930.95450.967731
30.96770.9640.96740.97620.96930.95450.967731
ViT24_8_AD_HCMCIVal1111111127
2111111127
3111111127
Test1111111131
2111111131
30.96770.94470.96660.98080.9690.91670.967731
ViT24_8_AD_HC_MCIVal1111111127
2111111127
30.9630.9680.96260.97620.96560.9630.96327
Test1111111131
20.96770.97260.96750.97780.96990.96970.967731
30.93550.93160.93540.95830.94350.91410.935531
ViT_vanilla_ADMCI_HCVal1111111127
2111111127
3111111127
Test10.96770.9640.96740.97620.96930.95450.967731
20.96770.9640.96740.97620.96930.95450.967731
3111111131
ViT_vanilla_AD_HCMCIVal1111111127
2111111127
3111111127
Test1111111131
2111111131
30.96770.94470.96660.98080.9690.91670.967731
ViT_vanilla_AD_HC_MCIVal1111111127
2111111127
30.9630.96910.96320.96670.96670.97440.96327
Test10.96770.97260.96750.97780.96990.96970.967731
20.96770.97260.96750.97780.96990.96970.967731
30.96770.95820.96680.97780.96990.94440.967731
OViTAD_ADMCI_HCVal1111111127
2111111127
3111111127
Test10.96770.9640.96740.97620.96930.95450.967731
20.96770.9640.96740.97620.96930.95450.967731
3111111131
OViTAD_AD_HCMCIVal10.9630.93330.96130.97830.96460.90.96327
2111111127
3111111127
Test1111111131
2111111131
30.96770.94470.96660.98080.9690.91670.967731
OViTAD_AD_HC_MCIVal1111111127
20.9630.96910.96320.96670.96670.97440.96327
3111111127
Test10.96770.97260.96750.97780.96990.96970.967731
20.96770.97260.96750.97780.96990.96970.967731
30.96770.95820.96680.97780.96990.94440.967731
OViTAD_AD_HCVal1111111114
2111111114
3111111114
Test1111111117
2111111117
30.94120.95830.94610.91670.94120.93280.939817
OViTAD_HC_MCIVal1111111122
2111111122
3111111122
Test10.960.96670.96270.95450.960.95890.959725
20.960.96670.96270.95450.960.95890.959725
3111111125
Table A3. The slice-level performance of structural MRI models using the preprocessed data with spatial smoothing sigma = 3 mm (S3). The naming convention for models and classes is as in the previous tables.
Table A3. The slice-level performance of structural MRI models using the preprocessed data with spatial smoothing sigma = 3 mm (S3). The naming convention for models and classes is as in the previous tables.
ModelDatasetRepetitionAccuracyPrecision macro_avgPrecision weighted_avgRecall macro_avgRecall weighted_avgF1-Score macro_avgF1-Score weighted_avgSlices
CaIT_ADMCI-HC_S3Val10.93170.79970.91430.51880.93170.51910.902511040
20.93160.76030.90980.52720.93160.53450.904611027
30.9310.77620.91050.51910.9310.51970.901811040
Test10.9210.7970.90210.52430.9210.52650.888611413
20.91980.75240.89410.51770.91980.51450.886111427
30.92130.7970.90250.52530.92130.52840.889211455
CaIT_AD-HCMCI_S3Val10.69120.67960.68540.6570.69120.66020.68111040
20.67530.66040.6680.64120.67530.64360.665111027
30.66450.64820.65590.62620.66450.62740.651311040
Test10.66870.65260.66090.63490.66870.6370.658611413
20.66190.64430.65320.62640.66190.62790.650711427
30.6720.65630.66440.63780.6720.640.661811455
CaIT_AD-HC-MCI_S3Val10.67570.69410.67640.48260.67570.4790.649811040
20.66370.65830.65950.46560.66370.45330.635411027
30.65910.6160.64780.48320.65910.49190.633711040
Test10.64960.64090.64420.47010.64960.4640.620811413
20.63310.66460.6370.4480.63310.43110.600411427
30.66750.61570.65380.49720.66750.50380.641911455
DeepViT_ADMCI_HC_S3Val10.98980.9580.98940.98830.98970.93180.989811027
20.99010.96040.98990.97960.990.9430.990111040
30.98290.93260.98270.94440.98250.92150.982911040
Test10.98910.96180.98890.98590.98910.94050.989111427
20.98920.96180.98890.98970.98920.93750.989211413
30.98690.95460.98670.96910.98670.94120.986911455
DeepViT_AD_HCMCI_S3Val10.94520.94210.94480.94940.94620.93670.945211027
20.95050.9480.95030.95190.95070.94480.950511040
30.91860.91370.91780.92250.91960.90760.918611040
Test10.94040.93690.93990.94540.94160.93080.940411427
20.94580.9430.94550.94730.9460.93940.945811413
30.92450.91950.92360.93230.9270.91140.924511455
DeepViT_AD_HC_MCI_S3Val10.92550.91520.92480.93980.92840.89590.925511040
20.91760.90280.9170.92350.91830.88550.917611027
30.91490.90580.91470.92920.91550.88650.914911040
Test10.92020.90790.91930.93880.92410.88490.920211413
20.91030.90140.90990.90960.9110.89450.910311427
30.92080.91560.92060.93640.92150.89820.920811455
ViT44_8_ADMCI_HC_S3Val10.99120.9650.99110.97970.99110.95120.991211027
20.99130.96530.99120.98170.99120.95020.991311040
30.98510.94020.98470.96030.98470.92210.985111040
Test10.99110.96930.9910.98210.9910.95730.991111427
20.98920.96240.9890.98270.98910.94390.989211413
30.98460.94650.98440.96370.98430.93060.984611455
ViT44_8_AD_HCMCI_S3Val10.96070.95870.96060.9630.96110.95520.960711027
20.96360.96170.96340.9670.96420.95740.963611040
30.93720.93410.9370.93760.93730.93110.937211040
Test10.96160.95950.96140.96530.96230.95490.961611427
20.9620.95990.96180.96580.96270.95530.96211413
30.93620.93270.93580.93830.93660.92840.936211455
ViT44_8_AD_HC_MCI_S3Val10.93590.930.93580.94250.93610.91890.935911040
20.92790.91580.92770.92320.92790.90880.927911027
30.92870.92090.92830.9280.930.91550.928711040
Test10.93390.92430.93370.94070.93420.91010.933911413
20.92280.91790.92270.91750.92280.91850.922811427
30.92770.92190.92720.92910.92950.9170.927711455
ViT_vanilla_ADMCI_HC_S3Val10.99160.96580.99140.99010.99150.94420.991611027
20.99150.96650.99140.97650.99140.95690.991511040
30.9860.94330.98560.96880.98570.92080.98611040
Test10.98980.96430.98960.9890.98980.94240.989811427
20.98920.96210.9890.98640.98920.94050.989211413
30.98440.94550.98410.96350.98410.9290.984411455
ViT_vanilla_AD_HCMCI_S3Val10.95980.95770.95960.96320.96040.95330.959811027
20.95660.95410.95630.96190.95780.94840.956611040
30.92620.92250.92590.92570.92610.91980.926211040
Test10.95850.95620.95820.96340.95960.95070.958511427
20.95450.9520.95420.95960.95570.94630.954511413
30.93190.9280.93140.93540.93270.92260.931911455
ViT_vanilla_AD_HC_MCI_S3Val10.94390.94080.94370.94970.94430.93290.943911040
20.9240.91090.92370.92740.92410.89660.92411027
30.92670.9110.92640.92040.92680.90240.926711040
Test10.93850.9320.93820.9480.93930.91820.938511413
20.92140.91830.92130.92670.92150.91060.921411427
30.93210.92110.93160.93230.93310.91180.932111455
OViTAD_ADMCI_HC_S3Val10.98860.95290.98820.98620.98850.92450.988611027
20.99010.96010.98990.98390.990.93880.990111040
30.98280.93170.98250.94740.98240.91730.982811040
Test10.98810.95760.98770.98850.98810.93110.988111427
20.98480.94620.98440.97480.98460.92140.984811413
30.97910.92870.9790.9360.97880.92180.979111455
OViTAD_AD_HCMCI_S3Val10.94570.94290.94550.94650.94580.93990.945711027
20.94550.94270.94520.94670.94570.93940.945511040
30.91670.91140.91580.92210.91820.90440.916711040
Test10.94490.94190.94460.94740.94540.93750.944911427
20.94030.93710.940.94280.94080.93270.940311413
30.91760.91240.91670.92320.91920.90530.917611455
OViTAD_AD_HC_MCI_S3Val10.9310.92430.93060.93850.93220.91230.93111040
20.91160.89310.9110.90780.91260.88080.911611027
30.9110.89360.91020.92070.91260.87210.91111040
Test10.91130.89970.91070.91820.91290.88470.911311413
20.90470.89830.90420.91120.90580.88760.904711427
30.91090.89650.90990.92190.91410.87740.910911455
OViTAD_AD_HC_S3Val10.96730.96360.96710.90460.96730.93110.96625173
20.96280.94560.9620.90340.96280.92290.96195164
30.96560.95620.96510.90520.96560.92850.96465173
Test10.95750.96070.95780.88540.95750.91770.95565482
20.96530.95050.96480.92370.96530.93650.96485480
30.95340.94170.95260.88730.95340.91160.95195493
OViTAD_HC_MCI_S3Val10.93960.84120.94450.88670.93960.86190.94156636
20.91830.79190.93490.88410.91830.82820.92386630
30.92640.81040.93620.8770.92640.83880.92996641
Test10.93440.85720.93510.8650.93440.86110.93476857
20.91360.80450.93020.89040.91360.83840.91896874
30.91440.8080.92320.86030.91440.83080.91776889
Table A4. The slice-level performance of structural MRI models using the preprocessed data with spatial smoothing sigma = 4 mm (S4). The naming convention for models and classes is as in the previous tables.
Table A4. The slice-level performance of structural MRI models using the preprocessed data with spatial smoothing sigma = 4 mm (S4). The naming convention for models and classes is as in the previous tables.
ModelDatasetRepetitionAccuracyPrecision macro_avgPrecision weighted_avgRecall macro_avgRecall weighted_avgF1-Score macro_avgF1-Score weighted_avgSlices
CaIT_ADMCI-HC_S4Val10.9330.8590.92330.52550.9330.53140.904811040
20.93070.7240.90580.53630.93070.54980.906311027
30.93290.8850.92650.52610.93290.53240.904511040
Test10.92210.8350.90890.52930.92210.53560.890511413
20.91840.69770.88620.52530.91840.52880.887711427
30.92140.81520.90530.52390.92140.52580.888811455
CaIT_AD-HCMCI_S4Val10.74490.73670.74190.72180.74490.72640.740811040
20.73870.72820.73590.71960.73870.72270.736211027
30.73390.72560.73050.70760.73390.71240.728411040
Test10.72930.71910.72580.70680.72930.71060.725511413
20.71420.70140.71130.69550.71420.69780.712111427
30.74180.7330.73860.71860.74180.72310.737711455
CaIT_AD-HC-MCI_S4Val10.74680.7690.74960.54820.74680.55480.724411040
20.73860.7030.73010.53750.73860.54080.71511027
30.72290.67160.71350.53080.72290.54060.698111040
Test10.71880.69150.7110.53550.71880.53970.694611413
20.70390.67110.69440.51340.70390.50840.676111427
30.73080.67260.71840.54740.73080.55640.705611455
DeepViT_ADMCI_HC_S4Val10.98920.95640.9890.97880.98910.93630.989211027
20.99030.96110.99010.98090.99020.94310.990311040
30.98340.93270.98290.96030.9830.90870.983411040
Test10.98910.96170.98880.98370.9890.94190.989111427
20.98710.95470.98680.97860.9870.93340.987111413
30.9850.94680.98460.97290.98470.9240.98511455
DeepViT_AD_HCMCI_S4Val10.95690.95450.95660.96120.95780.94940.956911027
20.95340.95080.95310.95870.95470.9450.953411040
30.93830.9350.9380.94040.93870.93080.938311040
Test10.94910.94620.94870.95330.950.94080.949111427
20.9510.94810.95060.95710.95260.94180.95111413
30.9380.93440.93750.94220.9390.92880.93811455
DeepViT_AD_HC_MCI_S4Val10.94070.93140.94050.9360.94060.92690.940711040
20.93480.91680.93430.94010.93510.89740.934811027
30.93130.91190.93090.92510.93120.90010.931311040
Test10.93590.92170.93550.93540.9360.90980.935911413
20.92480.9150.92450.92810.92520.90360.924811427
30.93510.92420.93480.94090.93550.90980.935111455
ViT44_8_ADMCI_HC_S4Val10.99180.96730.99170.98510.99180.9510.991811027
20.99330.97340.99320.9880.99320.95970.993311040
30.9850.93840.98440.9710.98470.91070.98511040
Test10.99350.97780.99340.98970.99350.96650.993511427
20.99120.9690.99090.99190.99120.94840.991211413
30.98460.9450.98410.97680.98440.91790.984611455
ViT44_8_AD_HCMCI_S4Val10.9680.96630.96780.97180.96870.96190.96811027
20.96820.96670.96810.96920.96830.96440.968211040
30.950.94750.94980.95080.95010.94470.9511040
Test10.96670.96480.96640.97150.96770.95960.966711427
20.96710.96540.9670.96860.96720.96270.967111413
30.94490.94210.94470.94630.94510.93860.944911455
ViT44_8_AD_HC_MCI_S4Val10.95110.94510.95090.95920.95170.93260.951111040
20.94680.93470.94660.94540.94680.92490.946811027
30.94620.93480.94610.94070.94620.92920.946211040
Test10.94530.93310.9450.95520.94620.91480.945311413
20.93540.92980.93520.93130.93570.92890.935411427
30.94970.940.94950.94730.94980.93320.949711455
ViT_vanilla_ADMCI_HC_S4Val10.99220.96850.9920.99110.99220.94820.992211027
20.99430.97750.99420.98870.99430.96690.994311040
30.9880.95260.98780.9650.98770.9410.98811040
Test10.99190.97210.99180.98870.99190.95680.991911427
20.9890.96180.98880.9770.98880.94770.98911413
30.98580.95120.98560.96080.98560.94210.985811455
ViT_vanilla_AD_HCMCI_S4Val10.96890.96750.96890.96840.96890.96670.968911027
20.96210.96020.9620.96420.96240.95690.962111040
30.94410.94110.94380.94640.94450.9370.944111040
Test10.96710.96550.9670.96780.96720.96350.967111427
20.96090.95890.96070.96390.96140.95480.960911413
30.94230.93920.9420.94490.94280.93470.942311455
ViT_vanilla_AD_HC_MCI_S4Val10.94720.92660.9470.93260.9470.92090.947211040
20.9430.92690.94270.94010.94310.91510.94311027
30.93660.92560.93650.92650.93680.92510.936611040
Test10.94520.92980.94490.94110.9450.91960.945211413
20.93970.93680.93950.94670.94040.92820.939711427
30.94040.92440.94020.92820.94040.9210.940411455
OViTAD_ADMCI_HC_S4Val10.99010.96010.98990.9820.990.94040.990111027
20.98890.95470.98860.98110.98870.93150.988911040
30.9830.92990.98230.96470.98250.90070.98311040
Test10.9890.96130.98870.98470.98890.94040.98911427
20.98680.95260.98630.98650.98680.92390.986811413
30.97930.92560.97860.95810.97870.89820.979311455
OViTAD_AD_HCMCI_S4Val10.9560.95390.95590.95680.95610.95130.95611027
20.94990.94710.94950.95470.9510.94140.949911040
30.93550.93240.93530.93490.93540.93010.935511040
Test10.95440.9520.95420.95660.95480.94830.954411427
20.94950.94650.94910.95590.95120.940.949511413
30.9390.93590.93880.93920.9390.93320.93911455
OViTAD_AD_HC_MCI_S4Val10.94030.93110.94010.9430.94040.92030.940311040
20.93430.91630.93390.93810.93460.8980.934311027
30.92870.91420.92820.93140.92990.89990.928711040
Test10.92880.91320.92840.92880.92880.89960.928811413
20.92090.90990.92050.9250.92140.89690.920911427
30.93380.91670.93320.93290.93490.90320.933811455
OViTAD_AD_HC_S4Val10.97350.96530.97320.92810.97350.94550.9735173
20.95620.9410.95530.88010.95620.90720.95465164
30.96680.95130.96610.9150.96680.9320.96615173
Test10.96680.96370.96660.91590.96680.93770.96595482
20.95780.93880.9570.90760.95780.92230.95715480
30.9550.92830.95430.90850.9550.9180.95455493
OViTAD_HC_MCI_S4Val10.9260.8070.94030.89660.9260.8430.93076636
20.90750.77180.92360.85030.90750.80330.91346630
30.93680.83380.94330.88910.93680.85830.93926641
Test10.91640.81030.92940.88330.91640.84050.92086857
20.90820.79550.9250.87730.90820.82790.91386874
30.92730.83180.93510.88740.92730.85620.93016889
Table A5. The performance of structural MRI models at subject-level for preprocessed data using spatial smoothing with sigma = 3 mm (S3). The naming convention for models and classes is as in the previous tables.
Table A5. The performance of structural MRI models at subject-level for preprocessed data using spatial smoothing with sigma = 3 mm (S3). The naming convention for models and classes is as in the previous tables.
ModelDatasetRepetitionAccuracyPrecision macro_avgPrecision weighted_avgRecall macro_avgRecall weighted_avgF1-Score macro_avgF1-Score weighted_avgSubjects
CaIT_ADMCI-HC_S3Val10.93060.46530.86590.50.93060.4820.8971144
20.93060.46530.86590.50.93060.4820.8971144
30.93060.46530.86590.50.93060.4820.8971144
Test10.91950.45970.84540.50.91950.4790.8809149
20.91950.45970.84540.50.91950.4790.8809149
30.91950.45970.84540.50.91950.4790.8809149
CaIT_AD-HCMCI_S3Val10.74310.74160.74230.70870.74310.71520.7338144
20.70830.7150.71240.65880.70830.66060.6871144
30.69440.70550.70170.63820.69440.63530.6659144
Test10.70470.7040.70430.65920.70470.66190.6869149
20.69130.68930.69010.64230.69130.64260.67149
30.71810.72670.72330.67030.71810.67370.6987149
CaIT_AD-HC-MCI_S3Val10.81940.54250.76030.57460.81940.55610.7861144
20.79860.53410.74450.55560.79860.53860.7625144
30.78470.52280.72990.54390.78470.52630.7473144
Test10.78520.52020.71950.55370.78520.53180.7448149
20.7450.49140.68070.52250.7450.50070.7037149
30.79190.52450.72550.55930.79190.53740.752149
DeepViT_ADMCI_HC_S3Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
DeepViT_AD_HCMCI_S3Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test10.99330.9930.99330.99450.99340.99150.9933149
21111111149
31111111149
DeepViT_AD_HC_MCI_S3Val11111111144
21111111144
31111111144
Test10.99330.9950.99330.99580.99340.99440.9933149
20.98660.99010.98660.99010.98660.99010.9866149
31111111149
ViT44_8_ADMCI_HC_S3Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
ViT44_8_AD_HCMCI_S3Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test11111111149
20.99330.9930.99330.99450.99340.99150.9933149
31111111149
ViT44_8_AD_HC_MCI_S3Val11111111144
21111111144
31111111144
Test11111111149
20.99330.9950.99330.99440.99340.99570.9933149
31111111149
ViT_vanilla_ADMCI_HC_S3Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
ViT_vanilla_AD_HCMCI_S3Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test10.99330.9930.99330.99450.99340.99150.9933149
20.99330.9930.99330.99450.99340.99150.9933149
31111111149
ViT_vanilla_AD_HC_MCI_S3Val11111111144
21111111144
31111111144
Test11111111149
20.99330.9950.99330.99440.99340.99570.9933149
31111111149
OViTAD_ADMCI_HC_S3Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
OViTAD_AD_HCMCI_S3Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test10.99330.9930.99330.99450.99340.99150.9933149
20.99330.9930.99330.99450.99340.99150.9933149
31111111149
OViTAD_AD_HC_MCI_S3Val11111111144
21111111144
30.99310.99490.9930.99570.99310.99420.9931144
Test10.99330.9950.99330.99580.99340.99440.9933149
20.98660.99010.98660.99010.98660.99010.9866149
31111111149
OViTAD_AD_HC_S3Val1111111167
2111111167
3111111167
Test1111111171
2111111171
3111111171
OViTAD_HC_MCI_S3Val1111111187
2111111187
3111111187
Test1111111190
2111111190
3111111190
Table A6. The performance of structural MRI models at subject-level for preprocessed data using spatial smoothing with sigma = 4 mm (S4). The naming convention for models and classes is as in the previous tables.
Table A6. The performance of structural MRI models at subject-level for preprocessed data using spatial smoothing with sigma = 4 mm (S4). The naming convention for models and classes is as in the previous tables.
ModelDatasetRepetitionAccuracyPrecision macro_avgPrecision weighted_avgRecall macro_avgRecall weighted_avgF1-Score macro_avgF1-Score weighted_avgSubjects
CaIT_S4_ADMCI-HCVal10.93060.46530.86590.50.93060.4820.8971144
20.93060.46530.86590.50.93060.4820.8971144
30.93060.46530.86590.50.93060.4820.8971144
Test10.91950.45970.84540.50.91950.4790.8809149
20.91950.45970.84540.50.91950.4790.8809149
30.91950.45970.84540.50.91950.4790.8809149
CaIT_S4_AD-HCMCIVal10.86810.86070.86990.86660.86810.86320.8686144
20.89580.88890.89930.89870.89580.89260.8965144
30.85420.84820.85380.8460.85420.84710.8539144
Test10.85230.84480.85350.84860.85230.84650.8527149
20.83890.83070.84180.83750.83890.83350.8397149
30.87920.8720.88380.88250.87920.87570.88149
CaIT_S4_AD-HC-MCIVal10.90970.60210.84870.64910.90970.62450.8778144
20.91670.60690.85610.6550.91670.62960.8848144
30.86810.57430.80630.6140.86810.59320.8356144
Test10.89260.59070.8230.64410.89260.6160.8561149
20.87250.57690.80360.62850.87250.60150.8365149
30.86580.57220.79530.62150.86580.59580.829149
DeepViT_ADMCI_HC_S4Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
DeepViT_AD_HCMCI_S4Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test11111111149
20.99330.9930.99330.99450.99340.99150.9933149
31111111149
DeepViT_AD_HC_MCI_S4Val11111111144
21111111144
30.99310.99490.9930.99570.99310.99420.9931144
Test11111111149
20.98660.99010.98660.99010.98660.99010.9866149
31111111149
ViT44_8_ADMCI_HC_S4Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
ViT44_8_AD_HCMCI_S4Val11111111144
21111111144
31111111144
Test10.99330.9930.99330.99450.99340.99150.9933149
21111111149
31111111149
ViT44_8_AD_HC_MCI_S4Val11111111144
21111111144
30.99310.99490.9930.99570.99310.99420.9931144
Test10.99330.9950.99330.99580.99340.99440.9933149
20.98660.99010.98660.99010.98660.99010.9866149
31111111149
ViT_vanilla_ADMCI_HC_S4Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
ViT_vanilla_AD_HCMCI_S4Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test11111111149
20.99330.9930.99330.99450.99340.99150.9933149
31111111149
ViT_vanilla_AD_HC_MCI_S4Val11111111144
21111111144
30.99310.99490.9930.99570.99310.99420.9931144
Test10.99330.9950.99330.99580.99340.99440.9933149
20.99330.9950.99330.99580.99340.99440.9933149
31111111149
OViTAD_ADMCI_HC_S4Val11111111144
21111111144
31111111144
Test11111111149
21111111149
31111111149
OViTAD_AD_HCMCI_S4Val11111111144
21111111144
30.99310.99270.9930.99430.99310.99120.9931144
Test10.99330.9930.99330.99450.99340.99150.9933149
20.99330.9930.99330.99450.99340.99150.9933149
31111111149
OViTAD_AD_HC_MCI_S4Val11111111144
21111111144
30.99310.99490.9930.99570.99310.99420.9931144
Test11111111149
20.98660.99010.98660.99010.98660.99010.9866149
31111111149
OViTAD_AD_HC_S4Val1111111167
2111111167
3111111167
Test1111111171
2111111171
3111111171
OViTAD_HC_MCI_S4Val1111111187
2111111187
3111111187
Test1111111190
2111111190
3111111190
Table A7. The summary of highlights for each method.
Table A7. The summary of highlights for each method.
ReferenceHighlights
Lin et al. 2018 [100]* MCIc vs MCInc 68.68%
* FreeSurfer-based Features + 3-layer CNN
Dimitriadis et al. 2018 [101]* Random Forest Feature Selection + SVM
* Model interpretability
Kruthika et al. 2019 [102]* FreeSurfer-based Features + Multistage Classifier
* Further non-ML optimization (PSO) 96.31%
Spasov et al. 2019 [103]* 3D Images + 3D CNN + Statistical Model
* sMCI vs pMCI trained by AD, HC, MCI
Basaia et al. 2019 [104]* ADNI + non-ADNI data
* c-MCI vs s-MCI 75.1%
Abrol et al. 2020 [105]* 3D Adopted ResNet
* Standard 4-way AD, HC, sMCI, pMCI
Shao et al. 2020 [106]* Hypergraph + Multi-task Feature Selection + SVM
Alinsaif et al. 2021 [107]* HC + sMCI vs pMCI + AD dataset
* 3D Shearlet technique + SVM
Alinsaif et al. 2021 [107]* HC + sMCI vs pMCI + AD
* MobileNet fine-tuned
Ramzan et al. 2019 [63]* ResNet18 fine-tuned
Hojjati et al. 2018 [108]* functional connectivity + cortical thickness
* SVM
Cui et al. 2019 [109]* 3D CNN features + RNN
Amoroso et al. 2018 [110]* Random Forest Feature Selection + Deep Neural Network
Buvaneswari et al. 2021 [111]* Hippocampal visual features
* PCA-SVR
Duc et al. 2019 [61]* 3D CNN + MMSE Regression
OViTAD - fMRI* First Vision Transformer for Alzheimer’s prediction using rs-fMRI
* Aggressive fMRI preprocessing + 4D data decomposition to 2D
* postprocessing to retrieve subject-level prediction
OViTAD - MRI (Sigma = 3,4)* First Vision Transformer for Alzheimer’s prediction using MRI
* Aggressive fMRI preprocessing + 3D data decomposition to 2D
* postprocessing to retrieve subject-level prediction
Figure A1. The attention maps for a random AD fMRI slice from the testing set in AD vs. HC vs. MCI in OViTAD with head = 8 and depth = 6, input dimension = 56.
Figure A1. The attention maps for a random AD fMRI slice from the testing set in AD vs. HC vs. MCI in OViTAD with head = 8 and depth = 6, input dimension = 56.
Brainsci 13 00260 g0a1
Figure A2. The attention maps for a random AD structural MRI slice from the testing set in AD vs. HC vs. MCI in OViTAD with head = 8 and depth = 6, input dimension = 112.
Figure A2. The attention maps for a random AD structural MRI slice from the testing set in AD vs. HC vs. MCI in OViTAD with head = 8 and depth = 6, input dimension = 112.
Brainsci 13 00260 g0a2
Figure A3. The global attention feature map was obtained by multiplying the FC layer vector by each pixel in the structural MRI brain slices and measuring the sum of the multiplication per pixel. Next, we normalized the feature map to (0,255) and visualized the maps using the CIVIDIS color map. Finally, we selected the first slice of each time-course to demonstrate various brain morphology across the MRI data acquisition.
Figure A3. The global attention feature map was obtained by multiplying the FC layer vector by each pixel in the structural MRI brain slices and measuring the sum of the multiplication per pixel. Next, we normalized the feature map to (0,255) and visualized the maps using the CIVIDIS color map. Finally, we selected the first slice of each time-course to demonstrate various brain morphology across the MRI data acquisition.
Brainsci 13 00260 g0a3
Figure A4. The performance of best-performing models for fMRI, MRI-S3, and MRI-S4 in a multiple classification experiment to predict AD vs. HC vs. MCI includes the training loss and accuracy rates and loss scores for training and validations sets. The modeling was conducted using 2D images, and the metrics shown across represent the slice-level performance used to extract the subject-level metrics.
Figure A4. The performance of best-performing models for fMRI, MRI-S3, and MRI-S4 in a multiple classification experiment to predict AD vs. HC vs. MCI includes the training loss and accuracy rates and loss scores for training and validations sets. The modeling was conducted using 2D images, and the metrics shown across represent the slice-level performance used to extract the subject-level metrics.
Brainsci 13 00260 g0a4
Figure A5. The summary of fMRI models’ performance using averaged F1-scores for three testing sets.
Figure A5. The summary of fMRI models’ performance using averaged F1-scores for three testing sets.
Brainsci 13 00260 g0a5
Figure A6. The summary of sMRI models’ performance (S3,S4) using averaged F1-scores for three testing sets.
Figure A6. The summary of sMRI models’ performance (S3,S4) using averaged F1-scores for three testing sets.
Brainsci 13 00260 g0a6

References

  1. Lin, P.J.; D’Cruz, B.; Leech, A.A.; Neumann, P.J.; Sanon Aigbogun, M.; Oberdhan, D.; Lavelle, T.A. Family and caregiver spillover effects in cost-utility analyses of Alzheimer’s disease interventions. Pharmacoeconomics 2019, 37, 597–608. [Google Scholar] [CrossRef] [PubMed]
  2. Alzheimer’s Association. 2018 Alzheimer’s disease facts and figures. Alzheimer’s Dement. 2018, 14, 367–429. [Google Scholar] [CrossRef]
  3. Frisoni, G.B.; Boccardi, M.; Barkhof, F.; Blennow, K.; Cappa, S.; Chiotis, K.; Démonet, J.F.; Garibotto, V.; Giannakopoulos, P.; Gietl, A.; et al. Strategic roadmap for an early diagnosis of Alzheimer’s disease based on biomarkers. Lancet Neurol. 2017, 16, 661–676. [Google Scholar] [CrossRef]
  4. Rasmussen, J.; Langerman, H. Alzheimer’s disease–why we need early diagnosis. Degener. Neurol. Neuromuscul. Dis. 2019, 9, 123. [Google Scholar] [CrossRef]
  5. Fitzpatrick, A.W.; Falcon, B.; He, S.; Murzin, A.G.; Murshudov, G.; Garringer, H.J.; Crowther, R.A.; Ghetti, B.; Goedert, M.; Scheres, S.H. Cryo-EM structures of tau filaments from Alzheimer’s disease. Nature 2017, 547, 185–190. [Google Scholar] [CrossRef]
  6. Mazure, C.M.; Swendsen, J. Sex differences in Alzheimer’s disease and other dementias. Lancet Neurol. 2016, 15, 451. [Google Scholar] [CrossRef]
  7. Murphy, M.C.; Jones, D.T.; Jack, C.R., Jr.; Glaser, K.J.; Senjem, M.L.; Manduca, A.; Felmlee, J.P.; Carter, R.E.; Ehman, R.L.; Huston, J., III. Regional brain stiffness changes across the Alzheimer’s disease spectrum. Neuroimage Clin. 2016, 10, 283–290. [Google Scholar] [CrossRef]
  8. Gillis, C.; Mirzaei, F.; Potashman, M.; Ikram, M.A.; Maserejian, N. The incidence of mild cognitive impairment: A systematic review and data synthesis. Alzheimer’s Dementia: Diagn. Assess. Dis. Monit. 2019, 11, 248–256. [Google Scholar] [CrossRef] [PubMed]
  9. Cabeza, R.; Albert, M.; Belleville, S.; Craik, F.I.; Duarte, A.; Grady, C.L.; Lindenberger, U.; Nyberg, L.; Park, D.C.; Reuter-Lorenz, P.A.; et al. Maintenance, reserve and compensation: The cognitive neuroscience of healthy ageing. Nat. Rev. Neurosci. 2018, 19, 701–710. [Google Scholar] [CrossRef] [PubMed]
  10. Petersen, R.C. Mild cognitive impairment. Contin. Lifelong Learn. Neurol. 2016, 22, 404. [Google Scholar]
  11. Anthony, M.; Lin, F. A systematic review for functional neuroimaging studies of cognitive reserve across the cognitive aging spectrum. Arch. Clin. Neuropsychol. 2018, 33, 937–948. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Mateos-Pérez, J.M.; Dadar, M.; Lacalle-Aurioles, M.; Iturria-Medina, Y.; Zeighami, Y.; Evans, A.C. Structural neuroimaging as clinical predictor: A review of machine learning applications. NeuroImage Clin. 2018, 20, 506–522. [Google Scholar] [CrossRef] [PubMed]
  13. Neale, N.; Padilla, C.; Fonseca, L.M.; Holland, T.; Zaman, S. Neuroimaging and other modalities to assess Alzheimer’s disease in Down syndrome. NeuroImage Clin. 2018, 17, 263–271. [Google Scholar] [CrossRef] [PubMed]
  14. Rathore, S.; Habes, M.; Iftikhar, M.A.; Shacklett, A.; Davatzikos, C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer’s disease and its prodromal stages. NeuroImage 2017, 155, 530–548. [Google Scholar] [CrossRef]
  15. Vemuri, P.; Lesnick, T.G.; Przybelski, S.A.; Knopman, D.S.; Lowe, V.J.; Graff-Radford, J.; Roberts, R.O.; Mielke, M.M.; Machulda, M.M.; Petersen, R.C.; et al. Age, vascular health, and Alzheimer disease biomarkers in an elderly sample. Ann. Neurol. 2017, 82, 706–718. [Google Scholar] [CrossRef]
  16. Lindquist, M. Neuroimaging results altered by varying analysis pipelines, 2020. Nature 2020, 582, 36–37. [Google Scholar] [CrossRef]
  17. Wang, X.; Huang, W.; Su, L.; Xing, Y.; Jessen, F.; Sun, Y.; Shu, N.; Han, Y. Neuroimaging advances regarding subjective cognitive decline in preclinical Alzheimer’s disease. Mol. Neurodegener. 2020, 15, 1–27. [Google Scholar] [CrossRef]
  18. Hainc, N.; Federau, C.; Stieltjes, B.; Blatow, M.; Bink, A.; Stippich, C. The bright, artificial intelligence-augmented future of neuroimaging reading. Front. Neurol. 2017, 8, 489. [Google Scholar] [CrossRef] [PubMed]
  19. Jo, T.; Nho, K.; Saykin, A.J. Deep learning in Alzheimer’s disease: Diagnostic classification and prognostic prediction using neuroimaging data. Front. Aging Neurosci. 2019, 220. [Google Scholar] [CrossRef] [PubMed]
  20. Henschel, L.; Conjeti, S.; Estrada, S.; Diers, K.; Fischl, B.; Reuter, M. Fastsurfer-a fast and accurate deep learning based neuroimaging pipeline. NeuroImage 2020, 219, 117012. [Google Scholar] [CrossRef]
  21. Puranik, M.; Shah, H.; Shah, K.; Bagul, S. Intelligent Alzheimer’s detector using deep learning. In Proceedings of the 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 14–15 June 2018; pp. 318–323. [Google Scholar]
  22. Bi, X.; Li, S.; Xiao, B.; Li, Y.; Wang, G.; Ma, X. Computer aided Alzheimer’s disease diagnosis by an unsupervised deep learning technology. Neurocomputing 2020, 392, 296–304. [Google Scholar] [CrossRef]
  23. Kazemi, Y.; Houghten, S. A deep learning pipeline to classify different stages of Alzheimer’s disease from fMRI data. In Proceedings of the 2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), St. Louis, MO, USA, 30 May–2 June 2018; pp. 1–8. [Google Scholar]
  24. Tang, Z.; Chuang, K.V.; DeCarli, C.; Jin, L.W.; Beckett, L.; Keiser, M.J.; Dugger, B.N. Interpretable classification of Alzheimer’s disease pathologies with a convolutional neural network pipeline. Nat. Commun. 2019, 10, 1–14. [Google Scholar] [CrossRef] [PubMed]
  25. Wen, J.; Thibeau-Sutre, E.; Diaz-Melo, M.; Samper-González, J.; Routier, A.; Bottani, S.; Dormont, D.; Durrleman, S.; Burgos, N.; Colliot, O.; et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Med. Image Anal. 2020, 63, 101694. [Google Scholar] [CrossRef] [PubMed]
  26. Liu, M.; Cheng, D.; Wang, K.; Wang, Y. Multi-modality cascaded convolutional neural networks for Alzheimer’s disease diagnosis. Neuroinformatics 2018, 16, 295–308. [Google Scholar] [CrossRef]
  27. Islam, J.; Zhang, Y. Brain MRI analysis for Alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks. Brain Informatics 2018, 5, 1–14. [Google Scholar] [CrossRef] [PubMed]
  28. Song, T.A.; Chowdhury, S.R.; Yang, F.; Jacobs, H.; El Fakhri, G.; Li, Q.; Johnson, K.; Dutta, J. Graph convolutional neural networks for Alzheimer’s disease classification. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; pp. 414–417. [Google Scholar]
  29. Sarraf, A.; Jalali, A.E.; Ghaffari, J. Recent Applications of Deep Learning Algorithms in Medical Image Analysis. Am. Acad. Sci. Res. J. Eng. Technol. Sci. 2020, 72, 58–66. [Google Scholar]
  30. Sarraf, A.; Azhdari, M.; Sarraf, S. A comprehensive review of deep learning architectures for computer vision applications. Am. Acad. Sci. Res. J. Eng. Technol. Sci. 2021, 77, 1–29. [Google Scholar]
  31. Janghel, R.; Rathore, Y. Deep convolution neural network based system for early diagnosis of Alzheimer’s disease. Irbm 2021, 42, 258–267. [Google Scholar] [CrossRef]
  32. Chen, S.; Zhang, J.; Wei, X.; Zhang, Q. Alzheimer’s Disease Classification Using Structural MRI Based on Convolutional Neural Networks. In Proceedings of the 2020 2nd International Conference on Big-data Service and Intelligent Computation, Johannesburg, South Africa, 28–30 April 2020; pp. 7–13. [Google Scholar]
  33. Albright, J.; Initiative, A.D.N. Forecasting the progression of Alzheimer’s disease using neural networks and a novel preprocessing algorithm. Alzheimer’s Dementia: Transl. Res. Clin. Interv. 2019, 5, 483–491. [Google Scholar] [CrossRef]
  34. Li, F.; Liu, M.; Initiative, A.D.N. A hybrid convolutional and recurrent neural network for hippocampus analysis in Alzheimer’s disease. J. Neurosci. Methods 2019, 323, 108–118. [Google Scholar] [CrossRef]
  35. Feng, C.; Elazab, A.; Yang, P.; Wang, T.; Zhou, F.; Hu, H.; Xiao, X.; Lei, B. Deep learning framework for Alzheimer’s disease diagnosis via 3D-CNN and FSBi-LSTM. IEEE Access 2019, 7, 63605–63618. [Google Scholar] [CrossRef]
  36. Dua, M.; Makhija, D.; Manasa, P.; Mishra, P. A CNN–RNN–LSTM based amalgamation for Alzheimer’s disease detection. J. Med. Biol. Eng. 2020, 40, 688–706. [Google Scholar] [CrossRef]
  37. Anwar, S.M.; Majid, M.; Qayyum, A.; Awais, M.; Alnowami, M.; Khan, M.K. Medical image analysis using convolutional neural networks: A review. J. Med. Syst. 2018, 42, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Yao, G.; Lei, T.; Zhong, J. A review of convolutional-neural-network-based action recognition. Pattern Recognit. Lett. 2019, 118, 14–22. [Google Scholar] [CrossRef]
  39. Dhillon, A.; Verma, G.K. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
  40. Sornam, M.; Muthusubash, K.; Vanitha, V. A survey on image classification and activity recognition using deep convolutional neural network architecture. In Proceedings of the 2017 ninth international conference on advanced computing (ICoAC), Chennai, India, 14–16 December 2017; pp. 121–126. [Google Scholar]
  41. Sultana, F.; Sufian, A.; Dutta, P. Evolution of image segmentation using deep convolutional neural network: A survey. Knowl.-Based Syst. 2020, 201, 106062. [Google Scholar] [CrossRef]
  42. Ebrahimighahnavieh, M.A.; Luo, S.; Chiong, R. Deep learning to detect Alzheimer’s disease from neuroimaging: A systematic literature review. Comput. Methods Programs Biomed. 2020, 187, 105242. [Google Scholar] [CrossRef]
  43. Altinkaya, E.; Polat, K.; Barakli, B. Detection of Alzheimer’s disease and dementia states based on deep learning from MRI images: A comprehensive review. J. Inst. Electron. Comput. 2020, 1, 39–53. [Google Scholar]
  44. Murn, L.; Blasi, S.; Smeaton, A.F.; O’Connor, N.E.; Mrak, M. Interpreting CNN for low complexity learned sub-pixel motion compensation in video coding. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 798–802. [Google Scholar]
  45. You, J.; Korhonen, J. Transformer for image quality assessment. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1389–1393. [Google Scholar]
  46. Li, N.; Liu, S.; Liu, Y.; Zhao, S.; Liu, M. Neural speech synthesis with transformer network. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6706–6713. [Google Scholar]
  47. Haller, S.; Lovblad, K.O.; Giannakopoulos, P. Principles of classification analyses in mild cognitive impairment (MCI) and Alzheimer disease. J. Alzheimer’s Dis. 2011, 26, 389–394. [Google Scholar] [CrossRef]
  48. Dukart, J.; Mueller, K.; Barthel, H.; Villringer, A.; Sabri, O.; Schroeter, M.L.; Initiative, A.D.N. Meta-analysis based SVM classification enables accurate detection of Alzheimer’s disease across different clinical centers using FDG-PET and MRI. Psychiatry Res. Neuroimaging 2013, 212, 230–236. [Google Scholar] [CrossRef]
  49. Suk, H.I.; Lee, S.W.; Shen, D.; Initiative, A.D.N. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 2014, 101, 569–582. [Google Scholar] [CrossRef] [PubMed]
  50. Zhu, X.; Suk, H.I.; Lee, S.W.; Shen, D. Canonical feature selection for joint regression and multi-class identification in Alzheimer’s disease diagnosis. Brain Imaging Behav. 2016, 10, 818–828. [Google Scholar] [CrossRef] [PubMed]
  51. Rieke, J.; Eitel, F.; Weygandt, M.; Haynes, J.D.; Ritter, K. Visualizing convolutional networks for MRI-based diagnosis of Alzheimer’s disease. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications; Springer: Berlin, Germany, 2018; pp. 24–31. [Google Scholar]
  52. Farooq, A.; Anwar, S.; Awais, M.; Rehman, S. A deep CNN based multi-class classification of Alzheimer’s disease using MRI. In Proceedings of the 2017 IEEE International Conference on Imaging systems and techniques (IST), Beijing, China, 18–20 October 2017; pp. 1–6. [Google Scholar]
  53. Long, X.; Chen, L.; Jiang, C.; Zhang, L.; Initiative, A.D.N. Prediction and classification of Alzheimer disease based on quantification of MRI deformation. PLoS ONE 2017, 12, e0173372. [Google Scholar] [CrossRef] [PubMed]
  54. Sarraf, S.; DeSouza, D.D.; Anderson, J.; Tofighi, G. DeepAD: Alzheimer’s disease classification via deep convolutional neural networks using MRI and fMRI. BioRxiv 2017, 070441. [Google Scholar]
  55. Wang, S.; Wang, H.; Shen, Y.; Wang, X. Automatic recognition of mild cognitive impairment and alzheimers disease using ensemble based 3d densely connected convolutional networks. In Proceedings of the 2018 17th IEEE International conference on machine learning and applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 517–523. [Google Scholar]
  56. Khvostikov, A.; Aderghal, K.; Benois-Pineau, J.; Krylov, A.; Catheline, G. 3D CNN-based classification using sMRI and MD-DTI images for Alzheimer disease studies. arXiv 2018, arXiv:1801.05968. [Google Scholar]
  57. Hosseini-Asl, E.; Keynton, R.; El-Baz, A. Alzheimer’s disease diagnostics by adaptation of 3D convolutional network. In Proceedings of the 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 126–130. [Google Scholar]
  58. Sarraf, S.; Desouza, D.D.; Anderson, J.A.; Saverino, C. MCADNNet: Recognizing stages of cognitive impairment through efficient convolutional fMRI and MRI neural network topology models. IEEE Access 2019, 7, 155584–155600. [Google Scholar] [CrossRef]
  59. Soliman, S.A.; Hussein, R.R.; El-Dahshan, E.S.A.; Salem, A.B.M. Intelligent Algorithms for the Diagnosis of Alzheimer’s Disease. In Innovative Smart Healthcare and Bio-Medical Systems; CRC Press: Boca Raton, FL, USA, 2020; pp. 51–86. [Google Scholar]
  60. Soliman, S.A.; El-Sayed, A.; Salem, A.B.M. Predicting Alzheimer’s Disease with 3D Convolutional Neural Networks. Int. J. Appl. Fuzzy Sets Artif. Intell. 2020, 10, 125–146. [Google Scholar]
  61. Duc, N.T.; Ryu, S.; Qureshi, M.N.I.; Choi, M.; Lee, K.H.; Lee, B. 3D-deep learning based automatic diagnosis of Alzheimer’s disease with joint MMSE prediction using resting-state fMRI. Neuroinformatics 2020, 18, 71–86. [Google Scholar] [CrossRef]
  62. Li, W.; Lin, X.; Chen, X. Detecting Alzheimer’s disease Based on 4D fMRI: An exploration under deep learning framework. Neurocomputing 2020, 388, 280–287. [Google Scholar] [CrossRef]
  63. Ramzan, F.; Khan, M.U.G.; Rehmat, A.; Iqbal, S.; Saba, T.; Rehman, A.; Mehmood, Z. A deep learning approach for automated diagnosis and multi-class classification of Alzheimer’s disease stages using resting-state fMRI and residual neural networks. J. Med. Syst. 2020, 44, 1–16. [Google Scholar] [CrossRef]
  64. Sarraf, S.; Tofighi, G. Deep learning-based pipeline to recognize Alzheimer’s disease using fMRI data. In Proceedings of the 2016 future technologies conference (FTC), San Francisco, CA, USA, 6–7 December 2016; pp. 816–820. [Google Scholar]
  65. Cheng, D.; Liu, M. Combining convolutional and recurrent neural networks for Alzheimer’s disease diagnosis using PET images. In Proceedings of the 2017 IEEE International Conference on Imaging Systems and Techniques (IST), Beijing, China, 18–20 October 2017; pp. 1–5. [Google Scholar]
  66. Hong, X.; Lin, R.; Yang, C.; Zeng, N.; Cai, C.; Gou, J.; Yang, J. Predicting Alzheimer’s disease using LSTM. IEEE Access 2019, 7, 80893–80901. [Google Scholar] [CrossRef]
  67. Wang, T.; Qiu, R.G.; Yu, M. Predictive modeling of the progression of Alzheimer’s disease with recurrent neural networks. Sci. Rep. 2018, 8, 1–12. [Google Scholar] [CrossRef] [PubMed]
  68. Sethi, M.; Ahuja, S.; Rani, S.; Bawa, P.; Zaguia, A. Classification of Alzheimer’s Disease Using Gaussian-Based Bayesian Parameter Optimization for Deep Convolutional LSTM Network. Comput. Math. Methods Med. 2021, 2021, 4186666. [Google Scholar] [CrossRef] [PubMed]
  69. Cui, R.; Liu, M.; Li, G. Longitudinal analysis for Alzheimer’s disease diagnosis using RNN. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 1398–1401. [Google Scholar]
  70. Bubu, O.M.; Pirraglia, E.; Andrade, A.G.; Sharma, R.A.; Gimenez-Badia, S.; Umasabor-Bubu, O.Q.; Hogan, M.M.; Shim, A.M.; Mukhtar, F.; Sharma, N.; et al. Obstructive sleep apnea and longitudinal Alzheimer’s disease biomarker changes. Sleep 2019, 42, zsz048. [Google Scholar] [CrossRef]
  71. Benoit, J.S.; Chan, W.; Piller, L.; Doody, R. Longitudinal sensitivity of Alzheimer’s disease severity staging. Am. J. Alzheimer’s Dis. Other Dementias® 2020, 35, 1533317520918719. [Google Scholar] [CrossRef]
  72. Jabason, E.; Ahmad, M.O.; Swamy, M. Hybrid Feature Fusion Using RNN and Pre-trained CNN for Classification of Alzheimer’s Disease (Poster). In Proceedings of the 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada, 2–5 July 2019; pp. 1–4. [Google Scholar]
  73. Song, J.; Zheng, J.; Li, P.; Lu, X.; Zhu, G.; Shen, P. An effective multimodal image fusion method using MRI and PET for Alzheimer’s disease diagnosis. Front. Digit. Health 2021, 3, 19. [Google Scholar] [CrossRef]
  74. Gupta, Y.; Kim, J.I.; Kim, B.C.; Kwon, G.R. Classification and graphical analysis of Alzheimer’s disease and its prodromal stage using multimodal features from structural, diffusion, and functional neuroimaging data and the APOE genotype. Front. Aging Neurosci. 2020, 12, 238. [Google Scholar] [CrossRef]
  75. Thushara, A.; Amma, C.U.; John, A.; Saju, R. Multimodal MRI Based Classification and Prediction of Alzheimer’s Disease Using Random Forest Ensemble. In Proceedings of the 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), Cochin, India, 2–4 July 2020; pp. 249–256. [Google Scholar]
  76. Liu, M.; Cheng, D.; Yan, W.; Initiative, A.D.N. Classification of Alzheimer’s disease by combination of convolutional and recurrent neural networks using FDG-PET images. Front. Neuroinformatics 2018, 12, 35. [Google Scholar] [CrossRef] [PubMed]
  77. Yuen, S.C.; Liang, X.; Zhu, H.; Jia, Y.; Leung, S.W. Prediction of differentially expressed microRNAs in blood as potential biomarkers for Alzheimer’s disease by meta-analysis and adaptive boosting ensemble learning. Alzheimer’s Res. Ther. 2021, 13, 1–30. [Google Scholar] [CrossRef] [PubMed]
  78. Kim, J.; Park, Y.; Park, S.; Jang, H.; Kim, H.J.; Na, D.L.; Lee, H.; Seo, S.W. Prediction of tau accumulation in prodromal Alzheimer’s disease using an ensemble machine learning approach. Sci. Rep. 2021, 11, 1–8. [Google Scholar] [CrossRef]
  79. Hu, D. An introductory survey on attention mechanisms in NLP problems. In Proceedings of the Proceedings of SAI Intelligent Systems Conference, London, UK, 5–6 September 2019; pp. 432–448. [Google Scholar]
  80. Letarte, G.; Paradis, F.; Giguère, P.; Laviolette, F. Importance of self-attention for sentiment analysis. In Proceedings of the Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 267–275. [Google Scholar]
  81. Roshanzamir, A.; Aghajan, H.; Soleymani Baghshah, M. Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech. BMC Med. Informatics Decis. Mak. 2021, 21, 1–14. [Google Scholar] [CrossRef] [PubMed]
  82. Sarasua, I.; Pölsterl, S.; Wachinger, C.; Neuroimaging, A.D. TransforMesh: A Transformer Network for Longitudinal Modeling of Anatomical Meshes. In Proceedings of the International Workshop on Machine Learning in Medical Imaging, Strasbourg, France, 27 September 2021; pp. 209–218. [Google Scholar]
  83. Wang, S.; Zhuang, Z.; Xuan, K.; Qian, D.; Xue, Z.; Xu, J.; Liu, Y.; Chai, Y.; Zhang, L.; Wang, Q.; et al. 3DMeT: 3D Medical Image Transformer for Knee Cartilage Defect Assessment. In Proceedings of the International Workshop on Machine Learning in Medical Imaging, Strasbourg, France, 27 September 2021; pp. 347–355. [Google Scholar]
  84. Jack, C.R., Jr.; Bernstein, M.A.; Fox, N.C.; Thompson, P.; Alexander, G.; Harvey, D.; Borowski, B.; Britson, P.J.; Whitwell, J.L.; Ward, C.; et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging Off. J. Int. Soc. Magn. Reson. Med. 2008, 27, 685–691. [Google Scholar] [CrossRef]
  85. Churchill, N.W.; Spring, R.; Afshin-Pour, B.; Dong, F.; Strother, S.C. An automated, adaptive framework for optimizing preprocessing pipelines in task-based functional MRI. PLoS ONE 2015, 10, e0131520. [Google Scholar] [CrossRef] [PubMed]
  86. Churchill, N.W.; Oder, A.; Abdi, H.; Tam, F.; Lee, W.; Thomas, C.; Ween, J.E.; Graham, S.J.; Strother, S.C. Optimizing preprocessing and analysis pipelines for single-subject fMRI. I. Standard temporal motion and physiological noise correction methods. Hum. Brain Mapp. 2012, 33, 609–627. [Google Scholar] [CrossRef] [PubMed]
  87. Li, X.; Morgan, P.S.; Ashburner, J.; Smith, J.; Rorden, C. The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J. Neurosci. Methods 2016, 264, 47–56. [Google Scholar] [CrossRef]
  88. Smith, S.M. Fast robust automated brain extraction. Hum. Brain Mapp. 2002, 17, 143–155. [Google Scholar] [CrossRef]
  89. Jenkinson, M.; Bannister, P.; Brady, M.; Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 2002, 17, 825–841. [Google Scholar] [CrossRef]
  90. Fonov, V.; Evans, A.C.; Botteron, K.; Almli, C.R.; McKinstry, R.C.; Collins, D.L.; the Brain Development Cooperative Group. Unbiased average age-appropriate atlases for pediatric studies. Neuroimage 2011, 54, 313–327. [Google Scholar] [CrossRef]
  91. Smith, S.M.; Jenkinson, M.; Woolrich, M.W.; Beckmann, C.F.; Behrens, T.E.; Johansen-Berg, H.; Bannister, P.R.; De Luca, M.; Drobnjak, I.; Flitney, D.E.; et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 2004, 23, S208–S219. [Google Scholar] [CrossRef]
  92. Scarpazza, C.; Tognin, S.; Frisciata, S.; Sartori, G.; Mechelli, A. False positive rates in Voxel-based Morphometry studies of the human brain: Should we be worried? Neurosci. Biobehav. Rev. 2015, 52, 49–55. [Google Scholar] [CrossRef]
  93. Mikl, M.; Mareček, R.; Hluštík, P.; Pavlicová, M.; Drastich, A.; Chlebus, P.; Brázdil, M.; Krupa, P. Effects of spatial smoothing on fMRI group inferences. Magn. Reson. Imaging 2008, 26, 490–503. [Google Scholar] [CrossRef]
  94. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
  95. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  96. Zhou, D.; Kang, B.; Jin, X.; Yang, L.; Lian, X.; Jiang, Z.; Hou, Q.; Feng, J. Deepvit: Towards deeper vision transformer. arXiv 2021, arXiv:2103.11886. [Google Scholar]
  97. Touvron, H.; Cord, M.; Sablayrolles, A.; Synnaeve, G.; Jégou, H. Going deeper with image transformers. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 32–42. [Google Scholar]
  98. Alakörkkö, T.; Saarimäki, H.; Glerean, E.; Saramäki, J.; Korhonen, O. Effects of spatial smoothing on functional brain networks. Eur. J. Neurosci. 2017, 46, 2471–2480. [Google Scholar] [CrossRef]
  99. Chen, Z.; Calhoun, V. Effect of spatial smoothing on task fMRI ICA and functional connectivity. Front. Neurosci. 2018, 12, 15. [Google Scholar] [CrossRef] [PubMed]
  100. Lin, W.; Tong, T.; Gao, Q.; Guo, D.; Du, X.; Yang, Y.; Guo, G.; Xiao, M.; Du, M.; Qu, X.; et al. Convolutional neural networks-based MRI image analysis for the Alzheimer’s disease prediction from mild cognitive impairment. Front. Neurosci. 2018, 777. [Google Scholar] [CrossRef]
  101. Dimitriadis, S.I.; Liparas, D.; Initiative, A.D.N. How random is the random forest? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer’s disease: From Alzheimer’s disease neuroimaging initiative (ADNI) database. Neural Regen. Res. 2018, 13, 962. [Google Scholar] [CrossRef]
  102. Kruthika, K.; Maheshappa, H.; Initiative, A.D.N. Multistage classifier-based approach for Alzheimer’s disease prediction and retrieval. Informatics Med. Unlocked 2019, 14, 34–42. [Google Scholar] [CrossRef]
  103. Spasov, S.; Passamonti, L.; Duggento, A.; Lio, P.; Toschi, N.; Initiative, A.D.N. A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease. Neuroimage 2019, 189, 276–287. [Google Scholar] [CrossRef] [Green Version]
  104. Basaia, S.; Agosta, F.; Wagner, L.; Canu, E.; Magnani, G.; Santangelo, R.; Filippi, M.; Initiative, A.D.N. Automated classification of Alzheimer’s disease and mild cognitive impairment using a single MRI and deep neural networks. NeuroImage Clin. 2019, 21, 101645. [Google Scholar] [CrossRef] [PubMed]
  105. Abrol, A.; Bhattarai, M.; Fedorov, A.; Du, Y.; Plis, S.; Calhoun, V.; Initiative, A.D.N. Deep residual learning for neuroimaging: An application to predict progression to Alzheimer’s disease. J. Neurosci. Methods 2020, 339, 108701. [Google Scholar] [CrossRef]
  106. Shao, W.; Peng, Y.; Zu, C.; Wang, M.; Zhang, D.; Initiative, A.D.N. Hypergraph based multi-task feature selection for multimodal classification of Alzheimer’s disease. Comput. Med. Imaging Graph. 2020, 80, 101663. [Google Scholar] [CrossRef]
  107. Alinsaif, S.; Lang, J.; Initiative, A.D.N. 3D shearlet-based descriptors combined with deep features for the classification of Alzheimer’s disease based on MRI data. Comput. Biol. Med. 2021, 138, 104879. [Google Scholar] [CrossRef]
  108. Hojjati, S.H.; Ebrahimzadeh, A.; Khazaee, A.; Babajani-Feremi, A.; Initiative, A.D.N. Predicting conversion from MCI to AD by integrating rs-fMRI and structural MRI. Comput. Biol. Med. 2018, 102, 30–39. [Google Scholar] [CrossRef]
  109. Cui, R.; Liu, M.; Initiative, A.D.N. RNN-based longitudinal analysis for diagnosis of Alzheimer’s disease. Comput. Med. Imaging Graph. 2019, 73, 1–10. [Google Scholar] [CrossRef]
  110. Amoroso, N.; Diacono, D.; Fanizzi, A.; La Rocca, M.; Monaco, A.; Lombardi, A.; Guaragnella, C.; Bellotti, R.; Tangaro, S.; Initiative, A.D.N.; et al. Deep learning reveals Alzheimer’s disease onset in MCI subjects: Results from an international challenge. J. Neurosci. Methods 2018, 302, 3–9. [Google Scholar] [CrossRef] [PubMed]
  111. Buvaneswari, P.; Gayathri, R. Detection and Classification of Alzheimer’s disease from cognitive impairment with resting-state fMRI. Neural Comput. Appl. 2021, 1, 1–16. [Google Scholar] [CrossRef]
Figure 1. The OViTAD architecture is an optimized ViT shown for structural MRI data composed of a linear projection layer applied to the flattened patches fed into an 8-HSA Transformer. The MLP layer of 2048 parameters translates the features from the transformer encoder to a proper format for the cross-entropy loss function.
Figure 1. The OViTAD architecture is an optimized ViT shown for structural MRI data composed of a linear projection layer applied to the flattened patches fed into an 8-HSA Transformer. The MLP layer of 2048 parameters translates the features from the transformer encoder to a proper format for the cross-entropy loss function.
Brainsci 13 00260 g001
Figure 2. The transformer block in DeepViT architecture includes a re-attention module instead of a standard self-attention layer (Left). Class-Attention in Image Transformer architecture consists of a class embedding (CLS) and additional class-attention layers preceded by self-attention layers (Right).
Figure 2. The transformer block in DeepViT architecture includes a re-attention module instead of a standard self-attention layer (Left). Class-Attention in Image Transformer architecture consists of a class embedding (CLS) and additional class-attention layers preceded by self-attention layers (Right).
Brainsci 13 00260 g002
Figure 3. The normalized confusion matrices for the best-performing fMRI (left), MRI-S3 (middle), and MRI-S4 (right) OViTAD models to classify AD vs. HC vs. MCI at subject-level.
Figure 3. The normalized confusion matrices for the best-performing fMRI (left), MRI-S3 (middle), and MRI-S4 (right) OViTAD models to classify AD vs. HC vs. MCI at subject-level.
Brainsci 13 00260 g003
Figure 4. The global attention feature map was obtained by multiplying the FC layer vector by each pixel in the fMRI brain slices and measuring the sum of the multiplication per pixel. Next, we normalized the feature map to (0, 255) and visualized the maps using the CIVIDIS color map. Finally, we selected the first slice of each time-course to demonstrate various brain morphology across the fMRI data acquisition.
Figure 4. The global attention feature map was obtained by multiplying the FC layer vector by each pixel in the fMRI brain slices and measuring the sum of the multiplication per pixel. Next, we normalized the feature map to (0, 255) and visualized the maps using the CIVIDIS color map. Finally, we selected the first slice of each time-course to demonstrate various brain morphology across the fMRI data acquisition.
Brainsci 13 00260 g004
Table 1. The demographic of two sets of ADNI data used in model development shows all the groups are older adults within an age group of >75.
Table 1. The demographic of two sets of ADNI data used in model development shows all the groups are older adults within an age group of >75.
ModalityTotalGroupParticipantFemaleAgeMaleAgeMMSE
rs-fMRI284AD542780.96 ± 4.642779.0 ± 2.7422.70 ± 2.10
HC994979.78 ± 4.765082.57 ± 3.8828.82 ± 1.35
MCI1316679.15 ± 3.096579.72 ± 4.8426.53 ± 2.51
MRI1460AD57723280.98 ± 4.6534581.27 ± 4.0823.07 ± 2.06
HC1085179.37 ± 3.545780.81 ± 4.4228.81 ± 1.35
MCI77526580.28 ± 3.3151081.61 ± 4.1526.53 ± 2.09
Table 2. To evaluate the performance of experiments referring to model-level results, we used the weighted average scores of subject-level results and calculated each experiment’s average and standard deviation across three repetitions for validation and test sets (random data splits).
Table 2. To evaluate the performance of experiments referring to model-level results, we used the weighted average scores of subject-level results and calculated each experiment’s average and standard deviation across three repetitions for validation and test sets (random data splits).
ModelDatasetPrecisionRecallF1-ScoreAccuracy
CaIT_ADMCI-HCVal0.53 ± 0.140.68 ± 0.020.57 ± 0.070.68 ± 0.02
Test0.54 ± 0.210.66 ± 0.020.53 ± 0.040.66 ± 0.02
CaIT_AD-HCMCIVal0.66 ± 00.81 ± 00.73 ± 00.81 ± 0
Test0.65 ± 00.81 ± 00.72 ± 00.81 ± 0
CaIT_AD-HC-MCIVal0.4 ± 0.140.54 ± 0.060.43 ± 0.110.54 ± 0.06
Test0.37 ± 0.010.46 ± 0.020.37 ± 0.010.46 ± 0.02
DeepViT_AD_HC_MCIVal1 ± 01 ± 01 ± 01 ± 0
Test0.96 ± 0.020.96 ± 0.020.96 ± 0.020.96 ± 0.02
DeepViT_AD_HCMCIVal0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
Test0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
DeepViT_ADMCI_HCVal1 ± 01 ± 01 ± 01 ± 0
Test0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
ViT_224_8_AD_HC_MCIVal0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
Test0.97 ± 0.030.97 ± 0.030.97 ± 0.030.97 ± 0.03
ViT_224_8_AD_HCMCIVal1 ± 01 ± 01 ± 01 ± 0
Test0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
ViT_224_8_ADMCI_HCVal1 ± 01 ± 01 ± 01 ± 0
Test0.97 ± 00.97 ± 00.97 ± 00.97 ± 0
ViT_vanilla_AD_HC_MCIVal0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
Test0.97 ± 00.97 ± 00.97 ± 00.97 ± 0
ViT_vanilla_AD_HCMCIVal1 ± 01 ± 01 ± 01 ± 0
Test0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
ViT_vanilla_ADMCI_HCVal1 ± 01 ± 01 ± 01 ± 0
Test0.98 ± 0.020.98 ± 0.020.98 ± 0.020.98 ± 0.02
OViTAD_AD_HC_MCIVal0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
Test0.97 ± 00.97 ± 00.97 ± 00.97 ± 0
OViTAD_AD_HCMCIVal0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
Test0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
OViTAD_ADMCI_HCVal1 ± 01 ± 01 ± 01 ± 0
Test0.98 ± 0.020.98 ± 0.020.98 ± 0.020.98 ± 0.02
OViTAD_AD_HCVal1 ± 01 ± 01 ± 01 ± 0
Test0.99 ± 0.020.99 ± 0.020.99 ± 0.020.99 ± 0.02
OViTAD_HC_MCIVal1 ± 01 ± 01 ± 01 ± 0
Test0.97 ± 0.030.97 ± 0.030.97 ± 0.030.97 ± 0.03
Table 3. The models’ performance of two sets for structural MRI experiments evaluated by standard evaluation metrics.
Table 3. The models’ performance of two sets for structural MRI experiments evaluated by standard evaluation metrics.
ModelDatasetPrecisionRecallF1-ScoreAccuracy
CaIT_S3_AD-HC-MCIVal0.74 ± 0.020.8 ± 0.020.77 ± 0.020.8 ± 0.02
Test0.71 ± 0.020.77 ± 0.030.73 ± 0.030.77 ± 0.03
CaIT_S3_AD-HCMCIVal0.72 ± 0.020.72 ± 0.030.7 ± 0.030.72 ± 0.03
Test0.71 ± 0.020.7 ± 0.010.69 ± 0.010.7 ± 0.01
CaIT_S3_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
DeepViT_S3_AD-HC-MCIVal0.81 ± 0.060.85 ± 0.030.82 ± 0.040.85 ± 0.03
Test0.77 ± 0.020.84 ± 0.020.81 ± 0.020.84 ± 0.02
DeepViT_S3_AD-HCMCIVal0.84 ± 0.040.84 ± 0.050.84 ± 0.050.84 ± 0.05
Test0.83 ± 0.040.83 ± 0.040.83 ± 0.040.83 ± 0.04
DeepViT_S3_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
ResNet50_S3_AD-HC-MCIVal0.85 ± 0.090.86 ± 0.050.84 ± 0.060.86 ± 0.05
Test0.83 ± 0.060.85 ± 0.020.82 ± 0.020.85 ± 0.02
ResNet50_S3_AD-HCMCIVal0.84 ± 0.010.84 ± 0.010.84 ± 0.010.84 ± 0.01
Test0.84 ± 0.040.84 ± 0.040.84 ± 0.040.84 ± 0.04
ResNet50_S3_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
ViT_S3_AD-HC-MCIVal0.84 ± 0.050.88 ± 0.020.85 ± 0.030.88 ± 0.02
Test0.84 ± 0.080.85 ± 0.030.83 ± 0.040.85 ± 0.03
ViT_S3_AD-HCMCIVal0.84 ± 0.010.84 ± 0.010.83 ± 0.020.84 ± 0.01
Test0.84 ± 0.030.83 ± 0.020.83 ± 0.020.83 ± 0.02
ViT_S3_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
OViTAD_S3_AD-HC-MCIVal0.78 ± 0.020.84 ± 0.020.81 ± 0.020.84 ± 0.02
Test0.75 ± 0.030.82 ± 0.030.79 ± 0.030.82 ± 0.03
OViTAD_S3_AD-HCMCIVal0.79 ± 0.030.77 ± 0.050.75 ± 0.070.77 ± 0.05
Test0.79 ± 0.020.77 ± 0.040.75 ± 0.060.77 ± 0.04
OViTAD_S3_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
OViTAD_S3_AD-HCVal1 ± 01 ± 01 ± 01 ± 0
Test1 ± 01 ± 01 ± 01 ± 0
OViTAD_S3_HC-MCIVal1 ± 01 ± 01 ± 01 ± 0
Test1 ± 01 ± 01 ± 01 ± 0
CaIT_S4_AD-HC-MCIVal0.84 ± 0.030.9 ± 0.030.87 ± 0.030.9 ± 0.03
Test0.81 ± 0.010.88 ± 0.010.84 ± 0.010.88 ± 0.01
CaIT_S4_AD-HCMCIVal0.87 ± 0.020.87 ± 0.020.87 ± 0.020.87 ± 0.02
Test0.86 ± 0.020.86 ± 0.020.86 ± 0.020.86 ± 0.02
CaIT_S4_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
DeepViT_S4_AD-HC-MCIVal0.85 ± 0.010.91 ± 0.010.88 ± 0.010.91 ± 0.01
Test0.82 ± 0.010.88 ± 0.010.85 ± 0.010.88 ± 0.01
DeepViT_S4_AD-HCMCIVal0.91 ± 0.010.91 ± 0.010.91 ± 0.010.91 ± 0.01
Test0.89 ± 0.020.89 ± 0.020.89 ± 0.020.89 ± 0.02
DeepViT_S4_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
ResNet50_S4_AD-HC-MCIVal0.93 ± 0.010.93 ± 0.010.91 ± 0.010.93 ± 0.01
Test0.89 ± 0.060.91 ± 0.020.89 ± 0.030.91 ± 0.02
ResNet50_S4_AD-HCMCIVal0.93 ± 00.93 ± 00.93 ± 00.93 ± 0
Test0.91 ± 0.010.91 ± 0.010.91 ± 0.010.91 ± 0.01
ResNet50_S4_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.87 ± 0.050.92 ± 00.89 ± 0.010.92 ± 0
ViT_S4_AD-HC-MCIVal0.9 ± 0.060.91 ± 0.020.89 ± 0.020.91 ± 0.02
Test0.88 ± 0.060.9 ± 0.020.87 ± 0.020.9 ± 0.02
ViT_S4_AD-HCMCIVal0.91 ± 00.91 ± 00.91 ± 00.91 ± 0
Test0.9 ± 0.030.9 ± 0.030.9 ± 0.030.9 ± 0.03
ViT_S4_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
OViTAD_S4_AD-HC-MCIVal0.86 ± 0.060.9 ± 0.020.87 ± 0.030.9 ± 0.02
Test0.81 ± 0.010.87 ± 0.010.84 ± 0.010.87 ± 0.01
OViTAD_S4_AD-HCMCIVal0.9 ± 00.89 ± 00.89 ± 00.89 ± 0
Test0.89 ± 0.020.88 ± 0.020.88 ± 0.020.88 ± 0.02
OViTAD_S4_ADMCI-HCVal0.87 ± 00.93 ± 00.9 ± 00.93 ± 0
Test0.85 ± 00.92 ± 00.88 ± 00.92 ± 0
OViTAD_S4_AD-HCVal1 ± 01 ± 01 ± 01 ± 0
Test1 ± 01 ± 01 ± 01 ± 0
OViTAD_S4_HC-MCIVal1 ± 01 ± 01 ± 01 ± 0
Test1 ± 01 ± 01 ± 01 ± 0
Table 4. The number of trainable parameters reduced by 28% compared to vanilla vision transformer and DeepViT while producing higher performance in fMRI and similar performance to other models in structural MRI data.
Table 4. The number of trainable parameters reduced by 28% compared to vanilla vision transformer and DeepViT while producing higher performance in fMRI and similar performance to other models in structural MRI data.
Model Input (Channel,x,y)Params
CaIT3,224,224120,707,075
DeepViT3,224,22453,532,867
ViT-vanilla3,224,22453,532,675
ViT-224-83,224,22440,949,763
OViTAD3,56,5638,406,147
Table 5. Comparison between recent studies of Alzheimer’s classification using ADNI and our OViTAD. The analysis shows that our study addresses a broader classification aspect with novel vision transformer technology, and our model performance outperformed the literature. Further details is found at Table A7.
Table 5. Comparison between recent studies of Alzheimer’s classification using ADNI and our OViTAD. The analysis shows that our study addresses a broader classification aspect with novel vision transformer technology, and our model performance outperformed the literature. Further details is found at Table A7.
ReferenceModalityAD vs. HC vs. MCIAD + MCI vs. HCAD vs. MCI+HCAD vs. HCMCI vs. HC
Lin et al. 2018 [100]MRI---88.79%-
Dimitriadis et al. 2018 [101]MRI61.90%----
Kruthika et al. 2019 [102]MRI90.47%----
Spasov et al. 2019 [103]MRI + Clinical----86%
Basaia et al. 2019 [104]MRI---98%87%
Abrol et al. 2020 [105]MRI83.01%----
Shao et al. 2020 [106]MRI+PET---92.51%82.53%
Alinsaif et al. 2021 [107]MRI-70.50%-62.22%-
Alinsaif et al. 2021 [107]MRI-91.61%-92.78%-
Ramzan et al. 2019 [63]rs-fMRI97.92%----
Hojjati et al. 2018 [108]MRI + rs-fMRI--93%--
Cui et al. 2019 [109]MRI---91.33%-
Amoroso et al. 2018 [110]MRI38.80%----
Buvaneswari et al. 2021 [111]rs-fMRI----79.15%
Duc et al. 2019 [61]rs-fMRI+Clinical---85.27%-
OViTAD—fMRIrs-fMRI0.97 ± 00.98 ± 0.020.99 ± 0.020.99 ± 0.020.97 ± 0.03
OViTAD—MRI (Sigma = 3)MRI0.9955% ± 0.00391 ± 00.9955 ± 0.00391 ± 01 ± 0
OViTAD—MRI (Sigma = 4)MRI0.9955% ± 0.00391 ± 00.9955 ± 0.00391 ± 01 ± 0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sarraf, S.; Sarraf, A.; DeSouza, D.D.; Anderson, J.A.E.; Kabia, M.; The Alzheimer’s Disease Neuroimaging Initiative. OViTAD: Optimized Vision Transformer to Predict Various Stages of Alzheimer’s Disease Using Resting-State fMRI and Structural MRI Data. Brain Sci. 2023, 13, 260. https://doi.org/10.3390/brainsci13020260

AMA Style

Sarraf S, Sarraf A, DeSouza DD, Anderson JAE, Kabia M, The Alzheimer’s Disease Neuroimaging Initiative. OViTAD: Optimized Vision Transformer to Predict Various Stages of Alzheimer’s Disease Using Resting-State fMRI and Structural MRI Data. Brain Sciences. 2023; 13(2):260. https://doi.org/10.3390/brainsci13020260

Chicago/Turabian Style

Sarraf, Saman, Arman Sarraf, Danielle D. DeSouza, John A. E. Anderson, Milton Kabia, and The Alzheimer’s Disease Neuroimaging Initiative. 2023. "OViTAD: Optimized Vision Transformer to Predict Various Stages of Alzheimer’s Disease Using Resting-State fMRI and Structural MRI Data" Brain Sciences 13, no. 2: 260. https://doi.org/10.3390/brainsci13020260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop