Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease

Almufareh, Maram Fahaad; Tehsin, Samabia; Humayun, Mamoona; Kausar, Sumaira

doi:10.3390/healthcare11202763

Open AccessArticle

Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease

¹

Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakakah 72388, Saudi Arabia

²

Department of Computer Science, Bahria University, Islamabad 44000, Pakistan

^*

Author to whom correspondence should be addressed.

Healthcare 2023, 11(20), 2763; https://doi.org/10.3390/healthcare11202763

Submission received: 25 September 2023 / Revised: 16 October 2023 / Accepted: 17 October 2023 / Published: 18 October 2023

(This article belongs to the Section Health Assessments)

Download

Browse Figures

Versions Notes

Abstract

:

Alzheimer’s disease is a common neurological disorder and mental disability that causes memory loss and cognitive decline, presenting a major challenge to public health due to its impact on millions of individuals worldwide. It is crucial to diagnose and treat Alzheimer’s in a timely manner to improve the quality of life of both patients and caregivers. In the recent past, machine learning techniques have showed potential in detecting Alzheimer’s disease by examining neuroimaging data, especially Magnetic Resonance Imaging (MRI). This research proposes an attention-based mechanism that employs the vision transformer approach to detect Alzheimer’s using MRI images. The presented technique applies preprocessing to the MRI images and forwards them to a vision transformer network for classification. This network is trained on the publicly available Kaggle dataset, and it illustrated impressive results with an accuracy of 99.06%, precision of 99.06%, recall of 99.14%, and F1-score of 99.1%. Furthermore, a comparative study is also conducted to evaluate the performance of the proposed method against various state-of-the-art techniques on diverse datasets. The proposed method demonstrated superior performance, outperforming other published methods when applied to the Kaggle dataset.

Keywords:

mental disability; diagnosis; Alzheimer’s disease; medical image analysis; vision transformer

1. Introduction

Alzheimer’s disease (AD) is a condition of neurodegeneration that is distinguished by the gradual deterioration in memory, cognitive skills, and reasoning capabilities. It is a leading cause of dementia in the elderly, with symptoms typically starting in mid- to late life. The underlying mechanisms of AD are not yet fully understood; however, it is hypothesized that AD arises from a blend of genetic and environmental elements. The brain of an Alzheimer’s patient experiences the loss of neurons and connections between neurons, leading to a decline in cognitive function. Presently, no definitive remedy for AD has been identified, and there are multiple treatment options available that can aid in symptom management and decelerate the progression of the condition.

The timely detection of AD holds significant importance because it can help individuals and their families prepare for the future, access treatments and support services, and plan for long-term care. However, early detection can pose challenges as the symptoms of AD can resemble those of other conditions and may even be considered a normal part of the aging process. In numerous countries, including the United States, AD is officially recognized as a disability. As per the Americans with Disabilities Act (ADA), a disability refers to a substantial impairment in one or more crucial life activities arising from a mental or physical disability, such as mobility, vision, hearing, or cognitive function. Given that AD can substantially impede a person’s capacity to think, remember, and communicate effectively—key aspects of daily life—it is regarded as a disability. This limitation can make it challenging for individuals with AD to perform tasks they were previously capable of, such as managing finances or driving. Moreover, AD has the potential to affect an individual’s ability to involve themselves in social activities, leading to social isolation and a decline in overall quality of life. Under the ADA, people with AD are entitled to reasonable accommodations in the workplace, such as modified work schedules or job duties, as well as access to public accommodations and services, such as transportation or housing. Moreover, individuals diagnosed with AD might be eligible for Social Security Disability Insurance (SSDI) or Supplemental Security Income (SSI) benefits if their condition hinders their ability to engage in gainful employment and sustain a livelihood.

The diagnostic process for AD entails a thorough assessment that encompasses a medical history and physical examination, a neurological evaluation, cognitive and psychological assessments, brain imaging studies such as Magnetic Resonance Imaging (MRI) or positron emission tomography (PET) scans, and laboratory tests to exclude other disorders with similar symptoms. Structural Magnetic Resonance Imaging (sMRI) is extensively utilized to study progressive neurological impairments. This non-invasive technique employs radio waves and strong magnetic fields to analyze brain anatomical changes, ensuring a high level of spatial resolution and providing a painless imaging method [1]. The field of Artificial Intelligence (AI) is progressing at a swift pace, exhibiting the capacity to transform numerous sectors. AI could transform many industries. The progress that has been made in AI is very encouraging from a scientific perspective. AI is having a significant impact on healthcare, particularly in the field of medical diagnosis. AI is being used to create systems that can analyze large amounts of medical data and identify patterns that may be indicative of disease. This research emphasizes the use of AI techniques to diagnose AD using brain MRIs. The novelty of the presented technique lies in its application of the vision transformer approach, originally designed for natural language processing tasks, to detect Alzheimer’s disease using MRI images. By employing an attention-based mechanism within the vision transformer network, the method showcases remarkable efficacy in accurately diagnosing Alzheimer’s. Furthermore, the paper includes a comprehensive comparative study that demonstrates the superiority of the proposed approach over various state-of-the-art techniques when applied to the Kaggle dataset. This novel combination of the vision transformer and attention-based mechanism for Alzheimer’s detection contributes to the growing body of research in the field of neuroimaging and machine learning.

The first section of this article represents the introduction of the domain and research problem. After the introduction, this paper provides an analysis of related state-of-the-art work and identifies gaps in the research. The third section describes the methodology, including the proposed algorithms and techniques for data acquisition, preprocessing, feature extraction, and classification. The results and analysis section presents the research findings, including evaluation metrics used to assess the model’s performance and analysis of the results. Furthermore, this section compares the achieved results with other prominent studies. This paper is concluded in the last section by summarizing the main contributions and the future research directions.

2. Literature Review

The advancements achieved in AI are highly promising from a scientific standpoint [2,3,4]. The area of computer-aided diagnosis is advancing rapidly [5,6,7]. A variety of studies have examined the use of AI, machine learning, and image analysis techniques to examine brain scans and identify AD. Specifically, AI systems can be trained to identify patterns in MRI, fMRI, and other types of scans that are characteristic of AD such as the shrinkage or dilation in specific areas of the brain.

2.1. Machine Learning Methods

There are many methods in the literature that use traditional machine learning models for the detection of Alzheimer’s. Kloppel et al. [8] proposed a technique to lessen the dimensionality of input features. Furthermore, the detection of AD patients can be accomplished through the utilization of a support vector machine (SVM) algorithm. A poly-modal discriminator for PET and MRI images employing a random forest predictor was proposed in Gray et al.’s study [9]. In another work by Neati et al. [10], a union of SVM and kernel principal component analysis (PCA) was introduced for dimensionality reduction and extraction of information from MRI images, achieving 92.5% accuracy on the OASIS datasets.

In their study [10], researchers aimed to achieve 100% accuracy by introducing wavelet entropy and the biography optimization engine. They employed a six-fold cross-validation technique on sixty-four brain images. Another approach for improving feature extraction and selection accuracy on datasets of AD and normal control patients was developed by [11]. They utilized a matrix of gray-level occurrences and an approach based on voxel-based morphometry. El-Dahshan et al. [12] presented two classifiers for feature representation and selection of weighted MRI images obtained from the Harvard Medical School and facilitated the multi-classification of AD.

Wavelet entropy and the multilayer perceptron were employed by Wang et al. [13] to develop a new AD detection system. The technique was tested on three-dimensional volumetric data by selecting optimum slices. Li et al. [14] obtained ninety-three 3D features from MRI and PET images using ROI, and subsequently conducted PCA.

2.2. Deep Learning Methods

Deep learning-based techniques have shown promising results in detecting AD [15]. These methods typically involve using neural network algorithms to analyze medical imaging data like MRI or PET scans, and identify characteristic patterns associated with AD.

One example of a deep learning method used for AD detection is a Convolutional Neural Network (CNN), which has been used to analyze structural MRI scans and predict the presence of AD. In the realm of deep learning, scholars can devise their own framework by relying on Convolutional Neural Networks (CNNs), which have demonstrated remarkable success. Numerous refined CNN models have surfaced and offer a proficient means of detecting AD, like VGGNet, AlexNet, ResNet, DenseNet, and Inception [16].

In another study [17], a high-performance multiple sclerosis classification model was developed. AlexNet served as the foundational model, and transferred learning was applied to adapt it for the specific task of classifying multiple-sclerosis brain images. Various configurations of transfer learning were tested, involving the transfer and replacement of different numbers of layers. Another study is presented to automate the detection of pathological brain regions in Magnetic Resonance Imaging (MRIs) images using a deep learning structure combined with transfer learning [18]. Initially, a pre-trained AlexNet structure was obtained, and then the parameters of the last three layers were replaced with random weights, while the remaining parameters retained their pre-trained values.

Another example is the utilization of Recurrent Neural Networks (RNNs) to analyze functional MRI data and predict cognitive decline in AD patients.

In this study [19], the researchers aim to predict the clinical diagnosis, cognition, and ventricular volume of individuals based on multimodal AD markers and clinical diagnosis data from one or more time points. To achieve this, a minimal Recurrent Neural Network (minimalRNN) model was proposed and applied to longitudinal data from The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge.

The researchers devise an automated classification method capable of effectively handling EEG data [20]. The study demonstrates the use of the RNN robust principal component analysis (RPCA) algorithm for AD detection. In another study [21], neuropsychological measures and MRI biomarkers are derived and subsequently fed into an RNN. The RNN utilizes the long short-term memory (LSTM) architecture, and the proposed model aims to forecast the biomarkers (feature vectors) of patients at 6, 12, 18, 21, 24, and 36 months into the future.

Ren et al. [22] proposed three classifiers—SBPCNNs with a single slice, SACNNs with simple assembly, and MSCNNs with multiple slices—to observe the model using a reduced number of slices, achieving over 90% performance in AD detection. Zhang et. al. [23] presented a deep neural network framework for sMRI gray matter slicing, using the attention approach to enhance characteristic information and improve accuracy by 1–8% in comparison to contemporary techniques.

Suk et al. [24] used resting-state functional Magnetic Resonance Imaging (rs-fMRI) to select Regions of Interest (ROIs), and then trained a deep model for every ROI. They also proposed an auto-encoder to uncover hierarchical connections between different ROIs.

In [25], Shi et al. proposed a stacked denoising sparse auto-encoder that utilized an SVM for classification. Subsequently, ref. [26] employed a deep belief network composed of a layered Restricted Boltzmann Machine for AD detection, and the results demonstrated superior performance compared to the SVM. In their work, Shakeri et al. [27] employed a deep probabilistic autoencoder to acquire low-dimensional feature portrayal from hippocampal morphological variations. Another study [28] introduced a neural network architecture with multi-scale depth for the purpose of AD diagnosis. Their approach utilized the basic-level patch features extracted from PET images.

The study in [29] utilizes the vision transformer architecture to automatically detect Alzheimer’s patients from healthy controls. The vision transformer architecture is chosen for its ability to effectively capture global or long-range relationships among image features. To improve the network’s performance, frequency and image domain features are integrated since MRI data are acquired in the frequency domain before being transformed into images. Another study [30] introduces a novel approach to AD classification using a Dual-Input Convolution Encoder Network (DICE-net). The proposed method involves denoising the EEG data, followed by the extraction of band power and coherence features. These extracted features are then fed into the DICE-net, which comprises convolution, transformer encoder, and feedforward layers.

A very deep convolutional network is designed by Islam et al. [31], and the performance is demonstrated on the Open Access Series of Imaging Studies (OASIS) database.

Deep learning techniques have shown great promise in detecting AD. In most of the cases, they have outperformed other image analysis methods and even human experts in certain instances. However, there are a few limitations to deep learning methods for AD detection. Notwithstanding these limitations, deep learning methods have the capacity to revolutionize the diagnosis and treatment of AD.

3. Proposed Methodology

3.1. Data

The dataset for AD was obtained from the Open Access Series of Imaging Studies-1 (OASIS-1) [32,33]. The OASIS is a project with the goal of providing the scientific community with open access to neuroimaging datasets of the brain. Its objective is to promote advancements in both basic and clinical neuroscience by gathering and openly sharing neuroimaging datasets, thus fostering potential breakthroughs in the field. It consists of cross-sectional MRI data from young, middle-aged, non-demented, and demented older adults. It comprises 80k MRI images categorized into quaternary classes: Mild-Demented (MID), Moderate Demented (MOD), Non-Demented (ND), and Very Mild Demented (VMD). Patient classification was performed based on the provided metadata and Clinical Dementia Rating (CDR) values. CDR is based on Mini-Mental State Exam (MMSE) for Alzheimer’s [34]. The dataset undergoes partitioning in an 80–20 ratio, with 80% of the data allocated for training and the remaining 20% designated for validation purposes. The 80:20 split validation offered a reasonable baseline to showcase the performance gain achieved by the advanced architecture compared to previous models. This setup allowed us to directly compare against prior studies, providing a benchmark for the community. Many medical researchers have used this split validation approach [35,36,37,38]. The images in the dataset have a size of 176 × 208 pixels. Figure 1 shows the distribution of data in feature space for various classes.

3.2. Data Preparation

Data augmentation is a technique used to increase the size of a training dataset by creating new data points from existing data points. This can help to improve the performance of deep learning models by reducing overfitting. This research uses rotation and flipping techniques to enhance the size of training data [39]. Rotation involves rotating an image by a certain angle

γ

to create a new image. The mathematical representation of rotation can be given as follows:

I m g_{r o t a t e d} (s, t) = I m g (s . c o s (γ) - t . s i n (γ), s . s i n (γ) + t . c o s (γ))

(1)

where

I m g (s, t)

represents the original image,

I m g_{r o t a t e d} (s, t)

represents the rotated image, and

γ

represents the angle of rotation. Flipping involves horizontally or vertically mirroring an image to create a new image. The mathematical representation of flipping can be given as follows:

I m g_{V f l i p p e d} (s, t) = I m g (s, H - t)

(2)

I m g_{H f l i p p e d} (s, t) = I m g (W - s, t)

(3)

where

I m g (s, t)

represents the original image,

I m g_{V f l i p p e d} (s, t)

represents the vertically flipped image, and

I m g_{H f l i p p e d} (s, t)

represents the horizontally flipped image. The variables W and H mean the no. of columns and rows of the image, correspondingly.

3.3. Preprocessing

The input image is normalized first. Image normalization is a process of rescaling pixel values of an image so that they lie within a certain range of values [40]. The purpose of normalization is to standardize the intensity values of an image to facilitate comparison and analysis across different images. Mathematically, image normalization can be represented:

I m g_{n o r m} = (I m g - m i n (I m g)) / (m a x (I m g) - m i n (I m g))

(4)

where

I m g

is the original image,

I m g_{n o r m}

is the normalized image, and

m i n (I m g)

and

m a x (I m g)

represent the lowest and highest pixel values of

I m g

, respectively. The resulting

I m g_{n o r m}

values will be in the range of 0 to 1. After normalization, the image is treated for bias field correction. Bias field correction is a technique used to correct for intensity inhomogeneities or non-uniformities in the MRI signal. These non-uniformities can be caused by various factors such as magnetic field inhomogeneities, patient motion, and hardware issues. Mathematically, the bias field correction can be represented as follows:

I_{c o r r} = I m g_{n o r m} / B

(5)

where

I m g_{n o r m}

is the original image, B is the estimated bias field calculated by the Non-Parametric Non-Uniformity Normalization (N3) algorithm [41], and

I_{c o r r}

is the corrected image.

3.4. Feature Extraction

The proposed methodology uses transformer networks for feature extraction. Moreover, it uses the concept of transfer learning to deal with the data scarcity issue.

Given an input image

I m g

\in R^{H x W x C}

with

l a b e l

as the ground truth value of v binary labels

l a b_{1}, l a b_{2}, \dots \dots l a b_{v}

, where

l a b_{1} \in 0, 1

,

v = 4

represents four different classes of Alzheimer patients. The aim of this methodology is to structure a classification function,

c f

, to predict a set of labels provided for an image so that

l \hat{a} b = c f (I m g)

. A vision transformer refers to a specific neural network architecture that draws inspiration from the transformer architecture, originally created for tasks in natural language processing, stands as its initial application [42]. In a vision transformer, the input is typically an image, and the goal is to perform tasks like object identification, segmentation, or classification [43,44]. Figure 2 shows architectural details of the proposed methodology. The first step in the vision transformer is to convert the input image into a sequence of tokens suitable for the transformer architecture. Subsequent paragraphs explain each module of the architecture in detail. The first step is the tokenization of the input image. Suppose we have an input image x of size H × W with C channels, which is first transformed into a sequence of 1D tokens of length

N = H

× W, with each token having a dimension of C. This is performed using an embedding layer that maps each pixel in the image to a token vector:

E (x) = [e_{1}, e_{2}, \dots, e_{N}] \in R^{(N x D)}

(6)

where D is the dimensionality of the token embedding, which is a hyperparameter of the model. Each token vector

e_{i} \in R^{D}

is obtained by applying a linear projection followed by a non-linearity such as ReLU. Next, we apply a positional encoding for the linear projection of flattened layers to encode the spatial characteristics of the image. The positional encoding is defined in as follows:

P o s E (p o s_{i}, 2 j) = s i n (p o s_{i} / {10, 000}^{(2 j / D)})

(7)

P o s E (p o s_{i}, 2 j + 1) = c o s (p o s_{i} / {10, 000}^{(2 j / D)})

(8)

where

p o s_{i}

is the position of the i-th token and j is the index of the dimension of the token embedding. The positional encoding is added to the token embeddings to obtain the input sequence for the transformer:

X = E (x) + P o s E

(9)

The next module is the transformer encoder. The transformer consists of a stack of L identical blocks, each containing a multi-head self-attention mechanism

(M H)

and a feedforward network also referred to as multi-layer perceptron (MLP). The attention mechanism allows the model to focus on relevant parts of the image while suppressing noise and irrelevant regions. The

M H

is defined by

M H (V, Q, K) = j o i n (h d_{1}, h d_{2}, \dots, h d_{h}) W^{O}

(10)

where

V, K, Q \in R^{(N x d)}

are the value, key, and query matrices, correspondingly, and d is the dimensionality of the attention space, which is typically smaller than D.

h d_{i} = A t t e n t i o n (V W_{i}^{V}, Q W_{i}^{Q}, K W_{i}^{K}) \in R^{(N x d / h)}

is the i-th attention head, where

W_{i}^{Q}, W_{i}^{K}

, and

W_{i}^{V}

are learnable projection matrices, and

A t t e n t i o n (V, Q, K) = s o f t m a x (Q K^{T} / s q r t (d)) V

is the scaled dot-product attention function. The output of the multi-head attention mechanism is concatenated and projected using a learnable matrix

W^{O}

to obtain the outcome of the block. The feedforward network is a two-layer MLP with a GELU activation function, defined as follows:

F F N (I m g) = G E L U (I m g W_{1} + b_{1}) W_{2} + b_{2}

(11)

where

W_{1}, W_{2} \in R^{(d x 4 d)}

and

b_{1}, b_{2} \in R^{4 d}

are learnable parameters. The output of the transformer is obtained by applying L blocks of the multi-head self-attention mechanism and feedforward network to the input sequence X:

Y = T r a n s f o r m e r (X) = B l o c k_{L} (B l o c k_{L - 1} (\dots (B l o c k_{1} (X))))

(12)

3.5. Classification

Finally, once the dependencies between features and labels have been modeled using the Transformer encoder, a classifier is employed to make the final predictions for the labels. An independent feedforward network

({F F N}_{i})

is employed for final label assignment.

ℓ_{i}^{,} .

{F F N}_{i}

contains two linear layers.

l \hat{a} b = F F N_{ℓ_{i}} = s o f t m a x (R e l u (ℓ_{i} . w i c + b_{j}) + b_{i})

(13)

The loss function employed is categorical cross-entropy, and the optimization is performed using the Adam optimizer. In this study, a pre-trained vision transformer (ViT) model, initially trained on the ImageNet dataset, was employed as the foundation. Subsequently, the model underwent a fine-tuning process utilizing MRI images.

4. Evaluation Metrics

Various assessment metrics are employed to evaluate the proposed methodology. The list includes the confusion matrix, precision, recall, F-1 score, and micro-averaged accuracy. A confusion matrix is a tabular representation commonly employed for assessing the effectiveness of a classification model. The table displays the actual classification results against the predicted results of the model. A confusion matrix for a multiclass classification problem is similar to the binary confusion matrix, but it includes more than two classes. The confusion matrix for a multiclass problem is a square matrix, where each row corresponds to the instances predicted for a specific class, while each column represents the instances belonging to a particular actual class. The diagonal elements of the matrix indicate the number of correctly predicted samples, whereas the off-diagonal elements correspond to instances that were classified incorrectly.

For a multiclass classification problem, precision, recall, and F1-score can be defined based on the confusion matrix.

Precision quantifies the proportion of instances classified as positive that are genuinely positive. For class i, precision is defined as follows:

p r e c i s i o n_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}

(14)

Recall measures how many of the actual positive instances are predicted as positive. For class i, recall is defined as follows:

r e c a l l_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}}

(15)

T P_{i}

refers to the count of correctly identified instances belonging to class i as true positives,

F P_{i}

represents the count of false positives for respective class i, and

F N_{i}

represents the instances of false negatives for class i.

The F1-score is a metric that strikes a balance between precision and recall by calculating their harmonic mean. For class i, the F1-score is defined as follows:

F 1 - s c o r e_{i} = 2 \times \frac{p r e c i s i o n_{i} \times r e c a l l_{i}}{p r e c i s i o n_{i} + r e c a l l_{i}}

(16)

5. Result and Analysis

Figure 3 illustrates the confusion matrix of the proposed classification model for a four-class Alzheimer’s detection problem. The first row of the matrix represents the predicted instances for the Mild-Demented class. The model predicted this class correctly 500 times and incorrectly predicted instances of other classes as Mild-Demented zero times. The second row represents the predicted instances for the Moderate-Demented class. The model predicted this class correctly 48 times and incorrectly predicted instances of other classes as Moderate-Demented zero times. The third row represents the predicted instances for the Non-Demented class. The model predicted this class correctly 580 times. The fourth row represents the predicted instances for the Very Mild-Demented class. The model predicted this class correctly 571 times and incorrectly predicted instances of other classes as Very-Mild-Demented one time. This confusion matrix shows that the model has high accuracy for predicting all the classes. The model tends to correctly predict the Mild-, Moderate-, and Non-Demented classes.

In Figure 4, the loss-accuracy curves for the training process are depicted, as well as the validation for Alzheimer’s detection. Loss curves are an important tool for monitoring the training and testing performance of machine learning models. The loss function measures how far off the predicted values are from the actual values. The goal is to find the model that minimizes this difference.

During the training process, the model is optimized to minimize the loss function by adjusting its parameters. The loss curves track the value of the loss function over the training and validation datasets as the model is trained over multiple epochs.

The validation loss curve shows how the value of the loss function changes during the training process on a separate dataset, called the validation dataset. The validation dataset is employed for the evaluation of the model’s capabilities on data that are not part of the training dataset. As the model learns from the training dataset, it also performs well on the validation dataset.

Figure 5 shows boxplots for different evaluation metrics of the Alzheimer’s detection process. Boxplots can provide a quick visual summary of the variability of values in a training process. They show the median, upper and lower quartiles, and minimum and maximum values.

Table 1 represents the results of Alzheimer’s detection on each stage. The model’s performance is evaluated using three common metrics for classification problems: precision, recall, and F1-score. Looking at the table, we can see that the model has excellent performance for all four classes, as indicated by high precision, recall, and F1-scores. These results suggest that the model has learned to accurately classify AD patients into their respective stages with high accuracy.

A transformer module comprises numerous heads, and each head projects the input data to unique sub-spaces in a transformer block. This enables every single head to focus on diverse portions of the image. Hence, it is logical to display each attention head map individually to understand what each head is focusing on. Figure 6 shows the attention head maps for a sample mild AD image. The top row of the headmap represents the mean of all the attention head maps, providing an overall view of the regions that the model attends to the most. The second and third rows show individual attention head maps, each corresponding to a specific attention head in the network. These individual maps reveal the model’s attention patterns for specific aspects or features in the image. The underlying motivation behind this visual interpretation is to gain insights into the model’s reasoning process and understand which regions or features are critical for detecting mild Alzheimer’s disease. By visualizing the attention head maps, researchers can identify regions in the image that receive higher attention weights, indicating their importance in making the final prediction.

An ablation study is a technique employed to evaluate the significance of various features or components within a model. By selectively altering or eliminating specific elements of the model, researchers can examine how these modifications affect the model’s performance. Through this process, valuable insights can be obtained regarding the relative importance or influence of each individual factor. Different factors are analyzed for the optimized results of the presented technique. Figure 7 shows the results of different choices. Various learning rates are examined during the training phase of the proposed method. The learning rate is denoted as

l r

, with

l r

taking values from the set

{0.001, 0.01, 0.005, 0.05, 0.0001, 0.0005}

. An

l r

of

0.0001

gives the best results. Two classifiers, namely softmax and support vector machine (SVM), are also compared. Softmax gives slightly better results. Feature fusion can be performed in different ways in the transformer architecture. We employed three different ways of feature fusion, namely addition, multiplication, and concatenation. Feature fusion using concatenation outperformed the other two methods.

The comparison table (Table 2) represents the performance evaluation of different algorithms for AD detection, using their precision, recall, and F1-score as criteria. All the reported results are on same dataset. The table shows the outcomes of five different methods, including four previously published methods and the proposed method. The proposed method outruns the rest of the techniques in terms of accuracy, precision, recall, and F1-score, attaining a precision of 99.99%, recall of 99.99%, and F1-score of 99.99%. This suggests that the proposed method has learned to classify AD with high accuracy and precision and can identify a high proportion of true positive instances while minimizing the number of false negatives.

The proposed deep learning model demonstrates remarkable performance in detecting Alzheimer’s disease using MRI images. The confusion matrix analysis reveals high accuracy in predicting the Demented and Non-Demented classes. The loss-accuracy curves exhibit effective training and validation performance, showcasing the model’s ability to generalize well. Moreover, the attention head maps provide insights into the regions of focus for the vision transformer during classification. The ablation study highlights the importance of various choices for optimizing the proposed technique. Comparison with other methods demonstrates the superiority of the proposed approach in Alzheimer’s detection on the chosen dataset. However, the model’s performance may vary depending on the dataset and modality employed, warranting further research for broader applicability.

6. Limitations and Future Recommendations

This research demonstrates promising results in an academic setting, but the absence of clinical validation and real-world testing hinders the practical applicability of the proposed method in clinical diagnosis and treatment. Moreover, the research relies on a single publicly available Kaggle dataset, which might introduce dataset-specific biases and limit the generalizability of the proposed model to diverse populations or different acquisition protocols.

To validate the proposed method’s generalizability, future research should evaluate the model on multiple independent datasets with varied demographics and imaging protocols, spanning different populations and geographical regions. Conducting clinical trials and collaborating with medical experts for expert validation would strengthen the proposed method’s credibility and potential adoption in real clinical settings. Other AI and machine learning techniques can also be used for the focused problem [46,47]. Incorporating explainable AI techniques can enhance the model’s interpretability, allowing clinicians to understand the features contributing to the diagnosis and fostering trust in the AI system.

7. Conclusions

The proposed approach offers an AD detection method that utilizes a vision transformer architecture with MRI images. The methodology includes preprocessing the MRI images and inputting them into the vision transformer network for classification. By leveraging self-attention mechanisms, the network is trained on a Kaggle dataset of MRI images to learn discriminative features for AD detection.

The experimental findings illustrate the remarkable effectiveness of the proposed approach in AD detection using MRI, surpassing other published methods on the Kaggle dataset. However, it is crucial to acknowledge that performance may vary depending on the specific dataset and imaging modality utilized. As a result, future research endeavors should concentrate on devising effective techniques for AD detection encompassing diverse datasets and modalities. Furthermore, several alternative methods have undergone evaluation on different datasets and modalities, in addition to the proposed approach. These methods include DEMNET, EfficientNetV2B1, InceptionResnetV2, InceptionV3, and Acharya. These methods achieved varying levels of accuracy, precision, recall, and F1-score, depending on the dataset and modality used. An area that holds promise for future investigation is the exploration of multimodal approaches that integrate distinct modalities, such as MRI and fMRI, to enhance AD detection capabilities. Another area for future research is the development of methods that can handle imbalanced datasets, which is a common issue in AD detection due to the large number of healthy controls compared to patients. Moreover, research can be conducted to identify the most informative regions of the brain for AD detection, which can help in developing more accurate and efficient methods. Additionally, methods can be developed that can track disease progression and predict the onset of AD.

Author Contributions

Conceptualization, M.F.A. and S.T.; formal analysis, S.T. and S.K.; funding acquisition, M.F.A.; investigation, M.F.A.; project administration, M.H.; supervision, S.T.; validation, S.T. and S.K.; writing—original draft, S.T. and S.K.; writing—review and editing, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This paper received funding from the King Salman center for Disability Research through Research Group no. KSRG-2023-085.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors extend their appreciation to the King Salman center for Disability Research for funding this work through Research Group no. KSRG-2023-085.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tomassini, S.; Sbrollini, A.; Covella, G.; Sernani, P.; Falcionelli, N.; Müller, H.; Morettini, M.; Burattini, L.; Dragoni, A.F. Brain-on-Cloud for automatic diagnosis of Alzheimer’s disease from 3D structural magnetic resonance whole-brain scans. Comput. Methods Programs Biomed. 2022, 227, 107191. [Google Scholar] [CrossRef]
Hamza, M.; Tehsin, S.; Humayun, M.; Almufareh, M.F.; Alfayad, M. A Comprehensive Review of Face Morph Generation and Detection of Fraudulent Identities. Appl. Sci. 2022, 12, 12545. [Google Scholar] [CrossRef]
Tehsin, S.; Kausar, S.; Jameel, A.; Humayun, M.; Almofarreh, D.K. Satellite Image Categorization Using Scalable Deep. Learn. Appl. Sci. 2023, 13, 5108. [Google Scholar] [CrossRef]
Alrayes, F.S.; Asiri, M.M.; Maashi, M.S.; Nour, M.K.; Rizwanullah, M.; Osman, A.E.; Drar, S.; Zamani, A.S. Waste classification using vision transformer based on multilayer hybrid convolution neural network. Urban Clim. 2023, 49, 101483. [Google Scholar] [CrossRef]
Almufareh, M.F.; Tehsin, S.; Humayun, M.; Kausar, S. A Transfer Learning Approach for Clinical Detection Support of Monkeypox Skin Lesions. Diagnostics 2023, 13, 1503. [Google Scholar] [CrossRef] [PubMed]
Loh, H.W.; Ooi, C.P.; Seoni, S.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 2022, 226, 107161. [Google Scholar] [CrossRef]
Dicuonzo, G.; Donofrio, F.; Fusco, A.; Shini, M. Healthcare system: Moving forward with artificial intelligence. Technovation 2023, 120, 102510. [Google Scholar] [CrossRef]
Klöppel, S.; Stonnington, C.M.; Chu, C.; Draganski, B.; Scahill, R.I.; Rohrer, J.D.; Fox, N.C.; Jack, C.R.; Ashburner, J.; Frackowiak, R.S.J. Automatic classification of MR scans in Alzheimer’s disease. Brain 2008, 131, 681–689. [Google Scholar] [CrossRef]
Gray, K.R.; Aljabar, P.; Heckemann, R.A.; Hammers, A.; Rueckert, D. Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease. NeuroImage 2013, 65, 167–175. [Google Scholar] [CrossRef]
Neati, S.; Abdellafou, K.B.; Jael, I.; Taouali, O.; Bouzrara, K. An improved machine learning technique based on downsized KPCA for Alzheimer’s disease classification. Int. J. Imaging Syst. Technol. 2019, 29, 121–131. [Google Scholar]
Ahmed, O.B.; Benois-Pineau, J.; Allard, M.; Amar, C.B.; Catheline, G. Classification of Alzheimer’s disease subjects from MRI using hippocampal visual features. Multimed. Tools Appl. 2014, 74, 1249–1266. [Google Scholar] [CrossRef]
El-Dahshan, E.S.A.; Hosny, T.; Salem, A.B.M. Hybrid intelligent techniques for MRI brain images classification. Digit. Signal Process. 2010, 20, 433–441. [Google Scholar] [CrossRef]
Wang, S.-H.; Zhang, Y.; Li, Y.-J.; Jia, W.-J.; Liu, F.-Y.; Yang, M.-M.; Zhang, Y.-D. Single slice based detection for Alzheimer’s disease via wavelet entropy and multilayer perceptron trained by biogeography-based optimization. Multimed. Tools Appl. 2018, 77, 10393–10417. [Google Scholar] [CrossRef]
Li, F.; Tran, L.; Thung, K.H.; Ji, S.; Shen, D.; Li, J. A robust deep model for improved classification of AD/MCI patients. IEEE J. Biomed. Health Inform. 2015, 19, 1610–1616. [Google Scholar] [CrossRef] [PubMed]
Böhle, M.; Eitel, F.; Weygandt, M.; Ritter, K. Layer-wise relevance propagation for explaining deep neural network decisions in MRI-based Alzheimer’s disease classification. Front. Aging Neurosci. 2019, 11, 194. [Google Scholar] [CrossRef] [PubMed]
Yang, K.; Mohammed, E.A. A review of artificial intelligence technologies for early prediction of Alzheimer’s disease. arXiv 2020, arXiv:2101.01781. [Google Scholar]
Govindaraj, V.V. High performance multiple sclerosis classification by data augmentation and AlexNet transfer learning model. J. Med. Imaging Health Inform. 2019, 9, 2012–2021. [Google Scholar]
Lu, S. Pathological brain detection based on AlexNet and transfer learning. J. Comput. Sci. 2019, 30, 41–47. [Google Scholar] [CrossRef]
Nguyen, M.; He, T.; An, L.; Alexander, D.C.; Feng, J.; Yeo, B.T.T.; Alzheimer’s Disease Neuroimaging Initiative. Predicting Alzheimer’s disease progression using deep recurrent neural networks. Neuroimage 2020, 222, 117203. [Google Scholar] [CrossRef] [PubMed]
Aless, R.M.; Biagetti, G.; Crippa, P.; Falaschetti, L.; Luzzi, S.; Turchetti, C. EEG-Based Alzheimer’s disease recognition using robust-PCA and LSTM recurrent neural network. Sensors 2022, 22, 3696. [Google Scholar]
Aqeel, A.; Hassan, A.; Khan, M.A.; Rehman, S.; Tariq, U.; Kadry, S.; Majumdar, A.; Thinnukool, O. A long short-term memory biomarker-based prediction framework for Alzheimer’s disease. Sensors 2022, 22, 1475. [Google Scholar] [CrossRef]
Ren, F.; Yang, C.; Qiu, Q.; Zeng, N.; Cai, C.; Hou, C.; Zou, Q. Exploiting discriminative regions of brain slices based on 2D CNNs for Alzheimer’s disease classification, IEEE Access: Practical Innovations. Open Solut. 2019, 7, 181423–181433. [Google Scholar]
Zhang, Y.; Teng, Q.; Liu, Y.; Liu, Y.; He, X. Diagnosis of Alzheimer’s disease based on regional attention with sMRI gray matter slices. J. Neurosci. Methods 2021, 365, 109376. [Google Scholar] [CrossRef]
Suk, H.I.; Wee, C.Y.; Lee, S.W.; Shen, D. State-space model with deep learning for functional dynamics estimation in resting-state fMRI. NeuroImage 2016, 129, 292–307. [Google Scholar] [CrossRef] [PubMed]
Shi, B.; Chen, Y.; Zhang, P.; Smith, C.D.; Liu, J.; Neuroimaging, I.A.D. Nonlinear feature transformation and deep fusion for Alzheimer’s Disease staging analysis. Pattern Recognit. 2017, 63, 487–498. [Google Scholar] [CrossRef]
Faturrahman, M.; Wasito, I.; Hanifah, N.; Mufidah, R. Structural MRI classification for Alzheimer’s disease detection using deep belief network. In Proceedings of the 11th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 31 October 2017; pp. 37–42. [Google Scholar]
Shakeri, M.; Lombaert, H.; Tripathi, S.; Kadoury, S.; Initiative, A.S.D.N. Deep spectral-based shape features for Alzheimer’s disease classification. In Proceedings of the International Workshop on Spectral and Shape Analysis in Medical Imaging, Athens, Greece, 21 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 15–24. [Google Scholar]
Lu, D.; Popuri, K.; Ding, G.W.; Balachandar, R.; Beg, M.F.; Initiative, A.S.D.N. Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis of Alzheimer’s disease. Med. Image Anal. 2018, 46, 26–34. [Google Scholar] [CrossRef] [PubMed]
Kushol, R.; Masoumzadeh, A.; Huo, D.; Kalra, S.; Yang, Y.H. Addformer: Alzheimer’s disease detection from structural mri using fusion transformer. In Proceedings of the 2022 IEEE 19th International Symposium On Biomedical Imaging (ISBI), Kolkata, India, 28–31 March 2022; IEEE: Piscataway Township, NJ, USA, 2022; pp. 1–5. [Google Scholar]
Miltiadous, A.; Gionanidis, E.; Tzimourta, K.D.; Giannakeas, N.; Tzallas, A.T. DICE-net: A Novel Convolution-Transformer Architecture for Alzheimer Detection in EEG Signals. IEEE Access 2023, 11, 71840–71858. [Google Scholar] [CrossRef]
Islam, J.; Zhang, Y. A novel deep learning based multi-class classification method for Alzheimer’s disease detection using brain MRI data. In Proceedings of the Brain Informatics: International Conference, BI 2017, Beijing, China, 16–18 November 2017; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; pp. 213–222. [Google Scholar]
OASIS Alzheimer’s Detection. Available online: https://www.kaggle.com/datasets/ninadaithal/imagesoasis (accessed on 10 June 2023).
OASIS-1: Cross-Sectional: Principal Investigators: D. Marcus, R, Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382. Available online: https://www.oasis-brains.org/ (accessed on 10 June 2023).
The Mini-Mental State Exam for Alzheimer’s. Available online: https://www.verywellhealth.com/mini-mental-state-exam-as-an-alzheimers-screening-test-98623 (accessed on 25 May 2023).
Saba, T.; Khan, S.U.; Islam, N.; Abbas, N.; Rehman, A.; Javaid, N.; Anjum, A. Cloud-based decision support system for the detection and classification of malignant cells in breast cancer using breast cytology images. Microsc. Res. Tech. 2019, 82, 775–785. [Google Scholar] [CrossRef]
Ayadi, M.; Ksibi, A.; Al-Rasheed, A.; Soufiene, B.O. COVID-AleXception: A deep learning model based on a deep feature concatenation approach for the detection of COVID-19 from chest X-ray images. Healthcare 2022, 10, 2072. [Google Scholar] [CrossRef] [PubMed]
Kothadiya, D.; Rehman, A.; Abbas, S.; Alamri, F.S.; Saba, T. Attention-based deep learning framework to recognize diabetes disease from cellular retinal images. Biochem. Cell Biol. 2023. ahead of print. [Google Scholar] [CrossRef] [PubMed]
Hern, E.-G.A.; Santana-Perez, I.; Arteaga-Marrero, N.; Fabelo, H.; Callico, G.M.; Ruiz-Alzola, J. Performance Evaluation of Deep Learning Models for Image Classification Over Small Datasets: Diabetic Foot Case Study. IEEE Access 2022, 10, 124373–124386. [Google Scholar]
Hussain, Z.; Gimenez, F.; Yi, D.; Rubin, D. Differential data augmentation techniques for medical imaging classification tasks. AMIA Annu. Symp. Proc. 2017, 2017, 979. [Google Scholar] [PubMed]
Shah, M.; Xiao, Y.; Subbanna, N.; Francis, S.; Arnold, D.L.; Collins, D.L.; Arbel, T. Evaluating intensity normalization on MRIs of human brain with multiple sclerosis. Med Image Anal. 2011, 15, 267–282. [Google Scholar] [CrossRef] [PubMed]
Boyes, R.G.; Gunter, J.L.; Frost, C.; Janke, A.L.; Yeatman, T.; Hill, D.L.G.; Bernstein, M.A.; Thompson, P.M.; Weiner, M.W.; Schuff, N.; et al. Intensity non-uniformity correction using N3 on 3-T scanners with multichannel phased array coils. Neuroimage 2008, 39, 1752–1762. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–15. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Tolstikhin, I.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
Kabir, A.; Kabir, F.; Mahmud, M.A.H.; Sinthia, S.A.; Azam, S.R.; Hussain, E.; Parvez, M.Z. Multi-Classification based Alzheimer’s Disease Detection with Comparative Analysis from Brain MRI Scans using Deep Learning. In Proceedings of the TENCON 2021–2021 IEEE Region 10 Conference (TENCON), Auckland, New Zealand, 7–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 905–910. [Google Scholar]
Qureshi, S.A.; Raza, S.E.A.; Hussain, L.; Malibari, A.A.; Nour, M.K.; Rehman, A.U.; Al-Wesabi, F.N.; Hilal, A.M. Intelligent ultra-light deep learning model for multi-class brain tumor detection. Appl. Sci. 2022, 12, 3715. [Google Scholar] [CrossRef]
Qureshi, S.A.; Hussain, L.; Ibrar, U.; Alabdulkreem, E.; Nour, M.K.; Alqahtani, M.S.; Nafie, F.M.; Mohamed, A.; Mohammed, G.P.; Duong, T.Q. Radiogenomic classification for MGMT promoter methylation status using multi-omics fused feature space for least invasive diagnosis through mpMRI scans. Sci. Rep. 2023, 13, 3291. [Google Scholar] [CrossRef]

Figure 1. T-SNE visualization of the dataset.

Figure 2. Diagrammatic illustration of proposed feature extraction and classification method.

Figure 3. Confusion matrix for Alzheimer’s classification.

Figure 4. Curves depicting the loss and accuracy during the training and validation stages. (a) Accuracy curves. (b) Loss curves.

Figure 5. Boxplot for different evaluation metrics.

Figure 6. Attention head map visualization.

Figure 7. Ablation study results for different parameters. (a) Learning rate. (b) Classifiers. (c) Feature fusion.

Table 1. Evaluation metrics for each specific class.

Disease	Precision%	Recall%	F1-Score%
ND	100	100	100
VMD	99.99	100	99.99
MID	100	100	100
MOD	100	100	100

Table 2. Performance comparison on Kaggle dataset.

Methodology	Precision%	Recall%	F-1%
Kabir et al. (2021) [45]	92.78	90.78	0.94
EfficientNetV2B1	90.37	89.76	90.06
InceptionResnetV2	97.4	94.76	95.80
InceptionV3	98.13	97.72	98.05
Proposed	99.99	99.99	99.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Almufareh, M.F.; Tehsin, S.; Humayun, M.; Kausar, S. Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease. Healthcare 2023, 11, 2763. https://doi.org/10.3390/healthcare11202763

AMA Style

Almufareh MF, Tehsin S, Humayun M, Kausar S. Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease. Healthcare. 2023; 11(20):2763. https://doi.org/10.3390/healthcare11202763

Chicago/Turabian Style

Almufareh, Maram Fahaad, Samabia Tehsin, Mamoona Humayun, and Sumaira Kausar. 2023. "Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease" Healthcare 11, no. 20: 2763. https://doi.org/10.3390/healthcare11202763

APA Style

Almufareh, M. F., Tehsin, S., Humayun, M., & Kausar, S. (2023). Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease. Healthcare, 11(20), 2763. https://doi.org/10.3390/healthcare11202763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Cognition for Detection of Mental Disability: A Vision Transformer Approach for Alzheimer’s Disease

Abstract

1. Introduction

2. Literature Review

2.1. Machine Learning Methods

2.2. Deep Learning Methods

3. Proposed Methodology

3.1. Data

3.2. Data Preparation

3.3. Preprocessing

3.4. Feature Extraction

3.5. Classification

4. Evaluation Metrics

5. Result and Analysis

6. Limitations and Future Recommendations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI