A Novel Framework for Classiﬁcation of Different Alzheimer’s Disease Stages Using CNN Model

: Background: Alzheimer’s, the predominant formof dementia, is a neurodegenerative brain disorder with no known cure. With the lack of innovative ﬁndings to diagnose and treat Alzheimer’s, the number of middle-aged people with dementia is estimated to hike nearly to 13 million by the end of 2050. The estimated cost of Alzheimer’s and other related ailments is USD321 billion in 2022 and can rise above USD1 trillion by the end of 2050. Therefore, the early prediction of such diseases using computer-aided systems is a topic of considerable interest and substantial study among scholars. The major objective is to develop a comprehensive framework for the earliest onset and categorization of different phases of Alzheimer’s. Methods: Experimental work of this novel approach is performed by implementing neural networks (CNN) on MRI image datasets. Five classes of Alzheimer’s disease subjects are multi-classiﬁed. We used the transfer learning determinant to reap the beneﬁts of pre-trained health data classiﬁcation models such as the MobileNet. Results: For the evaluation and comparison of the proposed model, various performance metrics are used. The test results reveal that the CNN architectures method has the following characteristics: appropriate simple structures that mitigate computational burden, memory usage, and overﬁtting, as well as offering maintainable time. The MobileNet pre-trained model has been ﬁne-tuned and has achieved 96.6 percent accuracy for multi-class AD stage classiﬁcations. Other models, such as VGG16 and ResNet50 models, are applied tothe same dataset whileconducting this research, and it is revealed that this model yields better results than other models. Conclusion: The study develops a novel framework for the identiﬁcation of different AD stages. The main advantage of this novel approach is the creation of lightweight neural networks. MobileNet model is mostly used for mobile applications and was rarely used for medical image analysis; hence, we implemented this model for disease detection andyieldedbetter results than existing models.


Introduction
Alzheimer's is a condition of the brain's central partthat causes gradual memory loss, cognitive impairment, and emotional distress. Around 46.8 million individuals worldwide have dementia, with Alzheimer's disease accounting for 60-70% of cases and costing more 1.
AD or common variants of AD [16].
Cognitive assessments and imaging examinations, such as Neuro-imaging, rule out alternative factors of memory impairment, such as tumors, which are ruled out by cognitive assessments and imaging examinations, such as Neuro-imaging [17]. When integrated with these approaches and professional clinical expertise, medical criteria have an 80 percent good prognosis and a 60 percent diagnostic accuracy for clinical diagnosis. Recent visualization advances, such as Neuro-imaging scans (MRI) [18], PET scans [19,20], and single-photon emission computed tomography (SPECT) [21], have empowered the tracking of neurodegeneration, malformations in neuronal development, and edema.
The number of AD patients is expected to rise substantially, necessitating the use of a computer-aided diagnosis (CAD) system for early and precise AD diagnosis [22]. Furthermore, mild cognitive impairment (MCI) is an interim phase between sound perception and dementia. As per a previous study [23], MCI participants advance to clinical AD at a rate of 10-15 percent yearly. In recent years, research in detecting MCI patients who will develop clinical dementia has received much attention. Conversion of one stage into another is vital to identifying the different stages of Alzheimer's disease.
The primary emphasis of this study is on identifying different stages of AD based on an image dataset. Deep learning techniques are frequently employed for time series classification, image identification, and multidimensional data processing [24]. These methods are widely applied to neuroimaging data to identify Alzheimer's disease (AD) [25]. Potential genetic indicators of Alzheimer's disease have also been explored using these methods [26].
In order to diagnose Alzheimer's, Zhang et al. [27] combined neuroimaging data with clinical and neuropsychological evaluations using a multimodal deep learning model. The idea put forth by Spasov et al. [28] emphasizes the significance of deep learning designs in patients at risk of AD in stopping the progression of mild cognitive impairment. Deep learning-based models also use extensive genomic and DNA methylation data to anticipate AD, mitigating the symptoms of usual neurodegeneration (falls, memory loss, etc.).
Our work progresses these approaches (neuroimaging and clinical analysis) by applying a technique that uses an ensemble of CNN models to identify the different stages of AD. Different CNN models are applied to the same dataset and comparing results, and it is found that the MobileNet model is efficient for medical image analysis. The rest of the work is presented in this paper in the following subsections. Section 2 discusses previous findings for diagnosing Alzheimer's disease. Our CNN-based method for determining the stage of Alzheimer's disease based on MobileNet and image weights is described in Section 3. The experimental findings of our model are shown in Section 4. The research's findings are discussed in Section 5, and in Section 6, we provided a research paper conclusion.

Related Work
AD detection is extensively investigated and encompasses many problems and complexities [29]. Payan et al. [30] used a sparse autoencoder and 3D convolutional neural networks. They developed an algorithm that analyses a brain MRI scan (MRI) to determine a person's disease status. The primary innovation was 3D convolutions, which outperformed 2D convolutions in terms of results. The auto-encoder was used to train the convolutional layer, but it was not fine-tuned. Efficiency is supposed to enhance with fine-tuning [24].
Sarraf et al. [31] classified AD from the NC brain using a widely used CNN architecture, LeNet-5 (binary classification). The work presented in [30] was developed by Hosseini et al. [32]. A deeply supervised adaptive 3D-CNN (DSA-3D-CNN) classifier was used to predict AD. Three-layered autoencoder (3D-CAE) architectures are pre-trained without any skull stripping pre-processing on a CAD-Dementia dataset. Performance is analyzed using ten-fold cross-validation.
Gupta et al. [33] devised a sparse autoencoder model for the categorization of Alzheimer's disease (AD), mild cognitive impairment (MCI), and healthy controls (HC). Payan et al. [34] used sparse autoencoders and a CNN architecture to diagnose Alzheimer's. They also devised a two-dimensional CNN model that performed similarly. Brosch et al. [34] employed a deep belief network model with manifold learning to diagnose Alzheimer's disease in MRI images, etc. [35][36][37][38].
Liu and Shen [39] used unsupervised and supervised technology to develop a deeplearning model that categorized AD and MCI patients. Korolev et al. [40] showed that a corresponding finding might be accomplished. When the control neural network and basic 3D CNN designs were deployed to three-dimensional MRIs, the outcomes revealed that the deepness and complexity of the two networks were remarkably similar. They did not perform as well as they originally anticipated.
Sarraf and Tofghi [41] employed functional MRI data and the deep LeNet model for AD diagnosis. Suk et al. [42][43][44][45] used multiple complex SVM kernels for classification in an autoencoder network-based model for AD diagnosis. They used a multi-kernel classifier to classify small-to semi-characteristics extracted from magnetic current imaging, MCI-converter structural MRI, and PET data.
Wang et al. [46] developed a novel CNN approach that is based on a multimodal MRI analysis approach that involves diffusion tensor images or functional brain imaging data. Patients with Alzheimer's ailment, dementia, and other related conditions were classified utilizing the framework. Despite the excellent classification accuracy, it is anticipated that employing 3D convolution rather than 2D convolution would enhance efficiency. A 3D multi-scale CNN (3DMSCNN) model was devised by Ge et al. [47]. The 3DMSCNN was a novel architecture for AD diagnosis. They also devised a multi-scale feature augmentation technique as well as a feature fusion.
Song et al. [48] postulated a Graph Convolutional Neural Network (GCNN) classifier based on graph techniques. By using structural connection graphs, which indicate a multi-class model, training and architecture evaluation divides the AD spectrum into four categories. Xu Y et al. [49] proposed a medical image segmentation method based on multi-dimensional statistical features. The main purpose of this paper is to integrate CNN and transformers to detect and diagnose brain tumors. This research has efficient results. Based on such techniques, this research can be further enhanced by integrating both data modalities and models to detect and diagnose brain disorders. We are also currently working on the detection and diagnosis of brain disorders using different data modalities.

Problem Description and Solution Strategy
As highlighted in the Section 2, numerous paradigms encompassing AD prognosis and clinical image assessment have recently been presented in the literature. However, most do not use transfer learning algorithms, multi-class clinical object detection, or an Alzheimer's disease monitoring cloud service to assess AD distinct phases and provide faraway guidance. These issues have received insufficient attention in literary works. Thus, the novelties of this research can be organized as follows following other cutting-edge techniques discussed in the Section 2: A novel framework is devised to identify various Alzheimer's ailment phases and the classification of medical images. The suggested method relies on CNN architectures for structural MRI images of the brain. Transfer learning is utilized to grab efficiencies of already trained architectures, such as VGG19, ResNet50, and DenseNet121.
Unbalanced datasets and notional size are the most problematic aspects of medical image analysis. Resampling techniques are employed to balance the datasets, while data expansion methods are utilized to enhance the dataset size and overcome the over-fitting issues. According to performance indicators, the experimental results predict a positive outcome.

Methods and Materials
The early diagnosis of Alzheimer's ailment is critical for precluding and managing its progression. The data used in this research are taken from ADNI (Alzheimer's Disease Neuroimaging Initiative). The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer's disease (AD). This initiative started in 2004 and is supported by multiple companies. Complete 1-year data were taken from ADNI to design this novel research. The main purpose of this research is to design a novel approach for the initial identification and tracking of Alzheimer's stages. The proposed framework workflow, data preparation algorithms, and medical image classification techniques are thoroughly explained below.

The Proposed Framework
The proposed approach consists of below mentioned four steps: Step1. Data Attainment This approach uses the ADNI dataset in the T2w MRI format. It offers medical images in jpeg formats, such as Axial Coronal and Sagittal. The dataset comprises data from 300 subjects alienated into five classes that are cognitively normal, mild cognitive impairment, early mild cognitive impairment, late mild cognitive impairment (CN, MCI, EMCI, LMCI), and NC. The patient with LMCI symptoms can acquire more AD than the EMCI subject because the patient has undergone severe neuron damage in this stage. The total number of images available here is 1101. AD class comprises 145 images, EMCI comprises 204 images, LMCI comprises 61 images, MCI has 198, and NC has 493 images. We re-sized all the images into 224 × 224 pixels, and three channels (RGB) were used. A batch size of 32 images was transferred at each iteration during training to reduce the computational power batch size of 32 images transferred at each iteration of training to minimize the height = 224 width = 224 channels = 3 batch size = 32.
Step2. Pre-processing Data are unbalanced, and training on an unbalanced dataset leads us toward data underfitting or overfitting issues, and in the end, our model would not be able to classify the images correctly. The solution is to balance the data, for which we use the upsampling technique, in which the labels with a smaller number of images are increased or unsampled. After resampling, all the classes become 580 MRI images, and thus the entire dataset size is 2900. The data are refined, standardized, scaled, denoised, and formatted appropriately. In the future, other techniques, such as downsampling, will be used. Figure 1 illustartes the resampling technique of MRI images.
unsampled. After resampling, all the classes become 580 MRI images, and thus the entire dataset size is 2900. The data are refined, standardized, scaled, denoised, and formatted appropriately. In the future, other techniques, such as downsampling, will be used. Figure  1 illustartes the resampling technique of MRI images. Step3. Data Augmentation The primary objective of employing data augmentation methods is to (1) expand the data size and (2) solve the issue of overfitting. Data augmentation approaches are used in the following way.
The input images are pre-processed by using the pre-processing function of the pretrained model, horizontal flipping of the images, rotation of images by 5 degrees, and width and shift in the images.
We used the Keras API of the image data generator to apply the data augmentation. We can observe that some images are 5 degrees rotated, and some are flipped, as shown in Figure 2.  Step3. Data Augmentation The primary objective of employing data augmentation methods is to (1) expand the data size and (2) solve the issue of overfitting. Data augmentation approaches are used in the following way.
The input images are pre-processed by using the pre-processing function of the pretrained model, horizontal flipping of the images, rotation of images by 5 degrees, and width and shift in the images.
We used the Keras API of the image data generator to apply the data augmentation. We can observe that some images are 5 degrees rotated, and some are flipped, as shown in Figure 2.
unsampled. After resampling, all the classes become 580 MRI images, and thus the entire dataset size is 2900. The data are refined, standardized, scaled, denoised, and formatted appropriately. In the future, other techniques, such as downsampling, will be used. Figure  1 illustartes the resampling technique of MRI images. Step3. Data Augmentation The primary objective of employing data augmentation methods is to (1) expand the data size and (2) solve the issue of overfitting. Data augmentation approaches are used in the following way.
The input images are pre-processed by using the pre-processing function of the pretrained model, horizontal flipping of the images, rotation of images by 5 degrees, and width and shift in the images.
We used the Keras API of the image data generator to apply the data augmentation. We can observe that some images are 5 degrees rotated, and some are flipped, as shown in Figure 2.   As a result, the dataset expands to 2900 images, partitioned into 580 images in each class. After that, the balanced dataset of 2900 MRI scans is reconfigured and randomly fragmented into training, validation, and test groups, with an 80:10:10 split ratio for each class. The division of data testing, training, and validation groups for 5-way classification is summarized in Table  Step4. Pre-processing Techniques a. Data normalization: Data normalization is beneficial for removing different redundancies from the datasets, such as varied contrasts and varied subject poses, to simplify subtle difference detection. It rescales the attributes with a mean value of 0 and a standard deviation of 1. Different types of normalization techniques, such as Z normalization, called standardization; min-max normalization; and unit vector normalization, are applied to the dataset. We applied unit vector normalization to our dataset. b.
Unit vector normalization: It shrinks/stretches a vector and scales it to a unit length. We applied it to the whole dataset, and the transformed data are viewed as a cluster of vectors with distant trajectories on the d-dimensional unit sphere. The general formulae for unit vector normalization areÛ = U |U| , whereÛ = normalized vector, U = non-Zero vector, and |U| = length of U.

Proposed Classification Methods and Techniques
The three critical components of machine learning algorithms are feature extraction, reduction, and classification. All three steps are performed manually or separately while implementing machine learning algorithms. The beauty of deep learning algorithms such as CNNs is that there is no need for manual feature extraction. These three stages are performed in combination with CNN architectures. CNN architectures have high classification performance than traditional models. The three layers of CNN architectures are the convolution layer, the pooling layer, and the entire connected layer [54]. Extraction of features is the responsibility of the convolution layer, dimension reduction by the pooling layer, and classification by fully connected layers. Conversion of two-dimensional metrics into one-dimensional vectors is also performed by a fully connected layer [55].

Convolution Layer
It acts as a base for the CNN architecture. It comprises a set of filters, also called kernels, which are learned through the training process. The filter dimensions are smaller than the real image. Filters convolve with images and create activation maps. The convolution layer extracts all the features. A learnable filter that retrieves features out of a given image is represented by the convolutional layer. For a three-dimensional image with the dimensions H × W × C, H denotes height, W width, and C is the total count of channels. Applying a 3D filter-sized F × H F × W F × C, where FC is the number of filter channels, FW denotes the filter width, and FH denotes the filter height. Hence, the output activation map size must be AH AW, where AH stands for activation height and AW activation width. The following equations are used to calculate activation height and width values.
P signifies padding, S represents stride, and there are n filters, so the activation map dimensions must turn out to be AH × AW × n. Figure 3 illustrates the complete convolution.
P signifies padding, S represents stride, and there are n filters, so the activation map dimensions must turn out to be AH × AW × n. Figure 3 illustrates the complete convolution.

Polling Layer
The pooling layer's primary purpose is to lower the size of the feature maps. Therefore, there are fewer parameters to learn and fewer computations to be made by the network. The different polling layers are max pooling, average pooling, and global pooling. By applying a non-linear conversion to the given inputs, the activation function addresses non-linearity in the network. Our proposed multi-classifier uses the SoftMax activation function in the output layer. The main function of the SoftMax function is to calculate relative probabilities. The general equation of the SoftMax equation function is given below in Equation (3).
In this case, Z stands for the values of the output layer neurons, with the exponent being a non-linear function. These values are then normalized and transformed into probabilities by dividing them by the sum of exponent values. For all hidden layers, we applied the ReLU activation function, the most familiar implicated function in CNNs. There are different variants of ReLU activation functions, such as parametricReLU, leaky ReLU, exponential linear (ELU, SELU), and concatenated ReLU (CReLU). We applied leaky ReLU since it has some benefits over other variants, such as it fixes the problem of "dying ReLU" because it has no zero-slope parts. It also speeds the training process because it is more balanced and therefore learns faster. However, it should be kept in mind that leaky ReLU is not superior to simple ReLU and should be considered as an alternative. The general equation for the ReLU and leaky ReLU activation functions are given below in Equation (4) and Equation (5)

Polling Layer
The pooling layer's primary purpose is to lower the size of the feature maps. Therefore, there are fewer parameters to learn and fewer computations to be made by the network. The different polling layers are max pooling, average pooling, and global pooling. By applying a non-linear conversion to the given inputs, the activation function addresses non-linearity in the network. Our proposed multi-classifier uses the SoftMax activation function in the output layer. The main function of the SoftMax function is to calculate relative probabilities. The general equation of the SoftMax equation function is given below in Equation (3).
In this case, Z stands for the values of the output layer neurons, with the exponent being a non-linear function. These values are then normalized and transformed into probabilities by dividing them by the sum of exponent values. For all hidden layers, we applied the ReLU activation function, the most familiar implicated function in CNNs. There are different variants of ReLU activation functions, such as parametricReLU, leaky ReLU, exponential linear (ELU, SELU), and concatenated ReLU (CReLU). We applied leaky ReLU since it has some benefits over other variants, such as it fixes the problem of "dying ReLU" because it has no zero-slope parts. It also speeds the training process because it is more balanced and therefore learns faster. However, it should be kept in mind that leaky ReLU is not superior to simple ReLU and should be considered as an alternative. The general equation for the ReLU and leaky ReLU activation functions are given below in Equation (4) and Equation (5), respectively.
When z is less than 0, leaky ReLU allows a small nonzero, constant gradient α. Generally, the value of α = 0.01. in our study for medical image analysis, we ensemble Mobile net with ImageNet weights along with transfer learning for image classification. These architectures can easily handle two-dimension and three-dimension brain neuroimages built on 2D, 3D, and convolutions. The general flow of our novel framework is shown in Table 2. We applied the MobileNet model and ImageNet weights to categorize distant phases of Alzheimer's. It uses depth-wise separable convolutions. The main advantage of this model is that it reduces the parameter number compared to other networks and generates lightweight deep neural networks [58,59]. It is a class of CNN and gives us the optimum initial point for training our classifier to be insanely small and fast. Mobile Nets are built on depthwise separable convolution layers. Each dw Conv consists of depthwise convolution and pointwise convolution [60][61][62]. There are almost 4.2 million parameters in a MobileNet architecture. The size of the input image is 224 × 224 × 3. The convolution kernel shape is 3 × 3 × 3 × 32, with Avg pool size of 7 × 7 × 1024. Dropout layers are succeeded by a flattened layer and entirely connected layers. The final fully connected layer with SoftMax as the activation function is implemented to manage five classes of Alzheimer's (AD), while ReLU is the activation function for hidden layers.
The general architecture of the MobileNet model we applied is shown below in Figure 4. The total trainable parameters are 25,958,917, and the non-trainable are 3,231,936. We used the RMSProp as our optimizer with a learning rate of 0.00001 and used the loss as categorical cross entropy for multi-class classification and keeping the metrics such as accuracy, which give the results of training and authentication, loss, and accuracy values during the training. The evaluation of the novel approach with other developed approaches is revealed in the tables below. , x FOR PEER REVIEW 9 of 14 We used the RMSProp as our optimizer with a learning rate of 0.00001 and used the loss as categorical cross entropy for multi-class classification and keeping the metrics such as accuracy, which give the results of training and authentication, loss, and accuracy values during the training. The evaluation of the novel approach with other developed approaches is revealed in the tables below.

Experimental findings and Model Evaluation
The novelmodelconsiders various scenarios. We examined the empirical results in terms of many performance benchmarks, including confusion matrix, accuracy, loss, F1 score, precession, recall, ROC, sensitivity, and AUC. Table 3 below provides a summary of the novel model.

Model Evaluation
For the multi-classification, we used MobileNet architecture, a version of CNN networks. The efficacy of the planned model is equated with prevailing models; as depicted in Table 3, the proposed model shows better accuracy results than existing models. Our model achieves an accuracy of 96.22%, as shown in Figure 5, while Juan Ruiz et al. [49] achieved 66.67%, Spasov et al. [55] achieved 88%, and Sahumbaiev et al. [56] achieved 89.47%. The training and validating accuracy and loss of the suggested approach are shown in Figure 6.

Experimental Findings and Model Evaluation
The novelmodelconsiders various scenarios. We examined the empirical results in terms of many performance benchmarks, including confusion matrix, accuracy, loss, F1 score, precession, recall, ROC, sensitivity, and AUC. Table 3 below provides a summary of the novel model. Table 3. Shows us the evaluation of the presentation metrics of the devised model [60].

Model Evaluation
For the multi-classification, we used MobileNet architecture, a version of CNN networks. The efficacy of the planned model is equated with prevailing models; as depicted in Table 3, the proposed model shows better accuracy results than existing models. Our model achieves an accuracy of 96.22%, as shown in Figure 5, while Juan Ruiz et al. [49] achieved 66.67%, Spasov et al. [55] achieved 88%, and Sahumbaiev et al. [56] achieved 89.47%. The training and validating accuracy and loss of the suggested approach are shown in Figure 6.   The number of patients diagnosed with each type of AD stage (NC/MCI/AD/LMCI/EMCI) is shown in the confusion matrix. The normalized confusion matrix for the suggested framework is shown in Figure 6. The comparative analysis of the proposed method with existing approaches is shown graphically in the below figure. Our approach shows better results than existing results. Figure 7 shows the assessment of the devised architecture with other. The number of patients diagnosed with each type of AD stage (NC/MCI/AD/LMCI/ EMCI) is shown in the confusion matrix. The normalized confusion matrix for the suggested framework is shown in Figure 6.
The comparative analysis of the proposed method with existing approaches is shown graphically in the below figure. Our approach shows better results than existing results. Figure 7 shows the assessment of the devised architecture with other.

Conclusions
This study presents a system for medical image categorization and Alzheimer's ailment recognition. Deep-learning CNN architectures support the proposed approach. Alzheimer's disease has five stages. We employ the MobileNet model with ImageNet weights. The beauty of MobileNet architecture is that it uses depthwise separable convolutions, which reduces parameter numbers compared to other models with regular convolutions and results in lightweight neural networks. The other important characteristic of MobileNet architecture is that instead of having a single 3 × 3 convolution layer in traditional networks, it has batch norm and ReLU. Additionally, it divides convolution into 1 × 1 pointwise convolutions and 3 × 3 depth-wise convolutions. For detection, embedding segmentation, and classification, MobileNet models are essential. ImageNet is a standard for image classification. It provides a standard measure of how efficient a model is for classification. Different performance metrics are implemented for the assessment of the model. Our model achieves an accuracy of 96.22%. In the impending, it is premeditated to implement other pre-trained models for classification purposes and check whether a patient can convert from one AD stage into another. In the imminent, the dataset's size will also be increased to improve accuracy, and different augmentation approaches, such as downsampling, will be used. The author is currently working on multi-modal fusion data fusion techniques to detect and diagnose AD.

Conclusions
This study presents a system for medical image categorization and Alzheimer's ailment recognition. Deep-learning CNN architectures support the proposed approach. Alzheimer's disease has five stages. We employ the MobileNet model with ImageNet weights. The beauty of MobileNet architecture is that it uses depthwise separable convolutions, which reduces parameter numbers compared to other models with regular convolutions and results in lightweight neural networks. The other important characteristic of MobileNet architecture is that instead of having a single 3 × 3 convolution layer in traditional networks, it has batch norm and ReLU. Additionally, it divides convolution into 1 × 1 pointwise convolutions and 3 × 3 depth-wise convolutions. For detection, embedding segmentation, and classification, MobileNet models are essential. ImageNet is a standard for image classification. It provides a standard measure of how efficient a model is for classification. Different performance metrics are implemented for the assessment of the model. Our model achieves an accuracy of 96.22%. In the impending, it is premeditated to implement other pre-trained models for classification purposes and check whether a patient can convert from one AD stage into another. In the imminent, the dataset's size will also be increased to improve accuracy, and different augmentation approaches, such as downsampling, will be used. The author is currently working on multi-modal fusion data fusion techniques to detect and diagnose AD.