HTLML: Hybrid AI Based Model for Detection of Alzheimer’s Disease

Alzheimer’s disease (AD) is a degenerative condition of the brain that affects the memory and reasoning abilities of patients. Memory is steadily wiped out by this condition, which gradually affects the brain’s ability to think, recall, and form intentions. In order to properly identify this disease, a variety of manual imaging modalities including CT, MRI, PET, etc. are being used. These methods, however, are time-consuming and troublesome in the context of early diagnostics. This is why deep learning models have been devised that are less time-intensive, require less high-tech hardware or human interaction, continue to improve in performance, and are useful for the prediction of AD, which can also be verified by experimental results obtained by doctors in medical institutions or health care facilities. In this paper, we propose a hybrid-based AI-based model that includes the combination of both transfer learning (TL) and permutation-based machine learning (ML) voting classifier in terms of two basic phases. In the first phase of implementation, it comprises two TL-based models: namely, DenseNet-121 and Densenet-201 for features extraction, whereas in the second phase of implementation, it carries out three different ML classifiers like SVM, Naïve base and XGBoost for classification purposes. The final classifier outcomes are evaluated by means of permutations of the voting mechanism. The proposed model achieved accuracy of 91.75%, specificity of 96.5%, and an F1-score of 90.25. The dataset used for training was obtained from Kaggle and contains 6200 photos, including 896 images classified as mildly demented, 64 images classified as moderately demented, 3200 images classified as non-demented, and 1966 images classified as extremely mildly demented. The results show that the suggested model outperforms current state-of-the-art models. These models could be used to generate therapeutically viable methods for detecting AD in MRI images based on these results for clinical prospective.


Introduction
AD is a neurological condition which damages the brain cells that slowly erodes memory and hampers basic cognitive functions and abilities. This disease is detected by changes in the brain and eventually results in neuron loss and their connections. According to WHO, around 50 million people have dementia, and there are nearly 10 million new

•
A hybrid AI-based model was proposed by combining both transfer learning (TL) and permutation-based machine learning (ML) for AD diagnosis. The three hybrid DenseNet-121 models have been simulated with combinations of three machine learning classifiers, i.e., SVM, Gaussian naïve base and XGBoost, respectively, for detection of Alzheimer's disease. From these models, the best hybrid DenseNet-121-SVM model was selected for further simulation. • Two TL-based models were implemented, namely DenseNet-121 and DenseNet-201, for feature extraction. • Finally, the three most popular machine learning (ML) classifiers, namely SVM, Gaussian naïve base and XGBoost, were respectively implemented for classification purposes. • A permutation-based voting classifier was implemented for final accuracy observation. • The proposed model was implemented using Adam optimizer and 1000 Epochs for evaluation purposes.

Background Literature
Most of the research work was implemented on the binary classification (Bin.c) of AD, and a smaller dataset was utilized to design their proposed model, which was not adaptable. Many of these authors have been training on a bigger dataset. They had performed binary classification. Table 1 compares previous models. Dataset contained 3127, 3T T1-weighted images. It performed tertiary classification and was able to an accuracy of 88.9%. It also aims to focus on reverting MCI individuals to normal individuals, predict AD progression and improve diagnosis of AD in future. [3] GoogleNet, AlexNet, VGGNet16VGGNet19S-queezeNetResNet18 ResNet50 ResNet101Inceptionv3.
To detect AD on MRI scans using D.L techniques.
Dataset consisted of 177 images. It performed Bin.c and achieved an accuracy of 84.38%. To include other neuro-imaging modalities such as PET scans or features in the system to take different aspects of AD into consideration.
To classify AD by using Cross-Modal Transfer Learning Dataset contained 416; sMRI image scans and it implemented Bin.c and achieved an accuracy of 83.57%. To proceed with a longitudinal dataset and develop a method based on spatial optimization of ROI. [5] DTCWT, PCA, FNN To develop a CAD system to early diagnose AD individuals.
Dataset contained 416; T1-weighted image scans and it performed Bin.c and achieved an accuracy of 90.06%. Various feature reduction methods such as ICA, LDA and PCA were utilized for swarm optimization.
[6] SVM, CNN To classify AD from MCI by using semi-supervised SVM-CNN.
Dataset contained 359; T1-weighted images and it performed Bin.c and achieved an accuracy of 82.91%. To distinguish brain MRI images semi semi-supervised SVM is applied. Dataset contained 1167; T1-weighted image scans and it performed Bin.c and achieved an accuracy of 81%. To distinguish brain images by using SVM-REF. [8] 2D-CNN, VGG16 To classify AD by using ensemble based CNN.
Dataset contained 798; T1-weighted image scans and it performed Bin.c and achieved an accuracy of 90.36%. To distinguish AD from MCI images by using 2D-CNN. [9] SVM, CNN To distinguish MCI from AD by using an SVM classifier with a linear kernel.
Dataset contained 1167; T1-weighted image scans and it performed Bin.c and achieved an accuracy of 69.37%. To distinguish AD from MCI images by using SVM-CNN.
[10] SVM, k-NN, CNN To distinguish MCI from AD by using SVM and k-NN.
Dataset contained 1311; T1 & T2 weighted image scans and it performed Bin.c and achieved an accuracy of 75%. To distinguish AD from MCI images by using SVN-CNN, KNN.
From Table 1, smaller datasets have been utilized to execute these models. However, the proposed model ran on a large dataset and does not utilize binary classification. Rather, it classifies Alzheimer's disease into four categories, that is, MD, V.M.D, Mod. D and N.D.

Proposed Research Methodology
A transfer learning-based DenseNet model is an ensemble with machine learning classifiers in which DenseNet is used to extract the feature map from the image dataset, whereas machine learning classifiers are used to classify the feature map into four categories, i.e., ND, VMD, MD and Mod D classes. For feature map extraction, two DenseNet models, i.e., Densenet121 and DenseNet201 are used. For classification, three types of classifiers are used, i.e., SVM, Gaussian NB and XG Boost. The proposed model utilizes the Kaggle dataset containing 6200 AD images. The framework consists of various layers as shown in Figure 1. The different blocks of the proposed model are explained below. The model is executed using the Keras package in Python with Tensorflow, which is used at the backend on an Intel(R) Core(TM) i5-6400 CPU 2.70 GHz processor and 12GB RAM. gories, i.e., ND, VMD, MD and Mod D classes. For feature map extraction, two DenseNet models, i.e., Densenet121 and DenseNet201 are used. For classification, three types of classifiers are used, i.e., SVM, Gaussian NB and XG Boost. The proposed model utilizes the Kaggle dataset containing 6200 AD images. The framework consists of various layers as shown in Figure 1. The different blocks of the proposed model are explained below. The model is executed using the Keras package in Python with Tensorflow, which is used at the backend on an Intel(R) Core(TM) i5-6400 CPU 2.70 GHz processor and 12GB RAM.

Input Dataset
The database used in the study consists of a total of 6126 AD images that were collected from the Kaggle database. It comprises grayscale images of 896 MD, 64 Mod D, 3200 ND and 1966 VMD of dimensions (208 × 176 × 3) pixels. The dataset for evaluation was divided in such a way that 80% of the data were utilized for training and the remaining 20% for testing. Table 2 shows the number of images used for training and validation purposes. Figure

Input Dataset
The database used in the study consists of a total of 6126 AD images that were collected from the Kaggle database. It comprises grayscale images of 896 MD, 64 Mod D, 3200 ND and 1966 VMD of dimensions (208 × 176 × 3) pixels. The dataset for evaluation was divided in such a way that 80% of the data were utilized for training and the remaining 20% for testing. Table 2 shows the number of images used for training and validation purposes.

Data Pre-Processing
Data preprocessing is a technique of assembling input data and making them utilizable for deep learning models [20][21][22]. It is the fundamental step in this proposed research methodology. Data preprocessing is required for cleaning the noise, identifying and rectifying the missing values, and making the data usable in an appropriate format [23,24]. Further, this enhances the efficacy of the whole proposed research model [25,26]. In this model, two types of data pre-processing are utilized, namely data normalization and data augmentation.

Data Pre-Processing
Data preprocessing is a technique of assembling input data and making them utilizable for deep learning models [20][21][22]. It is the fundamental step in this proposed research methodology. Data preprocessing is required for cleaning the noise, identifying and rectifying the missing values, and making the data usable in an appropriate format [23,24]. Further, this enhances the efficacy of the whole proposed research model [25,26]. In this model, two types of data pre-processing are utilized, namely data normalization and data augmentation.

Data Normalization
Data normalization preserves the numerical stability of the modified Inception model. MRI images were acquired basically in grayscale format. Utilizing normalization techniques, the MRI datasets in the proposed model were trained faster.

Data Augmentation
In order to enhance the usefulness of the model, a large dataset is needed. However, numerous sites as well as privacy and data restrictions are issues faced while acquiring these datasets. In order to overcome these problems, various augmentation methods on the dataset were implemented. These augmentation methods significantly expanded the original data quantity. Techniques such as Horizontal Flipping, Vertical Flipping, and Anticlockwise Rotated Image at 900, Rotated Image at 2700 and Brighter Image by a factor 0.7 are implemented. These five data augmentation methods are shown in Figure 3.  Table 3 exhibits the quantity of images before and after data augmentation. Further, there is a disproportion in the number of images in every class. In order to improve this disproportion, the above processes were performed. After performing these methods, the original dataset was updated to 10,760 images. Table 3 represents the number of newly

Data Normalization
Data normalization preserves the numerical stability of the modified Inception model. MRI images were acquired basically in grayscale format. Utilizing normalization techniques, the MRI datasets in the proposed model were trained faster.

Data Augmentation
In order to enhance the usefulness of the model, a large dataset is needed. However, numerous sites as well as privacy and data restrictions are issues faced while acquiring these datasets. In order to overcome these problems, various augmentation methods on the dataset were implemented. These augmentation methods significantly expanded the original data quantity. Techniques such as Horizontal Flipping, Vertical Flipping, and Anticlockwise Rotated Image at 900, Rotated Image at 2700 and Brighter Image by a factor 0.7 are implemented. These five data augmentation methods are shown in Figure 3.

Data Pre-Processing
Data preprocessing is a technique of assembling input data and making them utilizable for deep learning models [20][21][22]. It is the fundamental step in this proposed research methodology. Data preprocessing is required for cleaning the noise, identifying and rectifying the missing values, and making the data usable in an appropriate format [23,24]. Further, this enhances the efficacy of the whole proposed research model [25,26]. In this model, two types of data pre-processing are utilized, namely data normalization and data augmentation.

Data Normalization
Data normalization preserves the numerical stability of the modified Inception model. MRI images were acquired basically in grayscale format. Utilizing normalization techniques, the MRI datasets in the proposed model were trained faster.

Data Augmentation
In order to enhance the usefulness of the model, a large dataset is needed. However, numerous sites as well as privacy and data restrictions are issues faced while acquiring these datasets. In order to overcome these problems, various augmentation methods on the dataset were implemented. These augmentation methods significantly expanded the original data quantity. Techniques such as Horizontal Flipping, Vertical Flipping, and Anticlockwise Rotated Image at 900, Rotated Image at 2700 and Brighter Image by a factor 0.7 are implemented. These five data augmentation methods are shown in Figure 3.  Table 3 exhibits the quantity of images before and after data augmentation. Further, there is a disproportion in the number of images in every class. In order to improve this disproportion, the above processes were performed. After performing these methods, the original dataset was updated to 10,760 images. Table 3 represents the number of newly  Table 3 exhibits the quantity of images before and after data augmentation. Further, there is a disproportion in the number of images in every class. In order to improve this disproportion, the above processes were performed. After performing these methods, the original dataset was updated to 10,760 images. Table 3 represents the number of newly updated images. The augmentation was applied only on the training images. Earlier, the training images of MD, Mod D, ND and VMD were 896, 64, 3200 and 1966, respectively. After the augmentation, training images totaled 10,760 in count. Table 3 also shows the total images of training and validation data after augmentation.

Feature Extraction Using Different DenseNet Transfer Learning Models
In the proposed model, input images with 208 * 176 sizes are applied to two DenseNet models, i.e., DenseNet121 and DenseNet201, for feature map extraction. The DesneNet121 model consists of five convolutional blocks as shown in Table 4. In the first convolution block (Block-1), the image is shaped to match the Conv_1 size 112 × 112, then it is transferred to the max pooling block. From the max pooling block, it is then sent to Conv_2 to again to be shaped into 56 × 56, then to Conv_3 for 28 × 28, Conv_4 for 14 × 14, and finally to Conv_5 for 7 × 7. After Conv_5, the obtained features are then sent to the global average pooling layer and finally sent to the dense layer to be obtained as output. Similarly, layers description of DenseNet201 is shown in Table 5. The difference between DenseNet121 and DenseNet201 is in the number of convolution layers used in each convolution block. The last dense layer is also different in both DenseNet structures; 1024 filters are used in DenseNet121, whereas DenseNet201 uses 1920 filters.    Table 6 exhibits both the filter visualization images of every convolution layer of DenseNet 121 after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 6. Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convo DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kernel with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution DenseNet 201 as well as filtered images of each class after every dense block. The total of five convolutional blocks in the proposed model. The single kernel or filte with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer o DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convol DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kernel with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution l DenseNet 201 as well as filtered images of each class after every dense block. Ther total of five convolutional blocks in the proposed model. The single kernel or filter with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7. Table 7. Filter visualization and image visualization for each convolution layer of DenseNet 201.

Conv_3
Diagnostics 2022, 12, x FOR PEER REVIEW  Table 7 exhibits both the filter visualization image of every convo DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kernel with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution l DenseNet 201 as well as filtered images of each class after every dense block. Ther total of five convolutional blocks in the proposed model. The single kernel or filter with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7. Table 7. Filter visualization and image visualization for each convolution layer of DenseNet 201.

Conv_4
Diagnostics 2022, 12, x FOR PEER REVIEW  Table 7 exhibits both the filter visualization image of every convo DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kernel with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution la DenseNet 201 as well as filtered images of each class after every dense block. Ther total of five convolutional blocks in the proposed model. The single kernel or filter with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convo DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kernel with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution la DenseNet 201 as well as filtered images of each class after every dense block. Ther total of five convolutional blocks in the proposed model. The single kernel or filter with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.    Table 7 exhibits both the filter visualization image of every convo DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kerne with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution DenseNet 201 as well as filtered images of each class after every dense block. Th total of five convolutional blocks in the proposed model. The single kernel or filte with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer o DenseNet 201 as well as filtered images of each class after every dense block. There are total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.     Table 7 exhibits both the filter visualization image of every convolution l DenseNet 201 as well as filtered images of each class after every dense block. The total of five convolutional blocks in the proposed model. The single kernel or filter with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer o DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convo DenseNet 201 as well as filtered images of each class after every dense blo total of five convolutional blocks in the proposed model. The single kernel with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution l DenseNet 201 as well as filtered images of each class after every dense block. The total of five convolutional blocks in the proposed model. The single kernel or filter with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer o DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7.   Table 7 exhibits both the filter visualization image of every convolution layer of DenseNet 201 as well as filtered images of each class after every dense block. There are a total of five convolutional blocks in the proposed model. The single kernel or filters along with the images after every convolution layer are shown in Table 7. Table 7. Filter visualization and image visualization for each convolution layer of DenseNet 201.

Classification Using Hybrid Machine Learning-Convolutional Neural Netw
The extracted features obtained from Block-5 of both DenseNet ar sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaus that, these are finally sent to a dense layer to be obtained as output. The la of both hybrid architectures is shown in Tables 8 and 9, respectively, in w two major differences-one is in the number of filters used at each layer the number of times a block runs. These are supervised machine learning classification techniques whic Bayes' theorem. They can be used to calculate conditional probability. Th vidual features in these classifiers is totally independent. These features on the values of any other features. This classifier utilizes continuous da take continuous values associated with their respective class. The likeli given by Equation (1).
where µ is mean of values in x associated with class , and is B variance; v is a random observation value.

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectu sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB that, these are finally sent to a dense layer to be obtained as output. The layer desc of both hybrid architectures is shown in Tables 8 and 9, respectively, in which th two major differences-one is in the number of filters used at each layer and an the number of times a block runs. These are supervised machine learning classification techniques which are ba Bayes' theorem. They can be used to calculate conditional probability. The value vidual features in these classifiers is totally independent. These features do not on the values of any other features. This classifier utilizes continuous data that take continuous values associated with their respective class. The likelihood fe given by Equation (1).
where µ is mean of values in x associated with class , and is Bessel co variance; v is a random observation value.

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectures are sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB. Afte that, these are finally sent to a dense layer to be obtained as output. The layer description of both hybrid architectures is shown in Tables 8 and 9, respectively, in which there ar two major differences-one is in the number of filters used at each layer and another i the number of times a block runs. These are supervised machine learning classification techniques which are based on Bayes' theorem. They can be used to calculate conditional probability. The value of indi vidual features in these classifiers is totally independent. These features do not depend on the values of any other features. This classifier utilizes continuous data that usually take continuous values associated with their respective class. The likelihood feature i given by Equation (1).

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectures are sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB. After that, these are finally sent to a dense layer to be obtained as output. The layer description of both hybrid architectures is shown in Tables 8 and 9, respectively, in which there are two major differences-one is in the number of filters used at each layer and another is the number of times a block runs. These are supervised machine learning classification techniques which are based on Bayes' theorem. They can be used to calculate conditional probability. The value of individual features in these classifiers is totally independent. These features do not depend on the values of any other features. This classifier utilizes continuous data that usually take continuous values associated with their respective class. The likelihood feature is given by Equation (1).
where µ is mean of values in x associated with class , and is Bessel corrected variance; v is a random observation value.

XGBoost Classifier
XG Boost stands for Extreme Gradient Boosting, which is a fully optimized distributed gradient boosting module which tends to be highly flexible, portable and efficient.

Classification Using Hybrid Machine Learning-Convolutional Neural Netw
The extracted features obtained from Block-5 of both DenseNet ar sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaus that, these are finally sent to a dense layer to be obtained as output. The la of both hybrid architectures is shown in Tables 8 and 9, respectively, in w two major differences-one is in the number of filters used at each layer the number of times a block runs. These are supervised machine learning classification techniques whi Bayes' theorem. They can be used to calculate conditional probability. Th vidual features in these classifiers is totally independent. These features on the values of any other features. This classifier utilizes continuous da take continuous values associated with their respective class. The likeli given by Equation (1).
where µ is mean of values in x associated with class , and is B variance; v is a random observation value.

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectu sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB that, these are finally sent to a dense layer to be obtained as output. The layer desc of both hybrid architectures is shown in Tables 8 and 9, respectively, in which th two major differences-one is in the number of filters used at each layer and an the number of times a block runs. These are supervised machine learning classification techniques which are b Bayes' theorem. They can be used to calculate conditional probability. The value vidual features in these classifiers is totally independent. These features do not on the values of any other features. This classifier utilizes continuous data that take continuous values associated with their respective class. The likelihood fe given by Equation (1).
where µ is mean of values in x associated with class , and is Bessel co variance; v is a random observation value.

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectures ar sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB. Afte that, these are finally sent to a dense layer to be obtained as output. The layer description of both hybrid architectures is shown in Tables 8 and 9, respectively, in which there ar two major differences-one is in the number of filters used at each layer and another i the number of times a block runs. These are supervised machine learning classification techniques which are based on Bayes' theorem. They can be used to calculate conditional probability. The value of indi vidual features in these classifiers is totally independent. These features do not depend on the values of any other features. This classifier utilizes continuous data that usuall take continuous values associated with their respective class. The likelihood feature i given by Equation (1).

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectures are sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB. After that, these are finally sent to a dense layer to be obtained as output. The layer description of both hybrid architectures is shown in Tables 8 and 9, respectively, in which there are two major differences-one is in the number of filters used at each layer and another is the number of times a block runs. These are supervised machine learning classification techniques which are based on Bayes' theorem. They can be used to calculate conditional probability. The value of individual features in these classifiers is totally independent. These features do not depend on the values of any other features. This classifier utilizes continuous data that usually take continuous values associated with their respective class. The likelihood feature is given by Equation (1).
where µ is mean of values in x associated with class , and is Bessel corrected variance; v is a random observation value.

XGBoost Classifier
XG Boost stands for Extreme Gradient Boosting, which is a fully optimized distributed gradient boosting module which tends to be highly flexible, portable and efficient.

Classification Using Hybrid Machine Learning-Convolutional Neural Network
The extracted features obtained from Block-5 of both DenseNet architectures are sent to Machine Learning Classifiers, namely SVM, XG Boost and Gaussian NB. After that, these are finally sent to a dense layer to be obtained as output. The layer description of both hybrid architectures is shown in Tables 8 and 9, respectively, in which there are two major differences-one is in the number of filters used at each layer and another is the number of times a block runs. These are supervised machine learning classification techniques which are based on Bayes' theorem. They can be used to calculate conditional probability. The value of individual features in these classifiers is totally independent. These features do not depend on the values of any other features. This classifier utilizes continuous data that usually take continuous values associated with their respective class. The likelihood feature is given by Equation (1).
where µ k is mean of values in x associated with class C k , and σ 2 k is Bessel corrected variance; v is a random observation value.

XGBoost Classifier
XG Boost stands for Extreme Gradient Boosting, which is a fully optimized distributed gradient boosting module which tends to be highly flexible, portable and efficient. For unstructured datasets like image datasets, this classifier is highly utilized along with several convolutional neural network models. This classifier is mainly utilized for higher unstructured datasets. XG Boost is given in Equation (2).
where x is an input training set trained by weak learners ranging from m to M, f (m) (x) defines model update.

Support Vector Machine Classifier
This technique is utilized for each data item as n-dimensional space with each feature value as coordinate value. Then, the hyper-plane is obtained by distinguishing the specified classes. SVM classifier is given by Equation (3).
where x i is ith dimensional real vector, y i indicates the ith class to which x i belongs, w is normal vector to hyperplane, w T x i − b gives the ith output, n gives total number of input points, and λ is a hard margin classifier for classifiable input data.

Results Analysis
The hyper parameters are utilized and essential for tuning the model, which may include optimizer, batch size (BS) and epochs. These optimization techniques are used to reduce the losses that have already occurred. Optimizers are algorithms or methods used to modify neural network features so as to minimize the losses. Adam optimizer was used in this model. BS specifies managed images in a single iteration. BS 64 was utilized in these models. Epochs indicate the number of times the dataset has been received by the neural network. One thousand epochs were used in these models. The Adam optimizer is used for training the deep learning algorithms, as it combines both AdaGrad and RMSProp optimizers' characteristics. A large BS causes heavy computational processes during deep learning model training. However, small BS allows faster computational processes. Hence, there is always a trade-off between large and small BS. The number of epochs should be more so that error can be minimized during model training; however, a large number of epochs increases the computational time. Hence, there should be a trade-off between a high and small number of epochs. Table 8 shows the name of hyper tuning parameters and their values.

Analysis of Hybrid DenseNet 121 Model
The features extracted from the Densenet121 model are classified using three different machine learning classifiers, namely SVM, Gaussian NB and XG Boost. The performance of these three hybrid DenseNet121 models is analyzed using train and validation loss and confusion matrix parameters.

Training and Validation Loss of Hybrid DenseNet121 Models with Different Epochs
Hinge loss, which depicts changes in loss during model training, is shown in Figure 4, and minimum hinge loss is achieved for hybrid DenseNet121-SVM (Figure 4a) and hybrid Desnenet121-Gaussian NB (Figure 4b), whereas validation hinge loss is more for hybrid DenseNet121-XG Boost (Figure 4c).   The confusion matrices of the hybridDenseNet121 models with machine learning classifiers are shown in Figure   The performance parameters of hybrid DenseNet 121 models for different epochs are depicted in Table 10, in which hybrid DenseNet121-SVM outperforms at 1000th epoch. At 1000th epoch, the training loss is 0.051, and validation loss is at 0.313 for DenseNet121 with SVM. The confusion matrices of the hybridDenseNet121 models with machine learning classifiers are shown in Figure 5   The confusion matrices of the hybridDenseNet121 models with machine learning classifiers are shown in Figure 5  From Table 11, the average performance comparison of all the DenseNet121 h models with average Precision(P), average Sensitivity(S), average Specificity (Sp) age F1-Score (F1) and average accuracy is depicted. These average parameters a tained by using batch size 64. It can be seen that a stable and better testing performa achieved with DenseNet121-SVM.

Analysis of Hybrid DenseNet 201 Model
The features extracted from the DenseNet201 model are classified using thr ferent machine learning classifiers, namely SVM, Gaussian NB and XG Boost. Th formance of these three hybrid DenseNet201 models is analyzed using training an idation loss and confusion matrix parameters.  From Table 11, the average performance comparison of all the DenseNet121 hybrid models with average Precision(P), average Sensitivity(S), average Specificity (Sp), average F1-Score (F1) and average accuracy is depicted. These average parameters are obtained by using batch size 64. It can be seen that a stable and better testing performance is achieved with DenseNet121-SVM.

Analysis of Hybrid DenseNet 201 Model
The features extracted from the DenseNet201 model are classified using three different machine learning classifiers, namely SVM, Gaussian NB and XG Boost. The performance of these three hybrid DenseNet201 models is analyzed using training and validation loss and confusion matrix parameters.

Training and Validation Loss of Hybrid DenseNet201 Models with Different Epochs
Hinge loss, which depicts changes in loss during model training, is shown in Figure 4. Minimum hinge Loss is achieved for DenseNet201-SVM ( Figure 7a) and DenseNet201-Gaussian NB (Figure 7b), whereas validation hinge loss for DenseNet201-XGBoost (Figure 7a) is more than that of the other two hybrid models. From Table 11, the average performance comparison of all the DenseNet121 hybrid models with average Precision(P), average Sensitivity(S), average Specificity (Sp), average F1-Score (F1) and average accuracy is depicted. These average parameters are obtained by using batch size 64. It can be seen that a stable and better testing performance is achieved with DenseNet121-SVM.

Analysis of Hybrid DenseNet 201 Model
The features extracted from the DenseNet201 model are classified using three different machine learning classifiers, namely SVM, Gaussian NB and XG Boost. The performance of these three hybrid DenseNet201 models is analyzed using training and validation loss and confusion matrix parameters.

Training and Validation Loss of Hybrid DenseNet201 Models with Different Epochs
Hinge loss, which depicts changes in loss during model training, is shown in Figure  4. Minimum hinge Loss is achieved for DenseNet201-SVM ( Figure 7a) and Dense-Net201-Gaussian NB (Figure 7b), whereas validation hinge loss for Dense-Net201-XGBoost (Figure 7a) is more than that of the other two hybrid models.  The performance parameters of hybrid DenseNet 201 models for different epochs are depicted in Table 12, in which hybrid DenseNet201-Gaussian NB outperforms at 1000th epoch. The hybrid DenseNet201-Gaussian NB outperforms the remaining three DL models at 1000th epoch. The minimum training loss is 0.028, and validation loss is at 0.265 for DenseNet 201 with Gaussian NB. The confusion matrices of the DenseNet201 model with machine learning classifiers of batch size 64 are shown in Figure 8. Both true and false predictions are displayed in these matrices. The accuracy of all DL models is shown in Figure 9, where DenseNet201 with Gaussian NB performs better than the other DL Models with machine learning classifiers.  The confusion matrices of the DenseNet201 model with machine learning classifiers of batch size 64 are shown in Figure 8. Both true and false predictions are displayed in these matrices. The accuracy of all DL models is shown in Figure 9, where DenseNet201 with Gaussian NB performs better than the other DL Models with machine learning classifiers.  From Table 13, the average performance comparison of all DenseNet201 hybrid models with average Precision, average Sensitivity, average Specificity, average F1-Score and average accuracy is depicted. These average parameters are obtained by using batch The performance parameters of hybrid DenseNet 201 models for different epochs are depicted in Table 12, in which hybrid DenseNet201-Gaussian NB outperforms at 1000th epoch. The hybrid DenseNet201-Gaussian NB outperforms the remaining three DL models at 1000th epoch. The minimum training loss is 0.028, and validation loss is at 0.265 for DenseNet 201 with Gaussian NB. The confusion matrices of the DenseNet201 model with machine learning classifiers of batch size 64 are shown in Figure 8. Both true and false predictions are displayed in these matrices. The accuracy of all DL models is shown in Figure 9, where DenseNet201 with Gaussian NB performs better than the other DL Models with machine learning classifiers.  From Table 13, the average performance comparison of all DenseNet201 hybrid models with average Precision, average Sensitivity, average Specificity, average F1-Score and average accuracy is depicted. These average parameters are obtained by using batch  Table 13, the average performance comparison of all DenseNet201 hybrid models with average Precision, average Sensitivity, average Specificity, average F1-Score and average accuracy is depicted. These average parameters are obtained by using batch size 64. It can be seen that a stable and better testing performance is achieved with DenseNet201-Gaussian NB. Precision, sensitivity, specificity and F1-Score of both DenseNet121-SVM and DenseNet201-Gaussian NB models are depicted from Figure 10a size 64. It can be seen that a stable and better testing performance is achieved with DenseNet201-Gaussian NB.

Comparison of Hybrid DenseNet121-SVM and DenseNet201-GNB Classifier
Precision, sensitivity, specificity and F1-Score of both DenseNet121-SVM and DenseNet201-Gaussian NB models are depicted from Figure 10a Average precision, average sensitivity, average specificity, average F1-score and average accuracy of both DenseNet121-SVM and DenseNet201-Gaussian NB models are depicted in Figure 11. From Figure 11, DenseNet201-GaussianNB outperforms Dense-Net121-SVM in all these criteria, such as average precision, average sensitivity, average specificity, average F1-score and average accuracy. Average precision, average sensitivity, average specificity, average F1-score and average accuracy of both DenseNet121-SVM and DenseNet201-Gaussian NB models are depicted in Figure 11. From Figure 11, DenseNet201-GaussianNB outperforms DenseNet121-SVM in all these criteria, such as average precision, average sensitivity, average specificity, average F1-score and average accuracy.

State of Art Comparison
Results obtained from pre-trained D.L models are shown in comparison with previous models using MRI images as shown in Table 14. The utilized approach outperformed other previous approaches. This approach utilized DenseNet121 and DenseNet201 with preprocessing and machine learning classifier methods to modify their efficacy. Diagnostics 2022, 12, x FOR PEER REVIEW 14 of 16 Figure 11. Average Performance Parameters of both Hybrid DenseNet Models with Machine Learning Classifiers.

State of Art Comparison
Results obtained from pre-trained D.L models are shown in comparison with previous models using MRI images as shown in Table 14. The utilized approach outperformed other previous approaches. This approach utilized DenseNet121 and Dense-Net201 with preprocessing and machine learning classifier methods to modify their efficacy.

Conclusions
This paper displayed the usefulness of DL models for prediction of AD. Dense-Net201 outperforms DenseNet121 in various comparative parameters. The dataset was acquired from Sarvesh Dubey via Kaggle. Accuracy of 91.75%, specificity of 96.5% and F1-score of 90.25%, respectively, were achieved with the DenseNet201 for Gaussian NB. These results would help radiologists to obtain a second opinion or simulator.
The model performs better when both environments for training and testing are similar. A possible limitation would be to guarantee reproducibility; however, the issue could be resolved through collections of large brain MRI datasets. A hybrid approach

Conclusions
This paper displayed the usefulness of DL models for prediction of AD. DenseNet201 outperforms DenseNet121 in various comparative parameters. The dataset was acquired from Sarvesh Dubey via Kaggle. Accuracy of 91.75%, specificity of 96.5% and F1-score of 90.25%, respectively, were achieved with the DenseNet201 for Gaussian NB. These results would help radiologists to obtain a second opinion or simulator.
The model performs better when both environments for training and testing are similar. A possible limitation would be to guarantee reproducibility; however, the issue could be resolved through collections of large brain MRI datasets. A hybrid approach places the convolutional information into machine learning parts and the AD images into deep learning parts before adding the results of both processes. Medical imaging requires various DL techniques for various bio-medical applications. Further, progress will be made on this model to overcome issues with modifying image acquisition, enhancement, different data formats integration, and weights misalignment, while applying the model to specific AD problems. As more data are acquired, the research could be made more impactful. Further, expanding from 2DCNN to 3DCNN could also be achieved, which mostly deals with multimodal aspects of brain MRI images. For data augmentation, GAN could also be implemented. Reinforcement learning, which makes its own decisions based on the existing environment, could also be used. This approach is still evolving to achieve better performance and transparency. As AD image data and computer assets are growing rapidly, research on AD using deep learning involving hybrid methods is also continuously evolving. This would be a necessary boon not only to these applications but to new approaches presently conducted in medical institutions as well.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.