An Approach for Classiﬁcation of Alzheimer’s Disease Using Deep Neural Network and Brain Magnetic Resonance Imaging (MRI)

: Alzheimer’s disease (AD) is a deadly cognitive condition in which people develop severe dementia symptoms. Neurologists commonly use a series of physical and mental tests to diagnose AD that may not always be effective. Damage to brain cells is the most signiﬁcant physical change in AD. Proper analysis of brain images may assist in the identiﬁcation of crucial bio-markers for the disease. Because the development of brain cells is so intricate, traditional image processing algorithms sometimes fail to perceive important bio-markers. The deep neural network (DNN) is a machine learning technique that helps specialists in making appropriate decisions. In this work, we used brain magnetic resonance scans to implement some commonly used DNN models for AD classiﬁcation. According to the classiﬁcation results, where the average of multiple metrics is observed, which includes accuracy, precision, recall, and an F1 score, it is found that the DenseNet-121 model achieved the best performance (86.55%). Since DenseNet-121 is a computationally expensive model, we proposed a hybrid technique incorporating LeNet and AlexNet that is light weight and also capable of outperforming DenseNet. To extract important features, we replaced the traditional convolution Layers with three parallel small ﬁlters (1 × 1,3 × 3, and 5 × 5). The model functions effectively, with an overall performance rate of 93.58%. Mathematically, it is observed that the proposed model generates signiﬁcantly fewer convolutional parameters, resulting in a lightweight model that is computationally effective.


Introduction
Alzheimer's disease (AD) is a severe neurological syndrome that renders a patient incapable of making decisions, memorising, speaking, learning, and so on [1,2]. The majority of Alzheimer's patients are in their early 60s or older. Damage to brain cells is the most devastating of all the physical changes. The hippocampus, amygdala, and certain other brain regions that regulate the majority of AD symptoms are the ones to suffer the most damage [3][4][5]. Learning cells are first impacted, and subsequently, other grey matter cells are destroyed, rendering the patient incapable of performing even the most basic tasks. As a consequence, individuals with Alzheimer's disease have severe behavioural and cognitive difficulties, as well as memory loss [6]. Beginning in the early 1960s, the From Figure 1, it can be observed that the hippocampus region (in the centre of the brain images) in AD patients is much smaller than in CN and MCI individuals. Similarly, the hippocampal size of MCI patients is smaller than that of CN patients.
Amongst all the ML approaches, the artificial neural network (ANN) is one of the most widely used technique, especially in the field of medical image processing [20]. ANN works by building multiple interlinked artificial neurons that simulate the biological functions of a human brain in order to interpret information from the environment [21,22]. The deep neural network (DNN) is an ANN component in which a collection of hidden layers are interpreted between input and output to aid in absorbing crucial characteristics for improved model training [23]. DNN is a commonly utilised machine learning approach that has been successful in a variety of healthcare applications [24]. Another reason for the popularity of the DNN is that it can handle even the most complex data, such as brain images [20].
Deep learning has many medical applications. In recent studies, it was revealed that DL can effectively detect K-complexes in EEG signals that help in identifying biomarkers of various diseases [25,26]. DL can also classify X-ray images effectively. In a recent article, it was shown that DL can be used in the detection of developmental dysplasia of the hip using X-ray images [27]. Artificial intelligence is also being utilized in the veterinary medicine field. Recent research discussed how artificial intelligence could effectively predict survivability likelihood and need for surgery in horses presented with acute abdomen (Colic) [28].
Research is going on to develop a reliable DNN-based image classifier. Different effective models have been created to date. DNN has been widely used in the classification of AD and has shown highly compelling findings [29]. Still, as far as we are aware, DNN algorithms are used relatively rarely in the diagnosis of AD. In order to test the efficacy of DNN models in AD classification, we deployed a collection of existing models and evaluated their overall performance in this study. The models we have implemented are LeNet [30], AlexNet [31], VGG-16 and VGG-19 [32], Inception-V1/V2/V3 [33], ResNet-50 [34], MobileNet-V1 [35], EfficientNet-B0 [36], Xception [37], and DenseNet-121 [38]. The motivation behind considering these DNN models includes the following: (a) LeNet has the one of the most simplest architectures that works effectively. (b) The 1st-ImageNet Large Scale Visual Recognition Challenge was won by AlexNet. (c) Except LeNet and AlexNet, all other models are recognized and are available in the Keras library for transfer learning. The main contributions to this work can be summarized as follows: • According to the results of the performance evaluation, all of the existing models performed at a percentage of less than 90. It has also been observed that amongst all the models, because of the simple and effective architecture, LeNet and AlexNet can perform faster in training and testing. • The main aim of this work is to develop a light weighted hybrid model that can perform faster and better. We combined LeNet and AlexNet in parallel and proposed a new hybrid DNN architecture. • Different convolutional kernel sizes may help a network to learn more crucial aspects, and mixing several features can improve feature representations [39]. Hence, in the proposed hybrid model, we replaced all the traditional large convolutional filters with a set of three small filters (1 × 1, 3 × 3, and 5 × 5). • Better feature extraction improves the model's performance, and the model's average performance improved to 93.58%. Mathematically, it is shown that the proposed hybrid model retrieved much fewer convolutional parameters (even significantly fewer than the regular AlexNet model), making it a computationally faster model.

•
In comparison to all other deployed models, as well as the discussed state of the art, it is observed that the proposed hybrid model achieved the most convincing performance.
The organization of the paper is as follows: (a) In Section 2, we discussed some of the recently published related state of the art. (b) In Section 3, we discussed and evaluated the performance of some existing DNN models for AD classification. (c) In Section 4, the proposed hybrid model is discussed and evaluated using the same data set. (d) In Section 5, Results, a discussion is given. (e) In Section 6, we concluded the paper along with a discussion about some future scopes of work.

Related Study
In the diagnosis of Alzheimer's disease, ANN methods are becoming increasingly popular. One of the key reasons for its popularity is the ability to learn about the best features from the surroundings to improve its forecasting accuracy over time [40]. Some of the recently published state-of-the-art works are discussed in this section.
A ResNet-based model for brain shrinkage identification that further helps in AD classification is proposed in ref. [41]. Initially, a DNN of residual self-attention is developed to increase the classification efficiency by integrating local-global along with spatial features from brain scans. For enhancing the intelligible characteristics, a gradient-based localisation class activation mapping (g-LCAM) based intelligibility procedure is developed. Finally, the authors propose an automatic classification approach which is based on the sub-sequential training. The proposed 3D model is inspired from the original ResNet model. In order to achieve the most convincing results, the 3D g-LCAM is used in the model. The proposed model can classify AD/CN, and progressive MCI (pMCI)/ stable MCI (sMCI).
In a related work, a new broad learning system (BroLeS) based model for the categorization of AD is proposed in ref. [42]. The diagnosis method employs brain MRIs and leverages BroLeS and its convolutional advances to categorize several stages of Alzheimer's disease. The computational anatomy toolbox (CAT-12) is used to perform various preprocessing tasks. A new model is developed, known as the convolution feature-based cascade of enhancement nodes BroLeS (CCEBroLeS) based on processed data that aid in merging BroLeS variants. As a result, a new version is offered that incorporates both the CCEBroLeS and the BroLeS. For feature extraction, a multi-layer CNN is utilised. The model is based on the well-known VGG model.
An artificial neural network-based AD diagnosing model that can also predict the progression of the disease is proposed in ref. [43]. A 3d multi-information generative adversarial network (MulGAN) is used in order to determine brain changes as age progresses. A DenseNet-based model is built to classify the disease that fundamentally optimises the localized degradation of brain to predict Alzheimer stages. The model incorporates a variety of factors, including age, gender, and so on. The Voxel-based morphometry (VoBM) toolkit is used in pre-processing to conduct skull removal and the splitting of brain images into 3 sections (grey matter, white matter, and cerebrospinal fluid). The presented method can differentiate various phases of cognition, such as MCI vs. AD, MCI vs. CN, pMCI vs. sMCI, and so on. The model was also tested as a multiclass classifier and came out with a positive result. Taking brain atrophy as an important bio-marker, a new AD classification approach is introduced in ref. [44]. The model can be used for both atrophy identification and classification. For determining the most discriminatory regions, a cluster-based CNN is designed. Crucial characteristics are retrieved from the detected locations and utilised for training the model. Information from regional brain MRI slices is also used in the training of the model. A cell-based anatomy with each of the axially aligned images is created to obtain the approximated positions for extracting the features. A composite loss function is used to improve the results.
A CNN-based AD diagnostic system is proposed in ref. [45]. A CNN model is designed that incorporates the most relevant characteristics of the hippocampal lobes utilizing T1-MR and FDG-PET data. No splitting procedures are carried out. All image data are converted as the identical spatial space to prepare data in training and testing. Rigid normalization is used to ensure that the cells from same brain areas from both sources are the same. The concept of original VGG model is taken as reference while building the proposed model. CN vs. AD, CN vs. pMCI, and sMCI vs. pMCI individuals are classified using the proposed model. A novel DNN model that can classify AD from DT images is proposed in ref. [46]. Preprocessing, such as normalisation, RoI separation, etc., is performed utilising the statistical parametric mapping software. After performing the segmentation, a separate volumetric measurement is performed for GM and WM. The proposed model is the combination of an input, convolution, batch-normalization, activation, pooling, dense, and the final classifying layer. Five-cross validation enhanced the training/testing performance of the model.
An AD classifier is developed using a fusion of CNN with the recurrent neural networks in a research work [47]. All 3D brain images are processed into a series of 2D slices. A mixture of convolutional and recurrent neural networks is used to fit the classifier appropriately on intra-inter characteristics. Slice-wise characteristics are adopted using CNNs, and multi-characteristics are adopted using RNNs' gated recurrent unit.
A DenseNet-based strategy for AD detection is proposed in ref. [48]. Some of the effective slices are used for further analysis from 3D MR data. In the proposed model, the concept of the DenseNet's Bottleneck is used. A channel factor is also used that takes into account three specific channels (RGB) from monocular MRIs. The M3d-Cam toolkit is combined with a guided gradient weighted class activation mapping (Grad-CAM) technique to improve imaging feature extraction. The procedure is known as attention mapping, and it aids in the discovery of undesired characteristics. All undesirable pixels are therefore eliminated using the proper processing methods.
An artificial neural network model for AD diagnosing using the brain MR data is proposed in the research work of [49]. The 3D Slicer toolkit is used to separate the hippocampal lobes from brain data. The surface of voxels is then processed using an uniformity rectifying analysis based on Local-Entropy-Minimization-bi cubic Spline. Eventually, the diagnoses are carried out using a CNN-based predictor. The input layer, convolutional layers, pooling layers, flatten layers, fully connected layers, and output class label make up the proposed CNN.
By using the concept of extreme learning, a novel CNN model for AD classification is proposed in ref. [50]. Cognitive control networks are classified using two distinct networks. Additionally, the concept of an enhanced extreme learning machine is also used.
Employing extreme learning, the model is trained on elements of deep regional connectivity. Extreme learning is also used to assist the network in learning more about characteristics in the area. The Pearson correlation (PC) coefficient is used to construct the brain network. The suggested DNN is made up of convolutional layers, the ReLu activation function, pooling layers, fully linked layers, and decision layers.
For staging the AD spectrum, a RoI-CNN based classifier is proposed in a research article [51]. Patches of three orthogonal views of selected RoIs from cerebral regions are used to train a CNN model. From the brain images, the hippocampus, amygdala and insulae regions are chosen as RoIs. Softmax activation function is used to predict the probability of the AD stages. The Gwangju Alzheimer and Related Dementia (GARD) data set is used. RoI-basd data are used in a CNN for binary classifications. Then classifiers are grouped together for staging the AD. A permutation test is performed to choose the specific 3 pairs of ROIs from 101 different ROIs in the data set.
From the discussion about various recently published state-of-the-art works, it is observed that the majority of the papers did not give a high priority to computational time. Moreover, the highest performance is reported as 90%. So to enhance both the computational and classification performance results, we propose a hybrid approach that generates fewer parameters to make it a light-weight model (discussed in Section 4).

Data and Tools
T1-weighted, MPRAGE MRI data are acquired from the online data set ADNI [52]. During the acquisition of data, a total of 150 subjects (CN: 50, MCI: 50, AD: 50) are considered.
Throughout ageing, the thickness and biological structures of the human brain changes [53,54]. Our earlier studies [55,56] showed that the hippocampal size as well as the volume of grey matter (GM) varies with the individual's ageing. It is observed that the average hippocampal and GM size/volume is higher in participants in a certain category (CN, MCI, or AD) aged 60-69 years than any other ages (70+ years). Similarly, individuals in their 70s and 80s have greater hippocampal/GM areas than those in their 80s and 90s. As a consequence, all training and testing data are separated into multiple subgroups depending on patient age (60-69/70-79/80+ years) for better evaluation of the algorithms.
The actual number of training images was 5000. We used the Data-Gen process to generate a large amount of training data with several variables, such as rotation, mirror reflection, and so on. The total number of images surpassed 11,000. Table 1 shows how all of the data are organized.

Experimental Setup
The CPU used in this work is configured with (a) 12 GB of RAM (b) 500 GB of SSD storage, (c) 2 GB graphics, and (d) i7 processor. For implementations, we used the Python 3.0 toolkit. We employed the "softmax" activation function, the "StochasticGradientDescent (SGD)" optimizer, and the "SparseCategoricalCrossEntropy (SCCE)" loss function for all of the models. The data are split into 32 batches and trained across 40 epochs.

Pre-Processing
With the help of a radiologist from the North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences (NEIGRIHMS), we performed a pre-processing step for selecting an appropriate slice from the 3D images. By utilizing 3D slicer software, we selected a slice of the image where the hippocampus region can be visualised properly. The reason behind taking the hippocampus as a region of visual interest is that, in AD, the hippocampus is the primarily affected region in the brain. After obtaining the 2D images, we applied the skull-stripping operation.
All data acquired from the data set ADNI are not skull-free. In our study, the skull is unnecessary; hence, we performed the skull-stripping operation before training the models. Five frequently utilized segmentation strategies, including region-growing, region splitting-merging, K-means clustering, histogram-based thresholding, and the fuzzy c means method, are examined to separate the skulls more precisely [57]. As shown in Table 2, it is observed that the histogram-based thresholding technique can deliver a reasonable outcome [57]. As a result, a histogram-based technique is used to remove the skull. Figure 2 shows an example of an input and the corresponding output result (skull stripping).

Discussion about the Implemented DNN Models
Below is a brief overview of all the models that have been implemented.

LeNet
In 1989, Yann LeCun presented one of the most simplest and effective DNN architectures with only 7 layers [30]. The arrangement of the layers can be summarized after the input as (1) conv layer, (2) pooling layer, (3) conv layer, (4) pooling layer, (5) fully connected layer, (6) fully connected layer, and (7) the output layer. In CNN, the convolution layer has the responsibility for extracting the important features from the input data. By learning attributes with smaller sections of input data, convolution maintains the correlation among pixels. It is a computational process with two variables: image matrix as well as a filter/kernel. A sample convolutional operation can be presented as Equation (1).
In Equation (1), 'C' stands for convolution, 'A' stands for input, and 'B' stands for kernel function, and 'b' is bias value. The matrices' rows and columns are denoted by 'x' and 'y'.
The pooling operation implies rolling a 2D kernel across each channel of the feature space and aggregating the features that fall inside the filter's coverage zone. Data maps' dimensionalities are reduced by using pooling layers. As a result, the set of variables to train as well as the cost of processing in the networks are both reduced. There are three types of pooling layers used by different DNN models, which are max pooling, average pooling, and global pooling.

AlexNet
AlexNet was first introduced by Krizhevesky in 2012 [58]. AlexNet, which uses an 8-layer network, blew away the competition in the ILSVRC 2012 [59]. AlexNet and LeNet have nearly identical design ideas, yet they also have substantial variances. Firstly, AlexNet is substantially larger than LeNet, which is comparably smaller. AlexNet has 8 layers, including 5 convolutional layers and 2 dense layers, followed by an output layer. Secondly, AlexNet adopts the ReLu activation function instead of sigmoid. This model demonstrates that learning-based features can outperform manually designed features, shattering the old paradigm in machine vision.

Inception-V1 and V2 and V3
Although architectures of deep CNN, such as VGG-16 and VGG-19, can convincingly perform classification tasks, but they sacrifice computation time [62]. Furthermore, overfitting difficulties have an impact on such networks, and it is difficult to propagate gradient modifications across the entire network. Lin et al. developed the notion of the inception module in 2014 to address these challenges [33]. The main goal of the inception block is to estimate an ideal local sparse organization. It lets us employ many sorts of filter sizes in a single image block, rather than being limited to a single filter size, and finally the combination of all will be forwarded to the next layer. Szegedy et al. then developed the architecture of Inception-V1 (which is also known as GoogleNet) by borrowing the concept of the inception module [63]. Inception-V1 was chosen the winner of the ILSVRC 2014. Including several inception modules, this model is designed with a total of 22 layers and in each module, a set of 1 × 1, 3 × 3, and 5 × 5 filters is used.
Although the achievement of Inception-V1 is adequate, the topology has a flaw. The utilization of larger filters, such as 5 × 5, can cause the input parameters to diminish by a large factor, possibly resulting in the loss of vital information [64]. To address this problem, the Inception-V2 framework is created, in which each of the 5 × 5 convolutions is modified with 2 3 × 3 [65]. Additional modification to this approach is the replacement of the n × n computation with n × 1 and 1 × n, which improves the method to be operationally quicker.
Inception V3 is introduced by upgrading and adding some new concepts, including label filtering, uses of 7 × 7 filters, use of the RMSprop estimator, and so on [65]. Inception-V3 came in second place in the ILSVRC contest in 2015 [66].

ResNet-50
Although deep models, such as Inception, produce impressive findings, as the network grows deeper, it becomes saturated and loses accuracy quickly [34,67]. The notion of a residual block is developed to overcome the problem. The fundamental idea is to create a bridging that enables to bypass 1 or even more layers. The concept of residual blocks worked successfully, as ResNet won the ILSVRC 2015 championship [68]. The ResNet-50 can be divided into five blocks, each block owning a collection of convolution and residual blocks.

MobileNet-V1
MobileNet is well known for its use in lightweight apps [69]. The notion of depth-wise convolutions is applied in this framework, which aids in the reduction of less significant parameters [35]. Convolutional function is divided into two parts: initially, a depth-wise convolutional layer which is used to filter the input, and then an 1 × 1 (also known as point-wise convolution) convolutional layer merges the processed information to form new features.

EfficientNet-B0
Tan et al. introduced a novel prototype scaling strategy in 2019 that is built on a basic compounded coefficient that helps in scaling up the networks in a more ordered manner [36]. Traditionally, dimension scaling is performed by taking the width/depth/resolution as a factor, but EfficientNet uses a vector of scaling coefficients [70]. This technique is also called compound scaling (CS). If, for a particular input channel, the depth, width, and resolution are given by m = a φ , n = b φ , and o = c φ correspondingly, where φ represents the compound coefficient, then the mathematical expression of CS can be defined as 2:

Xception
Google team designed this new CNN architecture based on the Inception network topologies with the introduction of a new idea termed depth-wise separable convolu-tion [37]. This new idea of the convolution technique is simply a revised form of the depth-wise convolution. The operation starts with a 1 × 1 convolution and then proceeds to channel-wise spatial convolutional procedures. The non-linearity of the inception model is removed in Xception by applying the depth-wise separable convolution.

DenseNet-121
In deeper models, the communication route from the origin to destination, as well as the gradient that traverses in the opposite direction, can become so long that certain information gets lost even before it achieves the given target [71]. DenseNet changed the way the layers communicate with one another. All layers in the network are intrinsically linked, and the idea of feature reuse is used to lower the overall amount of parameters. Another difficulty with DNN models is that they follow knowledge transfer and also gradients during learning. To address this problem, DenseNets provides all layers with the ability to directly acquire gradients by matching loss functions [38].
We evaluated several of the most common parameters, including accuracy, precision, recall, and F1-score, to evaluate performance. The aggregate of all of these key metrics is computed. The overall evaluation is presented in Table 3.  Table 3, it can be noticed that the highest average performance is achieved by the DenseNet-121 model. However, the model compromises with the execution complexity. LeNet and AlexNet, on the other hand, have the fastest execution times because of their simple and effective architectures. Our major goal is to design an effective architecture that can perform better and faster. We propose a hybrid architecture where LeNet and AlexNet combined together. A detailed discussion of the architecture is given in Section 4.

Proposed Model for AD Classification
The ensemble of different DL models is a popular way to enhance classification performance results. Effective examples of ensembles of different ML models for automatic sleep-arousal detection, attention classification, etc., can be observed in refs. [72,73]. By taking the original LeNet and AlexNet architectures as a reference, we propose a new model, where all the layers of both models are combined parallelly. Apart from that, since our region of interest is the brain, which is not very large, and since different convolutional kernel sizes help a model to learn better [39], we replaced the large convolutional filters from the original architectures with a set of three small parallel filters (1 × 1, 3 × 3, 5 × 5). The architecture of the proposed model is presented in Figure 3. In Figure 3, the symbol '+' represents the concatenation of different layers. The average performance of the model is presented in Table 4.
One of the primary motivations for substituting normal convolution layers is to accelerate up the model by extracting fewer but more diverse parameters. Instead of employing a large number of kernels with the same enormous filter size, we broke it down into three separate filter sizes. This phase not only assisted us in obtaining multiple features, but it also resulted in fewer parameters, allowing the model to run more efficiently. The mathematical advantages of adopting a set of small kernel sized convolution layers are addressed in the next paragraph.
Considering no padding and 1 stride, if we use 'M' number of 'K h × K w ' kernels in a convolution layer followed by a prior layer with 'N' number of kernels, the total number of parameters 'P' generated in the current convolution layer can be represented by Equation (3): where 1 is added as the bias term in each filter. In the original LeNet architecture, it has a total of two convolution layers with 6, 5 × 5, and 16, 5 × 5, filters. If the input dimension is 256 × 256 × 3, the number of parameters generated by each of the convolution layers can be calculated as follows: 1.

3.
Total parameters generated in LeNet by the two convolution layers = 2872.
Total parameters generated in LeNet by the two convolution layers = 3,747,200.
If we add the original LeNet and AlexNet architectures together, then we will get a total number of 3,750,072 convolutional numbers. Since the number of parameters is huge, it will make the hybrid model slower in execution. Hence, we introduced the concept of multiple small sized kernels in the original convolution layers to gain a variety of features that can also reduce the total number of parameters significantly. In the modified LeNet architecture, we divided the total number of kernels in three parts in each of the convolution layers. In the modified convolution layers of LeNet, we have [(2, 1 × 1), (2, 3 × 3), (2, 5 × 5)], and [(6, 1 × 1), (6, 3 × 3), (6, 5 × 5)], filters. The number of parameters generated by each of the modified convolution layers can be calculated as follows:

1.
No. of parameters generated by first convolution layer = [((1 No. of parameters generated by second convolution layer = [((1 Total parameters generated in LeNet by the two convolution layers = 652.
Total parameters generated in LeNet by the two convolution layers = 1,445,380.
The proposed hybrid model (modified LeNet + modified AlexNet) generates 1,446,032 convolutional parameters, which is 2,304,040 lesser than the original architecture. It can be clearly observe that the proposed model generates significantly fewer convolutional parameters, resulting in a model that is lighter in weight and faster. It is also worth noting that the number of convolutional parameters in the hybrid model is even much lower than the original AlexNet (parameters difference is 2,301,168), which makes the hybrid model even faster than the original AlexNet model. The hybrid model not only outperforms all other models in terms of execution time, but it also has a better ability to classify AD. In Table 4, the average performance of the proposed hybrid model is presented.
All the performances evaluated in this work are examined over test images. The performance evaluation metrics used in this work are accuracy, precision, recall, and F1 score. The percentage of correct forecasts (true positive + true negative) among all guesses is known as accuracy. The number of correctly predicted positive outcomes (true positive) is measured by precision. The percentage of positive cases the classifier accurately predicted out of all the positive instances in the data is known as recall. The F1-score is a measurement that combines recall and precision. Overall, it is referred to as the mean of the two.
From Table 4, it can be observed that the average performance of the proposed hybrid model, which is around 93.58%, is the maximum among all the implemented models. Moreover, the average time required per epoch in the proposed model is lower than most of the discussed models.
The proposed model is also tested for multi-class classification (CN vs. MCI vs. A) using 5-fold cross validation. The dataset is split into 5 folds, where examples are assigned randomly to each fold. For each one of the ith runs (where i = 1 to 5), assign examples in the ith fold for testing with the remaining examples in the other folds for training. Then, perform predictions on the ith testing fold. The performance observed is presented in Table 5. The best-performing confusion matrix for multi-class classification is shown in Figure 4. From Table 5, the standard deviation (σ) is determined as Equation (4): In Equation (4), T is the population size (3). Mean Performance = 0.87; Sum of difference (s) = (x i − µ) =0.002; Variance (σ 2 ) = s/T = 0.0027/3 = 0.001; Deviation (σ) = √ 0.0013 = 0.032.

Results and Discussion
Some of the most commonly used DL models are implemented for AD classification using the same dataset and the same experimental setup. As presented in Table 3, the average performance results of all implemented models are observed. It is observed from Table 3 that LeNet and AlexNet are computationally faster (68 s/epoch, and 79 s/epoch) than all other implemented models. The idea of this work is to design a hybrid approach that can classify AD efficiently with less computational time. Hence, we combined LeNet and AlexNet with certain modifications.
After combining LeNet and AlexNet with certain modifications, the hybrid model is tested for binary class classifications. As presented in Table 4, we observed the performance with some widely used evaluation matrices, such as accuracy, precision, recall, and F1 score. According to the average value shown in Table 4, it is observed that the hybrid model can outperform all implemented DL models (except the original LeNet) in terms of performance (93.58%) and computational time (72 s/epoch).
The same hybrid model is also tested for multi-class classifications. For better performance analysis, we used a 5-fold cross-validation approach. The multi-class classification performance is presented in Table 5. Table 5 shows that the proposed hybrid approach can be utilized effectively for multi-class categorization as well.
As some of the recently published related state-of-the-art works are discussed, we made a performance comparison of all the discussed state-of-the-art work with the proposed approach (in the referred similar works, authors mentioned the performance of their models). It is observed from Table 6 that the proposed hybrid approach can outperform all the discussed state-of-the-art works convincingly.
As compared in Tables 3 and 6, it can be observed that amongst all the discussed models and state-of-arts, the proposed hybrid approach can classify AD more convincingly. From Table 6, it can be observed that, amongst all the implemented models, Efficient-Net requires the maximum time for execution. The proposed hybrid approach requires approximately 72 s per epoch, which is the minimum among all implemented models (except the original LeNet).

Conclusions and Future Work
In this experimental work, we took 12 of the most commonly used DNN models for implementation. We tested the models using the same data which were acquired from the online database ADNI. There are three classes considered for classification (CN, MCI, and AD). All data were further distributed separately for different age groups. From the implementation results, it was observed that the DenseNet performed most convincingly but took much computational time. Although the LeNet and AlexNet performance results are not as good as that of DenseNet, their simple and effective architecture allows them to run significantly faster. Given the importance of computing time, we presented a new hybrid DNN model in which we integrated LeNet and AlexNet in parallel. However, the proposed model would take longer time to implement due to its complicated architecture and high convolutional operations. We replaced all of the typical convolution layers with a series of three small parallel convolution layers having 1 × 1, 3 × 3, and 5 × 5 filters to fix this issue, which also allowed the model to extract more significant features. Mathematically, the proposed hybrid model extracted much fewer convolutional parameters than all of the other models (except LeNet) presented, revealing that it is one of the most light-weight models. We discussed some of the recently published state-of-the-art work to compare our work. From the experimental evaluation, it was observed that the average performance of the proposed hybrid model not only outperformed all the implemented models but also all the discussed state-of-the-art works. The proposed model's average performance is approximately 93.58%, and training takes around 72 s per epoch, which is faster than all the discussed models (except the original LeNet model).
This work demonstrates that large convolutional filters are not necessarily required to extract features for image classification using DNN. More relevant features can be extracted by combining more than two small-sized kernels, which results in significantly fewer parameters and reduces computing time.
Although the proposed DNN model can classify AD convincingly, the model can be further improved in future work. The performance of the model may be further improved by adopting advanced DNN concepts, such as the Dense-block notion, that can help the model with gradient losses. For better feature extraction, GA based approach may be utilized in the proposed model. Since lower-intensity valued pixels may also contain important information, a hybrid pooling layer (Min + Max pooling) may help the model in adopting more relevant features. In this work, only one data-set ADNI is used. In the future, more data from different databases can be acquired to compare and improve the results. The performance of the model can be compared with some more advanced DNN-based approaches. Only MRI is used in this work; in the future, more modalities of images, such as CT images, PET, etc., can be acquired and tested in the model.