A3C-TL-GTO: Alzheimer Automatic Accurate Classification Using Transfer Learning and Artificial Gorilla Troops Optimizer

Alzheimer’s disease (AD) is a chronic disease that affects the elderly. There are many different types of dementia, but Alzheimer’s disease is one of the leading causes of death. AD is a chronic brain disorder that leads to problems with language, disorientation, mood swings, bodily functions, memory loss, cognitive decline, mood or personality changes, and ultimately death due to dementia. Unfortunately, no cure has yet been developed for it, and it has no known causes. Clinically, imaging tools can aid in the diagnosis, and deep learning has recently emerged as an important component of these tools. Deep learning requires little or no image preprocessing and can infer an optimal data representation from raw images without prior feature selection. As a result, they produce a more objective and less biased process. The performance of a convolutional neural network (CNN) is primarily affected by the hyperparameters chosen and the dataset used. A deep learning model for classifying Alzheimer’s patients has been developed using transfer learning and optimized by Gorilla Troops for early diagnosis. This study proposes the A3C-TL-GTO framework for MRI image classification and AD detection. The A3C-TL-GTO is an empirical quantitative framework for accurate and automatic AD classification, developed and evaluated with the Alzheimer’s Dataset (four classes of images) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The proposed framework reduces the bias and variability of preprocessing steps and hyperparameters optimization to the classifier model and dataset used. Our strategy, evaluated on MRIs, is easily adaptable to other imaging methods. According to our findings, the proposed framework was an excellent instrument for this task, with a significant potential advantage for patient care. The ADNI dataset, an online dataset on Alzheimer’s disease, was used to obtain magnetic resonance imaging (MR) brain images. The experimental results demonstrate that the proposed framework achieves 96.65% accuracy for the Alzheimer’s Dataset and 96.25% accuracy for the ADNI dataset. Moreover, a better performance in terms of accuracy is demonstrated over other state-of-the-art approaches.


Introduction
The prevalence of age-related diseases rises as people live longer, especially brain diseases, mostly neurodegenerative, such as Alzheimer's disease (AD) [1]. AD was named in 1907 by Alois Alzheimer, who delineated a fifty-year-old woman dying of advanced dementia after four years of rapid memory deterioration [2]. AD is an irreversible, progressive, and ultimately fatal brain degenerative disorder that affects middle-aged and older people. When the disease is discovered, most patients have already progressed to an advanced stage [3]. As a result, AD gradually deteriorates memory and thinking abilities and the ability to carry out even the most basic duties of daily life by destroying the brain cells. Unfortunately, there is no currently available curative treatment for AD. Thus, early detection can effectively treat cognitive losses at the initial stage.
Various ailments are associated with aging, and AD is a major cause of dementia, also known as a major neurocognitive disorder, which mainly affects older people and poses the highest cost to society and healthcare budgets. The estimated annual cost of dementia is one trillion dollars, and it is expected to double by 2030 [4]. The World Health Organization (WHO) stated that dementia is a major societal concern, with more than 55 million people worldwide suffering from dementia, with nearly 10 million new cases diagnosed each year and 82 million cases in the next ten years [5]. Furthermore, the report [6] pointed out that, by 2050, patients with dementia will reach 152 million, with a patient being diagnosed with dementia every three seconds [7]. AD is a progressively developing disease and is considered the seventh leading cause of death in the USA, with 132,741 deaths in 2020 [8], which exceeds breast and prostate cancer combined [9]. In addition, AD with unknown causes endangers the physical health of the elderly [7]. The aging of the world's population is increasing year by year [3]. For the first time in US history, the speedup of global aging will outnumber children (77 million) by 2034. As a result, the incidence of AD will increase dramatically and become more challenging with this quickening of global population aging. Figure 1 reports the anticipated number of people above 65 with AD in the US population from 2020 to 2060 [10]. There are no viable therapy techniques or medications available for Alzheimer's disease at the moment. Therefore, the diagnosis of dementia journey is often complex and experiences long wait times [11]. On the other hand, AD treatments at early stages slow down the complications and maintain the residual brain functions. Therefore, early detection and intervention for this central nervous system degeneration are crucial to providing timely treatment to patients. In this vein, a complete understanding of its biomarkers is essential to differentiate AD symptoms from normal aging symptoms and accordingly slow its progression. Indeed, many neurological disorders directly impact the brain, particularly the hippocampus, which is essential in forming memories [12], emotional control, and learning. Hippocampus damage has been linked to various neurological and psychiatric disorders, including AD [12]. Prolonged AD is linked to tissue loss in various brain regions [13]. The damage begins in the gray matter (GM) and progresses to the white matter (WM) before reaching the hippocampus [12]. Figure 2 shows the major signs and symptoms of dementia [5,11] that start with memory loss and end with death. Early on, AD manifests as a mild cognitive impairment (MCI) and gradually gets worse. MCI is a condition in which people have more memory problems than usual and increases the risk of developing AD in some older people than others. Mild Alzheimer's patients are frequently diagnosed with getting lost, difficulty performing tasks, repeating questions, and behavioral changes. The disease progresses in stages, ranging from a moderate to severe AD stage [14]. Damage occurs in areas of the brain that control language, reasoning, and thought in the moderate AD stage. As a result, memory loss worsens, and people have difficulty recognizing others. Severe Alzheimer's disease is distinguished by significant brain tissue shrinkage and plaques and tangles spread throughout the brain. Patients in this stage cannot communicate and must rely entirely on others for their care. The manual diagnosis of AD is based on recent developments in advanced neuroimaging techniques (such as magnetic resonance imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET)), manual feature extraction, and clinical evaluations. MRI scans are the commonly utilized method that achieved unprecedented progress due to their non-invasive nature, high resolution, nonionizing radiation, and multidirectional imaging [15]. However, the brain structure is very complicated, and the imaging modalities involved are multi-modal and the curse of dimensionality, making the manual diagnosis time-consuming, error-prone, and tedious.
Recently, the rise of decision support systems based on medical imaging analysis has a great role in developing intelligent diagnosis systems for AD that can identify the severity of the patient's disease and, therefore, keep AD in the initial stage. Furthermore, artificial intelligence and machine learning appear to be promising solutions that aid radiologists in an AD diagnosis. Thus, the accurate classification approach of brain images in the different stages of the disease can be efficiently performed. However, the AD diagnosis based on traditional machine learning algorithms had different time and space complexity, statistical data distribution, convergence, and overfitting challenges. Deep learning (DL) has recently been used in image classification to solve these challenges and introduced an accurate medical image classification approach. The key elements of a successful DL model are: the used datasets for training and testing, the design of the network, and the parameters and hyperparameter optimization [16]. Current deep learning approaches are effective in medical image evaluation as they do not require great effort for prior preprocessing and feature selection, resulting in a more objective and less biased process [17]. As a result, deep learning can efficiently classify brain images at various stages of the disease.
The main objective of this study is to propose an A 3 C-TL-GTO framework for MRI image classification and Alzheimer's disease detection. The proposed framework consists of four phases: (1) Acquisition Phase, (2) Preprocessing Phase, (3) Classification, Learning, and Optimization Phase, and (4) Population Updating Phase. The A 3 C-TL-GTO framework is based on transfer learning and the Artificial Gorilla Troops Optimizer (GTO). The main contributions of this study are: • Introduce a novel Alzheimer classification framework based on pretrained CNNs. • A CNN architecture is chosen based on an analysis of an Alzheimer's patient brain MRI scans formulated as an optimization problem handled by Gorilla Troops Optimizers on the list of top algorithms that outperform natural-inspired algorithms. • The performance of each pretrained model is improved by optimizing a CNN and transfer learning hyperparameters with Gorilla. • There is no need to manually configure hyperparameters because this framework is adaptable. • The findings of standard performance measurements have been quite promising.
The paper is organized as follows: The background is introduced in Section 2. In Section 3, related work is reviewed. Section 4 describes the proposed A 3 C-TL-GTO framework and algorithms. Section 5 discusses the experiments and the results. Section 6 concludes the paper.

Background
Alzheimer's disease (AD) is a type of dementia that progresses over time and is among the many ailments associated with aging. Alzheimer's disease develops gradually over the years, and there is no cure. However, older people are more prone to AD. Early-onset is rare [18]. However, AD is fatal if left untreated. Diagnosing AD at an early stage is imperative because existing treatments only slow the progression of symptoms [12,16]. One of the neurologists' most difficult issues is classifying Alzheimer's disease (AD). Methods using manual classification can be time-consuming and inaccurate. Because the brain is the most impacted region in AD [19], a precise classification framework based on a brain imaging dataset may deliver better results. Various research studies use different datasets to evaluate and compare their proposed methodology with other state-of-the-art research [2]; Figure 3 summarizes the characteristics of well-known AD datasets. Historically, basic scientific findings concerning neurological disorders have been hard to translate into effective treatments. Nevertheless, gathering and manipulating large datasets has become exponentially easier with big data. Multi-modal and multidimensional datasets, such as imaging and genomics analysis, are among these complicated datasets. Analytics become more challenging as datasets grow. Advanced statistical and mathematical algorithms are being used to tackle this formidable challenge based on machine learning, deep learning, and deep reinforcement learning. Computer-aided techniques and medical imaging are the most reliable means of detecting AD early [20,21]. In recent years, deep learning has received great success in the medical image field. As well as being used in medical image analysis, it has also gained wide attention for AD detection [22]. The AI learning model learns directly from the data, and as it is exposed to huge datasets and trained over time, it gets better. The model can make predictions based on previously unknown data with this knowledge. AI learning models are classified into three types: a supervised model for structured and labeled data, an unsupervised model for unlabeled and unstructured data, and a semi-supervised model, which combines both.
Machine learning techniques, such as deep learning (DL), simulate the brain's functions to create patterns and identify patterns that can be used to make more complex decisions. DL is the first choice for researchers because of its ability to draw information from even unstructured and unlabeled data [19]. In DL calculations, many nonlinear layers can be used to extract features. Each layer contributes to the depth of understanding of a system. A DL family member, a convolutional neural network (CNN), typically analyzes images without prior processing [21]. For identifying documents [23], LeCun and others introduced a deep CNN in 1998. Machine learning has been used as a diagnostic tool by physicians in recent years, as it offers additional information [20,23].
Deep learning is predicted to be the future of artificial intelligence, but it requires enormous amounts of data. When feature spaces change, algorithms must be rebuilt to address new problems. In previous studies, the network was generally built from the ground up, which is rarely achievable, and the training process is time-consuming, labor-intensive, and ineffective. Because transfer learning is much faster and more effective than traditional learning, using pretrained networks, such as AlexNet, to identify images changed the significance of DL networks in the long term. Furthermore, this is inappropriate for small radiology datasets, and overfitting is prevalent during training [18,24]. Deep learning layers transferred between datasets could be an interesting research topic for various tasks. Meta-learning may achieve higher reuse levels in the future. Despite the difficulty of the process, researchers can use a variety of internet databases and software packages to identify AD. The depth model can be implemented using Matlab, Keras, Tensorflow, Torch, and other software programs.
Because deep learning models outperform traditional models on large datasets, the methods described above are less reliable when applied to clinical cases. In addition, the models above depend on standard parameters. The chosen hyperparameters and datasets [25] significantly influence the CNN performance. Hyperparameters are different from model weights. The former is determined before training, whereas the latter is determined during training. Hyperparameters can be adjusted in several ways [25]. A poor choice of hyperparameters can negatively impact the performance of an application [26]. Therefore, hyperparameter values are selected according to an optimization process [25] instead of being randomly selected for each application.
A proposed framework will typically include numerous layers, intermediate processing elements, and other structural features, necessitating the use of search metaheuristics to find these hyperparameters. The metaheuristic algorithm provides accurate and robust solutions to nonlinear, multidimensional optimization problems. Most metaheuristics derived from natural organisms in nature are used to solve optimization problems [27]. Furthermore, because metaheuristics use a black-box approach, they have high flexibility and no gradient information, making them simple to use and not reliant on gradient information. Regardless of structural characteristics, metaheuristic methods begin with random trial solutions within their limitations. The algorithm-specific equations then iteratively evolve candidate solutions until the termination condition is satisfied. As a result, various optimization algorithms can propose varying degrees of solution improvement [28]. Evolution, physics, and swarms are three commonly used metaheuristic algorithm types [29]. The swarm algorithm simulates a population's social behavior. Since the early 1990s, various swarm-based optimization algorithms, such as particle swarm (PSO) and ant colony (ACO), have been developed. Swarm intelligence algorithms include firefly, grey wolf, sparrow, whale optimization, and artificial bee colony algorithms.
The Artificial Gorilla Troops Optimizer is a new algorithm based on gorilla natural behaviors (GTO). In 2021, Abdollahzadeh et al. proposed the GTO. Gorillas' social behavior and movement are mimicked in this method [27,30]. Troops of gorillas consist of a silverback gorilla group and several females and their offspring. Male gorilla groups are also present. The silver hair that emerges on the silverback's back during puberty gives it its name [27,31]. It has a lifespan of about 12 years. Therefore, a group's attention is drawn to the silverback. However, it is not just the one who makes all the decisions but mediates fights, determines gorilla group movement, guides them to food sources, and is responsible for their safety and well-being. Blackbacks are young male gorillas who serve as backup guardians for silverbacks. They are between the ages of 8 and 12, and their backs are free of silver hairs. Gorillas, both male and female, move from their birthplaces. Normally, gorillas migrate to new groups. On the other hand, a male gorilla is more likely to abandon his group and start a new one by wooing a female gorilla who has traveled outside. Male gorillas may stay in the same group, although they were born silverbacks and are categorized as such. If the silverback dies, certain gorillas may strive to dominate the group or stand and fight with the silverback to attain their objectives [32].
The accuracy and efficiency of the GTO were demonstrated [31]. The optimizer is simple for engineering applications and does not require many adjustments [27]. Furthermore, the GTO algorithm can produce good results for a wide range of system dimensions by increasing search capabilities. Other optimizers' performance drop significantly as the dimensions increase, giving them a significant advantage in all comparable dimensions [32]. For example, gorillas cannot live alone due to their group-living preferences. As a result, gorillas hunt for food in groups and are led by a silverback leader in charge of group choices. A silverback is regarded as the best in this algorithm, and any candidate of the gorillas tends to approach it. The weakest gorilla is excluded because it is the worst.
In this algorithm, gorillas are denoted as X, while silverbacks are denoted as GX. For example, consider a gorilla on the hunt for better food sources. As a result, the iteration process generates GX each time and exchanges it with another solution if a better value can be determined [30]. The GTO flowchart is shown in Figure 4. This algorithm is also divided into two phases as follows.

Exploration Phase
Silverback gorillas are the best possible choice solutions for each optimization step in the GTO algorithm, and all gorillas are regarded as potential solutions. Exploration has been carried out with three operators: the movement to unknown places to expand the GTO exploration further. The second operator balances the gorilla exploration and exploitation by moving to other gorillas. With the third operator migrating toward a known site in the exploration phase, the GTO may explore different optimization spaces more effectively.
The migration mechanism was selected by using a parameter named p. Before conducting the optimization procedure [30], the factor (p) in the range 0-1 must be specified to determine the likelihood of adopting a migration strategy to an unidentified location. A first mechanism is selected when rand < p. However, if rand 0.5, the mechanism of approaching other gorillas is chosen. However, if rand < 0.5, a movement to a well-known site is chosen, and each can deliver a good performance to the algorithm based on the strategies used. At the end of the exploration phase, all of the results are evaluated, and if GX(t) is the least expensive option, GX(t) is used instead of X(t) (Silverback). In addition, Equations (7)-(9) in Section 4.3.3 summarize three different mechanisms [27].

Exploitation Phase
There are two types of mechanisms for use during this phase. The first mechanism is "follow the silverback", while the second includes "adult female competition". The decision can be made by comparing the value of D with the random number W chosen at the start of the optimization procedure [27].
The newly established group's leader Silverback is a young and fit gorilla whom the other male gorillas closely follow. Similarly, they follow Silverback's orders to find food and travel to various locations. Members of a group can also influence each other's movements within the group. For example, Silverback directs his gorillas to travel to food-supply locations to locate food, and this strategy can also be used with D W. When young gorillas reach maturity, they struggle with other adult gorillas for the right to choose females for their group, which is a frequently violent process. This strategy can also be used when D < W. If GX(t) has a lower cost than X(t), GX(t) replaces X(t) and is found to be the best alternative (Silverback) [30].

Related Studies
Recently, many researchers studied machine learning in the medical field. Finding a more accurate and efficient method for diagnosing and predicting AD is a hot research topic [14]. Deep learning has great potential in diagnosing AD based on imaging or molecular data. This section explores the current state of the art that uses deep learning architectures for AD diagnosis and prediction.
Islam and Zhang [33] developed a DCNN model for AD four-class classification based on MRI images. The Inception-V4 model was trained and tested on the OASIS dataset. The proposed model achieved an accuracy of 73.75%. However, the proposed model suffered limited datasets and low accuracy. Zhang et al. [34] introduced an extreme learning machine (ELM) model for binary AD classification. First, manually segmented Voxel-based Morphometry images from the ADNI database of 627 patients were used. Then, feature calculation, simple feature extraction, and classification were performed using the ELM model. Ten-fold cross-validation was performed to ensure the ELM model validity, which achieved an accuracy of 96%. However, its major drawbacks are dataset limitation and poor feature selection.
Martinez et al. [35] studied applying deep learning to discover the relationship between symptoms, tests, and features extraction using Convolutional Autoencoders (CAEs). This study began with data acquisition from three data sources: MRI from the ADNI database, data obtained via the Alzheimer's Disease Assessment Scale (ADAS), and the Clinical Dementia Ratio (CDR-SB). After data preprocessing, CAEs were used for feature extraction and manifold modeling and achieved a classification accuracy of 85%. Saratxaga et al. [2] developed an approach for the AD multi-class classification based on deep learning-based techniques. They used 305 MRI images from the OASIS database and CDR clinical annotation. They used different pretrained architectures, and the ResNet achieved the best results with an accuracy of 93%. Raees et al. [36] introduced a light DL classification and feature extraction approach. They deployed different pretrained models to build a trinary classifier. Functional MRI (fMRI) images retrieved from the ADNI database were used for training and testing. The VGG19 achieved the highest accuracy of 90%. Buvaneswari and Gayathri [37] introduced a segmentation, feature extraction, and classification approach based on deep learning. From the ADNI, 240 sMRI images with SegNet were used to train the ResNet-10 architecture for classification. The proposed approach recorded an accuracy of 95%.
Katabathula et al. [38] developed a lightweight 3D DenseCNN2 model for AD classification. The DenseCNN2 was built on the global shape and visual hippocampus segmentation. Their proposed model was trained and tested with 933 sMRI images obtained from the ADNI. The DenseCNN2 model achieved a classification accuracy of 92.52%. Mahendran and Vincent [12] developed a feature selection and classification approach for AD. They used a DNA methylation dataset that consisted of 68 records. First, preprocessing was performed to improve the classification performance. The feature selection was then applied using Ada Boost, Random Forest, and SVM to select useful genes. An Enhanced Deep Recurrent Neural Network (EDRNN) model was used for classification. They used the Bayesian optimization technique with five-fold cross-validation for hyperparameter optimization. The approach achieved an accuracy of 87% with the Ada Boost. Zhang et al. [39] introduced an effective CNN-based framework based on T1-weighted structural MRI (sMRI) images from the ADNI. Data preprocessing was performed using conventional procedures. An improved framework tresnet of a residual network was used for classification. The proposed method achieved a classification accuracy of 90%.
Liu et al. [15] developed a multi-scale CNN with a channel attention mechanism for enhanced AD diagnosis. They used preprocessing and segmentation to obtain the WM and GM datasets and model training. They extracted multi-scale features and fused them between channels to obtain more comprehensive information. ResNet-50 was used and achieved an accuracy of 92.59%. The CLSIndRNN model for AD feature selection and classification was introduced in [9] using the ADNI dataset, which contains 805 samples of MRI images. A recurrent neural network regression was used to predict the early diagnosis clinical score. Image preprocessing, feature selection, and classification techniques proved the effectiveness of the proposed model in clinical scores prediction. Shanmugam et al. [16] introduced a transfer learning-based approach for multi-class detection for cognitive impairment stages and AD. They used GoogLeNet, AlexNet, and ResNet-18 networks trained and tested by 6000 MRI ADNI images. The ResNet-18 network achieved the highest classification accuracy of 98.63%. Kong et al. [3] developed a deep learning-based strategy that involved a novel MRI and PET image fusion and 3D CNN for AD multi-classification methods. The ADNI dataset of 740 3D images was used. The proposed strategy achieved an accuracy of 93.5%. A study [40] applied network architecture and hyperparameters optimization based on a Genetic Algorithm. They used an amyloid brain image dataset that contains PET/CT images of 414 patients. The proposed algorithm achieved a classification accuracy of 81.74%. A TL-based approach for Alzheimer's diagnosis based on sagittal MRI (sMRI) was introduced in [13]. The authors used the ADNI and OASIS datasets and concluded that sMRI can be used effectively to differentiate AD stages and that TL is necessary for completing the task.
Helaly et al. [4] developed a deep learning-based framework for the early multiclassification of AD named the E 2 AD 2 C framework. The E 2 AD 2 C framework consists of six stages: Data Acquisition, Preprocessing, Data Augmentation, Classification, Evaluation, and Application. For classification, they used two architectures: (1) the light CNN architectures and (2) transfer learning-based architecture. The ADNI dataset for 300 patients divided into four classes was used. The E2AD2C framework achieved accuracies of 93.61% and 95.17% for 2D and 3D multi-class AD stage classifications. In addition, an accuracy of 97% was recorded via the VGG19 model. Then, the same authors developed a deep learning-based framework for hippocampus segmentation [41] using the U-Net architecture. This framework consists of four stages: data acquisition, preprocessing, data augmentation, and segmentation. The segmentation step was performed via two architectures: (1) the U-Net architecture with hyperparameter tuning and ResNet pretrained based on U-Net. They achieved an accuracy of 97% using the ADNI dataset. Andrea [17] developed an automatic deep-ensemble approach for AD classification. They used MRI and fMRI images from the Kaggle, OASIS, and ADNI datasets. They evaluated AlexNet, ResNet-50, ResNet-101, GoogLeNet, and Inception-ResNet-v2 architectures. The proposed approach achieved 98.51% and 98.67% accuracy in binary and multi-class classification. Serkan [17] used different pretrained CNN architecture for the trinary classification of AD. T1-weighted sMRI 2182 images were used from the ADNI database. After data acquisition, preprocessing was performed in three steps. For data analysis, he used DL architectures created with the CNN algorithm. The EfficientNetB0 model achieved the best accuracy of 92.98%.
CNN and deep learning-based approaches have been widely studied as a key methodology for AD diagnosis. However, there are still challenges, such as the MRI image complexity, CNN-based methods that cannot analyze MRI images on the deep structure, the empirical design of DL technologies, limited datasets, time and space complexity, inaccuracy, and large model parameters and hyperparameter optimization.

Methodology
The main objective of this study is to introduce a novel framework for automatic and accurate classification of Alzheimer's based on MRI images with the help of transfer learning and an Artificial Gorilla Troops Optimizer (GTO). The framework is called A 3 C-TL-GTO. Figure 5 depicts the different framework stages. The stages and processes will be discussed in the next subsections.

Data Acquisition
The datasets can be retrieved from different sources, such as online repositories. The current study retrieves the datasets from Kaggle and IDA (Image and Data Archive by LONI). In addition, the experiments are performed on two datasets named Alzheimer's Dataset (4 class of Images) and Alzheimer's Disease Neuroimaging Initiative (ADNI).
Alzheimer's Dataset (4 class of Images): This dataset consists of MRI images that are hand-collected from different verified websites [42]. It is partitioned into four classes: Mild Demented, Moderate Demented, Non-Demented, and Very Mild Demented. It consists of 6400 images. The dataset can be retrieved from [42].
Alzheimer's Disease Neuroimaging Initiative (ADNI): The DICOM data is downloaded from LONI. The current study focused on the MRI T2-weighted axial cases. The data are partitioned into three classes, AD (Alzheimer), NC (Normal Cohort), and MCI (Mild Cognitive Impairment), and organically counted, 17,976, 138,105, and 70,076, respectively [43]. The dataset can be retrieved from (accessed on 1 February 2022) http://adni. loni.usc.edu/ and https://ida.loni.usc.edu/. Figure 6 shows samples from each dataset. It shows the "Alzheimer's Dataset (4 class of Images)" dataset with its four categories in the first row and the "Alzheimer's Disease Neuroimaging Initiative (ADNI)" dataset with its three categories in the second row.

Data Preprocessing
The second stage focuses on preprocessing the datasets by applying four processes. They are data conversion and cleaning, data resizing, data scaling, and train-to-test splitting.

Data Conversion and Cleaning
The ADNI dataset is subjected to the cleaning process. It means that the noisy images are ignored, as shown in Figure 7. In this process, the DICOM records are converted to images, the SNR values are calculated, and the noisy images are removed using a signal-to-noise (SNR) threshold of 1.15.

Data Resizing
The images in the target dataset have various dimensions; hence, equalizing their dimensions (i.e., width and height) is required. The current study uses the size of (128, 128, 3) using the bicubic interpolation in the RGB color mode.

Categories Encoding
The categories are encoded and converted to numeric values. This process is run on the two used datasets. For example, the ADNI categories (i.e., NC, MCI, and AD) are converted to [0, 1, 2].
where X is the input image, X output is the scaled image, µ is the image mean, and σ is the image standard deviation.

Train-To-Test Splitting
The two used datasets are split into training, testing, and validation subsets. The dataset is partitioned into training (and validation) with 85% images to testing with 15% images.

Dataset Balancing
More records in one category than another, then leads the model to learn, extracting the features from the model with the highest instances better than the others. Hence, data balancing is required to overcome that issue. The current study balances the datasets during the training process using data augmentation techniques that can be applied using different techniques, including GANs [44].

Classification, Learning, and Optimization Phase
After preprocessing the datasets and generating the initial population, the learning phase comes in. This phase utilizes the GTO metaheuristic optimizer to optimize the different transfer learning hyperparameters, such as the appliance of data augmentation and batch size. The followed approach is to find the best hyperparameters configurations for each used pretrained transfer learning model. This stage utilizes three processes. They are summarized in Algorithm 1 and in Figure 5. As presented in it, the first process runs only once, while the other two processes run repeatedly for a number of iterations T max .

Algorithm 1:
The hyperparameters optimization overall process in short.

Initial Population Generation
The population is randomly generated once at the beginning of the optimization processes. The number of solutions in the population pack is set to N max . Each solution is a vector with a size of 1 × D where each element in the solution ∈ [0, 1]. The value of D is determined according to the number of hyperparameters in the current study. It is set to 16. Equation (5) shows the population initialization process.
where X denotes the whole population solutions matrix, LB is the lower boundaries vector, UB is the upper boundaries vector, and rand is a random values vector ∈ [0, 1].

Fitness Function Calculation
In the current step, the fitness function score is evaluated for each solution. As described earlier, each solution consists of random floating-point numbers ∈ [0, 1]. Hence, it is required to convert (i.e., map) them to the corresponding hyperparameters as defined in Table 1. Apply data augmentation or not 8 Rotation value (in case of data augmentation is applied) 9 Width shift value (in case of data augmentation is applied) 10 Height shift value (in case of data augmentation is applied) 11 Shear value (in case of data augmentation is applied) 12 Zoom value (in case of data augmentation is applied) 13 Horizontal flipping flag (in case of data augmentation is applied) 14 Vertical flipping flag (in case of data augmentation is applied) 15 Brightness changing range (in case of data augmentation is applied) How to apply the mapping technique? To recognize the working mechanism of this mapping process, let us assume that we need to map the batch size (i.e., the second element) from the solution cell to a corresponding hyperparameter. It is required first to determine the allowed batch sizes range to select from. The current study utilizes the "4 → 48 (step = 4)" range. Hence, there are 12 possibilities. With a simple calculation (Equation 6), the possibility can be determined. For example, if the random numeric value is 0.75 and there are 12 possibilities, then the index is 9 (i.e., the batch size value of 36). It is worth noting that ranges of each hyperparameter are presented in Table 2.
After mapping each element in the solution to the corresponding hyperparameter, the target pretrained transfer learning model is compiled with the hyperparameters. DenseNet201, MobileNet, MobileNetV2, MobileNetV3Small, MobileNetV3Large, VGG16, VGG19, and Xception with the "ImageNet" pretrained weights are the utilized pretrained transfer learning CNN models. Each pretrained transfer learning CNN model will begin the learning process on the split subsets for a number of epochs that is set to 5 in the current study. To validate its generalization, the pretrained transfer learning CNN model is evaluated on the entire entered dataset.
The different utilized performance metrics in the current study are: Accuracy, F1-score, Precision, Recall (or Sensitivity), Specificity

Population Updating
In terms of fitness scores, the population is arranged in descending order. As a result, the best solution is at the top and the worst solution is at the bottom. This process is crucial to determine X t best and X t worst in the case of them being required in the population updating process. The current study utilizes the GTO metaheuristic optimizer to determine the best hyperparameters for each CNN model.
The GTO works on the (1) three exploration mechanisms, (2) an exploitation mechanism, and (3) a competition for adult females mechanism. Equation (7) represents expanded exploration process, Equation (8) represents the exploitation mechanism, and Equation (9) represents the competition for adult females mechanism.
where r 1 , 2 , and r 3 are three random values, X r (t) is a random solution from the population, X silverback is the silverback gorilla position vector (i.e., best solution), Q simulates the impact force, and A is the coefficient vector to determine violence degree in conflicts.

The Suggested A 3 C-TL-GTO Framework Pseudocode
The steps are iteratively computed for a number of iterations T max . After completing the learning iterations, the best combination can be used in any further analysis. Algorithm 2 summarizes the proposed overall classification, learning, and hyperparameters optimization approach.

Experiments Configurations
The configurations of the experiments performed in this study are described in Table 2.

The "Alzheimer's Dataset (4 Class of Images)" Experiments
The A 3 C-TL-GTO framework stages are run on the "Alzheimer's Dataset (4 class of Images)" dataset. Table 3 reports the confusion matrix (i.e., TP, TN, FP, and FN) for each pretrained CNN model. From Table 3, different performance metrics can be reported, as shown in Table 4. Table 5 reports the corresponding best hyperparameters produced that lead to the reported results. The "Categorical Crossentropy" is the recommended loss function from five models. The "SGD" is the recommended parameters' optimizer from three models. Applying data augmentation to balance and increase the diversity of the images during the training process is recommended by six models. Figure 8 summarizes the performance metrics graphically. The x-axis shows the metrics, while the y-axis shows the scores. It shows that the "MobileNet" pretrained CNN model reports the highest performance metrics. Figure 9 shows the confusion matrices for the used models.   Confusion Matrices using "Alzheimer's Dataset (4 class of Images)" Dataset Figure 9. The confusion matrices using the "Alzheimer's Dataset (4 class of Images)" dataset.

The "Alzheimer's Disease Neuroimaging Initiative (ADNI)" Experiments
The A 3 C-TL-GTO framework stages are run on the "Alzheimer's Disease Neuroimaging Initiative (ADNI)" dataset. Table 6 reports the confusion matrix (i.e., TP, TN, FP, and FN) for each pretrained CNN model. From Table 6, different performance metrics can be reported, as shown in Table 7. Table 8 reports the corresponding best hyperparameters produced that lead to the reported results. The "KLDivergence" is the recommended loss function from six models. The "AdaGrad" and "AdaMax" are the recommended parameters' optimizers from three models each. Applying data augmentation to balance and increase the diversity of the images during the training process is recommended by seven models. Figure 10 summarizes the performance metrics graphically. The x-axis shows the metrics, while the y-axis shows the scores. It shows that the "Xception" pretrained CNN model reports the highest performance metrics. Figure 11 shows the confusion matrices for the used models.   Confusion Matrices using "Alzheimer's Disease Neuroimaging Initiative (ADNI)" Dataset Figure 11. The confusion matrices using the "Alzheimer's Disease Neuroimaging Initiative (ADNI)" dataset. Figure 12 presents a graphical summary of the performed work in the current study concerning the hyperparameters selection process. The best models are added at the right of the figure. The different hyperparameters are added in a gray color, while the best hyperparameters are added in a different color.

The Proposed Approach and Related Studies Comparison
A comparison between the suggested A 3 C-TL-GTO framework and other related state-of-the-art studies is conducted in Table 9. It is clear that the A 3 C-TL-GTO framework outperforms most of the related studies. One of the main objectives of the suggested approach is to design a general framework that utilizes the pretrained CNN model and hyperparameters tuning using metaheuristic optimizers. In other words, the framework is adaptable to the metaheuristic optimizer and the used datasets. Hence, in comparison with the related studies, the systems are compared as black boxes. One of the main advantages of the suggested framework is that it is not sensitive to the datasets and their outliers. Table 9. Comparison between the suggested approach and related studies.

Conclusions
With the rapid growth of artificial intelligence, computer vision has become increasingly helpful in identifying Alzheimer's disease. In recent years, deep learning technology has increasingly dominated medical imaging and has been successfully used to automate AD detection by analyzing medical pictures. A deep network model based on transfer learning, which Gorilla Troops optimizes, has been developed to aid in the classification of Alzheimer's disease patients for early diagnosis. In the present study, an empirical quantitative framework for automatic and accurate Alzheimer's classification is proposed and evaluated using multi-class MRI datasets. The convolutional neural network (CNN) performance is primarily affected by the hyperparameters selected and the dataset used. The proposed framework reduces the bias and variability of the preprocessing steps and optimization hyperparameters to the classifier model and dataset utilized. Specifically, the proposed framework comprises CNN, transfer learning (TL), and the Gorilla Troops Optimizer (GTO) for optimizing parameters and hyperparameters. The transfer learning hyperparameters are optimized using the GTO natural-inspired optimizers. The ADNI dataset, an online dataset on Alzheimer's disease, is used to obtain the brain's magnetic resonance (MR) pictures. When all models are compared, MobileNet and Xception achieved a top accuracy of 96.65% and 96.25%, respectively.