Deep Learning-Based Computer-Aided Diagnosis (CAD): Applications for Medical Image Datasets

Computer-aided diagnosis (CAD) has proved to be an effective and accurate method for diagnostic prediction over the years. This article focuses on the development of an automated CAD system with the intent to perform diagnosis as accurately as possible. Deep learning methods have been able to produce impressive results on medical image datasets. This study employs deep learning methods in conjunction with meta-heuristic algorithms and supervised machine-learning algorithms to perform an accurate diagnosis. Pre-trained convolutional neural networks (CNNs) or auto-encoder are used for feature extraction, whereas feature selection is performed using an ant colony optimization (ACO) algorithm. Ant colony optimization helps to search for the best optimal features while reducing the amount of data. Lastly, diagnosis prediction (classification) is achieved using learnable classifiers. The novel framework for the extraction and selection of features is based on deep learning, auto-encoder, and ACO. The performance of the proposed approach is evaluated using two medical image datasets: chest X-ray (CXR) and magnetic resonance imaging (MRI) for the prediction of the existence of COVID-19 and brain tumors. Accuracy is used as the main measure to compare the performance of the proposed approach with existing state-of-the-art methods. The proposed system achieves an average accuracy of 99.61% and 99.18%, outperforming all other methods in diagnosing the presence of COVID-19 and brain tumors, respectively. Based on the achieved results, it can be claimed that physicians or radiologists can confidently utilize the proposed approach for diagnosing COVID-19 patients and patients with specific brain tumors.


Introduction
Ever since the development of the first known "expert system" in medicine, researchers have been striving to explore the possibilities of engaging artificial intelligence to solve medical problems of various natures [1]. The advent of modern computers paved the way for a completely new domain that emerged as a combined struggle of medical personnel and computer scientists, known as computer-aided diagnosis (CAD) [2]. The first commercial CAD system was approved by the US Food and Drug Administration (FDA) for mammography in 1998 [3]. Since then, several CAD systems have been developed for analyzing the chest, brain, heart, colon, and kidney, among many other organs, based upon conventional projection radiography, ultrasound [4], computed tomography (CT), and magnetic resonance imaging.
The modern concept of the CAD is to help physicians, pathologists, or radiologists in decision-making. However, researchers in the 1960s and 1970s had a different perspective [5,6]. The original intention was to replace humans with computers for detecting

•
The extraction of features from the image dataset using auto-encoder and well-known pre-trained CNNs, e.g., ResNet and AlexNet; • The selection of the most notable features using ACO to enhance accuracy; • A generic framework is proposed that can work on multiple datasets, such as MRIs and X-rays; • The evaluation of the proposed classification model against the current baseline diagnostic models.
The remainder of this paper is organized as follows: the related work is presented in Section 2; the datasets and their pre-processing, methodologies, deep learning techniques, and the proposed feature selector are described in Section 3; and the results are presented and analyzed in Section 4. Then, in Section 5, some discussions are made. Finally, the conclusion and future recommendations are presented in Section 6.

Related Works
The adaptation of deep learning methods to perform automated CAD for brain tumors and COVID-19 has been astonishing [16,17]. Deep convolutional neural networks are compelling neural network architectures that can produce promising results when applied for image analysis, pattern recognition, and other purposes. For medical image analysis primarily used for diagnostics, the accuracy of the prediction is vital. The prediction based upon image dataset generally depends upon segmentation and pre-processing, feature extraction and selection, and classification [18].

Segmentation and Pre-Processing
The first stage in the CAD process is gathering the relevant data and executing their quality assessment [18]. After the initial screening, all corrupted images are removed from the dataset. After the attainment of the dataset, a large group of data is distributed into several subsets based upon some uniformity conditions through a process known as segmentation. It is regarded as one of the essential tasks in medical image processing to enhance the already existing information [19]. The wide variety of segmentation techniques can be broadly classified into classical (pixel-based, region-based, edge-based, texturebased) and intelligent (artificial neural network-based, fuzzy logic-based, decision treebased, genetic algorithm-based) [20].
The captured chest X-ray (CXR) and MRI images are mostly considered noisy, inconsistent, and incomplete; therefore, pre-processing these images plays a crucial role in acquiring higher accuracy [21]. Different types of linear and non-linear filters are extensively used for denoising. Filtering is also considered a mean for suppressing unwanted information and enhancing the wanted information. According to their architecture, different deep learning methods have specific input image requirements; thus, image resizing, cropping, normalization, and padding are considered the essential steps in pre-processing [22]. The volume, region, and intensity adjustments are also considered effective pre-processing techniques to improve the execution rate.

Feature Extraction and Selection
Accurate diagnosis is also dependent upon the availability of the large-scale and complex dataset. This also entails the requirement of large memory space and computing power. Feature extraction and selection can be potentially used as a means to remove irrelevant or duplicate information; hence, the time and storage-space requirements can be reduced, which may also lead to the better performance of the CAD systems.
To minimize the information loss and reduce the dimensionality, principal component analysis (PCA) is utilized for the feature selection and extraction [23]. The authors in [24] used a combination of discrete wavelet transform [25] and Gabor filter for feature selection from magnetic resonance images. Some recent work used deep learning techniques for the extraction and selection of features from medical images [22,26,27]. Studies were conducted to extract deep features from brain MRI images using pre-trained networks [28]. Rajpurkar et al. [29] used discriminative chest X-ray (CXR) imaging features to detect pneumonia at different stages through CNN.

Classification
The diagnostic predictions are performed through the central processing module, usually called a classifier. A common problem that exists today is the determination of an appropriate classifier for a given dataset. It is observed that improper classifier selection may substantially reduce the accuracy. Multiple classifiers exist, including those based on neural networks, decision trees, and support vector machines. Recent trends have shown that deep learning-based methods have received much appreciation from investigators for the improved classification on various datasets. In [30], the gray-level co-occurrence matrix is proposed to classify brain tumors as Meningioma, Glioma, and Pituitary based upon CNN. Mustafa R. Ismael et al. [24] proposed a system that integrates statistical features and neural network methods to classify brain cancers in MRI images. The deep convolutional neural network-based multi-grade is presented in [31] for brain tumor classification.
It is typical for a Rician distribution to control the noise in MRI scans. The bendlets system is a second-order shearlet transform with bent elements and is considered an effective tool for sparsely representing images with curve outlines, such as brain MRI scans. An adaptive denoising approach for microsection pictures with Rician noise is proposed in [32] using the bendlet characteristic. This method allows for identifying the curve's texture and contour as low-frequency components. The Rician noise is clearly recognized to belong to a high-frequency channel, thus making it simple to eliminate without impairing the clarity of the contour.
Alqahtani, Ali et al. developed a novel deep CNN-based framework for chest Xray analysis [33] that is computationally light and efficient. They used several machine learning classifiers to improve the framework's discriminating capabilities and suggested a new COV-Net to learn COVID-specific patterns from chest X-rays. The network can assist in the multi-class categorization of two different infection types by utilizing maxpooling procedures.
In [34], soft sensor techniques are used to assess the neurocognitive dysfunctions unique to neurodevelopmental disorders, ADD/HD, and specific learning impairments. The bendlet transform was proposed on the basis of the shearlet transform and used for image processing [35]. K Gopalakrishnan et al. investigated the capacity of several pre-trained DCNN models-Alexnet, Resnet50, GoogLeNet, VGG-16, Resnet101, VGG-19, Inceptionv3, and InceptionResNetV2-using transfer learning to classify diseased brain images [36].
In [37], the authors attempted to establish an automated method for categorizing fundus images. Through an ablation study, Tong He et al. evaluated a number of these modifications' effects on the final model's accuracy empirically [38]. Samir S. Yadav et al. classified pneumonia using a convolutional neural network (CNN)-based technique on a chest X-ray dataset [39].
In [40], the authors used the deep convolutional neural network to detect COVID-19 from the X-ray imaging system. They introduced the new open-access benchmark that included 13,975 chest X-ray (CXR) images across 13,870 patient case datasets. The threeplayer knowledge transfer and concentration framework counting a pre-trained appearing network that extracts the chest X-ray (CXR) imaging features from a large scale of lung disease CXR images is presented in [41]. The deep learning based on the DarkNet model is used to classify the binary and multi-classes cases of COVID-19 [42]. In this model, YOLO was employed for the real-time object detection. A rapid and valid method [43] diagnoses COVID-19 with an artificial neural network. They used and tested ten various CNNs to distinguish infection of COVID-19 cases from non-COVID-19.
The existing deep learning diagnostic models use CT, CXR, or MRI images for the detection of brain tumors and COVID-19. The performance indicators of these models indicate that there is still massive room for improvement. In this study, the main emphasis is on selecting the most effective and important features based on ant colony optimization (ACO). The obtained features are then processed through state-of-the-art deep learning classifiers to perform an accurate diagnosis. This paper also strives to develop a generic framework that can work on multiple datasets, such as MRI and CXR.

Material and Methods
In this study, for detecting brain tumors and COVID-19, the features from the image datasets are obtained either through pre-trained CNN or Auto-encoder. These methods are implemented separately and combined with feature selection methods. The proposed framework tends to work with datasets of different natures. To prove its efficacy, it is tested on two different medical image datasets based on MRI and CXR to detect brain tumors and COVID-19. Both datasets have been used widely by the research community in the recent past.

COVID-19 Dataset
The coronavirus strain COVID-19 (coronavirus disease 2019) causes the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first case was reported in Wuhan, China, in late December 2019, before spreading worldwide [44]. The dataset contains 6432 X-ray images of varying sizes [45]. The distribution for the training-testing is defined as 80-20, consisting of three classes: COVID-19, pneumonia, and normal. The actual number of training images for COVID-19, pneumonia, and normal is defined as 460, 3418, and 1266, respectively. For testing, 116 COVID-19, 855 pneumonia, and 317 normal images are selected. Figure 1 shows a sample from each class for the COVID-19 dataset.

Material and Methods
In this study, for detecting brain tumors and COVID-19, the features from the image datasets are obtained either through pre-trained CNN or Auto-encoder. These methods are implemented separately and combined with feature selection methods. The proposed framework tends to work with datasets of different natures. To prove its efficacy, it is tested on two different medical image datasets based on MRI and CXR to detect brain tumors and COVID-19. Both datasets have been used widely by the research community in the recent past.

COVID-19 Dataset
The coronavirus strain COVID-19 (coronavirus disease 2019) causes the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first case was reported in Wuhan, China, in late December 2019, before spreading worldwide [44]. The dataset contains 6432 X-ray images of varying sizes [45]. The distribution for the training-testing is defined as 80-20, consisting of three classes: COVID-19, pneumonia, and normal. The actual number of training images for COVID-19, pneumonia, and normal is defined as 460, 3418, and 1266, respectively. For testing, 116 COVID-19, 855 pneumonia, and 317 normal images are selected. Figure 1 shows a sample from each class for the COVID-19 dataset.

Brain Tumor Dataset
The figshare brain tumor MRI dataset is publicly available and is commonly used for classification evaluation. The involved dataset was acquired from Nanfang Hospital, Guangzhou, China, and Tianjin Medical University General Hospital, China, between 2005 and 2010 [46]. This dataset includes 3064 T1-weighted contrast-enhanced images obtained from 233 patients. Three different types of brain tumor are considered in this dataset: meningioma (708 slices), glioma (1426 slices), and pituitary (930 slices). This study defines the distribution between train and test datasets as 80-20. It means that meningioma has 566 slices for training the model and 142 slices for testing, glioma has 1141 slices for training and 285 slices for the testing, and pituitary has 744 slices to train the model and 186 slices for evaluation. Figure 2 represents the brain dataset samples for the three classes.

Brain Tumor Dataset
The figshare brain tumor MRI dataset is publicly available and is commonly used for classification evaluation. The involved dataset was acquired from Nanfang Hospital, Guangzhou, China, and Tianjin Medical University General Hospital, China, between 2005 and 2010 [46]. This dataset includes 3064 T1-weighted contrast-enhanced images obtained from 233 patients. Three different types of brain tumor are considered in this dataset: meningioma (708 slices), glioma (1426 slices), and pituitary (930 slices). This study defines the distribution between train and test datasets as 80-20. It means that meningioma has 566 slices for training the model and 142 slices for testing, glioma has 1141 slices for training and 285 slices for the testing, and pituitary has 744 slices to train the model and 186 slices for evaluation. Figure 2 represents the brain dataset samples for the three classes.

Convolutional Neural Network
This study combines pre-trained CNNs with ant colony optimization (ACO) and some classifiers for image classification. In the first stage, the pre-trained model is applied to extract important features from the image. AlexNet is one of the several pre-trained classifiers involved in this study, and is known as a common model in image classification studies. The study aims to extract high-level and sensitive features from input images. The CNN model realizes human vision technology. Nowadays, CNN is widely used in several computer vision problems as it implements the shared weight technique to reduce the computation rather than the fully connected approach. AlexNet consists of several

Convolutional Neural Network
This study combines pre-trained CNNs with ant colony optimization (ACO) and some classifiers for image classification. In the first stage, the pre-trained model is applied to extract important features from the image. AlexNet is one of the several pre-trained classifiers involved in this study, and is known as a common model in image classification studies. The study aims to extract high-level and sensitive features from input images. The CNN model realizes human vision technology. Nowadays, CNN is widely used in several computer vision problems as it implements the shared weight technique to reduce the computation rather than the fully connected approach. AlexNet consists of several layers: a convolution layer, pooling layer, and a fully connected layer. The extracted features differ from one classifier to another: Densenet 201 (1920) features.

Auto-Encoders
An auto-encoder is a neural network that is trained to replicate its input at its output. Auto-encoders can be used as tools to learn deep neural networks. Training an auto-encoder is unsupervised in the sense that no labeled data are needed. The training process is still based on the optimization of a cost function. The cost function measures the error between the input x and its reconstruction at the outputx.
An auto-encoder is composed of an encoder and a decoder. The encoder and decoder can have multiple layers, but for simplicity consider that each of them has only one layer [47,48].
If the input to an auto-encoder is a vector x ∈ R D x , then the encoder maps the vector x to another vector z ∈ R D (1) , as follows: where the superscript (1) indicates the first layer. h (1) : is a bias vector, and D x is the dimensional of layer x; R is real values. Then, the decoder maps the encoded representation z back into an estimate of the original input vector, x, as follows: where the superscript (2) represents the second layer. h (2) : R D x → R D x is the transfer function for the decoder, W (1) ∈ R D x ×D (1) is a weight matrix, and b (2) ∈ R D x is a bias vector.

Ant Colony Optimization Algorithm for Feature Selection
One of the interesting behaviors in nature for finding food, which has a high intelligence nature, is the behavior of ants. Ants have a clever way of finding optimal food and reducing their output error in finding the shortest way to it. In this paper, the behavior of ants is first introduced and a modified ant colony optimization for the feature selection method is introduced based on it. In the second phase, the deep learning and auto-encoderbased method are used for classifying the COVID-19 disease and brain tumor data [49].
The ant colony optimization algorithm was proposed in 1990 by Marco Dorigo as an innovative method for solving hybrid optimization problems [49]. This algorithm is derived from the actual behavior of ants to find food along the shortest path [50]. Each ant leaves a chemical material called pheromone along its path of finding food, and other ants choose the shortest path using these previously secreted pheromones. This algorithm is very useful for solving non-deterministic polynomial (NP) problems and is used in problems such as itinerant vendors, scheduling problems, vehicle routing problems, etc. [51].
To solve any NP problem using the ant colony algorithm, the following must be specified: • First, you have to turn the problem into a graph, including nodes and edges. • See distance nodes (η) are raised and specified.
• A possible solution is created according to the problem.

•
The pheromone update rule is used to determine the effective edges in achieving the best answer. • The probabilistic transition rule is used to find the next node [52].
Several different implementations of ant colony algorithms such as Ant System, Max-Min Ant System, and Ant Colony System have been proposed; the main difference of these methods is in the pheromone update formula [52]. The method of implementing the proposed algorithm is described. Feature selection will be based on the ant colony method, with two methods of measuring feature-class (FC) and feature-feature (FF).

Working Method of the Proposed Model
The method of implementing the algorithm is shown in Figure 3.  In this flowchart, Τ is the pheromone value in ith iteration.
First, a graphical model of all the features in the S dataset is introduced. Attributes are considered nodes and all nodes are interconnected. Then, τ, η, the number of ants, and number of repetitions must be determined [53]. The value of τ is known as the pheromone trail, and at the beginning of the algorithm its values for all attributes are a fixed number of one by default. The value of η is known as heuristic information and is equal to the inverse of the distance between the properties [54]; in this article this distance will be set based on the two methods FC and FF.
After determining the initial values, the algorithm is applicable. In each iteration, the ant is first randomly placed on a node. To determine the next ninety, the law of transfer is In this flowchart, T i is the pheromone value in ith iteration. First, a graphical model of all the features in the S dataset is introduced. Attributes are considered nodes and all nodes are interconnected. Then, τ, η, the number of ants, and number of repetitions must be determined [53]. The value of τ is known as the pheromone trail, and at the beginning of the algorithm its values for all attributes are a fixed number of one by default. The value of η is known as heuristic information and is equal to the inverse of the distance between the properties [54]; in this article this distance will be set based on the two methods FC and FF.
After determining the initial values, the algorithm is applicable. In each iteration, the ant is first randomly placed on a node. To determine the next ninety, the law of transfer is used, which is shown in Equation (3): The values of α and β are determined to make the values of τ and η more effective. j k is a set of traits that the ant has not yet met, and the trait that the ant has seen before is zero. The parameter q 0 is very important in determining the choice of both greedy and probabilistic methods, and the value of q is a random number between zero and one.
After the n ant has completed the node scan, the amount of pheromone obtained from the scan should be updated according to Equation (4): The value of ρ must be determined to reduce the effect. ∆τ k i is the inverse of the error obtained for the Wrapper method and is equal to the average number of nodes selected for the Filter method [53,54].

Criteria for Distance and Similarity of Features
The relationship between two random variables is divided into two categories: linear, and nonlinear. The most famous method for calculating linear variables is the correlation coefficient formula. In [55], the authors used the entropy method and information theory to calculate nonlinear variables. The problem with the correlation coefficient method is that it is inefficient on non-numerical and batch data, while the entropy method works well [55].
Entropy or irregularity criteria are used to measure the uncertainty of a discrete or continuous random variable. The entropy or H(X) of the discrete random variable X = (x 1 , x 2 , . . . , x n ) is calculated from Equation (5).
where p(x i ) is the probability of x i occurring on the whole set. Calculate the entropy of two discrete random variables X and Y with dimensions' n according to Equation (6).
Equation (7) is used to calculate the conditional entropy of X to condition Y.
The purpose of the above formulas is to calculate the information factor (IF). To examine how interdependent the two variables are, the IF criterion is used, which is in accordance with Equation (8): If the value of IF becomes zero, it means that the two variables are independent, and the higher this value, the more X and Y are dependent [56]. Figure 4 shows the relationship between information coefficient and entropy. amine how interdependent the two variables are, the IF criterion is used, which is in accordance with Equation (8): If the value of IF becomes zero, it means that the two variables are independent, and the higher this value, the more X and Y are dependent [56]. Figure 4 shows the relationship between information coefficient and entropy.  This study uses the normalized form of IG, known as symmetrical uncertainty (SU), which is consistent with Equation (9). The advantage of this formula is the normality of the dependence of the two variables between the range 0 and 1. If the value of SU is close to one, it means that the two variables are dependent, and if it is close to zero, it means that it is independent.
In this paper, two criteria SU FC and SU FF are used to calculate η [55].
The SU FC criterion is defined as the dependence of each attribute on the class. The closer this value is to one, the more important that feature will be and it should be selected.
The criterion SU FF means the dependence of two properties on each other. If its value is close to one, it means that the two properties are very similar, so we will be looking to remove one of the features.
In selecting attributes, we seek to preserve class-related attributes and remove duplicate or trivial attributes. The goal is to select features with SU FC higher and SU FF lower [57].

Dataset Pre-Processing
Pre-processing image datasets is an essential step that improves the quality of the features, hence diagnosis prediction. The pre-processing steps for CXR and MRI image datasets are detailed in Figure 5. Either auto-encoder or CNN later uses the processed data for the feature extraction. The RGB images are converted to grayscale when processed through an auto-encoder to reduce the amount of processed data by one-third. Furthermore, for auto-encoder images are resized to 64 × 64 and later converted to a single array (vector) without losing any features and to enhance the training model's quality. The brain tumor images are available as png files with the size of 512 × 512, whereas the chest X-ray (CXR) images have a jpg format of varying sizes. The mismatch among the input images is fixed by resizing them to a fixed 227 × 227 (Alexnet) or 224 × 224 (GoogleNet, ResNet-50, and DenseNet-201) resolution.
(vector) without losing any features and to enhance the training model's quality. The brain tumor images are available as png files with the size of 512 × 512, whereas the chest X-ray (CXR) images have a jpg format of varying sizes. The mismatch among the input images is fixed by resizing them to a fixed 227 × 227 (Alexnet) or 224 × 224 (GoogleNet, ResNet-50, and DenseNet-201) resolution.

Overall Architecture of the Proposed Framework
In the proposed framework ( Figure 6), features can be extracted either using pretrained CNNs (AlexNet, GoogleNet, ResNet-50, or DenseNet-201) or auto-encoder. Once features are extracted, the meta-heuristic, ant colony optimization (ACO), is utilized to choose the most effective and prominent features [58].

ER REVIEW 11 of 22
features are extracted, the meta-heuristic, ant colony optimization (ACO), is utilized to choose the most effective and prominent features [58]. The main goal of this study is to generate smaller subsets of essential features that help to expedite the search process and select the most optimal features. Afterward, learnable classifiers (decision tree, support vector machine, k-nearest neighbor, ensemble, naive Bayes, or discriminant analysis) can be employed for the diagnosis prediction. These classifiers learn from the features taken from the ACO algorithm and classify them based on the labels of input type. Table 1 shows the characteristics of the chosen classifiers. The initial selection of the classifier can be made based on the trade-off among prediction speed, memory usage, and interpretability.

Prediction Speed
Memory Usage Interpretability Fast Small Easy The main goal of this study is to generate smaller subsets of essential features that help to expedite the search process and select the most optimal features. Afterward, learnable classifiers (decision tree, support vector machine, k-nearest neighbor, ensemble, naive Bayes, or discriminant analysis) can be employed for the diagnosis prediction. These classifiers learn from the features taken from the ACO algorithm and classify them based on the labels of input type. Table 1 shows the characteristics of the chosen classifiers. The initial selection of the classifier can be made based on the trade-off among prediction speed, memory usage, and interpretability.

Experiments and Results Analysis
In this study, MATLAB2021a is used to execute the proposed method on Intel(R) Core (TM) i7-6500U CPU @ 2.50GHz 2.60 GHz. Many research studies are conducted to investigate the performance of the existing prediction methods. Once done with the pre-processing, the feature selection-extraction is performed, followed by the classification. In the former stage, an auto-encoder/CNN is combined with an ACO algorithm to extract and select the most prominent features from the input training dataset. The second stage employs a range of learnable classifiers to diagnose tumors or COVID-19 accurately.

Evaluation Metrics
The performance evaluation of the proposed framework is based upon the following performance matrices: sensitivity, specificity, precision, accuracy, F1-score, misclassification rate, and Mathews correlation coefficient (MCC). For the confusion matrix, TP represents the true positive, TN represents the true negative, FP represents the false positive, and FN represents the false negative. All the parameters used for the evaluation are listed in Table 2. Among these, accuracy is usually considered the most reliable measure to compare the performance of different classification methods.

Classification Using Learnable Classifiers
The performance of the CAD prediction system depends upon the combination of features and the classifier model. In this study, feature extraction-selection is either done using a combination of auto-encoder + ACO or pre-trained CNNs + ACO. For classification, a broad range of learnable classifiers is selected: decision tree, support vector machine, kernel nearest neighbor, ensemble, naive bayes, and discriminant analysis.

Feature Extraction and Feature Selection Using Auto-Encoder and ACO
In the initial stage, image features from the chest X-ray (CXR) images dataset are extracted using an auto-encoder; whereas most optimal features are selected based on ACO. For training and testing, six different classifiers are chosen to evaluate the performance of the proposed framework. The chest X-ray (CXR) image dataset is used to determine the health status of the patients, such as those affected by COVID-19 or pneumonia or considered normal. Table 3 shows the classification results using the evaluation matrices. The accuracy determined using true values of the test images is considered the most important parameter to evaluate the performance. Support vector machines are able to achieve the highest accuracy of 98.68% among all classifiers, followed by ensemble and discriminant as the second-best classifiers to achieve almost 98% accuracy. The misclassification rate also supports the accuracy results, as the highest MR of 1.31% is obtained using SVM. The NB classifier exhibits the worst performance with the lowest accuracy of 89.28% and an MR of 10.71%. The MRI image dataset is used to classify three types of brain tumors: meningioma, glioma, and pituitary. Once the image features are selected based on the auto-encoder, ACO searches for the best prominent features. Six different classifiers are used to evaluate the performance of the proposed framework on the MRI image dataset for diagnosing tumors (see Table 3). The classification results demonstrate that the highest classification accuracy of 99.18% is obtained through KNN; likewise, the MR is the lowest, i.e., 0.81%. The worst prediction results are obtained through NB, which achieves the lowest accuracy of 78.62% and the MR of 21.37%.

Feature Extraction-Selection Using Pre-Trained CNNs and ACO
For the feature extraction of the chest X-ray (CXR) images dataset, several pre-trained CNNs (AlexNet, GoogleNet, ResNet-50, and DenseNet-201) are considered. Once again, the ACO method is employed to select the most prominent image features. The training and testing of the CXR image dataset for the diagnosis of COVID-19, pneumonia, and normal is performed using six classifiers (see Table 1). Table 4 represents the evaluation matrices obtained through all classifiers against four pre-trained CNNs + ACO. The best overall results are obtained when feature extraction is made through ResNet-50 and classification is done through SVM. The highest accuracy of 99.61% is obtained with SVM and an MR of 0.38. The lowest possible accuracy of 94.95% is obtained for AlexNet evaluated against the NB classifier. It is worth mentioning that all the pre-trained CNNs have been able to achieve decent results. The MRI image dataset is processed through pre-trained CNNs and ACO to determine the existence of a specific type of brain tumor. The evaluation matrices for classifying brain tumors using six different classifiers are shown in Table 5. It is observable that features extracted using AlexNet, when evaluated against discriminant analysis classifier, produce the best results. The highest classification accuracy of 98.69% is obtained through discriminant analysis, whereas the lowest accuracy of 87.27% is obtained when features using GoogleNet are evaluated against the decision tree classifier.

Performance Comparison with Existing Methods
The comparison analysis is performed for our proposed method with the most prominent methods related to brain tumors and COVID-19. Table 6 shows the classification accuracy results obtained through the selected state-of-the-art methods and our proposed approach using MRI image-dataset. Classification accuracy is a measure to gauge the amount of certainty to perform the correct diagnosis of brain tumor type. For trainingtesting, a proportion of 80-20 is defined for all the involved methods. The comparison table shows that the proposed method attains superior results in diagnosing brain tumor type when feature extraction, selection, and classification are performed using auto-encoder, ACO, and KNN, respectively.

Method
Acc (%) CNN with data fusion [64] 98.27 ANN [65] 83.98 DNN + SVM [66] 95.33 DNN [67] 94.80 Random Forest [68] 95.90 Stacked-auto-encoder [9] 94.70 Deep Convolutional Auto-encoder [69] 76.52 CNN + Auto-encoder [70] 96.05 Tailored CNN [40] 92.30 Dense Net [41] 88.90 Capsule Networks [71] 95.70 DarkNet-19 based CNN [42] 87.02 Deep Learning [42] 98.08 Deep Learning [43] 86.  (Table 8) compares the work of (ACO) vs. the work of (GA) as a feature selection, and shows an obvious difference in time and accuracy; GA takes twice as long and is less accurate than ACO. Ant colony optimization reaches the global minimum point faster than the genetic algorithm (GA) as it avoids being trapped in the local minimum. By simulating the intelligent behavior of ants, ACO tries to find optimal solutions to various optimization problems. It has gained considerable interest worldwide because of its advantages, such as simple implementation, small number of parameters, flexibility, etc.
Ant colony optimization has simplicity, flexibility, robustness, scalability, and selforganization. It has few control parameters as compared to a genetic algorithm (GA). The execution of various tasks can be undertaken by individuals simultaneously. Memory space is less utilized by swarm intelligence (SI) compared to GA.
The benefits of ACOs are numerous and there are many stakeholders who obtain advantages from this model of care. The patient community gains a wide number of advantages including improved outcomes, a better quality of care, greater engagement with providers, and an overall reduction in out-of-pocket costs. They have an advantage over the simulated annealing and genetic algorithm approaches of similar problems when the graph may change dynamically; the ant colony algorithm can be run continuously and adapt to changes in real-time [72][73][74].

Discussion
The advancement in the domain of artificial intelligence has also influenced diagnostics in medical science. This progress has enabled computer scientists to build up CAD tools with the assistance of medical personnel. Physicians frequently use such tools worldwide to perform an accurate diagnosis prediction. Although deep learning methods achieved higher accuracy, their success is highly dependent upon the features and their pre-processing. Today, there exist a plethora of diverse methods; the main problem is the selection of an appropriate method. In this study, we attempted to combine the most suitable methods for feature extraction-selection and classification to attain the highest accuracy using image-dataset.
To prove the effectiveness of the proposed method, two different medical image datasets are employed: CXR for the determination of COVID-19 and MRI for diagnosing brain tumors. Many researchers have evaluated the performance of different existing prediction models for COVID-19 and brain tumors. It is observed that they can achieve accuracy to a certain extent, beyond which further improvement is not made. From [9,69], it is witnessed that the auto-encoder extracted many unimportant image features; thus, it is required that the features should be further enhanced and the best ones should be selected. This study used a unique combination of deep learning techniques and a meta-heuristic algorithm to perform feature extraction and selection. The results reported in Tables 6 and 7 show that the proposed method outperformed other state-of-the-art methods. This shows that our method is extensive and can be used for different types of medical image datasets.
In [31], researchers applied the Deep Convolution Neural Network VGG-19 based on the SoftMax classifier, resulting in 94.58% accuracy. This showed that combining the GLCM and Deep Convolution Neural Network VGG-19 leads to highly accurate results, which the authors in [62] obtained for the related comparison of the brain tumor dataset mentioned in Table 6.
For other Covid-19 dataset comparisons in Table 7, when the feature extraction method is not combined with feature selection methods such as Dense Net, we cannot expect results with high accuracy. However, in [41], our study, when we made a combination of the pre-trained Densenet201 with the ACO algorithm, we achieved 98.99% accuracy.
The method used by GLCM [30,62] has an accuracy is 82.00% and 96.50%. For this type of dataset, a gray level co-occurrence matrix cannot give accurate results from the GLCM features, so GLCM is useful for texture images such as fingerprint and Palmprint images [75]. Also, the method used in [30] used the GLCM, and the result cannot be high, because texture features cannot be used in medical images. However, in [62] they used both GLCM and Pre-Trained VGG-16 CNN, and the combination of the GLCM and Pre-Trained VGG-16 CNN improved the results. That means the use of GLCM only cannot improve the performance of the system.
In [23], texture feature extraction is used; also, to find the best feature, PCA is used. Principal component analysis feature selection cannot find the best feature from the medical data [76]. Also, there are no combination methods for feature extraction with feature selection employed in the other methods mentioned in Tables 6 and 7. The ineffective features cannot improve the results. Furthermore, the reason for using these related works is using similar methods either for feature extraction or selection with the same dataset that we used.
Sachdeva et al. [23] recommended PCA based on the intensity and texture features. In this method, PCA finds the eigenvalues of the features and this will tend to find the high eigenvalues in the image. However, in the brain tumor images, some regions of the image are not effective on the high eigenvalues, so the result will contain some mistakes and ultimately the result will be not so accurate.
Cheng et al. [59] used the Bag of Words (BoW) to extract the features from the images and SVM are used in the classification of the images. The BoW is used to recognize the text sources [77,78]. This method for finding the features of medical images is not suitable, and also in [59,60,66] researchers used deep learning with the SVM classifier without the aid of any feature selection function in order to select the effective features and achieve quality results. In [22], a modified deep CNN network was assigned for the dataset to extract the most effective features and classify them with the SVM classifier.
The 2D Discrete Wavelet transform (DWT) and 2D Gabor filter were used in [24]. These feature extraction methods are also useful in the face recognition systems used in [79][80][81].
Capsule networks (CapsNets) were used in [61,71] to extract the features from brain tumor images and Covid-19 images. This method cannot give high accuracy to 2D signals such as medical images and was recently used in speech recognition signals [82]. For 1D signals the performance is better than for 2D signals.
Ghassemi et al. [63] combined the Generative adversarial network (GAN) and Con-vNet (random split) to find and classify brain tumor images. Recently, GAN methods were used in a face recognition system [83,84]. Face images have high resolution, and there are more objects on the human face than on tumor images. However, combining with ConvNet can produce high accuracy.
The use of Deep Convolutional Auto-encoder alone cannot produce good results [9,69,70]. Auto-encoder produces more unimportant features from images. These features should be enhanced, and the best ones selected.
Umut Özkaya et al. [64] used CNN with data fusion and they obtained 98.27% accuracy. Compared with other methods, this method can be considered as high performance because data fusion creates robust features.
We implemented a powerful method to extract features from medical image datasets in the proposed study; the deep learning algorithms (CNN, Auto-encoder) produced highquality results when they extracted features. In addition, the meta-heuristic algorithm was utilized to select the effective features; this combination had a favorable impact. Ant colony optimization was used to select important features, to test its performance.

Conclusions
Computer-aided diagnostic systems have emerged as an effective tool for performing diagnosis prediction based on medical images. The performance of these systems is challenged by the processing of 'big data', which is also vital for accurate diagnosis. This study proposes a novel framework for extracting and selecting features based on deep learning, auto-encoder, and ACO with an intention to select the most prominent image features to reduce the amount of data to be processed. The meta-heuristic algorithm, ACO, is utilized to search for the essential features from the available feature-set to minimize the error rate. The obtained features are then processed through the learnable classifiers to determine the accuracy of the CAD system. The performance of the proposed system is evaluated for two medical image datasets: CXR and MRI. The CXR dataset is used to diagnose the patient's condition as COVID-19, pneumonia, and normal, whereas, based upon MRI images, the objective is to determine the type of brain tumor (meningioma, glioma, and pituitary). The removal of the minor, redundant, and noisy features produce a significant effect on the accuracy of the overall system. The proposed approach achieves the highest accuracy compared to the other state-of-the-art methods, such as ANN, CNN, CNN with data fusion, Stacked-auto-encoder, Capsule Networks, and DarkNet-19-based CNN, for diagnosing various medical disease datasets. The basic notion of developing the proposed CAD system is assisting the physician. This study's primary limitation is using the labeled data for supervised learning; hence, this process is not claimed to be completely automated. To exploit the potential of deep learning, the diagnostic prediction must be made using unlabeled data without any human intervention. Further studies will explore the possibility of performing disease prediction without requiring supervision to train the model.