Multimodality Imaging of COVID-19 Using Fine-Tuned Deep Learning Models

In the face of the COVID-19 pandemic, many studies have been undertaken to provide assistive recommendations to patients to help overcome the burden of the expected shortage in clinicians. Thus, this study focused on diagnosing the COVID-19 virus using a set of fine-tuned deep learning models to overcome the latency in virus checkups. Five recent deep learning algorithms (EfficientB0, VGG-19, DenseNet121, EfficientB7, and MobileNetV2) were utilized to label both CT scan and chest X-ray images as positive or negative for COVID-19. The experimental results showed the superiority of the proposed method compared to state-of-the-art methods in terms of precision, sensitivity, specificity, F1 score, accuracy, and data access time.


Introduction
The first confirmed COVID-19 case, according to the World Health Organization (WHO) reports, was in the central Chinese city of Wuhan on 8 December 2019. COVID-19 was reported as an epidemic in January 2020. From that date onwards, new confirmed cases were reported each day and the COVID-19 virus spread to every continent. According to the WHO, the number of confirmed COVID-19 virus cases is more than 416,614,050 and the number of deaths is more than 5,844,095 [1]. COVID-19 has become a global health crisis and the WHO has declared it a major pandemic. Understanding how the disease spreads and determining how undetected and undocumented cases contribute to the transmission of the virus are major challenges. Although COVID-19 vaccines are available on the market, there is still an inevitable demand for smart healthcare systems for the general population-and, especially, the elderly-so that the expected shortage in doctors in the health sector during the crisis can be promptly addressed. With the spread of COVID-19 to numerous countries in the world, considering the increases in the number of people infected and the number of deaths from day to day, it has become mandatory to diagnose and identify this COVID-19 virus.
All over the world, the COVID-19 virus remains a threat to the economies of countries and the health of people. It has been proven that the disease is transmitted from one person to another and, therefore, delays in discovering the disease lead to the spread of infection through interactions between the healthy and infected patients [2][3][4].
The test to verify a person's infection with the virus is often implemented by taking samples from the patient's throat, sputum, or nasopharynx to analyze the PCR of the viral RNA. However, one of the limitations of such tests is their low accuracy [5][6][7][8]. The diagnosis of COVID-19 based on laboratory tests is costly, laborious, time-consuming, and involves a complicated manual process [9]. Therefore, it is recommended that this type of test be replaced with chest CT images, which could be used as one type of early investigative test [3]. The challenge of laboratory testing with CT image analysis is that it demands a radiology expert and takes a significant amount of time. One solution for this challenge is to save valuable time for medical professionals by using an automated analysis system. Thus, the early diagnosis of COVID-19 would be invaluable in containing the disease outbreak. However, as the diagnosis of COVID-19 based on laboratory tests still displays a low positive rate, as well as being costly, laborious, time-consuming, and manually complicated [9], suggestions have emerged from specialists recommending the use of radiography screening as a primary tool for checking for COVID-19. With increased numbers of infected patients on a daily basis, especially in the epidemiological setting, the bottleneck for this method is the requirement for a large number of expert X-ray specialists to interpret the CT images. Moreover, the similar and overlapping patterns of lung diseases make it difficult for radiologists to point out these slight differences [10]. As a result, there is an urgent need to develop intelligent systems to aid radiologists with fast and accurate results.
Artificial intelligence (AI) has made great strides in recent years. Deep learning and the accompanying innovations offer radiologists a chance to change the radiology scene and improve survival rates. Deep learning neural system models have been applied to a few imaging tasks to date, including image segmentation, classification, and object detection. Deep learning techniques are unique compared to classic AI strategies, which are the significant ones. The latter employ feature extraction strategies in preparation for the computation, while deep learning techniques familiarize themselves with the image information without the requirement for feature extraction. Deep learning provides promising solutions for the analysis of medical images in future applications. With the recent advances in machine learning and artificial intelligence applications, especially in the medical field and including medical image processing, artificial intelligence has become a promising tool that can change diagnosis methods. Deep learning networks, as an artificial intelligence tool, have proven successful in image classification with their unique characteristics of being able to learn image representations automatically and mapping features onto continuous vectors that are subsequently used for predictions. AI applications in radiology [11] are driven by the idea that medical images are sets of data that can be computed by a machine to extract useful information from the properties of the image [12].
Thus, this paper presents a novel technique for the detection of the COVID-19 virus early on based on X-ray and CT chest radiography image analysis using deep learning algorithms. This methodology can be basically classified as image analysis as it comprises steps such as image acquisition, image preprocessing, feature extraction, and the employment of different classifiers. This paper's contributions can be outlined as follows: (1) We built promising, fine-tuned DL models capable of diagnosing chest X-ray and CT images that showed improvements in their precision, sensitivity, specificity, F1-scores, accuracy, and data access time. The models for COVID-19 detection were trained and their performances were evaluated using novel samples in order to contribute to the control of the epidemic.
(2) The proposed system will reduce the pressure on the national healthcare budget by reducing the cost of the physical tests performed in clinical laboratories. It has been reported that the cost of a PCR test is about USD 120-130 [13]. (3) For radiologists, with the spread of COVID-19, a daily flood of CT images need to be analyzed; the proposed techniques can act as an aid for radiologists (regardless of the level of experience) by reducing not only the time required to diagnose CT scans but also the pressure on the radiologist. (4) The disease can be localized by using a Grad-CAM algorithm that visualizes the infected areas of the lungs in chest X-ray and CT images.
The remainder of the paper is laid out as follows: Section 2 contains a literature review; Section 3 presents the proposed methodology; Section 4 contains the experimental results and discussion; and, finally, conclusions are drawn in Section 5.

Literature Review
The attention the COVID-19 virus has received from researchers has led to an enormous number of publications. In this section, most of the recently developed systems ap-  [14] proposed a coronavirus classification technique for CT images based on machine learning methods. The dataset, consisting of patients from Italy, one of the most endemic regions, comprised 618 images, including 219 images from COVID-19 patients, 224 images from influenza-A viral pneumonia patients, and 175 images from healthy cases. Chowdhury [15] trained four convolutional neural networks (CNNs) to classify CT images into two classes: normal and COVID-19 pneumonia. Linda et al. [9] proposed COVID-net, which is a deep CNN that can recognize and identify the COVID-19 disease from CT images. Shuai et al. [16] adapted the Inception transfer learning model to detect COVID-19 in CT images. Ghoshal et al. [17] presented a Bayesian deep learning classifier to estimate model uncertainty using the transfer learning method with COVID-19 X-ray images. The proposed system differs from the others in tracking disease progression through a deep analysis of the periodical changes in the CT images from the same patient, making it possible to modify treatment and help achieve improved results for patients. It also utilizes an efficient parameterized transfer learning model and smart data augmentation.
Rahimzadeh et al. [18] proposed a linked CNN dependent on the Xception and ResNet50V2 models to characterize COVID-19 cases from chest X-rays. The created framework utilized a dataset that contained 180 images of COVID-19 patients, 6054 images of pneumonia patients, and 8851 images of typical individuals. For each of the eight preparation stages, 633 pictures were selected. The test results exhibited 99.56% precision with 80.53% of COVID-19 cases. Fan et al. [19] proposed a multi-kernel attention network to analyze chest X-ray images from COVID-19 patients. Their model has three stages: a feature extraction stage followed by two parallel multi-kernel-size attention modules and, finally, the classification stage. The experimental results demonstrated improved performance in COVID-19 detection and an accuracy of 98.2%.
Loey et al. [20] presented a generative adversarial network (GAN) using deep learning to analyze COVID-19 from chest X-rays. Their study utilized the three pre-prepared models AlexNet, GoogleNet, and RestNet18. Apostolopoulos et al. [21] presented an exchange learning technique with a CNN for the analysis of COVID-19 cases using chest X-rays. The framework can detect COVID-19 images using five main models: VGG19, Inception, MobileNet, Xception, and Inception-ResNetV2. VGG19 was chosen as the fundamental learning model and it showed 93.48% accuracy. To detect COVID-19 patients, Panwar et al. [22] presented a binary image classification task. The input data were classified using a fine-tuned VGG model. Mishra et al. [23] used deep CNN-based image classification models to differentiate COVID-19 instances using chest CT scan images. Song et al. [24] employed a linear classifier to extract semantic features from CT scans. Jaiswal et al. [2] used a DenseNet201-based deep transfer learning (DTL) model to identify patients with COVID-19. The proposed model uses its own training weights to extract features from the ImageNet dataset. Silva et al. [25] proposed CovidNet, an efficient, voting-based technique for analyzing COVID-19 patterns in CT images.
Allioui et al. [26] proposed a multi-agent deep learning model for enhancement of COVID-19 CT image segmentation. Their proposal was based on multi-agent deep reinforcement learning (DRL), which utilizes a modified version of the Deep Q-Network. Khan et al. [27] proposed a COVID-19 detection method for CT images using deep learning, entropy-controlled optimization, and parallel feature fusion techniques. Their method mainly depends on the AlexNet and VGG16 models. The features are extracted and fused using a parallel positive correlation approach. Then, the entropy-controlled firefly optimization method is employed to select the optimal features. Their best achievement was an accuracy rate of 98%. Rehman et al. [28] proposed a framework for the detection of COVID-19 disease and 14 other types of chest diseases. They employed a convolutional neural network architecture with a soft-max classifier. Then, transfer learning was applied in order to extract deep features, which provided results similar to classic machine learning classification methods. Guo et al. [29] studied COVID-19 diagnosis from chest CT scans via an ensemble learning method based on ordinal regression. Their proposal relies on multi-binary, neuron stick-breaking, and soft label techniques. Mukherjee et al. [30] implemented an advanced deep network architecture with two CT image datasets. The authors utilized the transfer learning strategy with custom-sized input tailored to each type of deep architecture in order to improve the performance. Their best models achieved an average accuracy of 99.4%. Nasiri and Hasani [31] proposed a method for diagnosing coronavirus disease from X-rays. They used the DenseNet169 deep neural network (DNN). The extracted features were then used as input for the Extreme Gradient Boosting (XGBoost) algorithm to perform the classification task. They achieved accuracy up to 99.78%. Ullah et al. [32] developed an effective COVID-19 detection technique using the Shufflenet CNN by employing three types of images; i.e., chest radiographs, CT scans, and ECG trace images. Nasiri and Alavi [33] proposed a pretrained network named DenseNet169 to extract features from X-ray images. Analysis of variance (ANOVA) was employed as a feature selection method to reduce the computation and time complexity. Then, the selected features were classified with Extreme Gradient Boosting (XGBoost). Their proposed method reached 98.72% accuracy for two-class classification and 92% accuracy for multiclass classification.

Proposed Methodology
When building the diagnostic DL model, the chest X-ray and CT images were initially collected. In this study, as illustrated in Figure 1, a publicly available SARS-CoV-2 CT scan dataset was used [34]. The dataset contained 1252 CT scans positive for COVID-19 and 1230 CT scans from non-infected individuals. Another chest X-ray dataset [35] with 6939 sample images was also considered in this study, which included three classes (COVID-19, normal, and pneumonia) with 2313 samples for each category. Section 4.1 provides a detailed description of the datasets utilized. Preprocessing, one of the basic phases in DL learning, is responsible for resizing images to fit the deep learning model. Other processes are also performed to prepare the images for the next phase, such as data augmentation to select the more diverse, more robust datasets to train the model; image grayscale conversion; and image binarization.
learning was applied in order to extract deep features, which provided results similar to classic machine learning classification methods. Guo et al. [29] studied COVID-19 diagnosis from chest CT scans via an ensemble learning method based on ordinal regression. Their proposal relies on multi-binary, neuron stick-breaking, and soft label techniques. Mukherjee et al. [30] implemented an advanced deep network architecture with two CT image datasets. The authors utilized the transfer learning strategy with custom-sized input tailored to each type of deep architecture in order to improve the performance. Their best models achieved an average accuracy of 99.4%. Nasiri and Hasani [31] proposed a method for diagnosing coronavirus disease from X-rays. They used the DenseNet169 deep neural network (DNN). The extracted features were then used as input for the Extreme Gradient Boosting (XGBoost) algorithm to perform the classification task. They achieved accuracy up to 99.78%. Ullah et al. [32] developed an effective COVID-19 detection technique using the Shufflenet CNN by employing three types of images; i.e., chest radiographs, CT scans, and ECG trace images. Nasiri and Alavi [33] proposed a pretrained network named DenseNet169 to extract features from X-ray images. Analysis of variance (ANOVA) was employed as a feature selection method to reduce the computation and time complexity. Then, the selected features were classified with Extreme Gradient Boosting (XGBoost). Their proposed method reached 98.72% accuracy for two-class classification and 92% accuracy for multiclass classification.

Proposed Methodology
When building the diagnostic DL model, the chest X-ray and CT images were initially collected. In this study, as illustrated in Figure 1, a publicly available SARS-CoV-2 CT scan dataset was used [34]. The dataset contained 1252 CT scans positive for COVID-19 and 1230 CT scans from non-infected individuals. Another chest X-ray dataset [35] with 6939 sample images was also considered in this study, which included three classes (COVID-19, normal, and pneumonia) with 2313 samples for each category. Section 4.1 provides a detailed description of the datasets utilized. Preprocessing, one of the basic phases in DL learning, is responsible for resizing images to fit the deep learning model. Other processes are also performed to prepare the images for the next phase, such as data augmentation to select the more diverse, more robust datasets to train the model; image grayscale conversion; and image binarization.  Transfer learning is an inevitable step for networks with sparse data (a few hundred or thousand images). Transfer learning is applied to a vast, pretrained network of millions of images. There are two main techniques for applying transfer learning: feature extraction and fine-tuning. For the first technique, only some of the newly added layers are updated and improved during the training phase. In contrast, for the second technique, the weights for all layers are updated, optimized, and customized for the new classification problem. In general, fine-tuning is more effective than the feature extraction technique. Fine-tuning DL models (EfficientB0, VGG-19, DenseNet121, MobileNetV2, etc.) requires extensive resources and time. Initially, the convolution layers learn low-level features and, as the network grows, mid/high-level features are learned. With fine-tuning, these trained lowlevel features are retained, while the high-level features are trained for new classification problems. In this study, five residual blocks were used: the input, two convolution layers, a max-pooling layer, and an output layer. Subsequently, fine-tuning transfer learning was employed for the first four head layers of the network. The trainable parameters were adjusted along with the supplemented soft-max activation function, which consisted of two or three output neurons relating to binary or three-way classification. Algorithm 1 summarizes the working steps for our DL model. Move forward and calculate cross-entropy E c = −(ylog(p) + (1 − y) log(1 − p)) 10: Move backward and update the optimizer 11: EndWhile In this section, the architectures for the CNNs and the transfer learning approach are described. Although CNNs are more similar to vanilla neural networks, the convolution operation is carried out in more than one layer [36]. A simple neural network layer is presented in Equation (1).
where z [1] is the current layer; a [0] is the first or input layer; W [1] represents the weights for the first layer; and b [1] is the bias. For instance, for the VGG19 Conv layer [37] in Equation (2), for each channel of x, there is a corresponding channel in the first filter of W C [1] . Equation (3) illustrates the output of the final layer.
where i, j, and k correspond to the row, column, and channel for z [1] , respectively; l, m, and n refer to the row, column, and channel number for the filter, respectively; and k denotes the filter being used for the present epoch. Figure 2 depicts the general convolution operation carried out by the CNNs, which comprised input, convolution, pooling, fully connected, and output layers. The chest X-ray and CT scan dataset images were fed into the input layer. [0] + ( ,1) where i, j, and k correspond to the row, column, and channel for [1] , respectively; l, m, and n refer to the row, column, and channel number for the filter, respectively; and k denotes the filter being used for the present epoch. Figure 2 depicts the general convolution operation carried out by the CNNs, which comprised input, convolution, pooling, fully connected, and output layers. The chest X-ray and CT scan dataset images were fed into the input layer.  Figure 3 illustrates an example of the convolution operation with a 6 × 6 matrix using a stride of 2 and a 3 × 3 filter. The stride value defines the moving filter window of the input matrix. The pooling layer, which comes after the convolutional layer, is responsible for reducing the network computational loss; it is a fully connected layer where all neurons receive their inputs from the flattened form of the previous convolutional layer, as illustrated in Figure 4. An example of a flattening operation is depicted in Figure 5. In our study, some of the popular pooling functions considered were the average, L2 norm, minimum, and maximum functions. In addition, the output layer depends on the number of categories required to train the DL models. In our experiments, two different datasets were utilized: the CT scan dataset, which has a binary classification of COVID and non-COVID; and the chest X-ray dataset, which has a triple classification of COVID-19, normal, and pneumonia classes.  Figure 3 illustrates an example of the convolution operation with a 6 × 6 matrix using a stride of 2 and a 3 × 3 filter. The stride value defines the moving filter window of the input matrix. The pooling layer, which comes after the convolutional layer, is responsible for reducing the network computational loss; it is a fully connected layer where all neurons receive their inputs from the flattened form of the previous convolutional layer, as illustrated in Figure 4. An example of a flattening operation is depicted in Figure 5. In our study, some of the popular pooling functions considered were the average, L2 norm, minimum, and maximum functions. In addition, the output layer depends on the number of categories required to train the DL models. In our experiments, two different datasets were utilized: the CT scan dataset, which has a binary classification of COVID and non-COVID; and the chest X-ray dataset, which has a triple classification of COVID-19, normal, and pneumonia classes. [0] + ( ,1) where i, j, and k correspond to the row, column, and channel for [1] , respectively; l, m, and n refer to the row, column, and channel number for the filter, respectively; and k denotes the filter being used for the present epoch. Figure 2 depicts the general convolution operation carried out by the CNNs, which comprised input, convolution, pooling, fully connected, and output layers. The chest X-ray and CT scan dataset images were fed into the input layer.  Figure 3 illustrates an example of the convolution operation with a 6 × 6 matrix using a stride of 2 and a 3 × 3 filter. The stride value defines the moving filter window of the input matrix. The pooling layer, which comes after the convolutional layer, is responsible for reducing the network computational loss; it is a fully connected layer where all neurons receive their inputs from the flattened form of the previous convolutional layer, as illustrated in Figure 4. An example of a flattening operation is depicted in Figure 5. In our study, some of the popular pooling functions considered were the average, L2 norm, minimum, and maximum functions. In addition, the output layer depends on the number of categories required to train the DL models. In our experiments, two different datasets were utilized: the CT scan dataset, which has a binary classification of COVID and non-COVID; and the chest X-ray dataset, which has a triple classification of COVID-19, normal, and pneumonia classes.      The proposed DL models consider the pretrained weights, which help in learning COVID-19 cases. Three main steps follow: in the first step, the training and test datasets for CT scan or chest X-ray images are prepared. Here, the first CT scan dataset was divided into a training set and testing set, and the training data samples were used to learn the utilized models. The split ratio for the training and testing sets was 978:274 for the COVID-19 class and 1006:223 for the non-COVID class, as presented in Table 1. Further, for the second chest X-ray dataset, there was an approximately equal distribution between COVID-19, normal, and pneumonia classes, with 1850 for training and 463 for testing, as reported in Table 2.  Training set  978  1006  Testing set  274  223  Total  1252  1229   Table 2. Frequency of training and testing images in the second dataset (chest X-ray). Training set  1850  1850  1850  Testing set  463  463  463  Total  2313  2313  2313 In the second step, the base model and the new model are generated. Here, five main models with weights pretrained with ImageNet were used as the base models. The experiments were run many times with the intention of reaching the most suitable hyperparameters, which, in turn, would provide the best results. Table 3 summarizes the hyperparameters for the different DL models used in this study, and Table 4 presents the characteristics of the DL model architectures used in our experiments. Finally, in the The proposed DL models consider the pretrained weights, which help in learning COVID-19 cases. Three main steps follow: in the first step, the training and test datasets for CT scan or chest X-ray images are prepared. Here, the first CT scan dataset was divided into a training set and testing set, and the training data samples were used to learn the utilized models. The split ratio for the training and testing sets was 978:274 for the COVID-19 class and 1006:223 for the non-COVID class, as presented in Table 1. Further, for the second chest X-ray dataset, there was an approximately equal distribution between COVID-19, normal, and pneumonia classes, with 1850 for training and 463 for testing, as reported in Table 2. In the second step, the base model and the new model are generated. Here, five main models with weights pretrained with ImageNet were used as the base models. The experiments were run many times with the intention of reaching the most suitable hyperparameters, which, in turn, would provide the best results. Table 3 summarizes the hyperparameters for the different DL models used in this study, and Table 4 presents the characteristics of the DL model architectures used in our experiments. Finally, in the third step, the trained weights are updated and then stored. Hence, once the forward propagation is completed, the binary cross-entropy loss function (Equation (4)) is calculated for the output layer.

COVID-19 Normal Viral Pneumonia
where y denotes the true value, and p denotes the probability predicted by the model. Then, when the backpropagation process occurs, it counts the number of changes in the weights. Traveling forward and backward is called one epoch, and during one epoch one sample from the dataset is passed per batch size (BS).

Description of Datasets
In this study, two different open-access sources were used as our basic experimental datasets. The CT scan dataset [34] had a total number of 2481 CT images divided into 1229 normal cases and 1252 COVID-19 patients, whereas the chest X-ray dataset [35] had 6939 sample images consisting of three classes (COVID-19, normal, and pneumonia), with 2313 samples used for each category. Figure 6 shows samples from the CT scan dataset and Figure 7 shows samples from the chest X-ray dataset. Next, the database was split into training and testing sets. The details for the training and testing samples are shown in Tables 1 and 2 for the CT scan and chest X-ray datasets, respectively, with the results of the different models displayed in the subsequent tables. Table 1 describes the splitting strategy used for the training and testing sets in the experiment for the CT scan dataset. The 80-20 training-testing ratio was adopted in our experiments. The COVID-19 class had 1013 images for training and 239 images for testing, while the normal class had 971 images for training and 258 for the testing set. Table 2 describes the splitting strategy used for the chest X-ray dataset. This dataset had an equal distribution, with 1850 images for training and 463 images for testing in each class. Data augmentation techniques were employed here to effectively increase the number of training samples. The images were augmented through cropping, noising, brightness modifications, contrast modifications, and random flipping.

Performance Metrics
In order to measure the performance of the proposed DL models, Equations (5)-(9) were used.

Performance Metrics
In order to measure the performance of the proposed DL models, Equations (5)-(9) were used.

Performance Metrics
In order to measure the performance of the proposed DL models, Equations (5)-(9) were used.
Accuracy = TP + TN (TP + FP + TN + FN) (8) Table 5 reports the results obtained for the five baseline DL models (EfficientB0, VGG-19, DenseNet121, EfficientB7, and MobileNetV2) in the classification task for both the CT scan and chest X-ray datasets. The maximum values are marked in red and underlined. Moreover, Table 5 also reports the run time in seconds for the training and testing sets. The minimum measured values are also marked in red and underlined. As presented in the table, for the CT scan dataset, all five models provided an average accuracy greater than 95%, while for the chest X-ray dataset, all models provided an average accuracy greater than 95% except for the EfficientB7 model, which had 89.04% average accuracy. For the CT scan dataset, EfficientB0 achieved the highest results (99%) in terms of precision and the F1-score, while VGG-19 and DenseNet121 attained the lowest average scores (96%). The MobileNetV2 model achieved the highest averages (99.18% and 99.19) for sensitivity and accuracy, respectively, while EfficientB7 achieved the highest score (99.74%) in terms of specificity. In contrast, VGG-19 attained the lowest averages for sensitivity, specificity, and accuracy. MobileNetV2 achieved the best results in terms of training and testing run times, with 117.43 s for the training run time and 0.77 s for the testing run time, respectively. In general, the MobileNetV2 model can be considered the superior model in comparison to the other four models.

Results for DL Models
For the chest X-ray dataset, DenseNet121 achieved the highest average precision, F1-score, specificity, and accuracy with 99.57%, 99.56%, 99.78%, and 99.71% respectively. The EfficientB0 model achieved the highest average sensitivity with 99.77%. The VGG-19 model achieved the lowest average values for all metrics. In general, the DenseNet121 model can be considered the best model, despite MobileNetV2 having the lowest training run time. For more details, see Appendix A. Table 6 presents the detailed results for the two categories (COVID-19 vs. non-COVID) obtained with the five mentioned DL models with regard to precision, sensitivity, specificity, F1-score, and accuracy. As can be observed in the table, all models achieved accuracy greater than 95%. Regarding the COVID-19 class, the highest values are underlined and marked in red, while the non-COVID class is underlined and marked in green. For the COVID-19 category, the EfficientB0 model was the best in terms of precision and F1-score. However, EfficientB7 achieved the highest sensitivity and specificity, while MobileNetV2 achieved the highest accuracy. Regarding the non-COVID class, the EfficientB0 model was also the best in terms of precision and F1-score. The MobileNetV2 model achieved the highest sensitivity, specificity, and accuracy. For more details, see Appendix A.  Table 7 presents the detailed results for the three-category (COVID-19 vs. normal vs. viral pneumonia) dataset. The highest values for the COVID-19 class are underlined and marked in red, the highest values for the normal class are underlined and marked in green, and, finally, the highest values for the viral pneumonia class are underlined and marked in blue. As can be observed in the table, for the COVID-19 category, DenseNet121 achieved the highest precision, sensitivity, specificity, F1-score, and accuracy with 98.93%, 99.46%, 100%, 99.45%, and 99.63%, respectively. DenseNet121 also achieved the highest values for the normal class for all metrics with 100%, 99.89%, 99.78%, 100%, and 99.92%, respectively. Furthermore, for the viral pneumonia class, it achieved the best results in terms of precision, specificity, and accuracy with 99.78%, 99.89%, and 99.56%, respectively. However, MobileNetV2 surpassed DenseNet121 in terms of the F1-score and sensitivity, achieving 99.35% and 98.91%, respectively. For more details, see Appendix A.  Figure 8 illustrates the 80-20% confusion matrixes for the five models with the CT scan dataset, showing support of 274 and 223 for the COVID-19 and normal classes, respectively. The best model, MobileNetV2, misclassified only 2 out of 274 of the COVID-19 images as normal. Moreover, 2 out of 223 of the normal images were also misclassified. In contrast, the worst model, VGG-19, misclassified ten images in the COVID-19 class as normal. Figure 9 illustrates the 80-20% confusion matrixes for the five models with the chest X-ray dataset, showing support of 457, 463, and 462 for the normal, COVID-19, and viral pneumonia classes, respectively. As can be observed, DenseNet121 was the best model because it classified all COVID-19 images (463 images) accurately. Classifying COVID-19 images accurately with no errors is a remarkable result, as the control is spread among individuals. Moreover, only one normal image was misclassified as pneumonia, and five pneumonia images were misclassified as COVID-19. However, this is less severe than misclassifying COVID-19 images. It can be observed that the DenseNet121 model was not confused with respect to COVID-19 images. Figure 10 shows the receiver operating characteristic (ROC) curves for the true-positive rate (TPR) vs. false-positive rate (FPR) for the normal, viral pneumonia, and COVID-19 chest X-ray dataset images. Furthermore, the values of the area under the curve (AUC) are shown in the figure for each category. In addition, Figure 11 shows the ROC curves for the COVID-19 and non-COVID CT scan dataset images. Figure 12 illustrates the results of applying the Grad-CAM algorithm to cover the chest X-ray and CT scan dataset images with heat maps. In the chest X-ray images, the class activation mapping was undertaken by concentrating on particular portions of the normal, viral pneumonia, and COVID-19 classes. In the CT scan images, we applied the Grad-CAM algorithm to COVID-19 and non-COVID classes. In general, in the normal images in both datasets, there was not any kind of opacity that distinguished normal patients from other patients. As depicted in Figure 12, there were no significant localized regions in normal images. In other classes, our models demonstrated the capability to detect the localized regions in the heat maps generated.  contrast, the worst model, VGG-19, misclassified ten images in the COVID-19 class normal. Figure 9 illustrates the 80-20% confusion matrixes for the five models with t chest X-ray dataset, showing support of 457, 463, and 462 for the normal, COVID-19, an viral pneumonia classes, respectively. As can be observed, DenseNet121 was the be model because it classified all COVID-19 images (463 images) accurately. Classifyi COVID-19 images accurately with no errors is a remarkable result, as the control spread among individuals. Moreover, only one normal image was misclassified pneumonia, and five pneumonia images were misclassified as COVID-19. However, th is less severe than misclassifying COVID-19 images. It can be observed that the Dens Net121 model was not confused with respect to COVID-19 images.   Figure 9. The 80-20% confusion matrixes for the five models with the chest X-ray dataset. Figure 10 shows the receiver operating characteristic (ROC) curves for true-positive rate (TPR) vs. false-positive rate (FPR) for the normal, viral pneumonia, COVID-19 chest X-ray dataset images. Furthermore, the values of the area under curve (AUC) are shown in the figure for each category. In addition, Figure 11 shows ROC curves for the COVID-19 and non-COVID CT scan dataset images. Figure 12 i trates the results of applying the Grad-CAM algorithm to cover the chest X-ray and scan dataset images with heat maps. In the chest X-ray images, the class activation m ping was undertaken by concentrating on particular portions of the normal, viral p monia, and COVID-19 classes. In the CT scan images, we applied the Grad-CAM a rithm to COVID-19 and non-COVID classes. In general, in the normal images in datasets, there was not any kind of opacity that distinguished normal patients from o patients. As depicted in Figure 12, there were no significant localized regions in nor images. In other classes, our models demonstrated the capability to detect the local regions in the heat maps generated.  Chest X-ray COVID-19  Chest X-ray COVID-19  Table 8 presents a comparison of the best proposed DL model and the state-of-the-art method [38] that obtained the best results in the literature for the CT scan  Table 8 presents a comparison of the best proposed DL model and the state-of-theart method [38] that obtained the best results in the literature for the CT scan dataset. The model proposed here surpassed the state-of-the-art method in terms of sensitivity, specificity, and accuracy. Moreover, the authors of [38] relied on a deep feature fusion stage extracted from the deep features of AlexNet, GoogleNet, ShuffleNet, and ResNet-18. Therefore, their fusion demands an enormous number of parameters, which, in turn, requires excessive processing time.  Table 9 presents a comparison of the best proposed DL model and the state-of-the-art methods for the chest X-ray dataset. The worst results, such as for the model from [34], were obtained with models pretrained for feature extraction instead of with the transfer learning strategy. In general, the results demonstrate the superiority of the proposed model for the chest X-ray three-class classification task, and a remarkably higher average accuracy result of 99.32% was achieved in the case of the DenseNet121 model. Fine-tuning with a moderate number of layers and parameters contributed to these results appreciably. Similarly, dimension reduction promotes faster learning, which was reflected in the short training time (2906.43 s).

Conclusions
In this study, two primary benchmark datasets for CT and X-ray images were used. All images were enhanced and preprocessed as part of the basic DL learning phase. COVID-19 images were classified as positive or negative using a set of fine-tuned transfer learning models. A set of deep learning models were trained and tested in this research study. For the CT scan dataset, all five models provided average accuracy greater than 95%, whereas, for the chest X-ray dataset, all models provided average accuracy greater than 95% except EfficientB7, which achieved 89.04% accuracy. Compared to the methods in the literature, the results show that MobileNetV2 surpassed the best method in terms of sensitivity, specificity, and accuracy, with a training run time of 117.43 s and testing run time of 0.77 s. In addition, DenseNet121 achieved the highest precision, specificity, F1-score, and accuracy for X-ray images with 99.57%, 99.78%, 99.56%, and 99.71% respectively. In the future, a prediction approach based on a combination of these DL models will be considered in order to improve the results. In addition, more complicated and larger datasets will be used for training to assess the robustness of the proposed approach. Moreover, using CT scans for COVID-19 detection may bring extra radiation to patients. In the future, we will consider the radiation dose issue in our proposed models. Three-class experiment results.