Detection of COVID-19 Using Transfer Learning and Grad-CAM Visualization on Indigenously Collected X-ray Dataset

The COVID-19 outbreak began in December 2019 and has dreadfully affected our lives since then. More than three million lives have been engulfed by this newest member of the corona virus family. With the emergence of continuously mutating variants of this virus, it is still indispensable to successfully diagnose the virus at early stages. Although the primary technique for the diagnosis is the PCR test, the non-contact methods utilizing the chest radiographs and CT scans are always preferred. Artificial intelligence, in this regard, plays an essential role in the early and accurate detection of COVID-19 using pulmonary images. In this research, a transfer learning technique with fine tuning was utilized for the detection and classification of COVID-19. Four pre-trained models i.e., VGG16, DenseNet-121, ResNet-50, and MobileNet were used. The aforementioned deep neural networks were trained using the dataset (available on Kaggle) of 7232 (COVID-19 and normal) chest X-ray images. An indigenous dataset of 450 chest X-ray images of Pakistani patients was collected and used for testing and prediction purposes. Various important parameters, e.g., recall, specificity, F1-score, precision, loss graphs, and confusion matrices were calculated to validate the accuracy of the models. The achieved accuracies of VGG16, ResNet-50, DenseNet-121, and MobileNet are 83.27%, 92.48%, 96.49%, and 96.48%, respectively. In order to display feature maps that depict the decomposition process of an input image into various filters, a visualization of the intermediate activations is performed. Finally, the Grad-CAM technique was applied to create class-specific heatmap images in order to highlight the features extracted in the X-ray images. Various optimizers were used for error minimization purposes. DenseNet-121 outperformed the other three models in terms of both accuracy and prediction.


Introduction
The world has been going through an existential pandemic of the COVID-19 disease since December 2019, and it has affected every aspect of our lives [1]. This pandemic has had an intense social and economic impact on most countries. COVID-19 is characterized by a high transmissibility and has a significant mortality rate [2]. All countries have been implementing several precautionary measures to ensure the safety of their citizens. More than 200 countries and territories have been reported to become infected by this virus. As of 29 June 2021, nearly 180 million confirmed cases of COVID-19 and four million deaths have been reported globally [3]. named 'CoVIR-Net'. They exhibited two approaches: CoVIR-Net with Inception-Resnet-v2 residual blocks and the CoVIR-Net feature extractor with a random forest classifier. The two approaches presented accuracies of 95.78% and 97.29%, respectively. Another method named by the authors as 'COVID-Net' is presented in [24]. The presented architecture utilizes a lightweight residual PEPX (projection-expansion-projection-extension) design pattern. The presented article utilizes an open-source dataset, COVIDx. This dataset has 13,975 X-ray images that have been collected from 13,870 patient cases. The presented model reports a 93.3% test accuracy. Similarly, a ResNet-50-based model termed 'DREnet' is presented in [25]. It was used for the diagnosis of CT images that were divided into three classes. The achieved accuracy is reported as 89.36%. A modified model named 'DeCoVNet' has been presented in [26], which use UNet as the base model. The presented model was tested on 499 CT scan images, with an accuracy of 95.9%. In addition, a model called 'DarkCovidNet' is presented in [27], in which the DarkNet model has been utilized as a classifier for the YOLO real time object selection. It exhibited a binary and multiclass accuracy of 98.08% and 87.02%, respectively. Another model, named 'Coronet', is presented in [28], with the use of Xception as the base model. A total of 203 normal and 660 bacterial pneumonia images were used. The reported accuracy is 89.6%.
A comparative study was performed in [29] using ResNet-18, ResNet-50, COVID-NET, and DenseNet-121 on Pytorch 1.4. The collection of a custom dataset was reported (named as 'CORDA'), which consists of data of 386 patients from a hospital in Turin, Italy. Six other publicly available datasets were also utilized. In addition to the implementation of the aforementioned models, authors have also used a new convolutional neural network comprising of eight convolution layers and a fully connected layer. This new model has been named 'Conv8'. The use of this smaller architecture did not perform well in comparison to large models, such as ResNet-18, but presented acceptable results, with a BA of 0.61 and DOR of 2.38. The main constraint of the study was the lack of availability of a large dataset. The authors suggested that these models can perform quite well upon the availability of a larger dataset. In comparison, the re-search work in this paper focuses on the use of fine tuning and transfer learning by utilizing the pre-trained weights through freezing the pre-trained layers. Only the weights of the last two layers have been retrained/updated. Furthermore, a large dataset of 7232 images has been used in this research, which improves the accuracies of the models used in this paper. Moreover, the testing dataset has been segregated from the training and validation datasets. Apart from this segregated dataset, a locally collected dataset of 450 images has also been utilized for the purpose of testing and prediction.
In this research, the transfer learning technique is used to detect COVID-19 from the pulmonary (chest) X-rays. For this purpose, four pre-trained CNN models, i.e., VGG16, ResNet50, MobileNet, and DenseNet-121 have been used. In order to utilize both the transfer learning technique and the fine-tuning on the aforementioned pre-trained models, batch normalization and dropout layers are added in the FC layer. Details on the implementation of transfer learning are given in Section 2.3. The models under study have been compared on the basis of important parameters, such as the number of epochs, batch size, learning rate, etc. Suggestions have been made on the basis of comparison and critical analysis. The models have been tested on 450 indigenously collected X-ray images of Pakistani patients. Finally, the Grad-CAM technique has been applied in order to create class-specific heatmap images to highlight the features that are extracted from the X-ray images. Various optimizers, i.e., Adam, the stochastic gradient descent (SGD), Adadelta, and RMSprop from Keras version 2.6.0 have been used for error minimization purpose. It is noteworthy to mention here that the pretrained models had originally been trained on the Imagenet dataset, which consisted of millions of images with multiple classes. Whereas, in this research, these models have been retrained on 5062 images assessed through the Kaggle website for two classes only. By utilizing transfer learning and fine tuning, a higher accuracy is achieved for each model. Tensorflow and Keras API is used for the processing of these models. This study was performed on a 12 GB NVIDIA Tesla K80 GPU that

Dataset
In this study, pulmonary (chest) X-ray images were used for the diagnosis of COVID-19. The dataset was categorized into two main classes, i.e., COVID-19 and normal. A total of 7232 images (3616 COVID-19 + 3616 normal) were accessed from the 'COVID-19 Radiography Database (available at Kaggle, https://www.kaggle.com/tawsifurrahman/ covid19-radiography-database, accessed on 4 May 2021) [30]. From the total images, 70% (5062 images) were used for training + validation, and the remaining 30% (2170 images) were used for testing purposes. The training + validation dataset of 5062 images was further split into a 70:30 ratio, i.e., 3544 (70%) images for training purposes and 1518 (30%) images for validation purposes. The details of the data splitting are given in Table 1. In addition to the dataset accessed via Kaggle, another locally collected dataset of 450 images (COVID-19 + normal) was also used for testing and prediction purposes. This indigenous data of chest X-ray images of Pakistani COVID-19 positive and normal patients have been collected from a local hospital. The samples of normal and COVID-19 X-ray images assessed via the Kaggle database are shown in Figure 1, whereas the samples of the locally collected images are given in Figure 2.

Methodology
The methodology adopted for transfer learning for the detection and classification of COVID-19 is depicted in Figure 3. The primary objective was to classify a chest X-ray image into two categories, i.e., normal and COVID-19. The two main stages involved in the model were the preprocessing stage (which further included normalization and data augmentation) and the classification stage (which involved the use of transfer learning on pre-trained models and prediction). A normalization range of 0 to 1 was used. The images were rescaled by multiplying each pixel with a factor of 1/255. Moreover, the images were augmented by the following: (1) rotation at 40 degrees, (2) height, width, and zoom range scaling, and (3) horizontal flipping and vertical flipping. The data augmentation performed on the training images is shown in Figure 4. A sample of the results obtained after the data augmentation is depicted in Figure 5.
During the training process in a CNN model, the visualization of intermediate activations helps to better understand the feature extraction process, especially for an imagebased dataset. The term activation here is referred to as the output of a layer, or, more specifically, the outputs of the several pooling and convolution layers are termed 'feature maps'. Therefore, the purpose of visualizing the intermediate activations was actually to display these feature maps in order to better understand the process of decomposition of an input image into various filters learned by the network. The visualization of the intermediate activations for the CNNs used in this study is given in Figure 6.  (a-c) normal chest X-rays; (d-f) COVID-19 chest X-rays.

Methodology
The methodology adopted for transfer learning for the detection and classification of COVID-19 is depicted in Figure 3. The primary objective was to classify a chest X-ray image into two categories, i.e., normal and COVID-19. The two main stages involved in the model were the preprocessing stage (which further included normalization and data augmentation) and the classification stage (which involved the use of transfer learning on pre- trained models and prediction). A normalization range of 0 to 1 was used. The images were rescaled by multiplying each pixel with a factor of 1/255. Moreover, the images were augmented by the following: (1) rotation at 40 degrees, (2) height, width, and zoom range scaling, and (3) horizontal flipping and vertical flipping. The data augmentation performed on the training images is shown in Figure 4. A sample of the results obtained after the data augmentation is depicted in Figure 5.   [33]. During the training process in a CNN model, the visualization of intermediate activations helps to better understand the feature extraction process, especially for an image- trained models and prediction). A normalization range of 0 to 1 was used. The images were rescaled by multiplying each pixel with a factor of 1/255. Moreover, the images were augmented by the following: (1) rotation at 40 degrees, (2) height, width, and zoom range scaling, and (3) horizontal flipping and vertical flipping. The data augmentation performed on the training images is shown in Figure 4. A sample of the results obtained after the data augmentation is depicted in Figure 5.   [33].  . Augmentation techniques applied on input images used in this study [33].
trained models and prediction). A normalization range of 0 to 1 was used. The images were rescaled by multiplying each pixel with a factor of 1/255. Moreover, the images were augmented by the following: (1) rotation at 40 degrees, (2) height, width, and zoom range scaling, and (3) horizontal flipping and vertical flipping. The data augmentation performed on the training images is shown in Figure 4. A sample of the results obtained after the data augmentation is depicted in Figure 5.   [33]. During the training process in a CNN model, the visualization of intermediate activations helps to better understand the feature extraction process, especially for an image- display these feature maps in order to better understand the process of decomposition of an input image into various filters learned by the network. The visualization of the intermediate activations for the CNNs used in this study is given in Figure 6.

Architecutral Overview of Pre-Trained Models
The VGG16 contains 13 convolutional layers and 3 fully connected dense layers. The ResNet-50 contains 50 layers, and skip connection technique is used in this residual network, which skips convolutional layers that help a lot during backpropagation method. The original model consists of 16 residual blocks and one dense layer, whereas, in this study, three dense layers are added and utilized. Regarding MobileNet, it is a lightweight neural network with lesser parameters and higher image classification accuracy. MobileNet contains 28 convolutional layers and comprises the depthwise separable convolution. The depthwise separable convolution further contains two layers, i.e., depthwise convolution, and pointwise convolution. DenseNet-121, on the other hand, contains 121 layers and has 4 dense blocks. The architectural parameters of the four models under study are given in Table 2, and the architectural designs of the models utilized for transfer learning are shown in Figure 7. The original model consists of 16 residual blocks and one dense layer, whereas, in this study, three dense layers are added and utilized. Regarding MobileNet, it is a lightweight neural network with lesser parameters and higher image classification accuracy. Mo-bileNet contains 28 convolutional layers and comprises the depthwise separable convolution. The depthwise separable convolution further contains two layers, i.e., depthwise convolution, and pointwise convolution. DenseNet-121, on the other hand, contains 121 layers and has 4 dense blocks. The architectural parameters of the four models under study are given in Table 2, and the architectural designs of the models utilized for transfer learning are shown in Figure 7.   The models used in this research have already been trained on a large scale labeled dataset, i.e., ImageNet. During the training phase, the pre-trained weights (the weights obtained when a model was trained on the ImageNet dataset) were utilized. Transfer learning was applied by freezing the pre-trained layers, except for the last two layers of each model. The last two layers were unfrozen and were retrained on our dataset. Moreover, a batch normalization layer was added before the fully connected layer in each model. After flattening the model in the fully connected layer, dense layer 1 was added, which contains 4096 units with ReLU activation function. A batch normalization layer was added again after the dense layer 1. After the batch normalization layer, a dropout layer was added with a dropout size of 0.5. Then, a dense layer 2 was added, which contains 1024 units with the activation function ReLU. Another dropout layer was then added after the dense layer 2, with a dropout size of 0.5, and, finally, a dense layer 3 was added, which contains 2 units with the Softmax activation function.
Usually, a great number of data are needed to train a neural network from scratch, but access to data is not always available, e.g., the COVID-19 radiography data. Utilizing a pretrained model with pretrained weights addresses the issue of large training data requirement. The learning process becomes faster and more accurate and needs less training data. Traditional learning, in comparison, starts with randomized weights and tunes them until they finally converge. Transfer learning, on the other hand, offers a higher learning rate during training. Therefore, with a better starting point and a higher learning rate, transfer learning converges the neural networks faster and at a higher performance level, enabling more accurate outputs.
In this study, fully connected (FC) layers, batch normalization layers, and a dropout of 0.5 was added to the model for the purpose of fine tuning. Details pertaining to the trainable and non-trainable parameters are given in Table 3. The FC layer was kept the same for all of the models. In the FC layer, the dense layer 1 and dense layer 2, an activation function of 'ReLU' with 4096 and 1024 units was used, respectively. Whereas, in the dense layer 3, two units, along with the 'softmax' activation function, was used. This activation function gave us the output from the 2 classes, i.e., COVID-19 and normal. Even though, for binary classification, a simple Sigmoid can also work, in comparison, the Softmax as an output layer works better. The probabilities produced by Sigmoid do not sum up to the value of 1. Whereas, the output probabilities of Softmax are interrelated and always sum up to the value 1. In case of Softmax, increasing the output value of one class makes the output of other class go down. Furthermore, while using ReLU as the activation function, use of Softmax as the output layer is mostly preferred [34]. In a recent research [34], the authors evaluated the performance capability of the Softmax output layer with the ReLU activation layer for several neural networks and validated better performance of Softmax with ReLU activation layers for classification tasks. Hence, Softmax as an output layer in this research is primarily preferred in order to obtain prediction in the form of probabilities. As for the purpose of predicting the final result, it is convenient to know how much our model is closer to predicting the specific class. In addition, in order to have a high probability for one class, the probability of the other class has to decrease by an equal amount; thus, Softmax works better with ReLU activation layers [34]. Table 3. Parameters of the pre-trained models in this study. In addition to dense layers, batch normalization layers were used, which help to converge the model faster and achieve lower error in the training phase. The size of input images was kept constant to 224 × 224 pixels for all four models. However, for the purpose of fine tuning, the last two layers of the pre-trained model were unfrozen and trained on the training dataset used in this study. The final architectures used for transfer learning are given in Figure 7.

The Grad-CAM Technique
In CNNs, the gradient class activation map (Grad-CAM) is a technique that is used to create a class-specific heatmap. This generated class-specific heatmap is based on a specific input image using a trained CNN model [35]. The Grad-CAM technique is employed to figure out the COVID-19 detection transparency. This technique actually highlights the regions of the input image, which is where the model pays much attention during the classification process, implying the fact that the feature maps generated in the final convolution layer hold the spatial information that helps in capturing the visual pattern. This visual pattern contributes in distinguishing assigned classes. The Grad-CAM technique is applied by utilizing the layers and extracted features of the trained model. The architecture explaining the Grad-CAM technique is shown in Figure 8. are given in Figure 7.

The Grad-CAM Technique
In CNNs, the gradient class activation map (Grad-CAM) is a technique that is used to create a class-specific heatmap. This generated class-specific heatmap is based on a specific input image using a trained CNN model [35]. The Grad-CAM technique is employed to figure out the COVID-19 detection transparency. This technique actually highlights the regions of the input image, which is where the model pays much attention during the classification process, implying the fact that the feature maps generated in the final convolution layer hold the spatial information that helps in capturing the visual pattern. This visual pattern contributes in distinguishing assigned classes. The Grad-CAM technique is applied by utilizing the layers and extracted features of the trained model. The architecture explaining the Grad-CAM technique is shown in Figure 8.  [36]. Figure 9 shows an example of a chest X-ray image generated using the Grad-CAM technique. The original chest X-ray image is shown in Figure 9a, whereas the overlay heatmap on the input image is shown in Figure 9b. The jet color scheme is used in this study. In this color scheme, blue tones represent lower values, which means that no features are extracted for a specific class, whereas the yellow and green tones represent medium values depicting quite less feature extraction, and the red and dark red tones represent larger values, i.e., the features in the region represent the specific class.  Figure 9 shows an example of a chest X-ray image generated using the Grad-CAM technique. The original chest X-ray image is shown in Figure 9a, whereas the overlay heatmap on the input image is shown in Figure 9b. The jet color scheme is used in this study. In this color scheme, blue tones represent lower values, which means that no features are extracted for a specific class, whereas the yellow and green tones represent medium values depicting quite less feature extraction, and the red and dark red tones represent larger values, i.e., the features in the region represent the specific class.

Implementation Details and Performance Parameters
The prepared dataset is evaluated using the four pre-trained models, i.e., VGG16, ResNet-50, DenseNet-121, and MobileNet. A 70:30 training-validation ratio was utilized, and the data splitting details can be found in Table 1. The training of the four models in this study was performed on the augmented data. For training, the dataset images were resized to 224 × 224 pixels. Besides, a batch size of 32 is kept with the number of epochs,

Implementation Details and Performance Parameters
The prepared dataset is evaluated using the four pre-trained models, i.e., VGG16, ResNet-50, DenseNet-121, and MobileNet. A 70:30 training-validation ratio was utilized, and the data splitting details can be found in Table 1. The training of the four models in this study was performed on the augmented data. For training, the dataset images were resized to 224 × 224 pixels. Besides, a batch size of 32 is kept with the number of epochs, which is 80. These values were finalized using the grid search approach. The learning rate was fixed to 0.0001 for the training of each model. Furthermore, in order to evaluate the performance of each model, the important performance parameters, i.e., precision, recall, specificity, accuracy, and F1-score were calculated. These parameters were calculated using Equations (1)-(5). The quantities involved in the calculation of aforementioned performance parameters, i.e., True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) were obtained from confusion matrices [37].
The normal and COVID-19 are considered as negative and positive cases, respectively. Therefore, nTN and nTP represent the correctly predicted normal and correctly predicted COVID-19 images, respectively, whereas nFN and nFP represent the falsely predicted normal and falsely predicted COVID-19 cases, respectively.

Results and Discussion
The training performance of the models under study were evaluated in terms of important parameters, i.e., training accuracy, validation accuracy, training loss, and validation loss at different epochs. The results of these parameters can be found in Table 4. These parameters are calculated to estimate the over-fitting and under fitting of the trained models. The graphs of training loss vs. validation loss and training accuracy vs. validation accuracy of each model are given in Figure 10. The training curves are reported on a population of 3544 patients from the training dataset, whereas the validation curves are reported on a population of 1518 patients from the validation dataset. Details of the training and validation datasets with data splitting ratios are given in Section 2.1 and Table 1. It can be observed that the Dense-Net-121 has the minimum training and validation loss and exhibits the best training and validation accuracy.   Furthermore, confusion matrices for all four models were generated in order to quantify the performance metrics, i.e., precision, F1-score, recall, specificity, and accuracy. The results of the aforementioned parameters are given in Table 5. The confusion matrices were generated using the testing dataset of 2170 images. These images were not included in the training and validation datasets. The confusion matrices of each model is given in Figure 11, whereas the true labels and predicted labels from confusion matrices with accuracies and 95% confidence intervals (CI) are presented in Table 6. In addition to calculating the confidence intervals, the Cohen's kappa coefficient was also calculated for each model, which confirms the reliability of the implemented models. 80 is considered to be a substantial agreement with the presented results, whereas the range of 0.81-1.00 is considered as an almost perfect agreement with the presented results [37,38]. It is evident from the aforementioned results that the Cohen's kappa coefficients of the three retrained models, i.e., DenseNet-121, Resnet-50, and MobileNet, lie in the range of 0.81-1.00, and that the Cohen's kappa coefficient of the VGG16 model lies in the range of 0.61-0.80. This confirms the reliability of the implemented models.    Using the numbers obtained via the confusion matrices, the performance parameters for all four models were calculated using Equations (1)- (5). The results of these parameters are given in Table 5. It can be seen that among the four models under study, the DenseNet-121 shows an eminent performance with a recall of 100%, specificity of 92.99%, F1-score of 0.97, and accuracy of 96.49%. When benchmarked with the popular convolutional neural network COVID-Net, which has an accuracy of 93.3%, two of the four retrained models in this research, i.e., DenseNet-121 and MobileNet exhibit an improved accuracy of 96.49% and 96.48%, respectively.

Optimizers
Four optimizers, i.e., SGD, Adadelta, RMSprop, and Adam were applied on the two best trained models in order to compare the performance of the optimizers and to select on optimizer which can be applied on all four models under study. The parameters obtained for Densenet-121 and MobileNet after applying the four aforementioned optimizers are given in Table 7. It can be seen that the RMSprop optimizer exhibited promising results among all four optimizers. Hence, the RMSprop optimizer was chosen to be applied on the four trained models. The results previously shown in Table 5 are actually RMSpropoptimized results. In order to choose an optimum learning rate for all models, the models were evaluated over a selected range of learning rates, i.e., from 10 −2 to 10 −6 (0.01, 0.001, 0.0001, 0.00001, 0.000001). The value of the testing loss was calculated for each model using the testing dataset of 2170 images (dataset details have been provided in Section 2.1 and Table 1). A value of the learning rate at the minimum loss was considered best for a model. The graphs between learning rates and values of the testing loss are given in Figure 12. In Figure 12a, the DenseNet-121 has the minimum testing loss of 0.1 at the learning rate of 0.0001. ResNet-50 has the minimum testing loss of 0.15 at 0.01, as shown in Figure 12b. MobileNet has the minimum testing loss of 0.129 at a 0.0001 learning rate, as shown in Figure 12c, and VGG16 has the minimum testing loss of 0.08 at a 0.0001 learning rate, as well as the same testing loss at a 0.00001 learning rate, as shown in Figure 12d. A learning rate of 0.0001 was chosen based on these results.  Figure 12c, and VGG16 has the minimum testing loss of 0.08 at a 0.0001 learning rate, as well as the same testing loss at a 0.00001 learning rate, as shown in Figure 12d. A learning rate of 0.0001 was chosen based on these results.

Optimal Batch Size
In the training of a CNN model, the batch size plays an indispensable role. The role of the batch size in the improvement of the testing accuracy is studied. The values of test accuracies based on different batch sizes, i.e., 8,16, and 32 are given in Table 8. It is evident that, for a batch size of 32, all of the models demonstrate highest test accuracies. Therefore, a batch size of 32 was set for this work.

Optimal Batch Size
In the training of a CNN model, the batch size plays an indispensable role. The role of the batch size in the improvement of the testing accuracy is studied. The values of test accuracies based on different batch sizes, i.e., 8,16, and 32 are given in Table 8. It is evident that, for a batch size of 32, all of the models demonstrate highest test accuracies. Therefore, a batch size of 32 was set for this work.

Prediction
By applying the transfer learning technique on the pre-trained models, we trained the augmented chest X-ray images, whereas for the testing and validation, local verified images of COVID-19-positive and normal patients were used. The prediction of ResNet-50, MobileNet, and Densenet-121 was accurate in determining whether the X-ray under test is of a COVID-19 patient or of a normal patient. However, VGG16 displayed the least accurate prediction and misclassified the test images because of its low testing accuracy and loss.
The prediction result for a chest X-ray of a local COVID-19-positive patient for each model is given in Figure 13a-d. The DenseNet-121 had the highest probability percentage of COVID-19, i.e., 99.9%. In order to highlight the area where the model is paying the most attention during feature extraction, the Grad-CAM technique was applied. The Grad-CAM results of all four models under study are given in Figure 14. Moreover, the results of the predicted normal patients for each model are shown in Figure 15a-d. Considering these results, the MobileNet has the highest probability of 96.8% for the prediction of a normal chest X-ray.

Prediction
By applying the transfer learning technique on the pre-trained models, we trained the augmented chest X-ray images, whereas for the testing and validation, local verified images of COVID-19-positive and normal patients were used. The prediction of ResNet-50, MobileNet, and Densenet-121 was accurate in determining whether the X-ray under test is of a COVID-19 patient or of a normal patient. However, VGG16 displayed the least accurate prediction and misclassified the test images because of its low testing accuracy and loss.
The prediction result for a chest X-ray of a local COVID-19-positive patient for each model is given in Figure 13a-d. The DenseNet-121 had the highest probability percentage of COVID-19, i.e., 99.9%. In order to highlight the area where the model is paying the most attention during feature extraction, the Grad-CAM technique was applied. The Grad-CAM results of all four models under study are given in Figure 14. Moreover, the results of the predicted normal patients for each model are shown in Figure 15a-d. Considering these results, the MobileNet has the highest probability of 96.8% for the prediction of a normal chest X-ray.

Conclusions
In this paper, we applied transfer learning technique with fine tuning on the four pretrained models (VGG16, ResNet-50, MobileNet, and DenseNet-121) in order to detect COVID-19 using chest X-ray images acquired from a hospital in Pakistan. The models under study were trained using the dataset of more than 3600 COVID-19 and normal chest X-ray images, whereas an indigenously collected dataset of 450 X-ray images of Pakistani patients were used for testing and prediction purposes. Various important parameters, e.g., recall, specificity, F1-score, precision, loss graphs, and confusion matrices were used to validate the accuracy of the models. The VGG16 model exhibited the least accurate performance in classifying the COVID-19 and normal chest X-ray images; however, the DenseNet-121 displayed promising results in the classification of the COVID-19 and normal images. The achieved accuracies of VGG16, ResNet-50, DenseNet-121, and MobileNet are 83.27%, 92.48%, 96.49%, and 96.48%, respectively. Furthermore, in order to highlight the area where the model is paying the most attention during feature extraction, the Grad-CAM technique was also applied to create class-specific heatmap images. Various optimizers were tested, and among all of the optimizers under study, the 'RMSprop' optimizer exhibited the best performance and hence was applied for the error minimization and better optimization in the training procedure.

Conclusions
In this paper, we applied transfer learning technique with fine tuning on the four pre-trained models (VGG16, ResNet-50, MobileNet, and DenseNet-121) in order to detect COVID-19 using chest X-ray images acquired from a hospital in Pakistan. The models under study were trained using the dataset of more than 3600 COVID-19 and normal chest X-ray images, whereas an indigenously collected dataset of 450 X-ray images of Pakistani patients were used for testing and prediction purposes. Various important parameters, e.g., recall, specificity, F1-score, precision, loss graphs, and confusion matrices were used to validate the accuracy of the models. The VGG16 model exhibited the least accurate performance in classifying the COVID-19 and normal chest X-ray images; however, the DenseNet-121 displayed promising results in the classification of the COVID-19 and normal images. The achieved accuracies of VGG16, ResNet-50, DenseNet-121, and MobileNet are 83.27%, 92.48%, 96.49%, and 96.48%, respectively. Furthermore, in order to highlight the area where the model is paying the most attention during feature extraction, the Grad-CAM technique was also applied to create class-specific heatmap images. Various optimizers were tested, and among all of the optimizers under study, the 'RMSprop' optimizer exhibited the best performance and hence was applied for the error minimization and better optimization in the training procedure.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.