A Deep-Learning-Based Framework for Automated Diagnosis of COVID-19 Using X-ray Images

: The emergence and outbreak of the novel coronavirus (COVID-19) had a devasting e ﬀ ect on global health, the economy, and individuals’ daily lives. Timely diagnosis of COVID-19 is a crucial task, as it reduces the risk of pandemic spread, and early treatment will save patients’ life. Due to the time-consuming, complex nature, and high false-negative rate of the gold-standard RT-PCR test used for the diagnosis of COVID-19, the need for an additional diagnosis method has increased. Studies have proved the signiﬁcance of X-ray images for the diagnosis of COVID-19. The dissemination of deep-learning techniques on X-ray images can automate the diagnosis process and serve as an assistive tool for radiologists. In this study, we used four deep-learning models—DenseNet121, ResNet50, VGG16, and VGG19—using the transfer-learning concept for the diagnosis of X-ray images as COVID-19 or normal. In the proposed study, VGG16 and VGG19 outperformed the other two deep-learning models. The study achieved an overall classiﬁcation accuracy of 99.3%.


Introduction
A novel coronavirus (2019-nCoV or COVID-19 or SARS-CoV-2) case was reported in China in December 2019. The spread of the virus is increasing at an exponential rate globally and has become a pandemic. This COVID-19 outbreak has had an instigated a distressing impact on individuals' lives and community health and has led to an economic crisis. The COVID-19 pandemic has led to 7,981,067 cases, with 435,141 deaths, and is affecting 213 countries worldwide [1]. Some of the common symptoms associated with the virus are fever, cough, sore throat, shortness of breath, etc. The source of transmission is via human-to-human interaction and respiratory droplets [2]. Due to the highly contagious nature of the COVID-19 virus, early detection and control is crucial to control the outbreak. The long incubation period of COVID-19 has posed a substantial challenge regarding the control of this pandemic. Furthermore, the asymptomatic nature of COVID-19 in some patients is another reason for the outbreak. Owing to these reasons, early detection and control of the spread of COVID-19 is very hard.
Due to the above-mentioned challenges and the increase in the number of cases, efforts have been made to explore an effective method for accurate and easy diagnosis of COVID-19. Among the gold-standard laboratory methods used for the diagnosis of COVID-19 is real-time Reverse Transcription Polymerase Chain Reaction (RT-PCR) using an oral or nasopharyngeal swab [3]. The RT-PCR test requires a special kit to perform the diagnosis. It is a complex and time-consuming text. Furthermore, RT-PCR suffers from a high false-negative rate and results in poor sensitivity [4]. A high false-negative rate increases the prevalence of the virus, as some of the patients who carry the virus are diagnosed as negative, may interact with a substantial amount of people, and therefore, transmit the virus. These individuals will, in turn, be very precarious and can potentially increase the spread of the The study is organized as follows; Section 2 covers the literature review. Section 3 discusses the data set description; Section 4 covers the methodology along with the evaluation parameters used. Section 5 presents the experimental results, while Section 6 contains the comparison between the proposed study and the previous studies. Section 7 briefs a conclusion of the study.

Literature Review
Subsequently, manifesting the significance of X-ray images and deep-learning in COVID-19 diagnosis, several noticeable studies will be discussed in this section.
Recently, a study by Hamdan et al. [29] presented COVIDX-Net, containing a comparative analysis of seven deep-learning models for the diagnosis of COVID-19. The models used were VGG19, DenseNet201, ResNetV2, InceptionV3, InceptionResNetV2, Xception, and MobileNetV2 using a binary data set that consisted of 50 X-ray samples . Experiments were performed using the X-ray images from two data sets: the COVID-19 X-ray image database [30], which consists of 123 frontal view X-rays, and the Adrain Rosebrock data set [31]. The study achieved the highest accuracy of 90% with VGG19 and DenseNet201. Nevertheless, the study suffers from the limitation of a small data set.
Likewise, VGG19 and ResNet50 models were used and compared with the proposed COVID-Net model in the study performed by Wang et al. [32] for COVID-19 diagnosis using a pretrained ImageNet model and Adam optimizer using a multiclass data set (normal, pneumonia, and COVID-19). They achieved a better accuracy rate of 93.3% when compared to the study mentioned earlier. There were 13,975 X-ray images that were taken from multiple open-source data sets [30,[33][34][35][36]. However, to address the data imbalance issue a data augmentation technique was used. Similar to Wang et al., another study performed by Apostolopoulos et al. [37] also used VGG19. The data set used in the study was collected from four open-source data sets [30,34], Radiopaedia [38], and the Italian Society of Medical and Interventional Radiology (SIRM) [39]; the total number of X-ray images was 1427 (224 COVID-19, 700 pneumonia, and 504 normal). Experiments were performed for binary and multiclass categories. The highest accuracy achieved for the binary class was 98.75% and for multiclass, the highest accuracy was 93.48%. Like the two previously mentioned studies [29,32], VGG19 outperformed the other models in terms of accuracy.
Furthermore, Kumar et al. [40] used ResNet50 and Support Vector Machine (SVM) classification for the diagnosis of COVID-19 using X-ray images. The chest X-ray images used in the study were taken from two COVID-19 X-ray image databases: Cohen [30] and Kaggle Chest X-ray Images (Pneumonia) [41]; though, the experiments used 50 chest X-rays (25 COVID-19, 25 non-COVID). The study outperformed-with an accuracy of 95.38% for the binary class-and produced better results when compared to the study performed by Hemdan et al. [29]. Similarly, the study performed by Ali et al. [42] also found ResNet50 with the highest accuracy of 98% using some X-ray images from the data sets [30,41]. Nevertheless, the study achieved a very high outcome. However, the data set used in the study was small.
In addition, Ozturk et al. [43] proposed a DarkNet, a deep neural network model for COVID-19 diagnosis using X-ray images from two data sets, namely, the COVID-19 X-ray image database [30] and ChestX-ray8 database [44] for both binary (125 COVID, 500 no-findings) and multiclass (124 COVID, 500 pneumonia, and 500 no-findings) classification. The model contains a 17-layer convolutional network model, a leaky ReLu activation function, and a "You Only Look Once" (YOLO) object detection model. For multiclass, the accuracies of 82.02% and 98.08% for the binary class were achieved. DarkNet yielded a better classification performance when compared to the former study [40]. Despite all these benefits, the study suffers from the limitation of a small number of X-ray images of COVID-19 patients.
Consequently, due to the effectiveness of the ResNet and DenseNet models in the previous studies, Minaee et al. [45] proposed a Deep-COVID model using ResNet18, ResNet50, SqueezeNet, and DenseNet121. ResNet was trained using an ImageNet pretrained model. The study created a data Information 2020, 11, 419 4 of 13 set of 5000 chest X-ray images (COVID X-ray 5k) using two open-source data sets [30,46]. The study achieved 97.5% sensitivity and 90% specificity for the binary classification. However, the number of samples for COVID-19 was 100, while for the non-COVID category was 5000 samples; this imitates a huge data imbalance.
To address the small size open-source data set for COVID-19, Afshar et al. [47] proposed a model COVID-CAPS, using a capsule network containing four CNN and three capsule layers. The study was performed using two open-source data sets [30,41], and achieved the highest accuracy of 95.7% and 95.8% specificity and a sensitivity of 90%. However, the study suffers from a huge data imbalance. Furthermore, Ucar et al. [48] proposed COVIDiagnosis-Net, a deep-learning-based model using SqueezeNet and Bayesian optimization techniques on a COVIDxNet data set [29]. The study produced an accuracy of 98.3% for multiclass classification. Likewise, the same data set was used in another model-COVID-ResNet by Farooq et al. [49]. They used pretrained ResNet30 with the aim to reduce training time and achieved an overall accuracy of 96.23% for multiclass. This study achieved lower accuracy when compared to the former study but covered the bacterial infection. Correspondingly, pretrained ResNet18 was used in another study performed by Oh et al. [50] and achieved an accuracy of 88.9%. Several data sets were used such as Japanese Society of Radiological Technology (JSTR) [51,52], U.S. National Library of Medicine (USNLM) collected Montgomery Country (MC) (NLM/MC) [53], CoronaHack [54], and the COVID-19 X-ray image database [30].
Undoubtedly, the significance of the chest X-ray for the diagnosis of COVID-19 and the implication of a deep convolutional model for the automated analysis of X-ray [28] motivated the need for further exploration. Despite these advantages, there is a conundrum to find an open-source data set containing a large number of COVID-19 X-ray images. Most of the previous studies suffer from a small data set or data imbalance. To avoid these drawbacks, we used the data set which is the combination of a number of open-source data sets. Several studies have already been performed, but indeed, there is still a need for further exploration.

Data Set Description
The X-ray images used were taken from four open-source chest X-ray image data sets, with a total number of X-rays of 1683. The details of the images taken from each data set are mentioned below. a.
COVID-19 X-ray image database collected by Cohen et al. [30] consists of a total of 660 images; some of the images in the data set were CT-scan, and some were nonfrontal chest X-rays. CT-scan and nonfrontal X-rays of non-COVID-19 patients X-rays were removed. Moreover, the images tagged with pneumonia were also removed from the data set. The selected frontal chest X-ray of positive COVID-19 patients from Cohen's data set was 390 X-rays. b.
Furthermore, 25 X-ray images of COVID-19 patients were selected from the COVID-19 chest X-ray data initiative [33] data set. The original data set consisted of 55 X-rays. Some of the images in the data set were not clear and were not considered in our experiments. c.
Additionally, 180 X-ray images of COVID-19 were also selected from the Actualmed COVID-19 chest X-ray data initiative [35]. Originally, the data set consisted of 237 scans. d.
Finally, the X-ray images of both the normal and COVID-19 categories were selected from the COVID-19 radiography database [36]. The data set contained 1057 X-ray images (219 COVID-19, 1341 normal, and 1345 viral pneumonia). In our study, we selected 195 X-rays for COVID-19 and 862 images for the normal category.
The total number of images per category used for training, testing, and validation is shown in the Table 1, while Figure 1 indicates the number of images per category (normal and COVID-19). The distribution of the data set used in the study was stratified in order to alleviate the data imbalance issue. Figure 2 contains the sample of COVID-19 and normal X-ray in the data set.

COVID-19
Normal Figure 2. Chest X-ray sample images from the data set used in the study.

Methodology
The model consists of two steps: preprocessing and data augmentation and transfer-learning using pretrained deep-learning models: ResNet50, VGG16, VGG19, and DenseNet121. This study classified the chest X-ray images into binary classes as normal and COVID-19. The description of the stages is discussed below.

Data PreProcessing and Augmentation
During this stage, a data augmentation technique was applied, with the aim to alleviate the problem of model overfitting. Due to the in-depth nature of the pretrained model, there is a high risk of overfitting if the size of the data set is small. To circumvent this drawback, additional images were generated using data augmentation. The data augmentation method increases the generalization of the data, specifically for the X-ray data sets [55,56]. Augmentation was applied using three stepsresizing, flipping, and rotation. For resizing the images, a dimension of 224 × 224 × 3 was used. Moreover, random horizontal flip was used to increase the generalization of the model for all possible locations of the COVID-19 symptoms in the X-rays. Finally, some images were generated by applying

COVID-19
Normal Figure 2. Chest X-ray sample images from the data set used in the study.

Methodology
The model consists of two steps: preprocessing and data augmentation and transfer-learning using pretrained deep-learning models: ResNet50, VGG16, VGG19, and DenseNet121. This study classified the chest X-ray images into binary classes as normal and COVID-19. The description of the stages is discussed below.

Data PreProcessing and Augmentation
During this stage, a data augmentation technique was applied, with the aim to alleviate the problem of model overfitting. Due to the in-depth nature of the pretrained model, there is a high risk of overfitting if the size of the data set is small. To circumvent this drawback, additional images were generated using data augmentation. The data augmentation method increases the generalization of the data, specifically for the X-ray data sets [55,56]. Augmentation was applied using three stepsresizing, flipping, and rotation. For resizing the images, a dimension of 224 × 224 × 3 was used. Moreover, random horizontal flip was used to increase the generalization of the model for all possible locations of the COVID-19 symptoms in the X-rays. Finally, some images were generated by applying the rotation of 15 degrees. The applied augmentation techniques endeavored to boost the proposed model generalization. The data augmentations were only applied on the X-ray training data set.

Methodology
The model consists of two steps: preprocessing and data augmentation and transfer-learning using pretrained deep-learning models: ResNet50, VGG16, VGG19, and DenseNet121. This study classified the chest X-ray images into binary classes as normal and COVID-19. The description of the stages is discussed below.

Data PreProcessing and Augmentation
During this stage, a data augmentation technique was applied, with the aim to alleviate the problem of model overfitting. Due to the in-depth nature of the pretrained model, there is a high risk of overfitting if the size of the data set is small. To circumvent this drawback, additional images were generated using data augmentation. The data augmentation method increases the generalization of the data, specifically for the X-ray data sets [55,56]. Augmentation was applied using three steps-resizing, Information 2020, 11, 419 6 of 13 flipping, and rotation. For resizing the images, a dimension of 224 × 224 × 3 was used. Moreover, random horizontal flip was used to increase the generalization of the model for all possible locations of the COVID-19 symptoms in the X-rays. Finally, some images were generated by applying the rotation of 15 degrees. The applied augmentation techniques endeavored to boost the proposed model generalization. The data augmentations were only applied on the X-ray training data set.

Deep Neural Networks and Transfer-Learning
The emergence of the Convolution Neural Networks (CNN) or deep-learning has enhanced the image classification task. Training the deep neural network model requires a large training data set. The performance of a deep-learning model highly depends on the number of images used to train the model because the model has the innate ability to extract the features (temporal and spatial) using filters. However, deep-learning can also be employed in the domain where the size of the data set is not huge by using the concept of transfer-learning. In transfer-learning, the feature extracted from the specified data using a CNN model is transferred to solve related tasks, including new data (small data set), where building the CNN from scratch is unsuitable [57]. Among the widely used methods for transfer-learning in a medical domain is training a model with a huge data set i.e., ImageNet [58], a pretrained model for object detection and classification. The selection of the deep-learning model for the transfer-learning depends on the ability of the model to extract the features related to the domain.
Transfer-Learning is implemented using two steps such as feature extraction and parameter tuning (optimization strategy). During the feature extraction, the pretrained model holds the new features from the data set using the training data. Secondly, to optimize the performance of a model in the current applied domain, the model architecture needs to be reconstructed and updated along with the parameter tuning. Using this pretrained model alleviates the drawback of a small data set and reduces the computational cost.
The pretrained models used in this study are discussed below. Pretrained models such as DenseNet121 [59], ResNet50 [60], VGG16, and VGG19 [61] were used. These models were pretrained using the ImageNet data set and were further trained using the X-ray data set.

a.
DenseNet121: The dense convolutional neural network (DenseNet) is a feed forward fully connected neural network. Each layer in DenseNet consists of a feature map. The feature map of each layer serves as an input to the next layer. Among the advantages of the DenseNet is that it requires less parameters. The number of filters or feature maps used in DenseNet is 12. Traditional convolutional neural networks consisting of L layers contain L number of connections, while in DenseNet, the number of direct connections is L(L+1) 2 [60]. The dense connectivity of the model circumvents the need for redundant learning. In addition to this, DenseNet decreases the chance of model overfitting due to the small size training data set by applying regularization. b.
ResNet50: ResNet, also known as the deep residual network, was initially proposed in 2015 with the motivation of a "identity shortcut connection". It is also among the pretrained models using ImageNet. ResNet skips one or more layers and handles the gradient vanishing issue. Among the key advantages of ResNet is easier optimization. Moreover, the accuracy of the model can be enhanced with the increase in the depth of the model [60]. ResNet model skips one, two, or more layers and is directly connected to any layer, not necessarily the adjacent layer, using a ReLu nonlinear activation function. ResNet uses the forward and backward propagation method. c.
VGG: VGG, also known as a very deep convolutional network, was first introduced in 2014. VGG is an advanced version of AlexNet with an increased number of layers. The increase in the number of layers increases the generalization of the model [61]. The benefit of VGG is the use of only 3 × 3 convolutional filters. The only difference between VGG16 and VGG19 is the number of layers. However, the convolutional neural network is used for analyzing the object of the image. We used both models for COVID-19 X-ray.
Finally, for training the network, the input image size 224 × 224 × 3 was used, and the initial learning rates (1 × 10 −3 ) for all models were kept fixed. The number of Epochs was kept at 30 for each model, and the ReLU activation function was used in order to make the feature extraction range of neurons more extensive.

Model Evaluation
For evaluating the performance of the proposed module, several standard evaluation parameters were used. In the current study, we compared the performance of all the developed models in terms of Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), Positive Predicted Value (PPV), F1 Score (F1), False Negative Rate (FNR), and False-Positive Rate (FPR) [62]. Accuracy (ACC) represents the total number of X-ray samples classified correctly as COVID-19 or healthy divided by the total number of X-rays of the data set. Sensitivity is among the widely used measures in the health domain. Sensitivity indicates the true positive rate of the proposed technique. As mentioned in the first section, the sensitivity of the RT-PCR test was low; so, in the current study, sensitivity is among the key evaluation measures. It represents the ratio of the number of patients predicted COVID-19 using the model and the number of overall COVID-19 cases in the data set. Specificity, also known as the true negative rate, represents the ratio of the number of patients that were predicted normal and all patients that were normal in the data set.
Moreover, the likelihood ratio of the proposed models was evaluated in terms of positive predicted value (PPV), also known as the precision. It indicates if the X-ray image is predicted as COVID-19 and how well it represents the actual presence of COVID-19 disease.
In addition, other measures used were F1, FNR, and FPR. F1 is a combination of predicted positive value (precision) and sensitivity (true positive rate), also known as recall. However, false negative and positive rate measure the number of X-ray samples predicted mistakenly as COVID-19 positive or healthy. The smaller the value of false negative and false positive, the higher the significance of the proposed model.

Experimental Results
Experiments were conducted using the four trained models-DenseNet121, ResNet50, VGG16, and VGG19. The below figures (Figures 3-6) show the training accuracy, training loss, testing accuracy, and testing loss of the implemented models with the number of epochs. Figure 3 represents the number of epochs and accuracy for all the implemented deep-learning techniques for training the model. However, Figure 4 represents the training loss. These Figures represent the significance of VGG19 and VGG16 in the training phase. VGG19 had the highest accuracy and reduced loss. Similarly, testing the accuracy curve (see Figure 5) and loss curve (see Figure 6) showed the effectiveness of both VGG16 and VGG19 deep-learning models. However, the performance of DenseNet121 had a much lower stability and produced the worst outcome when compared with the other applied models.
In our study, a total of four models were developed, and the performance of each model was evaluated in terms of the measures discussed in the previous section. The comparative analysis of all four models is shown in the Table 2. Our results demonstrate that VGG16 and VGG19 outperformed the other two models. The highest accuracy achieved in the study was 99.33%. Both VGG16 and VGG19 achieved similar accuracy. However, the sensitivity of VGG19 was higher than VGG16. The main motivation of using X-ray images for the early diagnosis of COVID-19 was alleviating the impediment of RT-PCR i.e., lower sensitivity. RT-PCR laboratory tests have a higher false positive rate. The main challenge in controlling the outbreak of the pandemic is that the gold-standard test used for the diagnosis of the COVID-19 has a lower sensitivity. Higher sensitivity indicates the only a few COVID-19 patients will be undetected which will ultimately reduce the spread of the disease. VGG-19 achieved the highest sensitivity of 100%. Similarly, the specificity indicates the true negative rate. VGG-16 achieved the highest specificity of 99.38%. and VGG19. The below Figures (Figures 3-6) show the training accuracy, training loss, testing accuracy, and testing loss of the implemented models with the number of epochs. Figure 3 represents the number of epochs and accuracy for all the implemented deep-learning techniques for training the model. However, Figure 4 represents the training loss. These Figures represent the significance of VGG19 and VGG16 in the training phase. VGG19 had the highest accuracy and reduced loss. Similarly, testing the accuracy curve (see Figure 5) and loss curve (see Figure 6) showed the effectiveness of both VGG16 and VGG19 deep-learning models. However, the performance of DenseNet121 had a much lower stability and produced the worst outcome when compared with the other applied models.     In our study, a total of four models were developed, and the performance of each model was evaluated in terms of the measures discussed in the previous section. The comparative analysis of all four models is shown in the Table 2. Our results demonstrate that VGG16 and VGG19 outperformed    In our study, a total of four models were developed, and the performance of each model was evaluated in terms of the measures discussed in the previous section. The comparative analysis of all four models is shown in the Table 2. Our results demonstrate that VGG16 and VGG19 outperformed the other two models.    In our study, a total of four models were developed, and the performance of each model was evaluated in terms of the measures discussed in the previous section. The comparative analysis of all four models is shown in the Table 2. Our results demonstrate that VGG16 and VGG19 outperformed the other two models.  Consequently, the false positive rate value is also among the key measures because if a patients is wrongly predicted as positive and treated by the same health professional, that might expose the patient to the virus. Moreover, due to the exponential increase in the COVID-19 pandemic, most countries are facing difficulties in managing the patients as the number of patients is very high, and the resources are not enough to treat the patients. A high FPR increases the burden on the health care system, due to the increased number of required resources (RT-PCR tool kits, other medical resources), which will sometimes fail to accommodate the actual positive patients.
VGG16 achieved the lowest false positive rate of 0.62%, while the false negative rate of VGG19 was the best. Similarly, VGG16 achieved the highest outcomes in terms of positive predicted value and F1 score.

Comparison with Existing Studies
In the proposed study, four deep-learning models-DenseNet121, ResNet50, VGG16, and VGG19 were used. The data set consisted of a total of 1272 X-ray images (642 normal, 630 COVID19). To compare the performance of the proposed techniques, the outcome of the study was compared with the benchmark studies. The criterion for the benchmark was the studies using X-ray radiology images for the diagnosis of COVID-19. Table 3 contains the comparison of the proposed technique with the benchmark studies in the literature using X-ray images for the diagnosis of COVID-19.
Therefore, based on Table 3, the proposed study outperformed the studies in the benchmark. Most of the previous studies had a very limited number of COVID-19 X-ray radiology images. Novel coronavirus (COVID-19) is a new pandemic; limited open-source X-ray radiology images are available for developing a deep-learning based automated diagnosis model. Nevertheless, a huge number of X-ray images is present for other respiratory diseases. Most of the previous studies suffered from data imbalance issues. Widely used deep-learning models in the literature for COVID-19 diagnosed using X-ray radiology images were VGG19, ResNet, Inception, DenseNet, and SqueezeNet. However, in our studyXception and SqueezeNet did not provide good outcomes.
The main contributions of the current study are: 1.
The study does not suffer from data imbalance. 2.
The model was trained using a large number of COVID-19 X-ray radiology images when compared to the previous studies.

3.
The proposed model is a fully automated diagnosis method and does not require any separate feature extraction or annotation prior to the diagnosis. 4.
Data augmentation was applied to increase the generalization of the proposed model. 5.
The model outperforms the the benchmark studies.
Despite the above-mentioned advantages, the study also suffers from some limitations: 1.
The proposed system needs to be trained for other respiratory diseases. The current model only diagnoses COVID-19 and healthy individuals and is unable to diagnose other kinds of pneumonia and respiratory infections.

2.
The number of COVID-19 X-ray radiology images needs to be increased for better model training.
The deep-learning model performance can be further enhanced with the increase in the size of the data set. 3.
The current study was based on the data set curated using several open-source chest X-ray images. These samples were collected from various research publications or uploaded by volunteers. Therefore, these X-ray images were not collected in rigorous manner.
To alleviate the above mention limitations, there is a need for developing a model using X-ray data samples from the hospital.

Conclusions
In this study, we used transfer-learning for an automated COVID-19 diagnosis using X-ray images. The motivation of using X-ray images for the diagnosis of COVID-19 is the lower sensitivity of the gold-standard RT-PCR diagnosis test. The proposed system achieved the highest sensitivity of 100% and specificity of 99.3% when compared to the studies in the benchmark. The system can assist radiologists in the early diagnosis of COVID-19. Generalization in the model was achieved by generating the data using augmentation. Moreover, the study attempted to use a large number of COVID-19 X-ray images by combining several open-source data sets. Despite combining multiple open-source data sets, there is still a need for an increased number of COVID-19 positive X-ray sample images. An increased number of COVID-19 X-ray samples will enhance the model's performance.
Author Contributions: Conceptualization, investigation and software, I.U.K.; data curation, formal analysis, writing-original draft and writing-review & editing, N.A. All authors have read and agreed to the published version of the manuscript.