A Deep Learning Approach to Detect COVID-19 Patients from Chest X-ray Images †

: Deep Learning has improved multi-fold in recent years and it has been playing a great role in image classiﬁcation which also includes medical imaging. Convolutional Neural Networks (CNNs) have been performing well in detecting many diseases including coronary artery disease, malaria, Alzheimer’s disease, different dental diseases, and Parkinson’s disease. Like other cases, CNN has a substantial prospect in detecting COVID-19 patients with medical images like chest X-rays and CTs. Coronavirus or COVID-19 has been declared a global pandemic by the World Health Organization (WHO). As of 8 August 2020, the total COVID-19 conﬁrmed cases are 19.18 M and deaths are 0.716 M worldwide. Detecting Coronavirus positive patients is very important in preventing the spread of this virus. On this conquest, a CNN model is proposed to detect COVID-19 patients from chest X-ray images. Two more CNN models with different number of convolution layers and three other models based on pretrained ResNet50, VGG-16 and VGG-19 are evaluated with comparative analytical analysis. All six models are trained and validated with Dataset 1 and Dataset 2. Dataset 1 has 201 normal and 201 COVID-19 chest X-rays whereas Dataset 2 is comparatively larger with 659 normal and 295 COVID-19 chest X-ray images. The proposed model performs with an accuracy of 98.3% and a precision of 96.72% with Dataset 2. This model gives the Receiver Operating Characteristic (ROC) curve area of 0.983 and F1-score of 98.3 with Dataset 2. Moreover, this work shows a comparative analysis of how change in convolutional layers and increase in dataset affect classifying performances.


Introduction
Deep Learning (DL) is a branch of Machine Learning (ML) which is inspired by the working procedure of the human brain. DL has the capability of unsupervised learning, i.e., to learn from the examples with unlabeled data. The features like unlabeled data utilization, working without feature engineering, and prediction with high accuracy and precision make DL very popular with Artificial Intelligence (AI) and Big Data analysis [1,2]. DL has been vastly used in industries, self driven cars, face recognition, object detection, image classification and in many other fields [3]. Convolutional Neural Network (CNN) is a DL algorithm which has been performing very well in solving problems like document analysis, different sorts of image classification, pose detection and action recognition [4]. Medical imaging is another field where CNN has been showing promising results in recent years [5].
Related works are described in Section 2, proposed and pretrained models are presented in Section 3, Results and Comparative Analysis are depicted in Section 4 and lastly Section 5 concludes the paper.

Related Works
Extensive research work is going on for classifying COVID-19 patient image data. Few researchers have proposed different DL models for classifying chest X-ray images whereas some others have taken CT images into consideration. Narin et al. proposed three pretrained CNN models based on ResNet50, InceptionV3 and Inception-ResNetV2 for detecting COVID-19 patient from chest X-ray radiographs [29]. It is found that ResNet 50 gives a classifying accuracy of 98% whereas InceptionV3 and Inception-ResNetV2 perform with the accuracy of 97% and 87% respectively. However, these models have taken only 100 images (50 COVID-19 and 50 normal Chest X-rays) into consideration for training which might result in declined accuracy for a higher number of training images. Zhang et al. propose a DL model for Coronavirus patient screening using their chest X-ray images [30]. This research group has used 100 chest X-ray images of 70 COVID-19 patients and 1431 X-ray images of other pneumonia patients where they are classified as COVID-19 and non-COVID-19 respectively. This model is formed of three main parts: backbone networks, classification head, and anomaly detection head. The backbone network is a 18 residual CNN layer pre-trained on ImageNet dataset and it is mentionable that ImageNet provides a huge generalized dataset for image classifications. This model can diagnose COVID-19 and non-COVID-19 patients with an accuracy of 96% and 70.65% respectively. Hall et al. also worked on finding COVID-19 patients from a small set of chest X-ray images with DL [31]. They have used pre-trained ResNet50 which generates the overall accuracy of 89.2%. Sethy and Behea have also utilized deep features for Coronavirus disease detection [32]. Their model is based on ResNet50 plus SVM which achieved an accuracy and F1-score of 95.38% and 91.41% respectively. Apostolopoulos and Mpesiana utilized CNN transfer learning for detecting COVID-19 with X-ray images [33]. This work has considered 224 chest X-ray images of COVID- 19 [35]. This work classifies the dataset into three categories-normal, COVID-19 and pneumonia with 1525 X-ray images in each of those categories. This CNN-LSTM based model achieved an over all accuracy of 99.4% and F1-score of 98.9. Hemdan et al. introduced a deep learning framework naming COVIDX-Net to classify COVID-19 X-ray images [36]. This model is based on only 25 chest X-ray images in each of the classes-normal and Covid-19. For classifying the images this model uses seven different pre-trained models-VGG19, DenseNet121, InceptionV3, ResNetV2, Inception-ResNetV2, Xception, MobileNetV2 [14,[37][38][39][40][41][42][43]. This research group has achieved the best performances from VGG19 with an overall accuracy of 90% and F1-score of 90.94. Chowdhury et al. used transfer learning with image augmentation to detect COVID-19 from chest X-ray images [44]. This work does the classification in two different scheme-(i) COVID-19 and normal and (ii) COVID-19, viral pneumonia and COVID-19. They have used 423 COVID-19, 1485 viral pneumonia, and 1579 normal chest X-ray images respectively for training and validation. This group achieved an excellent result with binary classification with accuracy and F1-score of 99.70% and 99.70 respectively. Ozturk et al. also does the binary and multi-class classification with their proposed DarkCovidNet model which is based on Darknet-19 model [45]. This model has used 500 normal and 127 COVID-19 chest X-ray images for training and validation of their model. For binary classification, this model has achieved an average overall accuracy of 98.08% whereas it is decreased to 87.02% in case of multi-class classification.  [46][47][48][49][50]. Other researchers have also put an effort to detect COVID-19 patient from CT scans in [51,52].
Most of the discussed research works in detecting COVID-19 use pretrained models for their model architecture. These models are pretrained on more generalized dataset like ImageNet. Here, a sequential CNN model is proposed that is computationally efficient due to its simplicity in architecture and this is trained from scratch with the relevant dataset. Moreover, this model is trained and validated with a smaller dataset as in [53] and also with a comparatively larger dataset to analyze how the model performs with increase in dataset and change in convolutional layers which is novel as per the discussed literature.

Proposed CNN Model for COVID-19 Detection
The whole system for detection of COVID-19 from chest X-ray images comprised of few important steps-collection of dataset, pre-processing the data, categorization of dataset, training the models and evaluation and analysis of the model. The complete system architecture of the for detecting COVID-19 with CNN is depicted in Figure 2. At first the dataset needed for training and validating the model is collected and sorted out. The collected data are then shuffled, resized and normalized to maintain the uniformity. After this step, all the data are categorize according to the classification of the model. Then all the models are trained and validated with the same dataset and same environment. Lastly the trained models are analyzed based on few important metrics like accuracy, recall, precision, F-1 score, ROC curve. The next part of this section discuss the dataset modeling and proposed CNN modeling in details.

Dataset Collection and Modeling
For training and validating the models, 201 chest X-ray images of COVID-19 patients are used which are obtained from open Github repository by Cohen et al. [54]. This repository contains patients' chest X-ray images of COVID-19, SARS, ARDS, pneumocystis, streptococcus, chlamydophila, E. coli, legionella, varicella, lipoid, bacterial, pneumonia, mycoplasma bacterial pneumonia, klebsiella and influenza. For training, only the COVID-19 positive X-rays are taken into consideration where the patients' ages range from 12-93 years. The training also needs the normal or non-COVID-19 chest X-rays which is obtained from Kaggle dataset naming "Chest X-ray Images (Pneumonia)" [55]. This repository contains 5863 images in two categories-normal and Pneumonia. However, we have taken 201 (same number as the COVID-19 chest X-ray images) normal chest X-ray images for the training and validating purposes. The whole dataset is primarily split into two categories: training and validation maintaining the ratio of 80% and 20% respectively. Each group of the training and validation dataset contains two subcategories: 'normal' and 'COVID-19', containing the respective types of X-ray images. So, for the training, both the categories-'normal' and 'COVID-19' contain 161 chest X-ray images each whereas, the validation dataset contains 40 images for each of the 'normal' and 'COVID-19' sub-categories. This is termed as Dataset 1 for the rest of the paper.
Another COVID-19 chest X-ray dataset is created by combining the Github repositories [54,56]. A dataset of 295 COVID-19 chest X-ray images are created which is used for training and validation of the model. Similarly, a larger dataset of 659 normal chest X-ray images are collected randomly from [55]. This whole dataset is divided into training and validation set. The training dataset contains 236 COVID-19 and 600 normal chest X-ray images. On the contrary, the validation set contains 59 chest X-ray images for each of the categories: COVID-19 and normal thus keeping the balance of the data for performance analysis. This whole dataset contains 954 chest X-ray images divided in to two classes and this is termed as Dataset 2. Table 1 depicts the categorization of Dataset 1 and Dataset 2. For maintaining uniformity and the image quality at the same time, all the images are converted to 224 × 224 pixels. Moreover, all the X-ray images that are used for training and validation of the model are in Posteroanterior (PA) chest view. Figures 3 and 4 present the sample of PA views of the X-ray images of both COVID-19 positive and Normal cases from the training and validation dataset respectively. All the models are trained and validated based on these two datasets and their performances are analyzed to observe how they perform with the increase in number of dataset.

CNN Modeling
CNN has been playing a great role in classifying images, in particular medical images. This has opened new windows of opportunities and made the disease detection much more convenient. It also successfully detects recent novel Coronavirus with higher accuracy. One of the constraints that researchers encounter is a limited dataset for training their model. Being a novel disease, the chest X-ray dataset of COVID-19 positive patients is also limited. Therefore, to avoid overfitting, a sequential CNN model is proposed as in authors' earlier work of [53] for classifying X-ray images. Figure 5 depicts the proposed CNN model for COVID-19 detection. This model has 4 main components: (i) input layers (ii) convolutional layers (iii) fully connected layers and (iv) output layers.
The tuned data set is fed into the input layers of the model. It has four convolutional layers, first one is a 2D convolutional layer with 3 × 3 kernels and Rectified Linear Unit (ReLU) activation function. ReLU is one of the most popular and effective activation functions that are being widely used in DL. ReLU does not activate all the neurons at the same time making it computationally efficient in comparison to other activation functions like tanh.
The next three layers are 2D convolutional layer along with the ReLU activation function and Max pooling layer. Max pooling accumulates the features of the convolutional layer by convolving filters over it. It reduces the computational cost as it minimizes the number of parameters thus it helps to avoid overfitting. In each of three layers a 2 × 2 Max pooling layer is added after the convolutional layer to avoid overfitting and to make the model computationally efficient. In the next step of the model, the output of the convolutional layers is converted to a long 1D feature vector by a flatten layer. This output from the flatten layer is feed to the fully connected layer with dropout. In a fully connected layer, every input neuron is connected to every activation unit of the next layer. All the input features are passed through the ReLU activation function and this layer categorizes the images to the assigned labels. The Sigmoid activation function makes the classification decision depending on the classification label of the neurons. Finally, in the output layer, it is declared if the input X-ray image is COVID-19 positive or normal. This model is termed as 'Model 1'. For comparative analysis, two more CNN models are also developed with 3 and 5 convolution layers instead of the 4 layers of the Model 1. These models with 3 and 5 convolution layers are termed as 'Model 2' and 'Model 3' respectively. Model 2 has one 3 × 3 convolution layer with ReLU having 32 channels and two more 3 × 3 convolution layers with ReLU and 2 × 2 Max Pooling layers having 64 channels each. Model 3 has a 3 × 3 convolution layer with ReLU and 2 × 2 Max Pooling having 128 channels as the fifth layer.
This work also takes few pretrained models into consideration in terms of their performance with COVID-19 image classification. Three pretrained models based on ResNet50, VGG-16 and VGG-19 are also developed and tuned to detect the COVID-19 cases from the same chest X-ray datasets [14,39]. ResNet is based on ImageNet and it has achieved excellent results with only 3.57% error. It has five stages each having one convolution and one identity block. Each of the convolution and identity blocks have 3 convolution layers. VGG-16 is a CNN model which is 16 layers deep as its name suggests. This is one of the most excellent CNN architecture for image classification. This model does not have a large number of hyper -parameters rather it use 3 × 3 convolution layers and 2 × 2 max pooling layer with stride of 1 and 2 respectively. The whole architecture is based on this consistent convolution and max pooling layer. VGG-19 is of the same architecture as VGG-16 except for VGG-19 has 19 deep layers instead of 16. The pretrained model of these three CNN architecture is used to extract features and outputs are feed to 2 × 2 average pooling layer. A flatten layer convert the outputs to a 1D feature vector. The output from the flatten layer is feed to the fully connected layer with dropout which has the same architecture as Model 1, Model 2 and Model 3. Figure 6 depicts the workflow diagram of the pretrained models.

Results and Analysis
For Dataset 1, the overall accuracy is 97.5%, 93.75%, and 95% for Model 1, Model 2, and Model 3 respectively whereas the pretrained model achieved the accuracy of 88.5%, 78.75% and 60% respectively by ResNet50, VGG-16 and VGG-19. It clearly shows that the proposed model (Model 1) performs better than the other models in terms of accuracy. The performance of the models is more evident from the metrics like precision, recall, and F-1 score. These performance metrics are calculated from the possible outcomes of the validation dataset which is obtained by the confusion matrix. A confusion matrix has four different outcomes: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). In this case, TP denotes the number of Coronavirus positive patients detected as positive, TN denotes the number of negative cases detected as negative, FP presents the number of cases which are actually negative but detected as positive and FN gives the cases which are actually positive but detected as negative. Receiver Operating Characteristics (ROC) curve represents the performance of the classifier at different threshold values which plot the TP rates vs. FP rates. Confusion matrices and ROC curve areas for all the six models is depicted in Figures 9 and 10 respectively for analytical analysis.
With Dataset 1, model 1 detects 39 TP and 39 TN cases, Model 2 finds 35 TP and 40 TN cases whereas, Model 3 detects 40 TP and 36 TN cases. On the other hand, the pretrained models perform very well in detecting the TN cases which is 40 for each models whereas the TP cases detected are 31, 23 and 8 by ResNet50, VGG-16 and VGG-19 respectively. The ROC curve area of the Model 1 is 0.975 which outperforms the other discussed models. On the contrary, VGG-19 achieved the lowest ROC curve area of 0.60 in compared to others. It is evident from the confusion matrices that Model 1 performs better in terms of case detection.
Accuracy defines how close the generated result is close to the actual value whereas precision measures the percentage of the relevant results. Recall or sensitivity is another important factor for evaluating a CNN model. It is defined by the percentage of the total relevant results that a model can correctly classify. F1-score combines both precision and recall and it is designated as the weighted average of these two. Equations (1)-(4) represents accuracy, precision, recall, and F-1 sore respectively.
Model 1 achieves the highest F1-score of 97.5, contrarily, VGG-19 performs with the lowest F1-score of 33.33. The overall performance and the F1-score of the proposed model (Model 1) show better results than that of the other models. The accuracy of the proposed model is 97.5% with the precision and recall value of 97.5% for both the parameters. The overall performance including accuracy and F1-score can be improved further by training the model with a larger dataset.
For This model performs better in comparison to other models with the Dataset 2 also. VGG-19 performs worst when compared with other models, the TN and FP cases are 59 and 0 respectively which are perfect but the TP cases are only 17. Figures 11 and 12 shows the confusion matrices and ROC curves for all the models trained and validated with Dataset 2.
The comparison of the models with Dataset 1 and Dataset 2 shows that, model 1 (proposed model), VGG-16 and VGG-19 performs better with larger dataset (Dataset 2) in comparison to performances with Dataset 1. The accuracy for model 1, VGG-16 and VGG-19 with Dataset 2 are 98.3%, 80.5%, 64.4% respectively whereas with Dataset 1, these models achieved comparatively lower accuracy of 97.5%, 78.5% and 60% respectively. Model 3 and ResNet50 performs almost similar with Dataset 1 and Dataset 2. On the other hand, model 2 performs worse with Dataset 2 in comparison with Dataset 1 due to less convolutional layers than that of other models. In terms of F1-score also, the proposed model (model 1), VGG-16 and VGG-19 perform better with Dataset 2 than that of Dataset 1. Table 2 shows the comparative results of all the models with Dataset 1 and 2.
The performances can be analyzed better with a comparative bar graph of two important metrics-accuracy and F1-score for Dataset 1 and 2 which is depicted by Figure 13a,b respectively.
It clearly shows that, the best result is achieved by model 1 (proposed model) which is trained and validated with Dataset 2 which contain 295 COVID-19 and 659 normal chest X-ray images with accuracy and F1-score of 98.3% and 98.3 respectively. It performs better than the other models.   Model 3 has five convolutional layers which starts overfitting a bit with Dataset 1 and 2. On the other hand, VGG-16, VGG-19 and ResNet50 have much more deep layers than these three models thus easily overfits with this smaller training and validating dataset. Therefore the performances of VGG-16 and VGG-19 improve with Dataset 2 compared to Dataset 1 (comparatively smaller). Their performances can be improved further by increasing the dataset. Moreover, approaches like data augmentation and cross layer validation can be adopted to improve the performances of these pre-trained models.   Even though this is a good result for the proposed model, a few researchers could achieve better results than this with binary classification. However, this work shows how number of convolutional layers and number of images in the dataset play role with performances of the models. Table 3 shows a comparative analysis of how the proposed model perform with other prominent models by different researchers. As CNN classifies images by extracting features from the images, it is possible to differentiate and classify between images with very minute and subtle changes. Of course, chest X-ray of a COVID-19 patient from early stage would show differences from an X-ray of the same patient at middle stage and late stage. Provided the necessary dataset, it would be possible to detect the stages of the COVID-19 patients. Moreover, this will allow doctors to treat the patients of different stages accordingly. This would need the chest X-ray datasets to be classified in stages or by days like-X-ray from the first day, X-ray from the third day or X-ray from the eighth day. Unfortunately, to the best of authors' knowledge, they could not find such a dataset with time labelling. As it is a very new disease, and there is a lack of reliable classified data according to different stages, classifying stages are not addressed here in this work but it is hoped to address this challenge in future work.

Conclusions
Mass testing and early detection of COVID-19 play an important role in preventing the spread of this recent global pandemic. Time, cost, and accuracy are the few major factors in any disease detection process specially COVID-19. To address these issues, a CNN based model is proposed in this paper for detecting COVID-19 cases from patients' chest X-rays. The CNN models are trained with Dataset 1 which has a total of 402 chest X-ray images divided into two classes and also with a comparatively larger dataset (Dataset 2) which contains a total of 954 chest X-ray images. Of all the discussed six models, the proposed model excels other models with both the datasets. The accuracy and F1-score of the proposed model is 98.3% and 98.3 with Dataset 2. Moreover, this model compares the achieved results with other prominent works in the field. This work can be improved further with multi-class classification and availability of the larger dataset. Finally, CNN has great prospects in detecting COVID-19 with very limited time, resources, and costs. Though the proposed model shows promising results, it is not clinically tested. However, with such a higher accuracy the proposed model can surely play an important role in early and fast detection of COVID-19 thus reducing testing time and cost.