COVID-XNet: A Custom Deep Learning System to Diagnose and Locate COVID-19 in Chest X-ray Images

Featured Application: This work could be used to aid radiologists in the screening process, contributing to the ﬁght against COVID-19. Abstract: The COVID-19 pandemic caused by the new coronavirus SARS-CoV-2 has changed the world as we know it. An early diagnosis is crucial in order to prevent new outbreaks and control its rapid spread. Medical imaging techniques, such as X-ray or chest computed tomography, are commonly used for this purpose due to their reliability for COVID-19 diagnosis. Computer-aided diagnosis systems could play an essential role in aiding radiologists in the screening process. In this work, a novel Deep Learning-based system, called COVID-XNet, is presented for COVID-19 diagnosis in chest X-ray images. The proposed system performs a set of preprocessing algorithms to the input images for variability reduction and contrast enhancement, which are then fed to a custom Convolutional Neural Network in order to extract relevant features and perform the classiﬁcation between COVID-19 and normal cases. The system is trained and validated using a 5-fold cross-validation scheme, achieving an average accuracy of 94.43% and an AUC of 0.988. The output of the system can be visualized using Class Activation Maps, highlighting the main ﬁndings for COVID-19 in X-ray images. These promising results indicate that COVID-XNet could be used as a tool to aid radiologists and contribute to the ﬁght against COVID-19. Contributions: Conceptualization, Methodology, and Investigation, Resources, L.D.-L., J.P.D.-M., manuscript.


Introduction
COVID-19 is the disease caused by the new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which was recently declared a pandemic [1]. Coronaviruses are an extensive family of viruses that may affect both humans and animals, causing problems to the respiratory system [2]. Other well-known human coronaviruses identified in the past are SARS-CoV and the Middle East respiratory syndrome-related coronavirus (MERS-CoV), which have had around 8100 and 2500 confirmed cases, with a case fatality rate of around 9.2% and 37.1%, respectively [3,4].
Globally, as of 12 August 2020, the number of confirmed deaths worldwide caused by COVID-19 has surpassed 700 k, with more than 215 countries, areas or territories affected and a total of more used 10 X-ray images for the COVID-19 class. Moreover, works that proposed not only COVID-19 detection, but also locating affected areas of the lungs, did not include any ground truth comparison or medical supervision with the obtained results.
In this work, we present a DL-based CAD system, named COVID-XNet, which classifies between COVID-19 and normal frontal X-ray chest images. The network focuses on specific regions of the lungs in order to perform a prediction and detect whether the patient has COVID-19. The output of the system can be then represented in a heatmap-like plot by performing the Class Activation Map (CAM) algorithm, which locates the affected areas. The high reliability obtained in the results, which were supervised by a lung specialist, indicates that this system could be used to aid expert radiologists as a screening test for COVID-19 diagnosis in patients with clinical manifestations, helping them throughout this stage and to overcome this situation.
The rest of the paper is structured, as follows: first, materials and methods are presented in Section 2, focusing on the dataset and the CNN model. Subsequently, results and discussion are presented in Section 3, where a quantitative evaluation is performed, along with the visualization of the pulmonary areas that are affected by COVID-19. Finally, the conclusions over the obtained results are presented in Section 4.

Materials and Methods
In this section, the dataset used for this work and the custom CNN model are described, detailing each of them in different subsections.

Materials
In this work, different publicly-available datasets were taken into account to build a diverse and large collection of chest X-ray images from healthy patients and COVID-19 cases. Both posteroanterior (PA) and anteroposterior (AP) projections were considered, discarding lateral X-ray images.
For the COVID-19 class, chest X-ray images were obtained from the BIMCV-COVID19+ dataset, being provided by the Medical Imaging Databank of the Valencia Region (BIMCV) [25] and from the COVID-19 image data collection from Cohen et al. [26]. On the other hand, for healthy patients, images were obtained from the PadChest dataset, also provided by BIMCV [27]. From the total number of images labeled as normal from this dataset, around the first 10% were used, since, otherwise, the imbalance between the number of cases for COVID-19 and healthy patients would have been very high. Therefore, a total of 2589 images from 1429 patients and 4337 images from 4337 patients were considered for COVID-19 and normal classes, respectively.

Preprocessing
Step A preprocessing step, which included different techniques, was applied to the original images in order to reduce the large variability of these images.
Firstly, all of the images were converted to grayscale. Because the original images came from different hospitals and, consequently, from different X-ray machines, a histogram matching process was applied to every image, taking one of them as a reference [28]. Therefore, all images in the dataset were similar in terms of histogram distribution.
Subsequently, rib shadows were suppressed from the X-ray images with a pretrained autoencoder model developed by Chuong M. Huynh, which is publicly available in GitHub (www.github. com/hmchuong/ML-BoneSuppression). This makes it easier for the network to focus on relevant information within the lungs. Rib shadows suppression has been applied in other works related to lung cancer, pulmonary nodules, and pneumonia detection in chest radiography, proving to be a useful approach to help radiologists and machine learning systems when diagnosing lung related diseases [29][30][31][32][33].
After this process, a contrast enhancement method, called Contrast Limited Adaptive Histogram Equalization (CLAHE) [34], was used to improve local contrast and enhance image definition. Figure 1 shows the whole preprocessing phase, where each algorithm's output is presented for three different examples.

Convolutional Neural Network
After applying the preprocessing step, the obtained images were used as input to a custom CNN model that was trained from scratch to classify between COVID-19 and normal cases. This model consists of the following set of layers: five convolutions, four max poolings, a Global Average Pooling (GAP), and a final softmax layer (see Figure 2). This custom model was selected by means of an Exhaustive Grid Search over the number of layers and kernel's sizes, prioritizing accuracy and computational complexity. Layers were explored from one up to the maximum number of layers that allowed having features of over one × one pixels before the GAP layer. Kernel sizes were explored from three × three up to 11 × 11. The best configuration over all the different possibilities was the one selected.  Figure 2. Diagram of COVID-XNet. It consists of five convolutional layers (Conv), four max pooling layers (MaxPool), a GAP layer, and a softmax layer. Conv1, Conv2, and Conv3 use five × five kernel size, while Conv4 and Conv5 use three × three. All MaxPool layers use 2 × 2 kernels.

Training and Testing the Network
To ensure that our model was generalizing well with data that it had not been trained with, a stratified five-fold cross-validation was used to train and validate the network with all the images. This allowed obtaining more robust results on the different metrics that were used, which are presented in Section 2.2.4. For this approach, the images were split in five different sets, taking into account that images from the same patient were only present in a single set. Subsequently, the model was trained five times, where four out of the five sets were used for training and the remaining one for validation. Therefore, for each fold, 80% of the dataset was considered when training the system and the remaining 20% when validating it.
Data augmentation techniques were use in order to increase the variability of the dataset. Random rotations (up to a maximum of 15 degrees), width shift (up to 20%), height shift (up to 20%), shear (up to 20%), zoom (up to 20%), and horizontal flips were applied to the input images.
In this work, Tensorflow (www.tensorflow.org) and Keras (www.keras.io) were used to train and test the CNN model. The Adadelta optimizer [35] and a batch size of 32 were set for the learning phase. Input images were resized to 128 × 128 pixels to reduce the computational complexity. Because the dataset that we used is unbalanced (i.e., there are more images corresponding to the normal class than to COVID-19), the class_weight parameter was set accordingly in Keras in order to give more importance to the COVID-19 class when training the network.

Performance Metrics
The following metrics were used to measure the performance of the system for the COVID-19 detection task: sensitivity (1), specificity (2), precision (3), and F1-score (4); since the dataset is unbalanced (see Sections 2.1 and 2.2.3), the balanced accuracy was also used (5). In addition, the Area Under Curves (AUCs) of the Receiver Operating Characteristic (ROC) was calculated.
Balanced accuracy = Sensitivity + Specificity 2 (5) where TP and FP refer to true positive cases (when the system diagnoses a COVID-19 case correctly) and false positive cases (the system detects a normal X-ray image as a COVID-19 case), respectively. On the other hand, TN and FN refer to true negative cases (the system detects a normal case correctly) and false negative cases (the system diagnoses a COVID-19 case as normal), respectively. However, we believe that only reporting the metrics of the system is not the best option to validate a model that performs COVID-19 detection in X-ray images, since the system could be learning patterns that are not related to COVID-19. This becomes a greater problem when working with small datasets that do not contemplate the variability of these images that were caused by different factors, such as the X-ray machine used and the patient's constitution, position, or condition. For this reason, we proposed the application of CAMs to visualize what the system is focusing on when performing the prediction. In this way, CAMs can be used by specialists to validate the system's output.

Class Activation Maps
Zhou et al. demonstrated that, even without providing any information of the object location inside an input image, convolutional units of CNNs work as unsupervised object detectors [36,37]. With this idea in mind, CAMs can be generated. A CAM for a particular class highlights the regions of the input image that the CNN considered relevant to perform the prediction. With this method, which allows visualizing the output of a CNN, the network cannot have any fully connected layer. Therefore, a GAP layer is applied to the feature maps obtained after the last convolution or pooling layer of the network, and the features obtained are then used to perform the classification with a SoftMax activation function.

Post-Processing
Because the relevant information for COVID-19 detection in frontal X-ray images only lies inside the lung area [30,32], lungs were segmented from the original images in order to discard surrounding regions. With this process, CAMs (see Section 2.2.5) only focus on this area and, therefore, clearer results in terms of visualization of the system's output are provided. This lung segmentation step was performed using a CNN based on the U-Net model [38], which was used to solve the Radiological Society of North America (RSNA R ) Pneumonia Detection Challenge (www.kaggle.com/eduardomineo/u-net-lung-segmentation-montgomery-shenzhen).

Results and Discussion
In this section, a quantitative evaluation of the system was performed following the performance metrics presented in Section 2.2.4. In addition, a qualitative evaluation was carried out, in which the output of the system was compared with the corresponding ground truth descriptions, and also verified by a lung specialist, in order to validate the results.

Quantitative Evaluation
The results achieved by the network after training and validating the CNN using the five-fold cross-validation are summarized in Table 1. Figure 3 presents the ROC curve for each of the cross-validation folds, which also reports their corresponding AUC values.  As can be seen, the achieved results demonstrate that the system is able to generalize well, obtaining similar and stable results across the different folds. Each of these sets achieved balanced accuracies greater than 91%, and AUC values above 0.97, which confirms that the system is very reliable when performing the classification. After calculating the average of the metrics obtained over all of the different cross-validation folds, the system achieved 92.53% sensitivity, 96.33% specificity, 93.76% precision, 93.14% F1-score, 94.43% balanced accuracy, and an AUC value of 0.988.

Qualitative Evaluation
CAMs are used to visualize what the network is focusing on when performing the classification, as introduced in Section 2.2.5. Figure 4 shows different input images and their corresponding CAM heatmaps obtained with COVID-XNet. The most relevant information that the network considered when performing the prediction for the COVID-19 class is highlighted in red, while regions that were not relevant for COVID-19 detection (considered as normal) are presented in dark blue.  The examples that are shown in Figure 4 present different cases that correspond to true positives (A-H), true negatives (I-K) and false positives (L). The heatmaps obtained for the true positive cases were compared to the ground truth descriptions provided in the datasets in order to verify whether the system was highlighting the correct regions inside the lung area. It is important to mention that these results were also validated by a lung specialist.
The ground truth corresponding to Figure 4A reports patchy ground-glass opacities in right upper and lower lung zones and patchy consolidation in the left middle to lower lung zones. Furthermore, several calcified granulomas were incidentally noted in the left upper lung zone. Figure 4B shows a right interstitial paracardiac thickening with a tendency to cavitation in its most cranial portion, along with a mild right hilar enlargement. Figure 4C presents consolidations in the base of the right hemithorax and an interstitial pattern that affects most of that lung. Moreover, a small pseudonodular consolidation is presented in the left paracardiac region, which could suggest another affected area. In Figure 4D, the ground truth describes the presence of right upper lobe opacity. The report for Figure 4E details the existence of alveolar infiltrates in the right upper and lower lobe, and also in the left parahilar area. As can be seen in the corresponding heatmaps for these cases (A-E), relevant areas that were described in the ground truths were detected by the system. In the case shown in Figure 4F, the patient is reported to present opacities in the base of the right lung and in the left middle and lower lung zones. The output heatmap matches this description, along with a smaller region in the left upper area which is not mentioned in the report. Lower and middle to upper right lobe consolidations are reported in Figure 4G, together with a mild small consolidation in the left lower lobe. In this case, the system was not able to detect consolidations in the middle to upper right lung area. Finally, the ground truth of Figure 4H report COVID-19 pneumonia manifesting as a single nodular lesion. The AP chest radiograph shows a single nodular consolidation (black arrows) in the left lower lung zone. In the latter case, the system detected consolidations marked by ground truth arrows, but it also mistakenly highlighted upper areas in both lungs.
For normal cases (I-L), the system did not detect any relevant COVID-19 area, except for Figure 4L, where two small regions were highlighted.
These promising results prove that, even when training the system with a large unbalanced dataset obtained from different sources, our custom model is learning specific characteristics and patterns appropriately.

Conclusions
In this work, the authors have presented a novel CAD system, COVID-XNet, for detecting COVID-19 in frontal chest X-ray images. The system, which consists of a custom CNN, was trained and validated from scratch with X-ray images that were obtained from publicly-available datasets. These images were preprocessed with different methods (histogram matching, rib suppression, and CLAHE) in order to enhance the relevant information. Using a five-fold cross-validation scheme, COVID-XNet achieved 92.53% sensitivity, 96.33% specificity, 93.76% precision, 93.14% F1-score, 94.43% balanced accuracy, and an AUC value of 0.988 on average over the different folds.
CAMs were used to visualize the output of the CNN, where the relevant features that the system considered for COVID-19 detection were highlighted. The obtained heatmaps were compared and verified with their corresponding ground truths from the radiologists that diagnosed these cases, and were also validated by a lung specialist.
In this work, we combined X-ray images from different publicly-available sources in order to obtain an updated dataset with a larger quantity of images than other state-of-the-art works, making the system more robust, since a greater variability of cases were explored when training and validating the model. In addition, as this work is proposed as a supporting tool to aid specialists in the screening process, we believe that it is important to verify the results with a ground truth. For this reason, a lung specialist supervised and validated our work, which is an aspect that most state-of-the-art works did not consider.
The proposed system could be useful as a screening test for COVID-19 diagnosis in combination with patients' clinical manifestations and/or laboratory results to discard severe cases and decide whether the patient should be hospitalized. The performance of the system when predicting new unseen images shows that the model generalizes well, proving that COVID-XNet could be the first step for developing a universal CAD system for COVID-19 diagnosis in X-ray images.
This work, by no means, presents a solution that is currently ready for its production phase. More tests and improvements should be performed before considering the use of any deep learning solution in hospitals. COVID-XNet was never conceived as a replacement for human radiologists, but as a tool to aid them and contribute to the fight against COVID-19. Future research will focus on training and testing the model with more images, since, currently, only a few small datasets are available. Funding: This work was partially supported by the Spanish grant (with support from the European Regional Development Fund) COFNET (TEC2016-77785-P), and by the Andalusian Regional Project PAIDI2020 (with FEDER support) PROMETEO (AT17_5410_USE).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: