Fast COVID-19 and Pneumonia Classiﬁcation Using Chest X-ray Images

: As of the end of 2019, the world su ﬀ ered from a disease caused by the SARS-CoV-2 virus, which has become the pandemic COVID-19. This aggressive disease deteriorates the human respiratory system. Patients with COVID-19 can develop symptoms that belong to the common ﬂu, pneumonia, and other respiratory diseases in the ﬁrst four to ten days after they have been infected. As a result, it can cause misdiagnosis between patients with COVID-19 and typical pneumonia. Some deep-learning techniques can help physicians to obtain an e ﬀ ective pre-diagnosis. The content of this article consists of a deep-learning model, speciﬁcally a convolutional neural network with pre-trained weights, which allows us to use transfer learning to obtain new retrained models to classify COVID-19, pneumonia, and healthy patients. One of the main ﬁndings of this article is that the following relevant result was obtained in the dataset that we used for the experiments: all the patients infected with SARS-CoV-2 and all the patients infected with pneumonia were correctly classiﬁed. These results allow us to conclude that the proposed method in this article may be useful to help physicians decide the diagnoses related to COVID-19 and typical pneumonia. and C.Y.-M.; investigation, M.A.M.-I.; writing—original draft preparation, J.E.L.-G.; writing—review and editing, Y.V.-R., and C.Y.-M.; visualization, J.E.L.-G.; supervision, Y.V.-R., and C.Y.-M.


Introduction
COVID-19 is a newly known disease that can be misdiagnosed as common pneumonia. In 2020, COVID-19 has become a major pandemic due to the easy propagation though the air and contact with contaminated objects and people. According with the World Health Organization (WHO), as of 30 June 2020, there have been more than 10 million cases of COVID-19 and more than half a million confirmed deaths [1]. COVID-19 is caused by the SARS-CoV-2 virus infecting the lung and the respiratory system. This is a major concern for the medical field due to the fuzzy symptoms that present early contagious people. Patients with COVID-19 can develop symptoms that belong to common flu, pneumonia, and other respiratory diseases in the first four to 10 days since they were infected [2].
It is possible to support the diagnosis of respiratory and lung diseases through computer-assisted diagnosis (CAD). Within CAD, we can find techniques that obtain images from the internals of the body, such as chest X-ray (CXR) images.
The use of computed tomography (CT) as well as magnetic resonance imaging (MRI) is also useful. Nowadays, the most effective image examination for diagnosing COVID-19 on infected patients is the CT images, due to their high sensitivity compared to CXRs. Nonetheless, only patients with severe complications are hospitalized, and CT Scan Units are limited in hospitals. On the other hand, with severe complications are hospitalized, and CT Scan Units are limited in hospitals. On the other hand, CXRs can also provide useful images to help to visualize COVID-19-infected patients. Radiologists have found radiological features that could be useful for screening COVID-19 in CXR images [3]. However, these features can be confused with atypical pneumonia and other pulmonary manifestations.
Even though CXRs are not the most accurate examination, CXR images can be widely used because they are not as expensive as CT units and do not require extensive patient preparation to perform an examination. Furthermore, there are portable X-ray units that can be moved to noncritical hospital areas, even to specific facilities to analyze suspicious COVID-19 cases.
According to the Radiology Society of North America (RSNA), there are some specific manifestations that can be found in CT scans, such as peripheral and bilateral consolidations with "crazy paving" pattern and peripheral and bilateral ground-glass opacification (GGO) signs [4].
Moreover, radiological features can be observed in CXRs from patients five days before acquiring COVID-19. The findings on CXRs are airspace opacities, such as consolidations or GGO, both unilateral or bilateral [5]. Two examples of CXR images are shown in Figure 1: a healthy patient and a person infected with COVID-19. On the other hand, pneumonia is an infectious disease that also affects the lungs, and according to WHO, it is considered a leading cause of death in children [6]. Pneumonia can be caused by fungus, bacteria, or a virus attack.
Pneumonia causes pain in the chest and limits the oxygen intake of the infected patient. In the same way, pneumonia presents radiological features, such as consolidations by fluid accumulation [6]. Two examples of CXR images are shown in Figure 2: a healthy patient and a person with pneumonia caused by bacteria.
There are several machine-learning (ML) techniques that can help develop CAD tools. One of the best current methods for computer vision (CV) tasks are the convolutional neural networks (CNN) [7]. A lot of techniques have been used in the medical field to assist CAD tasks, such as lesion segmentation [8], brain tumor segmentation [9,10], automatic size calculation of the heart [11], and classification among several thorax diseases [12][13][14][15].
In this research, we propose to use the Xception [16] CNN with pretrained weights on ImageNet. The use of pretrained models allows us to use transfer learning (TL) to obtain new retrained models to classify, in this case, among COVID-19, pneumonia, and healthy patients. On the other hand, pneumonia is an infectious disease that also affects the lungs, and according to WHO, it is considered a leading cause of death in children [6]. Pneumonia can be caused by fungus, bacteria, or a virus attack.
Pneumonia causes pain in the chest and limits the oxygen intake of the infected patient. In the same way, pneumonia presents radiological features, such as consolidations by fluid accumulation [6]. Two examples of CXR images are shown in Figure 2: a healthy patient and a person with pneumonia caused by bacteria.
There are several machine-learning (ML) techniques that can help develop CAD tools. One of the best current methods for computer vision (CV) tasks are the convolutional neural networks (CNN) [7]. A lot of techniques have been used in the medical field to assist CAD tasks, such as lesion segmentation [8], brain tumor segmentation [9,10], automatic size calculation of the heart [11], and classification among several thorax diseases [12][13][14][15].
In this research, we propose to use the Xception [16] CNN with pretrained weights on ImageNet. The use of pretrained models allows us to use transfer learning (TL) to obtain new retrained models to classify, in this case, among COVID-19, pneumonia, and healthy patients.

Materials and Methods
CNN have proved to be one of the best techniques to solve CV problems. Even since a CNN won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 [17], a lot of new models for CV tasks have been developed and established new milestones on the ILSVRC.

Convolutional Neural Networks and Chest Diseases
The medical field and CAD have presented big advances thanks to deep learning (DL) and CNN. Moreover, TL has been widely used in order to reuse pre-trained models on ImageNet [22].
Although, CAD applications have been actively published after the release of the Chest-Xray14 dataset (CX14) in 2017, by Wang et al. [12]. The CX14 contains posterior-anterior images from the chest, with a total of 108,948 images. The CX14 represented a big milestone for CAD, CV, and DL applications for multiple chest diseases classification [13,23,24].

Convolutional Neural Networks, Pneumonia, and COVID-19
Pneumonia classification has also been aborded by multiple research groups. For example, Rajpurkar et al. used a DenseNet model in CX14 to achieve radiologist-level diagnosis of pneumonia [28].
Kermany et al. [29], presented a pneumonia dataset with only 5232 images and established a baseline result for it. Additionally, on Kermany's dataset, multiple works had been presented; the three latest top results are by Lian and Zheng [28], Luján et al. [30], and Chouhan et al. [31]. On the RSNA Challenge [32], the top result was presented by Sirazitdinov et al. [33].
By July 2020, a large number of scientific papers related to the COVID-19 were published. Most of the investigations used CT images in conjunction with CNN models to classify sick patients and try to differentiate them from typical pneumonia. Nonetheless, the main problem was the lack of public datasets that would allow us to perform diverse experiments with different techniques. Furthermore, the small number of datasets that contain images from patients with COVID-19 lack a great number of samples. The last statement represents a big challenge to many of the state-of-the- Figure 2. Examples of CXR images: (a) shows a healthy patient without any anomaly within the thorax; (b) shows a pneumonia-infected patient, fluid and consolidations are found in the right lung. Original images are from (https://github.com/ieee8023/covid-chestxray-dataset) and (https: //data.mendeley.com/datasets/rscbjbr9sj/2), respectively. Annotations were manually performed by us with a doctor assistant.

Materials and Methods
CNN have proved to be one of the best techniques to solve CV problems. Even since a CNN won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 [17], a lot of new models for CV tasks have been developed and established new milestones on the ILSVRC.

Convolutional Neural Networks and Chest Diseases
The medical field and CAD have presented big advances thanks to deep learning (DL) and CNN. Moreover, TL has been widely used in order to reuse pre-trained models on ImageNet [22].
Although, CAD applications have been actively published after the release of the Chest-Xray14 dataset (CX14) in 2017, by Wang et al. [12]. The CX14 contains posterior-anterior images from the chest, with a total of 108,948 images. The CX14 represented a big milestone for CAD, CV, and DL applications for multiple chest diseases classification [13,23,24].

Convolutional Neural Networks, Pneumonia, and COVID-19
Pneumonia classification has also been aborded by multiple research groups. For example, Rajpurkar et al. used a DenseNet model in CX14 to achieve radiologist-level diagnosis of pneumonia [28].
Kermany et al. [29], presented a pneumonia dataset with only 5232 images and established a baseline result for it. Additionally, on Kermany's dataset, multiple works had been presented; the three latest top results are by Lian and Zheng [28], Luján et al. [30], and Chouhan et al. [31]. On the RSNA Challenge [32], the top result was presented by Sirazitdinov et al. [33].
By July 2020, a large number of scientific papers related to the COVID-19 were published. Most of the investigations used CT images in conjunction with CNN models to classify sick patients and try to differentiate them from typical pneumonia. Nonetheless, the main problem was the lack of public datasets that would allow us to perform diverse experiments with different techniques. Furthermore, the small number of datasets that contain images from patients with COVID-19 lack a great number of samples. The last statement represents a big challenge to many of the state-of-the-art classification models that are related with DL and other ML algorithms that need several examples to work effectively. Therefore, is a great challenge to present alternatives to classify and screen new COVID-19, even using traditional techniques.
One of the first attempts to classify COVID-19 patients was presented by El-Din et al. [34] in which they compared multiple CNN baselines to select the best model for classification. Butt et al. [35] presented a ResNet variant to classify and localize the patterns from the SARS-CoV-2 virus in COVID-19. Ardakani et al. [36] implemented and compared 10 models of the best-known CNN models to classify CT images, obtaining outstanding results compared with medical radiologists.
A similar approach to the proposed work was presented by Ozturk et al. [37] in which they used the Darknet model with pretrained weights from the YOLO system [38], achieving the best results using CXR images until the publication of their article. Other approaches presented by Zhanh et al. [39] include the use of a CNN as feature extractor and two different multilayer perceptron (MLP) as anomaly detection and confidence prediction; a DenseNet121 model presented by Cohen et al. [40] established a baseline for an early version of the same dataset used in this research.
The proposed method aims to use the Xception pre-trained network on ImageNet and combining two datasets from both pneumonia and COVID-19 to classify among healthy and sick patients. We use preprocessing techniques and cost-sensitive learning (CSL), as in previous research to avoid the undesired effect produced by data imbalance problems.

Materials and Methods
In this section, we describe the datasets we have used to carry out these research experiments. A class imbalance problem is presented, and both classification algorithms and screening techniques are presented. Finally, performance measures are described.

Pneumonia Dataset
The pneumonia dataset selected to perform the experiments in this article is the one published in 2018 by the team of Kermany et al. [29]. This dataset contains 5856 CXR images of children up to five years old (https://data.mendeley.com/datasets/rscbjbr9sj/2).
The total set of images is made up of two disjoint sets: a training set with 5232 images and a test set containing 624 images. The training set includes 3883 images of pneumonia-infected patients and 1349 images of healthy patients. Otherwise, the test set contains 390 images of pneumonia-infected patients and 234 images of healthy patients.

COVID-19 Dataset
Due to the recent outbreak of COVID-19, there are not too many publicly available datasets that allow researchers to try different classification and segmentation methods for this new disease. Nonetheless, the Montreal University has opened a regularly updated dataset for COVID-19 and other respiratory diseases to contribute to diagnosis tools.
In this dataset, only 287 CXRs were specifically from patients with COVID-19. Most of them are from adult patients and include people from 12 to 87 years old and different nationalities. Only COVID-19 images were used in this research from this dataset (https://github.com/ieee8023/covidchestxray-dataset).

Class Imbalance
Considering the two datasets, we have used the images of each one to get a new dataset in which we can set three different classes: COVID-19, HEALTHY, and PNEUMONIA. Moreover, we can also observe that the number of examples of each class are not the same. Therefore, we can know how imbalanced our new dataset is. We computed the imbalance ratio (IR) as in Equation (1): If IR >1.5, we can consider a dataset imbalanced [42]. Then, the IR for our dataset is computed in Equation (2):

Convolutional Network
François Chollet published, in 2016, one of the most effective CNN models known to date: it is the Xception model [16], which we have chosen as the basis for the experimental section of this article. This CNN is structured by 36 convolutional layers, and its main difference is the use of depth-wise separable convolutions (DWSC) and residual connections just as in ResNet models. DWSC obtain the same results as traditional convolutions but perform fewer operations when using large filter sizes.
Outperforming in operations models, such as VGG, Inception, and ResNet, deeper configurations can be used to extract features from the images. The original Xception model contained a Softmax layer of 1000 neurons and a global average as a pooling layer. We adapted the Xception model for three class classification, as shown in Figure 3.
If IR >1.5, we can consider a dataset imbalanced [42]. Then, the IR for our dataset is computed in Equation (2):

Convolutional Network
François Chollet published, in 2016, one of the most effective CNN models known to date: it is the Xception model [16], which we have chosen as the basis for the experimental section of this article. This CNN is structured by 36 convolutional layers, and its main difference is the use of depthwise separable convolutions (DWSC) and residual connections just as in ResNet models. DWSC obtain the same results as traditional convolutions but perform fewer operations when using large filter sizes.
Outperforming in operations models, such as VGG, Inception, and ResNet, deeper configurations can be used to extract features from the images. The original Xception model contained a Softmax layer of 1000 neurons and a global average as a pooling layer. We adapted the Xception model for three class classification, as shown in Figure 3.  Our new network needs to include an initialization step because we use transfer learning. We have chosen to use the ImageNet image set in order to get the pretrained weights. We have removed the top layer in order to add a global average layer as pooling method, a dropout layer as regularization method, and finally, three units with a Softmax activation function for multinomial logistic regression, outputting probabilistic predictions [43]. Softmax function is defined in the following Equation (3): in which = T for = 1,2, … , and = ( , , … , ) . Our new network needs to include an initialization step because we use transfer learning. We have chosen to use the ImageNet image set in order to get the pretrained weights. We have removed the top layer in order to add a global average layer as pooling method, a dropout layer as regularization method, and finally, three units with a Softmax activation function for multinomial logistic regression, outputting probabilistic predictions [43]. Softmax function is defined in the following Equation (3): in which z = x T w j for i = 1, 2, . . . , k and z = (z 1 , z 2 , . . . , z k ) k ∈ R k .

Screening and Localization
We have implemented the Gradient-Weighted Class Activation Mapping (Grad-CAM) [44] algorithm in order to visualize the activation of the last convolutional layer of the Xception network. The last convolutional layer is the one that provides the final features' values for the logistic layer to compute a probabilistic output. The gradients of this layer are used to generate the Grad-CAM of an input image. Therefore, Grad-CAM provides a coarse localization map that indicates the most important regions in the image (radiological features) for a specific concept-in this case, COVID-19 or PNEUMONIA class.

Performance Measures
The confusion matrix allows us to compute metrics of the performance of our classification task. In a binary classification problem, the confusion matrix includes [45]: true positives (tp) true negatives (tn) false positives (fp), and false negatives (fn). Figure 4 represents an example of confusion matrix.

Screening and Localization
We have implemented the Gradient-Weighted Class Activation Mapping (Grad-CAM) [44] algorithm in order to visualize the activation of the last convolutional layer of the Xception network. The last convolutional layer is the one that provides the final features' values for the logistic layer to compute a probabilistic output. The gradients of this layer are used to generate the Grad-CAM of an input image. Therefore, Grad-CAM provides a coarse localization map that indicates the most important regions in the image (radiological features) for a specific concept-in this case, COVID-19 or PNEUMONIA class.

Performance Measures
The confusion matrix allows us to compute metrics of the performance of our classification task. In a binary classification problem, the confusion matrix includes [45]: true positives (tp) true negatives (tn) false positives (fp), and false negatives (fn). Figure 4 represents an example of confusion matrix.
On the other hand, in a multiclass problem, a common solution is to evaluate each class separately.
That is, each class needs to be evaluated considering the tp and fn of the class of interest and the tn and fp from the other classes. In this research, when evaluating a single class like COVID-19, the tn and fp are the sum of the instances of HEALTHY and PNEUMONIA classes.
Although there are a large number of performance measures in pattern classification problems, one of the most used is accuracy, which is defined in Equation (4) for multiclass problems: where, is the total of classes. Individual Accuracy is computed per class without dividing the result by .
On the contrary, when we have imbalanced datasets, it is recommended to use different performance measures that are not biased to the general result of the classification algorithm. Some of these performance measures include sensitivity (also called recall) and precision, which are derived directly from the confusion matrix, in addition to the F1-Score, which is defined from recall and precision. The quality of the classification can be assessed as either macro or micro average. In general, macro average treats classes as equal [45]. Therefore, both individual and macro-average measures will be used in this research. Precision, recall, and F-1 Score are detailed in Equations (5)-(7), respectively. On the other hand, in a multiclass problem, a common solution is to evaluate each class separately. That is, each class needs to be evaluated considering the tp and fn of the class of interest and the tn and fp from the other classes.
In this research, when evaluating a single class like COVID-19, the tn and fp are the sum of the instances of HEALTHY and PNEUMONIA classes.
Although there are a large number of performance measures in pattern classification problems, one of the most used is accuracy, which is defined in Equation (4) for multiclass problems: where, l is the total of classes. Individual Accuracy is computed per class without dividing the result by l.
On the contrary, when we have imbalanced datasets, it is recommended to use different performance measures that are not biased to the general result of the classification algorithm. Some of these performance measures include sensitivity (also called recall) and precision, which are derived directly from the confusion matrix, in addition to the F1-Score, which is defined from recall and precision. The quality of the classification can be assessed as either macro or micro average. In general, macro average treats classes as equal [45]. Therefore, both individual and macro-average measures will be used in this research. Precision, recall, and F-1 Score are detailed in Equations (5)-(7), respectively.
Individual measures are computed similar to the last equations but without dividing the values by l. In addition, it is common to analyze the classification results with graphics tools, such as the reception operating characteristic curve (ROC curve) and its general score-the area under the curve (AUC). Therefore, in Section 5, we will present the performance measures of our classification results and the ROC curve associated with them.

Image Preprocessing
It is common that CNN uses square images as input. Our CXRs from the two different datasets come in a variety of sizes that depend on the original equipment and preprocessing of the X-ray machines. Consequently, in order to feed the network, we would need to resize the images to a square form, causing a distortion on the images, as shown in Figure 5.
Individual measures are computed similar to the last equations but without dividing the values by . In addition, it is common to analyze the classification results with graphics tools, such as the reception operating characteristic curve (ROC curve) and its general score-the area under the curve (AUC). Therefore, in Section 5, we will present the performance measures of our classification results and the ROC curve associated with them.

Image Preprocessing
It is common that CNN uses square images as input. Our CXRs from the two different datasets come in a variety of sizes that depend on the original equipment and preprocessing of the X-ray machines. Consequently, in order to feed the network, we would need to resize the images to a square form, causing a distortion on the images, as shown in Figure 5. We wanted to avoid distortion on the input images in order to evade the suppression of useful data of the images. As a result, we have used a preprocessing technique presented by Pasa et al. [46] to extract the central region and eliminating black bars. Figure 6 shows a preprocessed image to which the following operations have been performed: 1. If some black band appears at the edges, they are removed.  We wanted to avoid distortion on the input images in order to evade the suppression of useful data of the images. As a result, we have used a preprocessing technique presented by Pasa et al. [46] to extract the central region and eliminating black bars. Figure 6 shows a preprocessed image to which the following operations have been performed: 1.
If some black band appears at the edges, they are removed.

2.
The size of the image is transformed until the smallest border measures 299 pixels.

3.
Extract the central region of 299 × 299 pixels. At last, normalization was performed over all the sets of images. We transformed the distribution of the dataset to a normal distribution with media = 0 and standard deviation = 1. We achieved normalization by computing the media and standard deviation values from the training set partition, and then, adding the media and dividing training, validation, and test sets by the computed standard deviation. Partitions will be explained in Section 4.4.

Cost Sensitive Learning
We can apply a penalty to the function that scores each class in a classification task, this process is known as cost-sensitive learning (CSL) [47]. In this research, the cost function is categorical crossentropy [48]. CSL helps avoid bias in classification when using CNNs. Therefore, CSL helps us as a method for solving imbalance problem, presented in Section 3.3. In this case, weights were obtained considering the number of training examples of each class, using a Sci-Kit Learn [49] function that obtains the numbers to balance the number of examples based on the principles of logistic regression. Table 1 includes the weights.

Data Augmentation and Hyperparameter Tuning
Deep-learning models, such as CNN, often present better results when used as much images, if possible. As a result, data augmentation is useful to generate more data based on the original training images. We performed the following data augmentation operations at training time:


Random rotation of ±10 degrees.  Zoom on a range of ±10%.  Horizontal flipping.
On the other hand, hyperparameters used for training a neural network are not obtained by a specific rule [50]. Moreover, it is necessary to adjust the parameters considering the validation data. With previous experience, we follow the next hyperparameters (Table 2), obtaining the current At last, normalization was performed over all the sets of images. We transformed the distribution of the dataset to a normal distribution with media µ = 0 and standard deviation σ = 1. We achieved normalization by computing the media and standard deviation values from the training set partition, and then, adding the media and dividing training, validation, and test sets by the computed standard deviation. Partitions will be explained in Section 4.4.

Cost Sensitive Learning
We can apply a penalty to the function that scores each class in a classification task, this process is known as cost-sensitive learning (CSL) [47]. In this research, the cost function is categorical cross-entropy [48]. CSL helps avoid bias in classification when using CNNs. Therefore, CSL helps us as a method for solving imbalance problem, presented in Section 3.3. In this case, weights were obtained considering the number of training examples of each class, using a Sci-Kit Learn [49] function that obtains the numbers to balance the number of examples based on the principles of logistic regression. Table 1 includes the weights.

Data Augmentation and Hyperparameter Tuning
Deep-learning models, such as CNN, often present better results when used as much images, if possible. As a result, data augmentation is useful to generate more data based on the original training images. We performed the following data augmentation operations at training time: • Random rotation of ±10 degrees.
On the other hand, hyperparameters used for training a neural network are not obtained by a specific rule [50]. Moreover, it is necessary to adjust the parameters considering the validation data. With previous experience, we follow the next hyperparameters (Table 2), obtaining the current presented trained model. We also implemented early stopping to prevent the network from overfitting; we stopped the training when the score of the loss function did not improve after 10 epochs. The categorical cross-entropy is a generalization of the cross-entropy loss function. It is used when we have a multiclass classification task. The loss is computed for each class independently, and then, the results are summed. Categorical cross entropy is defined in following Equation (8) [48].
where, k is the number of the class, and q is the Softmax function or predicted probability of class k (Equation (8)). Moreover, if we apply the weights for CSL, presented in the last subsection, we will have Equation (9).
where W k are the weights for COVID-19, HEALTHY, and PNEUMONIA classes, respectively as shown in Table 1.

Dataset Partition
A simple validation strategy was followed to obtain training, validation, and test sets. First, 32 randomly examples where selected from the COVID-19 class, which represents the 10% of the complete data. Then, with the left images, a hold-out 80-20 validation method was applied. Similarly, with the original training images of HEALTHY and PNEUMONIA patients, hold-out 80-20 was performed. Although, test examples for both HEALTHY and PNEUMONIA were the official ones, presented by Kermany et al. [51] in the original dataset. Final partitions are summarized in Table 3.

Results
In this section, we preset the general performed methodology for the experiments. Performance on validation set is presented. Finally, classification performance and screening of the diseases are shown.

Experimental Framework
All experiments were implemented using Python 3.6 programming language, using Jupyter Notebook and OpenCV [52] as our main graphic processor software. We preprocessed all the available images and, then, performed hold-out validation to obtain the partitions presented in Table 3. We have used a fixed seed for replication purposes of the algorithm. Hold-out validation was only performed once, without changes on the partitions. In carrying out the research whose results are presented in this article, we have taken advantage of the free online Linux platform called Google Collaboratory.

Validation Set Results
We trained the Xception network with pretrained weights on ImageNet and, then, selected the best model using the score of the loss function over the validation set. Moreover, we also computed the Accuracy, Precision, Recall, and ROC AUC over the validation set. Last scores are shown in Figure 7.

Validation Set Results
We trained the Xception network with pretrained weights on ImageNet and, then, selected the best model using the score of the loss function over the validation set. Moreover, we also computed the Accuracy, Precision, Recall, and ROC AUC over the validation set. Last scores are shown in Figure  7. In addition, we also compared the proposed method with some of the most important CNN baselines, such as VGG16, ResNet50, and DenseNet121. Figure 8 shows the score of the loss function for each model.  In addition, we also compared the proposed method with some of the most important CNN baselines, such as VGG16, ResNet50, and DenseNet121. Figure 8 shows the score of the loss function for each model. presented in this article, we have taken advantage of the free online Linux platform called Google Collaboratory. Therefore, at the time of conducting the experiments, we had 25 GB of RAM, in addition to a Nvidia Tesla K80 model GPU that included 12 GDDR VRAM. Summary of the proposed methodology is as follows:  Resizing and cropping of all images with the proposed method.  Hold-out as validation method to obtain training, validation, and test sets.  Normalization of the images.  Model selection by Xception training and validation.  Performance evaluation of the model on the test set.  Grad-CAM generation for test examples.

Validation Set Results
We trained the Xception network with pretrained weights on ImageNet and, then, selected the best model using the score of the loss function over the validation set. Moreover, we also computed the Accuracy, Precision, Recall, and ROC AUC over the validation set. Last scores are shown in Figure  7. In addition, we also compared the proposed method with some of the most important CNN baselines, such as VGG16, ResNet50, and DenseNet121. Figure 8 shows the score of the loss function for each model.   Table 4 shows the time measures and the validation loss score for all CNN models. It was found that the proposed method took less than 12 min to obtain a competitive model to classify COVID-19.

Test Set Classification Results
Our best model, the one with the lowest validation loss score, was selected to classify our test set. Results of classification are shown in the confusion matrix of Figure 9.

Test Set Classification Results
Our best model, the one with the lowest validation loss score, was selected to classify our test set. Results of classification are shown in the confusion matrix of Figure 9.   In Figure 9, we can observe the values of each individual CLASS. In the same way, ROC curves and their AUC were computed. Individual and macro-average (MA) metrics are condensed in Table 5. Moreover, we also computed the macro-averaged metrics and plotted the ROC curves for each CNN model. Metrics are presented in Table 6, and ROC curves are shown in Figure 10.  Moreover, we also computed the macro-averaged metrics and plotted the ROC curves for each CNN model. Metrics are presented in Table 6, and ROC curves are shown in Figure 10. Finally, we applied t-distributed Stochastic Neighbor Embedding (t-SNE) [53] using Sci-Kit Learn [49] to represent the high-dimensional final features vector of the Xception network, which contains 2048 features for each example. Figure 11 shows the prediction of the test set in a twodimensional plane. Finally, we applied t-distributed Stochastic Neighbor Embedding (t-SNE) [53] using Sci-Kit Learn [49] to represent the high-dimensional final features vector of the Xception network, which contains 2048 features for each example. Figure 11 shows the prediction of the test set in a two-dimensional plane.

Disease Screening
As mentioned in Section 3.5, Grad-CAM was used to provide the possible localization of the manifestations of both COVID-19 and PNEUMONIA. The maps are showed as a heatmap, where the intense red color represents the most important area (extracted features) from which the classification

Disease Screening
As mentioned in Section 3.5, Grad-CAM was used to provide the possible localization of the manifestations of both COVID-19 and PNEUMONIA. The maps are showed as a heatmap, where the intense red color represents the most important area (extracted features) from which the classification decision is taken by the network. Figures 12 and 13 show several examples of correctly classified images of both COVID-19 and PNEUMONIA classes, respectively.
(a) (b) Figure 11. t-SNE of predicted outputs for test set: (a) shows the instances using random initialization; (b) shows the instances using principal component analysis (PCA) for initialization.

Disease Screening
As mentioned in Section 3.5, Grad-CAM was used to provide the possible localization of the manifestations of both COVID-19 and PNEUMONIA. The maps are showed as a heatmap, where the intense red color represents the most important area (extracted features) from which the classification decision is taken by the network. Figure 12 and Figure 13 show several examples of correctly classified images of both COVID-19 and PNEUMONIA classes, respectively.  (e) (f) We did not perform further comparison of localization of the diseases with other state-of-the-art papers due to the daily updates of the dataset. Therefore, the number of examples on different dates would not be the same.

Discussion
In this section, we highlight the advantages of the proposed method and evaluate the performance of both the classification algorithm and the visualization technique.
On the validation set (Table 4), the proposed method obtained the minor loss score of 0.05619 among all CNN models, belonging to the sixth epoch of the training phase. Therefore, our method trained over all the dataset and obtained the best model with an average of 700 s. This means that we were able to obtain a good model for the classification of pneumonia, healthy, and COVID-19 patients in less than 12 min. In addition, the total training time of the proposed model, took under 27 min, thanks to the condition of early stopping. On the other hand, even though DenseNet121 finished the We did not perform further comparison of localization of the diseases with other state-of-the-art papers due to the daily updates of the dataset. Therefore, the number of examples on different dates would not be the same.

Discussion
In this section, we highlight the advantages of the proposed method and evaluate the performance of both the classification algorithm and the visualization technique.
On the validation set (Table 4), the proposed method obtained the minor loss score of 0.05619 among all CNN models, belonging to the sixth epoch of the training phase. Therefore, our method trained over all the dataset and obtained the best model with an average of 700 s. This means that we were able to obtain a good model for the classification of pneumonia, healthy, and COVID-19 patients in less than 12 min. In addition, the total training time of the proposed model, took under 27 min, thanks to the condition of early stopping. On the other hand, even though DenseNet121 finished the training before our method, it did not obtain the best classification result. Moreover, the differences on performance are of less than six minutes of difference on training a full CNN model for COVID-19 classification. Therefore, experiments with the hyperparameters presented allow us to generate up to two solutions to the classification of CXRs in one hour. Furthermore, the time taken on the proposed method in processing only one example (inference of one CXR) is approximately 22.6 milliseconds, making our model time-efficient for the classification of new CXR images.
On the other hand, in the test set, we computed the performance measures using the confusion matrix ( Figure 9a) and condensed the results in the Table 5. The proposed method has obtained an accuracy of 1.00; a precision of 0.96; a sensitivity (or recall) of 1.00; also, a F1-score of 0.97 and an AUC of 1.00 for the COVID-19 class. For the HEALTHY class, our model obtained the following scores: accuracy of 0.86; precision: 0.99; recall: 0.62; F1-Score: 0.76; AUC: 0.97. Moreover, for the PNEUMONIA class, we obtained the following scores: accuracy of 0.86; precision of 0.82; recall of 1.00; a F1-Score of 0.90; an AUC of 0.97. Finally, the average accuracy was 0.91; a macro-average precision of 0.92; macro-average recall of 0.82; macro-average recall of 0.87; macro-average of 0.88 for the F1-Score; macro-average score of 0.98 for the AUC.
When comparing with other baseline CNN models, we found (from Table 6) that our method achieved the best scores among all models, making our proposal a better option compared with most common architectures, such as DenseNet121, in computer vision tasks for medical applications. On the other hand, we would highlight that despite the fact of most pneumonia images are from children, and most COVID-19 images are mainly taken of adults, the proposed method does not discriminate by the "age" feature. We could observe from experiments and comparisons with baseline CNN that the proposed method extracts useful features from the texture and patterns within the CXRs to achieve a better classification score. On the contrary, baseline CNN models also misclassified a lot of patients from the "only children" dataset, which contains the samples for HEALTHY and PNEUMONIA classes. Moreover, we showed that an evident linear separability is found by our method for the COVID-19 class compared with HEALTHY and PNEUMONIA ( Figure 11). Nonetheless, some overlaps exist between HEALTHY and PNEUMONIA patients, the proposed method effectively classifies most instances of the last two classes from children's image datasets.
From the scores of Table 5, we can observe that we achieved more than 90% of precision and more than 80% of recall. Moreover, for individual measurements, we obtained a 100% of recall on both COVID-19 and pneumonia-infected patients. One of the main findings of this article is that the following relevant result was obtained in the dataset that we used for the experiments: all the patients infected with SARS-CoV-2 and all the patients infected with pneumonia were correctly classified. The results show more than 85% of F1-Score, which indicates a good relationship between precision and recall. The AUC value indicates the benefits of the model understudy that allow it to avoid, in general, false classification. We obtained a good value for AUC: a score of more than 97% considering the three classes, and 100% on COVID-19 individually. Again, the proposed method obtained the best scores compared with other baseline CNN models.
On the other hand, we argue that we obtained favorable results on the Grad-CAM representations for screening and visualization of the diseases. From Figure 12, we can observe that in the first example, the heatmap indicates the bilateral consolidations on both pulmonary fields. Moreover, the second and third examples show consolidations on both pulmonary fields with a GGO pattern on the vassal region of the left lung from both patients, which is a characteristic the COVID-19 pattern. However, the visualization is not perfect because in the three examples of Figure 12, we mentioned that there existed bilateral GGO patterns. Nonetheless, our network sometimes only detected unilateral manifestations. Furthermore, sometimes the networks generate the heatmaps around the main bronchi area due to inflammation and dilatation that are not always exclusive manifestations of COVID-19.
In the same way, Grad-CAM for the PNEUMONIA class was generated. In Figure 13, we can observe that the three patients show fluid accumulation and consolidations of typical pneumonia. Therefore, our model correctly identified manifestations of pneumonia in the right lung of each patient. Therefore, generated heatmaps resulted in a useful tool for fast screening of pneumonia disease from CXR images.

Conclusions
In this paper, we have presented an effective and fast method to automatically classify three types of chest X-ray images: healthy people, patients with pneumonia, and patients infected with SARS-CoV-2 virus. We have used two different datasets to train the proposed method: for COVID-19, most of the samples were obtained from adult patients, including people from 12 to 87 years old; for HEALTHY and PNEUMONIA instances, we have used a dataset that contains images from children under five years old. The proposed method showed to be effective to extract features from the diseases, despite the fact of age differences. Furthermore, our method proves to be fast and obtain good performance under 12 min, in addition to obtaining good (and, in some cases, excellent) values in performance measures, such as macro-average precision, recall, F1-Score, and AUC compared with other state-of-the art baseline models. When performance measures were calculated for the three cases individually, the results were excellent. Specifically, we were successful in correctly classifying all the patients infected with SARS-CoV-2 and all the patients infected with pneumonia. This allows us to conclude that the proposed method may be useful to help physicians with the classification and visualization of COVID-19 and typical pneumonia.