A Deep Batch Normalized Convolution Approach for Improving COVID-19 Detection from Chest X-ray Images

Pre-trained machine learning models have recently been widely used to detect COVID-19 automatically from X-ray images. Although these models can selectively retrain their layers for the desired task, the output remains biased due to the massive number of pre-trained weights and parameters. This paper proposes a novel batch normalized convolutional neural network (BNCNN) model to identify COVID-19 cases from chest X-ray images in binary and multi-class frameworks with a dual aim to extract salient features that improve model performance over pre-trained image analysis networks while reducing computational complexity. The BNCNN model has three phases: Data pre-processing to normalize and resize X-ray images, Feature extraction to generate feature maps, and Classification to predict labels based on the feature maps. Feature extraction uses four repetitions of a block comprising a convolution layer to learn suitable kernel weights for the features map, a batch normalization layer to solve the internal covariance shift of feature maps, and a max-pooling layer to find the highest-level patterns by increasing the convolution span. The classifier section uses two repetitions of a block comprising a dense layer to learn complex feature maps, a batch normalization layer to standardize internal feature maps, and a dropout layer to avoid overfitting while aiding the model generalization. Comparative analysis shows that when applied to an open-access dataset, the proposed BNCNN model performs better than four other comparative pre-trained models for three-way and two-way class datasets. Moreover, the BNCNN requires fewer parameters than the pre-trained models, suggesting better deployment suitability on low-resource devices.


Introduction
The coronavirus (COVID-19) remains a global health problem that negatively impacts our lives and the global economy. The initial infection was reported on January 2020 of twenty-seven patients with Pneumonia and an epidemiological link to a live wild animal

Proposed Model
The pre-trained VGG models inspire the proposed model intending to reduce computation complexity while increasing the classification accuracy of COVID-19 detection. Various tasks explored the state-of-the-art VGG models due to their excellent capability for feature extraction. These models can be well understood in two sections: Feature extraction and classifier. The first section embeds the raw input into low-dimensional vectors further accepted by the classifier for generating desired class labels. The proposed BNCNN is inspired by the repetition of block structures of feature extraction, as in the VGG model. The proposed BNCNN-based COVID-19 detection system has three primary phases: Data pre-processing, Feature extraction, and Classification. We explain each phase in the following subsections.

Data Pre-Processing
The proposed model uses chest X-ray images of COVID-19 patients and other subjects. Researchers from the University of Doha and the University of Dhaka collected this dataset, and it is publicly available with metadata on the 'Kaggle' [15]. The dataset contains three classes, including COVID-19, Normal and Viral Pneumonia images. Samples of X-ray images from each of the three classes in the dataset are provided in Figure 1. For experiments and evaluation, the dataset is partitioned into three mutually exclusive and exhaustive subsets for training (80%), validation (10%), and testing (10%). These subsets are summarised in Table 1. The X-ray images in the dataset are uniformly pre-processed to facilitate the learning process. Each X-ray image is resized to 150 × 150 pixels, unlike 224 × 224 as in VGG models. It reduces the network's input dimension and trainable and non-trainable weights of the BNCNN model. Each X-ray image is divided by 255, resulting in a normalized image in 0-1, which facilitates weight learning by avoiding vanishing and exploding gradients. Data augmentation strategies are used to simulate real-life scenarios and avoid the risks of overfitting. All subsets are augmented independently with a rotation ranging from −10 to 10 degrees, a zooming range of 0-10%, shearing of 0-10%, a horizontal stride of 0-10%, a vertical stride of 0-10%, and horizontally flip to improve generalization and increase  The X-ray images in the dataset are uniformly pre-processed to facilitate the learning process. Each X-ray image is resized to 150 × 150 pixels, unlike 224 × 224 as in VGG models. It reduces the network's input dimension and trainable and non-trainable weights of the BNCNN model. Each X-ray image is divided by 255, resulting in a normalized image in 0-1, which facilitates weight learning by avoiding vanishing and exploding gradients. Data augmentation strategies are used to simulate real-life scenarios and avoid the risks of overfitting. All subsets are augmented independently with a rotation ranging from −10 to 10 degrees, a zooming range of 0-10%, shearing of 0-10%, a horizontal stride of 0-10%, a vertical stride of 0-10%, and horizontally flip to improve generalization and increase diversity in the learning process by the models. The pixel values unavailable in the augmented image are replaced by the nearest pixel values. Examples of pre-processed chest X-ray images from each of the three classes are shown in Figure 1. The vertical flip is avoided because it will be easy for an ordinary user to identify vertical orientation in chest X-ray images. The class label for each image is encoded using one-hot encoding. It converts the label into a 3D vector with all zeros except for the one corresponding to the image class. It can be noted that the actual order of the classes in these three dimensions does not affect the classifier performance [16].

Feature Extraction
The feature extraction phase comprises the first twelve layers, while the remaining are for the classifier phase. Architectural details of the BNCNN model, the output dimension at each layer, and the number of trainable/non-trainable parameters are provided in Table 2. Inspired by the VGG models, the feature extraction phase of the proposed BNCNN uses four repetitions of a block of similar layers. Each block comprises a convolution layer to learn kernel weights suitable for the features map, followed by a batch normalization layer to solve the internal covariance shift of feature maps [13] and a max-pooling layer to find the highest-level patterns in the input images by increasing the span of the convolution calculation [16].
where h = m + s and v = n + s. Herein, m and n are horizontal and vertical indices of the image, and s is stride. A detailed illustration can be found in [17]. Each convolution layer comprises 3 × 3 filters with stride and padding of 1 and a ReLU activation function. The number of filters in convolution layers increases from the input to the output layers. A batch normalization layer after every convolution layer increases the model's generalization capability. It standardizes the output of the previous layer to have a mean of zero and a Standard Deviation of one. These layers keep track of input variable statistics during training and standardize input during testing. The standardized variables can be scaled at the transformed output to have the desired statistics updated during training and maintained for testing. Hence, these layers have an equal number of trainable and non-trainable parameters. A max-pooling layer reduces the dimension of the input feature map without any parameters. All max-pooling layers use a stride and maximum size of two, halving the dimension of the input feature map at each occurrence.

Classification
The classifier starts with the flattening layer to convert all output of the previous phase (i.e., feature extraction) into a vector. This phase replaces convolution and max-pooling layers with dense and dropout layers. It comprises two repetitions of a block of the dense layer for feature mapping, followed by a batch normalization layer to standardize internal feature maps. The dropout layer avoids overfitting while aiding the model generalization, and a softmax layer generates a probabilistic output for each class label.
The number of neurons in the dense layer decreases from 256 to 128 from input to output to remove redundant features. Furthermore, it reduces the model's computational complexity as the number of trainable parameters for dense layers increases exponentially with the number of neurons. All dropout layers use a drop factor of 0.2, indicating that randomly selected 20% weights are updated in each iteration to increase the model's generalization.
The architecture of the BNCNN remains unchanged for three-way and two-way classifications except for the softmax layer. In three-way classification, the softmax layer uses three output nodes with 387 trainable parameters, resulting in 2,786,435 trainable parameters. While in two-way classification, two output nodes with 258 trainable parameters are used, resulting in 2,786,306 trainable parameters.
The Adaptive Moment Estimation (Adam) optimizer iteratively updates network weights using the training data. This optimizer combines Stochastic Gradient Descent (SGD) and Root Mean Square Propagation (RMSP). The choice of Adam optimizer facilitates adjusting the learning rate for each weight in the network by computing the first and second moments of the gradient, adaptive learning rate, and history-based updates for faster convergence. Moreover, it shows the best accuracy compared to SGD and RMSP optimization algorithms [17].
For the nth training example, let y pred (n, i) be the ith scalar value corresponding to predicted class probability (∑ i y pred (n, i) = 1, ∀n) and y (n, i) be the ith scalar value corresponding to the actual one-hot-encoded class label. The cross-entropy loss (L) is calculated as follows: where, N train is the total number of training examples and N class is the number of classes (i.e., 3 for three-way and 2 for two-way classification). The number of non-trainable parameters remains unaltered in both classification types. The major contributors to the increasing number of trainable parameters, convolution, and dense layers are minimized in the proposed BNCNN layer. Figure 2 illustrates the overall process of the BNCNN model. A callback is tailored to reduce the learning rate after every epoch if the validation accuracy stops improving. The callback's primary goal is to monitor the validation accuracy and reduce the learning rate by a factor of 0.3 if no improvement is achieved for three consecutive epochs. After reducing the learning rate, it waits for at least five epochs before applying the reduction again. Another callback is tailored for early stopping the training by monitoring training and validation accuracies after each epoch. The callback stores the model weights if the current training and validation accuracies exceed the earlier epochs. The second callback avoids model overfitting without worrying about the exact number of epochs. The proposed BNCNN steps are represented in Algorithm 1.  A callback is tailored to reduce the learning rate after every epoch if the validation accuracy stops improving. The callback's primary goal is to monitor the validation accuracy and reduce the learning rate by a factor of 0.3 if no improvement is achieved for three consecutive epochs. After reducing the learning rate, it waits for at least five epochs before applying the reduction again. Another callback is tailored for early stopping the training by monitoring training and validation accuracies after each epoch. The callback stores the model weights if the current training and validation accuracies exceed the earlier epochs. The second callback avoids model overfitting without worrying about the exact number of epochs. The proposed BNCNN steps are represented in Algorithm 1.

Experiments and Results
This section provides the implementation details of the BNCNN model, followed by the results of the experiments.
The BNCNN model is trained for 100 iterations using the Adam optimizer with an initial learning rate decay of 0.0001 to finish all the epochs and obtain the solutions without interruption. The cross-entropy loss function minimizes the distance between predicted and actual probability distributions.
The hyper-parameter settings of the Adam optimizer for the BNCNN model and other pre-trained models are provided in Table 3. These settings are assigned after we experimentally find that these are the best settings of parameters for training the models. All the models are implemented using Python and are executed using 12 GB NVIDIA Tesla P100 GPU and Intel Xenon CPU @ 2.00GHz with 13 GB RAM.

Evaluation Measures
For model evaluation, accuracy, sensitivity, Positive Predictive Value (PPV), and F1score measures are used to assess the performance of the BNCNN and the other models. The equations for deriving the values of these metrics are provided in (3)-(6).

Results and Discussion
This section discusses the comparative performance of the BNCNN against the other pre-trained models.

Results of the Proposed BNCNN Model
The comparative performance results of the BNCNN model and the pre-trained models for three-way classification are provided in Table 4. Although the VGG-16 model performs slightly better than the proposed BNCNN model during training, the latter performs better during the validation and testing phases, as shown in   A confusion matrix is utilized to determine further the distribution of the predicted X-ray images in different classes. The confusion matrices for the three-way and two-way classification of the BNCNN model are shown in Figure 3. The BNCNN model is tested on the test dataset, including 384 COVID-19 X-ray images, 981 Normal images, and 153 Viral Pneumonia images for three-way classification. For two-way classification, 369 COVID-19 images and 1004 normal images are used. A confusion matrix is utilized to determine further the distribution of the predicted X-ray images in different classes. The confusion matrices for the three-way and two-way classification of the BNCNN model are shown in Figure 3. The BNCNN model is tested on the test dataset, including 384 COVID-19 X-ray images, 981 Normal images, and 153 Viral Pneumonia images for three-way classification. For two-way classification, 369 COVID-19 images and 1004 normal images are used.

Convergence Analysis
The convergence analysis is performed to study the stability of the learning patterns by the BNCNN model over the number of epochs. Figure 4 plots the BNCNN model accuracy and loss on the training and validation datasets over the training number of epochs. Figure 4a is for three-way classification, while Figure 4b is for two-way classification. It can be observed from Figure 4 that the BNCNN model has shown the best fit and convergence for three-way and two-way classification.

Comparison with Existing Models
Several ML models have been developed to diagnose COVID-19 automatically from X-ray images. Ozturk et al. (2020) [21] suggested the DarkCovidNet model for automatic COVID-19 detection from X-ray images. The model used 17 convolution layers with different numbers of filters in each layer. The DarkCovidNet model reported high reliability with an accuracy of 98% for two-way classification (i.e., COVID-19 and Normal) and 87% for 3-way classification (i.e., COVID-19, Normal, and Viral Pneumonia). In another work, Khan et al. (2020) [22] reported a CoroNet model based on pre-trained Inception architecture for COVID-19 detection from X-ray images. The CoroNet reported an accuracy of 95% for three-way classification.
Apostolopoulos and Mpesiana (2020) [23] suggested a novel convolution neural network (CNN) architecture and examined VGG-19 for COVID-19 detection from X-ray images. The model reported an accuracy of 93.48% for three-way classification. Wang et al. (2020) [24] reported a COVID-Net model for 3-way classification with an accuracy of 92.4%. Sethy and Behera (2020) [25] combined TL based on ResNet-50 and the support vector machine to diagnose COVID-19 from X-ray images. Their combined model achieved 95.38% accuracy for three-way class classification.
Horry et al. (2020) [26] used a pre-trained VGG-19 model for COVID-19 detection using a dataset comprising 115 COVID-19, 60361 Normal, and 322 Pneumonia X-ray images. The results showed that the pre-trained model attained 81% accuracy for 3-way classification. In another work, Rahimzadeh and Attar (2020) [27] detected COVID-19 from X-ray images using several deep neural networks and reported an accuracy of 91.4% for 3-way classification. Song et al. (2021) [28] developed a computer-aided method to classify images into COVID-19, bacterial Pneumonia, and Normal cases from a dataset collected from two provinces in China. Experimental results showed that the reported model could accurately identify COVID-19 cases with an accuracy of 86%. Figure 3a shows that 360 of 384 COVID-19 images, 979 of 981 normal images, and 131 of 153 Viral Pneumonia images are correctly classified for the three-way classification. Figure 3b shows 361 out of 369 COVID-19 images, and 1002 out of 1004 Normal images are classified for two-way classification.

Convergence Analysis
The convergence analysis is performed to study the stability of the learning patterns by the BNCNN model over the number of epochs. Figure 4 plots the BNCNN model accuracy and loss on the training and validation datasets over the training number of epochs. Figure 4 (a) is for three-way classification, while Figure 4 (b) is for two-way classification. It can be observed from Figure 4 that the BNCNN model has shown the best fit and convergence for three-way and two-way classification.

Comparison with Existing Models
Several ML models have been developed to diagnose COVID-19 automatically from X-ray images. Ozturk et al. (2020) [21] suggested the DarkCovidNet model for automatic COVID-19 detection from X-ray images. The model used 17 convolution layers with different numbers of filters in each layer. The DarkCovidNet model reported high reliability with an accuracy of 98% for two-way classification (i.e., COVID-19 and Normal) and 87% for 3-way classification (i.e., COVID-19, Normal, and Viral Pneumonia). In another work, Khan et al. (2020) [22] reported a CoroNet model based on pre-trained Inception architecture for COVID-19 detection from X-ray images. The CoroNet reported an accuracy of 95% for three-way classification.  [29] suggested a CoroDet model for COVID-19 detection from Xray images. The results confirmed that the CoroDet model could effectively identify COVID-19 cases with an accuracy of 94.2% for three-way and 99.1% for two-way classification. Chen (2021) [30] employed a CNN model to detect COVID-19 cases from X-ray images, and the results showed an accuracy of 85% for 3-way classification. Vinod et al. (2021) [31] suggested DeepCovix-net to effectively diagnose COVID-19 from X-ray and CT medical images and reported an accuracy of 96.8% for 3-way class classification. Anter et al. (2021) [32] proposed a model for COVID-19 diagnosis from X-ray images called AFCM-LSMA. Their suggested model achieved an accuracy of 96% for two-way classification. Basha et al. (2021) [33] reported a neurotrophic model for COVID-19 diagnosis from chest X-ray images with an accuracy of 98.7% for two-way classification. Table 6 depicts the accuracy of the achievements of the existing models. Most studies in Table 6 used different datasets to validate their proposed model's efficiency. The dataset used in this work is collected from existing studies and is publicly available [34]. It is not fair to compare the performance of the BNCNN with the other models since the size and chrematistics of the datasets are different, but the performance of these models is still comparable. However, Table 6 depicts the achievements of 13 existing models against the proposed BNCNN model in terms of accuracy for three-way and two-way class classification. As per the results in Table 6, the proposed BNCNN model provides higher accuracy for three-way and two-way classification than the existing models. Table 6. Accuracy comparison between the existing models and the proposed BNCNN model.

Statistical Analysis
To further evaluate and show the significance of the results of the BNCNN, Friedman's test is performed [35]. We partially trained models to analyze learning speed based on testing accuracy. Testing accuracy performance for the BNCNN and the other pretrained models, corresponding to the respective model's test accuracy, are evaluated after 100 epochs.
Friedman's test is performed with the null hypotheses where the testing accuracy samples for BNCNN and other pre-trained models originated from the same distribution, i.e., all models under comparison have equal testing accuracy. The alternate hypothesis assumes at least one of the models predicts different testing accuracy than other COVID-19 detection models with a significance level (p < 0.05). The samples for testing accuracy are taken from partially trained models at every ten epochs. It should be noted that higher rankings indicate improved performance. The average ranks of all the models for three-way and two-way classification are shown in Figure 5. The p-value calculated using Friedman's test for three-way classification is 0.0146 and for two-way classification is p = 0.0053, which is less than the value of p. The test indicates that testing accuracy is different for comparative models. From Figure 5, BNCNN is ranked first for both three-way and two-way classification.
Holm's posthoc test is used to confirm the differences in the behavior of the BNCNN (controlled model) and the other comparative models, as provided in Table 7. It uses BNCNN as the controlled model because of the highest rank in Friedman's test. The results in Table 7 show a significant difference between the BNCNN and other pre-trained models. This proves the efficiency of the BNCNN as an alternative model for COVID-19 detection. 1677 Reject assumes at least one of the models predicts different testing accuracy than other COVID-19 detection models with a significance level (p < 0.05). The samples for testing accuracy are taken from partially trained models at every ten epochs. It should be noted that higher rankings indicate improved performance. The average ranks of all the models for threeway and two-way classification are shown in Figure 5. The p-value calculated using Friedman's test for three-way classification is 0.0146 and for two-way classification is p = 0.0053, which is less than the value of p. The test indicates that testing accuracy is different for comparative models. From Figure 5, BNCNN is ranked first for both three-way and twoway classification. Holm's posthoc test is used to confirm the differences in the behavior of the BNCNN (controlled model) and the other comparative models, as provided in Table 7. It uses BNCNN as the controlled model because of the highest rank in Friedman's test. The results in Table show a   In addition, the ability of the BNCNN and other models are assessed by comparing them in terms of Area under the Receiver Operating Curve (AUC). We used the DeLong test to assess differences between the AUC of the models (p ≤ 0.05 is considered statistically significant) [36]. Figure 6 shows the discriminative ability of all the used models.  In addition, the ability of the BNCNN and other models are assessed by comparing them in terms of Area under the Receiver Operating Curve (AUC). We used the DeLong test to assess differences between the AUC of the models (p ≤ 0.05 is considered statistically significant) [36]. Figure 6 shows the discriminative ability of all the used models.  The proposed BNCNN model showed better discriminative ability with an equal AUC of 0.92 (95% CI: 0.87-0.95) and 0.94 (95% CI: 0.89-0.96) for three-way and twoway classification, respectively. Discriminative analysis of other models compared to the proposed BNCNN is shown in Table 8. No competing model has a statistically similar AUC to the proposed BNCNN model. It can be observed that all models have smaller AUCs than the BNCNN model, indicating the better discriminative ability of the developed BNCNN model.

Discussion
This work aimed to introduce a novel model for COVID-19 detection from X-ray medical images as an intelligent platform, which can provide updates on the patient's health conditions and then guide further treatment. The proposed BNCNN model performance and efficacy are investigated using several evaluation measurements and compared to VGG-16, VGG-19, Inception-V3, and ResNet-50 pre-trained models. It is observed from the performance measures comparison that the suggested model shows superior results compared to the tested pre-trained models. In addition, the statistical tests of significance proved the superiority of the proposed model compared to other pre-trained models, which reflects the reliability of the developed BNCNN model. This is because the batch normalization layers in the proposed BNCNN model extracted features much better, while the max pooling and dropout layers reduced the computational complexity in their structure.
The main limitation associated with our study is the dataset, which is restricted only to publically available chest X-ray images. X-ray images are not recommended as the first lineimaging test for diagnosing COVID-19 due to the low positive detection rate at the early stages, which may be related to the insensitivity of X-ray images to the density of Ground Glass Opacity (GGO) [37]. Conversely, CT showed significant advantages in monitoring disease progression and served as an effective clinical diagnostic tool for early screening and diagnosis of COVID-19 [38]. CT has proved to be a good choice for early detection, severity assessment, and timely therapeutic effects evaluation for COVID-19, with or without laboratory confirmation [39]. X-ray and CT are medical imaging techniques widely used to assess and diagnose COVID-19 pneumonia patients. However, CT shows greater sensitivity for early pneumonic change, disease progression, and alternative diagnosis than X-ray [40]. ML can analyze irregular symptoms and other 'red flags 'of the infected cases at the early stage by using advanced algorithms [41]. These methods show a promising way for optimizing healthcare and improving the results of diagnostic and therapeutic procedures. Therefore, the extension of the proposed BNCNN model can be improved by using a large CT dataset to build an intelligent, accurate, and cost-effective platform for COVID-19.

Conclusions and Future Work
The COVID-19 disease is becoming increasingly significant as infected cases rapidly increase. Many researchers have devoted their efforts to developing ML models for COVID-19 detection that would benefit radiologists and health experts. This paper proposed an improved model named BNCNN to detect COVID-19 from chest X-ray images. The BNCNN uses VGG-inspired repetitive block structure, and each block comprises a convolution, followed by batch normalization, and max-pooling layers to improve the model's generalization and feature map reduction. For confirming the reliability of the proposed BNCNN and other pre-trained models: VGG-16, VGG-19, Inception-V3, and ResNet-50 pre-trained models, a dataset is employed from 'Kaggle'. The results show the superiority of the proposed BNCNN model over the pre-trained models, which significantly outperforms other comparative pre-trained models. Hence, the proposed BNCNN can be used to recognize the COVID-19 virus accurately. In the future, the performance of the proposed BNCNN model on CT imaging needs to be investigated.