In Search of an Efficient and Reliable Deep Learning Model for Identification of COVID-19 Infection from Chest X-ray Images

The virus responsible for COVID-19 is mutating day by day with more infectious characteristics. With the limited healthcare resources and overburdened medical practitioners, it is almost impossible to contain this virus. The automatic identification of this viral infection from chest X-ray (CXR) images is now more demanding as it is a cheaper and less time-consuming diagnosis option. To that cause, we have applied deep learning (DL) approaches for four-class classification of CXR images comprising COVID-19, normal, lung opacity, and viral pneumonia. At first, we extracted features of CXR images by applying a local binary pattern (LBP) and pre-trained convolutional neural network (CNN). Afterwards, we utilized a pattern recognition network (PRN), support vector machine (SVM), decision tree (DT), random forest (RF), and k-nearest neighbors (KNN) classifiers on the extracted features to classify aforementioned four-class CXR images. The performances of the proposed methods have been analyzed rigorously in terms of classification performance and classification speed. Among different methods applied to the four-class test images, the best method achieved classification performances with 97.41% accuracy, 94.94% precision, 94.81% recall, 98.27% specificity, and 94.86% F1 score. The results indicate that the proposed method can offer an efficient and reliable framework for COVID-19 detection from CXR images, which could be immensely conducive to the effective diagnosis of COVID-19-infected patients.

In recent times, few investigations emerged with four-class classification capabilities in COVID-19 identification from CXR images. Hussain et al., used a small CXR dataset with normal (138), bacterial pneumonia (145), non-COVID-19 viral pneumonia (145), and COVID-19 (130) and managed to produce 79.52% accuracy [59]. The deep CNN termed as CoroNet was able to produce 89.6% over a CXR dataset composed of normal (310), bacterial pneumonia (330), viral pneumonia (327), and COVID-19 (284) [60]. The use of deep ResNet improved the accuracy of the four-class classification to 92.1% for a very small dataset consisting of 450 CXR images [61]. Attaullah et al., combined patients' symptoms with a total of 800 four-class CXR images and obtained an accuracy of 78.88% [62]. The estimation of the uncertainty in the deep CNN with a Bayesian approach can improve the reliability of the accuracy measurements in a four-class classification problem [63]. The pretrained and fine-tuned ResNet-50 architecture have been shown to achieve 96.23% accuracy for four-class CXR dataset containing normal (1203), non-COVID-19 viral pneumonia (660), bacterial pneumonia (931), and COVID-19 pneumonia (68) [64]. It is hard to find rigorous and extensive studies of COVID-19 identification within the framework of fourclass classification problems using different ML algorithms on a relatively large dataset in the existing literature.
To that cause, here we used different ML algorithms on a comparatively large dataset comprising of 5360 CXR images containing four different classes, i.e., COVID-19, normal, lung opacity, and viral pneumonia, each of which contains 1340 images. The CXR image feature extractions were performed using local binary pattern (LBP) and pre-trained CNN. We used LBP-based PRN, LBP-based SVM, LBP-based DT, LBP-based RF, LBP-based KNN, and CNN-based SVM for image class identification. For the reliable performance analysis of LBP-PRN, a variety of six different training algorithms were used. The performance of SVM classifiers was assessed for nineteen different pre-trained CNNs in feature extraction. The classification performance provided by the ensemble configuration of the three best CNN-based SVM classifiers selected from these aforementioned nineteen different CNNs was also evaluated in this study. Overall, we believe that the results presented here have established an efficient and reliable CNN-based SVM framework for COVID-19 detection from CXR images.

Materials and Methods
Here, we classified four-class CXR images of COVID-19 infected patients (COVID-19 class), healthy persons (normal class), persons with lung opacity (lung opacity class), and viral pneumonia infected patients (viral pneumonia class) using LBP-and CNN-based feature extraction from the CXR images. The LBP-extracted features were subsequently used to train the PRN, SVM, DT, RF, and KNN-based machine learning classifiers. Moreover, the pre-trained CNN-derived features were subjected to SVM-based classification. The functional diagram of such feature extraction-based classifiers is depicted in Figure 1.
Diagnostics 2023, 13, x FOR PEER REVIEW 4 of 21 based feature extraction from the CXR images. The LBP-extracted features were subsequently used to train the PRN, SVM, DT, RF, and KNN-based machine learning classifiers. Moreover, the pre-trained CNN-derived features were subjected to SVM-based classification. The functional diagram of such feature extraction-based classifiers is depicted in Figure 1. In both training and testing phases of the classification process, feature extraction algorithms were used to provide the necessary CXR image features. The classifiers are trained with the features obtained in the training phase and the trained classifiers are used to classify the CXR images based on the image features obtained in the testing phase.

Dataset of CXR Images
The dataset of CXR images used in this study has been collected from a public source [65]. Table 1 outlines a brief description of the number of CXR images in the dataset along with the number of images used in this study. Table 1. Total number of chest X-ray images per-class and per-fold.

Class
Number of Total CXR Images  Dataset  Used  Training per-Fold  Testing per-Fold  COVID-19  3616  1340  1072  268  Normal  10,192  1340  1072  268  Lung opacity  6012  1340  1072  268  Viral pneumonia  1345  1340  1072  268 We have utilized a total of 5360 CXR images from the four different classes comprising of COVID-19, normal, lung opacity, and viral pneumonia, each of which contains an equal number of 1340 images as shown in Table 1. Four sample CXR images from each class are shown in Figure 2.  In both training and testing phases of the classification process, feature extraction algorithms were used to provide the necessary CXR image features. The classifiers are trained with the features obtained in the training phase and the trained classifiers are used to classify the CXR images based on the image features obtained in the testing phase.

Dataset of CXR Images
The dataset of CXR images used in this study has been collected from a public source [65]. Table 1 outlines a brief description of the number of CXR images in the dataset along with the number of images used in this study.  COVID-19  3616  1340  1072  268  Normal  10,192  1340  1072  268  Lung opacity  6012  1340  1072  268  Viral pneumonia  1345  1340  1072  268 We have utilized a total of 5360 CXR images from the four different classes comprising of COVID-19, normal, lung opacity, and viral pneumonia, each of which contains an equal number of 1340 images as shown in Table 1. Four sample CXR images from each class are shown in Figure 2.
Diagnostics 2023, 13, x FOR PEER REVIEW 4 of 21 based feature extraction from the CXR images. The LBP-extracted features were subsequently used to train the PRN, SVM, DT, RF, and KNN-based machine learning classifiers. Moreover, the pre-trained CNN-derived features were subjected to SVM-based classification. The functional diagram of such feature extraction-based classifiers is depicted in Figure 1. In both training and testing phases of the classification process, feature extraction algorithms were used to provide the necessary CXR image features. The classifiers are trained with the features obtained in the training phase and the trained classifiers are used to classify the CXR images based on the image features obtained in the testing phase.

Dataset of CXR Images
The dataset of CXR images used in this study has been collected from a public source [65]. Table 1 outlines a brief description of the number of CXR images in the dataset along with the number of images used in this study. Table 1. Total number of chest X-ray images per-class and per-fold.

Class
Number of Total CXR Images  Dataset  Used  Training per-Fold  Testing per-Fold  COVID-19  3616  1340  1072  268  Normal  10,192  1340  1072  268  Lung opacity  6012  1340  1072  268  Viral pneumonia  1345  1340  1072  268 We have utilized a total of 5360 CXR images from the four different classes comprising of COVID-19, normal, lung opacity, and viral pneumonia, each of which contains an equal number of 1340 images as shown in Table 1. Four sample CXR images from each class are shown in Figure 2.

Extraction of Features from CXR Images
The feature extraction process maps the most significant information of image to a much reduced-size feature vector. In this work, we have demonstrated the use of LBP operator and pre-trained CNN to extract features from CXR images.

Extraction of Features from CXR Images Using LBP Operator
The extraction of image features using LBP operator has found extensive applications in the field of image processing [66][67][68][69]. The basic principle of LBP operator was first presented in Ref. [66] to describe the texture of an image. This operator works by thresholding the gray levels of neighborhood pixels compared to that of their central pixel in a local circular region. The thresholded values are then summed up in a clockwise direction after being weighted by powers of 2 to obtain the gray levels of the central pixel. The LBP value of a given pixel is given by [66]: In Equation (1), P is the total number of neighborhoods of the central pixel in a region of the image having radius R, g m stands for gray level of the neighborhood pixel and g c represents the gray level of the central pixel within the region considered in the image.
The mechanism of feature extraction using LBP is depicted in Figure 3. In this illustration, a local region having radius R = 1 is considered to attain an image segment of 3×3 pixels. There are 8 pixels (P = 8) in this region excluding the central pixels. Then, the thresholding operation is performed through which the pixel value is set to 0 if the gray level of any neighboring pixel is less than that of the center pixel which is 90 in this illustration. Otherwise, it is set to 1. The binary values obtained through this thresholding operation are then weighted by powers of 2 sequentially in a clockwise fashion. These weighted values are finally summed up to obtain the LBP value of the central pixel. The process is repeated to obtain the LBP values of central pixels of other local regions in the whole image.

Extraction of Features from CXR Images
The feature extraction process maps the most significant information of image to a much reduced-size feature vector. In this work, we have demonstrated the use of LBP operator and pre-trained CNN to extract features from CXR images.

Extraction of Features from CXR Images Using LBP Operator
The extraction of image features using LBP operator has found extensive applications in the field of image processing [66][67][68][69]. The basic principle of LBP operator was first presented in Ref. [66] to describe the texture of an image. This operator works by thresholding the gray levels of neighborhood pixels compared to that of their central pixel in a local circular region. The thresholded values are then summed up in a clockwise direction after being weighted by powers of 2 to obtain the gray levels of the central pixel. The LBP value of a given pixel is given by [66]: In Equation (1), P is the total number of neighborhoods of the central pixel in a region of the image having radius R, gm stands for gray level of the neighborhood pixel and gc represents the gray level of the central pixel within the region considered in the image.
The mechanism of feature extraction using LBP is depicted in Figure 3. In this illustration, a local region having radius R = 1 is considered to attain an image segment of 3×3 pixels. There are 8 pixels (P = 8) in this region excluding the central pixels. Then, the thresholding operation is performed through which the pixel value is set to 0 if the gray level of any neighboring pixel is less than that of the center pixel which is 90 in this illustration. Otherwise, it is set to 1. The binary values obtained through this thresholding operation are then weighted by powers of 2 sequentially in a clockwise fashion. These weighted values are finally summed up to obtain the LBP value of the central pixel. The process is repeated to obtain the LBP values of central pixels of other local regions in the whole image. For an M × N image, a total of 2 P local binary patterns LP,R are obtained from Equation (1) which are then represented as a histogram vector I of length 0 ≤ l≤ (2 P − 1) as given by The LP,R operator given by Equation (1) generates 2 P different local binary patterns. These local binary patterns vary in accordance with the rotation of image. To avoid the For an M × N image, a total of 2 P local binary patterns L P,R are obtained from Equation (1) which are then represented as a histogram vector I of length 0 ≤ l ≤ (2 P − 1) as given by The L P,R operator given by Equation (1) generates 2 P different local binary patterns. These local binary patterns vary in accordance with the rotation of image. To avoid the Diagnostics 2023, 13, 574 6 of 21 effect of this rotation, the rotation invariant uniform (riu) LBP operator is used in this study, which is defined by [66]: where U(L P,R ) is a measure of uniformity, which corresponds to the number of bitwise transitions from 0 to 1 or 1 to 0 in L P,R . The superscript 2 on the left side of Equation (3) signifies the utilization of riu patterns having U value of no more than 2. All non-uniform L P,R obtained through Equation (1) are now grouped as one pattern. As a result, the use of Equation (3) gives a total of P + 2 different riu LBP [66,69]. In our study, we have considered R = 1, for which P = 8. As a result, the length of the histogram vector (i.e., length of the feature vector) for each CXR image is only 10.

Extraction of Features from CXR Images Using CNN
The convolutional neural network (CNN) is a dominant tool widely used for extracting features from images using deep learning algorithms [30][31][32][33][34]. A CNN can effectively extract the spatial and temporal characteristics in an image by utilizing shared weight structure of convolution filters to provide crucial features of an image. The CNN architecture consists of a large number of convolutional layers, batch normalization layers, rectified linear units (ReLU), and pooling layers [30,39]. The organizations of such architecture are different for different pre-trained CNNs. The layered architecture of a general CNN is given in Figure 4. effect of this rotation, the rotation invariant uniform (riu) LBP operator is used in this study, which is defined by [66]: where U(LP,R) is a measure of uniformity, which corresponds to the number of bitwise transitions from 0 to 1 or 1 to 0 in LP,R. The superscript 2 on the left side of Equation (3) signifies the utilization of riu patterns having U value of no more than 2. All non-uniform LP,R obtained through Equation (1) are now grouped as one pattern. As a result, the use of Equation (3) gives a total of P + 2 different riu LBP [66,69]. In our study, we have considered R = 1, for which P = 8. As a result, the length of the histogram vector (i.e., length of the feature vector) for each CXR image is only 10.

Extraction of Features from CXR Images Using CNN
The convolutional neural network (CNN) is a dominant tool widely used for extracting features from images using deep learning algorithms [30][31][32][33][34]. A CNN can effectively extract the spatial and temporal characteristics in an image by utilizing shared weight structure of convolution filters to provide crucial features of an image. The CNN architecture consists of a large number of convolutional layers, batch normalization layers, rectified linear units (ReLU), and pooling layers [30,39]. The organizations of such architecture are different for different pre-trained CNNs. The layered architecture of a general CNN is given in Figure 4. As shown in Figure 4, the CXR images are applied to the input layer of the CNN. The image input layer is followed by repeated and sequential arrangement of convolutional layers, batch normalization layers, rectified linear units (ReLUs), and max pooling layers. In each of the convolutional layers, padding is added to the input feature map for ensuring the output size is equal to the input size. In the layered arrangement, the convolutional layer is followed by the batch normalization layer to normalize the activations and gradients propagating through the CNN. The ReLU is then used to perform the process of nonlinear activation. Such ReLU layers also help to speed up the training of the CNN and reduce the sensitivity to the parameter initialization. Next, the max pooling layer is used, in which the function is to perform down-sampling so as to reduce the size of the feature map as well as to eliminate redundant information. The features of the CXR images are provided by the fully connected layer at the end of the CNN as feature vectors. The CNNs used in this study are listed in Table 2 along with the fully connected feature layer from which feature vectors have been extracted. As shown in Figure 4, the CXR images are applied to the input layer of the CNN. The image input layer is followed by repeated and sequential arrangement of convolutional layers, batch normalization layers, rectified linear units (ReLUs), and max pooling layers. In each of the convolutional layers, padding is added to the input feature map for ensuring the output size is equal to the input size. In the layered arrangement, the convolutional layer is followed by the batch normalization layer to normalize the activations and gradients propagating through the CNN. The ReLU is then used to perform the process of nonlinear activation. Such ReLU layers also help to speed up the training of the CNN and reduce the sensitivity to the parameter initialization. Next, the max pooling layer is used, in which the function is to perform down-sampling so as to reduce the size of the feature map as well as to eliminate redundant information. The features of the CXR images are provided by the fully connected layer at the end of the CNN as feature vectors. The CNNs used in this study are listed in Table 2 along with the fully connected feature layer from which feature vectors have been extracted.

Classification of CXR Images
In this study, the classification of CXR images has been performed based on the extracted features of CXR images. For such classification, we have utilized several widely used classifiers, i.e., PRN, SVM, DT, RF, and KNN.

Classification of CXR Images Using PRN Classifier
The pattern recognition network (PRN) used in this study is a feedforward neural network comprising the input layer, a single hidden layer and the output layer. The PRN has been trained first by using known input-output patterns to optimize the weights of interconnection between the neurons in its different layers. The basic architecture of the PRN used is shown in Figure 5.

Classification of CXR Images
In this study, the classification of CXR images has been performed based on the tracted features of CXR images. For such classification, we have utilized several wid used classifiers, i.e., PRN, SVM, DT, RF, and KNN.

Classification of CXR Images Using PRN Classifier
The pattern recognition network (PRN) used in this study is a feedforward neu network comprising the input layer, a single hidden layer and the output layer. The P has been trained first by using known input-output patterns to optimize the weight interconnection between the neurons in its different layers. The basic architecture of PRN used is shown in Figure 5. The output of the ith neuron in the input layer equals to the input to the ith neu of the same input layer, i.e., The output of the ith neuron in the input layer equals to the input to the ith neuron of the same input layer, i.e., The input I j to the jth neuron in the hidden layer is calculated to be the weighted sum of O i and W ij , where W ij is the weight connecting ith neuron in the input layer and jth neuron in the hidden layer. This I j is then passed through the activation function. In this work, hyperbolic tangent sigmoid function is used for the neurons in the hidden layer. Thus, the output of jth neuron in the hidden layer yields Similarly, the input I k to the kth neuron in the output layer is calculated to be the weighted sum of O j and W jk , where W jk is the weight connecting jth neuron in the hidden layer and kth neuron in the output layer. Initially, the output of the kth neuron in the output layer is computed by This O k is then passed through the softmax activation function. For using this activation function, the output of a particular neuron in the output layer is assumed to be 1 if its value calculated by Equation (6) is the maximum. The outputs of all other neurons are considered to be 0.
The training of the PRN is accomplished via backpropagation learning algorithm [70][71][72]. The feature vectors of the CXR images and their corresponding attributes (i.e., COVID-19, normal, lung opacity, and viral pneumonia) are used as the known input pairs. Once the process of training is over, the trained PRN is tested for the feature vectors of unknown CXR images to determine their attributes. In this work, we have analyzed the classification performances of PRN for six different training algorithms as listed in Table 3. Table 3. Training algorithms used to train the PRN classifier.

Classification of CXR Images Using SVM Classifier
SVM is a popular supervised learning algorithm widely used for data classification in machine learning [73][74][75]. The ultimate function of SVM in data classification applications is to create the best decision boundary, called hyperplane that facilitates the classification among different classes. The utilization of hyperplane performs well for linearly separable data [73,74]. In case of linearly inseparable data classification, SVM essentially utilizes a method called kernel trick by which the linearly inseparable data are transformed into linearly separable data to be classified by a linear classifier. In this study, Gaussian radial basis function (RBF) has been employed as the SVM kernel as it provides better performances in many machine-learning applications [75,76]. Such RBF is defined by [76]: In Equation (7), x − x c is the Euclidean distance of x from the center x c of the function and the parameter σ controls the smoothness of the function. Since the value of κ decreases with the increase in Euclidean distance, the use of Equation (7) can approximate the local characteristics of a nonlinear function closed to x c . The RBF kernel nonlinearly projects two-dimensional original features onto a three-dimensional space. As a result, the linearly inseparable data can be separated by using an appropriate hyperplane. The data grouping approach of SVM via RBF kernel is depicted in Figure 6.
In Equation (7), c x x  is the Euclidean distance of x from the center xc of the function and the parameter σ controls the smoothness of the function. Since the value of κ decreases with the increase in Euclidean distance, the use of Equation (7) can approximate the local characteristics of a nonlinear function closed to xc. The RBF kernel nonlinearly projects two-dimensional original features onto a three-dimensional space. As a result, the linearly inseparable data can be separated by using an appropriate hyperplane. The data grouping approach of SVM via RBF kernel is depicted in Figure 6. In SVM, the known input pairs are employed for the optimization of parameters of classification model. This optimized model is then applied to classify unknown samples. Since SVM can match various data groups acquired from training phase, it can identify images with same categories.

Classification of CXR Images Using DT Classifier
The DT classifier used in this study is based on classification and regression tree (CART) algorithm that employs binary tree structure [77][78][79][80]. In such CART-based DT classifier, the tree structure consists of nodes, which are linked via branches as depicted in Figure 7. The topmost node in Figure 7, called the root node, utilizes all samples to be classified based on various features to create sub-groups. A sub-group is further split in the decision node to create more sub-groups to be split by other decision nodes. Alternatively, such splitting results in final nodes (terminal nodes), which represent the label of the class. The In SVM, the known input pairs are employed for the optimization of parameters of classification model. This optimized model is then applied to classify unknown samples. Since SVM can match various data groups acquired from training phase, it can identify images with same categories.

Classification of CXR Images Using DT Classifier
The DT classifier used in this study is based on classification and regression tree (CART) algorithm that employs binary tree structure [77][78][79][80]. In such CART-based DT classifier, the tree structure consists of nodes, which are linked via branches as depicted in Figure 7.

 
In Equation (7), c x x  is the Euclidean distance of x from the center xc of the function and the parameter σ controls the smoothness of the function. Since the value of κ decreases with the increase in Euclidean distance, the use of Equation (7) can approximate the local characteristics of a nonlinear function closed to xc. The RBF kernel nonlinearly projects two-dimensional original features onto a three-dimensional space. As a result, the linearly inseparable data can be separated by using an appropriate hyperplane. The data grouping approach of SVM via RBF kernel is depicted in Figure 6. In SVM, the known input pairs are employed for the optimization of parameters of classification model. This optimized model is then applied to classify unknown samples. Since SVM can match various data groups acquired from training phase, it can identify images with same categories.

Classification of CXR Images Using DT Classifier
The DT classifier used in this study is based on classification and regression tree (CART) algorithm that employs binary tree structure [77][78][79][80]. In such CART-based DT classifier, the tree structure consists of nodes, which are linked via branches as depicted in Figure 7. The topmost node in Figure 7, called the root node, utilizes all samples to be classified based on various features to create sub-groups. A sub-group is further split in the decision node to create more sub-groups to be split by other decision nodes. Alternatively, such splitting results in final nodes (terminal nodes), which represent the label of the class. The The topmost node in Figure 7, called the root node, utilizes all samples to be classified based on various features to create sub-groups. A sub-group is further split in the decision node to create more sub-groups to be split by other decision nodes. Alternatively, such splitting results in final nodes (terminal nodes), which represent the label of the class. The grouping of all samples in the root node and that in the decision nodes is performed based on a predefined criterion. In our study, we have utilized Gini index criterion to construct the trees. For a group of samples D having c classes, such Gini index is calculated by [79].
where P i corresponds to the probability of class i = 1, 2, . . . ., c in D.

Classification of CXR Images Using RF Classifier
Random forest classifiers utilize multiple decision trees to form forest-like structures by employing randomly selected subsets of features from the feature set of the samples [78,80]. In this ensemble learning procedure, multiple DT-based classifiers are fitted on different subsets of features. Each of the different trees in RF classifier provides the label of a class. The class labels provided by individual trees go through a voting process and the label obtained through the majority voting is considered the final class label. The working principle of a generalized RF classifier is illustrated in Figure 8.
grouping of all samples in the root node and that in the decision nodes is performed based on a predefined criterion. In our study, we have utilized Gini index criterion to construct the trees. For a group of samples D having c classes, such Gini index is calculated by [79].
where Pi corresponds to the probability of class i = 1, 2, …., c in D.

Classification of CXR Images Using RF Classifier
Random forest classifiers utilize multiple decision trees to form forest-like structures by employing randomly selected subsets of features from the feature set of the samples [78,80]. In this ensemble learning procedure, multiple DT-based classifiers are fitted on different subsets of features. Each of the different trees in RF classifier provides the label of a class. The class labels provided by individual trees go through a voting process and the label obtained through the majority voting is considered the final class label. The working principle of a generalized RF classifier is illustrated in Figure 8. As shown in Figure 8, the RF classifier randomly picks subsets of features to construct random decision trees T1, T2,…, TN with corresponding labels Li of class i = 1, 2, …., c. Then, the process of majority voting is utilized to determine the final class label L of the RF classifier.

Classification of CXR Images Using KNN Classifier
The KNN is a non-parametric machine learning classifier that utilizes the similarity between the available training samples and the new sample to be classified [80][81][82]. The class labels for KNN classifier are determined by calculating the closeness among each test sample and the training samples in n-dimensional space. The classification of a new test sample using KNN classifier is illustrated in Figure 9. As shown in Figure 8, the RF classifier randomly picks subsets of features to construct random decision trees T 1 , T 2 , . . . , T N with corresponding labels L i of class i = 1, 2, . . . ., c. Then, the process of majority voting is utilized to determine the final class label L of the RF classifier.

Classification of CXR Images Using KNN Classifier
The KNN is a non-parametric machine learning classifier that utilizes the similarity between the available training samples and the new sample to be classified [80][81][82]. The class labels for KNN classifier are determined by calculating the closeness among each test sample and the training samples in n-dimensional space. The classification of a new test sample using KNN classifier is illustrated in Figure 9. For the classification of the new test sample using KNN classifier, the number (k) of neighbors is fixed first. Then, the Euclidean distance between the new test sample and training samples is determined by Equation (9) to select k-nearest neighbors [82]. Next, the number of training samples belonging to a particular category is counted among these k-nearest neighbors. Finally, the label of the new test sample is assigned to the class for For the classification of the new test sample using KNN classifier, the number (k) of neighbors is fixed first. Then, the Euclidean distance between the new test sample and training samples is determined by Equation (9) to select k-nearest neighbors [82]. Next, the number of training samples belonging to a particular category is counted among these k-nearest neighbors. Finally, the label of the new test sample is assigned to the class for which the number of neighbors among these k-nearest neighbors is maximum. For k = 3, the new example sample in Figure 9 is categorized as class B by KNN classifiers. In practical applications of KNN classifiers, k is usually selected to be an odd number which can minimally be k = 1 [82].

Classification of CXR Images Using Ensemble-CNN Based SVM Classifier
Ensemble technique combines diverse models together to yield better performance than any of the constituent models. Although CNN based classifiers offer good classification performance, they can also be utilized in ensemble configuration to build robust and highly reliable classification model [83][84][85][86]. In this study, we have explored the performance of CNN-based SVMs in such ensemble configuration to further improve the classification accuracy in CXR image classification. To do so, we have identified 3 best CNNs among nineteen different pre-trained CNNs listed in Table 2 based on their classification performances. The CXR images have been applied directly to each of the 3 different CNNs for extracting feature vectors. Three SVMs are then utilized separately to classify the CXR images based on the extracted feature vectors. The mode statistics of SVMs derived CXR image class labels were considered as the ultimate image class as depicted in Figure 10. For the classification of the new test sample using KNN classifier, the number (k) of neighbors is fixed first. Then, the Euclidean distance between the new test sample and training samples is determined by Equation (9) to select k-nearest neighbors [82]. Next, the number of training samples belonging to a particular category is counted among these k-nearest neighbors. Finally, the label of the new test sample is assigned to the class for which the number of neighbors among these k-nearest neighbors is maximum. For k = 3, the new example sample in Figure 9 is categorized as class B by KNN classifiers. In practical applications of KNN classifiers, k is usually selected to be an odd number which can minimally be k = 1 [82].

Classification of CXR Images Using Ensemble-CNN Based SVM Classifier
Ensemble technique combines diverse models together to yield better performance than any of the constituent models. Although CNN based classifiers offer good classification performance, they can also be utilized in ensemble configuration to build robust and highly reliable classification model [83][84][85][86]. In this study, we have explored the performance of CNN-based SVMs in such ensemble configuration to further improve the classification accuracy in CXR image classification. To do so, we have identified 3 best CNNs among nineteen different pre-trained CNNs listed in Table 2 based on their classification performances. The CXR images have been applied directly to each of the 3 different CNNs for extracting feature vectors. Three SVMs are then utilized separately to classify the CXR images based on the extracted feature vectors. The mode statistics of SVMs derived CXR image class labels were considered as the ultimate image class as depicted in Figure 10.

Performance Evaluation of Classifiers
In this study, we have adopted five-fold cross-validation to generalize the performance of the classifiers. Each of the five folds contains different combinations of 4288 (i.e., Figure 10. Classification of CXR images using ensemble-CNN-based SVM classifier.

Performance Evaluation of Classifiers
In this study, we have adopted five-fold cross-validation to generalize the performance of the classifiers. Each of the five folds contains different combinations of 4288 (i.e., 1072 from each class) CXR images for training and 1072 (i.e., 268 from each class) CXR images for testing the classifier as listed in Table 1. The processes of data splitting for five-fold cross-validation and performance evaluation of classifiers are illustrated in Figure 11.  Table 1. The processes of data splitting for fivefold cross-validation and performance evaluation of classifiers are illustrated in Figure 11. To evaluate the performance of the classifiers, we first computed the performance score per-class in a particular fold. For this, four confusion matrix parameters TP (true positive), TN (true negative), FP (false positive), and FN (false negative) were estimated to calculate the four performance scores, namely, accuracy, precision, recall, and specificity by applying Equations (10)-(13) for each of the four different image classes of COVID-19, normal, lung opacity, and viral pneumonia in case of a particular fold. To evaluate the performance of the classifiers, we first computed the performance score per-class in a particular fold. For this, four confusion matrix parameters TP (true positive), TN (true negative), FP (false positive), and FN (false negative) were estimated to calculate the four performance scores, namely, accuracy, precision, recall, and specificity by applying Equations (10)-(13) for each of the four different image classes of COVID-19, normal, lung opacity, and viral pneumonia in case of a particular fold.
The per-fold performance scores stem from the average over the four-class performances. The ultimate performance score is the average obtained from the five-fold performances. In addition to four performance metrics (i.e., accuracy, precision, recall, and specificity), the ultimate performances of the classifiers have also been assessed in terms of F1 score as given by Equation (14), in which the precision and the recall scores used are the ultimate performance scores of the classifier.
In this study, we have extracted the image features of the training images offline. The classifiers have also been trained on the extracted features offline during the training phase. However, the feature extraction and classification of different trained classifiers in the testing phase are performed online to facilitate comparative analysis of their runtime. The whole system was implemented by MATLAB R2021a on a workstation with Intel ® Core™ i7-117000@2.50 GHz, 8 cores 16 logical processors, HP Ex900 M.2 500 GB PCIe NVMe Internal SSD, Gigabyte GeForce RTX 2060 OC 6 GB Graphics card, and 16 GB DDR4 RAM memory.

Results and Discussion
In our demonstration of four-class (i.e., COVID-19, normal, lung opacity, and viral pneumonia) classification, we have extracted the feature vectors of CXR images by using the LBP operator as well as pre-trained CNNs. Then, the classification has been performed by PRN, SVM, DT, RF, and KNN classifiers based on those extracted feature vectors. We have also evaluated the classification performances of LBP-based PRN, LBP-based SVM, LBP-based DT, LBP-based RF, LBP-based KNN, CNN-based SVM, and ensemble-CNNbased SVM by adopting the process of five-fold cross-validation as depicted in Figure 11. For instance, the confusion matrix for the testing images in a particular fold (i.e., Fold 1) is shown in Figure 12 for LBP-based SVM.
The confusion matrix in Figure 12 corroborates LBP-based classifiers' success and is used to compute the performances of LBP-based SVM for each of the four different classes in Fold 1 by using Equations (10)-(13) as shown in Table 4.
It is observed in Table 4 that per-class classification accuracy for the test CXR images of each of the four different classes is around 90%. Now, we have averaged these per-class accuracies obtained for this fold to compute fold accuracy. For this Fold 1, LBP-based SVM yields 90.81% accuracy. We have also calculated the fold precision (81.62%), fold recall (81.68%), and fold specificity (93.90%) by averaging per-class precision, recall, and specificity, respectively, for Fold 1. In a similar fashion, we have calculated the per-fold accuracy, precision, recall, and specificity for the other four folds in the case of LBP-based SVM. The results for each of the five folds are listed in Table 5.
the LBP operator as well as pre-trained CNNs. Then, the classification has been performed by PRN, SVM, DT, RF, and KNN classifiers based on those extracted feature vectors. We have also evaluated the classification performances of LBP-based PRN, LBP-based SVM, LBP-based DT, LBP-based RF, LBP-based KNN, CNN-based SVM, and ensemble-CNNbased SVM by adopting the process of five-fold cross-validation as depicted in Figure 11. For instance, the confusion matrix for the testing images in a particular fold (i.e., Fold 1) is shown in Figure 12 for LBP-based SVM. The confusion matrix in Figure 12 corroborates LBP-based classifiers' success and is used to compute the performances of LBP-based SVM for each of the four different classes in Fold 1 by using Equations (10)-(13) as shown in Table 4. It is observed in Table 4 that per-class classification accuracy for the test CXR images of each of the four different classes is around 90%. Now, we have averaged these per-class accuracies obtained for this fold to compute fold accuracy. For this Fold 1, LBP-based SVM yields 90.81% accuracy. We have also calculated the fold precision (81.62%), fold recall (81.68%), and fold specificity (93.90%) by averaging per-class precision, recall, and specificity, respectively, for Fold 1. In a similar fashion, we have calculated the per-fold accuracy, precision, recall, and specificity for the other four folds in the case of LBP-based SVM. The results for each of the five folds are listed in Table 5.   The overall performance of this classification model is computed by averaging five per-fold performances as listed in Table 5. Consequently, the overall accuracy, precision, recall, and specificity for using LBP-based SVM have been found to be 88.86%, 77.72%, 79.80%, and 92.58%, respectively. The overall F1 score for LBP-based SVM classification turned out to be 78.75% following Equation (14).
Next, we have applied a pattern recognition network (PRN) to classify four classes of CXR images utilizing the image feature vectors obtained from the LBP operator. In this classification process, we have also employed five-fold cross-validation to generalize the overall performance of LBP-based PRN. In this study, we have also explored the effects of applying six different training algorithms to train the PRN as listed in Table 3. The overall classification performances of LBP-based PRN for adopting each of the six training algorithms are shown in Figure 13.
Next, we have applied a pattern recognition network (PRN) to classify four classes of CXR images utilizing the image feature vectors obtained from the LBP operator. In this classification process, we have also employed five-fold cross-validation to generalize the overall performance of LBP-based PRN. In this study, we have also explored the effects of applying six different training algorithms to train the PRN as listed in Table 3. The overall classification performances of LBP-based PRN for adopting each of the six training algorithms are shown in Figure 13. The performances of LBP-based PRN vary with training algorithms used to train the PRN as seen in Figure 13. It is evident that gradient descent ("traingd") and gradient descent with momentum ("traingdm") training algorithms failed to perform well when used with LBP-based PRN. It can also be observed in Figure 13 that the performances of LBPbased PRN are comparable if such PRN is trained with variable learning rate gradient descent ("traingdx"), Levenberg-Marquardt ("trainlm"), resilient backpropagation ("trainrp") and scaled conjugate gradient ("trainscg") learning algorithms. However, the training of LBP-based PRN by adopting the Levenberg-Marquardt algorithm provides the best performances with accuracy, precision, recall, specificity, and F1 score of 88.61%, 77.28%, 79.60%, 92.44%, and 78.42%, respectively. It is worth mentioning that all six different algorithms mentioned in Figure 13 have very similar runtime in the testing phase.
In a similar fashion, we have determined the classification performances of LBPbased DT, LBP-based RF, and LBP-based KNN. For instance, the overall accuracies provided by LBP-based DT, LBP-based RF, and LBP-based KNN are computed to be 83.77%, 87.43%, and 84.58%, respectively. The SVM yields the best accuracy in all cases of LBP- The performances of LBP-based PRN vary with training algorithms used to train the PRN as seen in Figure 13. It is evident that gradient descent ("traingd") and gradient descent with momentum ("traingdm") training algorithms failed to perform well when used with LBP-based PRN. It can also be observed in Figure 13 that the performances of LBP-based PRN are comparable if such PRN is trained with variable learning rate gradient descent ("traingdx"), Levenberg-Marquardt ("trainlm"), resilient backpropagation ("trainrp") and scaled conjugate gradient ("trainscg") learning algorithms. However, the training of LBP-based PRN by adopting the Levenberg-Marquardt algorithm provides the best performances with accuracy, precision, recall, specificity, and F1 score of 88.61%, 77.28%, 79.60%, 92.44%, and 78.42%, respectively. It is worth mentioning that all six different algorithms mentioned in Figure 13 have very similar runtime in the testing phase.
In a similar fashion, we have determined the classification performances of LBP-based DT, LBP-based RF, and LBP-based KNN. For instance, the overall accuracies provided by LBP-based DT, LBP-based RF, and LBP-based KNN are computed to be 83.77%, 87.43%, and 84.58%, respectively. The SVM yields the best accuracy in all cases of LBP-based machine learning classifiers used in this study. Consequently, we have only considered the SVM classifier for classifying CXR images in the next stage.
In this stage, we have extracted feature vectors from the CXR images by applying a deep learning algorithm where features are taken from the fully connected layer at the end of the CNN. To accomplish this feature extraction using CNN, we have employed a total of nineteen different pre-trained CNNs as listed in Table 2. After extracting the feature vectors from the CXR images with CNNs, we have applied the SVM classifier in this stage for the four-class classification of CXR images as described previously. In this stage, we have also applied five-fold cross-validation to generalize the performance of CNN-based SVM classifiers. The overall classification performances of CNN-based SVM for adopting each of the nineteen different pre-trained CNNs along with the LBP-based different machine learning classifiers are shown in Figure 14.
total of nineteen different pre-trained CNNs as listed in Table 2. After extracting the feature vectors from the CXR images with CNNs, we have applied the SVM classifier in this stage for the four-class classification of CXR images as described previously. In this stage, we have also applied five-fold cross-validation to generalize the performance of CNNbased SVM classifiers. The overall classification performances of CNN-based SVM for adopting each of the nineteen different pre-trained CNNs along with the LBP-based different machine learning classifiers are shown in Figure 14. It is easily observed in Figure 14 that each of the nineteen CNN-based SVM classifiers outperforms LBP-based different classifiers (i.e., DT, KNN, PRN, SVM, and RF). This is due to the fact that CNNs utilize a deep learning algorithm that is extremely powerful to extract feature vectors from the CXR images [60][61][62][63]. For instance, the lowest accuracy in Figure 14 provided by the pre-trained CNN model of the NasNet-Mobile-based SVM is 92.74%, which is even better than that of LBP-based DT (83.77%), LBP-based KNN (84.58%), LBP-based PRN (88.61%), LBP-based SVM (88.86%), and LBP-based RF (87.43%) classifiers. However, the classification performances of SVM utilizing image feature vectors extracted with EfficientNet-b0 (model 13 in Figure 14) are the best among the nineteen pre-trained CNN architectures as can be seen from Figure 14. Such "EfficientNet-b0" pre- It is easily observed in Figure 14 that each of the nineteen CNN-based SVM classifiers outperforms LBP-based different classifiers (i.e., DT, KNN, PRN, SVM, and RF). This is due to the fact that CNNs utilize a deep learning algorithm that is extremely powerful to extract feature vectors from the CXR images [60][61][62][63]. For instance, the lowest accuracy in Figure 14 provided by the pre-trained CNN model of the NasNet-Mobile-based SVM is 92.74%, which is even better than that of LBP-based DT (83.77%), LBP-based KNN (84.58%), LBP-based PRN (88.61%), LBP-based SVM (88.86%), and LBP-based RF (87.43%) classifiers. However, the classification performances of SVM utilizing image feature vectors extracted with EfficientNet-b0 (model 13 in Figure 14) are the best among the nineteen pre-trained CNN architectures as can be seen from Figure 14. Such "EfficientNet-b0" pre-trained CNN-based SVM can achieve overall accuracy, precision, recall, specificity, and F1 score of 96.39%, 92.86%, 93.04%, 97.59%, and 92.95%, respectively.
To further improve the classification performance of CNN-based SVM, we have finally utilized ensemble-CNN-based SVM as described in Section 2.3.6. To effectively utilize such ensemble configuration, we have selected the best three pre-trained CNNs (i.e., EfficientNet-b0, DenseNet-201, and DarkNet-53) among the nineteen different CNN architectures used in this study based on their classification metrics. The classification performances of ensemble-CNN-based SVM have also been plotted in Figure 14 (model 25) for the purpose of comparison. The topmost classification performances provided by different feature extraction-based classifiers are listed in Table 6. It is seen in Table 6 that the classification performances attained for using ensemble-CNN-based SVM are the highest among the 25 different classifiers adopted in this study. For instance, the ensemble-CNN-based SVM can improve the classification accuracy bỹ 1% as compared to the best CNN-based SVM (i.e., EfficientNet-b0-based SVM). These overall performances of EfficientNet-b0-based SVM and ensemble-CNN-based SVM are more promising compared to some recently published results as listed in Table 7. As observed in Table 7, the performances attained by applying an Efficient-b0-based SVM classifier are much better than that achieved in Ref. [60] for four-class classification using 1251 CXR images. The overall accuracy of the Efficient-b0-based SVM classifier is also comparable to that achieved in Ref. [64]. It is to be noted that the dataset used in Ref. [64] is imbalanced as there is a big difference in the number of images in each of the four classes (with only 68 CXR images in the COVID-19 class). However, the ensemble-CNN-based SVM classifier used in this study for four-class classification can provide much-improved classification performances as compared to other methods listed in Table 7 with overall accuracy, precision, recall, specificity, and F1 score of 97.41%, 94.91%, 94.81%, 98.27%, and 94.86%, respectively. To the best of our knowledge, these classification performances rank the best among all other reported values for four-class classification of COVID-19, normal, lung opacity, and viral pneumonia CXR images in the existing literature. Now we focus on relative runtime comparison analysis among different classifiers that use different feature extraction algorithms as shown in Figure 14. It is evident that LBP-based DT requires the lowest runtime. Thus, the relative runtime of a particular technique is normalized with respect to the runtime taken by LBP-based DT. The relative runtimes of different LBP-based machine learning classifiers are nearly uniform with LBPbased RF being the slowest. However, the relative runtimes of CNN-based SVMs vary in accordance with the depth of the layered architecture of the pre-trained CNNs. Among them, the SqueezeNet-based SVM yields the lowest runtime while the NasNet-Large-based SVM requires the highest relative runtime as can be seen in Figure 14. However, the relative runtime of the single CNN (i.e., EfficieNet-b0)-based SVM which provides the best classification performances is moderately low as compared to other CNN architectures used in this study. To be specific, the "EfficieNet-b0"-based SVM is 4.63 times slower as compared to LBP-based DT. However, such "EfficieNet-b0"-based SVM can provide significantly improved classification performances compared to that of LBP-based machine learning classifiers as shown in Figure 14. It is also observed in Figure 14 and Table 6 that the ensemble-CNNs-based SVM provides the highest classification performance (e.g., 97.41% accuracy) among all the classifiers used in this study. However, to achieve such high performance, this classifier requires relative runtime of 40.68 (i.e., 40.68 times higher than LBP-based DT), which is~8.72 times larger than that of "EfficieNet-b0"-based SVM.

Conclusions
This paper presents a rigorous study on the identification of COVID-19 infection from CXR images based on machine learning approaches. The feature vectors of CXR images have been extracted successfully by utilizing LBP operator and pre-trained CNNs of nineteen different architectures. Then, PRN, SVM, DT, RF, and KNN classifiers have been applied to classify four-class CXR images comprising COVID-19, normal, lung opacity, and viral pneumonia by utilizing the extracted feature vectors of the CXR images. The performances of LBP-based PRN, LBP-based SVM, LBP-based DT, LBP-based RF, LBP-based KNN, CNN-based SVM, and ensemble-CNN-based SVM classifiers have been investigated in detail on the four-class test images and their performances are analyzed in terms of accuracy, precision, recall, specificity, F1 score, and relative runtime. The effects of using six different learning algorithms used to train the LBP-based PRN are analyzed in detail and the results indicate that the Levenberg-Marquardt learning algorithm provides the best classification performance for using LBP-based PRNs in this study. The results also show that the classification performances of LBP-based classifiers are not up to the mark and are significantly lower than that of CNN-based SVM. Among nineteen different single pretrained CNN-based SVM classifiers, the use of EfficientNet-b0 CNN architecture performs best in our study. The use of such CNN architecture can achieve overall classification performances of 96.39% accuracy, 92.86% precision, 93.04% recall, 97.59% specificity, and 92.95% F1 score with moderately low relative runtime. To further improve the classification performance of CNN-based SVM, we have also utilized ensemble-CNN-based SVM. Such an ensemble configuration consisting of three pre-trained CNNs (i.e., EfficientNet-b0, DenseNet-201, and DarkNet-53) has provided improved classification performances with 97.41% accuracy, 94.91% precision, 94.81% recall, 98.27% specificity, and 94.86% F1 score but required highest runtime to classify CXR images. We believe that the strategy suggested in this paper will provide doctors and physicians with a complementary tool for the diagnosis and prognosis of COVID-19-infected patients. Moreover, the framework so proposed can be integrated into a decision support system that can diagnose COVID-19 based on CXR images, thus considerably minimizing both human and machine error.

Data Availability Statement:
The dataset used in this study is available at the following link: https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database (accessed on 12 May 2022).