A Self-Activated CNN Approach for Multi-Class Chest-Related COVID-19 Detection

: Chest diseases can be dangerous and deadly. They include many chest infections such as pneumonia, asthma, edema, and, lately, COVID-19. COVID-19 has many similar symptoms compared to pneumonia, such as breathing hardness and chest burden. However, it is a challenging task to differentiate COVID-19 from other chest diseases. Several related studies proposed a computer-aided COVID-19 detection system for the single-class COVID-19 detection, which may be misleading due to similar symptoms of other chest diseases. This paper proposes a framework for the detection of 15 types of chest diseases, including the COVID-19 disease, via a chest X-ray modality. Two-way classiﬁcation is performed in proposed Framework. First, a deep learning-based convolutional neural network (CNN) architecture with a soft-max classiﬁer is proposed. Second, transfer learning is applied using fully-connected layer of proposed CNN that extracted deep features. The deep features are fed to the classical Machine Learning (ML) classiﬁcation methods. However, the proposed framework improves the accuracy for COVID-19 detection and increases the predictability rates for other chest diseases. The experimental results show that the proposed framework, when compared to other state-of-the-art models for diagnosing COVID-19 and other chest diseases, is more robust, and the results are promising.


Introduction
Society is developing towards one in which people are more consistently confronted with new and unique diseases. In this industrial era, due to different types of pollution, different diseases, including pulmonological diseases, have arisen. The recent rise of the deadly coronavirus caused a global pandemic. COVID-19 is manifested by pneumonia or chest infections. Different chest diseases are experienced; however, to discover the exact problem or disease in the chest, multiple tests or procedures must be performed. Separate tests and procedures are usually done at one time to identify different chest problems. Many chest diseases cannot be detected with a single test or procedure. X-ray images and convolutional neural networks (CNN) are used to to diagnose different diseases. For example, for the coronavirus disease, to detect the virus, an antibody test, a swab test, and other tests are required, but to determine whether the level of chest infection is due to the same virus, we need X-rays in conjunction with other tests for different diseases.

•
The proposed study considers multimodal chest diseases, including COVID-19. • Self-CNN improved accuracy with a multi-validation method enhances the robustness of the proposed framework.
This paper shows the achievability of detection abnormalities in X-ray images utilizing DL approaches that depend on non-clinical methods. The deep CNNs learn more significant image representations.
The custom CNN design comprises five types of layers: a convolutional layer, a pooling layer, an activation layer, a fully connected layer, and a soft-max activation function that outputs the likelihood for each type of chest disease. Furthermore, transfer learning is applied on proposed DL model that increases accuracy to detect chest diseases using ML classifiers.
The rest of the article is divided into four sections. Section 2 describes previous state-of-the-art work related to chest diseases. Section 3 discusses the proposed system, including the working methodology. Section 4 represents the experiment setup and results. The study is concluded in Section 5.

Background and Literature Review
The purpose of this study is to provide a more efficient and robust solution to the current problem of chest disease detection. An overview of previously proposed studies indicates solutions proposed by other authors. Many automatic techniques are used for solving different problems. AI systems are becoming the core of many real-time and data science-related solutions, including those for COVID-19 [19,20]. Machine learning (ML) methods can be found in science, technology, health care, manufacturing, education, policing, and marketing [21]. DL is a sub-type of ML that works on deep neural network data representations in a supervised, semi-supervised, or unsupervised way [22,23]. DL techniques are used for the execution of different medical imaging-related tasks [24,25]. In radiology, deep learning can improve the precision of X-ray image segmentation and disease diagnostics. DL methods combined with transfer learning from pre-trained models such as ResNet18, InceptionV3, Xception, MobileNetV3, and DenseNet121 can be used for chest X-ray image segmentation, detection, recognition, and classification tasks [26].
In chest imaging, there has been an effort to create and apply computer-aided detection (CAD) frameworks for the identification of lung lesions on chest radiographs [27,28] and chest tomography [29]. A scan of a patient's lungs using CT/X-ray is used to identify COVID-19, which can cause significant chest abnormalities that require a multimodal chest detection system for chest disease diagnosis.
CNN-based approaches have acquired prominence because of their capacity to learn mid-level and high-level image representations. In [30], the authors show the possibility of recognizing different chest pathologies in chest X-rays using convolutional and DL methods. Different CNNs are introduced for the analysis of chest abnormalities. Backpropagation neural networks are used with the supervised learning backend phenomenon, and competitive neural networks with a backend of unsupervised learning have been built for identifying chest infections.
There are a couple of works focused on multi-class pathological X-ray images, where COVID-19 also needs to be included [31]. Abnormal CT signs such as those of COVID-19 patients at our medical centers need to be analyzed. The acknowledgment of these highlights with medical specialists must be fortified by such signs , which will help them make quick and precise decisions [32]. Remarkably, only 56% of early patients of COVID-19 had a typical CT-X-ray test. However, some time after the beginning of indications, CT discoveries were more frequent, including consolidation, reciprocal and fringe illness, complete lung infections, severe opacities, "insane clearing", and "opposite halo" types. Reciprocal lung association was found. In 28% of early patients, 76% of transitional patients, and 88% of late patients, infectious lungs were diagnosed at different stages from a certain range [33].
The point of examination was to research chest (CT) images of confirmed COVID-19 patients and to assess their relationship with clinical findings. This study considered 80 patients with COVID-19 diagnoses from January to February 2020. The chest CT images and other diagnosed information were reported, and the relation between them was examined [34]. The author presented chest CT discoveries from five patients with COVID-19 infection who had introductory negative, inverted polymerase chain response (RT-PCR) reports. Each of the five patients had regular medical discoveries, including ground-glass opacity, a blended type, and a mixed combination of chest abnormalities [35]. The role of CT images as an aid to or substitution to RT-PCR in finding COVID-19 and pneumonia has been a subject of debate [36]. There are many diseases of the chest, and we can identify them using machine learning. One way to identify asthma endotypes is to utilize advances in information-driven strategies, with the suspicion that examples of indications or potential biomarkers were surveyed either longitudinally (e.g., in birth co-founders) or cross-sectionally (e.g., in investigations of patients with asthma) [37]. In another paper, the aim was to improve on the medical use and to extend the precision level of the forced oscillation technique (FOT) for identifying asthma. However, the researchers utilized various methods such as k-nearest neighbor (KNN), AdaBoost, random forest (RF), and the feature-based dissimilarity space classifier [38].
The paper in [39] portrayed records of patients with chronic obstructive pulmonary disease (COPD) who were hospitalized for intensive care to meet the expense of such hospitalizations. For highly endangered patient subgroups, other distinguishing factors were possibly connected to a danger of rehospitalization. An AI model was used to consider the variables related to the danger of rehospitalization utilizing choice tree examination. Another direct cost examination was performed from the point of view of public medical insurance.
A precise survey of articles that utilize AI techniques to distinguish clinically significant COPD phenotypes was performed in [40]. Lately, the developing utilization of AI calculations, bunch investigations specifically, has the potential to establish this grouping via joining other explanatory attributes, comorbidities, genomic data, and biomarkers. This combination will permit scientists to more dependably recognize new types of COPD phenotypes, to better describe existing ones, and to improve conclusions and create novel medicines.
The aim of [41] was to find the potential relationship between cellular breakdowns in the lungs and thereby help clinicians and consequently patients to distinguish cellular breakdown in the lungs using these normal tests. Random forest was adopted to assemble a recognizable proof model between routine blood records and cellular breakdown in the lungs that would decide whether they were intensively connected. There are few recent studies have also used the Ai-driven approaches using Regression [42] and Classification methods to detect the COVID-19 and other lungs infections using time-series, pathological, CT, and X-ray data [43][44][45][46][47]. However, some of them are discussed. The time-series data, used in many studies [48][49][50], are shifted into normalized form using a regression method where later a multi-layer perceptron is used for training purpose [51], it also used in many other studies [52,53]. Similarly, Indian COVID-19 pandemic-based regression is performed on Kaggle data [54], US data are also used in another study [55], as are data from Mexico [56] and Indonesia [57]. Although, some of the studies are given health measures to make effective prevention against COVID-19 [58]. Moreover, the system behavior after government applied policies regarding COVID-19 is analyzed [59]. The evolutionary algorithms also used by various studies to estimate the COVID-19 [60], such as in [60] where optimization algorithms were also used to optimize the proposed methods of COVID-19 identification [61]. Table 1 shows a summary of previous work.

Methodology
In this study, a framework is proposed for the detection of chest diseases, including COVID-19. First, we train our proposed 32-layer CNN and classify the chest diseases using soft-max activations. After that, transfer learning is applied on fully-connected layer of trained CNN. It extracted deep features that were fed to ML classification methods. We then perform 10-fold and 5-fold validation on best performed of the seven machine learning classifiers. The proposed framework in its first stage takes an image as an input and subsequently applies preprocessing to normalize the data. The processing of the proposed framework with primary steps is shown in Figure 1.

Dataset
There are many datasets used for the identification of chest diseases, but we selected these two datasets because our proposed system identifies chest diseases and COVID-19 issues. We selected the NIH and Open COVID-19 X-ray datasets. The datasets of X-ray modality for both chest and COVID-19 diseases have been merged to obtain single-unit data. Due to the size difference, the data were resized in a preprocessing step to normalize it.

NIH Chest X-Ray Dataset
In this dataset, the publicly provided improved rendition (with six more infection classifications) of the dataset is utilized in the new work, which has a much higher number of frontal chest X-ray images. It achieved clinically important information for recognition and determination of CAD systems where all information settings of chest X-rays on clinical sites are still troublesome. However, it is certainly feasible when large numbers of images are utilized for any study. This dataset is separate from the clinical PACS information based on the National Institute of Health Clinical Center (NIH) and comprises 60% of all frontal chest X-rays in the emergency clinic where 14 different chest disease data are utilized in proposed study.

COVID-19 Chest X-Ray Image Dataset
In the analysis of COVID-19 infection, chest X-rays are a significant part of the analysis of COVID-19 infection, as they contain clarified picture datasets. Multi-class chest disease identification including COVID-19 disease is needed. Therefore, the data from NIH and COVID-19, available at kaggle, were collected and normalized. The number of images for each class is kept equal to avoid overfitting and bias. The normalized images with their selected number and dimension are shown in Table 2.  Table 2, the size of these images in the dataset is 1024 × 1024. The extension is Portable network graphics (PNG). There are 200 images per class.

Data Preprocessing
Preprocessing is a significant step in the data mining process. The expression "trash in, trash out" is especially applicable to information mining and AI projects. Data gathering strategies are regularly approximately controlled, with out-of-range esteems, impossible information mixes, missing qualities, etc. A preliminary processing of information is set up for primary preparation or for additional examination. Data preprocessing is a cycle of setting up raw information and making it appropriate for an AI model. It is the first and pivotal advance in making an AI model robust. Data cleaning and normalization are techniques used to eliminate anomalies and normalize the information. It takes a structure that can be handily used to make a model. Normalization is an information-based design technique that decreases data redundancy and eliminates unwanted qualities such as insertion, deletion, and update anomalies. The described normalization rules separate larger tables into smaller tables and connect those utilizing connections. Mathematically, the normalization equation is represented as given in Equation (1): where x is the input variable as the individual input, x min and x max are the minimum and maximum values from that particular feature, and x norm is the output of the processed input value. However, the images have pixels that are taken as intensity values. The normalization in these images is somehow different in the context of numeric operations. For this purpose, different interpolation methods have been used in matrix normalization. The data become loss-free when we increase or decrease the input image dimensions. The bilinear interpolation method has been used to preprocess the images. In image processing tasks, the neighborhood pixels are utilized in most cases to obtain focused pixel results. Similarly, in the bilinear method, the 2 × 2 neighborhood operation is utilized to obtain the weighted average. The horizontal and vertical interpolations are performed using corner points of the given input image as points of reference for further operations. Assume the 4 corner points as C (1,1) , C (2,1) , C (1,2) , and C (2,2) as given in Equations (2)-(5).
The vertical corner points C (1,1) and C (2,1) are shown in Equation (2) and (3), where α is the middle point regarding the x-axis between α 1 and α 2 . The middle point is considered as the 2 × 2 neighborhood, where β is taken as the y-axis point, and β 1 , β 2 are the corresponding 2 × 2 neighbors of the middle y point of the considered interpolated point. Q1 and Q2 are the quadrants of the four points of the image.
Equations (4) and (5) are the corner points of the bottom of a given digital lattice. These points are shown in Figure 2. The four corner points with their corresponding 2 × 2 neighbors for the final point calculations have been shown for a better understanding and interpretation of the given equations of the bilinear interpolation method.
As calculated in Equations (2) and (3), by determining the two corner points C (1,2) and C (2,2) , the final point P f inal is calculated by the summation of all of them, as given in Equation (6).

Classification
The proposed study has used two methods of classification for multi-chest disease detection: (1) the Deep learning and (2) Machine learning methods. In Deep Learning, the proposed study used a proposed architect of CNN where in Machine learning-based Classification, the transfer learning is applied on proposed CNN fully-connected layer that returns the deep features and then fed them as input data to Machine Learning classifiers.

Deep Learning Based Classification
There are many ways of doing classification and other tasks using DL methods such as multi-layer perceptron, autoencoders, etc. but the most commonly used DL method for image based classification is CNN that uses convolve operation in its layers. Similar to this, we propose a CNN architect that is discussed in detail.

Proposed CNN
CNN stands for Convolutional Neural Network, a specific neural network for handling information that has a 2D input shape. CNNs are ordinarily utilized for image detection and classification. In this stage, we train our CNN. The proposed CNN is based on a 32-layer architecture. The size of input images is set 1024 × 1024. The images are fed to Convolutional Blocks where our convolutional blocks include a sequence of 4 layers (Convolution, batch-normalization, ReLU, and Max-Pooling) with different parameters as explained in Table 2. For our first Convolutional Block, we make a window of 3 × 3 and convolve the image through kernels, where number of filters is set to 16 and these number of filters increases by incoming convolutional layers. The layer-by-layer visualization of weights is shown in Figure 3.
After the convolution layer, we then apply batch normalization. Batch normalization is a strategy for preparing deep neural networks that normalizes inputs to a layer for every scaled-down bunch. This settles the learning process and reduces data variation. Let us have a look on CNN layers that how they works individually.

The Input Layers
In this layer, the tensor to reshape and then restructure the tensor is followed up by layers. In the Input arguments of this layer, the properties of the data that are used to define the argument of a function for sizing are the width, height, and a channel. For the proposed CNN, it is 1024 × 1024 × 1 and a 2D image with 1024 rows and 1024 columns, where 1 is the color channels representation.

Convolutional Layer
The convolution layer contains at least one convolutional operation. For the first convolutional layer, the kernel size is 3 × 3 with equal padding. The output tensor and the input tensor have the same width and height. The tensor flow will add zeros in the rows and columns to ensure the same size. Convolutional blocks indicate how many times the image is iterated over 4 layers in proposed study. Our convolutional layer output size is (N − m + 1) × (N − m + 1). The output of the l-th convolution layer, denoted as in [30], consists of feature maps. It is computed as shown in Equation (7): i is the bias matrix and the convolution filter or kernel of size a * a that connects the j-th feature map in layer (l − 1) with the i-th feature map in the same layer. The output layer consists of feature maps. The first convolutional layer has input space. Our first convolutional block is 3 × 3, and the number of filters used is 16.

Batch Normalization Layer
Batch normalization layers are utilized among the convolution and the ReLU layers to normalize the information x i by estimating the µB and σ 2 B over a smaller batch size to accelerate CNN training and furthermore limit the affectability of the organization introduction. The standardized computations are defined in Equation (8) [68]: The normalized output of input instance is shown as x i , where i is the corresponding instance of data. After batch normalization, the ReLU activation function is applied, after which max-pooling is applied.

Max Pooling Layer
Max pooling computation is the next step. The pooling counting will reduce the addition of the data. In this stage, the module max-pooling2D with a size of 3 × 3 and a stride of 2 is used. For a pooling layer, one can specify only the filter/kernel size (F) and the strides (S).
There is no special parameter in the pooling layer, but it has two hyperparameters: Filter(F) and Stride(S). In general, if we have input dimensions of W 1 * H 1 * D 1 , then [69] The kernel or operational window is represented as W; the window that is needed to compute, represented as W 1 ; and the result is shown as W 2 . The number of filters in the proposed CNN is changed in each block in the convolutional layers, where the maxpooling filter window size remains the same at 2 × 2 with a stride of 1. The image is then downsampled to max-pooled data that are further processed by the subsequent layers.
In Equation (11), F is the spatial extent in the given filter of the image, and H is the height of the given image. These are the columns of the image. However, their height is calculated as stride by subtracting the assigned corresponding number of filters with their 3 × 3 size. This is later on subtracted from the stride with a summation of 1 by subtracting it from the spatial extent.
If the volume of an input image is W 1 * H 1 * D 1 , then an output of size W 1 * H 1 * D 1 is produced by a pooling layer. The equations for W 2 , H 2 , and D 2 in the pooling layer are shown above, where W 2 , H 2 , and D 2 are the width, height, and depth of the output, respectively.

Rectified Linear Unit (ReLU)
ReLU refers to the Rectifier Unit, the most ordinarily conveyed initiation work for the output of CNN neurons. Unfortunately, ReLU work is not differentiable at the beginning, which makes it difficult to use with backpropagation preparation. In this layer, we eliminate low values from the sifted picture and supplant it with nothing. This capacity is possibly enacted when the hub input is over a specific amount. Thus, when the info is under zero, the yield is zero.
Therefore, to cover any in-bounding and out-bounding range of pixels, the activation function is performed, and this normalizes incoming values. The summarizing details have been shown in Equation (13) [1].

Softmax Layer
Softmax is a numerical capacity that changes over a vector of numbers into a vector of probabilities, where the probabilities of each value correspond to the general size of each value in the vector. Convolutional layers are layers where channels are applied to the first picture, or to other element maps in a deep CNN. We use 32 layers for training. The table of CNN layers (Table 3) is given below.    Table 3 shows the fine-tuned CNN details for 15 categories of chest disease detection, where padding, stride, and other fine details of CNN show how the fine features are being collected and passed on to the next block. There are seven total Conv-Blocks in which a 4-layer combination is used by conducting a downsample of half of the size, as compared to the previous layer input. A training and validation graph of the proposed CNN is shown in Figure 4, where the consistent confidence of validation accuracy is shown. However, the accuracy graph is saturated over a few epochs. Various approaches using hyperparameters optimization, such as changing the learning rate (LR), the activation function, and epochs, are considered. Finely, the fine-tuned parameters for case study are shown in Table 4.  As discussed above, the training parameters of the proposed CNN are shown in Table 4. It is observed that max-epochs are taken as 500, which are manually stopped due to the saturation of change in validation accuracy. The initial learning rate is set to 0.0004, where the frequency of iteration is taken as 65. Considering 13,000 iterations in total, by looking into a consistent accuracy rate from 1000 to more than 6000, training was manually stopped after 6764 iterations. At this time, epochs reached 105. After obtaining poor results of classification on validation and testing data, the state-of-the-art deep features using self-activations of CNN are applied. The self-activations are referred to as no other pre-trained model is used to extract deep features where the proposed CNN based transfer learning is performed on its fully-connected layer.

ML-Based Classification
Many of the state-of-the-art machine learning classifiers are used when a comparison of algorithm performance is proposed. Support Vector Machine (SVM) is an algorithm used for linearity in data. We consider training features and training labels. Training labels are already defined, and every machine learning or SVM classifier selects the numeric value row by row from the table, so this training features results as 2100 × 15 feature vector, which means 2100 rows in every single image, as we have 2100 images for training and 15 is the number of features. Basically, in this, we design a matrix that consists of rows and columns. As we know, the rows are a feature, so there are n columns in front of every row, and this is a feature vector of a single image. Testing is performed using testing features that are performed on a trained model. The predictions function is performed, and we map these predicted labels on testing labels and calculate the accuracy, which is maximally reached to 99.98%. The testing features data contain a 900 × 15 size vector array where 900 is representing to testing features instances and 15 is the number of features. Different evaluation measures are used for evaluation on multiple trained model predictions. The all testing measures are discussed in Section 4.

Deep Features Extraction
Transfer learning is an examination issue in AI that emphasizes putting acquired information away while tackling an issue and applying it to another related issue. For instance, information acquired while figuring out how to perceive vehicles can be applied when attempting to perceive trucks. Basically, we train our CNN mainly with 106 epochs. In our train network we take preprocessed Training data. First, we consider two datasetsthe NIH Chest X-ray Dataset and the COVID-19 Chest X-ray Image Dataset, perform a process of normalization on them, and change the scale. We then make the image sizes of both datasets the same. After the normalization process, a dataset with a normalization form of 200 images of 15 different diseases is created. Our proposed CNN achieved validation accuracy that is 87.89% in 105 epochs with 6764 iterations. We also checked the 105 epochs with 13,000 iterations, but the result was the same. In the training graph, the Black Dots are Validation data. To check the accuracy, we load the model, but the accuracy is in two forms: (1) the model and (2) validation accuracy. This takes a very long time and an extensive validation process, but the accuracy is still very low, at 87.89%. Therefore, we utilize our proposed CNN fully-connected layer to obtain deep features and perform a machine learning classification process such as SVM or decision tree using 10-fold and 5-fold validation methods where accuracy reaches 95, 98, and 99%. We use the machine learning process because it is highly time-efficient as compared to deep learning. We perform the activations on CNN using fully-connected layer, in which there are 15 classes. These activations get training features on our training data and testing features on testing data.

Results and Discussions
With the use of the two best datasets, the CNN and the activation of CNN provides machine learning techniques with fivefold and tenfold validation techniques included in the proposed work. We show the result of the proposed work in the form of tables and graphs where the training set is utilized to prepare the model, while the validation set is simply used to assess the model's presentation. We made a table of CNN activation for the tenfold technique and calculated the individual accuracy of individual classes. In Tables 3 and 4, detailed CNN layers and training parameters are shown. CNN validation accuracy as shown in Figure 4 remains saturated with an 87.89% validation accuracy.
The results of CNN testing data are further discussed for each class that increase the testing accuracy as compared to validation accuracy. Furthermore, other important evaluation measure are also use and shown in Table 5. Although the validation accuracy was lower when testing data were collected at a 70:15:15 ratio: 70% data is used as training data, 15% as validation, and 15% as testing data, the overall results of classification are improved. The F1-score is an important measure of evaluation as it contains the effect of precision and recall. The precision is the ratio of true positives over true positives and false positives. The 96.67% value is a good measure of predictivity. Similarly, a kappa value of more than 60% shows the moderate level of agreement on testing data. However, to increase the overall evaluation measure performance, the transfer learning based deep features are used and tested via classical machine learning methods that improved the accuracy and other results as well.
Although the CNN testing data results were good enough, to increase accuracy, the transfer learning concept was used. Therefore, to improve the testing data results, deep transfer learning is utilized using self-activation on the proposed trained fully connected CNN layers. These features are embedded within classical ML classification methods. The best classification results selected among 23 different methods are shown in Tables 6 and 7 with fivefold and tenfold validation methods. Table 6. Classification validation accuracy of fivefold validation using self-activated features. K-fold cross-approval is a technique that endeavors to expand the utilization of accessible information for training and afterward testing a model. It is especially valuable for surveying model execution, as it gives a scope of accuracy scores across (to some degree) different datasets. K-fold CV is a method where a given informational index is part of a K number of segments/folds, where each fold is eventually utilized as a testing set. Let us consider the situation of fivefold cross-validation (K = 5). This cycle is repeated until each of the five folds has been utilized as the testing set. The validation accuracy using fivefold validation means that it makes five folds of all the input data from which the randomized instances are selected in each fold. Afterwards, one fold is trained and tested on the four other folds, where results are saved as one-fold predictions. Similarly, two folds selected for training and testing are performed on three folds, and the results of the predictions are saved. At last, when all folds' training and testing is performed, the mean of all predicting results is taken and shown as fivefold prediction results. By doing these cross-fold validation methods, the biases in predicting and training decreases almost to zero. Therefore, the proposed study uses these five-and tenfold methods to make the proposed results more promising. In Table 6, we can see almost all methods perform more than 99% of the results, except for the KNN-coarse, KNN-medium, and subspacediscriminant methods. The other results of these methods in terms of sensitivity, specificity, precision, F1−score, and the statistical method of evaluation (kappa-cohen) index are used. Generally, the validation accuracy is the ratio between the summation of true positives, true negatives with the summation of true positives, true negatives, false positives, and false negatives. This tells us how many true results for positive and negative classes are found. The evaluation measures operational calculations are shown in Equations (14)- (19) [70].
The other measure called recall or sensitivity is used to measure true positivity predictions among actual true positives.
The specificity measure is used to measure the true negatives among actual negatives. The precision is similar to sensitivity, which is used to measure over the summation of true negatives and false negatives; precision is used to measure the summation of true positives and false positives. However, the proposed studies used both to cross check the true positive predictions.
The F1−score is used to measure the ratio of the product of precision and recall with the summation of precision and recall. Sometimes, the true positivity and true negativity predictions may lead to incorrect perceptions due to data imbalancing. Therefore, we need to give weights to false positives and false negatives for the mean measurement.
Similarly, a statistical evaluation measure calculated from a confusion matrix of predicted data also uses four evaluation measures to give a confidence value. Its confidence range changes from 50 to 90+, where the 90+ value yields strong confidence, where 0-20% = None, 21-39% = Minimal, 40-59% = Weak, 60-79% = Moderate, 80-90% = Strong, and Above 90% = an almost Perfect Confidence in the proposed model of classification.

Agreement level =
x cm * x rm n + y cm * y rm n n In Equation (19), x is a cm and rm value showing the column 1 and the row 1 predicted values of the confusion matrix, and y is a cm and rm value showing the column 2 and the row 2 value of the confusion matrix for two classes only where in the proposed study, which uses 15 classes that indicate a 15 cm and rm value from both x and y aspects.
In Table 6, we can see that there are seven different classification models used, where four of them yield 99%+ accuracies. The bag-ensemble, KNN-fine, the LP-boost ensemble, and total-boost yield 99.33%, 99.40%, 99.1%, and 99.17% validation accuracies, respectively. It is further noticed that accuracy, sensitivity, and precision values remain the same. Therefore, we can say that the true positivity predictions over the true negatives and false negatives do not affect it. However, multiple evaluation measure usage covers it by evaluating the results in a different way. The F1-score uses both precision and sensitivity, which yields 99.33% again using the bag-ensemble method, as in the other six methods used in this study. The specificity calculated over the true negatives that changes for all methods is used in Table 6. It remains higher than the sensitivity values in all methods. However, we can say that the true negative predictions are more accurate than the true positive predictions. The changed precision value, as compared to sensitivity, includes the sum of TP and FP in the denominator over the precision values, which shows slightly higher values as compared to validation accuracy. This shows that the true positive predictions outnumber the false positives.
The statistical measure covers each aspect of the TP, FP, NP, and TN values to yield a confidence value. It can be observed that KNN-Fine has the highest accuracy, but the kappa index is not higher. The highest kappa value is of the bag-ensemble method with 96.64%, which makes it the best over all methods of classification. To cross-validate and remove bias, if any, more folding methods can validate it. The tenfold validation method is also used and shown in Table 7. The same classification methods are used with the ten-fold cross validation method. The results are slightly higher than the fivefold method results. The validation accuracy for the bag-ensemble method increased from 99.33% to 99.77%. Similarly, it increased the validation accuracy results for other lower-value algorithms as well. Therefore, we can say that the proposed method is more promising because it uses more randomized folding, which is good for big data usage as well. By increasing validation accuracy, the other two values-sensitivity and F1-score-remain the same, except the specificity of bag-ensemble, which decreased from 99.77 to 99.73%. The specificity remains above 99% in both five-and tenfold methods, where it increased in decimal values in the case of the tenfold method. The Kappa Cohen index overall in all methods increases its confidence on the predicted results of each algorithm.
The best kappa value method in the fivefold method was bag-ensemble, with 96.64%, which was increased to 97.86% in the tenfold method. The 2nd best kappa value in the fivefold method was 95.18% for KNN-fine, which is also increased to 97.32%. The 3rd and 4th best kappa value for LP-boost and total boost was 92.77% and 93.30%, respectively, and are increased from 92.37% to 95.71% and from 93.30% to 97.05% for both methods. The increasing results in all evaluation measures for the tenfold method makes the proposed results more promising for chest disease identification.
We here provide a table of CNN activations of the five-fold technique and calculate the individual accuracy for each class. The results are shown in Table 8.
The individual disease detection rate will lead us to propose a more confident model for particular disease detection. COVID-19 needs to be detected quickly and cheaply. In the proposed study, using the fivefold validation method showed that the best kappa value in the COVID-19 detection results was 99.0, i.e., the bag-ensemble. Similarly, for the other algorithms, i.e., the KNN-fine, LP-boost, and total-boost methods, the COVID-19 classification accuracy was 100%, 100%, and 98.0%, respectively. This shows that COVID-19 disease detection is highly accurate using the proposed framework. The COVID-19 results are set as a point of reference for investigating other chest disease detection results. Most of the results were detected with 100% or 99% accuracy. The best kappa value was attained with the bag-ensemble method. The all-class results were either 99% accurate or 100% accurate, but the Atelectasis disease classification was 97% accurate. Table 8. Self-activated features based on classification using the fivefold technique results on individual class accuracy. Similarly, the KNN-fine method was 100% accurate in COVID-19 detection, and Atelectasis detection accuracy was increased, as compared to the bag-ensemble method (97% to 99%). Emphysema, nodule, and pneumothorax disease results decreased from 100 to 98%, from 100 to 99%, and from 99 to 98%, respectively. The third best model in the fivefold method was the LP-boost method, which also achieves 100% COVID-19 detection results, where the Atelectasis disease results, as compared to the bag-ensemble method, increased from 97% to 100%. However, other classification results decreased slightly. For example, for Cardiomegaly, it decreased from 100 to 99%; for Nodule, it decreased from 100 to 97%; for Pleural Thickening, it decreased from 100 to 99%; and for pneumothorax, it decreased from 99 to 96.50%. However, it not only decreased the detection rate, but also increased in other disease detection results. The fourth best model, the total boost, also detected COVID-19 with 98% accuracy. It also decreased the classification results of Effusion, Emphysema, and Pneumothorax. In other disease detection results, it increased the detection rate, but it maintained results for some diseases. However, by looking into all ML classification methods, all performances are promising using the proposed method for multi-class chest disease detection. The ten-fold validation method proved to be more accurate than the fivefold method. This method based on individual class detection results are shown in Table 9.

Total-Boost Ensemble
We can see in Table 9 that the COVID-19 detection accuracies achieved by the best algorithms showed 100%, 99%, 99%, and 99% for the bag-ensemble, KNN-fine, LP-boost, and total boost methods, respectively. We can set the bag-ensemble algorithm results as a frame of reference for disease detection. The other algorithms showed that other methods are 1% less accurate than the bag-ensemble method. This method also achieves the highest confidence value of kappa. This algorithm not only achieves 100% for COVID-19 detection but also increases the classification accuracy for Atelectasis to 99%, which was 97% in the fivefold method. Furthermore, it also increased the detection results for fibrosis, mass, pneumonia, and pneumothorax from 99% to 100%. However, it decreased detection of infil-tration from 100% to 98%. This disease detection decrease is negligible if we look upon the improvement of the other six diseases. However, the best results achieved by the tenfold method makes the detection rate of all diseases 100%, except for three, two of which (Atelectasis and Consolidation) are 99% accurate and one is 98% (Infiltration). Furthermore, the 2 nd best method also improved the results of detection for all diseases. It ranges from 99 to 100%. The 10 classes reach 100% (cardiomegaly, consolidation, edema, emphysema, fibrosis, hernia, mass, nodule, pleural thickening, and pneumonia) accuracy, where five (atelectasis, COVID-19, effusion, infiltration, and pneumothorax) reach 99%, so these results are improved compared to the fivefold validation approach. The 3 rd best model, LP-boost, also increases its detection results and reaches 100 detections in eight diseases (cardiomegaly, edema, fibrosis, hernia, infiltration, mass, nodule, and pleural-thickening), 99% for five diseases (atelectasis, consolidation, COVID-19, pneumonia, and pneumothorax), and 98% for two diseases (effusion and emphysema). The 4 th best model also improved its results by giving a 100% accurate detection for nine diseases (cardiomegaly, consolidation, edema, emphysema, fibrosis, hernia, infiltration, pleural thickening, and pneumonia) and 99% for six other diseases (atelectasis, COVID-19, effusion, mass, nodule, and pneumothorax). Table 9. Self-activated features based on classification using the ten-fold technique results on individual class accuracy. It is observed that the tenfold method shows the less accurate results. The kappa confidence value is improved in the tenfold method. Therefore, tenfold cross-validation is a more robust, confident, promising method to classify chest diseases using more randomization and folds. The comparison of state-of-the-art studies on COVID-19 disease detection and other chest diseases are shown in the next section.

Comparison of with Previous Studies
The proposed study was used to solve multi-class classification problem for chest diseases. Few studies cover the multiple aspects of chest diseases, but currently, the problem must include COVID-19 among chest diseases. By including COVID-19, the proposed method not only solved the other single-class chest disease detection problems, but also covered the COVID-19 single-class detection studies. The comparison with state-of-the-art recent methods is summarized in Table 10.  Table 10 shows the comparison of the proposed models with various studies of MLand DL-based methods. The 1st comparison shows the staging problem solver for smokers' diseases and achieves up to 74.95% accuracy. The 2nd comparison shows that the DL-based COVID-19 detection results reach 90.13%. The 3rd comparison used a DL method and achieved 87%. The 4th comparison shows that the chest CT scan-based analysis of the GGO classifier achieve 98% accuracy. Another chest detection method showed 87.94%. The 7th comparison showed 97% using the radiographical method and a modality of images for chest disease detection. The last comparison used X-ray modality using ML classification methods and achieved 95.8%. The proposed study finally shows more accurate results on all compared studies. There are many other studies that focus either on a single disease of pneumonia, COVID-19, or some other chest disease, but most studies do not cover the multi-class problem in order to classify chest studies.

Conclusions
The proposed study uses a combined dataset of X-ray modality for chest disease detection by including a COVID-19 X-ray imaging dataset. To normalize the data, data augmentation is performed. This preprocessed data removes data bias, if there is any. However, a novel CNN is proposed for multi-chest disease detection. It saturates its learning in its training time by reporting an 87.89% validation accuracy. To increase the prediction results and reduce the prediction time, deep transfer learning is applied. In this way, self-activated deep features are extracted from the fully-connected layer of the proposed CNN. The deep features are fed to seven different domain algorithms of the ML approach. To make the proposed study more promising, two methods of validation (5-and 10-fold) are performed. It is observed from the results that more folding makes the results more accurate. This means that the proposed study can be used to detect multi-class chest disease detection. It not only increases accuracy, but also reduces the prediction time of a given testing sample. At last, a comparison shows the improvement of the proposed study from either single-class chest diseases or multi-class chest diseases, including COVID- 19. In the future, multi-class decision-making CAD systems in different aspects of the medical domain should be used. However, data normalization needs to be considered to make the data reliable. Big data samples are encouraged for more confident results of chest diseases. The deep transfer learning features are also encouraged.