An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples

Current research endeavors in the application of artificial intelligence (AI) methods in the diagnosis of the COVID-19 disease has proven indispensable with very promising results. Despite these promising results, there are still limitations in real-time detection of COVID-19 using reverse transcription polymerase chain reaction (RT-PCR) test data, such as limited datasets, imbalance classes, a high misclassification rate of models, and the need for specialized research in identifying the best features and thus improving prediction rates. This study aims to investigate and apply the ensemble learning approach to develop prediction models for effective detection of COVID-19 using routine laboratory blood test results. Hence, an ensemble machine learning-based COVID-19 detection system is presented, aiming to aid clinicians to diagnose this virus effectively. The experiment was conducted using custom convolutional neural network (CNN) models as a first-stage classifier and 15 supervised machine learning algorithms as a second-stage classifier: K-Nearest Neighbors, Support Vector Machine (Linear and RBF), Naive Bayes, Decision Tree, Random Forest, MultiLayer Perceptron, AdaBoost, ExtraTrees, Logistic Regression, Linear and Quadratic Discriminant Analysis (LDA/QDA), Passive, Ridge, and Stochastic Gradient Descent Classifier. Our findings show that an ensemble learning model based on DNN and ExtraTrees achieved a mean accuracy of 99.28% and area under curve (AUC) of 99.4%, while AdaBoost gave a mean accuracy of 99.28% and AUC of 98.8% on the San Raffaele Hospital dataset, respectively. The comparison of the proposed COVID-19 detection approach with other state-of-the-art approaches using the same dataset shows that the proposed method outperforms several other COVID-19 diagnostics methods.


Introduction
The Internet of Things (IoT) and smart home technologies enable the monitoring of people in their homes without interfering with their daily routines [1]. Advances in artificial intelligence (AI) and machine learning (ML) can enable faster patient monitoring, management, and treatment, as well as convert a hospital-only treatment pathway into cost-effective combined home-hospital or even outpatient alternatives, improving overall quality of health care and paving the way for personalized medicine [2]. Digital health signals recorded at home by sensors provide a wealth of clinical data. Such data could be sent to cloud computing infrastructure and analyzed remotely, which is especially useful in the case of various contagious diseases, such as the coronavirus one [3]. However, analyzing real-time data collected from heterogeneous IoT sensors presents several challenges because the data contain significant artifacts because of transmission and recording limitations, are highly imbalanced and incomplete because of subject variability and resource limitations, and involve multiple modalities [4].

1.
The proposed algorithm was able to effectively provide the preliminary classification of COVID-19 using relevant feature parameters. 2.
The proposed algorithm has a lower computational intensity, and the detection time was in a few seconds. 3.
Based on the effectiveness of our proposed model, it can improve pathologist efficiency and aid effective laboratory examination in pathology departments.
The remaining parts of the paper are prepared and sectioned as follows: an extensive review of the literature is discussed in Section 2, while Section 3 presents the framework and detailed description of our proposed methods. The experimental results and the discussion are presented in Section 4, and the last part of the paper is the conclusion and future recommendation, as presented in Section 5.

Materials and Methods
This section describes in detail the progress and contributions, including the state-ofthe-art methods presented in the previous study on the detection of COVID-19. To further understand the level of the existing study with the contribution of AI methods, especially the ML algorithms used, we reviewed the various literature using blood test results in the detection of COVID-19 with highlights on the methods, contribution, and limitations.
The authors of [32] evaluated the results of the blood tests to perform an initial screening of likely patients with COVID-19 using the dataset of 598 blood samples from Albert Einstein Hospital, Brazil. The dataset consists of 81 cases of COVID-19. The authors based their experiment on 14 blood features using ML models based on random forest, logistic regression, artificial neural network (ANN), and Lasso elastic-net regularized generalized linear network (GLMNET). The best-performance model gave an accuracy of 87% for ANN.
A study [33] presented a COVID-19 detection approach based on some ML models, which are XGBoost, LDA, LR, RF, and Decision Tree. The authors investigated the impact of feature/variable selection and dimensionality reduction in features from 12 variables to 4. They concluded that the best accuracies of 89.6% and 85.9% were achieved by XGBoost for 12-variable and 4-variable models, respectively. The later study [25] was conducted using blood test results from Oxford University Hospitals, UK. The XGBoost classifier achieved the accuracy, sensitivity, and specificity of 92.3%, 77.4%, and 95.7%, respectively.
Recent work by [34] carried out an analysis using two ML algorithms in the detection of COVID-19 on routine blood tests. The ML algorithms used by the authors are RF and SVM on a small data set of 294 blood samples obtained from Wuhan Union Hospital and Kunshan People's Hospital, China. Fifteen characteristics were selected for analysis and the experimental results showed that SVM outperformed random forest classifiers with accuracy, precision, sensitivity, and specificity of 84%, 92%, 88%, and 80%, respectively.
Five ML algorithms, namely gradient boost trees, neural networks, logistic regression, random forest, and SVM, were proposed by authors in [35] in the diagnosis of COVID-19. A dataset containing a total number of 235 blood samples with 102 established cases of COVID-19 was gathered from Albert Einstein Hospital in Brazil and 15 relevant characteristics were selected. SVM gave the best classification result with very little significance compared to previous work reviewed in this study on AUC, sensitivity, and specificity of 85%, 68%, and 85%, respectively.
Another dataset consisting of 279 cases from San Raffaele Hospital, Milan, Italy, was analyzed for the early detection of COVID-19 by the authors of Brinati et al. [27]. In the performance of seven ML models such as KNN, DT, NB, extremely randomized trees (ET), LR, RF, and SVM, the experimental results showed that the RF model outperformed other classifiers with an accuracy of 86% and a sensitivity of 95%. Another study [37] presented an LR-based ML classifier to detect COVID-19 using three major component counts. The training set consists of 390 cases including established COVID-19 cases from Stanford Health Care and a different dataset was used for validation.
Further studies from [38] analyzed and applied six state-of-the-art methods including MLP, SVM, RT, NB, RF, and Bayesian Networks (BN). The study was carried out using a dataset consisting of 564 samples, including 559 established COVID-19 samples from Albert Einstein Hospital in Brazil. The authors performed oversampling using the SMOTE technique because of the limited data size and, for feature selection, a manual method and two algorithms based on PSO and evolutionary search were utilized. The performance model with the highest results was obtained from the BN model with an accuracy, precision, specificity, and sensitivity of 95.159%, 93.8%, 93.6%, and 96.8%, respectively.
The authors of [39] presented a neural network model for the detection of the severity of COVID-19 in small data samples from the Tongji Medical College of Huazhong University of Science and Technology, Hubei, in collaboration with the Tumor Center of Union Hospital, China. The authors evaluated the severity of COVID-19 on 151 images after selecting features.
An extreme gradient boosting (XGBoost) model was applied by the authors in Kukar et al. [40] to identify COVID-19. A total of 5333 blood samples, including 160 established COVID-19 samples, were obtained from the University Medical Center Ljubljana, Slovenia. Thirty-five relevant characteristics were selected for further analysis and the experimental results showed an improved AUC of 97%, 81.9% sensitivity, and a specificity of 97.9%.
A robust model for oversampling and ensemble learning based on the integration of the SVM and SMOTEBoost methods was proposed in [41]. The results of 10 SVM-SMOTEBoost models were used for the ensemble learning, and the overall performance was determined using the average results of the 10 models. The proposed model was able to achieve an AUC of 86.78%, a sensitivity of 70.25%, and a specificity of 85.98%.
Aljame et al. [42] proposed an ensemble learning model for the initial screening of patients with COVID-19 from routine blood tests. The model used the dataset obtained from 564 patients of the Albert Einstein Israelita Hospital located in Sao Paulo, Brazil, and achieved an accuracy of 99.88% in discriminating COVID-19 positive cases.
In Wu et al. [43], to identify COVID-19 from a complete blood count, a mixed dynamic ensemble selection (DES) approach for unbalanced data is suggested. This approach combines data preparation with enhanced DES. First, the authors balance the data and reduce noise using the hybrid synthetic minority oversampling approach and edited nearest neighbor (SMOTE-ENN). Second, a hybrid multiple clustering and bagging classifier generation (HMCBCG) approach is presented to enhance the variety and local regional competency of candidate classifiers to improve DES performance. With 99.81% accuracy, HMCBCG with k-nearests oracles eliminate (KNE) achieves the best performance for COVID-19 screening.
AlJame et al. [44] propose a ML prediction model for the diagnosis of COVID-19 based on clinical and regular laboratory data. The model uses an ensemble-based strategy known as deep forest (DF), which employs numerous classifiers in several layers to foster variety and increase performance. The cascade level uses layer-by-layer processing and is made up of three separate classifiers: additional trees, XGBoost, and LightGBM. The DF model has an accuracy of 99.5% on two publicly accessible datasets.
In Babaei Rikan et al. [45], to diagnose positive instances of COVID-19 from three regular laboratory blood test datasets, seven ML, and four deep learning models were presented. To illustrate the relevance among samples, Pearson, Spearman, and Kendall correlation coefficients were used. The suggested models were trained, validated, and tested using a four-fold cross-validation procedure. The deep neural network (DNN) model earned the highest accuracy values in all three datasets.
Buturovic et al. [46] sought to build a blood-based host gene expression classifier for the severity of viral infections, including COVID-19. They created a logistic regression-based classifier for viral infection severity and validated it in a variety of viral infection situations, including COVID-19. In patients with confirmed COVID-19, the classifier exhibited area under curve (AUC) values of 0.89 and 0.87 to detect patients with severe respiratory failure or 30-day mortality, respectively.
Hu et al. [48] proposed a framework based on enhanced binary Harris hawk optimization (HHO) in conjunction with an extreme kernel learning machine (KELM). They used specular reflection learning to improve the original HHO algorithm. The experimental findings reveal that the selected indicators, such as age, partial oxygen pressure, oxygen saturation, sodium ion concentration, and lactic acid, are critical for the early correct evaluation of COVID-19 by the proposed feature selection method.
Kukar et al. [40] built an ML model for the detection of COVID-19 based on regular blood tests from 5333 patients with various bacterial and viral illnesses, as well as 160 COVID-19-positive patients using the extreme gradient boost machine (XGBoost) and achieved the AUC value of 0.97. According to the significance score of the XGBoost feature, the most beneficial routine blood parameters for the diagnosis of COVID-19 were MCHC, eosinophil count, albumin, INR, and prothrombin activity.
Rahman et al. [49] used a stacking machine learning model to propose a biomarkerbased COVID-19 detection system. This study trained and validated the proposed model using seven different publicly available datasets. White blood cell count, monocyte and lymphocyte percentage, and age parameters were discovered to be important biomarkers for COVID-19 disease prediction. The overall accuracy of the stacking model was 91.45%.
Qu et al. [50] used a logistic regression model to analyze the results of the blood test. The best prognostic indications for severe COVID-19 were lymphocyte count, hemoglobin, and ferritin levels.
The summary of related studies with an emphasis on the significant methods used and the contributions of the studies with their evaluation metrics and values is described in Table 1. The results of the previous study show the applications of single-level and ensemble classifiers. However, some of the shortcomings of existing studies include the challenges of limited dataset samples and imbalance datasets [51], problems with most datasets with aged and male-dominant patient results [52], insufficient clinical data that are useful to improve model classification, challenges of a single data source could lead to model restrictions in generalizability [53], and incomprehensive/inadequate data [54].
Therefore, the need to explore some of the existing feature selection methods for dimensionality reduction is important for an effective classification model [42]. Besides, research focus should be targeted toward analyzing the integrated performance of the new test data using various ML algorithms [35]. Based on some of the existing pitfalls of the previous study, this study presents a unique ensemble method using an automatic feature selection method based on PCA, thus improving the classification of models for efficient COVID-19 detection.

Proposed Methodology
This section discusses in detail the description of the proposed experimental model and the visual summary of the proposed methodology is depicted in Figure 1. Our study applied and investigated the performance of different state-of-the-art ML algorithms, including single and ensemble learning, for effective detection of COVID-19. The proposed system is divided into four categories, and they are fully described in the subsections.

Proposed Methodology
This section discusses in detail the description of the proposed experimental model and the visual summary of the proposed methodology is depicted in Figure 1. Our study applied and investigated the performance of different state-of-the-art ML algorithms, including single and ensemble learning, for effective detection of COVID-19. The proposed system is divided into four categories, and they are fully described in the subsections.

Dataset Description
The dataset used in this study contains 279 cases of patients from San Raffaele Hospital Milan, Italy [27]. It was made accessible by the Italian Scientific Institute for Research, Hospitalization and Healthcare (IRCCS) and annotated with 16 hematochemical values from routine blood tests. The dataset consists of the results of the respiratory tract rRT-PCR test of the samples for 177 positively established cases of COVID-19 and 102 non-COVID-19 cases based on the asopharyngeal swab. The dataset is summarized in Table 2.

Dataset Description
The dataset used in this study contains 279 cases of patients from San Raffaele Hospital Milan, Italy [27]. It was made accessible by the Italian Scientific Institute for Research, Hospitalization and Healthcare (IRCCS) and annotated with 16 hematochemical values from routine blood tests. The dataset consists of the results of the respiratory tract rRT-PCR test of the samples for 177 positively established cases of COVID-19 and 102 non-COVID-19 cases based on the asopharyngeal swab. The dataset is summarized in Table 2.

Data Preprocessing
This is the first phase of our proposed system, and the concept of data preprocessing has been considered as an important aspect of the generalization performance of supervised ML algorithms [59]. First, we replaced the categorical variable of gender with numerical values (0 for 'male' and 1 for 'female'). We also manually checked all datasets and corrected data typing errors (such as the '0-4' value entered instead of 0.4). After data cleaning, further pre-processing was done to remove outliers. We used the Median Absolute Deviation (MAD)-based outlier removal, which removed the samples that differed by more than three standard deviations from the median value of the variable across the dataset.
In the data preprocessing phase, the need to handle missing values within the dataset is extremely important; thus, we applied the KNN imputation method [60], which allows us to input missing values with the five closest neighbors acting as the best choice, and then input them based on the mean of the non-missing values. We further explore data rebalancing, since the dataset suffers from data imbalance comparing the ratio of positive class to negative class. Previous studies on the impact of class imbalance have shown that if a dataset suffers from imbalance, then classifier biases could lead to classifier biases and hence an increasing misclassification rate and classification model degradation. Based on this, we integrated a synthetic minority oversampling technique (SMOTE) [61], aiming to balance the data by oversampling the minority class.

Feature Selection
The feature selection phase is a crucial stage necessary to select the most appropriate feature representation and improve an ML model. Previous studies have shown that reducing the dimensionality of the data helps reduce data redundancy, avoid noisy data, and improve the performance [62]. This study applied an unsupervised linear transformation technique based on Principal Component Analysis (PCA) to select features with the largest eigenvalues that represent 95% of the variability. The correlation matrix of the different features in the selected datasets is depicted in Figure 2.

Cross-Validation Methods
For this study, we applied holdout cross-validation to evaluate the performance of

Cross-Validation Methods
For this study, we applied holdout cross-validation to evaluate the performance of our model as follows. The train-test split function was used from the scikit-learn library to randomly split the data into train/test data samples. The train/test split methods were used to randomly divide the dataset into 80% for training and 20% for testing. We further partitioned the training dataset into the train/validate split using 75% for training and 25% as validation data. Thus, the overall data samples used for training comprise 106 COVID-19 and 62 non-COVID-19 samples, while the validation data comprise 35 COVID-19 and 20 non-COVID-19 samples. To test the performance of our model, the initial holdout of 20% data was used, which consist of 35 COVID-19 and 20 non-COVID-19. The experimental procedure was repeated 10 times, and the performance of each model was measured by calculating the mean average of the recorded scores.

Ensemble Learning
Ensemble learning is a ML approach in which numerous models (dubbed "weak learners") are trained to tackle the same problem and then combined to achieve superior results [63]. Weak learners (or base models, aka first-stage models) can be used to create more complicated models by merging multiples of them. Most of the time, these base models do not perform well on their own, either because they contain too much bias or too much variation to be robust. The concept behind ensemble techniques is to try to lessen the bias and variance of such weak learners by merging many of them to form a strong learner with superior outcomes. We can generate more accurate or reliable models by combining weak models in the proper way. Base models and a meta-learner (or a second-stage model) that uses base-model predictions are used to design a stacking ensemble model. The base models are trained on the training data and are used to produce predictions. The metalearner then is trained on the decisions made by base models using previously unseen data to aggregate the base-model predictions. This is done by feeding the meta-learner with the input and output pairs of data from the base learners while aiming to predict the correct output. Therefore, the stacking algorithm has three stages:

1.
Construct an ensemble: • Select base learners B, which must be different, • Select a meta learner L.

2.
Train the ensemble: • Train each base model on the training dataset D, • Cross-validate each base model, • Combine the predictions from the base models to form a new training dataset D = {X tr , B 1 (X tr ), B 2 (X tr ), . . . , B m (X tr )}, which consists of training inputs X tr and the corresponding predictions by k base models B i (X tr ), i = 1 . . . k, • Train the meta-learner M on the new datasetD to generate more accurate predictions on previously unseen data.

Test on new data:
• Record output decisions from the base models B, • Feed base-model decisions into meta-learner M to make final decision.
The ensemble learning algorithm is summarized in Figure 3. Stacking exploits the capabilities of any best learner. When base classifiers used for stacking have high variability and uncorrelated outputs, the largest improvement in performance is usually made.
predictions on previously unseen data.

3.
Test on new data: • Record output decisions from the base models ℬ, • Feed base-model decisions into meta-learner ℳ to make final decision.
The ensemble learning algorithm is summarized in Figure 3. Stacking exploits the capabilities of any best learner. When base classifiers used for stacking have high variability and uncorrelated outputs, the largest improvement in performance is usually made.

Machine Learning Models
We have experimented on 15 ML models, namely KNN, Linear SVM, RBF SVM, Random Forest, Decision Tree, Neural Network (MultiLayer Perceptron), AdaBoost, Extremely randomized trees (ExtraTrees), Naïve Bayes, LDA, QDA, Logistic Regression, Passive Classifier, Ridge Classifier, and Stochastic Gradient Descent Classifier (SGDC). These ML algorithms were used in other classification domains and have achieved the best prediction performances based on their ability to collaborate the benefits of several different algorithms to a more powerful model. To improve generalizability and robustness compared to a single ML algorithm, we applied three different ensemble learners.
Some of the ML algorithms used in this study are described as follows:

Machine Learning Models
We have experimented on 15 ML models, namely KNN, Linear SVM, RBF SVM, Random Forest, Decision Tree, Neural Network (MultiLayer Perceptron), AdaBoost, Extremely randomized trees (ExtraTrees), Naïve Bayes, LDA, QDA, Logistic Regression, Passive Classifier, Ridge Classifier, and Stochastic Gradient Descent Classifier (SGDC). These ML algorithms were used in other classification domains and have achieved the best prediction performances based on their ability to collaborate the benefits of several different algorithms to a more powerful model. To improve generalizability and robustness compared to a single ML algorithm, we applied three different ensemble learners.
Some of the ML algorithms used in this study are described as follows: 1.
The K-Nearest Neighbor (KNN) model has been used effectively in previous studies, especially in solving non-linear problems. It is used to assign the class label according to the smallest distance between the target point and training point(s) in the feature space. The Euclidean distance (ED) is widely used to determine the distance between the target point x and the training point y: 3. Naive Bayes (NB) is used for classification where the instances of a dataset are differentiated using specified features. This model is a probabilistic classifier based on strong independence assumptions between features. The mathematical expression for NB classifier is expressed as the best value of P(x/t) and will be predicted value: where P(x) and P(t) are the prior probabilities, the posterior probability is represented as P(xt), and P(tx) is the likelihood.

4.
Logistic Regression (LR): We presented a logistic regression model to find the optimal regularization strength and thereby prevent overfitting of the model.

5.
Random Forest (RF) is an ensemble algorithm that applies the combination of tree predictors with the same distribution for all trees in the forest. Considering the ensemble of classifiers h 1 (x), h 2 (x), . . . , h k (x), and with the training set drawn at random from the distribution of the random vector X, Y, the mathematical definition for the margin function is expressed in Equation (5): The generalization error is depicted in Equation (6): where I(.) is the indicator function, and P X,Y is the probability over X, Y.

6.
Linear Discriminant Analysis (LDA) is a Bayes optimal classifier that is used in many classification problems. LDA finds a one-dimensional subspace in which the classes are separated well. The discriminant function is given by Equation (7): The parameters of these models are summarized in Table 3. Table 3. Default parameters values for the machine learning models.

Performance Metrics
The performance of machine algorithms was evaluated using accuracy, false positive rate (FPR), false negative rate (FNR), area under curve (AUC), Matthew's Correlation Coefficient (MCC), and Cohen's kappa. The description of the performance metrics used in this study is summarized in Table 4. Table 4. Mathematical definition of performance metrics.

Metrics Definition
Accuracy (

Software and Hardware
The ML algorithms were implemented using the scikit-learn 0.19.1, Keras 2.1.6 in Python 3.7 (Python Software Foundation, Wilmington, DE, USA) packages. We have performed all computations in a personal computer with Windows 10 (Microsoft, Redmond, WA, USA), and 64-bit operating system Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz (Intel Corporation, San Francisco, CA, USA), and 8.0 GB RAM.

Results
This section provides the details of our findings with respect to the performance of each model along with experimental values of our evaluation metrics.

Convolutional Neural Network (First Stage of Ensemble Learning)
This study is built on different neural network models using different levels of hidden layers 1, 2 trained with two different numbers of epochs (10,50). The activation function used in this study is ReLu and we used Keras and the Tensorflow library. We used a sequential class from the Keras library and further applied an Adam (Adaptive Moment Estimation) optimizer. The train-test split function was used from the scikit-learn library for performing the random splitting of the data into train/test data samples.
The training of the dataset was done using a simple CNN architecture and this CNN architecture comprises 10 sequential layers, which includes two convolutional 1D layers, followed by 1-D max-pooling and again two convolutional 1D layers followed by 1D max-pooling. The final stage has one flattening layer, and two dense layers, with an activation function of ReLu and Softmax, respectively. Besides, we added a dropout layer between them. The initial input size is the maximum input size for this dataset, which is 12 × 20. We used two different numbers (10 and 50) of epochs to train the network for all the experiments. The training progress for was at its best within Epoch 50. The hyperparameters of the CNN model are presented in Table 5.

Ablation Study of Machine Learning Algorithms (Second Stage of Ensemble Learning)
We implemented the ablation study to determine the best second-stage ML algorithm. Classification is carried out using 15 ML algorithms and, after all training and testing, the results of each experimental performance after 10 runs were analyzed, and the summary of the accuracy performance of each model using the training validation test (TVT) method is presented in Table 6. The classifier performance is depicted in Figure 4. The experiment was run 10 times and all through the experiment the ensemble classifiers have shown consistency with improved results in the detection of COVID-19. However, the best five performances were achieved by Adaboost, ExtraTrees, Decision Tree, QDA, and random forest models with mean accuracies of 99.28%, and 99.28%, 98.5%, 94.6%, and 92.9%, respectively.

Computational Complexity
The computational complexity of the entire framework is dominated by backpropagation training of the convolutional neural network used in the first stage of the ensemble

Computational Complexity
The computational complexity of the entire framework is dominated by backpropagation training of the convolutional neural network used in the first stage of the ensemble learning model. The computational cost of the 2D direct convolution is where M and N are the size of the input feature map, m and n are the size of spatial two-dimensional kernels, and F I and F O are the input and output channels within a layer, respectively [65]. The computational complexity of the best machine learning algorithm used in the second stage of the ensemble learning model (i.e., the ExtraTrees classifier) depend linearly on the number of attributes, which is not high. Formally, it is equal to O(n × p × n trees ), where n is the number of training samples, p is the number of features, and n trees is the number of trees.

Statistical Analysis
To rank the methods, we applied the non-parametric statistical Friedman test and the post hoc Nemenyi test. The Nemenyi test returns the critical difference (CD), which is used to evaluate the significance of the difference between the mean ranks of the methods as presented in Figure 5. If the difference between the mean ranks is smaller than the CD value, then it is considered as not statistically significant. The results of the Nemenyi test show that the ExtraTrees and AdaBoost final-stage classifiers achieved the best performance of 99.28% in accuracy. The result is significantly better than the performance of all other classifiers, except of Decision Tree, which achieved an accuracy of 94.64%.

Computational Complexity
The computational complexity of the entire framework is dominated by backpropagation training of the convolutional neural network used in the first stage of the ensemble learning model. The computational cost of the 2D direct convolution is ( × × × × × ), where and are the size of the input feature map, and are the size of spatial two-dimensional kernels, and and are the input and output channels within a layer, respectively [65]. The computational complexity of the best machine learning algorithm used in the second stage of the ensemble learning model (i.e., the ExtraTrees classifier) depend linearly on the number of attributes, which is not high. Formally, it is equal to ( × × ), where is the number of training samples, is the number of features, and is the number of trees.

Statistical Analysis
To rank the methods, we applied the non-parametric statistical Friedman test and the post hoc Nemenyi test. The Nemenyi test returns the critical difference (CD), which is used to evaluate the significance of the difference between the mean ranks of the methods as presented in Figure 5. If the difference between the mean ranks is smaller than the CD value, then it is considered as not statistically significant. The results of the Nemenyi test show that the ExtraTrees and AdaBoost final-stage classifiers achieved the best performance of 99.28% in accuracy. The result is significantly better than the performance of all other classifiers, except of Decision Tree, which achieved an accuracy of 94.64%.

Comparison with Previous Studies
For further evaluation of our proposed ensemble learning-based method, we benchmarked the results of our models with previous studies using the same datasets and the same performance metrics. The proposed model shows a significant improvement compared to the existing study using state-of-the-art methods [27,[66][67][68] that applied a hybrid fuzzy interference engine and DNN, and a similar study by Brinati et al. [27], which uses a three-way random forest classifier in the prediction of COVID-19 using the RT-PCR dataset. In another study, Chadaga et al. [68] used SMOTE for oversampling, and then evaluated four machine learning algorithms (Random Forest, Logistic Regression, KNN, and Xgboost), while their hyperparameters were optimized using grid search. The best result in terms of accuracy was the 92% achieved with the Random Forest classifier. The summary of related studies with the description of the model type and performance metrics is shown in Figure 6. Our proposed model is compared with a Hybrid Fuzzy inference engine and deep neural network (HDS) approach [66], a three-way random forest classifier (TWFR) approach [27], and Random Forest (RF) [67,68]. evaluated four machine learning algorithms (Random Forest, Logistic Regression, KNN, and Xgboost), while their hyperparameters were optimized using grid search. The best result in terms of accuracy was the 92% achieved with the Random Forest classifier. The summary of related studies with the description of the model type and performance metrics is shown in Figure 6. Our proposed model is compared with a Hybrid Fuzzy inference engine and deep neural network (HDS) approach [66], a three-way random forest classifier (TWFR) approach [27], and Random Forest (RF) [67,68]. Figure 6. Comparison of results with previous studies. Our proposed model is compared with a three-way random forest classifier (TWFR) approach [27], Random Forest (RF) [67], SMOTE+RF [68], and a Hybrid Fuzzy inference engine and deep neural network (HDS) [66].

Discussion and Conclusions
The need for early and effective methods for the detection of COVID-19 is extremely important in this era of global pandemic and the application of artificial intelligence methods can significantly improve prediction and assist the physician in the decision-making process. In this paper, the viability and clinical soundness of using blood sample test analysis and machine learning as alternatives to a commonly used RT-PCR test to classify COVID-19-positive patients with were shown. This is particularly useful in countries suffering from scarcity of RT-PCR reagents and specialist laboratories, such as developing ones.
This paper provides simple and interesting stages in the detection of the COVID-19 disease using a small dataset. In addition to the small size of the dataset used in this paper, the problem of missing values, outliers, and class imbalance was also addressed. Our paper explored and analyzed 15 interesting machine learning algorithms, and the experiments were run continuously 10 times on the train-validate-test (TVT) datasets. After 10 runs, we computed the mean metrics and the TVT cross-validation accuracy with the best five models in their descending order: Adaboost, ExtraTrees, Decision Tree, QDA, and random forest with 99.28%, 99.28%, 98.5%, 94.6%, and 92.9%, respectively. In addition, the mean AUC value for ExtraTrees is 99.48%, AdaBoost gave 98.88%, and Decision Tree had 93.72%.
On the basis of our study, we can argue that our proposed ensemble model outperforms the state-of-the-art methods, as we can see in the next subsection. The COVID-19 Figure 6. Comparison of results with previous studies. Our proposed model is compared with a three-way random forest classifier (TWFR) approach [27], Random Forest (RF) [67], SMOTE + RF [68], and a Hybrid Fuzzy inference engine and deep neural network (HDS) [66].

Discussion and Conclusions
The need for early and effective methods for the detection of COVID-19 is extremely important in this era of global pandemic and the application of artificial intelligence methods can significantly improve prediction and assist the physician in the decision-making process. In this paper, the viability and clinical soundness of using blood sample test analysis and machine learning as alternatives to a commonly used RT-PCR test to classify COVID-19-positive patients with were shown. This is particularly useful in countries suffering from scarcity of RT-PCR reagents and specialist laboratories, such as developing ones.
This paper provides simple and interesting stages in the detection of the COVID-19 disease using a small dataset. In addition to the small size of the dataset used in this paper, the problem of missing values, outliers, and class imbalance was also addressed. Our paper explored and analyzed 15 interesting machine learning algorithms, and the experiments were run continuously 10 times on the train-validate-test (TVT) datasets. After 10 runs, we computed the mean metrics and the TVT cross-validation accuracy with the best five models in their descending order: Adaboost, ExtraTrees, Decision Tree, QDA, and random forest with 99.28%, 99.28%, 98.5%, 94.6%, and 92.9%, respectively. In addition, the mean AUC value for ExtraTrees is 99.48%, AdaBoost gave 98.88%, and Decision Tree had 93.72%.
On the basis of our study, we can argue that our proposed ensemble model outperforms the state-of-the-art methods, as we can see in the next subsection. The COVID-19 early detection ML system based on blood tests offers a quick, simple, and cheaper alternative to imaging scan detection. Our results show the great potential of machine learning with promising results in the detection of the COVID-19 disease. We intend to further explore other medical disease domains using some of our highly performed models in collaboration with deep models in clinical settings.
Some of the limitations and future directions of this study are as follows: only one feature selection technique was applied, thus exploring other feature selection methods can prove useful in improving the results of other machine learning models, thereby increasing classification accuracy. Second, by adopting data augmentation methods, we can aid the performance of training of machine learning methods, with the focus on improving other state-of-the-art single models and, finally, the need to effectively explore more effective deep learning methods to reduce overfitting.

Data Availability Statement:
The dataset used in this study is available from https://zenodo.org/ record/3886927#.Yc6feGiOmUk (accessed on 9 February 2022).

Conflicts of Interest:
The authors declare no conflict of interest.