# Semi-Supervised KPCA-Based Monitoring Techniques for Detecting COVID-19 Infection through Blood Tests

^{1}

^{2}

^{3}

^{4}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Kernel PCA Model

#### 2.2. One-Class SVM

- Radial basis function (RBF) kernel:$$\mathcal{K}(x,{x}^{\prime})=\langle \mathsf{\Psi}\left(x\right),\mathsf{\Psi}\left({x}^{\prime}\right)\rangle ={e}^{(\alpha \parallel x-{x}^{\prime}{\parallel}^{2})},$$
- Linear kernel:$$K(\mathbf{x},{\mathbf{x}}^{\prime})={\mathbf{x}}^{T}\xb7{\mathbf{x}}^{\prime}.$$
- Polynomial kernel:$$K(\mathbf{x},{\mathbf{x}}^{\prime})={(\gamma \xb7{\mathbf{x}}^{T}\xb7{\mathbf{x}}^{\prime}+r)}^{d},$$
- Sigmoid kernel:$$K(\mathbf{x},{\mathbf{x}}^{\prime})=tanh(\gamma \xb7{\mathbf{x}}^{T}\xb7{\mathbf{x}}^{\prime}+r),$$

#### 2.3. The Proposed KPCA-OCSVM Anomaly Detection Approach

#### 2.4. Benchmark Methods

#### Dimensionality Reduction Methods

**PCA model:**PCA is a linear dimensionality reduction technique that seeks to find the linear combinations of the input variables that explain the most variance in the data [28]. PCA identifies the directions in the data that contain the most information and projects the data onto these directions to reduce the dimensionality [29]. PCA has a time complexity of O(${n}^{3}$) for both the training and testing phase. This is because PCA relies on the computation of the covariance matrix and the eigenvalues/eigenvectors of that matrix, which both have a time complexity of O(${n}^{3}$). Specifically, for the training phase, PCA needs to calculate the covariance matrix of the input data, which is of size $n\ast n$, then it needs to compute the eigenvectors and eigenvalues of the covariance matrix, these two steps have O(${n}^{3}$) time complexity. For the testing phase, once the PCA model is trained, projecting new data points onto the principal components requires matrix multiplication, which also has a time complexity of O(${n}^{3}$).

**ICA model:**ICA is also a linear method, but it aims to find a linear combination of the original variables, such that the resulting components are statistically independent [30]. ICA is often used to perform blind source separation, which is the task of separating a multivariate signal into independent non-Gaussian components. Essentially, ICA uses kurtosis, which measures the peakedness of a distribution, to find non-Gaussian and independent components [31]. This is because ICA is based on the assumption that the underlying sources of the data are non-Gaussian and independent, and kurtosis is a robust measure of non-Gaussianity. The time complexity of ICA depends on the algorithm used to perform the ICA. Popular algorithms, such as FastICA, have a time complexity of $O\left({n}^{2}p\right)$, where n is the number of samples and p is the number of features. Other algorithms such as Infomax have a time complexity of $O(nplogn)$. However, it is worth noting that the time complexity of ICA can also depend on the implementation and specific details of the dataset, such as the number of independent components being estimated. Table 1 compares the main features of the three investigated dimensionality reduction methods, KPCA, PCA, and ICA.

#### 2.5. Semi-Supervised Anomaly Detection Methods

**Elliptical envelope (EE):**EE is a density-based anomaly detection algorithm that assumes that the data are generated from a Gaussian distribution. The algorithm fits an ellipse to the data, and any point outside of this ellipse is considered an anomaly [32]. This algorithm is sensitive to the shape of the data distribution and is not suitable for data that do not have Gaussian distributions [33]. The time complexity of EE is $O\left({n}^{3}\right)$ for both the training and testing phase.

**Local outlier factor (LOF):**LOF is a density-based anomaly detection algorithm that calculates the local density of a point compared to its surrounding points [34]. It considers a point an anomaly if its local density is significantly lower than the density of its surrounding points [35]. The time complexity of LOF is $O\left({n}^{2}\right)$ for both the training and testing phase.

**Isolation forest (iForest):**iForest is a tree-based anomaly detection algorithm. It builds a forest of isolation trees, where each tree splits the data based on a randomly selected feature and a random split value [36]. The goal is to isolate anomalies by creating shorter paths for abnormal points, and longer paths for normal points [37]. The iForest algorithm has a computational cost of $O(tllogl)$ during the training phase and $O(ntllogl)$ during the testing phase, where l is the subsampling size of the dataset, n is the number of samples in the dataset, and t is the number of trees in the forest, as reported in [38]. It is worth noting that for optimal detection performance, l should be kept small and consistent across different datasets. Table 2 lists the main advantages and shortcomings of the four investigated anomaly detection methods: OCSVM, LOF, iForest, and EE.

## 3. Results and Discussion

#### 3.1. Description of the Used Data

#### 3.1.1. Dataset 1

#### 3.1.2. Dataset 2

#### 3.2. Detection Results

#### 3.3. Comparison with the Existing Methods

#### 3.4. Feature Importance Identification

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Kistenev, Y.V.; Vrazhnov, D.A.; Shnaider, E.E.; Zuhayri, H. Predictive models for COVID-19 detection using routine blood tests and machine learning. Heliyon
**2022**, 8, e11185. [Google Scholar] [CrossRef] [PubMed] - Day, M. COVID-19: Identifying and isolating asymptomatic people helped eliminate virus in Italian village. BMJ Br. Med. J.
**2020**, 368, m1165. [Google Scholar] [CrossRef] [PubMed] - Rikan, S.B.; Azar, A.S.; Ghafari, A.; Mohasefi, J.B.; Pirnejad, H. COVID-19 diagnosis from routine blood tests using artificial intelligence techniques. Biomed. Signal Process. Control
**2022**, 72, 103263. [Google Scholar] [CrossRef] [PubMed] - Chadaga, K.; Prabhu, S.; Vivekananda Bhat, K.; Umakanth, S.; Sampathila, N. Medical diagnosis of COVID-19 using blood tests and machine learning. J. Phys. Conf. Ser.
**2022**, 2161, 012017. [Google Scholar] [CrossRef] - Lee, Y.; Kim, Y.S.; Lee, D.i.; Jeong, S.; Kang, G.H.; Jang, Y.S.; Kim, W.; Choi, H.Y.; Kim, J.G.; Choi, S.h. The application of a deep learning system developed to reduce the time for RT-PCR in COVID-19 detection. Sci. Rep.
**2022**, 12, 1234. [Google Scholar] [CrossRef] - Loddo, A.; Meloni, G.; Pes, B. Using Artificial Intelligence for COVID-19 Detection in Blood Exams: A Comparative Analysis. IEEE Access
**2022**, 10, 119593–119606. [Google Scholar] [CrossRef] - Wang, B.; Zhao, Y.; Chen, C.P. Hybrid Transfer Learning and Broad Learning System for Wearing Mask Detection in the COVID-19 Era. IEEE Trans. Instrum. Meas.
**2021**, 70, 5009612. [Google Scholar] [CrossRef] - Sharma, R.R.; Kumar, M.; Maheshwari, S.; Ray, K.P. EVDHM-ARIMA-based time series forecasting model and its application for COVID-19 cases. IEEE Trans. Instrum. Meas.
**2020**, 70, 6502210. [Google Scholar] [CrossRef] - Lam, C.; Tso, C.F.; Green-Saxena, A.; Pellegrini, E.; Iqbal, Z.; Evans, D.; Hoffman, J.; Calvert, J.; Mao, Q.; Das, R.; et al. Semisupervised deep learning techniques for predicting acute respiratory distress syndrome from time-series clinical data: Model development and validation study. JMIR Form. Res.
**2021**, 5, e28028. [Google Scholar] [CrossRef] - Wu, W.; Shi, J.; Yu, H.; Wu, W.; Vardhanabhuti, V. Tensor gradient L
_{0}-norm minimization-based low-dose CT and its application to COVID-19. IEEE Trans. Instrum. Meas.**2021**, 70, 4503012. [Google Scholar] [CrossRef] - Han, C.H.; Kim, M.; Kwak, J.T. Semi-supervised learning for an improved diagnosis of COVID-19 in CT images. PLoS ONE
**2021**, 16, e0249450. [Google Scholar] [CrossRef] - Khobahi, S.; Agarwal, C.; Soltanalian, M. Coronet: A deep network architecture for semi-supervised task-based identification of covid-19 from chest x-ray images. MedRxiv
**2020**. [Google Scholar] [CrossRef] - Brunese, L.; Mercaldo, F.; Reginelli, A.; Santone, A. Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays. Comput. Methods Programs Biomed.
**2020**, 196, 105608. [Google Scholar] [CrossRef] - Dairi, A.; Harrou, F.; Sun, Y. Deep generative learning-based 1-svm detectors for unsupervised COVID-19 infection detection using blood tests. IEEE Trans. Instrum. Meas.
**2021**, 71, 2500211. [Google Scholar] [CrossRef] - Alves, M.A.; Castro, G.Z.; Oliveira, B.A.S.; Ferreira, L.A.; Ramírez, J.A.; Silva, R.; Guimar aes, F.G. Explaining machine learning based diagnosis of COVID-19 from routine blood tests with decision trees and criteria graphs. Comput. Biol. Med.
**2021**, 132, 104335. [Google Scholar] [CrossRef] - AlJame, M.; Ahmad, I.; Imtiaz, A.; Mohammed, A. Ensemble learning model for diagnosing COVID-19 from routine blood tests. Inform. Med. Unlocked
**2020**, 21, 100449. [Google Scholar] [CrossRef] - Sargiani, V.; De Souza, A.A.; De Almeida, D.C.; Barcelos, T.S.; Munoz, R.; Da Silva, L.A. Supporting Clinical COVID-19 Diagnosis with Routine Blood Tests Using Tree-Based Entropy Structured Self-Organizing Maps. Appl. Sci.
**2022**, 12, 5137. [Google Scholar] [CrossRef] - Brinati, D.; Campagner, A.; Ferrari, D.; Locatelli, M.; Banfi, G.; Cabitza, F. Detection of COVID-19 infection from routine blood exams with machine learning: A feasibility study. J. Med. Syst.
**2020**, 44, 135. [Google Scholar] [CrossRef] - de Freitas Barbosa, V.A.; Gomes, J.C.; de Santana, M.A.; Albuquerque, J.E.d.A.; de Souza, R.G.; de Souza, R.E.; dos Santos, W.P. Heg. IA: An intelligent system to support diagnosis of Covid-19 based on blood tests. Res. Biomed. Eng.
**2022**, 38, 99–116. [Google Scholar] [CrossRef] - Aktar, S.; Ahamad, M.M.; Rashed-Al-Mahfuz, M.; Azad, A.; Uddin, S.; Kamal, A.; Alyami, S.A.; Lin, P.I.; Islam, S.M.S.; Quinn, J.M.; et al. Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development. JMIR Med. Inform.
**2021**, 9, e25884. [Google Scholar] [CrossRef] - Choi, S.W.; Lee, C.; Lee, J.M.; Park, J.H.; Lee, I.B. Fault detection and identification of nonlinear processes based on kernel PCA. Chemom. Intell. Lab. Syst.
**2005**, 75, 55–67. [Google Scholar] [CrossRef] - Hejazi, M.; Singh, Y.P. One-class support vector machines approach to anomaly detection. Appl. Artif. Intell.
**2013**, 27, 351–366. [Google Scholar] [CrossRef] - Harrou, F.; Sun, Y.; Hering, A.S.; Madakyaru, M.; Dairi, A. Unsupervised deep learning-based process monitoring methods. In Statistical Process Monitoring Using Advanced Data-Driven and Deep Learning Approaches; Elsevier: Amsterdam, The Netherlands, 2021; pp. 193–223. [Google Scholar]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput.
**2001**, 13, 1443–1471. [Google Scholar] [CrossRef] - Alam, S.; Sonbhadra, S.K.; Agarwal, S.; Nagabhushan, P. One-class support vector classifiers: A survey. Knowl.-Based Syst.
**2020**, 196, 105754. [Google Scholar] [CrossRef] - Sebald, D.J.; Bucklew, J.A. Support vector machine techniques for nonlinear equalization. IEEE Trans. Signal Process.
**2000**, 48, 3217–3226. [Google Scholar] [CrossRef] - Schölkopf, B.; Smola, A.; Müller, K.R. Kernel principal component analysis. In Artificial Neural Networks—Proceedings of the ICANN’97: 7th International Conference Lausanne, Switzerland, 8–10 October 1997 Proceeedings; Springer: Berlin/Heidelberg, Germany, 2005; pp. 583–588. [Google Scholar]
- Harrou, F.; Kadri, F.; Khadraoui, S.; Sun, Y. Ozone measurements monitoring using data-based approach. Process. Saf. Environ. Prot.
**2016**, 100, 220–231. [Google Scholar] [CrossRef] - Harrou, F.; Nounou, M.N.; Nounou, H.N.; Madakyaru, M. Statistical fault detection using PCA-based GLR hypothesis testing. J. Loss Prev. Process. Ind.
**2013**, 26, 129–139. [Google Scholar] [CrossRef] - Kong, X.; Yang, Z.; Luo, J.; Li, H.; Yang, X. Extraction of Reduced Fault Subspace Based on KDICA and Its Application in Fault Diagnosis. IEEE Trans. Instrum. Meas.
**2022**, 71, 3505212. [Google Scholar] [CrossRef] - Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Netw.
**2000**, 13, 411–430. [Google Scholar] [CrossRef] - Rousseeuw, P.J.; Driessen, K.V. A fast algorithm for the minimum covariance determinant estimator. Technometrics
**1999**, 41, 212–223. [Google Scholar] [CrossRef] - Dairi, A.; Zerrouki, N.; Harrou, F.; Sun, Y. EEG-Based Mental Tasks Recognition via a Deep Learning-Driven Anomaly Detector. Diagnostics
**2022**, 12, 2984. [Google Scholar] [CrossRef] [PubMed] - Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 93–104. [Google Scholar]
- Dairi, A.; Harrou, F.; Sun, Y. Efficient Driver Drunk Detection by Sensors: A Manifold Learning-Based Anomaly Detector. IEEE Access
**2022**, 10, 119001–119012. [Google Scholar] [CrossRef] - Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data
**2012**, 6, 3. [Google Scholar] [CrossRef] - Chabchoub, Y.; Togbe, M.U.; Boly, A.; Chiky, R. An in-depth study and improvement of Isolation Forest. IEEE Access
**2022**, 10, 10219–10237. [Google Scholar] [CrossRef] - Data4u, E. Diagnosis of COVID-19 and Its Clinical Spectrum AI and Data Science Supporting Clinical Decisions (from 28 March to 3 April). Kaggle. Available online: https://www.kaggle.com/einsteindata4u/covid19 (accessed on 24 July 2021).
- Banerjee, A.; Ray, S.; Vorselaars, B.; Kitson, J.; Mamalakis, M.; Weeks, S.; Baker, M.; Mackenzie, L.S. Use of machine learning and artificial intelligence to predict SARS-CoV-2 infection from full blood counts in a population. Int. Immunopharmacol.
**2020**, 86, 106705. [Google Scholar] [CrossRef] [PubMed] - Cabitza, F.; Campagner, A.; Ferrari, D.; Di Resta, C.; Ceriotti, D.; Sabetta, E.; Colombini, A.; De Vecchi, E.; Banfi, G.; Locatelli, M.; et al. Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests. Clin. Chem. Lab. Med.
**2021**, 59, 421–431. [Google Scholar] [CrossRef] - de Moraes Batista, A.F.; Miraglia, J.L.; Donato, T.H.R.; Chiavegatto Filho, A.D.P. COVID-19 diagnosis prediction in emergency care patients: A machine learning approach. MedRxiv
**2020**. [Google Scholar] [CrossRef] - Kukar, M.; Gunčar, G.; Vovko, T.; Podnar, S.; Černelč, P.; Brvar, M.; Zalaznik, M.; Notar, M.; Moškon, S.; Notar, M. COVID-19 diagnosis by routine blood tests using machine learning. Sci. Rep.
**2021**, 11, 10738. [Google Scholar] [CrossRef] - Wang, D.; Hu, B.; Hu, C.; Zhu, F.; Liu, X.; Zhang, J.; Wang, B.; Xiang, H.; Cheng, Z.; Xiong, Y.; et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. JAMA
**2020**, 323, 1061–1069. [Google Scholar] [CrossRef] - Chen, N.; Zhou, M.; Dong, X.; Qu, J.; Gong, F.; Han, Y.; Qiu, Y.; Wang, J.; Liu, Y.; Wei, Y.; et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study. Lancet
**2020**, 395, 507–513. [Google Scholar] [CrossRef] - Zhang, C.; Shi, L.; Wang, F.S. Liver injury in COVID-19: Management and challenges. Lancet Gastroenterol. Hepatol.
**2020**, 5, 428–430. [Google Scholar] [CrossRef] [PubMed] - Lippi, G.; Plebani, M. Laboratory abnormalities in patients with COVID-2019 infection. Clin. Chem. Lab. Med.
**2020**, 58, 1131–1134. [Google Scholar] [CrossRef] [PubMed] - Kadri, F.; Dairi, A.; Harrou, F.; Sun, Y. Towards accurate prediction of patient length of stay at emergency department: A GAN-driven deep learning framework. J. Ambient. Intell. Humaniz. Comput.
**2022**, 1–15. [Google Scholar] [CrossRef] [PubMed]

**Figure 6.**Histogram of 400 bootstrapped outcomes with a 95% confidence interval for each evaluation metric (

**a**) accuracy, (

**b**) precision, (

**c**) recall, (

**d**) F1-score, and (

**e**) AUC, calculated from the testing Dataset 1.

**Figure 7.**Histogram of 400 bootstrapped outcomes with a 95% confidence interval for each evaluation metric (

**a**) accuracy, (

**b**) precision, (

**c**) recall, (

**d**) F1-score, and (

**e**) AUC, calculated from the testing Dataset 2.

Method | Time Complexity | Linear/Nonlinear | Data Distribution |
---|---|---|---|

KPCA | O(${n}^{3}$) or O(${n}^{2}$) | Nonlinear | Non-Gaussian |

PCA | O(${n}^{3}$) or O(${n}^{2}$) | Linear | Gaussian |

ICA | O(${n}^{3}$) | Linear | Non-Gaussian |

Approach | Advantages | Shortcomings |
---|---|---|

OCSVM | - Assumption-free about the data distribution | - Sensitive to the choice of the kernel function and parameters |

iForest | - High-dimensional data and large datasets are supported | - Prone to overfitting and may not handle circular patterns well |

LOF | - Can handle multi-dimensional data | - Sensitive to the choice of parameters and may not handle noisy data well |

EE | - Assumes normal data are distributed normally | - Sensitive to the choice of parameters and assumes data are distributed normally |

Feature | Abbreviation |
---|---|

Hemoglobin | HGB |

Platelets | PLT1 |

White blood cells | WBC |

Lymphocyte count | LYT |

Basophils count | BAT |

Eosinophil count | EOT |

Neutrophil count | NET |

Monocyte count | MOT |

Urea | Urea |

Alanine aminotransferase | ALT |

Aspartate aminotransferase | AST |

Approach | TPR | FPR | Accuracy | Precision | Recall | F1Score | AUC |
---|---|---|---|---|---|---|---|

KPCA-OCSVM${}_{\mathrm{RBF}}$ | 1.00 | 0.03 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 |

KPCA-OCSVM${}_{\mathrm{Poly}}$ | 0.37 | 0.40 | 0.51 | 0.35 | 0.37 | 0.36 | 0.48 |

KPCA-OCSVM${}_{\mathrm{Sig}}$ | 0.34 | 0.44 | 0.45 | 0.43 | 0.34 | 0.38 | 0.45 |

KPCA-OCSVM${}_{\mathrm{Lin}}$ | 0.46 | 0.35 | 0.58 | 0.44 | 0.46 | 0.45 | 0.56 |

KPCA-iForest | 0.93 | 0.83 | 0.68 | 0.70 | 0.93 | 0.80 | 0.55 |

KPCA-LOF | 0.95 | 0.45 | 0.91 | 0.95 | 0.95 | 0.95 | 0.75 |

KPCA-EE | 1.00 | 0.47 | 0.91 | 0.90 | 1.00 | 0.95 | 0.76 |

PCA-OCSVM${}_{\mathrm{RBF}}$ | 1.00 | 0.03 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 |

PCA-OCSVM${}_{\mathrm{Poly}}$ | 0.39 | 0.39 | 0.52 | 0.41 | 0.39 | 0.40 | 0.50 |

PCA-OCSVM${}_{\mathrm{Sig}}$ | 0.33 | 0.45 | 0.44 | 0.43 | 0.33 | 0.37 | 0.44 |

PCA-OCSVM${}_{\mathrm{Lin}}$ | 0.20 | 0.61 | 0.29 | 0.28 | 0.20 | 0.23 | 0.29 |

PCA-iForest | 0.91 | 0.87 | 0.66 | 0.69 | 0.91 | 0.79 | 0.52 |

PCA-LOF | 0.94 | 0.52 | 0.90 | 0.94 | 0.94 | 0.94 | 0.71 |

PCA-EE | 1.00 | 0.48 | 0.91 | 0.90 | 1.00 | 0.95 | 0.76 |

ICA-OCSVM${}_{\mathrm{RBF}}$ | 1.00 | 0.03 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 |

ICA-OCSVM${}_{\mathrm{Poly}}$ | 0.39 | 0.39 | 0.51 | 0.43 | 0.39 | 0.41 | 0.50 |

ICA-OCSVM${}_{\mathrm{Sig}}$ | 0.39 | 0.39 | 0.48 | 0.56 | 0.39 | 0.46 | 0.50 |

ICA-OCSVM${}_{\mathrm{Lin}}$ | 0.14 | 0.80 | 0.16 | 0.22 | 0.14 | 0.17 | 0.17 |

ICA-ISOL | 0.91 | 0.87 | 0.73 | 0.78 | 0.91 | 0.84 | 0.52 |

ICA-LOF | 0.94 | 0.48 | 0.90 | 0.95 | 0.94 | 0.95 | 0.73 |

ICA-EE | 1.00 | 0.36 | 0.94 | 0.94 | 1.00 | 0.97 | 0.82 |

Approach | TPR | FPR | Accuracy | Precision | Recall | F1Score | AUC |
---|---|---|---|---|---|---|---|

KPCA-OCSVM${}_{\mathrm{RBF}}$ | 0.97 | 0.00 | 0.99 | 1.00 | 0.97 | 0.99 | 0.99 |

KPCA-OCSVM${}_{\mathrm{Poly}}$ | 0.71 | 0.98 | 0.23 | 0.24 | 0.71 | 0.36 | 0.36 |

KPCA-OCSVM${}_{\mathrm{Sig}}$ | 0.90 | 0.90 | 0.51 | 0.51 | 0.90 | 0.65 | 0.50 |

KPCA-OCSVM${}_{\mathrm{Lin}}$ | 0.77 | 0.96 | 0.29 | 0.29 | 0.77 | 0.43 | 0.41 |

KPCA-iForest | 0.55 | 0.23 | 0.66 | 0.71 | 0.55 | 0.62 | 0.66 |

KPCA-LOF | 0.81 | 0.10 | 0.86 | 0.85 | 0.81 | 0.83 | 0.85 |

KPCA-EE | 0.73 | 0.10 | 0.82 | 0.87 | 0.73 | 0.79 | 0.82 |

PCA-OCSVM${}_{\mathrm{RBF}}$ | 0.97 | 0.00 | 0.99 | 1.00 | 0.97 | 0.99 | 0.99 |

PCA-OCSVM${}_{\mathrm{Poly}}$ | 0.73 | 0.96 | 0.23 | 0.22 | 0.73 | 0.34 | 0.38 |

PCA-OCSVM${}_{\mathrm{Sig}}$ | 0.89 | 0.91 | 0.50 | 0.50 | 0.89 | 0.64 | 0.49 |

PCA-OCSVM${}_{\mathrm{Lin}}$ | 0.62 | 0.99 | 0.16 | 0.16 | 0.62 | 0.26 | 0.32 |

PCA-iForest | 0.60 | 0.20 | 0.70 | 0.74 | 0.60 | 0.66 | 0.70 |

PCA-LOF | 0.78 | 0.10 | 0.85 | 0.85 | 0.78 | 0.82 | 0.84 |

PCA-EE | 0.71 | 0.10 | 0.81 | 0.87 | 0.71 | 0.78 | 0.81 |

ICA-OCSVM${}_{\mathrm{RBF}}$ | 0.94 | 0.01 | 0.97 | 0.99 | 0.94 | 0.96 | 0.97 |

ICA-OCSVM${}_{\mathrm{Poly}}$ | 0.70 | 0.98 | 0.22 | 0.22 | 0.70 | 0.34 | 0.36 |

ICA-OCSVM${}_{\mathrm{Sig}}$ | 0.87 | 0.93 | 0.49 | 0.51 | 0.87 | 0.64 | 0.47 |

ICA-OCSVM${}_{\mathrm{Lin}}$ | 0.96 | 0.71 | 0.79 | 0.79 | 0.96 | 0.87 | 0.62 |

ICA-ISOL | 0.60 | 0.20 | 0.71 | 0.72 | 0.60 | 0.66 | 0.70 |

ICA-LOF | 0.75 | 0.09 | 0.84 | 0.88 | 0.75 | 0.81 | 0.83 |

ICA-EE | 0.71 | 0.10 | 0.81 | 0.87 | 0.71 | 0.78 | 0.81 |

Refs | Dataset | Model | Metrics |
---|---|---|---|

[40] | Dataset 1 | RF, LR, GLMNET, and ANN | AUC = 95 |

[42] | Dataset 1 | NN, RF, GBT, LR, and SVM | AUC = 85 |

[19] | Dataset 1 | MLP, SVM, RT, RF, BN, and NB | Acc = 95.15% |

Sens = 96.8%, Spec = 93.6% | |||

[16] | Dataset 1 | XGBoost | AUC = 99.38 |

KPCA-OCSVM | Dataset 1 | KPCA-OCSVM | AUC = 99 |

[41] | Dataset 2 | DT-XGBoost | AUC = 85 |

KPCA-OCSVM | Dataset 2 | KPCA-OCSVM | AUC = 99 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Harrou, F.; Dairi, A.; Dorbane, A.; Kadri, F.; Sun, Y.
Semi-Supervised KPCA-Based Monitoring Techniques for Detecting COVID-19 Infection through Blood Tests. *Diagnostics* **2023**, *13*, 1466.
https://doi.org/10.3390/diagnostics13081466

**AMA Style**

Harrou F, Dairi A, Dorbane A, Kadri F, Sun Y.
Semi-Supervised KPCA-Based Monitoring Techniques for Detecting COVID-19 Infection through Blood Tests. *Diagnostics*. 2023; 13(8):1466.
https://doi.org/10.3390/diagnostics13081466

**Chicago/Turabian Style**

Harrou, Fouzi, Abdelkader Dairi, Abdelhakim Dorbane, Farid Kadri, and Ying Sun.
2023. "Semi-Supervised KPCA-Based Monitoring Techniques for Detecting COVID-19 Infection through Blood Tests" *Diagnostics* 13, no. 8: 1466.
https://doi.org/10.3390/diagnostics13081466