Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data
Abstract
1. Introduction
2. Data Description and Preprocessing
3. Methodology
3.1. Logistic Regression
3.2. Classification and Regression Trees
3.3. Bootstrap Aggregation (Bagging)
- 1.
- Draw a bootstrap sample from the training set;
- 2.
- Apply CART to the bootstrap sample;
- 3.
- Repeat these steps for a preselected number of times (say, 500);
- 4.
- Based on majority voting, combine basic CART classifiers to produce the final decision criterion.
3.4. Random Forests
- 1.
- Grow a decision tree for different bootstrap samples of size from the training data;
- 2.
- When growing the tree, select m variables at random from the p variables at each step;
- 3.
- From the m variables randomly selected, choose the best split variable;
- 4.
- Split the node into two nodes until the minimum node size is reached;
- 5.
- Output the ensemble of the trees.
3.5. Support Vector Machines
3.6. Artificial Neural Networks
3.7. Summary
4. Empirical Results and Discussion
4.1. Full Dataset/Model
4.2. Reduced Dataset/Model
4.2.1. Variable Selection
4.2.2. Results for the Reduced Model
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Mei, X.; Lee, H.C.; Diao, K.y.; Huang, M.; Lin, B.; Liu, C.; Xie, Z.; Ma, Y.; Robson, P.M.; Chung, M.; et al. Artificial intelligence–enabled rapid diagnosis of patients with COVID-19. Nat. Med. 2020, 26, 1224–1228. [Google Scholar] [CrossRef]
- Heidari, A.; Jafari Navimipour, N.; Unal, M.; Toumaj, S. Machine learning applications for COVID-19 outbreak management. Neural Comput. Appl. 2022, 34, 15313–15348. [Google Scholar] [CrossRef]
- Ihle-Hansen, H.; Berge, T.; Tveita, A.; Rønning, E.J.; Ernø, P.E.; Andersen, E.L.; Wang, C.H.; Tveit, A.; Myrstad, M. COVID-19: Symptoms, course of illness and use of clinical scoring systems for the first 42 patients admitted to a Norwegian local hospital. Tidsskr. Nor. Laegeforening 2020, 140. [Google Scholar]
- Chow, E.J.; Schwartz, N.G.; Tobolowsky, F.A.; Zacks, R.L.T.; Huntington-Frazier, M.; Reddy, S.C.; Rao, A.K. Symptom screening at illness onset of health care personnel with SARS-CoV-2 infection in King County, Washington. J. Am. Med. Assoc. 2020, 323, 2087–2089. [Google Scholar] [CrossRef]
- Kwekha-Rashid, A.S.; Abduljabbar, H.N.; Alhayani, B. Coronavirus disease (COVID-19) cases analysis using machine-learning applications. Appl. Nanosci. 2023, 13, 2013–2025. [Google Scholar] [CrossRef] [PubMed]
- Luers, J.C.; Rokohl, A.C.; Loreck, N.; Wawer Matos, P.A.; Augustin, M.; Dewald, F.; Klein, F.; Lehmann, C.; Heindl, L.M. Olfactory and gustatory dysfunction in coronavirus disease 2019 (COVID-19). Clin. Infect. Dis. 2020, 71, 2262–2264. [Google Scholar] [CrossRef] [PubMed]
- Moulaei, K.; Shanbehzadeh, M.; Mohammadi-Taghiabad, Z.; Kazemi-Arpanahi, H. Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med. Inform. Decis. Mak. 2022, 22, 2. [Google Scholar] [CrossRef] [PubMed]
- Zimmerman, R.K.; Nowalk, M.P.; Bear, T.; Taber, R.; Clarke, K.S.; Sax, T.M.; Eng, H.; Clarke, L.G.; Balasubramani, G. Proposed clinical indicators for efficient screening and testing for COVID-19 infection using Classification and Regression Trees (CART) analysis. Hum. Vaccines Immunother. 2021, 17, 1109–1112. [Google Scholar] [CrossRef]
- Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson Education, Inc.: London, UK, 2010. [Google Scholar]
- Ahuja, A.S. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ 2019, 7, e7702. [Google Scholar] [CrossRef]
- Alyasseri, Z.A.A.; Al-Betar, M.A.; Doush, I.A.; Awadallah, M.A.; Abasi, A.K.; Makhadmeh, S.N.; Zitar, R.A. Review on COVID-19 diagnosis models based on machine learning and deep learning approaches. Expert Syst. 2022, 39, e12759. [Google Scholar] [CrossRef]
- Miller, R.A. Medical diagnostic decision support systems—past, present, and future: A threaded bibliography and brief commentary. J. Am. Med. Inform. Assoc. 1994, 1, 8–27. [Google Scholar] [CrossRef] [PubMed]
- Szolovits, P.; Patil, R.S.; Schwartz, W.B. Artificial intelligence in medical diagnosis. Ann. Intern. Med. 1988, 108, 80–87. [Google Scholar] [CrossRef] [PubMed]
- De Dombal, F.T. Computer-aided Diagnosis of Acute Abdominal Pain: The British Experience. In Professional Judgment: A Reader in Clinical Decision Making; Dowie, J., Elstein, A., Eds.; Cambridge University Press: Cambridge, UK, 1988; pp. 190–199. [Google Scholar]
- Miller, R.A.; McNeil, M.A.; Challinor, S.M.; Masarie, F.E., Jr.; Myers, J.D. The INTERNIST-1/quick medical REFERENCE project—Status report. West. J. Med. 1986, 145, 816. [Google Scholar] [PubMed]
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
- Sarker, I.H.; Kayes, A.; Watters, P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J. Big Data 2019, 6, 57. [Google Scholar] [CrossRef]
- Sarker, I.H.; Salim, F.D. Mining user behavioral rules from smartphone data through association analysis. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, 3–6 June 2018, Proceedings, Part I 22; Springer: Berlin/Heidelberg, Germany, 2018; pp. 450–461. [Google Scholar]
- Yu, K.H.; Beam, A.L.; Kohane, I.S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018, 2, 719–731. [Google Scholar] [CrossRef] [PubMed]
- Jiang, F.; Jiang, Y.; Zhi, H.; Dong, Y.; Li, H.; Ma, S.; Wang, Y.; Dong, Q.; Shen, H.; Wang, Y. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2017, 2, 230–243. [Google Scholar] [CrossRef]
- Murdoch, T.B.; Detsky, A.S. The inevitable application of big data to health care. J. Am. Med. Assoc. 2013, 309, 1351–1352. [Google Scholar] [CrossRef]
- Dilsizian, S.E.; Siegel, E.L. Artificial intelligence in medicine and cardiac imaging: Harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr. Cardiol. Rep. 2014, 16, 1–8. [Google Scholar] [CrossRef]
- Rao, A.S.S.; Vazquez, J.A. Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone–based survey when cities and towns are under quarantine. Infect. Control Hosp. Epidemiol. 2020, 41, 826–830. [Google Scholar]
- Wang, S.; Kang, B.; Ma, J.; Zeng, X.; Xiao, M.; Guo, J.; Cai, M.; Yang, J.; Li, Y.; Meng, X.; et al. A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). Eur. Radiol. 2021, 31, 6096–6104. [Google Scholar] [CrossRef]
- Van Buuren, S. Flexible Imputation of Missing Data; Chapman & Hall/CRC: Boca Raton, FL, USA, 2012. [Google Scholar]
- McCaffrey, P.E. UTMB Non-Image COVID-19 Clinical Dataset. 2024. Available online: https://github.com/pmccaffrey6/COVID-LOS (accessed on 28 November 2025).
- Pokojovy, M. Data Wrangling and Imputation for the UTMB Non-Image COVID-19 Clinical Dataset. 2024. Available online: https://github.com/mpokojovy/COVID.screening.prep (accessed on 28 November 2025).
- LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Deming, S.; Morgan, S. Handbook of Chemometrics and Qualimetrics: Part A. Technometrics 1998, 40, 264. [Google Scholar] [CrossRef]
- Ziegel, E.R. Handbook of Chemometrics and Qualimetrics, Part B. Technometrics 2000, 42, 218–219. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Bauer, E.; Kohavi, R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn. 1999, 36, 105–139. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; Volume 2. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Stitson, M.; Weston, J.; Gammerman, A.; Vovk, V.; Vapnik, V. Theory of Support Vector Machines; Technical Report, CSD-TR-96-17; University of London: London, UK, 1996. [Google Scholar]
- Furey, T.S.; Cristianini, N.; Duffy, N.; Bednarski, D.W.; Schummer, M.; Haussler, D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000, 16, 906–914. [Google Scholar] [CrossRef]
- Pavlidis, P.; Wapinski, I.; Noble, W.S. Support vector machine classification on the Web. Bioinformatics 2004, 20, 586–587. [Google Scholar] [CrossRef] [PubMed]
- Keerthi, S.S.; Lin, C.J. Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput. 2003, 15, 1667–1689. [Google Scholar] [CrossRef]
- Vert, J.P.; Tsuda, K.; Schölkopf, B. A primer on kernel methods. Kernel Methods Comput. Biol. 2004, 47, 35–70. [Google Scholar]
- Musavi, M.T.; Ahmed, W.; Chan, K.H.; Faris, K.B.; Hummels, D.M. On the training of radial basis function classifiers. Neural Netw. 1992, 5, 595–603. [Google Scholar] [CrossRef]
- Jamous, R.; ALRahhal, H.; El-Darieby, M. A new ANN-particle swarm optimization with center of gravity (ANN-PSOCog) prediction model for the stock market under the effect of COVID-19. Sci. Program. 2021, 2021, 6656150. [Google Scholar] [CrossRef]
- Aggarwal, C.C. An Introduction to Neural Networks. In Neural Networks and Deep Learning: A Textbook; Aggarwal, C.C., Ed.; Springer: Cham, Switzerland, 2018; pp. 1–52. [Google Scholar]
- Xu, H.; Anum, A.T. Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data. 2024. Available online: https://github.com/HonglunXu/Machine-Learning-Techniques-for-COVID-19.git (accessed on 28 November 2025).
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Krzywinski, M.; Altman, N. Classification and regression trees. Nat. Methods 2017, 14, 757–758. [Google Scholar] [CrossRef]
- Yitzhaki, S.; Schechtman, E. The Gini Methodology: A Primer on a Statistical Methodology; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Rakotomamonjy, A. Variable selection using SVM-based criteria. J. Mach. Learn. Res. 2003, 3, 1357–1370. [Google Scholar]
- Aminghafari, M.; Cheze, N.; Poggi, J.M. Multivariate denoising using wavelets and principal component analysis. Comput. Stat. Data Anal. 2006, 50, 2381–2398. [Google Scholar] [CrossRef]
- Bruce, P.; Bruce, A.; Gedeck, P. Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python; O’Reilly Media: Sebastopol, CA, USA, 2020. [Google Scholar]









| Variables | Description | Type |
|---|---|---|
| sex | Gender (male or female) | Binary |
| ethnicity | Whether a patient is Hispanic or Latino | Binary |
| age | Patient’s age | Numerical |
| pulse | Number of pulse beats per minute | Numerical |
| pulse oximetry | Blood oxygen level (oxygen saturation) | Numerical |
| respirations | Number of breaths per minute | Numerical |
| temperature | Body temperature | Numerical |
| BP systolic | Systolic blood pressure (top number): the force | Numerical |
| heart exerts on the walls of arteries each time it beats | ||
| BP diastolic | Diastolic blood pressure (bottom number): the force | Numerical |
| heart exerts on the walls of arteries in between beats | ||
| BMI | Measure of under-/overweight | Numerical |
| ICD-10 codes | Binary indicator (yes/no) for each ICD *-10 code | Binary |
| (for diseases, signs and symptoms, abnormal findings, | ||
| complaints, social circumstances, and external causes | ||
| of injury or diseases) |
| Variable | Coefficients |
|---|---|
| sex | |
| ethnicity | |
| I10 code | |
| pulse | |
| pulse oximetry | |
| R52 code | |
| temperature | |
| E11.9 code | |
| BP diastolic | |
| BMI | |
| I50.9 code |
| Method | Accuracy | Sensitivity | Specificity | AUC | F1 Score |
|---|---|---|---|---|---|
| random forest | 0.718 | 0.656 | 0.746 | 0.7529 | 0.6585 |
| bagging | 0.710 | 0.632 | 0.751 | 0.7473 | 0.6443 |
| SVM linear | 0.715 | 0.837 | 0.517 | 0.7651 | 0.6759 |
| SVM radial basis | 0.743 | 0.862 | 0.550 | 0.7828 | 0.7014 |
| SVM polynomial | 0.746 | 0.850 | 0.576 | 0.7797 | 0.7049 |
| ANN | 0.708 | 0.748 | 0.642 | 0.7295 | 0.6732 |
| CART | 0.680 | 0.586 | 0.732 | 0.6821 | 0.6038 |
| logistic regression | 0.725 | 0.797 | 0.609 | 0.7621 | 0.6886 |
| Variable | Coefficient |
|---|---|
| ethnicity | |
| pulse oximetry | |
| pulse | |
| temperature | |
| BMI |
| Method | Accuracy | Sensitivity | Specificity | AUC | F1 Score |
|---|---|---|---|---|---|
| Random forest | 0.710 | 0.632 | 0.747 | 0.7339 | 0.6183 |
| Bagging | 0.710 | 0.629 | 0.755 | 0.7320 | 0.6203 |
| SVM linear | 0.698 | 0.615 | 0.740 | 0.7436 | 0.6034 |
| SVM radial basis | 0.723 | 0.652 | 0.760 | 0.7529 | 0.6383 |
| SVM polynomial | 0.713 | 0.643 | 0.746 | 0.7512 | 0.6252 |
| ANN | 0.708 | 0.817 | 0.530 | 0.7444 | 0.6327 |
| CART | 0.673 | 0.723 | 0.577 | 0.6322 | 0.5993 |
| Logistic regression | 0.683 | 0.768 | 0.543 | 0.7416 | 0.6114 |
| Full Model | Reduced Model | |||
|---|---|---|---|---|
| Method | Sensitivity | AUC | Sensitivity | AUC |
| (Specificity = 0.8) | (Specificity = 0.8) | |||
| Random forest | 0.5759 | 0.7529 | 0.5611 | 0.7339 |
| Bagging | 0.5687 | 0.7473 | 0.5501 | 0.7219 |
| SVM linear | 0.5572 | 0.7651 | 0.5203 | 0.7436 |
| SVM radial basis | 0.6032 | 0.7828 | 0.5616 | 0.7529 |
| SVM polynomial | 0.5917 | 0.7797 | 0.5504 | 0.7512 |
| ANN | 0.5275 | 0.7295 | 0.5579 | 0.7444 |
| CART | 0.4641 | 0.6821 | 0.4869 | 0.6322 |
| Logistic regression | 0.4863 | 0.7621 | 0.4947 | 0.7416 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xu, H.; Anum, A.T.; Pokojovy, M.; Madathil, S.C.; Wen, Y.; Rahman, M.F.; Tseng, T.-L.; Moen, S.; Walser, E. Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data. COVID 2026, 6, 17. https://doi.org/10.3390/covid6010017
Xu H, Anum AT, Pokojovy M, Madathil SC, Wen Y, Rahman MF, Tseng T-L, Moen S, Walser E. Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data. COVID. 2026; 6(1):17. https://doi.org/10.3390/covid6010017
Chicago/Turabian StyleXu, Honglun, Andrews T. Anum, Michael Pokojovy, Sreenath Chalil Madathil, Yuxin Wen, Md Fashiar Rahman, Tzu-Liang (Bill) Tseng, Scott Moen, and Eric Walser. 2026. "Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data" COVID 6, no. 1: 17. https://doi.org/10.3390/covid6010017
APA StyleXu, H., Anum, A. T., Pokojovy, M., Madathil, S. C., Wen, Y., Rahman, M. F., Tseng, T.-L., Moen, S., & Walser, E. (2026). Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data. COVID, 6(1), 17. https://doi.org/10.3390/covid6010017

