COVID-19 Prediction Using Machine Learning †
Abstract
1. Introduction
2. Literature Review
3. Methodology
3.1. Data Collection and Preprocessing
3.2. Split Dataset
3.3. Feature Selection
3.4. Model Development
- Logistic Regression: Here, the baseline model applied was logistic regression, which gave a standard model against which all other models would be compared. Logistic regression is mostly used for the binary classification.
- K-Nearest Neighbors (KNN): The technique was applied with different values of k, specifically 1 and 3, to see which value of k gave the optimal number of nearest neighbors to classify the data points. KNN is a nonparametric method that can easily adapt to any kind of data distribution.
- Random Forest: This is an ensemble learning method, which is suitable for tuning with GridSearchCV, systematically searching through multiple combinations of hyperparameters to discover the best configuration for improving accuracy and performance.
- Decision Tree: This is a nonparametric model that does not assume any particular distribution of data.
- Naïve Bayes: This is a parametric model that assumes conditional independence between features.
3.5. Ensemble Method
4. Results and Discussion
5. Conclusions
- Increasing the size of datasets to better understand the factors affecting COVID-19 outcomes in different groups.
- We will perform hyperparameter optimization and feature selection to improve model performance and reduce mistakes.
- Class Imbalance: We will use more advanced techniques such as oversampling, under sampling, and algorithmic tuning to make the model more robust and accurate for under-represented groups. Our findings have wide-ranging implications for the advancement of better care for patients. This work serves as foundation upon which more powerful diagnostic tools can be developed that will allow practitioners to make more informed decisions with accurate, data-driven predictions.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Moulaei, K.S.; Zohreh, M.-T.; Al, K.-A.H. Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med. Inform. Decis. Mak. 2022, 22, 2. [Google Scholar] [CrossRef] [PubMed]
- Solayman, S.A.; Mamun, M.C.; Mahmud, M.; Khan, K.R. Automatic COVID-19 prediction using explainable machine learning techniques. Int. J. Cogn. Comput. Eng. 2023, 4, 36–46. [Google Scholar] [CrossRef]
- El-Ush, L.A.; Ahmad, S.A.; Choi, C.C.; Muhammad, M.I. Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput. Sci. 2021, 2, 11. [Google Scholar]
- Hassan, A.A.; Kwekha-Rashid, A.B. Coronavirus disease (COVID-19) cases analysis using machine-learning applications. Appl. Nanosci. 2023, 13, 2013–2025. [Google Scholar]
- Kumar, R.; Dutta, F.; Bera, S.; Maji, S.; Ahmad, A.; Kumar, I.E. Recurrent neural network and reinforcement learning model for COVID-19 prediction. Front. Public Health 2021, 9, 744100. [Google Scholar] [CrossRef]
- Shaikh, A.A.; Bhat, B.; Ali, K.A.; Nazeer, A.; Akhtar, J.M. COVID-19 detection from CBC using machine learning techniques. Int. J. Technol. Innov. Manag. 2021, 1, 65–78. [Google Scholar]
- Alballa, N.; Al-Turki, A.-T.I. Machine learning approaches in COVID-19 diagnosis, mortality, and severity risk prediction: A review. Inform. Med. Unlocked 2021, 24, 100564. [Google Scholar] [CrossRef] [PubMed]
- Tofan, G.G.; Almasan, R.E.; Mohammadi, M.F.; Zainuddin, K.M.; Ulucay, I.M.; Lazaroiu, C.M.; Razvan, S. The economics of deep and machine learning-based algorithms for COVID-19 prediction. Oeconomia Copernic. 2024, 15, 27–58. [Google Scholar] [CrossRef]
- Jimenez, C.M.; Jiang, I.X.; Jiang, J.; Villavicencio, H.J. COVID-19 prediction applying supervised machine learning algorithms with comparative analysis using Weka. Algorithms 2021, 14, 201. [Google Scholar] [CrossRef]
- Painuli, D.M.; Bansal, D.; Painuli, A.M. Forecast and prediction of COVID-19 using machine learning. In Data Science for COVID-19; Academic Press: Cambridge, MA, USA, 2021; pp. 381–397. [Google Scholar]
- Shaikh, I.H.; Mahmud, S.A.-E.; Ali, A.-K.M.; Arpaci, P.M. Predicting the COVID-19 infection with fourteen clinical features using machine learning classification algorithms. Multimed. Tools Appl. 2021, 80, 11943–11957. [Google Scholar]
- El-Ebiary, Z.A.; Dwedar, E.A.; Ghaleb, G.; Abdelrazek, O.M.; Mansour, A.-D.; Malki, G.I. The COVID-19 pandemic: Prediction study based on machine learning models. Environ. Sci. Pollut. Res. 2021, 28, 40496–40506. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, S.E.; Sayed, M.S. Applying different machine learning techniques for prediction of COVID-19 severity. IEEE Access 2021, 9, 135697–135707. [Google Scholar] [CrossRef] [PubMed]
- Ezzat, A.A.; Booth, M.P. Development of a prognostic model for mortality in COVID-19 infection using machine learning. Mod. Pathol. 2021, 34, 522–531. [Google Scholar]
- Ikemura, K.B.; Yamaguchi, E.Y.; Huynh, B.H.; Sharafoddini, S.M.; Li, S.K.; Johnson, L.S.; Doran, J.G.; Gonzalez, R.G. Using automated machine learning to predict the mortality of patients with COVID-19: Prediction model development. J. Med. Internet Res. 2021, 23, e23458. [Google Scholar] [CrossRef] [PubMed]
- Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M.A. A New Model for Predicting Component Based Software Reliability Using Soft Computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
- Kok, S.H.; Abdullah, A.; Jhanjhi, N.Z.; Supramaniam, M.A. A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 2019, 12, 8–15. [Google Scholar]
- Ahmed, S.; Hossain, M.A.; Bhuiyan, M.M.I.; Ray, S.K. A Comparative Study of Machine Learning Algorithms to Predict Road Accident Severity. In Proceedings of the 2021 20th International Conference on Ubiquitous Computing and Communications (IUCC/CIT/DSCI/SmartCNS), London, UK, 20–22 December 2021; pp. 390–397. [Google Scholar] [CrossRef]
Paper | Model Used | Accuracy/Performance |
---|---|---|
[1] | Random Forest (RF) | 95.03% (Precision: 94.23%) |
[2] | CNN-LSTM, Random Forest, Logistic Regression, Decision Trees | CNN-LSTM: 96.34% (F1 Score: 0.98) |
[3] | ML Model using Binary Features (Age, Gender, Symptoms) | AUROC: 0.90 |
[4] | Supervised Learning (Logistic Regression, ANN, CNN) | Up to 92.9% accuracy |
[5] | MLSTM + Deep Reinforcement Learning (DRL) | High correlation, low error rates |
Model | Accuracy | Precision | Recall |
---|---|---|---|
Logistic Regression | 73.00% | 77.28% | 67.82% |
KNN (k = 1) | 63.00% | 65.56% | 58.08% |
KNN (k = 3) | 61.67% | 63.85% | 59.36% |
Random Forest | 79.33% | 81.59% | 71.67% |
SVM | 87.00% | 91.54% | 79.15% |
Naive Bayes | 49.33% | 51.61% | 41.03% |
Decision Tree | 59.93% | 51.49% | 44.23% |
Ensemble (Soft) | 51.70% | 53.20% | 49.85% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Raza, A.; Rehman, A.U.; Sanjaya, I. COVID-19 Prediction Using Machine Learning. Eng. Proc. 2025, 107, 60. https://doi.org/10.3390/engproc2025107060
Raza A, Rehman AU, Sanjaya I. COVID-19 Prediction Using Machine Learning. Engineering Proceedings. 2025; 107(1):60. https://doi.org/10.3390/engproc2025107060
Chicago/Turabian StyleRaza, Ali, Attique Ur Rehman, and Imam Sanjaya. 2025. "COVID-19 Prediction Using Machine Learning" Engineering Proceedings 107, no. 1: 60. https://doi.org/10.3390/engproc2025107060
APA StyleRaza, A., Rehman, A. U., & Sanjaya, I. (2025). COVID-19 Prediction Using Machine Learning. Engineering Proceedings, 107(1), 60. https://doi.org/10.3390/engproc2025107060