Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Set
2.2. Study Population
2.3. Feature Preparation
2.4. Training
2.5. Feature Importance
3. Results
3.1. Model Performance
3.2. Feature Importance
3.2.1. SARS-CoV-2 Variants
3.2.2. Sociodemographic and Practice Effects, and General Diagnosis and Medication Counts
3.2.3. ICD-10 Classes
3.2.4. ATC Classes
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- WHO Coronavirus (COVID-19) Dashboard. Available online: https://covid19.who.int/ (accessed on 25 February 2023).
- Chen, C.; Haupert, S.R.; Zimmermann, L.; Shi, X.; Fritsche, L.G.; Mukherjee, B. Global Prevalence of Post-Coronavirus Disease 2019 (COVID-19) Condition or Long COVID: A Meta-Analysis and Systematic Review. J. Infect. Dis. 2022, 226, 1593–1607. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Zhang, L.; Zhang, Y.; Chen, G.; Wang, D.; Chen, X.; Wang, Z.; Wang, J.; Che, X.; Horita, N.; et al. Prevalence and clinical features of long COVID from omicron infection in children and adults. J. Infect. 2023, 86, e97–e99. [Google Scholar] [CrossRef] [PubMed]
- Cisterna-García, A.; Guillén-Teruel, A.; Caracena, M.; Pérez, E.; Jiménez, F.; Francisco-Verdú, F.J.; Reina, G.; González-Billalabeitia, E.; Palma, J.; Sánchez-Ferrer, Á.; et al. A predictive model for hospitalization and survival to COVID-19 in a retrospective population-based study. Sci. Rep. 2022, 12, 18126. [Google Scholar] [CrossRef] [PubMed]
- Gupta, H.; Verma, O.P. Vaccine hesitancy in the post-vaccination COVID-19 era: A machine learning and statistical analysis driven study. Evol. Intell. 2023, 16, 739–757. [Google Scholar] [CrossRef]
- Jimenez-Solem, E.; Petersen, T.S.; Hansen, C.; Hansen, C.; Lioma, C.; Igel, C.; Boomsma, W.; Krause, O.; Lorenzen, S.; Selvan, R.; et al. Developing and validating COVID-19 adverse outcome risk prediction models from a bi-national European cohort of 5594 patients. Sci. Rep. 2021, 11, 3246. [Google Scholar] [CrossRef]
- Sudre, C.H.; Murray, B.; Varsavsky, T.; Graham, M.S.; Penfold, R.S.; Bowyer, R.C.; Pujol, J.C.; Klaser, K.; Antonelli, M.; Canas, L.S.; et al. Attributes and Predictors of Long COVID. Nat. Med. 2021, 27, 626–631. [Google Scholar] [CrossRef]
- Pfaff, E.R.; Girvin, A.T.; Bennett, T.D.; Bhatia, A.; Brooks, I.M.; Deer, R.R.; Dekermanjian, J.P.; Jolley, S.E.; Kahn, M.G.; Kostka, K.; et al. Identifying who has long COVID in the USA: A machine learning approach using N3C data. Lancet Digit. Health 2022, 4, e532–e541. [Google Scholar] [CrossRef]
- Rathmann, W.; Bongaerts, B.; Carius, H.-J.; Kruppert, S.; Kostev, K. Basic characteristics and representativeness of the German Disease Analyzer database. Int. J. Clin. Pharmacol. Ther. 2018, 56, 459–466. [Google Scholar] [CrossRef] [PubMed]
- Federal Institute for Drugs and Medical Devices (BfArM). Internationale statistische Klassifikation der Krankheiten und verwandter Gesundheitsprobleme, 10. Revision, German Modification, Version 2023. Available online: https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2023/#IV (accessed on 12 October 2022).
- EphMRA. Available online: https://www.ephmra.org/ (accessed on 12 October 2022).
- Robert Koch Institute. Anzahl und Anteile von VOC und VOI in Deutschland. Available online: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/VOC_VOI_Tabelle.xlsx (accessed on 12 October 2022).
- Impfdashboard Deutschland. Available online: https://impfdashboard.de/static/data/germany_vaccinations_timeseries_v3.tsv (accessed on 20 June 2022).
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; Guyon, I., von Luxburg, U., Bengio Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; MIT Press: Cambridge, MA, USA, 2017; pp. 3146–3154. [Google Scholar]
- Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? arXiv 2022, arXiv:2207.08815. [Google Scholar]
- Schöler, D.; Kostev, K.; Peters, M.; Zamfir, C.; Wolk, A.; Roderburg, C.; Loosen, S.H. Machine Learning Can Predict the Probability of Biologic Therapy in Patients with Inflammatory Bowel Disease. J. Clin. Med. 2022, 11, 4586. [Google Scholar] [CrossRef]
- Csizmadia, G.; Liszkai-Peres, K.; Ferdinandy, B.; Miklósi, Á.; Konok, V. Human activity recognition of children with wearable devices using LightGBM machine learning. Sci. Rep. 2022, 12, 5472. [Google Scholar] [CrossRef]
- Rahman, S.; Irfan, M.; Raza, M.; Moyeezullah Ghori, K.; Yaqoob, S.; Awais, M. Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living. Int. J. Environ. Res. Public Health 2020, 17, 1082. [Google Scholar] [CrossRef][Green Version]
- Sasaki, Y. The Truth of the F-Measure. 2007. Available online: https://www.cs.odu.edu/mukka/cs795sum09dm/Lecturenotes/Day3/F-measure-YS-26Oct07.pdf (accessed on 26 February 2023).
- Lundberg, S.; Lee, S. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Available online: https://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf (accessed on 26 February 2023).
- O’Sullivan, C. SHAP for Categorical Features. Available online: https://towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea (accessed on 26 February 2023).
- Aktar, S.; Ahamad, M.M.; Rashed-Al-Mahfuz, M.; Azad, A.; Uddin, S.; Kamal, A.; A Alyami, S.; Lin, P.-I.; Islam, S.M.S.; Quinn, J.M.; et al. Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development. JMIR Med. Inform. 2021, 9, e25884. [Google Scholar] [CrossRef] [PubMed]
- Du, M.; Ma, Y.; Deng, J.; Liu, M.; Liu, J. Comparison of Long COVID-19 Caused by Different SARS-CoV-2 Strains: A Systematic Review and Meta-Analysis. Int. J. Environ. Res. Public Health 2022, 19, 16010. [Google Scholar] [CrossRef] [PubMed]
- Kostev, K.; Smith, L.; Koyanagi, A.; Jacob, L. Prevalence of and Factors Associated with Post-Coronavirus Disease 2019 (COVID-19) Condition in the 12 Months After the Diagnosis of COVID-19 in Adults Followed in General Practices in Germany. Open Forum Infect. Dis. 2022, 9, ofac333. [Google Scholar] [CrossRef]
- Peghin, M.; Palese, A.; Venturini, M.; De Martino, M.; Gerussi, V.; Graziano, E.; Bontempo, G.; Marrella, F.; Tommasini, A.; Fabris, M.; et al. Post-COVID-19 symptoms 6 months after acute infection among hospitalized and non-hospitalized patients. Clin. Microbiol. Infect. 2021, 27, 1507–1513. [Google Scholar] [CrossRef] [PubMed]
- Fernández-De-Las-Peñas, C.; Martín-Guerrero, J.D.; Pellicer-Valero, Ó.J.; Navarro-Pardo, E.; Gómez-Mayordomo, V.; Cuadrado, M.L.; Arias-Navalón, J.A.; Cigarán-Méndez, M.; Hernández-Barrera, V.; Arendt-Nielsen, L. Female Sex Is a Risk Factor Associated with Long-Term Post-COVID Related-Symptoms but Not with COVID-19 Symptoms: The LONG-COVID-EXP-CM Multicenter Study. J. Clin. Med. 2022, 11, 413. [Google Scholar] [CrossRef]
- Thompson, E.J.; Williams, D.M.; Walker, A.J.; Mitchell, R.E.; Niedzwiedz, C.L.; Yang, T.C.; Huggins, C.F.; Kwong, A.S.F.; Silverwood, R.J.; Di Gessa, G.; et al. Long COVID burden and risk factors in 10 UK longitudinal studies and electronic health records. Nat. Commun. 2022, 13, 3528. [Google Scholar] [CrossRef]
- Yong, S.J. Long COVID or post-COVID-19 syndrome: Putative pathophysiology, risk factors, and treatments. Infect. Dis. 2021, 53, 737–754. [Google Scholar] [CrossRef] [PubMed]
- Tsampasian, V.; Elghazaly, H.; Chattopadhyay, R.; Debski, M.; Naing, T.K.P.; Garg, P.; Clark, A.; Ntatsaki, E.; Vassiliou, V.S. Risk Factors Associated with Post−COVID-19 Condition: A Systematic Review and Meta-analysis. JAMA Intern. Med. 2023. [Google Scholar] [CrossRef]
- Schou, T.M.; Joca, S.; Wegener, G.; Bay-Richter, C. Psychiatric and neuropsychiatric sequelae of COVID-19—A systematic review. Brain Behav. Immun. 2021, 97, 328–348. [Google Scholar] [CrossRef]
- Pływaczewska-Jakubowska, M.; Chudzik, M.; Babicki, M.; Kapusta, J.; Jankowski, P. Lifestyle, course of COVID-19, and risk of Long-COVID in non-hospitalized patients. Front. Med. 2022, 9, 1036556. [Google Scholar] [CrossRef] [PubMed]
- Wilk, P.; Ruiz-Castell, M.; Moran, V.; Noel Pi Alperin, M.; Bohn, T.; Fagherazzi, G.; Suhrcke, M. How multimorbidity and socio-economic factors affect Long COVID: Evidence from European Countries. Eur. J. Public Health 2022, 32 (Suppl. S3), ckac129.137. [Google Scholar] [CrossRef]
- Hayhoe, B.W.; Powell, R.A.; Barber, S.; Nicholls, D. Impact of COVID-19 on individuals with multimorbidity in primary care. Br. J. Gen. Pract. 2021, 72, 38–39. [Google Scholar] [CrossRef] [PubMed]
- Notarte, K.I.; Catahay, J.A.; Velasco, J.V.; Pastrana, A.; Ver, A.T.; Pangilinan, F.C.; Peligro, P.J.; Casimiro, M.; Guerrero, J.J.; Gellaco, M.M.L.; et al. Impact of COVID-19 vaccination on the risk of developing long-COVID and on existing long-COVID symptoms: A systematic review. eClinicalMedicine 2022, 53, 101624. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kessler, R.; Philipp, J.; Wilfer, J.; Kostev, K. Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. J. Clin. Med. 2023, 12, 3511. https://doi.org/10.3390/jcm12103511
Kessler R, Philipp J, Wilfer J, Kostev K. Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. Journal of Clinical Medicine. 2023; 12(10):3511. https://doi.org/10.3390/jcm12103511
Chicago/Turabian StyleKessler, Roman, Jos Philipp, Joanna Wilfer, and Karel Kostev. 2023. "Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany" Journal of Clinical Medicine 12, no. 10: 3511. https://doi.org/10.3390/jcm12103511
APA StyleKessler, R., Philipp, J., Wilfer, J., & Kostev, K. (2023). Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. Journal of Clinical Medicine, 12(10), 3511. https://doi.org/10.3390/jcm12103511