This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation
by
Leonard A. Jason
Leonard A. Jason 1,*
,
Jacob Furst
Jacob Furst 2,
Lauren Ruesink
Lauren Ruesink 1 and
Ben Z. Katz
Ben Z. Katz 3
1
Center for Community Research, DePaul University, Chicago, IL 60614, USA
2
Jarvis College of Computing and Digital Media, DePaul University, Chicago, IL 60614, USA
3
Ann and Robert H. Lurie Children’s Hospital of Chicago, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
*
Author to whom correspondence should be addressed.
COVID 2025, 5(12), 205; https://doi.org/10.3390/covid5120205 (registering DOI)
Submission received: 4 November 2025
/
Revised: 10 December 2025
/
Accepted: 11 December 2025
/
Published: 14 December 2025
Abstract
Efforts have been made to develop a case definition for Long COVID, with results differing on whether the case definition should be specific and exclusive, or broad and easily generalizable. Each of these methods has been subject to limitations. As most efforts have focused on symptoms, inclusion criteria have often relied on the binary occurrence of a symptom. The current study uses a more detailed measure that considers the frequency and severity of symptoms in a sample of individuals with Long COVID and matched controls who recovered from acute SARS-CoV-2 infection. Patients were diagnosed with Long COVID in a systematic process involving their completion of quantitative questionnaires, qualitative interviews, a physical examination, and general laboratory testing to rule out other diagnoses. Since samples were comparatively small given the number of symptoms investigated, Leave One Out Cross-Validation (LOOCV) was used to develop LASSO regression models to determine which symptoms best distinguished Long COVID from recovered controls. An ideal threshold for classifying Long COVID based on symptomatology was developed using a receiver operator characteristics (ROC) curve. The model presented in this article identified Long COVID with high accuracy. The importance of smell/taste was lessened in the current study, and gastrointestinal symptoms took on greater prominence in our study. It is possible to achieve high accuracy in differentiating those with Long COVID from those who have recovered. It is important to specify criteria of Long COVID and to measure symptoms comprehensively to identify those with Long COVID. Reliably identifying those who have developed Long COVID will help in the formulation of treatment strategies.
Share and Cite
MDPI and ACS Style
Jason, L.A.; Furst, J.; Ruesink, L.; Katz, B.Z.
Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID 2025, 5, 205.
https://doi.org/10.3390/covid5120205
AMA Style
Jason LA, Furst J, Ruesink L, Katz BZ.
Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID. 2025; 5(12):205.
https://doi.org/10.3390/covid5120205
Chicago/Turabian Style
Jason, Leonard A., Jacob Furst, Lauren Ruesink, and Ben Z. Katz.
2025. "Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation" COVID 5, no. 12: 205.
https://doi.org/10.3390/covid5120205
APA Style
Jason, L. A., Furst, J., Ruesink, L., & Katz, B. Z.
(2025). Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID, 5(12), 205.
https://doi.org/10.3390/covid5120205
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.