Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation

Jason, Leonard A.; Furst, Jacob; Ruesink, Lauren; Katz, Ben Z.

doi:10.3390/covid5120205

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation

by

Leonard A. Jason

^1,*

,

Jacob Furst

²,

Lauren Ruesink

¹ and

Ben Z. Katz

³

¹

Center for Community Research, DePaul University, Chicago, IL 60614, USA

²

Jarvis College of Computing and Digital Media, DePaul University, Chicago, IL 60614, USA

³

Ann and Robert H. Lurie Children’s Hospital of Chicago, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA

^*

Author to whom correspondence should be addressed.

COVID 2025, 5(12), 205; https://doi.org/10.3390/covid5120205 (registering DOI)

Submission received: 4 November 2025 / Revised: 10 December 2025 / Accepted: 11 December 2025 / Published: 14 December 2025

(This article belongs to the Special Issue Long COVID: Pathophysiology, Symptoms, Treatment, and Management)

Download Versions Notes

Abstract

Efforts have been made to develop a case definition for Long COVID, with results differing on whether the case definition should be specific and exclusive, or broad and easily generalizable. Each of these methods has been subject to limitations. As most efforts have focused on symptoms, inclusion criteria have often relied on the binary occurrence of a symptom. The current study uses a more detailed measure that considers the frequency and severity of symptoms in a sample of individuals with Long COVID and matched controls who recovered from acute SARS-CoV-2 infection. Patients were diagnosed with Long COVID in a systematic process involving their completion of quantitative questionnaires, qualitative interviews, a physical examination, and general laboratory testing to rule out other diagnoses. Since samples were comparatively small given the number of symptoms investigated, Leave One Out Cross-Validation (LOOCV) was used to develop LASSO regression models to determine which symptoms best distinguished Long COVID from recovered controls. An ideal threshold for classifying Long COVID based on symptomatology was developed using a receiver operator characteristics (ROC) curve. The model presented in this article identified Long COVID with high accuracy. The importance of smell/taste was lessened in the current study, and gastrointestinal symptoms took on greater prominence in our study. It is possible to achieve high accuracy in differentiating those with Long COVID from those who have recovered. It is important to specify criteria of Long COVID and to measure symptoms comprehensively to identify those with Long COVID. Reliably identifying those who have developed Long COVID will help in the formulation of treatment strategies.

Keywords: long COVID; case definition; assessment; LASSO regression

Share and Cite

MDPI and ACS Style

Jason, L.A.; Furst, J.; Ruesink, L.; Katz, B.Z. Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID 2025, 5, 205. https://doi.org/10.3390/covid5120205

AMA Style

Jason LA, Furst J, Ruesink L, Katz BZ. Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID. 2025; 5(12):205. https://doi.org/10.3390/covid5120205

Chicago/Turabian Style

Jason, Leonard A., Jacob Furst, Lauren Ruesink, and Ben Z. Katz. 2025. "Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation" COVID 5, no. 12: 205. https://doi.org/10.3390/covid5120205

APA Style

Jason, L. A., Furst, J., Ruesink, L., & Katz, B. Z. (2025). Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID, 5(12), 205. https://doi.org/10.3390/covid5120205

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI