Next Article in Journal
Gene-Level Analyses of Novel Olfactory-Related Signal from Severe SARS-CoV-2 GWAS Reveal Association with Disease Mortality
Next Article in Special Issue
Neurodivergence as a Risk Factor for Post-COVID-19 Syndrome
Previous Article in Journal
Association Between Methylprednisolone and the Increase of Respiratory Infections in COVID-19 Patients in the Intensive Care Unit
Previous Article in Special Issue
Does Low-Dose Oral Naltrexone Alleviate Symptoms of Long COVID? A Systematic Review and Meta-Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation

1
Center for Community Research, DePaul University, Chicago, IL 60614, USA
2
Jarvis College of Computing and Digital Media, DePaul University, Chicago, IL 60614, USA
3
Ann and Robert H. Lurie Children’s Hospital of Chicago, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
*
Author to whom correspondence should be addressed.
COVID 2025, 5(12), 205; https://doi.org/10.3390/covid5120205
Submission received: 4 November 2025 / Revised: 10 December 2025 / Accepted: 11 December 2025 / Published: 14 December 2025
(This article belongs to the Special Issue Long COVID: Pathophysiology, Symptoms, Treatment, and Management)

Abstract

Efforts have been made to develop a case definition for Long COVID, with results differing on whether the case definition should be specific and exclusive, or broad and easily generalizable. Each of these methods has been subject to limitations. As most efforts have focused on symptoms, inclusion criteria have often relied on the binary occurrence of a symptom. The current study uses a more detailed measure that considers the frequency and severity of symptoms in a sample of individuals with Long COVID and matched controls who recovered from acute SARS-CoV-2 infection. Patients were diagnosed with Long COVID in a systematic process involving their completion of quantitative questionnaires, qualitative interviews, a physical examination, and general laboratory testing to rule out other diagnoses. Since samples were comparatively small given the number of symptoms investigated, Leave One Out Cross-Validation (LOOCV) was used to develop LASSO regression models to determine which symptoms best distinguished Long COVID from recovered controls. An ideal threshold for classifying Long COVID based on symptomatology was developed using a receiver operator characteristics (ROC) curve. The model presented in this article identified Long COVID with high accuracy. The importance of smell/taste was lessened in the current study, and gastrointestinal symptoms took on greater prominence in our study. It is possible to achieve high accuracy in differentiating those with Long COVID from those who have recovered. It is important to specify criteria of Long COVID and to measure symptoms comprehensively to identify those with Long COVID. Reliably identifying those who have developed Long COVID will help in the formulation of treatment strategies.

1. Introduction

Many who have contracted coronavirus disease (COVID-19) caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) have not fully recovered or have persisting or new symptoms, in a condition termed Long COVID [1]. By February 2024, 17.1 to 18.2% of U.S. adults reported having experienced Long COVID [2]. As of March 2024, about 7% of all US adults had Long COVID, which is roughly 17 million people [3]. Those with a more severe initial infection have a higher risk of developing Long COVID, but Long COVID can develop even after a mild SARS-CoV-2 infection [4]. Other terms for Long COVID include “post-COVID-19 condition” or “long haul COVID”. The CDC uses the term post-acute sequelae of COVID-19 (PASC). Stephenson et al. [5] found that those with Long COVID have more symptoms than those who tested negative for SARS-CoV-2; other studies have corroborated these findings [2,6].
There have been multiple efforts to define Long COVID. The World Health Organization’s [7] case definition of what they referred to as “post-COVID-19 condition” states that this condition “occurs in individuals with a history of probable or confirmed SARS-CoV-2 infection, usually 3 months from the onset of COVID-19 with symptoms that last for at least 2 months and cannot be explained by an alternative diagnosis”. Another more recent broad definition by the National Academies of Sciences, Engineering, and Medicine [8] involves having continuous, relapsing, and remitting symptoms affecting one or more organ systems for at least three months following acute SARS-CoV-2 infection.
Thaweethai et al. [9] proposed a narrower definition of Long COVID and evaluated their definition using a national data set of those infected versus those not infected with SARS-CoV-2. They identified 12 symptoms (i.e., loss of or change in smell and taste, post-exertional malaise, chronic cough, brain fog, thirst, heart palpitations, chest pain, fatigue, change in sexual desire or capacity, dizziness, gastrointestinal symptoms, and abnormal movements). A symptom score (created from LASSO coefficient magnitudes) was provided for each symptom, and individuals who scored 12 or greater were classified as having Long COVID. Each of the symptoms had an associated value. The occurrence of “loss of or change in smell or taste” was the best discriminator between those with and without Long COVID and was associated with 8 points. Therefore, if a respondent had this symptom, it counted for 8 of the needed 12 points, so 66% of the required score to meet their Long COVID definition. They found that among those who were ultimately classified as having Long COVID, 41% had this symptom. Dorri and Jason [10] looked at the same item, “loss of or change in smell and/or taste,” in another Long COVID data set and found just 12.6% of patients satisfied the requirements for this symptom, using criteria that require symptoms to occur at least half the time and have at least moderate severity. Thus, frequency and severity criteria caused this symptom to decrease in prevalence from 41% to 12.6% in patients with Long COVID. Other studies have also demonstrated a lower prevalence of this symptom among patients with COVID-19 across the US [11].
Occurrence measures (i.e., a symptom occurs or does not occur) alone may benefit from more complexity to characterize Long COVID. As an example, Oliveira et al. [12] used frequency and severity ratings with a machine learning algorithm and applied an adaptive LASSO approach for feature selection, choosing tuning parameters and penalization with cross-validation. They found that “unrefreshing sleep” and “flu-like symptoms” were the best discriminators of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) versus Long COVID. In the current study, we applied a similar analysis to patients with Long COVID to distinguish that group from recovered controls. We used a symptom measure that incorporated both frequency and severity with a sample of individuals with Long COVID and those without. LASSO procedures were used to first develop and then test models based on symptom presentation to best differentiate those with Long COVID and recovered controls. A scale was developed using an ROC curve to determine the best scores distinguishing patients with Long COVID from recovered controls. The scale and threshold score could then be used to define those with Long COVID.

2. Methods

2.1. Participants

COVID data were collected from 2023 to 2024. Questionnaires were first distributed to patients who self-reported having Long COVID on multiple online Long COVID support group forums and through recruitment efforts at local universities. Patients were then identified as having Long COVID using quantitative questionnaires, qualitative interviews, a medical examination, and laboratory screening to rule out other conditions (see below). Comprehensive clinical evaluations determined who had both been infected with SARS-CoV-2 and continued to experience significant symptoms for at least three months following infection and did not have any other medical conditions that could explain their symptoms. Interviews and questionnaires were conducted so that the language used would not bias the patient’s answer. For example, we asked questions about their infection with SARS-CoV-2 and what symptoms persisted. Through various methods including the medical examination, surveys, and interviews, we were able to determine which participants had continuing symptoms since the SARS-CoV-2. We included (a) Any person with a positive Nucleic Acid Amplification Test (NAAT) OR positive SARS-CoV-2 antibody test. (This definition modifies the WHO criterion to add detectable SARS-CoV-2 antibody as a qualifying test.); (b) Any person with a positive SARS-CoV-2 Antigen-rapid test (including home-administered rapid test) AND meeting either the probable or suspected case definition; (c) An asymptomatic person with a positive SARS-CoV-2 Antigen-rapid diagnostic test (RDT) (including home-administered RDT) who is a contact of a probable or confirmed case; (d) Any person with a positive SARS-CoV-2 nucleocapsid protein antibody test OR a positive SARS-CoV-2 spike protein antibody test in a non-vaccinated individual. We attempted to identify relative weights that could be applied to symptoms through an algorithm to best distinguish cases of Long COVID from recovered controls. Participants included 55 patients with Long COVID and 55 recovered controls, matched to patients with Long COVID by sex, age, race, and ethnicity.

2.2. Measures

Participants completed the DePaul Symptom Questionnaire (DSQ), a 54-item self-report survey that measures ME/CFS symptomatology [13], and the DSQ-COVID, a questionnaire developed to measure Long COVID symptoms [10,14]. The DSQ-COVID collected data on COVID-related symptoms, created by identifying the most common symptoms across the COVID-19 research literature. A literature search used the following terms: “exploratory factor analysis of COVID”, “exploratory factor analysis of Long COVID”, “factor analysis of COVID”, and “factor analysis of Long COVID” across several databases (DePaul University Library system, PubMed, Google Search, and Google Scholar). In addition, possible symptoms were presented to patient communities for their feedback. The information patients provided was adjudicated among the research team, and then a revised list was created and shared with patients for additional feedback. This feedback was evaluated, and a final list of 38 symptoms was developed. All symptoms were measured on a 5-point Likert scale for frequency (e.g., 0 = None of the time, 1 = A little of the time, 2 = About half the time, 3 = Most of the time, and 4 = All of the time) and severity (e.g., 0 = Absent, 1 = Mild, 2 = Moderate, 3 = Severe, and 4 = Very severe). Composite scores were calculated for each symptom by averaging their corresponding frequency and severity and then multiplying each result by 25 to create values ranging from 0 to 100, with higher scores indicating more frequent/severe symptoms. In the past, we have used a frequency score of 2 or higher (at least about half the time) and a severity score of 2 or higher (at least moderate) to indicate a significant symptom, but in this study, we allowed the scores to vary from 0 to 100 as we were attempting to develop an algorithm for defining Long COVID.

2.3. Preprocessing

Missing values were imputed depending on what values were missing. For simplicity, the frequency and severity of answers for a particular question were referred to as pair values. For cases where one paired value (either frequency or severity) for a symptom was missing, we imputed the missing value based on the available information as follows: If the missing value was paired with a value of zero, the missing value was assumed to be zero. If the missing value had a non-zero pair value, the mode of the score for other participants who had the same score for the pair value was imputed. For instance, if the severity of a symptom was missing, and its pair value frequency score was 3, the mode of other participants with a frequency of 3 was imputed as the missing value. If both values for a symptom were missing, the median value for the symptom was imputed for both values.
Our choice of imputation method was guided by the structure of the symptom-rating system. In the DSQ framework, frequency and severity scores are conceptually linked paired dimensions of the same underlying symptom. In practice, these two values tend to cluster together because respondents who report a symptom as frequent generally also report higher severity, while those who report a symptom as infrequent typically report low severity. Imputing from participants with the same paired value preserves this empirical dependency more effectively than alternative simple imputations (e.g., mean substitution or global modal imputation), which would ignore the known correspondence between the two ratings. More complex approaches (e.g., multiple imputation) were not suitable for this dataset because of (a) the ordinal, highly skewed distribution of the items, and (b) the fact that symptom frequency and severity scores are not independent variables, but components of a scoring algorithm used to derive overall symptom burden. Multiple imputation algorithms treating these scores as independent predictors tend to produce unstable or clinically nonsensical values.

2.4. Statistical Analysis

We used composite variables taking in the frequency and severity of symptoms on a scale from 0 to 100, as that provided the most pertinent symptom information for developing a predictive model for a definition of Long COVID. Since our dataset was relatively small, the Least Absolute Shrinkage and Selection Operator (LASSO) analysis was trained using a Leave One Out Cross-Validation (LOOCV), through which we generated 110 LASSO models. Because LASSO variable selection can be unstable in small samples, we generated 110 LOOCV LASSO models and averaged the resulting coefficients to obtain more stable estimates of predictor importance. This approach aligns with established methods in stability selection and model aggregation, which use repeated regularization fits to improve robustness in datasets with correlated predictors [15,16]. We then took the average of the LASSO coefficients for each symptom to create an average model. The LASSO coefficient is a regression coefficient that indicates the strength of a variable’s influence on the outcome. Due to the LASSO process, each symptom coefficient can be penalized by a standardized value. Sometimes the penalty is larger than a particular coefficient’s value, in which case that coefficient would be shrunk to zero and eliminated from the model. We also made a set of simplified integer scores for each symptom by multiplying their smaller LASSO coefficients by 1000, rounding the results to whole numbers, and then removing any symptoms whose integer scores were less than 1. We tested the model using a threshold based on the simplified integer scores.
Each LASSO coefficient score was multiplied by a participant’s composite score, which was calculated from the frequency and severity of their symptoms according to the DSQ. The resulting products were then summed for each participant, giving the participants a final score predicting their likelihood of having Long COVID. The participants’ final scores were put along an ROC curve to evaluate the best threshold for distinguishing participants with Long COVID from recovered controls. The threshold with the best accuracy was selected and used to formulate a definition of Long COVID.

3. Results

The participants, who averaged about 19 and a half years of age, were primarily white females who were not Latinx. There were no significant age [Long COVID (M= 19.8, SD = 2.31 versus Recovered (M = 19.5, SD = 1.30), t(88.32) = −0.77, p = 0.45], racial [Long COVID White (69.1%), Other (16.4%), Asian (7.3%), Black (7.3%) versus Recovered White (70.9%), Other (9.1%), Asian (12.7%), Black (7.3%), X2 (3, N = 110) = 1.97, p = 0.58], Latinx [Long COVID (25.5%) versus Recovered (20%), [X2 (1, N =110) = 0.47, p = 0.50], or gender differences [Long COVID women (78.0%) versus Recovered women (64.2%) [X2(1, N = 110) = 2.39, p = 0.12].
Table 1 shows each of the symptoms with associated values created from their LASSO coefficients. Among the highest-scored symptoms (i.e., those that were most discriminatory between the two groups) were shortness of breath or trouble catching your breath, gastrointestinal symptoms, loss of/change in smell or taste, dizziness or fainting, heavy legs and/or swelling of legs, physically drained or sick after mild activity, nose congestion, muscle aches, vision problems, no appetite, and absent-mindedness or forgetfulness. Figure 1 shows the means of composite scores for the key symptoms, on a 100-point scale, with higher scores indicating higher frequency/severity of symptoms.
Figure 2 provides the optimal threshold for identifying Long COVID: composite scores of 530 or higher yielded a diagnosis of Long COVID with 90.91% Accuracy, 89.09% Sensitivity, and 92.73% Specificity (see Table 2). This formula can thus be used as the basis for a case definition of Long COVID. The equation for the likelihood of having Long COVID total score is: Likelihood of Having Long COVID = (6) × (shortness of breath composite) + (5) × (gastrointestinal composite) + (3) × (loss smell and taste composite) + (2) × (dizziness composite) + (2) × (heavy legs composite) + (2) × (physically drained composite) + (2) × (nose congestion composite) + (1) × (muscle aches composite) + (1) × (vision problems composite) + (1) × (no appetite composite) + (1) × (absentmindedness composite).

4. Discussion

The current study found that by using data mining strategies, it is possible to achieve high accuracy in differentiating those with Long COVID versus those who have recovered using precise measures of frequency and severity. In contrast to definitions of Long COVID that use occurrence measures, such as those of the National Academies of Sciences, Engineering, and Medicine [8], our study identifies key symptoms that best differentiate those who have recovered from SARS-CoV-2 infection versus those who have not. When Thaweethai et al. [9] employed a narrower criterion to define Long COVID, they found that among their participants first infected on or after 1 December 2021, and enrolled within 30 days of infection, 10% were Long COVID positive at 6 months; however, among those not infected, 4.6% still met their case definition of Long COVID. Our study used more precise frequency/severity measures but was not able to determine what percentage of uninfected patients would satisfy our formulaic definition of Long COVID as we do not have an uninfected control group. Our study also utilized a high threshold for predicting Long COVID, at 530, which according to our ROC analysis was the threshold that would yield the most accuracy; however, it did cause 6 of our 55 participants (10.9%) with Long COVID to be falsely classified as recovered controls.
Our study found that shortness of breath or trouble catching your breath, gastrointestinal symptoms, and loss of/change in smell or taste were the three highest-rated items for identifying Long COVID, with other high-scoring items consisting of autonomic domains (dizziness or fainting, heavy legs and/or swelling of legs, vision problems), post-exertional malaise (physically drained or sick after mild activity), a respiratory symptom (nose congestion), another gastrointestinal symptom (no appetite), muscle aches, and a cognitive item (absent-mindedness or forgetfulness). In contrast, Thaweethai et al. [9] found smell/taste and post-exertional malaise to be the highest-rated items, with chronic cough, brain fog, and thirst being the next highest-rated items. The importance of smell/taste was lessened in the current study, and gastrointestinal symptoms take on greater prominence in our study, as was also seen in our previous analysis of patients with ME/CFS following infectious mononucleosis [17]. It appears that using more precise measures of frequency/severity rather than just using occurrence measures resulted in a different selection of symptoms as being the most important.
A key question involves how broad or narrow a case definition might be. Long COVID has more than 200 possible symptoms, and criteria that are very broad will have good sensitivity so that all those who have the condition will be identified. However, such an approach will have poor specificity, so many will be inaccurately diagnosed with Long COVID. If a person can meet Long COVID criteria by merely having a few minor symptoms for 3 months following COVID infection, the prevalence of Long COVID will be extremely high. For example, a large percentage of primary care patients with psychogenic causes have unexplained symptoms [18], and they might fit a broad case definition of Long COVID. Therefore, a broader case definition might also lead to incorrectly attributing those with Long COVID to having psychogenic causes.
Those with another post-viral illness, ME/CFS, have sometimes been re-traumatized by the reaction of healthcare workers, friends, and even family members to their disease [19]. This same type of stigma could occur for those with Long COVID. With ME/CFS, because about 20% of the general population experiences fatigue [20], it is not uncommon for people to feel their fatigue is comparable to ME/CFS, and if they can cope with their symptoms, they expect others to cope with what they believe to be similar symptoms. Yet these attitudes trivialize the experience of ME/CFS, because common fatigue is not the same as the debilitating fatigue (and other associated symptoms) of ME/CFS. The consequences are that 95% of individuals seeking medical treatment for ME/CFS report feelings of estrangement [21], 90% of patients with ME/CFS report delegitimizing experiences by physicians [22], and most cannot find a knowledgeable and sympathetic physician to care for them [23]. Avoiding similar trauma for patients with Long COVID should be a priority [24].
There are several methodological limitations to the current study. Sample sizes were small given the number of variables; however, we used leave-one-out procedures to help mitigate this issue. The highly specific demographic profile of our sample, predominantly young, adult, white females, limits the generalization of the findings to broader Long COVID populations. Finally, as mentioned above, we did not have an uninfected control group, and biological measures were not used in the current study. Hopefully, biomarkers will be identified in future studies to provide a better understanding of different presentation types of Long COVID and their relation to organ systems impacted by Long COVID.

5. Conclusions

This study has found that it is possible to achieve high accuracy in differentiating those with Long COVID from those who have recovered. This can be accomplished by specifying the criteria of Long COVID and measuring symptoms comprehensively. Reliably identifying those who have developed Long COVID is often the first step that needs to be taken to formulate treatment strategies. To identify cases of Long COVID, it is critical to use surveys or questionnaires with adequate reliability and validity. More work needs to be done following patients over time to determine which patients who are affected continue to be impaired.

Author Contributions

Conceptualization: L.A.J. and B.Z.K.; data curation: L.A.J. and B.Z.K.; formal analysis: L.A.J., L.R. and J.F.; methodology: L.A.J., B.Z.K., L.R. and J.F.; project administration: L.A.J. and B.Z.K.; supervision: L.A.J. and B.Z.K.; writing—original draft: L.A.J. and B.Z.K., L.R. and J.F.; writing—review and editing: L.A.J., B.Z.K., L.R. and J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Institute of Neurological Disorders and Stroke (R01NS111105; MPIs: Jason & Katz). The content is the sole responsibility of the authors and does not necessarily reflect the official views of the National Institutes of Health.

Institutional Review Board Statement

This study was approved by the DePaul University Institutional Review Board, code: RB-2022-775, approved on 21 February 2023.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are available upon reasonable request to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Alwan, N.A. Surveillance is underestimating the burden of the COVID-19 pandemic. Lancet 2020, 396, E24. [Google Scholar] [CrossRef] [PubMed]
  2. Long COVID Household Pulse Survey. 2024. Available online: https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm (accessed on 1 April 2024).
  3. CDC. Nationwide COVID-19 Infection- and Vaccination-Induced Antibody Seroprevalence (Blood Donations). 2024b. 2022–2023. Available online: https://covid.cdc.gov/covid-data-tracker/#nationwideblood-donor-seroprevalence-2022 (accessed on 23 April 2024).
  4. Choutka, J.; Jansari, V.; Hornig, M.; Iwasaki, A. Unexplained post-acute infection syndromes. Nat. Med. 2022, 28, 911–923. [Google Scholar] [CrossRef] [PubMed]
  5. Stephenson, T.; Pinto Pereira, S.M.; Shafran, R.; De Stavola, B.L.; Rojas, N.; McOwat, K.; Simmons, R.; Zavala, M.; O’Mahoney, L.; Chalder, T.; et al. Physical and mental health 3 months after SARS-CoV-2 infection (long covid) among adolescents in England (CLoCk): A national matched cohort study. Lancet Child. Adolesc. Health 2022, 6, 230–239. [Google Scholar] [CrossRef] [PubMed]
  6. Buonsenso, D.; Munblit, D.; Pazukhina, E.; Ricchiuto, A.; Sinatti, D.; Zona, M.; De Matteis, A.; D’Ilario, F.; Gentili, C.; Lanni, R.; et al. Post-COVID condition in adults and children living in the same household in Italy. Front. Pediatr. 2022, 10, 834875. [Google Scholar] [CrossRef] [PubMed]
  7. World Health Organization. A Clinical Case Definition of Post COVID-19 Condition by a Delphi Consensus; World Health Organization, Inc.: Geneva, Switzerland, 2021; Available online: https://www.who.int/publications/i/item/WHO-2019-nCoV-Post_COVID-19_condition-Clinical_case_definition-2021.1 (accessed on 1 November 2021).
  8. National Academies of Sciences, Engineering, and Medicine. A Long COVID Definition: A Chronic, Systemic Disease State with Profound Consequences; The National Academies Press: Washington, DC, USA, 2024. [Google Scholar] [CrossRef]
  9. Thaweethai, T.; Jolley, S.E.; Karlson, E.W.; Levitan, E.B.; Levy, B.; McComsey, G.A.; McCorkell, L.; Nadkarni, G.N.; Parthasarathy, S.; Singh, U.; et al. Development of a definition of Postacute Sequelae of SARS-CoV-2 Infection. JAMA 2023, 329, 1934. [Google Scholar] [CrossRef] [PubMed]
  10. Dorri, J.A.; Jason, L.A. An exploratory factor analysis of Long-COVID. Cent. Asian J. Med. Hypotheses Ethics 2022, 3, 245–256. [Google Scholar] [CrossRef]
  11. Reiter, E.R.; Coelho, D.H.; French, E.; Costanzo, R.M.; N3C Consortium. COVID-19-Associated chemosensory loss continues to decline. Otolaryngol. Head Neck Surg. 2023, 169, 1386–1389. [Google Scholar] [CrossRef] [PubMed]
  12. Oliveira, C.R.; Jason, L.A.; Unutmaz, D.; Bateman, L.; Vernon, S.D. Improvement of Long COVID symptoms over one year. Front. Med. 2022, 9, 1065620. [Google Scholar] [CrossRef] [PubMed]
  13. Jason, L.A.; Sunnquist, M. The development of the DePaul Symptom Questionnaire: Original, expanded, brief and pediatric versions. Front. Pediatr. 2018, 6, 330. [Google Scholar] [CrossRef]
  14. Jason, L.A.; Dorri, J. Predictors of impaired functioning among Long COVID patients. Work. J. Prev. Assess. Rehabil. 2023, 74, 1215–1224. [Google Scholar] [CrossRef]
  15. Chatterjee, A.; Lahiri, S.N. Bootstrapping Lasso Estimators. J. Am. Stat. Assoc. 2011, 106, 608–625. [Google Scholar] [CrossRef]
  16. Meinshausen, N.; Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. B 2010, 72, 417–473. [Google Scholar] [CrossRef]
  17. Jason, L.A.; Cotler, J.; Islam, M.; Furst, J.; Katz, B.Z. Predictors for developing severe myalgic encephalomyelitis/chronic fatigue syndrome following infectious mononucleosis. J. Rehabil. Ther. 2022, 4, 1–5. [Google Scholar] [CrossRef] [PubMed]
  18. Jadhakhan, F.; Lindner, O.C.; Blakemore, A.; Guthrie, E. Prevalence of medically unexplained symptoms in adults who are high users of health care services: A systematic review and meta-analysis protocol. BMJ Open 2019, 9, e027922. [Google Scholar] [CrossRef] [PubMed]
  19. Froehlich, L.; Hattesohl, D.B.; Cotler, J.; Jason, L.A.; Scheibenbogen, C.; Behrends, U. Causal attributions and perceived stigma for myalgic encephalomyelitis/chronic fatigue syndrome. J. Health Psychol. 2022, 27, 2291–2304. [Google Scholar] [CrossRef] [PubMed]
  20. Yoon, J.-H.; Park, N.-H.; Kang, Y.-E.; Ahn, Y.-C.; Lee, E.-J.; Son, C.-G. The demographic features of fatigue in the general population worldwide: A systematic review and meta-analysis. Front. Public Health 2023, 11, 1192121. [Google Scholar] [CrossRef] [PubMed]
  21. Green, J.; Romei, J.; Natelson, B.H. Stigma and Chronic Fatigue Syndrome. J. Chronic Fatigue Syndr. 1999, 5, 63–75. [Google Scholar] [CrossRef]
  22. Ware, N.C. Suffering and the Social Construction of Illness: The Delegitimation of Illness Experience in Chronic Fatigue Syndrome. Med. Anthropol. Q. 1992, 6, 347–361. [Google Scholar] [CrossRef]
  23. Tidmore, T.; Jason, L.A.; Chapo-Kroger, L.; So, S.; Brown, A.; Silverman, M. Lack of knowledgeable healthcare access for patients with neuro-endocrine-immune diseases. Front. Clin. Med. 2015, 2, 46–54. [Google Scholar]
  24. Komaroff, A.L.; Bateman, L. Will COVID-19 lead to myalgic encephalomyelitis/chronic fatigue syndrome? Front. Med. 2021, 7, 606824. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Mean symptom composites defining Long COVID symptoms.
Figure 1. Mean symptom composites defining Long COVID symptoms.
Covid 05 00205 g001
Figure 2. Optimal threshold for identifying Long COVID with scored model.
Figure 2. Optimal threshold for identifying Long COVID with scored model.
Covid 05 00205 g002
Table 1. Symptoms that define Long COVID using Frequency and Severity Measures.
Table 1. Symptoms that define Long COVID using Frequency and Severity Measures.
SymptomLASSO
Coefficient
Final Score *Long COVIDRecovered Control
M (SD)M (SD)
Shortness of breath or trouble catching your breath5.90638.41 (20.53) 5.45 (11.97)
Gastrointestinal (belly) Symptoms (pain, feeling full or vomiting after eating, nausea, diarrhea, constipation)4.83536.82 (23.19) 6.14 (14.41)
Loss of/change in smell or taste2.7037.50 (20.36)1.36 (5.73)
Dizziness or fainting2.27230.68 (25.66)5.91 (12.47)
Heavy legs and/or swelling of legs2.1326.59 (15.38)1.14 (4.97)
Physically drained or sick after mild activity2.08239.77 (26.14)9.09 (15.49)
Nose congestion1.57246.36 (25.65)23.41 (13.19)
Muscle Aches1.15124.77 (18.56)4.56 (11.12)
Vision problems (blurry, light sensitivity, difficulty reading or focusing, floaters, flashing light)0.96122.95 (17.14)4.32 (12.55)
No appetite 0.63134.77 (26.43) 14.77 (20.28)
Absentmindedness or forgetfulness0.60147.72 (29.47)24.77 (22.50)
Need to nap daily0.46040.45 (27.32) 18.86 (22.55)
Nerve Problems (tremor, shaking, abnormal movements, numbness, tingling, burning, cannot move part of body, new seizures)0.18010.45 (19.80)0.91 (4.72)
Fever, chills, and or sweating0.10017.95 (26.33)9.55 (16.83)
Next day soreness or fatigue after non-strenuous, everyday activities0.07039.55 (28.44)12.27 (16.75)
Fatigue0.03055.91 (24.16)27.27 (20.28)
Difficulty finding the right word to say or expressing thoughts0.02047.50 (32.04)27.95 (24.53)
Color changes in your skin such as red, white or purple0.0106.36 (17.33)0.91 (6.74)
Gynecological symptoms (e.g., change in menstruation or menopause)0.0106.82 (17.48)2.73 (11.45)
Problems remembering things0.01045.91 (32.28)22.50 (25.05)
* A symptom score was assigned by rounding coefficients to the nearest integer.
Table 2. Success of scored model in identifying Long COVID.
Table 2. Success of scored model in identifying Long COVID.
Predicted Condition
Actual ConditionRecovered ControlLong COVID
Recovered Control (N = 55)514
Long COVID (N = 55)649
Accuracy = 90.91%. Sensitivity = 89.09%. Specificity = 92.73%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jason, L.A.; Furst, J.; Ruesink, L.; Katz, B.Z. Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID 2025, 5, 205. https://doi.org/10.3390/covid5120205

AMA Style

Jason LA, Furst J, Ruesink L, Katz BZ. Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID. 2025; 5(12):205. https://doi.org/10.3390/covid5120205

Chicago/Turabian Style

Jason, Leonard A., Jacob Furst, Lauren Ruesink, and Ben Z. Katz. 2025. "Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation" COVID 5, no. 12: 205. https://doi.org/10.3390/covid5120205

APA Style

Jason, L. A., Furst, J., Ruesink, L., & Katz, B. Z. (2025). Developing a Long COVID Case Definition: Using Machine Learning to Distinguish Long COVID Based on Symptom Presentation. COVID, 5(12), 205. https://doi.org/10.3390/covid5120205

Article Metrics

Back to TopTop