Long COVID Classification: Findings from a Clustering Analysis in the Predi-COVID Cohort Study

The increasing number of people living with Long COVID requires the development of more personalized care; currently, limited treatment options and rehabilitation programs adapted to the variety of Long COVID presentations are available. Our objective was to design an easy-to-use Long COVID classification to help stratify people with Long COVID. Individual characteristics and a detailed set of 62 self-reported persisting symptoms together with quality of life indexes 12 months after initial COVID-19 infection were collected in a cohort of SARS-CoV-2 infected people in Luxembourg. A hierarchical ascendant classification (HAC) was used to identify clusters of people. We identified three patterns of Long COVID symptoms with a gradient in disease severity. Cluster-Mild encompassed almost 50% of the study population and was composed of participants with less severe initial infection, fewer comorbidities, and fewer persisting symptoms (mean = 2.9). Cluster-Moderate was characterized by a mean of 11 persisting symptoms and poor sleep and respiratory quality of life. Compared to the other clusters, Cluster-Severe was characterized by a higher proportion of women and smokers with a higher number of Long COVID symptoms, in particular vascular, urinary, and skin symptoms. Our study evidenced that Long COVID can be stratified into three subcategories in terms of severity. If replicated in other populations, this simple classification will help clinicians improve the care of people with Long COVID.


Introduction
It is now estimated that a mean of 10 to 20% of the people infected by the SARS-CoV-2 experience persisting and fluctuating symptoms more than 12 weeks after the acute infection [1,2]. This syndrome has been called "Long COVID" by patients themselves and has a high impact on the quality of life of the affected people and, as a consequence, on the whole healthcare system. Long COVID has been defined by WHO as a condition that occurs 3 months after infection with SARS-CoV-2, with symptoms that last at least 2 months and cannot be explained by any other diagnosis [3], but this definition does not account for the substantial intragroup variability in the different presentations of Long COVID.
Many studies described Long COVID in post-hospitalization cohorts [4][5][6] and, with similar results, in population-based studies of less severe forms of COVID-19 [7,8]. The most commonly reported symptoms are fatigue, shortness of breath, and cognitive dysfunction, usually having a major impact on daily life [3,7,8]. Long COVID affects many organs with pulmonary, cardiac, thromboembolic, neurologic, and renal sequelae. However, their distribution and intensity in the general population are largely heterogeneous [9].
A one-size-fits-all care strategy for people with Long COVID is therefore not possible and a better understanding of the subforms of Long COVID would allow for developing personalized care for people with Long COVID or could be integrated as a screening tool for future clinical trials [10]. To date, few studies used clustering analysis to identify and characterize different Long COVID phenotypes [8,11,12].
In this study, we hypothesized that Long COVID can be stratified into different clinically relevant subgroups. We applied hierarchical clustering to study participants with Long COVID from the Predi-COVID cohort study to test this hypothesis.

Study Population
We used data from the Predi-COVID study, a prospective cohort study of persons in Luxembourg with a PCR-confirmed diagnosis of COVID-19. The study design and objectives have been published previously [13]. Participants were followed-up at 12 months with a self-reported questionnaire to update their general health status, persisting symptoms, and quality of life. The Predi-COVID study was approved in April 2020 by the National Research Ethics Committee of Luxembourg (study number 202003/07) and by the Luxembourg Ministry of Health as the authorizing body.
Persisting symptoms were collected using a list of 62 symptoms [10], further divided into 8 categories: ear/nose/throat symptoms, neurological and ocular symptoms, general symptoms, cardiorespiratory symptoms or diseases, gastrointestinal symptoms, vascular and ganglionic symptoms or diseases, urinary symptoms, and skin symptoms (see online  Supplementary Table S1 for the full list).
Sleep quality was assessed using the Pittsburgh Sleep Quality Index [16]. The respiratory quality of life was assessed with the VQ11 questionnaire (global score and 3 subscores) [17]. Finally, participants were asked whether they could envisage coping with their current health status in the long term (yes/no).
Inclusion criteria for our analysis were: adult participants with a complete 12-month questionnaire and baseline data available and who declared at least one persisting symptom.

Clustering and Statistical Analysis
The clustering was based on the following features: sociodemographic characteristics, initial classification of COVID-19 disease severity, comorbidities, symptoms at inclusion, and quality of life (see online Supplementary Table S2 for the full list).
A hierarchical ascendant classification (HAC) was used to construct clusters [8]. The optimal number of clusters was determined using the "elbow" method, which calculates the distortion depending on the number of clusters with the objective to maintain clinical interpretability and sufficient cluster size. The cluster stability was assessed with the Jaccard similarity index. A simple imputation was done for variables if they had less than 5% of missing data (using median for quantitative variables and main modality for categorical variables) and multiple imputations using the mice package from R otherwise. Data were described with numbers and percentages for categorical variables and with mean and standard deviation for numerical variables. We performed the analysis by using R software v4.1.2 [18] and generated the figures by using the ggplot2 R package [19].

Population Study Characteristics
We initially included 545 participants between May 2020 and May 2021 with an available follow-up questionnaire 12 months after their primary infection. Participants with incomplete questionnaires were excluded (N = 54) as were participants aged less than 18 years (N = 1), and participants without any information about their study inclusion (N = 19) or about their initial COVID-19 severity classification (N = 3). Participants who did not experience any symptoms at 12 months were removed (N = 180). Finally, 288 participants were considered in the analysis (see online Supplementary Figure S1).

Clusters
Based on the elbow curve (see Figure 1), we determined the optimal cluster number to be three, which simultaneously allows good cluster stability (Cluster-Mild, Jaccard = 0.5707; Cluster-Moderate, Jaccard = 0.7556; and Cluster-Severe, Jaccard = 0.8297), clinical interpretability, and sufficient cluster size for each cluster. We labeled them according to their distinguishing characteristics. The characteristics of the overall study population and of the three clusters are shown in Table 1. Cluster-Mild contains 139 participants (48.26%). Compared with the overall study population, the initial disease severity was classified as moderate/severe for only 24% of the members of Cluster-Mild. Individuals in this cluster had a less impacted quality of life than the overall study population: only 7.9% declared that they could not envisage coping with their symptoms in the long term, 40% of them had poor sleep quality, and 5.8% had poor respiratory quality of life. Overall, participants in Cluster-Mild had fewer comorbidities (8.6%). At 12 months, participants declared fewer symptoms overall (mean number = 2.89, sd = 2.15). The symptoms were mostly grouped in the following categories: general symptoms (58%), neurological and ocular symptoms (37%), and cardiorespiratory symptoms or diseases (24%).
Cluster-Moderate contains 106 participants (36.81%). Compared with the overall study population, members were slightly more frequently female (62%) and presented more frequently a moderate/severe form of the initial illness (39%). Quality of life was more impacted with 23% of Cluster-Moderate declaring that they could not envisage coping with their symptoms in the long term, 78% of them having a poor sleep quality, and 48% having a poor respiratory quality of life. Comorbidities were similar in Cluster-Moderate and in the overall study population but participants declared a higher number of symptoms at 12 months (mean = 11.5, sd = 5.7). All participants had general symptoms (100%), and a large majority also had neurological and ocular symptoms (95%) and cardiorespiratory symptoms or diseases (82%). Most participants also had ENT symptoms (61%).
Cluster-Severe contains 43 participants (14.93%). Compared with the overall study population, members were mostly females (72%). Participants were more frequently smokers (33%) and 47% had an initial moderate/severe acute illness. Similar to Cluster-Moderate, the quality of life in Cluster-Severe was highly impacted with 84% of them having poor sleep quality and 51% having a poor respiratory quality of life. Overall, participants in Cluster-Severe presented more comorbidities at inclusion (28%), hypertension being the most frequent one (28%). At 12 months, participants had a high number of symptoms (mean = 18, sd = 9). The presentation of symptoms was similar to Cluster-Moderate for general, neurological, and cardiorespiratory symptoms: all participants had general symptoms (100%), 84% had neurological and ocular symptoms or diseases, and 91% had cardiorespiratory symptoms or diseases. High frequencies of vascular, skin, and urinary symptoms (86%, 86%, and 33%, respectively) characterize Cluster-Severe.
The symptom distribution by symptom categories in the three clusters is represented in Figure 2, which shows the differences among the clusters.

Discussion
In this study, we identified three clusters of Long COVID in people with persisting symptoms 12 months after acute infection with a clear gradient in Long COVID severity. Cluster-Mild represented almost half of the study population and was composed of participants with less severe initial infection, fewer comorbidities, and with few persisting symptoms (mean = 2.9), mainly in the general, neurological, or cardiorespiratory categories. Individuals in Cluster-Moderate declared a mean of 11.5 persisting symptoms and had poor quality of sleep and of respiratory quality of life. Cluster-Severe was characterized by a higher proportion of women, smokers, and a higher number of pre-existing comorbidities than in Clusters-Mild and Clusters-Moderate. Strikingly, participants from Cluster-Severe declared more persisting symptoms in total than those from Cluster-Moderate (mean = 18), with a similar pattern of general, neurological, and cardiorespiratory symptoms, but is distinct by higher occurrences of vascular, urinary, and skin symptoms.
General symptoms were predominant in all three clusters. This is in line with previous findings showing that general symptoms were the most frequently reported symptoms in people with persisting symptoms at 12 months, with a predominance of fatigue (34.3%), irritability (18%), anxiety (15.9%), muscle or joint pain in the lower limbs (15.6%), and back pain (14.9%) [10].
Few studies investigated clustering analysis of Long COVID patients. Kenny et al. applied similar clustering methods to a prospective cohort of 233 COVID-19-infected patients with ongoing symptoms at least 4 weeks after acute infection and also described three clusters: the largest constituted by participants with a lower number of persisting symptoms (mean = 2) and two characterized by a higher number of persisting symptoms (mean = 4 and 6) and more functional impairments. As in our study, the distribution of persisting symptoms was different between the two most severe clusters, with one cluster grouping cardiorespiratory and general symptoms, and the other one with a predominance of pain-related symptoms. The time and method of symptom evaluation were different as it was done in person during a visit to a clinic and the median time of symptom duration was 18 weeks [12]. Another study identified three different clusters among a cohort of 1969 post-hospitalized COVID-19 patients in Spain [11]: one cluster grouped patients with fewer comorbidities and symptoms at the hospital inclusion, less persisting symptoms, and had a preserved quality of life, and the other two clusters were constituted of patients with more pre-existing comorbidities, a higher number of symptoms during the acute phase, a higher number of persisting symptoms, and greater impact on quality of life (higher level of anxiety and altered sleep quality). One cluster was also characterized by respiratory symptoms (dyspnea at rest, 73.4%) and particularly high limitations in daily activities (92.1% for social activities and 93.3% for instrumental daily activities). The overall number of symptoms in each cluster was lower than in our clusters because their clustering also included participants without persisting symptoms.
Another study conducted in the United Kingdom in 2022 also described groups of people with Long COVID. More participants (N = 2550) were recruited, via an online survey, with a mean duration of illness of 7.2 months (sd = 1.8). The mean age was similar to our participants, as was the greater presence of women and comorbidities. The most common first symptoms (fatigue, headache, chest pain, shortness of breath, and cough), persistent symptoms (fatigue, cognitive dysfunction, chest pain, shortness of breath, headache, and muscle pain), number of symptoms experienced, and organ systems affected, were also similar. Participants were asked to report the presence or absence of 35 symptoms, and two groups were identified. The first group (88.8%) had mainly cardiopulmonary, cognitive, and fatigue symptoms and the second group had more multisystem symptoms [8], which aligns relatively well with our findings.
Reese et al. applied an adapted Phenomizer algorithm to classify patients with Long COVID, based on the ICD-10 diagnosis code U09.9 for post-COVID-19 condition, and identified six clusters [20]. Although the clustering method was different and based on medical records data, this study also identified two "severe" clusters with more pre-existing comorbidities, an increased initial illness, and a wide range of Long COVID symptoms.
The larger representation of women in the most severe cluster is consistent with findings from other studies [8,12].
Finally, despite different analysis time points, similar results were found in these different studies, which confirm that our findings are relevant despite the fluctuating character of Long COVID.

Strengths and Limitations
This study has several strengths. First, a large list of 62 symptoms was considered, distributed in eight categories that cover the complex symptomatology of Long COVID. Participants with different forms of initial illness severity were represented. All participants had a documented initial COVID-19 infection, confirmed by a PCR test and their symptoms were assessed 12 months after acute infection.
This study also has some limitations. The analyses were done on a moderate sample size and, as in any selected study population, results may not be directly extrapolated to all people with Long COVID. External validation in a larger population would be of the highest interest to confirm these results. Information on pre-existing symptoms before COVID-19 infection was missing and symptoms were self-reported, which could lead to bias in estimating the number of persisting symptoms attributable to COVID-19. However, this may not affect the main message of our findings. The participants in the present study were included before the Omicron wave; thus, we cannot ensure that our results can be extended to Long COVID following infection by the Omicron variant. Recent studies demonstrated that infection by Omicron variants leads to a 24 to 50% risk reduction of developing Long COVID; however, there were no differences in the distribution of Long COVID symptoms and the risk of neurological and psychiatric sequelae remains the same after infection by Omicron [21][22][23].

Conclusions
Our study highlighted three clinically relevant subgroups of people with Long COVID of increasing severity, but also with different patterns of symptoms. Such stratification of Long COVID will help healthcare professionals improve the triage and care of people with Long COVID.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ijerph192316018/s1, Figure S1: Flowchart of participants included in the analyses (N = 288); Table S1: Full list of persisting symptoms considered in the 12-months questionnaire; Table S2: Full list of features included in the clustering.
Author Contributions: N.B., A.F., and G.F. wrote the manuscript. All the authors interpreted the data, critically revised the manuscript, and approved the final version. All authors have read and agreed to the published version of the manuscript.

Funding:
The Predi-COVID study is supported by the Luxembourg National Research Fund (FNR) (Predi-COVID, grant number 14716273), the André Losch Foundation, and the Luxembourg Institute of Health.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the National Research Ethics Committee of Luxembourg (study number 202003/07).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data are available from the corresponding author upon reasonable request.