Clusters of Pregnant Women with Severe Acute Respiratory Syndrome Due to COVID-19: An Unsupervised Learning Approach

COVID-19 has been widely explored in relation to its symptoms, outcomes, and risk profiles for the severe form of the disease. Our aim was to identify clusters of pregnant and postpartum women with severe acute respiratory syndrome (SARS) due to COVID-19 by analyzing data available in the Influenza Epidemiological Surveillance Information System of Brazil (SIVEP-Gripe) between March 2020 and August 2021. The study’s population comprised 16,409 women aged between 10 and 49 years old. Multiple correspondence analyses were performed to summarize information from 28 variables related to symptoms, comorbidities, and hospital characteristics into a set of continuous principal components (PCs). The population was segmented into three clusters based on an agglomerative hierarchical cluster analysis applied to the first 10 PCs. Cluster 1 had a higher frequency of younger women without comorbidities and with flu-like symptoms; cluster 2 was represented by women who reported mainly ageusia and anosmia; cluster 3 grouped older women with the highest frequencies of comorbidities and poor outcomes. The defined clusters revealed different levels of disease severity, which can contribute to the initial risk assessment of the patient, assisting the referral of these women to health services with an appropriate level of complexity.


Introduction
Since its emergence as a pandemic in 2020, COVID-19 has been widely explored in relation to its epidemiological characteristics, symptoms, and outcomes, as well as comorbidities and risk factors that may predispose an individual to a severe condition. In the general population, approximately 80% of cases are asymptomatic or mild, with flu-like symptoms, such as fever, myalgia, headache, malaise, cough, and sore throat [1]. Of the remaining cases, 15% are severe and 5% critical, generally requiring supplemental oxygen and mechanical ventilation, respectively [2]. Nevertheless, there are groups at greater risk of developing the severe form of the disease, such as elderly individuals, individuals with comorbidities, and pregnant women [3,4].
The pregnancy-puerperal period deserves special attention as it involves immunophysiological, hormonal, and cardiopulmonary changes that can predispose an individual to complications resulting from respiratory infections such as COVID-19 [5,6]. Similarly to the general population, pregnant women are also frequently asymptomatic [7]. However, some studies have reported more severe cases of COVID-19 in the third trimester of pregnancy [7,8], possibly due to the greater overload of the cardiopulmonary system observed at the end of pregnancy, as well as among pregnant women who have comorbidities [7,9,10] or an unfavorable socioeconomic condition [11], which could result in difficulty in accessing quality health services in a timely manner.
Severe morbidity and mortality effects from COVID-19 are directly related to the evolution of the disease from a flu-like condition to a severe acute respiratory syndrome [12], which can result in poor outcomes for mothers and their babies, such as emergency cesarean section procedures and preterm birth [13,14]. Brazil is the country with the highest rates of maternal mortality from COVID-19 [6]. According to the Observatório Obstétrico Brasileiro (OOBr), maternal deaths from COVID-19 in the first half of 2021 (911 deaths until 31 May) exceeded the number recorded in the entire year of 2020 (544 deaths) [15]. Delays in initial care and hospitalization [16] and barriers to accessing resources available in specialized intensive care services, such as mechanical ventilators [11], were frequently experienced by this group and may be the main reason for the fatal outcome.
Unsupervised learning approaches are used when several characteristics are observed for each individual, but there is no previously defined outcome of interest, responsible for guiding the analysis. Therefore, the main objective is to understand relationships between variables or observations and to simplify or reduce data or identify target groups to which specific actions can be proposed [17]. Principal component analysis, multiple correspondence analysis, and cluster analysis are some examples of techniques that may be used for these purposes.
Specifically regarding COVID-19 research, these approaches were applied to reveal profiles of disease severity in the general population based on symptoms, laboratory measures, and comorbidities [18][19][20][21][22] and to detect vaccine hesitancy segments using data on intentions to take the COVID-19 vaccine, beliefs about the vaccine and the disease, and adherence to non-pharmacological interventions [23]. In this setting, identifying clusters of pregnant and postpartum women with different risk profiles for the severe form of COVID-19 can collaborate to the proposal of preventive measures, guidelines for seeking care in services of different levels of complexity, and for the organization of specific care for this population. This study aimed to identify clusters of pregnant or postpartum women with SARS resulting from COVID-19, according to symptoms, comorbidities, and hospital characteristics, using data routinely generated by healthcare services during the COVID-19 pandemic in Brazil.

Study Design and Participants
A nationwide cross-sectional study was carried out using an analysis of public data from the Influenza Epidemiological Surveillance Information System (Sistema de Informação da Vigilância Epidemiológica da Gripe, SIVEP-Gripe) from March 2020 to August 2021. The SIVEP-Gripe is the official Brazilian system for recording cases and deaths of Severe Acute Respiratory Syndrome (SARS). It is coordinated by the Ministry of Health since the Influenza A (H1N1) pandemic, which occurred in 2009. This information system started recording hospitalized SARS cases resulting from COVID-19 in February 2020. For the present study, we considered women aged between 10 and 49 years old (fertile period), whose gestational stage was classified as the first (n = 1092), second (n = 3276), or third (n = 7455) trimester of pregnancy; pregnant women whose gestational age was not informed (n = 587); and those who provided affirmative answers for the field corresponding to the puerperium-referring to women whose childbirth occurred at most 42 days earlier (n = 3999), resulting in a total of 16, 409 women.

Study Variables
The following variables were selected from the SIVEP-Gripe:

1.
Demographic and obstetric characteristics: maternal age group ( The SIVEP-Gripe form has an open field for the description of other symptoms and comorbidities. We analyzed these fields to retrieve symptoms and comorbidities not selected in the pre-defined corresponding fields, and we assessed characteristics frequently described for the pregnant and postpartum women population but that had no pre-defined field in this form (inappetence, nasal congestion, headache, chest pain, myalgia, malaise, and preeclampsia/eclampsia). With the exception of the use of MV, symptoms, comorbidities, and hospital characteristics were transformed into dichotomous variables (yes or non-occurrence). Furthermore, since in most cases the variable was filled in only when the characteristic was observed, empty fields were grouped into the non-occurrence category.

Statistical Analysis
Multiple correspondence and agglomerative hierarchical clustering analyses were performed to identify segments within the pregnant and postpartum women population ( Figure S1). These exploratory approaches together can provide relevant information for the clinical management of each specific subgroup in order to make better use of available resources.
As our first step, we applied multiple correspondence analyses (MCA) on categorical variables represented by symptoms, comorbidities, and hospital characteristics as a preprocess procedure to derive a set of continuous and uncorrelated principal components (PCs) [24]. This method makes it possible to summarize the information contained in categorical variables into a lower-dimensional space, which explains the maximum amount of data variability [25]. The MCA interpretation is based on the estimated eigenvalues and eigenvectors. The former are positive quantities related to inertia, which represents the total amount of explained variability (percentage of inertia) by a given PC, while the latter represents the PC orientation [24]. The eigenvalues order the PCs from the one that most explains the variability of the data to the one that explains the least. PCs totaling 50% of explained variability were retained for Agglomerative Hierarchical Cluster Analysis.
Cluster analysis includes a set of methods suitable for exploring data sets aimed at segmenting a heterogeneous population into clusters of individuals who present a high degree of homogeneity regarding the studied variables, as well as a high degree of heterogeneity between the defined clusters. These methods are divided into hierarchical and partitioning procedures, the former being of great practical interest since it does not require a prior definition of the number of clusters that has to be obtained. Hierarchical procedures can be further subdivided into agglomerative and divisive according to whether the initial step of the clustering algorithm is related to each individual being considered as a cluster or whether all individuals are a single cluster, respectively [26].
There are many agglomerative hierarchical clustering procedures, but all of them can be described according to the following basic steps: first, we have n clusters, each containing a single p-dimensional entity (individual) and an n × n symmetric matrix of distances. The task consists of searching for the most similar pair of clusters that will be merged; next, the distance matrix entries are updated, and this task is repeated until all individuals are in a single cluster; finally, we can display the procedure's results in a hierarchical tree and choose where to cross-section it to define the most suitable number of clusters [27].
We identified segments within the pregnant and postpartum women population by applying hierarchical clustering on the retained PCs in an agglomerative manner, using Ward's algorithm and Euclidean distance. This approach is based on joining clusters for which its combination implies the minimum loss of information, i.e., the smallest increase in an error sum of squares criterion. Thus, at each algorithm step, all possible pairs of clusters are evaluated, and pairs with a combination that results in a minimum loss of information are joined [27]. The number of clusters was determined by the visual inspection of the hierarchical tree; within-cluster homogeneity increases, represented by an inertia plot; and cluster interpretability [24]. The defined clusters were presented on a map formed by the first two PCs.
To validate and interpret these clusters, we estimated the absolute and relative frequencies of symptoms and comorbidities, as well as hospital, demographic, and obstetric characteristics. Furthermore, to assess whether clusters had different COVID-19 severity profiles, we compared these characteristics between the clusters by estimating relative risks (RR) and corresponding 95% confidence intervals (95% CI). Analyses were performed with R software version 4.1.3 by using the 'ca' and 'FactoMineR' packages.

Multiple Correspondence Analysis
Between March 2020 and August 2021, 16, 409 pregnant or postpartum women were registered on SIVEP-Gripe with SARS due to COVID-19. We retained 10 PCs from the MCA applied to 28 categorical variables related to symptoms, comorbidities, and hospital characteristics. These PCs summarized 51.18% of the total percentage of inertia (Table 1). Overall, the first PC distinguished women with mild symptoms (positive responses to 16: headache; 17: malaise; and 18: nasal congestion), related to a flu-like condition, and without comorbidities (upper left quadrant) from those symptomatic women with comorbidities, whereas the second PC seems to differentiate moderate from severe cases, the former represented by women with positive responses for ageusia (1), anosmia (2), abdominal pain (3), diarrhea (4), and vomiting (5) (upper right quadrant) and the latter by poor hospital outcomes (26a: invasive MV; 27: ICU admission and 28: death, in the lower right quadrant) (Figure 1). The black dots are "yes" categories, numbered according to the variable labels presented in the methods section. Gray dots correspond to the "non−occurrence" category for symptoms, comorbidities, or hospital characteristics.

Agglomerative Hierarchical Clustering
Hierarchical clustering was applied to the top 10 PCs retained from the MCA. A visual inspection of the hierarchical tree ( Figure S2) and inertia plot ( Figure S3) suggested a population segmentation into three clusters ( Figure 2).
In Table 2, we present the frequency of affirmative answers for the analyzed variables according to clusters. Cluster 1 (n = 7866; 48%) grouped younger women with the lowest frequencies of comorbidities and poor hospital outcomes (i.e., MV, ICU admission, and death), while cluster 3 (n = 6327; 38.5%) aggregated women with the highest frequencies for these characteristics, indicating segments of mild and severe disease, respectively. Cluster 2 (n = 2216; 13.5%), on the other hand, presented higher frequencies of anosmia and ageusia, as well as gastrointestinal symptoms. Additionally, this group is situated between cluster 1 and cluster 3 regarding positive answers for symptoms, comorbidities, and poor hospital outcomes, suggesting a segment of patients with moderate cases of the disease.   Table 3 shows the comparison of symptoms, comorbidities, hospital, demographic, and obstetric characteristics frequencies between clusters, considering cluster 1 as the reference group. Table 3. Relative Risk (RR) with 95% confidence interval (CI) for the comparison of symptoms, comorbidities, and hospital characteristics between clusters (cluster 1: reference group) of pregnant and postpartum women with SARS due to COVID-19.  It shows a gradual increase in frequencies from the first to the third cluster for variables related to hospital characteristics, lower airway respiratory symptoms (except fatigue), and comorbidities (except eclampsia). Furthermore, cluster 2 presented the highest relative risks for anosmia, ageusia, and gastrointestinal symptoms. Flu-like symptoms presented the most similar frequencies between the clusters (RR close to 1). In addition, cluster 1, despite being the one with a mild risk profile, had a higher frequency of pregnant women who were Indigenous, mixed, Black, or without race/skin color information and aged less than 35 years old, suggesting a group with unfavorable socioeconomic condition.

Discussion
In this study, we analyzed data from pregnant and postpartum women with SARS due to COVID-19 in Brazilian health services between March 2021 and August 2021. We identified three clusters in this population, corresponding to different disease severity profiles. In summary, cluster 1 was represented by younger women, with the lowest frequencies of comorbidities and poor hospital outcomes. This group presented flu-like symptoms, suggesting a segment of mild COVID-19 cases. Cluster 2 was represented by women who reported a variety of symptoms, mainly nasopharyngeal (ageusia and anosmia) and gastrointestinal. Finally, cluster 3 showed a more severe profile of COVID-19, grouping older women with the highest frequencies of comorbidities, mainly obesity, who most often required invasive MV and ICU admission. Mortality was also higher in this segment of the population.
Women in the pregnancy-puerperal period are routinely followed up by specialized health services, both during prenatal care and at the time of childbirth. However, given the diagnosis of viral infections, especially those that affect the airways, this population needs priority care, due to the implications of the disease for women and their babies. In Brazil, the COVID-19 pandemic highlighted pregnant women's difficulties in accessing health services, which, consequently, led to a delay in providing care to this population, which may have contributed to the high rate of maternal mortality from COVID-19 [6,16].
The population segmentation that we performed identified a group (cluster 1) with a higher frequency of younger women of Indigenous, mixed, Black, or not available race/skin color information who, despite having symptoms consistent with a mild disease, required care at health facilities. This group corroborates the expected profile for most cases of COVID-19, representing a segment of pregnant/postpartum women with symptoms similar to a flu-like condition. A systematic review [28] highlighted that the course of COVID-19 in pregnant women is similar to that of the general population, with most cases being asymptomatic and those with symptoms presenting a mild condition. Dashraath et al. (2021) [29] also reported no differences between pregnant and non-pregnant women regarding symptoms of COVID-19, with a predominance of fever, cough, dyspnea, and lymphopenia. Importantly, as these symptoms are commonly seen in colds and flu, this profile can confuse the diagnosis of COVID-19 and delay proper management. Additionally, the condition of social vulnerability requires close monitoring, as it is indicative of greater difficulty in accessing healthcare services.
Cluster 2 aggregated women who most frequently reported gastrointestinal symptoms, ageusia, and anosmia. Teixeira et al. (2021) [30] reported ageusia and anosmia as the most common clinical symptoms of COVID-19 in pregnant women with comorbidities in a study carried out in Rio de Janeiro. These are the most specific symptoms in COVID-19 infections, with great potential to be used as "flag symptoms" in the context of initial risk assessments in health facilities [31]. Multiawati et al. (2021) [32] performed a meta-analysis aiming to estimate the prevalence of anosmia and dysgeusia in patients with COVID-19 and concluded that these symptoms are more prevalent in patients with COVID-19 than in those with other respiratory diseases.
On the other hand, gastrointestinal symptoms are also common in several viral infections and represent a warning sign for the worsening of the general condition of patients, especially as a result of dehydration. A review study focusing on the gastrointestinal symptoms of COVID-19 [31] concluded that, similarly to the general population, pregnant women may experience gastrointestinal manifestations such as diarrhea, abdominal pain, nausea, and a loss of appetite. One hypothesis is that SARS-CoV-2 binds to human ACE-2 receptors that are also present in intestinal cells, hepatocytes, and cholangiocytes, making the gastrointestinal tract a potential route for infection. Thus, pregnant women with gastrointestinal symptoms should be closely monitored in appropriate health facilities, as these signs can indicate both SARS-CoV-2 and other infections and can also lead to secondary conditions such as dehydration.
Studies that analyzed data from SIVEP-Gripe to describe characteristics related to maternal morbidity and mortality due to SARS from COVID-19 infection highlighted characteristics similar to those of the cluster 3 profile [33,34].   [33] described a fatality rate of 12.7% among 978 pregnant and postpartum women with SARS from COVID-19 until 18 June 2020. Among the fatal cases, the authors highlighted the frequency of at least one comorbidity (48.4%), as well as ICU admission (58.9%) and invasive MV (53.2%). Additionally, the main risk factors for maternal death were being postpartum at the onset of SARS, being obese, and having diabetes or cardiovascular disease. Menezes et al. (2020) [34] analyzed data up to 14 July 2020 and identified 2, 475 pregnant and postpartum women with SARS due to COVID-19 and 204 deaths. The authors emphasized that age over 35 years, obesity, diabetes, black skin color, living in a peri-urban area, not having access to family health strategy, or living more than 100 km from the reporting hospital were risk factors for adverse outcomes (death, ICU admission, and MV).
According to our analysis, in cluster 3, older women in the puerperium period and with comorbidities were more frequent. This cluster also had a higher frequency of invasive MV, ICU admission, and death. In addition, symptoms indicative of more severe cases of the disease, such as dyspnea, fatigue, oxygen saturation less than 95%, and respiratory distress, were also more prevalent in this group. Lassi et al. (2021) [35] systematically reviewed studies comparing pregnant women with severe and non-severe COVID-19 and highlighted an increased risk of severe COVID-19 cases for older pregnant women (over 35 years old) with comorbidities (obesity, diabetes, and preeclampsia). In addition, the authors reported a high risk of adverse neonatal outcomes, such as preterm birth, from women with a severe case of the disease.
Among the comorbidities analyzed in our study, obesity was the most frequently observed in cluster 3. In the general population, this condition has been recognized as one of the main risk factors for severe COVID-19 cases [3]. Additionally, among pregnant women, obesity may also imply a higher risk of cesarean section [36]. According to a study review of pregnant women with COVID-19, obesity doubled the risk of death [37]. Furthermore, a multicenter cohort study of pregnant women with and without a diagnosis of COVID-19 [13] revealed a higher frequency of overweight in early pregnancy, as well as preeclampsia/eclampsia, ICU admission, maternal mortality, and preterm birth among pregnant women with diagnoses of COVID-19. These results suggest that the presence of comorbidities, especially obesity, increases the risk of a worse prognosis of COVID-19.
Finally, it is important to emphasize that, in addition to the direct impacts of the COVID-19 pandemic, indirect consequences related to delays in obstetric care due to suspensions of consultations or changes in birth plans, as well as worsening socioeconomic status and increased mental suffering due to fear of illness, implied reduced access to and quality of obstetric care during the pregnancy-puerperal period, thus favoring poor outcomes [38,39].
Our study has limitations that need to be mentioned. First, we analyzed a public database, so the number of cases we identified may be underestimated due to the underreporting of cases, for example, from patients who had symptoms and did not undergo tests or were misdiagnosed. Another limitation refers to the absence of data on neonatal outcomes in this dataset. Finally, we considered data from the beginning of the pandemic to August 2021, since we chose to analyze a setting at the moment immediately before the start of vaccination of pregnant women (July 2021).
Despite the advances achieved in maternal and child health during the last decades, with the adoption of public policies aimed at monitoring prenatal care and childbirth, there are still several barriers to be faced for access to quality health services in Brazil. The alarming number of maternal deaths from COVID-19 points to delays in health care access for this population that are exacerbated by socioeconomic and spatial inequalities [34,40,41].

Conclusions
Our results highlighted the importance of continuously monitoring the population of pregnant and postpartum women, as well as of properly referring them to specialized care services prepared to attend this specific population. The clusters identified in this study can contribute to the initial assessment of the patient, differentiating general conditions from those most likely to be related to COVID-19, and those indicative of severe cases, assisting in the referral of these women to health services with the appropriate level of complexity. Moreover, different profiles reinforce key variables that could be used in the initial risk assessment and in the prognostic evaluation, helping to improve the clinical management of each specific subgroup and, thus, implying the more efficient use of scarce resources currently available in healthcare services.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijerph192013522/s1. Figure S1: Flowchart of the Multiple Correspondence Analysis -Agglomerative Hierarchical Clustering procedure. Figure S2: Agglomerative Hierarchical Clustering of the pregnant and postpartum women with SARS due to COVID-19 on the top 10 Principal Components. Population was segmented into three clusters. Hierarchical tree (cut off value: 0.04). Figure S3: Inertia plot. Since the increase in homogeneity within the cluster from three to four clusters is small when compared to that observed from two to three clusters, the population was segmented into three clusters.
Author Contributions: I.C.R.C., S.G.F., G.F.S. and H.G.d.S. contributed to conceptualization, data curation, methodology design, and formal analysis. I.C.R.C and H.G.d.S wrote the original draft. S.G.F., G.F.S. and A.D.P.C.F. reviewed and edited the original draft. All authors contributed to the manuscript's revision and read and approved the submitted version. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

SARS
Severe