Sequential Multiple Imputation for Real-World Health-Related Quality of Life Missing Data after Bariatric Surgery

One of the main challenges for the successful implementation of health-related quality of life (HRQoL) assessments is missing data. The current study examined the feasibility and validity of a sequential multiple imputation (MI) method to deal with missing values in the longitudinal HRQoL data from the Scandinavian Obesity Surgery Registry. All patients in the SOReg who received bariatric surgery between 1 January 2011 and 31 March 2019 (n = 47,653) were included for the descriptive analysis and missingness pattern exploration. The patients who had completed the short-form 36 (SF-36) at baseline (year 0), and one-, two-, and five-year follow-ups were included (n = 3957) for the missingness pattern simulation and the sequential MI analysis. Eleven items of the SF-36 were selected to create the six domains of SF-6D, and the SF-6D utility index of each patient was calculated accordingly. The multiply-imputed variables in previous year were used as input to impute the missing values in later years. The performance of the sequential MI was evaluated by comparing the actual values with the imputed values of the selected SF-36 items and index at all four time points. At the baseline and year 1, where missing proportions were about 20% and 40%, respectively, there were no statistically significant discrepancies between the distributions of the actual and imputed responses (all p-values > 0.05). In year 2, where the missing proportion was about 60%, distributions of the actual and imputed responses were consistent in 9 of the 11 SF-36 items. However, in year 5, where the missing proportion was about 80%, no consistency was found between the actual and imputed responses in any of the SF-36 items. Relatively high missing proportions in HRQoL data are common in clinical registries, which brings a challenge to analyzing the HRQoL of longitudinal cohorts. The experimental sequential multiple imputation method adopted in the current study might be an ideal strategy for handling missing data (even though the follow-up survey had a missing proportion of 60%), avoiding significant information waste in the multivariate analysis. However, the imputations for data with higher missing proportions warrant more research.


Introduction
Health-related quality of life (HRQoL) represents the subjective evaluation of a patient's health status, providing complementary information to survival, cures, and biological responses to treatment [1]. HRQoL data have been increasingly collected in clinical trials, population health surveys, and clinical registers in many countries [2][3][4][5]. However, one of the main challenges for the successful implementation of HRQoL assessments is missing data, which can be at the item level, i.e., respondents do not provide answers to 2 of 16 certain items in an HRQoL questionnaire, or missing entire forms due to loss of followups [1]. The missing data may lead to biased conclusions if unattended. Therefore, it is important to understand missingness patterns and handle missing data properly when analyzing HRQoL data. In economic evaluations, HRQoL instruments that can be used to generate health utility data, also known as preference-based measures, are applied. The most commonly applied preference-based measures are EQ-5D [6], short-form-6D (SF-6D) [7,8], and the health utilities index [9]. Missing data are also important issues when it comes to health utility calculations since these measures require complete answers to all of the relevant items [1,10].
Both missing items and missing forms in HRQoL are rather common in clinical trials or observational studies, which may reduce statistical power and present a challenge for research in this field [11,12]. During the data collection phases, strategies could be integrated into the study design to minimize the incidence of missing data. However, once the trial/registry has started, as an analyst, one has little influence on how data are collected but primarily relies on analytical methods to account for missing data [13]. The development of sound strategies for handling missing data includes imputation methods. The current practices of handling missing data in HRQoL studies include list-wise deletion, single imputation by replacing the missing value with one previously observed value or mean value, multiple imputation(s) (MI), and model-based approaches [14,15]. Among them, MI methods are widely recommended because they may incorporate uncertainty around the missing values [16]. However, this is often poorly applied in reality [17].
Managing missing real-world data, including HRQoL data, has become a challenging issue with the rapidly increasing applications to real-world data in recent years. Real-world data are derived from a number of sources that document outcomes in a heterogeneous patient population in real-world settings, including (but not limited to) electronic health records, health insurance claims, and patient surveys [18]. Real-world data provide insights beyond those that can be derived from clinical trials as they follow patients with different characteristics in real-life situations, and often for longer periods than clinical trials [19]. Compared with well-conducted randomized controlled trials (RCTs), missing data are more pronounced in real-world data because data can be missing for exposures, known confounders, and outcomes [13]. However, existing guidance and standards for handling missing data most often only concern RCTs. Currently, there are no standards or formal guidelines on how to deal with missing real-world HRQoL data [13].
In this study, we demonstrated and simulated the missingness of real-world HRQoL data from the Scandinavian Obesity Surgery Registry (SOReg) [20], and examined whether a sequential MI procedure is a practical strategy for handling missing values in the shortform 36 (SF-36) and SF-6D forms. Because the HRQoL data were repeatedly collected at four time points, the sequential MI procedure imputed the missing values chronologically, i.e., the missing data in a later follow-up were imputed using the multiply-imputed datasets of a prior follow-up.

Materials and Methods
This research applied data from existing registers in Sweden. Data retrieval, analyses, and presentation results were performed in accordance with the Declaration of Helsinki. The research work was approved by the Swedish Ethical Review Agency (Etikprövningsmyndigheten; approval numbers: 2019-03666 and 2019-05713).

Data Sources
The Scandinavian Obesity Surgery Registry (SOReg) is a Swedish national quality registry for bariatric surgery management and research. It has a coverage of >98% nationwide, its internal validity is evaluated regularly, and it has high data quality [21]. Data on patients' sociodemographic information, hospital characteristics, and detailed information regarding the surgeries and post-surgery outcomes, including HRQoL assessed by SF-36 and the Obesity Problem Scale [22,23], were obtained from the SOReg. Patients included in the study reported their HRQoL data at baseline (i.e., prior surgery) and years 1, 2, and 5 postoperatively by filling out a questionnaire. Specialized nurses collected anthropometric data and completed questionnaires. Data entry was performed by trained persons (participating surgeons, plus dedicated nurses in each center).
In the current study, all patients who received bariatric surgery between 1 January 2011 and 31 March 2019 (n = 47,653) were included for descriptive analysis and missingness pattern explorations, while only patients who had completed SF-36 at baseline (year zero), and at the one-, two-, and five-year follow-ups were included (n = 3957) as part of an analytical dataset to evaluate the multiple imputation process.

SF-36 and SF-6D
SF-36 measures HRQoL with 36 items, which can be grouped into 8 domains (physical function, role-physical, bodily pain, general health, vitality, social function, role-emotional, and mental health), and each item contains 2-6 severity levels [24]. In order to elicit the health utility, 11 items of the SF-36 (Supplementary Material Table S1) were selected to create SF-6D, including 6 domains (pain, mental health, physical functioning, social functioning, role participation, and vitality). Each domain described four to six severity levels (Supplementary Material Table S2) [7]. The SF-6D utility index of each patient in the current study was calculated using the Formula (1) below: where yi (i = 1, 2, . . . , 6) indicates SF-6D domains that can take m levels (m = 2, 3, . . . , 5 or 6); X yi = m represents dummy variables that indicate levels 2 to 5 or 6 and β yi=mX yi =m is the associated coefficient; β constant corresponds to the constant deviating from full health; and Most is a dummy variable, indicating that there is at least one dimension at levels 5 or 6. Missing information on any of the 11 items would lead to missingness in the SF-6D domains (the right hand of the equation), which in turn would lead to missingness in the SF-6D index score. There were two methods applied for index imputation: (1) to impute the items, firstly, then calculate the index based on the above formula; (2) to impute the index directly. There might be differences in the results when the two different methods are used. In the current study, we applied the second method, as it is also useful even when information from the items is missing.

Missingness Mechanism and Missingness Pattern Simulation
The widely used missingness mechanisms in simulation studies on multiple imputations are: missing completely at random, missing at random (MAR), and missing not at random [16]. In the current study, we simulated missingness in the analytical dataset according to a MAR mechanism, which assumes that the probability of the data that are missing does not depend on the unobserved data, but is conditional on the observed data.
To ensure that the missingness patterns of the analytical dataset used for multiple imputations may reflect the patterns found in the real-world data, we explored missingness patterns of the selected 11 SF-36 items and SF-6D index at baseline and one-, two-, and five-year follow-ups for the real-world data. The missingness pattern explorations were conducted using the package mice in the statistical software R 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria). The overall missing proportion of the SF-36 items at baseline was 19.6%. The missing proportions of the 11 SF-36 items at baseline from the highest to the lowest are shown in Figure 1 (left) and a total of 163 missingness patterns were found (Figure 1, right). Each row in the right panel of Figure 1 is a missingness pattern that indicates where the missing values (red colored) are located in the 11 SF-36 items.
for Statistical Computing, Vienna, Austria). The overall missing proportion of the SF-36 items at baseline was 19.6%. The missing proportions of the 11 SF-36 items at baseline from the highest to the lowest are shown in Figure 1 (left) and a total of 163 missingness patterns were found (Figure 1, right). Each row in the right panel of Figure 1 is a missingness pattern that indicates where the missing values (red colored) are located in the 11 SF-36 items. The overall missing proportions of the 11 SF-36 items in the one-, two-, and five-year follow-ups were 40.7%, 62.9%, and 83.8%, respectively, and the missingness patterns are shown in Supplemental Figures S1-S3. In total, 150, 105, and 64 missingness patterns of the 11 SF-36 items were found in the one-, two-, and five-year follow-ups, respectively.
To evaluate the performance of the proposed MI procedure using the analytical dataset, there was a need to simulate the missingness in the data by masking some known values in the analytical dataset. The missingness patterns (right panels of Figure 1, Supplemental Figures S1-S3) detected in the real-world data were applied to mask the values in the 11 SF-36 items at the four time points of the analytical dataset. The masking of the known values was conducted using the package mice as well.
The simulated missingness of the analytical dataset for the selected 11 SF-36 items at baseline is shown in Figure 2, with an overall missing proportion of 20.0%. The missing proportions of the 11 items in the analytical dataset ( Figure 2, left) were similar to those in the real-world data (Figure 1, left). The simulated missingness of the analytical dataset for the selected 11 SF-36 items in the one-, two-, and five-year follow-ups are shown in Supplemental Figures S4-S6, with the overall missing proportions of 40.4%, 63.8%, and 83.7%, respectively.
In general, the number of missing patterns decreased with reduced observations. Because the analytical dataset has much fewer observations than those in the real-world data (3957 vs. 47,653), the missingness patterns of the analytical dataset were less than those of The overall missing proportions of the 11 SF-36 items in the one-, two-, and five-year follow-ups were 40.7%, 62.9%, and 83.8%, respectively, and the missingness patterns are shown in Supplemental Figures S1-S3. In total, 150, 105, and 64 missingness patterns of the 11 SF-36 items were found in the one-, two-, and five-year follow-ups, respectively.
To evaluate the performance of the proposed MI procedure using the analytical dataset, there was a need to simulate the missingness in the data by masking some known values in the analytical dataset. The missingness patterns (right panels of Figure 1, Supplemental Figures S1-S3) detected in the real-world data were applied to mask the values in the 11 SF-36 items at the four time points of the analytical dataset. The masking of the known values was conducted using the package mice as well.
The simulated missingness of the analytical dataset for the selected 11 SF-36 items at baseline is shown in Figure 2, with an overall missing proportion of 20.0%. The missing proportions of the 11 items in the analytical dataset ( Figure 2, left) were similar to those in the real-world data (Figure 1, left). The simulated missingness of the analytical dataset for the selected 11 SF-36 items in the one-, two-, and five-year follow-ups are shown in Supplemental Figures S4-S6, with the overall missing proportions of 40.4%, 63.8%, and 83.7%, respectively.
In general, the number of missing patterns decreased with reduced observations. Because the analytical dataset has much fewer observations than those in the real-world data (3957 vs. 47,653), the missingness patterns of the analytical dataset were less than those of the real-world data; however, the percentages of the top missingness patterns of both datasets were similar. the real-world data; however, the percentages of the top missingness patterns of both datasets were similar.

Process of the Sequential Multiple Imputation
We applied a sequential method to impute the missing values at baseline (year 0), and years 1, 2, and 5 in order. The process of the sequential multiple imputation is shown in Figure 5, the "Sequential multiple imputation" step, and described in detail as follows: Firstly, the missing values of the selected 11 SF-36 items at baseline (year 0) were multiply-imputed (five imputations were used in the current study) using all baseline variables, including age, sex, BMI, pregnancy, and comorbidities, including sleep apnea,

Process of the Sequential Multiple Imputation
We applied a sequential method to impute the missing values at baseline (year 0), and years 1, 2, and 5 in order. The process of the sequential multiple imputation is shown in Figure 5, the "Sequential multiple imputation" step, and described in detail as follows: Secondly, for each imputed baseline dataset, the missing values of the selected 11 SF-36 items and comorbidities at the one-year follow-up were multiply-imputed based on all the baseline variables, as well as the previously imputed SF-36 items. Five imputed datasets were generated for each imputed baseline (year 0) dataset; therefore, in total, 5 × 5 = 25 imputed datasets were generated for the one-year follow-up.
Similarly, for missing values in the two-and five-year follow-ups, they were imputed five times for each previously imputed dataset based on all the variables in the previous years. Therefore, in total, 125 (5 × 25 imputed datasets of year 1) and 625 (5 × 125 imputed datasets of year 2) imputed datasets were generated for the two-and five-year follow-ups, respectively.
When conducting the multiple imputations within each year, the multivariate imputation using chained equations was used, with predictive mean matching, logistic regression, and proportional odds regression for continuous, binary, and ordered variables, respectively [25].

Assessment of Performance
The performance of the sequential multiple imputation approach was evaluated by comparing the actual values with the imputed values of the selected 11 SF-36 items and index at all four time points (baseline, one-, two-, and five-year follow-ups). The SF-36 Firstly, the missing values of the selected 11 SF-36 items at baseline (year 0) were multiply-imputed (five imputations were used in the current study) using all baseline variables, including age, sex, BMI, pregnancy, and comorbidities, including sleep apnea, hypertension, diabetes, dyslipidemia, dyspepsia, diarrhea, depression, and other illnesses that may have contributed to the surgical decisions. Five imputed datasets were generated for the baseline data.
Secondly, for each imputed baseline dataset, the missing values of the selected 11 SF-36 items and comorbidities at the one-year follow-up were multiply-imputed based on all the baseline variables, as well as the previously imputed SF-36 items. Five imputed datasets were generated for each imputed baseline (year 0) dataset; therefore, in total, 5 × 5 = 25 imputed datasets were generated for the one-year follow-up.
Similarly, for missing values in the two-and five-year follow-ups, they were imputed five times for each previously imputed dataset based on all the variables in the previous years. Therefore, in total, 125 (5 × 25 imputed datasets of year 1) and 625 (5 × 125 imputed datasets of year 2) imputed datasets were generated for the two-and five-year followups, respectively.
When conducting the multiple imputations within each year, the multivariate imputation using chained equations was used, with predictive mean matching, logistic regression, and proportional odds regression for continuous, binary, and ordered variables, respectively [25].

Assessment of Performance
The performance of the sequential multiple imputation approach was evaluated by comparing the actual values with the imputed values of the selected 11 SF-36 items and index at all four time points (baseline, one-, two-, and five-year follow-ups). The SF-36 items were compared using frequency distributions of the actual and imputed item scores, and the agreement of the distributions was tested using the chi-squared test controlled for the false discovery rate [26,27]. The mean absolute percentage error (MAPE), one of the most common metrics used to measure accuracy for continuous variables, was calculated to assess the agreement between the actual and imputed values for the SF-6D index [28][29][30]. MAPE is the mean of the absolute difference between the actual and imputed values divided by the actual values [31]. MAPE < 10% is excellent, <20% is good, 20-50% is fine, and >50% is poor [32].
The intraclass correlation coefficients (ICCs) were also provided to indicate the agreement between the actual values and the imputed values. An ICC value below 0.50, between 0.50 and 0.75, between 0.75 and 0.90, or above 0.90 indicates poor, moderate, good, or excellent agreement, respectively [33].
In the current study, MAPE and ICC were averaged across the imputations. All statistical analyses were conducted in R 4.11 (R Foundation for Statistical Computing, Vienna, Austria) and Stata 17.0 (College Station, Texas, USA). A two-sided p-value < 0.05 was considered statistically significant.

Characteristics of the Patients
The demographic characteristics of the patients at baseline are shown in Table 1. Statistically significant differences were found in most variables between the patients included in the analytical dataset and those excluded (with at least one missing form). In general, the patients in the analytical dataset were older and fewer (in proportion) of them had comorbidities. Descriptive analysis of the selected 11 SF-36 items and SF-6D index at baseline are shown in Table 2. Similarly, statistically significant differences in proportions of the SF-36 item scores and mean values of the SF-6D index were found between the patients included and excluded. In general, the respondents in the analytical dataset reported much fewer missing items and a slightly higher SF-6D index, compared to those excluded. Demographics and comorbidities, SF-36 items scores, and SF-6D indices in one-, two-, and five-year follow-ups are shown in Supplemental Tables S3-S8. At all three time points, the included respondents had much fewer missing values on characteristics and SF-36 items, and were relatively healthier, compared to those excluded.

Imputation Results for the Selected SF-36 Items
The comparisons of distributions between the actual and imputed responses of the patients in the analytical dataset are shown in Table 3. At the baseline and year 1, where missing proportions were about 20% and 40%, respectively, there were no statistically significant discrepancies between the distributions of the actual and imputed responses (all p-values > 0.05). In year 2, where the missing proportion was about 60%, distributions of the actual and imputed responses were consistent in most SF-36 items, except for PF2 and PF10. However, in year 5, where the missing proportion rose to about 80%, no consistency was found between the actual and imputed responses in any of the SF-36 items. The results indicate that the imputation based on the previous demographic and comorbidity information works well for the SF-36 items even when the missing proportion was as high as 60%. According to the ICC values presented in Table 3, we can see that the agreements between the actual values and the imputed values of the SF-36 items were good at baseline and in year 1 but moderate and poor in years 2 and 5, respectively. Table 3. Accuracy of multiple imputations for the selected SF-36 item scores.

Imputation Results for SF-6D Index
In general, the imputed SF-6D index values had similar means and standard errors as the actual ones (Table 4). For the baseline, the MAPE of the imputed index was smaller than 5% of the actual index, which means, on average, the imputed index values ranged between 95% and 105% of the actual value. Even for the follow-up in postoperative year 2, the deviation of the imputed index values from the actual values was smaller than 10% ( Table 4). The results indicate that the imputation method may provide relatively accurate values for the continuous index in terms of the mean absolute percentage error when the missing proportion is around 60%. According to the ICC values presented in Table 4, we can see that the agreement between the actual and imputed values of the SF-6D index was good at baseline but moderate in the one-, two-, and five-year follow-ups.

Discussion
Real-world data collections, compared to RCTs, face more challenges. Firstly, a high proportion of missing forms is possible due to long follow-ups, especially when the followup time extends beyond two years [11,12]. Moreover, the follow-up time points of most clinical registers are based on the need for care and are not standardized, which brings additional challenges in handling data missingness and analysis [34]. Secondly, missing or incompleteness in confounder measures in real-world data are more common compared to clinical trials, which might distort the inference. For example, patients lost to follow-ups might be due to characteristics that cannot be randomized, such as deteriorating health, which would introduce bias in MI and later inferential statistics. However, almost all of the guidelines regarding how to handle missingness with HRQoL data are for clinical trials only [13,14,35]. Therefore, our study might contribute to the development of guidance for good practices for the prevention and handling of missing data in real-world HRQoL data.

Main Findings
In the current study, we explored the missing data problem in the HRQoL data of a clinical register (SOReg) and examined a sequential multiple imputation method as a potential solution for the missing item and form problem in the repeated data collection using the SF-36 questionnaire. To the best of our knowledge, this was the first time that the sequential multiple imputation method was applied to impute HRQoL data. This method is preferred as it takes into consideration the longitudinal nature of HRQoL data collected in clinical studies; that is, a patient's HRQoL at follow-ups is determined by his/her previous HRQoL i.e., at the baseline and previous follow-ups. Although the missing proportion was high for the self-reported HRQoL questionnaire, the sequential multiple imputation method may still provide quite similar distributions for the dataset with missing values even when the missing proportion is as high as 60%. The method has provided a potential solution to handle missing data for multivariable analyses in HRQoL studies when missing was quite substantial.
Estimation of the SF-6D index requires complete answers on all 11 items from SF-36 or SF-12 [7,36]. Knowledge of missingness patterns, especially items associated with high missing proportions, is crucial to prevent missing data and select appropriate imputation methods. Knowledge of missingness patterns ((in terms of which characteristics of patients and providers are associated with missing data) might enable one to use appropriate strategies to reduce missing data during the process of data collection [12]. In the current study, although we found that there were many different combinations in missingness patterns for SF-6D, there was no dominating pattern, suggesting that the missingness on the 11 items used for SF-6D were independent of each other.
The overall missing proportions of the 11 SF-6D items increased over time, which might suggest that the 'missing' is associated with the extension of the follow-up. Based on the missing at random mechanism of the 11 items, the sequential multiple imputation method achieved satisfactory agreement between the actual data and the imputed data even when the missing proportion was as high as 63% at the two-year follow-up. However, as expected, the imputation could not approximate the actual data at the five-year followup where the missing proportion was >80%. One reason for the worse performance of the sequential MI procedure in the two-and five-year follow-ups might be the effect of propagation of uncertainty (or propagation of error) embedded in the procedure [37]. Because the MI for a particular year has already incorporated uncertainty regarding the missing data of the year, the sequential MI for data in later years could propagate the uncertainty due to the uncertainty of the parameters in the function for imputation, which are inherited from the previously multiply-imputed data [38].

Strengths and Limitations
In the current study, we adopted and evaluated the sequential multiple imputation method for four waves of HRQoL data collected from around four thousand patients in a registry. Although the performance of the items and index cannot be compared directly in our study, both imputations presented a high agreement between the imputed data and the real-world data when the missing proportion was < 50%, showing great application potential. We hope our study may stimulate more research on missingness in real-world HRQoL data. Efficient imputation methods would help improve the translation of HRQoL data into complete, accurate, and reliable evidence for healthcare decision-making [13].
There are limitations in the current study. Firstly, because the actual values of the missing SF-36 items and SF-6D index in the real-world data were unknown, it was impossible to identify the mechanism of the missingness in the current study; therefore, we applied the missing at random mechanism for the missing values. However, if the probability of missingness for an item was dependent on what would have been true or the item's non-response was 'missing not at random', for example, patients with worse health were more likely to have missing HRQoL items and/or forms, the current multiple imputation method would not be sufficient and other missing data models with different assumptions should be investigated [39]. Secondly, the proposed sequential MI and its performance were evaluated based on the simulation using the complete respondents, i.e., those who had all four HRQoL forms during the 5-year follow-up. However, we observed statistically significant differences between the patients with missing forms and complete forms in the current study. Although the differences were minor and the statistical significance might have been due to the large sample size, a possible bias introduced by the imputation based on the data of the complete respondents cannot be ruled out. Thirdly, in the current study, we investigated missing data concerning SF-36 and SF-6D in a clinical registry for obese patients. Further investigations on missingness based on other HRQoL instruments, such as EQ-5D, the Health Utility Index, and other patient groups, are also needed. It might be that HRQoL instruments with more items are more likely associated with higher missing proportions, which should be considered when designing the data collection strategy.

Conclusions
Relatively high missing proportions in HRQoL data-especially after long-term followups-are common in clinical registries, which brings challenges to analyzing the HRQoL of longitudinal cohorts. The sequential multiple imputation method adopted in the current study might provide an ideal imputation for the missing data (even though the follow-up survey had a missing proportion of 60%), avoiding significant information waste in the multivariable analysis. However, imputations for data with higher missing proportions (above 60%) are unclear. To prevent and handle the missing data in HRQoL studies, researchers should apply a rigorous methodology and practices. Guidance for preventing and handling missing data in observational studies is needed, and studies that use realworld data should be prioritized.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/ijerph191710827/s1, Table S1: SF-6D domains and the 11 selected SF-36 items that construct the SF-6D domains, Table S2: The short-form-6D (SF-6D), Table S3: Demographic characteristics of the patients in the one-year follow-up, Table S4: Scores for the selected SF-36 items and SF-6D index in the one-year follow-up, Table S5: Demographic characteristics of the patients in the two-year follow-up, Table S6: Scores for the selected SF-36 items and SF-6D index in the two-year follow-up, Table S7: Demographic characteristics of the patients in the five-year follow-up, Table S8: Scores for the selected SF-36 items and SF-6D index in the five-year follow-up, Figure S1: Missingness pattern of the selected 11 SF-36 items in the one-year follow-up for the real-world dataset (red cells indicating missing), Figure S2: Missingness pattern of the selected 11 SF-36 items in the two-year follow-up for the real-world dataset (red cells indicating missing), Figure S3: Missingness pattern of the selected 11 SF-36 items in the five-year follow-up for the real-world dataset (red cells indicating missing), Figure S4: Simulated missingness of the selected 11 SF-36 items in the one-year follow-up for the analytical dataset (red cells indicating missing), Figure S5: Simulated missingness of the selected 11 SF-36 items in the two-year follow-up for the analytical dataset (red cells indicating missing), Figure S6: Simulated missingness of the selected 11 SF-36 items in the five-year follow-up for the analytical dataset (red cells indicating missing).

Author Contributions:
The project was conceived by Y.C., S.S. and E.S. and developed with critical input from N.L., L.L. and K.-G.S.; K.A.F., S.S. and E.S. performed the data acquisition. Y.C. and S.S. performed all analyses; interpretation with guidance and feedback: N.L., E.S., L.L., K.-G.S. and K.A.F.; S.S. and Y.C. drafted the manuscript and coordinated the revisions. All authors critically revised the manuscript and approved the final version to be submitted. All authors have read and agreed to the published version of the manuscript.
Funding: This study was supported by a grant from the Swedish Research Council for Health, Working Life, and Welfare (FORTE, 2018-00896). FORTE had no role in the study design, plans for data collection and analysis, decision to publish, or preparation of this manuscript.
Institutional Review Board Statement: The study was approved by the Swedish Ethical Review Agency (Etikprövningsmyndigheten, https://etikprovningsmyndigheten.se/, accessed on 25 August 2022; approval numbers: 2019-03666 and 2019-05713). All data used in the study were retrieved from existing Swedish registers.

Informed Consent Statement:
Informed consent to use the data in a legally secure manner (https://www.scb.se/en/About-us/personal-data/, accessed on 25 August 2022) was obtained from the patients when the registration took place.

Data Availability Statement:
The data that support the study are not publicly available because they contain information that could compromise the privacy and confidentiality of the research participant.