Inverse Probability of Treatment Weighting in 5-Year Quality-of-Life Comparison among Three Surgical Procedures for Hepatocellular Carcinoma

Simple Summary In patients who had undergone resection for hepatocellular carcinoma (HCC), scores for most quality-of-life (QOL) subscales were significantly improved by 6 months after resection; QOL scores then remained stable for the rest of the 5-year period of this study. The QOL improvements after laparoscopic surgery or robotic surgery were much larger than improvements after open surgery. Between the 2nd and 5th year postsurgery, however, QOL improvements were larger in robotic surgery patients compared to laparoscopic surgery patients. Abstract This prospective longitudinal cohort study analyzed long-term changes in individual subscales of quality-of-life (QOL) measures and explored whether these changes were related to effective QOL predictors after hepatocellular carcinoma (HCC) surgery. All 520 HCC patients in this study had completed QOL surveys before surgery and at 6 months, 2 years, and 5 years after surgery. Generalized estimating equation models were used to compare the 5-year QOL among the three HCC surgical procedures. The QOL was significantly (p < 0.05) improved at 6 months after HCC surgery but plateaued at 2–5 years after surgery. In postoperative surveys, the effect size was largest in the nausea and vomiting subscales in patients who had received robotic surgery, and the effect size was smallest in the dyspnea subscale in patients who had received open surgery. It revealed the following explanatory variables for postoperative QOL: surgical procedure type, gender, age, hepatitis C, smoking, tumor stage, postoperative recurrence, and preoperative QOL. The comparisons revealed that, when evaluating QOL after HCC surgery, several factors other than the surgery itself should be considered. The analysis results also implied that postoperative quality of life might depend not only on the success of the surgical procedure, but also on preoperative quality of life.


Introduction
Hepatic resection is the mainstay curative treatment for hepatocellular carcinoma (HCC) patients. Generally, the three surgical options for HCC (open surgery, laparoscopic surgery, and robotic surgery) are equally effective in terms of medical outcomes [1,2]. Therefore, patients typically select the procedure that optimizes their quality of life (QOL) [1][2][3].
As it obtains comparable long-term clinical outcomes, laparoscopic surgery is considered the standard alternative to conventional open surgery [4,5]. Previous studies have compared outcomes between robotic surgery and laparoscopic surgery and between robotic surgery and open liver resection [4,5]. However, very little surgical outcome data are available to guide surgeons in selecting among these three surgical approaches [6,7]. In Taiwan, the few reports of complications and QOL outcomes after HCC surgery have been limited to robotic surgical procedures [8]. The available literature indicates that robotic surgery provides HCC patients with better clinical outcomes and higher overall satisfaction compared to both laparoscopic surgery and open surgery [6][7][8]. However, none of the three procedures have consistently demonstrated superior outcomes for other aspects of QOL. Until now, most QOL studies of surgical resection of HCC have only evaluated patients at 3 to 6 months after a single postoperative assessment [3]. Additionally, studies of the efficacy of HCC resection have been limited to surgical procedures performed in only one medical institution. Hence, empirical studies using patient-reported QOL are needed to quantify the effectiveness of clinical treatments for HCC. Data obtained in QOL assessments can be used to improve care quality for cancer patients. To our knowledge, this is the first Taiwan study to apply generalized estimating equation (GEE) analysis in a large-scale and long-term prospective cohort study of QOL change after HCC resection. Many studies have used a natural experimental design to examine the impacts of surgical procedures on cancer outcomes [9,10]. However, a major criticism of natural experimental design is that it does not randomly assign patients to different surgical procedures. For example, studies show that HCC patients with specific demographic attributes and clinical attributes tend to prefer a specific surgical procedure, which introduces the potential for selection bias [11]. To our knowledge, the present study is the first to apply inverse probability of treatment weighting (IPTW) in a natural experimental design for a long-term comparison of QOL outcomes among different HCC surgery types. Therefore, the purposes of this study were to couple a natural experimental design with IPTW in a comparison of 5-year QOL among three HCC surgery types and to explore predictors of QOL after HCC resection.

Study Design and Population
The subjects were patients who had undergone surgical resection of HCC performed at one of three southern Taiwan medical centers between January 2012 and December 2015. Inclusion criteria were the following: (1) a histologic or combined radiographic and laboratory diagnosis of HCC; (2) ability to communicate in Chinese or Taiwanese; (3) agreement to participate in a questionnaire survey performed in the hospital ward or by telephone. For accurate assessment of postoperative outcome measures, only patients who had been treated by highly experienced surgeons were analyzed [12]. Thus, the analysis excluded 21 HCC procedures performed by low-volume surgeons (defined as surgeons who had performed three or fewer surgeries within the previous year). Other major exclusion criteria were concurrent malignancy and participation in another QOL study that might have interfered with this study. Figure 1 shows that, during the sample selection period, 676 subjects were eligible to participate. Of these, 156 were excluded because they did not meet the enrollment criteria, declined to participate, or had already died. Therefore, 520 subjects completed the baseline preoperative survey, and 273 subjects completed the preoperative survey and the 6-month, 2-year, and 5-year postoperative surveys. Baseline demographic and clinical data were collected through questionnaire surveys and chart reviews. This study was approved by the Institutional Review Board of Kaohsiung Medical University Hospital (KMUH-IRB-20110002), and written informed consent was obtained from each participant.

Measuring Instruments
The Functional Assessment of Cancer Therapy-Hepatobiliary (FACT-Hep) and the European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30 questionnaires were used to assess QOL. Subscales of the FACT-Hep included three 7-item subscales with score ranges of 0-28 points (physical well-being, PWB; social/family wellbeing, SWB; functional well-being, FWB), one 6-item subscale with score ranges of 0-24 points (emotional well-being, EWB), and one 18-item subscale with score ranges of 0-72 points (hepatobiliary cancer subscale, HCS) [13]. The PWB, SWB, EWB, and FWB subscales were combined into the FACT-G total subscale. The FACT-G total and HCS subscales were combined into the FACT-Hep total subscale. Higher scores on all subscales of the FACT-Hep were interpreted as better QOL and fewer symptoms.
The EORTC QLQ-C30 is a generic QOL measure and consists of 30 items organized into five areas of functioning subscales (physical functioning, PF; role functioning, RF; emotional functioning, EF; cognitive functioning, CF; social functioning, SF), nine symptom subscales (fatigue, FA; nausea/vomiting, NV; pain, PA; dyspnoea, DY; insomnia, SL; appetite loss, AP; constipation, CO; diarrhea, DI; financial difficulties, FI) and a global health subscale (QL) [14]. Likert scales (from 1 to 7 in the global health subscale and from 1 to 4 in other subscales) are linearly transformed into scores of 0-100 where higher scores indicate better functional status or worse symptomatic problems.
The Chinese versions of the FACT-Hep and EORTC QLQ-C30 have been validated in HCC patients in Taiwan. In all subjects, the QOL measures were administered by the same three research assistants before and after surgery.
The following study characteristics obtained by records reviews and questionnaire interviews were tested as independent variables in this study: surgical procedure, gender, age, education, living with family, marital status, body mass index (BMI), smoking, drinking, Charlson comorbidity index (CCI), hepatitis B, hepatitis C, tumor stage, postoperative average length of stay (ALOS), postoperative 30-day readmission, postoperative recurrence, and preoperative QOL (FACT-Hep and EORTC QLQ-C30 subscales). Co-morbidities were identified by ICD-9-CM codes, which were used to calculate the Deyo-Charlson co-morbidity index [15]. The dependent variables were the postoperative QOL subscales.

Statistical Analysis
The unit of analysis in the present study was each patient with HCC resection. After determining the distribution of observed subjects and the numbers of subjects excluded due to loss, to follow-up, or to refusal to participate at different time points, the baseline data for the study population were first compared by surgery type. Continuous variables were tested for statistical significance by one-way analysis of variance (ANOVA), and categorical variables were tested by Fisher exact analysis.
A propensity score approach was applied to enable the use of IPTW to balance the baseline characteristics of patients among the three surgery types. Each observation was weighted by the inverse of the probability of a patient receiving HCC surgery, given observed confounders identified to the index date. Stabilized inverse probability weights were used to mitigate the influence of very low probabilities estimated by the propensity score model [16]. Weights were derived to obtain estimates representing population-average treatment effects to enable a balanced comparison among the three groups. Treatment was considered the method chosen at the time of consent to this study. Regression models were used to make final inferences, which enabled adjustment for any covariate that remained unbalanced after IPTW [17].
The radar diagrams present the mean score for each QOL subscale in different surgery groups. The GEE approach was performed to explore longitudinal changes in each QOL subscale at different time points in comparison with reference data, i.e., data obtained in surveys performed before surgery and at 6 months, 2 years, and 5 years after surgery. Each QOL subscale was used as a dependent variable as a function of time and effective covariates, which included surgical procedure, gender, age, education, cohabitation with family, marital status, BMI, smoking, drinking, CCI, hepatitis B, hepatitis C, tumor stage, postoperative ALOS, postoperative 30-day readmission, postoperative recurrence, and preoperative QOL. Effective covariates that significantly correlated with each QOL subscale were identified by univariate analysis and entered in the GEE model for multivariate regression analysis. The GEE approach is similar to a repeated-measure ANOVA but is more powerful because it can accommodate incomplete data for individual subjects at one or more assessment points without compromising their remaining data [18]. This approach is also recommended when analyzing incomplete data in longitudinal studies with continuous outcomes.
Effect size (ES) was directly calculated for comparisons of the relative magnitude of change as measured by the QOL measures. That is, ES was calculated as the difference between the mean scores for two time intervals divided by the standard deviation in the score for the previous (or former) time-interval [19]. Using this method of standardizing the extent of change measured by an instrument enabled comparisons between the QOL measures. An ES of 1.0 is equivalent to a change of one standard deviation (SD) in the sample. Effect sizes of 0.2, 0.5, and 0.8 are typically considered small, medium, and large changes, respectively.
As no studies have quantified the uncertainty in estimated responsiveness, the precision of reported ES values remains unknown, and the statistical results of different studies are difficult to compare across different populations or activity measures. When repeated measures are used, matters are further complicated by dependent observations of the same patient. To address these issues, differences in ES and associated 95% confidence intervals are calculated using bias-corrected and accelerated bootstrapping with 1000 replications [20]. For the statistical analyses in this study, Stata, version 12.0 (StataCorp, College Station, TX, USA) was used to perform GEE in XTGEE. All tests were two-sided, and p values less than 0.05 were considered statistically significant. Table 1 shows the characteristics of the 520 HCC surgery patients (362 with open surgery, 112 with laparoscopic surgery, and 46 with robotic surgery) in this study. The average age was 65.94 years, and 70.4% were male. After adjustment by IPTW, all covariates were well balanced. In all QOL subscales, subjects who continuously participated in the study throughout the 5 years did not significantly differ from those who died or dropped out during the observation period of the study (data not shown).  In all HCC surgery patients, QOL improvements after laparoscopic surgery or robotic surgery were much larger than QOL improvements after open surgery. However, QOL improvements after robotic surgery were larger than those after laparoscopic surgery. The radar diagram clearly distinguishes QOL outcomes among the different surgery types at different time points.

Difference and 95% Confidence Interval (CI) in ES by Using Bootstrapping Method
We compared the difference and 95% CI in ES values for all QOL subscales among the three surgery types at different time points (see Supplementary Table S1). A difference is considered statistically significant at the 0.05 level if the 95% CI does not include zero. Between the 6-month and preoperative surveys, both laparoscopic surgery patients and robotic surgery patients compared to open surgery patients had significantly larger absolute differences in ES values for the QOL subscales, but robotic surgery patients compared to laparoscopic surgery patients had significantly smaller absolute differences in ES values for the QOL subscales (p < 0.05). Furthermore, both laparoscopic surgery patients and robotic surgery patients compared to open surgery patients showed significantly larger absolute differences in ES values for the QOL subscales between the 5-year and 2-year postoperative surveys (p < 0.05), but the absolute differences were smaller than the absolute differences between the 2-year and 6-month postoperative surveys. Additionally, robotic surgery patients compared to laparoscopic surgery patients revealed significantly larger absolute differences in ES values for the QOL subscales during the same time period (p < 0.05).

Multivariate Analysis
The GEE models of effective QOL predictors in the HCC surgery patients in this study indicated that each time point was significantly related to the QOL subscales throughout the 5 years (p < 0.05) (Tables 2 and 3). Compared to open surgery patients, laparoscopic surgery patients had significantly higher scores for FACT-Hep SWB and EORTC-QLQ-C30 EF after controlling for related variables; compared to open surgery patients, robotic surgery patients also had significantly higher scores for FACT-G and EORTC-QLQ-C30 CF after controlling for related variables (p < 0.05). Compared to open surgery patients, however, robotic surgery patients had significantly lower symptom subscale scores for EORTC-QLQ-C30 PA, SL, AP, and DI (p < 0.05). The QOL subscale scores had significantly negative associations with female gender, advanced age, hepatitis C, smoking, high tumor stage, and postoperative recurrence (p < 0.05). Additionally, all preoperative QOL subscale scores had significant associations with each subscale score of the FACT-Hep and EORTC-QLQ-C30 throughout the 5-year follow-up surveys (p < 0.05).

Discussion
Comparisons of QOL improvements between different time points indicated that the FACT-Hep and EORTC QLQ-C30 subscale scores for HCC patients were significantly improved by 6 months after resection (p < 0.05) and then remained stable for the rest of the 5-year period of the study. The QOL improvements at 6 months postsurgery were also much larger for both laparoscopic surgery and robotic surgery compared to open surgery. Between the 2nd and the 5th years postoperatively, however, QOL was higher in the robotic surgery patients compared to the laparoscopic surgery patients, which is consistent with the literature [3,4,21]. This prospective study of real-world registry data from multiple institutions in Taiwan over a 5-year period found that surgical procedures, gender, age, hepatitis C, smoking, tumor stage, recurrence after surgery, and preoperative QOL subscale scores were significantly associated with QOL subscale scores after hepatocellular carcinoma surgery This study has several strengths. To our knowledge, this is the first populationbased prospective cohort study designed to assess changing trends in postoperative QOL subscale scores in HCC patients after surgery and to evaluate predictors of these scores. Notably, however, the results of this study were obtained in a specific setting: three large tertiary academic hospitals in Taiwan. Nevertheless, the subjects were a representative sample of HCC patients who had received resection performed by high-volume surgeons in Taiwan [22]. Additionally, we used the IPTW method to obtain comparison groups that were balanced in all baseline characteristics.
Any observational study is potentially subject to measured and unmeasured confounding effects. However, this study used IPTW to balance the baseline patient attributes and clinical attributes in a pragmatic nonrandomized cluster design. The results obtained in this approach mimicked those obtained in a randomized clinical trial for a primary outcome and provided a robust sample size while avoiding the individual selection bias present in registries created for observational studies [16]. Additionally, our novel use of IPTW also provided a methodology for maximizing accuracy in comparisons of the three surgical procedures in terms of factors that were potentially related to confounding variables. The results suggest that our comparisons were well balanced and improve our confidence that the estimates were appropriately adjusted for potential confounding variables measured in previous studies and included in our analyses.
A systematic review by Muzellec et al. found that all instruments conventionally used to measure QOL in cancer patients have had adequate development and validation processes, including EORTC QLQ-C30, EORTC QLQ-HCC18, FACT-Hep, FACT Hepatobiliary Symptom Index (FHSI), and Quality of Life-liver cancer (QOL-LC) [2]. The EORTC QLQ-C30 and FACT-Hep are well-developed measures that have been tested extensively in patients with HCC [13,14]. Luckett et al. further compared the EORTC QLQ and FACT cancer-specific measures for the purpose of informing the choice between them [23]. They concluded that, while further psychometric evidence is still needed, important differences in the subscale structures and social domains of the two measures should inform the choice of which measure to use in a particular study.
However, they also recommended that individual items on the FACT should be reviewed to ensure that symptoms and issues included within each subscale do not produce bias in the context of specific research objectives in a given study. Thus, the current study used the EORTC QLQ-C30 and FACT-Hep measures to survey patient-reported QOL.
The magnitude of improvement in QOL subscales was larger between the preoperative and postoperative 6-month surveys than between the postoperative 2-year and 5-year surveys. In all HCC surgery patients, however, the ES for EORTC QLQ-C30 NV was larger in comparisons between the preoperative and 6-month postoperative surveys than in comparisons between time periods. The magnitude of improvement in the RF and CF subscale scores was smaller than that in other subscale scores, probably because low QOL substantially reduced RF. After undergoing surgery that eliminated their physical and emotional problems, and after completing adjunct treatment, the patients substantially improved in functions that had been previously limited by nausea and vomiting [24]. Thus, alleviating nausea and vomiting can enhance functioning in other subscales of health (e.g., social functioning) and ultimately enhance overall QOL. This improvement may explain why, for most QOL subscales, the absolute ES for the period between the preoperative survey and the postoperative 5-year survey remained positive, whereas the ES for EORTC QLQ-C30 NV was larger in the period between the preoperative survey and postoperative 6-month survey than between the postoperative 2-year survey and postoperative 5-year survey.
For almost all FACT-Hep and EORTC QLQ-C30 subscales related to HCC cancersymptom functioning (including EF, FA, NV, DY, SL, AP, CO, DI, and FI), the improvement was smaller in the 5-year survey than in both the 6-month survey and 2-year survey. The only exception was the future perspective subscale. Emotional dysfunction in HCC patients may result from adjuvant systemic therapy after surgery [25,26]. In the current study, physical and emotional functioning tended to be highest in patients with short follow-up intervals, whereas overall well-being tended to be highest in patients with long followup intervals, which is in line with other studies [2,3,25,26]. One explanation for these observations is the use of coping mechanisms by patients to deal with the stress caused by adverse diagnoses and the potentially life-threatening effects of primary therapy. Gradual decreases in psychological stress may also explain consistent improvements in function symptoms [27,28].
Our findings demonstrated that robotic surgery outperforms laparoscopic surgery and open surgery in terms of PF, RF, EF, CF, SF, and symptoms. Compared with the laparoscopic surgery and open surgery groups, however, the robotic surgery groups revealed significantly larger subjective improvements in PF, RF, EF, CF, SF, and symptoms, which is consistent with previous studies [2,3,25]. A possible explanation for relatively larger improvements in the robotic surgery group is that patients in this group tended to have better socioeconomic status and tended to have a higher education level compared to those in the laparoscopic surgery and open surgery groups. Thus, the larger improvements in the robotic surgery group may have at least partially resulted from factors such as their greater capability to acquire and comprehend health-related information as well as the financial means to optimize their post-surgery recovery conditions. Finally, the most important predictors of postoperative QOL subscale scores throughout the 5-year study were preoperative QOL subscale scores, which is consistent with reports that preoperative QOL subscale scores are the best predictors of postoperative QOL [29,30]. Therefore, effective counseling is essential for apprising patients of expected postoperative impairments. If QOL outcomes are considered benchmarks, then preoperative QOL subscale scores, which are accurate predictors of postoperative outcome, are crucial.
For further validation of the significant association observed between risk factors and postoperative QOL after HCC surgery, Table 4 lists selected studies that have identified QOL trends after HCC surgery and risk factors for decreased QOL [30][31][32][33][34]. Notably, our cohort study had a larger population and a longer duration of longitudinal analysis compared with the selected studies. As in these previous works, our study demonstrated that QOL was significantly (p < 0.05) improved at 6 months after HCC surgery and then plateaued at 2-5 years after surgery. Our study also revealed that QOL after HCC surgery is significantly associated with surgery type, gender, age, hepatitis C, smoking, tumor stage, postoperative recurrence, and preoperative QOL (p < 0.05). After major hepatectomy, FACT-physical and functional scores were significantly decreased at the first postoperative visit and at the 6-week postoperative visit (p = 0.04) but returned to baseline at the 3-month postoperative visit.

2.
For minor hepatectomy, the nadir for most QOL scores occurred at the first postoperative visit with a return to baseline at the 6-week postoperative visit.

Study Limitations
Although all research questions were adequately and satisfactorily addressed, some limitations of this study are noted. First, this study collected data from patients who had received HCC resection under the supervision of a surgeon in one of three medical centers. Each surgeon had performed the highest volume of HCC resection in their respective hospital. This sample selection procedure was intended to limit the effect of the learning curve on QOL outcomes. As this study focused on high-volume surgeons in three different institutions, the results obtained are more representative of all patients with HCC resection compared to one that analyzes patients treated by a single surgeon. However, a notable limitation is that, in this prospective cohort study, the first patient was enrolled in 2012. Therefore, depending on their inclusion dates, the duration of follow-up differed among patients, which may have caused selection bias.

Conclusions
In conclusion, most QOL subscale scores for patients with HCC resection were significantly (p < 0.05) improved by 6 months postsurgery and then remained stable for the rest of the 5-year study period. Additionally, the QOL improvements at 6 months postsurgery were much larger for laparoscopic surgery and robotic surgery compared to open surgery. Between 2 years and 5 years postsurgery, however, QOL improvement was larger for robotic surgery than for laparoscopic surgery. Nevertheless, a forecast of QOL after HCC resection must consider several factors other than the surgery type itself. All predictors analyzed in this study could be addressed in preoperative and postoperative consultations so that candidates for HCC resection and patients who have already completed HCC resection are adequately educated in the expected course of recovery and expected QOL outcomes. Patients should also be advised that their postoperative quality of life might depend not only on the success of their operations, but also on their preoperative quality of life.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15010252/s1, Table S1: Differences and 95% confidence intervals in effect sizes for each FACT-Hep and EORTC QLQ-C30 subscale scores in hepatocellular carcinoma patients compared among three surgery types at different time points.  Informed Consent Statement: Written informed consent was obtained from each participant. Data Availability Statement: Data and study materials can be made available for non-commercial use upon reasonable request to the corresponding author.

Conflicts of Interest:
The authors have no conflict of interest in this study.