External Validation of the Individualized Prediction of Breast Cancer Survival (IPBS) Model for Estimating Survival after Surgery for Patients with Breast Cancer in Northern Thailand

Simple Summary Recently, a prediction model was developed specifically for predicting the probability of survival and disease progression five years after diagnosis for Thai female patients with breast cancer. The model was composed of twelve routinely available clinical predictors. Even though it seemed to provide accurate predictions in the development dataset, it has never been tested in other external datasets. This study validated this new prediction model, entitled IPBS (Individualized Prediction of Breast cancer Survival), in another dataset of Northern Thai female patients with breast cancer and found that the model carried an acceptable discriminative ability comparable to when it was developed. Nonetheless, model recalibration to each specific context is encouraged, as it may overestimate the probability of events when the underlying baseline survival of the patient cohort is different from the development dataset. Abstract The individualized prediction of breast cancer survival (IPBS) model was recently developed. Although the model showed acceptable performance during derivation, its external performance remained unknown. This study aimed to validate the IPBS model using the data of breast cancer patients in Northern Thailand. An external validation study was conducted based on female patients with breast cancer who underwent surgery at Maharaj Nakorn Chiang Mai hospital from 2005 to 2015. Data on IPBS predictors were collected. The endpoints were 5-year overall survival (OS) and disease-free survival (DFS). The model performance was evaluated in terms of discrimination and calibration. Missing data were handled with multiple imputation. Of all 3581 eligible patients, 1868 were included. The 5-year OS and DFS were 85.2% and 81.9%. The IPBS model showed acceptable discrimination: C-statistics 0.706 to 0.728 for OS and 0.675 to 0.689 for DFS at 5 years. However, the IPBS model minimally overestimated both OS and DFS predictions. These overestimations were corrected after model recalibration. In this external validation study, the IPBS model exhibited good discriminative ability. Although it may provide minimal overestimation, recalibrating the model to the local context is a practical solution to improve the model calibration.


Introduction
Breast cancer has recently become one of the most prevalent malignancies worldwide, with over 2 million new cases diagnosed in 2020 [1]. With continuous population growth, an improvement in accessibility to early mammographic screening, and a higher fraction of the population exposed to common risk factors, breast cancer incidence is projected to double the current figure within the year 2070 [2]. In Thailand, the age-standardized incidence rate of breast cancer has shown a steady rise from 18 per 100,000 women-years in 1998 to almost 40 per 100,000 women-years in 2020 [3]. This rising pattern of age-standardized incidence rates has been similarly observed in both developed and developing parts of the world [4].
However, there were still significant disparities in terms of breast cancer survival. From the recent Global Surveillance report, the five-year net survival was over 85% in the developed countries, whereas in the developing countries where healthcare resources (e.g., the number of healthcare personnel and access to effective therapy) are generally limited, and diagnosis is often delayed, the survival probability was much lower at only 65% [5]. As more than half of all patients are diagnosed within these less developed countries, breast cancer is certainly a significant burden to both individuals and public health [4].
During the past decades, the survival rates of patients with breast cancer have significantly improved, owing to both an advance in breast cancer therapy and an early screening strategy [6]. Nonetheless, the survival rates still varied largely across patients with different baseline prognoses [4]. Accurate prognostication of patient survival after a cancer diagnosis is paramount to clinicians and patients during shared decision-making about optimal treatment planning [7]. The development of multivariable clinical prediction models that provide absolute survival prediction for each individual is a current rising trend [8,9]. Over the years, several prognostic tools for breast cancer have been developed, validated, and implemented, such as the Nottingham prognostic index (NPI) [10], Adjuvant! [11], OPTIONS [12], and PREDICT [13]. Even though most of the tools have been proven to provide accurate predictions in many settings, one validation study from Thailand reported an underestimation of the predicted survival by some of these western-derived models and suggested that a prognostic model specific for Thai breast cancer patients be developed [14].
In 2021, the individualized prediction of breast cancer survival (IPBS) model was derived using the cohort data of breast cancer patients registered in the Network of National Cancer Institutes of Thailand [15]. The IPBS model is composed of twelve clinicallyavailable predictors and is able to provide survival predictions with an acceptable discriminative ability and a good calibration. However, as no validation study has been conducted, the external performance of the IPBS remains unknown. This study aimed to externally validate the IPBS model in predicting the five-year overall survival (OS) and disease-free survival (DFS) using the cohort data of breast cancer patients who underwent surgical operations at Maharaj Nakorn Chiang Mai hospital.

Study Design
An external validation study of the IPBS model was conducted using a retrospective observational cohort design. Female patients with breast cancer registered in the Chiang Mai Cancer registry and underwent surgical treatment at Maharaj Nakorn Chiang Mai hospital from 1 January 2005 to 31 December 2015 were eligible for inclusion. The Institutional Review Board of the Faculty of Medicine, Chiang Mai University, approved the study protocol (FAM-2563-07779). In addition, we followed the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement for reporting this study [9].

Study Patients
Patients who met the following criteria were included: (1) female patients aged between 18 and 90 years with primary invasive breast cancer based on histopathologic diagnosis, and (2) patients who underwent definite surgical treatment for breast cancer. We excluded patients who had one of the following criteria: (1) patients with stage IV breast cancer or metastatic breast cancer, (2) patients with a history of receiving neoadjuvant chemotherapy, pre-operative hormonal therapy, or radiotherapy, (3) patients with synchronous tumor, (4) patients with inflammatory breast cancer or Paget's disease, and (5) patients with incomplete data, regarding the date of operation or national identification number. The investigators verified every included patient diagnosis and cross-checked it with the electronic medical database.

Data Collection
Patient clinical data (age at diagnosis, nationality, type of medical insurance, marital status, menopausal status), tumor factors (tumor size, location, clinical stage, histological type, tumor grade, number of nodes positive, number of examined lymph nodes, lymphovascular invasion (LVI), pathological stage, Ki-67 proliferation index, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER-2) status), and treatment factors (surgical procedure, adjuvant chemotherapy status, adjuvant targeted therapy (e.g., trastuzumab), adjuvant radiotherapy status, adjuvant hormonal therapy status) were retrieved from medical records. Tumor factor data were extracted from associated pathological reports. The ER or PR positivity was defined as 1% or more positive tumor cells with nuclear staining. The HER-2 positivity was defined as either a score of 2+ or 3+ by immunohistochemistry.
We also extracted and collected the baseline clinicopathologic characteristics data from the development study by Pongnikorn et al. [15] to assess the degree of relatedness between the two datasets.

The IPBS Model
The IPBS model consists of twelve practical predictors, including age at surgical treatment, menopausal status (premenopause or postmenopause), pathological staging (stage I, II, or III), tumor type (ductal carcinoma or others), histological grading (grade I, II, or III), tumor size in millimeters, lymphovascular invasion status (presence or absence), number of positive axillary lymph nodes (0, 1-3, or ≥4), ER (positive or negative), PR (positive or negative), HER-2 status (positive or negative), and type of surgical treatment (mastectomy or breast conserving surgery). According to the development dataset, the baseline 5-year OS and 5-year DFS were 0.893 and 0.889, respectively. The details on how to calculate the prognostic index of 5-year OS and 5-year DFS by the IPBS model are shown in Appendix A (Table A1).

Study Outcomes
The primary endpoints for prediction were 5-year OS and 5-year DFS. OS event was defined as death from any cause during follow-up, whereas DFS event was defined similarly to the IPBS development study, which was any invasive relapse (including ipsilateral recurrence), any appearance of a second primary cancer (including contralateral breast cancer), any appearance of distant metastasis, and death from any cause, whichever occurred first [16]. The survival time for both OS and DFS events started at the date of surgery. All patients included in the analysis were followed up until 5 years after their index date of surgery. Patients who had no event during the follow-up period were censored at 5 years.
2.6. Statistical Analyses 2.6.1. Study Size Estimation Our study size estimation was based on the previous guidance on sample size for designing an external validation study by Collins et al., which suggests that a minimum of 100 to 200 events are required for external validation of a prognostic model [17]. According to our preliminary review of patients diagnosed with breast cancer registered in the Chiang Mai Cancer Registry between 2005 and 2015, there were at least 820 deaths from  4096 patients. Provided that about half of these might be excluded, the remaining number of death events would still be sufficient.

Handling of Missing Data
As we anticipated that there would be missing data on several prognostic variables of the IPBS model in our validation dataset, we used multiple imputation with chained equation (MICE) to replace the missing values. We followed a two-stage calculation using a quadratic rule to estimate the number of imputations required for MICE to achieve replicable parameters and their standard error estimates [18]. External validations were performed within each imputation dataset. The estimated parameters were then pooled with Rubin's rules [19].
In this study, three analytic approaches for handling missing data were performed. The first analytic approach used multiple imputations that included the data on the observed outcomes as independent variables during MICE modeling. As the analysis concerned survival outcomes, Nelson-Aalen cumulative hazard estimates were used. The second approach was the use of multiple imputation that did not include the data on the cumulative hazard estimates. This approach would represent the performance of the model in real-life settings where the values of the observed outcomes were unknown [20]. The last approach was the complete-case analysis, where only patient records with complete data on IPBS predictors were included for analysis.

Descriptive and Comparative Analysis
Statistical analyses were performed with Stata version 17 (StataCorp, College Station, TX, USA). Continuous data were summarized by mean and standard deviation or median and interquartile range (IQR) based on the underlying distribution. Frequency and percentage were used for the description of categorical data. Missing data was labeled and presented as an unknown dummy variable. Survival probability was estimated using Kaplan-Meier methods. Log-rank test was used to explore the association between each prognostic factor and OS.
The clinical characteristics and the association of prognostic factors for OS in the validation dataset were comparatively tabulated with those in the development dataset. The degree of relatedness was evaluated using the standardized difference (STD). Characteristics with an absolute STD value of more than 10% were considered significant differences between datasets [21].

Evaluation of External Performance
The model performance was assessed in two aspects: discriminative ability and calibration. Harrell's C-statistics was used to represent the model discrimination. Calibration was evaluated using calibration plots, expected-to-observed ratio (E:O ratio), and calibration slope. The expected-to-observed ratio of less than 1.0 indicated underestimation, whereas overestimation was suggested when these parameters were greater than 1.0.
For comparative purposes, the external performance of the updated version of PRE-DICT model (PREDICT v2) for predicting a 5-year OS was estimated [22]. However, as the specific model equation and baseline probability for DFS outcomes were not reported for the PREDICT v2, and we did not evaluate the performance in terms of DFS prediction. The details on how to calculate the prognostic index of 5-year OS by the PREDICT v2 model are shown in Appendix A (Table A2).
In case the IPBS or the PREDICT v2 model showed poor calibration in this validation dataset, model updating would be performed to improve the calibration of these prediction models in the validation set. In this study, model recalibration would be conducted by readjusting the model intercept or the baseline survival probability to that of our validation population while all other coefficients remained the same as originally proposed. This was achieved by refitting the Cox's model in the validation dataset, while the linear predictors were included as an offset term [23]. This approach would allow us to correct calibration-dataset, model updating would be performed to improve the calibration of these prediction models in the validation set. In this study, model recalibration would be conducted by readjusting the model intercept or the baseline survival probability to that of our validation population while all other coefficients remained the same as originally proposed. This was achieved by refitting the Cox's model in the validation dataset, while the linear predictors were included as an offset term [23]. This approach would allow us to correct calibration-in-the-large of the model, which is the common issue during external validation due to a mismatch in the overall observed event rate and the predicted risk [24].
An exploratory subgroup analysis of model performance was performed based on the histological subtypes of breast cancer and the pathological staging. The estimated subgroup C-statistics and calibration slope was based on a randomly selected sample of multiple imputed datasets, including the cumulative hazard of events.

Patient Characteristics
According to the Chiang Mai Cancer Registry, a total of 3581 female patients were diagnosed with breast cancer and underwent surgical operations at Maharaj Nakorn Chiang Mai hospital during the study period. Of these patients, 1713 were excluded from the analysis (Figure 1). The most common reasons for exclusion were patients with metastatic breast cancer and patients who received neoadjuvant chemotherapy. Finally, 1868 patients were included in the validation dataset. After a median follow-up of 5 years, there were 270 events for OS analysis and 332 events for DFS analysis. The 5-year OS and DFS of the patients in the validation dataset were 85.2% (95% confidence interval (CI) 83.5, 86.8%) and 81.9% (95% CI 80.1%, 83.6%), respectively. In the validation dataset, the mean age at the time of surgery was 52.9 years, and around half of the patients were postmenopausal (49.6%). Regarding the characteristics of breast cancer, most of the patients were diagnosed with invasive ductal carcinoma (73.3%), with a pathological stage of I or II (54.2%) and a histological grade of II or III (80.6%). A larger proportion In the validation dataset, the mean age at the time of surgery was 52.9 years, and around half of the patients were postmenopausal (49.6%). Regarding the characteristics of breast cancer, most of the patients were diagnosed with invasive ductal carcinoma (73.3%), with a pathological stage of I or II (54.2%) and a histological grade of II or III (80.6%). A larger proportion of patients had tumor size smaller than 30 mm (59.9%), without lymphovascular invasion (46.0%), absence of node involvements (50.9%), positive ER and PR status (57.1% and 49.8%), and negative HER-2 status (51.4%). The majority of the patients underwent surgical mastectomy (75.6%) and received adjuvant chemotherapy (77.3%). About half of the patients received hormonal therapy (56.6%) and adjuvant radiotherapy (54.5%). However, less than 10% of the patients were prescribed targeted therapy. The details on patient characteristics in both the validation and development datasets are shown in Table 1. There were significant differences between the validation and the development dataset of the IPBS model in almost all of the presented features, except for the proportion of patients receiving radiotherapy (Table 1). The three clinicopathological characteristics with the highest standardized difference values were pathological staging (STD = 0.933), histological type (STD = 0.629), and histological grading (STD = 0.562).

Predictor-Outcome Associations
Both the validation and development datasets showed similar direction and statistical significance of predictor-outcome association patterns (Table 1), except for histological subtype, HER-2 status, type of surgery, and adjuvant chemotherapy. In the development dataset, there were no statistically significant differences in the 5-year OS between patients who were and were not prescribed adjuvant chemotherapy. In contrast, patients who did not receive chemotherapy were more likely to survive at 5-year in the validation dataset.  The median C-statistics of the PREDICT v2 model for predicting 5-year OS were 0.658 (range 0.638-0.672) and 0.644 (range 0.633-0.656) for multiple imputed datasets that included and did not include cumulative hazard of events, respectively. For complete-case analysis, the C-statistic of the PREDICT v2 model was estimated at 0.582 (95% CI 0.499, 0.664).

External Calibration
Regarding calibration, the IPBS model exhibited good agreement between the predicted survival probability and the observed proportion of 5-year OS and 5-year DFS for both multiple imputation approaches. Figure 2 visualizes the agreement between the predicted survival curves by the IPBS model and the observed Kaplan-Meier estimates across four risk quantiles for both MI approaches. For both 5-year OS and 5-year DFS, the IPBS model showed an apparent underestimation of the probability of events in the third risk quantile (Figure 2). The comparison of predicted survival curves and the observed Kaplan-Meier estimates for the complete-case analysis is shown in Appendix B ( Figure A1).

External Calibration
Regarding calibration, the IPBS model exhibited good agreement between the predicted survival probability and the observed proportion of 5-year OS and 5-year DFS for both multiple imputation approaches. Figure 2 visualizes the agreement between the predicted survival curves by the IPBS model and the observed Kaplan-Meier estimates across four risk quantiles for both MI approaches. For both 5-year OS and 5-year DFS, the IPBS model showed an apparent underestimation of the probability of events in the third risk quantile (Figure 2). The comparison of predicted survival curves and the observed Kaplan-Meier estimates for the complete-case analysis is shown in Appendix B ( Figure A1).  According to the E:O ratio, the IPBS model modestly overestimated the 5-year risk of death by 4.6 to 5.2% while inversely underestimating the 5-year risk of disease progression by 10.7 to 11.2% for both MI approaches (Table 2). In the complete-case analysis, the IPBS model seriously overestimated the probability of death and disease progression by 2.47 and 1.54 times the proportions of the observed death and progression events, respectively ( Table 2).
According to the MI approaches, the PREDICT v2 model provided minimal underestimation of 5-year overall survival by 9.3 and 10.1% (Table 2). In contrast, the model's predicted 5-year overall risk of mortality was significantly overestimated by almost two times the actual observed value in the complete-case analysis ( Table 2). The figures visualizing the agreement between the observed and predicted survival curves across the 5-year follow-up period of the PREDICT v2 model for all three analytic approaches are provided in Appendix B ( Figure A2).

Model Recalibration
After recalibration of the IPBS model, the overall calibration of the IPBS model improved for both multiple imputation approaches and complete-case analysis ( Table 2). Figure 3 compares the external calibration of the IPBS model for 5-year OS and 5-year DFS before and after model recalibration for the MI approach that includes cumulative hazard of events. In contrast, the comparison of the second MI approach is shown in Figure 4. The calibration plots of the IPBS model, before and after model recalibration from the complete-case analysis, are also presented in Appendix B ( Figure A3).   Improvements in model calibration were also observed for the PREDICT v2 m after recalibration, as observed through the changes in the E:O ratio toward 1.0. This however, more obvious in the complete-case analysis than in the MI approaches (Tab The calibration plots of the PREDICT v2 model (before and after recalibrating the bas survival probability) are presented in Appendix B ( Figure A4).

Exploratory Subgroup Analysis of Model Performance
The C-statistics and the calibration slope of the IPBS and PREDICT v2 model fo dicting 5-year survival outcomes stratified by histological subtypes and pathological are shown in Appendix A (Table A3). It was observed that the performance of IPB Improvements in model calibration were also observed for the PREDICT v2 model after recalibration, as observed through the changes in the E:O ratio toward 1.0. This was, however, more obvious in the complete-case analysis than in the MI approaches ( Table 2). The calibration plots of the PREDICT v2 model (before and after recalibrating the baseline survival probability) are presented in Appendix B ( Figure A4).

Exploratory Subgroup Analysis of Model Performance
The C-statistics and the calibration slope of the IPBS and PREDICT v2 model for predicting 5-year survival outcomes stratified by histological subtypes and pathological stage are shown in Appendix A (Table A3). It was observed that the performance of IPBS did not vary by the histological subtype of breast cancer. However, the IPBS model tended to show poor discrimination in patients with higher pathological stages. The PREDICT v2 model's performance was affected by the histological subtype and pathological stage.

Discussion
To our knowledge, this study is the first to externally validate the performance of the IPBS model after its introduction in 2021. The IPBS model showed acceptable discriminative ability for predicting both outcomes in the validation dataset (median C-statistics 0.706 to 0.728 for 5-year OS and 0.675 to 0.689 for 5-year DFS). Based on our findings, the external discriminative ability of the IPBS model was somewhat similar to that of the development study (C-statistics 0.72 for OS prediction and 0.70 for DFS prediction) [15]. However, in terms of external calibration, IPBS minimally overestimated the probability of mortality and progression events (4.6 to 5.2% for 5-year OS and 10.7 to 11.2% for 5-year DFS). The external performance of the PREDICT v2 model in predicting 5-year OS was inferior to that of the IPBS model in terms of discriminative ability (median C-statistics 0.644 to 0.658 for 5-year OS).
To evaluate whether the results of our study represent the reproducibility or the transportability of the IPBS model, we assessed the relatedness between the validation and the development dataset [25]. Compared to the development data, our validation data showed significant differences in most of the clinicopathologic characteristics at diagnosis, such as the pathological staging and histologic grading. This heterogeneity in prognostic characteristics between the two datasets could also explain the differences in outcome occurrence and survival rates. The 5-year OS and 5-year DFS were obviously higher in the validation dataset than in the development dataset (OS 85.2% vs. 77.9% and DFS 81.9% vs. 74.0%) [15]. In spite of these differences, the direction of predictor-outcome associations was generally the same for both datasets. Overall, it might be concluded that the validation study was only partly related to the development study and that the results of this validation study would reflect the transportability of the IPBS model over reproducibility.
Despite the unrelatedness between the two datasets, the IPBS model still showed an acceptably robust performance in providing survival prediction for surgically-treated breast cancer patients. Nonetheless, there was still evidence that the IPBS model may provide a modest overestimation of mortality and disease progression events, which could be explained by the lower baseline survival probability in the development dataset [8]. When the IPBS model was recalibrated in the validation data, the overestimation of event rates disappeared. This important finding suggests that, while the discriminative ability of the IPBS model is well-preserved and that the derivation of new prognostic models or addition of other important features might not be necessary, the baseline survival probability should be recalibrated or tailored for each specific group of patients to achieve accurate predictions [26].
The observed superiority further strengthened the potential clinical applicability of the IPBS model for the Thai population in the external discriminative performance compared to the recently proposed PREDICT v2 model in our dataset. One previous study was conducted to validate the performance of the PREDICT v2 model in Thai patients with breast cancer in 2020 [14]. It was found that PREDICT v2 underestimated the overall survival probability, which was concordant with our findings. However, it was unclear whether recalibrating the model would correct this issue. The discriminative ability of the PREDICT v2 model was estimated at C-statistics 0.78 for 5-year OS, which was modestly higher than that of our study. This discrepancy could be explained by differences in study populations and the types of concordance statistics used for validation.
Our study carries both strengths and limitations. The main strength was the inclusion of a sufficiently large number of OS and DFS events that would be adequate for validating a prognostic model. Another important point was the use of real-world routinely collected data as the data source, which improved the generalizability of the validation results [27]. However, the use of real-world data also led to several issues that threatened the validity of our findings. First, less than half of the included patients had complete data on all twelve IPBS predictors. As a complete-case analysis may be subjected to a serious risk of selection bias, we employed two MICE approaches to handle the missing data in the analyses. We also performed and reported the performance measure from a complete-case analysis. Even though the results of the complete-case analysis were different from those of the MICE approaches, we believe this finding had no strong clinical implication, as a growing body of evidence has found that complete-case analysis often leads to errors, result misinterpretations, and impaired generalizability [28]. Second, a large proportion of eligible patients were excluded due to incomplete or unverifiable data on relevant dates. Finally, this validation study was based only on the data of breast cancer patients who were diagnosed and treated at a single tertiary care center in Northern Thailand. Although our results should be considered transportable to other distant populations, according to the unrelatedness of the datasets, further broader external validation studies are still encouraged. In addition, as the data on how other well-known western-derived prediction models perform in the Thai population is still limited, an independent validation study of these models would be of value to clinical practice.

Conclusions
The IPBS was externally validated using the cohort data of female breast cancer patients in one tertiary care center in Northern Thailand. The validation dataset was proven to be unrelated to the development dataset. Hence, the validation results should be considered in terms of transportability rather than reproducibility. The model was able to provide both OS and DFS predictions at 5 years after diagnosis with an acceptable discriminative ability comparable to when it was developed. However, it was apparent from our results that the model provided a minimal overestimation of event probability for both OS and DFS. Thus, recalibrating the IPBS model to the local context is suggested before clinical implementation. Informed Consent Statement: Patient consent was waived due to the retrospective nature of anonymous data collection.

Data Availability Statement:
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request. The data are not publicly available, due to their containing information that could compromise the privacy of research patients.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.