External Validation of the Dutch SOURCE Survival Prediction Model in Belgian Metastatic Oesophageal and Gastric Cancer Patients.

The SOURCE prediction model predicts individualised survival conditional on various treatments for patients with metastatic oesophageal or gastric cancer. The aim of this study was to validate SOURCE in an external cohort from the Belgian Cancer Registry. Data of Belgian patients diagnosed with metastatic disease between 2004 and 2014 were extracted (n = 4097). Model calibration and discrimination (c-indices) were determined. A total of 2514 patients with oesophageal cancer and 1583 patients with gastric cancer with a median survival of 7.7 and 5.4 months, respectively, were included. The oesophageal cancer model showed poor calibration (intercept: 0.30, slope: 0.42) with an absolute mean prediction error of 14.6%. The mean difference between predicted and observed survival was −2.6%. The concordance index (c-index) of the oesophageal model was 0.64. The gastric cancer model showed good calibration (intercept: 0.02, slope: 0.91) with an absolute mean prediction error of 2.5%. The mean difference between predicted and observed survival was 2.0%. The c-index of the gastric cancer model was 0.66. The SOURCE gastric cancer model was well calibrated and had a similar performance in the Belgian cohort compared with the Dutch internal validation. However, the oesophageal cancer model had not. Our findings underscore the importance of evaluating the performance of prediction models in other populations.


Introduction
Oesophagogastric cancer has a dismal prognosis. Patients diagnosed with metastatic disease face a median overall survival (OS) time of three to five months with best supportive care (BSC) [1,2]. Survival is dependent on various prognostic factors and treatment type [3].
Patients with a relatively good Eastern Cooperative Oncology Group Performance Status (PS) of 0-2, may be eligible for chemotherapy, targeted therapy, or even palliative surgery [4,5].
Brachytherapy, external radiotherapy, or stent placement may be deployed to relieve symptoms, such as dysphagia, and/or to reduce tumour growth [6,7].
Palliative treatments often have uncertain and limited benefit while the treatment burden can be high. Ideally, shared decision-making should be applied where patient preferences and values are taken into account during decision making [8]. Accurate and balanced information about treatment options tailored to the individual patient should be provided. However, oncologists were found to rarely discuss the potential pros and cons of palliative treatment and the BSC option [9][10][11][12]. This may, at least in part, be due to the complexity of predicting outcomes for individual patients [13].
Prediction models can aid such individual risk estimation. Additionally, they can help quantify risks and benefits in an understandable manner to patients which allows them to more actively participate in the decision-making process [14,15]. Such prediction models will only live up to their potential if they have the required model performance qualities. A recent review investigated published risk prediction models regarding oesophagogastric cancer and concluded that model performance is often poorly described and external validation limited [16]. In addition, no models in the metastatic setting were of sufficient quality for use in clinical practice.
We therefore developed the SOURCE model (stimulating evidence-based, personalised and tailored information provision to improve decision-making after oesophageal-gastric cancer diagnosis) [17]. The model makes OS predictions based on prognostic factors for metastatic oesophagogastric cancer patients. The SOURCE model was developed on a nationwide Dutch population-based cohort selected from the Netherlands Cancer Registry. Predictions regarding OS are conditional on various treatment types. Details on the input parameters, development and internal validation of the model were previously published [17].
External validation is needed to investigate the performance of the original Dutch model and to justify its use for other populations. The Belgian population was selected, because the neighbouring countries have an extensive population-based national cancer registry. Therefore, the aim of this study was to validate the SOURCE model on an external population-based cohort selected from the Belgian Cancer Registry (BCR).

Results
Overall, 4097 patients diagnosed between 2004 and 2014 registered by the BCR were included. Figure 1 depicts the selection process stratified by oesophageal and gastric cancer patients.

Introduction
Oesophagogastric cancer has a dismal prognosis. Patients diagnosed with metastatic disease face a median overall survival (OS) time of three to five months with best supportive care (BSC) [1,2]. Survival is dependent on various prognostic factors and treatment type [3].
Patients with a relatively good Eastern Cooperative Oncology Group Performance Status (PS) of 0-2, may be eligible for chemotherapy, targeted therapy, or even palliative surgery [4,5].
Brachytherapy, external radiotherapy, or stent placement may be deployed to relieve symptoms, such as dysphagia, and/or to reduce tumour growth [6,7].
Palliative treatments often have uncertain and limited benefit while the treatment burden can be high. Ideally, shared decision-making should be applied where patient preferences and values are taken into account during decision making [8]. Accurate and balanced information about treatment options tailored to the individual patient should be provided. However, oncologists were found to rarely discuss the potential pros and cons of palliative treatment and the BSC option [9][10][11][12]. This may, at least in part, be due to the complexity of predicting outcomes for individual patients [13].
Prediction models can aid such individual risk estimation. Additionally, they can help quantify risks and benefits in an understandable manner to patients which allows them to more actively participate in the decision-making process [14,15]. Such prediction models will only live up to their potential if they have the required model performance qualities. A recent review investigated published risk prediction models regarding oesophagogastric cancer and concluded that model performance is often poorly described and external validation limited [16]. In addition, no models in the metastatic setting were of sufficient quality for use in clinical practice.
We therefore developed the SOURCE model (stimulating evidence-based, personalised and tailored information provision to improve decision-making after oesophageal-gastric cancer diagnosis) [17]. The model makes OS predictions based on prognostic factors for metastatic oesophagogastric cancer patients. The SOURCE model was developed on a nationwide Dutch population-based cohort selected from the Netherlands Cancer Registry. Predictions regarding OS are conditional on various treatment types. Details on the input parameters, development and internal validation of the model were previously published [17].
External validation is needed to investigate the performance of the original Dutch model and to justify its use for other populations. The Belgian population was selected, because the neighbouring countries have an extensive population-based national cancer registry. Therefore, the aim of this study was to validate the SOURCE model on an external population-based cohort selected from the Belgian Cancer Registry (BCR).

Results
Overall, 4097 patients diagnosed between 2004 and 2014 registered by the BCR were included. Figure 1 depicts the selection process stratified by oesophageal and gastric cancer patients.

Oesophageal Cancer Patients
In total, 2514 oesophageal cancer patients were analysed of whom 97.1% died during follow-up. Most patients were male (80.8%), had a PS of 1 (65.0%) and were diagnosed with adenocarcinoma (67.3%). The median observed OS was 7.7 months. An overview of patient, tumour and treatment characteristics is given in Table 1. Compared to the Dutch SOURCE population, the median OS time was higher for Belgian patients (7.7 vs. 5.1 months, p < 0.0001), see Table 1. cT3 tumours were more frequently observed in Belgian patients (45.5% vs. 22.7%) whereas the Dutch population had a cTX status in 49.9% of patients. Squamous cell carcinoma was compared to adenocarcinoma more frequently diagnosed in Belgium than in the Netherlands. Topography was not further specified in 33.1% of Belgian patients. Half of the Belgian patients were treated with chemotherapy, 10.6% received BSC and 5.8% received radiotherapy. Dutch patients received less treatment; 27.7% received chemotherapy, 26.6% BSC and 26% radiotherapy.

SOURCE Oesophageal Cancer Model Validation
Model discrimination for the oesophageal cancer population amounted to a c-index of 0.64 (0.63-0.66), see Table 2. Model calibration at six months for the overall oesophageal cancer population corresponded to an intercept and calibration slope of 0.30 (0.28-0.31) and 0.42 (0.39-0.45), respectively. The mean difference between predicted and observed survival was −2.6% with a mean absolute prediction error of 14.6% ( Table 2). The corresponding calibration plot ( Figure 2) shows an underestimation of OS for patients with a predicted six-month OS of ≤46% with the most prominent deviations in the lowest tertile of the plot. Overestimation of six-month OS was present for patients with a relatively good prognosis, with larger deviations on the higher end of the scale (60-80%), see Figure 2.

SOURCE Oesophageal Cancer Model Validation
Model discrimination for the oesophageal cancer population amounted to a c-index of 0.64 (0.63-0.66), see Table 2. Model calibration at six months for the overall oesophageal cancer population corresponded to an intercept and calibration slope of 0.30 (0.28-0.31) and 0.42 (0.39-0.45), respectively. The mean difference between predicted and observed survival was −2.6% with a mean absolute prediction error of 14.6% ( Table 2). The corresponding calibration plot ( Figure 2) shows an underestimation of OS for patients with a predicted six-month OS of ≤46% with the most prominent deviations in the lowest tertile of the plot. Overestimation of six-month OS was present for patients with a relatively good prognosis, with larger deviations on the higher end of the scale (60-80%), see Figure 2.

Gastric Cancer Patients
In total, 1583 patients with gastric cancer were analysed of whom 98.0% died during follow-up. Details of patient, tumour and treatment characteristics are given in Table 3. More than half of the patients were male (59.8%) and had a PS of 1 (59.6%). The median observed OS was 5.4 months.
Compared to the original Dutch SOURCE population, the median OS time was longer for Belgian patients (5.4 vs. 3.9 months, p < 0.0001), see Table 3. The primary tumour location was not further specified in 60.1% of Belgian patients versus 8.4% of Dutch patients, and topography was assessed as an overlapping lesion in 0.5% of Belgian versus 34.5% of Dutch patients. Half (52.1%) of the Belgian patients were treated with chemotherapy versus 34.6% of Dutch patients, and 33.5% of Figure 3. Mean differences between predicted and observed six-month overall survival for oesophageal cancer patients by patient subgroups. Values > 0% indicate an overestimation and values < 0% indicate an underestimation in overall survival. The grey band represents the mean difference between predicted and observed six-month overall survival for the entire oesophageal cancer cohort.

Gastric Cancer Patients
In total, 1583 patients with gastric cancer were analysed of whom 98.0% died during follow-up. Details of patient, tumour and treatment characteristics are given in Table 3. More than half of the patients were male (59.8%) and had a PS of 1 (59.6%). The median observed OS was 5.4 months.

SOURCE Gastric Cancer Model Validation
Model discrimination amounted to a c-index of 0.66 (0.64-0.68). Model calibration at six months for the overall gastric cancer population corresponded to an intercept and calibration slope of 0.02 (0.02-0.02) and 0.91 (0.90-0.91), respectively. The mean difference between predicted and observed survival was 2.0% with a mean absolute prediction error of 2.5% ( Table 2). The corresponding calibration plot showed good calibration with no differences greater than 5% between predicted and observed survival along all prediction estimates, see Figure 2.
Differences between predicted and observed survival were greatest in terms of overestimation for patients aged 80-89 (+6.1%), and with a PS score of 3 (+5.8%) and a cN3 status (+5.5%). The majority of patient subgroups (59%) showed similar or smaller differences between predicted and observed OS compared to the overall cohort (−2.0%), see Figure 4.

Discussion
External validation of prediction models is essential for use in clinical practice [18]. This external validation study of the Dutch SOURCE model demonstrated that the oesophageal model had low transportability to the Belgian population, given its poor calibration and c-index. However, the gastric cancer model transported adequately.
The original development report of SOURCE noted a calibration slope of one and an intercept

Discussion
External validation of prediction models is essential for use in clinical practice [18]. This external validation study of the Dutch SOURCE model demonstrated that the oesophageal model had low transportability to the Belgian population, given its poor calibration and c-index. However, the gastric cancer model transported adequately.
The original development report of SOURCE noted a calibration slope of one and an intercept of zero for both models during internal validation. C-indices were 0.71 and 0.68 for the oesophageal and gastric cancer model, respectively [17]. In this external validation study, we did not expect a superior performance compared to the internal validation, given the different nationality and healthcare settings. Our results showed that the oesophageal model performed poorer in the Belgian population regarding its calibration and a c-index of 0.64, but the performance of the gastric cancer model was close to the original internal validation with a good calibration and a c-index of 0.66.
Calibration of the gastric cancer model showed that differences between predicted and observed OS for the entire cohort were no greater than 5% along the calibration line, indicating a well calibrated model. Differences between predicted and observed OS were small (<5%) for most patient subgroups. Older patients aged 80-90 had the largest difference (+6.1%), which still was interpreted as fair by us.
C-indices were relatively low according to our classification, indicating that the models had difficulties in making higher prediction estimates for patients who actually survived longer versus patients who had a shorter lifespan. Since the calibration was good for the gastric cancer population and the variation between prediction estimates was small, one might argue that the model had difficulties in ranking patients' OS.
The poor fit of the oesophageal cancer model might be explained by overfitting during model development. The oesophageal cancer model has more input parameters and interaction terms compared to the gastric cancer model (see Table S1). Such complex models with a high number of parameters might lead to good fit for the sample population-in this case the Dutch-but predictions might not generalize to new subjects outside the sample, such as the Belgians [19].
Missing data in the Belgian cohort might be another explanation for the poor fit. In this study, >40% of data regarding the location of metastases was missing and therefore multiply imputed to avoid selection bias. This, however, is always suboptimal in comparison to having observed values. The oesophageal model compared to the gastric model contains more input parameters regarding the location of metastases (see Table S1). Therefore, the oesophageal model validation was more subject to multiple imputation and thus uncertainty, which might explain the poorer fit.
Furthermore, adenocarcinoma and squamous cell carcinoma were combined into the same oesophageal cancer model, despite their differential biological features. Although the oesophageal cancer model contained histology as an input parameter, it is unclear to what extent this combination contributed to the poor model fit. Patient subgroup analysis showed that mean differences between predicted and observed survival for adenocarcinoma, squamous cell carcinoma, and the entire cohort were −2.9%, +1.7% and −2.6%, respectively. These mean differences did not substantially differ (see Figure 3). For the re-estimated model based on BCR data and its calibration and discrimination, see Table S1, Figure S1 and Table S2.

Differences between Development and Validation Datasets
Several differences in patient, tumour and treatment characteristics were observed between the Dutch and Belgian population. These include topography, cT-category and tumour differentiation grade, which might be due to missing data and/or differing cancer registration policies. In the Netherlands, data managers are centrally trained to interpret and register data in a standardised fashion. In Belgium, data collection is decentralised where clinical and pathological data is obtained by oncological care programmes and laboratories [20]. Albeit training of data managers and data cleaning is performed according to specific guidelines, differences in registration might thus be due to varying registration practices and/or interpretations [21]. In addition, BCR data regarding treatment types have been sufficiently validated. Data regarding the location of metastases were derived from Belgian hospital discharge data. This is the first study to use discharge records for this this purpose. It is, however, unknown to what extent this data might deviate from patients' medical records.
Taking patient selection into account, the proportion of patients with cM1 tumours at diagnosis in the BCR was substantially lower compared to the Netherlands (22.1% vs. 40.1%). Additionally, the proportion of Belgian patients with a cTXNXMX status was considerably higher (28.6% vs. 1.9%) (personal communication, 29 May 2019). So, one might argue that this Belgian cTXNXMX patient group is quite heterogeneous and that a portion of these patients had true cM1 tumours at diagnosis. These patients, however, were not included in our analysis due to lack of detail in the clinical TNM classification. It might be the case that including these patients affects case mix and survival, which could lead to a dataset more similar to the Dutch.
When looking at the use of treatment modalities in the Belgian sample, the Belgians administered chemotherapy more frequently than the Dutch. Dutch oncologists more frequently offered BSC and radiotherapy, a more conservative approach that may explain the shorter median survival [20][21][22].
Lastly, SOURCE was developed to aid decision-making between BSC and (some form of) active treatment. During model development, 26.6% (n = 2131) and 47.6% (n = 2266) of Dutch oesophageal and gastric cancer patients received BSC (no treatment). Although this relatively large cohort could aid survival estimation on BSC, it should be pointed out that these estimates may have an inherent selection bias. Patients who received BSC most likely had worse PS scores or comorbidities compared to patients who did undergo treatment. Therefore, survival estimates for a relatively fit patient considering BSC may be underestimated. Although this effect could be partially corrected by other input parameters in the model, there may still be bias in the survival predictions [17].

Materials and Methods
This manuscript was written in accordance with the TRIPOD statement [23]. The SOURCE model aims to stimulate evidence based, personalized and tailored information provision to improve decision-making after oesophageal-gastric cancer diagnosis. The model predicts overall survival for patients with metastatic oesophageal or gastric carcinoma (cM1), who did not die within 14 days after diagnosis. Patients with only distant metastases located in the head or neck region fall outside the target population of SOURCE.
Input parameters of the model include: Age, cT-category, cN-category, tumour differentiation grade, number of metastatic sites, distant lymph node metastasis only, intra-thoracic and intra-abdominal lymph node metastasis and initial treatment. The gastric cancer model also includes gender as an input parameter and the oesophageal cancer model also includes peritoneal, liver and head and neck metastases, morphology and topography. Input parameters were measured at diagnosis, before the start of treatment.
SOURCE is integrated into a web-interface and will be made freely available after extensive assessments in clinical practice. Physicians can use the model together with patients during the clinical consultation. Since medical terminology is present in the web-interface, it is recommended that physicians discuss the results from the model with the patient, in a way that is tailored to the patient's level of understanding. It should be noted that SOURCE is developed to be a decision-aid to stimulate shared and informed decision-making. It should not and cannot replace the expertise and clinical judgement of physicians.

Data Source
The BCR covers more than 95% of the Belgian cancer population [24]. Patient and tumour characteristics were collected from the standard cancer registration database, which relies on notifications from both the clinical (oncology care programmes) and pathological (laboratories for pathological anatomy) network. Data regarding treatment were derived from reimbursement claims of health insurance companies. A detailed description of the BCR data and data sources is given in the Supplementary Methods. The use of BCR data for scientific purposes is regulated by Belgian law, excluding the need for written informed consent for this study.

Patients
All patients diagnosed between 2004 and 2014 with a primary tumour in the oesophagus/gastroesophageal junction or stomach (ICD-10: C15.0-C16.9) and a cM1 status were identified in the BCR. Analyses were restricted to patients with a Belgian residence at time of diagnosis. Inclusion and exclusion criteria were in accordance with the criteria used to develop the SOURCE model [17]. As this study took place entirely within the legal framework of the Belgian Cancer Registry, no ethical approval of concerned patients was needed. We more concretely refer to the privacy law of 08/12/1992 Chapter III Art 9 §2, 2e a) and 2e b) which refers to the Health Law of 2006.

Procedures
Treatment type was classified as for the original SOURCE model [17]. Input parameters for initial treatment were: BSC (registered as "no treatment" or if no anti-cancer or symptom relief treatment was registered), radiotherapy (aimed at primary tumour or metastases), chemotherapy, chemoradiotherapy, chemotherapy plus short-term (≤28 days) radiotherapy, resection (aimed at primary tumour or metastases), stent placement or other treatment (all other treatments not mentioned above, like targeted therapy only).
Missing data regarding input parameters were handled using multiple imputation by chained equations [25]. Tumour staging was based on the 7th edition of the TNM staging system. However, patients diagnosed prior to 2010 were staged according to the TNM 6th edition. Conditional multiple imputation was used to align the definitions. This procedure has been described previously for SOURCE [17]. Conditional multiple imputation based on the original SOURCE dataset was also used to impute data regarding the target location (primary tumour or metastases) if patients underwent radiotherapy, since this level of detail was not given.

Statistical Analyses
The primary endpoint was prediction of six-month overall survival. Overall survival was defined as the time between the date of diagnosis and death, or the date of last follow-up if a patient was censored. Differences in median survival between the development and validation cohort were assessed using Cox regression. To assess model performance, a concordance index (c-index) was calculated, as well as a calibration slope, intercept, absolute error and differences between predicted and observed survival outcomes.
Model calibration was assessed by measuring the goodness-of-fit and is described by the agreement between predicted and observed outcomes. In case of a perfect prediction, the calibration line has a slope of one and an intercept of zero (x = y). A linear model was applied to assess the calibration slope and intercept of the model. The model was evaluated for the entire cohort and pre-defined patient subgroups based on the model's input parameters [17]. Mean differences between predicted and observed survival were calculated only for patient subgroups greater than 50 patients.
A c-index was calculated to assess the discriminatory ability of SOURCE. The c-index estimate is the probability that for a random pair of patients, the patient with the highest survival indeed has a higher predicted survival estimate than the other patient. A value 0.5 indicates that the model does not perform better than chance. A value of 1 indicates perfect discrimination. C-indices <0.7 were rated as poor, 0.7-0.79 as fair, 0.8-0.89 as good and 0.9-1 as excellent [16].
The SOURCE model was re-estimated using the input of the Belgian dataset with the method that was applied to create the original Dutch SOURCE model. Model performance for the re-estimated model was also assessed by means of c-indices, calibration slopes, intercepts and absolute prediction errors.

Data Availability
The data that support the findings of our study are available in the Belgian Cancer Registry.

Conclusions
In conclusion, the SOURCE oesophageal model had low transportability to the Belgian population, but the gastric cancer model did transport adequately. Future studies should investigate the differences in diagnostics, treatment and survival between the populations, and the potential underlying causes. Model updating, in which newly available predictors can be incorporated to improve model performance, remains important. Furthermore, SOURCE should arm against overfitting by including fewer input parameters in future models. Lastly, for usage of the model in the Belgian clinical setting, model updating would be preferable in which ideally PS and more details regarding treatment could be incorporated.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/4/834/s1, Figure S1: Calibration plot of predicted versus observed six-month overall survival for patients with oesophageal cancer (red line) and gastric cancer (blue line). Results are shown from the re-estimated Belgian model, Table  S1: Re-estimation of the SOURCE prediction model for overall survival in Belgium patients with metastatic oesophageal and gastric cancer, Table S2: Calibration and discriminative ability of the re-estimated model for the Belgium population.