External Validation of a Nomogram to Predict Survival and Benefit of Concurrent Chemoradiation for Stage II Nasopharyngeal Carcinoma

Simple Summary The optimal treatment strategy (concurrent chemoradiation (CCRT) vs. radiotherapy alone) for stage II nasopharyngeal carcinoma (NPC) in the intensity-modulated radiotherapy (IMRT) era is controversial across guidelines. A nomogram by Sun et al. was published to predict the overall survival (OS) benefit of CCRT based on a patient’s clinical parameters. Using the cohort from the Hong Kong NPC1301 study, we evaluated the external validity of the nomogram and the associations between the proposed clinical factors and OS among stage II NPC patients. Use of CCRT was an insignificant predictor for OS. The nomogram lacked the predictive accuracy and should be interpreted with caution. Abstract A nomogram was recently published by Sun et al. to predict overall survival (OS) and the additional benefit of concurrent chemoradiation (CCRT) vs. radiotherapy (RT) alone, in stage II NPC treated with conventional RT. We aimed to assess the predictors of OS and to externally validate the nomogram in the IMRT era. We analyzed stage II NPC patients treated with definitive RT alone or CCRT between 2001 and 2011 under the territory-wide Hong Kong NPC Study Group 1301 study. Clinical parameters were studied using the Cox proportional hazards model to estimate OS. The nomogram by Sun et al. was applied with 1000 times bootstrap resampling to calculate the concordance index, and we compared the nomogram predicted and observed 5-year OS. There were 482 patients included. The 5-year OS was 89.0%. In the multivariable analysis, an age > 45 years was the only significant predictor of OS (HR, 1.98; 95%CI, 1.15–3.44). Other clinical parameters were insignificant, including the use of CCRT (HR, 0.99; 95%CI, 0.62–1.58). The nomogram yielded a concordance index of 0.55 (95% CI, 0.49–0.62) which lacked clinically meaningful discriminative power. The nomogram proposed by Sun et al. should be interpreted with caution when applied to stage II NPC patients in the IMRT era. The benefit of CCRT remained controversial.


Introduction
Nasopharyngeal carcinoma (NPC) is endemic in Southeast Asia. The age-standardized incidence rates (per 100,000 persons) were 5 in Southeast Asia and 1.6 globally, respectively [1]. According to the Hong Kong Cancer Registry, the crude rate was 11.2 per 100,000 persons in Hong Kong in 2018. Stage II NPC comprises 11.5% and 14.1% of all stages in the seventh and eighth edition AJCC [2,3]. While radiotherapy (RT) is the mainstay of definitive treatment in stage II NPC, the additional benefit of concurrent chemoradiation (CCRT) in the intensity-modulated radiotherapy (IMRT) era remains controversial [4][5][6][7]. The National Comprehensive Cancer Network (NCCN) guideline [8] recommends CCRT with induction or adjuvant chemotherapy for stage II-IVB NPC. On the other hand, the latest CSCO/ASCO guideline [9] recommends the decision on CCRT to be based on the TN subcategory and risk assessment. In contrast, the ESMO/EURACAN guideline [10] suggests that RT alone could be considered if IMRT is used. Treatment outcomes in early NPC have improved remarkably in the IMRT era [5,11,12]. In contrast to the traditional two-dimensional radiotherapy, IMRT utilizes multiple radiation beams and modulated radiation intensities to deliver an adequate dose to the tumor in a conformal shape with high precision, while minimizing radiation spillage to the adjacent critical organs [13].
Recently a nomogram [14] was proposed by Sun et al. to estimate the 5-year and 10-year overall survival (OS) in stage II NPC. It predicted the additional benefit of CCRT based on the data from a landmark randomized controlled trial (RCT) in 2011 [15] that showed CCRT improved OS, progression-free survival, and distant metastasis in stage II NPC treated with a conventional RT technique.
This nomogram is easy to use and consists of clinical parameters commonly reported in cancer staging. It could potentially inform clinicians and patients of disease prognosis and estimate the clinical benefit of CCRT, for shared treatment decisions. In this study, we performed an external validation on the proposed nomogram to review its discrimination and accuracy, to study whether it should be widely adopted in the modern treatment era.

Patients and Treatment
We retrospectively reviewed data from the Hong Kong NPC Study Group (HKNPCSG) 1301 study [16]. The data was based on the Hong Kong Cancer Registry. All clinical data and treatment records were retrieved through the electronic patient record system of the six oncology centers in public hospitals in Hong Kong. All patients underwent physical examination, fiberoptic nasopharyngoscopy, and MRI of the nasopharynx and neck (or CT if contraindicated to MRI) as part of the pretreatment evaluation. They were retrospectively staged according to the seventh edition of the AJCC/UICC staging system [17], and clinical information was validated in the previous study by the principal investigator. Inclusion criteria were similar to the original study [14,15]. Stage II (seventh edition AJCC/UICC) treatment naïve NPC patients, with WHO type II or III histology, aged 70 or under, who underwent definitive IMRT alone, or CCRT, were selected from the database. We excluded patients without staging MRI, and those who received neoadjuvant and/or adjuvant chemotherapy. All patients were treated with IMRT according to their institutional practice. RT details had been reported [16]. Concurrent chemotherapy regimens were commonly cisplatin 30-40 mg/m 2 weekly, cisplatin 100 mg/m 2 three-weekly, or for selected patients, carboplatin. The use of concurrent chemotherapy was based on patient factors, the clinicians' discretion, and the individual center's protocol. Fiberoptic nasopharyngoscopy was performed at 6-16 weeks after RT completion. Subsequent follow-up schedules followed institutional policies. This study was approved by all individual institution review boards.
The nomogram under study was published in the paper by Sun et al. [14]. We studied the clinical and treatment factors, namely: 1. Age, 2. T category, 3. N category, 4. Treatment group, and calculated the total points for each patient. The predicted probability of OS was estimated from the nomogram. The observed OS was derived from survival analyses in our study cohort.

Statistical Analysis
The primary endpoint was OS, which was defined as the time interval from the start of RT to any cause of death or the date of censoring at the last follow-up. Baseline characteristics were evaluated using a Chi-square test for categorical variables and t-test for continuous data (or Mann-Whitney U test if appropriate). Kaplan-Meier curves were used for survival data. A log-rank test was used to compare the survival between treatment groups and stages. The Cox proportional hazards model was used for univariable and multivariable analyses, and to determine the adjusted hazard ratio with a 95% confidence interval. A p < 0.05 was considered statistically significant. Total points were calculated from the nomogram to predict 5-year OS. The discrimination and calibration were evaluated using Harrell's Concordance Index (C-index) and calibration plot with 1000 times bootstrap resampling. To evaluate the nomogram accuracy, a calibration plot was constructed by grouping patients according to the total points and the corresponding nomogram-predicted 5-year OS probabilities, and then compared with the observed Kaplan-Meier 5-year OS. Perfect prediction accuracy should be on the diagonal line. The C-index and calibration plot were calculated with Stata (Version 16.1, StataCorp, College Station, TX, USA) and R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). Other statistical analyses were performed using SPSS software (Version 21, SPSS Inc., Chicago, IL, USA).

Results
A total of 589 patients treated in 2001-2011 were identified from the HKNPCSG 1301 study. Of these, 482 patients were included in this study based on the inclusion and exclusion criteria (Supplementary Table S1). Baseline characteristics are shown in Table 1. Of 482 patients, 447 (92.7%) had nonkeratinizing undifferentiated carcinoma.
Compared with the RT alone group, patients in the CCRT group were more likely to be younger (p = 0.02), in category N1 than N0 (p < 0.01), and with a more advanced TN category (p < 0.01). However, the gender and T category were similar. The median follow-up duration was 86 months. The mean of the total points calculated from the nomogram was 198.53 (standard derivation 58.92); the RT alone group had significantly higher total points than the CCRT group (RT alone: 230.20 vs. CCRT: 143.47; p < 0.01).
Within the study period, a total of 79 deaths occurred. Twenty-eight in the CCRT group and 51 in the RT alone group. The 5-year and 8-year OS of the overall cohort were 89.0% and 84.1%, respectively. The CCRT group had a 5-year and 8-year OS of 90.8% and 84.9%, while in the RT alone group they were 88.4% and 83.7%, respectively. The log-rank test did not show a significant difference between these two groups (X 2 = 0.02; p = 0.90) (Table 1; Figure 1b). Clinical factors described in the nomogram and other relevant factors were analyzed for their association with OS. In the univariable analysis, older age was a poor prognostic factor for OS (p < 0.01); an age > 45 years had a shorter OS (hazard ratio (HR), 1.97; 95% confidence interval (CI), 1.13-3.40; p = 0.02). The T category, N category, and TN category were not significant predictors of OS (Table 2). In the multivariable analysis, an age > 45 years remained the only independent predictor for OS (HR, 1.98; 95% CI, 1.15-3.44; p = 0.02). None of the other predictors in the nomogram were significant (Table 3). Figure 1 shows the OS curves by different factor categories.

Discrimination and Accuracy
The C-index of the study nomogram was 0.55 (95% CI, 0.49-0.62). A calibration plot ( Figure 2) was constructed based on the observed probability of a 5-year OS against the predicted 5-year OS from the nomogram. The scatter plot did not fit with the diagonal line. The calibration slope and intercept were 0.27 (95% CI, 0.23-0.32) and −1.51 (95% CI, −1.64 to −1.38), respectively.

Discussion
We retrieved data from our Hong Kong territory-wide electronic health database to perform an external validation of a recently published nomogram to predict 5-year OS after curative treatment for stage II NPC patients. However, the result did not support the use of this nomogram to predict OS or make clinical decisions on the use of CCRT in the modern IMRT era.
We reviewed patients with similar inclusion criteria to the original cohort for nomogram construction and validation. Our sample size (N = 482) was equivalent to the original cohorts (N = 199 and 306 for internal and external validation respectively [14]). Only younger patients (i.e., aged ≤ 70 years) were selected, as CCRT, especially a platinumbased regimen, was generally considered less well-tolerated in the older population. Major trials of chemoradiation have also selected only younger populations [18][19][20][21]. Our analysis showed that age was the only significant predictor for OS. T-category, N-category, TN category, or concurrent use of chemotherapy did not predict OS. The concordance index of the nomogram lacked clinically meaningful discriminative power. The calibration plot also did not support prediction accuracy, with concerns on overestimation and overfitted risk estimates.
A phase III RCT showed the survival benefit of CCRT in stage II NPC using 2D conventional RT and staged according to the Chinese 92 staging system [15]. In addition, the nomogram under evaluation in this paper was derived from the patient cohort from that trial after the exclusion of 31 patients that should be regarded as stage III disease in the seventh edition of the AJCC/UICC TNM staging system [14]. Subsequently, there was significant RT technique improvement with IMRT, which further improved local control and survival [22][23][24]. Whether concurrent chemotherapy, in the IMRT era, confers additional survival benefits in stage II NPC had been under much debate. While Luo et al. found an improved survival for CCRT as compared with IMRT in their cohort with predominantly WHO type II histology [25], other phase II trials [26], retrospective cohorts [7,[27][28][29][30][31] and meta-analyses [6], [32] have failed to replicate a survival or progression-free survival benefit in the IMRT era. In addition, CCRT had been shown to increase toxicities, including grade three or four neutropenia [6,28], mucositis [15], nausea/vomiting [15] and weight loss [28]. To our knowledge, no subsequent prospective phase III RCT had been published in the IMRT era to confirm the findings from these retrospective analyses.
Stage II NPC had the caveat of being a heterogeneous group of patients with and without lymph node metastasis. Clinicians were generally more inclined to offer concurrent chemotherapy for T2N1 and/or T1N1 patients. Studies had suggested that the T2N1 subgroup had poorer survival outcomes [25,29,33]. The CSCO/ASCO guideline published in late 2020 suggested considering RT alone for T2N0 patients and CCRT for N1, particularly T2N1 patients (eighth AJCC). Our cohort indicated no significant difference in OS between TN categories, and neither the T category nor N category was a useful parameter to make clinical decisions on CCRT. Furthermore, no significant interaction effect was found between the TN category nor the N category and the treatment group in predicting OS. Our result did not concur with other retrospective series [11,29,31,33] that node positivity predicts worse survival. Our cohort is one of the largest. Yet, all the series had a notably different proportion of treatment groups and length of follow-up. Also, like other similar retrospective series, the uneven baseline characteristics in the N and TN categories and the treatment group is an important confounding factor. Prospective studies are needed to confirm our findings. In addition, studies had been conducted to evaluate other parameters, e.g., Epstein-Barr virus (EBV) DNA [34][35][36] and gross tumor volume [37], to refine risk stratification for stage II NPC patients [36]. The result of a phase III noninferiority trial comparing CCRT and RT alone in intermediate-risk NPC with low EBV DNA copies in the IMRT era was eagerly awaited to guide treatment decisions (ClinicalTrials.gov identifier: NCT02135042).
Staging in NPC has evolved over the past decades. The difference in the staging system used in the training cohort (Chinese 1992 staging [38] which was restaged to the seventh AJCC) and validation cohort (seventh AJCC) in Sun et al., as well as in this current cohort (seventh AJCC) should be carefully addressed. The major difference between the two is in the N category. First, patients with a bilateral upper neck LN were restaged to N2 in the seventh AJCC and were excluded from the analyses. Second, patients with lower neck LN or an LN sized 4-6 cm were considered N2 in the Chinese 1992 staging, but regarded as N1 in the seventh AJCC staging. One should exercise extra caution in applying the nomogram in these populations.
The nomogram is a valuable tool to estimate individualized risk based on patient, disease, and treatment factors. Yet, careful appraisal and application of nomograms are vital. Many nomograms published in the oncology field were solely internally validated or externally validated in patients from the same institute. Issues including over-interpretation, over-fitting, and generalizability across different populations had to be addressed before clinical use [39]. Moreover, rapid advancement in oncological treatment could render previously validated nomograms inaccurate in the modern era. External validation from other institutions is therefore needed to ensure the accuracy and reliability of the decision tool.
A major limitation of this study is that our median follow-up duration was 86 months, shorter than the 120 months in the original trial. We did not analyze the 10-year prediction nomogram due to data limitations. However, the survival curves had already plateaued in our data, and we believed that the influence on survival of the clinical predictors under study would have been apparent within the study period. Despite these limitations, our study has several strengths. We analyzed a reasonably large and homogeneous cohort similar to that in Chen et al. [15] and Sun et al. [14] in terms of patient selection and staging. Moreover, it is a territory-wide study that covered more than 90% of secondary and tertiary care. Incomplete medical information and loss of follow-up were of minor concern. Our cohort and the study outcome are representative of the Hong Kong population in real-world practice.

Conclusions
The nomogram under study lacked predictive discrimination and accuracy in the modern IMRT era, and it should be used with caution. The benefit of concurrent chemoradiation vs. radiotherapy alone among stage II NPC patients is still controversial. Future research is needed to identify subgroups of stage II NPC patients who may benefit from CCRT.   Data Availability Statement: Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data are not available.

Conflicts of Interest:
The authors declare no conflict of interest.