MRI Radiomics for Predicting Survival in Patients with Locally Advanced Hypopharyngeal Cancer Treated with Concurrent Chemoradiotherapy

Simple Summary MRI radiomic models outperformed traditional clinical parameters in the prediction of survival in patients with hypopharyngeal cancer who had undergone concurrent chemoradiotherapy. By combining the identified radiomic signature with independent traditional clinical variables, we were able to devise new nomograms that successfully predicted survival outcomes in this patient group. Abstract A reliable prognostic stratification of patients with locally advanced hypopharyngeal cancer who had been treated with concurrent chemoradiotherapy (CCRT) is crucial for informing tailored management strategies. The purpose of this retrospective study was to develop robust and objective magnetic resonance imaging (MRI) radiomics-based models for predicting overall survival (OS) and progression-free survival (PFS) in this patient population. The study participants included 198 patients (median age: 52.25 years (interquartile range = 46.88–59.53 years); 95.96% men) who were randomly divided into a training cohort (n = 132) and a testing cohort (n = 66). Radiomic parameters were extracted from post-contrast T1-weighted MR images. Radiomic features for model construction were selected from the training cohort using least absolute shrinkage and selection operator–Cox regression models. Prognostic performances were assessed by calculating the integrated area under the receiver operating characteristic curve (iAUC). The ability of radiomic models to predict OS (iAUC = 0.580, 95% confidence interval (CI): 0.558–0.591) and PFS (iAUC = 0.625, 95% CI = 0.600–0.633) was validated in the testing cohort. The combination of radiomic signatures with traditional clinical parameters outperformed clinical variables alone in the prediction of survival outcomes (observed iAUC increments = 0.279 [95% CI = 0.225–0.334] and 0.293 [95% CI = 0.232–0.351] for OS and PFS, respectively). In summary, MRI radiomics has value for predicting survival outcomes in patients with hypopharyngeal cancer treated with CCRT, especially when combined with clinical prognostic variables.


Introduction
Hypopharyngeal cancer represents a distinct clinical entity, and estimates derived from the most recent update of the Taiwan Cancer Registry show that the crude incidence rate is 5.15 per 100,000 persons. Among different head and neck malignancies, hypopharyngeal cancer continues to show unfavorable survival outcomes [1]. In addition, a large proportion of patients (70-85%) have advanced stages at diagnosis due to the presence of occult symptoms and signs [2,3]. While primary surgery remains a treatment option in advanced hypopharyngeal cancer, concurrent chemoradiotherapy (CCRT) has increasingly emerged as a non-surgical alternative to achieve organ preservation [4]. Unfortunately, approximately 50% of patients with advanced hypopharyngeal cancer who had received non-surgical primary treatment ultimately experience disease recurrences [5], while the outcomes of salvage surgery are generally unsatisfactory [6,7]. In this scenario, a reliable prognostic stratification of patients treated with primary CCRT is crucial for informing tailored management strategies.
By virtue of its excellent soft tissue contrast, magnetic resonance imaging (MRI) outperforms computed tomography (CT) in terms of anatomical resolution and is commonly applied for head and neck cancer staging. Although there is a growing potential for utilizing radiomic features as prognostic biomarkers in patients with head and neck malignancies [8][9][10][11][12][13], previous research has mainly relied on CT features. Previous studies have described the robustness of CT radiomic features and their potential usefulness for predicting various clinical endpoints; however, CT radiomic features are intrinsically limited by low soft tissue contrast. It is therefore crucial to investigate the potential usefulness of MRI-based radiomics in assisting prognostic stratification. Starting from these premises, the purpose of this retrospective study was to develop robust and objective MRI radiomics-based models for predicting overall survival (OS) and progression-free survival (PFS) in patients with locally advanced hypopharyngeal cancer who had undergone CCRT. By combining the identified radiomic signature with independent clinical variables, we were able to devise new nomograms that outperformed traditional prediction models.

Materials and Methods
This retrospective study was approved by the Institutional Review Board of the Chang Gung Medical Foundation, Taiwan (reference number: 201901900B0) and received a waiver of patient consent. All procedures complied with the tenets outlined in the Declaration of Helsinki and the Good Clinical Practice Guidelines.

Study Patients
We retrospectively reviewed the clinical records of patients with newly diagnosed hypopharyngeal cancer who presented at the Chang Gung Memorial Hospital, Taoyuan, Taiwan, between August 2006 and September 2015. Inclusion criteria were as follows: (1) pathologically proven diagnosis of advanced hypopharyngeal cancer, (2) availability of pretreatment contrast-enhanced head and neck MRI, and (3) curative-intent treatment with primary CCRT according to the National Comprehensive Cancer Network guidelines [14]. Patients with histology types different from squamous cell carcinoma and those with second primary tumors or synchronous cancers were excluded, as were those with metastatic disease. Demographic data (including age and sex), tumor differentiation and information on clinical stages (including T stage, N stage, and overall stage) were collected in all participants. Disease staging was performed using the American Joint Committee on Cancer (AJCC) Staging Manual, Seventh Edition. A detailed description of the CCRT protocol is reported in Appendix C. For model training and validation, the study participants were divided (2:1 ratio) into a training cohort and a testing cohort.

Follow-Up and Survival
All patients were clinically followed-up with physical and pharyngoscopic examinations every 1-3 months during the first two years, every 3-6 months during the third year, and 6-12 months thereafter. Imaging follow-up was alternatively performed with CT or MRI every 3 months during the first post-treatment year, every 6 months during the subsequent three years, and on an annual basis thereafter. All of the imaging examinations were scheduled in advance and were generally performed in the week preceding the clinical follow-up visits. OS was defined as the interval between the date of initial pathologic diagnosis and the date of death or the day of last follow-up. Patients who were lost to follow-up or were alive at the day of last follow-up were treated as censored observations. PFS was calculated as the interval between the date of initial pathologic diagnosis and the date of the first sign of progression, death, or the day of last follow-up.
Using a slice-by-slice approach, all tumor volumes of interest were manually contoured on the transverse section using an open-source platform (ITK-SNAP, version 3.8.0; http://www.itksnap.org, accessed on 12 June 2019). All procedures were carried out by a senior head and neck radiologist (S.H.N.; 35 years of working experience). To assess interobserver reproducibility, images obtained from a randomly selected patient subset (n = 30) in the training cohort were subjected to segmentation by an independent radiologist (T.Y.S.; 8 years of working experience in the field of head and neck imaging). During tumor contouring, both radiologists were blinded to clinical information. Intraclass correlation coefficients (ICCs) were used to quantify the interobserver reproducibility of the extracted radiomic features, with reproducibility being defined as an ICC ≥ 0.75.
Prior to feature extraction, a pre-processing pipeline was applied to fat-saturated gadolinium-enhanced T1-weighted MR images to normalize signal intensity and geometric variations. The detailed procedure is described in Appendix C. An open-source platform (PyRadiomics, version 3.0.1) was used for both image pre-processing and radiomic features extraction. Most of the features extracted with PyRadiomics were in accordance with the criteria outlined in the Image Biomarker Standardization Initiative [15].

Model Construction and Data Analysis
Model construction and data analysis were carried out in the R environment (version 3.6.3; http://www.r-project.org/), accessed on 6 March 2020. The R packages used in the study are reported in Appendix C.

Machine-Learning-Based Radiomic Model
Feature selection and model building were carried out in the training cohort, whereas model performance was examined in the testing cohort. We applied a feature selection strategy that included the following steps: reproducibility assessment, redundancy reduction, univariate outcome analysis, and least absolute shrinkage and selection operator (LASSO)-Cox regression modeling. We initially disregarded all features characterized by low reproducibility (i.e., ICC < 0.75), followed by removal of radiomic features showing a high degree of collinearity (i.e., Pearson's r > 0.9). This was accomplished using the "caret: find Correlation" function in R. The retained features were subsequently subjected to univariate Cox analysis to preselect significant (p < 0.05) prognostic factors. Finally, the LASSO-Cox regression model was applied in the training set to identify the strongest predictive parameters. On the basis of the regulation weight (λ), LASSO shrinks all of the regression coefficients towards zero and removes irrelevant features by setting their coefficients exactly to zero. The optimal λ value was identified by applying a ten-fold crossvalidation with minimum criteria. We finally devised a radiomic score (termed RadScore) for outcome prediction using a linear combination of the selected features weighted by their non-zero coefficients generated with LASSO. The workflow used for radiomic model construction is summarized in Figure 1.

Development of Clinical and Combined Radiomic-Clinical Models
Clinical characteristics, including age, sex, histologic grading, T stage, N stage, and overall clinical stage, were collected from medical records and subsequently entered into a multivariate Cox regression model (with the exception of histologic grading due to a high number of missing data). Survival outcomes served as dependent variables. The combination of parameters characterized by the lowest Akaike information criterion (AIC) value was selected to construct the clinical model. The radiomic-clinical model was subsequently devised by combining the RadScore with the variables selected in the clinical model, with the resulting estimates being plotted in nomograms. Calibration curves were used to illustrate the agreement between the estimated prognosis and the observed survival.

Statistics
The training cohort was dichotomized into two groups (high-versus low-risk) according to the median values of predicted risk scores. The same cutoffs were subsequently applied to the testing cohort. Intergroup comparisons of survival outcomes, including OS and PFS, were performed using the log-rank test. The prognostic performance of each model was assessed using iAUC based on the predicted risk. The iAUC is an integral of the product of area under the cumulative/dynamic time-dependent ROC curve and the probability density function of the time-to-event outcome [16]. Higher iAUC values reflect a better prognostic ability. The iAUC values of different models were compared, and the differences were calculated by applying a total of 1000 bootstrap replicates. All hypothesis testing was two-tailed, with statistical significance defined as a p value < 0.05.

MRI Radiomic Models
Of the 1223 radiomic features extracted from each patient, 858 were found to be reproducible (ICC ≥ 0.75) and 702 were selected for further analyses after the removal of redundancies. Univariate analysis identified 125 and 131 features as being significantly associated with OS and PFS, respectively. After applying LASSO selection, four (denoted as f 1 -f 4 ) and nine (denoted as f ' 1 -f ' 9 ) features were retained for the prediction of OS and PFS, respectively. Table 2 Table 3). The ability of these scores for predicting OS (model 1a: iAUC = 0.580; 95% CI = 0.558-0.591]) and PFS (model 2a: iAUC = 0.625; 95% CI = 0.600-0.633) was validated in the testing cohort (Table 4). Using the median values of the RadScore_OS (0.50) and the RadScore_PFS (−3.28) as cutoffs, the radiomic models were used to dichotomize patients in the training and testing cohorts into low-versus high-risk groups (OS in the training cohort ( Figure 4A), p = 0.009, log-rank test; OS in the testing cohort ( Figure 4B), p = 0.004, log-rank test; PFS in the training cohort ( Figure 5A), p < 0.001, log-rank test; and PFS in the testing cohort ( Figure 5B), p = 0.003; log-rank test).  Abbreviations: glszm-gray-level size zone matrix; glcm-gray-level co-occurrence matrix; and ngtdm-neighboring gray tone difference matrix. LHH, LHL, and LLL denote the high-and low-pass filters on the x, y, and z dimensions, respectively (H-high; L-low).

Clinical Models
According to the least AIC values, the optimal clinical models for the prediction of both OS and PFS included T4a stage, T4b stage, and N2c stage. In the testing cohort, multivariate Cox proportional hazard analysis revealed no independent associations of T4a or T4b stages with both OS ( (Table 3). However, the associations of N2c stage with OS (HR = 2.54 [95% CI = 1.21-5.33], p = 0.01) and PFS were statistically significant and marginally significant (HR = 1.93 [95% CI = 0.96-3.90], p = 0.07), respectively (Table 3). However, these findings should be interpreted with caution due to the limited number of patients in the testing cohort. The clinical models in the testing cohort were characterized by a modest ability to predict both OS (model 1b: iAUC = 0.392 [95% CI = 0.322-0.447]) and PFS (model 2b: iAUC = 0.381 [95% CI = 0.308-0.433]; Table 4). Application of the models to the training cohort revealed a marginally significant difference in OS (p = 0.06, log-rank test; Figure 4C) and a statistically significant difference in PFS (p = 0.04, log-rank test; Figure 5C) for patients at high-versus low-risk. However, no significant difference was observed in the testing cohort (OS: p = 0.27, log-rank test, Figure 4D; PFS: p = 0.18, log-rank test, Figure 5D).

Comparison of Model Performances
The differences in terms of iAUC for distinct predictive models are reported in Table 5.

Construction of Nomograms from Radiomic-Clinical Models
With the goal of devising visual tools for predicting both OS and PFS, nomograms comprising both clinical factors and radiomic signatures were constructed ( Figure 6A,B). Calibration curves ( Figure 6C,D) revealed a good agreement between the predicted and observed survival endpoints (2-and 3-year OS and PFS). However, the observed outcomes showed a slight deviation from the predicted curves during the first year of follow-up.

Discussion
Using a combination of MRI radiomic signatures and clinical parameters, we were able to devise and validate prognostic models that successfully predicted OS and PFS in patients with hypopharyngeal cancer who had undergone CCRT. By applying the LASSO-Cox machine learning algorithm, a total of 13 radiomic features extracted from fat-saturated post-contrast T1-weighted MR images were found to be associated with survival outcomes. Interestingly, the integration of radiomic features improved the predictive capacity of clinical models, and the combined radiomic-clinical models showed the highest ability to predict both OS and PFS. On the one hand, our prediction tools can offer a reliable prognostic assessment suitable for clinical prognostication. On the other hand, the use of our nomograms has the potential to tailor treatment at the individual level.
Our study confirms and expands previous data on the prognostic utility of radiomic features in patients with hypopharyngeal cancer [17]. However, prior studies were conducted with heterogeneous samples in terms of disease stage, with the majority of partici-pants being treated with surgical excision [17]. In the current investigation, we specifically focused on patients with advanced-stage hypopharyngeal cancer who had undergone CCRT. Therefore, a strength of our study lies in the possibility to obtain an accurate survival prediction in this specific subgroup.
Three first-order and seven second-order MRI radiomic features showed the highest discriminative power for prognostic purposes. It is worth noting that the prognostic features identified in our study reflected the extent of contrast enhancement observed in post-contrast T1-weighted images as being therefore related to tumor vascularity. There is ample evidence that angiogenesis has an adverse prognostic significance in several solid malignancies [18], including head and neck cancer [19][20][21]. A prior radiomic study conducted on patients with hypopharyngeal cancer who had been treated with chemoradiation demonstrated that two first-order features derived from post-contrast CT images (wavelet-LLH_firstorder_Maximum and wavelet-HLL_firstorder_Median) were independently associated with PFS [22]. Another study reported that wavelet-LHL_firstorder_Maximum and wavelet-LHL-firstorder_Kurtosis-two features extracted from post-contrast CT images-successfully predicted PFS in patients with locally advanced hypopharyngeal cancer who had undergone induction chemotherapy [23]. Finally, Li et al. [24] developed a CT radiomic signature based on first-order features (i.e., minimum, skewness, and total energy) to be used in the preoperative phase for predicting early recurrences of hypopharyngeal cancer. Second-order radiomic features-also termed texture features-reflect the statistical relationships of gray levels within an image and represent a proxy for intratumor heterogeneity. Aerts et al. [25] have previously shown that, among different CT radiomic features, those related to tumor heterogeneity had the highest value for predicting survival in lung cancer or head and neck cancer. This signature was subsequently validated in an independent cohort of oropharyngeal squamous cell carcinoma, wherein its prognostic significance was unaffected by the presence of CT artifacts [26].
Clinical decision-making in patients with malignancies is generally guided by the AJCC staging system. While the TNM stage can be considered a suitable proxy of the overall disease status, staging variables do not possess a quantitative nature and might not accurately reflect underlying differences in tumor biology. In this scenario, the use of radiomic markers has markedly potentiated our capacity to characterize highly diverse phenotypic tumor characteristics [27]. It can therefore be expected that they would possess a complementary value to traditional TNM staging for prognostic purposes. A previous study has shown that radiomic models can predict the risk of progression in hypopharyngeal cancer more effectively compared with clinical variables alone [22], an observation in line with our current data. However, a significant limitation that our investigation shares with prior studies lies in its retrospective design. This may raise questions about whether the predictive value of our combined radiomics-clinical models can still be applicable to the eighth edition of the AJCC TNM Staging Manual [28].
On examining the prognostic value of the nomograms devised in our study, we found a good agreement between predicted and observed 2-and 3-year OS and PFS; however, a slight deviation was evident when the outcomes pertaining to the first year were taken into account. Previous studies have shown that several clinical factors-different from the oncologic status-may be associated with early treatment failure in patients with locally advanced head and neck cancer who had completed their CCRT course. These variables, which include comorbidities [29,30], poor performance status [29,31], low body mass index [29,31,32], anemia [29,30], malnutrition [29,31], and low total lymphocyte count [31], are associated with impaired immune defenses and increase patient vulnerability to infectious complications during the course of treatment schemes. The lack of inclusion of these parameters in our nomograms may explain their limited ability to predict survival outcomes during the first year of follow-up.
Our study has several limitations that merit consideration. First, the reliance on manually selected slices made the extraction of MRI radiomic features labor-intensive, time-consuming, and prone to intra-and inter-observer variability [33,34]. Future studies with deep-learning-based automated segmentation techniques should work to address this limitation. Second, a certain degree of technical variability and the potential occurrence of image artifacts are still a concern in the field of radiomics. Previous studies have shown that image noise and texture may be affected by variations in MRI acquisition parameters [35][36][37][38]. Additionally, head and neck MRI is prone to swallowing-related motion artifacts. Collectively, these potential confounders may affect the prognostic value of the extracted radiomic features. Third, we solely focused on features extracted from post-contrast T1-weighted images, and other MRI sequences were not taken into account. Future research should include additional MRI sequences or multiple imaging modalities to examine a higher number of features. Fourth, it is also possible that the small sample size may have limited the power to detect significant associations and, for that reason, larger prospective cohort studies are required. Finally, the single-center design might have limited the external validity of the results. The prognostic value of our tools needs to be independently tested in larger, longitudinal investigations.

Conclusions
In conclusion, we were able to obtain an accurate prediction of survival outcomes in patients with hypopharyngeal cancer treated with CCRT through combined radiomicclinical models and related nomograms. Our results suggest that the extraction of radiomic features from MR images may improve the prognostic stratification informed by traditional clinical variables. Integration of clinical and radiomic signatures may have the potential to tailor treatment at the individual level.

Informed Consent Statement:
This retrospective study received a waiver of patient consent.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Concurrent Chemoradiotherapy Protocol
All patients received intensity-modulated radiotherapy using 6-MV photon beams. The initial prophylactic field included gross tumor with at least 1-cm margins and neck lymph node at risk for 46-56 Gy, then cone-down boost to the gross tumor area up to 72 Gy. Concurrent chemotherapy consisted of intravenous cisplatin 50 mg/m 2 on day 1, oral tegafur 800 mg/day and leucovorin 60 mg/day from day 1 to day 14. This regimen was administered every 14 days.

Appendix B. Image Pre-Processing and Radiomic Features Extraction
Gadolinium-enhanced T1-weighted MRIs in the format of Digital Imaging and Communications in Medicine (DICOM) were exported to a local console through a picture archiving and communication system (GE Centricity RA1000, GE Healthcare, Barrington, IL, USA). These images were converted to the Neuroimaging Informatics Technology Initiative (NIFTI) format for tumor segmentation and image pre-processing. A pre-processing pipeline was applied to the fat-saturated gadolinium-enhanced T1-weighted MRI to normalize the signal intensity and geometric variations. Low-frequency intensity nonuniformity in MRI was first removed using the N4 bias correction function in simpleITK (version 2.0.2; https://simpleitk.org). Normalization of the intensity of the MRI signal was performed by centering the image at the mean with the standard deviation. Outlier voxels, which were defined as voxels with a signal intensity that differs more than 3 standard deviations from the mean, were removed from the analysis. Normalized MRIs were then isotropically resampled using B-spline interpolation to acquire a voxel size of 1 mm × 1 mm ×1 mm. A fixed bin number of 64 was used for gray-level discretization. Radiomic features were extracted from the original image and the filtered images, including the Laplacian of Gaussian (LoG) filter (σ = 0.5, 1.0, 1.5 mm and 2.0 mm) and the wavelet filter (LLL, LLH, LHL, LHH, HLL, HLH, HHL and HHH with a mother wavelet of Coiflet1). A total of 1223 radiomic features were extracted, characterizing tumor shape (14 features), first-order metrics (18 features), texture patterns (75 features), LoG features (372 = 93×4 features), and wavelet features (744 = 93×8 features). The image pre-processing and radiomic feature extraction were performed using the PyRadiomics platform (version 3.0.1; https://www.radiomics.io/pyradiomics.html) implemented in Python version 3.7.4.

Appendix C. Statistical Analysis in R
The following R packages were used in this study: (1) "stats" package for the Student t test and chi-square tests, (2) "irr" package for calculating ICC, (3) "survival" package for building Cox proportional hazards model and Kaplan-Meier analysis, (4) "glmnet" package for least absolute shrinkage and selection operator (LASSO)-Cox analysis, (5) "survAUC" package for calculating integrated area under the time-dependent receiver operating characteristic (ROC) curve (iAUC), (6) "boot" and "boot.pval" packages for performing bootstrapping estimates, and (7) "rms" package for nomograms and calibration curves. Survival curves were plotted using the ggplot2 package for R.