Radiomics Nomogram: Prediction of 2-Year Disease-Free Survival in Young Age Breast Cancer

Simple Summary Breast cancer in young women under 40 years of age shows a poor prognosis, and its treatment is difficult due to premenopausal status and fertility preservation. The early prediction of prognosis of young age breast cancer would be helpful for planning treatment and postoperative surveillance. In this study, the radiomics-based nomogram for the prediction of recurrence shows good predictive ability, especially 2-year disease-free survival after surgery. Several radiomics features presumed to be unique imaging features of young age breast cancer were also observed: tumor homogeneity and tumor sphericity. As these radiomics features are quantitative parameters extracted through the texture analysis of breast MRI, they may reflect the information of tumors more objectively, such as the tumor microenvironment. Furthermore, these results will be the basis for identifying the unique biology of young age breast cancer through multi-omics studies such as radio-genomics. Abstract This study aimed to predict early breast cancer recurrence in women under 40 years of age using radiomics signature and clinicopathological information. We retrospectively investigated 155 patients under 40 years of age with invasive breast cancer who underwent MRI and surgery. Through stratified random sampling, 111 patients were assigned as the training set, and 44 were assigned as the validation set. Recurrence-associated factors were investigated based on recurrence within 5 years during the total follow-up period. A Rad-score was generated through texture analysis (3D slicer, ver. 4.8.0) of breast MRI using the least absolute shrinkage and selection operator Cox regression model. The Rad-score showed a significant association with disease-free survival (DFS) in the training set (p = 0.003) and validation set (p = 0.020) in the Kaplan–Meier analysis. The nomogram was generated through Cox proportional hazards models, and its predictive ability was validated. The nomogram included the Rad-score and estrogen receptor negativity as predictive factors and showed fair DFS predictive ability in both the training and validation sets (C-index 0.63, 95% CI 0.45–0.79). In conclusion, the Rad-score can predict the disease recurrence of invasive breast cancer in women under 40 years of age, and the Rad-score-based nomogram showed reasonably high DFS predictive ability, especially within 2 years of surgery.


Introduction
For decades, the development of treatment and screening systems for breast cancer has contributed to improving breast cancer mortality rates [1][2][3]. Despite these improvements, the incidence of breast cancer and cancer-associated mortality in Asia has rapidly increased, and the incidence and mortality rates of breast cancer were the highest in Asia among five continents in 2020 [4,5]. Moreover, breast cancer in Asia affects a distinct population. The This retrospective study was approved by the Institutional Review Board of our institution. The requirement for informed consent was waived by the ethics committee due to the retrospective design. All assessments were carried out according to the Declaration of Helsinki of 1975, revised in 2013.
From January 2011 to February 2019, among 4451 patients who underwent breast cancer surgery in our institution, 320 female patients under 40 years old who were diagnosed with invasive breast cancer were consecutively included. Because pretreatment breast MRI was used to create the radiomics signature, patients without pretreatment MRI (n = 21) or patients with incomplete MR data (n = 86) were excluded. To investigate the DFS rates two years after surgery, patients with less than two years of follow-up (n = 58) were excluded. Patients who had not been diagnosed with recurrent breast cancer by the end of five years were censored.
Disease recurrence was defined as newly diagnosed breast cancer in the ipsilateral or contralateral breast or the axillary or distant metastasis. The diagnosis of disease recurrence was based on pathological confirmation through biopsy. If biopsy was impossible, recurrence was diagnosed using imaging modalities such as positron emission tomography and computed tomography.

MRI Protocol
All enrolled patients underwent dynamic contrast-enhanced MRI, including diffusionweighted images with 3-Teslar MRI vendors. MRI was performed in the prone position using a dedicated bilateral breast surface coil. Imaging with 3-Teslar MRI systems (Verio, Siemens Healthcare, Erlangen, Germany; Ingenia, Philips Medical Systems, Best, The Netherlands) was performed. The detail of each sequence is described in Appendix A.1.

Texture Analysis for Radiomics Feature Extraction
Two expert breast radiologists with 8 and 24 years of experience in breast MRI retrospectively reviewed the pretreatment breast MRI using a picture archiving and communication system (PACS) with a workstation monitor. The target lesion was defined as the largest enhancing lesion regardless of mass or non-mass enhancement. One radiologist segmented the target lesion three-dimensionally with a semiautomatic tool using an open-source software package (3D slicer, ver. 4.10.2; available at: https://slicer.org/, accessed on 1 June 2020) in the early phase of contrast-enhanced T1-weighted subtraction imaging and ADC mapping. If the target lesion was not clearly delineated on the ADC map, the segmented volume of interest on early-phase contrast-enhanced T1 subtraction images was applied to the ADC map with modification. After the segmentation of the target lesion, radiomics features were extracted using open-source PyRadiomics software. A total of 107 features were extracted from early-phase contrast-enhanced T1 subtraction images and ADC maps.

Clinicopathologic Information and Conventional MRI Analysis
Clinicopathologic information including age, treatment methods, stage and follow-up period after surgery were reviewed from medical records. Pathologic characteristics of tumors were collected from pathologic reports after surgery (Appendix A).
Morphologic data of the tumor, including shape, margin, internal enhancement pattern and enhancement kinetics, were obtained from the expert consensus of the conventional MRI analysis of PACS and reviewed according to the fifth edition of the Breast Imaging Reporting and Data System MRI lexicon. In addition, previously established poor prognostic factors, peritumoral edema and ipsilateral vascularity around the tumor [21,22], were reviewed on T2-weighted images and maximum intensity projection images, respectively.

Statistical Analysis
As in the standard statistical inference, two-tailed p values < 0.05 were considered significant. All statistical analyses were performed using R software version 4.0.2 (Ihaka and Gentleman, 1996). We fitted the least absolute shrinkage and selection operator (LASSO) Cox regression model using the "glmnet" package, and the "rms" and "hdnom" packages were used for the Kaplan-Meier curve, nomogram and calibration plot.

Creation and Validation of Rad-Score
We randomly divided patients into a training set (n = 111) and a validation set (n = 44) to create a radiomics signature (Rad-score). To compare patients' characteristics between the two cohorts, we conducted a Wilcoxon rank-sum test for continuous variables and Fisher's exact test for categorical variables. We first used the univariate Cox proportional hazards model to screen for insignificant variables. We then used the LASSO regularization method [23] to perform radiomics feature selection from the training set. The Rad-score of each patient was calculated using a combination of selected features and their estimated coefficients. We analyzed the association between the Rad-score and DFS from the training set and then assessed it in the validation set. The patients were classified as high-or low-risk based on the Rad-score. We used Youden's index [24] to set the cut-off value of the Rad-score in the receiver-operating characteristic curve analysis. The DFS of high-and low-risk patients was analyzed using Kaplan-Meier survival analysis, and the log-rank test was applied to evaluate the difference in DFS between the two groups. A radiomics-based prediction model for recurrence was constructed using the Radscore and clinicopathologic and morphologic information. We used the univariate Cox proportional hazards model to exclude the insignificant variables. We then fitted the multivariate Cox proportional hazards model based on all significant variables from the univariate analysis. To avoid multicollinearity, we conducted stepwise variable selection based on the Akaike information criterion. We compared the prediction power among the models with several combinations of significant variables in the validation set with the C-index and calibrated them.

Patient Characteristics
In total, 155 patients were enrolled in the study (Figure 1). The mean age was 35 years (SD ± 4.8), and the mean follow-up period was 55 months (SD ± 26.55 high-or low-risk based on the Rad-score. We used Youden's index [24] to set the cut-of value of the Rad-score in the receiver-operating characteristic curve analysis. The DFS o high-and low-risk patients was analyzed using Kaplan-Meier survival analysis, and th log-rank test was applied to evaluate the difference in DFS between the two groups.
A radiomics-based prediction model for recurrence was constructed using the Rad score and clinicopathologic and morphologic information. We used the univariate Co proportional hazards model to exclude the insignificant variables. We then fitted the mul tivariate Cox proportional hazards model based on all significant variables from the uni variate analysis. To avoid multicollinearity, we conducted stepwise variable selectio based on the Akaike information criterion. We compared the prediction power among th models with several combinations of significant variables in the validation set with the C index and calibrated them.

Patient Characteristics
In total, 155 patients were enrolled in the study (Figure 1). The mean age was 35 year (SD ± 4.8), and the mean follow-up period was 55 months (SD ± 26.55). Of the 155 patients 138 were diagnosed with invasive ductal carcinoma, 4 were diagnosed with invasive lob ular carcinoma and 13 were diagnosed with other histologies, including mucinous carc noma (8 cases), adenoid cystic carcinoma (2 cases), invasive micropapillary carcinoma ( case), metaplastic carcinoma (1 case), and mixed mucinous and invasive micropapillar carcinoma (1 case). There were 50 patients with luminal A subtype (32.2%), 61 patient with luminal B subtype (39.4%), 10 patients with HER2 positive subtype (6.5%) and 3 patients with triple-negative subtype (21.9%). Furthermore, 52 patients underwent neo adjuvant chemotherapy after diagnosis, while the remaining 103 patients did not.  During the mean follow-up period of 55 months (6-118 months), 42 patients were diagnosed as experiencing recurrence. Among the 42 recurrent patients, 3 were diagnosed as experiencing recurrence after five years (60 months) of surgery and were censored. Among the 39 recurrent patients, 20 (13 in the training set and 7 in the validation set) experienced disease recurrence within 24 months, with a median recurrence interval of 13 months (range, 3-23 months). Of the 20 cases of recurrence, 13 were manifested as local recurrence of the ipsilateral breast, 6 were manifested as distant metastases to the lung, brain and contralateral supraclavicular lymph nodes, 1 was manifested as contralateral breast recurrence, 1 was manifested as both ipsilateral breast recurrence and distant metastasis, and 1 was manifested as both contralateral breast recurrence and distant metastasis. The patients' characteristics in each cohort are listed in Table 1. Except for delayed enhancement kinetics, there was no difference between the training and validation sets.

Creation of Rad-Score & Assessment of Disease-Free Survival
The 214 total texture features (107 from each cohort), including shape (14 features), first-order (18 features), gray-level co-occurrence matrix (24 features), gray-level run-length matrix (16 features), gray-level size zone matrix (16 features), gray-level dependence matrix (14 features) and neighboring gray-tone difference matrix (5 features), were extracted. These features were standardized and used to generate the radiomics-based scoring equation (Rad-score). Based on the fitted LASSO Cox regression model from the training set, the Rad-score for predicting disease recurrence was created as follows: Using the equation above, the Rad-score of each patient was calculated, and the patients in the training set were classified as high-or low-risk for disease recurrence. The optimal cut-off value for separating the high-and low-risk patients was determined to be −0.016 according to Youden's index. The characteristics of the patients according to risk are presented in Table 2. In the training set, Rad-score, mastectomy ratio, T stage, N stage, overall stage, mean tumor size, ratio of non-mass enhancement or combined pattern, ipsilateral vascularity and the presence of peritumoral edema were all positively correlated with high risk. Using Kaplan-Meier survival analysis, the DFS by Rad-score was analyzed. The high-risk group in the training set showed significantly lower DFS values within 2 years of surgery (p = 0.003). Similarly, the high-risk group showed significantly lower survival rates in the validation set (p = 0.020) (Figure 2). Using Kaplan-Meier survival analysis, the DFS by Rad-score was analyzed. The high-risk group in the training set showed significantly lower DFS values within 2 years of surgery (p = 0.003). Similarly, the high-risk group showed significantly lower survival rates in the validation set (p = 0.020) (Figure 2).

Rad-Score-Based Recurrence Prediction Model: Radiomics Nomogram
Univariate analysis showed that a higher Rad-score (p < 0.001) and ER-negativity (p = 0.044) were associated with recurrence. Multivariate analysis confirmed the independent association of Rad-score (hazard ratio 5.87, 95% CI 2.87-11.99, p < 0.001) and ER-negativity (hazard ratio 0.41, 0.19-0.86, p = 0.019) with disease recurrence (Table 3). With Rad-score and ER-negativity, the Rad-score-based nomogram for the prediction of disease recurrence within 2 years of surgery was created (Figure 3).  We compared the prediction power among three models in the validation set: the radiomics nomogram, the ER-negativity only model and the Rad-score only model. The C-index of the radiomics nomogram was 0.63 (95% CI 0.45-0.80), displaying fair predictive ability for disease recurrence, whereas the C-index of the ER-negativity-only model was 0.51 (95% CI 0.39-0.66). In addition, the Rad-score-only model showed a value of 0.71 (95% CI 0.51-0.86). The calibration curves of these models in the validation set are shown in Figure 4. Additionally, the calibration curves of these models in the training set are shown in Figure A1, Appendix B. Representative recurrent and non-recurrent cases are shown in Figures 5 and 6. We compared the prediction power among three models in the validation set: the radiomics nomogram, the ER-negativity only model and the Rad-score only model. The C-index of the radiomics nomogram was 0.63 (95% CI 0.45-0.80), displaying fair predictive ability for disease recurrence, whereas the C-index of the ER-negativity-only model was 0.51 (95% CI 0.39-0.66). In addition, the Rad-score-only model showed a value of 0.71 (95% CI 0.51-0.86). The calibration curves of these models in the validation set are shown in Figure 4. Additionally, the calibration curves of these models in the training set are shown in Figure A1, Appendix B. Representative recurrent and non-recurrent cases are shown in Figures 5 and 6.

Discussion
In this study, we developed a radiomics-based nomogram for predicting early disease recurrence in YABC. Rad-score and ER negativity were associated with early cancer recurrence within two years of surgery. Among the three prediction models, those employing the Rad-score demonstrated higher predictive ability for recurrence than the model that only included ER status. Moreover, the Rad-score-only model showed a higher predictive performance than the Rad-score and ER negativity model. Note that the relatively small sample size may lead to larger variability in predictive ability; the sparse model is often preferred in this case. However, as we pointed out, the sample size used in this study can still provide a small margin of error close to the threshold value suggested in [25].
The Rad-score was generated using the equation created from six features of the tumor texture analysis. Of these six features, a few were consistent with a previous study on YABC. Previously, a low surface-area-to-volume ratio, indicating tumor sphericity, and texture parameters indicating tumor homogeneity exhibited an association with cancer recurrence [14]. In this study, a low surface-area-to-volume ratio and a high cluster tendency were also correlated with a high Rad-score. Moreover, most malignant masses exhibited a high surface-area-to-volume ratio because of irregular shapes with non-circumscribed margins. In contrast, a low surface-area-to-volume ratio indicated a more spherical shape. This result is not only consistent with previous studies but is also consistent with the fact that the triple-negative subtype tends to be more spherical [26,27], and there was a high proportion of the triple-negative subtype in the recurrence group in the current study.
Generally, tumor heterogeneity from texture analysis is a poor prognostic factor, as it represents aggressive tumor biology [28,29]. However, several previous studies suggested that lower entropy or higher tumor uniformity in contrast-enhanced T1 subtraction images, as well as tumor heterogeneity in T2-weighted images, are associated with poor breast cancer outcomes [29,30]. In these previous studies, it was hypothesized that the vascular permeability of tumors leads to increased parenchymal enhancement, resulting in less heterogeneity in texture analysis. However, the results of the present and previous studies consistently showed an association between the tumor homogeneity of ADC maps

Discussion
In this study, we developed a radiomics-based nomogram for predicting early disease recurrence in YABC. Rad-score and ER negativity were associated with early cancer recurrence within two years of surgery. Among the three prediction models, those employing the Rad-score demonstrated higher predictive ability for recurrence than the model that only included ER status. Moreover, the Rad-score-only model showed a higher predictive performance than the Rad-score and ER negativity model. Note that the relatively small sample size may lead to larger variability in predictive ability; the sparse model is often preferred in this case. However, as we pointed out, the sample size used in this study can still provide a small margin of error close to the threshold value suggested in [25].
The Rad-score was generated using the equation created from six features of the tumor texture analysis. Of these six features, a few were consistent with a previous study on YABC. Previously, a low surface-area-to-volume ratio, indicating tumor sphericity, and texture parameters indicating tumor homogeneity exhibited an association with cancer recurrence [14]. In this study, a low surface-area-to-volume ratio and a high cluster tendency were also correlated with a high Rad-score. Moreover, most malignant masses exhibited a high surface-area-to-volume ratio because of irregular shapes with non-circumscribed margins. In contrast, a low surface-area-to-volume ratio indicated a more spherical shape. This result is not only consistent with previous studies but is also consistent with the fact that the triple-negative subtype tends to be more spherical [26,27], and there was a high proportion of the triple-negative subtype in the recurrence group in the current study.
Generally, tumor heterogeneity from texture analysis is a poor prognostic factor, as it represents aggressive tumor biology [28,29]. However, several previous studies suggested that lower entropy or higher tumor uniformity in contrast-enhanced T1 subtraction images, as well as tumor heterogeneity in T2-weighted images, are associated with poor breast cancer outcomes [29,30]. In these previous studies, it was hypothesized that the vascular permeability of tumors leads to increased parenchymal enhancement, resulting in less heterogeneity in texture analysis. However, the results of the present and previous studies consistently showed an association between the tumor homogeneity of ADC maps and lower DFS. In this study, the cluster tendency from the ADC map showed a positive correlation with disease recurrence, and in a previous study, the inverse difference moment from the ADC map was associated with disease recurrence. Usually, ADC maps represent tumor cellularity, and low ADC values are associated with high-grade tumors [31] or high tumor proliferation [32]. However, because the association between tumor cellularity and the texture parameters of ADC maps has not yet been evaluated, tumor homogeneity as a recurrence-associated factor is not yet confirmed as a unique factor of YABC. Technically, variations between MRI vendors can affect the tumor homogeneity from texture analysis. Therefore, further studies are warranted to verify tumor homogeneity from ADC maps as a recurrence-associated factor in YABC.
Of the 20 cases of disease recurrence, 10 showed ER-negativity, 2 of which were HER2 positive and 8 were triple-negative. Because we investigated DFS within two years of surgery, the high proportion of the triple-negative subtype is expected due to the aggressive nature of the triple-negative subtype [33,34]. In our study, the overall rate of the triplenegative subtype was higher (22%, 34/155) than that of the general population of breast cancer, which is consistent with the idea that YABC has a higher rate of the triple-negative subtype [35]. Thus, ER-negativity as a recurrence-associated factor cannot necessarily be considered unique to YABC. Therefore, further studies are needed to compare clinical, imaging or genetic features of ER-negative breast cancer between young-and average-age breast cancer patients.
This study has the limitation of the inevitable selection bias of YABC cases because it was a retrospective study conducted at a single institution. Second, a relatively small sample size was used to make and validate the nomogram. Moreover, the Rad-score-only model showed a higher predictive performance than the radiomics nomogram model, which has both the Rad-score and ER negativity as predictors. Note that the relatively small sample size may lead to a larger variability in prediction; in this case, the model with a smaller number of predictors is often preferred in the test dataset. Therefore, we calculate a margin of error and compare it with the guidelines provided in [25]. For a given sample of 155 with a 12.9% recurrence rate, the margin of error is 0.053, which is close to 0.05, the suggested threshold value in [25]. This implies that the sample size used in this study can lead to robust prediction models, though collecting larger samples could provide a smaller variability in prediction. We provide details about calculating the margin of error in Appendix C. In addition, considering the proportion of young age patients among all breast cancer patients, the number of patients enrolled in our study is not that small. Third, though we validated the nomogram with a separate cohort, we did not perform external validation with data from more independent resources, such as prospective patient groups or patient groups of other institutions. For this reason, we are planning to validate this nomogram with a prospective group in our institution. Moreover, the overall survival of YABC should be investigated with a follow-up period of more than 10 years. Finally, information regarding family history or the presence of BRCA mutations is lacking in our study group. However, only 10% of YABC patients have an association with a first-degree family history of breast cancer or BRCA mutations [10,11], and a family history of breast cancer or the presence of BRCA mutations cannot affect the mortality of breast cancer patients [36,37].

Conclusions
In conclusion, our nomogram based on the radiomics signature and clinicopathologic information showed reasonably high predictive ability of disease-free survival, especially within 2 years of surgery. Future prospective studies should be conducted to validate the predictability of the radiomics nomogram for YABC. Furthermore, it is crucial to determine the relationship between tumor biology including genetic mutation and the imaging phenotype of YABC through multi-omics studies.

Appendix C. Discussion
The margin of error can measure the amount of random sampling error in the survey data. Riley RD et al. [25] recommend that a margin of error should be smaller than or equal to 0.05 to build robust clinical prediction models. Especially when we have a binary outcome (recurrence/no-recurrence), a margin of error from an approximate 95% confidence interval is