Nomogram to Predict the Overall Survival of Colorectal Cancer Patients: A Multicenter National Study

Background: Colorectal cancer (CRC) is the third foremost cause of cancer-related death and the fourth most commonly diagnosed cancer globally. The study aimed to evaluate the survival predictors using the Cox Proportional Hazards (CPH) and established a novel nomogram to predict the Overall Survival (OS) of the CRC patients. Materials and methods: A historical cohort study, included 1868 patients with CRC, was performed using medical records gathered from Iran’s three tertiary colorectal referral centers from 2006 to 2019. Two datasets were considered as train set and one set as the test set. First, the most significant prognostic risk factors on survival were selected using univariable CPH. Then, independent prognostic factors were identified to construct a nomogram using the multivariable CPH regression model. The nomogram performance was assessed by the concordance index (C-index) and the time-dependent area under the ROC curve. Results: The age of patients, body mass index (BMI), family history, tumor grading, tumor stage, primary site, diabetes history, T stage, N stage, and type of treatment were considered as significant predictors of CRC patients in univariable CPH model (p < 0.2). The multivariable CPH model revealed that BMI, family history, grade and tumor stage were significant (p < 0.05). The C-index in the train data was 0.692 (95% CI, 0.650–0.734), as well as 0.627 (0.670, 0.686) in the test data. Conclusion: We improved a novel nomogram diagram according to factors for predicting OS in CRC patients, which could assist clinical decision-making and prognosis predictions in patients with CRC.


Introduction
According to GLOBOCAN 2020 data, the CRC have been regarded as the fourth most commonly diagnosed cancer globally [1]. In the USA, patients with CRC have reported about 130,000 cases with over 50,000 death records [2]. In European Union countries, CRC is the second common cause of death in the European Union, with 215,000 cases, and second common cancer sites, with 447,000 cases [3]. In Singapore, the CRC is the top rank of cancer and second in the cause of cancer death [4]. According to the cancer registry program in Iran, CRC is considered the third most common cancer in Iran, following only breast and stomach cancer [5][6][7]. The CRC is the fourth most commonly diagnosed cancer 2 of 11 in the Iranian male's population, and the second in females, respectively [1,[8][9][10]. Although a great number of investigations have revealed a remarkable variability around the world, and almost 60% of cases happen in developed countries, its overall incidence rate illustrates a slow trend but steady increase (approximately 2% per year) in developed nations. On the contrary, in developing societies and a significant number of Asian countries, the annual incidence is unfortunately anticipated to rise during the next two decades [11].
Nomogram is a simple graphical representation of a statistical prediction model that generates a numerical probability of a clinical event and has been recently applied in prognosis-associated clinical studies with comparable results [12][13][14]. In other words, nomograms which include the histology, tumor grading, history of polyp, the number of involved lymph nodes can be clinically used to predict survival among patients with CRC [15,16].
To the best of our knowledge, this study was the first viewpoint of nomogram visualization on the predictive and prognostic factors regarding and OS for CRC in Iran. Also, this is the first Iranian multicenter study that surveys demographic and clinical traits of patients with CRC. The large sample size (n = 1868) confirms a vast range of relationships with sufficient statistical analysis power in both train and test sets.
With such a large sample size population, the goal of the historical cohort study was to apply Cox regression to assess the influence of significant factors on CRC patients' survival rate who registered at three tertiary referral centers in Iran between 2006 and 2019. Then, the nomogram was drawn to generate the probability of survival in CRC patients. The C-index was used for the validation of train and test datasets.

Materials and Methods
In the study, we gathered both demographic information and clinical characteristics of 1868 patients diagnosed with CRC and referred to three tertiary Hospitals of Iran from 2006 to 2019. Patients in Shahid Faghihi Hospital in Shiraz and Taleghani Hospital in Tehran were considered train sets, and patients in Imam Khomeini Hospital in Mazandaran were applied as the test set.
The response variable was the time (months) elapsed from the cancer diagnosis until death. Several important clinical factors were included in the model, such as tumor size, the number of involved lymph nodes, distant metastasis, histology, type of treatment, history of polyp and CRC, comorbidity colon diseases (inflammatory bowel disease and irritable bowel syndrome), Diabetes Mellitus, tumor stage, location of the tumor, and demographic variables such as sex, age, education level, smoking and alcohol consumption status, marital status, and BMI. Also, there are some missing data among variables. Patients who had a history of colorectal surgery for any reason except colorectal cancer were excluded. The Ethics Committee of the Iran University of Medical Sciences approved the project (IR.IUMS.REC.1399.1223). The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD), a statement including a 22-item checklist, which aims to improve the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes, has been presented in the supplementary material 1. Figure 1 shows the flowchart of choosing patients in both training and testing sets.

Statistical Analysis
The participants' clinical features were represented by reporting the mean with SD for continuous measures and frequency with proportions for categorical ones. The univariable CPH model was implemented to evaluate the effect of some essential factors on CRC patients. Those variables which had p < 0.2 in the univariable analysis were candidates for the multivariable regression analysis. The result of the multivariable Cox model was presented as a nomogram. To assess the model performance, concordance index (Cindex) and the time-dependent AUC (Area Under the ROC Curve) at different time points were calculated.
The significance level for the statistical analysis was considered 0.05. The R 4.1.0 software (http://www.r-project.org) with the survival and rms packages was applied for statistical analysis. Also, the DynNom package was used to construct the dynamic nomogram [26].

Results
A total of 1649 CRC patients, including Shiraz and Tehran cities, were included in the study as the train set. Also, another dataset from Mazandaran was applied as the test set (n = 219). Overall, 59.7% (n = 988) were male and 40.3% (n = 666) were female. The median follow-up time was 21.86 months (IQR: 9-37.2 and range 1, 179 months). The mean (SD) age of patients was 54 (14) years; moreover, the detailed demographic and clinical characteristics of all the CRC patients, according to survival status, were summarized in Table 1.

Statistical Analysis
The participants' clinical features were represented by reporting the mean with SD for continuous measures and frequency with proportions for categorical ones. The univariable CPH model was implemented to evaluate the effect of some essential factors on CRC patients. Those variables which had p < 0.2 in the univariable analysis were candidates for the multivariable regression analysis. The result of the multivariable Cox model was presented as a nomogram. To assess the model performance, concordance index (C-index) and the time-dependent AUC (Area Under the ROC Curve) at different time points were calculated.
The significance level for the statistical analysis was considered 0.05. The R 4.1.0 software (http://www.r-project.org) with the survival and rms packages was applied for statistical analysis. Also, the DynNom package was used to construct the dynamic nomogram [26].

Results
A total of 1649 CRC patients, including Shiraz and Tehran cities, were included in the study as the train set. Also, another dataset from Mazandaran was applied as the test set (n = 219). Overall, 59.7% (n = 988) were male and 40.3% (n = 666) were female. The median follow-up time was 21.86 months (IQR: 9-37.2 and range 1, 179 months). The mean (SD) age of patients was 54 (14) years; moreover, the detailed demographic and clinical characteristics of all the CRC patients, according to survival status, were summarized in Table 1. In this regard, factors associated with survival are listed in Table 1 based on the univariable Cox regression. The table revealed that age, BMI, family history, tumor grade, stage of the tumor, primary site, Diabetes history, T stage, N stage, and types of treatment are significant in the univariable Cox model. Those variables with p < 0.2 in the univariable analysis were incorporated in the multivariable Cox model given in Table 2. The multivariable Cox model's output presented that BMI, family history, grade tumor, and tumor stage are statistically significant (p < 0.05).
The HR of death for patients with BMI < 18 (underweight) is 94% more than those with overweight persons, which was significant (HR = 1.94, p < 0.05). Also, the HR in normal-weight persons is 42% more than the overweight persons (HR = 1.42, p < 0.05). The HR in patients who do not have a family history of cancer is 42% less than those who do not have a family history (HR = 0.58, p = 0.002).
An HR of tumor grade categories indicated that both moderate and poor differentiation had worse prognoses than poorly differentiated (HR = 1.5; HR = 2.67, p < 0.05).
By worsening the tumor stage, the HR is increased significantly in CRC patients. That means the higher the stage of the tumor, the higher the HR. The HR in patients with stage IV of CRC is about 3.2 times more than stage I of patients (HR = 3.24, p = 0.005).
Based on the results of multivariable analysis, we established a dynamic web-based nomogram to calculate the survival probability (Dynamic Nomogram (shinyapps.io), https://nbshiny.shinyapps.io/DynNomColorectal/). Using it, one can predict the longterm survival of patients with CRC ( Figure 2). This statistic tool that combines all prognostic indexes represents a graphical model that simply calculates the individualized overall survival probability for CRC patients.

Validation of Nomogram
The C-index for the nomogram was calculated for train and test datasets. The C-index in the train set was 0.692 (95% CI, 0.650-0.734). The demographic and clinical characteristics of all the CRC patients of the test set, according to survival status, were summarized in Table 3. Also, the C-index of the test set was estimated as 0.627 (0.670, 0.686), which showed the nomogram provided good discernment.

Validation of Nomogram
The C-index for the nomogram was calculated for train and test datasets. The C-index in the train set was 0.692 (95% CI, 0.650-0.734). The demographic and clinical characteristics of all the CRC patients of the test set, according to survival status, were summarized in Table 3. Also, the C-index of the test set was estimated as 0.627 (0.670, 0.686), which showed the nomogram provided good discernment. In addition, to assess the model performance internally, the time-dependent AUC was calculated at different time points. The results have been presented in Figure 3.

Variables
Alive (n = 111) Dead (n = 108) Follow up duration; median (IQR) 55.0 (37.0-70.0) 26.5 (13.5-42. In addition, to assess the model performance internally, the time-dependent AUC was calculated at different time points. The results have been presented in Figure 3.

Discussion
In the present study, the univariable and multivariable Cox regression models were applied, and then the nomogram diagram was constructed to predict OS, which was able to provide individualized estimates of potential survival benefits. The significant factors of the study are the BMI, family history of cancer, histology, depth of invasion. The C-index of the train and test dataset was estimated at 0.692 and 0.627, respectively. Also, time-dependent AUC was evaluated at separate times.
A significant number of modeling techniques in survival analysis have been suggested for proportional hazard and non-proportional hazards [20,27,28]. Their results of the Cox model showed that tumor size and grade of tumor are vital in the survival of CRC patients. Similar to our study, previous surveys have reported the relationship between age at diagnosis and the 5-year survival [21]. Zhao et al. (2020) applied machine learning to predict OS more accurately in colon cancer patients and presented the predictive model in nomograms for patients and clinicians [21]. They also used the Cox regression model to find the predictive factors on cancer. Some variables such as age, highest CEA level, the primary site of a tumor, treatment type, and the number of involved lymph nodes were significant. In our study, we did not have the CEA level; moreover, the number of involved lymph nodes and types of treatment were not statistically significant.
Our study's critical result revealed a significant relationship between the survival of CRC patients with marital status, consistent with Zhang et al. study [29]. In their study, sex, race, CEA status, tumor size, tumor site, marital status, histology, grade and tumor stage, the extent of surgery, and metastasis were considered significant prognostic factors of CRC. In our study, histology, grade, and tumor stage are significant, which were compatible with their study [29][30][31].
Li et al. showed the age of patients, sex, depth of invasion, and tumor location were significant prognostic factors [32]. In the study, the C-indexes of the nomogram for the prediction of OS were 0.723 and 0.716 in the training and testing group, respectively. In another survey, tumor size and involved lymph nodes were substantial, while these variables were not significant prognostic factors in Yu's study [14]. In our study, the C-index of the train and test sets was estimated at 0.692 and 0.627. Similar to our results, Li et al. showed that tumor size and the number of involved lymph nodes were significant prognostic factors in CRC [33]. In their study, several serum tumor biomarkers, including CA19-9, CA242, CA72-4, CA50, and CA125 were studied in association with prognosis. They used the univariable and multivariable Cox regression models to evaluate the relationship between these markers and survival outcomes. They also draw the nomograms based on multivariable Cox regression model analysis for OS. Also the C-indexes of their study were 0.772 and 0.715. In our investigation, the number of involved lymph nodes was significant in the univariable Cox regression model, but the variable was not considered as the main factor in multivariable CPH.
A survey has revealed that age, depth of invasion, number of involved lymph nodes, treatment type were significant in CRC, consistent with our study [34]. The univariable and multivariable Cox analyses were conducted to predict the individual risk of metachronous peritoneal carcinomatosis after surgery for non-metastatic CRC. The depth of invasion and pathology of primary tumors have been identified as risk factors for CRC patients' survival, which are compatible with our study. In their study, the C-index in both train and test datasets were 80% and 70%, while in our study, these values were 0.692 and 0.627.
Li et al. have performed survival analysis to assess an effective prognostic model for predicting survival in resected colorectal cancer patients [18]. They applied multivariable Cox regression analysis to identify significant prognostic. Their results demonstrated that age, CEA level, the number of involved lymph nodes, tumor stage, histological type, tumor grading, tumor location, treatment type, and lymph-vascular invasion were significant. In our study, the stage and grade of cancer were significant, which was consistent with the findings of treatment in the study of Li et al.

Strengths and Limitations
The first key strength of the present survey is the large sample size of a multi-center study together with a small number of missing data. The second fundamental strength of this study is the long-term follow-up period. The limitation of the study is that some