Nomogram for Predicting Distant Metastasis of Pancreatic Ductal Adenocarcinoma: A SEER-Based Population Study

(1) Background: The aim of this study was to identify risk factors for distant metastasis of pancreatic ductal adenocarcinoma (PDAC) and develop a valid predictive model to guide clinical practice; (2) Methods: We screened 14328 PDAC patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2015. Lasso regression analysis combined with logistic regression analysis were used to determine the independent risk factors for PDAC with distant metastasis. A nomogram predicting the risk of distant metastasis in PDAC was constructed. A receiver operating characteristic (ROC) curve and consistency-index (C-index) were used to determine the accuracy and discriminate ability of the nomogram. A calibration curve was used to assess the agreement between the predicted probability of the model and the actual probability. Additionally, decision curve analysis (DCA) and clinical influence curve were employed to assess the clinical utility of the nomogram; (3) Results: Multivariate logistic regression analysis revealed that risk factors for distant metastasis of PDAC included age, primary site, histological grade, and lymph node status. A nomogram was successfully constructed, with an area under the curve (AUC) of 0.871 for ROC and a C-index of 0.871 (95% CI: 0.860–0.882). The calibration curve showed that the predicted probability of the model was in high agreement with the actual predicted probability. The DCA and clinical influence curve showed that the model had great potential clinical utility; (4) Conclusions: The risk model established in this study has a good predictive performance and a promising potential application, which can provide personalized clinical decisions for future clinical work.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignant tumor characterized by metastatic susceptibility and high mortality, with a rising incidence in both economically developed and developing countries [1]. Worldwide, PDAC is the 12th most commonly diagnosed cancer and the seventh leading cause of cancer death in 2020 [2]. In the United States, PDAC is estimated to be the fourth leading cause of cancer-related deaths in 2022, and its mortality rate is maintaining an upward trend, with the disease projected to rank second in mortality among all cancer types by 2030 [3,4]. The five-year relative overall survival (OS) rate for the disease is about 11% and most patients die from peritoneal spread or distant metastases [3]. For metastatic PDAC, the five-year OS rate is only 2% [5]. The most common site of metastasis in PDAC is the liver, followed by the lung, bone, and brain [6][7][8]. Although several previous studies have evaluated the prognostic role and predictors of single-organ metastasis of PDAC, studies on the prediction of risk for distant metastasis of PDAC are still not available [6,9]. Metastasis of PDAC is affected by a variety of clinicopathological factors such as gender, age, race, primary site, grade, tumor size, and lymph node status [10]. Population-level estimates for the risk of distant metastases in PDAC patients are lacking, and the relationship between clinical-related factors and distant metastasis have not been well-elucidated. Therefore, in this study, we used data from patients diagnosed with PDAC in the Surveillance, Epidemiology, and End Results (SEER) cancer registry between 2010 and 2015 to analyze the impact of distant metastasis on the prognosis of PDAC patients, to identify risk factors for distant metastasis in PDAC, and to develop an effective predictive model that provides a guiding strategy for clinical practice.

Patients Selection
A retrospective cohort study was performed by extracting data from the SEER database, which is the authoritative cancer database in the United States, collecting cancer data from 18 population-based cancer registries in 14 states. "Incidence-SEER Research Plus Date, 13 Registries, November 2020 sub (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)" was employed as the data source. We selected cases through "Site code ICD-O-3/WHO 2008" and regarded "Pancreas" as the site of morbidity to determine the patient cohort of this study and extract patient-related information. Based on previous studies using the same database, we identified PDAC by the codes defined by "ICD-O-3 Hist/Behav, Malignant" including 8140 and 8500 [11] (8140: Adenocarcinoma, 8500: infiltrating duct carcinoma). Demographic and clinical data were extracted for every patient, including age, gender, race, primary site (defined encoding: C25.0: Head of pancreas, C25.1: Body of pancreas, C25.2: Tail of pancreas, C25.3: Pancreatic duct, C25.4: Islets of Langerhans, C25.7: Other specified parts of pancreas, C25.8: Overlapping lesion of pancreas, C25.9: Pancreas), tumor size, histologic grade, TNM stage (AJCC 7th edition), surgery, regional lymph node dissection, marital status, radiotherapy, chemotherapy and the survival data of the patients. Inclusion criteria were as follows: (1) age ≥ 18 years; (2) histologically confirmed disease; (3) definite distant metastasis status; (4) diagnosed in 2010-2015. The exclusion criteria were shown in Figure 1. Ultimately, 5564 PDAC patients were eligible for subsequent analysis. SEER*Stat software (version 8.3.9.2) was used to extract data. This study complies with the requirements of the Declaration of Helsinki. Since SEER data are publicly available, this study did not require approval from the institutional ethics committee and informed consent was waived. of distant metastases in PDAC patients are lacking, and the relationship between clinicalrelated factors and distant metastasis have not been well-elucidated. Therefore, in this study, we used data from patients diagnosed with PDAC in the Surveillance, Epidemiology, and End Results (SEER) cancer registry between 2010 and 2015 to analyze the impact of distant metastasis on the prognosis of PDAC patients, to identify risk factors for distant metastasis in PDAC, and to develop an effective predictive model that provides a guiding strategy for clinical practice.

Patients Selection
A retrospective cohort study was performed by extracting data from the SEER database, which is the authoritative cancer database in the United States, collecting cancer data from 18 population-based cancer registries in 14 states. "Incidence-SEER Research Plus Date, 13 Registries, November 2020 sub (2000-2018)" was employed as the data source. We selected cases through "Site code ICD-O-3/WHO 2008" and regarded "Pancreas" as the site of morbidity to determine the patient cohort of this study and extract patientrelated information. Based on previous studies using the same database, we identified PDAC by the codes defined by "ICD-O-3 Hist/Behav, Malignant" including 8140 and 8500 [11] (8140: Adenocarcinoma, 8500: infiltrating duct carcinoma). Demographic and clinical data were extracted for every patient, including age, gender, race, primary site (defined encoding: C25.0: Head of pancreas, C25.1: Body of pancreas, C25.2: Tail of pancreas, C25.3: Pancreatic duct, C25.4: Islets of Langerhans, C25.7: Other specified parts of pancreas, C25.8: Overlapping lesion of pancreas, C25.9: Pancreas), tumor size, histologic grade, TNM stage (AJCC 7th edition), surgery, regional lymph node dissection, marital status, radiotherapy, chemotherapy and the survival data of the patients. Inclusion criteria were as follows: (1) age ≥ 18 years; (2) histologically confirmed disease; (3) definite distant metastasis status; (4) diagnosed in 2010-2015. The exclusion criteria were shown in Figure 1. Ultimately, 5564 PDAC patients were eligible for subsequent analysis. SEER*Stat software (version 8.3.9.2) was used to extract data. This study complies with the requirements of the Declaration of Helsinki. Since SEER data are publicly available, this study did not require approval from the institutional ethics committee and informed consent was waived.

Statistical Analysis
The software used for statistical analysis in this study included IBM SPSS (version 22.0), GraphPad Prism (version 8.0.1), and RStudio (version 1.4.1717.0). The applied R function packages included rms, readr, corrplot, glmnet, pROC, rmda, and ResourceSelection. The chi-square test or Fisher exact test were deployed to compare categorical variables between different groups. Lasso regression method was employed to reduce the risk of over-fitting of the model. The minimum partial likelihood deviance representing the complexity of the model was applied by the 10-fold cross-validation method. As a result, none of the independent variables were eliminated in the optimal case. Univariate logistic regression analyses were then performed, and variables with p values less than 0.05 were selected to enter a multivariate logistic regression model to determine the independent predictors for distant metastasis of PDAC. The nomogram was drawn based on the independent predictors. The Hosmer-Lemeshow (HL) test was used to assess the goodness-of-fit of the model. The accuracy and discrimination of the model was assessed by the area under the curve (AUC) of the ROC curve and the calculation of the consistency-index (C-index). The agreement between the predicted probability of the model and the actual probability was evaluated by a calibration curve. Furthermore, a decision curve analysis (DCA) and clinical influence curve were used to assess the clinical utility of the model. Finally, we analyzed the survival differences between patients with distant metastasis and without distant metastasis. In the survival analysis, the primary endpoints were OS and cancerspecific survival (CSS). Survival curves were plotted using Kaplan-Meier method and differences were compared using the log-rank test. p values for all analyses were bilateral, and p < 0.05 was considered statistically significant in this study.

Characteristics of PDAC Patients
A total of 5564 patients with PDAC that met our criteria were included in the study. Among them, 1242 patients (22.3%) with distant metastasis. The rate of distant metastasis in PDAC patients decreased gradually with the increasing of age. The rate of distant metastasis was significantly lower in patients over 80 years of age than in those under 60 years of age. The incidence of distant metastasis was higher in men than in women. The incidence of distant metastases was significantly higher in blacks than in other ethnic groups. The incidence of distant metastasis of PDAC originating from the body and tail of the pancreas was more than twice that of PDAC originating from the head of the pancreas. The high incidence of distant metastasis of PADC was significantly associated with worse tumor differentiation and larger tumor size. More invasive surgery and radiotherapy significantly reduced the incidence of distant metastases of PDAC. Detailed demographic and clinicopathological characteristics of PDAC patients were summarized in Table 1.

Risk Factors for Distant Metastasis of PDAC
We used the lasso regression method to reduce the risk of over-fitting of the developed model, which compresses the partial factorial regression coefficients to zero. We performed 100 calculations using the glmnet package in RStudio software, and the results converged to the optimal solution at the 65th time (lambda = 0.00571). The degree of freedom corresponding to the optimal solution was 13. This means that none of the 13 independent variables included in this study were eliminated (Figure 2A). The mean-square error of the model at different lambda values was then checked through 10-fold cross-verification ( Figure 2B). The lambda value corresponding to a standard error space with the minimum mean square error was taken as the optimal solution, and the regression coefficients of all independent variables at this time were checked. Subsequently, we included 13 independent variables in the univariate logistic regression analysis, and the variables with p < 0.05 in the regression results were included in the multivariate logistic regression analysis. Finally, multivariate analysis showed that age < 60 (OR = 2.481, 95% CI 1.906-3.238, p < 0.001), age in 60-69 (OR = 2.076, 95% CI 1.626-2.657, p < 0.001), age in 70-79 (OR = 1.790, 95% CI 1.397-2.299, p < 0.001), the tumor located in the pancreatic body/tail (OR = 2.520, 95% CI 2.091-3.038, p < 0.001), the tumor in other parts of the pancreas (OR = 1.622, 95% CI 1.293-2.033, p < 0.001), moderately differentiated in the histological grade (OR = 1.652, 95% CI 1.247-2.201, p < 0.001), poorly and undifferentiated in the histological grade (OR = 1.732, 95% CI 1.309-2.304, p < 0.001), and N1 in the AJCC_N (OR = 1.708, 95% CI 1.431-2.041, p < 0.001) were independent risk factor for distant metastases of PDAC. Partial pancreatectomy in the surgery (OR = 0.083, 95% CI 0.045-0.149, p < 0.001), total pancreatectomy in the surgery (OR = 0.069, 95% CI 0.032-0.140, p < 0.001), extended pancreatectomy in the surgery (OR = 0.076, 95% CI 0.026-0.195, p < 0.001), and radiotherapy (OR = 0.303, 95% CI 0.195-0.456, p < 0.001) were independent protective factor for distant metastases (

Construction and Validation of the Nomogram
Nomogram was established according to the significant variables determined by multivariate logistic regression analysis (Figure 3). The HL test showed a p-value of 0.394 (greater than 0.05), indicating a good fit of the model. The ROC curve of the nomogram was constructed, and the results revealed that the AUC value of the model was 0.871, its cut-off value was 0.195, and the corresponding specificity and sensitivity were 0.780 and 0.855, indicating that this model had good predictive performance ( Figure 4A). The Cindex of the nomogram was 0.871 (95% CI 0.860-0.882), suggesting a good discriminatory

Construction and Validation of the Nomogram
Nomogram was established according to the significant variables determined by multivariate logistic regression analysis (Figure 3). The HL test showed a p-value of 0.394 (greater than 0.05), indicating a good fit of the model. The ROC curve of the nomogram was constructed, and the results revealed that the AUC value of the model was 0.871, its cut-off value was 0.195, and the corresponding specificity and sensitivity were 0.780 and 0.855, indicating that this model had good predictive performance ( Figure 4A). The C-index of the nomogram was 0.871 (95% CI 0.860-0.882), suggesting a good discriminatory ability of the model. We further employed the calibration curve to evaluate the agreement between the probability of occurrence of events predicted by the model and the actual probability of occurrence of events using the bootstrap method 1000 times ( Figure 4B). Finally, we plotted DCA and clinical influence curves to observe the clinical utility of the model. DCA indicated that the nomogram had a wide threshold probability range and a positive net return ( Figure 4C). In the clinical impact curve (Figure 4D), the red curve represents the number of people classified as positive by the model at each threshold probability and the blue curve represents the number of true positive people at each threshold probability, showing that with the increase of threshold probability, the accuracy of the prediction model for distant metastasis of PDAC was closer to the true value.
Curr. Oncol. 2022, 29 7 ability of the model. We further employed the calibration curve to evaluate the agreement between the probability of occurrence of events predicted by the model and the actual probability of occurrence of events using the bootstrap method 1000 times ( Figure 4B). Finally, we plotted DCA and clinical influence curves to observe the clinical utility of the model. DCA indicated that the nomogram had a wide threshold probability range and a positive net return ( Figure 4C). In the clinical impact curve ( Figure 4D), the red curve represents the number of people classified as positive by the model at each threshold probability and the blue curve represents the number of true positive people at each threshold probability, showing that with the increase of threshold probability, the accuracy of the prediction model for distant metastasis of PDAC was closer to the true value.

Survival Analysis
Survival analysis was performed according to the situation of distant metastasis and different independent risk factors. The Kaplan-Meier curves of CSS and OS were shown in Figure 5. Median OS was 18 and 6 months and median CSS was 19 and 6 months in PDAC patients without and with distant metastases, respectively, with statistically significant differences between groups ( Figure 5A,B; p < 0.01). Comparison of different age subgroups showed that advanced age reduced the survival of patients with PDAC. The median survival for both OS and CSS was 9 months in patients with PDAC aged 80 years or older, and in the age group below 60 years, the median survival for both OS and CSS was 18 months (Figure 5C,D). There was no significant difference in the survival of patients with PDAC originating from different sites of the pancreas (Figure 5E,F). Figure 5G,H showed that the survival of patients with well differentiated PDAC was significantly better than those with poorly differentiated and undifferentiated PDAC, with median OS of 20 and 11 months and median CSS of 22 and 11 months, respectively (p < 0.01). In addition, there was no statistically significant difference in survival between PADC patients with and without lymph node metastasis ( Figure 5I,J).

Survival Analysis
Survival analysis was performed according to the situation of distant metastasis and different independent risk factors. The Kaplan-Meier curves of CSS and OS were shown in Figure 5. Median OS was 18 and 6 months and median CSS was 19 and 6 months in PDAC patients without and with distant metastases, respectively, with statistically significant differences between groups ( Figure 5A,B; p < 0.01). Comparison of different age subgroups showed that advanced age reduced the survival of patients with PDAC. The median survival for both OS and CSS was 9 months in patients with PDAC aged 80 years or older, and in the age group below 60 years, the median survival for both OS and CSS was 18 months (Figure 5C,D). There was no significant difference in the survival of patients with PDAC originating from different sites of the pancreas (Figure 5E,F). Figure 5G,H showed that the survival of patients with well differentiated PDAC was significantly better than those with poorly differentiated and undifferentiated PDAC, with median OS of 20 and 11 months and median CSS of 22 and 11 months, respectively (p < 0.01). In addition, there was no statistically significant difference in survival between PADC patients with and without lymph node metastasis ( Figure 5I,J).

Discussion
PDAC accounts for approximately 90% of all pancreatic tumors and is an extremely aggressive malignancy [12]. Although patients with PDAC receive anticancer treatment, the outcome is not satisfactory, and most PDAC patients die from peritoneal spread or distant metastasis [13]. The prediction of PDAC patients with a high probability of distant metastasis allows the development of more aggressive and precise treatment measures, which is of great relevance to improve survival. Therefore, the present study investigated risk factors and developed a valid prediction model for distant metastasis of PDAC. The findings showed that risk factors for distant metastasis of PDAC included young age, worse histological differentiation, body and tail tumors originating from the pancreas, and lymph node metastasis. The risk model established had good predictive performance and great potential application. The AUC values of the ROC curve and C-index showed that the model had good predictive performance and discriminatory ability. The calibration curve showed that the predicted probability of the model was in high agreement with the actual predicted probability. The DCA and clinical influence curve revealed that the model had great potential clinical utility.
The relationship between age and the aggressive progression of cancer has been demonstrated. Previous studies on thyroid cancer, breast cancer, and colorectal cancer revealed an inverse association between age and the risk of distant metastasis of malignant tumors [14][15][16][17]. In this study, we found that the risk of PDAC metastasis decreased with increasing age. The possible reason for the association of younger age with distant metastasis is the progressive degeneration of B-and T-lymphocytes with advancing age [18]. The adaptive immune system also undergoes significant disruption through more complex pathways, and age-related deterioration of the immune system may actually be protective by depriving the metastatic process of key immune cellular components [19]. In addition, aging inhibits cancer cell metastasis by altering the extracellular matrix through non-enzymatic glycosylation and reducing the activity of matrix-modifying proteases [18]. Survival analyses showed that younger patients with PDAC had longer survival time than older patients. The potential causes perhaps were younger patients usually have a more positive attitude toward therapies and have better physical condition to tolerant various treatment modalities. However, this study did not consider whether the elder patients received less systemic therapy than the younger patients. In addition, since elderly PDAC patients are in poorer physical condition, the probability of non-neoplastic death is higher, resulting in shorter survival and thus not enough time for distant metastases to occur, which is also a possible factor contributing to the low risk of distant metastases in elderly patients. Therefore, clinicians should pay special attention to whether patients under 60 years of age are at high risk of distant metastases, which may affect their prognosis. Yamaguchi et al. [20] showed that precancerous pancreatic cells could undergo latent metastasis at an early stage and remain latent in the host organ. Although these cells did not appear malignant at an early stage, what is surprising is that such distant metastases appeared before the development of the primary tumor site. Such early disseminated cells can develop in parallel with the primary tumor [20,21]. Therefore, the combined effect of age and latent metastasis of precancerous cells should be considered in young PDAC patients at an early stage, and aggressive management strategies should be given to minimize the chances of metastasis after treatment.
Due to the anatomical location, patients with PDAC in the body and tail of the pancreas usually have symptoms caused by biliary obstruction later than patients with PDAC in the head of the pancreas. Consequently, in clinical practice, PDAC in the body and tail of the pancreas is often detected with more advanced symptoms than PDAC in the head of the pancreas [22]. Previous studies have also demonstrated differences in biological behavior and molecular levels between PDAC in different locations of the pancreas [23]. PDAC originating from the body and tail of the pancreas is more aggressive since it is enriched for gene programs involved in tumor cell invasion and epithelial-to-mesenchymal transition and is characterized by a poor antitumor immune response [24]. A single-center retrospective study showed that the pattern of distant metastasis was significantly different in PDAC from the body and tail than in PDAC from the head of the pancreas [25]. PDAC located in the body and tail of the pancreas was larger in diameter and more prone to distant metastases [25]. A multicenter study similarly demonstrated that PDAC from the body and tail of the pancreas was significantly more aggressive than PDAC from the head of the pancreas [26]. The risk model in this study suggested that PDAC in the body and tail of the pancreas had a higher risk score for distant metastasis than PADC in the head of the pancreas. Our data demonstrated that there was no significant difference in survival time for patients with PDAC from different parts of the pancreas. It is important to explain that the survival analysis in this study was performed based on all patients with PADC, including those with and without distant metastasis. For patients with advanced PDAC, PDAC in the body and tail of the pancreas was more malignant than PDAC in the head of the pancreas [27]. In PDAC patients with stage I, survival rates were higher for cancers of the body and tail of the pancreas than for cancers of the head of the pancreas, but the opposite was true in patients with stage II to IV PDAC [22]. As a result, patients with middle to advanced PDAC in the body and tail of the pancreas have a lower survival rate and are more prone to metastasis. More attention should be paid to these patients.
Tumor grade correlates with the malignant biological behavior of cancer. Several studies have demonstrated that the degree of tumor differentiation was a determinant of distant metastasis [28,29]. In the present study, the proportion of poorly differentiated and undifferentiated tumors (26.8%) was significantly higher in PDAC patients with distant metastasis than in well differentiated (17.3%) and moderately differentiated (19.7%) tumors. Multivariate logistic regression analysis showed that the risk of distant metastasis of PDAC increased as the degree of differentiation decreased. Survival analysis showed that better differentiated tumors had significantly longer survival times.
The lymphatic tract is an important route for tumor cells to achieve metastasis. In colorectal cancer, more than one-third of distant metastasis may arise from lymph node metastasis, which share a common origin with distant metastasis [30,31]. Some tumor cells infiltrate into the lymphatic vessels and then colonize distant organs or tissues through the lymphatic circulation and continue to grow into metastases [32]. Han et al. [33] detected stem-like lymphatic circulating tumor cells from the thoracic duct that empties lymph directly into the circulation, suggesting a key role for the lymphatic system in mediating distant metastasis of cancer. The relationship between lymph node and distant metastasis has also been demonstrated in mouse cancer models by injecting tumor cells directly into lymphatic vessels and using photoswitchable tumor cell models [34,35]. Our data also showed that lymph node metastasis leads to an increased risk of distant metastasis of PDAC. Inconsistently, our data show that patients with lymph node metastases have a significantly lower proportion of distant metastases than patients without lymph node metastases. Accordingly, further large cohort studies are needed to reveal the relationship between lymph node metastasis and distant metastasis. Although regional lymph node dissection was not statistically significant in multivariate logistic regression, its demographics showed a significant reduction in the proportion of patients with distant metastasis.
We have to acknowledge the deficiencies of this research. First, the risk model for predicting distant metastasis of PDAC was based on patient information screened from a public database. Therefore, the effectiveness of external application of the model has yet to be tested. Second, the SEER data do not provide the specific dosing regimen of chemotherapy received by PDAC patients and whether they received neoadjuvant chemotherapy. This means that we might not demonstrated how neoadjuvant chemotherapy could change these results. Third, this database only included synchronous distant metastasis information, which meant that the patients developed metachronous metastasis lesions were not included in our study. Furthermore, the database only provided limited metastatic sites such as the bone, liver, lung and brain, and thus other sites of metastasis are not known. Finally, the nomogram was established based on a retrospective study and needs to be further validated in multicenter prospective cohorts and clinical trials. Despite these limitations in this retrospective study, the nomogram model has practical utility in different populations. Nomogram has been proved to be an efficient and instructive model that can effectively assist clinicians in providing personalized management.

Conclusions
In conclusion, our findings suggested that young age, poorer histological differentiation, body and tail tumors originating from the pancreas, and lymph node metastasis were risk factors for distant metastasis of PDAC. Furthermore, the present study successfully constructed a risk model with promising predictive value for distant metastasis of PDAC, which provides guidance for clinical practice in predicting distant metastasis of PDAC.

Institutional Review Board Statement:
This study complies with the requirements of the Declaration of Helsinki. Since SEER data are publicly available, this study did not require approval from the institutional ethics committee.
Informed Consent Statement: Patient consent was waived due to the fact that this is a retrospective study.

Data Availability Statement:
The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest:
All authors have no conflict of interest to declare.