Development and Validation of a Nomogram for Preoperative Prediction of Early Recurrence after Upfront Surgery in Pancreatic Ductal Adenocarcinoma by Integrating Deep Learning and Radiological Variables

Simple Summary Early recurrence is common after curative resection for pancreatic ductal adenocarcinoma (PDAC). Patients with a high-risk of early recurrence may benefit from a neoadjuvant-first approach instead of an upfront surgery. In our study, a deep-learning model for predicting early recurrence was developed and validated. The results showed that the deep learning model outputs were an independent risk factors associated with early recurrence. Additionally, higher values of deep learning model outputs were significantly associated with worse recurrence-free survival in various subgroups and demonstrated more advanced tumor behaviors. The comprehensive nomogram that integrated the deep learning model outputs and independent radiological factors further improved the predictive performance. Our findings show that the deep learning-based nomogram could noninvasively predict early recurrence in PDAC patients, which may support clinical decision-making about upfront resection or neoadjuvant treatment strategies. Abstract Around 80% of pancreatic ductal adenocarcinoma (PDAC) patients experience recurrence after curative resection. We aimed to develop a deep-learning model based on preoperative CT images to predict early recurrence (recurrence within 12 months) in PDAC patients. The retrospective study included 435 patients with PDAC from two independent centers. A modified 3D-ResNet18 network was used for a deep learning model construction. A nomogram was constructed by incorporating deep learning model outputs and independent preoperative radiological predictors. The deep learning model provided the area under the receiver operating curve (AUC) values of 0.836, 0.736, and 0.720 in the development, internal, and external validation datasets for early recurrence prediction, respectively. Multivariate logistic analysis revealed that higher deep learning model outputs (odds ratio [OR]: 1.675; 95% CI: 1.467, 1.950; p < 0.001), cN1/2 stage (OR: 1.964; 95% CI: 1.036, 3.774; p = 0.040), and arterial involvement (OR: 2.207; 95% CI: 1.043, 4.873; p = 0.043) were independent risk factors associated with early recurrence and were used to build an integrated nomogram. The nomogram yielded AUC values of 0.855, 0.752, and 0.741 in the development, internal, and external validation datasets. In conclusion, the proposed nomogram may help predict early recurrence in PDAC patients.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) represents one of the most lethal malignancies with a five year survival rate of less than 10% [1,2]. Radical resection with adjuvant chemo(radio)therapy is considered the major therapy for treating PDAC. Even in patients with resectable PDAC, recurrence occurs in approximately 80%, with 50% occurring within one year [3,4]. Neoadjuvant chemo(radio)therapy has been reported may decrease recurrence and improve survival, especially for borderline resectable (BRPC) and locally advanced PDAC (LAPC) [5][6][7]. Therefore, identifying patients with a high recurrence risk is essential, as these patients may benefit from a neoadjuvant-first approach instead of an upfront surgery.
Currently, predicting the recurrence of PDAC is mainly based on multiple clinicopathological factors. Postoperative pathological factors, such as lymph node metastases and tumor differentiation, are the most reported independent predictors of recurrence [8][9][10]. However, pathological indicators that may not allow preoperative clinical decision-making can only be acquired after surgery. Several preoperative score indicators or clinical factors, such as Glasgow prognostic score, carbohydrate antigen 19-9 (CA19-9), and platelet-to-lymphocyte ratio, have been reported to be associated with postoperative recurrence [11][12][13]. Nevertheless, these factors have not yet been the subject of widespread recognition or validation.
Imaging methods, including computed tomography (CT) and magnetic resonance imaging (MRI), are widely used for PDAC diagnosis, staging, and resectability evaluation. Furthermore, some studies [14][15][16] reported that some imaging characteristics, such as suspicious metastatic lymph nodes, hypodense tumor in the portal venous phase, peripancreatic tumor infiltration, tumor necrosis, and presence of pancreatitis or pseudocyst, were associated with postoperative tumor recurrence. Recently, there has been growing interest in applying deep learning for prognosis prediction from cancer imaging. Deep learning is a powerful approach to the extraction of information from medical images and has shown promise for survival prediction in PDAC patients [17][18][19]. However, to the best of our knowledge, deep learning methods have not been well-evaluated for predicting recurrence in PDAC patients.
Therefore, we aimed to develop a deep learning model based on preoperative contrastenhanced CT (CECT) images for the prediction of early recurrence (ER) after upfront surgery in patients with PDAC. Moreover, a comprehensive preoperative nomogram was established by integrating the deep learning model outputs and radiological variables.

Patients
Our study recruited patients from Zhejiang University School of Medicine Affiliated Second Hospital (Center 1, for developing models and internal validation) and Southern Medical University Affiliated Zhujiang Hospital (center 2, for independent external validation). The criteria for inclusion were as follows: histologically confirmed PDAC and contrast-enhanced CT performed within 1 month before surgery. Exclusion criteria included the following: use of neoadjuvant therapy, including radiotherapy, chemotherapy, or other treatments, unavailability of preoperative computed tomography (CT) or suboptimal image quality, 90-day postoperative mortality, coexisting other malignancy, no visible mass at CT, and multiple synchronous PDAC. Patients were also excluded if their records were incomplete or had less than 12 months of follow-up without recurrence or death. The detailed process is shown in Figure S1.

Outcomes and Data Collection
Baseline characteristics, including age, sex, liver function test, and serum CA19-9 level, were collected. Preoperative CT imaging was used to assess vascular involvement and tumor size. Venous involvement comprised the portal vein, superior mesenteric vein, and spleen vein. The arterial involvement included the coeliac trunk, superior mesenteric artery, common hepatic artery, and spleen artery. The T and N stages were determined based on preoperative CT images according to the 8th AJCC TNM staging system. R0 resection was defined as the absence of identifiable tumor cells within 1 mm of the resection margin. All patients were followed every month for the first six months after surgery for adjuvant chemo(radio)therapy, every three months for the following 1.5 years, and once a year after that. At each follow-up, serum CA19-9 levels were measured, and imaging (contrast-enhanced CT or MRI) was performed. ER was defined as recurrence within 12 months after surgery. Figure 1 shows the workflow of this study. All patients received a contrast-enhanced CT scan prior to surgery. This study used images of the portal venous phase for deep learning model construction. CT acquisition protocols of the two centers can be found in the Supplementary Materials. based on preoperative CT images according to the 8th AJCC TNM staging system. R0 resection was defined as the absence of identifiable tumor cells within 1 mm of the resection margin. All patients were followed every month for the first six months after surgery for adjuvant chemo(radio)therapy, every three months for the following 1.5 years, and once a year after that. At each follow-up, serum CA19-9 levels were measured, and imaging (contrast-enhanced CT or MRI) was performed. ER was defined as recurrence within 12 months after surgery. Figure 1 shows the workflow of this study. All patients received a contrast-enhanced CT scan prior to surgery. This study used images of the portal venous phase for deep learning model construction. CT acquisition protocols of the two centers can be found in the Supplementary Materials.

CT Acquisition and Image Processing
The image intensity values were truncated from −125 to 225 HU (window width: 250 HU, window level: 50 HU) and then resampled to a resolution of 1 × 1 × 3 mm 3 using spline interpolation to decrease the variability between scans. Finally, each pixel value was standardized to the range of [0, 1]. The 3D primary tumor was manually segmented using the ITK-SNAP software (version 3.6.0) on the portal venous-phase CT images by two radiologists (X.M.L, X.C.Z, with 10 and 20 years of work experience) in consensus. The primary tumor was then cropped and resized to a uniform size (50 × 50 × 50) as the input to a 3D deep-learning network. The primary tumor was cropped and resized to a uniform size (50 × 50 × 50) as the input to a 3D deep learning network. The deep learning model was constructed using a modified 3D-ResNet18 framework, and the outputs were used for integrated nomogram construction. (B) Preoperative factors such as tumor size, lymph node metastasis, venous invasion, artery invasion, CA19-9 level, and baseline clinical characteristics were inputted into univariable and multivariate logistics regression to select independent factors for nomogram and Figure 1. The workflow of this study. (A) The primary tumor was cropped and resized to a uniform size (50 × 50 × 50) as the input to a 3D deep learning network. The deep learning model was constructed using a modified 3D-ResNet18 framework, and the outputs were used for integrated nomogram construction. (B) Preoperative factors such as tumor size, lymph node metastasis, venous invasion, artery invasion, CA19-9 level, and baseline clinical characteristics were inputted into univariable and multivariate logistics regression to select independent factors for nomogram and clinical modeling. (C) High-risk group patients showed significantly worse recurrence-free survival in the Kaplan-Meier analysis. A nomogram was created by incorporating independent radiological factors and deep learning model outputs. The ROC curve was used to compare the predictive performance of developed models. The nomogram can support shared decision-making regarding upfront resection or neoadjuvant treatment strategies.
The image intensity values were truncated from −125 to 225 HU (window width: 250 HU, window level: 50 HU) and then resampled to a resolution of 1 × 1 × 3 mm 3 using spline interpolation to decrease the variability between scans. Finally, each pixel value was standardized to the range of [0, 1]. The 3D primary tumor was manually segmented using the ITK-SNAP software (version 3.6.0) on the portal venous-phase CT images by two radiologists (X.M.L, X.C.Z, with 10 and 20 years of work experience) in consensus. The primary tumor was then cropped and resized to a uniform size (50 × 50 × 50) as the input to a 3D deep-learning network.

Deep Learning Model Development
We used a modified 3D-ResNet-18 to develop the CT-based deep learning model. The channel of the first convolutional layer of the network was modified from a three-channel into a single-channel, ensuring that the network can accept gray images as input. In addition, the 3D convolutional kernel with size (3 × 3 × 3) instead of (7 × 7 × 7) was used for a relatively small size of the input. Then two ResNet layers with 2 and 3 basic blocks were appended to increase network depth. Finally, the output layer was modified to classify patients into two classes (with or without ER). The outputted conditional probabilities indicated the individual recurrence risk used for integrated nomogram construction (The code can be found at https://github.com/fatfeifei/PDAC_recurrence_prediction (accessed on 15 May 2023)).
During the model's training, all inputted 3D images were augmented using the torchIO (version 0.18.86), such as translation, rotation, or shearing, and the magnitude of the operations. Patients in Center 1 were randomly split into development and internal validation datasets (7:3 ratio). To minimize the loss, the Adam optimizer was used with a learning rate of 1 × 10 −4 . The loss function was binary cross-entropy. The training was aborted when the loss in the validation dataset did not decrease for 10 epochs.

Performance Evaluation in Different Subgroups
Patients were classified into high-risk and low-risk groups with the median value of the output probabilities of the deep learning model in the development dataset as the cutoff. Clinicopathological characteristics and surgery details were compared between high-and low-risk groups.

Nomogram and Clinical Model Construction
The selection of significant risk factors for ER was performed using logistic regression analysis. First, the deep learning model outputs and preoperative clinical factors were analyzed using univariable logistic regression analysis. The multivariate logistics regression analysis included the factors with a p-value of less than 0.1. Next, risk factors were selected using stepwise backward elimination based on the Akaike information criterion (AIC). Better model fit was indicated by a lower Akaike information criterion. The selected variables were then used to generate the nomogram. Then multivariate logistic regression was applied repeatedly without deep learning model outputs to develop the clinical model. Model discrimination was assessed and compared via area under the receiver operating curve (AUC). Calibration curves and Hosmer-Lemeshow tests were used to assess the agreement between the nomogram prediction and the actual observed rate.

Patient Characteristics
A total of 368 PDAC patients in Center 1 were selected and randomized to the development (n = 257) and internal validation (n = 111) datasets. The external validation dataset consisted of 67 patients from Center 2. Table 1

Development and Validation of Deep Learning Model
The range of deep learning model outputs was (−1.68-1.01) with a median value of 0.18 in the development dataset. According to the median value of deep learning model outputs, all patients were assigned to high-risk (≥median value) and low-risk (<median value) groups. Kaplan-Meier analyses depicted that patients in the high-risk group had lower recurrence-free survival (RFS) than in the low-risk group among development, internal, and external validation datasets (Figure 2A-C). Meanwhile, across the development, internal, and external validation datasets, patients who experienced ER had higher values of deep learning model outputs than those without ( Figure 2D-F). The ROC curve analysis demonstrated that the deep learning model provided AUC values of 0.836, 0.736, and 0.720 in the development, internal, and external validation datasets for predicting ER, respectively ( Figure 2G).
Clinicopathological characteristics were also compared between the high-risk and low-risk groups.
The high-risk group showed more aggressive tumor behavior, including higher CA19-9 level, advanced T stage, lymph node metastasis, larger tumor size, and poorer tumor differentiation (all p < 0.005). Additionally, in patients with vascular involvement, adjacent organ invasions occurred more frequently in high-risk groups (all p < 0.005). The detailed comparison results are provided in Table S1. Clinicopathological characteristics were also compared between the high-risk and low-risk groups.
The high-risk group showed more aggressive tumor behavior, including higher CA19-9 level, advanced T stage, lymph node metastasis, larger tumor size, and poorer tumor differentiation (all p < 0.005). Additionally, in patients with vascular involvement, adjacent organ invasions occurred more frequently in high-risk groups (all p < 0.005). The detailed comparison results are provided in Table S1.
Moreover, we tested the prognostic value of the binary deep learning model outputs within each subgroup of patients. The results demonstrated significant differences in RFS between high versus low-risk patients in all subgroups (Figure 3).

Nomogram and Clinical Modeling
The multivariable logistic analysis of ER with deep learning model outputs and preoperative clinical factors is shown in

Nomogram and Clinical Modeling
The multivariable logistic analysis of ER with deep learning model outputs and preoperative clinical factors is shown in Table 2. The results revealed three independent pre-    Table 3 summarized the predictive performance of all developed models. The nomogram yielded AUC values of 0.855, 0.752, and 0.741 in the development, internal, and  Table 3 summarized the predictive performance of all developed models. The nomogram yielded AUC values of 0.855, 0.752, and 0.741 in the development, internal, and external validation datasets, respectively ( Figure 4B-D). DeLong's test showed that the nomogram model outperformed the clinical model in the development dataset (p < 0.001) but was comparable in the internal (p = 0.293) and external (p = 0.364) datasets. The calibration curves of the nomogram showed good agreements in the development, internal, and external validation datasets, respectively ( Figure 4E-G). The Hosmer-Lemeshow test further indicated the good calibration of the nomogram for all datasets (p = 0.915, 0.797, and 0.367 for the development, internal, and external validation datasets).

Discussion
This study developed a CT-based deep-learning model for predicting ER after upfront surgery in patients with PDAC. A subgroup analysis demonstrated that the median value of deep learning model outputs could stratify PDAC patients into high and low-risk groups with significantly different prognoses. In addition, the combined nomogram integrating the deep learning model outputs and radiological variables further enhanced predictive abilities.
Biomedical images contain information that reflects underlying tumor pathophysiology. Machine learning model construction based on image features has been increasingly explored for identifying high-risk PDAC patients. Radiological characteristics, such as suspicious lymph node metastasis and peripancreatic tumor infiltration, are well-known high-risk imaging features associated with early recurrence in PDAC patients [14,21]. Quantitative image analyses, such as radiomics and deep learning, represent a novelty approach contributing to decision support in oncology [22,23]. Radiomics features, such as kurtosis and grey-level non-uniformity, have the potential to indicate the presence of tumor heterogeneity. Studies [16,24] indicated that high values of these two features were linked to ER in PDAC. In addition, Sandrasegaran et al. [25] reported that high kurtosis, and the mean value of positive pixels were predictors of worse overall survival in PDAC patients. Unlike radiomics, which relied on domain expertise to define features and needed to undergo feature extraction processes, feature selection, and machine learning modeling, deep learning is the end-to-end one-step process that automatically learns effective features and simultaneously outputs predicted probability values. Lee et al. [19] developed an ensemble model that integrated a series of deep learning and machine learning models that showed better performance in predicting one-year RFS in PDAC patients than the AJCC staging system. Yao et al. [18] proposed a 3D Convolutional LSTM network using multiple CECT phase data for predicting overall survival in PDAC patients. The multivariable analysis revealed that the deep learning score strongly predicted PDAC survival.
In our study, the modified 3D-ResNet-18 was used as the backbone network for deep learning model construction. The ResNet architecture enables the depth of the network to increase without degradation and can therefore improve the representation of its learning ability [26]. The multivariable logistic regression and the nomogram plot showed that the deep learning model outputs were the strongest predictor among all preoperative clinical variables. In addition, we further dichotomized all patients into high-and low-risk groups based on the median deep learning model outputs. The discrimination ability of the binary deep learning model outputs was validated in various clinical subgroups. In addition, when clinicopathological variables were compared between high-and low-risk groups, the high-risk group patients demonstrated more aggressive tumor behavior. These results suggested that the deep learning model outputs may have the potential as a useful preoperative prognostic biomarker.
The multivariable logistic regression also found that artery involvement and cN stage were independent factors that synergistically predicted the ER. PDAC with artery involvement has been reported in association with advanced tumor characteristics, lower R0 resection rate, and higher recurrence rate [27][28][29]. With respect to the cN stage, a number of studies [14,21,30] reported that the detection of lymph node metastasis preoperative CT scans (cN1/2) was a useful predicter of ER. The nomogram integrating the deep learning model outputs and these two radiological variables showed an improved predictive performance over the clinical model. Notably, nomogram construction variables were all derived from preoperative CT imaging. The results indicated that combining deep learning and conventional radiological variables can integrate both advantages.
Our study has some limitations. First, the deep learning model was trained and validated on a relatively small dataset. Second, the retrospective design may have selection bias and unknown confounding factors. Third, the primary tumor was manually segmented. Although manual segmentation is more accurate than automatic, the process is laborious and time-consuming. Therefore, an automatic segmentation and prediction deep learning model trained in a large dataset may improve efficiency and prediction performance.

Conclusions
We proposed a preoperative deep learning model and an integrated nomogram based on preoperative CT images that can noninvasively predict ER in PDAC patients. This may aid clinical decision-making regarding upfront resection or neoadjuvant treatment strategies in PDAC patients.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers15143543/s1, Figure S1: Inclusion criteria and exclusion criteria; Supplementary Method S1. CT image acquisition protocol of two centers; Table S1. The comparison results between high-risk and low-risk groups; Table S2. Univariate and multivariate logistic analyses of risk factors for recurrence without deep learning model outputs; Table S3. Selection process for the clinical model.

Conflicts of Interest:
The authors declare no conflict of interest.