Validation of the Surgical Outcome Risk Tool (SORT) and SORT v2 for Predicting Postoperative Mortality in Patients with Pancreatic Cancer Undergoing Surgery

Background: Pancreatic cancer surgery is related to significant mortality, thus necessitating the accurate assessment of perioperative risk to enhance treatment decision making. A Surgical Outcome Risk Tool (SORT) and SORT v2 have been developed to provide enhanced risk stratification. Our aim was to validate the accuracy of SORT and SORT v2 in pancreatic cancer surgery. Method: Two hundred and twelve patients were included and underwent pancreatic surgery for cancer. The surgeries were performed by a single surgical team in a single tertiary hospital (2016–2022). We assessed a total of four risk models: SORT, SORT v2, POSSUM (Physiology and Operative Severity Score for the enumeration of Mortality and Morbidity), and P-POSSUM (Portsmouth-POSSUM). The accuracy of the model was evaluated using an observed-to-expected (O:E) ratio and the area under the curve (AUC). Results: The 30-day mortality rate was 3.3% (7 patients). Both SORT and SORT v2 demonstrated excellent discrimination traits (AUC: 0.98 and AUC: 0.98, respectively) and provided the best-performing calibration in the total analysis. However, both tools underestimated the 30-day mortality. Furthermore, both reported a high level of calibration and discrimination in the subgroup of patients undergoing pancreaticoduodenectomy, with previous ERCP, and CA19-9 ≥ 500 U/mL. Conclusions: SORT and SORT v2 are efficient risk-assessment tools that should be adopted in the perioperative pathway, shared decision-making (SDM) process, and counseling of patients with pancreatic cancer undergoing surgery.


Introduction
Pancreatic cancer (PC) represents a major cancer-related cause of death and is currently the fourth most common cause of cancer-related mortality in the USA [1,2]. Most of the cases diagnosed with PC are adenocarcinomas (PDAC) and are commonly located in the pancreatic head or neck [3,4]. In spite of the important advances in anticancer research, PC-associated mortality continues to rise and the prognosis continues to be poor. Thus, it is projected that by 2030, PC will represent the second-highest cancer-related cause of mortality [5,6], with most patients undergoing potentially curative surgery. The treatment strategy for pancreatic cancer should be multidisciplinary, including regimens of chemoand radiotherapy in conjunction with surgery [7]. On this basis, there is an urgent need for an accurate assessment of the patient's perioperative risk to facilitate shared decisionmaking (SDM) and the informed consent process while raising the standards of clinical 2 of 12 practice quality on the perioperative pathway. In addition, the adoption of a specific and sensitive risk-stratification tool allows for the accurate comparative evaluation of surgical results among institutions, departments, and surgeons for either service evaluation or clinical audit. Several such tools have been implemented into clinical practice [8]. Despite the increasing interest in more advanced risk-stratification tools, risk prediction models remain the most easily accessible choice for this purpose. Nonetheless, they are not frequently employed in everyday practice, potentially due to poor awareness amongst clinicians and with concerns about their accuracy and complexity [9].
The Surgical Outcome Risk Tool (SORT) was proposed following the 2011 National Confidential Enquiry into Patient Outcome and Death (NCEPOD) report [9]. It was developed with the goal of providing a tool that could easily provide an enhanced level of risk stratification for surgical patients in a user-friendly manner [9]. In order to be userfriendly, SORT utilizes only six clinical data variables [9]. Currently, it has been compared favorably with other previously validated risk-stratification tools, such as the ASA physical status (ASA PS) grade, and has been externally validated in groups of patients undergoing hip fracture surgery [10] and colorectal surgery [11]. In both groups [10,11], SORT was associated with acceptable discrimination and calibration levels.
Our previous study implementing preliminary outcomes [12] was the first to validate SORT in patients undergoing surgery for pancreatic cancer, but we did not perform a comparison with other traditional risk-stratification tools. Furthermore, in that study [12], the number of included patients was limited. In addition, an updated version of SORT (SORT v2) has been developed that takes into consideration the physician's risk estimation of the surgery [13]. In this context, the present study aimed to validate the SORT and SORT v2 models in adult patients undergoing surgery for pancreatic cancer and compare them with other traditional risk prediction models.

Data Extraction Strategy
The current study was performed according to a protocol designed and agreed upon by all authors. Data were extracted from a prospectively maintained database of consecutive patients with pancreatic cancer who underwent surgery between 1 January 2015, and 31 August 2022. All procedures were performed by a single surgical team led by the senior author (D.Z.) at the Department of Surgery, University Hospital of Larissa, Greece. Ethical approval was obtained by the Scientific Committee of the hospital (Protocol number: 50271/30- [10][11][12][13][14][15][16][17][18][19]. Informed consent was waived based on the retrospective nature of the present study. No imputation methods were employed for missing data. We extracted and included data regarding age, gender, body mass index (BMI), ASA (American Society of Anesthesiology) grade, history of previous operations, operative priority, surgical severity, malignancy status, staging, and type of procedure. We defined mortality as any patient death that occurred during the first 30 days or during the hospital stay if longer than 30 days. The predicted risk of mortality was determined using the SORT and SORT v2 models. Moreover, the predicted mortality was calculated by employing POSSUM and P-POSSUM for all patients. In all cases where the patients' data were incomplete, they were excluded from the analysis.
In order to identify the accuracy of each model, we performed separate sensitivity analyses. These additional analyses were performed to evaluate the discrimination and calibration traits of each model relevant to predicting the perioperative mortality risk based on (1) a procedure-related variable: surgical operation (pancreaticoduodenectomy or total pancreatectomy or distal pancreatectomy); (2) cancer-related variables: CA19-9 levels (≥500 mU/L vs. <500 mU/L), neoadjuvant treatment (received or not); and (3) patientrelated variables: age (≥70 vs. <70), pre-operative ERCP (yes or no), and postoperative pancreatic fistula (POPF) (yes or no). The risk for POPF was assessed using the formula described by Weng et al. [14]. We employed these variables given that they might affect postoperative mortality.

Primary and Secondary Endpoints
The validation of the SORT and SORT v2 models in adult patients with PC undergoing surgery was set as the primary endpoint of the present study. Secondary endpoints included (1) the comparison of SORT and SORT v2 with the POSSUM and P-POSSUM models regarding their discrimination and calibration traits in predicting perioperative mortality and (2) a subgroup sensitivity analysis.

Statistical Analysis
The SORT score was calculated using the method and web platform developed and proposed by Protopappa et al. [9], in addition with the updated version incorporating subjective information to calculate the SORT v2 score [13]. The SORT and SORT v2 models implement five variables: ASA physical status, operative priority level (elective, urgent, immediate), surgical specialties (gastrointestinal, thoracic, or vascular surgery), surgical severity (major/complex), and malignancy status, age (65-79 or ≥80 years). Surgical severity is calculated automatically upon the entry of procedure details. According to the developers' guidelines, if the procedure performed is not listed, the nearest available procedure is used for calculation [13]. The procedures from the list we used were "total pancreatectomy" and "distal pancreatectomy", both associated with major severity. SORT v2 also implements the physician's perceived mortality risk [13]. The POSSUM and P-POSSUM scores were calculated by employing the method proposed by Copland [15] and Prytherch [16], respectively.
Discrimination (the ability to distinguish patients who died from patients who did not die) and calibration (the ability to successfully predict the mortality rate) traits of the SORT and SORT v2 models were assessed. Discrimination was assessed by producing receiver-operating characteristic (ROC) curves and calculating the area under the ROC curve (AUC). The AUC was determined by calculating the 95% confidence intervals and was compared by employing nonparametric paired tests, as described by DeLong [17]. The model discrimination was defined as poor, fair, or excellent when the AUC was of <0.70, 0.70-0.79, and 0.80-1.00, respectively [17].
The calibration was calculated for each included model by measuring the expected mortality and then comparing it with the observed mortality. An observed-to-expected ratio of 1 represented perfect accuracy, a ratio < 1 represented an overestimation of mortality rate, and a ratio of >1 demonstrated an underprediction. Furthermore, calibration was also assessed by employing the Hosmer-Lemeshow (H-L) goodness of fit test, with a lack of fit defined as a p-value ≤ 0.05 [18]. In cases where the outcome variable separated the predictor variable completely, a perfect separation was described.
All extracted data were tabulated using Microsoft ® Excel 16.61 (Microsoft, Redmond, WA, USA) and were analyzed by employing Prism ® Graphpad 9.3.1 for Mac (GraphPad Software, San Diego, CA, USA).

Baseline Patient Characteristics
The findings of the current study are presented in accordance with the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines [19]. The trial flowchart for the study, which demonstrates the data extraction strategy, is reported in Figure 1. In total, 252 patients were screened, and 212 patients were finally incorporated. The patients' baseline characteristics are presented in Table 1. Of the total group, 78 (36.8%) female patients were included, with a mean age of 67.2 (standard deviation (SD)-10.5) years. Most of the cases presented with a re-sectable tumor (71.7%) and underwent an elective procedure (91.5%). The tumor was located primarily in the head (180 patients-84.9%) of patients. Most of the cases were PDAC 190 (89.6%), with a mean CA19-9 of 502.9 (SD: 1136) U/mL. A total of 178 (84%) patients underwent pancreaticoduodenectomy, sixteen (7.5%) a total pancreatectomy, and eighteen (8.5%) a distal pancreatectomy. Finally, the overall 30-day mortality rate was 3.3%.

Performance of SORT and SORT v2 Models in the Total Dataset
The performance of SORT is presented in Table 2

Performance of Mortality Prediction Models in Subgroups
The outcomes derived from the subgroup analysis are shown in Table 3 and Figure 3. The SORT and SORT v2 models demonstrated an excellent discrimination level in predicting perioperative mortality in all subgroups. In certain subgroups, SORT and SORT v2 models demonstrated a perfect separation, which is translated into a perfect prediction of mortality (Table 3). Furthermore, POSSUM and P-POSSUM were inferior in terms of the discrimination level in most of the subgroups when compared with SORT and SORT v2. In addition, SORT demonstrated a high level of calibration in all subgroups, with the lowest value reported in patients undergoing pancreaticoduodenectomy with high levels of CA19-9 and a previous ERCP. In all subgroup analyses except "ERCP or No ERCP", SORT and SORT v2 underestimated the perioperative mortality.

Performance of Mortality Prediction Models in Subgroups
The outcomes derived from the subgroup analysis are shown in Table 3 and Figure  3. The SORT and SORT v2 models demonstrated an excellent discrimination level in predicting perioperative mortality in all subgroups. In certain subgroups, SORT and SORT v2 models demonstrated a perfect separation, which is translated into a perfect prediction of mortality (Table 3). Furthermore, POSSUM and P-POSSUM were inferior in terms of

Discussion
The current original trial represents the first attempt to validate SORT and SORT v2 models in (1) PC surgery and (2) compare them with additional traditional risk models such as POSSUM and P-POSSUM, and (3) perform a sensitivity subgroup analysis. This study also represents the first external validation of SORT v2 currently provided in the literature and especially in PC surgical patients. The outcomes provided by the present study directly affect daily clinical practice, suggesting the potential value of SORT and SORT v2 in the perioperative pathway and during the counseling and shared decision-making (SDM) processes for patients with PC scheduled for surgery.
SORT remains a useful and probably the most user-friendly risk-stratification tool. It was developed by Protopapa et al. [9], who aimed to accurately predict the 30-day mortality in an objective manner. The present trial demonstrated that six pre-operatively available clinical variables could efficiently predict postoperative mortality with a higher accuracy compared to other traditional risk assessment tools, such as ASA-PS [9]. In the same context, SORT v2 was proposed as an enhanced version of the original SORT as it implements the physician's perception of the perioperative mortality risk [13]. Other risk-stratification tools that have been implemented in clinical practice and were included for comparison in the current study are POSSUM and P-POSSUM. Given that both patients and physicians have implemented these tools in the SDM process, it was important to compare them with SORT and SORT v2. In addition, according to recent evidence [15], traditional risk-stratification tools, such as POSSUM and P-POSSUM, were associated with poor accuracy, while new models are required to provide enhanced calibration and discrimination traits, according to findings derived from prospectively collected data [15]. Our outcomes provide a response to this call for enhanced risk-stratification models in the setting of PC surgery. SORT and SORT v2 demonstrated the best-performing discrimination and calibration characteristics compared with all other risk-stratification models assessed in the present study. Our outcomes not only follow the preliminary outcomes of our previous study [12] but also highlight the superiority of both tools compared with POSSUM and P-POSSUM and validate SORT v2 for the first time. In this context, the outcomes of this study have direct implications for the SDM process of patients with PC regarding their postoperative mortality risk, thus helping patients to co-shape their treatment strategy.
The efficiency of both SORT and SORT v2 was also demonstrated in the sensitivity subgroup analyses. SORT and SORT v2 were associated with excellent discrimination traits and enhanced calibration. However, we should further stress our comparative outcomes regarding patients undergoing pancreaticoduodenectomy with raised levels of CA19-9 and pre-operative ERCP. In this group, SORT and SORT v2 demonstrated excellent calibration and discrimination traits and showed significantly lower H-L values compared to POSSUM and P-POSSUM. Patients with these baseline characteristics represent the most difficult cases faced by our HPB multi-disciplinary teams. These are commonly symptomatic patients, diagnosed through a thorough diagnostic workup after presenting with jaundice. At that stage, they commonly present CA19-9 levels over 500 U/mL, thus demonstrating an aggressive tumor biology, although the tumor is borderline resectable in most of these cases. They also commonly undergo ERCP stenting to alleviate jaundice prior to surgery, especially in cases in which neoadjuvant treatment is chosen. In this context, it is of great importance to have access to such an effective and reliable risk-stratification tool during the MDT meetings when such complex cases are discussed, in addition to during the patients' counseling process.
We have not found a significant difference between SORT and SORT v2, thus proposing that the physicians' estimation of perioperative mortality risk does not significantly affect the original SORT outcomes. Nonetheless, in all analyses, SORT v2 demonstrated slightly better discrimination and calibration compared with SORT. Consequently, it would be interesting to investigate whether there is a discrepancy between SORT and SORT v2 in patients undergoing pancreaticoduodenectomy with vascular reconstruction. Despite our original intention to perform such a subgroup analysis, there were limited available cases that underwent pancreaticoduodenectomy with major vascular reconstruction to perform further analyses. Consequently, this clinically relevant question requires further investigation by a future trial mainly focusing on complex cases. Moreover, the findings of the current study regarding the value of clinical variables employed by SORT remain in accordance with the evidence provided by administrative datasets [20]. Finally, according to our outcomes, SORT and SORT v2 are associated with higher accuracy compared with other pre-operative (BH 2009-Barwon Health 2009) [21] and intraoperative risk-stratification tools (SAS-Surgical Apgar Score) [22], while remaining user-friendly as they implement six clinical variables.
Although POSSUM and P-POSSUM have been extensively validated [2], SORT and SORT v2 have certain advantages. To begin, both tools incorporate only six pre-operative variables, significantly fewer compared with the eighteen perioperative variables of POS-SUM and P-POSSUM. They are thus significantly easier to implement in real-life clinical practice. Moreover, POSSUM and P-POSSUM include intra-and postoperative variables that are not available during the pre-operative assessment. Finally, (P-)POSSUM contains certain subjective variables, thus increasing the interobserver variability and heterogeneity and posing a certain bias.
The current study is associated with certain limitations. One limitation is associated with the study design, given that it is a single-institution retrospective trial. Nonetheless, it should be noted that all data was prospectively collected, the patients were consecutive, the surgical team remained the same, and the surgeon's bias regarding patient or surgical approach selection was minimized as this was decided based on MDT suggestions and patients' choices after extensive counseling. In addition, given that one of the most important postoperative complications associated with high morbidity and mortality in pancreatic surgery is POPF, there is a certain limitation related to the lack of this variable in the formulas of all the risk-stratification tools implemented in the present study.
The current outcomes demonstrate that SORT and SORT v2 are feasible, friendly, and efficient risk-stratification tools that should be implemented in the pre-operative counseling and SDM process of patients with PC undergoing surgery, thus enhancing clinical quality in a cost-effective manner. In addition, they are useful instruments to be taken into consideration during multidisciplinary meetings when examining complex cases associated with comorbidities and frailty.

Conclusions
In the present study, we validated the SORT and SORT v2 risk-stratification models in adult patients undergoing surgery for pancreatic cancer. Both tools demonstrated the bestperforming discrimination and calibration compared with POSSUM and P-POSSUM. The value of SORT and SORT v2 was further confirmed by sensitivity subgroup analyses. Both tools are associated with excellent discrimination and calibration, especially in patients with PC undergoing pancreaticoduodenectomy with pre-operative ERCP and CA19-9 levels over 500 U/mL. SORT represents a feasible and efficient risk stratification tool that can be easily implemented in the perioperative pathway of patients with PC.