1. Introduction
Peptic ulcer disease (PUD) is one of the most prevalent gastrointestinal diseases in the whole world and it consists of mucosal ulceration and exposure of the muscularis mucosae. It was traditionally attributed to acid hypersecretion, dietary factors and stress, though several other risk factors have been known to contribute: namely, smoking, alcohol abuse,
Helicobacter pylori infection, advanced age or prolonged nonsteroidal anti-inflammatory drug (NSAID) use [
1,
2,
3,
4]. Despite major advances in the management of PUD over the decades, including the widespread use of proton pump inhibitors (PPIs) and eradication therapy for
Helicobacter pylori, the lifetime prevalence is still approximately 10% [
5,
6] and complications like bleeding, perforation and gastric outlet obstruction continue to represent a significant burden to health systems all over the world [
2,
3,
4].
Among these complications, hemorrhage is by far the most common, with perforation being the second, in a ratio of 6:1. Nevertheless, perforation is the most dreadful and the most common cause for emergency surgery in PUD patients [
1]. It occurs in approximately 2–10% of PUD patients, carries a lifetime risk of 5% and accounts for up to 70% of ulcer-related deaths [
1,
7]. Perforated peptic ulcer (PPU) is a life-threatening surgical emergency, and patients usually present with sudden-onset severe epigastric pain, a rigid abdomen and radiological evidence of free intraperitoneal air and fluid. The reported 30-day mortality for PPU ranges from 9% to 30% [
8,
9,
10,
11,
12,
13], with morbidity standing much higher at 31–63% [
8,
10,
13]. Mortality increases markedly in elderly patients and those with delayed presentation, severe comorbidities or presence of septic shock or organ failure [
14,
15].
Over recent decades, the surgical management of PPU has shifted significantly towards minimally invasive approaches such as laparoscopic repair, which reduces postoperative pain, hospital stay and wound complications, without increasing morbidity or mortality. Nonetheless, the choice between open and laparoscopic repair is highly dependent on the physiological status of the patient, surgeon expertise and institutional resources. Additionally, advances in perioperative care and sepsis control have also contributed to improved survival and lower incidence of complications [
1,
16,
17,
18].
Despite these advances in critical care and minimally invasive surgery, PPU outcomes remain strongly influenced by patient-specific factors (age, comorbidities, sex, lifestyle) and ulcer-related factors (site, size, duration) [
19,
20]. As such, swift diagnosis and treatment are key in keeping morbidity and mortality to a minimum [
21,
22]. Recognizing which patients are at most risk of developing complications or losing their life is of great importance. In this context, several scoring systems have been developed to predict patients’ prognosis, among which the Boey score, PULP (peptic ulcer perforation) score and ASA (American Society of Anesthesiologists) physical status classification are amongst the most used. The Boey score, one of the earliest and most widely used scoring systems developed, uses only three variables and thus excels for its simplicity [
23,
24]. Conversely, the PULP score, which was more recently developed, comprises eight variables and may arguably offer a more comprehensive assessment of a patient’s status [
19,
25]. Meanwhile, the ASA classification, whilst not specific to PPU, is one of the most-used scoring systems to evaluate pre-operative physical status in surgical patients [
23].
The available literature regarding the utility of scoring systems in predicting mortality in PPU is not abundant and it becomes scarce and small-scale when predicting morbidity.
This article aims to compare the predictive value of these three scores in assessing morbidity and mortality in patients with PPU at our hospital. By understanding these scores’ potential and limitations, clinicians can improve risk stratification and decision-making, which in turn optimizes treatment strategies for PPU patients and hopefully improves patient outcomes.
2. Materials and Methods
This single-center retrospective cohort study was undertaken at Hospital do Divino Espírito Santo de Ponta Delgada, EPE, in Portugal. With 450 beds, it is the largest hospital in the nine-island archipelago of the Azores, serving a population of over 150,000 people.
We included all patients who were surgically treated for PPU at our center over a 5-year period, between 1 January 2020 and 31 December 2024. PPU patients were identified by using surgical logbooks and data from the electronic clinical chart of the General Surgery Department database (which includes all patients submitted to surgery by any member of the department). Seventy-eight patients who were surgically treated for a perforated ulcer were identified over the 5-year period. However, 2 patients were excluded on account of the ulcer being malignant. Non-operative patients were not considered for this cohort, as they are not included in the surgical logbooks. There was no age restriction.
Diagnosis was largely based on clinical signs, laboratory values and imaging: either radiography or computerized tomography (CT). The standard procedure for PPU at our institution is primary closure with interrupted sutures covered with a pedicled omentoplasty, although it can change based on surgeon preference and intraoperative findings. Occasionally, when primary closure is not possible due to tension in the tissues, a pedicled omentum patch is performed. Most of these procedures are performed via laparoscopy. However, in select cases where local conditions prove too difficult, laparoscopy might be converted to laparotomy. When dealing with hemodynamically unstable patients, an open approach might be the choice from the start. All patients were started on intravenous antibiotics and high dose PPI (40 mg twice a day).
Our main outcome measure was 30-day mortality and morbidity.
Patients’ electronic records were accessed for information on demographics, comorbidities, vital signs in the Emergency Room, duration from symptom onset to surgery, preoperative laboratory tests, ASA classification, complications and mortality.
With this data, the Boey, PULP and ASA scores were calculated for each patient. All necessary clinical and laboratory values were complete, and no data were missing.
The Boey score was measured by the presence of shock, delay from symptom onset to surgery > 24 h and presence of a major medical illness (
Table 1).
The PULP score was determined based on age > 65 years, cirrhosis, AIDS/active cancer, concomitant use of steroids, presence of shock, time from symptom onset to surgery > 24 h, serum creatinine > 1.5 mg/dL and ASA classification [
19] (
Table 2).
In contrast to the original Boey score, which defined shock solely as a blood pressure < 100 mmHg [
24], the present study defined shock based on both blood pressure < 100 mmHg and heart rate > 100 beats per minute. This definition aligns with the criteria used in the PULP score, reflecting more current clinical practices. Major medical illnesses included active malignant disease, acquired immunodeficiency syndrome (AIDS), chronic obstructive pulmonary disease (COPD), diabetes mellitus, heart disease, liver cirrhosis or other chronic disease, contrary to the original Boey score definition, which only included cardiorespiratory disease, renal failure, diabetes or hepatic pre-coma [
24]. This change was made to reflect the current clinical understanding of conditions that significantly affect a patient’s surgical outcomes and recovery. Both these modifications were also made to standardize the variables between Boey and PULP scores.
The ASA classification considers the patient’s pre-existing comorbidities and present clinical condition, indicating a healthy patient, mild systemic disease, severe systemic disease, severe systemic disease that is a constant threat to life or a moribund person who is not expected to survive without an operation [
26] (
Table 3).
Complications were defined as any deviation from the expected post-operative course, according to the Clavien–Dindo classification within 30 days of index surgery [
27], and mortality was defined by death within 30 days of index surgery.
Statistical analyses were performed using IBM SPSS 29.0.1.0 (171) and a p-value < 0.05 was considered statistically significant. Continuous variables were assessed for normality, using the Shapiro–Wilk test.
The predictive performance of the Boey, PULP and ASA scores for 30-day mortality and morbidity in PPU patients was evaluated using receiver operating characteristic (ROC) curve analysis. The area under the ROC curve (AUC) and the corresponding 95% confidence intervals (CI) were calculated to assess the discriminatory ability. ROC analysis assumptions were verified by ensuring binary outcome variables, independence of observations and the appropriate ordering of test values. ROC analysis was considered appropriate, despite the sparse number of mortality events, and this limitation was considered when interpreting AUC estimates. Although a comparison of AUCs was considered, formal AUC comparisons were not performed due to the small sample size and the low number of mortality events. Therefore, differences in predictive performance between scores were evaluated based on AUC values and diagnostic performance metrics.
The optimal cutoff point for each score was found according to the highest sum of sensitivity and specificity (Youden’s Index). Based on these cutoffs, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated.
4. Discussion
In the present study, surprisingly for us, our 30-day mortality rate was 7% and morbidity was 32%, which is within the lower end of what is reported in the literature, at approximately 9% to 30% for mortality [
8,
9,
10,
11,
12,
13] and 31–63% for morbidity [
8,
10,
13]. Studies with higher mortality and morbidity rates often attribute them to patient inherent factors, including advanced age, severe comorbidities (such as shock, renal dysfunction or malignancy) or female population (especially postmenopausal women, as the loss of estrogen’s protective effect on the gastric mucosa and higher exposure to chronic NSAIDs use, due to pain related to osteoporosis and rheumatologic conditions, contribute to PUD and its severity) [
8,
9,
10,
13]. The small mortality rate observed in this study, even though shock, major medical illness and other comorbidities were present in a significant subset of patients, may reflect a younger population (median age of 49.9 years), as elderly patients are at higher risk for complications and death. The predominance of male patients also likely influenced the low mortality rate. These findings may also be justified by good quality care, like a high minimally invasive surgery rate (87% laparoscopy), which has been shown to reduce complications and recovery time. They contrast with the levels found in mainland Portugal, as reported by Pereira et al. in a study at the second-largest center in our country, where 29% of 169 patients between 2012 and 2019 received a fully laparoscopic approach [
16]. Our laparoscopy rate is also higher than the one reported in Italy by Costa et al., where 44% of patients were given a fully laparoscopic primary repair [
17], or in South Korea, by Kim et al., where laparoscopy was performed in 59% of 598 patients [
18]. On the other hand, our laparoscopy rate may also reflect a selection bias that is inherent to our population, which mainly consists of younger patients, who are likely in better clinical condition and have better physiological reserves. All the while, the present results suggest that mortality can be minimized in timely treated populations, with minimally invasive care. However, even in a lower-risk setting, the predictive value of clinical scores remains important for identifying which patients are at risk of poor outcomes, and acting accordingly.
Some small-scale studies have already investigated the predictive value of these clinical scores regarding mortality, but few have done so for morbidity. In the original study, Boey et al. [
24] demonstrated that with a higher score, there was a progressive increase in mortality: 1.5%, 14.4%, 32.1% and 100% mortality for a score of 0, 1, 2 and 3, respectively. These results were further validated by Lee et al. [
21] in a study with 436 patients, where escalating rates of mortality were reported alongside higher Boey score values. Nevertheless, morbidity was not successfully predicted by Lee et al., which suggests that while the score effectively identifies those at risk of death, postoperative complications are likely influenced by other factors that are not considered in the score. Additionally, Agarwal et al. also found similar results in a cohort of 180 patients in northwestern India [
28]. More recent studies keep supporting these conclusions: namely, Ghobashy et al. [
29], who also reported, in a cohort of 52 Egyptian patients, that the Boey score achieved an AUC of 0.866 for mortality prediction.
Furthermore, the PULP score, the most recent of the two scores, has also shown promising results. Developed by Møller et al. [
19], it was specifically designed to predict 30-day mortality in PPU patients, incorporating laboratory markers, apart from the physiological ones that constitute the basis for other clinical scores. With a cohort of 2668 patients surgically treated for PPU, the 30-day mortality rate was 26.6%. The PULP score achieved an AUC of 0.83, outperforming both the Boey (AUC ≈ 0.70) and ASA (AUC ≈ 0.78) scores. Subsequently, Søreide et al. [
23] compared the Boey, ASA and PULP scoring systems in 172 PPU patients and found a slightly superior discriminative power for PULP, when compared to Boey and ASA scores. Similarly, Nichakankitti et al., in a Thai cohort of 140 patients, reported a PULP AUC of 0.784 versus 0.728 for Boey and 0.776 for ASA [
30]. Ghobashy et al. [
29] also studied the PULP score predictive ability and showed an even higher performance with an AUC of 0.935 (versus 0.866 for Boey).
Meanwhile, several subsequent studies have also explored the PULP score ability to predict postoperative morbidity. While not achieving as good results as those for mortality prediction, the PULP score demonstrated a moderate but meaningful predictive value for morbidity. In the study by Nichakankitti et al., the AUC for morbidity prediction was 0.727 and outperformed the Boey (AUC = 0.671) and ASA (AUC = 0.684) scores [
30]. Likewise, Saafan et al., in a retrospective study of 152 patients with perforated duodenal ulcers, found that a PULP score ≥ 8 was significantly associated with postoperative complications and that it demonstrated a better discriminative ability for morbidity than Boey or ASA [
31]. However, not all studies report equally strong findings. Ghobashy et al. reported only modest results for the PULP score for morbidity prediction (AUC = 0.694), which was comparable to Boey (AUC = 0.698) and slightly better than ASA (AUC = 0.624). Several factors may be the cause of these differences, such as morbidity definition, differences in healthcare infrastructure and the varying prevalence of risk factors in each population.
With these findings in mind, we aimed to evaluate the predictive accuracy of Boey, PULP and ASA scores regarding 30-day mortality and morbidity at our hospital. Both the Boey and PULP scores demonstrated good performance in predicting mortality, with AUC > 0.950, while also achieving 100% sensitivity at their optimal cutoff points (1.5 for Boey and 5.5 for PULP). For our population, the Boey and PULP scores also achieved 100% NPV, meaning that there was not a single death in patients with a Boey score ≤ 1 or a PULP score ≤ 5, which can make them reliable for ruling out mortality. With regard to specificity, the Boey score had a higher value (87.3%) when compared to the PULP score (81.7%). This, in turn, confers a higher Youden’s Index for the Boey score, which translates to a lower false positive rate and renders the Boey score the more robust one of the two. On the other hand, the PPVs for Boey (35.7%) and PULP (27.8%) scores were quite low, meaning that although both scores identified all the patients who eventually died, they also flagged a considerable number of patients who survived, over-estimating patient mortality.
Alternatively, the ASA score, while also showing good AUC (0.930), had a contrasting profile. It showed a specificity of 100% at an optimal cutoff of 4.5 (meaning that it only considered ASA V patients), but at the cost of a much lower sensitivity (60%). This limitation makes the ASA score much less reliable for identifying all patients who are at risk of complication or death. The ASA score achieved an NPV of 97.3%, a slightly lower result than the other scores, which further underscores the utility of ASA in surgical patients’ risk stratification.
In assessing the ability to predict morbidity, the Boey score was the best overall performer, with an AUC of 0.914, showing good discriminative ability. At its optimal cutoff (0.5), the sensitivity was 95.8% and the specificity was 75.0%, resulting in the highest Youden’s Index and confirming it as the most balanced. It also had the highest NPV at 97.5%, indicating a strong ability to rule out morbidity when the score is 0. The PULP score also performed well, by having the highest specificity (88.5%) for values ≥ 4, but with a slightly lower AUC (0.880). Lastly, the ASA score had the weakest performance (lowest AUC and Youden’s Index). With a PPV of 51.3%, it failed to identify which patients developed complications.
This study attempts to fill a knowledge gap in the literature regarding the prediction of mortality, and especially morbidity, for PPU patients, while also studying a relevant pathology in our population and department. AUC values > 0.95 for mortality demonstrate that both Boey and PULP can be effective predictors in our population. Likewise, the Boey score was also found to be a good predictor of morbidity. These scores, especially Boey, excel due to their simplicity and can help surgeons make quick, but effective, clinical decisions. A Boey score ≤ 1 or PULP ≤ 5 ruled out all deaths in this study, which might help differentiate low-risk patients from the ones who are in need of emergent surgery, as well as reallocating assets, like ICU admission, for the patients that will benefit the most from it in a resource-limited setting, like our institution.
While the Boey and PULP scores demonstrated good performance in predicting 30-day mortality, we recognize that the small sample size and the relatively low number of deaths in this study may lead to an overestimation of the predictive power of these scores. We caution that these results should be interpreted with care, and further validation in larger cohorts with a higher number of patients and mortality events are needed to confirm our findings.
It is also important to recognize the relatively low PPV for predicting mortality observed for the Boey and PULP scores, despite demonstrating 100% sensitivity and NPV. This suggests that, although the scores were effective in ruling out mortality, they also flagged a substantial number of patients who ultimately survived. This discrepancy indicates that both scoring systems may lead to overtriage, potentially leading to unnecessary resource allocation.
The use of Youden Index-derived cutoffs allowed us to identify optimal thresholds for predicting mortality and morbidity. However, these cutoffs were derived from a small, single-center dataset, which increases the risk of overfitting. As such, these findings may not be universally applicable and can result in inflated performance estimates. These cutoffs should be validated in larger independent cohorts to ensure their generalizability.
Altogether, this study has several limitations. A relatively small cohort (n = 76) limits its statistical significance. The retrospective nature of this study also introduces potential for several biases. Since it is a single-center study, the results may not be entirely reproducible in other populations and healthcare settings. Follow-up was limited to 30 days; thus, the long-term complications may have been underestimated. Different variable definitions may have also reduced comparability with previous studies. Moreover, confounding factors like patients’ nutritional status or surgeon expertise were not evaluated, which may have further influenced the results. In this sense, results need to be interpreted cautiously.
Future studies should aim for larger, multi-center and prospective cohorts, to reduce potential biases and produce higher-quality evidence. Including new variables (such as other clinical or laboratory markers) may also help to identify other prognostic factors that may help to stratify patient severity.
Taking this into account, we believe that this could be the starting point for additional research on this topic, particularly a prospective study that evaluates the different outcomes between ulcer location and size, types of procedure and other patient-related factors, which in turn could help to guide our clinical practice in the future.