Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy

Shalabayeva, Laura; Bahílo Mateu, Pilar; Romeu Ferras, Marc; Díaz-Carnicero, Javier; Budía, Alberto; Vivas-Consuelo, David

doi:10.3390/a18090558

Open AccessArticle

Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy

by

Laura Shalabayeva

^1,*

,

Pilar Bahílo Mateu

²,

Marc Romeu Ferras

¹,

Javier Díaz-Carnicero

¹,

Alberto Budía

² and

David Vivas-Consuelo

^1,*

¹

Research Unit for Health Economics and Management, Polytechnic University of Valencia, Camí de Vera, s/n, 46022 Valencia, Spain

²

Department of Lithotripsy, La Fe University and Polytechnic Hospital, 46026 Valencia, Spain

^*

Authors to whom correspondence should be addressed.

Algorithms 2025, 18(9), 558; https://doi.org/10.3390/a18090558

Submission received: 29 July 2025 / Revised: 26 August 2025 / Accepted: 28 August 2025 / Published: 4 September 2025

(This article belongs to the Special Issue Mathematical Modelling in Engineering and Human Behaviour (3rd Edition))

Download

Browse Figures

Versions Notes

Abstract

PCNL treatment is often associated with complications of hemorrhagic or infectious origin, which can result in prolonged hospitalization. This study aims to develop predictive models using machine learning (ML) techniques to anticipate these outcomes. Multiple ML algorithms—including Logistic Regression, Decision Tree, Random Forest, and Extreme Gradient Boosting—were evaluated on separate validation and test datasets. The Random Forest model achieved the highest predictive performance for hospitalization need (AUC 0.726/0.736) and infectious complications (AUC 0.799/0.735). Threshold adjustment was applied to increase sensitivity, reducing false negatives. The interpretability of the models was ensured through SHAP analysis, identifying clinically meaningful variables. Risk factors for both hospitalization and infectious complications models included nephrostomy drainage, a neutrophils percentage higher than 80, Guy’s score of grade 4, leukocytes level higher than 15 or lower than 4.5, and balloon dilation, while protective features included tubeless intervention, easy localization of a stone, negative culture, and microorganism results. However, no model achieved acceptable performance for predicting hemorrhagic complications, likely due to limited data. These results suggest that AI-based models can contribute to risk stratification after PCNL. Further experiments with larger, multi-center datasets are needed to confirm these findings and improve the generalizability of the models.

Keywords:

percutaneous nephrolithotomy; PCNL; machine learning; prediction; hospitalization stay; complications

1. Introduction

Nephrolithiasis is the medical term used to describe the presence of kidney stones, which are crystalline mineral formations composed of one or more substances found in urine, such as calcium, oxalate, phosphate, or uric acid [1]. The prevalence of this disease among the Spanish population is 5.06%, corresponding to a total of 2,233,214 cases [2].

Among the possible treatments for this disease are pharmacological therapy and different types of lithotripsy, such as extracorporeal shock wave lithotripsy (ESWL), which uses ultrasonic energy generated by an external source; retrograde intrarenal surgery (RIRS), in which a catheter is introduced through the urinary tract; and percutaneous nephrolithotomy (PCNL), which is considered the most invasive of the treatments mentioned above [3,4]. According to the European Association of Urology guidelines, this is indicated for patients with renal stones larger than 2 cm and lower pole stones larger than 1.5 cm. It is also recommended by the American Urological Association (AUA) for patients with staghorn calculi [5]. However, even though PCNL is considered a minimally invasive intervention, it can cause a variety of complications of hemorrhagic or infectious origin. Several studies estimate the complication rate of this surgery to be between 4% and 50.8% [6].

Recently, there has been a significant rise in the digitalization of data within the healthcare field, which has also led to an increased use of artificial intelligence (AI), here referring to computational systems capable of learning from data, recognizing patterns, and making autonomous decisions [7]. One of the subsections of AI is machine learning (ML), which identifies patterns in categorized datasets and can deal with nonlinear relationships that traditional statistical methods may fail to capture [7,8]. This innovative approach has already been applied, for example, to identify antimicrobial compounds with potential therapeutic properties [7]. Moreover, several ML-based studies in the PCNL field have been conducted to predict outcomes such as transfusion needs, stone-free status, the need for adjuvant treatment, and other postoperative complications [8,9]. Building on these advances, the present study aims to develop a Clinical Decision Support System specifically designed to identify patients at risk of post-surgery complications and hospitalization, using machine learning algorithms.

2. Materials and Methods

2.1. Database

The database includes data of 901 patients that were treated by means of PCNL at the Department of Lithotripsy at La Fe University and Polytechnic Hospital in the city of Valencia, in Spain, between March 2011 and December 2024. As shown in Figure 1, the whole data of a patient can be distinguished as three large data packs: preoperative variables, that are gathered during external consultation before the intervention; intraoperative variables, collected during the PCNL procedure; and postoperative variables, obtained from a blood test performed approximately 4 h after the surgery was completed.

This database is subject to confidentiality restrictions and cannot be shared publicly. Access to the data may be requested from the Department of Lithotripsy of Hospital Fe, subject to institutional approval and compliance with applicable ethical guidelines.

2.2. Exclusion Criteria

Since cancer patients are hospitalized after surgery independently of whether they present any complication or not, due to protocol, these were excluded from the study. Also, pediatric patients fall under the exclusion criteria. Thus, the study is left with 883 patients.

2.3. Predictors

Among the preoperative predictive variables, the following can be found: demographic data, such as sex, age, and Body Mass Index (BMI); results of microbiology tests: culture and microorganism results; the medical history of a patient that includes diabetes variable and classification of anesthesia risk according to the American Society of Anesthesiologists (ASA); and treatment type and a set of variables that describe the stone and its location: side, major diameter, minor diameter, number, location, Guy’s score, Hounsfield Units (HU) and distance.

During the PCNL intervention, the following variables are gathered: patient position, guidance of access, catheterization, contrast, dilation method, multi-trajectory, sheath caliber, fragmentation source, localization ease, duration, drains, and tubeless.

From the blood test results, levels of procalcitonin, leukocytes, and percentage of neutrophils (%neutrophils) are included in the study. The %neutrophils variable was calculated according to

% n e u t r o p h i l s = \frac{n e u t r o p h i l s}{l e u k o c y t e s} \times 100,

(1)

2.4. Variables of Interest

The variables of interest, also called labels, are hospitalization need, presence of a complication of hemorrhagic origin, and presence of a complication of infectious origin. All of these are binary. The hospitalization need variable is defined as positive when a patient presents at least one complication from the following list: hematoma, hematuria, sepsis, infection, perforation, reintervention, bleeding, embolization, or transfusion. Hemorrhagic complication is equal to one when a patient presents one of the following complications: hematoma, hematuria, perforation, bleeding, embolization or transfusion, while the infectious complication category includes sepsis or infection.

2.5. Exploratory Data Analysis

Within the framework of Exploratory Data Analysis (EDA), first, the Kolmogorov–Smirnov test is applied to the numerical features to deduce which of them are normally distributed. Then, to analyze their influence over the label, the t-test is used for variables of normal distribution, while the Mann–Whitney test is used for those not normally distributed. The influence of categorical variables over the label is evaluated by means of the Chi-squared test.

To perform the EDA, the statistical software IBM SPSS Statistics 30.0 was used.

2.6. Missing Data Treatment

Those patients with more than 30% of the data equal to missing values or NaN were excluded from the study as well as non-labeled patients. The rest of NaNs were imputed with median or mode value for numerical and categorical variables, respectively.

2.7. Treatment of Numerical Variables

For the numerical variables of the blood test, thresholding was applied. More specifically, procalcitonin was recodified with one if its value was higher than 0.5 and with zero when its value was lower than 0.5. %neutrophils was changed to one when the original value was higher than 80 and changed to zero for the rest of values. Regarding leukocytes, this variable was set to one if its original value was higher than 15 or lower than 4.5. In other cases, it was set to zero.

2.8. Treatment of Categorical Variables

With respect to categorical variables, one-hot encoding was applied to nominal variables.

2.9. Data Splitting

The database was divided into three parts: training, validation, and test set, in the proportion of 70%/15%/15%, respectively. The database presents some imbalances. For instance, patients that require hospitalization after the surgery represent one quarter, while patients that face hemorrhagic complications are 13.5%, and patients with infectious complications are 14.2%. Thus, stratification by label was especially important.

2.10. Model Development and Evaluation

To find the most suitable predictive model, Logistic Regression and three different models based on trees were studied: Decision Tree, Random Forest, and Extreme Gradient Boosting, as these models are the ones with greatest potential for explainability, which is very important in the clinical context. For all models, the penalization for incorrect prediction of minority class via the “class_weight” parameter was applied to deal with the imbalance of the label. Furthermore, in the case of Logistic Regression, it was very important to perform prior normalization of features. Following this, the models were evaluated by means of Accuracy, Sensitivity, Specificity, and Area Under the Curve (AUC) metrics, with most importance given to the latter metric.

Python programming language (version 3.11.9) and Visual Studio Code software (version 1.103.1) were used to build and evaluate the models.

2.11. Threshold Optimization

Once the most suitable model was chosen according to AUC metric, the optimal threshold needed to be identified. In this case, it was decided to prioritize correct detection of patients with a low possibility of hospitalization need or infectious complications, so they could be excluded from those needing healthcare resources. Thus, we needed to lower the threshold to achieve higher sensitivity. To evaluate the impact of the update, the Sensitivity, Specificity, Positive Predictive Value (PPV), and Negative Predictive Value (NPV) metrics were calculated.

2.12. Explainability

In order to deduce the most important set of features for the prediction and how they influence the prediction of a class, the SHAP package, which is based on cooperative game theory, was used. The resulting plot highlights the most relevant features, where each dot corresponds to a patient. The x-axis position indicates the magnitude and direction of the feature’s impact on the predicted outcome, while the color gradient reflects the feature value, with blue representing low values and red representing high values.

3. Results

3.1. Exploratory Data Analysis Results

The gender distribution of patients is balanced: 51.9% females and 48.1% males. The overall mean age is 54.5 ± 15.4 years, and the average Body Mass Index is 27.6 ± 5.8 kg/m², which corresponds to slightly overweight individuals. Further results of the exploratory data analysis for numerical and categorical variables can be found in Table 1 and Table 2, respectively. To obtain the p-values for each feature in relation to the prediction label, a 95% confidence interval was used.

3.2. Hospitalization Need Predictive Model

Table 3 presents a comparison of the performance of the models with different architectures. The Random Forest model achieved the highest AUC metrics on both the validation and test sets, 0.726 and 0.736, respectively, and is therefore considered the most suitable approach. The ROC curves of the Random Forest model, together with their corresponding AUC values for the three dataset partitions, are shown in Figure 2.

Therefore, the threshold of the Random Forest model was moved from 0.5 to 0.29 to increase sensitivity, and the resulting confusion matrices and updated metrics can be observed in Figure 3 and Table 4, respectively. With this adjustment, a sensitivity of at least 90% was achieved. On average, 10 patients not requiring hospitalization could be correctly identified, potentially reducing the use of hospital resources. However, even with the lower threshold, some patients were still misclassified as not having complications: one in the validation set and two in the test set.

3.3. Hemorrhagic Complications Predictive Model

Table 5 presents a comparison of the performance of models for detecting patients with hemorrhagic complications. Since none of these models was able to achieve at least 70 AUC points in both the validation and test sets, it was concluded that, with the currently available data and models selected for this study, the objective of developing a model for predicting hemorrhagic complications cannot be accomplished.

3.4. Infectious Complications Predictive Model

A comparison of the performance of different models for predicting infectious complications is presented in Table 6. It can be noted that, as in case of the hospitalization need model, the most suitable architecture is Random Forest, with 0.799 and 0.735 AUC values for validation and test sets, respectively. The ROC curves of the Random Forest model, together with their corresponding AUC values for the three dataset partitions, are shown in Figure 4.

Once the best model was identified, the optimal decision threshold was adjusted. In this case, it was lowered from the default value of 0.5 to 0.15 in order to prioritize the correct identification of patients without infectious complications. The results can be observed in Figure 5 and Table 7. Similarly to the hospitalization need prediction model, lowering the threshold significantly reduced the number of patients incorrectly predicted as not having complications when, in fact, they did, which is highly critical in a clinical context. However, some false negatives still persist.

3.5. Explainability of the Models

To interpret the internal workings of the predictive models for the detection of hospitalization need and infectious complications, the SHAP (SHapley Additive exPlanations) library in Python was used. The results are shown in Figure 6.

In the model for predicting the need for hospitalization (Figure 6a), the following characteristics were identified as indicators of hospitalization: nephrostomy drainage; %neutrophils ≥ 80; Guy’s stone score grade 4; medium or difficult localization of a stone; intervention on the right side; leukocytes level ≥ 15 or ≤4.5; dilation method by balloon, staghorn-shaped stone; and the use of a holmium laser as the fragmentation source, while protection factors include tubeless intervention, easy localization of a stone, and negative culture and microorganism results. Additionally, two numerical variables showed a direct proportional relationship with the label: the largest diameter of stone and BMI.

In case of prediction of infectious complications (Figure 6b), indicators of presence of complications of infectious origin are as follows: procalcitonin > 0.5, nephrostomy drainage, %neutrophils ≥ 80, Guy’s stone score grade 4, ASA of grade 5, positive culture result, leukocytes ≥ 15 or ≤4.5, and the balloon dilation method. The protective factors were the same as in the hospitalization model, with the addition of ureteral catheterization. In this case, a direct proportional relationship with the outcome was observed only for the variable representing the largest diameter of the stone.

4. Discussion

Among the different model architectures tested, the Random Forest algorithm consistently demonstrated superior performance, achieving the highest AUC values for both the prediction of hospitalization need (0.726 and 0.736 in the validation and test sets, respectively) and infectious complications (0.799 and 0.735). These findings suggest that Random Forest is an effective approach for modeling clinical outcomes for PCNL intervention. In contrast, the models for detection of hemorrhagic complications did not reach acceptable predictive performance. This could be due to insufficient data availability or due to the need for a more sophisticated model. Thus, data enrichment is necessary through recruitment of additional centers, both in terms of patient numbers and features specifically related to hemorrhagic complications. Furthermore, ensemble modeling methods could be explored to better capture complex patterns. Notably, Yang et al. [9], who compared different models for the prediction of transfusion need, one of the most critical hemorrhagic complications, also obtained the highest AUC with Random Forest, thus supporting its relevance in this context.

Moreover, Geraghty et al. [8] carried out the largest study in the field of PCNL with a total of 12,810 patients. In their work, the model developed for predicting postoperative infection achieved AUC values ranging from 0.78 to 0.86, while the model for predicting postoperative complications—comparable to our hospitalization need model—achieved AUC values between 0.81 and 0.89. This indicates that recruiting a higher number of patients into the study may contribute to improved model performance.

Additionally, the decision threshold was adjusted to improve sensitivity, as false negatives in this context could have serious clinical consequences. This led to the development of an elimination-based model to discard patients who are unlikely to have a complication.

Among the strengths of our study are the evaluation of several machine learning models, the use of separate validation and test datasets, and application of interpretable AI methods such as SHAP to understand the contribution of clinical features.

However, there are several limitations. The dataset was gathered from a single center, which may limit the generalizability of the findings. Future research should focus on multi-center validation to ensure that the predictive models perform consistently across different populations and clinical settings. Additionally, the dataset included a relatively small number of samples for rare outcomes, and class imbalance was also a challenge.

Therefore, future research should focus on expanding the dataset across multiple institutions, incorporating a wider range of predictive variables. Moreover, if a sufficiently large sample size can be obtained, the development of complication-specific predictive models should be considered, as each type of complication may have distinct pathophysiological and procedural risk factors.

5. Conclusions

This study highlights the potential of machine learning, particularly Random Forest, in predicting key complications following PCNL. The models demonstrated good performance in identifying patients at risk of hospitalization or infection. These findings suggest that AI-based tools could support clinical decision-making and resource optimization in the PCNL field.

Author Contributions

Conceptualization, L.S., P.B.M. and M.R.F.; methodology, L.S., M.R.F. and J.D.-C.; software, L.S.; validation, L.S.; formal analysis, L.S.; investigation, P.B.M. and A.B.; resources, P.B.M. and D.V.-C.; data curation, L.S.; writing—original draft preparation, L.S.; writing—review and editing, L.S. and D.V.-C.; visualization, L.S.; supervision, D.V.-C. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of La Fe University and Polytechnic Hospital (protocol code: NLP-Ing, registration number: 2025-0540-1) on 27 May 2025.

Data Availability Statement

Access to the database may be requested from the Department of Lithotripsy of La Fe University and Polytechnic Hospital, subject to institutional approval and compliance with applicable ethical guidelines. The Python codes employed are available upon request to the authors.

Acknowledgments

The authors would like to thank John Wright for his assistance in improving the language and readability of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PCNL	Percutaneous Nephrolithotomy
ESWL	Extracorporeal Shock Wave Lithotripsy
RIRS	Retrograde Intrarenal Surgery
AUA	American Urological Association
AI	Artificial Intelligence
ML	Machine Learning
BMI	Body Mass Index
ASA	American Society of Anesthesiologists
UH	Hounsfield Units
EDA	Exploratory Data Analysis
AUC	Area Under the Curve
PPV	Positive Predictive Value
NPV	Negative Predictive Value

References

Liangos, O.; Jaber, B.L. Kidney Stones. In Humana Press eBooks; Humana Press: Totowa, NJ, USA, 2008; pp. 513–527. [Google Scholar] [CrossRef]
Sánchez-Martín, F.; Rodríguez, F.M.; Fernández, S.E.; Tomás, J.S.; Barón, F.R.; Martínez-Rodríguez, R.; Mavrich, H.V. Incidencia y prevalencia de la urolitiasis en España: Revisión de los datos originales disponibles hasta la actualidad. Actas Urológicas Españolas 2007, 31, 511–520. [Google Scholar] [CrossRef] [PubMed]
Del Pilar Alcoba García, M.; Serrano, G.B.; Jiménez, J.T.; López, R.G.; Ruíz, L.C.; Enguita, C.G. Is extracorporeal shock wave lithotripsy a treatment option for renal colic? Arch. Españoles De Urol. 2023, 76, 175. [Google Scholar] [CrossRef] [PubMed]
Zong, D.; Shao, P. Flexible Ureteroscopic Holmium Laser Lithotripsy vs Percutaneous Nephrolithotomy for Renal Stones. Int. J. Pharmacol. 2024, 20, 593–601. [Google Scholar] [CrossRef]
Ghani, K.R.; Andonian, S.; Bultitude, M.; Desai, M.; Giusti, G.; Okhunov, Z.; Preminger, G.M.; De La Rosette, J. Percutaneous nephrolithotomy: Update, trends, and future directions. Eur. Urol. 2016, 70, 382–396. [Google Scholar] [CrossRef] [PubMed]
Baltar, C.F.; Corral, M.E.M.; Fentes, D.P. Predicting and Avoiding complications in percutaneous nephrolithotomy in the Era of Personalized Medicine: A scoping review. J. Pers. Med. 2024, 14, 962. [Google Scholar] [CrossRef] [PubMed]
Ghaderzadeh, M.; Shalchian, A.; Irajian, G.; Sadeghsalehi, H.; Bialvaei, A.Z.; Sabet, B. Artificial Intelligence in Drug Discovery and Development Against Antimicrobial Resistance: A Narrative review. Iran. J. Med. Microbiol. 2024, 18, 135–147. [Google Scholar] [CrossRef]
Geraghty, R.M.; Thakur, A.; Howles, S.; Finch, W.; Fowler, S.; Rogers, A.; Sriprasad, S.; Smith, D.; Dickinson, A.; Gall, Z.; et al. Use of Temporally Validated Machine Learning Models To Predict Outcomes of Percutaneous Nephrolithotomy Using Data from the British Association of Urological Surgeons Percutaneous Nephrolithotomy Audit. Eur. Urol. Focus 2024, 10, 290–297. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Cao, Z.; Wang, W.; Yang, C.; Wang, K.; Qiu, X. Application of Machine-learning models in Predicting Transfusion Among Complex Renal Stones Patients Receiving Percutaneous Nephrolithotomy: A Retrospective Study. Res. Sq. 2022. [Google Scholar] [CrossRef]

Figure 1. Flux of collecting data.

Figure 2. ROC curves for Random Forest model predicting hospitalization.

Figure 3. Hospitalization need confusion matrices after applying new threshold over: (a) validation set; (b) test set.

Figure 4. ROC curves for Random Forest model predicting infectious complications.

Figure 5. Infectious complications confusion matrices after applying new threshold over: (a) validation set; (b) test set.

Figure 6. Most important features ranked by SHAP values for (a) hospitalization need predictive model; (b) infectious complications predictive model. Each dot represents a patient. The position on the x-axis (SHAP value) indicates the impact on the predicted outcome (negative values = decreased risk of complication, positive values = increased risk). The color scale reflects the feature value, with blue representing low values and red representing high values. For binary features, only red and blue points are shown (red = 1, blue = 0). Features are ranked by their overall importance in the model.

Table 1. Exploratory data analysis for numerical variables.

Variable	Mean ± SD	p-Value
Variable	Mean ± SD	Hospitalization Need	Hemorrhagic Complications	Infectious Complications
BMI Largest diameter	27.6 ± 5.8	0.218	0.374	0.119
BMI Largest diameter	25 ± 13.3	<0.001	0.131	<0.001
Smallest diameter	15.2 ± 9.3	0.136	0.603	0.027
UH Distance Sheath caliber	1010.7 ± 347	0.448	0.299	0.009
	28.4 ± 36.9	0.073	0.329	0.398
	25.3 ± 3.8	0.03	0.065	0.080
Duration Age	107.9 ± 38.9	0.055	0.001	0.721
Duration Age	54.5 ± 15.4	0.824	0.882	0.816

Table 2. Exploratory data analysis for categorical variables.

Variable	Frequency per Possible Value	p-Value
Variable	Frequency per Possible Value	Hospitalization Need	Hemorrhagic Complications	Infectious Complications
Gender Diabetes	Male: 48.1% Female: 51.9%	0.040	0.127	<0.001
Gender Diabetes	No: 84.3% Type 1: 3.9% Type 2:11.8%	0.191	0.075	0.955
ASA	ASA I: 22.8% ASA II: 60.7% ASA III: 14.5% ASA IV: 2%	0.041	0.172	<0.001
Culture result Microorganism Treatment type	Negative: 65.3% Positive: 33.7% Contaminated: 1.1%	<0.001	0.478	<0.001
	Negative: 66.1% Urease negative: 20.4% Urease positive: 13.4%	<0.001	0.427	<0.001
	Primary: 77.2% Post-ESWL: 8.4% Post-URS: 5.8% Post-NLP: 8.7%	0.32	0.334	0.371
Side Guy score	Right: 47.9% Left: 51.4% Bilateral: 0.4% Transplant: 0.2%	0.221	0.547	0.169
Side Guy score	I: 24.5% II: 31.9% III: 25.9% IV: 17.7%	<0.001	0.086	<0.001
Position	Prone: 1.2% Supine: 98.8%	0.864	0.653	0.706
Access Catheterization	Fluoroscopy: 2.7% Ultrasound: 1% Both: 96.3%	0.033	0.858	0.009
Access Catheterization	No: 11.3% Yes: 88.7%	0.855	0.257	0.269
Contrast	No: 12.4% Yes: 87.6%	0.53	0.388	0.945
Dilatation method	Amplatz: 7.7% Dilation balloon: 44.6% Both: 44.4% Metallic: 3.3%	0.279	0.120	0.766
Multi-trajectory	No: 93.9% Yes: 6.1%	0.867	0.339	0.583
Localization ease	Easy: 43.4% Medium: 41.9% Difficult: 14.7%	<0.001	0.007	0.244
Tubeless	No: 60.9% Yes: 39.1%	<0.001	<0.001	0.001
Procalcitonin	Normal range: 88.8% Out of normal range: 11.2%	<0.001	<0.001	<0.001
Leukocytes	Normal range: 85.3% Out of normal range: 14.7%	<0.001	0.081	<0.001
%Neutrophils	Normal range: 51.2% Out of normal range: 48.8%	<0.001	0.018	<0.001
Location	Pelvis: 26.2% Superior calyx: 3% Middle calyx: 0.6% Inferior calyx: 9.4% Pseudocoraliform: 11.5% Coraliform: 26.3% Renal and ureteral: 2.9% Calyceal diverticulum: 0.9% Pelvis + upper calyceal group: 12% Multiple calyces: 3.6% Proximal ureter: 0.9% Distal ureter: 0.7% Pelvis + lower calyceal group: 2.1%	0.063	0.549	<0.001
Fragmentation source	Lithoclast: 59.5% Ultrasound: 0.3% Lithoclast + ultrasound: 0.6% Holmium laser: 16.8% Basket: 2.8% Forceps: 4.9% Lithoclast + laser: 14.5% Irrigation: 0.5%	<0.001	0.016	0.003
Drains	No: 0.9% Nephrostomy: 17.8% Double-J stent: 36.9% Double-J stent + nephrostomy: 42.6% 24 h Ureteral catheter: 1% 24 h Ureteral catheter + nephrostomy: 0.9%	<0.001	<0.001	0.002

Table 3. Summary of evaluation metrics of the models for hospitalization need prediction.

Model Type	Subset	AUC	Accuracy	Sensitivity	Specificity
Logistic Regression	Train	0.786	0.723	0.694	0.733
	Validation	0.684	0.656	0.5	0.721
	Test	0.682	0.718	0.619	0.75
Decision Tree	Train	0.718	0.797	0.306	0.956
	Validation	0.675	0.75	0.286	0.941
	Test	0.593	0.729	0.19	0.906
Random Forest	Train	0.781	0.710	0.647	0.731
	Validation	0.726	0.708	0.607	0.75
	Test	0.736	0.741	0.619	0.781
Extreme Gradient Boosting	Train	0.771	0.788	0.218	0.973
	Validation	0.734	0.708	0.071	0.971
	Test	0.657	0.753	0.095	0.969

Table 4. Updated performance of Random Forest model predicting hospitalization after moving the threshold.

Metric	Validation Set	Test Set
Sensitivity	0.964	0.905
Specificity	0.162	0.141
PPV	0.321	0.257
NPV	0.917	0.818

Table 5. Summary of evaluation metrics of the models for hemorrhagic complications prediction.

Model Type	Subset	AUC	Accuracy	Sensitivity	Specificity
Logistic Regression	Train	0.797	0.862	0.179	0.956
	Validation	0.705	0.857	0.1	0.959
	Test	0.644	0.859	0.091	0.973
Decision Tree	Train	0.812	0.877	0.262	0.961
	Validation	0.726	0.833	0.2	0.919
	Test	0.542	0.859	0.091	0.973
Random Forest	Train	0.866	0.888	0.107	0.995
	Validation	0.716	0.893	0.1	1.00
	Test	0.633	0.882	0.091	1.00
Extreme Gradient Boosting	Train	0.972	0.912	0.94	0.909
	Validation	0.708	0.774	0.3	0.838
	Test	0.671	0.824	0.273	0.905

Table 6. Summary of evaluation metrics of the models for infectious complications prediction.

Model Type	Subset	AUC	Accuracy	Sensitivity	Specificity
Logistic Regression	Train	0.778	0.856	0.00	1.00
	Validation	0.684	0.854	0.00	1.00
	Test	0.763	0.855	0.00	1.00
Decision Tree	Train	0.899	0.855	0.646	0.89
	Validation	0.727	0.833	0.5	0.89
	Test	0.656	0.795	0.25	0.887
Random Forest	Train	0.91	0.887	0.646	0.927
	Validation	0.799	0.885	0.286	0.988
	Test	0.735	0.843	0.333	0.93
Extreme Gradient Boosting	Train	0.976	0.917	0.939	0.913
	Validation	0.716	0.885	0.571	0.939
	Test	0.658	0.795	0.417	0.859

Table 7. Updated performance of Random Forest model predicting infectious complications after moving the threshold.

Metric	Validation Set	Test Set
Sensitivity	0.927	1.00
Specificity	0.122	0.07
PPV	0.153	0.154
NPV	0.909	1.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shalabayeva, L.; Bahílo Mateu, P.; Romeu Ferras, M.; Díaz-Carnicero, J.; Budía, A.; Vivas-Consuelo, D. Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy. Algorithms 2025, 18, 558. https://doi.org/10.3390/a18090558

AMA Style

Shalabayeva L, Bahílo Mateu P, Romeu Ferras M, Díaz-Carnicero J, Budía A, Vivas-Consuelo D. Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy. Algorithms. 2025; 18(9):558. https://doi.org/10.3390/a18090558

Chicago/Turabian Style

Shalabayeva, Laura, Pilar Bahílo Mateu, Marc Romeu Ferras, Javier Díaz-Carnicero, Alberto Budía, and David Vivas-Consuelo. 2025. "Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy" Algorithms 18, no. 9: 558. https://doi.org/10.3390/a18090558

APA Style

Shalabayeva, L., Bahílo Mateu, P., Romeu Ferras, M., Díaz-Carnicero, J., Budía, A., & Vivas-Consuelo, D. (2025). Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy. Algorithms, 18(9), 558. https://doi.org/10.3390/a18090558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Models for Predicting Postoperative Complications and Hospitalization After Percutaneous Nephrolithotomy

Abstract

1. Introduction

2. Materials and Methods

2.1. Database

2.2. Exclusion Criteria

2.3. Predictors

2.4. Variables of Interest

2.5. Exploratory Data Analysis

2.6. Missing Data Treatment

2.7. Treatment of Numerical Variables

2.8. Treatment of Categorical Variables

2.9. Data Splitting

2.10. Model Development and Evaluation

2.11. Threshold Optimization

2.12. Explainability

3. Results

3.1. Exploratory Data Analysis Results

3.2. Hospitalization Need Predictive Model

3.3. Hemorrhagic Complications Predictive Model

3.4. Infectious Complications Predictive Model

3.5. Explainability of the Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI