Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Description
2.2. Data Preprocessing and Feature Engineering
2.2.1. Deterministic Link Between ‘Complications’ and ’Outcome’
2.2.2. Data Cleaning and Initial Processing
2.2.3. Categorical Variable Encoding
2.2.4. Text Feature Engineering
2.2.5. Feature Scaling and Final Matrix Construction
2.2.6. Train/Test Splitting and Cross-Validation
2.2.7. Handling Class Imbalance
2.2.8. Summary of Preprocessing Pipeline
2.3. Summary of ML Algorithms for Anesthesia Complication Prediction
2.3.1. Tree-Based Algorithms
2.3.2. Deep Neural Network Algorithms
2.3.3. Other Algorithms
2.3.4. Modeling Pipeline Overview
3. Results
3.1. Overview of Model Performance
3.2. Classical ML Algorithms
3.3. Tree-Based Ensemble Methods
3.4. Deep Learning and Transformer-Based Models
3.5. Comparison of Feature Sets and Algorithms
3.6. Summary of Findings
4. Discussion
4.1. Leakage-Aware ML
4.2. Comparison with Existing Studies
4.3. Study Contributions
- Leakage-Aware Experimental Design: We present a transparent, leakage-aware workflow for preoperative risk prediction that explicitly identifies and excludes deterministically linked features (such as the ‘Complications’ column) and proxies unavailable at prediction time. This approach directly addresses a pervasive methodological flaw in the literature, where inclusion of such features can lead to artificially inflated performance metrics and non-generalizable models.
- Clinician-Guided Feature Selection and Oversight: The study demonstrates the essential role of clinician engagement throughout the ML pipeline, from feature selection to model interpretation. By systematically involving clinical experts, we ensure that only information available at the time of prediction is used, and we illustrate how clinical oversight alters both model inputs and outcomes compared to purely engineering-driven approaches.
- Comparative Evaluation of Text Representations: We systematically compare classical text representations (TF-IDF with PCA) and domain-adapted transformer embeddings (ClinicalBERT) under strict leakage control. This head-to-head evaluation clarifies the relative benefits and limitations of each approach for extracting predictive signal from preoperative anesthesia notes.
- Reproducible, Educational Benchmark: Rather than aiming for maximal predictive performance, we provide an explicit, reproducible baseline (including models with and without SMOTE, models with and without PCA-based dimensionality reduction, standard performance metrics (AUC, accuracy, ROC, confusion matrix), and detailed workflow diagrams) intended as an educational benchmark for future perioperative ML studies. Our results highlight the challenges of real-world prediction and the necessity of rigorous methodology over superficial accuracy gains.
- Promotion of Methodological Transparency: By thoroughly documenting our preprocessing, feature engineering, model selection, and evaluation steps, we set a standard for methodological transparency. This enables meaningful comparison with future studies and supports the development of clinically actionable, trustworthy ML tools in perioperative care.
4.4. The Importance of Clinical Guidance in ML Model Development
4.5. Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AUC | Area under the receiver operating characteristic curve |
BMI | Body Mass Index |
CatBoost | Categorical Boosting |
CSV | Comma-Separated Values |
EHR | Electronic Health Record |
KNN | K-Nearest Neighbors |
LSBoost | Least Squares Boosting |
ML | Machine Learning |
NLP | Natural Language Processing |
PCA | Principal Component Analysis |
RF | Random Forest |
ROC | Receiver Operating Characteristic |
RUSBoost | Random Under Sampling Boosting |
SMOTE | Synthetic Minority Oversampling Technique |
TF-IDF | Term Frequency–Inverse Document Frequency |
XGBoost | Extreme Gradient Boosting |
LLM | Large Language Model |
Appendix A
Algorithm A1 Generalized ML Pipeline for Anesthesia Complication Prediction |
|
References
- Hassan, A.M.; Rajesh, A.; Asaad, M.; Nelson, J.A.; Coert, J.H.; Mehrara, B.J.; Butler, C.E. Artificial Intelligence and Machine Learning in Prediction of Surgical Complications: Current State, Applications, and Implications. Am. Surg. 2022, 89, 25–30. [Google Scholar] [CrossRef] [PubMed]
- Fritz, B.A.; King, C.R.; Abdelhack, M.; Chen, Y.; Kronzer, A.; Abraham, J.; Tripathi, S.; Ben Abdallah, A.; Kannampallil, T.; Budelier, T.P.; et al. Effect of machine learning models on clinician prediction of postoperative complications: The Perioperative ORACLE randomised clinical trial. Br. J. Anaesth. 2024, 133, 1042–1050. [Google Scholar] [CrossRef]
- Si, Y.; Wang, J.; Xu, H.; Roberts, K. Enhancing clinical concept extraction with contextual embeddings. J. Am. Med. Inform. Assoc. 2019, 26, 1297–1304. [Google Scholar] [CrossRef] [PubMed]
- Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv 2019, arXiv:1904.05342. [Google Scholar] [CrossRef]
- Ahn, J.M.; Kim, J.; Kim, K. Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting. Toxins 2023, 15, 608. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Nerella, S.; Bandyopadhyay, S.; Zhang, J.; Contreras, M.; Siegel, S.; Bumin, A.; Silva, B.; Sena, J.; Shickel, B.; Bihorac, A.; et al. Transformers and large language models in healthcare: A review. Artif. Intell. Med. 2024, 154, 102900. [Google Scholar] [CrossRef]
- Sendak, M.; Gao, M.; Nichols, M.; Lin, A.; Balu, S. Machine Learning in Health Care: A Critical Appraisal of Challenges and Opportunities. eGEMs 2019, 7, 1. [Google Scholar] [CrossRef]
- Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 2012, 6, 1–21. [Google Scholar] [CrossRef]
- Perets, O.; Stagno, E.; Yehuda, E.B.; McNichol, M.; Celi, L.A.; Rappoport, N.; Dorotic, M. Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias. medRxiv 2024. [Google Scholar] [CrossRef]
- Coeckelenbergh, S.; Boelefahr, S.; Alexander, B.; Perrin, L.; Rinehart, J.; Joosten, A.; Barvais, L. Closed-loop anesthesia: Foundations and applications in contemporary perioperative medicine. J. Clin. Monit. Comput. 2024, 38, 487–504. [Google Scholar] [CrossRef]
- Coman, S.; Iosif, D. AnesthesiaGUIDE: A MATLAB tool to control the anesthesia. Appl. Sci. 2021, 4, 3. [Google Scholar] [CrossRef]
- Qiu, X.; Hu, S.; Dong, S.; Sun, H. Construction of an automated machine learning-based predictive model for postoperative pulmonary complications risk in non-small cell lung cancer patients undergoing thoracoscopic surgery. PLoS ONE 2025, 20, e0333413. [Google Scholar] [CrossRef]
- Lin, Z.; Yan, M.; Chen, H.; Wei, S.; Li, Y.; Jian, J. Development and validation of a machine learning model to predict postoperative complications following radical gastrectomy for gastric cancer. Front. Oncol. 2025, 15, 1606938. [Google Scholar] [CrossRef] [PubMed]
- Glebov, M.; Lazebnik, T.; Katsin, M.; Orkin, B.; Berkenstadt, H.; Bunimovich-Mendrazitsky, S. Predicting postoperative nausea and vomiting using machine learning: A model development and validation study. BMC Anesthesiol. 2025, 25, 135. [Google Scholar] [CrossRef]
- Hua, C.; Chu, Y.; Zhou, M.; Ye, J.; Xu, X. Predictive effect of postoperative recovery in general anesthesia patients using interpretable models based on swarm intelligence machine learning. Front. Physiol. 2025, 16, 1565548. [Google Scholar] [CrossRef]
- Chen, M.; Zhang, D. Machine learning-based prediction of post-induction hypotension: Identifying risk factors and enhancing anesthesia management. BMC Med. Inform. Decis. Mak. 2025, 25, 96. [Google Scholar] [CrossRef]
- Tsai, F.F.; Chang, Y.C.; Chiu, Y.W.; Sheu, B.C.; Hsu, M.H.; Yeh, H.M. Machine Learning Model for Anesthetic Risk Stratification for Gynecologic and Obstetric Patients: Cross-Sectional Study Outlining a Novel Approach for Early Detection. JMIR Form. Res. 2024, 8, e54097. [Google Scholar] [CrossRef] [PubMed]
- Arina, P.; Kaczorek, M.R.; Hofmaenner, D.A.; Pisciotta, W.; Refinetti, P.; Singer, M.; Mazomenos, E.B.; Whittle, J. Prediction of Complications and Prognostication in Perioperative Medicine: A Systematic Review and PROBAST Assessment of Machine Learning Tools. Anesthesiology 2023, 140, 85–101. [Google Scholar] [CrossRef]
- Zaki, H.A.; Elmelliti, H.; Shaban, E.E.; Shaban, A.; Shaban, A.; Elgassim, M.; Shallik, N. Comprehensive systematic review and meta-analysis: Evaluating artificial intelligence (AI) effectiveness and integration obstacles within anesthesiology. J. Emerg. Med. Trauma Acute Care 2025, 2025, 22. [Google Scholar] [CrossRef]
- Mehta, D.; Gonzalez, X.T.; Huang, G.; Abraham, J. Machine learning-augmented interventions in perioperative care: A systematic review and meta-analysis. Br. J. Anaesth. 2024, 133, 1159–1172. [Google Scholar] [CrossRef] [PubMed]
- Sevakula, R.K.; Au-Yeung, W.M.; Singh, J.P.; Heist, E.K.; Isselbacher, E.M.; Armoundas, A.A. State-of-the-Art Machine Learning Techniques Aiming to Improve Patient Outcomes Pertaining to the Cardiovascular System. J. Am. Heart Assoc. 2020, 9, e013924. [Google Scholar] [CrossRef] [PubMed]
- Melton, G.B.; Hripcsak, G. Automated Detection of Adverse Events Using Natural Language Processing of Discharge Summaries. J. Am. Med. Inform. Assoc. 2005, 12, 448–457. [Google Scholar] [CrossRef]
- Voss, R.W.; Schmidt, T.D.; Weiskopf, N.; Marino, M.; Dorr, D.A.; Huguet, N.; Warren, N.; Valenzuela, S.; O’Malley, J.; Quiñones, A.R. Comparing ascertainment of chronic condition status with problem lists versus encounter diagnoses from electronic health records. J. Am. Med. Inform. Assoc. 2022, 29, 770–778. [Google Scholar] [CrossRef]
- Alba, C.; Xue, B.; Abraham, J.; Kannampallil, T.; Lu, C. The foundational capabilities of large language models in predicting postoperative risks using clinical notes. Npj Digit. Med. 2025, 8, 95. [Google Scholar] [CrossRef]
- Mendez, J.A.; Leon, A.; Marrero, A.; Gonzalez-Cava, J.M.; Reboso, J.A.; Estevez, J.I.; Gomez-Gonzalez, J.F. Improving the anesthetic process by a fuzzy rule based medical decision system. Artif. Intell. Med. 2018, 84, 159–170. [Google Scholar] [CrossRef] [PubMed]
- Hashimoto, D.A.; Witkowski, E.; Gao, L.; Meireles, O.; Rosman, G. Artificial Intelligence in Anesthesiology: Current Techniques, Clinical Applications, and Limitations. Anesthesiology 2020, 132, 379–394. [Google Scholar] [CrossRef]
- Xu, Y.; Foryciarz, A.; Steinberg, E.; Shah, N.H. Clinical utility gains from incorporating comorbidity and geographic location information into risk estimation equations for atherosclerotic cardiovascular disease. J. Am. Med. Inform. Assoc. 2023, 30, 878–887. [Google Scholar] [CrossRef]
- Tayebi Arasteh, S.; Han, T.; Lotfinia, M.; Kuhl, C.; Kather, J.N.; Truhn, D.; Nebelung, S. Large language models streamline automated machine learning for clinical studies. Nat. Commun. 2024, 15, 1603. [Google Scholar] [CrossRef] [PubMed]
- Usman, S.M.; Usman, M.; Fong, S. Epileptic Seizures Prediction Using Machine Learning Methods. Comput. Math. Methods Med. 2017, 2017, 9074759. [Google Scholar] [CrossRef]
- Toma, M. AI-Assisted Medical Diagnostics: A Clinical Guide to Next-Generation Diagnostics; Dawning Research Press: Old Westbury, NY, USA, 2025; Available online: https://openlibrary.org/works/OL44048041W/ (accessed on 4 October 2025).
- Bellini, V.; Valente, M.; Bertorelli, G.; Pifferi, B.; Craca, M.; Mordonini, M.; Lombardo, G.; Bottani, E.; Del Rio, P.; Bignami, E. Machine learning in perioperative medicine: A systematic review. J. Anesth. Analg. Crit. Care 2022, 2, 2. [Google Scholar] [CrossRef]
- Zhang, Z.; Duan, Y.; Lin, J.; Luo, W.; Lin, L.; Gao, Z. Artificial intelligence in anesthesia: Insights from the 2024 Nobel Prize in Physics. Anesthesiol. Perioper. Sci. 2025, 3, 5. [Google Scholar] [CrossRef]
- Xu, P. Multi-layered data framework for enhancing postoperative outcomes and anaesthesia management through natural language processing. SLAS Technol. 2025, 32, 100294. [Google Scholar] [CrossRef]
- Mahajan, A.; Esper, S.; Oo, T.H.; McKibben, J.; Garver, M.; Artman, J.; Klahre, C.; Ryan, J.; Sadhasivam, S.; Holder-Murray, J.; et al. Development and Validation of a Machine Learning Model to Identify Patients Before Surgery at High Risk for Postoperative Adverse Events. JAMA Netw. Open 2023, 6, e2322285. [Google Scholar] [CrossRef]
- Starcke, J.; Spadafora, J.; Spadafora, J.; Spadafora, P.; Toma, M. The Effect of Data Leakage and Feature Selection on Machine Learning Performance for Early Parkinson’s Disease Detection. Bioengineering 2025, 12, 845. [Google Scholar] [CrossRef]
- Ng, F.Y.C.; Thirunavukarasu, A.J.; Cheng, H.; Tan, T.F.; Gutierrez, L.; Lan, Y.; Ong, J.C.L.; Chong, Y.S.; Ngiam, K.Y.; Ho, D.; et al. Artificial intelligence education: An evidence-based medicine approach for consumers, translators, and developers. Cell Rep. Med. 2023, 4, 101230. [Google Scholar] [CrossRef]
- Nasef, D.; Nasef, D.; Sher, M.; Toma, M. A Standardized Validation Framework for Clinically Actionable Healthcare Machine Learning with Knee Osteoarthritis Grading as a Case Study. Algorithms 2025, 18, 343. [Google Scholar] [CrossRef]
- Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019, 17, 195. [Google Scholar] [CrossRef]
- Sher, M.; Sharma, R.; Remyes, D.; Nasef, D.; Nasef, D.; Toma, M. Stratified Multisource Optical Coherence Tomography Integration and Cross-Pathology Validation Framework for Automated Retinal Diagnostics. Appl. Sci. 2025, 15, 4985. [Google Scholar] [CrossRef]
- Foti, L.; Michard, F.; Villa, G.; Ricci, Z.; Romagnoli, S. The impact of arterial pressure waveform underdamping and resonance filters on cardiac output measurements with pulse wave analysis. Br. J. Anaesth. 2022, 129, e6–e8. [Google Scholar] [CrossRef] [PubMed]
- Gallitto, G.; Englert, R.; Kincses, B.; Kotikalapudi, R.; Li, J.; Hoffschlag, K.; Bingel, U.; Spisak, T. External validation of machine learning models—registered models and adaptive sample splitting. GigaScience 2025, 14, giaf036. [Google Scholar] [CrossRef]
Field | Description | Data Type | Possible Values/Notes |
---|---|---|---|
PatientID | Unique patient identifier | Integer | – |
Age | Age of patient (years) | Integer | – |
Gender | Gender of patient | String | Male, Female |
BMI | Body Mass Index | Integer | – |
SurgeryType | Type of surgery | String | Cardiovascular, Orthopedic, Neurological, Cosmetic |
SurgeryDuration | Duration of surgery | String | e.g., “120 min”, “180 min” |
AnesthesiaType | Type of anesthesia | String | General, Local |
PreoperativeNotes | Pre-surgery clinical notes | String | Unstructured text |
PostoperativeNotes | Post-surgery clinical notes | String | Unstructured text |
PainLevel | Postoperative pain level (1–10) | Integer | 1 to 10 |
Complications | Postoperative complications | String | None, Nausea, mild bleeding, Respiratory distress, Delayed recovery |
Outcome | Complication outcome label | Integer | 0 (No complications), 1 (Complications present) |
Model/Feature Set | With SMOTE | Without SMOTE | With RUSBoost | Without RUSBoost |
---|---|---|---|---|
XGBoost (TF-IDF+One-hot) | ✓ | × | × | ✓ |
Random Forest (TF-IDF+PCA) | × | ✓ | × | ✓ |
Random Forest (TF-IDF+One-hot) | ✓ | × | × | ✓ |
CatBoost (TF-IDF+One-hot) | ✓ | × | × | ✓ |
RUSBoost (TF-IDF+PCA) | × | ✓ | ✓ | × |
KNN (Numeric+One-hot+TF-IDF+PCA) | × | ✓ | × | ✓ |
ClinicalBERT+Tabular+SMOTE XGBoost | ✓ | × | × | ✓ |
Features | Algorithm | Test AUC/Accuracy |
---|---|---|
TF-IDF (500) + tabular | LogitBoost | 0.563/50.0% |
TF-IDF + PCA | RUSBoost | 0.563/50.0% |
Numeric + One-hot + TF-IDF + PCA | Naïve Bayes | 0.625/54.4% |
TF-IDF + PCA | Random Forest | 0.589/56.7% |
TF-IDF (100) + PCA | LSBoost | 0.563/50.0% |
TF-IDF + PCA + tabular | Stacked Ensemble | 0.456/44.4% |
Numeric + One-hot + TF-IDF + PCA | KNN | 0.644/60.0% |
TF-IDF (500) + One-hot + SMOTE | XGBoost | 0.575/56.5% |
TF-IDF + One-hot + SMOTE | Random Forest | 0.556/56.7% |
TF-IDF + One-hot + SMOTE | CatBoost | 0.581/56.7% |
TF-IDF + One-hot + SMOTE | Stacked Ensemble | 0.456/44.4% |
ClinicalBERT + tabular + SMOTE | XGBoost | 0.600/56.7% |
ClinicalBERT (notes only) | Transformers | 0.539/52.2% |
Joint ClinicalBERT + tabular | Transformers | 0.450/44.0% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Amanatidis, A.; Egan, K.; Nio, K.; Toma, M. Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes. Surgeries 2025, 6, 87. https://doi.org/10.3390/surgeries6040087
Amanatidis A, Egan K, Nio K, Toma M. Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes. Surgeries. 2025; 6(4):87. https://doi.org/10.3390/surgeries6040087
Chicago/Turabian StyleAmanatidis, Anastasia, Kyle Egan, Kusuma Nio, and Milan Toma. 2025. "Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes" Surgeries 6, no. 4: 87. https://doi.org/10.3390/surgeries6040087
APA StyleAmanatidis, A., Egan, K., Nio, K., & Toma, M. (2025). Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes. Surgeries, 6(4), 87. https://doi.org/10.3390/surgeries6040087