This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes
by
Anastasia Amanatidis
Anastasia Amanatidis 1,
Kyle Egan
Kyle Egan 1
,
Kusuma Nio
Kusuma Nio 2 and
Milan Toma
Milan Toma 1,*
1
Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
2
Department of Surgery, Icahn School of Medicine at Mount Sinai, 1428 Madison Avenue, Atran Berg Building, 8th Floor, New York, NY 10029, USA
*
Author to whom correspondence should be addressed.
Surgeries 2025, 6(4), 87; https://doi.org/10.3390/surgeries6040087 (registering DOI)
Submission received: 29 August 2025
/
Revised: 4 October 2025
/
Accepted: 8 October 2025
/
Published: 9 October 2025
Abstract
Background/Objectives: Machine learning has been suggested as a way to improve how we predict anesthesia-related complications after surgery. However, many studies report overly optimistic results due to issues like data leakage and not fully using information from clinical notes. This study provides a transparent comparison of different machine learning models using both structured data and preoperative notes, with a focus on avoiding data leakage and involving clinicians throughout. We show how high reported metrics in the literature can result from methodological pitfalls and may not be clinically meaningful. Methods: We used a dataset containing both structured patient and surgery information and preoperative clinical notes. To avoid data leakage, we excluded any variables that could directly reveal the outcome. The data was cleaned and processed, and information from clinical notes was summarized into features suitable for modeling. We tested a range of machine learning methods, including simple, tree-based, and modern language-based models. Models were evaluated using a standard split of the data and cross-validation, and we addressed class imbalance with sampling techniques. Results: All models showed only modest ability to distinguish between patients with and without complications. The best performance was achieved by a simple model using both structured and summarized text features, with an area under the curve of 0.644 and accuracy of 60%. Other models, including those using advanced language techniques, performed similarly or slightly worse. Adding information from clinical notes gave small improvements, but no single type of data dominated. Overall, the results did not reach the high levels reported in some previous studies. Conclusions: In this analysis, machine learning models using both structured and unstructured preoperative data achieved only modest predictive performance for postoperative complications. These findings highlight the importance of transparent methodology and clinical oversight to avoid data leakage and inflated results. Future progress will require better control of data leakage, richer data sources, and external validation to develop clinically useful prediction tools.
Share and Cite
MDPI and ACS Style
Amanatidis, A.; Egan, K.; Nio, K.; Toma, M.
Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes. Surgeries 2025, 6, 87.
https://doi.org/10.3390/surgeries6040087
AMA Style
Amanatidis A, Egan K, Nio K, Toma M.
Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes. Surgeries. 2025; 6(4):87.
https://doi.org/10.3390/surgeries6040087
Chicago/Turabian Style
Amanatidis, Anastasia, Kyle Egan, Kusuma Nio, and Milan Toma.
2025. "Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes" Surgeries 6, no. 4: 87.
https://doi.org/10.3390/surgeries6040087
APA Style
Amanatidis, A., Egan, K., Nio, K., & Toma, M.
(2025). Data-Leakage-Aware Preoperative Prediction of Postoperative Complications from Structured Data and Preoperative Clinical Notes. Surgeries, 6(4), 87.
https://doi.org/10.3390/surgeries6040087
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.