Early Sepsis Prediction Using Publicly Available Data: High-Performance AI/ML Models with First-Hour Clinical Information
Abstract
1. Introduction
2. Methods
2.1. Study Setting and Design
2.2. Inclusion and Exclusion Criteria
2.3. Primary Outcome
2.4. Data Preprocessing
2.5. Key Features
2.6. Model Development
2.7. Performance Accuracy
2.8. Model Interpretability
2.9. Statistical Analysis
3. Results
3.1. Identification of the Best AI/ML Model for Early Sepsis Prediction
3.2. AI/ML Early Sepsis Prediction Model Interpretation
4. Discussion
5. Limitations
6. Future Directions
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Singer, M.; Deutschman, C.S.; Seymour, C.W.; Shankar-Hari, M.; Annane, D.; Bauer, M.; Bellomo, R.; Bernard, G.R.; Chiche, J.D.; Coopersmith, C.M.; et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). J. Am. Med. Assoc. 2016, 315, 801–810. [Google Scholar] [CrossRef]
- Rudd, K.E.; Johnson, S.C.; Agesa, K.M.; Shackelford, K.A.; Tsoi, D.; Kievlan, D.R.; Colombara, D.V.; Ikuta, K.S.; Kissoon, N.; Finfer, S.; et al. Global, regional, and national sepsis incidence and mortality, 1990-2017: Analysis for the Global Burden of Disease Study. Lancet 2020, 395, 200–211. [Google Scholar] [CrossRef]
- Liu, V.; Escobar, G.J.; Greene, J.D.; Soule, J.; Whippy, A.; Angus, D.C.; Iwashyna, T.J. Hospital deaths in patients with sepsis from 2 independent cohorts. J. Am. Med. Assoc. 2014, 312, 90–92. [Google Scholar] [CrossRef]
- Bone, R.C.; Balk, R.A.; Cerra, F.B.; Dellinger, R.P.; Fein, A.M.; Knaus, W.A.; Schein, R.M.; Sibbald, W.J. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest 1992, 101, 1644–1655. [Google Scholar] [CrossRef]
- Mitaka, C. Clinical laboratory differentiation of infectious versus non-infectious systemic inflammatory response syndrome. Clin. Chim. Acta 2005, 351, 17–29. [Google Scholar] [CrossRef]
- Churpek, M.M.; Snyder, A.; Han, X.; Sokol, S.; Pettit, N.; Howell, M.D.; Edelson, D.P. Quick Sepsis-related Organ Failure Assessment, Systemic Inflammatory Response Syndrome, and Early Warning Scores for Detecting Clinical Deterioration in Infected Patients outside the Intensive Care Unit. Am. J. Respir. Crit. Care Med. 2017, 195, 906–911. [Google Scholar] [CrossRef] [PubMed]
- Kramer, A.A.; Sebat, F.; Lissauer, M. A review of early warning systems for prompt detection of patients at risk for clinical decline. J. Trauma Acute Care Surg. 2019, 87 (Suppl. S1), S67–S73. [Google Scholar] [CrossRef]
- Evans, L.; Rhodes, A.; Alhazzani, W.; Antonelli, M.; Coopersmith, C.M.; French, C.; Machado, F.R.; McIntyre, L.; Ostermann, M.; Prescott, H.C.; et al. Surviving sepsis campaign: International guidelines for management of sepsis and septic shock 2021. Intensive Care Med. 2021, 47, 1181–1247. [Google Scholar] [CrossRef] [PubMed]
- Kim, T.; Tae, Y.; Yeo, H.J.; Jang, J.H.; Cho, K.; Yoo, D.; Lee, Y.; Ahn, S.H.; Kim, Y.; Lee, N.; et al. Development and Validation of Deep-Learning-Based Sepsis and Septic Shock Early Prediction System (DeepSEPS) Using Real-World ICU Data. J. Clin. Med. 2023, 12, 7156. [Google Scholar] [CrossRef]
- He, Z.; Du, L.; Zhang, P.; Zhao, R.; Chen, X.; Fang, Z. Early Sepsis Prediction Using Ensemble Learning With Deep Features and Artificial Features Extracted From Clinical Electronic Health Records. Crit. Care Med. 2020, 48, e1337–e1342. [Google Scholar] [CrossRef] [PubMed]
- Desautels, T.; Calvert, J.; Hoffman, J.; Jay, M.; Kerem, Y.; Shieh, L.; Shimabukuro, D.; Chettipally, U.; Feldman, M.D.; Barton, C.; et al. Prediction of Sepsis in the Intensive Care Unit With Minimal Electronic Health Record Data: A Machine Learning Approach. JMIR Med. Inform. 2016, 4, e28. [Google Scholar] [CrossRef]
- O’Reilly, D.; McGrath, J.; Martin-Loeches, I. Optimizing artificial intelligence in sepsis management: Opportunities in the present and looking closely to the future. J. Intensive Med. 2024, 4, 34–45. [Google Scholar] [CrossRef]
- Islam, K.R.; Prithula, J.; Kumar, J.; Tan, T.L.; Reaz, M.B.I.; Sumon, M.S.I.; Chowdhury, M.E.H. Machine Learning-Based Early Prediction of Sepsis Using Electronic Health Records: A Systematic Review. J. Clin. Med. 2023, 12, 5658. [Google Scholar] [CrossRef]
- Kumar, A.; Roberts, D.; Wood, K.E.; Light, B.; Parrillo, J.E.; Sharma, S.; Suppes, R.; Feinstein, D.; Zanotti, S.; Taiberg, L.; et al. Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit. Care Med. 2006, 34, 1589–1596. [Google Scholar] [CrossRef]
- Johnson, A.E.W.; Bulgarelli, L.; Shen, L.; Gayles, A.; Shammout, A.; Horng, S.; Pollard, T.J.; Hao, S.; Moody, B.; Gow, B.; et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 2023, 10, 1. [Google Scholar] [CrossRef]
- Pollard, T.J.; Johnson, A.E.W.; Raffa, J.D.; Celi, L.A.; Mark, R.G.; Badawi, O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 2018, 5, 180178. [Google Scholar] [CrossRef]
- Reyna, M.A.; Josef, C.S.; Jeter, R.; Shashikumar, S.P.; Westover, M.B.; Nemati, S.; Clifford, G.D.; Sharma, A. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019. Crit. Care Med. 2020, 48, 210–217. [Google Scholar] [CrossRef] [PubMed]
- Moor, M.; Rieck, B.; Horn, M.; Jutzeler, C.R.; Borgwardt, K. Early Prediction of Sepsis in the ICU Using Machine Learning: A Systematic Review. Front. Med 2021, 8, 607952. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Cui, X.; Song, Z. Predicting sepsis onset in ICU using machine learning models: A systematic review and meta-analysis. BMC Infect. Dis. 2023, 23, 635. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Fu, B.; Wang, W.; Liu, M.; Sun, X. Dynamic Sepsis Prediction for Intensive Care Unit Patients Using XGBoost-Based Model With Novel Time-Dependent Features. IEEE J. Biomed. Health Inform. 2022, 26, 4258–4269. [Google Scholar] [CrossRef]
- Lee, S.G.; Song, J.; Park, D.W.; Moon, S.; Cho, H.; Kim, J.Y.; Park, J.; Cha, J.H. Prognostic value of lactate levels and lactate clearance in sepsis and septic shock with initial hyperlactatemia: A retrospective cohort study according to the Sepsis-2 definition. Medicine 2021, 100, e24835. [Google Scholar] [CrossRef]
- Henriksen, D.P.; Pottegard, A.; Laursen, C.B.; Jensen, T.G.; Hallas, J.; Pedersen, C.; Lassen, A.T. Risk factors for hospitalization due to community-acquired sepsis: A population-based case-control study. PLoS ONE 2015, 10, e0124838. [Google Scholar]
- Arina, P.; Hofmaenner, D.A.; Singer, M. Definition and epidemiology of sepsis. Semin. Respir. Crit. Care Med. 2024, 45, 461–468. [Google Scholar] [CrossRef]
- Matter, M.L.; Shvetsov, Y.B.; Dugay, C.; Haiman, C.A.; Murcharid, L.L.; Wilkens, L.R.; Maskarinec, G. High mortality due to sepsis in Native Hawaiians and African Americans: The multiethnic cohort. PLoS ONE 2017, 12, e0178374. [Google Scholar] [CrossRef] [PubMed]
- Hennssy, D.A.; Soo, A.; Niven, D.J.; Jolley, R.J.; Posadas-Calleja, J.; Stelfox, H.T.; Doig, C.J. Sociodemographic characteristics associated with hospitalization for sepsis among adults in Canada: A Census-linked cohort study. Can. J. Anaesth. 2020, 67, 408–420. [Google Scholar]
- Perez-Lebel, A.; Varoquaux, G.; Le Morvan, M.; Josse, J.; Poline, J.B. Benchmarking missing-values approaches for predictive models on health databases. Gigascience 2022, 11, giac013. [Google Scholar]
- Velez, T.; Ibrahim, Z.; Duru, K.; Velez, D.; Triantafyllou, M.; McKinley, K.; Saif, P.; Kratimenos, P.; Clark, A.; Koutroulis, I. Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies. Int. J. Med. Inform. 2025, 200, 105905. [Google Scholar] [CrossRef]
- Zhuang, J.; Huang, H.; Jiang, S.; Liang, J.; Liu, Y.; Yu, X. A generalizable and interpretable model for mortality risk stratification of sepsis patients in intensive care unit. BMC Med. Inform. Decis. Mak. 2023, 23, 185. [Google Scholar] [CrossRef]
- Li, S.; Dou, R.; Song, X.; Lui, K.Y.; Xu, J.; Guo, Z.; Hu, X.; Guan, X.; Cai, C. Developing an Interpretable Machine Learning Model to Predict in-Hospital Mortality in Sepsis Patients: A Retrospective Temporal Validation Study. J. Clin. Med. 2023, 12, 915. [Google Scholar] [CrossRef]
- Gauer, R.; Forbes, D.; Boyer, N. Sepsis: Diagnosis and Management. Am. Fam. Physician 2020, 101, 409–418. [Google Scholar] [PubMed]


| Sepsis Group (n = 2245) | Non-Sepsis Group (n = 9067) | p | |
|---|---|---|---|
| Age—years Mean (SD) Median (IQR) | 62 (17) 65 [53–75] | 61 (17) 64 [51–74] | 0.0129 0.0145 | 
| Sex—n (row percentage%) Male Female Others | 1322(20.22) 923 (19.34) 0 (0) | 5217 (79.78) 3849 (80.66) 1 (100) | 0.4548 | 
| Race—n (row percentage%) White Black Asian Others | 1596 (19.75) 266 (19.22) 81 (21.83) 302 (20.43) | 6483 (80.25) 1118 (80.78) 290 (78.17) 1176 (79.57) | 0.6539 | 
| Ethnicity—n (row percentage%) Hispanic/Latino Non-Hispanic/Latino Others | 133 (19.28) 1950 (20.07) 162 (17.86) | 557 (80.72) 7765 (79.93) 745 (82.14) | 0.2593 | 
| Marital Status—n (row percentage%) Partnered/Married Divorced/Separated Single Widowed Missing * | 58 (35.58) 16 (43.24) 32 (26.67) 13 (39.39) 2126 (19.40) | 105 (64.42) 21 (56.76) 88 (73.33) 20 (60.61) 8833 (80.60) | <0.001 | 
| XGBoost | LightGBM | HistGB | ||||
|---|---|---|---|---|---|---|
| Training | Testing | Training | Testing | Training | Testing | |
| EHR Structured Data (demographics and laboratory tests) | ||||||
| Accuracy | 0.739 | 0.738 | 0.986 | 0.968 | 0.977 | 0.970 | 
| Precision | 0.093 | 0.086 | 0.771 | 0.399 | 0.586 | 0.428 | 
| Recall | 0.967 | 0.901 | 0.714 | 0.351 | 0.522 | 0.370 | 
| F1 score | 0.169 | 0.156 | 0.741 | 0.374 | 0.552 | 0.397 | 
| AUROC | 0.951 | 0.913 | 0.977 | 0.911 | 0.942 | 0.914 | 
| Waveform Data | ||||||
| Accuracy | 0.836 | 0.826 | 0.883 | 0.883 | 0.932 | 0.920 | 
| Precision | 0.196 | 0.173 | 0.212 | 0.211 | 0.366 | 0.287 | 
| Recall | 0.833 | 0.740 | 0.575 | 0.571 | 0.657 | 0.502 | 
| F1 score | 0.318 | 0.281 | 0.310 | 0.308 | 0.470 | 0.365 | 
| AUROC | 0.915 | 0.871 | 0.857 | 0.851 | 0.931 | 0.875 | 
| A Combination of EHR structured and Waveform Data | ||||||
| Accuracy | 0.868 | 0.858 | 0.915 | 0.913 | 0.945 | 0.934 | 
| Precision | 0.245 | 0.218 | 0.290 | 0.278 | 0.441 | 0.363 | 
| Recall | 0.905 | 0.811 | 0.586 | 0.564 | 0.722 | 0.571 | 
| F1 score | 0.386 | 0.344 | 0.388 | 0.372 | 0.547 | 0.443 | 
| AUROC | 0.953 | 0.922 | 0.906 | 0.895 | 0.957 | 0.920 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Pounds, D.; Zhang, W.; Mokbel, A.Y.; Kabir, M.N.; Lin, X.Y.; Highlander, A.; Dehzangi, I. Early Sepsis Prediction Using Publicly Available Data: High-Performance AI/ML Models with First-Hour Clinical Information. Diagnostics 2025, 15, 2727. https://doi.org/10.3390/diagnostics15212727
Wang H, Pounds D, Zhang W, Mokbel AY, Kabir MN, Lin XY, Highlander A, Dehzangi I. Early Sepsis Prediction Using Publicly Available Data: High-Performance AI/ML Models with First-Hour Clinical Information. Diagnostics. 2025; 15(21):2727. https://doi.org/10.3390/diagnostics15212727
Chicago/Turabian StyleWang, Hao, Destiny Pounds, Wenhui Zhang, Alaa Y. Mokbel, Md Niamul Kabir, Xin Yao Lin, April Highlander, and Iman Dehzangi. 2025. "Early Sepsis Prediction Using Publicly Available Data: High-Performance AI/ML Models with First-Hour Clinical Information" Diagnostics 15, no. 21: 2727. https://doi.org/10.3390/diagnostics15212727
APA StyleWang, H., Pounds, D., Zhang, W., Mokbel, A. Y., Kabir, M. N., Lin, X. Y., Highlander, A., & Dehzangi, I. (2025). Early Sepsis Prediction Using Publicly Available Data: High-Performance AI/ML Models with First-Hour Clinical Information. Diagnostics, 15(21), 2727. https://doi.org/10.3390/diagnostics15212727
 
        

 
       