Leakage-Aware Federated Learning for ICU Sepsis Early Warning: Fixed Alert-Rate Evaluation on PhysioNet/CinC 2019 and MIMIC-IV
Abstract
1. Introduction
- A two-stage, leakage-aware evaluation framework for sepsis early warning that (Stage-1) benchmarks cross-hospital generalization on PhysioNet/CinC 2019 via a bidirectional cross-hospital evaluation scheme based on the official training sets A and B (A→B/B→A) and (Stage-2) evaluates a Sepsis-3-aligned task derived from MIMIC-IV with simulated federated clients.
- A workload-matched operational evaluation strategy using fixed alert-rate thresholding (α = 5%) with stay-level detection rates and lead-time distributions to improve clinical interpretability beyond AUROC/AUPRC.
- Supplementary trustworthiness analyses, including leakage stress tests (proper-vs-leaky splitting and contamination sweeps), calibration/decision-curve analysis, and privacy/security stress tests (membership inference, DP-inspired clipping/noise, and model poisoning).
2. Related Work
2.1. Sepsis Early Warning from EHR Time Series
2.2. Federated Learning in Healthcare
2.3. Operational Evaluation, Calibration, and Robustness
3. Materials and Methods
3.1. Study Overview and Leakage-Aware Evaluation Pipeline
3.2. Datasets and Cohort Construction
3.3. Prediction Task, Windowing, and Label Definitions
| Algorithm 1. Stage-2 Sepsis-3 proxy labeling and windowing rules (summary) |
| (1) Suspected infection time tinf: identify the first qualifying culture↔antibiotics pair. If antibiotics occur first, require culture within 24 h; if culture occurs first, require antibiotics within 72 h. Set tinf to the earlier event time. (2) Baseline SOFAbase: assume SOFAbase = 0 when pre-existing organ dysfunction is not documented. (3) Compute hourly SOFA(t) from ICU charted variables and labs. (4) Sepsis onset tsepsis: earliest time t within [tinf − 12 h, tinf + 12 h] such that SOFA(t) − SOFAbase ≥ 2. (5) Window labeling: for each window ending at time t, set yt,H = 1 if tsepsis ∈ (t, t + H]; exclude windows with end time ≥ tsepsis. (6) Missingness: carry forward within stay; remaining missing SOFA components are set to 0 (normal organ function). Notation. For each ICU stay, let xt−T:t denote multivariate measurements within the observation window of length T ending at time t. The prediction target is yt = 1 if sepsis onset occurs within the future horizon (t, t + H) and yt = 0 otherwise. Models output a risk score pt = fθ(xt−T:t) between 0 and 1. Candidate times t are generated on an hourly grid in principle; in this retrospective evaluation, we score a sampled subset of candidate times per stay (mean ≈ 2.74 windows/stay). Stay-level partitioning ensures that no windows from the same ICU stay appear in multiple splits. |
3.4. Models
3.5. Federated Learning Algorithms
3.6. Evaluation Metrics and Fixed Alert-Rate Lead-Time Analysis
| Algorithm 2. Fixed alert-rate evaluation and stay-level lead-time metrics |
| Input: validation monitored scores ; target alert rate ; test scores ; sepsis onset times . 1. Select . 2. For each sepsis-positive stay define ; if no alert exists, mark the stay as undetected. 3. Detection indicator: . 4. Lead time (defined only if detected): . 5. Report detection rate and the distribution of (median, IQR). 6. Capped timeliness over detected stays: . |
3.7. Privacy and Security Stress Tests (Supplementary)
4. Results
4.1. Stage-1 Cross-Hospital Benchmarking on PhysioNet 2019
4.2. Stage-2 Full-SOFA Early Warning on MIMIC-IV with Operational Fixed Alert-Rate Evaluation
4.3. Stage-2 Sensitivity Analyses and Leakage Stress Tests (SOFA-Proxy Label; Reference CIs for the Main Full-SOFA Task)
4.4. Summary of Privacy and Security Stress Tests (Supplementary)
5. Discussion
5.1. Interpreting the Cross-Hospital FL Benchmark
5.2. Why Fixed Alert-Rate Evaluation Matters
5.3. Calibration, Decision-Analytic Utility, and Non-IID Heterogeneity
5.4. Leakage, Privacy, Security, and Limitations
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Feature Sets and Model I/O Details
| Feature Group | Variables (Hourly Unless Noted) |
|---|---|
| Vital signs | HR, O2Sat, Temp, SBP, MAP, DBP, Resp, EtCO2 |
| Laboratory + demographics | BaseExcess, HCO3, FiO2, pH, PaCO2, SaO2, AST, BUN, Alkalinephos, Calcium, Chloride, Creatinine, Bilirubin_direct, Glucose, Lactate, Magnesium, Phosphate, Potassium, Bilirubin_total, TroponinI, Hct, Hgb, PTT, WBC, Fibrinogen, Platelets; Age, Gender, Unit1, Unit2, HospAdmTime |
| SOFA Component | Raw Variables Used | Example Window Summary |
|---|---|---|
| Respiratory/Coagulation/Liver/CNS/Renal | PaO2, FiO2; Platelets; Bilirubin_total; GCS; Creatinine, UrineOutput | Worst value within window (e.g., PaO2 min, FiO2 max, platelets min, bilirubin max, GCS min, creatinine max, urine output min) |
| Cardiovascular | MAP; Dopamine; Dobutamine; Epinephrine; Norepinephrine | MAP min; maximum infusion rate per drug within window (0 if not administered) |
References
- Singer, M.; Deutschman, C.S.; Seymour, C.W.; Shankar-Hari, M.; Annane, D.; Bauer, M.; Bellomo, R.; Bernard, G.R.; Chiche, J.-D.; Coopersmith, C.M.; et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA 2016, 315, 801–810. [Google Scholar] [CrossRef] [PubMed]
- Vincent, J.-L.; Moreno, R.; Takala, J.; Willatts, S.; De Mendonça, A.; Bruining, H.; Reinhart, C.K.; Suter, P.M.; Thijs, L.G. The SOFA (Sepsis-related Organ Failure Assessment) Score to Describe Organ Dysfunction/Failure. Intensive Care Med. 1996, 22, 707–710. [Google Scholar] [CrossRef]
- Sherman, E.; Szolovits, P.; Ieva, A.; Patel, V.; Ghassemi, M. Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale. arXiv 2018, arXiv:1811.12520. [Google Scholar] [CrossRef]
- Moor, M.; Bennett, N.; Plečko, D.; Horn, M.; Rieck, B.; Meinshausen, N.; Bühlmann, P.; Borgwardt, K. Predicting Sepsis Using Deep Learning across International Sites: A Retrospective Development and Validation Study. EClinicalMedicine 2023, 62, 102124. [Google Scholar] [CrossRef]
- Boussina, A.; Shashikumar, S.P.; Malhotra, A.; Owens, R.L.; El-Kareh, R.; Longhurst, C.A.; Quintero, K.; Donahue, A.; Chan, T.C.; Nemati, S.; et al. Impact of a Deep Learning Sepsis Prediction Model on Quality of Care and Survival. npj Digit. Med. 2024, 7, 14. [Google Scholar] [CrossRef]
- Valan, B.; Prakash, A.; Ratliff, W.; Gao, M.; Muthya, S.; Thomas, A.; Eaton, J.L.; Gardner, M.; Nichols, M.; Revoir, M.; et al. Evaluating Sepsis Watch Generalizability through Multisite External Validation of a Sepsis Machine Learning Model. npj Digit. Med. 2025, 8, 350. [Google Scholar] [CrossRef]
- Gupta, A.; Chauhan, R.; G, S.; Shreekumar, A. Improving Sepsis Prediction in Intensive Care with SepsisAI: A Clinical Decision Support System with a Focus on Minimizing False Alarms. PLoS Digit. Health 2024, 3, e0000569. [Google Scholar] [CrossRef]
- Rich, R.L.; Montero, J.M.; Dillon, K.E.; Condon, P.; Vadaparampil, M. Evaluation of an Intensive Care Unit Sepsis Alert in Critically Ill Medical Patients. Am. J. Crit. Care 2024, 33, 212–216. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Aguera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Reyna, M.A.; Josef, C.S.; Jeter, R.; Shashikumar, S.P.; Westover, M.B.; Nemati, S.; Clifford, G.D.; Sharma, A. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Crit. Care Med. 2020, 48, 210–217. [Google Scholar] [CrossRef] [PubMed]
- Johnson, A.E.W.; Bulgarelli, L.; Shen, L.; Gayles, A.; Shammout, A.; Horng, S.; Pollard, T.J.; Hao, S.; Moody, B.; Gow, B.; et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 2023, 10, 1. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; The Neural Information Processing Systems Foundation: San Diego, CA, USA, 2017; pp. 5998–6008. [Google Scholar]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The Future of Digital Health with Federated Learning. npj Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
- Alam, M.U.; Rahmani, R. FedSepsis: A Federated Multi-Modal Deep Learning-Based Internet of Medical Things Application for Early Detection of Sepsis from Electronic Health Records Using Raspberry Pi and Jetson Nano Devices. Sensors 2023, 23, 970. [Google Scholar] [CrossRef]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. arXiv 2019, arXiv:1912.04977. [Google Scholar]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
- Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
- Kaissis, G.A.; Makowski, M.R.; Rückert, D.; Braren, R.F. Secure, Privacy-Preserving and Federated Machine Learning in Medical Imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
- Vickers, A.J.; Elkin, E.B. Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Med. Decis. Mak. 2006, 26, 565–574. [Google Scholar] [CrossRef]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy (SP); IEEE: New York, NY, USA, 2017; pp. 3–18. [Google Scholar]
- Yeom, S.; Giacomelli, I.; Fredrikson, M.; Jha, S. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF); IEEE: New York, NY, USA, 2018; pp. 268–282. [Google Scholar]
- Melis, L.; Song, C.; De Cristofaro, E.; Shmatikov, V. Exploiting Unintended Feature Leakage in Collaborative Learning. In 2019 IEEE Symposium on Security and Privacy (SP); IEEE: New York, NY, USA, 2019. [Google Scholar]
- Bhagoji, A.N.; Chakraborty, S.; Mittal, P.; Calo, S. Model Poisoning Attacks in Federated Learning. arXiv 2019, arXiv:1811.02645. [Google Scholar]
- Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to Backdoor Federated Learning. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual, 26–28 August 2020; pp. 2938–2948. [Google Scholar]
- Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Advances in Neural Information Processing Systems; The Neural Information Processing Systems Foundation: San Diego, CA, USA, 2017. [Google Scholar]
- Yin, D.; Chen, Y.; Ramchandran, K.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the 35th International Conference on Machine Learning; PMLR: Stockholm, Sweden, 2018; pp. 5650–5659. [Google Scholar]
- Davis, S.E.; Matheny, M.E.; Balu, S.; Sendak, M.P. A Framework for Understanding Label Leakage in Machine Learning for Health Care. J. Am. Med. Inform. Assoc. 2024, 31, 274–280. [Google Scholar] [CrossRef] [PubMed]
- Kapoor, S.; Narayanan, A. Leakage and the Reproducibility Crisis in Machine-Learning-Based Science. Patterns 2023, 4, 100804. [Google Scholar] [CrossRef]
- Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; van Smeden, M.; et al. TRIPOD+AI Statement: Updated Guidance for Reporting Clinical Prediction Models that Use Regression or Machine Learning Methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef]
- Collins, G.S.; Dhiman, P.; Ma, J.; Schlussel, M.M.; Archer, L.; Van Calster, B.; Harrell, F.E.; Martin, G.P.; Moons, K.G.M.; van Smeden, M.; et al. Evaluation of Clinical Prediction Models (Part 1): From Development to External Validation. BMJ 2024, 384, e074819. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In KDD’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Saito, T.; Rehmsmeier, M. The Precision–Recall Plot Is More Informative than the ROC Plot when Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
- Van Calster, B.; Collins, G.S.; Vickers, A.J.; Wynants, L.; Kerr, K.F.; Barreñada, L.; Varoquaux, G.; Singh, K.; Moons, K.G.M.; Hernandez-Boussard, T.; et al. Evaluation of Performance Measures in Predictive Artificial Intelligence Models to Support Medical Decisions: Overview and Guidance. Lancet Digit. Health 2025, 7, e100916. [Google Scholar] [CrossRef]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In CCS’16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2016; pp. 308–318. [Google Scholar]
- Wong, A.; Otles, E.; Donnelly, J.P.; Krumm, A.; McCullough, J.; DeTroyer-Cooley, O.; Pestrue, J.; Phillips, M.; Konye, J.; Penoza, C.; et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Intern. Med. 2021, 181, 1065–1070. [Google Scholar] [CrossRef]
- Cvach, M. Monitor Alarm Fatigue: An Integrative Review. Biomed. Instrum. Technol. 2012, 46, 268–277. [Google Scholar] [CrossRef]
- Sendelbach, S.; Funk, M. Alarm Fatigue: A Patient Safety Concern. AACN Adv. Crit. Care 2013, 24, 378–386. [Google Scholar] [CrossRef] [PubMed]
- Evans, L.; Rhodes, A.; Alhazzani, W.; Antonelli, M.; Coopersmith, C.M.; French, C.; Machado, F.R.; McIntyre, L.; Ostermann, M.; Prescott, H.C.; et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Intensive Care Med. 2021, 47, 1181–1247. [Google Scholar] [CrossRef] [PubMed]
- Carlini, N.; Chien, S.; Nasr, M.; Song, S.; Terzis, A.; Tramèr, F. Membership Inference Attacks from First Principles. In 2022 IEEE Symposium on Security and Privacy (SP); IEEE: New York, NY, USA, 2022; pp. 1897–1914. [Google Scholar] [CrossRef]
- Van Calster, B.; McLernon, D.J.; van Smeden, M.; Wynants, L.; Steyerberg, E.W. Calibration: The Achilles Heel of Predictive Analytics. BMC Med. 2019, 17, 230. [Google Scholar] [CrossRef] [PubMed]
- Fleuren, L.M.; Klausch, T.L.T.; Zwager, C.L.; Schoonmade, L.J.; Guo, T.; Roggeveen, L.F.; Swart, E.L.; Girbes, A.R.J.; Thoral, P.; Ercole, A.; et al. Machine Learning for the Prediction of Sepsis: A Systematic Review and Meta-Analysis. Intensive Care Med. 2020, 46, 383–400. [Google Scholar] [CrossRef] [PubMed]
- Nemati, S.; Holder, A.; Razmi, F.; Stanley, M.D.; Clifford, G.D.; Buchman, T.G. An Interpretable Machine Learning Model for Accurate Prediction of Sepsis in the ICU. Sci. Transl. Med. 2018, 10, eaas9557. [Google Scholar] [CrossRef] [PubMed]











| Stage | Dataset | Records | Split Unit | Client/Domain | Evaluation Target |
|---|---|---|---|---|---|
| Stage-1 | PhysioNet/CinC 2019 (https://doi.org/10.13026/v64v-d857) | Training sets A + B: 40,336 patients (A: 20,336; B: 20,000) | patient (group split) | Hospital systems (A/B) | Fixed-horizon benchmark (main) |
| Stage-2 | MIMIC-IV v3.1 (https://doi.org/10.13026/kpb9-mt58) | Total: 36,193 stays; 87,460 windows (T = 6 h). Test: 3620 stays; 8743 windows (). Monitored test stream: 4022 stays; 9927 windows ). | stay (group split) | Care unit (multi-client) + pooled-1 (single-client control) | Operational evaluation (fixed alert-rate α = 5%) |
| Split | ICU Stays (n) | Evaluation Windows (n) | Positive Evaluation Windows (n) | Prevalence |
|---|---|---|---|---|
| Train | 28,954 | 70,012 | 3387 | 4.84% |
| Validation | 3619 | 8705 | 486 | 5.58% |
| Test | 3620 | 8743 | 500 | 5.72% |
| Total | 36,193 | 87,460 | 4373 | 5.00% |
| Method | A-Test AUROC (95% CI) | A-Test AUPRC (95% CI) | B-Test AUROC (95% CI) | B-Test AUPRC (95% CI) |
|---|---|---|---|---|
| LSTM Local(A) | 0.769 [0.726, 0.806] | 0.255 [0.188, 0.331] | 0.715 [0.667, 0.766] | 0.121 [0.083, 0.190] |
| LSTM Local(B) | 0.712 [0.670, 0.756] | 0.193 [0.150, 0.253] | 0.810 [0.766, 0.851] | 0.170 [0.124, 0.247] |
| LSTM-FedAvg(r10) | 0.738 [0.697, 0.780] | 0.185 [0.143, 0.243] | 0.774 [0.724, 0.824] | 0.207 [0.144, 0.315] |
| LSTM FedProx(r8) | 0.736 [0.694, 0.777] | 0.207 [0.159, 0.271] | 0.779 [0.736, 0.825] | 0.171 [0.113, 0.247] |
| TF Local(A) | 0.783 [0.742, 0.821] | 0.300 [0.231, 0.376] | 0.818 [0.778, 0.861] | 0.199 [0.144, 0.294] |
| TF Local(B) | 0.666 [0.628, 0.704] | 0.121 [0.099, 0.161] | 0.806 [0.752, 0.854] | 0.219 [0.155, 0.316] |
| TF-FedAvg(r4) | 0.748 [0.705, 0.785] | 0.226 [0.178, 0.303] | 0.850 [0.806, 0.887] | 0.312 [0.225, 0.421] |
| Setting | Training | Test AUROC | Test AUPRC |
|---|---|---|---|
| Centralized-XGB | Centralized | 0.6592 | 0.1443 |
| Centralized-RF | Centralized | 0.6646 | 0.0899 |
| Centralized-LogReg | Centralized | 0.6693 | 0.1174 |
| SimFL FedAvg-LogReg (client = careunit) | SimFL | 0.6150 | 0.1160 |
| SimFL FedAvg-LogReg (client = admission-type) | SimFL | 0.6319 | 0.1230 |
| Pooled-1 FedAvg-LogReg (client = pooled-1) | Pooled | 0.5802 | 0.0708 |
| (a) | ||||||||
| (A) Contamination Sweep (Mean ± std Over Three Seeds) | ||||||||
| Leak_frac | AUROC (Mean ± std) | AUPRC (Mean ± std) | ||||||
| 0 | 0.6410 ± 0.0063 | 0.0888 ± 0.0018 | ||||||
| 0.05 | 0.6313 ± 0.0265 | 0.0880 ± 0.0058 | ||||||
| 0.1 | 0.6474 ± 0.0113 | 0.0925 ± 0.0073 | ||||||
| 0.3 | 0.6493 ± 0.0166 | 0.0916 ± 0.0032 | ||||||
| 0.5 | 0.6514 ± 0.0081 | 0.0984 ± 0.0034 | ||||||
| (b) | ||||||||
| Method | Test_AUROC | AUROC_lo | AUROC_hi | Test_AUPRC | AUPRC_lo | AUPRC_hi | n_Test_windows | n_Test_Stays |
| Centralized-LogReg | 0.6693 | 0.6203 | 0.7161 | 0.1174 | 0.0854 | 0.1569 | 8743 | 3620 |
| SimFL FedAvg-LogReg (client = careunit) | 0.615 | 0.5612 | 0.6649 | 0.116 | 0.0816 | 0.1596 | 8743 | 3620 |
| SimFL FedAvg-LogReg (client = admission-type) | 0.6319 | 0.5798 | 0.6816 | 0.123 | 0.0851 | 0.1688 | 8743 | 3620 |
| Setting | Test AUROC | Test AUPRC | MIA AUC (Attack Model) | MIA AUC (Loss Score) |
|---|---|---|---|---|
| Pooled LogReg (central) | 0.5571 | 0.0598 | 0.4979 | 0.5069 |
| Careunit LogReg (FedAvg) | 0.4873 | 0.0625 | 0.4942 | 0.4973 |
| FedAvg + clip (C = 1.0), σ = 0.01 | 0.5450 | 0.0594 | 0.4924 | 0.4987 |
| FedAvg poisoning (scale = 20) | 0.4467 | 0.0490 | — | — |
| (a) | |||||
| Model | Training | Detection Rate | Median Lead Time (h) | IQR (25–75) | Threshold τ (α = 5%) |
| XGB | Centralized | 54.8% | 44.4 | 19.7–105.9 | 0.632 |
| Pooled-1 LogReg | FedAvg | 37.2% | 33.7 | 8.1–77.8 | 0.240 |
| Careunit LogReg | FedAvg | 31.4% | 23.5 | 4.5–62.2 | 0.999 |
| (b) | |||||
| Model | Training | P (L ≤ 24 h) | P (L ≤ 48 h) | Median (L|L ≤ 48 h) (h) | |
| XGB | Centralized | 34.0% | 52.4% | 21.4 | |
| Pooled-1 LogReg | FedAvg | 40.0% | 60.0% | 14.1 | |
| Careunit LogReg | FedAvg | 50.8% | 69.5% | 6.5 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jin, H.; Lee, H. Leakage-Aware Federated Learning for ICU Sepsis Early Warning: Fixed Alert-Rate Evaluation on PhysioNet/CinC 2019 and MIMIC-IV. Appl. Sci. 2026, 16, 2735. https://doi.org/10.3390/app16062735
Jin H, Lee H. Leakage-Aware Federated Learning for ICU Sepsis Early Warning: Fixed Alert-Rate Evaluation on PhysioNet/CinC 2019 and MIMIC-IV. Applied Sciences. 2026; 16(6):2735. https://doi.org/10.3390/app16062735
Chicago/Turabian StyleJin, Hyejin, and Hongchul Lee. 2026. "Leakage-Aware Federated Learning for ICU Sepsis Early Warning: Fixed Alert-Rate Evaluation on PhysioNet/CinC 2019 and MIMIC-IV" Applied Sciences 16, no. 6: 2735. https://doi.org/10.3390/app16062735
APA StyleJin, H., & Lee, H. (2026). Leakage-Aware Federated Learning for ICU Sepsis Early Warning: Fixed Alert-Rate Evaluation on PhysioNet/CinC 2019 and MIMIC-IV. Applied Sciences, 16(6), 2735. https://doi.org/10.3390/app16062735

