Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data

Yadgarov, Mikhail Ya.; Rebrova, Olga Yu.; Berikashvili, Levan B.; Polyakov, Petr A.; Kadantseva, Kristina K.; Yakovlev, Alexey A.; Grechko, Andrey V.; Likhvantsev, Valery V.

doi:10.3390/jcm15020777

Open AccessArticle

Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data

by

Mikhail Ya. Yadgarov

^1,*,

Olga Yu. Rebrova

²

,

Levan B. Berikashvili

¹,

Petr A. Polyakov

¹

,

Kristina K. Kadantseva

¹,

Alexey A. Yakovlev

¹,

Andrey V. Grechko

¹ and

Valery V. Likhvantsev

^1,3

¹

Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology, Moscow 107031, Russia

²

Pirogov Russian National Research Medical University, Moscow 117997, Russia

³

I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia

^*

Author to whom correspondence should be addressed.

J. Clin. Med. 2026, 15(2), 777; https://doi.org/10.3390/jcm15020777

Submission received: 16 December 2025 / Revised: 13 January 2026 / Accepted: 16 January 2026 / Published: 18 January 2026

(This article belongs to the Section General Surgery)

Download

Browse Figures

Versions Notes

Abstract

Background: No machine learning (ML) models for sepsis prediction have been specifically developed for patients with prolonged or chronic critical illness (PCI/CCI). Objective: This study aimed to develop and validate an ML-based sepsis prediction model for this cohort. Methods: We analyzed ICU admissions from the Russian Intensive Care Dataset (RICD, 575 patients with PCI/CCI) and two public ICU datasets from the PhysioNet (>40,000 patients with acute critical illness). Models were trained within a right-aligned prediction framework using a case–crossover–control sampling approach and a 6 h prediction window. Two strategies were evaluated: (1) a PCI/CCI-focused model trained on RICD with external testing on PhysioNet data and (2) a universal model trained on combined RICD and PhysioNet cohorts. Models were developed with tree-based algorithms (XGBoost, LightGBM, Random Forest, AdaBoost), with internal and external validation. Primary outcome was model discrimination (AUROC). Subgroup analyses were performed for sepsis phenotypes. Results: The PCI/CCI-focused XGBoost model achieved an AUROC of 0.82 in the RICD cohort but failed to generalize to external ICU populations (AUROC 0.47). A universal model trained on mixed data demonstrated reduced discrimination in PCI/CCI patients (AUROC mean difference 0.02, p = 0.0012). Respiratory rate, heart rate, body temperature, and age were among the most important features. Predictive performance was higher in hypoinflammatory sepsis phenotype (AUROC 0.84 vs. 0.81 for hyperinflammatory, p < 0.001). Despite worthless positive predictive value (up to 21%) for PCI/CCI-focused model, negative predictive value exceeded 97%. Conclusions: A right-aligned ML model tailored to PCI/CCI demonstrated strong internal performance for sepsis exclusion but limited cross-population generalizability, underscoring the need for population-specific prediction models and prospective validation before clinical application.

Keywords:

sepsis prediction; machine learning; chronic critical illness; intensive care unit; right-aligned model; SHAP; real-world data

1. Introduction

Sepsis is a life-threatening condition characterized by organ dysfunction due to a dysregulated host response to infection, remaining a critical global health challenge despite advances in intensive care [1]. Recent estimates indicate that sepsis affects approximately 31.5 million individuals worldwide each year, with over 19 million cases progressing to severe sepsis and accounting for approximately 5.3 million deaths annually [2].

Timely recognition and initiation of treatment within the so-called “golden hour” have been consistently associated with improved clinical outcomes [3,4,5]. However, early detection remains a persistent challenge. Traditional clinical tools, such as the Sequential Organ Failure Assessment (SOFA), Systemic Inflammatory Response Syndrome (SIRS), and quick SOFA (qSOFA), have demonstrated limited efficacy for sepsis prediction in real-world settings, often resulting in delayed diagnosis and treatment initiation [6,7,8,9,10]. Recent evidence supports the systematic use of screening tools for early sepsis detection, with the 2021 Surviving Sepsis Campaign (SSC) guidelines highlighting the potential role of machine learning (ML) algorithms in enhancing screening accuracy [5]. ML-based models, particularly those employing real-time prediction strategies with right-aligned data structures, have emerged as promising tools for anticipating sepsis onset hours before clinical manifestation [11]. This approach supports clinical decision-making by enabling timely interventions, such as the early initiation of antibiotic or fluid therapy [11].

Over the past five years, several systematic reviews and meta-analyses have confirmed the efficacy of ML-based right-aligned sepsis prediction models, demonstrating their superiority over traditional clinical scoring systems [10,11,12]. Nevertheless, these models have been developed and validated in general intensive care unit (ICU) cohorts or general ward populations, limiting their applicability to specific patient subgroups. One such subgroup comprises patients with prolonged or chronic critical illness (PCI/CCI)—a distinct clinical condition characterized by persistent organ dysfunction, extended ICU stays, and increased risk of complications, including healthcare-associated infections and sepsis [13,14]. Despite growing clinical interest, no universally accepted definition of PCI/CCI exists to date [15], and no sepsis prediction models have been developed for this population.

The aim of this study was to develop and validate a right-aligned ML model for sepsis prediction in ICU patients with PCI/CCI using real-world data.

2. Materials and Methods

2.1. Source of Data

The Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology (FRCC ICMR) is one of the largest tertiary centers in the Russian Federation, specializing in the management and research of PCI/CCI. The primary data source for this study was the institutional Russian Intensive Care Dataset (RICD) v2.0 [16,17], an anonymized dataset comprising 8420 ICU admissions from 3404 unique patients treated at FRCC ICMR, totalling 252,836 patient-days. In addition, two publicly available datasets from the PhysioNet/Computing in Cardiology Challenge 2019 (Sepsis Prediction Challenge) [18,19,20] were used both for model development and external validation. Challenge-1 dataset contains data from 20,336 ICU patients admitted to Beth Israel Deaconess Medical Center (Boston, MA, USA), and Challenge-2 dataset includes data from 20,000 ICU patients admitted to Emory University Hospital (Atlanta, GA, USA). Unlike the RICD cohort, the Challenge datasets reflect ICU populations with acute critical conditions. Both Challenge datasets have been widely used as benchmarks for the development and validation of ML models in sepsis prediction research [21,22,23,24,25,26]. They were selected for this study due to their large size, structured time-series format, availability of sepsis onset annotations, and the absence of other publicly accessible datasets representing patients with PCI/CCI beyond the institutional RICD dataset.

All datasets provide structured hourly time-series data on vital signs, laboratory values, and demographic characteristics, with sepsis onset annotated according to the Sepsis-3 criteria [5].

2.2. Study Design and Setting

For the purpose of this real-world study, we screened all ICU admissions recorded in the RICD dataset between December 2017 and September 2024. Patients were eligible for inclusion if their ICU length of stay was ≥24 h and if at least one hourly sepsis assessment based on the Sepsis-3 criteria was available during their ICU stay. Patients were excluded if no data on continuously monitored vital signs or therapeutic interventions were recorded throughout the ICU admission. To avoid duplication, only the first ICU admission for each patient was included.

The external Challenge datasets from the PhysioNet/Computing in Cardiology Challenge 2019, covering ICU admissions from 2009 to 2019, were used according to their original structure and inclusion criteria, and no additional exclusion criteria were applied.

All eligible cases from the datasets covering an 8–10-year period were included; no formal sample size calculation was required. The study protocol was approved by the FRCC ICMR local ethics committee (approval No. 1/24/1, 24 April 2024). The study protocol was not prospectively registered. The study was conducted in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines [27,28]. The TRIPOD + AI checklist is provided in Supplemental Materials S1.

2.3. Data Management

Data extraction and preprocessing were performed using DB Browser for SQLite v.3.13.1 and Python v.3.12. All data processing scripts are available at GitHub (https://github.com/MikhailYadgarov/RICDv2-sql-code, accessed on 15 October 2025).

In the institutional RICD dataset, Sepsis-3 assessment was performed on a daily basis. In the absence of SOFA measurements at ICU admission, baseline SOFA was considered unavailable, and Sepsis-3 assessment was possible only if at least one SOFA score was available within the preceding two days. SOFA scores were not recalculated but were used directly as recorded in the clinical data. Suspected infection was based on the presence of microbiological cultures and/or administration of antibiotic therapy. The end of a sepsis episode was defined as the cessation of antibiotic therapy. For the PhysioNet/Computing in Cardiology Challenge 2019 datasets (Challenge-1 and Challenge-2), the original Sepsis-3 labels provided by the dataset authors were used without modification.

The initial feature set was defined using a hybrid approach, combining expert knowledge, precedent-based inclusion of features from prior sepsis prediction studies [11,12], and hand-crafted features derived from raw time-series data, which have been shown to improve model performance in sepsis prediction tasks [29]. Feature selection across vital signs, laboratory values, demographics, and comorbidities was performed using mutual information analysis, followed by a Spearman rank correlation analysis to reduce multicollinearity. Features with strong pairwise correlations (r ≥ 0.9) were excluded, favoring those with higher information gain. Final predictor importance was confirmed using SHapley Additive exPlanations (SHAP) values [30]. A complete list of features is provided in Supplemental Materials S2.

The following variables were extracted and analyzed: (1) the occurrence/absence of sepsis for all prediction windows; (2) baseline characteristics, including sex and age; (3) severity scores at ICU admission (available for RICD only); (4) laboratory parameters at admission and dynamically during the ICU stay; (5) comorbidities (not available in PhysioNet/Computing in Cardiology Challenge 2019); and (6) outcomes and complications, including all-cause mortality, ICU and hospital length of stay, septic shock (according to the sepsis-3 criteria [31]), vasopressor and/or inotrope use, mechanical ventilation, and nosocomial pneumonia (available only in RICD).

Considering sepsis heterogeneity, we classified episodes as hyperinflammatory if systemic inflammatory response syndrome (SIRS) score ≥ 2 was recorded at any time during sepsis episode, and as hypoinflammatory phenotype otherwise. No universal or consensus-based definition of sepsis phenotypes currently exists [32,33,34]. The choice of the present phenotyping approach was driven by its reproducibility and feasibility in a retrospective setting.

In all datasets, vital signs, including heart rate (HR), respiratory rate (RR), body temperature, systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial blood pressure (MBP), and oxygen saturation (SpO₂), were recorded hourly. For each parameter, we calculated the average (avg), minimum (min) value, maximum (max) value, and standard deviation (sd) for 1-h period (calculated only when two or more values were available), and the 3 h difference (delta_3h).

For the right-aligned (real-time) prediction approach, datasets were structured relative to the time of sepsis onset in patients with sepsis developed, allowing estimation of sepsis risk within a predefined prediction window [35]. Negative windows were obtained both from earlier time points in patients who later developed sepsis and from patients who did not develop sepsis, following a case–crossover–control (full-window) approach which reflect real-world conditions accurately and is currently recommended for sepsis real-time prediction model development and validation [29,36]. The models were trained to predict sepsis within the next 6 h (prediction window) based on data collected during the preceding 3 h for vital signs and 12 h for laboratory parameters (observation window), using sliding windows with a 1 h shift. This sampling approach was consistently applied across all datasets. The prediction framework is illustrated in Figure 1.

2.4. Statistical Analysis and Model Development

Continuous variables were summarized using medians with interquartile ranges (IQR), while categorical variables were reported as absolute numbers and percentages. Normality was assessed using the Shapiro-Wilk test. Continuous variables were compared using non-parametric tests: the Mann-Whitney U test for two groups and the Kruskal-Wallis test for three or more groups. Categorical variables were compared using the χ² test or Fisher’s exact test. Two-sided p-values < 0.05 were considered statistically significant. Bonferroni correction was applied by multiplying p-values by the number of comparisons while retaining the significance threshold.

Two strategies were applied for predictive modeling. The first approach, focused on the prediction of sepsis in patients with PCI/CCI, used the RICD dataset random split into training (60%), validation (20%), and internal test (20%) subsets. The second approach aimed to develop a universal sepsis prediction model by combining 80% of RICD with the Challenge-1 dataset (acute critically ill populations), followed by an 80:20 split for training and validation, using the same 20% of RICD for internal validation as in the first approach. External validation was performed on the Challenge-2 dataset in both approaches.

All ML models—Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Random Forest, and LightGBM—were trained within the right-aligned prediction framework, using the same observation and prediction windows, feature engineering pipeline, and sampling strategy across all datasets. The XGBoost and LightGBM models incorporated class imbalance correction by applying the scale_pos_weight parameter, which was calculated as the ratio of negative to positive samples in the training dataset, and these weights were accounted for in the loss function during model training. For Random Forest, class imbalance was addressed using the class_weight = ‘balanced’ option, which scales class weights inversely to class prevalence. Hyperparameter optimization was performed using grid search with predefined parameters and early stopping criteria. The optimal hyperparameters were selected based on the area under the receiver operating characteristic curve (AUROC) on the validation set.

All predictors were used in their original scale without standardization or transformation, as all applied tree-based models are scale-invariant. Missing values were replaced with a constant placeholder (−999) prior to model training to ensure compatibility across all applied algorithms, including those unable to handle missing values natively (e.g., AdaBoost, Random Forest) [37]. XGBoost and LightGBM natively handle missing values by assigning them to the child node (left or right) that minimizes the loss function at each split, with the learned default direction stored in the model.

Model performance was evaluated on both internal and external test sets. The primary evaluation metric was the area under the receiver operating characteristic curve (AUROC), reported with 95% confidence intervals (CIs) and standard deviations (SDs). Optimal cut-off values were determined using Youden’s index, with additional thresholds identified based on the maximization of the positive likelihood ratio (LR+). Secondary metrics included sensitivity, specificity, positive and negative predictive values (PPV, NPV), accuracy (Acc) (adjusted for event prevalence), and F1 score. The F1 score was defined as the harmonic mean of precision and recall: F1 = 2 × (precision × recall)/(precision + recall).

Model robustness was assessed using a leave-one-patient-out (LOPO) sensitivity analysis. Comparative analysis of AUROC between models was conducted using the DeLong method [38]. In the absence of statistically significant differences between AUROC values, the highest value was selected based on simple arithmetic comparison. A predefined subgroup analysis was conducted to evaluate model performance separately in patients with hyperinflammatory and hypoinflammatory sepsis phenotypes. Decision curve analysis (DCA) was also conducted to assess the clinical utility of the model across a range of threshold probabilities.

Statistical analyses and model development were performed using IBM SPSS Statistics v. 29.0 (IBM Corp., Armonk, NY, USA) and Python (v3.12). PPV, NPV, and Acc were assessed using MedCalc web application. SHAP summary plots, calibration curves, and force plots were used to visualize and interpret model predictions.

Detailed model-building code, preprocessing scripts, and hyperparameter configurations are available on GitHub (https://github.com/MikhailYadgarov/Sepsis-prediction, accessed on 15 October 2025).

3. Results

3.1. Patient Characteristics

A total of 575 patients (388,914 patient–hours) from the RICD dataset met the eligibility criteria, of whom 336 (57.0%) developed sepsis during their ICU stay (Figure 2). The RICD cohort was predominantly composed of patients with PCI/CCI [13], with a median ICU length of stay of 42 days (IQR 30–59); mechanical ventilation was required in over 95% patients, and more than 98% were transferred from other ICUs. Sepsis onset occurred markedly later in RICD (median 233 h) compared to Challenge datasets (35 h) (Figure S1). Moreover, the datasets differed in terms of sex and age distribution, with the RICD cohort exhibiting lower levels of leukocytes and higher platelet counts compared to the Challenge datasets. Detailed comparative data across datasets are provided in Tables S1–S5. Among septic patients in the RICD cohort, 243 (72%) episodes were classified as hyperinflammatory. Compared to the hypoinflammatory phenotype, these patients developed sepsis earlier, had higher C-reactive protein, leucocyte, fibrinogen, and procalcitonin levels on admission, and more frequently presented with septic shock and nosocomial pneumonia (Table S6).

3.2. ML Sepsis Prediction Models

Comparative baseline characteristics of training, validation, and test datasets are presented in Tables S7 and S8.

In the PCI/CCI-focused approach, the XGBoost model demonstrated the best performance, with 26 selected predictors and AUROC values of 0.875 (train), 0.711 (validation), and 0.753 (internal test) (Figure 3). Performance on the full RICD cohort reached AUROC 0.819 (Table S9). However, this model failed to generalize to external data from acute critically ill populations, yielding an AUROC of 0.474 on Challenge-2 (Figure S2). In contrast, the best-performing model in the universal approach was LightGBM, based on 25 predictors. It achieved AUROC values of 0.756 (train), 0.698 (validation), and 0.754 (internal test), with a full RICD AUROC of 0.80 and an external test AUROC of 0.655 (Figure 3, Table S9, Figure S2).

The XGBoost model from the PCI/CCI approach outperformed the LightGBM model from the universal approach for sepsis prediction in the RICD cohort (AUROC 0.819 vs. 0.802; p = 0.0012, Figure S3).

SHAP analysis highlighted RR, age, HR, and body temperature as key contributors (Figure 4). Examples of patient-specific predictor impact are shown in Figure S4; the calibration curve is presented in Figure S5.

Multiple cut-off points were evaluated for the XGBoost model, showing high specificity (75–87%) and NPV (>97%) but low-to-moderate sensitivity (60–87%) and PPV (16–21%) across thresholds, suggesting limited reliability of positive predictions (Table S10).

Subgroup analysis revealed superior performance of the XGBoost model in patients with a hypoinflammatory phenotype (AUROC 0.843 vs. 0.810; p < 0.001, Figure S6), with distinct patterns of predictor contributions (Figure S7). Model robustness was confirmed by a LOPO cross-validation (SD 0.0005; 95% CI 0.818–0.820). DCA confirmed the clinical utility of the model across relevant thresholds (Figure S8). An example of predicted sepsis score dynamics is provided in Figure S9.

4. Discussion

4.1. Key Findings

In this study, we developed and validated a right-aligned ML model for early sepsis prediction using a large institutional dataset of ICU patients with PCI/CCI (RICD) and two publicly available ICU datasets. The analysis included 575 patients (388,914 patient–hours) from the RICD dataset and over 40,000 ICU patients (1,535,484 patient–hours) from the PhysioNet Challenge datasets, representing critically ill patients.

Two modeling strategies were evaluated. The PCI/CCI-focused model, trained exclusively on institutional data, demonstrated robust discrimination within this cohort (AUROC 0.819) but limited generalizability to acute critically ill populations (AUROC 0.474 in Challenge-2). In contrast, the universal model, trained on both institutional and external data, yielded more balanced performance across datasets (AUROC 0.802 in RICD; AUROC 0.655 in Challenge-2), but showed a relative loss of specificity and discrimination within the PCI/CCI subgroup.

Given these findings, further evaluation was performed using the XGBoost model derived from the PCI/CCI-focused strategy, which demonstrated the most favorable trade-off between discrimination, specificity, and alignment with the clinical characteristics of the target cohort. The final model included 26 predictors. SHAP analysis identified RR, age, HR, and body temperature as the most important features. While PPVs were low (16–21%), the model consistently achieved high specificity (up to 87%) and NPVs exceeding 97%.

Subgroup analysis showed significantly better model performance in patients with a hypoinflammatory phenotype (AUROC 0.843) compared to the hyperinflammatory phenotype (AUROC 0.810). Model robustness was supported by DCA, which demonstrated a net clinical benefit across a wide range of threshold probabilities, with a maximum net benefit of 0.237 corresponding to 24 additional correct decisions per 100 patients compared to strategies without model use. These results were further confirmed by LOPO cross-validation.

4.2. Relationship with Previous Studies

The findings of this study regarding the performance of sepsis prediction models are generally consistent with previously published results. Decision tree-based algorithms, including those used in our work, have demonstrated predictive performance comparable to that of neural networks, while offering better interpretability for clinical application [12]. Several studies utilizing the PhysioNet Challenge datasets reported strong predictive performance of tree-based models for sepsis prediction within a 6-h window. Chen et al. (2022) developed a random forest model achieving an AUROC of 0.850 [39]; Li et al. (2020) reported a LightGBM model with an AUROC of 0.850 [22]; Rangan et al. (2022) developed an XGBoost model with an AUROC of 0.940 [40]; and Yang et al. (2020) reported an XGBoost model with an AUROC of 0.850 [25]. However, direct comparison is limited by the absence of publicly available datasets representing PCI/CCI populations similar to RICD. Furthermore, our attempt to build a universal model applicable to both PCI/CCI and acute critically ill populations did not yield satisfactory results. This finding, together with the observation that a model with high performance in the PCI/CCI cohort failed to generalize to acute critically ill patients, can be attributed to several key factors.

First, patient characteristics in the PCI/CCI cohort differed substantially from those in the Challenge datasets. Patients with PCI/CCI had a higher incidence of sepsis and markedly longer ICU stays. Prolonged ICU stays in this population are associated with progressive anemia and a high need for transfusions, particularly in chronic critical illness [41]. Low lymphocyte and leukocyte counts are commonly observed in the course of PCI/CCI and are considered hallmarks of immune dysregulation [42]. Although elevated platelet counts are not typically associated with chronic critical illness, in our cohort, they may reflect reactive thrombocytosis secondary to systemic inflammation, likely driven by a high burden of infection—particularly pneumonia, which was documented in over 65% of patients at ICU admission.

Second, the timing and trajectory of sepsis onset differed markedly. In the RICD cohort, sepsis episodes occurred later and without a distinct peak (median 233 h), whereas in the Challenge datasets, sepsis typically developed within the first 24–48 h of ICU admission (median 35 h).

Third, the predictors of sepsis development in PCI/CCI patients differed markedly from those observed in acute critically ill populations. In acute critical care settings, sepsis is commonly associated with increased HR, RR, body temperature, and decreased blood pressure and SpO₂ [25,40]. In contrast, in the PCI/CCI cohort, SHAP analysis identified increased HR as a shared risk factor, while lower RR, lower body temperature, higher diastolic and mean blood pressures, and higher SpO₂ were associated with sepsis risk. Elevated blood pressure in PCI/CCI patients likely reflects the effects of vasopressor and fluid therapy administered to maintain adequate perfusion and hemodynamic stability. Lower body temperature may reflect a reduced systemic inflammatory response and impaired thermoregulation associated with immune dysfunction in chronic critical illness [43]. Decreased RR and increased SpO₂ likely reflect the impact of mechanical ventilation. Thus, vital signs in PCI/CCI patients may serve as surrogate markers reflecting both physiological status and the effects of intensive care interventions. Other risk factors identified in our model, including patient age (although no linear association was observed) and comorbidities, are consistent with previously established predictors of sepsis [44,45,46]. Laboratory variables, including low hemoglobin, hypoalbuminemia, elevated C-reactive protein, and increased lactate levels, were also associated with sepsis risk and are supported by existing evidence [47,48,49,50]. Male sex was additionally linked to higher risk, consistent with prior reports [51,52].

Fourth, sepsis is a heterogeneous clinical syndrome characterized by marked biological and phenotypic variability. Depending on the classification approach, previous studies have identified between two and four distinct sepsis phenotypes, differing in immune response profiles, clinical course, and outcomes [32,33,34]. In our study, we stratified patients into two phenotypes based on SIRS criteria: among septic patients in the RICD cohort, 243 (72%) episodes were classified as hyperinflammatory. The predictors of sepsis onset differed between the two phenotypes in the PCI/CCI population. In the hypoinflammatory phenotype, lower CRP levels were associated with an increased risk of sepsis, likely reflecting immune suppression and insufficient inflammatory response to infection. In contrast, in the hyperinflammatory phenotype, higher CRP levels were associated with increased risk, probably indicating an exaggerated cytokine-mediated inflammatory response [53]. Similarly, the association between MBP and sepsis risk differed between groups. In the hypoinflammatory phenotype, lower MBP was associated with increased risk, possibly reflecting inadequate vascular compensatory mechanisms and evolving hypoperfusion [54]. In contrast, in the hyperinflammatory phenotype, higher MBP values may reflect the use of high-dose vasopressors required to maintain hemodynamics, and were associated with increased risk. These findings underscore the high degree of heterogeneity in sepsis and support the concept that universal prediction models are unlikely to perform adequately across different populations. Instead, prediction models should be developed with consideration of the specific demographics and clinical characteristics of the target population [55].

Fifth, differences in data structure and outcome definitions between the institutional real-world dataset and the curated PhysioNet Challenge datasets may have additionally contributed to the observed performance reduction, as the latter rely on a fixed, preprocessed feature set and predefined sepsis labels that may not fully capture the temporal complexity and clinical context of PCI/CCI populations.

The PPVs obtained at optimal cut-off points remained worthless (up to 21%), meaning that for every 100 model-generated alerts, only approximately 21 would correspond to true sepsis cases, while the remaining positive predictions would represent false positives. Low PPV is a well-known limitation of predictive models, particularly when applied to clinical scenarios with a sepsis prevalence below 50%, which may increase the rate of false alerts and limit their clinical applicability [56]. Notably, even models tested under prospective-like conditions have demonstrated similarly modest PPV values. For example, in the study by Yu et al. (2022), an ML model predicting sepsis six hours before onset achieved a PPV of 29.1% in a pseudo-prospective evaluation conducted in a general ward setting [57]. These findings highlight the need for careful interpretation of positive alerts in clinical practice. In the ICU environment, where clinicians are already exposed to a high burden of physiologic and device-related alarms, additional predictive alerts without a clearly defined action pathway may exacerbate alarm fatigue and reduce trust in decision support systems [58]. Prior studies have shown that alerts lacking clear actionability or evidence-backed response protocols are frequently ignored or overridden, limiting their potential clinical impact [59,60,61,62]. Therefore, future implementation of our sepsis prediction model must be preceded by prospective validation and explicitly address alert thresholding, expected alert burden, and integration with predefined clinical workflows to ensure that alerts prompt meaningful clinical actions.

4.3. Significance of the Study Findings

Our findings confirm the feasibility of developing a clinically applicable ML model for early sepsis prediction in ICU patients with PCI/CCI, a population defined by high clinical complexity, unique risk factors, and extended ICU stays.

The applicability of right-aligned prediction models has been supported by multiple studies, which demonstrated that clinical validation and subsequent integration of such models into electronic health records as components of decision support tools can facilitate earlier initiation of antimicrobial therapy and improve patient outcomes, including reduced hospital length of stay and decreased mortality rates in patients with sepsis [11,63,64,65,66].

Despite the potential advantages of ML-based predictive models, their implementation in real-world clinical practice is often challenged by concerns over algorithm transparency and interpretability, frequently referred to as the “black box” phenomenon [67]. In our study, the use of SHAP summary plots for model interpretation and SHAP force plots for individual patient-level explanations provided transparency regarding the contribution of each predictor to the model’s output. This approach may enhance clinician trust in ML-driven tools and support their acceptance in critical care settings [68].

In addition, our findings underscore the importance of population-specific model development. The significant heterogeneity in patient characteristics, sepsis trajectories, and predictor relevance observed between PCI/CCI patients and acute critically ill populations suggests that predictive models trained on generalized cohorts may underperform when applied to specialized subgroups. Consequently, tailored models, calibrated to the clinical and physiological features of target populations, are likely to offer superior clinical utility and reliability. At the current stage, the proposed model should be viewed neither as a diagnostic instrument nor as a tool for autonomous clinical decision-making or treatment initiation. Its practical role is limited to exploratory risk stratification in selected ICU populations, and use outside this context—particularly in patients with established sepsis or ongoing antibiotic therapy—may result in misinterpretation and inappropriate clinical reliance. Following successful prospective validation, the model could theoretically be used as a clinical decision support tool to assist in decisions regarding initiation or modification of antibiotic therapy, with a pragmatic threshold range of approximately 0.4–0.7, which demonstrated the greatest net clinical benefit in DCA.

4.4. Strengths and Limitations

To our knowledge, this is the first study to develop and validate an ML-based sepsis prediction model specifically tailored for ICU patients with PCI/CCI. To account for potential heterogeneity, the study included diverse ICU cohorts from institutional and public datasets. The case–crossover–control sampling approach (also known as the full-window strategy) used in our study has been recognized as one of the most clinically relevant methodologies for model development, providing sampling conditions closest to real-life clinical application [29,36]. Another strength is the use of external tests, still rarely performed in AI-based ICU models (reported in only 14.7% of studies [69]). Nevertheless, a full external test in PCI/CCI cohorts was not feasible due to the absence of comparable publicly available datasets representing this patient group.

Nevertheless, several limitations should be considered when interpreting our results.

First, the model was developed on a single-center cohort of PCI/CCI patients, which may affect its generalizability, although an external test was performed on multicenter datasets of acute critically ill populations. Moreover, the prolonged ICU stays, frequent inter-hospital transfers, prolonged immobilization, and delayed ICU liberation in our cohort differ from care pathways in other centers, potentially affecting model transportability to settings with shorter ICU trajectories. Second, only the first sepsis episode per patient was analyzed; recurrent episodes and their timing were not assessed. Third, although an external test was performed, the models have not been prospectively tested in clinical practice; therefore, the reported retrospective performance should be interpreted as hypothesis-generating, and prospective, outcome-based evaluation is required before any clinical deployment. Fourth, the analysis was limited to a 6-h prediction window, and the applicability of the model for longer prediction horizons remains unexplored. Fifth, two of the four applied ML algorithms (AdaBoost and Random Forest) lack native mechanisms for handling missing values, necessitating constant-value imputation, which is not an optimal strategy and may introduce bias. In the ICU setting, missingness may be informative, and our approach may have influenced both model learning and SHAP-based interpretation; no sensitivity analysis with alternative imputation strategies was performed. Sixth, the occurrence of in-hospital complications and the potential impact of therapeutic interventions were not accounted for in the prognostic model. Seventh, SHAP was used as a post-hoc attribution tool to visualize model behavior; however, SHAP values may be unstable under predictor collinearity, do not imply causality, and should not be interpreted as clinically actionable or definitive explanations of underlying pathophysiology [70,71]. Moreover, the SIRS-based sepsis phenotype stratification was used solely as an exploratory tool to evaluate heterogeneity of model performance and was not intended to redefine sepsis phenotypes, which remain an area without consensus definitions. Finally, the model demonstrated relatively low PPV and a high false-positive rate, which may affect clinical acceptance and requires further evaluation in future studies.

4.5. Future Studies and Prospects

The findings of this study highlight several directions for future research. First, the development and open availability of dedicated PCI/CCI datasets is essential to facilitate external validation, reproducibility, and benchmarking of predictive models in this population. Second, prospective clinical trials are warranted to assess the real-world effectiveness and clinical impact of sepsis prediction models in PCI/CCI settings. Third, the absence of a standardized definition for PCI/CCI remains a major barrier to model generalizability and cross-study comparison.

5. Conclusions

In this study, a right-aligned machine learning model was developed and validated for early sepsis prediction in ICU patients with prolonged or chronic critical illness, demonstrating robust discrimination within this population but limited generalizability to acute critically ill patients. The model’s performance underscores the critical importance of population-specific prediction strategies and highlights the need for tailored approaches in heterogeneous ICU populations. Further prospective studies and the development of dedicated datasets are warranted to validate these findings and to facilitate the integration of sepsis prediction models into clinical practice.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jcm15020777/s1, Supplemental Materials S1. TRIPOD + AI checklist; Supplemental Materials S2. List of features; Table S1. Comparative characteristics of the three datasets; Table S2. Detailed characteristics of patients (RICD dataset); Table S3. Characteristics of vital and lab parameters (RICD dataset); Table S4. Characteristics of vital and lab parameters (Challenge-1 dataset); Table S5. Characteristics of vital and lab parameters (Challenge-2 dataset); Table S6. Comparative characteristics of patients with hypo- and hyperinflammatory sepsis phenotypes (RICD dataset); Table S7. Comparative baseline characteristics of the train set, validation set, internal and external test sets (PCI/CCI sepsis prediction model); Table S8. Comparative baseline characteristics of the train set, validation set, internal and external test sets (Universal sepsis prediction model); Table S9. AUROC values (and 95% CIs) of machine learning models for 6-h sepsis prediction across training, validation, and test sets; Table S10. Performance characteristics of best machine learning model for 6-h sepsis prediction (RICD dataset); Figure S1. Distribution of time to sepsis onset after ICU admission in three datasets; Figure S2. ROC curves of the best-performing machine learning models for early sepsis prediction (6-h window, external validation); Figure S3. ROC curves of the best-performing machine learning models for early sepsis prediction (6-h window, RICD dataset); Figure S4. Force plot illustrating predictor contributions for two patients from the RICD dataset; Figure S5. Calibration curve for XGBoost model (train set); Figure S6. ROC curves of the XGBoost model for hyperinflammatory and hypoinflammatory sepsis phenotypes (RICD dataset); Figure S7. SHAP summary plots of the XGBoost model for hyperinflammatory and hypoinflammatory sepsis phenotypes (RICD dataset); Figure S8. Decision curve analysis for the XGBoost model (RICD dataset, balanced 1:1 sample); Figure S9. Example of sepsis score dynamics over time predicted by the XGBoost model (RICD dataset, individual patient trajectory).

Author Contributions

Conceptualization: M.Y.Y., A.A.Y., A.V.G. and V.V.L., Writing the original draft: M.Y.Y., O.Y.R., L.B.B., P.A.P. and K.K.K., Revision of original draft: M.Y.Y., O.Y.R., L.B.B., P.A.P., K.K.K., A.A.Y., A.V.G. and V.V.L., Data extraction: M.Y.Y., P.A.P. and L.B.B., Data analysis: M.Y.Y., O.Y.R. and P.A.P., Supervision: A.A.Y., A.V.G. and V.V.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study received approval from the FRCC ICMR local ethics committee (approval No. 1/24/1, 24 April 2024).

Informed Consent Statement

Informed consent was obtained from all patients in the study.

Data Availability Statement

Publicly and partially available datasets were analyzed in this study. The PhysioNet/Computing in Cardiology Challenge 2019 dataset is available at https://physionet.org/content/challenge-2019/1.0.0/ (accessed on 8 August 2025). The RICD dataset can be obtained upon request at https://fnkcrr-database.ru/ (accessed on 8 August 2025). Source code available at GitHub (https://github.com/MikhailYadgarov/RICDv2-sql-code (accessed on 8 August 2025); https://github.com/MikhailYadgarov/Sepsis-prediction (accessed on 8 August 2025)) and upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Acc	Accuracy
AUROC	area under the receiver operating characteristic curve
CCI	chronic critical illness
CI	confidence interval
CRP	C-reactive protein
DCA	decision curve analysis
DBP	diastolic blood pressure
FRCC ICMR	Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology
HR	heart rate
ICU	intensive care unit
IQR	interquartile range
LOPO	leave-one-patient-out
MBP	mean blood pressure
ML	machine learning
NB	net benefit
NPV	negative predictive value
PCI	prolonged critical illness
PPV	positive predictive value
RICD	Russian Intensive Care Dataset
ROC	receiver operating characteristic
RR	respiratory rate
SBP	systolic blood pressure
SD	standard deviation
SHAP	SHapley Additive exPlanations
SIRS	systemic inflammatory response syndrome
SOFA	Sequential Organ Failure Assessment
SpO₂	oxygen saturation
SSC	Surviving Sepsis Campaign
TRIPOD	Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis

References

Bomrah, S.; Uddin, M.; Upadhyay, U.; Komorowski, M.; Priya, J.; Dhar, E.; Hsu, S.C.; Syed-Abdul, S. A Scoping Review of Machine Learning for Sepsis Prediction-Feature Engineering Strategies and Model Performance: A Step towards Explainability. Crit. Care 2024, 28, 180. [Google Scholar] [CrossRef]
Fleischmann, C.; Scherag, A.; Adhikari, N.K.J.; Hartog, C.S.; Tsaganos, T.; Schlattmann, P.; Angus, D.C.; Reinhart, K. Assessment of Global Incidence and Mortality of Hospital-Treated Sepsis Current Estimates and Limitations. Am. J. Respir. Crit. Care Med. 2016, 193, 259–272. [Google Scholar] [CrossRef]
Raghavan, M.; Marik, P.E. Management of Sepsis during the Early “Golden Hours”. J. Emerg. Med. 2006, 31, 185–199. [Google Scholar] [CrossRef]
Im, Y.; Kang, D.; Ko, R.E.; Lee, Y.J.; Lim, S.Y.; Park, S.; Na, S.J.; Chung, C.R.; Park, M.H.; Oh, D.K.; et al. Time-to-Antibiotics and Clinical Outcomes in Patients with Sepsis and Septic Shock: A Prospective Nationwide Multicenter Cohort Study. Crit. Care 2022, 26, 19. [Google Scholar] [CrossRef]
Evans, L.; Rhodes, A.; Alhazzani, W.; Antonelli, M.; Coopersmith, C.M.; French, C.; Machado, F.R.; Mcintyre, L.; Ostermann, M.; Prescott, H.C.; et al. Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Intensive Care Med. 2021, 47, 1181–1247. [Google Scholar] [CrossRef]
Adegbite, B.R.; Edoa, J.R.; Ndzebe Ndoumba, W.F.; Dimessa Mbadinga, L.B.; Mombo-Ngoma, G.; Jacob, S.T.; Rylance, J.; Hänscheid, T.; Adegnika, A.A.; Grobusch, M.P. A Comparison of Different Scores for Diagnosis and Mortality Prediction of Adults with Sepsis in Low-and-Middle-Income Countries: A Systematic Review and Meta-Analysis. eClinicalMedicine 2021, 42, 101184. [Google Scholar] [CrossRef]
Konjety, P.; Chakole, V.G. Beyond the Horizon: A Comprehensive Review of Contemporary Strategies in Sepsis Management Encompassing Predictors, Diagnostic Tools, and Therapeutic Advances. Cureus 2024, 16, e64249. [Google Scholar] [CrossRef] [PubMed]
He, R.R.; Yue, G.L.; Dong, M.L.; Wang, J.Q.; Cheng, C. Sepsis Biomarkers: Advancements and Clinical Applications—A Narrative Review. Int. J. Mol. Sci. 2024, 25, 9010. [Google Scholar] [CrossRef] [PubMed]
Sakib, N.; Ahamed, S.I.; Khan, R.A.; Griffin, P.M.; Haque, M.M. Unpacking Prevalence and Dichotomy in Quick Sequential Organ Failure Assessment and Systemic Inflammatory Response Syndrome Parameters: Observational Data–driven Approach Backed by Sepsis Pathophysiology. JMIR Med. Inform. 2020, 8, e18352. [Google Scholar] [CrossRef]
Islam, M.M.; Nasrin, T.; Walther, B.A.; Wu, C.C.; Yang, H.C.; Li, Y.C. Prediction of Sepsis Patients Using Machine Learning Approach: A Meta-Analysis. Comput. Methods Programs Biomed. 2019, 170, 1–9. [Google Scholar] [CrossRef] [PubMed]
Fleuren, L.M.; Klausch, T.L.T.; Zwager, C.L.; Schoonmade, L.J.; Guo, T.; Roggeveen, L.F.; Swart, E.L.; Girbes, A.R.J.; Thoral, P.; Ercole, A.; et al. Machine Learning for the Prediction of Sepsis: A Systematic Review and Meta-Analysis of Diagnostic Test Accuracy. Intensive Care Med. 2020, 46, 383–400. [Google Scholar] [CrossRef]
Yadgarov, M.Y.; Landoni, G.; Berikashvili, L.B.; Polyakov, P.A.; Kadantseva, K.K.; Smirnova, A.V.; Kuznetsov, I.V.; Shemetova, M.M.; Yakovlev, A.A.; Likhvantsev, V.V. Early Detection of Sepsis Using Machine Learning Algorithms: A Systematic Review and Network Meta-Analysis. Front. Med. 2024, 11, 1491358. [Google Scholar] [CrossRef]
Likhvantsev, V.V.; Berikashvili, L.B.; Yadgarov, M.Y.; Yakovlev, A.A.; Kuzovlev, A.N. The Tri-Steps Model of Critical Conditions in Intensive Care: Introducing a New Paradigm for Chronic Critical Illness. J. Clin. Med. 2024, 13, 3683. [Google Scholar] [CrossRef]
Hawkins, R.B.; Raymond, S.L.; Stortz, J.A.; Horiguchi, H.; Brakenridge, S.C.; Gardner, A.; Efron, P.A.; Bihorac, A.; Segal, M.; Moore, F.A.; et al. Chronic Critical Illness and the Persistent Inflammation, Immunosuppression, and Catabolism Syndrome. Front. Immunol. 2018, 9, 1511. [Google Scholar] [CrossRef] [PubMed]
Ohbe, H.; Satoh, K.; Totoki, T.; Tanikawa, A.; Shirasaki, K.; Kuribayashi, Y.; Tamura, M.; Takatani, Y.; Ishikura, H.; Nakamura, K. Definitions, Epidemiology, and Outcomes of Persistent/Chronic Critical Illness: A Scoping Review for Translation to Clinical Practice. Crit. Care 2024, 28, 435. [Google Scholar] [CrossRef]
Grechko, A.V.; Yadgarov, M.Y.; Yakovlev, A.A.; Berikashvili, L.B.; Kuzovlev, A.N.; Polyakov, P.A.; Kuznetsov, I.V.; Likhvantsev, V.V.; Grechko, A.V.; Yadgarov, M.Y.; et al. Russian Intensive Care Dataset—RICD. Obs. Reanimatol. 2024, 20, 22–31. [Google Scholar] [CrossRef]
RICD—Open Dataset. Available online: https://fnkcrr-database.ru (accessed on 14 July 2025).
Reyna, M.A.; Josef, C.S.; Jeter, R.; Shashikumar, S.P.; Westover, M.B.; Nemati, S.; Clifford, G.D.; Sharma, A. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019. Crit. Care Med. 2020, 48, 210–217. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef] [PubMed]
Reyna, M.A.; Josef, C.; Seyedi, S.; Jeter, R.; Shashikumar, S.P.; Brandon Westover, M.; Sharma, A.; Nemati, S.; Clifford, G.D. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019. Available online: https://physionet.org/content/challenge-2019/1.0.0/ (accessed on 14 July 2025).
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Two-Stage Monitoring of Patients in Intensive Care Unit for Sepsis Prediction Using Non-Overfitted Machine Learning Models. Electronics 2020, 9, 1133. [Google Scholar] [CrossRef]
Li, X.; Xu, X.; Xie, F.; Xu, X.; Sun, Y.; Liu, X.; Jia, X.; Kang, Y.; Xie, L.; Wang, F.; et al. A Time-Phased Machine Learning Model for Real-Time Prediction of Sepsis in Critical Care. Crit. Care Med. 2020, 48, E884–E888. [Google Scholar] [CrossRef]
Rafiei, A.; Rezaee, A.; Hajati, F.; Gheisari, S.; Golzan, M. SSP: Early Prediction of Sepsis Using Fully Connected LSTM-CNN Model. Comput. Biol. Med. 2021, 128, 104110. [Google Scholar] [CrossRef]
Zargoush, M.; Sameh, A.; Javadi, M.; Shabani, S.; Ghazalbash, S.; Perri, D. The Impact of Recency and Adequacy of Historical Information on Sepsis Predictions Using Machine Learning. Sci. Rep. 2021, 11, 20869. [Google Scholar] [CrossRef] [PubMed]
Yang, M.; Liu, C.; Wang, X.; Li, Y.; Gao, H.; Liu, X.; Li, J. An Explainable Artificial Intelligence Predictor for Early Detection of Sepsis. Crit. Care Med. 2020, 48, E1091–E1096. [Google Scholar] [CrossRef] [PubMed]
Prithula, J.; Islam, K.R.; Kumar, J.; Tan, T.L.; Reaz, M.B.I.; Rahman, T.; Zughaier, S.M.; Khan, M.S.; Murugappan, M.; Chowdhury, M.E.H. A Novel Classical Machine Learning Framework for Early Sepsis Prediction Using Electronic Health Record Data from ICU Patients. Comput. Biol. Med. 2025, 184, 109284. [Google Scholar] [CrossRef]
Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. BMC Med. 2015, 13, 1. [Google Scholar] [CrossRef]
Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; Van Smeden, M.; et al. TRIPOD+AI Statement: Updated Guidance for Reporting Clinical Prediction Models That Use Regression or Machine Learning Methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef]
Wang, Z.; Wang, W.; Sun, C.; Li, J.; Xie, S.; Xu, J.; Zou, K.; Jin, Y.; Yan, S.; Liao, X.; et al. A Methodological Systematic Review of Validation and Performance of Sepsis Real-Time Prediction Models. npj Digit. Med. 2025, 8, 190. [Google Scholar] [CrossRef]
Nohara, Y.; Matsumoto, K.; Soejima, H.; Nakashima, N. Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital. Comput. Methods Programs Biomed. 2022, 214, 106584. [Google Scholar] [CrossRef]
Shankar-Hari, M.; Phillips, G.S.; Levy, M.L.; Seymour, C.W.; Liu, V.X.; Deutschman, C.S.; Angus, D.C.; Rubenfeld, G.D.; Singer, M. Developing a Newdefinition and Assessing Newclinical Criteria for Septic Shock: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA—J. Am. Med. Assoc. 2016, 315, 775–787. [Google Scholar] [CrossRef]
Seymour, C.W.; Kennedy, J.N.; Wang, S.; Chang, C.C.H.; Elliott, C.F.; Xu, Z.; Berry, S.; Clermont, G.; Cooper, G.; Gomez, H.; et al. Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis. JAMA—J. Am. Med. Assoc. 2019, 321, 2003–2017. [Google Scholar] [CrossRef] [PubMed]
DeMerle, K.M.; Kennedy, J.N.; Chang, C.C.H.; Delucchi, K.; Huang, D.T.; Kravitz, M.S.; Shapiro, N.I.; Yealy, D.M.; Angus, D.C.; Calfee, C.S.; et al. Identification of a Hyperinflammatory Sepsis Phenotype Using Protein Biomarker and Clinical Data in the ProCESS Randomized Trial. Sci. Rep. 2024, 14, 6234. [Google Scholar] [CrossRef]
Papathanakos, G.; Andrianopoulos, I.; Xenikakis, M.; Papathanasiou, A.; Koulenti, D.; Blot, S.; Koulouras, V. Clinical Sepsis Phenotypes in Critically Ill Patients. Microorganisms 2023, 11, 2165. [Google Scholar] [CrossRef] [PubMed]
Lauritsen, S.M.; Thiesson, B.; Jørgensen, M.J.; Riis, A.H.; Espelund, U.S.; Weile, J.B.; Lange, J. The Framing of Machine Learning Risk Prediction Models Illustrated by Evaluation of Sepsis in General Wards. npj Digit. Med. 2021, 4, 158. [Google Scholar] [CrossRef] [PubMed]
Schvetz, M.; Fuchs, L.; Novack, V.; Moskovitch, R. Outcomes Prediction in Longitudinal Data: Study Designs Evaluation, Use Case in ICU Acquired Sepsis. J. Biomed. Inform. 2021, 117, 103734. [Google Scholar] [CrossRef] [PubMed]
Pedregosa, F.; Michel, V.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Vanderplas, J.; Cournapeau, D.; Pedregosa, F.; Varoquaux, G.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef]
Chen, M.; Hernández, A. Towards an Explainable Model for Sepsis Detection Based on Sensitivity Analysis. IRBM 2022, 43, 75–86. [Google Scholar] [CrossRef]
Rangan, E.S.; Pathinarupothi, R.K.; Anand, K.J.S.; Snyder, M.P. Performance Effectiveness of Vital Parameter Combinations for Early Warning of Sepsis—An Exhaustive Study Using Machine Learning. JAMIA Open 2022, 5, ooac080. [Google Scholar] [CrossRef]
Akbaş, T. Long Length of Stay in the ICU Associates with a High Erythrocyte Transfusion Rate in Critically Ill Patients. J. Int. Med. Res. 2019, 47, 1948–1957. [Google Scholar] [CrossRef]
Liang, C.; Rijin, C.; Jinli, W.; Xingwen, L.; En, M. Analysis of clinical characteristics of patients with chronic critical illness after sepsis. Chin. Crit. Care Med. 2021, 33, 1414–1417. [Google Scholar] [CrossRef]
Garami, A.; Steiner, A.A.; Romanovsky, A.A. Fever and Hypothermia in Systemic Inflammation. Handb. Clin. Neurol. 2018, 157, 565–597. [Google Scholar] [CrossRef]
Englert, N.C.; Ross, C. The Older Adult Experiencing Sepsis. Crit. Care Nurs. Q. 2015, 38, 175–181. [Google Scholar] [CrossRef] [PubMed]
Beutz, M.A.; Abraham, E. Community-Acquired Pneumonia and Sepsis. Clin. Chest Med. 2005, 26, 19–28. [Google Scholar] [CrossRef]
Kang, C.; Choi, S.; Jang, E.J.; Joo, S.; Jeong, J.H.; Oh, S.Y.; Ryu, H.G.; Lee, H. Prevalence and Outcomes of Chronic Comorbid Conditions in Patients with Sepsis in Korea: A Nationwide Cohort Study from 2011 to 2016. BMC Infect. Dis. 2024, 24, 184. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Meng, Z.; Li, Y.; Zhao, J.; Wu, S.; Gou, S.; Wu, H. Prognostic Accuracy of the Serum Lactate Level, the SOFA Score and the QSOFA Score for Mortality among Adults with Sepsis. Scand. J. Trauma. Resusc. Emerg. Med. 2019, 27, 51. [Google Scholar] [CrossRef]
Kaya, P.K.; Kaya, M.; Girgin, N.K.; Kahveci, F.; Akalın, E.H.; İşçimen, R. Sepsis Episodes Caused by Pressure Injuries in Critical Illness: A Retrospective Observational Cohort Study. Wound Manag. Prev. 2023, 69, 4–9. [Google Scholar] [CrossRef]
Jiang, X.; Zhang, C.; Pan, Y.; Cheng, X.; Zhang, W. Effects of C-Reactive Protein Trajectories of Critically Ill Patients with Sepsis on in-Hospital Mortality Rate. Sci. Rep. 2023, 13, 15223. [Google Scholar] [CrossRef] [PubMed]
Qi, D.; Peng, M. Early Hemoglobin Status as a Predictor of Long-Term Mortality for Sepsis Patients in Intensive Care Units. Shock 2021, 55, 215–223. [Google Scholar] [CrossRef]
Zhou, J.X.; Luo, X.Y.; Chen, G.Q.; Li, H.L.; Xu, M.; Liu, S.; Yang, Y.L.; Shi, G.; Zhou, J.X.; Zhang, L. Incidence, Risk Factors and Outcomes of Sepsis in Critically Ill Post-Craniotomy Patients: A Single-Center Prospective Cohort Study. Front. Public Health 2022, 10, 895991. [Google Scholar] [CrossRef]
Vetter, P.; Niggli, C.; Hambrecht, J.; Pape, H.C.; Mica, L. Sex-Specific Differences in Sepsis Development in Polytrauma Patients Undergoing Stand-Alone Definitive Surgery. Medicina 2025, 61, 183. [Google Scholar] [CrossRef]
Bosmann, M.; Ward, P.A. The Inflammatory Response in Sepsis. Trends Immunol. 2013, 34, 129–136. [Google Scholar] [CrossRef]
Chen, Q.; Li, W.; Wang, Y.; Chen, X.; He, D.; Liu, M.; Yuan, J.; Xiao, C.; Li, Q.; Chen, L.; et al. Investigating the Association Between Mean Arterial Pressure on 28-Day Mortality Risk in Patients With Sepsis: Retrospective Cohort Study Based on the MIMIC-IV Database. Interact. J. Med. Res. 2025, 14, e63291. [Google Scholar] [CrossRef]
Pettinati, M.J.; Chen, G.; Rajput, K.S.; Selvaraj, N. Practical Machine Learning-Based Sepsis Prediction. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 4986–4991. [Google Scholar] [CrossRef]
Liu, Z.; Khojandi, A.; Mohammed, A.; Li, X.; Chinthala, L.K.; Davis, R.L.; Kamaleswaran, R. HeMA: A Hierarchically Enriched Machine Learning Approach for Managing False Alarms in Real Time: A Sepsis Prediction Case Study. Comput. Biol. Med. 2021, 131, 104255. [Google Scholar] [CrossRef]
Yu, S.C.; Gupta, A.; Betthauser, K.D.; Lyons, P.G.; Lai, A.M.; Kollef, M.H.; Payne, P.R.O.; Michelson, A.P. Sepsis Prediction for the General Ward Setting. Front. Digit. Health 2022, 4, 848599. [Google Scholar] [CrossRef] [PubMed]
Xu, D.; Liu, F.; Ding, X.; Ma, J.; Suo, Y.; Peng, Y.-Y.; Li, J.; Fu, X. Exploring ICU Nurses’ Response to Alarm Management and Strategies for Alleviating Alarm Fatigue: A Meta-Synthesis and Systematic Review. BMC Nurs. 2025, 24, 412. [Google Scholar] [CrossRef]
Wong, A.; Amato, M.G.; Seger, D.L.; Slight, S.P.; Beeler, P.E.; Dykes, P.C.; Fiskio, J.M.; Silvers, E.R.; Orav, E.J.; Eguale, T.; et al. Evaluation of Medication-Related Clinical Decision Support Alert Overrides in the Intensive Care Unit. J. Crit. Care 2017, 39, 156–161. [Google Scholar] [CrossRef] [PubMed]
Ng, H.J.H.; Kansal, A.; Abdul Naseer, J.F.; Hing, W.C.; Goh, C.J.M.; Poh, H.; D’souza, J.L.A.; Lim, E.L.; Tan, G. Optimizing Best Practice Advisory Alerts in Electronic Medical Records with a Multi-Pronged Strategy at a Tertiary Care Hospital in Singapore. JAMIA Open 2023, 6, ooad056. [Google Scholar] [CrossRef] [PubMed]
Chaparro, J.D.; Beus, J.M.; Dziorny, A.C.; Hagedorn, P.A.; Hernandez, S.; Kandaswamy, S.; Kirkendall, E.S.; McCoy, A.B.; Muthu, N.; Orenstein, E.W. Clinical Decision Support Stewardship: Best Practices and Techniques to Monitor and Improve Interruptive Alerts. Appl. Clin. Inform. 2022, 13, 560–568. [Google Scholar] [CrossRef]
Romare, C.; Anderberg, P.; Sanmartin Berglund, J.; Skär, L. Burden of Care Related to Monitoring Patient Vital Signs during Intensive Care; a Descriptive Retrospective Database Study. Intensive Crit. Care Nurs. 2022, 71, 103213. [Google Scholar] [CrossRef]
Amland, R.C.; Haley, J.M.; Lyons, J.J. A Multidisciplinary Sepsis Program Enabled by a Two-Stage Clinical Decision Support System: Factors That Influence Patient Outcomes. Am. J. Med. Qual. 2016, 31, 501–508. [Google Scholar] [CrossRef][Green Version]
Narayanan, N.; Gross, A.K.; Pintens, M.; Fee, C.; Macdougall, C. Effect of an Electronic Medical Record Alert for Severe Sepsis among ED Patients. Am. J. Emerg. Med. 2016, 34, 185–188. [Google Scholar] [CrossRef]
Tafelski, S.; Nachtigall, I.; Deja, M.; Tamarkin, A.; Trefzer, T.; Halle, E.; Wernecke, K.D.; Spies, C. Computer-Assisted Decision Support for Changing Practice in Severe Sepsis and Septic Shock. J. Int. Med. Res. 2010, 38, 1605–1616. [Google Scholar] [CrossRef]
Liu, V.X.; Fielding-Singh, V.; Greene, J.D.; Baker, J.M.; Iwashyna, T.J.; Bhattacharya, J.; Escobar, G.J. The Timing of Early Antibiotics and Hospital Mortality in Sepsis. Am. J. Respir. Crit. Care Med. 2017, 196, 856–863. [Google Scholar] [CrossRef] [PubMed]
Islam, K.R.; Prithula, J.; Kumar, J.; Tan, T.L.; Reaz, M.B.I.; Sumon, M.S.I.; Chowdhury, M.E.H. Machine Learning-Based Early Prediction of Sepsis Using Electronic Health Records: A Systematic Review. J. Clin. Med. 2023, 12, 5658. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Shu, W.; Li, T.; Zhang, X.; Chong, W. Interpretable Machine Learning for Predicting Sepsis Risk in Emergency Triage Patients. Sci. Rep. 2025, 15, 887. [Google Scholar] [CrossRef]
Rockenschaub, P.; Akay, E.M.; Carlisle, B.G.; Hilbert, A.; Wendland, J.; Meyer-Eschenbach, F.; Näher, A.F.; Frey, D.; Madai, V.I. External Validation of AI-Based Scoring Systems in the ICU: A Systematic Review and Meta-Analysis. BMC Med. Inform. Decis. Making 2025, 25, 5. [Google Scholar] [CrossRef]
Bienefeld, N.; Boss, J.M.; Lüthy, R.; Brodbeck, D.; Azzati, J.; Blaser, M.; Willms, J.; Keller, E. Solving the Explainable AI Conundrum by Bridging Clinicians’ Needs and Developers’ Goals. npj Digit. Med. 2023, 6, 94. [Google Scholar] [CrossRef]
Huang, X.; Marques-Silva, J. On the Failings of Shapley Values for Explainability. Int. J. Approx. Reason. 2024, 171, 109112. [Google Scholar] [CrossRef]

Figure 1. Right-aligned prediction framework with sliding time windows and defined observation and prediction windows for sepsis modeling in this study. Adapted from Lauritsen S.M. et al., npj Digital Medicine. 2021;4:158. CC BY 4.0 [32].

Figure 2. Flowchart and data allocation into training, validation, and test datasets.

Figure 3. ROC-curves for ML models (sepsis prediction window 6 h). (A–C): PCI/CCI-focused approach; (D–F): universal sepsis prediction approach.

Figure 4. SHAP (SHapley Additive exPlanations) summary plots: impact of individual predictors on the model output.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yadgarov, M.Y.; Rebrova, O.Y.; Berikashvili, L.B.; Polyakov, P.A.; Kadantseva, K.K.; Yakovlev, A.A.; Grechko, A.V.; Likhvantsev, V.V. Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data. J. Clin. Med. 2026, 15, 777. https://doi.org/10.3390/jcm15020777

AMA Style

Yadgarov MY, Rebrova OY, Berikashvili LB, Polyakov PA, Kadantseva KK, Yakovlev AA, Grechko AV, Likhvantsev VV. Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data. Journal of Clinical Medicine. 2026; 15(2):777. https://doi.org/10.3390/jcm15020777

Chicago/Turabian Style

Yadgarov, Mikhail Ya., Olga Yu. Rebrova, Levan B. Berikashvili, Petr A. Polyakov, Kristina K. Kadantseva, Alexey A. Yakovlev, Andrey V. Grechko, and Valery V. Likhvantsev. 2026. "Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data" Journal of Clinical Medicine 15, no. 2: 777. https://doi.org/10.3390/jcm15020777

APA Style

Yadgarov, M. Y., Rebrova, O. Y., Berikashvili, L. B., Polyakov, P. A., Kadantseva, K. K., Yakovlev, A. A., Grechko, A. V., & Likhvantsev, V. V. (2026). Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data. Journal of Clinical Medicine, 15(2), 777. https://doi.org/10.3390/jcm15020777

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Model for Sepsis Prediction in Prolonged and Chronic Critical Illness: Development and Validation Using Retrospective Real-World ICU Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Source of Data

2.2. Study Design and Setting

2.3. Data Management

2.4. Statistical Analysis and Model Development

3. Results

3.1. Patient Characteristics

3.2. ML Sepsis Prediction Models

4. Discussion

4.1. Key Findings

4.2. Relationship with Previous Studies

4.3. Significance of the Study Findings

4.4. Strengths and Limitations

4.5. Future Studies and Prospects

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI