Background: Pre-existing diabetes mellitus is prevalent among critically ill adults and can influence initial glycemic targets, therapeutic decisions, and early risk stratification in the intensive care unit (ICU). However, diabetes status may be distributed across heterogeneous electronic health record (EHR) sources and may
[...] Read more.
Background: Pre-existing diabetes mellitus is prevalent among critically ill adults and can influence initial glycemic targets, therapeutic decisions, and early risk stratification in the intensive care unit (ICU). However, diabetes status may be distributed across heterogeneous electronic health record (EHR) sources and may be incomplete at the time of ICU admission, particularly for inter-facility transfers. Methods: Using the public WiDS Datathon 2021 tabular release derived from the Global Open-Source Severity of Illness Score (GOSSIS) initiative, we conducted a retrospective machine-learning benchmarking study for admission-time identification of documented diabetes status in ICU patients. Candidate predictors included demographics, admission characteristics, anthropometrics, day-1 physiologic and laboratory summaries, APACHE-related variables, comorbidity indicators, and site descriptors. We compared CatBoost, random forest, tuned XGBoost, tuned LightGBM, histogram-based gradient boosting, and a soft-voting ensemble combining XGBoost, LightGBM, and histogram-based gradient boosting. Because class imbalance was a central concern, the final workflow emphasized model-intrinsic class weighting and threshold-aware evaluation rather than synthetic oversampling. Results: In the primary leakage-mitigated random validation split, the voting ensemble achieved the highest overall balance, with AUROC 0.8539, precision 0.5671, recall 0.6690, and F1-score 0.6138. Tuned LightGBM was the most sensitivity-oriented individual model, achieving recall 0.7677 and AUROC 0.8537, although with lower precision and a less favorable Brier score. Ablation analyses clarified the source of this performance: removing leakage-prone and APACHE-related variables caused only modest decreases in discrimination, whereas the strict reduced model that also excluded glucose-like predictors produced a marked decline, with LightGBM AUROC falling to 0.7432 and the voting ensemble AUROC falling to 0.7448. These findings, together with SHAP analyses identifying day-1 glucose maximum, day-1 glucose minimum, BMI, age, hemoglobin, and related clinical variables as major contributors, indicate that glucose-related admission variables remained the dominant predictive signal. In grouped hospital validation, tuned LightGBM maintained recall of 0.7684 while AUROC decreased modestly to 0.8443, indicating preserved case detection under stricter site separation but reduced precision. Precision–recall analysis further showed that average precision decreased from 0.622 under random validation to 0.551 under grouped validation; at a high-sensitivity grouped-site operating point, a probability threshold of 0.4537 achieved recall of 0.8001 with precision of 0.4314. Calibration curves and Brier scores showed that predicted probabilities were imperfectly calibrated. Conclusions: Although the dominance of glucose-related predictors is clinically plausible for identifying documented diabetes status, early glycemic measurements in critically ill patients may also partly capture acute stress physiology, treatment-related effects, monitoring intensity, or other forms of acute dysglycemia rather than chronic diabetes status alone. Therefore, these findings support gradient-boosted and ensemble models as reproducible tools for ICU admission-time phenotyping of documented diabetes status, but the proposed system should be interpreted primarily as a screening-oriented phenotyping aid for chart review, cohort enrichment, or workflow support, not as a stand-alone diagnostic tool. Further external validation, recalibration, threshold selection matched to intended use, and clinical review are needed before deployment.
Full article