Abstract
Background/Objectives: Acute respiratory distress syndrome (ARDS) represents a critical complication in polytrauma patients, characterized by diffuse lung inflammation and bilateral pulmonary infiltrates with mortality rates reaching 45% in intensive care units (ICU). The heterogeneous nature of ARDS and complex clinical presentation in severely injured patients poses substantial diagnostic challenges, necessitating early prediction tools to guide timely interventions. Machine learning (ML) algorithms have emerged as promising approaches for clinical decision support, demonstrating superior performance compared to traditional scoring systems in capturing complex patterns within high-dimensional medical data. Based on the identified research gaps in early ARDS prediction for polytrauma populations, our study aimed to: (i) develop a balanced random forest (BRF) ML model for early ARDS prediction in critically ill polytrauma patients, (ii) identify the most predictive clinical features using ANOVA-based feature selection, and (iii) evaluate model performance using comprehensive metrics addressing class imbalance challenges. Methods: This retrospective cohort study analyzed 407 polytrauma patients admitted to the ICU of the Center of Traumatology and Major Burns of Ben Arous, Tunisia, between 2017 and 2021. We implemented a comprehensive ML pipeline that incorporates Tomek Links undersampling, ANOVA F-test feature selection for the top 10 predictive variables, and SMOTE oversampling with a conservative sampling rate of 0.3. The BRF classifier was trained with class weighting and evaluated using stratified 5-fold cross-validation. Performance metrics included AUROC, PR-AUC, sensitivity, specificity, F1-score, and Matthews correlation coefficient. Results: Among 407 patients, 43 developed ARDS according to the Berlin definition, representing a 10.57% incidence. The BRF model demonstrated exceptional predictive performance with an AUROC of 0.98, a sensitivity of 0.91, a specificity of 0.80, an F1-score of 0.84, and an MCC of 0.70. Precision–recall AUC reached 0.86, demonstrating robust performance despite class imbalance. During stratified cross-validation, AUROC values ranged from 0.93 to 0.99 across folds, indicating consistent model stability. The top 10 selected features included procalcitonin, PaO2 at ICU admission, 24-h pH, massive transfusion, total fluid resuscitation, presence of pneumothorax, alveolar hemorrhage, pulmonary contusion, hemothorax, and flail chest injury. Conclusions: Our BRF model provides a robust, clinically applicable tool for early prediction of ARDS in polytrauma patients using readily available clinical parameters. The comprehensive two-step resampling approach, combined with ANOVA-based feature selection, successfully addressed class imbalance while maintaining high predictive accuracy. These findings support integrating ML approaches into critical care decision-making to improve patient outcomes and resource allocation. External validation in diverse populations remains essential for confirming generalizability and clinical implementation.