Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients

Si, Yong; Fan, Junyi; Sun, Li; Chen, Shuheng; Pishgar, Elham; Alaei, Kamiar; Placencia, Greg; Pishgar, Maryam

doi:10.3390/biomedinformatics6020017

Open AccessArticle

Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients

by

Yong Si

¹,

Junyi Fan

¹,

Li Sun

¹

,

Shuheng Chen

¹,

Elham Pishgar

²,

Kamiar Alaei

³,

Greg Placencia

⁴ and

Maryam Pishgar

^1,*

¹

Department of Industrial and Systems Engineering, University of Southern California, 3715 McClintock Ave. GER 240, Los Angeles, CA 90087, USA

²

Colorectal Research Center, Iran University of Medical Sciences, Tehran Hemat Highway Next to Milad Tower, Tehran 14535, Iran

³

Department of Health Science, California State University, Long Beach (CSULB), 1250 Bellflower Blvd., Long Beach, CA 90840, USA

⁴

Department of Industrial and Manufacturing Engineering, California State Polytechnic University, Pomona, 3801 W Temple Ave., Pomona, CA 91768, USA

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2026, 6(2), 17; https://doi.org/10.3390/biomedinformatics6020017 (registering DOI)

Submission received: 20 February 2026 / Revised: 17 March 2026 / Accepted: 24 March 2026 / Published: 30 March 2026

Download

Browse Figures

Versions Notes

Abstract

Traumatic Brain Injury (TBI) is a major contributor to mortality among older adults, with geriatric patients facing disproportionately high risk due to age-related physiological vulnerability and comorbidities. Early and accurate prediction of mortality is essential for guiding clinical decision-making and optimizing ICU resource allocation. In this study, we utilized the MIMIC-III database and identified a final analytic cohort of 667 geriatric TBI patients, on which we developed a machine learning framework for 30-day mortality prediction. A rigorous preprocessing pipeline—including Random Forest-based imputation, feature engineering, and hybrid selection—was implemented to refine predictors from 69 to 9 clinically meaningful variables. CatBoost emerged as the top-performing model, achieving an AUROC of 0.867 (95% CI: 0.809–0.922), with a sensitivity of 0.752 and a specificity of 0.888 on the independent test set. SHAP analysis confirmed the importance of the GCS score, oxygen saturation, and prothrombin time as dominant predictors. These findings highlight the potential value of interpretable machine learning tools for early mortality risk stratification in elderly TBI patients and support further validation for future clinical use.

Keywords:

traumatic brain injury; geriatric; ICU; mortality prediction; machine learning; MIMIC-III

1. Introduction

Traumatic Brain Injury (TBI) arises from external mechanical forces—such as blunt trauma, acceleration–deceleration, or penetrating injuries—that disrupt neurological function [1,2,3]. Its pathophysiology involves primary structural damage and secondary cascades (e.g., neuroinflammation, axonal injury), driving heterogeneous clinical outcomes [3,4,5]. Globally, TBI affects millions annually, with an estimated 69 million cases each year, making it a leading cause of disability and death [6,7,8]. The economic burden of TBI is substantial, with annual healthcare costs in the U.S. alone exceeding $76 billion, encompassing acute care, rehabilitation, and lost productivity [9,10].

Older adults are disproportionately vulnerable to TBI, with falls—responsible for over 50% of geriatric TBI cases—driven by age-related risks such as polypharmacy, gait instability, and osteoporosis [11,12,13]. In the U.S., annual TBI-related healthcare costs for older adults exceed $10 billion, reflecting acute care, rehabilitation, and long-term disability [14,15]. Especially, older adults with TBI face disproportionately high mortality rates, with studies reporting 2–3-fold increased risk of death compared to younger cohorts, driven by complications such as sepsis, respiratory failure, and neurodegenerative exacerbations [16]. Given the global aging population—projected to double by 2050—this mortality burden is poised to escalate, straining healthcare systems [17]. Considering this urgent trend, accurate mortality prediction tools (e.g., machine learning-driven models or biomarker panels) are now critical to identify high-risk patients, tailor interventions (e.g., early palliative care), and prioritize resource allocation. Concurrently, age-specific prevention (e.g., fall reduction programs) and rapid diagnostic protocols remain essential to mitigate TBI incidence and improve outcomes in this vulnerable population [18]. Compared with younger adults, geriatric TBI patients may exhibit clinically distinct characteristics, including lower physiological reserve, greater multimorbidity, frequent exposure to anticoagulant or antiplatelet therapy, and age-related structural brain changes. These differences can influence presentation, deterioration patterns, and mortality risk, highlighting the importance of age-specific prognostic assessment.

Fu et al. (2017) [19] analyzed trends in TBI hospitalizations and mortality among elderly adults (65+) in Canada from 2006 to 2011 using a population-based database. Advanced age, comorbidities, and injury severity were independent predictors of both falls and in-hospital mortality. The authors suggest that prevention efforts should focus on the “older old” (85+) and those with multiple comorbidities, and recommend that healthcare facilities be prepared to manage this growing, complex patient population.

Bobeff et al. (2019) [20] developed the Elderly Traumatic Brain Injury Score (eTBI Score) to predict 30-day mortality or vegetative state in geriatric TBI patients. The study analyzed data from 214 patients aged ≥ 65 years, focusing on demographics, medical history, and clinical factors. Key predictors identified through logistic regression included Glasgow Coma Scale (GCS) motor score (OR 0.17), comorbid cardiac, pulmonary, or renal dysfunction or malignancy (OR 2.86), platelet count ≤100 × 10⁹ cells/L (OR 13.60), and red blood cell distribution width ≥ 14.5% (OR 2.91). The eTBI Score provides a practical tool for clinical decision-making and risk stratification in elderly TBI patients, offering reliable outcomes for managing treatment [20].

Huang et al. (2024) [21] assessed the utility of the Geriatric Trauma Outcome Score (GTOS) in predicting mortality in older adults with isolated moderate to severe TBI. The study included 5543 patients and found that higher GTOS was significantly associated with increased mortality, with the optimal cutoff value for mortality prediction identified as 121.5 (AUC = 0.813). Patients with GTOS ≥ 121.5 had higher odds of death (OR 2.64; 95% CI 1.93–3.61) and longer hospital stays. These findings suggest that GTOS is an effective tool for risk stratification in TBI patients, though further refinement is needed for broader clinical application [21].

While traditional models such as eTBI and GTOS have been commonly used to predict outcomes in geriatric TBI patients, they have certain limitations. These models typically focus on a narrow range of clinical factors and may not fully account for the complexities of older patients, such as comorbidities and frailty. Furthermore, they often struggle with incorporating high-dimensional data, such as laboratory results and imaging findings, which can reduce their accuracy in predicting mortality. Consequently, there is a critical need for a more comprehensive and adaptable predictive tool that can improve the accuracy and reliability of mortality predictions in elderly TBI patients, addressing the shortcomings of existing models.

In recent years, machine learning techniques have gained significant traction in predicting clinical outcomes, with the CatBoost algorithm emerging as a highly effective tool. CatBoost was considered as one of the candidate models because its design may be advantageous for structured clinical data analysis: (1) it incorporates robust regularization mechanisms that help prevent overfitting, particularly in datasets with high-dimensional features, (2) it efficiently handles missing values, a common challenge in clinical data, and (3) it provides superior feature importance quantification, which enhances the interpretability of the model for clinical decision-making. These characteristics make CatBoost a promising approach for clinical outcome prediction [22]. For example, Li et al. (2023) [23] developed a CatBoost-based model to predict hospital mortality in ICU patients receiving mechanical ventilation (MV) using the MIMIC-III database. The model showed strong discriminative performance, achieving an AUROC of 0.862 (95% Confidence Interval (CI): 0.850–0.874) in internal validation, surpassing the best reported AUROC of 0.821. Additionally, the model demonstrated improved accuracy (0.789), F1-score (0.747), and calibration, outperforming other machine learning models, including XGBoost, Random Forest, and Support Vector Machine (SVM) [23]. Safaei et al. (2022) [24] developed an optimized CatBoost-based model, E-CatBoost, to predict ICU mortality status upon discharge using data from the first 24 h of admission. The model was trained and validated using the eICU-CRD v2.0 dataset, which includes over 200,000 ICU admissions. The E-CatBoost model achieved AUROC scores ranging from 0.86 to 0.92 across twelve disease groups, outperforming baseline models by 7 to 18 percent. The study identified key features in mortality prediction, including age, heart rate, respiratory rate, blood urea nitrogen, and creatinine levels, providing valuable insights into critical patient conditions for early intervention [24]. These successful cases in developing machine learning-based models for clinical outcome prediction illustrate a feasible and promising future for this approach.

This study introduces several key innovations that enhance the predictive performance and clinical applicability of machine learning models for mortality prediction in geriatric TBI patients.

Hybrid Feature Selection Strategy: The study utilized a combination of Random Forest-based importance and Recursive Feature Elimination (RFE) to refine an initial pool of candidate variables, resulting in a set of clinically relevant features. This method effectively reduced dimensionality and ensured the retention of features most critical for mortality prediction.
Data Imputation and Preprocessing: To address common challenges in clinical datasets, such as missing data and heterogeneity in measurements, the study implemented Random Forest-based imputation. This approach preserved the interdependencies of variables, enhancing the robustness of the model.
Model Selection and Evaluation: A range of machine learning models, including CatBoost, LightGBM, and XGBoost, were evaluated. CatBoost demonstrated the best performance, achieving an AUROC of 0.867 (95% CI: 0.809–0.922), outperforming other models such as LightGBM and XGBoost, which had AUROCs of 0.852 and 0.855, respectively. This indicates that CatBoost achieved the highest point estimate for AUROC in this cohort, with a favorable balance between sensitivity and specificity.
Interpretability and Clinical Integration: To ensure clinical transparency and utility, SHAP (SHapley Additive exPlanations) analysis was applied to interpret the feature contributions. Key predictors such as GCS score, oxygen saturation, and prothrombin time were identified as major contributors to the model’s predictions, providing actionable insights for clinicians.

2. Methods

2.1. Data Source and Study Design

This study utilized the Medical Information Mart for Intensive Care III (MIMIC-III) database, a publicly available and extensively curated critical care dataset developed by the Laboratory for Computational Physiology at the Massachusetts Institute of Technology (MIT). MIMIC-III comprises de-identified health records of over 40,000 patients admitted to the Beth Israel Deaconess Medical Center (BIDMC) between 2001 and 2012. The database complies with the Health Insurance Portability and Accountability Act (HIPAA) through systematic de-identification procedures, including date shifting and removal of personally identifiable information.

MIMIC-III offers a wide range of structured and unstructured clinical data, including patient demographics, clinical notes, physiological time series, laboratory results, medication administration, and treatment information [25].

This study implemented a structured and clinically grounded machine learning pipeline to predict 30-day mortality in geriatric TBI patients. The workflow integrated rigorous data selection, preprocessing, feature engineering, feature selection, model development, and statistical validation, as illustrated in Algorithm 1.

Algorithm 1 Machine Learning Pipeline for 30-Day Mortality Prediction in Geriatric TBI Patients

Require: Final analytic cohort with extracted clinical features from the first 24 h of ICU admission
Ensure: Trained machine learning models with evaluated predictive performance

1:: Feature Construction
2:: for all numerical variables do
3:: Aggregate minimum, maximum, and mean values over the first 24 h
4:: end for
5:: for all categorical variables do
6:: Apply target encoding
7:: end for
8:: Missing Data Handling
9:: Impute missing values using Random Forest-based imputation
10:: Feature Scaling
11:: Normalize all continuous variables using z-score standardization
12:: Feature Selection
13:: Remove features with high missingness (>80%)
14:: Rank features using Random Forest importance
15:: Select a parsimonious subset of predictive features
16:: Model Development
17:: Perform stratified 70/30 train–test split
18:: for all models ∈ {CatBoost, LightGBM, XGBoost, Logistic Regression, KNN, Gaussian Naïve Bayes, Neural Network} do
19:: Tune hyperparameters via grid search with 5-fold cross-validation
20:: Train model on training set
21:: Evaluate performance on test set
22:: end for
23:: Model Evaluation
24:: Compute AUROC, accuracy, sensitivity, specificity, and F1-score
25:: Estimate 95% confidence intervals for AUROC using bootstrap resampling
26:: Plot calibration curves and compute Brier scores
27:: Model Interpretation
28:: Apply SHAP to assess global and local feature importance
29:: Conduct ablation study by iteratively removing features and re-evaluating AUROC

Patients were identified from the MIMIC-III database using standardized ICD-9 head trauma codes, with exclusion criteria applied to remove individuals under 65 years of age, ICU stays under 24 h, and cases with missing vital signs. Patients with ICU stays shorter than 24 h were excluded to ensure a consistent observation window for extracting early physiologic and laboratory variables from the first 24 h of ICU admission. This criterion was intended to improve comparability of model inputs across patients and reduce instability arising from incomplete early ICU data. The final analytic cohort consisted of 667 patients.

Preprocessing steps included aggregation of numerical features across the first 24 h (min, max, mean), unit harmonization, and z-score standardization. Missing values were imputed using Random Forest-based methods to preserve feature interdependencies.

From an initial 69 clinically relevant features informed by prior studies, we removed variables with more than 80% missingness and applied a hybrid feature selection strategy using Random Forest importance (via Gini impurity). This process yielded 17 candidate features for subsequent refinement, spanning demographics, comorbidities, vital signs, and functional scores. These 17 features served as an intermediate candidate set, from which a final parsimonious subset was selected.

Seven machine learning models were trained and tuned using 5-fold cross-validation with stratified sampling. CatBoost was retained as the primary model for downstream interpretation because it achieved the highest point estimate on the internal evaluation metrics while remaining clinically interpretable. Hyperparameters were optimized to jointly maximize discrimination and calibration.

Statistical validation procedures were used to assess internal model robustness. Cohort comparability was confirmed using independent t-tests. Feature-level contributions were assessed through ablation studies, and interpretability was achieved via SHAP, revealing clinically plausible predictors such as GCS, heart rate, and oxygen saturation.

This comprehensive and reproducible framework supports the development of clinically interpretable models for ICU risk stratification in elderly TBI patients.

2.2. Patient Selection

Patients diagnosed with TBI were retrospectively identified from the publicly available MIMIC-III database [25] using International Classification of Diseases, Ninth Revision (ICD-9) codes indicative of head trauma. Specifically, admissions were screened using the ICD-9 code ranges 80000–80199, 80300–80499, and 85000–85419, resulting in an initial pool of 3025 unique ICU admissions. Structured Query Language (SQL) queries were executed in a PostgreSQL environment to support reproducibility and adherence to the predefined cohort selection logic. A stepwise summary of the cohort extraction workflow and ICD-9 case-definition ranges has been added to the Supplementary Materials.

To focus on a geriatric population, only patients aged 65 years or older at the time of ICU admission were included. Patients were excluded if they had an ICU length of stay shorter than 24 h, lacked admission Glasgow Coma Scale (GCS) records, or had missing key early physiological variables required for model development, including mean arterial pressure (MAP), heart rate, oxygen saturation (SpO₂), and body temperature. These variables were treated as essential early predictors in the mortality modeling framework because they were used to characterize neurologic and physiologic status during the first 24 h of ICU admission.

For patients with multiple ICU admissions, only the first ICU admission was retained to avoid intra-patient correlation. We acknowledge that excluding patients with missing GCS or early physiological measurements may have introduced selection bias; however, these variables were considered core components of the early mortality prediction framework, and their absence would have limited comparability across patients during feature construction. After applying all inclusion and exclusion criteria, a total of 667 unique geriatric ICU patients with confirmed TBI were included in the final analytic cohort.

The patient extraction workflow is illustrated in Figure 1, and additional details regarding ICD-9 case identification and stepwise cohort construction are provided in the Supplementary Materials.

2.3. Data Preprocessing

A rigorous, multi-stage data preprocessing strategy was implemented to ensure data integrity, internal consistency, and analytical validity prior to model development. This step was crucial for addressing common challenges in clinical datasets, including missingness, heterogeneous measurement units, and feature scale imbalances—each of which can significantly compromise model accuracy, generalizability, and interpretability if left unaddressed [26,27].

To ensure data quality, we first excluded variables with more than 80% missingness, which resulted in the removal of features such as arterial lactate, AST/ALT levels, and non-invasive systolic blood pressure—variables that, although clinically relevant, lacked sufficient coverage for robust modeling. Variables with more than 80% missing values were excluded prior to modeling in order to balance information retention with data reliability, as variables with extremely high missingness may contribute limited stable information and may increase uncertainty during imputation. For the remaining variables, missing values were handled using a Random Forest-based imputation approach. This method was selected because it can accommodate nonlinear relationships and interactions among variables. Alternative missingness thresholds and imputation specifications were not systematically compared in the present study, and this is now acknowledged as a methodological limitation. Unlike conventional univariate imputation techniques such as mean or median substitution, Random Forests model each variable with missing values as a function of all other available features, thereby preserving nonlinear inter-variable relationships and enhancing the clinical realism of the imputed dataset. This method is particularly advantageous for ICU data, where multivariate dependencies among vital signs and laboratory variables are critical for reliable risk modeling. Additionally, its compatibility with both categorical and continuous data types enabled comprehensive treatment of the dataset’s heterogeneous feature space.

Post-imputation, unit consistency was enforced across the dataset. For example, temperature values recorded in both Fahrenheit and Celsius were converted to a common scale. Furthermore, variables such as patient age and ICU length of stay were calculated using precise timestamp differentials to ensure temporal accuracy and data reproducibility.

The nominal categorical features, including ethnicity and marital status, were processed using target encoding because they lacked intrinsic ordinal relationships but exhibited potential associations with the outcome variable (mortality). In this method, each category was replaced with the mean 30-day mortality rate observed among patients within that category in the training set.

{ETHNICITY}_{i} = \frac{\sum_{j \in {category}_{i}} {MORTALITY}_{j}}{| {category}_{i} |}

(1)

where category i represents all patients sharing the same ethnicity value, and

{MORTALITY}_{j}

denotes the 30-day mortality outcome for patient j in the training set. Similar target encoding was applied to marital status, resulting in MARITAL_STATUS. To reduce the risk of data leakage, target encoding was derived exclusively from the training set and then applied to the test set using the training-derived mappings only. No outcome information from the test set was used during the encoding process. We note, however, that a cross-validated or smoothed target encoding strategy was not implemented in the present study, and this is now acknowledged as a methodological limitation.

To mitigate the risk of scale-driven model bias, all continuous variables were normalized using z-score standardization:

z_{i} = \frac{x_{i} - μ}{σ}

(2)

where

μ

and

σ

denote the mean and standard deviation of each variable, computed exclusively from the training dataset. This transformation ensured that all features contributed equitably to gradient-based learning algorithms, especially tree-based and neural network models that are sensitive to variable magnitude.

To address the significant class imbalance in the dataset, the Synthetic Minority Oversampling Technique was applied exclusively to the training set following the stratified 70/30 split. SMOTE synthetically generates new samples for the minority class by interpolating between existing observations, thereby improving class representation without duplicating data. This technique mitigates bias toward the majority class, enhances the model’s sensitivity to minority outcomes, and helps maintain the validity of performance metrics. SMOTE was applied uniformly across all baseline models to ensure a fair comparison under balanced training conditions. For CatBoost, class-weighted loss functions were additionally evaluated to confirm robustness.

Collectively, these preprocessing procedures yielded a clean, harmonized, and temporally enriched dataset with preserved clinical structure and minimized noise. This high-quality analytical input formed the foundation for interpretable internal modeling of 30-day mortality among geriatric TBI patients.

2.4. Feature Selection

A supervised feature selection strategy was adopted to identify the most informative variables for predicting 30-day mortality in geriatric TBI patients. The full pipeline, including variable filtering, importance ranking, and final selection, is illustrated in Figure 2.

To establish a clinically interpretable and statistically robust input space for model development, we began by compiling a set of 69 candidate features. These variables were selected based on prior literature in TBI prognosis modeling and consultations with clinical experts specializing in neurocritical care [28,29]. The initial features were systematically categorized into three primary domains:

Demographics and Administrative Indicators, such as patient age, marital status, ethnicity, and type of insurance, were included to account for population-level heterogeneity and care access disparities.
Comorbidities, including cardiovascular disease, diabetes, chronic respiratory illness, and dementia, were incorporated to reflect pre-existing conditions known to influence short-term outcomes after TBI.
Clinical Measurements, which encompassed vital signs (heart rate, respiratory rate, temperature, oxygen saturation), neurological assessment (GCS score), and laboratory values (e.g., prothrombin time), captured the acute physiological status within the first 24 h of ICU admission.

Subsequently, we employed a model-driven feature selection strategy leveraging Random Forest-based importance scores. Random Forest was chosen over other ensemble classifiers (e.g., XGBoost, AdaBoost) due to its demonstrated stability across variable scales and its ability to handle mixed data types with minimal preprocessing. Feature importance was quantified using the Gini impurity reduction criterion aggregated across all trees in the forest, defined as:

I (x_{i}) = \sum_{t \in T} \frac{p (t) \cdot Δ i (t)}{f (t)}

(3)

where

I (x_{i})

denotes the importance score of feature

x_{i}

, T is the set of all nodes in all trees of the random forest,

p (t)

is the proportion of samples reaching node t,

Δ i (t)

is the decrease in Gini impurity at node t, and

f (t)

is the frequency with which feature

x_{i}

appears in tree splits.

This process enabled the identification of variables that consistently contributed to accurate classification while discarding those with marginal or redundant information. For instance, several comorbidities (e.g., metastatic cancer, paraplegia), administrative indicators (e.g., insurance type), and common chronic conditions (e.g., cardiovascular disease, diabetes) were removed due to low predictive importance. Additionally, redundant physiological indicators (e.g., both maximum and mean temperature) were excluded to reduce multicollinearity. This process yielded 17 candidate features with strong predictive relevance.

Model selection was initially guided by expert clinical opinion to identify nine clinically relevant features for ICU mortality prediction. This expert-driven selection was subsequently evaluated through cross-validation analysis. Although both the 9-feature and 15-feature models achieved perfect training accuracy (AUC = 1.000), such near-perfect training performance suggests possible overfitting. Compared with the 15-feature model, the expert-selected 9-feature model showed a slightly smaller overfitting gap (0.204 vs. 0.207) and more stable cross-validation performance (standard deviation: 0.044 vs. 0.057). Given their similar cross-validation performance (CV-AUC: 0.796 vs. 0.793), the 9-feature model was retained because it offered a more parsimonious and clinically interpretable feature set, while the cross-validation results likely provide a more realistic estimate of internal model performance than the perfect training results alone.

The selection process also incorporated input from clinical experts, who reviewed the retained features to ensure clinical plausibility and relevance. The final set of features was determined by jointly considering model-driven importance scores, domain knowledge, and the practical need to maintain a concise and interpretable feature set. In particular, we aimed to minimize the number of predictors to reduce model complexity and improve internal robustness. A parsimonious model may be less prone to overfitting, easier to implement within hospital systems, and better aligned with the operational constraints of intensive care units where time and data availability are often limited.

In total, 9 features were selected for model training. These final predictors strike a balance between parsimony and clinical representativeness, covering essential dimensions such as neurological status (e.g., GCS_SCORE), physiological stability (e.g., PT, Oxygen Saturation), and baseline demographics (e.g., AGE, ETHNICITY). In MIMIC-III, the dates of birth of patients older than 89 years are shifted for de-identification in compliance with HIPAA regulations, resulting in recorded ages greater than 300 years. To address this database-specific limitation and avoid distortion from artificially shifted age values, we recoded all ages ≥ 300 years as 90. However, we acknowledge that this approach reduces age granularity among the oldest patients and may limit the ability to distinguish risk differences within the very elderly subgroup. This feature set served as the foundation for all downstream modeling and interpretability analyses. The final nine selected features are summarized in Table 1, and their detailed definitions and encoding methods are provided in Table 2.

2.5. Model Development and Evaluation

To predict 30-day mortality among elderly ICU patients with TBI, we implemented a supervised learning framework using the 9 selected features. The dataset was randomly partitioned into a training set (70%) and a holdout test set (30%) using stratified sampling to preserve outcome distributions.

We developed and compared seven machine learning classifiers with complementary algorithmic principles and empirical advantages: CatBoost, LightGBM, XGBoost, Logistic Regression (LR), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (NB), and a fully connected Neural Network (NeuralNet). Each model was trained using grid search with stratified 5-fold cross-validation to ensure robust hyperparameter selection. Final evaluations were conducted exclusively on the unseen test set to assess generalization performance. Figure 3 provides a schematic overview of the modeling pipeline.

CatBoost was chosen for its ability to effectively handle categorical features and small datasets through ordered boosting and symmetric tree structures. Its unique approach to encoding and its inherent resistance to overfitting made it particularly well-suited for clinical scenarios where variable interdependence and missingness are common. Similarly, LightGBM offered a highly efficient gradient boosting solution, employing histogram-based binning and leaf-wise tree growth to accelerate training without sacrificing performance. This made it advantageous for scenarios involving high-dimensional or partially sparse input vectors. XGBoost, with its regularized learning objective and second-order optimization, served as a robust and well-established benchmark, particularly adept at capturing subtle nonlinearities and interactions across features. For all boosting models, extensive grid search was conducted to optimize key hyperparameters, including learning rate, tree depth, regularization penalties, and feature sampling ratios, thereby ensuring stable convergence and generalizability.

To provide an interpretable linear baseline, we incorporated Logistic Regression, which retains clinical utility due to its direct interpretability and well-understood statistical foundations. In this model, both L1 (Lasso) and L2 (Ridge) penalties were evaluated to manage multicollinearity and enhance model sparsity, with the inverse regularization strength (C) tuned to balance complexity and fit. K-Nearest Neighbors was included to assess the impact of local neighborhood structures on classification accuracy. As a non-parametric and instance-based learner, KNN is particularly sensitive to the spatial distribution of samples and thus provides insight into the geometric separability of the feature space. Hyperparameters such as the number of neighbors, distance metric, and weighting schemes were systematically tuned.

Gaussian Naïve Bayes, despite its strong independence assumptions, was incorporated as a lightweight probabilistic benchmark due to its resilience in high-dimensional settings and computational efficiency. Its generative nature and closed-form estimators allowed for rapid evaluation, offering a contrast to more complex discriminative models. Finally, a shallow feedforward Neural Network was employed to evaluate the expressive capacity of flexible, nonlinear function approximators. The network, consisting of a single hidden layer with ReLU activation and a sigmoid output, was trained using the Adam optimizer. Hyperparameters including neuron count, learning rate, batch size, and dropout rate were optimized to mitigate overfitting while enabling the model to capture latent interactions.

All models were trained using stratified 5-fold cross-validation with grid search optimization on the training set. The independent test set was withheld from all training and validation procedures and used exclusively for final evaluation, providing an internal estimate of each model’s generalization performance.

Model performance was evaluated using the AUROC on the test set. Robustness was assessed by computing 95% confidence intervals via 2000 bootstrap replicates. Letting

{\hat{y}}_{i}

denote predicted probabilities and

y_{i}

the ground-truth labels, models were trained to minimize a regularized loss function:

L (θ) = \sum_{i = 1}^{n} ℓ (y_{i}, {\hat{y}}_{i}) + Ω (f),

(4)

where ℓ is the binary cross-entropy loss and

Ω (f)

penalizes model complexity.

This structured modeling pipeline was designed to support a fair comparison of diverse classifiers, reduce overfitting risk, and produce an interpretable predictive model for early mortality risk stratification.

2.6. Statistical Evaluation and Interpretability Framework

To evaluate discrimination, robustness, and potential clinical relevance of the proposed machine learning model for 30-day mortality risk prediction in patients with TBI, we implemented a structured three-part statistical evaluation strategy: (1) assessing the comparability of training and test cohorts via t-test, (2) quantifying feature contribution through ablation analysis, and (3) enhancing transparency via SHAP.

First, we evaluated cohort equivalence to confirm that the training and test cohorts were statistically comparable. Two-sided Student’s t-tests were conducted on continuous clinical variables such as age, vital signs, and laboratory results:

t = \frac{{\bar{X}}_{1} - {\bar{X}}_{2}}{\sqrt{s_{p}^{2} (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}

(5)

where

{\bar{X}}_{1}

and

{\bar{X}}_{2}

represent group means and

s_{p}^{2}

is the pooled variance. Welch’s correction was applied when variance homogeneity was not met. Together, these metrics enabled a multidimensional assessment of discrimination, calibration, and practical utility, thereby improving the interpretability of the performance comparison.

Second, we performed an ablation analysis to assess the relative importance of individual features to the model’s predictive power. This involved iteratively removing each feature from the input set and measuring the change in model performance:

Δ_{i} = AUROC (f_{full}) - AUROC (f_{- i})

(6)

where

f_{full}

and

f_{- i}

represent the models with and without feature

x_{i}

, respectively. This approach provides an intuitive and quantifiable way to identify which features are most influential to prediction outcomes. Such insights support feature prioritization for clinicians and inform the design of simplified risk scores or targeted monitoring protocols.

Third, we employed SHAP to improve interpretability at both global and individual levels. SHAP computes additive contributions of each feature to the final model output for a given patient:

ϕ_{i} = \sum_{S \subseteq N ∖ {i}} \frac{| S |! (| N | - | S | - 1)!}{| N |!} [f (S \cup {i}) - f (S)]

(7)

where N is the set of all input features. Unlike traditional importance rankings, SHAP allows real-time interpretation of model predictions for individual patients, offering clinicians actionable insight into which clinical variables contributed most to the estimated risk. This transparency may improve trust in future clinical evaluation of the model. Together, these validation strategies and interpretability analyses supported the interpretation of the model outputs in a clinically relevant context.

3. Results

3.1. Cohort Characteristics and Statistical Comparison

The data set analyzed in this study includes hospitalized patients diagnosed with TBI. Patients were randomly assigned into a training cohort (70%, n = 466) and a test cohort (30%, n = 200) to develop and evaluate machine learning models for predicting 30-day mortality. Separately, patients were grouped based on survival outcomes, with 506 patients surviving (Group 1) and 160 patients not surviving (Group 0) within 30 days.

The cohort split was designed to ensure that the machine learning models could generalize effectively to unseen patients. To validate this, it is critical that the training and test sets maintain similar clinical profiles. Table 3 summarizes statistical comparisons in eight key clinical characteristics, presenting means, standard deviations, and associated p-values. A significance threshold of 0.05 was adopted. Most baseline variables did not differ significantly between the training and test sets, suggesting that the random split generally preserved the overall data structure. However, a modest but statistically significant difference in ethnicity was observed between the two groups (p = 0.048), which should be acknowledged when interpreting model generalizability.

Furthermore, Table 4 presents comparisons between patients who survived versus those who did not. Several features exhibited statistically significant differences. In particular, patients who survived had significantly higher GCS scores (11.94 vs. 7.20, p < 0.001). Oxygen saturation also differed modestly between the groups (97.68% in survivors vs. 98.37% in non-survivors, p = 0.0448), although the magnitude of this difference was small and should be interpreted cautiously. In contrast, patients who did not survive tended to have a prolonged prothrombin time (14.51 vs. 13.83, p = 0.0176), which may indicate underlying coagulopathy or systemic injury. Other variables, including heart rate and temperature, showed marginal differences between the groups. Interestingly, emergency department length of stay was significantly longer in the survival group than in the non-survival group. This finding may reflect differences in early clinical workflow rather than a direct protective effect, as more critically ill patients may have been transferred to intensive care more rapidly, whereas survivors may have undergone longer evaluation and stabilization in the emergency department.

These findings have two implications. First, the overall similarity between the training and test cohorts provides some support for internal consistency of the evaluation, although it does not eliminate the possibility of overfitting or guarantee broader generalizability. Second, identifying specific characteristics such as the GCS score, oxygen saturation, and prothrombin time that differ significantly between survival groups improves the clinical interpretability of the model and aligns with the established medical understanding of the severity and prognosis of TBI.

By combining robust statistical validation with meaningful clinical interpretation, these findings suggest that the selected predictors capture clinically relevant patterns in mortality risk within this internal cohort and support further evaluation of the model’s potential clinical utility. Structured comparison of feature distributions provides transparency and increases confidence among clinicians and researchers, particularly those less familiar with machine learning methodologies, that the models are grounded in real-world patient characteristics and clinical reasoning.

3.2. Ablation Study and Feature Contribution Analysis

To further assess the robustness and clinical relevance of our predictive model for 30-day mortality in TBI patients, we performed an ablation analysis, as illustrated in Figure 4. In this experiment, each feature was systematically removed from the input set, and the model was retrained on the modified dataset. Performance was evaluated over ten bootstrapped iterations to ensure statistical stability. The baseline model was a fully trained CatBoost classifier, achieving an AUROC of 0.8669 on the test set when using all available features.

Figure 4 presents the distribution of the AUROC scores after the removal of each clinical characteristic. Each boxplot represents the AUROC variability across the ten resampling rounds, with the median indicated by the central line and the interquartile range captured by the bounds of the box. The whiskers extend 1.5 times the interquartile range, and a red dashed line marks the baseline AUROC achieved with all included features, serving as a comparative reference.

The exclusion of key features, particularly the GCS score, resulted in the most substantial decline in AUROC, underscoring its critical role in model performance. Other variables such as the duration of stay in the ED, the time to prothrombin, and oxygen saturation also demonstrated noticeable impacts when removed, suggesting their significant contribution to the prediction of mortality in patients with TBI. Even features like marital status and ethnicity, which might appear less significant in univariate analyses, influenced performance in the full multivariate context, emphasizing the importance of complex feature interactions.

In particular, the general trend that removal of any single feature led to a decrease in AUROC supports the notion that the model does not overly depend on a single dominant predictor. Instead, it integrates a broad range of physiological and demographic inputs, suggesting that model performance was not driven by a single dominant predictor within this internal cohort.

From a clinical perspective, the importance of characteristics such as the GCS score, oxygen saturation, and prothrombin time is consistent with established understandings of neurological function, respiratory status, and coagulopathy in the prognosis of TBI. The model’s reliance on these clinically intuitive variables supports its interpretability and potential relevance for future clinical risk stratification.

In summary, the ablation study highlights the complementary contributions of multiple clinical factors. Rather than relying on isolated variables, model performance is derived from the collective synthesis of diverse clinical signals, improving interpretability of the model within this internal cohort.

3.3. Model Performance Evaluation and Comparative Analysis

To comprehensively assess the performance and generalizability of various machine learning models in predicting 30-day mortality among patients with TBI, we systematically compared seven classifiers across both the training and test sets. Figure 5 and Figure 6 display the ROC curves for the training and test cohorts, respectively, while Table 5 and Table 6 present detailed performance metrics, including AUROC, accuracy, sensitivity, specificity, PPV, NPV, and F1 score with corresponding 95% confidence intervals.

On the training set, ensemble models such as CatBoost, XGBoost, and LightGBM exhibited superior learning ability. XGBoost achieved an AUROC of 0.9991, followed closely by CatBoost at 0.9967 and LightGBM at 0.9781. These models demonstrated excellent sensitivity and specificity, suggesting a strong capacity to model complex, nonlinear interactions in high-dimensional clinical datasets. In contrast, models like Logistic Regression, NaiveBayes, and NeuralNet showed relatively lower performance, underscoring potential limitations in capturing the intricate physiological patterns seen in TBI patients. Notably, while all models showed improved performance compared with conventional scoring systems, tree-based algorithms, particularly CatBoost and XGBoost, demonstrated near-perfect discrimination on the training set, with AUROC values approaching 1.0. This marked gap between training and test performance suggests possible overfitting and indicates that these models may have captured patterns specific to the training data that did not fully generalize to the independent test set.

However, as performance on the training set alone can be misleading due to potential overfitting, evaluation on the independent test set was critical. CatBoost achieved the highest test AUROC at 0.8669 (95% CI: 0.8088–0.9222), along with a sensitivity of 0.7517 and a specificity of 0.8879. The relatively wide confidence interval also suggests some uncertainty in the true level of performance, likely reflecting the modest size of the independent test set. LightGBM and XGBoost also performed competitively, achieving AUROCs of 0.8519 and 0.8548, respectively. Notably, Logistic Regression also performed strongly, with a test AUROC of 0.8635, which was very close to that of CatBoost. This finding suggests that a simpler and more interpretable linear model remained highly competitive in this cohort, and that the incremental discriminative advantage of CatBoost should be interpreted cautiously.

The comparatively favorable internal test performance of CatBoost, LightGBM, and XGBoost may be related to their tree-based boosting architectures, which can flexibly model nonlinearities and feature interactions without relying on strict parametric assumptions. These methods inherently accommodate the heterogeneity and missingness often encountered in ICU datasets, making them well-suited for structured ICU data in this study setting.

Among the models evaluated, CatBoost showed several practical advantages. Its ordered boosting mechanism and handling of categorical variables may have contributed to stable learning under class imbalance. In addition to achieving the highest point estimate for AUROC, CatBoost also showed slightly higher NPV (0.9182) and PPV (0.6793) than the other tree-based models, which may be relevant in settings where minimizing false negatives and improving risk stratification are important. However, because Logistic Regression achieved a very similar test AUROC, the advantage of CatBoost in this study should be interpreted as modest rather than absolute. Taken together, these findings suggest that while CatBoost may offer incremental benefits in modeling nonlinear interactions and maintaining balanced predictive performance, simpler models such as Logistic Regression remain strong and clinically interpretable alternatives.

From a clinical standpoint, CatBoost’s ability to maintain high sensitivity and NPV is especially valuable in mortality prediction, where the cost of missing a high-risk patient can be substantial. The balanced performance across sensitivity and specificity, combined with clinically interpretable feature contributions demonstrated in prior analyses, suggests potential relevance for future clinical risk stratification pending further validation.

In summary, CatBoost achieved the highest point estimate for AUROC among the evaluated models and remained clinically interpretable for 30-day mortality prediction in TBI patients. Its balanced performance, handling of minority-class outcomes, and compatibility with the clinical data structure support its potential for future evaluation in clinical decision-support settings.

3.4. SHAP Analysis and Clinical Interpretability

To enhance the interpretability of the CatBoost model and ensure its alignment with clinical reasoning, SHAP values were employed to quantify the marginal contribution of each predictor to the model’s output. The SHAP summary plot in Figure 7 displays the distribution of SHAP values for each variable in the prediction of 30-day mortality among patients with TBI.

Each row in the plot represents a feature, and individual points indicate SHAP values for each patient. The horizontal axis denotes the SHAP value—how much a feature shifts the prediction toward mortality or survival—while the color reflects the original feature value, ranging from low (blue) to high (red). Features positioned further to the right with high SHAP values exert a stronger influence in increasing the predicted mortality risk.

The GCS score emerged as the most influential predictor in the model. Low GCS values (blue) strongly shifted the prediction toward increased mortality, in line with the clinical understanding that impaired neurological status is a hallmark of poor prognosis in TBI. ETHNICITY and PT (Prothrombin Time) followed, suggesting that coagulopathy and demographic-related disparities may significantly affect outcomes.

Temperature and age also played an important role. Lower temperatures and older age were associated with higher predicted mortality, which could reflect impaired thermoregulation and physiological reserve in critically ill patients. MARITAL_STATUS demonstrated modest but consistent effects, potentially capturing psychosocial factors that influence recovery and survival trajectories.

Additional contributors such as heart rate, oxygen saturation, and ED length of stay exhibited secondary influences. In particular, decreased oxygen saturation (blue points on the right) was associated with elevated mortality predictions, highlighting the importance of maintaining adequate oxygen delivery in the management of TBI.

Compared to previous iterations, the length of stay in the ICU was excluded from the set of characteristics, but the model retained its predictive strength, focusing on the fact that the basic physiological and demographic variables adequately encapsulate the risk of mortality. These findings are also consistent with results from the ablation study, where key predictors like GCS score and PT yielded marked performance changes upon exclusion, validating their central role.

From a clinical perspective, SHAP values enable patient-specific transparency, allowing clinicians to visualize and interpret the most impactful drivers of model predictions. The demographic variables ETHNICITY and MARITAL_STATUS need special interpretation because they represent more than basic demographic information. ETHNICITY contains multiple aspects of inequality which include unequal access to specialized neurocritical care and pre-injury health status variations due to social determinants of health and different family advocacy and care engagement and clinical decision-making biases. The marital status variable MARITAL_STATUS indicates both social support networks and healthcare decision-making patterns and post-acute care access and stress-buffering effects during critical illness recovery. The research demonstrates how social determinants interact with clinical results in TBI care thus requiring predictive models to include both physical factors and structural and social elements that affect care delivery and patient results.

This supports precision medicine by informing customized interventions, such as increased neurologic monitoring for patients with low GCS or early hemostatic support in those with coagulopathy.

Overall, the SHAP analysis confirms that the CatBoost model integrates biologically and clinically plausible patterns, rather than functioning as a black box. Its interpretability reinforces its applicability for real-time decision support in the management of traumatic brain injury.

4. Discussion

4.1. Summary of Existing Model Compilation

This study developed a robust and interpretable machine learning model for predicting 30-day mortality in geriatric patients with TBI, leveraging data from the MIMIC-III critical care database. Utilizing a structured pipeline involving data imputation, hybrid feature selection, and multi-model comparison, the CatBoost algorithm achieved the highest performance, with an AUROC of 0.867 (95% CI: 0.809–0.922), outperforming XGBoost and LightGBM. Nine features were ultimately selected, encompassing GCS score, age, oxygen saturation, prothrombin time, and vital signs—all clinically plausible predictors of short-term mortality. Notably, SHAP analysis confirmed the central role of GCS and oxygen saturation in driving predictions, reinforcing their critical importance in early TBI risk stratification. The high performance across multiple evaluation metrics and stratified cross-validation demonstrates the model’s discriminative power and potential generalizability. From a clinical perspective, the model is not intended to replace physician judgment, but rather to function as an early risk-stratification tool for elderly TBI patients admitted to the ICU. For example, patients identified as high risk by the model may warrant closer neurologic monitoring, earlier multidisciplinary escalation, more cautious hemodynamic and respiratory management, and more timely communication regarding prognosis. In this context, model performance should not be interpreted solely through AUROC, but also through its potential utility in prioritizing attention toward patients at increased risk of short-term mortality. At the same time, any such application would require prospective validation and careful assessment of how the model performs within real clinical workflows. We also acknowledge that, in many clinical settings, simpler and more interpretable models are often preferred. In the present study, Logistic Regression achieved a test AUROC very close to that of CatBoost, indicating that a linear model remained highly competitive in this cohort. For this reason, the advantage of CatBoost should not be interpreted as absolute. Rather, its justification lies in its ability to flexibly capture nonlinear relationships and interactions among structured clinical variables while maintaining balanced performance across multiple evaluation metrics. At the same time, our findings support the view that simpler models such as Logistic Regression may remain reasonable alternatives when ease of implementation and transparency are prioritized. Before any clinical implementation, several additional steps would be necessary. First, the model should be externally validated in independent datasets and healthcare settings to assess transportability beyond the current single-database retrospective cohort. Second, prospective evaluation would be needed to determine how model predictions perform in real-time clinical workflows and whether they improve decision-making without introducing unintended consequences. Third, calibration, threshold selection, and integration with existing ICU documentation or decision-support systems would need to be carefully assessed. Finally, any clinical deployment should occur with appropriate physician oversight, as the model is intended to support rather than replace clinical judgment.

Although discrimination metrics such as AUROC are useful for comparing model performance, their clinical significance depends on how such predictions can inform real-world care decisions. The findings of this study hold significant clinical and practical value. First, the use of data within the first 24 h of ICU admission enables early identification of patients at elevated mortality risk, providing a critical window for initiating targeted interventions, such as intensified monitoring, early palliative care discussions, or transfer to specialized neurocritical units. Second, the model’s interpretability via SHAP enhances clinician trust and supports its integration into bedside decision-making processes. For instance, when a patient presents with a low GCS score and prolonged prothrombin time, the model flags elevated mortality risk with transparent justification, potentially triggering timely resuscitative measures or anticoagulant review. Third, from a systems-level perspective, the implementation of this model can assist in triaging ICU resources more efficiently—an essential consideration given the projected rise in elderly TBI cases and the increasing pressure on critical care services. Furthermore, because the model operates on routinely collected clinical variables, it can be readily implemented across different healthcare settings without requiring additional testing or infrastructure investment. It is important to note that our mortality predictions do not distinguish between deaths from medical complications and those following comfort care transitions, which represent different clinical pathways requiring distinct decision-support approaches.

The study has several notable strengths. It leverages a large, publicly available critical care database with granular, time-stamped data, ensuring reproducibility and wide applicability. The hybrid feature selection strategy, combining Random Forest-based importance with clinical validation, enhances both model accuracy and interpretability. Additionally, the model’s robustness was affirmed through stratified 5-fold cross-validation and statistical validation, including SHAP-based explanation and ablation studies.

4.2. Comparison with Prior Studies

Several prior studies have explored prognostic modeling in geriatric TBI populations, but most rely on traditional statistical frameworks with limited predictive capacity or clinical granularity. Our findings extend and improve upon this prior work in both methodological design and outcome performance.

Fu et al. (2017) [19] conducted a population-based analysis of TBI hospitalizations in older adults using administrative data from Canada. While their study identified advanced age, comorbidities, and injury severity as independent predictors of mortality, it did not develop or validate a formal risk prediction model. Furthermore, the lack of ICU-specific physiological data limited its applicability to acute care triage or early mortality prediction. In contrast, our model leveraged granular ICU data from MIMIC-III, including vital signs, laboratory values, and early GCS scores, all within the first 24 h of admission. This allows for more immediate and actionable risk estimation.

Bobeff et al. (2019) [20] developed eTBI Score to predict 30-day mortality or vegetative state in patients aged ≥ 65. Using logistic regression, their model identified key predictors such as GCS motor score (OR 0.17), comorbid organ dysfunction or malignancy (OR 2.86), platelet count

\leq 100 \times 10^{9} / L

(OR 13.60), and RDW ≥ 14.5%. While the eTBI Score offered a parsimonious and clinically interpretable tool, it was developed from a relatively small sample (n = 214), with limited validation and no external test cohort. Our study, in contrast, included 667 patients and utilized stratified training and testing with 5-fold cross-validation to ensure robust model generalization.

Huang et al. (2024) [21] evaluated the GTOS in a multicenter cohort of 5543 older adults with moderate to severe isolated TBI. GTOS demonstrated an AUC of 0.813, with a cutoff score of 121.5 identifying patients at elevated mortality risk (OR 2.64; 95% CI 1.93–3.61). While their model benefited from a large sample size, it did not incorporate ICU-specific physiologic features (e.g., oxygen saturation, prothrombin time), and was based on linear scoring systems. In comparison, our machine learning approach (CatBoost) captured non-linear interactions among clinical features, resulting in superior predictive accuracy and better calibration. Importantly, our inclusion of SHAP analysis provided an interpretable framework for identifying the most influential predictors—namely GCS, oxygen saturation, and prothrombin time—which aligns with known TBI pathophysiology but enhances bedside decision-making by quantifying their individual impact.

In summary, while prior TBI-focused models such as eTBI and GTOS have provided valuable foundations for risk stratification, our study advances the field by integrating a high-resolution ICU dataset, robust machine learning methodology, and interpretability tools into a unified framework that offers improved accuracy and practical clinical utility in geriatric TBI care.

4.3. Clinical Integration and Operationalization

To transition from high model performance to bedside utility, practical deployment pathways must be clearly articulated. In a real-world ICU setting, our model could be integrated into the hospital’s electronic health record (EHR) system through an automated risk dashboard. Predictions would be generated every 6–12 h using the most recent available vital signs and lab values, especially within the first 24–48 h. For patients flagged as high risk, the dashboard could trigger automatic alerts to attending physicians and case managers, prompting early multidisciplinary huddles, advanced care planning discussions, or transfer to specialized neurocritical units. Intermediate-risk patients could be monitored more closely for physiological deterioration and flagged for additional diagnostic workup. Low-risk patients could be prioritized for early step-down care, freeing up ICU beds.

The clinical impact of such a system could include reduced in-hospital mortality through timely interventions, improved ICU resource allocation, and enhanced alignment with patient and family goals. For example, if the model aids in initiating palliative care for 20% of high-risk patients within 48 h, the estimated ICU bed savings could be substantial given current average daily ICU costs of $4000–$5000 in the U.S. Furthermore, by flagging risk early, the model may reduce moral distress among providers and avoid futile interventions in patients with extremely poor prognosis.

4.4. Limitations and Future Work

Despite these strengths, our study has limitations. It is based on retrospective, single-center data from the MIMIC-III database, which may limit external generalizability. Additionally, we restricted our input features to data available within the first 24 h of ICU admission. While this design supports early risk stratification, incorporating time-series data could further improve predictive performance by capturing clinical deterioration trajectories. This exclusion criterion may also have introduced selection bias, since patients with ICU stays shorter than 24 h may represent either rapidly deteriorating individuals or those who stabilized quickly and were discharged from intensive care early. Therefore, some uncertainty remains regarding the robustness of feature retention and imputation choices. Also, although most baseline characteristics were similar between the training and test sets, ethnicity differed modestly between the two groups. Because ethnicity was retained as a predictive feature, this imbalance may have influenced model behavior and may limit generalizability. Although variables with more than 80% missingness were excluded and the remaining missing values were handled using a Random Forest-based imputation strategy, alternative missingness thresholds and imputation specifications were not systematically compared in the present study. In addition, because MIMIC-III masks ages above 89 years for de-identification, very elderly patients could not be represented with full age granularity. Although ages ≥ 300 were recoded to 90 to address this issue, residual loss of information may have affected the contribution of age in the model. The longer emergency department stay observed among survivors may reflect differences in triage and stabilization processes rather than a direct association with improved outcomes. Therefore, this variable should be interpreted cautiously and not as evidence of a beneficial effect of prolonged emergency department exposure. Furthermore, although CatBoost achieved the highest point estimate for test AUROC, its confidence interval was relatively wide, indicating uncertainty in the true level of model performance. This likely reflects the modest size of the independent test set and suggests that the results should be interpreted with appropriate caution. Moreover, external validation on diverse ICU datasets, including prospective and multi-center cohorts, is necessary before deployment in routine clinical practice. Although the model was evaluated using an internal train-test split and cross-validation, all data were derived from the MIMIC-III database, which reflects a specific critical care environment and patient population. Therefore, the generalizability of the model to other institutions, healthcare systems, geographic regions, or clinical populations remains uncertain. In addition, the near-perfect training performance observed in some tree-based models suggests possible overfitting despite the more conservative performance seen in cross-validation and the independent test set. Future work should externally validate the model in additional datasets, such as MIMIC-IV, eICU-CRD, or other multi-institutional critical care cohorts, before broader clinical applicability can be assumed. An additional point of context is the relationship between the present model and established prognostic tools for traumatic brain injury, such as IMPACT, CRASH, GTOS, and eTBI. These tools have made important contributions to risk stratification, but they were developed in different patient populations, with different outcome definitions and predictor sets, and in some cases were not specifically designed for elderly ICU patients with structured high-granularity electronic health record data. In the present study, we referenced these prior approaches to motivate the need for geriatric-specific prognostic modeling; however, we did not perform a formal head-to-head benchmark comparison in the current retrospective analysis. Therefore, the incremental value of the proposed machine learning framework relative to established clinical scoring systems should be interpreted cautiously. Future work should directly compare this model with established prognostic tools in matched cohorts using harmonized predictors and outcome definitions.

An important limitation of our study is the inability to distinguish between deaths resulting from comfort care measures versus deaths from direct medical complications. A substantial proportion of TBI mortalities within 30 days, particularly in geriatric populations, result from transitions to comfort care following goals-of-care discussions with families. These decisions are often based on prognostic uncertainty and perceived quality of life considerations rather than acute medical deterioration. The MIMIC-III database does not provide specific documentation on comfort care transitions, withdrawal of life-sustaining treatments, or do-not-resuscitate orders that preceded death. This limitation means our model predictions encompass both potentially preventable deaths and those resulting from decisions to limit treatment intensity. Future studies should incorporate palliative care consultation timing, code status changes, and family conference documentation to better distinguish between these mortality pathways and potentially develop separate models for each outcome type.

Multiple crucial points require discussion within this analysis. The study restricts itself to ICU patients who create selection bias because not all geriatric TBI patients require ICU admission. Although our model provides detailed physiological information needed for early death prediction the restricted study population of ICU patients may not represent the entire geriatric TBI population and might result in overestimated mortality risk since ICU admission itself indicates higher acuity.

The selected in-hospital mortality as the primary outcome measure might not represent the complete range of meaningful outcomes for geriatric TBI patients and their families. Predicting long-term functional trajectories—including cognitive recovery, independence in daily activities, and quality of life—would provide richer guidance for clinical decision-making and goals-of-care discussions. However, these outcomes are not captured in the MIMIC-III dataset due to its focus on acute care episodes, limiting our ability to forecast results that may be most important to patients and caregivers when making treatment decisions.

In addition, the database does not provide detailed documentation on the circumstances surrounding in-hospital deaths, such as whether mortality followed a transition to comfort care after goals-of-care discussions. In geriatric TBI, a substantial proportion of early deaths may occur after such decisions, which are shaped by both patient/family preferences and clinicians’ prognostic judgments. Without this context, the model cannot distinguish between deaths driven by irreversible physiological decline and those associated with treatment-limiting choices, potentially leading to overestimation of mortality risk for some patients. Future research should integrate datasets that record palliative care transitions and decision-making context to refine prognostic accuracy and ensure predictions align with both clinical and patient-centered outcomes.

The sociodemographic predictors along with their interpretation and potential biases need immediate attention. ETHNICITY and MARITAL_STATUS prove to be predictive indicators but these variables might actually reflect past inequalities in healthcare service delivery rather than biological risk elements. The statistical effectiveness of target encoding could continue to perpetuate healthcare disparities because model suggestions might maintain or strengthen existing biased healthcare patterns. PT stands as a coagulation marker which shows predictive value yet lacks specificity when used to evaluate complex coagulopathy patterns found in TBI patients. Trauma centers use viscoelastic testing as their current standard for coagulation monitoring but this test does not appear in the MIMIC-III database.

Additionally, our dataset did not include reliable information on pre-injury use of anticoagulant or antiplatelet medications, which are known to influence outcomes in geriatric TBI patients. This omission may underestimate bleeding risk or severity in certain subgroups. Furthermore, while several individual comorbidities were included as features (e.g., cardiovascular disease, chronic pulmonary disease), we did not use a validated comorbidity index such as the Elixhauser or Charlson score.

To address these limitations, future work will focus on three strategic areas. In the clinical domain, studies should prioritize developing models that predict long-term functional outcomes at 3, 6, and 12 months post-injury, requiring linkage with post-acute care databases and patient-reported outcome tools. Additionally, parallel models should be developed for non-ICU TBI patients to provide risk stratification across all care settings. Future work will also benefit from incorporating composite indices to enhance risk adjustment and improve comparability across studies.

Clinical implementations need to integrate mechanisms for bias detection and auditing processes across demographic groups and equity safeguards in their recommendations. We plan to analyze temporal deep learning models including LSTM and transformers to detect clinical deterioration patterns while integrating imaging data for complete risk assessment.

The implementation phase will create user-friendly interfaces that enable the model to operate inside electronic health record systems for real-time clinical decision support which includes built-in equity monitoring and bias detection functionality.

5. Conclusions

In this study, we developed a clinically grounded machine learning framework to predict 30-day mortality among geriatric patients with traumatic brain injury using structured ICU data from the MIMIC-III database. A structured pipeline was applied, incorporating patient selection, Random Forest-based data imputation, and hybrid feature selection. From an initial pool of 69 candidate variables, nine clinically relevant predictors—including GCS score, prothrombin time, and oxygen saturation—were retained to support model interpretability and internal risk stratification.

Among the evaluated models, CatBoost achieved the highest point estimate for AUROC (0.867; 95% CI: 0.809–0.922), with a favorable balance between sensitivity and specificity in the internal test cohort. Feature ablation and SHAP analyses suggested that neurological, respiratory, and coagulation-related variables contributed meaningfully to mortality risk prediction. At the same time, the findings should be interpreted cautiously in light of the retrospective single-database design, the lack of external validation, the possibility of overfitting, and the uncertainty reflected in the test-set confidence intervals.

Clinically, the model may have value as an early risk-stratification tool rather than as a replacement for physician judgment. Predictions generated from the first 24 h of ICU data may help identify patients who warrant closer monitoring or earlier multidisciplinary attention, but any such use would require prospective evaluation within real clinical workflows.

Future work should focus on external validation in independent and multi-institutional cohorts, prospective assessment of workflow utility, and further refinement through additional modalities such as imaging or frailty-related variables. Accordingly, this framework should be viewed as a promising step toward clinically useful mortality prediction in geriatric TBI, rather than as a model ready for bedside implementation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biomedinformatics6020017/s1, Supplementary Methods S1. ICD-9-Based TBI Case Identification; Supplementary Methods S2. Stepwise Cohort Extraction Workflow; Supplementary Methods S3. Rationale for Excluding Missing Early Physiologic Variables.

Author Contributions

Y.S. conceptualized the study, developed the methodological framework, conducted the experiments, performed data analysis, and drafted the original manuscript. S.C., J.F. and L.S. contributed to data preprocessing, model development, and manuscript preparation. E.P., K.A. and G.P. provided critical feedback on the study design and interpretation of results. M.P. supervised the project, coordinated research efforts, and provided strategic guidance throughout the study. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data analyzed in this study are available from the MIMIC-III database at PhysioNet to credentialed users who complete the required training and data use agreement. No new public dataset was generated in this study.

Acknowledgments

The authors would also like to acknowledge the Laboratory for Computational Physiology at the Massachusetts Institute of Technology for maintaining the MIMIC-III database.

Conflicts of Interest

The authors declare no conflict of interest.

References

Corrigan, J.D.; Selassie, A.W.; Orman, J.A.L. The epidemiology of traumatic brain injury. J. Head Trauma Rehabil. 2010, 25, 72–80. [Google Scholar] [CrossRef]
Mckee, A.C.; Daneshvar, D.H. The neuropathology of traumatic brain injury. In Handbook of Clinical Neurology; Elsevier: Amsterdam, The Netherlands, 2015; Volume 127, pp. 45–66. [Google Scholar]
Ng, S.Y.; Lee, A.Y.W. Traumatic brain injuries: Pathophysiology and potential therapeutic targets. Front. Cell. Neurosci. 2019, 13, 528. [Google Scholar] [CrossRef]
Hume, P.A.; Bradshaw, E.J.; Brueggemann, G.P. Biomechanics: Injury mechanisms and risk factors. In Gymnastics; International Olympic Committee: Paris, France, 2013; pp. 75–84. [Google Scholar]
de Macedo Filho, L.; Figueredo, L.F.; Villegas-Gomez, G.A.; Arthur, M.; Pedraza-Ciro, M.C.; Martins, H.; Kanawati Neto, J.; Hawryluk, G.J.; Amorim, R.L.O. Pathophysiology-based management of secondary injuries and insults in TBI. Biomedicines 2024, 12, 520. [Google Scholar] [CrossRef]
Dewan, M.C.; Rattani, A.; Gupta, S.; Baticulon, R.E.; Hung, Y.C.; Punchak, M.; Agrawal, A.; Adeleye, A.O.; Shrime, M.G.; Rubiano, A.M.; et al. Estimating the global incidence of traumatic brain injury. J. Neurosurg. 2018, 130, 1080–1097. [Google Scholar] [CrossRef]
GBD 2016 Traumatic Brain Injury and Spinal Cord Injury Collaborators. Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019, 18, 56–87.
Hyder, A.A.; Wunderlich, C.A.; Puvanachandra, P.; Gururaj, G.; Kobusingye, O.C. The impact of traumatic brain injuries: A global perspective. NeuroRehabilitation 2007, 22, 341–353. [Google Scholar] [CrossRef]
Taylor, C.A. Traumatic brain injury–related emergency department visits, hospitalizations, and deaths—United States, 2007 and 2013. MMWR Surveill. Summ. 2017, 66, 1–16. [Google Scholar] [CrossRef]
Finkelstein, E.; Corso, P.S.; Miller, T.R. The Incidence and Economic Burden of Injuries in the United States; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
Thomas, K.E.; Stevens, J.A.; Sarmiento, K.; Wald, M.M. Fall-related traumatic brain injury deaths and hospitalizations among older adults—United States, 2005. J. Saf. Res. 2008, 39, 269–272. [Google Scholar] [CrossRef]
Cusimano, M.D.; Saarela, O.; Hart, K.; Zhang, S.; McFaull, S.R. A population-based study of fall-related traumatic brain injury identified in older adults in hospital emergency departments. Neurosurg. Focus 2020, 49, E20. [Google Scholar] [CrossRef]
Harvey, L.A.; Close, J.C. Traumatic brain injury in older adults: Characteristics, causes and consequences. Injury 2012, 43, 1821–1826. [Google Scholar] [CrossRef]
Thompson, H.J.; Weir, S.; Rivara, F.P.; Wang, J.; Sullivan, S.D.; Salkever, D.; MacKenzie, E.J. Utilization and costs of health care after geriatric traumatic brain injury. J. Neurotrauma 2012, 29, 1864–1871. [Google Scholar] [CrossRef]
Florence, C.S.; Bergen, G.; Atherly, A.; Burns, E.; Stevens, J.; Drake, C. Medical costs of fatal and nonfatal falls in older adults. J. Am. Geriatr. Soc. 2018, 66, 693–698. [Google Scholar] [CrossRef]
McIntyre, A.; Mehta, S.; Aubut, J.; Dijkers, M.; Teasell, R.W. Mortality among older adults after a traumatic brain injury: A meta-analysis. Brain Inj. 2013, 27, 31–40. [Google Scholar] [CrossRef]
Grinin, L.; Grinin, A.; Korotayev, A. Global aging and our futures. World Futur. 2023, 79, 536–556. [Google Scholar] [CrossRef]
Maas, A.I.; Lingsma, H.F.; Roozenbeek, B. Chapter 29—Predicting outcome after traumatic brain injury. In Handbook of Clinical Neurology; Traumatic Brain Injury, Part II; Grafman, J., Salazar, A.M., Eds.; Elsevier: Amsterdam, The Netherlands, 2015; Volume 128, pp. 455–474. [Google Scholar] [CrossRef]
Fu, W.W.; Fu, T.S.; Jing, R.; McFaull, S.R.; Cusimano, M.D. Predictors of falls and mortality among elderly adults with traumatic brain injury: A nationwide, population-based study. PLoS ONE 2017, 12, e0175868. [Google Scholar] [CrossRef]
Bobeff, E.J.; Fortuniak, J.; Bryszewski, B.; Wiśniewski, K.; Bryl, M.; Kwiecień, K.; Stawiski, K.; Jaskólski, D.J. Mortality after traumatic brain injury in elderly patients: A new scoring system. World Neurosurg. 2019, 128, e129–e147. [Google Scholar] [CrossRef]
Huang, C.Y.; Yen, Y.H.; Tsai, C.H.; Hsu, S.Y.; Tsai, P.L.; Hsieh, C.H. Geriatric Trauma Outcome Score as a Mortality Predictor in Isolated Moderate to Severe Traumatic Brain Injury: A Single-Center Retrospective Study. Healthcare 2024, 12, 1680. [Google Scholar] [CrossRef]
Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
Li, H.; Ashrafi, N.; Kang, C.; Zhao, G.; Chen, Y.; Pishgar, M. A machine learning-based prediction of hospital mortality in mechanically ventilated ICU patients. PLoS ONE 2024, 19, e0309383. [Google Scholar] [CrossRef]
Safaei, N.; Safaei, B.; Seyedekrami, S.; Talafidaryani, M.; Masoud, A.; Wang, S.; Li, Q.; Moqri, M. E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database. PLoS ONE 2022, 17, e0262895. [Google Scholar] [CrossRef] [PubMed]
Johnson, A.E.; Pollard, T.J.; Shen, L.; Lehman, L.w.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L.A.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef]
Si, Y.; Abdollahi, A.; Ashrafi, N.; Placencia, G.; Pishgar, E.; Alaei, K.; Pishgar, M. Optimized Feature Selection and Advanced Machine Learning for Stroke Risk Prediction in Revascularized Coronary Artery Disease Patients. medRxiv 2025. Available online: https://www.medrxiv.org/content/early/2025/03/15/2025.03.14.25324005.full.pdf (accessed on 23 March 2026). [CrossRef] [PubMed]
Fan, J.; Chen, S.; Pan, H.; Pishgar, E.; Placencia, G.; Alaei, K.; Pishgar, M. Predicting 28-Day Mortality in First-Time ICU Patients with Heart Failure and Hypertension Using LightGBM: A MIMIC-IV Study. medRxiv 2025. [Google Scholar] [CrossRef]
Wang, R.; Zeng, X.; Long, Y.; Zhang, J.; Bo, H.; He, M.; Xu, J. Prediction of Mortality in Geriatric Traumatic Brain Injury Patients Using Machine Learning Algorithms. Brain Sci. 2023, 13, 94. [Google Scholar] [CrossRef]
Sun, L.; Ashrafi, N.; Pishgar, M. Optimizing Urban Mobility Through Complex Network Analysis and Big Data from Smart Cards. arXiv 2025, arXiv:2502.17054. [Google Scholar] [CrossRef]

Figure 1. Flowchart Illustrating the Patient Selection Process from the MIMIC-III Database.

Figure 2. Flowchart Illustrating the Full Data Preprocessing and Feature Selection Pipeline.

Figure 3. Flowchart of Multi-model Development and Evaluation.

Figure 4. Impact of Feature Removal on CatBoost Model Performance.

Figure 5. AUROC Curves for Model Performance in the Training Set. The gray dotted line indicates the line of no discrimination.

Figure 6. AUROC Curves for Model Performance in the Test Set. The gray dotted line indicates the line of no discrimination.

Figure 7. SHAP Summary Plot Showing Feature Value Distributions and Their Impact on Model Output.

Table 1. Final 9 Features Used for 30-day Mortality Prediction in Geriatric TBI Patients.

Category	Features
Demographics	Age (AGE), Marital Status (MARITAL_STATUS), Ethnicity (ETHNICITY)
Administrative	Emergency Department Length of Stay (ED_LENGTH_OF_STAY)
Clinical Measurements	Glasgow Coma Scale Score (GCS_SCORE), Temperature (Temperature_Combined), Prothrombin Time (PT), Heart Rate, Oxygen Saturation

Table 2. Detailed Feature Definitions and Encoding Methods.

Feature	Definition	Units/Type	Source/Calculation
AGE	Patient age at ICU admission	Years	Calculated from date of birth
ETHNICITY	Categorical variable encoded using target encoding	Continuous (0–1)	Mean 30-day mortality rate per ethnicity group
MARITAL_STATUS	Categorical variable encoded using target encoding	Continuous (0–1)	Mean 30-day mortality rate per marital status group
Temperature_Combined	Mean body temperature	Fahrenheit	Averaged over 24 h; original Fahrenheit values retained
ED_LENGTH_OF_STAY	Time spent in Emergency Department	Hours	EDOUTTIME − EDREGTIME, calculated in hours
GCS_SCORE	Glasgow Coma Scale score	Points (3–15)	Neurological assessment within first 24 h
PT	Prothrombin time	Seconds	Laboratory measurement within first 24 h
Heart Rate	Heart rate	Beats per minute	Mean over first 24 h
Oxygen Saturation	Oxygen saturation	Percentage	Mean SpO₂ over first 24 h

Table 3. T-test Comparison of Feature Distributions between Training and Test Sets.

Feature	Training Set	Test Set	p-Value
GCS_SCORE	10.80 (4.28)	11.05 (4.32)	0.489
Temperature	97.88 (4.90)	97.76 (2.01)	0.647
ED_LOS	5.31 (4.89)	4.84 (3.43)	0.156
PT	13.99 (2.78)	15.04 (10.17)	0.153
Heart Rate	81.22 (16.96)	80.30 (15.99)	0.506
AGE	81.18 (19.20)	84.22 (27.80)	0.104
Oxygen Saturation	97.85 (3.14)	97.92 (2.61)	0.739
MARITAL_STATUS (Target Encoded)	0.23 (0.06)	0.22 (0.05)	0.262
ETHNICITY (Target Encoded)	0.25 (0.07)	0.24 (0.07)	0.048

Table notes: The table summarizes differences between the training and test cohorts across multiple clinical variables. Mean and standard deviation (SD) are reported. ETHNICITY and MARITAL_STATUS represent target-encoded categorical variables where values indicate mean mortality rates for each category. p-values are computed using appropriate statistical tests with a significance threshold of 0.05. In MIMIC-III, patients older than 89 years may appear with shifted ages due to de-identification; in this study, ages ≥ 300 years were recoded to 90.

Table 4. T-test Comparison of Feature Distributions between Survival and Non-Survival Sets.

Feature	Survival	Non-Survival	p-Value
GCS_SCORE	11.94 (3.68)	7.20 (4.02)	<0.001
Temperature	98.04 (5.45)	97.36 (2.33)	0.063
ED_LOS	5.62 (5.23)	4.32 (3.44)	0.003
PT	13.83 (2.82)	14.51 (2.58)	0.018
Heart Rate	80.37 (16.11)	83.89 (19.25)	0.080
AGE	80.29 (7.36)	82.66 (25.06)	0.238
Oxygen Saturation	97.68 (3.14)	98.37 (3.13)	0.045
MARITAL_STATUS (Target Encoded)	0.22 (0.05)	0.24 (0.08)	0.119
ETHNICITY (Target Encoded)	0.24 (0.07)	0.27 (0.08)	<0.001

Table notes: This table compares patients who survived or died within 30 days. Differences in mean values of key clinical variables are shown along with p-values. ETHNICITY and MARITAL_STATUS represent target-encoded categorical variables where values indicate mean mortality rates for each category. Statistical significance set at 0.05 threshold. In MIMIC-III, patients older than 89 years may appear with shifted ages due to de-identification; in this study, ages ≥ 300 years were recoded to 90.

Table 5. Performance Comparison of Different Models in the Training Set.

Model	AUROC (95% CI)	Accuracy	F1-Score	Sensitivity	Specificity	PPV	NPV
CatBoost	0.997 (0.994–0.999)	0.974	0.945	0.929	0.988	0.964	0.978
LightGBM	0.978 (0.967–0.988)	0.927	0.849	0.855	0.949	0.842	0.955
XGBoost	0.999 (0.998–1.000)	0.985	0.968	0.965	0.992	0.973	0.989
LogisticRegression	0.827 (0.785–0.865)	0.746	0.582	0.749	0.745	0.481	0.904
KNN	0.939 (0.916–0.959)	0.814	0.708	0.928	0.780	0.572	0.972
NaiveBayes	0.757 (0.706–0.808)	0.643	0.522	0.811	0.590	0.384	0.909
NeuralNet	0.800 (0.755–0.845)	0.683	0.542	0.787	0.650	0.414	0.905

Table 6. Performance Comparison of Different Models in the Test Set.

Model	AUROC (95% CI)	Accuracy	F1-Score	Sensitivity	Specificity	PPV	NPV
CatBoost	0.867 (0.809–0.922)	0.855	0.710	0.752	0.888	0.679	0.918
LightGBM	0.852 (0.785–0.905)	0.831	0.678	0.752	0.855	0.618	0.914
XGBoost	0.855 (0.790–0.912)	0.830	0.653	0.663	0.881	0.637	0.893
LogisticRegression	0.864 (0.801–0.921)	0.812	0.668	0.814	0.808	0.577	0.931
KNN	0.702 (0.612–0.779)	0.711	0.499	0.604	0.743	0.428	0.855
NaiveBayes	0.730 (0.624–0.826)	0.674	0.515	0.731	0.657	0.401	0.885
NeuralNet	0.800 (0.721–0.873)	0.709	0.587	0.855	0.662	0.445	0.935

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Si, Y.; Fan, J.; Sun, L.; Chen, S.; Pishgar, E.; Alaei, K.; Placencia, G.; Pishgar, M. Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients. BioMedInformatics 2026, 6, 17. https://doi.org/10.3390/biomedinformatics6020017

AMA Style

Si Y, Fan J, Sun L, Chen S, Pishgar E, Alaei K, Placencia G, Pishgar M. Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients. BioMedInformatics. 2026; 6(2):17. https://doi.org/10.3390/biomedinformatics6020017

Chicago/Turabian Style

Si, Yong, Junyi Fan, Li Sun, Shuheng Chen, Elham Pishgar, Kamiar Alaei, Greg Placencia, and Maryam Pishgar. 2026. "Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients" BioMedInformatics 6, no. 2: 17. https://doi.org/10.3390/biomedinformatics6020017

APA Style

Si, Y., Fan, J., Sun, L., Chen, S., Pishgar, E., Alaei, K., Placencia, G., & Pishgar, M. (2026). Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients. BioMedInformatics, 6(2), 17. https://doi.org/10.3390/biomedinformatics6020017

Article Menu

Machine Learning-Based Prediction of Mortality in Geriatric Traumatic Brain Injury Patients

Abstract

1. Introduction

2. Methods

2.1. Data Source and Study Design

2.2. Patient Selection

2.3. Data Preprocessing

2.4. Feature Selection

2.5. Model Development and Evaluation

2.6. Statistical Evaluation and Interpretability Framework

3. Results

3.1. Cohort Characteristics and Statistical Comparison

3.2. Ablation Study and Feature Contribution Analysis

3.3. Model Performance Evaluation and Comparative Analysis

3.4. SHAP Analysis and Clinical Interpretability

4. Discussion

4.1. Summary of Existing Model Compilation

4.2. Comparison with Prior Studies

4.3. Clinical Integration and Operationalization

4.4. Limitations and Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI