Previous Article in Journal
Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering
Previous Article in Special Issue
Understanding and Therapeutic Application of Immune Response in Major Histocompatibility Complex (MHC) Diversity Using Multimodal Artificial Intelligence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development and Validation of a CatBoost-Based Model for Predicting Significant Creatinine Elevation in ICU Patients Receiving Vancomycin Therapy

Department of Industrial and Systems Engineering, University of Southern California, 3715 McClintock Ave GER 240, Los Angeles, CA 90089, USA
*
Author to whom correspondence should be addressed.
BioMedInformatics 2025, 5(4), 71; https://doi.org/10.3390/biomedinformatics5040071
Submission received: 29 October 2025 / Revised: 4 December 2025 / Accepted: 9 December 2025 / Published: 10 December 2025
(This article belongs to the Special Issue Feature Papers on Methods in Biomedical Informatics)

Abstract

Vancomycin remains a cornerstone for severe Gram-positive infections in the ICU, yet creatinine elevation—a sensitive marker of early renal stress—occurs frequently and complicates therapy. We developed a machine learning model to predict vancomycin-associated creatinine elevation using routinely available clinical data, enabling preemptive risk stratification. In this retrospective MIMIC-IV cohort study ( n = 10 , 288 ICU adults aged 18–80 receiving vancomycin), the primary outcome was creatinine elevation per KDIGO criteria (≥0.3 mg/dL within 48 h or ≥50% within 7 d). A two-stage feature selection (SelectKBest + Random Forest) identified 15 predictors from 30 candidates. Six algorithms were compared via 5-fold cross-validation. CatBoost was selected for final modeling; interpretability was assessed using SHAP values and Accumulated Local Effects (ALE) plots. Creatinine elevation occurred in 2903 patients (28.2%). CatBoost achieved AUROC 0.818 (95% CI: 0.801–0.834), sensitivity 0.800, specificity 0.681, and NPV 0.900. Top predictors were serum phosphate, total bilirubin, magnesium, Charlson Comorbidity Index, and APSIII score. SHAP analysis confirmed hyperphosphatemia as the strongest driver; ALE plots revealed non-linear, clinically plausible thresholds (e.g., phosphate >4.5 mg/dL sharply increased risk). This interpretable model accurately predicts vancomycin-associated creatinine elevation using standard ICU monitoring data. With high negative predictive value, it supports early exclusion of low-risk patients and targeted interventions (e.g., intensified TDM, nephrotoxin avoidance) in high-risk cases—facilitating precision antimicrobial stewardship in critical care.

1. Introduction

Vancomycin, a glycopeptide antibiotic first isolated from Amycolatopsis orientalis in the 1950s, remains one of the most important treatments for severe infections caused by Gram-positive pathogens, particularly methicillin-resistant Staphylococcus aureus (MRSA) and Enterococcus species [1,2,3]. Its bactericidal activity stems from its ability to inhibit bacterial cell wall synthesis by binding to the D-Ala-D-Ala terminus of peptidoglycan precursors, thus preventing cell wall formation [4]. Despite the emergence of vancomycin-resistant Enterococcus (VRE) and other resistant strains, vancomycin continues to be a cornerstone in the treatment of serious hospital-acquired infections, especially in settings where multidrug resistance is prevalent [5,6,7].
While essential in treating severe infections, vancomycin is associated with several side effects, the most concerning of which is nephrotoxicity. Multiple in vivo and in vitro studies have demonstrated that high intracellular concentrations lead to oxidative stress, mitochondrial dysfunction, and resultant proximal tubular apoptosis and necrosis [8,9,10,11]. Lysosomal accumulation of the drug also contributes to lysosomal–mitochondrial crosstalk and cell death. Moreover, allergic mechanisms such as acute interstitial nephritis, and tubular cast formation further mediate renal injury [10,12,13]. However, despite advances in understanding these molecular pathways, the overall pathogenesis of vancomycin-induced nephrotoxicity remains incompletely understood, as significant individual variability and complex interactions between different injury mechanisms continue to be observed.
A lot of studies have been conducted to narrow this gap. For example, Aljefri et al. (2019) performed a systematic review analyzing the relationship between vancomycin AUC and acute kidney injury, synthesizing data from eight observational studies involving 2491 patients [14]. Their meta-analysis demonstrated that maintaining vancomycin AUC below approximately 650 mg·h/L within the first 48 h was associated with a significantly lower risk of AKI, and that AUC-guided monitoring strategies resulted in less nephrotoxicity compared to traditional trough-guided approaches, highlighting the importance of pharmacokinetic monitoring in reducing vancomycin-associated kidney injury. Kim et al. (2022) constructed a risk-scoring system for vancomycin-associated acute kidney injury by retrospectively analyzing clinical data with logistic regression and machine learning methods, including random forest and support vector machine, achieving AUROC values between 0.72 and 0.74 [15]. Their study identified low bodyweight, higher Charlson comorbidity index, elevated vancomycin trough levels, and concomitant use of multiple nephrotoxic agents as important predictors, providing a practical tool for risk assessment in clinical settings.
Given the complex and multifactorial mechanisms underlying vancomycin-induced nephrotoxicity, there is an urgent need for more precise clinical risk stratification—especially in critically ill ICU patients who are inherently more vulnerable to kidney injury. Serum creatinine elevation is a key indicator of kidney injury and is closely monitored during vancomycin therapy. By leveraging machine learning approaches to analyze the relationship between vancomycin administration and creatinine dynamics, our study aims to identify which variables most strongly predict creatinine elevation and subsequent renal complications in ICU populations. This approach not only enables the identification of high-risk patients and modifiable risk factors (such as drug exposure levels, concomitant medications, and patient-specific characteristics) but also provides a data-driven foundation for optimizing vancomycin use. Ultimately, these insights can improve individualized dosing strategies, minimize the incidence of vancomycin-associated kidney injury, and enhance patient safety in the ICU setting.
This study makes several significant methodological and clinical contributions to vancomycin nephrotoxicity prediction in critical care settings:
  • Temporal precision in outcome definition: Unlike previous studies using pre-existing AKI diagnosis labels, we employed time-stamped serum creatinine trajectories to define vancomycin-associated renal injury. This addresses critical data leakage where AKI diagnoses lack temporal precision, making it impossible to confirm kidney injury occurred after vancomycin administration. Our KDIGO-based criteria (≥0.3 mg/dL within 48 h or ≥50% increase within 7 days post-vancomycin) ensure strict temporal validity and eliminate reverse causality—where pre-existing renal dysfunction is misclassified as drug-induced injury.
  • Comprehensive feature engineering with clinical validation: We developed a multi-domain feature extraction pipeline capturing clinical state immediately prior to vancomycin across three data sources: real-time monitoring (chartevents), laboratory evaluation (labevents), and procedural interventions (procedureevents). Each feature represents the latest measurement before drug initiation, mirroring real-world clinical scenarios while maintaining strict temporal boundaries to prevent data leakage and enhance deployability.
  • Posterior-based risk stratification with uncertainty quantification: We leveraged DREAM algorithm to estimate full nephrotoxicity risk distributions rather than point predictions. By conditioning on priors from patients with creatinine elevation, DREAM outputs individualized probability curves with explicit uncertainty bounds. This allows clinicians to assess both risk level and prediction confidence, supporting threshold-free decisions where wide confidence intervals suggest monitoring while narrow intervals support definitive interventions.
  • Systematic model benchmarking with clinical interpretability: We evaluated six diverse machine learning paradigms spanning linear models, ensemble methods, and neural networks, identifying CatBoost as optimal. Multiple interpretability techniques—SHAP analysis, Accumulated Local Effects, and ablation studies—create a transparent, clinically auditable framework supporting evidence-based bedside decision-making through physiologically plausible predictors.

2. Materials and Methods

2.1. Data Source and Study Design

The study was conducted retrospectively using MIMIC-IV (v2.2) [16], a large-scale, de-identified database of ICU admissions curated by the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center. MIMIC-IV contains comprehensive, time-stamped data on demographics, vital signs, laboratory tests, medication exposures, and procedures for ICU patients admitted between 2008 and 2019. Leveraging this database, we constructed a binary classification framework to predict renal injury risk using only clinical information available prior to the first vancomycin dose. The entire experimental pipeline—including data preprocessing, feature selection, model development, performance evaluation, and explainability—was grounded in clinical relevance and designed for translational applicability to ICU workflows.
To operationalize the entire machine learning pipeline, we constructed a modular, step-wise framework encompassing cohort definition, data preprocessing, feature selection, model training, and interpretation. The full workflow is outlined below in pseudocode format:
This structured design ensured that each methodological step—from cohort assembly to final interpretability—was aligned with clinical reasoning, statistically rigorous, and transparent for deployment in real-world ICU settings, as shown in Algorithm 1.
Algorithm 1 Machine Learning Framework for Predicting Vancomycin-Associated Renal Injury
Require: MIMIC-IV ICU data (2008–2019)
Ensure: Binary prediction of creatinine elevation risk following vancomycin exposure
1:
Step 1: Study Cohort Construction
2:
Identify adult ICU patients who received IV vancomycin
3:
Apply inclusion criteria: age between 18 and 80, no active cancer
4:
Retain first ICU stay only
5:
Define outcome using KDIGO-based thresholds: Δ Cr ≥ 0.3 mg/dL within 48 h or ≥ 50% rise within 7 days
6:
Step 2: Data Preprocessing
7:
Extract most recent values prior to vancomycin initiation
8:
Impute continuous variables with median; categorical with mode
9:
Normalize continuous variables using min-max scaling
10:
Retain binary indicators as 0/1
11:
Step 3: Feature Selection
12:
Group features by source (chart events, lab events, procedure events)
13:
Perform univariate statistical testing via SelectKBest with F-statistics ( p > 0.05 )
14:
Select top 30 features with highest F-statistics from univariate analysis
15:
Apply Random Forest importance ranking to identify final 15 most predictive variables
16:
Combine selected features with admission-level characteristics
17:
Step 4: Imbalance Handling
18:
Apply SMOTE oversampling within training folds to address class imbalance
19:
Step 5: Model Development
20:
for all models ∈ {CatBoost, LightGBM, XGBoost, Logistic Regression, Naïve Bayes, Shallow Neural Net} do
21:
    Perform stratified 5-fold cross-validation
22:
    Optimize hyperparameters via grid search
23:
    Evaluate AUROC, F1-score, sensitivity, specificity, PPV, NPV
24:
end for
25:
Step 6: Posthoc Analysis and Interpretation
26:
Assess statistical equivalence of training/test sets via t-tests
27:
Quantify marginal utility via ablation analysis
28:
Interpret global/local effects using SHAP and ALEs
29:
Quantify predictive uncertainty via DREAM posterior sampling

2.2. Study Population

We retrospectively identified a cohort of ICU patients who received intravenous vancomycin to investigate risk factors associated with drug-induced renal injury. A stepwise screening process was applied to ensure clinical relevance and to minimize potential confounding.
The initial cohort included 25,916 ICU stays involving adult patients who had received at least one dose of intravenous vancomycin. This broad inclusion criterion ensured that all relevant exposures were captured. To reduce physiological heterogeneity and age-related renal confounding, we restricted the cohort to patients aged between 18 and 80 years, excluding those outside this range. Pediatric patients differ in renal maturity, pharmacokinetics, and treatment protocols, while elderly patients often have baseline renal impairment, increasing the likelihood of non–drug-related creatinine fluctuations. This step resulted in 21,925 eligible patients.
Next, we excluded patients with active malignancy or metastatic cancer, as these conditions may independently affect renal function through paraneoplastic processes, chemotherapy, or end-of-life care, which could confound the renal effects attributed to vancomycin. After removing 2720 such cases, 19,205 patients remained. To avoid duplicate representations and ensure that extracted features reflected the initial stage of critical illness, we retained only the first ICU admission per patient. This yielded a final study population of 10,288 unique ICU stays.
The prediction target was defined based on clinical consensus criteria from the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines and widely adopted nephrotoxicity thresholds [17,18]. Specifically, a patient was considered to have vancomycin-associated renal injury if either of the following criteria were met after the initial vancomycin dose: (1) an absolute increase in serum creatinine of at least 0.3 mg/dL within 48 h, or (2) a relative increase of 50% or more in peak creatinine within 7 days compared to the most recent pre-vancomycin baseline. These criteria reflect clinically meaningful renal impairment and align with pharmacovigilance standards for nephrotoxic agents. A total of 2903 patients met at least one of these criteria and were labeled positive, while the remaining 7385 patients served as the negative group, as shown in Figure 1.

2.3. Data Preprocessing

In this study, we developed a preprocessing pipeline specifically tailored to the structure of ICU data and the clinical context of vancomycin use. All predictor variables were extracted based on the most recent available measurement prior to the first vancomycin dose, ensuring strict temporal validity for prospective risk modeling.
We applied different imputation strategies to continuous and categorical variables. For continuous variables, including laboratory and physiological measures such as Phosphate and Anion Gap, missing values were imputed using the median from the training set [19]. This method is robust to outliers and skewed distributions, which are common in critical care laboratory data. For categorical variables such as Richmond-RAS Scale, Braden Mobility, and presence of an Arterial Line, mode imputation was used to preserve the most frequent clinical state and reduce noise from documentation inconsistencies. To ensure data quality and avoid unreliable imputations, variables with more than 20% missingness were removed during preprocessing.
Because our study aimed to predict renal outcomes based on patient status at the time of drug initiation, we did not compute temporal summary statistics (e.g., max, min, or mean over 24 h). Instead, each variable was represented by a single value—the latest recorded measurement prior to vancomycin administration. This approach mirrors clinical practice and enhances the interpretability of predictions in real-time applications.
To ensure scale comparability across variables, min–max normalization was applied to all continuous variables. This transformation mapped raw values to the [ 0 , 1 ] interval and prevented features with large numerical ranges, such as lactate or platelet count, from dominating learning algorithms. Binary variables, including indicator flags for devices, procedures, and comorbidities, were retained in their native 0/1 form to preserve clinical meaning and compatibility with tree-based and linear models.
The dataset exhibited a moderate class imbalance, with substantially fewer patients meeting the criteria for vancomycin-associated renal injury. To address this, we applied the Synthetic Minority Over-sampling Technique to the training folds during cross-validation [20]. SMOTE generated synthetic minority-class samples by interpolating between existing observations, helping the model learn boundary regions more effectively while avoiding overfitting. Specifically, for a minority-class instance x minority and one of its k-nearest neighbors x neighbor , a synthetic sample x new is created using:
x new = x minority + δ · ( x neighbor x minority ) , δ U ( 0 , 1 )
where δ is a random scalar drawn from a uniform distribution. This oversampling step was restricted to the training data and excluded from test folds to ensure unbiased evaluation. We tested alternative imbalance-handling strategies, including class weighting and focal loss; however, these approaches produced less stable performance across validation folds, likely due to the small number of positive nephrotoxicity cases and resultant probability compression effects. SMOTE was therefore selected as the primary strategy because it generated the most consistent improvement in both AUROC while preserving physiologically plausible feature relationships by operating directly within the minority-class feature space.
All preprocessing steps—including imputation, normalization, encoding, and oversampling—were conducted independently on training data and subsequently applied to the corresponding validation sets. This design avoided information leakage and ensured that model evaluation reflected performance on unseen data. The resulting feature matrix provided a temporally aligned, numerically stable, and clinically interpretable basis for predictive modeling.

2.4. Feature Selection

We initially compiled a comprehensive set of candidate predictors from MIMIC-IV’s three primary clinical data tables, each representing distinct aspects of ICU patient monitoring and care. This systematic approach ensured comprehensive coverage of physiological systems associated with vancomycin-induced nephrotoxicity while maintaining clinical interpretability.
The chartevents table contains real-time physiological measurements and clinical assessments documented at the bedside, representing continuous patient monitoring data. This table captures vital signs, neurological assessments, and point-of-care laboratory values that reflect immediate patient status. From this domain, we extracted features including Richmond-RAS Scale, Total Bilirubin, Arterial Base Excess, AST, Braden Mobility, and Mean Airway Pressure, along with additional real-time observations such as Heart Rate, Non-Invasive Blood Pressure, and SpO2 that were initially considered. These variables represent the dynamic physiological state and acute illness severity that may predispose patients to vancomycin nephrotoxicity.
The labevents table encompasses formal laboratory test results processed in hospital laboratories, providing biochemical markers of organ function and metabolic status. This systematic laboratory evaluation offers objective measures of renal function, electrolyte balance, coagulation status, and metabolic derangements. Key features from this domain included Phosphate, Anion Gap, Magnesium, Lactate, PTT, Platelet Count, White Blood Cells, and Glucose, as well as additional candidates like BUN, INR, and Calcium that were reviewed during initial screening [21]. These laboratory parameters are critical for assessing baseline renal vulnerability and identifying patients at higher risk for drug-induced nephrotoxicity.
We initially compiled a comprehensive set of candidate predictors from MIMIC-IV’s three primary clinical data tables, with each table representing distinct aspects of ICU patient monitoring and care. This systematic approach ensured comprehensive coverage of physiological systems associated with creatinine elevation following vancomycin administration while maintaining clinical interpretability.
Real-time physiological measurements and clinical assessments documented at the bedside are contained within the chartevents table, which represents continuous patient monitoring data. Vital signs, neurological assessments, and point-of-care laboratory values that reflect immediate patient status are captured by this table. From this domain, we extracted features including Richmond-RAS Scale, Total Bilirubin, Arterial Base Excess, AST, Braden Mobility, and Mean Airway Pressure, along with additional real-time observations such as Heart Rate, Non-Invasive Blood Pressure, and SpO2 that were initially considered. The dynamic physiological state and acute illness severity that may be associated with creatinine elevation following vancomycin treatment are represented by these variables.
Formal laboratory test results processed in hospital laboratories are encompassed by the labevents table, which provides biochemical markers of organ function and metabolic status. Objective measures of renal function, electrolyte balance, coagulation status, and metabolic derangements are offered through this systematic laboratory evaluation. Key features from this domain included Phosphate, Anion Gap, Magnesium, Lactate, PTT, Platelet Count, White Blood Cells, and Glucose, as well as additional candidates like BUN, INR, and Calcium that were reviewed during initial screening [21]. Assessing baseline renal vulnerability and identifying patients at higher risk for creatinine elevation after vancomycin administration are critical functions of these laboratory parameters.
The procedureevents table documents invasive procedures and interventions performed during ICU care, reflecting both illness severity and therapeutic intensity. Procedural data indicates the level of medical intervention required and serves as a proxy for critical illness severity. From this domain, we included features such as the presence of an Arterial Line, central line insertion and mechanical ventilation initiation [22]. These interventional markers provide insights into hemodynamic monitoring needs, vascular access requirements, and respiratory support intensity, all of which correlate with nephrotoxicity risk in critically ill patients receiving vancomycin.
This clinically grounded three-domain approach reflects the fundamental pillars of ICU care—continuous physiologic monitoring, systematic biochemical evaluation, and therapeutic intervention intensity—enabling structured feature selection while preserving alignment with clinical decision-making processes for renal risk assessment.
Our feature selection employed a two-stage approach combining statistical filtering with machine learning-based importance ranking to identify the most predictive variables for vancomycin-associated nephrotoxicity while minimizing overfitting and maintaining clinical interpretability.
Stage 1—Statistical filtering (SelectKBest): The initial stage utilized univariate statistical testing to reduce dimensionality and eliminate non-informative features. We applied the F-statistic from ANOVA F-test via SelectKBest with f_classif scoring function to evaluate each feature’s individual association with nephrotoxicity outcome [23]:
F = MS between MS within = i n i ( X ¯ i X ¯ ) 2 / ( k 1 ) i , j ( X i j X ¯ i ) 2 / ( N k )
where k represents the number of groups (nephrotoxicity vs. no nephrotoxicity), n i is the sample size of group i, and N is the total sample size. This approach selected the top 30 features with the highest F-statistics, effectively filtering variables that demonstrated significant group differences in vancomycin-treated ICU patients.
Stage 2—Machine learning-based ranking (Random Forest importance): From the 30 statistically significant features, we applied Random Forest feature importance to identify the most predictive subset for nephrotoxicity risk [24]. Random Forest importance quantifies each feature’s contribution to prediction accuracy by measuring the mean decrease in node impurity across all trees [25]:
Importance ( X j ) = 1 T t = 1 T v V t p ( v ) · Δ I ( v , X j )
where T is the number of trees, V t represents nodes in tree t that split on feature X j , p ( v ) is the proportion of samples reaching node v, and Δ I ( v , X j ) is the Gini impurity decrease from splitting on feature X j at node v. This method has proven effective in medical prediction tasks, particularly for identifying key risk factors in critical care settings [26].
This two-stage approach selected the final 15 most important features, balancing statistical significance with predictive utility while reducing overfitting risk. The combination of univariate filtering and ensemble-based ranking ensures both statistical validity and clinical relevance for vancomycin nephrotoxicity prediction, following established practices in medical machine learning applications [27].
The ordered importance of the selected variables is shown in the Table 1.
In addition to the three clinical event categories, we incorporated four key admission-level characteristics that establish baseline patient risk profiles. Age serves as a fundamental predictor of vancomycin nephrotoxicity due to reduced renal reserve and altered drug clearance in older patients [28]. Emergency Department (ED) duration reflects clinical acuity and complexity prior to ICU admission, potentially indicating hemodynamic instability that may predispose to renal injury. Charlson Comorbidity Index quantifies pre-existing chronic disease burden using admission diagnoses, providing standardized baseline health status assessment that influences nephrotoxicity susceptibility [29]. Acute Physiology Score III (APSIII) measures acute illness severity during the first 24 h of ICU admission, capturing physiological derangement that correlates with organ dysfunction risk [30]. These admission-level features complement dynamic clinical measurements by establishing foundational risk context, enabling the model to account for both baseline vulnerability and acute physiological changes.
Beyond the three clinical event categories, we incorporated four key admission-level characteristics that establish baseline patient risk profiles. Age constitutes a fundamental predictor of creatinine elevation following vancomycin administration, given that reduced renal reserve and altered drug clearance occur in older patients [28]. Emergency Department (ED) duration indicates clinical acuity and complexity prior to ICU admission, which may suggest hemodynamic instability that could be associated with renal injury. Charlson Comorbidity Index quantifies pre-existing chronic disease burden based on admission diagnoses, thereby providing standardized baseline health status assessment that influences susceptibility to creatinine elevation [29]. Acute Physiology Score III (APSIII) evaluates acute illness severity within the first 24 h of ICU admission, thus capturing physiological derangement that correlates with organ dysfunction risk [30]. These admission-level features complement dynamic clinical measurements by establishing foundational risk context, which allows the model to account for both baseline vulnerability and acute physiological changes.
The final selected features are summarized in Table 2.

2.5. Modeling

To predict vancomycin-associated renal injury in ICU patients, we developed a supervised machine learning framework incorporating six representative classification algorithms, each selected for its theoretical strengths, suitability for clinical data, and complementary modeling capabilities. The dataset was randomly split into stratified training (70%) and test (30%) sets to preserve outcome distribution. All model development—including hyperparameter tuning and performance validation—was conducted using five-fold stratified cross-validation within the training set to avoid information leakage and ensure generalizability.
We employed random stratified splitting instead of temporal splitting based on admission dates for medical and data science considerations. From a medical perspective, vancomycin nephrotoxicity mechanisms involve fundamental physiological processes—oxidative stress, tubular injury, and mitochondrial dysfunction—that remain biologically consistent over time, making temporal evolution less relevant than comprehensive patient representation. From a data science perspective, random splitting maximizes model performance by incorporating the most recent clinical practices and contemporary protocols in training, while temporal splitting would force the model to learn from outdated 2008–2014 data and exclude valuable 2015–2019 insights. Our approach ensures optimal clinical applicability for current deployment while maintaining statistical rigor through robust cross-validation techniques.
We prioritized CatBoost, LightGBM, and XGBoost as core modeling algorithms due to their proven effectiveness in handling structured clinical datasets. These tree-based ensemble methods leverage gradient boosting to sequentially improve prediction accuracy while capturing non-linear feature interactions and hierarchical patterns. CatBoost was particularly advantageous for its ordered boosting strategy and native support for categorical variables, reducing overfitting risks without extensive preprocessing. LightGBM offered efficiency through histogram-based binning and leaf-wise growth, which accelerated training on high-dimensional data. XGBoost provided granular control over regularization, tree complexity, and sampling strategies, allowing for flexible bias–variance trade-offs.
For interpretability, we included logistic regression with both L1 (lasso) and L2 (ridge) penalties to serve as a transparent baseline model. Its linear structure allowed for direct inspection of coefficient weights, enabling clinicians to interpret variable effects in a familiar framework. Regularization was applied to control multicollinearity and overfitting, with hyperparameters optimized via grid search.
A Gaussian Naïve Bayes classifier was also evaluated as a probabilistic baseline. This lightweight model assumes conditional independence among predictors and models continuous variables using parametric likelihoods. While simplistic, its efficiency and interpretability make it a useful benchmark for gauging the added value of more expressive algorithms.
Lastly, we implemented a shallow feedforward neural network to explore non-linear, high-capacity representations. The architecture included a single hidden layer with ReLU activation and a sigmoid output node. Training was performed using the Adam optimizer and binary cross-entropy loss, with hyperparameters such as learning rate, dropout ratio, and batch size selected via nested tuning. Although more data-intensive and less interpretable, neural networks offer flexible function approximation that may be valuable in larger-scale or multimodal extensions of this work.
Model performance was primarily evaluated using the area under the receiver operating characteristic curve (AUROC), which provides a threshold-independent measure of discrimination. Given the class imbalance in renal injury outcomes, AUROC served as a robust metric for comparing overall model performance. To quantify variability and assess stability, 95% confidence intervals for AUROC were calculated using 2000 bootstrap replicates.
In addition to AUROC, we reported a set of complementary metrics to ensure clinical applicability. Accuracy provided a general overview of classification performance, while the F1-score balanced precision and recall—important for minimizing both false positives and false negatives. Sensitivity and specificity were included to evaluate the model’s ability to detect high-risk cases and avoid overtreatment, respectively. Positive predictive value (PPV) and negative predictive value (NPV) were also calculated to aid in clinical interpretation of the model’s predictions. This comprehensive evaluation strategy ensured a balanced assessment of both statistical performance and real-world utility.
Together, these six models span a diverse spectrum of complexity, interpretability, and inductive bias, allowing for a comprehensive assessment of machine learning paradigms in predicting early renal complications related to vancomycin administration.

2.6. Statistical Analyses

To ensure the clinical relevance, interpretability, and methodological robustness of our vancomycin-associated renal outcome prediction framework, we implemented a suite of statistical analyses tailored to five key objectives: (1) to validate the comparability of training and test cohorts; (2) to quantify the marginal contribution of each feature through ablation; (3) to uncover non-linear or threshold effects using Accumulated Local Effects; (4) to interpret individual-level predictions via SHAP; and (5) to estimate predictive uncertainty through posterior sampling. These analyses were designed to support both the statistical credibility and bedside applicability of our model for predicting early renal injury and creatinine elevation after vancomycin exposure.
We first evaluated the statistical equivalence of the training and test sets using two-sided independent-sample t-tests across core clinical variables [31] including laboratory results and physiologic scores. Welch’s correction was applied when unequal variances were detected. The t-statistic was computed using:
t = x ¯ 1 x ¯ 2 s 1 2 n 1 + s 2 2 n 2 ,
This ensured that the stratified sampling strategy did not introduce distributional bias, thereby supporting valid model generalization and downstream inference. Clinically, it confirmed that both sets of patients were comparable at baseline so that observed differences in predicted renal outcomes could be attributed to model signals rather than sampling artifacts.
To assess the individual predictive utility of each feature, an ablation analysis was conducted by iteratively removing one variable at a time from the final model and retraining a logistic regression classifier [32]. The impact of removal was measured by changes in AUROC, offering direct insight into each variable’s marginal contribution. This analysis helped highlight which physiologic and laboratory factors were most strongly associated with the risk of vancomycin-related renal impairment, operationalized as significant post-administration creatinine elevation.
Formally, the ablation effect Δ A U C ( x i ) for a given feature x i was computed using:
Δ A U C ( x i ) = A U C full A U C x i
where A U C full denotes the AUROC of the complete model including all features, and A U C x i denotes the AUROC after removing feature x i . A larger Δ A U C ( x i ) indicates greater marginal importance of the feature in predicting renal risk.
To characterize non-linear associations and clinically relevant thresholds, we applied Accumulated Local Effects for top-ranked continuous variables. ALE curves provide unbiased estimates of a feature’s local impact on model predictions while addressing multicollinearity. Formally, the ALE for a given feature x j is defined as:
A L E j ( z ) = z 0 z E X j f ( X ) x j | x j = s d s ,
These visualizations revealed interpretable patterns—such as saturation effects in phosphate and U-shaped trends in magnesium—that aligned with known renal physiology, particularly in the context of drug-induced tubular stress and hemodynamic injury. These insights help clinicians define physiologic thresholds where risk escalates sharply, enabling more timely monitoring or adjustment of nephrotoxic therapies.
To enhance patient-level interpretability and clinical auditability, we employed SHAP to decompose predictions into additive feature contributions [33]. Each SHAP value ϕ i represents the marginal contribution of a feature relative to all possible feature coalitions:
ϕ i = S F { x i } | S | ! ( | F | | S | 1 ) ! | F | ! f ( S { x i } ) f ( S ) ,
This allowed us to generate both global importance rankings and patient-specific explanation plots, improving model transparency and facilitating clinical trust. From a medical standpoint, SHAP enables clinicians to trace a patient’s predicted renal risk back to underlying contributing features—e.g., elevated lactate or low platelet count—offering rationale for interventions and supporting explainable AI in nephrotoxic drug management.
Finally, to quantify uncertainty in individual-level predictions, we implemented Bayesian posterior sampling using the DREAM algorithm. Unlike point estimates, this method generates a full posterior distribution for the predicted probability of vancomycin-associated creatinine elevation, allowing explicit quantification of prediction uncertainty. DREAM was incorporated as a distributional wrapper around CatBoost to capture the variability arising from model stochasticity and input perturbations, providing clinically meaningful credible intervals that help distinguish high-risk patients from those with uncertain or unstable predictions.
The posterior predictive distribution was estimated as:
p ( y X ) = 1 N × C i = 1 C j = 1 N p ( y θ i j , X )
where N is the number of iterations per chain and C is the number of parallel chains. Each θ i j represents sampled model parameters from chain i at iteration j. This multi-chain, multi-iteration approach improves parameter space exploration and sampling robustness.
DREAM is particularly suited for clinical prediction due to its adaptive proposal mechanism and efficient sampling in complex, high-dimensional ICU data. Unlike basic MCMC, DREAM dynamically adjusts its sampling strategy, ensuring more reliable posterior estimation without model retraining.
In ICU practice, this uncertainty-aware prediction supports patient-specific decisions. For example, a high predicted risk with a narrow credible interval may prompt early intervention, while wide intervals may suggest close monitoring instead of immediate treatment changes. This enables more personalized and balanced renal risk management, especially for vancomycin-treated patients.
In summary, Bayesian posterior sampling with DREAM provides an efficient, interpretable way to communicate model uncertainty, enhancing the practical value of creatinine elevation risk prediction in real-time ICU settings.
Together, these statistical analyses strengthened the model’s validity, interpretability, and real-world relevance. They ensured that our framework is not only predictive but also transparent and clinically actionable for forecasting early creatinine elevation and renal risk in ICU patients receiving vancomycin.

3. Results

3.1. Cohort Characteristics and Statistical Comparison

The study cohort consists of critically ill patients who received vancomycin during their ICU stay. This study specifically focuses on identifying patient profiles that are more prone to developing significant creatinine elevation following vancomycin exposure. Although the creatinine changes observed meet the diagnostic thresholds typically used for AKI, the research target is the creatinine response associated with vancomycin, not the broader clinical syndrome of AKI. This distinction emphasizes the interest in vancomycin-induced renal effects rather than all-cause AKI.
The dataset was randomly divided using stratified sampling into a training set (70%) and a test set (30%), ensuring that the distribution of creatinine elevation events was balanced across both subsets. This partitioning minimizes sampling bias and enhances the reliability and generalizability of the model.
As shown in Table 3, none of the 19 clinical variables demonstrated statistically significant differences ( p > 0.05 ) between the training and test sets, confirming the internal consistency and comparability of the two cohorts.
Table 4 presents the baseline characteristics of the creatinine elevation and non-elevation groups. Patients with creatinine elevation exhibited higher levels of phosphate, lactate, PTT, anion gap, AST, and total bilirubin, along with lower platelet counts, more negative arterial base excess, and reduced Richmond-RAS scores. These differences suggest a combination of metabolic, coagulative, and neurologic disturbances. Additionally, the creatinine elevation group tended to be older and had higher Charlson Comorbidity Index scores, indicating a greater chronic disease burden.
These findings are consistent with known mechanisms of vancomycin-associated nephrotoxicity, including oxidative stress, tubular injury, and impaired renal perfusion. By focusing specifically on creatinine elevation linked to vancomycin use, this study offers a targeted perspective on medication-associated renal risk, rather than the generalized context of ICU-related AKI. The internal consistency of the dataset, along with the biological plausibility of the identified predictors, supports the clinical relevance and reliability of the proposed model in assessing vancomycin-associated creatinine elevation risk.
These findings demonstrate concordance with established mechanisms underlying creatinine elevation associated with vancomycin administration, encompassing oxidative stress, tubular injury, and compromised renal perfusion. Through specific emphasis on creatinine elevation related to vancomycin use, this investigation provides a focused approach to medication-associated renal risk assessment, as opposed to the broader framework of ICU-related AKI. The dataset’s internal consistency, combined with the biological plausibility of the identified predictors, validates the clinical relevance and reliability of the proposed model for evaluating creatinine elevation risk following vancomycin administration.

3.2. Feature Contribution Analysis

To evaluate individual feature contributions for predicting vancomycin-associated creatinine elevation in critically ill patients, we conducted an ablation analysis. As illustrated in Figure 2, each feature was systematically removed, and a logistic regression classifier was retrained using bootstrap sampling to assess its marginal impact on AUROC. The red dashed line represents the baseline AUROC (0.800) achieved when all features were included. This analysis aimed to quantify the predictive utility of each variable independently of specific model architectures, providing information on which clinical factors most strongly contribute to the evaluation of nephrotoxicity risk.
Notably, the exclusion of phosphate led to the most substantial decline in model performance (AUROC dropping to approximately 0.780), indicating that phosphate plays the dominant role in identifying patients at higher risk of creatinine elevation following vancomycin exposure. This is consistent with clinical observations that abnormal phosphate levels may reflect early renal stress or tubular dysfunction, which are critical in the development of drug-associated nephrotoxicity. Remarkably, phosphate exclusion resulted in the most pronounced deterioration in model performance (AUROC dropping to approximately 0.780), demonstrating that phosphate assumes the predominant role in identifying patients at elevated risk of creatinine elevation subsequent to vancomycin exposure. This aligns with clinical observations whereby abnormal phosphate levels may signify early renal stress or tubular dysfunction, processes that are pivotal in the progression of drug-associated creatinine elevation.
Other features with considerable impact included APSIII and magnesium, which showed notable performance decreases when removed (AUROC approximately 0.793 and 0.792 respectively), highlighting the importance of acute illness severity and electrolyte balance in nephrotoxicity risk assessment. Charlson Comorbidity Index and arterial line presence also demonstrated moderate contributions to predictive performance. In contrast, features such as ED duration, anion gap, and several laboratory values showed minimal impact when excluded, suggesting these variables provide limited unique predictive information in this clinical context.
It is important to emphasize that this ablation analysis serves a different purpose than traditional model ablation studies. Rather than evaluating model robustness, this analysis specifically quantifies individual feature contributions using logistic regression as a standardized baseline classifier. This approach provides model-agnostic insights into feature utility that may differ from contributions in more complex architectures, as non-linear interactions and feature substitution effects are not captured by the linear baseline.
Overall, the ablation results demonstrate clear hierarchical importance among clinical variables, with phosphate, acute illness severity, and key electrolyte markers providing the strongest individual contributions to vancomycin nephrotoxicity prediction. These findings support the biological plausibility of the feature set and provide valuable insights for clinical risk factor prioritization.

3.3. Model Performance on Creatinine Elevation Risk Prediction

To evaluate the capacity of different algorithms to predict vancomycin-associated creatinine elevation among ICU patients, we tested six widely used machine learning models. Performance metrics on the test set—including AUROC, sensitivity, specificity, F1-score, and predictive values—are summarized in Table 5. ROC curves for the test set are illustrated in Figure 3.
Our dataset exhibited moderate class imbalance with 28.2% nephrotoxicity cases (2903 of 10,288 patients). To ensure fair comparison across algorithms, sensitivity was manually fixed at 0.800 for all models, allowing direct evaluation of specificity and precision trade-offs at this clinically relevant threshold.
Among the models evaluated, CatBoost achieved the highest AUROC of 0.818 (95% CI: 0.801–0.834), indicating strong discriminatory ability. It also delivered the best overall accuracy (0.714), highest F1-score (0.605), and maintained a solid specificity of 0.681 at the fixed sensitivity threshold, ensuring the model captures most high-risk patients without overwhelming clinicians with false alarms. Critically, CatBoost’s performance demonstrates genuine predictive skill beyond baseline rates: its PPV of 0.486 represents a 72% improvement over the 0.282 baseline prevalence, while the NPV of 0.900 substantially exceeds the 0.718 rate a naive “always safe” classifier would achieve. This translates to correctly identifying an additional 182 low-risk patients per 1000 predictions beyond chance alone, confirming clinically meaningful risk stratification that substantially exceeds baseline performance expectations.
From a clinical perspective, this level of performance is particularly valuable in real-world ICU settings where vancomycin is commonly used to treat severe infections but carries well-documented nephrotoxic potential. Even a modest rise in creatinine can signal the early stages of renal injury in ICU patients, where rapid clinical deterioration is possible. In practice, this model can support timely risk stratification by identifying patients who may benefit from intensified renal monitoring, dose adjustment, or consideration of alternative therapies. For example, correctly flagging 80% of future creatinine elevation cases while still safely ruling out nearly 70% of low-risk patients provides actionable guidance that can directly inform bedside decisions.
The high NPV (0.900) is particularly valuable for clinical decision-making, enabling two complementary treatment strategies: confidently continuing vancomycin therapy in patients predicted as low-risk while implementing enhanced monitoring protocols for high-risk patients. This dual approach helps avoid both unnecessary treatment interruptions in safe patients and delayed intervention in vulnerable patients, optimizing antimicrobial stewardship without compromising patient safety.
CatBoost was ultimately selected as the final model not only for its superior test performance but also for its robustness to missing data and its ability to handle heterogeneous ICU feature sets. Its interpretability is strengthened by its reliance on physiologically meaningful predictors, such as phosphate, bilirubin, and comorbidity burden, which are strongly associated with renal stress mechanisms linked to vancomycin use. This ensures that the model’s outputs are transparent and clinically intuitive, even for readers without a background in machine learning.
In summary, by prioritizing high sensitivity and carefully balancing specificity, this framework provides a practical, explainable tool to support early detection of vancomycin-associated creatinine elevation and renal injury risk in ICU patients. It offers meaningful, real-time clinical support without compromising interpretability or safety.

3.4. SHAP Analysis and Feature Attribution

To interpret the contribution of individual variables in predicting vancomycin-associated creatinine elevation among ICU patients, SHAP analysis was applied to the CatBoost model. Figure 4 presents a SHAP summary plot, where each point represents an individual patient. The x-axis shows the SHAP value, which reflects the degree to which a feature influences the model’s prediction for that patient. Colors indicate the feature value: red points correspond to high values, and blue points represent low values.
Phosphate was the most influential predictor, with high phosphate levels (red points) consistently positioned on the right side of the plot, indicating that elevated phosphate levels substantially increase the predicted risk of creatinine elevation. This pattern is clinically plausible, as phosphate retention may signal impaired renal handling and early kidney stress.
Total bilirubin and magnesium also ranked among the top contributors. For these features, red and blue points are more evenly spread across the SHAP axis, suggesting complex and potentially non-linear relationships. For example, the impact of magnesium appears bidirectional, where both low and high magnesium levels may influence creatinine trajectories, which is reflected in red and blue points appearing on both sides of the SHAP axis.
Charlson comorbidity index and APS III also demonstrated significant impact. Higher scores (red points) typically increased predicted risk, reflecting the influence of chronic disease burden and acute illness severity on renal vulnerability. The more concentrated distribution of red points on the right side for these variables suggests a relatively straightforward, monotonic relationship.
The colored point distribution across the SHAP axis offers insight into both the importance and the detailed effect patterns of each variable. Features where red points consistently shift predictions to the right imply a clear risk-enhancing role, while mixed distributions suggest subtler, context-dependent effects. The model’s reliance on physiologically meaningful predictors, as reflected in these SHAP patterns, supports its clinical interpretability and enhances its potential for integration into real-world ICU risk management, where early identification of vancomycin-related renal injury risk can inform timely monitoring and intervention.

3.5. ALE Analysis and Clinical Interpretability

To further investigate the local interpretability and clinical plausibility of our CatBoost model in predicting vancomycin-associated creatinine elevation among ICU patients, we conducted an ALE analysis focusing on four high-impact features: phosphate, APSIII, magnesium, and total bilirubin. The ALE plots, shown in Figure 5, illustrate the marginal effect of each variable on the model’s output while accounting for the influence of other features.
Phosphate exhibited a steep positive ALE effect between values of 2 and 5 mg/dL, after which the curve plateaus and slightly decreases. This suggests that increasing phosphate levels strongly raise the predicted risk of creatinine elevation up to a certain threshold, beyond which the effect saturates. The sharp rise at lower levels aligns with clinical understanding that even moderate phosphate elevation may reflect early renal stress or impaired clearance, especially in vancomycin-treated patients.
For APSIII, a widely used severity score, the ALE curve shows a continuous upward trend up to approximately 70, indicating that patients with more severe physiological disturbances are more likely to experience creatinine elevation following vancomycin exposure. The flattening of the curve at higher APSIII scores likely reflects the model’s conservative adjustments in data-sparse regions, which is a desirable trait for maintaining prediction stability.
Magnesium demonstrated a non-linear effect: the ALE sharply increases from 1.5 to approximately 2.5 mEq/L, then stabilizes or slightly decreases. This pattern may suggest that both low and excessively high magnesium levels are clinically relevant, but the model is most sensitive to moderate elevations in magnesium, which may be linked to systemic imbalance or renal susceptibility.
Total bilirubin’s ALE plot shows a rapid positive effect starting at low levels, which continues to rise gradually. This indicates that elevated bilirubin consistently increases the predicted risk, possibly reflecting the impact of hepatic-renal interactions or systemic inflammation that can exacerbate renal stress in critically ill patients.
Overall, the ALE analysis confirms that the model’s risk estimations are biologically coherent and sensitive to clinically relevant ranges of key predictors. The smooth, interpretable curves indicate that the model does not overfit to outliers and responds appropriately to gradual physiological changes. These results support the model’s potential for real-time risk stratification in ICU settings, where early identification of vancomycin-associated creatinine elevation can guide timely monitoring and intervention to mitigate renal injury risk.

3.6. Posterior Distribution and Prediction of Vancomycin-Related Creatinine Elevation

To incorporate uncertainty into the prediction of vancomycin-associated creatinine elevation among ICU patients, we applied the DREAM algorithm to the trained CatBoost model. Unlike traditional deterministic predictions, this approach generates a posterior distribution over the predicted probability, allowing clinicians to assess not only the most likely risk estimate but also the associated confidence range for individual patients. This is particularly valuable in ICU settings, where misjudging renal risk can lead to either delayed intervention or unnecessary treatment changes.
In this study, we set the number of iterations to 2000 and used 38 parallel chains, following the recommended practice of employing at least twice the number of model parameters (19 variables) to ensure sufficient posterior exploration. This configuration allows the DREAM algorithm to efficiently traverse the parameter space, reduce the risk of local convergence, and produce stable, reliable posterior distributions. Adequate iterations and chains are essential to capture uncertainty accurately, particularly in high-stakes ICU risk prediction tasks.
As shown in Figure 6, the posterior distribution for a representative high-risk patient is skewed toward higher predicted probabilities, with a mean risk of 60.5% and a 95% credible interval ranging from 16.8% to 89.4%. In comparison to the overall cohort creatinine elevation rate of 28.22%, this patient demonstrates substantially increased risk. The wide credible interval highlights clinical uncertainty, indicating that while the model identifies this patient as high-risk, variability in clinical features could significantly influence the actual outcome.
The high-risk patient profile used in the posterior simulation was constructed based on the creatinine elevation subgroup described in Table 4, which includes elevated phosphate, bilirubin, magnesium, and higher Charlson comorbidity index scores. This ensures that the sampling process is grounded in real-world ICU scenarios and reflects clinically plausible risk patterns for vancomycin-associated renal injury, rather than relying on theoretical or averaged inputs.
This probabilistic framework provides meaningful clinical nuance. Two patients may have similar mean predicted risks but differ in the width of their credible intervals—one presenting with high certainty and another with considerable uncertainty. For patients with wide intervals, clinicians may choose to prioritize enhanced monitoring over immediate medication adjustments, recognizing the potential variability in risk trajectories.
Importantly, DREAM operates without retraining the CatBoost model. It conditions posterior sampling on prior distributions derived from observed creatinine elevation cases, enabling computationally efficient uncertainty quantification that can be applied in real-time. This makes the approach practical for bedside decision support.
In summary, integrating uncertainty-aware prediction with CatBoost and DREAM offers a clinically interpretable and statistically robust framework to assess the risk of creatinine elevation during vancomycin treatment in ICU patients. This provides a clinical decision-support tool rather than establishing causal conclusions, enhancing physicians’ ability to incorporate uncertainty estimates into their clinical judgment when managing nephrotoxic therapy and optimizing renal risk surveillance.

4. Discussion

4.1. Summary of Existing Model Compilation

In this study, we developed a machine learning pipeline using the MIMIC-IV critical care database to predict vancomycin-associated renal injury in ICU patients, defined by significant elevations in serum creatinine. Our approach encompassed data extraction of ICU stays where patients received vancomycin, rigorous preprocessing and feature engineering, and the training/evaluation of multiple candidate algorithms. Notably, an ensemble of both linear and non-linear models (including logistic regression, random forests, gradient boosting variants, and neural networks) was explored. Among these, the CatBoost gradient boosting model emerged as the top performer, yielding the highest discrimination for predicting nephrotoxic injury. CatBoost’s superior performance can be attributed to its ability to handle heterogeneous clinical features (including categorical variables and missing values) and capture complex non-linear interactions without extensive tuning. Using nested cross-validation, CatBoost achieved the best overall accuracy and calibration, outperforming other models on metrics such as AUC and F1-score. The final CatBoost model demonstrated robust predictive ability for early identification of patients at risk of vancomycin-induced acute kidney injury (AKI). We also examined feature importance to interpret the model: key predictors of vancomycin-associated creatinine rise included baseline renal function (initial creatinine), vancomycin exposure factors (e.g., dosing intensity or duration), patient comorbidities, and concurrent nephrotoxic treatments. These findings underscore that a data-driven ML approach can synthesize many risk factors into an effective predictor, potentially enabling early warnings for impending renal impairment during vancomycin therapy.

4.2. Comparison with Prior Studies

Our results align with and extend the observations of prior clinical studies on vancomycin’s nephrotoxic effects. It is well established that vancomycin has a narrow therapeutic window and can precipitate kidney injury, especially in critically ill patients; reported incidences of AKI in ICU patients on vancomycin range up to 40% [34]. Earlier investigations primarily focused on identifying risk factors for vancomycin-induced nephrotoxicity. For instance, high vancomycin trough concentrations (15–20 mg/L) have consistently been associated with increased rates of creatinine elevation and AKI. van Hal et al. [35] conducted a meta-analysis and found that maintaining trough levels in the 15–20 mg/L range significantly heightens nephrotoxicity risk compared to lower targets. Likewise, Bosso et al. [36] reported a clear relationship between elevated vancomycin troughs and subsequent acute renal dysfunction in a prospective multicenter trial. These studies support the paradigm that aggressive vancomycin dosing can be harmful to the kidneys, corroborating our model’s inclusion of vancomycin exposure metrics as important predictors.
Beyond drug levels, patient-specific factors identified previously are in line with our findings. Age and critical illness severity, for example, are known to modulate vancomycin’s nephrotoxic risk. One study observed that patients over 80 years old experienced higher rates of vancomycin-related renal injury [37]. Concomitant nephrotoxic agents also play a role: the combination of vancomycin with piperacillin–tazobactam has been associated with a significantly higher incidence of AKI than vancomycin alone [38]. Our model implicitly captured such effects—for example, the presence of nephrotoxin co-administration featured among the most influential predictors contributing to risk stratification.
Notably, few earlier efforts have used machine learning for this problem. A recent study by Aghamirzaei et al. [39] developed a stacking ensemble model for vancomycin-induced AKI prediction in 314 ICU patients, reporting an impressive AUC of 0.94. Their model highlighted variables like serum creatinine trends and glucose variability. Our study builds on this with a substantially larger cohort from MIMIC-IV and demonstrates that even a single advanced algorithm (CatBoost) can achieve high predictive performance. In contrast to rule-based clinical scoring or traditional regression, our ML approach offers improved sensitivity for complex, non-linear interactions and patient heterogeneity.
A recent study by Bao et al. published in Frontiers in Pharmacology [40] also explored vancomycin-associated nephrotoxicity using the MIMIC-IV database. However, their method employed AKI diagnosis labels without access to exact onset times, a limitation stemming from MIMIC-IV’s lack of time-stamped AKI events. This hinders the ability to confirm that AKI occurred after vancomycin administration. In contrast, our study uses time-stamped serum creatinine trajectories to define kidney injury, ensuring that elevated creatinine levels follow vancomycin initiation. This temporal precision enhances causal interpretability and clinical relevance. Moreover, our focus on routine creatinine monitoring supports real-time risk prediction at the bedside—an advantage over AKI-label-based approaches that lack precise timing and are less actionable.
Additionally, pharmacologic monitoring supports the value of early prediction. Zamoner et al. [34] found that an abnormal vancomycin serum level could predict impending AKI roughly 48 h before diagnosis in septic ICU patients. Such findings resonate with our goal of early detection. By deploying a predictive model that continuously ingests routine EHR data—including labs, vitals, and drug administration—our work contributes a practical approach for real-time clinical integration. In summary, whereas prior studies established the risk factors and incidence of vancomycin nephrotoxicity, our model integrates those insights into a comprehensive, interpretable, and prospective prediction tool, enabling earlier intervention and improved patient safety.

4.3. Limitations and Future Works

This study has several important limitations. First, our retrospective design using MIMIC-IV data from a single academic medical center may limit generalizability to other ICU populations with different patient demographics, clinical protocols, or resource constraints. The temporal span (2008–2019) may not reflect current clinical practices, potentially affecting model performance in contemporary settings.
Second, our nephrotoxicity definition relies solely on serum creatinine elevation, which is a delayed marker that may miss subclinical renal injury or cases where elevation is masked by clinical factors. The absence of more sensitive biomarkers (NGAL, cystatin C) or functional assessments limits our ability to detect early nephrotoxicity.
Third, methodological constraints include potential imputation bias from missing data patterns, unmeasured confounding factors (genetic predisposition, subclinical kidney disease, concomitant nephrotoxic medications such as piperacillin-tazobactam), and feature selection limitations that may have excluded clinically relevant variables such as detailed vancomycin pharmacokinetics or hemodynamic parameters.
Fourth, while our CatBoost model achieved strong performance, its ensemble nature creates interpretability challenges for clinical adoption. The evaluation focused primarily on discrimination metrics without extensive exploration of calibration or clinical utility measures, and performance thresholds were chosen arbitrarily rather than through clinical consensus.
Fifth, comprehensive model calibration assessment using metrics such as Brier score, reliability plots, and Expected Calibration Error was not performed, which would provide additional insights into the model’s predictive reliability across different probability ranges.
Moreover, the present study extracted only the most recent measurement prior to the first vancomycin administration and did not incorporate longitudinal laboratory or hemodynamic trajectories (e.g., creatinine slope, electrolyte variability, or hypotension duration). These temporal dynamics may contain clinically meaningful signals for early kidney stress, but incorporating them would require a fundamentally different modeling framework—such as sequence-based or time-series architectures—beyond the scope of the current single-time-point design. Future work will integrate longitudinal feature engineering to capture dynamic trends that may further enhance predictive performance.
Finally, clinical implementation faces significant barriers including workflow integration challenges, regulatory validation requirements, and potential algorithmic bias across demographic groups. Our model requires real-time access to multiple data streams that may not be consistently available in all clinical settings.
Future work should prioritize multi-center external validation across diverse healthcare systems to establish broader generalizability. Prospective randomized controlled trials comparing model-guided versus standard monitoring strategies are essential to demonstrate clinical utility and cost-effectiveness using meaningful endpoints such as time to nephrotoxicity detection and clinical outcomes.
Methodological enhancements should include integration of advanced biomarkers when available, development of longitudinal time-series models to capture dynamic risk evolution, and incorporation of detailed vancomycin pharmacokinetic data for more mechanistically informed predictions. Advanced analytical approaches should employ causal inference techniques, systematic fairness evaluation across demographic subgroups, and federated learning approaches for collaborative model development while preserving data privacy.
Implementation research should investigate human-AI collaboration patterns, conduct comprehensive health economic evaluations, and examine organizational barriers to clinical adoption. The development of bias mitigation strategies and systematic evaluation of model performance across diverse patient populations will be crucial for ensuring equitable clinical application.
Future success depends on addressing these limitations through rigorous validation studies, enhanced methodological approaches, and careful attention to clinical implementation challenges. Collaborative efforts among data scientists, clinicians, regulatory bodies, and healthcare systems will be essential to ensure these tools meaningfully improve patient outcomes while maintaining safety and equity in clinical care.

5. Conclusions

This study demonstrates that machine learning can successfully predict creatinine elevation during vancomycin treatment using routinely collected ICU data, achieving clinically meaningful performance while maintaining interpretability and supporting real-world implementation. The framework represents a significant step toward personalized, data-driven approaches to drug safety monitoring in critical care. However, the transition from research demonstration to clinical implementation requires continued commitment to rigorous validation, careful attention to clinical workflow integration, and systematic consideration of ethical implications. The ultimate measure of success will not be algorithmic performance alone, but demonstrable improvement in patient outcomes, clinical efficiency, and healthcare equity. By bridging the gap between retrospective risk identification and prospective, actionable prediction, this work underscores the transformative potential of interpretable machine learning for enhancing drug safety and optimizing clinical care in high-risk populations. The path forward demands collaborative efforts among researchers, clinicians, healthcare systems, and regulatory bodies to ensure that these technological advances translate into meaningful improvements in patient care and clinical outcomes. Future research must balance methodological sophistication with clinical practicality, ensuring that the promise of precision medicine in drug safety monitoring becomes a reality that serves all patients effectively and equitably. The foundation established by this work provides a robust platform for continued innovation in machine learning applications for healthcare, with the ultimate goal of improving patient safety and clinical decision-making in critical care environments.

Author Contributions

J.F. conceptualized the study, developed the methodological framework, conducted the experiments, performed data analysis, and drafted the original manuscript. L.S. and S.C. contributed to data preprocessing, model development, and manuscript preparation. Y.S. and M.A. provided technical support for machine learning implementation and model validation. M.P. supervised the project, coordinated research efforts, and provided strategic guidance throughout the study. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This study used de-identified data from the publicly available MIMIC-IV database (PhysioNet), which has received ethics approval from the Institutional Review Board at Beth Israel Deaconess Medical Center and Massachusetts Institute of Technology. No additional ethical review was required for this secondary analysis.

Informed Consent Statement

Not applicable. This study used de-identified patient data from the MIMIC-IV database. No identifiable human subjects were involved.

Data Availability Statement

The datasets used in this study are available through PhysioNet, a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology. The MIMIC-IV database available online: https://physionet.org/content/mimiciv/ (accessed on 4 December 2025). after completing the required training course and signing the data use agreement. Access requires registration and approval but is free for researchers. Due to the sensitive nature of clinical data, direct sharing of the processed datasets is not permitted under the data use agreement. Our analysis code and methodology are fully described to enable reproduction of results. The source code for this study is publicly available at https://github.com/JunyiTim/CatBoost-Model-for-Predicting-Creatinine-Elevation-in-Vancomycin-Treated-ICU-Patients (accessed on 4 December 2025).

Acknowledgments

The authors would like to acknowledge the Laboratory for Computational Physiology at the Massachusetts Institute of Technology for maintaining the MIMIC-IV database.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ICUIntensive Care Unit
CatBoostCatBoost
MIMIC-IVMedical Information Mart for Intensive Care IV
KDIGOKidney Disease: Improving Global Outcomes
AUROCArea Under the Receiver Operating Characteristic
SHAPSHapley Additive exPlanations
ALEAccumulated Local Effects
AKIAcute Kidney Injury
AUCArea Under the Curve
MRSAMethicillin-Resistant Staphylococcus Aureus
VREVancomycin-Resistant Enterococcus
CCICharlson Comorbidity Index
APSIIIAcute Physiology Score III
NPVNegative Predictive Value
PPVPositive Predictive Value
EDEmergency Department
SMOTESynthetic Minority Over-sampling Technique
RASSRichmond Agitation-Sedation Scale
ASTAspartate Aminotransferase
SpO2Peripheral Oxygen Saturation
BUNBlood Urea Nitrogen
INRInternational Normalized Ratio
PTTPartial Thromboplastin Time
ANOVAAnalysis of Variance
LightGBMLight Gradient Boosting Machine
XGBoosteXtreme Gradient Boosting
DREAMDifferential Evolution Adaptive Metropolis

References

  1. Griffith, R.S. Introduction to vancomycin. Rev. Infect. Dis. 1981, 3, S200–S204. [Google Scholar] [CrossRef] [PubMed]
  2. Levine, D.P. Vancomycin: A history. Clin. Infect. Dis. 2006, 42, S5–S12. [Google Scholar] [CrossRef] [PubMed]
  3. Moellering, R.C., Jr. Vancomycin: A 50-year reassessment. Clin Infect Dis. 2006, 42, S3–S4. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, C.; Bayer, A.; Cosgrove, S.E.; Daum, R.S.; Fridkin, S.K.; Gorwitz, R.J.; Kaplan, S.L.; Karchmer, A.W.; Levine, D.P.; Murray, B.E.; et al. Clinical practice guidelines by the Infectious Diseases Society of America for the treatment of methicillin-resistant Staphylococcus aureus infections in adults and children. Clin. Infect. Dis. 2011, 52, e18–e55. [Google Scholar] [CrossRef]
  5. Stevens, D.L. The role of vancomycin in the treatment paradigm. Clin. Infect. Dis. 2006, 42, S51–S57. [Google Scholar] [CrossRef]
  6. He, N.; Su, S.; Ye, Z.; Du, G.; He, B.; Li, D.; Liu, Y.; Yang, K.; Zhang, X.; Zhang, Y.; et al. Evidence-based guideline for therapeutic drug monitoring of vancomycin: 2020 update by the division of therapeutic drug monitoring, Chinese pharmacological society. Clin. Infect. Dis. 2020, 71, S363–S371. [Google Scholar] [CrossRef]
  7. Kirst, H.A.; Thompson, D.G.; Nicas, T.I. Historical yearly usage of vancomycin. Antimicrob. Agents Chemother. 1998, 42, 1303–1304. [Google Scholar] [CrossRef]
  8. Kwiatkowska, E.; Domański, L.; Dziedziejko, V.; Kajdy, A.; Stefańska, K.; Kwiatkowski, S. The mechanism of drug nephrotoxicity and the methods for preventing kidney damage. Int. J. Mol. Sci. 2021, 22, 6109. [Google Scholar] [CrossRef]
  9. Campbell, R.E.; Chen, C.H.; Edelstein, C.L. Overview of antibiotic-induced nephrotoxicity. Kidney Int. Rep. 2023, 8, 2211–2225. [Google Scholar] [CrossRef]
  10. Kan, W.C.; Chen, Y.C.; Wu, V.C.; Shiao, C.C. Vancomycin-associated acute kidney injury: A narrative review from pathophysiology to clinical application. Int. J. Mol. Sci. 2022, 23, 2052. [Google Scholar] [CrossRef]
  11. Džidić-Krivić, A.; Sher, E.K.; Kusturica, J.; Farhat, E.K.; Nawaz, A.; Sher, F. Unveiling drug induced nephrotoxicity using novel biomarkers and cutting-edge preventive strategies. Chem.-Biol. Interact. 2024, 388, 110838. [Google Scholar] [CrossRef] [PubMed]
  12. Hammond, D.A.; Smith, M.N.; Li, C.; Hayes, S.M.; Lusardi, K.; Bookstaver, P.B. Systematic review and metaanalysis of acute kidney injury associated with concomitant vancomycin and piperacillin/tazobactam. Clin. Infect. Dis. 2017, 64, 666–674. [Google Scholar] [CrossRef] [PubMed]
  13. Venugopalan, V.; Maranchick, N.; Hanai, D.; Hernandez, Y.J.; Joseph, Y.; Gore, A.; Desear, K.; Peloquin, C.; Neely, M.; Felton, T.; et al. Association of piperacillin and vancomycin exposure on acute kidney injury during combination therapy. JAC-Antimicrob. Resist. 2024, 6, dlad157. [Google Scholar] [CrossRef] [PubMed]
  14. Aljefri, D.M.; Avedissian, S.N.; Rhodes, N.J.; Postelnick, M.J.; Nguyen, K.; Scheetz, M.H. Vancomycin area under the curve and acute kidney injury: A meta-analysis. Clin. Infect. Dis. 2019, 69, 1881–1887. [Google Scholar] [CrossRef]
  15. Kim, J.Y.; Yee, J.; Yoon, H.Y.; Han, J.M.; Gwak, H.S. Risk factors for vancomycin-associated acute kidney injury: A systematic review and meta-analysis. Br. J. Clin. Pharmacol. 2022, 88, 3977–3989. [Google Scholar]
  16. Johnson, A.; Bulgarelli, L.; Pollard, T.; Horng, S.; Celi, L.A.; Mark, R. Mimic-iv. PhysioNet. 2020, pp. 49–55. Available online: https://physionet.org/content/mimiciv/1.0/ (accessed on 23 August 2021).
  17. Khwaja, A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin. Pract. 2012, 120, c179–c184. [Google Scholar]
  18. Kellum, J.A.; Lameire, N.; Group, K.A.G.W. Diagnosis, evaluation, and management of acute kidney injury: A KDIGO summary (Part 1). Crit. Care 2013, 17, 1–15. [Google Scholar] [CrossRef]
  19. Fan, J.; Chen, S.; Sun, L.; Si, Y.; Pishgar, E.; Alaei, K.; Placencia, G.; Pishgar, M. Predicting Short-Term Mortality in Elderly ICU Patients with Diabetes and Heart Failure: A Distributional Inference Framework. arXiv 2025, arXiv:2506.15058. [Google Scholar] [CrossRef]
  20. Chen, S.; Si, Y.; Fan, J.; Sun, L.; Placencia, G.; Pishgar, E.; Alaei, K.; Pishgar, M. Interpretable Machine Learning Model for Early Prediction of 30-Day Mortality in ICU Patients With Coexisting Hypertension and Atrial Fibrillation: A Retrospective Cohort Study. arXiv 2025, arXiv:2506.15036. [Google Scholar] [CrossRef]
  21. Du, H.; Li, Z.; Yang, Y.; Li, X.; Wei, Y.; Lin, Y.; Zhuang, X. New insights into the vancomycin-induced nephrotoxicity using in vitro metabolomics combined with physiologically based pharmacokinetic modeling. J. Appl. Toxicol. 2020, 40, 897–907. [Google Scholar] [CrossRef]
  22. Soenksen, L.R.; Ma, Y.; Zeng, C.; Boussioux, L.; Villalobos Carballo, K.; Na, L.; Wiberg, H.M.; Li, M.L.; Fuentes, I.; Bertsimas, D. Integrated multimodal artificial intelligence framework for healthcare applications. npj Digit. Med. 2022, 5, 149. [Google Scholar] [CrossRef]
  23. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  24. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  25. Louppe, G.; Wehenkel, L.; Sutera, A.; Geurts, P. Understanding variable importances in forests of randomized trees. Adv. Neural Inf. Process. Syst. 2013, 26, 431–439. [Google Scholar]
  26. Chen, J.H.; Asch, S.M. Machine learning and prediction in medicine—Beyond the peak of inflated expectations. N. Engl. J. Med. 2017, 376, 2507–2509. [Google Scholar] [CrossRef] [PubMed]
  27. Rajkomar, A.; Oren, E.; Chen, K.; Dai, A.M.; Hajaj, N.; Hardt, M.; Liu, P.J.; Liu, X.; Marcus, J.; Sun, M.; et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 2018, 1, 18. [Google Scholar] [CrossRef]
  28. Lodise, T.P.; Graves, J.; Evans, A.; Graffunder, E.; Helmecke, M.; Lomaestro, B.M.; Stellrecht, K. Relationship between vancomycin MIC and failure among patients with methicillin-resistant Staphylococcus aureus bacteremia treated with vancomycin. Antimicrob. Agents Chemother. 2008, 52, 3315–3320. [Google Scholar] [CrossRef]
  29. Charlson, M.E.; Pompei, P.; Ales, K.L.; MacKenzie, C.R. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J. Chronic Dis. 1987, 40, 373–383. [Google Scholar] [CrossRef]
  30. Knaus, W.A.; Wagner, D.P.; Draper, E.A.; Zimmerman, J.E.; Bergner, M.; Bastos, P.G.; Sirio, C.A.; Murphy, D.J.; Lotring, T.; Damiano, A.; et al. The APACHE III prognostic system: Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 1991, 100, 1619–1636. [Google Scholar] [CrossRef]
  31. Collaborative, S.; Drake, T.M.; Cheung, L.K.; Gaba, F.; Glasbey, J.; Griffiths, N.; Helliwell, R.J.; Huq, T.; Khaw, R.; Mayes, J.; et al. Association between peri-operative angiotensin-converting enzyme inhibitors and angiotensin-2 receptor blockers and acute kidney injury in major elective non-cardiac surgery: A multicentre, prospective cohort study. Anaesthesia 2018, 73, 1214–1222. [Google Scholar] [CrossRef]
  32. Chen, S.; Si, Y.; Fan, J.; Sun, L.; Pishgar, E.; Alaei, K.; Placencia, G.; Pishgar, M. Predicting ICU Readmission in Acute Pancreatitis Patients Using a Machine Learning-Based Model with Enhanced Clinical Interpretability. medRxiv. 2025. Available online: https://www.medrxiv.org/content/early/2025/05/12/2025.05.11.25327405.full.pdf (accessed on 1 December 2025). [CrossRef]
  33. Yang, J.; Peng, H.; Luo, Y.; Zhu, T.; Xie, L. Explainable ensemble machine learning model for prediction of 28-day mortality risk in patients with sepsis-associated acute kidney injury. Front. Med. 2023, 10, 1165129. [Google Scholar] [CrossRef] [PubMed]
  34. Zamoner, W.; Eid, K.Z.C.; de Almeida, L.M.B.; Pierri, I.G.; dos Santos, A.; Balbi, A.L.; Ponce, D. The serum concentration of vancomycin as a diagnostic predictor of nephrotoxic acute kidney injury in critically ill patients. Antibiotics 2022, 11, 112. [Google Scholar] [CrossRef] [PubMed]
  35. van Hal, S.J.; Paterson, D.L.; Lodise, T.P. Systematic review and meta-analysis of vancomycin-induced nephrotoxicity associated with dosing schedules that maintain troughs between 15 and 20 milligrams per liter. Antimicrob. Agents Chemother. 2013, 57, 734–744. [Google Scholar] [CrossRef] [PubMed]
  36. Bosso, J.A.; Nappi, J.; Rudisill, C.; Wellein, M.; Bookstaver, P.B.; Swindler, J. Relationship between vancomycin trough concentrations and nephrotoxicity: A prospective multicenter trial. Antimicrob. Agents Chemother. 2011, 55, 5475–5482. [Google Scholar] [CrossRef]
  37. Li, L.; Zhang, L.; Li, S.; Xu, F.; Li, L.; Li, S.; Lyu, J.; Yin, H. Effect of first trough vancomycin concentration on the occurrence of acute kidney injury in critically ill patients: A retrospective study of the MIMIC-IV database. Front. Med. 2022, 9, 879861. [Google Scholar] [CrossRef]
  38. Blair, M.; Côté, J.M.; Cotter, A.; Lynch, B.; Redahan, L.; Murray, P.T. Nephrotoxicity from vancomycin combined with piperacillin-tazobactam: A comprehensive review. Am. J. Nephrol. 2021, 52, 85–97. [Google Scholar] [CrossRef]
  39. Aghamirzaei, F.; Abin, A.A.; Futuhi, F. An ensemble machine learning model for early prediction of vancomycin-induced acute kidney injury in ICU patients. Arch. Acad. Emerg. Med. 2025, 13, e45. [Google Scholar] [CrossRef]
  40. Bao, P.; Sun, Y.; Qiu, P.; Li, X. Development and validation of a nomogram to predict the risk of vancomycin-related acute kidney injury in critical care patients. Front. Pharmacol. 2024, 15, 1389140. [Google Scholar] [CrossRef]
Figure 1. Cohort selectioneach figure appears in numerical order. process for vancomycin-associated renal injury analysis.
Figure 1. Cohort selectioneach figure appears in numerical order. process for vancomycin-associated renal injury analysis.
Biomedinformatics 05 00071 g001
Figure 2. Impact of Feature Removal on LR Model Performance.
Figure 2. Impact of Feature Removal on LR Model Performance.
Biomedinformatics 05 00071 g002
Figure 3. AUROC Curves for Model Performance in the Test Set.
Figure 3. AUROC Curves for Model Performance in the Test Set.
Biomedinformatics 05 00071 g003
Figure 4. SHAP summary plot showing feature contributions to predicted creatinine elevation in vancomycin patients.
Figure 4. SHAP summary plot showing feature contributions to predicted creatinine elevation in vancomycin patients.
Biomedinformatics 05 00071 g004
Figure 5. ALE plots for top features in vancomycin ICU patients.
Figure 5. ALE plots for top features in vancomycin ICU patients.
Biomedinformatics 05 00071 g005
Figure 6. Posterior distribution of vancomycin-associated creatinine elevation risk for a high-risk ICU patient.
Figure 6. Posterior distribution of vancomycin-associated creatinine elevation risk for a high-risk ICU patient.
Biomedinformatics 05 00071 g006
Table 1. Final 15 selected features (ordered by Random Forest importance).
Table 1. Final 15 selected features (ordered by Random Forest importance).
RankFeatureImportance (Normalized)
1Phosphate0.168
2APS III0.151
3Magnesium0.123
4Lactate0.110
5PTT0.094
6Total Bilirubin0.078
7Platelet Count0.066
8AST0.059
9Arterial Base Excess0.041
10Richmond-RAS Scale0.038
11Mean Airway Pressure0.030
12Anion Gap0.027
13White Blood Cells0.025
14Charlson Comorbidity Index0.021
15Age0.019
Table 2. Final selected features used for predicting vancomycin-associated renal injury.
Table 2. Final selected features used for predicting vancomycin-associated renal injury.
CategorySelected Features
charteventsRichmond-RAS Scale, Total Bilirubin, Arterial Base Excess, AST, Braden Mobility, Mean Airway Pressure
procedureeventsArterial Line
labeventsPhosphate, Anion Gap, Magnesium, Lactate, PTT, Platelet Count, White Blood Cells, Glucose, APS III
admissionAge, ED duration, Magnesium, Charlson Comorbidity Index
Table 3. T-test Comparison of Feature Distributions between Training and Test Sets.
Table 3. T-test Comparison of Feature Distributions between Training and Test Sets.
FeatureUnitTraining SetTest Setp-Value
Phosphatemg/dL3.61 (1.19)3.62 (1.20)0.555
APS III51.99 (22.52)51.63 (22.05)0.456
Magnesiummg/dL2.09 (0.30)2.09 (0.30)0.958
Lactatemmol/L2.20 (1.46)2.19 (1.41)0.654
PTTs40.05 (16.09)40.66 (16.81)0.089
Anion Gapmmol/L14.17 (3.59)14.06 (3.47)0.154
Platelet Count×103/µL197.33 (108.19)196.41 (107.81)0.690
Arterial Base Excessmmol/L−1.33 (3.72)−1.27 (3.86)0.413
Richmond-RAS Scale−1.31 (1.35)−1.31 (1.39)0.923
White Blood Cells×103/µL13.03 (9.21)12.63 (7.63)0.023
Glucosemg/dL144.82 (49.59)145.62 (48.87)0.447
Mean Airway PressurecmH2O10.50 (3.28)10.52 (3.29)0.821
Total Bilirubinmg/dL2.36 (4.23)2.26 (4.11)0.287
ASTU/L319.67 (1000.05)281.29 (743.46)0.061
Ageyears61.27 (14.28)61.16 (14.31)0.701
Braden Mobilityscore2.39 (0.59)2.40 (0.59)0.580
ED Durationh3.47 (4.60)3.42 (4.27)0.582
Charlson Comorbidity Indexscore4.61 (2.76)4.66 (2.81)0.473
Arterial Linebinary0.61 (0.49)0.60 (0.49)0.219
Note: This table summarizes statistical comparisons between the training and test cohorts. Continuous variables are expressed as mean (standard deviation). p-values are derived from two-sided t-tests, with significance set at p < 0.05 .
Table 4. T-test Comparison of Feature Distributions Between Non-Elevation and Elevation.
Table 4. T-test Comparison of Feature Distributions Between Non-Elevation and Elevation.
FeatureUnitNon-Elevation SetElevation Setp-Value
Phosphatemg/dL3.40 (1.07)4.13 (1.31)<0.001
APS III48.66 (21.05)60.44 (23.89)<0.001
Magnesiummg/dL2.06 (0.28)2.16 (0.32)<0.001
Lactatemmol/L2.05 (1.26)2.58 (1.83)<0.001
PTTs38.65 (15.22)43.62 (17.64)<0.001
Anion Gapmmol/L13.75 (3.34)15.23 (3.96)<0.001
Platelet Count×103/µL202.85 (106.86)183.31 (110.30)<0.001
Arterial Base Excessmmol/L−0.99 (3.57)−2.21 (3.96)<0.001
Richmond-RAS Scale−1.17 (1.26)−1.65 (1.50)<0.001
White Blood Cells×103/µL12.83 (9.72)13.52 (7.72)0.001
Glucosemg/dL142.96 (48.24)149.55 (52.59)<0.001
Mean Airway PressurecmH2O10.24 (3.07)11.17 (3.69)<0.001
Total Bilirubinmg/dL1.99 (3.15)3.28 (6.08)<0.001
ASTU/L250.16 (758.82)496.48 (1427.11)<0.001
Ageyears60.80 (14.53)62.47 (13.55)<0.001
Braden Mobilityscore2.44 (0.58)2.25 (0.59)<0.001
ED Durationh3.51 (4.09)3.36 (5.68)0.263
Charlson Comorbidity Indexscore4.37 (2.72)5.24 (2.75)<0.001
Arterial Linebinary0.57 (0.50)0.71 (0.45)<0.001
Note: This table compares patients with and without vancomycin-associated creatinine elevation. Continuous features are presented as mean (standard deviation). p-values were calculated using two-sided t-tests with a significance threshold of p < 0.05. The creatinine elevation events included in this study meet established diagnostic thresholds commonly used for AKI but specifically represent vancomycin-associated renal response.
Table 5. Performance Comparison of Different Models in the Test Set.
Table 5. Performance Comparison of Different Models in the Test Set.
ModelAUC (95% CI)AccuracyF1-ScoreSensitivitySpecificityPPVNPV
CatBoost0.818 (0.801–0.834)0.7140.6050.8000.6810.4860.900
LightGBM0.806 (0.791–0.825)0.6900.5870.8000.6480.4620.896
XGBoost0.802 (0.782–0.816)0.6770.5760.8000.6310.4500.893
LogisticRegression0.804 (0.787–0.821)0.6940.5890.8000.6530.4660.897
NaiveBayes0.782 (0.764–0.801)0.6470.5540.8000.5890.4230.886
NeuralNet0.748 (0.728–0.767)0.5960.5210.8000.5190.3860.873
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, J.; Sun, L.; Chen, S.; Si, Y.; Ahmadi, M.; Pishgar, M. Development and Validation of a CatBoost-Based Model for Predicting Significant Creatinine Elevation in ICU Patients Receiving Vancomycin Therapy. BioMedInformatics 2025, 5, 71. https://doi.org/10.3390/biomedinformatics5040071

AMA Style

Fan J, Sun L, Chen S, Si Y, Ahmadi M, Pishgar M. Development and Validation of a CatBoost-Based Model for Predicting Significant Creatinine Elevation in ICU Patients Receiving Vancomycin Therapy. BioMedInformatics. 2025; 5(4):71. https://doi.org/10.3390/biomedinformatics5040071

Chicago/Turabian Style

Fan, Junyi, Li Sun, Shuheng Chen, Yong Si, Minoo Ahmadi, and Maryam Pishgar. 2025. "Development and Validation of a CatBoost-Based Model for Predicting Significant Creatinine Elevation in ICU Patients Receiving Vancomycin Therapy" BioMedInformatics 5, no. 4: 71. https://doi.org/10.3390/biomedinformatics5040071

APA Style

Fan, J., Sun, L., Chen, S., Si, Y., Ahmadi, M., & Pishgar, M. (2025). Development and Validation of a CatBoost-Based Model for Predicting Significant Creatinine Elevation in ICU Patients Receiving Vancomycin Therapy. BioMedInformatics, 5(4), 71. https://doi.org/10.3390/biomedinformatics5040071

Article Metrics

Back to TopTop