Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Prognostic Value of Blood Urea Nitrogen for Acute Kidney Injury and Mortality in Vasculitis: A Large Cohort Study Using Multivariate Joint Model and Machine Learning

Diagnostics 2026, 16(5), 665; https://doi.org/10.3390/diagnostics16050665

by Si Chen

, Rongfeng Liu, Yongzhi Zhang, Yan Wang, Haixia Luan, Xiaoli Zeng^* and Hui Yuan^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Diagnostics 2026, 16(5), 665; https://doi.org/10.3390/diagnostics16050665

Submission received: 25 January 2026 / Revised: 18 February 2026 / Accepted: 19 February 2026 / Published: 25 February 2026

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Diagnostics and Analysis 2025)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear authors,

This is a wonderful work titled 'Multivariate Joint Model and Machine Learning Study: Prognostic Value of Blood Urea Nitrogen for Acute Kidney Injury and Mortality in Vasculitis.' The introduction is descriptive, and the results are well-presented, with analytical tables and enlightening diagrams that aid in their understanding. The discussion is adequate and inclusive, without being tiring. The conclusion summarizes the authors' thinking wonderfully. The references are contemporary and sufficient in number. I believe that I agree with the publication of this particular manuscript.

Author Response

We sincerely appreciate the reviewer’s generous and constructive comments. We are encouraged by the recognition of the analytical rigor, clarity of presentation, and overall coherence of the manuscript. Your positive feedback affirms the value of our work and motivates us to continue refining our research.

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review this manuscript. The authors present a study on predicting mortality factors in patients with vasculitis-associated acute kidney injury (AKI). The topic is novel and holds academic merit. Below are my detailed comments and suggestions for further improvement.

Major Comments：

1. The higher prevalence of hypertension in the non-AKI group contradicts clinical intuition and requires explanation, potentially through stratification by vasculitis subtype (e.g., GCA). The timing of BUN measurement relative to AKI onset is unclear, jeopardizing its causal interpretation as a "predictor." Reliance solely on creatinine criteria may miss non-oliguric AKI, a limitation whose potential bias should be discussed more thoroughly.

2. Inconsistent BUN tertile cut-offs across analyses (Table 4 vs. Table 6) are confusing. Significant performance drop in ML models (XGBoost, LightGBM) from training to test sets suggests overfitting; the rationale for selecting logistic regression as the final model needs clarification. The substantial sample attrition in long-term survival analysis (e.g., 365 days) undermines the reliability of conclusions at those time points.

3. Clinical Significance and Generalizability: The multiple, disparate BUN risk thresholds identified (e.g., 32, 56, 60 mg/dL) are difficult to translate into clinical practice. A unified, actionable risk stratification tool is recommended. The cohort is dominated by GCA patients, and results may not generalize well across other vasculitis types (e.g., GPA, MPA) with distinct AKI mechanisms. Subgroup analysis by vasculitis type would strengthen the findings.

4. Please refer to Lima, Camila1; Macedo, Etienne2,*. Biomarkers in acute kidney injury and cirrhosis. Journal of Translational Critical Care Medicine 6(2):e23-00014, June 2024. | DOI: 10.1097/JTCCM-D-23-00014

Minor Comments：

1. Content and Presentation: The introduction on BUN physiology could cite more recent literature. The ICD code used for Behçet's disease appears incorrect and requires verification. Some figures (e.g., Figure 4B) have poor readability and should be optimized.

2. Methods and Data: Variables excluded due to high missingness should be listed. The reported immunosuppressant use rate (10.7%) seems unusually low; stratification by vasculitis type and a note on potential data recording limitations are warranted. The definition of the "Overlap" syndrome patient category is unclear.

3. Reporting Standards: Adherence to the TRIPOD reporting guideline should be stated. The bootstrap validation used only 10 iterations, which is typically insufficient for stable estimates; justification is needed. Decision Curve Analysis (DCA) should report net benefit values at clinically relevant threshold probabilities.

4. Formatting and Language: Terminology should be consistent (e.g., use "vasculitis"). Abbreviations (e.g., "GCs") must be defined at first use. Typographical errors exist in tables (e.g., "tertile" vs. "quartile") and require correction.

Author Response

Major Comments：

The higher prevalence of hypertension in the non-AKI group contradicts clinical intuition and requires explanation, potentially through stratification by vasculitis subtype (e.g., GCA). The timing of BUN measurement relative to AKI onset is unclear, jeopardizing its causal interpretation as a "predictor." Reliance solely on creatinine criteria may miss non-oliguric AKI, a limitation whose potential bias should be discussed more thoroughly.

Response: We sincerely thank the reviewer for this thoughtful and constructive comment. First, regarding the higher prevalence of hypertension in the non-AKI group, we agree that this finding may appear counterintuitive. To address this concern, we have added clarification in the Results and Limitations sections. Specifically, we note that this pattern may reflect differences in vasculitis subtype distribution and age-related comorbidity. For example, certain subtypes such as giant cell arteritis predominantly affect elderly patients, among whom hypertension is common but renal involvement is less frequent compared with small-vessel vasculitides. Therefore, hypertension in this cohort may represent age-related comorbidity rather than a direct contributor to AKI development.

Second, we appreciate the reviewer’s important comment regarding the temporal relationship between BUN measurement and AKI onset. We have revised the Methods section to explicitly clarify that baseline BUN values were extracted from measurements obtained at hospital admission. Furthermore, we have expanded the Limitations section to acknowledge that, due to the retrospective nature of the database, the precise temporal relationship between BUN elevation and subsequent AKI onset cannot be fully determined, and reverse causation cannot be completely excluded. We agree that this temporal ambiguity limits causal interpretation. Third, as suggested, we have strengthened the discussion of AKI definition in the Limitations section. We now explicitly state that defining AKI solely according to KDIGO serum creatinine criteria, owing to incomplete urine output data, may have resulted in underestimation of non-oliguric AKI and potential misclassification bias, which could influence AKI incidence estimates and observed associations.

Inconsistent BUN tertile cut-offs across analyses (Table 4 vs. Table 6) are confusing. Significant performance drop in ML models (XGBoost, LightGBM) from training to test sets suggests overfitting; the rationale for selecting logistic regression as the final model needs clarification. The substantial sample attrition in long-term survival analysis (e.g., 365 days) undermines the reliability of conclusions at those time points.

Response: We sincerely thank the reviewer for these important methodological comments. First, regarding the apparent inconsistency in BUN tertile cut-offs between Table 4 and Table 6, we have clarified in the Methods section that tertile cut-offs were determined separately within each analytical population because BUN distributions differed between the overall vasculitis cohort and the subgroup of patients with AKI. We have added explicit statements in the Results section to avoid confusion. Second, we appreciate the reviewer’s observation regarding potential overfitting in certain machine learning models. While ensemble models such as XGBoost and LightGBM achieved high AUC values in the training set, their performance declined in the testing set, suggesting possible overfitting. In contrast, the logistic regression model demonstrated more stable performance across datasets and achieved the highest AUC in the testing set. We have revised the manuscript to clarify that logistic regression was selected as the final model due to its robustness, generalizability, and clinical interpretability. Third, we acknowledge that the number of long-term mortality events was smaller than that of short-term events. We have added a statement in the Limitations section to note that the relatively limited number of 365-day events may reduce statistical power and affect estimate stability.

Clinical Significance and Generalizability: The multiple, disparate BUN risk thresholds identified (e.g., 32, 56, 60 mg/dL) are difficult to translate into clinical practice. A unified, actionable risk stratification tool is recommended. The cohort is dominated by GCA patients, and results may not generalize well across other vasculitis types (e.g., GPA, MPA) with distinct AKI mechanisms. Subgroup analysis by vasculitis type would strengthen the findings.

Response: We sincerely thank the reviewer for these valuable comments regarding clinical applicability and generalizability. First, regarding the identification of multiple BUN thresholds (32, 56, and 60 mg/dL), we have clarified in the revised manuscript that these values were derived from different statistical approaches (restricted cubic spline modeling and survival-based optimization) and reflect method-specific risk inflection or discrimination points rather than definitive clinical decision thresholds. Importantly, the consistent finding across analyses was a monotonic increase in risk with rising BUN levels. We have emphasized that BUN should primarily be interpreted as a continuous risk marker, and risk stratification may be better guided by model-based probability estimation rather than rigid categorical cut-offs. Second, we acknowledge that the cohort included a predominance of patients with giant cell arteritis. We have added a statement in the Discussion to clarify that different vasculitis subtypes may involve distinct renal injury mechanisms, and therefore caution is warranted when generalizing these findings across all vasculitis categories. Future subtype-specific validation studies are needed to confirm the robustness of these associations.

Please refer to Lima, Camila1; Macedo, Etienne2,*. Biomarkers in acute kidney injury and cirrhosis. Journal of Translational Critical Care Medicine 6(2):e23-00014, June 2024. | DOI: 10.1097/JTCCM-D-23-00014

Response: We sincerely thank the reviewer for this valuable suggestion. We have carefully reviewed the cited article by Lima and Macedo (2024) and incorporated it into the Discussion section. Specifically, we expanded the discussion regarding the limitations of serum creatinine as a biomarker of renal function in complex systemic conditions and highlighted the growing interest in multimarker approaches combining filtration and tubular injury biomarkers. We also clarified how these considerations support the rationale for evaluating BUN as a pragmatic and widely available marker in real-world vasculitis populations.

Minor Comments：

Content and Presentation: The introduction on BUN physiology could cite more recent literature. The ICD code used for Behçet's disease appears incorrect and requires verification. Some figures (e.g., Figure 4B) have poor readability and should be optimized.

Response: We sincerely thank the reviewer for these helpful comments. First, we have updated the Introduction section to incorporate more recent literature supporting the physiological and prognostic relevance of BUN in acute and critical illness settings. Second, we carefully rechecked the ICD coding for Behçet’s disease in both MIMIC-III and MIMIC-IV databases. In our extracted cohort, Behçet’s syndrome was identified using ICD-9 code 136.1. No additional cases were identified using ICD-10 code M35.2 in the study population. We have clarified this point in the Methods section to avoid confusion. Third, the readability of Figure 4B has been improved by optimizing the heatmap layout, prioritizing the highest-ranking variables, enlarging font size, and enhancing color contrast in the revised version.

Methods and Data: Variables excluded due to high missingness should be listed. The reported immunosuppressant use rate (10.7%) seems unusually low; stratification by vasculitis type and a note on potential data recording limitations are warranted. The definition of the "Overlap" syndrome patient category is unclear.

Response: We sincerely thank the reviewer for these important comments. First, we have clarified in the Methods section that no variables exceeded the predefined 15% missingness threshold; therefore, no variables were excluded based on missingness. The manuscript has been revised accordingly. Second, we acknowledge that the reported immunosuppressant use rate appears relatively low. This likely reflects limitations of inpatient medication documentation in the MIMIC databases and the inability to capture outpatient or pre-admission immunosuppressive therapy. We have added clarification in the Results and Limitations sections. Third, we have clarified the definition of the “Overlap” category in the Methods section. Overlap was defined as the presence of ICD codes corresponding to more than one vasculitis subtype during the same hospitalization.

Reporting Standards: Adherence to the TRIPOD reporting guideline should be stated. The bootstrap validation used only 10 iterations, which is typically insufficient for stable estimates; justification is needed. Decision Curve Analysis (DCA) should report net benefit values at clinically relevant threshold probabilities.

Response: We thank the reviewer for these constructive suggestions. A statement confirming adherence to the TRIPOD reporting guideline has now been added to the Methods section to improve transparency and reporting quality. We agree that 10 iterations may be insufficient for stable internal validation. Therefore, the bootstrap validation has been repeated using 100 iterations. The revised results are now presented in Figure 6, which summarize the distribution of AUC values across resamples. The model demonstrated stable discrimination with a mean AUC of 0.775 (range: 0.610–0.860), and the 95% confidence interval analysis confirmed performance substantially above random chance. We have now explicitly reported net benefit values at clinically relevant threshold probabilities. At threshold probabilities around 0.2–0.3, the model achieved a net benefit of approximately 0.15–0.20. These results are described in the revised Results section and illustrated in Supplementary Figures 3A–3C.

Formatting and Language: Terminology should be consistent (e.g., use "vasculitis"). Abbreviations (e.g., "GCs") must be defined at first use. Typographical errors exist in tables (e.g., "tertile" vs. "quartile") and require correction.

Response: We thank the reviewer for these careful observations. Terminology has been standardized throughout the manuscript, and “vasculitis” is now used consistently. All abbreviations, including glucocorticoids (GCs) and creatinine (Cr), are now defined at first use. In addition, typographical inconsistencies in the tables, including the correction of “quartile” to “tertile,” have been addressed. We have carefully reviewed the entire manuscript to ensure consistency and accuracy.

Reviewer 3 Report

Comments and Suggestions for Authors

BUN is already a known marker of renal dysfunction and mortality risk in critical illness. The manuscript demonstrates statistical confirmation of expected associations rather than uncovering a new mechanistic or predictive paradigm.

AKI was defined using serum creatinine criteria only, excluding urine output (Methods, p. 17). This introduces misclassification bias.

Although seven models were implemented, and Logistic regression achieved the highest AUC (0.904). Bootstrap validation shows AUC CI 0.725–0.786 (Figure 6), which appears inconsistent with the reported 0.904 testing AUC. Only 10 bootstrap cycles were used — insufficient for stable resampling inference.

Provide calibration curves (Brier score, calibration slope).

Report SHAP values for interpretability.

Provide confusion matrices and sensitivity/specificity.

LASSO + logistic regression were used to preselect features before ML modeling. It is unclear whether feature selection was performed inside cross-validation folds or on the full dataset prior to split (which would cause data leakage).

Several tables contain typographical errors (“Medel” instead of “Model”).

To improve contextual positioning and incorporate advanced modeling approaches in kidney disease prediction, the authors should consider giving a read to Convnext-PCA: A Parameter-Efficient Model for Accurate Kidney Abnormality Classification.

Spelling: “nitorgeh” in Figure 3 caption.

“Medel” repeated in tables.

Improve figure resolution and label clarity.

Expand limitations regarding fluid status and catabolic state confounding BUN.

Comments on the Quality of English Language

Grammar inconsistencies.

Author Response

BUN is already a known marker of renal dysfunction and mortality risk in critical illness. The manuscript demonstrates statistical confirmation of expected associations rather than uncovering a new mechanistic or predictive paradigm.

Response: We appreciate the reviewer’s thoughtful comment.

We fully acknowledge that BUN has long been recognized as a marker of renal dysfunction and mortality risk in general critical illness. However, to our knowledge, its prognostic role has not been systematically investigated in patients with vasculitis, a heterogeneous inflammatory disease with distinct pathophysiological mechanisms of renal injury. Importantly, vasculitis-related AKI differs from typical ICU-associated AKI in that immune-mediated vascular inflammation, systemic catabolic stress, and hemodynamic instability may interact in unique ways. Our study is the first, to our knowledge, to evaluate BUN specifically in a vasculitis cohort using multivariable modeling, nonlinear restricted cubic spline analysis, and machine learning–based risk prediction. Beyond confirming an association, we demonstrate:

1) A nonlinear relationship between BUN and AKI/mortality risk;

2) A threshold effect at clinically relevant levels;

3) Robust predictive performance validated through bootstrap resampling;

4) The added value of BUN in an integrated risk prediction framework.

Therefore, rather than merely confirming an expected association, this study contextualizes BUN within a disease-specific inflammatory setting and provides a validated, clinically interpretable risk stratification tool for vasculitis patients. We have clarified this point in the revised Discussion to better emphasize the study’s incremental and disease-specific contribution.

AKI was defined using serum creatinine criteria only, excluding urine output (Methods, p. 17). This introduces misclassification bias.

Response: We thank the reviewer for highlighting this important methodological issue. As noted in the Methods section and further discussed in the Limitations, AKI was defined using the KDIGO serum creatinine criteria because urine output data were frequently incomplete in the MIMIC databases. We acknowledge that the absence of urine output information may have led to underestimation of non-oliguric or transient AKI cases, thereby introducing potential misclassification bias. We have clarified in the revised manuscript that this limitation may have resulted in under-detection of mild AKI and could bias the estimated associations. This issue is inherent to many retrospective analyses using large critical care databases and reflects data availability constraints rather than selective outcome definition.

Although seven models were implemented, and Logistic regression achieved the highest AUC (0.904). Bootstrap validation shows AUC CI 0.725–786 (Figure 6), which appears inconsistent with the reported 0.904 testing AUC. Only 10 bootstrap cycles were used — insufficient for stable resampling inference.

Response: We thank the reviewer for this insightful comment. First, regarding the apparent discrepancy between the testing-set AUC (0.904) and the bootstrap-estimated AUC (mean 0.775), these values were derived from different validation strategies. The AUC of 0.904 represents performance on a single train–test split, whereas the bootstrap estimate reflects the average discrimination across 100 resampled datasets. The slightly lower bootstrap AUC is expected due to optimism correction and provides a more conservative and robust estimate of internal validity.

Second, we agree that 10 bootstrap iterations would be insufficient for stable inference. Accordingly, we have repeated the bootstrap validation using 100 iterations. The updated results are presented in Figure 6, demonstrating consistent performance across resamples. We have clarified these points in the revised manuscript to improve transparency and methodological rigor.

Provide calibration curves (Brier score, calibration slope).

Response: Thank you for this valuable suggestion. We have now added calibration analyses to comprehensively evaluate model performance. Specifically: Calibration curves for the training set and testing set have been added (Supplementary Figures 3C and 3D). Internal bootstrap calibration results are presented in Supplementary Figures 3E and 3F. The calibration plots demonstrate good agreement between predicted and observed probabilities. These additions have been incorporated into both the Methods (Section 2.5) and Results (Section 3.7). We believe these analyses substantially strengthen the evaluation of model reliability.

Report SHAP values for interpretability.

Response: We appreciate the reviewer’s suggestion regarding model interpretability.

To enhance transparency and clinical interpretability, we conducted Shapley Additive Explanations (SHAP) analysis for the final logistic regression model. The SHAP summary plot is now provided in Supplementary Figure 4C. Relevant descriptions have been added to both the Methods (Section 2.5) and Results (Section 3.7).

Provide confusion matrices and sensitivity/specificity.

Response: Thank you for highlighting the importance of threshold-dependent classification metrics. We have now determined the optimal probability cutoff using the Youden index based on the testing set ROC curve. At this threshold: Sensitivity, specificity, and overall accuracy were calculated. The corresponding confusion matrix has been provided (Supplementary Figure 4A and 4B). These results are now explicitly reported in the Results section (Section 3.7). This addition complements the ROC-based discrimination assessment by providing clinically interpretable performance metrics.

LASSO + logistic regression were used to preselect features before ML modeling. It is unclear whether feature selection was performed inside cross-validation folds or on the full dataset prior to split (which would cause data leakage).

Response: We thank the reviewer for this important methodological comment. Feature selection using LASSO regression and multivariable logistic regression was conducted exclusively within the training set prior to model development to prevent data leakage. The testing set was not involved in any stage of feature selection, preprocessing parameter estimation, class rebalancing, or hyperparameter tuning. All preprocessing procedures, including imputation, standardization, and synthetic minority oversampling, were also restricted to the training set. The selected predictors identified within the training set were subsequently applied to the independent testing set for performance evaluation. We have revised the Methods section (Section 2.5) to clarify this workflow explicitly.

Several tables contain typographical errors (“Medel” instead of “Model”).Spelling: “nitorgeh” in Figure 3 caption.“Medel” repeated in tables. Improve figure resolution and label clarity.

Response: We thank the reviewer for carefully checking the manuscript. All typographical errors (including “Medel” corrected to “Model” and “nitorgeh” corrected in the Figure 3 caption) have been carefully revised throughout the manuscript and tables. In addition, the resolution and readability of Figure 4 have been improved to enhance label clarity and overall visual quality.

To improve contextual positioning and incorporate advanced modeling approaches in kidney disease prediction, the authors should consider giving a read to Convnext-PCA: A Parameter-Efficient Model for Accurate Kidney Abnormality Classification.

Response: We thank the reviewer for suggesting the ConvNext-PCA study. We have carefully reviewed the cited work. The ConvNext-PCA model represents an advanced deep learning framework for image-based kidney abnormality classification and demonstrates impressive performance in CT imaging tasks. In our study, however, the predictive modeling was based on structured clinical and laboratory data in critically ill vasculitis patients rather than imaging features. Therefore, simpler and more interpretable models such as logistic regression may be more suitable for clinical risk stratification in this context. We have incorporated a brief discussion in the revised manuscript acknowledging recent deep learning advances in kidney disease prediction and highlighting the distinction between image-based diagnostic models and tabular clinical prognostic modeling.

Expand limitations regarding fluid status and catabolic state confounding BUN.

Response: We thank the reviewer for this important comment.

We agree that BUN levels can be influenced by fluid status, protein intake, gastrointestinal bleeding, and catabolic conditions, which may act as potential confounders. We have revised the Introduction section to explicitly acknowledge that elevated BUN reflects not only renal clearance impairment but also systemic metabolic and hemodynamic stress. This multifactorial nature may partly explain its strong prognostic value in critically ill vasculitis patients. In addition, we have noted in the limitations that fluid status and catabolic conditions were not fully captured in the database, which may introduce residual confounding.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I would like to thank the author for going to such great lengths in the revisions.

Reviewer 3 Report

Comments and Suggestions for Authors

Author has made sufficient updates.

Article Menu

Prognostic Value of Blood Urea Nitrogen for Acute Kidney Injury and Mortality in Vasculitis: A Large Cohort Study Using Multivariate Joint Model and Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI