Next Article in Journal
Diagnostic Accuracy of Radiomics Versus Visual or Threshold-Based Assessment for Myocardial Scar/Fibrosis Detection on Cardiac MRI: A Systematic Review
Previous Article in Journal
Assessing the Characteristics of Modern Valvuloplasty Balloons Using a Robotic Non-Contact Optical Approach
 
 
Article
Peer-Review Record

Comparison of Two Risk Calculators Based on Clinical Variables (MAGGIC and BCN Bio-HF) in Prediction of All-Cause Mortality After Acute Heart Failure Episode

by Alejandro Gallego-Cuenca 1, Esperanza Bueno-Juana 1,2, Amelia Campos-Sáenz de Santamaría 1,2, Vanesa Garcés-Horna 2, Marta Sánchez-Marteles 2, Juan I. Pérez-Calvo 1,2, Ignacio Giménez-López 1,2,3,* and Jorge Rubio-Gracia 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 24 August 2025 / Revised: 28 September 2025 / Accepted: 23 October 2025 / Published: 30 October 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review your manuscript. Your study addresses an important and clinically relevant question regarding mortality risk stratification in heart failure patients discharged after ADHF. Attached file is suggestions to strengthen the manuscript and improve its clarity, validity, and relevance.

1)The study uses data from 2013–2018, a period before the widespread use of therapies such as SGLT2 inhibitors, ARNIs, and other guideline-directed medical therapies that are now standard in HF care.

Please expand your discussion of how the absence of these treatments may impact the generalizability of your findings and the performance of both risk models in a modern clinical context.

2)The assumption that all patients were non-smokers due to missing data could introduce systematic bias, particularly in models where smoking is a prognostic variable (e.g., MAGGIC).

Please explain whether smoking status is an input in the calculators used and discuss the potential impact of this assumption on model accuracy. If a sensitivity analysis is not feasible, clearly state this limitation in the discussion and conclusion.

3)The study’s retrospective, single-center nature limits its applicability to broader populations.

I suggest including a stronger statement about the need for external, multicenter, or prospective validation of these findings in diverse populations.

4)Consider streamlining the discussion to reduce repetition and improve clarity. For example, distinguish more clearly between short-term (1-year) and long-term (3-year) performance, and organize points about discrimination, calibration, and reclassification under clear subheadings or distinct paragraphs.

 

 

Author Response

REVIEWER 1
Thank you for the opportunity to review your manuscript. Your study addresses an important and
clinically relevant question regarding mortality risk stratification in heart failure patients discharged
after ADHF. Attached file is suggestions to strengthen the manuscript and improve its clarity, validity,
and relevance.

We thank the Reviewer for her/his careful review and are glad she/he found our study relevant. We

have done our best to address the Reviewer's suggestions for improvement. Thank you very much for
your helpful review.


1)The study uses data from 2013–2018, a period before the widespread use of therapies such as SGLT2
inhibitors, ARNIs, and other guideline-directed medical therapies that are now standard in HF care.
Please expand your discussion of how the absence of these treatments may impact the generalizability
of your findings and the performance of both risk models in a modern clinical context.

We agree with the Reviewer that the absence of modern treatments may impact the generalization
of our results. Hence, we have expanded the Limitations section (lines 434-441) to emphasize how
this issue may affect the predictions and is limiting its generalization to current populations.
MAGGIC does not include new therapies but it is still used as reference model in most recent studies,
suggesting the clinical variables it includes capture well the patient prognosis regardless of their
treatment. While it is true that risk models developed before the introduction of ARNI and SGLT2i
might show calibration issues because they do not capture the effect of such therapies, they however
retain discriminatory ability because the large fraction of the ACM risk is contributed by variables like
systolic blood pressure, ejection fraction, serum sodium, eGFR or hemoglobin levels. It is noteworthy
that miscalibration issues have been a constant for HF risk models even before the introduction of
new therapies. This may be due to the application of generalized models to small cohorts exhibiting
clinical singularities (one size does not fit all).
On the other hand, BCN-bioHF score was calculated using the last iteration (3.0) which includes
therapies such as ARNI or SGLT2i, both having moderate protector effects. This model was thus
developed from a modern clinical cohort. However, compared to previous iterations (v2.0), the
coefficients contributed to the algorithm by each variable did not change significantly after the
addition of SGLT2i, suggesting again that a small group of variables are retaining the larger effects on
mortality risks.


2)The assumption that all patients were non-smokers due to missing data could introduce systematic
bias, particularly in models where smoking is a prognostic variable (e.g., MAGGIC).
Please explain whether smoking status is an input in the calculators used and discuss the potential
impact of this assumption on model accuracy. If a sensitivity analysis is not feasible, clearly state this
limitation in the discussion and conclusion.

We thank the reviewer for pointing this issue, which we had insufficiently addressed in the original
manuscript. MAGGIC includes current smoking as a predictor, which has a relative weight in the total
predicted mortality between 0-4-4% for 1-year and 1.2-12% at 3-year follow-up. MAGGIC also includes
COPD which is pathophysiological related factor, with twice as much weight. So, while we think we
may have captured the long-term effects of smoking through COPD, it is true we have underestimated
the risk in an unknown fraction of patients (current smokers) for as much as 12%. According to 2016
statistics, 27.6% men and 13.8% women were active smokers. Hence, not reflecting this variable in
the calculation is a relatively modest cause for the calibration issue observed with MAGGIC.
MAGGIC is the only popular HF calculator to include current smoking. BCN-bioHF does not include
COPD either.
In any case, the fact that not including current smoking in the calculation could have biased MAGGIC
risk downward among true smokers is now better noted in the manuscript. We were unable to
perform a sensitivity analysis because smoking was not captured in the source records; we now make
these limitations more (lines 136-139 & 452-455).


3)The study’s retrospective, single-center nature limits its applicability to broader populations.
I suggest including a stronger statement about the need for external, multicenter, or prospective
validation of these findings in diverse populations.

We agree with the Reviewer and have strengthened the call for external, multicenter, and prospective
validation, including cohorts with current treatment and diverse care settings (lines 457-460).


4)Consider streamlining the discussion to reduce repetition and improve clarity. For example,
distinguish more clearly between short-term (1-year) and long-term (3-year) performance, and
organize points about discrimination, calibration, and reclassification under clear subheadings or
distinct paragraphs.

Following the Reviewer's suggestion, we reorganized the Discussion adding subheadings
(Discrimination; Calibration; Stratification, Reclassification and Clinical Utility) and explicitly separated
1-year from 3-year performance to reduce repetition and improve readability.

Reviewer 2 Report

Comments and Suggestions for Authors

1. Title and Abstract

The abstract is concise and informative, but it would benefit from inclusion of key statistics on heart failure (HF) prevalence and mortality burden in the introduction section of the abstract to highlight clinical importance.

Please clarify whether both 1-year and 3-year outcomes are all-cause mortality or HF-specific mortality, as this distinction has important clinical implications.

2. Introduction

The introduction provides context but is somewhat weak in justifying the comparative value of MAGGIC and BCN-bioHF. Please elaborate on why these two models were chosen, considering other validated HF prognostic tools (e.g., Seattle Heart Failure Model, GWTG-HF risk score).

Incorporating recent systematic reviews or meta-analyses (2023–2025) on prognostic model comparisons in HF would strengthen the rationale.

3. Methods

The retrospective design is appropriate, but the sample size (n=229) is relatively modest. Please discuss how this may affect the power of calibration/reclassification analyses.

More detail is required on handling of missing data (e.g., imputation method, exclusion criteria).

Specify the time frame of patient enrollment and whether treatment strategies (pharmacological or device-based) were consistent during that period, since changes could affect outcomes.

Clarify whether independent validation or bootstrapping was performed, as this would strengthen the robustness of model performance evaluation.

4. Results

Results are presented clearly, but the calibration findings need further exploration. Please provide numerical goodness-of-fit measures (e.g., Hosmer–Lemeshow, Brier score) in addition to calibration plots.

It would be useful to present Kaplan–Meier curves stratified by model risk categories to visually compare survival differences.

The reclassification metrics (NRI, IDI) are important, but please indicate whether clinical NRI (event vs. nonevent improvement) was considered, as opposed to statistical NRI only.

5. Discussion

The discussion appropriately highlights complementary strengths of the models, but it would benefit from:

A more critical analysis of why calibration was poor and whether recalibration or updating (e.g., refitting coefficients) could improve performance.

Positioning findings in the context of real-world HF management, particularly in resource-constrained settings.

A brief comment on how biomarkers in BCN-bioHF (e.g., NT-proBNP) add incremental prognostic value compared to purely clinical models like MAGGIC.

The limitations section should be expanded to explicitly include:

Small, single-center sample size.

Retrospective design bias.

Absence of external validation.

Evolving HF therapies during follow-up.

6. Figures and Tables

Figures would benefit from higher resolution calibration plots and KM curves, ensuring clarity at publication standards (≥300 dpi).

Consider adding a comparative summary table of model strengths and weaknesses (sensitivity, specificity, PPV, NRI, etc.) for quick clinical interpretation.

7. Conclusion

The conclusion is well written but slightly overstated. The complementary utility of the models should be framed as hypothesis-generating rather than definitive, pending validation in larger, multicenter cohorts.

Author Response

REVIEWER 2
We thank the Reviewer for her/his constructive  and helpful review, that helped us to convey our
message more clearly. We hope we have satisfactorily answered all the Reviewer's requests.


1. Title and Abstract
The abstract is concise and informative, but it would benefit from inclusion of key statistics on heart
failure (HF) prevalence and mortality burden in the introduction section of the abstract to highlight
clinical importance.
Please clarify whether both 1-year and 3-year outcomes are all-cause mortality or HF-specific
mortality, as this distinction has important clinical implications.

We have added concise, quantitative context on HF burden (lines 20-22) and state explicitly that
outcomes were all-cause mortality at both 1 and 3 years. Please find modified text for clarifying this
issue under Background, Methods, and Results. We have also added this clarification in the title.


2. Introduction
The introduction provides context but is somewhat weak in justifying the comparative value of
MAGGIC and BCN-bioHF. Please elaborate on why these two models were chosen, considering other
validated HF prognostic tools (e.g., Seattle Heart Failure Model, GWTG-HF risk score).
Incorporating recent systematic reviews or meta-analyses (2023–2025) on prognostic model
comparisons in HF would strengthen the rationale.

The goal of the study was to test whether the more contemporary risk model BCN-bioHF 3.0 was
superior to MAGGIC in our clinical setting. Both models were developed in chronic HF populations,
lacked direct comparison in the setting of acute heart failure, or previous comparisons were limited
to reporting AUC values.
MAGGIC was chosen between established HF prognostic tools because is available online and requires
a small number of common variables. More importantly, MAGGIC is the best validated model, and is
still considered a benchmark for comparison of HF risk models. For instance, during the period 2023-
25, MAGGIC appears in 67 papers for 12 using SHFM or 33 for GWTG-HF. The 2022 AHA guideline
includes also ADHERE, EFFECT and ESCAPE as risk scores for ADHF. However, these models, such as
GWTG, were developed for predicting in-hospital mortality, are older than MAGGIC and BCN-bioHF
3.0, or lack online calculators (EFFECT, ESCAPE).
BCN-bioHF 3.0 is the current iteration that considers ARNI and SGLT2i therapies, which are not
included in other established models. It also is flexible allowing the addition of biomarkers. Our goal
was to compare its performance in our cohort with the best validated risk score, in the absence of
biomarkers.
Following the Reviewer's suggestion, we expanded the rationale in the Introduction to justify better
the use of MAGGIC among other established HF risk calculators (lines 63-93).
Papers dealing with risk prediction in HF (all settings, >150 paper) in the 2023-25 were again reviewed.
Their contents do not change the perspectives about strengths and limitations of risk prediction tools
in the ADHF context, which were already considered in the Introduction. We have added two new
references including a recent systematic review of HF risk predictors (Skoularigkis et al, J Pers Med
2025). We would appreciate if the Reviewer had any suggestion for a relevant paper in the setting of
ADHF that we might have missed.

3. Methods
The retrospective design is appropriate, but the sample size (n=229) is relatively modest. Please
discuss how this may affect the power of calibration/reclassification analyses.
More detail is required on handling of missing data (e.g., imputation method, exclusion criteria).
Specify the time frame of patient enrollment and whether treatment strategies (pharmacological or
device-based) were consistent during that period, since changes could affect outcomes.
Clarify whether independent validation or bootstrapping was performed, as this would strengthen the
robustness of model performance evaluation.

These are indeed relevant points that needed clarification. New text has been added to the Methods
section to clarify handling of missing data (lines 137-140), specify the time frame line and therapy
consistency (lines 120-121), and the absence of independent validation or bootstrapping (lines 140-
142; lines 432-434).
This study aims to validate the use of BCN-bioHF for risk prediction in our clinical setting, using well
established MAGGIC as benchmark. Hence, the small size number reflects the single-center and
limited resources of our research group. And yet, the number of patients and events in our study is
still comparable with most of the reports we have reviewed in the literature applying these risk models
to local real-world clinical contexts. Mortality rates are in line with previous and present registers in
our clinical context. It is true that applied to our cohort it results in a relatively small number of events
(n=111 at 3-years, 48.5%). Applying the Events Per Variable rule this would allow for safely including
11 variables in a regression model, although the model may remain stable even with 22 variables.
MAGGIC was computed for 12 variables (smoking was left blank) and BCN-bioHF included 19 variables.
Moreover, the observed miscalibration suggest there is not overfitting. Still, we have now made
clearer the need for confirmation in larger, multicenter cohorts (lines 461-464).


4. Results
Results are presented clearly, but the calibration findings need further exploration. Please provide
numerical goodness-of-fit measures (e.g., Hosmer–Lemeshow, Brier score) in addition to calibration
plots.
It would be useful to present Kaplan–Meier curves stratified by model risk categories to visually
compare survival differences.
The reclassification metrics (NRI, IDI) are important, but please indicate whether clinical NRI (event vs.
nonevent improvement) was considered, as opposed to statistical NRI only.

Numerical calibration metrics were already reported: Hosmer–Lemeshow with X² and p-values (lines
257-260) and Brier scores -plus Brier Skill Score (Table 2). Kaplan–Meier curves by strata are already
included (Figure 5).
We now clarify the NRI type and provide event/non-event components (lines 326-328), thank you very
much for your suggestion, it is relevant to identify the source of the improvement.


5. Discussion
The discussion appropriately highlights complementary strengths of the models, but it would benefit
from:
A more critical analysis of why calibration was poor and whether recalibration or updating (e.g.,
refitting coefficients) could improve performance.
Positioning findings in the context of real-world HF management, particularly in resource-constrained
settings.
A brief comment on how biomarkers in BCN-bioHF (e.g., NT-proBNP) add incremental prognostic value
compared to purely clinical models like MAGGIC.

The Discussion section has been greatly revised to accommodate the Reviewer's useful suggestions.
Addition of biomarkers is a common approach to increase the prognostic value in a risk model, but it
remains controversial. For instance, in the original derivation of BCN-HF [Lupon PlosOnes 2014] all
three biomarkers (NT-proBNP, hsTNT or sST2) showed moderate or null effects when added alone
(NRI not significant). However, in a validation study carried out on a different cohort in the same
ambulatory clinic, addition of NT-proBNP increase discrimination power and improved reclassification
(Codina Eur J Heart Fail 2021]. To our knowledge, addition of NT-proBNP alone in the ADHF setting
has not been tested. Addition of several biomarkers (NT-proBNP, GDF-15 and hsCRP) improved the
performance of both MAGGIC and BCN-bioHF in an ADHF cohort (Alvarez-Garcia, Front Physiol 2021].
On the other hand, a recent study in acute HF patients showed that the addition of sST2 to MAGGIC
reduced its prognostic value while increasing that of BCN-bioHF [Pérez-Sanz Heart Vessels 2023].
The manuscript already discusses the issues with adding biomarkers (444-450). A comment has been
added to note that the addition of biomarkers can increase BCN-bioHF performance.


The limitations section should be expanded to explicitly include:
Small, single-center sample size.
Retrospective design bias.
Absence of external validation.
Evolving HF therapies during follow-up.

The Limitations sections has been expanded to make sure all the specific issues identified by the
Reviewer have been mentioned and discussed (lines 429-463).


6. Figures and Tables
Figures would benefit from higher resolution calibration plots and KM curves, ensuring clarity at
publication standards (≥300 dpi).
Consider adding a comparative summary table of model strengths and weaknesses (sensitivity,
specificity, PPV, NRI, etc.) for quick clinical interpretation.

High resolution figures (>300dpi) have been provided as individual files through the Journal's
application.
A concise comparative summary table is already present (Table 2) with AUROC, sensitivity, specificity
(via 1-specificity), PPV, NPV, Brier, NRI, IDI; A slight formatting tweak was done to add an explicit
“Specificity” column for quicker clinical interpretation.


7. Conclusion
The conclusion is well written but slightly overstated. The complementary utility of the models should
be framed as hypothesis-generating rather than definitive, pending validation in larger, multicenter
cohorts.
We thank the Reviewer for the construceve suggeseon to improve the conclusion. The Conclusion
seceon has been shortened, to focus on the main findings and to avoid too much speculaeon (lines
465-474).

Back to TopTop