Short-Duration Monofractal Signals for Heart Failure Characterization Using CNN-ELM Models
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript proposes a CNN–ELM hybrid model for short-duration monofractal signal analysis to distinguish between healthy subjects and heart failure patients (NYHA I–IV). The study is innovative and methodologically sound, and the results align with clinical patterns, showing potential for low-cost, real-time cardiac monitoring. However, several issues should be addressed before publication.
1)The experiments rely mainly on the Physionet database, with limited sample size and class imbalance. Additional datasets or cross-database validation are needed to confirm robustness.
2)Since the Hurst exponent is strongly correlated with disease severity, why not classify directly using the exponent with CNN or ELM? Please provide experiments to justify the added value of CNN-based feature extraction.
3)The study is restricted to monofractal signals, while real HRV often exhibits multifractality and non-stationarity. Extending the analysis to multifractal signals and comparing results would strengthen the work, like doi.org/10.1016/j.rineng.2025.104469.
4)The choice of sample lengths (128, 256, 512) needs clearer justification. Are these values physiologically meaningful? Could more biologically grounded lengths be considered? You can refer to doi.org/10.1016/j.aei.2023.101877.
5)The hybrid CNN–ELM design should be clarified. Why not use CNN directly for both feature extraction and classification?
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe work is interesting and well-structured, but I recommend a major revision. The comments are below:
- The model is mainly trained on synthetic monofractal signals; please clarify how these reflect real RR variability and whether performance was tested on real or mixed datasets.
- The study acknowledges that real RR signals may include multifractal and nonstationary components. Please discuss in more detail how these properties might affect classification performance.
- The link between H values < 0.5 and diseased states is interesting; please expand on the physiological rationale, particularly how autonomic regulation changes or loss of complexity leads to shifts in the Hurst exponent.
- The comparison is limited to SVM; please consider benchmarking against recent deep architectures such as CNN-LSTM hybrids, transformers, or attention-based models to strengthen the study.
- The reported accuracy drops significantly for 128-sample signals (Table 2). Since short signals are critical for real-time monitoring, how do the authors envision overcoming this limitation? Would ensemble methods, augmentation, or domain adaptation be feasible?
- Beyond accuracy and confusion matrices, please include precision, recall, and F1-scores per class to provide a more comprehensive evaluation of performance.
- Some methodological sections (CNN/ELM mathematical derivations) are overly detailed for an applied sciences journal. These could be summarized or moved to supplementary material. The discussion section would benefit from highlighting clinical implications more explicitly
- The manuscript states that the code is available on GitHub, but the provided link appears to be inaccessible; please ensure that the repository is publicly available and functional for reproducibility
- Several references are outdated; please update the literature review to include recent works from the past five years. The field has recently seen approaches that integrate convolutional layers with transformer mechanisms for short physiological signals. Positioning your work against such advances would help contextualize the novelty : doi: 10.1109/ACCESS.2025.3573870
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThank you for the opportunity to review this paper, "Short-Duration Monofractal Signals for Heart Failure Characterization Using CNN-ELM Models"
This paper has two primary aims. First, the authors creates random noise (with varying Hurst indices), and then tries to classify the Hurst index correctly. Subsequently, the trained CNN-ELM model is applied to real-world short time series of normal sinus rhythm and congestive heart failure, and evaluated for correct prediction of NYHA classification.
The paper is especially interesting since it is known that a low heart rate variability is associated with decreased cardiac autonomic regulation, and so a fractal analysis with Hurst exponent as the result could simplify the multitude of different HRV-parameters. It is interesting since training is quicker and easier, and also since other authors have tried to classify data based on models pre-trained on completely different data (transfer learning).
I do not have the foundational knowledge to assess the mathematics behind the paper, and are more used to working with entropy measures than Hursts index.
I have focused on the specific aspects of ECG classification in this review, and i have commented to the editor that reviewing the mathematics behind this paper is beyond my scope.
The discussion and conclusion read too positively: "The proposed CNN-ELM model successfully distinguished between healthy individuals and patients across the NYHA classification spectrum, even with time series as short as 128 samples".
While the model's performance in classifying synthetic monofractal signals (H-index classification) is quantified with accuracy rates (e.g., 92.48% for 512 samples, 68.95% for 128 samples), explicit formal calculations of precision, recall, or F1-scores for the NYHA classification results are notably absent. The assessment of "successful distinction" for NYHA classes relies primarily on visual interpretation of cumulative confusion matrices (Figures 4 and 5).
- I would recommend providing detailed quantitative performance metrics (e.g., precision, recall, F1-score, AUC) for each NYHA class and overall for the classification of heart failure severity.
The sampling rate for the physionet resource was 128 Hz, and the minimum sample length is 128 samples ("Short time series of lengths 2k data points, where k takes values of 9, 8, and 7 were extracted"). If I understand correctly, the signals of length 128 are 1 second long? (Or I misunderstand and each signal is 128 NN-intervals long?) If relevant, please comment on how it reasonable to classify ECG signals with potentially a single / part of a -QRS complex.
Figure 4 length 512 has a total of 27210 signals in the 'healthy' class and a total of 27210 signals in the 'disease' class. Is it intentional that these number are identical? Also, in figure 5, there are equally 27210 signals in the healthy class, but a total of 108.840 signals in NYHA class 1-4
- Please clarify these numbers.
I do feel that some very simple things are missing. Information on the number of patients included in the study is not presented (29 persons, as shown on the PhysioNet database). How many signals were available for the patients? Were the samples single-lead ECG (in that case, which lead), or were they 12-lead?
-Please add this information to section 3.2. as well as how data was processed, and how signals were selected.
Minor:
The authors propose "exploring its integration into intelligent diagnostic systems" as future work. However, that is two or three steps down the line. Physionet is easy and accessible, real-world data is almost defined by messyness.
First, the authors would need to do external validation with real-world data, compare model architecture and test workflow integration.
512 samples (2^9 data points) correspond to 4 seconds of signal. This clarity could be explicitly stated in the methods section to aid reader understanding, avoid unnecessary mathematical complexity, and enhance readability.
This sentence "In particular, (we?) [33] worked with the CNN-SVM approach (neural networks and support vector machines)"
"Monofractal research in the health domain plays a crucial role, particularly in the diagnosis of heart disease." Does it? Already?
Comments on the Quality of English LanguageConsider whether it is necessary to complicate the manuscript with detailed descriptions of the mathematical expression of a convolutional layer or softmax function. People who read your manuscript will have a basic understanding of the foundation, and can look up the calculations in more specific papers if needed. Focus on what the novelty of this method specifically is.
I only managed to find two spelling errors (eg. "4.2. Classification with manofractal signals" or "symptomfree")
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI have no further comment.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsDear authors.
I thank you for this first reviewed version of this manuscript. I think the addition of a demographics table, as well as accuracy metrics makes this a robust contribution to heart rate variability litterature!
However, one key methodological aspect still needs clarification. It remains unclear how the RR intervals were calculated. Could you please specify whether you used a particular package, algorithm, or custom method to derive RR intervals from the raw ECG data? This step is central to the reproducibility of HRV-based analyses and should be explicitly described, especially since references 33 and 34 also do not provide these details.
I would also recommend adding some metrics to the abstract instead of text. Especially this sentence in the abstract needs some metrics to justify: "The model successfully classified both
9 binary (healthy vs. sick) and multiclass (NYHA I–IV) scenarios, with performance validated
10 across multiple time window lengths"
From a clinical perspective, it would also be useful to discuss how the model’s balance between false positives and false negatives should be interpreted. For instance, the relatively low sensitivity (0.1–25.6%) for predicting heart failure suggests that this method may function best as a rule-in test — where a positive result increases confidence in disease presence, but a negative result cannot safely exclude it. Please comment on how you would envision this tool being used in a real-world diagnostic workflow.
Overall, the paper is clearly written and the results are promising. With some additional methodological transparency and a few clarifications regarding clinical interpretation, I believe the manuscript would be well-suited for publication after a second round of minor revisions.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf

