Next Article in Journal
The Role of XAI in Advice-Taking from a Clinical Decision Support System: A Comparative User Study of Feature Contribution-Based and Example-Based Explanations
Next Article in Special Issue
A Projection-Based Augmented Reality System for Medical Applications
Previous Article in Journal
Simplified Double-Integral Sliding-Mode Control of PWM DC-AC Converter with Constant Switching Frequency
 
 
Article
Peer-Review Record

Prediagnosis of Heart Failure (HF) Using Deep Learning and the Korotkoff Sound

Appl. Sci. 2022, 12(20), 10322; https://doi.org/10.3390/app122010322
by Huanyu Zhang 1, Ruwei Wang 1, Hong Zhou 1,*, Shudong Xia 2,*, Sixiang Jia 2 and Yiteng Wu 2
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Appl. Sci. 2022, 12(20), 10322; https://doi.org/10.3390/app122010322
Submission received: 7 September 2022 / Revised: 9 October 2022 / Accepted: 10 October 2022 / Published: 13 October 2022
(This article belongs to the Special Issue Advanced Medical Signal Processing and Visualization)

Round 1

Reviewer 1 Report

It's an interesting topic to address but more work and perhaps cross references in bio-signal, biomedical engineering and bioinformatics for better state of the art of your research. 

Research contributions are clearly stated.

Problem statement is equivocal

I'm a little bit struggling in understanding of problem statement, research purpose and how to tackle the problem, research story in introduction needs to be clearer and more concise especially in problem statements and research purpose also add more conventional research methods 

 

This paper need to present research gap, and novelty using comprehensive research conventional story with clearly and briefly in the introduction.

This paper should be present research methodology in detail, there are several part of the method not explained yet

Fig. 1 failed to described overview of the systems. Please improve more better illustration

Unusual schematic diagram FIg.4,5,6,.7

Please redraw schematic diagram for deep learning with better illustration

Too many performance comparison of NN, 

What is the difference of fig 8 and fig. 9?

can you comprise to reduce number of figure?

 

 

Comments for author File: Comments.pdf

Author Response

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Classification of chronic heart failure (CHF) using deep learning and the Korotkoff sound”. These comments are all valuable and very helpful for revising and improving our paper, as well as providing important guidance for our research. We have studied these comments carefully and made corrections for each one, which we hope meet with approval.

 

Here are the responses:

  1. It's an interesting topic to address but more work and perhaps cross references in bio-signal, biomedical engineering and bioinformatics for better state of the art of your research.

Response

Thank you for your advice. In the introduction, we discussed the most recent developments in related fields, which was extremely helpful to me.

 

  1. I'm a little bit struggling in understanding of problem statement, research purpose and how to tackle the problem, research story in introduction needs to be clearer and more concise especially in problem statements and research purpose also add more conventional research methods

Response

As for the referee's concern, we revised the introduction, including problem statements and research objectives, and advanced traditional research methods in response to your suggestions.

 

  1. This paper need to present research gap, and novelty using comprehensive research conventional story with clearly and briefly in the introduction.

Response

As for the referee's concern, we have made significant changes to the introduction and hope to receive your approval.

 

  1. This paper should be present research methodology in detail, there are several part of the method not explained yet

Response

Very appreciate your advice. We significantly improved the System Overview to include critical information.

  1. Fig. 1 failed to described overview of the systems. Please improve more better illustration

Response

As for the referee's concern, we have revised Figure 1.

 

  1. Unusual schematic diagram FIg.4,5,6,.7

Response

As for the referee's concern, we have revised Fig. 4-7.

 

  1. Please redraw schematic diagram for deep learning with better illustration

Response

As for the referee's concern, we have revised Fig. 4-7.

 

  1. Too many performance comparison of NN,

Response

As for the referee's concern, we have reduced performance metrics.

 

  1. What is the difference of fig 8 and fig. 9?

Response

Thank you for your inquiry.

In fact, both figures compare the ROC curves of different classifiers on the CWT and MFCC datasets. Figure 8 depicts the ROC curve comparison of the segmented dataset classification results, while Figure 9 depicts the unsegmented dataset classification results.

The results show that the ROC curve obtained by each classifier in segmented KS data is fundamentally worse than the ROC curve obtained in unsegmented KS data, regardless of whether it is CWT or MFCC datasets. (AUC parameters are used to evaluate ROC curves, but looking at the ROC curve directly will be more intuitive.)

 

  1. can you comprise to reduce number of figure?

Response

As for the referee's concern, we've reduced the number of figures.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript entitled “Classification of chronic heart failure (CHF) using deep learning and the Korotkoff sound” studies chronic heart failure (CHF) classification based on the deep convolutional neural network (CNN) and the Korotkoff sound (KS), considering continuous wavelet transform (CWT) features, Mel Frequency Cepstrum Coefficient (MFCC) features, and signal segmentation. Some well-known CNN architectures were used for this purpose, including AlexNet, VGG19, ResNet50, and Xception. Their performance was analyzed using several evaluation metrics, indicating successful classification based on the MFCC datasets.

The manuscript is well-written and easy to follow. The presented study is interesting and has some potential practical application.

However, here are some comments I would like the authors to address before the manuscript is considered for publication:

1.      Please put the reference numbers within the text into square brackets [].

2.      The text in lines 143-148 and 149-154 is almost identical. Please correct.

3.      The literature review is well done, considering important and recent studies and placing the study within the narrow research field. However, the application of deep CNNs with various two-dimensional signal representations has become a hot research topic recently. Therefore, I would like to suggest the authors supplement the introductory part with some of recent studies on this topic to briefly illustrate the state-of-the-art performances of the CNNs and alternative time-frequency representations in many different applications today and provide an interested reader with examples. Please consider briefly mentioning the following papers for illustration purposes: 10.1007/s10044-020-00921-5, 10.1109/ACCESS.2021.3139850, 10.1109/TNNLS.2020.3008938.

4.      Line 176 is unnecessary.

5.      Please elaborate on how is the scaling of CWT coefficients to 224x224 image done (line 223).

6.      The manuscript does not use the term convolutional neural network (CNN), although utilized models are deep CNNs.

7.      The text in Figure 5 is hardly visible. Please modify the figure by considering alternative network structure representation.

8.      Please elaborate precisely on which data augmentation procedures were utilized in this study and how. Please also elaborate on how did they affect dataset size and diversity.

9.      Did the authors analyze the statistical significance of the obtained classification results?

10.  In lines 314-315, the authors state that the learning rate and batch size were set to the same values for each model. This is not a technically correct approach. Namely, these training parameters should be optimized for each model (including different CNN architecture and different dataset) to allow fair comparison of their performances.

11.  Line 344: the AUC of Xception is higher than that of the  VGG19, despite being ranked as the second best.

12.  Is accuracy an appropriate metric for ranking models’ performances in the Discussion section, as the dataset is unbalanced?

 

 

 

Author Response

Thank you for your comments concerning our manuscript entitled “Classification of chronic heart failure (HF) using deep learning and the Korotkoff sound”. These comments are extremely helpful in improving the quality of our papers, as well as providing important guidance for our research. We have studied these comments carefully and made corrections for each one, which we hope meet with approval.

 

Here are the responses:

 

  1. Please put the reference numbers within the text into square brackets [].

Response

As for the referee's concern, the format of reference numbers has been revised.

 

  1. The text in lines 143-148 and 149-154 is almost identical. Please correct.

Response

Thank you very much for your advice. This paragraph has been revised.

 

  1. The literature review is well done, considering important and recent studies and placing the study within the narrow research field. However, the application of deep CNNs with various two-dimensional signal representations has become a hot research topic recently. Therefore, I would like to suggest the authors supplement the introductory part with some of recent studies on this topic to briefly illustrate the state-of-the-art performances of the CNNs and alternative time-frequency representations in many different applications today and provide an interested reader with examples. Please consider briefly mentioning the following papers for illustration purposes: 10.1007/s10044-020-00921-5, 10.1109/ACCESS.2021.3139850, 10.1109/TNNLS.2020.3008938.

Response

These documents are excellent, and play an important role in improving the quality of our manuscript. We appreciate your suggestions.

 

  1. Line 176 is unnecessary.

Response

As for the referee's concern, Line 176 has been removed.

 

  1. Please elaborate on how is the scaling of CWT coefficients to 224x224 image done (line 223).

Response

As for the referee's concern, the procedure for generating 224x224 images from CWT coefficients has been thoroughly described.

 

  1. The manuscript does not use the term convolutional neural network (CNN), although utilized models are deep CNNs.

Response

As for the referee's concern, the term Convolutional Neural Network (CNN) has been added.

 

  1. The text in Figure 5 is hardly visible. Please modify the figure by considering alternative network structure representation.

Response

As for the referee's concern, Fig. 5 has been revised.

 

  1. Please elaborate precisely on which data augmentation procedures were utilized in this study and how. Please also elaborate on how did they affect dataset size and diversity.

Response

As for the referee's concern, we revised the image augmentation section's content.

 

  1. Did the authors analyze the statistical significance of the obtained classification results?

Response

Thank you for your patience questions.

In fact, we conducted extensive statistical analysis, these projects provided a solid foundation for this paper's research.

  1. Height, weight, BMI, left ventricular ejection fraction, brachial artery flow velocity, brachial artery flow, and other independent variables were used to examine their relationship with patients as well as the differences between patients and healthy people. Objectively, the statistical analysis results are not ideal. These variables' sensitivity to patients is insufficient, and there is no significant specificity.
  2. Based on the energy characteristics of the KS signal as determined by wavelet analysis in different frequency bands (energy ratio extraction method reference https://doi.org/10.1155/2022/3226655), we discovered that these signals have a clear correlation with the patient.
  3. We focused on the relationship between brachial artery flow and human left ventricular outflow and discovered a clear proportion of them: 1.23% (https://doi.org/10.1155/2021/1251199)

We do not list the statistical significance of the analysis results in this paper for the reasons below:

  1. We have conducted corresponding discussions in published articles, which can demonstrate the reliability of the classification results.
  2. Because of the limitations of the article's structure and length, we did not include any relevant content.

However, if you feel the need to add relevant content, we will do so.

 

  1. In lines 314-315, the authors state that the learning rate and batch size were set to the same values for each model. This is not a technically correct approach. Namely, these training parameters should be optimized for each model (including different CNN architecture and different dataset) to allow fair comparison of their performances.

Response

Thank you for your advice. We re-optimize the learning rate and bach size for each deep CNN in order to maximize the performance of each network. Your recommendation improved the overall quality of our manuscript. Thank you very much.

 

  1. Line 344: the AUC of Xception is higher than that of the VGG19, despite being ranked as the second best.

Response

    As for the referee's concern, this section has been updated.

  1. Is accuracy an appropriate metric for ranking models’ performances in the Discussion section, as the dataset is unbalanced?

Response

Thank you very much for your question; objectively speaking, I was hesitant for a long time about evaluating results with unbalanced data. In fact, I believe that the ROC curve and the AUC value are important bases for measuring the model's performance under unbalanced data. However, I noticed that the majority of the literature focuses on the calculation results of the accuracy value. Inspired by you, I removed the ambiguity and focused more on the ROC curve and AUC value. This section of the manuscript has undergone extensive revision. Thank you once more for your thorough guidance.

Author Response File: Author Response.docx

Reviewer 3 Report

1. Summary:
This paper applied deep learning to classify CHF on KS. The authors selected several mainstreamed CNN architectures and compared segmented and unsegmented inputs. From the result, the CNNs performed well on segmented inputs with their experimental data.


2. Strength:
+ This research includes their own database.
+ The paper is well-structured and easy to understand.

3. Weakness:

- The English writing is poor, with many grammatical errors and statements that lack reference to support.

What is QKD? They are not defined. Furthermore, even if defined in the abstract, it is better to specify it in the main text. The reviewer suggests carefully proofreading this paper.

- The novelty of this research seems not enough. Specifically, the approaches used are commonly used CNN models and PCG features in related research. Regarding the reviewer’s knowledge, the selected features CWT and MFCC are quite small-size inputs. VGG19, ResNet50, and Xception are absolutely overused. Therefore, the contributions are debatable.

- The methodology is not explained in detail. What were the filter settings? Feature settings, e.g., window size, MFCC of 13? CWT mother wavelet? How were the data augmented from 365 to 1692? What were the optimizer and max epoch? How the models were pre-trained? Batch size 20 was definitely unreasonable. The reviewer suggests double-checking and adding this key information in the methodology rather than irrelevant basic concepts.

 

 

Author Response

Thank you for your comments concerning our manuscript entitled “Classification of chronic heart failure (HF) using deep learning and the Korotkoff sound”. These comments are extremely helpful and have significantly improved the quality of our manuscript. We thoroughly read and studied your comments. Each comment in the manuscript has been revised after careful consideration, which we hope meet with approval.

 

Here are the responses:

 

  1. The English writing is poor, with many grammatical errors and statements that lack reference to support.

Response

Thank you for your comments. We have made every effort to correct the manuscript's grammatical errors and add as many references to the description as possible.

 

  1. What is QKD? They are not defined. Furthermore, even if defined in the abstract, it is better to specify it in the main text. The reviewer suggests carefully proofreading this paper.

Response

As for the referee's concern, these issues have been revised in the manuscript.

 

3 The novelty of this research seems not enough. Specifically, the approaches used are commonly used CNN models and PCG features in related research. Regarding the reviewer’s knowledge, the selected features CWT and MFCC are quite small-size inputs. VGG19, ResNet50, and Xception are absolutely overused. Therefore, the contributions are debatable.

Response

Thank you for your comments. The article employs commonly used CNN models and features extraction methods such as CWT and MFCC, the theoretical innovation is devoid of any bright spots. In fact, there are many similarities between heart sounds and KS, so the analysis of these signals is very close.

The purpose of this paper is to investigate a more convenient and noninvasive method for early diagnosis of CHF, to improve the detection probability of potential risk patients, and to aid in the prevention and control of CHF. As far as we know, our team is the first to pay attention to the correlation between KS and CHF and put it into practice in the field of CHF prevention and treatment; our work is more focused on expanding the application fields of AI to benefit those in urgent need of assistance.

We have made some progress in the research of the statistical law of CHF patients and traditional machine learning recognition at the moment. (https://doi.org/10.1155/2021/1251199, https://doi.org/10.1155/2022/3226655) By the way, the purpose of this manuscript is to identify a deep learning algorithm that can improve the early detection probability of CHF and then apply that algorithm to model optimization and embedded development.

With your valuable suggestions and our work plan, we still have a lot of work to do in the future, and we believe that some new findings will be published in the near future.

4 The methodology is not explained in detail. What were the filter settings? Feature settings, e.g., window size, MFCC of 13? CWT mother wavelet? How were the data augmented from 365 to 1692? What were the optimizer and max epoch? How the models were pre-trained? Batch size 20 was definitely unreasonable. The reviewer suggests double-checking and adding this key information in the methodology rather than irrelevant basic concepts.

Response

As for the referee's concern, these issues have been revised in the manuscript. Our first draft did indeed only pile up the basic concepts, with no description of the relevant content, resulting in an unprofessional article; however, your recommendations really enhanced our article's professionalism and Logicality. Thank you very much!

Author Response File: Author Response.docx

Reviewer 4 Report

The paper is very interesting, but the main problem comes from the fact that the HF subjects, we are dealing with, are not precisely characterized and classified. So we do not know precisely the meaning and the potential of the classification the Authors are performing and what kind of Heart Failure (HF) they are identifying. In fact in the title they are suggesting a "classification of chronic heart failure", but all throughout the text they are referring to "an innovative method for pre-diagnosis of HF" (as example line 83). This is not clear.

To allow to understand the meaning and the potential of their work the Authors should characterize precisely both how they assessed healthy subjects, and also Heart Failure patients. It is also not clear what they call Chronic Heart Failure (they use the abbreviation CHF in the text, but HCF in Table 1). This abbreviation is very equivocal, since CHF can be easily interpreted as Congestive Heart Failure instead of Chronic Heart Failure. So we suggest to use HF (Heart Failure) as a unique abbreviation and term both in the Title, Tables and Text to avoid misunderstanding. HF patients should  also be defined according to the ESC 2021 Guidelines as HFrEF, HFmrEF, HFpEF. McDonagh TA, Metra M, Adamo M, Gardner RS, Baumbach A,  Böhm M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic  heart failure. Eur Heart J. 2021  Sep 21;42(36):3599-3726. The patients should also have BNP and/or NTpro-BNP measurements performed at the time of this study. The duration of HF disease should be indicated as well as actual drug treatment, since all this could influence the results, beeing specific for this specific population under study. All these informations could also allow to analyze and compare the performance of the proposed methodology, when to be applied to any new different population of HF. This also to lead to a more generalizable approach.  The presence and the possible influence of comorbidities, like diabetes, hypertension, coronary artery disease, renal failure, anemia and so on, should be also indicated. It should be also indicated if all the patients were consecutive, and how was their body mass Index (BMI) and sex . In fact we suggest also a separate analysis according to BMI and sexes for the influence of weight and breast on the KS recording. 

The first part of the introduction appears too long and non pertinent to this study.  Throughout the paper References numbers should be indicated between parentesis (1,2....)  to help the reader. 

Lines 59-62 are not clear: try to better explain.

Paragraph 2. Related works. It could be perhaps part of the discussion. In any case it appears too long and detailed. 

Paragraph 3. Materials and methods: It should be specified in detail what kind of sound recording machine was used, its brand and with which technical characteristics, standardization and settings. It should be also described how the recording was actually performed in a representative subject, and possible tips and tricks to be known to optimize the recording. It is also not clear on which technical substrate (hard disc ?) the recording was actually stored and how this recording was then interfaced with the deep learning system. The technical characteristics of the computer hardware shouls be also reported. 

Lines 171-173: this is not clear. 

Line 171: why you can assume "that the silence period ha no effect on CHF classification"?

In the text, instead of "table..." should be "Table...", as example in lines 392 and 398.

The Discussion should be taken into account all the suggestions reported above. Also the Conclusions should be changed accordingly, mostly lines 449-455.

Author Response

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Classification of chronic heart failure (CHF) using deep learning and the Korotkoff sound”. These comments are all valuable and very helpful for revising and improving our paper, as well as providing important guidance for our research. We have studied these comments carefully and made corrections for each one, which we hope meet with approval.

 

Here are the responses:

 

  1. The paper is very interesting, but the main problem comes from the fact that the HF subjects, we are dealing with, are not precisely characterized and classified. So we do not know precisely the meaning and the potential of the classification the Authors are performing and what kind of Heart Failure (HF) they are identifying. In fact in the title they are suggesting a "classification of chronic heart failure", but all throughout the text they are referring to "an innovative method for pre-diagnosis of HF" (as example line 83). This is not clear.

Response

Your suggestion is very pertinent and instructive; as you stated, HF can only be diagnosed with the assistance of professional equipment and a professional medical team, which is a problem we face in our work and a difficult problem that we strive to solve. Our objective is to find a more convenient marker to accomplish the pre-diagnosis and initial screening of HF, as well as to remind potential risk groups of the admission examination on time. At the same time, as a physiological signal far from the heart, the richness of the KS signal is hardly comparable to that of heart sound or ECG, limiting its use to primary screening. We apologize for any confusion caused by the article's title, for which we have made specific changes.

 

  1. To allow to understand the meaning and the potential of their work the Authors should characterize precisely both how they assessed healthy subjects, and also Heart Failure patients. It is also not clear what they call Chronic Heart Failure (they use the abbreviation CHF in the text, but HCF in Table 1). This abbreviation is very equivocal, since CHF can be easily interpreted as Congestive Heart Failure instead of Chronic Heart Failure. So we suggest to use HF (Heart Failure) as a unique abbreviation and term both in the Title, Tables and Text to avoid misunderstanding. HF patients should also be defined according to the ESC 2021 Guidelines as HFrEF, HFmrEF, HFpEF. McDonagh TA, Metra M, Adamo M, Gardner RS, Baumbach A, Böhm M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic  heart failure. Eur Heart J. 2021 Sep 21;42(36):3599-3726. The patients should also have BNP and/or NTpro-BNP measurements performed at the time of this study. The duration of HF disease should be indicated as well as actual drug treatment, since all this could influence the results, beeing specific for this specific population under study. All these informations could also allow to analyze and compare the performance of the proposed methodology, when to be applied to any new different population of HF. This also to lead to a more generalizable approach.  The presence and the possible influence of comorbidities, like diabetes, hypertension, coronary artery disease, renal failure, anemia and so on, should be also indicated. It should be also indicated if all the patients were consecutive, and how was their body mass Index (BMI) and sex . In fact we suggest also a separate analysis according to BMI and sexes for the influence of weight and breast on the KS recording.

Response

Thank you very much for your valuable advice. In fact, we carry out the screening process for patients and ordinary people in the manner you suggest. A team of professional physicians thoroughly evaluated all volunteers, including height, weight, age, gender, BMI, left ventricular ejection fraction, underlying disease, and NT-proBNP. When the experimental data was collected, all of the patients had just been admitted to the hospital and were not receiving drug treatment. The lack of a thorough investigation of how medications and underlying illnesses affect pre-diagnosis outcomes is due in part to our current concerns about the method's viability and our slight concern that it may not be as sensitive as we would want. This will undoubtedly play a significant role in our ongoing studies.

Additionally, patients are followed continuously, and our primary subjects are individuals with chronic heart failure, whose left ventricular ejection fraction has consistently been around 50%. In fact, LVEF is our primary reference for selecting patient volunteers, and the method for obtaining it is described in the revised manuscript.

 

  1. The first part of the introduction appears too long and non pertinent to this study. Throughout the paper References numbers should be indicated between parentesis (1,2....) to help the reader.

Response

As for the referee's concern, the introduction and reference list have been revised.

 

  1. Lines 59-62 are not clear: try to better explain..

Response

As for the referee's concern, we have revised lines 59-62.

 

  1. Related works. It could be perhaps part of the discussion. In any case it appears too long and detailed.

Response

As for the referee's concern, we have revised related works.

  1. Materials and methods: It should be specified in detail what kind of sound recording machine was used, its brand and with which technical characteristics, standardization and settings. It should be also described how the recording was actually performed in a representative subject, and possible tips and tricks to be known to optimize the recording. It is also not clear on which technical substrate (hard disc ?) the recording was actually stored and how this recording was then interfaced with the deep learning system. The technical characteristics of the computer hardware shouls be also reported.

Response

As for the referee's concern, these sections have been thoroughly revised.

  1. Lines 171-173: this is not clear. Line 171: why you can assume "that the silence period ha no effect on CHF classification"?

Response

Thank you for your professional and helpful questions.

In fact, this is just a hypothesis in our experiment that refers to the KS blood pressure test. Based on KS blood pressure monitoring, the first audible sound is usually defined as systolic blood pressure, and the last audible sound is usually defined as diastolic blood pressure. The voice during the silent period, on the other hand, pays little attention. In this paper's KS segmentation experiment, we simply assumed that only the human ear could hear the KS as a useful sound, so we segmented the sound during the silent period (common sense knowledge). Furthermore, from the standpoint of signal segmentation, it is frequently difficult to achieve accurate recognition of weak amplitude KS signals, which is another layer of consideration (Problems with technology).

As a result, as you questioned, our hypothesis is not a rigorous method, demonstrating that the KS signal segmentation technology is insufficient, and sound in the silent period is still very useful for HF recognition.

  1. In the text, instead of "table..." should be "Table...", as example in lines 392 and 398.

Response

As for the referee's concern, the table's name has been modified.

  1. The Discussion should be taken into account all the suggestions reported above. Also the Conclusions should be changed accordingly, mostly lines 449-455.

Response

As for the referee's concern, the Discussion and Conclusions has been rewritten.

 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors have addressed most of my comments, and the manuscript has been significantly improved after revisions.

The only comment that has not been adequately addressed is putting reference numbers within the manuscript's text into square brackets, e.g., [1], [2-3],... (according to the official MDPI template).

After this comment is addressed, I believe the manuscript is ready for publication.

Author Response

As for the referee's concern, the format of reference numbers has been revised. ( Refer to line 47, line 59, lines 65-66, line 70, line 268)

Author Response File: Author Response.docx

Reviewer 3 Report

Thanks for the revision. This reviewer still thinks that the manuscript needs major revision 

1.       In line 33, ‘women in half of the world and men in 3/4 of the world face premature death from cardiovascular disease’, this statement needs reference to support.

2.       It is suggested to briefly describe what is Korotkoff sound and the difference to the heart sound (HS), and why KS is not HS in this study.

3.       Abbreviation issues. Please define the ‘KS’ in the main body before using it. QKD is short for ‘timing of KS’? It seems not matched.  Please define ‘ST/UST’ before Figure 1, or readers can only know it in Results.

4.       In line 57, you claimed that traditional ML requires signal processing and relies on feature extraction, DL can avoid these issues. However, in the paper, signal processing and feature extraction were all applied with DL, which seems contradictory.

5.       In line 68, your statement after the survey is not convincing to the reviewer.

6.       It is not very clear to the reviewer what the segmentation exactly mean in your methodology. Normally, in heart sound segmentation, it is to break the signal into heart cycles. Did you do the same? Does the unsegmented mean raw data? It is confusing what is the difference between segmented and unsegmented.

7.       It is quite necessary to describe what is the data acquisition duration. Besides, it is better to include a figure for the acquisition settings (how is the participant positioned, where the device is placed, etc).

8.       In line 230, the statement is not correct. CNN supports 1D input.

9.       What is the output size of CWT or MFCC features before resizing to 224×224? I assume it is magnified over 20 times, which is actually information redundancy. Properly adjusting the input size is fine, but as I assumed, it is unreasonable.

10.   In the reviewer’s opinion, if you want to compare the CNN models, the settings should be as consistent as possible. The pre-trained or transfer learning will directly affect the performance due to different CNN capacities. Besides, Batch size, learning rate and max epoch are all different for the CNN models, which makes the comparison not convincing.

 

11.   It is good to compare your study with previous research in the Discussion, however, you work on different signals and different tasks, which is a bit confusing in your comparison.

Author Response

Thank you for your comments concerning our manuscript entitled “Classification of chronic heart failure (HF) using deep learning and the Korotkoff sound”. Your comments on this article are professional and detailed, and we greatly benefit from them. We thoroughly read and studied your comments. Each comment in the manuscript has been revised after careful consideration, which we hope meet with approval.

 

Here are the responses:

 

  1. In line 33, ‘women in half of the world and men in 3/4 of the world face premature death from cardiovascular disease’, this statement needs reference to support.

Response

Thank you for your comments. This is an improper description, which has been corrected, and the corresponding references have been listed. ( Refer to lines 32-34)

 

  1. It is suggested to briefly describe what is Korotkoff sound and the difference to the heart sound (HS), and why KS is not HS in this study.

Response

As for the referee's concern, these contents have been revised in the manuscript. ( Refer to lines 75-83)

 

3 Abbreviation issues. Please define the ‘KS’ in the main body before using it. QKD is short for ‘timing of KS’? It seems not matched.  Please define ‘ST/UST’ before Figure 1, or readers can only know it in Results.

Response

Thank you for your suggestion; we have made changes to the manuscript. ( Refer to line 42, lines 48-52, and lines 159-161)

4 In line 57, you claimed that traditional ML requires signal processing and relies on feature extraction, DL can avoid these issues. However, in the paper, signal processing and feature extraction were all applied with DL, which seems contradictory.

Response

Thank you for your inquiry. This section of the text has also piqued my interest. In fact, to achieve satisfactory classification results in traditional machine learning, we must manually extract dozens or even hundreds of signal features, such as energy features, wavelet features, entropy features, MFCC features, statistical features, and so on. This process may be accompanied by a large number of correlation analyses to reduce feature matrix redundancy and improve analysis efficiency.

The main goal of the feature extraction method using CWT and MFCC in this paper is to convert one-dimensional signals into two-dimensional signals in order to complete the classification task in CNN. These feature extraction tasks are far simpler than traditional ML feature requirements, and they save us a significant amount of time in pre-processing.

5 In line 68, your statement after the survey is not convincing to the reviewer.

Response

As for the referee's concern, we revised the manuscript. ( Refer to lines 72-74)

6 It is not very clear to the reviewer what the segmentation exactly mean in your methodology. Normally, in heart sound segmentation, it is to break the signal into heart cycles. Did you do the same? Does the unsegmented mean raw data? It is confusing what is the difference between segmented and unsegmented.

Response

Thank you for your inquiry. The goal of KS signal segmentation differs from that of HS. Overall, HS is a relatively stable set of signals, with relatively small differences in each heartbeat cycle sound, so the heart sound can be truncated heartbeat cycle, and comparative analysis. However, KS is a collection of pulse signals, and the signal strength and characteristics vary greatly across time. Moreover, the duration of the KS signal (14s-25s) and the number of KS pulses (6-20) vary greatly between individuals.

The KS signal does not always exist. It is only activated under certain cuff pressures. It takes approximately 4-15 seconds for the cuff pressure to begin to release to the first KS signal. The KS signal has almost no sound during this period, which we call the silence period. The purpose of KS segmentation is to extract effective KS pulse signals from the silent period in order to remove weak signals and other noises. This is useful in traditional feature extraction because it can improve feature correlation and sensitivity. In this paper's research, we did not perform extensive feature analysis on the segmented KS signals, instead feeding them to the DL model to examine the classification effect of the signal segmentation method in DL.

The unsegmented KS signal, as you mentioned, is the original signal, and we almost retain all of the information during the KS test, that is, the signals during the KS silence period are considered valid signals. Through comparative experiments, we can examine the effect of the silent period signal on HF classification.

7 It is quite necessary to describe what is the data acquisition duration. Besides, it is better to include a figure for the acquisition settings (how is the participant positioned, where the device is placed, etc).

Response

Thank you for your suggestion; the relevant description and image description have been added to the revised manuscript. ( Refer to lines 199-201, lines 205-206)

8 In line 230, the statement is not correct. CNN supports 1D input.

Response

Thank you for your careful guidance; I have revised the manuscript. ( Refer to lines 263-264)

In fact, we try to use the 1D CNN network as well, but the results are less than ideal. We believe that the primary causes of this problem are the KS signal's length inconsistency as well as its strong unsteady characteristics.

 

9 What is the output size of CWT or MFCC features before resizing to 224×224? I assume it is magnified over 20 times, which is actually information redundancy. Properly adjusting the input size is fine, but as I assumed, it is unreasonable.

Response

Thank you greatly for your thorough feedback on our manuscript. The number of center frequencies in the CWT feature calculation is approximately 600, and the signal length is approximately 42,000; thus, the size of the CWT coefficient matrix is [600×42000]. Because the length of each signal varies, the wavelet coefficient matrix will differ slightly. The coefficient of the MFCC matrix is [167×26], where 26 is the number of filters and 167 is the coefficient of each filter. We use unified [224×224] images primarily to reduce computational load and improve computational efficiency; however, as you mentioned, such processing will also face the problem of data redundancy.

10 In the reviewer’s opinion, if you want to compare the CNN models, the settings should be as consistent as possible. The pre-trained or transfer learning will directly affect the performance due to different CNN capacities. Besides, Batch size, learning rate and max epoch are all different for the CNN models, which makes the comparison not convincing.

Response

Sincerely, we appreciate your wise counsel; this issue has bothered me for a while.

I used the same calculation parameters at first: learning rate 1e-4, batch size = 20, max epoch = 10. As you suggest, limit the independent variables of each model as much as possible to ensure the fairness of the comparison results. However, during the first round of review, some experts suggested that different parameters be used for different models to ensure the fairness of the comparison results. After careful consideration, I re-optimized each model. The goal of optimization is to achieve the best classification performance for each model. Finally, different models use different parameters (as demonstrated in the revised version), resulting in the best performance that these models can achieve in these datasets.

Your suggestion makes me realize that both methods may be rational, which leaves me perplexed. I have a strong impression that you have extensive experience in ML and DL research based on your many high-quality suggestions for this article. I hope to receive additional assistance from you; thank you very much!

11 It is good to compare your study with previous research in the Discussion, however, you work on different signals and different tasks, which is a bit confusing in your comparison.

Response

Thank you for taking the time to read our article thoroughly. In fact, we use various signals. HS are used as markers in the studies on the list, whereas we use KS signals. However, it should be noted that they are all sound signals produced by the human body. At the same time, our objectives are consistent: classification or pre-diagnosis of heart failure. The ultimate goal is to encourage the potential risk population to seek medical attention as soon as possible in order to avoid more dangerous situations. This is also why I compare the methods of these literatures to our own.

Author Response File: Author Response.docx

Reviewer 4 Report

References are written in a confusing manner throughout the text. They should be more clearly separated from the text (parentesis ?). If two o more references are reported this is even more confusing, due to the fact that numbers are written sequencially. See for example 1819 or 2324, instead of (18,19) and (23,24).

Lines 154-156: the Authors should characterize precisely both how they assessed and defined healthy subjects, and also Heart Failure patients. It should be reported if HF patients have been defined according to the ESC 2021 Guidelines as HFrEF, HFmrEF, HFpEF.

McDonagh TA, Metra M, Adamo M, Gardner RS, Baumbach A, Böhm M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. 2021 Sep 21;42(36):3599-3726.

The guidelines used by the Authors should be reported in the references.  The Authors should also report BNP and/or NTpro-BNP measurements performed at the time of this study. The duration of HF disease should be indicated as well as actual drug treatment, since all this could influence the results, beeing specific for this specific population under study. All these informations could also allow to analyze and compare the performance of the proposed methodology, when to be applied to any new different population of HF. The presence and the possible influence of comorbidities, like diabetes, hypertension, coronary artery disease, renal failure, anemia and so on, should be also indicated. It should be also indicated if all the patients were consecutive. In fact we suggest also a separate analysis according to BMI and sexes for the influence of weight and breast on the KS recording. If all these informations are not available this should be clearly aknowledged in the text as important limitations.  

In the conclusions it should be also reported that this is a preliminary study in a sample population and that the method should be tested  and validated again prospectively in a test population in a blind manner. 

 

Author Response

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Classification of chronic heart failure (CHF) using deep learning and the Korotkoff sound”. Your comments on our manuscripts have inspired us greatly, allowing us to better understand our research topics and improve the quality of our articles. We have studied these comments carefully and made corrections for each one, which we hope meet with approval.

 

Here are the responses:

 

  1. References are written in a confusing manner throughout the text. They should be more clearly separated from the text (parentesis ?). If two o more references are reported this is even more confusing, due to the fact that numbers are written sequencially. See for example 1819 or 2324, instead of (18,19) and (23,24).

Response

Thanks for your detailed comments, we have updated the references. ( Refer to line 47, line 59, lines 65-66, line 70, line 268)

 

  1. Lines 154-156: the Authors should characterize precisely both how they assessed and defined healthy subjects, and also Heart Failure patients. It should be reported if HF patients have been defined according to the ESC 2021 Guidelines as HFrEF, HFmrEF, HFpEF. McDonagh TA, Metra M, Adamo M, Gardner RS, Baumbach A, Böhm M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. 2021 Sep 21;42(36):3599-3726.

Response

Thank you for your patience guidance once more; this section of the content is the result of our oversight. The cardiac status was evaluated in accordance with the ESC 2021 Guidelines. The manuscript have since been revised. ( Refer to lines 170-171)

 

  1. The guidelines used by the Authors should be reported in the references. The Authors should also report BNP and/or NTpro-BNP measurements performed at the time of this study. The duration of HF disease should be indicated as well as actual drug treatment, since all this could influence the results, beeing specific for this specific population under study. All these informations could also allow to analyze and compare the performance of the proposed methodology, when to be applied to any new different population of HF. The presence and the possible influence of comorbidities, like diabetes, hypertension, coronary artery disease, renal failure, anemia and so on, should be also indicated. It should be also indicated if all the patients were consecutive. In fact we suggest also a separate analysis according to BMI and sexes for the influence of weight and breast on the KS recording. If all these informations are not available this should be clearly aknowledged in the text as important limitations.

Response

Thank you for your courteous and patient reminder, the NTpro-BNPd test results in Table 1( Refer to lines 183-184), HF patient complications, and BMI and gender for the influence of weight and breast on KS recording were not thoroughly considered. We have emphasized the significance of this in the manuscript, making it clear that it will be a key direction for our future study. ( Refer to lines 207-213)

 

  1. In the conclusions it should be also reported that this is a preliminary study in a sample population and that the method should be tested and validated again prospectively in a test population in a blind manner.

Response

As for the referee's concern, the Discussion and Conclusions has been rewritten. ( Refer to lines 525-531)

Author Response File: Author Response.docx

Back to TopTop