Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Synthesis of Normal Heart Sounds Using Generative Adversarial Networks and Empirical Wavelet Transform

Appl. Sci. 2020, 10(19), 7003; https://doi.org/10.3390/app10197003

by Pedro Narváez^* and Winston S. Percybrooks

Reviewer 1: Anonymous

Reviewer 2:

Ganbayar Batchuluun

Appl. Sci. 2020, 10(19), 7003; https://doi.org/10.3390/app10197003

Submission received: 31 July 2020 / Revised: 26 August 2020 / Accepted: 12 September 2020 / Published: 8 October 2020

(This article belongs to the Special Issue Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening)

Round 1

Reviewer 1 Report

The authors consider the problem of data augmentation of cardiac signals acquired using a stetoscope.
They propose to use generative adversarial networks (GANs) followed by empirical wavelet transform (EWT).
Only examples of normal (non-pathological) signals were generated.
The authors claim that with the use of EWT, smaller number of epochs for training GAN training is needed. According to the authors this is because of the observation that GAN's generator is able to generate noisy version of likely cardiac signal.
The quality of the signals generated by GANs was assessed by comparing them to real signals and measure average MCD. Additionally an experiment was performed where 50 generated examples were classified using various classifiers developed in different works. The authors checked how many of these generated samples were correctly classified as normal cardiac signals.

The paper has the following flaws:
1) In Introduction it is mentioned that data augmentation is needed to enlarge datasets used for training and increase generalization capabilities of machine learning models. However, in the experimental section synthesized signals were compared to 400 real signals and tested whether they are correctly classified as normal cardiac signals. These experiments cannot show that data augmentation can improve the generalization of the classifiers. The authors did not try to experimentally exclude the possibility that signals used to train the proposed model were memorized. In the context of the general aim, the work is unfinished.

2) The use of EWT is not sufficiently justified. In paper can be found one example, where it can be visualy verified that EWT applied signal synthesized signal is more similar to the real signal than the one obtained without the use of EWT. There is no discussion why GAN trained with small number of epochs can generate signal that contains information about typical cardiac signal plus noise.

3) Information about experiments are missing. For example: what dataset was used to train GAN? What parameters were used for the extraction of MFCCs? Additionally, infomation about experiments should be moved to the section describing experiments.

Line 19: Please explain more clearly which model is compared.
Line 35: these names do not need to start with capital letters.
Line 55: Timing -> timing

Introduction: please clarify whether you synthesize normal only signals.
Line 101: "As a result of" what discriminator is able to correctly classify the input data as real or false? Please rewrite this sentence to make it more readable (this is the first sentence in the paragraph).

113: I think that you should write that D(x) represents the probability that x estimated by the discriminator.
114: Please write what \rho_{data}(x) means.

130-148: These are mainly information about the experimental setup. I suggest to move it to Section "Experiment".
130: Explain about which noise you write. Is it the noise provided at the input of the generator? In this case I suggest to clarify that at the input of the generator 2000-dimensional vector is provided. What was the mean and the standard deviation of the distribution?
141: It is very important to write from which datasets the signals were taken. What was the procedure for selection of the signals from these datasets?

148: Please provide more information about the noise source.

167-171 - Why do you need to model frequencies above 200 Hz, if no heart souds there?

178: Why do you present how EWT works using Gaussian noise? Please explain it.
219: Which system of ordinary differential equations? Please clarify.
Table 3: Please provide information what are units of f_i?

230: The is no information about parameters of MFCCs like:
framing (hop, window length)
applied window function,
the number of points in FFT
the number of mel filters
how DCT was applied?

Eq 3: Did you use only one frame spanning the whole singal in MFCC calculation? This equation suggests that this is the case.

Line 244: Please explain what do you mean by "normalizing of duration". Were the signals somehow time aligned? If yes how it was done?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

A method for generating heart sounds is proposed. The method is composed of GAN and empirical wavelet transform. I suggest the authors to revise the paper according to my comments.

I wish the fig.1 has axes and the caption should be "the wave or curve or signal or data of normal heart sound" or another proper form of explanation.
The word "Proposed method" used in tables 1 and 2 makes it confusing. "Proposed method" should indicate only the proposed method of the paper. The other methods should be explained like "previous methods or existing methods". Captions of the tables 1 and 2 should be corrected. In addition, it is better to include the the proposed method in table 2 as well for comparison.
Maybe i am confusing but the following sentence says that the EWT decreases the training epochs and the computation cost of GAN. However, in the fig.3, EWT is not performed during the training process of GAN but is performed after GAN model. "Additionally, a denoising algorithm is implemented using the Empirical Wavelet Transform (EWT), allowing a decrease in the number of epochs and the computational cost that the GAN model requires. ". So If EWT is performed after the process of GAN finishes, how EWT decreases the computation cost of GAN?
In figures, authors wrote "Padding: same". Looks like it is keras API representation. Authors should explain these representations. "Padding: same" is not clear to understand for readers.
"Subsequently, the difference between generator and discriminator losses is analyzed. If this difference is greater than 0.5," this sentence says that if the difference between G_loss and D_loss is higher than 0.5 a switch works. This means that if one of the losses is greater than another one the switch works. However, in fig.4 representation is "Loss_G>>Loss_D", where ">>" represents "much greater than". It means that the switch works only if Loss_G is greater than Loss_D but not Loss_D is greater than Loss_G.
In fig.8, include an original signal or target signal for comparison. Otherwise it is difficult to say the results are good or bad. Additionally, do the same thing to the all results.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

reply to comment 3:

So you mean the figure 3 is a flowchart of training phase. For clarity, change the flowchart or its caption in figure 3 which is still confusing for readers. Because, we do not know the EWT is included in training phase from flowchart. So, revise figure 3.

Article Menu

Synthesis of Normal Heart Sounds Using Generative Adversarial Networks and Empirical Wavelet Transform

Further Information

Guidelines

MDPI Initiatives

Follow MDPI