Singing-Voice Timbre Evaluations Based on Transfer Learning
Round 1
Reviewer 1 Report
This paper presents a transfer learning model to evaluate singing voice timbre. It can be accepted for publication by considering the following corrections:
1- Section 1 "Introduction" is lengthy and the way of writing is like a research proposal. It will be better if the authors merge three sections of 1.1, 1.2 and 1.3 in one section. Also, the paper contribution should be listed at the end of the paper before the paper organization.
2- Be consistent in using mismatched words such as this research, this study, this paper or this article.
3- In section three "Materials and Data Preprocessing", It will be better if you draw the research methodology. It will give readers the systematic phases that you follow in this paper. Also, this section required more elaboration.
4- In the caption of any figure, please give only one sentence (Figure 4 is an example).
5- Please share your dataset with the public (Kaggle or GitHub)
7- Please try to void pronouns in academic writing, I noticed that a lot of “ WE”.
8- “ The dataset was divided into a ratio of 80% for the training set and 10% for validation 276 and testing sets each”? 80% and 10%, what about the remaining records of the dataset?
9- The experiment has been executed on 200 Epochs only. What will happen if more or fewer epochs?
10- if you show the results on different activation functions and learning rates make a significant contribution to your paper.
11- Compare experiment results with existing methods.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
This manuscript is concerned with the automatic evaluation of singing voice timbre, building upon instrument timbre evaluation.
The authors outline their main contributions as:
1) Constructing a singing voice timbre evaluation architecture.
2) Discussing the applicability of instrument timbre evaluation knowledge to machine learning of singing voice timbre on a theoretical level and, further, conducting experiments to validate the theory using a transfer learning model.
Somehow, the theoretical level is barely touched upon, which is more consistent with what the authors describe as
"This paper explores the association between instrument timbre and singing voice timbre FROM THE PERSPECTIVE OF EXPERIMENTAL ANALYSIS, hoping to provide RESEARCH INSPIRATION for future timbre studies and related auditory-type tasks."
What is "vocal voice timbre"?
Please explain the rationale behind the following statement of yours: "if different musical instruments can produce similar timbre auditory effects, perhaps the timbre evaluation metrics for different vibrational sources are similar." This seems to be at odds with the basic notion about timbre being characteristic of each instrument.
What are "PESnQ scores"
Please revise Eq. (1), where no alpha parameter is found even though there is a reference to it in the text preceding this equation.
Timbre features are not clearly defined. They kind of abruptly pop up in the text after the general filtering description.
Overall, the manuscript helps to disclose the multidimensional nature of timbre evn though it keeps short in providing clear definitions of the dimensions involved.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
I am satisfied with all corrections except:
1- The figure of research methodology (point 3 ). Please add this figure, It will give a full picture of your research process.
2- Also, point 10, please add these results to your paper with justifications. (Activation functions)
Author Response
Please see the attachment.
Author Response File: Author Response.pdf