Investigation of Text-Independent Speaker Verification by Support Vector Machine-Based Machine Learning Approaches
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
The authors primarily explored the application of Support Vector Machines (SVM) in text-independent speaker verification and validated the effectiveness and advantages of SVM in this domain. However, the paper has the following issues:
1. The introduction is poorly written and logically disorganized. It is recommended to structure it into three parts: problem background, research methodology, and main conclusions, to clarify the innovation and contributions of the paper.
2. The innovation points are not clearly defined and should be explicitly stated.
3. The review of domestic and international research is insufficient and outdated.
4. While the paper mentions the selection of the LibriSpeech dataset, it lacks a detailed comparison with other datasets (e.g., TIMIT, ELSDR). It is recommended to supplement the rationale for dataset selection, particularly in terms of data diversity and noise levels.
5. Using the default SVM hyperparameters may not guarantee optimal model performance. It is suggested to include hyperparameter tuning experiments or provide a detailed explanation of the rationale behind using default parameters.
6. Deep learning has become dominant in this field. It is recommended to include a simple deep learning baseline model for comparison to better highlight the advantages and limitations of SVM and its applicable scenarios.
Comments on the Quality of English LanguageThe English could be improved to more clearly express the research.
Author Response
Please see attached file
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe purpose of the speaker verification is about confirming or denying the identity of the input speaker as the claimed speaker. This is an important security topic worthy of further research. Because text-independent speaker verification extracts characteristics of the target’s voice that are present across different words and sounds, it does not have some limitations facing the text-dependent speaker verification.
In this paper, the authors pursued the investigation of the text-independent speaker verification. As said, the topic itself is worthy of further study. The authors’ background discussion on the application of SVM on this topic is good and comprehensive enough. Then, the authors compared the performance of their own text-independent speaker verification SVM against other four SVM applications proposed previously. The results are interesting and confirm that SVM are useful for this application.
Overall, this paper is good at the comparison of the five SVM models (including the authors’ own model). My concern is that the results are not significant enough for journal publication in top journals yet. This is because the topic has been addressed by other methods (besides SVM). It would be more convincing to have additional focus on the comparison SVM with other methods as well. Then, this paper will become comprehensive enough for publication in top journal.
Author Response
Please see attached file
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsRefer to attached file for comments
Comments for author File: Comments.pdf
Author Response
Please see attached file
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsIn the introduction section authors mentioned that: The first reason is to demonstrate that SVMs can and have been successfully used for text-independent speaker verification; this is important because, in a field seemingly dominated by deep learning techniques, it is important to show that machine learning techniques can work and should be used when possible because of the aforementioned advantages they have over deep learning techniques.
1. However, there is no comparative analysis between SVM Vs deep learning on this dataset. It is important to justify why SVM is better than deep learning in this dataset. Authors need to provide experimental results to support above claim. Please include in result in revise version
Related work section
2. ELSDSR means English Language Speech Database for Speaker Recognition, please keep ELSDSR under the bracket (in related work section). Also make italic k and N (in same section)
The results seem impressive (in Table 4):
3. For ROC curve in Figure 4, please add grid like this plt.grid(). In the revise version please update with new figure (include grid)
4. I would like to see the SNR (signal to noise ratio) and MSE (mean square error) values in revise version.
Author Response
Please see attached file
Author Response File: Author Response.pdf
Reviewer 5 Report
Comments and Suggestions for AuthorsThe presentation has been very well structured and analyzed, while sufficient details are given for understanding the concepts of the intended application. The experimental and analysis methodology is sound, while sufficient details are given to replicate the proposed experimental procedures and analysis. The results are very interesting as well.
Yet, there are a few points that can (and should) be improved:
1) The documentation (references) is poor, both in terms of quantity and recency. I am sure that extra recent (after 2019) references can be added in the Introduction section and, e.g., to subsection 3.3, in order to reach a total of at least a dozen (i.e., 3-4 more references).
2) The paper seems to end "suddenly". Please, consider changing the heading of section 5 to "Discussion & Conclusion".
3) Erase the second "would" at the end of line 249.
You suggest future work; I hope that you will continue towards this direction.
Author Response
Please see attached file
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe references cited by the authors are outdated, and the review of the current state of domestic and international research is insufficient. These shortcomings can only demonstrate that the advancement, novelty, and innovation of this paper are significantly lacking.
Author Response
Please see attached
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsMinor revision. 1 more minor revision in attached pdf before manuscript can be accepted and published. Hope that authors will do it. Interesting piece of work and deserves to be published. Well done, authors.
Comments for author File: Comments.pdf
Author Response
Please see attached
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsAuthors address all the questions and concerns. However, Figure 2 has just gray image, and no significance please change it.
Author Response
Please see attached
Author Response File: Author Response.pdf
Round 3
Reviewer 1 Report
Comments and Suggestions for Authors- When replying, avoid directly copying LaTeX format, as it greatly diminishes the reading experience.
- The discussion of recent related research is too limited. There is a lack of research on text-independent speaker verification based on SVM, and related studies should be discussed, such as SVM techniques and other speaker verification methods. Currently, the reliance on outdated research is insufficient to support the novelty of this paper. If there is a lack of relevant research, it only suggests that this issue may no longer be worth studying.
- The proposed method in this paper is not superior to the CNN method. Please clarify the contribution of the method proposed in this paper.
Author Response
Please see attachment
Author Response File: Author Response.pdf
Round 4
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have addressed all the issues raised.