Next Article in Journal
Neural Sign Language Translation Based on Human Keypoint Estimation
Previous Article in Journal
Improving Lossless Image Compression with Contextual Memory
Previous Article in Special Issue
Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes
 
 
Article
Peer-Review Record

Individualized Interaural Feature Learning and Personalized Binaural Localization Model

Appl. Sci. 2019, 9(13), 2682; https://doi.org/10.3390/app9132682
by Xiang Wu 1,*, Dumidu S. Talagala 2, Wen Zhang 3 and Thushara D. Abhayapala 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2019, 9(13), 2682; https://doi.org/10.3390/app9132682
Submission received: 15 May 2019 / Revised: 20 June 2019 / Accepted: 25 June 2019 / Published: 30 June 2019
(This article belongs to the Special Issue Mobile Spatial Audio)

Round 1

Reviewer 1 Report

This is a very well written paper with detailed outline of simulation and experimentation. 

Author Response

We thank the reviewer for the positive comments.

A response letter has been uploaded. 

Author Response File: Author Response.pdf

Reviewer 2 Report

In general this is a quite interesting paper deserving to be published (after some changes and explanations) in this journal. It is well written and address an interesting (and difficult) question. 

I have some minor questions, comments and suggestions  to the authors:

There are many unexplained acronyms. For instance: HRIR, HRTF, BRIR, PPAM, DP-RTF and some others. Although they can be obvious for the author it is a better practice to define them the first time they are used.

In pag. 9, you propose an STFT technique (again, not previously defined) to obtain the spectral behavior of the sounds. Have you considered to use wavelet transforms? Can you offer any reason for not considering them?

Additionally, many parameters associated to STFT are described but no justified (16 ms. window length, 8 ms. shift, ...). How have they been selected? Please provide some justification. In the same sense, the window used in STFT is not mentioned (Hamming?). Please specify.

Some parameters "are determined based on an empirical approach" (p.9). What do you mean? Why do not use some kind of validation approach as it is common in Machine Learning techniques? And why do not select some other parameters (for instance, window length) using validation as well?

Apparently you have not used cross validation to increase the significance of your experimental results. Any reason for that?

Author Response

We thank the reviewer for the positive and detailed comments. The uploaded response letter details our responses to the questions and comments from the reviewer.


Author Response File: Author Response.pdf

Reviewer 3 Report

Dear authors,


it was pleasant for me to review your manuscript that I read with great interest. The problem you deal with is urgent, the solution and the experiment are scientifically elegant. Despite the high quality of your study, several remarks appear and questions while reading the text.

1. In line 12, the term Direct Path-Relative Transfer Function should not be written with capital letters. Also, it is not clear what should be exactly understood under "By connecting" in the beginning of the sentence. Please, make the phrase more direct.

The following upper and lower case confusion can be found all over the manuscript. For instance, line 332: the header begins with a lowercase letter. Figure 6 caption suffers from the same defect.

2. You introduce an abbreviation HRTF without explanation what does it stand for. In Conclusion, you use an abbreviation RF. Please, do not use abbreviations in the abstract and conclusion.

3. It is not clear what exact signals do you use for analysis and visualization in Figure 1. Probably almost the same result can be obtained with a variety of signals? Was the data simulated or measured in the real-world experiment?

4. Make an emphasis on the physics of the possible difference between left and right channel sounds that make it possible to determine the sound position using only acoustic data.  Are spectral ITD and ILD sufficient? Are special features of the Type 40AG mic important, that are reflected e.g. in the free-field corrections diagram (see p. 5 of the datasheet)?

5. As I understood, you used only a white noise as a signal in your experiment. Would your method work as good as you found when other signals are used?

5. In conclusion please make an emphasis on the differences between your and existing methods.


I would recommend to publish your paper after revision.


Author Response

We thank the reviewer for the positive comments. The uploaded response letter details our responses to the questions and comments from the reviewer.


Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Dear authors!


Thank you for the thorough revision of your paper and exhaustive answers for my remarks. After considering the reviewers' comments the manuscript presents a good quality scientific article. I would recommend it to be accepted in the present form.


Back to TopTop