M2ASR-KIRGHIZ: A Free Kirghiz Speech Database and Accompanied Baselines
Round 1
Reviewer 1 Report
This work presents the collection of a Kirghiz speech database aimed at building ASR systems.
Besides some English grammatical mistakes, the paper is well organized and the work clearly presented. The first part review the particularities of the Kirghiz language while the second part details the collection process. I have only a minor comment on the manuscript: in the paragraph "Speaker selection" it is mentioned that the speakers were "selected to reflect diversity of gender, age, geography and education". In the next sentence it is reported that the speakers are all university student with 63% males and 37% females and age ranging from 19 to 25 years old. It seems that the set of speakers is not representative of the Kirghiz population. Please correct this sentence accordingly.
My main concern is that in spite of what is claimed in the paper: the data is not currently available (it is "coming soon" according to the link provided by the paper). Therefore, at the time being, it is difficult to judge the outcome of this work. The same goes with the Kaldi and WeNet recipes. I couldn't find a relevant project on the link provided.
Consequently, I think this paper cannot be accepted for publication as long as the data and recipes are not provided.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
This paper provides a free database of Kirghiz's speech and related linguistic resources in this publication. This is the largest open Kirghiz speech database (transcribed, 128 hours from 163 speakers). The background knowledge of Kirghiz is detailly presented in this paper. The baselines are provided.
Author Response
Point 1: I This paper provides a free database of Kirghiz's speech and related linguistic resources in this publication. This is the largest open Kirghiz speech database (transcribed, 128 hours from 163 speakers). The background knowledge of Kirghiz is detailly presented in this paper. The baselines are provided.
Response 1: We appreciate the positive comments from the reviewer.
Reviewer 3 Report
All comments are given in the attached review document.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
The paper describes a new speech database for training ASR systems for the Kirghiz language, a minority language spoken in a region of China. Besides, it provides ASR results (in terms of Letter Error Rate) for five baseline systems implemented using SOA technology.
I find it very interesting the production of new resources to foster research and technological developments for minority languages. In this regard, the free availability of these new datasets and the baseline systems developed for validation is key for future research works.
Minor issues
The sequences of vowels and consonants shown in Table 6 are inverted. I mean that V should be replaced by C and C replaced by V.
In Section 4.1, the authors say that speakers were selected in order to have a diversity of gender, age, geography, and education. But taking into account the information provided in the paper, there is little diversity regarding age and education, because all speakers are students in the age range of 19 to 25 years old. So diversity restricts to gender and maybe geography.
Finally, while English writing is reasonably good, it requires some proofreading. Attached to this review, I provide my own notes and suggestions as a PDF file.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Remarks of my previous review have been adequately addressed:
1. Paragraph on speakers selection has been corrected
2. Links to the database is provided (data is accessible on request) and the authors describe the recipes they have used for Kaldi and WeNet (removing the need to provide link to their own repository.