Next Article in Journal
Computer Vision-Based Unobtrusive Physical Activity Monitoring in School by Room-Level Physical Activity Estimation: A Method Proposition
Next Article in Special Issue
Terminology Translation in Low-Resource Scenarios
Previous Article in Journal
Study on Unknown Term Translation Mining from Google Snippets
Previous Article in Special Issue
Crowdsourcing the Paldaruo Speech Corpus of Welsh for Speech Technology
Open AccessArticle

The Usefulness of Imperfect Speech Data for ASR Development in Low-Resource Languages

by 1,*,† and 1,2,†
1
Human Technologies Research Group, CSIR Next Generation Enterprises and Institutions Cluster, P.O. Box 395, Pretoria 0001, South Africa
2
Department of Electrical & Electronic Engineering, Stellenbosch University, Private Bag X1, Stellenbosch 7602, South Africa
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Information 2019, 10(9), 268; https://doi.org/10.3390/info10090268
Received: 29 June 2019 / Revised: 29 July 2019 / Accepted: 8 August 2019 / Published: 28 August 2019
(This article belongs to the Special Issue Computational Linguistics for Low-Resource Languages)
When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the substantial improvements in acoustic modeling that deep architectures achieved for well-resourced languages ushered in a new data requirement: their development requires hundreds of hours of speech. A suitable strategy for the enlargement of speech resources for the South African languages is therefore required. The first possibility was to look for data that has already been collected but has not been included in an existing corpus. Additional data was collected during the NCHLT project that was not included in the official corpus: it only contains a curated, but limited subset of the data. In this paper, we first analyze the additional resources that could be harvested from the auxiliary NCHLT data. We also measure the effect of this data on acoustic modeling. The analysis incorporates recent factorized time-delay neural networks (TDNN-F). These models significantly reduce phone error rates for all languages. In addition, data augmentation and cross-corpus validation experiments for a number of the datasets illustrate the utility of the auxiliary NCHLT data. View Full-Text
Keywords: automatic speech recognition; low-resource languages; speech data; speech technology; Kaldi; time-delay neural networks automatic speech recognition; low-resource languages; speech data; speech technology; Kaldi; time-delay neural networks
Show Figures

Figure 1

MDPI and ACS Style

Badenhorst, J.; de Wet, F. The Usefulness of Imperfect Speech Data for ASR Development in Low-Resource Languages. Information 2019, 10, 268.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map

1
Back to TopTop