Next Article in Journal
Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling
Previous Article in Journal
Generating Trees for Comparison

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition

1
School of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China
2
Center of Network & Computation, Huazhong University of Science & Technology, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Computers 2020, 9(2), 36; https://doi.org/10.3390/computers9020036
Received: 27 March 2020 / Revised: 24 April 2020 / Accepted: 30 April 2020 / Published: 2 May 2020
(This article belongs to the Special Issue Artificial Neural Networks in Pattern Recognition)
Deep neural networks (DNNs) have shown a great achievement in acoustic modeling for speech recognition task. Of these networks, convolutional neural network (CNN) is an effective network for representing the local properties of the speech formants. However, CNN is not suitable for modeling the long-term context dependencies between speech signal frames. Recently, the recurrent neural networks (RNNs) have shown great abilities for modeling long-term context dependencies. However, the performance of RNNs is not good for low-resource speech recognition tasks, and is even worse than the conventional feed-forward neural networks. Moreover, these networks often overfit severely on the training corpus in the low-resource speech recognition tasks. This paper presents the results of our contributions to combine CNN and conventional RNN with gate, highway, and residual networks to reduce the above problems. The optimal neural network structures and training strategies for the proposed neural network models are explored. Experiments were conducted on the Amharic and Chaha datasets, as well as on the limited language packages (10-h) of the benchmark datasets released under the Intelligence Advanced Research Projects Activity (IARPA) Babel Program. The proposed neural network models achieve 0.1–42.79% relative performance improvements over their corresponding feed-forward DNN, CNN, bidirectional RNN (BRNN), or bidirectional gated recurrent unit (BGRU) baselines across six language collections. These approaches are promising candidates for developing better performance acoustic models for low-resource speech recognition tasks.
Keywords: speech recognition; low-resource languages; acoustic models; neural network models speech recognition; low-resource languages; acoustic models; neural network models
MDPI and ACS Style

Fantaye, T.G.; Yu, J.; Hailu, T.T. Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition. Computers 2020, 9, 36.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop