Next Article in Journal
Long-Term Effects of a Soft Robotic Suit on Gait Characteristics in Healthy Elderly Persons
Next Article in Special Issue
Object-Based Approach for Adaptive Source Coding of Surveillance Video
Previous Article in Journal
Heated Metal Mark Attribute Recognition Based on Compressed CNNs Model
Previous Article in Special Issue
A Robust Brain MRI Segmentation and Bias Field Correction Method Integrating Local Contextual Information into a Clustering Model
Article Menu
Issue 9 (May-1) cover image

Export Article

Open AccessArticle

Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Department of Electrical Engineering, University of Engineering and Technology Peshawar, Institute of Communication Technologies (ICT) Campus, Islamabad 44000, Pakistan
Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology, 6001 Alesund, Norway
Malaysia-Japan International Institute of Technology (M-JIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
Department of Computer Sciences, COSMATS University Islamabad, Abbottabad Campus, Abbottabad 22010, Pakistan
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(9), 1956;
Received: 3 April 2019 / Revised: 22 April 2019 / Accepted: 30 April 2019 / Published: 13 May 2019
PDF [3114 KB, uploaded 15 May 2019]


Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size. View Full-Text
Keywords: speech recognition; locally linear embedding; label propagation; Maxout; low resource languages speech recognition; locally linear embedding; label propagation; Maxout; low resource languages

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Ali Humayun, M.; Hameed, I.A.; Muslim Shah, S.; Hassan Khan, S.; Zafar, I.; Bin Ahmed, S.; Shuja, J. Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning. Appl. Sci. 2019, 9, 1956.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top