Next Article in Journal
Development of an Experimental Platform for Combinative Use of an XFEL and a High-Power Nanosecond Laser
Previous Article in Journal
A Domain-Specific Generative Chatbot Trained from Little Data
Open AccessArticle

Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition

by Hwamin Kim 1 and Jeong-Sik Park 2,*
1
Department of English Linguistics, Hankuk University of Foreign Studies, Seoul 02450, Korea
2
Department of English Linguistics and Language Technology, Hankuk University of Foreign Studies, Seoul 02450, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(7), 2225; https://doi.org/10.3390/app10072225
Received: 21 February 2020 / Revised: 14 March 2020 / Accepted: 20 March 2020 / Published: 25 March 2020
(This article belongs to the Section Computing and Artificial Intelligence)
The conventional speech recognition systems can handle the input speech of a specific single language. To realize multi-lingual speech recognition, a language should be firstly identified from input speech. This study proposes an efficient Language IDentification (LID) approach for the multi-lingual system. The standard LID tasks depend on common acoustic features used in speech recognition. However, the features may convey insufficient language-specific information, as they aim to discriminate the general tendency of phonemic information. This study investigates another type of feature characterizing language-specific properties, considering computation complexity. We focus on speech rhythm features providing the prosodic characteristics of speech signals. The rhythm features represent the tendency of consonants and vowels of languages, and therefore, classifying them from speech signals is necessary. For the rapid classification, we employ Gaussian Mixture Model (GMM)-based learning in which two GMMs corresponding to consonants and vowels are firstly trained and used for classifying them. By using the classification results, we estimate the tendency of two phonemic groups such as the duration of consonantal and vocalic intervals and calculate rhythm metrics called R-vector. In experiments on several speech corpora, the automatically extracted R-vector provided similar language tendencies to the conventional studies on linguistics. In addition, the proposed R-vector-based LID approach demonstrated superior or comparable LID performance to the conventional approaches in spite of low computation complexity. View Full-Text
Keywords: language identification; rhythm metrics; Gaussian mixture model; linear mixed effect model; i-vector; convolutional neural network language identification; rhythm metrics; Gaussian mixture model; linear mixed effect model; i-vector; convolutional neural network
Show Figures

Graphical abstract

MDPI and ACS Style

Kim, H.; Park, J.-S. Automatic Language Identification Using Speech Rhythm Features for Multi-Lingual Speech Recognition. Appl. Sci. 2020, 10, 2225.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop