Next Article in Journal
MSSN: An Attribute-Aware Transmission Algorithm Exploiting Node Similarity for Opportunistic Social Networks
Next Article in Special Issue
A Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity
Previous Article in Journal
Self-Portrait, Selfie, Self: Notes on Identity and Documentation in the Digital Age
Previous Article in Special Issue
Terminology Translation in Low-Resource Scenarios
Open AccessArticle

Subunits Inference and Lexicon Development Based on Pairwise Comparison of Utterances and Signs

1
Idiap Research Institute, 1920 Martigny, Switzerland
2
Ecole polytechnique fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
*
Author to whom correspondence should be addressed.
Information 2019, 10(10), 298; https://doi.org/10.3390/info10100298
Received: 22 July 2019 / Revised: 7 September 2019 / Accepted: 24 September 2019 / Published: 26 September 2019
(This article belongs to the Special Issue Computational Linguistics for Low-Resource Languages)
Communication languages convey information through the use of a set of symbols or units. Typically, this unit is word. When developing language technologies, as words in a language do not have the same prior probability, there may not be sufficient training data for each word to model. Furthermore, the training data may not cover all possible words in the language. Due to these data sparsity and word unit coverage issues, language technologies employ modeling of subword units or subunits, which are based on prior linguistic knowledge. For instance, development of speech technologies such as automatic speech recognition system presume that there exists a phonetic dictionary or at least a writing system for the target language. Such knowledge is not available for all languages in the world. In that direction, this article develops a hidden Markov model-based abstract methodology to extract subword units given only pairwise comparison between utterances (or realizations of words in the mode of communication), i.e., whether two utterances correspond to the same word or not. We validate the proposed methodology through investigations on spoken language and sign language. In the case of spoken language, we demonstrate that the proposed methodology can lead up to discovery of phone set and development of phonetic dictionary. In the case of sign language, we demonstrate how hand movement information can be effectively modeled for sign language processing and synthesized back to gain insight about the derived subunits. View Full-Text
Keywords: subword units; phone set; pronunciation lexicon; hidden Markov model; under-resourced; speech processing; sign language processing subword units; phone set; pronunciation lexicon; hidden Markov model; under-resourced; speech processing; sign language processing
Show Figures

Figure 1

MDPI and ACS Style

Tornay, S.; Magimai.-Doss, M. Subunits Inference and Lexicon Development Based on Pairwise Comparison of Utterances and Signs. Information 2019, 10, 298.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop