MDPI - Publisher of Open Access Journals

32 pages, 5224 KB

Open AccessArticle

Functional Networks in Developmental Dyslexia: Auditory Discrimination of Words and Pseudowords

by Tihomir Taskov and Juliana Dushanova

NeuroSci 2026, 7(1), 21; https://doi.org/10.3390/neurosci7010021 - 3 Feb 2026

Viewed by 686

Developmental dyslexia (DD) often involves difficulties in phonological processing of speech. Objectives: While underlying neural changes have been identified in terms of stimulus- and task-related responses within specific brain regions and their neural connectivity, there is still limited understanding of how these changes [...] Read more.

Developmental dyslexia (DD) often involves difficulties in phonological processing of speech. Objectives: While underlying neural changes have been identified in terms of stimulus- and task-related responses within specific brain regions and their neural connectivity, there is still limited understanding of how these changes affect the overall organization of brain networks. Methods: This study used EEG and functional network analysis, focusing on small-world propensity across various frequency bands (from δ to γ), to explore the global brain organization during the auditory discrimination of words and pseudowords in children with DD. Results: The main finding revealed a systemic inefficiency in the functional network of individuals with DD, which did not achieve the optimal small-world propensity. This inefficiency arises from a fundamental trade-off between localized specialization and global communication. During word listening, the δ-/γ1-networks (related to impaired syllabic and phonemic processing of words) and the θ-/β-networks (related to pseudoword listening) in the DD group showed lower local clustering and connectivity compared to the control group, resulting in reduced functional segregation. In particular, the θ-/β-networks for words in the DD group exhibited a less optimal balance between specialized local processing and effective global communication. Centralized midline hubs, such as the postcentral gyrus (PstCG) and inferior frontal gyrus (IFG), which are crucial for global coordination, attention, and executive control, were either absent or inconsistent in individuals with DD. Consequently, the DD network adopted a constrained, motor-compensatory, and left-lateralized strategy. This led to the redirection of information flow and processing effort toward the left PstCG/IFG loop, interpreted as a compensatory effort to counteract automatic processing failures. Additionally, the γ1-network, which is involved in phonetic feature binding, lacked engagement from posterior sensory hubs, forcing this critical process into a slow and effortful motor loop. The γ2-network exhibited unusual activation of right-hemisphere posterior areas during word processing, while it employed a simpler, less mature routing strategy for pseudoword listening, which further diminished global communication. Conclusions: This functionality highlights the core phonological and temporal processing deficits characteristic of dyslexia. Full article

► Show Figures

Figure 1

24 pages, 17983 KB

Open AccessArticle

Inheritance and Contact in the Development of Lateral Obstruents in Nguni Languages (S40)

by Nina van der Vlugt and Hilde Gunnink

Languages 2025, 10(5), 90; https://doi.org/10.3390/languages10050090 - 24 Apr 2025

Cited by 1 | Viewed by 2300

Abstract

This study investigates the development of the lateral fricatives and affricates, to which we jointly refer as ‘lateral obstruents’, in Nguni (S40) languages of Southern Africa. These lateral obstruents, which include /ɬ, ⁿɬ, ɮ, ⁿɮ, k͡ʟ̝̊/, are rare in the Bantu language family, [...] Read more.

This study investigates the development of the lateral fricatives and affricates, to which we jointly refer as ‘lateral obstruents’, in Nguni (S40) languages of Southern Africa. These lateral obstruents, which include /ɬ, ⁿɬ, ɮ, ⁿɮ, k͡ʟ̝̊/, are rare in the Bantu language family, and are not reconstructed for Proto-Bantu. Lateral obstruents are also rare cross-linguistically. They do occur, however, in four sub-branches of Southern Bantu: Shona, Sotho-Tswana, Nguni, and Tsonga. In this paper, we study how Southern Bantu could have acquired such a large inventory of cross-linguistically rare phonemes by investigating their development in Nguni languages, a large but closely related cluster of languages in which lateral obstruents are very frequent. We analyze published data from nine Nguni languages, including languages for which the only available descriptions are dated or of limited scope, in which case we carefully assess the data and their analysis. On the basis of this large database, we show which lateral obstruents are used in Nguni, and the vocabulary in which they occur. Applying the Comparative Method, we show that alveolar lateral obstruents can be reconstructed to Proto-Nguni, where they are the regular reflex of Proto-Bantu palatals *c and *j. The velar lateral affricate, in contrast, cannot be reconstructed to Proto-Nguni, and finds its origin in loanwords, for example, from Khoe languages, where it is used as a click replacement strategy. As a result, we conclude that both inheritance and contact played a role in the development of lateral obstruents in Nguni, likely combined in the case of alveolar lateral obstruents. In order to better understand the contact history, we evaluate existing hypothesized contact scenarios to account for the presence of lateral obstruents in Southern Bantu or Nguni. Given that alveolar lateral obstruents result from a regular sound change, contact does not seem to be as prominent in the development of lateral obstruents as has been proposed before in the literature. This study lays the groundwork for future research into lateral obstruents in Southern Bantu. Full article

(This article belongs to the Special Issue Recent Developments on the Diachrony and Typology of Bantu Languages)

► Show Figures

Figure 1

18 pages, 4858 KB

Open AccessArticle

Enhancing Dysarthric Voice Conversion with Fuzzy Expectation Maximization in Diffusion Models for Phoneme Prediction

by Wen-Shin Hsu, Guang-Tao Lin and Wei-Hsun Wang

Diagnostics 2024, 14(23), 2693; https://doi.org/10.3390/diagnostics14232693 - 29 Nov 2024

Cited by 1 | Viewed by 2433

Abstract

Introduction: Dysarthria, a motor speech disorder caused by neurological damage, significantly hampers speech intelligibility, creating communication barriers for affected individuals. Voice conversion (VC) systems have been developed to address this, yet accurately predicting phonemes in dysarthric speech remains a challenge due to its [...] Read more.

Introduction: Dysarthria, a motor speech disorder caused by neurological damage, significantly hampers speech intelligibility, creating communication barriers for affected individuals. Voice conversion (VC) systems have been developed to address this, yet accurately predicting phonemes in dysarthric speech remains a challenge due to its variability. This study proposes a novel approach that integrates Fuzzy Expectation Maximization (FEM) with diffusion models for enhanced phoneme prediction, aiming to improve the quality of dysarthric voice conversion. Methods: The proposed method combines FEM clustering with Diffusion Probabilistic Models (DPM). Diffusion models simulate noise addition and removal to enhance the robustness of speech signals, while FEM iteratively optimizes phoneme boundaries, reducing uncertainty. The system was trained using the Saarland University Voice Disorder dataset, consisting of dysarthric and normal speech samples, with the conversion process represented in the Mel-spectrogram domain. The framework employs both subjective (Mean Opinion Score, MOS) and objective (Word Error Rate, WER) metrics for evaluation, complemented by ablation studies. Results: Experimental results showed that the proposed method significantly improved phoneme prediction accuracy and overall voice conversion quality. It achieved higher MOSs for naturalness, intelligibility, and speaker similarity compared to existing models like StarGAN-VC and CycleGAN-VC. Additionally, the proposed method demonstrated a lower WER for both mild and severe dysarthria cases, indicating better performance in producing intelligible speech. Discussion: The integration of FEM with diffusion models offers substantial improvements in handling the irregularities of dysarthric speech. The method’s robustness, as evidenced by the ablation studies, shows that it can maintain speech naturalness and intelligibility even without a speaker-encoder. These findings suggest that the proposed approach can contribute to the development of more reliable assistive communication technologies for individuals with dysarthria, providing a promising foundation for future advancements in personalized speech therapy. Full article

(This article belongs to the Special Issue Classification of Diseases Using Machine Learning Algorithms)

► Show Figures

Figure 1

19 pages, 595 KB

Open AccessArticle

Word-Final /s/-/z/ Omission in Vietnamese English

by Stephen J. Disney and Le Nu Cam Le

Languages 2024, 9(10), 327; https://doi.org/10.3390/languages9100327 - 14 Oct 2024

Viewed by 4733

Abstract

Southeast Asian learners of English, including those from Vietnam, frequently omit word-final consonants in their English speech. Previous work on Vietnamese learners of English is limited, and errors are typically usually attributed to first-language transfer effects. No large-scale empirical study on Vietnamese learners [...] Read more.

Southeast Asian learners of English, including those from Vietnam, frequently omit word-final consonants in their English speech. Previous work on Vietnamese learners of English is limited, and errors are typically usually attributed to first-language transfer effects. No large-scale empirical study on Vietnamese learners has been carried out to aid the development of an evidence-based pedagogy. This study uses authentic spoken data to compare lexical and morphological word-final /s/ and /z/ in the speech of sixteen Vietnamese adult learners of English. We discuss the relative impact of frequency of use, whether the instance of a target /s/ or /z/ is in a root or bound morpheme, and whether the preceding phoneme is a consonant or vowel. An overall omission rate of 28.4% of expected instances was found. Morphological {-s} when it is preceded by a consonant has the highest error rate (50.7%). A multilevel binary logistic regression was performed to ascertain the relative effects. Morphological words containing /s/ or /z/ were significantly more likely to be pronounced with the /s/ or /z/ absent than lexical words containing a /s/ or /z/, as were those in clusters compared to those with a preceding vowel. The results indicate that phonological effects and morphological effects are stacked and not multiplicative and that the observed omission rates are not solely attributable to L1 transfer effects. Frequency of use is also highly correlated with accuracy. Full article

(This article belongs to the Special Issue Investigating L2 Phonological Acquisition from Different Perspectives)

13 pages, 471 KB

Open AccessArticle

Exploring Verbal Fluency Strategies among Individuals with Normal Cognition, Amnestic and Non-Amnestic Mild Cognitive Impairment, and Alzheimer’s Disease

by Styliani Bairami, Vasiliki Folia, Ioannis Liampas, Eva Ntanasi, Panayiotis Patrikelis, Vasileios Siokas, Mary Yannakoulia, Paraskevi Sakka, Georgios Hadjigeorgiou, Nikolaos Scarmeas, Efthimios Dardiotis and Mary H. Kosmidis

Medicina 2023, 59(10), 1860; https://doi.org/10.3390/medicina59101860 - 19 Oct 2023

Cited by 9 | Viewed by 3770

Abstract

Background and Objectives: The present study explored the utilization of verbal fluency (VF) cognitive strategies, including clustering, switching, intrusions, and perseverations, within both semantic (SVF) and phonemic (PVF) conditions, across a continuum of neurocognitive decline, spanning from normal cognitive ageing (NC) to mild [...] Read more.

Background and Objectives: The present study explored the utilization of verbal fluency (VF) cognitive strategies, including clustering, switching, intrusions, and perseverations, within both semantic (SVF) and phonemic (PVF) conditions, across a continuum of neurocognitive decline, spanning from normal cognitive ageing (NC) to mild cognitive impairment (MCI) and its subtypes, amnestic (aMCI) and non-amnestic (naMCI), as well as AD. Materials and Methods: The study sample was derived from the Hellenic Longitudinal Investigation of Aging and Diet (HELIAD) cohort. The sample included 1607 NC individuals, 146 with aMCI (46 single-domain and 100 multi-domain), 92 with naMCI (41 single-domain and 51 multi-domain), and 79 with AD. Statistical analyses, adjusting for sex, age, and education, employed multivariate general linear models to probe differences among these groups. Results: Results showed that AD patients exhibited poorer performance in switching in both VF tasks and SVF clustering compared to NC. Similarly, the aMCI group performed worse than the NC in switching and clustering in both tasks, with aMCI performing similarly to AD, except for SVF switching. In contrast, the naMCI subgroup performed similarly to those with NC across most strategies, surpassing AD patients. Notably, the aMCI subgroup’s poor performance in SVF switching was mainly due to the subpar performance of the multi-domain aMCI subgroup. This subgroup was outperformed in switching in both VF tasks by the single-domain naMCI, who also performed better than the multi-domain naMCI in SVF switching. No significant differences emerged in terms of perseverations and intrusions. Conclusions: Overall, these findings suggest a continuum of declining switching ability in the SVF task, with NC surpassing both aMCI and AD, and aMCI outperforming those with AD. The challenges in SVF switching suggest executive function impairment associated with multi-domain MCI, particularly driven by the multi-domain aMCI. Full article

► Show Figures

Figure 1

30 pages, 3143 KB

Open AccessArticle

A Phonological Study of Rongpa Choyul

by Jingyao Zheng

Languages 2023, 8(2), 133; https://doi.org/10.3390/languages8020133 - 26 May 2023

Cited by 2 | Viewed by 3284

Abstract

This paper presents a detailed description of the phonology of the Rongpa variety of Choyul, an understudied Tibeto-Burman language spoken in Lithang (理塘) County, Dkarmdzes (甘孜) Tibetan Autonomous Prefecture of Sichuan Province, China. Based on firsthand fieldwork data, this paper lays out Rongpa [...] Read more.

This paper presents a detailed description of the phonology of the Rongpa variety of Choyul, an understudied Tibeto-Burman language spoken in Lithang (理塘) County, Dkarmdzes (甘孜) Tibetan Autonomous Prefecture of Sichuan Province, China. Based on firsthand fieldwork data, this paper lays out Rongpa phonology with details, examining its syllable canon, initial and rhyme systems, and word prosody. Peculiar characteristics of this phonological system are as follows: First, Rongpa has a substantial phonemic inventory, which comprises 43 consonants, 13 vowels, and 2 tones. 84 consonant clusters are observed to serve as the initial of a syllable. Secondly, the phonemic contrast between plain and uvularized vowels is attested. In addition, regressive vowel harmony on uvularization, height, and lip-roundedness can be clearly observed in various constructions including prefixed verb stems. Finally, regarding word prosody, two tones in monosyllabic words, /H/ and /L/, are observed to distinguish lexical meanings, and disyllabic words exhibit four surface pitch patterns. Pitch patterns in verb morphology are also examined. The findings and analyses as presented in this paper could form a foundation for future research on Rongpa Choyul. Full article

(This article belongs to the Special Issue New Directions for Sino-Tibetan Linguistics in the Mid-21st Century)

► Show Figures

Figure 1

18 pages, 2587 KB

Open AccessFeature PaperArticle

Brain Lateralization for Language, Vocabulary Development and Handedness at 18 Months

by Delphine Potdevin, Parvaneh Adibpour, Clémentine Garric, Eszter Somogyi, Ghislaine Dehaene-Lambertz, Pia Rämä, Jessica Dubois and Jacqueline Fagard

Symmetry 2023, 15(5), 989; https://doi.org/10.3390/sym15050989 - 27 Apr 2023

Cited by 4 | Viewed by 6325

Abstract

Is hemisphere lateralization for speech processing linked to handedness? To answer this question, we compared hemisphere lateralization for speech processing and handedness in 18-month-old infants, the age at which infants start to produce words and reach a stable pattern of handedness. To assess [...] Read more.

Is hemisphere lateralization for speech processing linked to handedness? To answer this question, we compared hemisphere lateralization for speech processing and handedness in 18-month-old infants, the age at which infants start to produce words and reach a stable pattern of handedness. To assess hemisphere lateralization for speech perception, we coupled event-related potential (ERP) recordings with a syllable-discrimination paradigm and measured response differences to a change in phoneme or voice (different speaker) in the left and right clusters of electrodes. To assess handedness, we gave a 15-item grasping test to infants. We also evaluated infants’ range of vocabulary to assess whether it was associated with direction and degree of handedness and language brain asymmetries. Brain signals in response to a change in phoneme and voice were left- and right-lateralized, respectively, indicating functional brain lateralization for speech processing in infants. Handedness and brain asymmetry for speech processing were not related. In addition, there were no interactions between the range of vocabulary and asymmetry in brain responses, even for a phoneme change. Together, a high degree of right-handedness and greater vocabulary range were associated with an increase in ERP amplitudes in voice condition, irrespective of hemisphere side, suggesting that they influence discrimination during voice processing. Full article

(This article belongs to the Special Issue Early Laterality in Behaviour and Brain)

► Show Figures

Figure 1

11 pages, 489 KB

Open AccessArticle

Qualitative Verbal Fluency Components as Prognostic Factors for Developing Alzheimer’s Dementia and Mild Cognitive Impairment: Results from the Population-Based HELIAD Cohort

by Ioannis Liampas, Vasiliki Folia, Elli Zoupa, Vasileios Siokas, Mary Yannakoulia, Paraskevi Sakka, Georgios Hadjigeorgiou, Nikolaos Scarmeas, Efthimios Dardiotis and Mary H. Kosmidis

Medicina 2022, 58(12), 1814; https://doi.org/10.3390/medicina58121814 - 9 Dec 2022

Cited by 13 | Viewed by 3057

Abstract

Background and Objectives: The aim of the present study was to investigate the prognostic value of the qualitative components of verbal fluency (clustering, switching, intrusions, and perseverations) on the development of mild cognitive impairment (MCI) and dementia. Materials and Methods: Participants [...] Read more.

Background and Objectives: The aim of the present study was to investigate the prognostic value of the qualitative components of verbal fluency (clustering, switching, intrusions, and perseverations) on the development of mild cognitive impairment (MCI) and dementia. Materials and Methods: Participants were drawn from the multidisciplinary, population-based, prospective HELIAD (Hellenic Longitudinal Investigation of Aging and Diet) cohort. Two participant sets were separately analysed: those with normal cognition and MCI at baseline. Verbal fluency was assessed via one category and one letter fluency task. Separate Cox proportional hazards regressions adjusted for important sociodemographic parameters were performed for each qualitative semantic and phonemic verbal fluency component. Results: There were 955 cognitively normal (CN), older (72.9 years ±4.9), predominantly female (~60%) individuals with available follow-up assessments after a mean of 3.09 years (±0.83). Among them, 34 developed dementia at follow-up (29 of whom progressed to Alzheimer’s dementia (AD)), 160 developed MCI, and 761 remained CN. Each additional perseveration on the semantic condition increased the risk of developing all-cause dementia and AD by 52% and 55%, respectively. Of note, participants with two or more perseverations on the semantic task presented a much more prominent risk for incident dementia compared to those with one or no perseverations. Among the remaining qualitative indices, none were associated with the hazard of developing all-cause dementia, AD, and MCI at follow-up. Conclusions: Perseverations on the semantic fluency condition were related to an increased risk of incident all-cause dementia or AD in older, CN individuals. Full article

(This article belongs to the Special Issue Preclinical Markers Preluding the Onset of Cognitive Impairment and Dementia)

► Show Figures

Figure 1

17 pages, 6208 KB

Open AccessArticle

Selecting the Most Important Features for Predicting Mild Cognitive Impairment from Thai Verbal Fluency Assessments

by Suppat Metarugcheep, Proadpran Punyabukkana, Dittaya Wanvarie, Solaphat Hemrungrojn, Chaipat Chunharas and Ploy N. Pratanwanich

Sensors 2022, 22(15), 5813; https://doi.org/10.3390/s22155813 - 3 Aug 2022

Cited by 5 | Viewed by 4415

Abstract

Mild cognitive impairment (MCI) is an early stage of cognitive decline or memory loss, commonly found among the elderly. A phonemic verbal fluency (PVF) task is a standard cognitive test that participants are asked to produce words starting with given letters, such as [...] Read more.

Mild cognitive impairment (MCI) is an early stage of cognitive decline or memory loss, commonly found among the elderly. A phonemic verbal fluency (PVF) task is a standard cognitive test that participants are asked to produce words starting with given letters, such as “F” in English and “ก” /k/ in Thai. With state-of-the-art machine learning techniques, features extracted from the PVF data have been widely used to detect MCI. The PVF features, including acoustic features, semantic features, and word grouping, have been studied in many languages but not Thai. However, applying the same PVF feature extraction methods used in English to Thai yields unpleasant results due to different language characteristics. This study performs analytical feature extraction on Thai PVF data to classify MCI patients. In particular, we propose novel approaches to extract features based on phonemic clustering (ability to cluster words by phonemes) and switching (ability to shift between clusters) for the Thai PVF data. The comparison results of the three classifiers revealed that the support vector machine performed the best with an area under the receiver operating characteristic curve (AUC) of 0.733 (N = 100). Furthermore, our implemented guidelines extracted efficient features, which support the machine learning models regarding MCI detection on Thai PVF data. Full article

(This article belongs to the Special Issue Advanced Computational Intelligence for Object Detection, Feature Extraction and Recognition in Smart Sensor Environments 2022-2023)

► Show Figures

Figure 1

17 pages, 3285 KB

Open AccessArticle

Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification

by Mengqi Niu, Liang He, Zhihua Fang, Baowei Zhao and Kai Wang

Appl. Sci. 2022, 12(15), 7463; https://doi.org/10.3390/app12157463 - 25 Jul 2022

Cited by 3 | Viewed by 2785

Abstract

Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component [...] Read more.

Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component to explore content information, such as c-vector. However, the introduced ASR component requires a large amount of annotated data and consumes high computation resources. In this paper, we propose a pseudo-phoneme label (PPL) loss for the TI-SR task by integrating content cluster loss at the frame level and speaker recognition loss at the segment level in a unified network by multitask learning, without additional data requirement and exhausting computation. By referring to HuBERT, we generate pseudo-phoneme labels to adjust a frame level feature distribution by deep cluster to ensure each cluster corresponds to an implicit pronunciation unit in the feature space. We compare the proposed loss with the softmax loss, center loss, triplet loss, log-likelihood-ratio cost loss, additive margin softmax loss and additive angular margin loss on the VoxCeleb database. Experimental results demonstrate the effectiveness of our proposed method. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 755 KB

Open AccessArticle

Hierarchical Phoneme Classification for Improved Speech Recognition

by Donghoon Oh, Jeong-Sik Park, Ji-Hwan Kim and Gil-Jin Jang

Appl. Sci. 2021, 11(1), 428; https://doi.org/10.3390/app11010428 - 4 Jan 2021

Cited by 21 | Viewed by 10470

Abstract

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with [...] Read more.

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement. Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)

► Show Figures

Figure 1

14 pages, 1795 KB

Open AccessArticle

Acoustic Data-Driven Subword Units Obtained through Segment Embedding and Clustering for Spontaneous Speech Recognition

by Jeong-Uk Bang, Sang-Hun Kim and Oh-Wook Kwon

Appl. Sci. 2020, 10(6), 2079; https://doi.org/10.3390/app10062079 - 19 Mar 2020

Cited by 4 | Viewed by 2960

Abstract

We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments from broadcast data and then convert them [...] Read more.

We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments from broadcast data and then convert them into fixed-length embedding vectors based on a long short-term memory architecture. We use decision tree-based clustering to find acoustically similar embedding vectors and then build new acoustic subword units by gathering the clustered vectors. To update the lexicon of a speech recognizer, we build a lookup table between the tri-phone units and the units derived from the decision tree. Finally, the proposed lexicon is obtained by updating the original phoneme-based lexicon by referencing the lookup table. To verify the performance of the proposed unit, we compare the proposed unit with the previous units obtained by using the segment-based k-means clustering method or the frame-based decision-tree clustering method. As a result, the proposed unit is shown to produce better performance than the previous units in both spontaneous, and read Korean speech recognition tasks. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

10 pages, 1976 KB

Open AccessArticle

Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques

by Spyros Kouzoupis, Andreas Neocleous and Irene Athanassakis

Acoustics 2019, 1(4), 837-846; https://doi.org/10.3390/acoustics1040050 - 4 Nov 2019

Cited by 1 | Viewed by 4961

Abstract

A study of the ultrasonic vocalizations of several adult male BALB/c mice in the presence of a female, is undertaken in this study. A total of 179 distinct ultrasonic syllables referred to as “phonemes” are isolated, and in the resulting dataset, k-means [...] Read more.

A study of the ultrasonic vocalizations of several adult male BALB/c mice in the presence of a female, is undertaken in this study. A total of 179 distinct ultrasonic syllables referred to as “phonemes” are isolated, and in the resulting dataset, k-means and agglomerative clustering algorithms are implemented to group the ultrasonic vocalizations into clusters based on features extracted from their pitch contours. In order to find the optimal number of clusters, the elbow method was used, and nine distinct categories were obtained. Results when the k-means method was applied are presented through a matching matrix, while clustering results when the agglomerative technique was applied are presented as a dendrogram. The results of both methods are in line with the manual annotations made by the authors, as well as with the ones presented in the literature. The two methods of unsupervised analysis applied on 14 element feature vectors provide evidence that vocalizations can be grouped into nine clusters, which translates into the claim that there is a distinct repertoire of “syllables” or “phonemes”. Full article

► Show Figures

Graphical abstract

26 pages, 1993 KB

Open AccessArticle

Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

by Helen L. Bear and Richard Harvey

Appl. Sci. 2019, 9(18), 3870; https://doi.org/10.3390/app9183870 - 15 Sep 2019

Cited by 13 | Viewed by 4660

Abstract

Lipreading is understanding speech from observed lip movements. An observed series of lip motions is an ordered sequence of visual lip gestures. These gestures are commonly known, but as yet are not formally defined, as ‘visemes’. In this article, we describe a structured [...] Read more.

Lipreading is understanding speech from observed lip movements. An observed series of lip motions is an ordered sequence of visual lip gestures. These gestures are commonly known, but as yet are not formally defined, as ‘visemes’. In this article, we describe a structured approach which allows us to create speaker-dependent visemes with a fixed number of visemes within each set. We create sets of visemes for sizes two to 45. Each set of visemes is based upon clustering phonemes, thus each set has a unique phoneme-to-viseme mapping. We first present an experiment using these maps and the Resource Management Audio-Visual (RMAV) dataset which shows the effect of changing the viseme map size in speaker-dependent machine lipreading and demonstrate that word recognition with phoneme classifiers is possible. Furthermore, we show that there are intermediate units between visemes and phonemes which are better still. Second, we present a novel two-pass training scheme for phoneme classifiers. This approach uses our new intermediary visual units from our first experiment in the first pass as classifiers; before using the phoneme-to-viseme maps, we retrain these into phoneme classifiers. This method significantly improves on previous lipreading results with RMAV speakers. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

11 pages, 1628 KB

Open AccessArticle

Enhanced Sensitivity to Subphonemic Segments in Dyslexia: A New Instance of Allophonic Perception

by Willy Serniclaes and M’ballo Seck

Brain Sci. 2018, 8(4), 54; https://doi.org/10.3390/brainsci8040054 - 26 Mar 2018

Cited by 9 | Viewed by 5840

Abstract

Although dyslexia can be individuated in many different ways, it has only three discernable sources: a visual deficit that affects the perception of letters, a phonological deficit that affects the perception of speech sounds, and an audio-visual deficit that disturbs the association of [...] Read more.

Although dyslexia can be individuated in many different ways, it has only three discernable sources: a visual deficit that affects the perception of letters, a phonological deficit that affects the perception of speech sounds, and an audio-visual deficit that disturbs the association of letters with speech sounds. However, the very nature of each of these core deficits remains debatable. The phonological deficit in dyslexia, which is generally attributed to a deficit of phonological awareness, might result from a specific mode of speech perception characterized by the use of allophonic (i.e., subphonemic) units. Here we will summarize the available evidence and present new data in support of the “allophonic theory” of dyslexia. Previous studies have shown that the dyslexia deficit in the categorical perception of phonemic features (e.g., the voicing contrast between /t/ and /d/) is due to the enhanced sensitivity to allophonic features (e.g., the difference between two variants of /d/). Another consequence of allophonic perception is that it should also give rise to an enhanced sensitivity to allophonic segments, such as those that take place within a consonant cluster. This latter prediction is validated by the data presented in this paper. Full article

(This article belongs to the Special Issue Dyslexia, Dysgraphia and Related Developmental Disorders)

► Show Figures

Figure 1

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI