MDPI - Publisher of Open Access Journals

22 pages, 771 KB

Open AccessArticle

Evaluation of Glottal Inverse Filtering Techniques on OPENGLOT Synthetic Male and Female Vowels

by Marc Freixes, Luis Joglar-Ongay, Joan Claudi Socoró and Francesc Alías-Pujol

Appl. Sci. 2023, 13(15), 8775; https://doi.org/10.3390/app13158775 - 29 Jul 2023

Cited by 3 | Viewed by 3292

Current articulatory-based three-dimensional source–filter models, which allow the production of vowels and diphtongs, still present very limited expressiveness. Glottal inverse filtering (GIF) techniques can become instrumental to identify specific characteristics of both the glottal source signal and the vocal tract transfer function to [...] Read more.

Current articulatory-based three-dimensional source–filter models, which allow the production of vowels and diphtongs, still present very limited expressiveness. Glottal inverse filtering (GIF) techniques can become instrumental to identify specific characteristics of both the glottal source signal and the vocal tract transfer function to resemble expressive speech. Several GIF methods have been proposed in the literature; however, their comparison becomes difficult due to the lack of common and exhaustive experimental settings. In this work, first, a two-phase analysis methodology for the comparison of GIF techniques based on a reference dataset is introduced. Next, state-of-the-art GIF techniques based on iterative adaptive inverse filtering (IAIF) and quasi closed phase (QCP) approaches are thoroughly evaluated on OPENGLOT, an open database specifically designed to evaluate GIF, computing well-established GIF error measures after extending male vowels with their female counterparts. The results show that GIF methods obtain better results on male vowels. The QCP-based techniques significantly outperform IAIF-based methods for almost all error metrics and scenarios and are, at the same time, more stable across sex, phonation type, F0, and vowels. The IAIF variants improve the original technique for most error metrics on male vowels, while QCP with spectral tilt compensation achieves a lower spectral tilt error for male vowels than the original QCP. Full article

(This article belongs to the Special Issue IberSPEECH 2022: Speech and Language Technologies for Iberian Languages)

► Show Figures

Figure 1

13 pages, 595 KB

Open AccessArticle

Potential Add-On Effects of Manual Therapy Techniques in Migraine Patients: A Randomised Controlled Trial

by Elena Muñoz-Gómez, Pilar Serra-Añó, Sara Mollà-Casanova, Núria Sempere-Rubio, Marta Aguilar-Rodríguez, Gemma V. Espí-López and Marta Inglés

J. Clin. Med. 2022, 11(16), 4686; https://doi.org/10.3390/jcm11164686 - 11 Aug 2022

Cited by 16 | Viewed by 4568

Abstract

Objective: To ascertain whether the combination of soft tissue and articulatory manual techniques is more effective than either one of these techniques alone for reducing migraine impact; Methods: Seventy-five participants with migraine were randomly divided into three groups (n = [...] Read more.

Objective: To ascertain whether the combination of soft tissue and articulatory manual techniques is more effective than either one of these techniques alone for reducing migraine impact; Methods: Seventy-five participants with migraine were randomly divided into three groups (n = 25 per group): (i) soft tissue (STG), (ii) articulatory (AG), and (iii) combined treatment (STAG). Pain, frequency of occurrence, duration, disability and impact, depression and anxiety levels, and perception of change were analysed at baseline, post intervention (T2) and at four-week follow-up (T3); Results: STAG showed a significantly greater reduction in pain versus STG and AG at T2 (p < 0.001; p = 0.014) and at T3 (p < 0.001; p = 0.01). Furthermore, STAG achieved a significantly greater reduction in pain duration versus STG at T2 (p = 0.020) and T3 (p = 0.026) and a greater impression of change versus STG (p = 0.004) and AG (p = 0.037) at T3. Similar effects were observed in all groups for frequency of occurrence, migraine disability, impact, and depression and anxiety levels; Conclusions: A combined manual therapy protocol including soft tissue and articulatory techniques yields larger improvements on pain and perception of change than either technique alone, yet the three therapeutic approaches show similar benefits for reducing pain, disability and impact caused by the migraine, depression or anxiety levels. Full article

(This article belongs to the Special Issue Migraine Pathophysiology and Treatment: Current Approaches and New Horizons)

► Show Figures

Figure 1

24 pages, 3512 KB

Open AccessArticle

Mispronunciation Detection and Diagnosis with Articulatory-Level Feedback Generation for Non-Native Arabic Speech

by Mohammed Algabri, Hassan Mathkour, Mansour Alsulaiman and Mohamed A. Bencherif

Mathematics 2022, 10(15), 2727; https://doi.org/10.3390/math10152727 - 2 Aug 2022

Cited by 26 | Viewed by 6278

Abstract

A high-performance versatile computer-assisted pronunciation training (CAPT) system that provides the learner immediate feedback as to whether their pronunciation is correct is very helpful in learning correct pronunciation and allows learners to practice this at any time and with unlimited repetitions, without the [...] Read more.

A high-performance versatile computer-assisted pronunciation training (CAPT) system that provides the learner immediate feedback as to whether their pronunciation is correct is very helpful in learning correct pronunciation and allows learners to practice this at any time and with unlimited repetitions, without the presence of an instructor. In this paper, we propose deep learning-based techniques to build a high-performance versatile CAPT system for mispronunciation detection and diagnosis (MDD) and articulatory feedback generation for non-native Arabic learners. The proposed system can locate the error in pronunciation, recognize the mispronounced phonemes, and detect the corresponding articulatory features (AFs), not only in words but even in sentences. We formulate the recognition of phonemes and corresponding AFs as a multi-label object recognition problem, where the objects are the phonemes and their AFs in a spectral image. Moreover, we investigate the use of cutting-edge neural text-to-speech (TTS) technology to generate a new corpus of high-quality speech from predefined text that has the most common substitution errors among Arabic learners. The proposed model and its various enhanced versions achieved excellent results. We compared the performance of the different proposed models with the state-of-the-art end-to-end technique of MDD, and our system had a better performance. In addition, we proposed using fusion between the proposed model and the end-to-end model and obtained a better performance. Our best model achieved a 3.83% phoneme error rate (PER) in the phoneme recognition task, a 70.53% F1-score in the MDD task, and a detection error rate (DER) of 2.6% for the AF detection task. Full article

(This article belongs to the Special Issue Recent Advances in Artificial Intelligence and Machine Learning)

► Show Figures

Figure 1

17 pages, 3951 KB

Open AccessArticle

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders

by Yao-Ming Kuo, Shanq-Jang Ruan, Yu-Chin Chen and Ya-Wen Tu

Children 2022, 9(7), 996; https://doi.org/10.3390/children9070996 - 1 Jul 2022

Cited by 11 | Viewed by 4302

Abstract

This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children’s speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated [...] Read more.

This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children’s speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3–6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech–language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of MFCC (Mel-frequency cepstral coefficients) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system’s ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4 percent. Full article

(This article belongs to the Special Issue Current Research on Developmental Speech and Language Delays and Disorders)

► Show Figures

Figure 1

17 pages, 469 KB

Open AccessReview

A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative Evidence

by Camryn Terblanche, Michal Harty, Michelle Pascoe and Benjamin V. Tucker

Appl. Sci. 2022, 12(11), 5623; https://doi.org/10.3390/app12115623 - 1 Jun 2022

Cited by 9 | Viewed by 4736

Abstract

(1) Background: Speech synthesis has customarily focused on adult speech, but with the rapid development of speech-synthesis technology, it is now possible to create child voices with a limited amount of child-speech data. This scoping review summarises the evidence base related to developing [...] Read more.

(1) Background: Speech synthesis has customarily focused on adult speech, but with the rapid development of speech-synthesis technology, it is now possible to create child voices with a limited amount of child-speech data. This scoping review summarises the evidence base related to developing synthesised speech for children. (2) Method: The included studies were those that were (1) published between 2006 and 2021 and (2) included child participants or voices of children aged between 2–16 years old. (3) Results: 58 studies were identified. They were discussed based on the languages used, the speech-synthesis systems and/or methods used, the speech data used, the intelligibility of the speech and the ages of the voices. Based on the reviewed studies, relative to adult-speech synthesis, developing child-speech synthesis is notably more challenging. Child speech often presents with acoustic variability and articulatory errors. To account for this, researchers have most often attempted to adapt adult-speech models, using a variety of different adaptation techniques. (4) Conclusions: Adapting adult speech has proven successful in child-speech synthesis. It appears that the resulting quality can be improved by training a large amount of pre-selected speech data, aided by a neural-network classifier, to better match the children’s speech. We encourage future research surrounding individualised synthetic speech for children with CCN, with special attention to children who make use of low-resource languages. Full article

(This article belongs to the Special Issue Applications of Speech and Language Technologies in Healthcare)

► Show Figures

Figure 1

26 pages, 2025 KB

Open AccessReview

Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review

by Vivek Bhardwaj, Mohamed Tahar Ben Othman, Vinay Kukreja, Youcef Belkhier, Mohit Bajaj, B. Srikanth Goud, Ateeq Ur Rehman, Muhammad Shafiq and Habib Hamam

Appl. Sci. 2022, 12(9), 4419; https://doi.org/10.3390/app12094419 - 27 Apr 2022

Cited by 111 | Viewed by 15747

Abstract

Automatic speech recognition (ASR) is one of the ways used to transform acoustic speech signals into text. Over the last few decades, an enormous amount of research work has been done in the research area of speech recognition (SR). However, most studies have [...] Read more.

Automatic speech recognition (ASR) is one of the ways used to transform acoustic speech signals into text. Over the last few decades, an enormous amount of research work has been done in the research area of speech recognition (SR). However, most studies have focused on building ASR systems based on adult speech. The recognition of children’s speech was neglected for some time, which means that the field of children’s SR research is wide open. Children’s SR is a challenging task due to the large variations in children’s articulatory, acoustic, physical, and linguistic characteristics compared to adult speech. Thus, the field became a very attractive area of research and it is important to understand where the main center of attention is, and what are the most widely used methods for extracting acoustic features, various acoustic models, speech datasets, the SR toolkits used during the recognition process, and so on. ASR systems or interfaces are extensively used and integrated into various real-life applications, such as search engines, the healthcare industry, biometric analysis, car systems, the military, aids for people with disabilities, and mobile devices. A systematic literature review (SLR) is presented in this work by extracting the relevant information from 76 research papers published from 2009 to 2020 in the field of ASR for children. The objective of this review is to throw light on the trends of research in children’s speech recognition and analyze the potential of trending techniques to recognize children’s speech. Full article

(This article belongs to the Special Issue Automatic Speech Recognition)

► Show Figures

Figure 1

14 pages, 592 KB

Open AccessArticle

Non-Parallel Articulatory-to-Acoustic Conversion Using Multiview-Based Time Warping

by Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, José L. Pérez-Córdoba and Phil D. Green

Appl. Sci. 2022, 12(3), 1167; https://doi.org/10.3390/app12031167 - 23 Jan 2022

Cited by 2 | Viewed by 2567

Abstract

In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed [...] Read more.

In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario. Full article

(This article belongs to the Special Issue IberSPEECH 2020: Speech and Language Technologies for Iberian Languages)

► Show Figures

Figure 1

15 pages, 5546 KB

Open AccessArticle

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

by Qingran Zhan, Xiang Xie, Chenguang Hu, Juan Zuluaga-Gomez, Jing Wang and Haobo Cheng

Electronics 2021, 10(24), 3172; https://doi.org/10.3390/electronics10243172 - 20 Dec 2021

Cited by 4 | Viewed by 4038

Abstract

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal [...] Read more.

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages. Full article

(This article belongs to the Special Issue Applications of Neural Networks for Speech and Language Processing)

► Show Figures

Figure 1

23 pages, 7440 KB

Open AccessArticle

Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech

by Mohammed Algabri, Hassan Mathkour, Mansour M. Alsulaiman and Mohamed A. Bencherif

Sensors 2021, 21(4), 1205; https://doi.org/10.3390/s21041205 - 9 Feb 2021

Cited by 9 | Viewed by 3467

Abstract

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes [...] Read more.

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

12 pages, 911 KB

Open AccessArticle

Chatting While Walking Does Not Interfere with Topographical Working Memory

by Laura Piccardi, Alessia Bocchi, Massimiliano Palmiero, Maddalena Boccia, Simonetta D’Amico and Raffaella Nori

Brain Sci. 2020, 10(11), 811; https://doi.org/10.3390/brainsci10110811 - 2 Nov 2020

Cited by 3 | Viewed by 2992

Abstract

In the present study, we employed the dual task technique to explore the role of language in topographical working memory when landmarks are present along the path. We performed three experiments to mainly test the effects of language but also motor, spatial motor [...] Read more.

In the present study, we employed the dual task technique to explore the role of language in topographical working memory when landmarks are present along the path. We performed three experiments to mainly test the effects of language but also motor, spatial motor and spatial environment interferences on topographical working memory. We aimed to clarify both the role of language in navigational working memory per se and the extent to which spatial language interferes with the main task more than the other types of interference. Specifically, in the three experiments we investigated the differences due to different verbal interference sources (i.e., articulatory suppression of nonsense syllables; right and left, up and bottom; and north, south, east and west). The main hypothesis was that the use of spatial language affected more landmark-based topographical working memory than both the verbalization of nonsense syllables and other types of interference. Results show no effect of spatial language, only spatial environmental interference affected the navigational working memory performance. In general, this might depend on the scarce role of spatial language in online navigational working memory tasks. Specifically, language is more important for learning and retrieval of the cognitive map. Implications and future research directions are discussed. Full article

(This article belongs to the Special Issue Effects of Individual Differences on Spatial Cognition)

► Show Figures

Figure 1

7 pages, 2189 KB

Open AccessReview

The Efficacy of Oral Myofunctional and Coarticulation Therapy

by Forrest G. Umberger and Robert G. Johnston

Int. J. Orofac. Myol. Myofunct. Ther. 1997, 23(1), 3-9; https://doi.org/10.52010/ijom.1997.23.1.3 - 1 Nov 1997

Cited by 2 | Viewed by 346

Abstract

Summary: The authors have attempted to summarize the current state of knowledge about the relationships between oral myofunctional therapy and articulation therapy. Considerable evidence has been obtained that indicates that oral myofunctional therapy techniques can improve articulation of sibilant sounds. These findings present [...] Read more.

Summary: The authors have attempted to summarize the current state of knowledge about the relationships between oral myofunctional therapy and articulation therapy. Considerable evidence has been obtained that indicates that oral myofunctional therapy techniques can improve articulation of sibilant sounds. These findings present an optimistic direction for future research and successful use of myofunctional therapy and coarticulation. There remains the need for clinical practitioners and laboratory scientists to continue to investigate the commonalities and differences of oral myofunctional and articulation disorders. We have suggested that the SLP can use techniques that are fundamentally sound for modifying both biological and articulatory behaviors. The use of coarticulation assessment and intervention processes combined with oral myofunctional retraining can coexist in a program designed to retract the resting and ballistic movements of the tongue. Full article

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI