Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (12)

Search Parameters:
Keywords = mispronunciation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3698 KB  
Article
Child-Centric Robot Dialogue Systems: Fine-Tuning Large Language Models for Better Utterance Understanding and Interaction
by Da-Young Kim, Hyo Jeong Lym, Hanna Lee, Ye Jun Lee, Juhyun Kim, Min-Gyu Kim and Yunju Baek
Sensors 2024, 24(24), 7939; https://doi.org/10.3390/s24247939 - 12 Dec 2024
Cited by 1 | Viewed by 2673
Abstract
Dialogue systems must understand children’s utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child–robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot [...] Read more.
Dialogue systems must understand children’s utterance intentions by considering their unique linguistic characteristics, such as syntactic incompleteness, pronunciation inaccuracies, and creative expressions, to enable natural conversational engagement in child–robot interactions. Even state-of-the-art large language models (LLMs) for language understanding and contextual awareness cannot comprehend children’s intent as accurately as humans because of their distinctive features. An LLM-based dialogue system should acquire the manner by which humans understand children’s speech to enhance its intention reasoning performance in verbal interactions with children. To this end, we propose a fine-tuning methodology that utilizes the LLM–human judgment discrepancy and interactive response data. The former data represent cases in which the LLM and human judgments of the contextual appropriateness of a child’s answer to a robot’s question diverge. The latter data involve robot responses suitable for children’s utterance intentions, generated by the LLM. We developed a fine-tuned dialogue system using these datasets to achieve human-like interpretations of children’s utterances and to respond adaptively. Our system was evaluated through human assessment using the Robotic Social Attributes Scale (RoSAS) and Sensibleness and Specificity Average (SSA) metrics. Consequently, it supports the effective interpretation of children’s utterance intentions and enables natural verbal interactions, even in cases with syntactic incompleteness and mispronunciations. Full article
(This article belongs to the Special Issue Challenges in Human-Robot Interactions for Social Robotics)
Show Figures

Figure 1

22 pages, 461 KB  
Article
(Mis)pronunciations of Hispanic Given Names in the U.S.: Positionalities and Discursive Strategies at Play
by Paola Enríquez Duque
Languages 2023, 8(3), 199; https://doi.org/10.3390/languages8030199 - 28 Aug 2023
Cited by 3 | Viewed by 3517
Abstract
This qualitative study examines the indexical nature of given names and their role in self-positioning within diverse social contexts. The study centers on the pronunciation of Hispanic given names in the United States. The analysis is grounded in interviews with six young adults [...] Read more.
This qualitative study examines the indexical nature of given names and their role in self-positioning within diverse social contexts. The study centers on the pronunciation of Hispanic given names in the United States. The analysis is grounded in interviews with six young adults who recognize that their names have Spanish and English variants, and it demonstrates that bearers’ phonological awareness plays a critical role in distinguishing name variants and mispronunciations, as evidenced through metalinguistic comments. These distinctions are additionally shaped by personal criteria. By examining the participants’ narratives and one participant’s discursive strategies in particular, I show that the pronunciation of given names constitutes a significant linguistic resource intentionally mobilized and managed to negotiate social positionings. Moreover, this research highlights that conferring Hispanic given names in the U.S. constitutes a sociocultural strategy that extends beyond an indexical ethnocultural naming practice across generations. This practice is found to be a means of fostering and maintaining intergenerational relationships. Full article
(This article belongs to the Special Issue Social Meanings of Language Variation in Spanish)
20 pages, 6480 KB  
Article
Arabic Mispronunciation Recognition System Using LSTM Network
by Abdelfatah Ahmed, Mohamed Bader, Ismail Shahin, Ali Bou Nassif, Naoufel Werghi and Mohammad Basel
Information 2023, 14(7), 413; https://doi.org/10.3390/info14070413 - 16 Jul 2023
Cited by 12 | Viewed by 3100
Abstract
The Arabic language has always been an immense source of attraction to various people from different ethnicities by virtue of the significant linguistic legacy that it possesses. Consequently, a multitude of people from all over the world are yearning to learn it. However, [...] Read more.
The Arabic language has always been an immense source of attraction to various people from different ethnicities by virtue of the significant linguistic legacy that it possesses. Consequently, a multitude of people from all over the world are yearning to learn it. However, people from different mother tongues and cultural backgrounds might experience some hardships regarding articulation due to the absence of some particular letters only available in the Arabic language, which could hinder the learning process. As a result, a speaker-independent and text-dependent efficient system that aims to detect articulation disorders was implemented. In the proposed system, we emphasize the prominence of “speech signal processing” in diagnosing Arabic mispronunciation using the Mel-frequency cepstral coefficients (MFCCs) as the optimum extracted features. In addition, long short-term memory (LSTM) was also utilized for the classification process. Furthermore, the analytical framework was incorporated with a gender recognition model to perform two-level classification. Our results show that the LSTM network significantly enhances mispronunciation detection along with gender recognition. The LSTM models attained an average accuracy of 81.52% in the proposed system, reflecting a high performance compared to previous mispronunciation detection systems. Full article
Show Figures

Figure 1

17 pages, 1757 KB  
Article
End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning
by Linkai Peng, Yingming Gao, Rian Bao, Ya Li and Jinsong Zhang
Appl. Sci. 2023, 13(11), 6793; https://doi.org/10.3390/app13116793 - 2 Jun 2023
Cited by 8 | Viewed by 5702
Abstract
As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which [...] Read more.
As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%. Full article
(This article belongs to the Special Issue Advances in Speech and Language Processing)
Show Figures

Figure 1

24 pages, 5307 KB  
Article
Rule-Based Embedded HMMs Phoneme Classification to Improve Qur’anic Recitation Recognition
by Ammar Mohammed Ali Alqadasi, Mohd Shahrizal Sunar, Sherzod Turaev, Rawad Abdulghafor, Md Sah Hj Salam, Abdulaziz Ali Saleh Alashbi, Ali Ahmed Salem and Mohammed A. H. Ali
Electronics 2023, 12(1), 176; https://doi.org/10.3390/electronics12010176 - 30 Dec 2022
Cited by 9 | Viewed by 4823
Abstract
Phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. A mispronunciation of Arabic short vowels or long vowels can change the meaning of a complete sentence. However, correctly distinguishing phonemes with vowels in Quranic recitation (the [...] Read more.
Phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. A mispronunciation of Arabic short vowels or long vowels can change the meaning of a complete sentence. However, correctly distinguishing phonemes with vowels in Quranic recitation (the Holy book of Muslims) is still a challenging problem even for state-of-the-art classification methods, where the duration of the phonemes is considered one of the important features in Quranic recitation, which is called Medd, which means that the phoneme lengthening is governed by strict rules. These features of recitation call for an additional classification of phonemes in Qur’anic recitation due to that the phonemes classification based on Arabic language characteristics is insufficient to recognize Tajweed rules, including the rules of Medd. This paper introduces a Rule-Based Phoneme Duration Algorithm to improve phoneme classification in Qur’anic recitation. The phonemes of the Qur’anic dataset contain 21 Ayats collected from 30 reciters and are carefully analyzed from a baseline HMM-based speech recognition model. Using the Hidden Markov Model with tied-state triphones, a set of phoneme classification models optimized based on duration is constructed and integrated into a Quranic phoneme classification method. The proposed algorithm achieved outstanding accuracy, ranging from 99.87% to 100% according to the Medd type. The obtained results of the proposed algorithm will contribute significantly to Qur’anic recitation recognition models. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

18 pages, 972 KB  
Article
Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection
by Md. Anwar Hussen Wadud, Mohammed Alatiyyah and M. F. Mridha
Appl. Sci. 2023, 13(1), 109; https://doi.org/10.3390/app13010109 - 22 Dec 2022
Cited by 15 | Viewed by 4303
Abstract
A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, [...] Read more.
A crucial element of computer-assisted pronunciation training systems (CAPT) is the mispronunciation detection and diagnostic (MDD) technique. The provided transcriptions can act as a teacher when evaluating the pronunciation quality of finite speech. The preceding texts have been entirely employed by conventional approaches, such as forced alignment and extended recognition networks, for model development or for enhancing system performance. The incorporation of earlier texts into model training has recently been attempted using end-to-end (E2E)-based approaches, and preliminary results indicate efficacy. Attention-based end-to-end models have shown lower speech recognition performance because multi-pass left-to-right forward computation constrains their practical applicability in beam search. In addition, end-to-end neural approaches are typically data-hungry, and a lack of non-native training data will frequently impair their effectiveness in MDD. To solve this problem, we provide a unique MDD technique that uses non-autoregressive (NAR) end-to-end neural models to greatly reduce estimation time while maintaining accuracy levels similar to traditional E2E neural models. In contrast, NAR models can generate parallel token sequences by accepting parallel inputs instead of left-to-right forward computation. To further enhance the effectiveness of MDD, we develop and construct a pronunciation model superimposed on our approach’s NAR end-to-end models. To test the effectiveness of our strategy against some of the best end-to-end models, we use publicly accessible L2-ARCTIC and SpeechOcean English datasets for training and testing purposes where the proposed model shows the best results than other existing models. Full article
(This article belongs to the Special Issue Deep Learning for Speech Processing)
Show Figures

Figure 1

24 pages, 3512 KB  
Article
Mispronunciation Detection and Diagnosis with Articulatory-Level Feedback Generation for Non-Native Arabic Speech
by Mohammed Algabri, Hassan Mathkour, Mansour Alsulaiman and Mohamed A. Bencherif
Mathematics 2022, 10(15), 2727; https://doi.org/10.3390/math10152727 - 2 Aug 2022
Cited by 20 | Viewed by 5663
Abstract
A high-performance versatile computer-assisted pronunciation training (CAPT) system that provides the learner immediate feedback as to whether their pronunciation is correct is very helpful in learning correct pronunciation and allows learners to practice this at any time and with unlimited repetitions, without the [...] Read more.
A high-performance versatile computer-assisted pronunciation training (CAPT) system that provides the learner immediate feedback as to whether their pronunciation is correct is very helpful in learning correct pronunciation and allows learners to practice this at any time and with unlimited repetitions, without the presence of an instructor. In this paper, we propose deep learning-based techniques to build a high-performance versatile CAPT system for mispronunciation detection and diagnosis (MDD) and articulatory feedback generation for non-native Arabic learners. The proposed system can locate the error in pronunciation, recognize the mispronounced phonemes, and detect the corresponding articulatory features (AFs), not only in words but even in sentences. We formulate the recognition of phonemes and corresponding AFs as a multi-label object recognition problem, where the objects are the phonemes and their AFs in a spectral image. Moreover, we investigate the use of cutting-edge neural text-to-speech (TTS) technology to generate a new corpus of high-quality speech from predefined text that has the most common substitution errors among Arabic learners. The proposed model and its various enhanced versions achieved excellent results. We compared the performance of the different proposed models with the state-of-the-art end-to-end technique of MDD, and our system had a better performance. In addition, we proposed using fusion between the proposed model and the end-to-end model and obtained a better performance. Our best model achieved a 3.83% phoneme error rate (PER) in the phoneme recognition task, a 70.53% F1-score in the MDD task, and a detection error rate (DER) of 2.6% for the AF detection task. Full article
(This article belongs to the Special Issue Recent Advances in Artificial Intelligence and Machine Learning)
Show Figures

Figure 1

28 pages, 8971 KB  
Article
An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition
by Niyada Rukwong and Sunee Pongpinigpinyo
Appl. Sci. 2022, 12(13), 6595; https://doi.org/10.3390/app12136595 - 29 Jun 2022
Cited by 5 | Viewed by 3419
Abstract
For Thai vowel pronunciation, it is very important to know that when mispronunciation occurs, the meanings of words change completely. Thus, effective and standardized practice is essential to pronouncing words correctly as a native speaker. Since the COVID-19 pandemic, online learning has become [...] Read more.
For Thai vowel pronunciation, it is very important to know that when mispronunciation occurs, the meanings of words change completely. Thus, effective and standardized practice is essential to pronouncing words correctly as a native speaker. Since the COVID-19 pandemic, online learning has become increasingly popular. For example, an online pronunciation application system was introduced that has virtual teachers and an intelligent process of evaluating students that is similar to standardized training by a teacher in a real classroom. This research presents an online automatic computer-assisted pronunciation training (CAPT) using deep learning to recognize Thai vowels in speech. The automatic CAPT is developed to solve the inadequacy of instruction specialists and the complex vowel teaching process. It is a unique system that develops computer techniques integrated with linguistic theory. The deep learning model is the most significant part of recognizing vowels pronounced for the automatic CAPT. The major challenge in Thai vowel recognition is the correct identification of Thai vowels when spoken in real-world situations. A convolutional neural network (CNN), a deep learning model, is applied and developed in the classification of pronounced Thai vowels. A new dataset for Thai vowels was designed, collected, and examined by linguists. The result of an optimal CNN model with Mel spectrogram (MS) achieves the highest accuracy of 98.61%, compared with Mel frequency cepstral coefficients (MFCC) with the baseline long short-term memory (LSTM) model and MS with the baseline LSTM model have an accuracy of 94.44% and 90.00% respectively. Full article
Show Figures

Figure 1

19 pages, 2070 KB  
Article
An Approach for Pronunciation Classification of Classical Arabic Phonemes Using Deep Learning
by Amna Asif, Hamid Mukhtar, Fatimah Alqadheeb, Hafiz Farooq Ahmad and Abdulaziz Alhumam
Appl. Sci. 2022, 12(1), 238; https://doi.org/10.3390/app12010238 - 27 Dec 2021
Cited by 29 | Viewed by 7282
Abstract
A mispronunciation of Arabic short vowels can change the meaning of a complete sentence. For this reason, both the students and teachers of Classical Arabic (CA) are required extra practice for correcting students’ pronunciation of Arabic short vowels. That makes the teaching and [...] Read more.
A mispronunciation of Arabic short vowels can change the meaning of a complete sentence. For this reason, both the students and teachers of Classical Arabic (CA) are required extra practice for correcting students’ pronunciation of Arabic short vowels. That makes the teaching and learning task cumbersome for both parties. An intelligent process of students’ evaluation can make learning and teaching easier for both students and teachers. Given that online learning has become a norm these days, modern learning requires assessment by virtual teachers. In our case, the task is about recognizing the exact pronunciation of Arabic alphabets according to the standards. A major challenge in the recognition of precise pronunciation of Arabic alphabets is the correct identification of a large number of short vowels, which cannot be dealt with using traditional statistical audio processing techniques and machine learning models. Therefore, we developed a model that classifies Arabic short vowels using Deep Neural Networks (DNN). The model is constructed from scratch by: (i) collecting a new audio dataset, (ii) developing a neural network architecture, and (iii) optimizing and fine-tuning the developed model through several iterations to achieve high classification accuracy. Given a set of unseen audio samples of uttered short vowels, our proposed model has reached the testing accuracy of 95.77%. We can say that our results can be used by the experts and researchers for building better intelligent learning support systems in Arabic speech processing. Full article
(This article belongs to the Special Issue Women in Artificial intelligence (AI))
Show Figures

Figure 1

18 pages, 1878 KB  
Article
Breakdowns in Informativeness of Naturalistic Speech Production in Primary Progressive Aphasia
by Jeanne Gallée, Claire Cordella, Evelina Fedorenko, Daisy Hochberg, Alexandra Touroutoglou, Megan Quimby and Bradford C. Dickerson
Brain Sci. 2021, 11(2), 130; https://doi.org/10.3390/brainsci11020130 - 20 Jan 2021
Cited by 19 | Viewed by 4621
Abstract
“Functional communication” refers to an individual’s ability to communicate effectively in his or her everyday environment, and thus is a paramount skill to monitor and target therapeutically in people with aphasia. However, traditional controlled-paradigm assessments commonly used in both research and clinical settings [...] Read more.
“Functional communication” refers to an individual’s ability to communicate effectively in his or her everyday environment, and thus is a paramount skill to monitor and target therapeutically in people with aphasia. However, traditional controlled-paradigm assessments commonly used in both research and clinical settings often fail to adequately capture this ability. In the current study, facets of functional communication were measured from picture-elicited speech samples from 70 individuals with mild primary progressive aphasia (PPA), including the three variants, and 31 age-matched controls. Building upon methods recently used by Berube et al. (2019), we measured the informativeness of speech by quantifying the content of each patient’s description that was relevant to a picture relative to the total amount of speech they produced. Importantly, form-based errors, such as mispronunciations of words, unusual word choices, or grammatical mistakes are not penalized in this approach. We found that the relative informativeness, or efficiency, of speech was preserved in non-fluent variant PPA patients as compared with controls, whereas the logopenic and semantic variant PPA patients produced significantly less informative output. Furthermore, reduced informativeness in the semantic variant is attributable to a lower production of content units and a propensity for self-referential tangents, whereas for the logopenic variant, a lower production of content units and relatively ”empty” speech and false starts contribute to this reduction. These findings demonstrate that functional communication impairment does not uniformly affect all the PPA variants and highlight the utility of naturalistic speech analysis for measuring the breakdown of functional communication in PPA. Full article
(This article belongs to the Special Issue Advances in Primary Progressive Aphasia)
Show Figures

Graphical abstract

17 pages, 633 KB  
Article
Improving Mispronunciation Detection of Arabic Words for Non-Native Learners Using Deep Convolutional Neural Network Features
by Shamila Akhtar, Fawad Hussain, Fawad Riasat Raja, Muhammad Ehatisham-ul-haq, Naveed Khan Baloch, Farruh Ishmanov and Yousaf Bin Zikria
Electronics 2020, 9(6), 963; https://doi.org/10.3390/electronics9060963 - 9 Jun 2020
Cited by 34 | Viewed by 5515
Abstract
Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is essential for communication with people of different linguistic backgrounds. Mispronunciation detection is an integral part of CALL, which is used for automatic pointing of errors for the non-native speaker. In this [...] Read more.
Computer-Aided Language Learning (CALL) is growing nowadays because learning new languages is essential for communication with people of different linguistic backgrounds. Mispronunciation detection is an integral part of CALL, which is used for automatic pointing of errors for the non-native speaker. In this paper, we investigated the mispronunciation detection of Arabic words using deep Convolution Neural Network (CNN). For automated pronunciation error detection, we proposed CNN features-based model and extracted features from different layers of Alex Net (layers 6, 7, and 8) to train three machine learning classifiers; K-nearest neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF). We also used a transfer learning-based model in which feature extraction and classification are performed automatically. To evaluate the performance of the proposed method, a comprehensive evaluation is provided on these methods with a traditional machine learning-based method using Mel Frequency Cepstral Coefficients (MFCC) features. We used the same three classifiers KNN, SVM, and RF in the baseline method for mispronunciation detection. Experimental results show that with handcrafted features, transfer learning-based method and classification based on deep features extracted from Alex Net achieved an average accuracy of 73.67, 85 and 93.20 on Arabic words, respectively. Moreover, these results reveal that the proposed method with feature selection achieved the best average accuracy of 93.20% than all other methods. Full article
(This article belongs to the Special Issue Deep Learning for Multiple-Level Visual Feature Extraction)
Show Figures

Figure 1

25 pages, 2960 KB  
Article
Consonant and Vowel Processing in Word Form Segmentation: An Infant ERP Study
by Katie Von Holzen, Leo-Lyuki Nishibayashi and Thierry Nazzi
Brain Sci. 2018, 8(2), 24; https://doi.org/10.3390/brainsci8020024 - 31 Jan 2018
Cited by 21 | Viewed by 7245
Abstract
Segmentation skill and the preferential processing of consonants (C-bias) develop during the second half of the first year of life and it has been proposed that these facilitate language acquisition. We used Event-related brain potentials (ERPs) to investigate the neural bases of early [...] Read more.
Segmentation skill and the preferential processing of consonants (C-bias) develop during the second half of the first year of life and it has been proposed that these facilitate language acquisition. We used Event-related brain potentials (ERPs) to investigate the neural bases of early word form segmentation, and of the early processing of onset consonants, medial vowels, and coda consonants, exploring how differences in these early skills might be related to later language outcomes. Our results with French-learning eight-month-old infants primarily support previous studies that found that the word familiarity effect in segmentation is developing from a positive to a negative polarity at this age. Although as a group infants exhibited an anterior-localized negative effect, inspection of individual results revealed that a majority of infants showed a negative-going response (Negative Responders), while a minority showed a positive-going response (Positive Responders). Furthermore, all infants demonstrated sensitivity to onset consonant mispronunciations, while Negative Responders demonstrated a lack of sensitivity to vowel mispronunciations, a developmental pattern similar to previous literature. Responses to coda consonant mispronunciations revealed neither sensitivity nor lack of sensitivity. We found that infants showing a more mature, negative response to newly segmented words compared to control words (evaluating segmentation skill) and mispronunciations (evaluating phonological processing) at test also had greater growth in word production over the second year of life than infants showing a more positive response. These results establish a relationship between early segmentation skills and phonological processing (not modulated by the type of mispronunciation) and later lexical skills. Full article
Show Figures

Figure 1

Back to TopTop