Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = dysarthric speech recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 492 KiB  
Article
A Study on Model Training Strategies for Speaker-Independent and Vocabulary-Mismatched Dysarthric Speech Recognition
by Jinzi Qi and Hugo Van hamme
Appl. Sci. 2025, 15(4), 2006; https://doi.org/10.3390/app15042006 - 14 Feb 2025
Viewed by 932
Abstract
Automatic speech recognition (ASR) systems often struggle to recognize speech from individuals with dysarthria, a speech disorder with neuromuscular causes, with accuracy declining further for unseen speakers and content. Achieving robustness for such situations requires ASR systems to address speaker-independent and vocabulary-mismatched scenarios, [...] Read more.
Automatic speech recognition (ASR) systems often struggle to recognize speech from individuals with dysarthria, a speech disorder with neuromuscular causes, with accuracy declining further for unseen speakers and content. Achieving robustness for such situations requires ASR systems to address speaker-independent and vocabulary-mismatched scenarios, minimizing user adaptation effort. This study focuses on comprehensive training strategies and methods to tackle these challenges, leveraging the transformer-based Wav2Vec2.0 model. Unlike prior research, which often focuses on limited datasets, we systematically explore training data selection strategies across diverse source types (languages, canonical vs. dysarthric, and generic vs. in-domain) in a speaker-independent setting. For the under-explored vocabulary-mismatched scenarios, we evaluate conventional methods, identify their limitations, and propose a solution that uses phonological features as intermediate representations for phone recognition to address these gaps. Experimental results demonstrate that this approach enhances recognition across dysarthric datasets in both speaker-independent and vocabulary-mismatched settings. By integrating advanced transfer learning techniques with the innovative use of phonological features, this study addresses key challenges for dysarthric speech recognition, setting a new benchmark for robustness and adaptability in the field. Full article
Show Figures

Figure 1

17 pages, 3114 KiB  
Article
Real-Time Communication Aid System for Korean Dysarthric Speech
by Kwanghyun Park and Jungpyo Hong
Appl. Sci. 2025, 15(3), 1416; https://doi.org/10.3390/app15031416 - 30 Jan 2025
Viewed by 1331
Abstract
Dysarthria is a speech disorder characterized by difficulties in articulation and vocalization due to impaired control of the articulatory system. Around 30% of individuals with speech disorders have dysarthria, facing significant communication challenges. Existing assistive tools for dysarthria either require additional manipulation or [...] Read more.
Dysarthria is a speech disorder characterized by difficulties in articulation and vocalization due to impaired control of the articulatory system. Around 30% of individuals with speech disorders have dysarthria, facing significant communication challenges. Existing assistive tools for dysarthria either require additional manipulation or only provide word-level speech support, limiting their ability to support effective communication in real-world situations. Thus, this paper proposes a real-time communication aid system that converts sentence-level Korean dysarthric speech to non-dysarthric normal speech. The proposed system consists of two main parts in cascading form. Specifically, a Korean Automatic Speech Recognition (ASR) model is trained with dysarthric utterances using a conformer-based architecture and the graph transducer network–connectionist temporal classification algorithm, significantly enhancing recognition performance over previous models. Subsequently, a Korean Text-To-Speech (TTS) model based on Jointly Training FastSpeech2 and HiFi-GAN for end-to-end Text-to-Speech (JETS) is pipelined to synthesize high-quality non-dysarthric normal speech. These models are integrated into a single system on an app server, which receives 5–10 s of dysarthric speech and converts it to normal speech after 2–3 s. This can provide a practical communication aid for people with dysarthria. Full article
Show Figures

Graphical abstract

23 pages, 3932 KiB  
Review
A Survey of Automatic Speech Recognition for Dysarthric Speech
by Zhaopeng Qian and Kejing Xiao
Electronics 2023, 12(20), 4278; https://doi.org/10.3390/electronics12204278 - 16 Oct 2023
Cited by 15 | Viewed by 5898
Abstract
Dysarthric speech has several pathological characteristics, such as discontinuous pronunciation, uncontrolled volume, slow speech, explosive pronunciation, improper pauses, excessive nasal sounds, and air-flow noise during pronunciation, which differ from healthy speech. Automatic speech recognition (ASR) can be very helpful for speakers with dysarthria. [...] Read more.
Dysarthric speech has several pathological characteristics, such as discontinuous pronunciation, uncontrolled volume, slow speech, explosive pronunciation, improper pauses, excessive nasal sounds, and air-flow noise during pronunciation, which differ from healthy speech. Automatic speech recognition (ASR) can be very helpful for speakers with dysarthria. Our research aims to provide a scoping review of ASR for dysarthric speech, covering papers in this field from 1990 to 2022. Our survey found that the development of research studies about the acoustic features and acoustic models of dysarthric speech is nearly synchronous. During the 2010s, deep learning technologies were widely applied to improve the performance of ASR systems. In the era of deep learning, many advanced methods (such as convolutional neural networks, deep neural networks, and recurrent neural networks) are being applied to design acoustic models and lexical and language models for dysarthric-speech-recognition tasks. Deep learning methods are also used to extract acoustic features from dysarthric speech. Additionally, this scoping review found that speaker-dependent problems seriously limit the generalization applicability of the acoustic model. The scarce available speech data cannot satisfy the amount required to train models using big data. Full article
(This article belongs to the Special Issue Advanced Natural Language Processing Technology and Applications)
Show Figures

Figure 1

16 pages, 8210 KiB  
Article
A Speech Command Control-Based Recognition System for Dysarthric Patients Based on Deep Learning Technology
by Yu-Yi Lin, Wei-Zhong Zheng, Wei Chung Chu, Ji-Yan Han, Ying-Hsiu Hung, Guan-Min Ho, Chia-Yuan Chang and Ying-Hui Lai
Appl. Sci. 2021, 11(6), 2477; https://doi.org/10.3390/app11062477 - 10 Mar 2021
Cited by 29 | Viewed by 4652
Abstract
Voice control is an important way of controlling mobile devices; however, using it remains a challenge for dysarthric patients. Currently, there are many approaches, such as automatic speech recognition (ASR) systems, being used to help dysarthric patients control mobile devices. However, the large [...] Read more.
Voice control is an important way of controlling mobile devices; however, using it remains a challenge for dysarthric patients. Currently, there are many approaches, such as automatic speech recognition (ASR) systems, being used to help dysarthric patients control mobile devices. However, the large computation power requirement for the ASR system increases implementation costs. To alleviate this problem, this study proposed a convolution neural network (CNN) with a phonetic posteriorgram (PPG) speech feature system to recognize speech commands, called CNN–PPG; meanwhile, the CNN model with Mel-frequency cepstral coefficient (CNN–MFCC model) and ASR-based systems were used for comparison. The experiment results show that the CNN–PPG system provided 93.49% accuracy, better than the CNN–MFCC (65.67%) and ASR-based systems (89.59%). Additionally, the CNN–PPG used a smaller model size comprising only 54% parameter numbers compared with the ASR-based system; hence, the proposed system could reduce implementation costs for users. These findings suggest that the CNN–PPG system could augment a communication device to help dysarthric patients control the mobile device via speech commands in the future. Full article
(This article belongs to the Special Issue Machine Learning and Signal Processing for IOT Applications)
Show Figures

Figure 1

Back to TopTop