Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (19)

Search Parameters:
Keywords = transcription speech intelligibility

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 856 KiB  
Article
Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation
by Micalle Carl and Michal Icht
Diagnostics 2025, 15(15), 1892; https://doi.org/10.3390/diagnostics15151892 - 28 Jul 2025
Viewed by 174
Abstract
Background/Objectives: Accurate assessment of speech intelligibility is necessary for individuals with motor speech disorders. Transcription or scaled rating methods by naïve listeners are the most reliable tasks for these purposes; however, they are often resource-intensive and time-consuming within clinical contexts. Automatic speech [...] Read more.
Background/Objectives: Accurate assessment of speech intelligibility is necessary for individuals with motor speech disorders. Transcription or scaled rating methods by naïve listeners are the most reliable tasks for these purposes; however, they are often resource-intensive and time-consuming within clinical contexts. Automatic speech recognition (ASR) systems, which transcribe speech into text, have been increasingly utilized for assessing speech intelligibility. This study investigates the feasibility of using an open-source ASR system to assess speech intelligibility in Hebrew and English speakers with Down syndrome (DS). Methods: Recordings from 65 Hebrew- and English-speaking participants were included: 33 speakers with DS and 32 typically developing (TD) peers. Speech samples (words, sentences) were transcribed using Whisper (OpenAI) and by naïve listeners. The proportion of agreement between ASR transcriptions and those of naïve listeners was compared across speaker groups (TD, DS) and languages (Hebrew, English) for word-level data. Further comparisons for Hebrew speakers were conducted across speaker groups and stimuli (words, sentences). Results: The strength of the correlation between listener and ASR transcription scores varied across languages, and was higher for English (r = 0.98) than for Hebrew (r = 0.81) for speakers with DS. A higher proportion of listener–ASR agreement was demonstrated for TD speakers, as compared to those with DS (0.94 vs. 0.74, respectively), and for English, in comparison to Hebrew speakers (0.91 for English DS speakers vs. 0.74 for Hebrew DS speakers). Listener–ASR agreement for single words was consistently higher than for sentences among Hebrew speakers. Speakers’ intelligibility influenced word-level agreement among Hebrew- but not English-speaking participants with DS. Conclusions: ASR performance for English closely approximated that of naïve listeners, suggesting potential near-future clinical applicability within single-word intelligibility assessment. In contrast, a lower proportion of agreement between human listeners and ASR for Hebrew speech indicates that broader clinical implementation may require further training of ASR models in this language. Full article
(This article belongs to the Special Issue Evaluation and Management of Developmental Disabilities)
Show Figures

Figure 1

17 pages, 1351 KiB  
Article
Automated Speech Intelligibility Assessment Using AI-Based Transcription in Children with Cochlear Implants, Hearing Aids, and Normal Hearing
by Vicky W. Zhang, Arun Sebastian and Jessica J. M. Monaghan
J. Clin. Med. 2025, 14(15), 5280; https://doi.org/10.3390/jcm14155280 - 25 Jul 2025
Viewed by 279
Abstract
Background/Objectives: Speech intelligibility (SI) is a key indicator of spoken language development, especially for children with hearing loss, as it directly impacts communication and social engagement. However, due to logistical and methodological challenges, SI assessment is often underutilised in clinical practice. This [...] Read more.
Background/Objectives: Speech intelligibility (SI) is a key indicator of spoken language development, especially for children with hearing loss, as it directly impacts communication and social engagement. However, due to logistical and methodological challenges, SI assessment is often underutilised in clinical practice. This study aimed to evaluate the accuracy and consistency of an artificial intelligence (AI)-based transcription model in assessing SI in young children with cochlear implants (CI), hearing aids (HA), or normal hearing (NH), in comparison to naïve human listeners. Methods: A total of 580 speech samples from 58 five-year-old children were transcribed by three naïve listeners and the AI model. Word-level transcription accuracy was evaluated using Bland–Altman plots, intraclass correlation coefficients (ICCs), and word error rate (WER) metrics. Performance was compared across the CI, HA, and NH groups. Results: The AI model demonstrated high consistency with naïve listeners across all groups. Bland–Altman analyses revealed minimal bias, with fewer than 6% of sentences falling outside the 95% limits of agreement. ICC values exceeded 0.9 in all groups, with particularly strong agreement in the NH and CI groups (ICCs > 0.95). WER results further confirmed this alignment and indicated that children with CIs showed better SI performance than those using HAs. Conclusions: The AI-based method offers a reliable and objective solution for SI assessment in young children. Its agreement with human performance supports its integration into clinical and home environments for early intervention and ongoing monitoring of speech development in children with hearing loss. Full article
(This article belongs to the Special Issue The Challenges and Prospects in Cochlear Implantation)
Show Figures

Figure 1

15 pages, 1359 KiB  
Article
Phoneme-Aware Hierarchical Augmentation and Semantic-Aware SpecAugment for Low-Resource Cantonese Speech Recognition
by Lusheng Zhang, Shie Wu and Zhongxun Wang
Sensors 2025, 25(14), 4288; https://doi.org/10.3390/s25144288 - 9 Jul 2025
Viewed by 441
Abstract
Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments [...] Read more.
Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments and Tacotron-2 synthesis, injects adversarial phoneme variants into both transcripts and their aligned audio segments, enlarging pronunciation diversity. Concurrently, a semantic-aware SpecAugment scheme exploits wav2vec 2.0 attention heat maps and keyword boundaries to adaptively mask informative time–frequency regions; a reinforcement-learning controller tunes the masking schedule online, forcing the model to rely on a wider context. On the Common Voice Cantonese 50 h subset, the combined strategy reduces the character error rate (CER) from 26.17% to 16.88% with wav2vec 2.0 and from 38.83% to 23.55% with Zipformer. At 100 h, the CER further drops to 4.27% and 2.32%, yielding relative gains of 32–44%. Ablation studies confirm that phoneme-level and masking components provide complementary benefits. The framework offers a practical, model-independent path toward accurate ASR for Cantonese and other low-resource tonal languages. This paper presents an intelligent sensing-oriented modeling framework for speech signals, which is suitable for deployment on edge or embedded systems to process input from audio sensors (e.g., microphones) and shows promising potential for voice-interactive terminal applications. Full article
Show Figures

Figure 1

21 pages, 3912 KiB  
Article
Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals
by Ahmed Elhadad, Safwat Hamad, Noha Elfiky, Fulayjan Alanazi, Ahmed I. Taloba and Rasha M. Abd El-Aziz
AI 2024, 5(4), 2497-2517; https://doi.org/10.3390/ai5040121 - 26 Nov 2024
Cited by 4 | Viewed by 2696
Abstract
Intelligent Speech Technology (IST) is revolutionizing healthcare by enhancing transcription accuracy, disease diagnosis, and medical equipment control in smart hospital environments. This study introduces an innovative approach employing federated learning with Multi-Layer Perceptron (MLP) and Gated Recurrent Unit (GRU) neural networks to improve [...] Read more.
Intelligent Speech Technology (IST) is revolutionizing healthcare by enhancing transcription accuracy, disease diagnosis, and medical equipment control in smart hospital environments. This study introduces an innovative approach employing federated learning with Multi-Layer Perceptron (MLP) and Gated Recurrent Unit (GRU) neural networks to improve IST performance. Leveraging the “Medical Speech, Transcription, and Intent” dataset from Kaggle, comprising a variety of speech recordings and corresponding medical symptom labels, noise reduction was applied using a Wiener filter to improve audio quality. Feature extraction through MLP and sequence classification with GRU highlighted the model’s robustness and capacity for detailed medical understanding. The federated learning framework enabled collaborative model training across multiple hospital sites, preserving patient privacy by avoiding raw data exchange. This distributed approach allowed the model to learn from diverse, real-world data while ensuring compliance with strict data protection standards. Through rigorous five-fold cross-validation, the proposed Fed MLP-GRU model demonstrated an accuracy of 98.6%, with consistently high sensitivity and specificity, highlighting its reliable generalization across multiple test conditions. In real-time applications, the model effectively performed medical transcription, provided symptom-based diagnostic insights, and facilitated hands-free control of healthcare equipment, reducing contamination risks and enhancing workflow efficiency. These findings indicate that IST, powered by federated neural networks, can significantly improve healthcare delivery, accuracy in patient diagnosis, and operational efficiency in clinical settings. This research underscores the transformative potential of federated learning and advanced neural networks for addressing pressing challenges in modern healthcare and setting the stage for future innovations in intelligent medical technology. Full article
Show Figures

Figure 1

17 pages, 1801 KiB  
Article
Toward Effective Aircraft Call Sign Detection Using Fuzzy String-Matching between ASR and ADS-B Data
by Mohammed Saïd Kasttet, Abdelouahid Lyhyaoui, Douae Zbakh, Adil Aramja and Abderazzek Kachkari
Aerospace 2024, 11(1), 32; https://doi.org/10.3390/aerospace11010032 - 29 Dec 2023
Cited by 4 | Viewed by 2504
Abstract
Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed [...] Read more.
Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed and accuracy that can often achieve a Word Error Rate (WER) below 10%. These toolkits can nowadays be deployed, for instance, within aircraft cockpits and Air Traffic Control (ATC) systems in order to identify aircraft and display recognized voice messages related to flight data, especially for airports not equipped with radar. Hence, the performance of air traffic controllers and pilots can ultimately be improved by reducing workload and stress and enforcing safety standards. Our experiment conducted at Tangier’s International Airport ATC aimed to build an ASR model that is able to recognize aircraft call signs in a fast and accurate way. The acoustic and linguistic models were trained on the Ibn Battouta Speech Corpus (IBSC), resulting in an unprecedented speech dataset with approved transcription that includes real weather aerodrome observation data and flight information with a call sign captured by an ADS-B receiver. All of these data were synchronized with voice recordings in a structured format. We calculated the WER to evaluate the model’s accuracy and compared different methods of dataset training for model building and adaptation. Despite the high interference in the VHF radio communication channel and fast-speaking conditions that increased the WER level to 20%, our standalone and low-cost ASR system with a trained RNN model, supported by the Deep Speech toolkit, was able to achieve call sign detection rate scores up to 96% in air traffic controller messages and 90% in pilot messages while displaying related flight information from ADS-B data using the Fuzzy string-matching algorithm. Full article
Show Figures

Figure 1

33 pages, 2312 KiB  
Article
Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding
by Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault and Khalid Choukri
Aerospace 2023, 10(10), 898; https://doi.org/10.3390/aerospace10100898 - 20 Oct 2023
Cited by 11 | Viewed by 7711
Abstract
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at [...] Read more.
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at integrating artificial intelligence (AI) into ATC communications in order to lessen ATCos’s workload. However, the development of data-driven AI systems for understanding of spoken ATC communications demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, which aimed to develop an unique platform to collect, preprocess, and transcribe large amounts of ATC audio data from airspace in real time. This paper reviews (i) robust automatic speech recognition (ASR), (ii) natural language processing, (iii) English language identification, and (iv) contextual ASR biasing with surveillance data. The pipeline developed during the ATCO2 project, along with the open-sourcing of its data, encourages research in the ATC field, while the full corpus can be purchased through ELDA. ATCO2 corpora is suitable for developing ASR systems when little or near to no ATC audio transcribed data are available. For instance, the proposed ASR system trained with ATCO2 reaches as low as 17.9% WER on public ATC datasets which is 6.6% absolute WER better than with “out-of-domain” but gold transcriptions. Finally, the release of 5000 h of ASR transcribed speech—covering more than 10 airports worldwide—is a step forward towards more robust automatic speech understanding systems for ATC communications. Full article
Show Figures

Figure 1

14 pages, 2523 KiB  
Article
AI Enhancements for Linguistic E-Learning System
by Jueting Liu, Sicheng Li, Chang Ren, Yibo Lyu, Tingting Xu, Zehua Wang and Wei Chen
Appl. Sci. 2023, 13(19), 10758; https://doi.org/10.3390/app131910758 - 27 Sep 2023
Cited by 3 | Viewed by 3143
Abstract
E-learning systems have been considerably developed after the COVID-19 pandemic. In our previous work, we developed a linguistic interactive E-learning system for phonetic transcription learning. In this paper, we propose three artificial-intelligence-based enhancements to this system from different aspects. Compared with the original [...] Read more.
E-learning systems have been considerably developed after the COVID-19 pandemic. In our previous work, we developed a linguistic interactive E-learning system for phonetic transcription learning. In this paper, we propose three artificial-intelligence-based enhancements to this system from different aspects. Compared with the original system, the first enhancement is a disordered speech classification module; this module is driven by the MFCC-CNN model, which aims to distinguish disordered speech and nondisordered speech. The accuracy of the classification is about 83%. The second enhancement is a grapheme-to-phoneme converter. This converter is based on the transformer model and designed for teachers to better generate IPA words from the regular written text. Compared with other G2P models, our transformer-based G2P model provides outstanding PER and WER performance. The last part of this paper focuses on a Tacotron2-based IPA-to-speech synthesis system, this deep learning-based TTS system can help teacher generate high-quality speech sounds from IPA characters which significantly improve the functionality of our original system. All of these three enhancements are related to the phonetic transcription process. and this work not only provides a better experience for the users of this system but also explores the utilization of artificial intelligence technologies in the E-learning field and linguistic field. Full article
Show Figures

Figure 1

19 pages, 3742 KiB  
Article
Analysis of Backchannel Inviting Cues in Dyadic Speech Communication
by Stanislav Ondáš, Eva Kiktová, Matúš Pleva and Jozef Juhár
Electronics 2023, 12(17), 3705; https://doi.org/10.3390/electronics12173705 - 1 Sep 2023
Cited by 1 | Viewed by 2387
Abstract
The paper aims to study speaker and listener behavior in dyadic speech communication. A multimodal (speech and video) corpus of dyadic face-to-face conversations on various topics was created. The corpus was manually labeled on several layers (text transcription, backchannel modality and function, POS [...] Read more.
The paper aims to study speaker and listener behavior in dyadic speech communication. A multimodal (speech and video) corpus of dyadic face-to-face conversations on various topics was created. The corpus was manually labeled on several layers (text transcription, backchannel modality and function, POS tags, prosody, and gaze). The statistical analysis was done on the proposed corpus. We focused on backchannel inviting cues on the speaker side and backchannels on the listener side and their patterns. We aimed to study interlocutor backchannel behavior and backchannel-related signals. The results of the analysis show similar patterns in the case of backchannel inviting cues between Slovak and English data and highlight the importance of gaze direction in a face-to-face speech communication scenario. The described corpus and results of the analysis are one of the first steps leading towards natural artificial intelligence-driven human–computer speech conversation. Full article
(This article belongs to the Special Issue Human Computer Interaction in Intelligent System)
Show Figures

Figure 1

18 pages, 603 KiB  
Article
The Influence of Stimulus Composition and Scoring Method on Objective Listener Assessments of Tracheoesophageal Speech Accuracy
by Philip C. Doyle, Natasha Goncharenko and Jeff Searl
Appl. Sci. 2023, 13(17), 9701; https://doi.org/10.3390/app13179701 - 28 Aug 2023
Viewed by 1072
Abstract
Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the [...] Read more.
Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the three intelligibility word lists, including the (1) Consonant Rhyme Test, (2) Northwestern Intelligibility Test, and (3) the Weiss and Basili list. Methodology: Fifteen normal-hearing young adults served as listeners; all listeners were trained in phonetic transcription (IPA), but none had previous exposure to any mode of postlaryngectomy alaryngeal speech. Speaker stimuli were presented to all listeners through headphones, and all stimuli were transcribed phonetically using an open-set response paradigm. Data were analyzed for individual speakers by stimulus list. Phonemic scoring was compared to a whole-word scoring method, and the types of errors observed were quantified by word list. Results: Individual speaker variability was noted, and its effect on the assessment of speech accuracy was identified. The phonemic scoring method was found to be a more sensitive measure of TE speech accuracy. The W&B list yielded the lowest accuracy scores of the three lists. This finding may indicate its increased sensitivity and potential clinical value. Conclusions: Overall, this study supports the use of open-set, phonemic scoring methods when evaluating TE speaker intelligibility. Future research should aim to assess the specificity of assessment tools on a larger sample of TE speakers who vary in their speech proficiency. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

20 pages, 18911 KiB  
Article
Chiroscript: Transcription System for Studying Hand Gestures in Early Modern Painting
by Temenuzhka Dimova
Arts 2023, 12(4), 179; https://doi.org/10.3390/arts12040179 - 21 Aug 2023
Cited by 2 | Viewed by 2637
Abstract
The main goal of this article is to introduce a new method for the analysis of depicted gestures in painting, namely a transcription system called chiroscript. Based on the model of transcription and annotation systems used in linguistics of co-speech gestures and sign [...] Read more.
The main goal of this article is to introduce a new method for the analysis of depicted gestures in painting, namely a transcription system called chiroscript. Based on the model of transcription and annotation systems used in linguistics of co-speech gestures and sign languages, it is intended to provide a more systematic and objective study of pictorial gestures, revealing their modes of combination inside chirographic accords. The place of chirograms (depicted hand gestures) within pictorial semiotics will be briefly discussed in order to better explain why a transcription system is very much needed and how it could expand art historical perspectives. Pictorial gestures form an understudied language-like system which has the potential to increase the intelligibility of paintings. We argue that even though transcription is not a common practice in art history, it may contribute and even transform semiotic analyses of figurative paintings. Full article
(This article belongs to the Special Issue Studies on Semiotics of Art)
Show Figures

Figure 1

8 pages, 2842 KiB  
Article
A Novel Intelligent Rebound Hammer System Based on Internet of Things
by Zongqiang Pang, Qing Wang, Yong Wang and Zhiyin Gong
Micromachines 2023, 14(1), 148; https://doi.org/10.3390/mi14010148 - 6 Jan 2023
Cited by 2 | Viewed by 2370
Abstract
In order to improve the test efficiency of concrete strength and ensure measured data reliability, we present a novel intelligent rebound hammer system which is based on the Internet of Things (IoT) and speech recognition technology. The system uses a STM32F103C8T6 microcontroller as [...] Read more.
In order to improve the test efficiency of concrete strength and ensure measured data reliability, we present a novel intelligent rebound hammer system which is based on the Internet of Things (IoT) and speech recognition technology. The system uses a STM32F103C8T6 microcontroller as the Main Control Unit (MCU), and one BC26 module as the communication unit, combined with a LD3320 voice recognition module and TOF050H laser ranging sensor to achieve the function of phonetic transcription and laser ranging. Without the need for traditional multi-person collaboration and burdensome data transfer, the system can collect the data of rebound value and location information and send them to the remote cloud information management system automatically in real time. The test results show that the system has high measuring accuracy, good data transmission stability and convenient operation, which could provide guidance for other types of non-destructive testing equipment designs. Full article
Show Figures

Figure 1

19 pages, 4838 KiB  
Article
ODIN IVR-Interactive Solution for Emergency Calls Handling
by Bogdan-Costel Mocanu, Ion-Dorinel Filip, Remus-Dan Ungureanu, Catalin Negru, Mihai Dascalu, Stefan-Adrian Toma, Titus-Constantin Balan, Ion Bica and Florin Pop
Appl. Sci. 2022, 12(21), 10844; https://doi.org/10.3390/app122110844 - 26 Oct 2022
Cited by 14 | Viewed by 3667
Abstract
Human interaction in natural language with computer systems has been a prime focus of research, and the field of conversational agents (including chatbots and Interactive Voice Response (IVR) systems) has evolved significantly since 2009, with a major boost in 2016, especially for industrial [...] Read more.
Human interaction in natural language with computer systems has been a prime focus of research, and the field of conversational agents (including chatbots and Interactive Voice Response (IVR) systems) has evolved significantly since 2009, with a major boost in 2016, especially for industrial solutions. Emergency systems are crucial elements of today’s societies that can benefit from the advantages of intelligent human–computer interaction systems. In this paper, we present two solutions for human-to-computer emergency systems with critical deadlines that use a multi-layer FreeSwitch IVR solution and the Botpress chatbot platform. We are the pioneers in Romania who designed and implemented such a solution, which was evaluated in terms of performance and resource management concerning Quality of Service (QoS). Additionally, we assessed our Proof of Concept (PoC) with real data as part of the system for real-time Romanian transcription of speech and recognition of emotional states within emergency calls. Based on our feasibility research, we concluded that the telephony IVR best fits the requirements and specifications of the national 112 system, with the presented PoC ready to be integrated into the Romanian emergency system. Full article
Show Figures

Figure 1

14 pages, 377 KiB  
Article
The Influence of Social Information on Speech Intelligibility within the Spanish Heritage Community
by Cecelia Staggs, Melissa Baese-Berk and Charlie Nagle
Languages 2022, 7(3), 231; https://doi.org/10.3390/languages7030231 - 7 Sep 2022
Cited by 2 | Viewed by 3066
Abstract
Previous research in speech perception has shown that perception is influenced by social factors that can result in behavioral consequences such as reduced intelligibility (i.e., a listeners’ ability to transcribe the speech they hear). However, little is known about these effects regarding Spanish [...] Read more.
Previous research in speech perception has shown that perception is influenced by social factors that can result in behavioral consequences such as reduced intelligibility (i.e., a listeners’ ability to transcribe the speech they hear). However, little is known about these effects regarding Spanish speakers’ perception of heritage Spanish, Spanish spoken by individuals who have an ancestral and cultural connection to the Spanish language. Given that ideologies within the U.S. Latino community often equate Latino identity to speaking Spanish “correctly” and proficiently, there is a clear need to understand the potential influence these ideologies have on speech perception. Using a matched-guised methodology, we analyzed the influence of speaker social background information and listener social background information on speech perception. Participants completed a transcription task in which four different Spanish heritage speakers were paired with different social guises to determine if the speakers were perceived as equally intelligible under each guise condition. The results showed that social guise and listener social variables did not significantly predict intelligibility scores. We argue that the unique socio-political culture within the U.S. Latino community may lead to different effects of language ideology and social expectation on speech perception than what has been documented in previous work. Full article
17 pages, 1933 KiB  
Article
The Reliability and Validity of Speech-Language Pathologists’ Estimations of Intelligibility in Dysarthria
by Micah E. Hirsch, Austin Thompson, Yunjung Kim and Kaitlin L. Lansford
Brain Sci. 2022, 12(8), 1011; https://doi.org/10.3390/brainsci12081011 - 30 Jul 2022
Cited by 12 | Viewed by 4547
Abstract
This study examined the reliability and validity of speech-language pathologists’ (SLP) estimations of speech intelligibility in dysarthria, including a visual analog scale (VAS) method and a percent estimation method commonly used in clinical settings. Speech samples from 20 speakers with dysarthria of varying [...] Read more.
This study examined the reliability and validity of speech-language pathologists’ (SLP) estimations of speech intelligibility in dysarthria, including a visual analog scale (VAS) method and a percent estimation method commonly used in clinical settings. Speech samples from 20 speakers with dysarthria of varying etiologies were used to collect orthographic transcriptions from naïve listeners n=70 and VAS ratings and percent estimations of intelligibility from SLPs n=21. Intra- and interrater reliability for the two SLP intelligibility measures were evaluated, and the relationship between these measures was assessed. Finally, linear regression was used to evaluate the relationship between the naïve listeners’ orthographic transcription scores and the two SLP intelligibility measures. The results indicated that the intrarater reliability for both SLP intelligibility measures was strong, and the interrater reliability between the SLP ratings was moderate to excellent. A moderate positive relationship between SLPs’ VAS ratings and percent estimations was also observed. Finally, both SLPs’ percent estimations and VAS ratings were predictive of naïve listeners’ orthographic transcription scores, with SLPs’ percent estimations being the strongest predictor. In conclusion, the average SLP percent estimations and VAS ratings are valid and reliable intelligibility measures. However, the validity and reliability of these measures vary between SLPs. Full article
(This article belongs to the Special Issue Profiles of Dysarthria: Clinical Assessment and Treatment)
Show Figures

Figure 1

14 pages, 955 KiB  
Article
Utility of the Intelligibility in Context Scale for Predicting Speech Intelligibility of Children with Cerebral Palsy
by Jennifer U. Soriano, Abby Olivieri and Katherine C. Hustad
Brain Sci. 2021, 11(11), 1540; https://doi.org/10.3390/brainsci11111540 - 20 Nov 2021
Cited by 6 | Viewed by 3862
Abstract
The Intelligibility in Context Scale (ICS) is a widely used, efficient tool for describing a child’s speech intelligibility. Few studies have explored the relationship between ICS scores and transcription intelligibility scores, which are the gold standard for clinical measurement. This study examined how [...] Read more.
The Intelligibility in Context Scale (ICS) is a widely used, efficient tool for describing a child’s speech intelligibility. Few studies have explored the relationship between ICS scores and transcription intelligibility scores, which are the gold standard for clinical measurement. This study examined how well ICS composite scores predicted transcription intelligibility scores among children with cerebral palsy (CP), how well individual questions from the ICS differentially predicted transcription intelligibility scores, and how well the ICS composite scores differentiated between children with and without speech motor impairment. Parents of 48 children with CP, who were approximately 13 years of age, completed the ICS. Ninety-six adult naïve listeners provided orthographic transcriptions of children’s speech. Transcription intelligibility scores were regressed on ICS composite scores and individual item scores. Dysarthria status was regressed on ICS composite scores. Results indicated that ICS composite scores were moderately strong predictors of transcription intelligibility scores. One individual ICS item differentially predicted transcription intelligibility scores, and dysarthria severity influenced how well ICS composite scores differentiated between children with and without speech motor impairment. Findings suggest that the ICS has potential clinical utility for children with CP, especially when used with other objective measures of speech intelligibility. Full article
(This article belongs to the Special Issue Profiles of Dysarthria: Clinical Assessment and Treatment)
Show Figures

Figure 1

Back to TopTop