MDPI - Publisher of Open Access Journals

22 pages, 318 KB

Open AccessArticle

Teacher and Speech-Language Therapist Perceptions of Classroom Listening in Innovative Learning Environment Classrooms

by Vanessa Yeardley, Nuzhat Sultana and Suzanne C. Purdy

Educ. Sci. 2026, 16(6), 949; https://doi.org/10.3390/educsci16060949 (registering DOI) - 16 Jun 2026

Viewed by 196

The shift towards collaborative teaching practices and the development of Innovative Learning Environments (ILEs) have brought significant changes to educational settings, particularly regarding acoustic demands and classroom listening. The experiences of teachers and other professionals working within ILEs are under-researched. Therefore, the aim [...] Read more.

The shift towards collaborative teaching practices and the development of Innovative Learning Environments (ILEs) have brought significant changes to educational settings, particularly regarding acoustic demands and classroom listening. The experiences of teachers and other professionals working within ILEs are under-researched. Therefore, the aim of this study was to explore how educators and professionals perceive and manage classroom listening and learning within these spaces. Using a qualitative approach, semi-structured interviews were conducted with eight primary school teachers and eight speech-language therapists working in various school ILEs. Transcripts were analysed using inductive thematic analysis. Three major themes were identified, namely experiences, creating order, and opportunities, with each theme encompassing two subthemes. Experiences included difficulties and positive experiences; creating order comprised teaching approaches and making it work; and opportunities involved collaboration and new roles. Key factors influencing the effectiveness of ILEs included collaboration, strategic resource use, use of scaffolding techniques, flexibility, and student engagement. The study highlighted the importance of mentorship for beginner teachers in order to foster a dynamic and supportive learning environment. Understanding these elements can help educators and professionals working together in ILE classrooms to shape their practices to enhance student outcomes. Full article

(This article belongs to the Special Issue Barriers to Learning and Participation in Educational Settings: Lights and Shadows Towards Inclusive Education)

17 pages, 3686 KB

Open AccessArticle

Aspects of Use of the Modern Lesbian Dialect in the Linguistic Landscape of Mytilene

by Costas Canakis and Irene Kouniarelli

Languages 2026, 11(6), 122; https://doi.org/10.3390/languages11060122 - 12 Jun 2026

Viewed by 869

Abstract

We focus on the use of the Modern Lesbian dialect in the linguistic landscape (LL), highlighting its diverse forms and functions. Since LL research primarily investigates written language in public space, emphasizing the dynamic relationship between language, place, and historicity, the growing visibility [...] Read more.

We focus on the use of the Modern Lesbian dialect in the linguistic landscape (LL), highlighting its diverse forms and functions. Since LL research primarily investigates written language in public space, emphasizing the dynamic relationship between language, place, and historicity, the growing visibility of the dialect in both physical and digital contexts (cf. the online–offline nexus) is particularly noteworthy. The presence of non-standard varieties in public discourse has been widely studied, revealing that aspects of language choice and use are related to the sustainability of minority languages, the shaping of linguistic attitudes and stereotypes, and the commodification of language as a cultural and economic resource. Within this framework, the data analyzed here illustrate positive attitudes toward Modern Lesbian, expressions of pride and comfort among its speakers, efforts to destigmatize dialectal speech, and indications of broader acceptance of Modern Lesbian. Meanwhile, the increasing commodification of the dialect is evident in its use for the promotion of products and services, capitalizing on its distinctiveness, despite its historical stigmatization vis-à-vis the standard. This development does not dissolve entrenched beliefs on the incompatibility of dialects with written discourse; rather, it capitalizes on the surprise (and humor) generated by their written presence in promotional contexts without resorting to humorous stereotyping. Full article

(This article belongs to the Special Issue The Modern Dialect of Lesbos: Selected Topics)

► Show Figures

Figure 1

18 pages, 1777 KB

Open AccessArticle

DeepFakeX: A Comprehensive Multimodal Deepfake Dataset for Research and Analysis

by Sonia Salman, Jawwad Ahmed Shamsi and Rizwan Qureshi

Data 2026, 11(6), 141; https://doi.org/10.3390/data11060141 - 11 Jun 2026

Viewed by 489

Abstract

The expanding capabilities of deep learning-based media synthesis have intensified concerns regarding the authenticity of digital content and the reliability of forensic analysis tools. In response to these challenges, this work introduces DeepFakeX, a collection of 800 synthetically generated videos available under controlled [...] Read more.

The expanding capabilities of deep learning-based media synthesis have intensified concerns regarding the authenticity of digital content and the reliability of forensic analysis tools. In response to these challenges, this work introduces DeepFakeX, a collection of 800 synthetically generated videos available under controlled access for research purposes. The dataset encompasses four distinct categories of AI-driven synthesis: facial identity replacement, audio track substitution, neural voice cloning, and combined audiovisual alteration. Unlike existing deepfake datasets that predominantly focus on facial synthesis, DeepFakeX covers a broader range of manipulation modalities, reflecting the diversity of synthetic media encountered in real-world settings. All deepfakes were generated using state-of-the-art, publicly available tools. Standardized post-processing procedures were applied to each video to ensure uniformity in terms of quality, duration and encoding format. DeepFakeX also emphasizes diversity in gender, age, ethnicity, and language. Video contexts span speeches, informational videos, movie clips, news broadcasts, and interviews that reflect content scenarios commonly encountered in real-world online environments. The dataset includes videos in both English and Urdu. The dataset’s quality and structural variability were assessed through visual and audio analyses using the Structural Similarity Index Measure (SSIM), Mel-Frequency Cepstral Coefficients (MFCCs), and Principal Component Analysis (PCA). The evaluation results revealed substantial variability within each manipulation category, along with clearly distinguishable patterns specific to each modality. DeepFakeX has been developed to facilitate rigorous and transparent research in deepfake detection, cross-modal forensic analysis, and AI-driven media forensics. It is hosted on Zenodo under controlled access for research use. Full article

25 pages, 478 KB

Open AccessArticle

A CEFR-Graded Lexicon and Morphology-Aware Benchmarks for Kazakh Lexical Complexity Prediction

by Gulnur Yerkebulan, Akerke Akanova, Zhantore Galymzhan and Nazira Ospanova

Technologies 2026, 14(6), 346; https://doi.org/10.3390/technologies14060346 - 9 Jun 2026

Viewed by 246

Abstract

Graded lexical resources aligned with the Common European Framework of Reference for Languages (CEFR) and lexical complexity prediction remain limited for low-resource Turkic languages, and the extent to which existing predictive models generalize to agglutinative morphology is unresolved. We introduce the first CEFR-graded [...] Read more.

Graded lexical resources aligned with the Common European Framework of Reference for Languages (CEFR) and lexical complexity prediction remain limited for low-resource Turkic languages, and the extent to which existing predictive models generalize to agglutinative morphology is unresolved. We introduce the first CEFR-graded lexicon for Kazakh, containing 4561 lemma–part-of-speech (POS) entries across A1–C1, and use it to test whether explicit morphology improves lexical complexity prediction. We compare handcrafted morphological features, XLM-RoBERTa contextual embeddings, and fusion models that combine both signal types on held-out CEFR classification. Our best model, a gated fusion of contextual embeddings with morphological features, achieves a macro-averaged F1 score of 0.360 and a mean absolute error of 1.125 on the held-out test set. Morphology provides useful information beyond character-level cues, contextual representations are strong on their own, and combining them yields the best supervised performance for this task. The paper therefore contributes a new CEFR resource for Turkic languages and evidence that morphology-aware modeling is useful for Kazakh lexical difficulty prediction. The results support Sustainable Development Goal 4 (Quality Education) by enabling objective assessment of learning-material complexity and adaptive Kazakh language learning. The derived lexicon and code are publicly available. Full article

► Show Figures

Figure 1

18 pages, 10628 KB

Open AccessArticle

From Speech to Summary in Turkmen: A Parameter-Efficient Neural Pipeline

by Ualsher Tukeyev and Maksim Ocheretin

Appl. Sci. 2026, 16(12), 5734; https://doi.org/10.3390/app16125734 - 6 Jun 2026

Viewed by 260

Abstract

This paper presents the development of a neural model pipeline for automatic speech recognition (ASR) and text summarization in Turkmen, a low-resource language with agglutinative morphology. For the ASR task, the MMS-1b-all model (Meta) was employed with LoRA adaptation and CTC decoding, fine-tuned [...] Read more.

This paper presents the development of a neural model pipeline for automatic speech recognition (ASR) and text summarization in Turkmen, a low-resource language with agglutinative morphology. For the ASR task, the MMS-1b-all model (Meta) was employed with LoRA adaptation and CTC decoding, fine-tuned on the Common Voice corpus (2733 samples). For summarization, the mBART-50-large model was used with Turkmen-specific tokenization and was trained on a news text corpus (10,248 samples). The following results were achieved: WER = 17.59% for ASR (baseline model: 107.33%) and ROUGE-L = 0.4255 for summarization (zero-shot baseline: 0.2294). The scientific contribution is the creation of a parameter-efficient neural pipeline for speech-to-summary for Turkmen. The developed system can be applied to automated meeting transcription and text data processing in the Turkmen language. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 982 KB

Open AccessArticle

Context-Oriented Method for Resolving Lexical Ambiguities in Speech Synthesis for a Low-Resource Language

by Elisa Izrailova, Andrey Ronzhin, Salaudin Umarkhadzhiev, Arslanbek Astemirov, Aleksandra Figurek and Zelimkhan Sultanov

Big Data Cogn. Comput. 2026, 10(6), 181; https://doi.org/10.3390/bdcc10060181 - 1 Jun 2026

Viewed by 302

Abstract

Disambiguation resolution in speech synthesis is one of the main challenges in text-to-speech conversion. Machine learning methods and artificial neural networks have been successfully applied to this problem in synthesis systems for English, Spanish, and other common languages. For low-resource languages, the available [...] Read more.

Disambiguation resolution in speech synthesis is one of the main challenges in text-to-speech conversion. Machine learning methods and artificial neural networks have been successfully applied to this problem in synthesis systems for English, Spanish, and other common languages. For low-resource languages, the available data are insufficient to train artificial neural networks, so heuristic methods for context analysis and selection of the correct homonym for polysemantic words should be used. The purpose of this study is to develop a word sense disambiguation (WSD) method for the low-resource Chechen language and to introduce it into a speech synthesis system. The study presents the developed method and three algorithms: AWEN (based on Euclidean distance), AWA (weighted average), and AWN (weighted normalized distance) for word sense disambiguation. A corpus of Chechen texts, CheWSData, was compiled, containing 15,035 manually selected sentences derived from 5 million annotated words and reflecting the natural frequency of polysemy across grammatical categories. Experimental results show that the proposed AWN method achieves the best performance, with an F1-score of 0.78 and an accuracy of 0.80, outperforming AWA (F1: 0.74) and AWEN (F1: 0.40). For specific parts of speech, AWN reaches F1-scores of 0.82 for nouns, 0.83 for verbs, and 0.85 for adverbs. Comparative analysis with existing WSD methods for low-resource languages (Kashmiri, Hausa, Assamese, Urdu, and Vietnamese) demonstrates that AWN is competitive, ranking second after ViConBERT (F1: 0.87) and ahead of XLM-R for Hausa (F1: 0.79). The developed software module for homonym recognition was integrated into the Chechen speech synthesis system, contributing to more natural synthesized speech. Full article

(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)

► Show Figures

Figure 1

14 pages, 917 KB

Open AccessArticle

Early Language Development in Infants and Toddlers with Hemato-Oncological Diseases: Preliminary Outcomes of a Shared Reading Intervention

by Giusy Melcarne, Roberta Maria Incardona, Giulia Marangon, Silvia Sorbara, Alessandra Biffi and Marta Tremolada

Diseases 2026, 14(6), 193; https://doi.org/10.3390/diseases14060193 - 29 May 2026

Viewed by 200

Abstract

Introduction: Children diagnosed with hemato-oncological cancers need intensive medical treatments and prolonged hospitalizations, which are associated with increased risk of impairment across multiple neurodevelopmental domains, particularly when exposure occurs within the first three years of life. Objective: This pilot study aimed to explore [...] Read more.

Introduction: Children diagnosed with hemato-oncological cancers need intensive medical treatments and prolonged hospitalizations, which are associated with increased risk of impairment across multiple neurodevelopmental domains, particularly when exposure occurs within the first three years of life. Objective: This pilot study aimed to explore early language performance in children under 36 months of age hospitalized in a pediatric hemato-oncology unit and to preliminarily investigate changes over time and potential associations with an early speech–language stimulation intervention based on shared reading. Specifically, the study investigated differences between language comprehension and production, as well as variations in linguistic outcomes according to gender and pathology type (liquid vs. solid). Methods: The study employed a sample of 29 children aged 2 to 36 months (M = 20.76, SD = 9.52). Baseline linguistic assessment was conducted using observational measures across multiple language domains, including lexical development (PinG), morphosyntactic abilities (PCGO, GALS), and articulatory skills (BAMF), together with an evaluation of general cognitive functioning (Griffiths). The intervention consisted of in-person shared reading sessions combined with concurrent parental counseling. Exploratory analyses were performed to examine changes over time and group-related differences. Results: Among the sample, lexical comprehension exceeded production (p < 0.001). Sex differences emerged only for lexical comprehension, with males performing worse than females (p = 0.048). Comparisons by diagnosis showed that children with solid tumors had significantly better articulation than those with hematologic malignancies (p = 0.018), with a trend toward higher morphosyntactic production. Longitudinal analyses on 13 children re-evaluated after six months of weekly shared reading intervention showed potential improvements in articulation, lexical production, and morphosyntactic production (all p < 0.01). Conclusions: These preliminary findings suggest that children in pediatric hemato-oncology settings may be vulnerable to expressive language difficulties. Shared reading interventions may represent a promising supportive approach for early language stimulation; however, further studies with larger samples and controlled designs are needed to better understand their potential contribution and effectiveness. Full article

(This article belongs to the Section Oncology)

► Show Figures

Figure 1

19 pages, 6501 KB

Open AccessArticle

Urdu–English Perceptual Confusions in Bilingual Children with Normal Hearing and Cochlear Implants: An Analysis of Place, Manner, and Voicing Features

by Amina Asif Siddiqui, Cila Umat, Farheen Naz Anis, Ayesha Butt and Kehkashan Kanwal

Audiol. Res. 2026, 16(3), 84; https://doi.org/10.3390/audiolres16030084 - 29 May 2026

Viewed by 190

Abstract

Background and Aims: Accuracy in speech perception in bilingual children is influenced by two phonological systems. This study compares phonological development in bilingual Urdu–English (UE) children with CIs with their hearing-age-matched peers with normal hearing (NH), by investigating whether bilingualism or any spectral [...] Read more.

Background and Aims: Accuracy in speech perception in bilingual children is influenced by two phonological systems. This study compares phonological development in bilingual Urdu–English (UE) children with CIs with their hearing-age-matched peers with normal hearing (NH), by investigating whether bilingualism or any spectral limitations of CI impact perception of UE phonemes. Method and Procedures: Children (n = 57) aged 3; 0–6; 11 years (28 CI, 29 NH) were assessed for speech perception using a custom-designed UE Speech Perception Test (UE-SPT), in quiet and noise (+5 dB SNR). Responses were analysed using confusion matrices, across phonological parameters of place, manner, and voicing to determine error patterns. Outcomes and Results: Significant deficits in CI children were found across all features, with voicing discrimination showing the largest errors (effect sizes d > 6), exacerbated by noise, especially for Urdu aspirated stops. CIs mastered only 8.3% Urdu-aspirated consonants at 6; 11 years compared to 91.7% mastered by NH peers, indicating critical language-specific vulnerabilities. Backing and substitutions errors were particularly seen in CI’s speech, whilst manner was preserved. Conclusion and Implications: UE bilingual phonological complexity compounded by inadequate speech processing abilities in CIs challenges them, underscoring urgent need for targeted speech therapy interventions focusing voicing contrasts and aspirated consonants, as well as environmental accommodations that reduce noise interference and enhance listening through CI, to optimise educational outcomes. This research contributes vital clinical guidance for supporting bilingual children with cochlear implants, addressing both environmental, technological and linguistic challenges. Full article

(This article belongs to the Collection Cochlear Implants: Challenges and Opportunities in Hearing Rehabilitation)

► Show Figures

Figure 1

14 pages, 595 KB

Open AccessArticle

Validation of the Adaptive Danish Sentence Test (DAST): Normative Data from a Template-Based, Linguistically Rich Sentence-in-Noise Test

by Abigail Anne Kressner, Kirsten Maria Jensen-Rico, Anja Kofoed Pedersen, Lars Bramsløw and Brent Kirkwood

Audiol. Res. 2026, 16(3), 75; https://doi.org/10.3390/audiolres16030075 - 19 May 2026

Viewed by 199

Abstract

Background/Objectives: This study describes the development and validation of the Danish Sentence Test (DAST), a Danish-language, adaptive speech-in-noise test constructed from a linguistically balanced corpus using a template-based method. This approach enables controlled linguistic variation while maintaining lexical consistency and may serve [...] Read more.

Background/Objectives: This study describes the development and validation of the Danish Sentence Test (DAST), a Danish-language, adaptive speech-in-noise test constructed from a linguistically balanced corpus using a template-based method. This approach enables controlled linguistic variation while maintaining lexical consistency and may serve as a model for developing similar speech materials in other languages. Methods: Sentences spoken by one female talker from the DAST corpus were sorted into 44 balanced lists of 20 sentences using a psychometric optimization procedure. Speech reception thresholds (SRTs) were measured in 20 normal-hearing participants using headphone playback with speech-shaped noise. Results: Across the 44 sentence lists, the mean SRT was −5.3 dB SNR, with list means within ±0.5 dB of the grand average under the tested configuration. The average within-subject standard deviation was 0.7 dB, and the grand-average psychometric slope was 18.5%/dB. A statistically significant within-session training effect of approximately 0.02 dB per measurement. Conclusions: This study provides normative speech reception threshold (SRT) data for the adaptive Danish Sentence Test (DAST) in normal-hearing listeners under a defined headphone-based speech-in-noise paradigm and demonstrates that the resulting sentence lists yield comparable performance across lists. The template-based construction and optimization approach offers a framework for developing linguistically rich sentence-in-noise tests in other languages. Full article

► Show Figures

Figure 1

36 pages, 10012 KB

Open AccessReview

Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis

by Subhashree Mohapatra, Jai Govind Singh, Subham Pankaj Samantaray and Manohar Mishra

Algorithms 2026, 19(5), 390; https://doi.org/10.3390/a19050390 - 14 May 2026

Viewed by 373

Abstract

In 1997, Long Short-Term Memory (LSTM) networks were proposed, which significantly changed the landscape of sequential data analysis by resolving the critical issue of the vanishing gradient problem in recurrent neural networks (RNNs). Over the last 25 years, LSTM has advanced from its [...] Read more.

In 1997, Long Short-Term Memory (LSTM) networks were proposed, which significantly changed the landscape of sequential data analysis by resolving the critical issue of the vanishing gradient problem in recurrent neural networks (RNNs). Over the last 25 years, LSTM has advanced from its inception as an innovative solution to its widespread adoption as an essential tool in various fields, including natural language processing (NLP), speech recognition, financial prediction, and healthcare analytics. The present study is a bibliometric review of the evolution of LSTMs. The evolution of LSTM is discussed in terms of its theoretical advancements, architectural developments, and its applications. The study is based on data obtained from the Scopus database, which is then analyzed to identify publication patterns, prominent authors, prominent institutions, and global contributions to the field. The present study is an insightful review of the evolution of LSTM, highlighting its developments and advancements, as well as its applications, to identify its future scope. Full article

► Show Figures

Figure 1

33 pages, 1423 KB

Open AccessReview

Non-Prosthetic Assistive Technologies for Persons with Hearing Losses: A Survey

by Reemas Alsubaiei, Farah AlHayek, Mariam Alsahhaf, Ghadah Alajmi, Aliah Almutairi, Karim Youssef, Ghina El Mir, Sherif Said, Taha Beyrouthy and Samer Al Kork

Technologies 2026, 14(5), 302; https://doi.org/10.3390/technologies14050302 - 13 May 2026

Viewed by 611

Abstract

Millions of persons worldwide experience varying degrees of hearing loss, traditionally addressed through prosthetic solutions such as hearing aids and cochlear implants. However, a significant proportion of individuals cannot benefit from these technologies, cannot access them, or choose not to use them. In [...] Read more.

Millions of persons worldwide experience varying degrees of hearing loss, traditionally addressed through prosthetic solutions such as hearing aids and cochlear implants. However, a significant proportion of individuals cannot benefit from these technologies, cannot access them, or choose not to use them. In this context, non-prosthetic assistive technologies have emerged as a complementary paradigm, leveraging advances in sensing, artificial intelligence, and wearable computing to transform acoustic information into alternative perceptual representations rather than restoring auditory function. This survey provides a review of such systems, focusing on technologies that enhance environmental awareness, communication, and social interaction. Existing approaches are categorized along two main dimensions: the tasks they perform and the platforms on which they operate. Task-oriented analysis includes sound recognition (speech and non-speech), sound source localization, emotion recognition, sign language recognition, and related emerging functionalities. Platform-based analysis emphasizes wearable devices and mobile solutions enabling real-time and context-aware assistance. The survey further highlights key research trends, including real-time auditory scene analysis, portable processing, and artificial intelligence. It shows that recent studies increasingly demonstrate that combining auditory, visual, and haptic modalities improves robustness and usability in real-world conditions, particularly in noisy and dynamic environments. Finally, open challenges such as energy efficiency, latency, evaluation methodologies, and user acceptance are discussed. By synthesizing existing work and identifying open research directions, this survey aims to provide a structured foundation for future developments in intelligent, non-prosthetic assistive systems that redefine how auditory information is accessed and interpreted. Full article

(This article belongs to the Section Assistive Technologies)

► Show Figures

Figure 1

10 pages, 363 KB

Open AccessArticle

Mapping Speech-Language Pathology and Audiology Rehabilitation Services Across Saudi Arabia: A Retrospective Cross-Sectional Study

by Mohammed F. Alharbi and Ahmad A. Alanazi

Audiol. Res. 2026, 16(3), 69; https://doi.org/10.3390/audiolres16030069 - 10 May 2026

Viewed by 392

Abstract

Background: Speech-language pathology (SLP) and audiology services are essential components of multidisciplinary rehabilitation, particularly for individuals with developmental, neurological, and communication-related disorders. National-level data describing the distribution and utilization of these services in Saudi Arabia remain limited. This study aimed to examine national [...] Read more.

Background: Speech-language pathology (SLP) and audiology services are essential components of multidisciplinary rehabilitation, particularly for individuals with developmental, neurological, and communication-related disorders. National-level data describing the distribution and utilization of these services in Saudi Arabia remain limited. This study aimed to examine national patterns of rehabilitation service utilization, with a focus on SLP and audiology services in comparison to other rehabilitation specialties. Methods: A retrospective cross-sectional analysis was conducted using publicly available national open data released by the Saudi Ministry of Health (MOH). Aggregated rehabilitation service encounters (n = 1,872,328 to 1,930,695) from 2023–2024 were analyzed by specialty, geographic region, sector (MOH clusters versus private sector), and pediatric age groups. Descriptive statistics were used to characterize utilization patterns and regional variation. Results: Rehabilitation services were widely delivered across both public and private sectors, with physiotherapy representing the largest share of encounters. SLP and audiology services contributed a smaller proportion of total rehabilitation encounters compared to other specialties. Service distribution varied regionally, with higher volumes concentrated in major urban areas including Riyadh, Makkah, and the Eastern Region. Pediatric service encounters were highest in early childhood (ages 3–7), with SLP and audiology services forming a consistent component of rehabilitation during this period. Conclusions: This study provides a descriptive overview of rehabilitation service utilization in Saudi Arabia, highlighting the distribution of SLP and audiology services relative to other specialties and across regions. Findings emphasize the importance of addressing regional variation, supporting workforce development, and enhancing national rehabilitation data systems to inform planning and ensure comprehensive access to communication and hearing services. Full article

► Show Figures

Figure 1

29 pages, 2632 KB

Open AccessArticle

AI-Based Framework for Arabic Language Proficiency Assessment: A Deep Learning ASR Model with Enhanced Similarity Measures

by Sufian A. Badawi, Maen Takruri, Khouloud Salameh, Mohammad Al-Badawi, Nowar Alani, Isam ElBadawi, Aws Al-Qaisi and Ghaleb Aldoboni

Future Internet 2026, 18(5), 251; https://doi.org/10.3390/fi18050251 - 9 May 2026

Viewed by 377

Abstract

This work presents an innovative approach to test the Arabic language proficiency assessment via Automatic Speech Recognition (ASR) by enhancing the proficiency of the Whisper model in transcribing Arabic speech. The core of our research involved fine-tuning the Whisper model using a substantial, [...] Read more.

This work presents an innovative approach to test the Arabic language proficiency assessment via Automatic Speech Recognition (ASR) by enhancing the proficiency of the Whisper model in transcribing Arabic speech. The core of our research involved fine-tuning the Whisper model using a substantial, large-scale Arabic speech corpus, with a specific focus on Modern Standard Arabic. This process used a 2000-h Arabic-labeled speech corpus, the QASR dataset, and improved the model’s Word Error Rate (WER). After optimization, the fine-tuned Whisper model’s WER was reduced from 35% to 7% on the QASR dataset, corresponding to an absolute reduction of 28 percentage points (approximately 80% relative reduction). These results demonstrate the strong generalization ability of the fine-tuned model across multiple Arabic ASR benchmarks. A key component of our methodology was the development of a sophisticated scoring system. This system integrates various similarity metrics, such as cosine similarity, the Jaccard index, and the Levenshtein distance, with a machine learning regression model. This multifaceted system provides a comprehensive assessment of reading proficiency, proposing a practical automated assessment method that contributes to the field of AI language transcription and to its application in the assessment of students’ reading. Our research also introduces the ICONET dataset, an augmented Arabic speech corpus comprising 3160 h of diverse and tailored audio–text pairs designed for fine-tuning ASR models. This study demonstrates the potential of fine-tuning pretrained models for specific linguistic contexts (Arabic), establishing a foundation for future research in ASR and language technology. Full article

(This article belongs to the Topic Learning to Live with Gen-AI)

► Show Figures

Figure 1

26 pages, 2700 KB

Open AccessStudy Protocol

A Speech Analytics-Based Methodological Protocol for Monitoring Orthopedic Rehabilitation in the Brazilian Unified Health System

by Rafael Baena Neto and Vicente Idalberto Becerra Sablón

Int. J. Environ. Res. Public Health 2026, 23(5), 626; https://doi.org/10.3390/ijerph23050626 - 8 May 2026

Viewed by 380

Abstract

The digital transformation of health systems and the increasing adoption of data-driven public health strategies have intensified the need for methods capable of capturing, structuring, and analyzing information derived from clinical interactions. In the Brazilian Unified Health System (SUS), orthopedic rehabilitation and therapeutic [...] Read more.

The digital transformation of health systems and the increasing adoption of data-driven public health strategies have intensified the need for methods capable of capturing, structuring, and analyzing information derived from clinical interactions. In the Brazilian Unified Health System (SUS), orthopedic rehabilitation and therapeutic exercise prescription rely heavily on communication between healthcare professionals and patients, particularly with regard to understanding instructions, reporting symptoms, and identifying barriers to treatment continuity. However, much of this information remains embedded in unstructured spoken interactions, limiting its use for monitoring and evaluation purposes. This study presents a prospective methodological protocol for the future development and validation of a speech analytics architecture designed to analyze verbal interactions in orthopedic rehabilitation within the SUS. The proposed framework integrates automatic speech recognition, speaker diarization, semantic processing with large language models (LLMs), biomedical entity extraction, and retrieval-grounded analytical components to generate structured indicators from clinical speech. In addition, the manuscript includes an illustrative simulation based on administrative proxy data converted into synthetic narratives in order to exemplify the expected structure of downstream analytical outputs. This simulation does not constitute validation of the full audio-based pipeline, but rather serves to clarify the proposed analytical workflow. Overall, the protocol establishes a structured methodological basis for future empirical studies aimed at evaluating the technical performance, semantic validity, and potential public health utility of speech analytics in rehabilitation monitoring, under appropriate ethical, regulatory, and data protection safeguards. Full article

(This article belongs to the Special Issue The Physiological Effects of Sports and Exercise)

► Show Figures

Graphical abstract

16 pages, 4498 KB

Open AccessArticle

Decoding Mandarin Action Verbs from EEG Using a Dual-LSTM Network: Towards Practical Assistive Brain–Computer Interfaces

by Binshuo Liu, Gengbiao Chen, Lairong Yin and Jing Liu

Sensors 2026, 26(9), 2749; https://doi.org/10.3390/s26092749 - 29 Apr 2026

Viewed by 398

Abstract

Electroencephalogram (EEG)-based brain–computer interfaces (BCIs) offer a promising pathway for restoring communication. Decoding tonal languages like Mandarin from EEG remains challenging due to homophones and complex temporal dynamics. This study investigates the decoding of six high-frequency Mandarin action verbs—Chi (eat), He (drink), Chuan [...] Read more.

Electroencephalogram (EEG)-based brain–computer interfaces (BCIs) offer a promising pathway for restoring communication. Decoding tonal languages like Mandarin from EEG remains challenging due to homophones and complex temporal dynamics. This study investigates the decoding of six high-frequency Mandarin action verbs—Chi (eat), He (drink), Chuan (wear), Na (take), Kan (look), and Dai (put on)—from EEG signals. We designed a visual-cue-based overt speech production experiment and collected EEG data from 30 participants during visually guided verb reading aloud. A recurrent neural network framework incorporating dual Long Short-Term Memory (LSTM) layers was implemented to model the long-range temporal dependencies in EEG patterns. The proposed model was compared against a traditional Common Spatial Pattern combined with Support Vector Machine (CSP-SVM) baseline. Our LSTM-based model achieved an average classification accuracy of 69.93% ± 3.07% for the six-class task, significantly outperforming the CSP-SVM baseline (36.53% ± 3.17%). Accuracy exceeded 75% under specific training conditions, including more than 15 training repetitions and a training-data proportion of 38%. Furthermore, the model attained this performance level utilizing approximately 38% of the available trial data for training, demonstrating data efficiency. The results indicate that the LSTM architecture can effectively capture the neural signatures associated with Mandarin verb processing, providing a foundation for developing practical EEG-based assistive communication technologies. The inference latency of the trained model, quantified as the post-training per-trial testing time, was under 2 s, supporting near-real-time applications. Full article

(This article belongs to the Special Issue Sensor-Based EEG Brain–Computer Interfaces: Technologies and Applications)

► Show Figures

Figure 1

Search Results (583)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (583)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI