MDPI - Publisher of Open Access Journals

21 pages, 2789 KiB

Open AccessArticle

BIM-Based Adversarial Attacks Against Speech Deepfake Detectors

by Wendy Edda Wang, Davide Salvi, Viola Negroni, Daniele Ugo Leonzio, Paolo Bestagini and Stefano Tubaro

Electronics 2025, 14(15), 2967; https://doi.org/10.3390/electronics14152967 - 24 Jul 2025

Viewed by 203

Abstract

Automatic Speaker Verification (ASV) systems are increasingly employed to secure access to services and facilities. However, recent advances in speech deepfake generation pose serious threats to their reliability. Modern speech synthesis models can convincingly imitate a target speaker’s voice and generate realistic synthetic [...] Read more.

Automatic Speaker Verification (ASV) systems are increasingly employed to secure access to services and facilities. However, recent advances in speech deepfake generation pose serious threats to their reliability. Modern speech synthesis models can convincingly imitate a target speaker’s voice and generate realistic synthetic audio, potentially enabling unauthorized access through ASV systems. To counter these threats, forensic detectors have been developed to distinguish between real and fake speech. Although these models achieve strong performance, their deep learning nature makes them susceptible to adversarial attacks, i.e., carefully crafted, imperceptible perturbations in the audio signal that make the model unable to classify correctly. In this paper, we explore adversarial attacks targeting speech deepfake detectors. Specifically, we analyze the effectiveness of Basic Iterative Method (BIM) attacks applied in both time and frequency domains under white- and black-box conditions. Additionally, we propose an ensemble-based attack strategy designed to simultaneously target multiple detection models. This approach generates adversarial examples with balanced effectiveness across the ensemble, enhancing transferability to unseen models. Our experimental results show that, although crafting universally transferable attacks remains challenging, it is possible to fool state-of-the-art detectors using minimal, imperceptible perturbations, highlighting the need for more robust defenses in speech deepfake detection. Full article

(This article belongs to the Special Issue Selected Papers from Young Researchers in Signal/Image/Video Coding and Processing, 2nd Edition)

► Show Figures

Figure 1

23 pages, 3741 KiB

Open AccessArticle

Multi-Corpus Benchmarking of CNN and LSTM Models for Speaker Gender and Age Profiling

by Jorge Jorrin-Coz, Mariko Nakano, Hector Perez-Meana and Leobardo Hernandez-Gonzalez

Computation 2025, 13(8), 177; https://doi.org/10.3390/computation13080177 - 23 Jul 2025

Viewed by 247

Abstract

Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, [...] Read more.

Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, crowdsourced Mozilla Common Voice, and in-the-wild VoxCeleb1. All models share the same architecture, optimizer, and data preprocessing; no corpus-specific hyperparameter tuning is applied. We perform a detailed preprocessing and feature extraction procedure, evaluating multiple configurations and validating their applicability and effectiveness in improving the obtained results. A feature analysis shows that Mel spectrograms benefit CNNs, whereas Mel Frequency Cepstral Coefficients (MFCCs) suit LSTMs, and that the optimal Mel-bin count grows with corpus Signal Noise Rate (SNR). With this fixed recipe, EfficientNet achieves 99.82% gender accuracy on Common Voice (+1.25 pp over the previous best) and 98.86% on VoxCeleb1 (+0.57 pp). MobileNet attains 99.86% age-group accuracy on Common Voice (+2.86 pp) and a 5.35-year MAE for age estimation on TIMIT using a lightweight configuration. The consistent, near-state-of-the-art results across three acoustically diverse datasets substantiate the robustness and versatility of the proposed pipeline. Code and pre-trained weights are released to facilitate downstream research. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Graphical abstract

10 pages, 857 KiB

Open AccessProceeding Paper

Implementation of a Prototype-Based Parkinson’s Disease Detection System Using a RISC-V Processor

by Krishna Dharavathu, Pavan Kumar Sankula, Uma Maheswari Vullanki, Subhan Khan Mohammad, Sai Priya Kesapatnapu and Sameer Shaik

Eng. Proc. 2025, 87(1), 97; https://doi.org/10.3390/engproc2025087097 - 21 Jul 2025

Viewed by 160

Abstract

In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not [...] Read more.

In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not receive an early diagnosis may develop an incurable neurological disorder. PD is a degenerative disorder of the brain, characterized by the impairment of the nigrostriatal system. A wide range of symptoms of motor and non-motor impairment accompanies this disorder. By using new technology, the PD is detected through speech signals of the PD victims by using the reduced instruction set computing 5th version (RISC-V) processor. The RISC-V microcontroller unit (MCU) was designed for the voice-controlled human-machine interface (HMI). With the help of signal processing and feature extraction methods, the digital signal is impaired by the impairment of the nigrostriatal system. These speech signals can be classified through classifier modules. A wide range of classifier modules are used to classify the speech signals as normal or abnormal to identify PD. We use Matrix Laboratory (MATLAB R2021a_v9.10.0.1602886) to analyze the data, develop algorithms, create modules, and develop the RISC-V processor for embedded implementation. Machine learning (ML) techniques are also used to extract features such as pitch, tremor, and Mel-frequency cepstral coefficients (MFCCs). Full article

(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)

► Show Figures

Figure 1

21 pages, 1118 KiB

Open AccessReview

Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines

by Yutong Liu, Qingquan Sun and Dhruvi Rajeshkumar Kapadia

AI 2025, 6(7), 158; https://doi.org/10.3390/ai6070158 - 15 Jul 2025

Viewed by 1262

Abstract

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into [...] Read more.

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs. Full article

► Show Figures

Figure 1

15 pages, 1359 KiB

Open AccessArticle

Phoneme-Aware Hierarchical Augmentation and Semantic-Aware SpecAugment for Low-Resource Cantonese Speech Recognition

by Lusheng Zhang, Shie Wu and Zhongxun Wang

Sensors 2025, 25(14), 4288; https://doi.org/10.3390/s25144288 - 9 Jul 2025

Viewed by 403

Abstract

Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments [...] Read more.

Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments and Tacotron-2 synthesis, injects adversarial phoneme variants into both transcripts and their aligned audio segments, enlarging pronunciation diversity. Concurrently, a semantic-aware SpecAugment scheme exploits wav2vec 2.0 attention heat maps and keyword boundaries to adaptively mask informative time–frequency regions; a reinforcement-learning controller tunes the masking schedule online, forcing the model to rely on a wider context. On the Common Voice Cantonese 50 h subset, the combined strategy reduces the character error rate (CER) from 26.17% to 16.88% with wav2vec 2.0 and from 38.83% to 23.55% with Zipformer. At 100 h, the CER further drops to 4.27% and 2.32%, yielding relative gains of 32–44%. Ablation studies confirm that phoneme-level and masking components provide complementary benefits. The framework offers a practical, model-independent path toward accurate ASR for Cantonese and other low-resource tonal languages. This paper presents an intelligent sensing-oriented modeling framework for speech signals, which is suitable for deployment on edge or embedded systems to process input from audio sensors (e.g., microphones) and shows promising potential for voice-interactive terminal applications. Full article

(This article belongs to the Special Issue Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis)

► Show Figures

Figure 1

7 pages, 404 KiB

Open AccessBrief Report

A Signal for Voice and Speech Abnormalities in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome

by Stephanie L. Grach, Jaime Seltzer and Diana M. Orbelo

J. Clin. Med. 2025, 14(14), 4847; https://doi.org/10.3390/jcm14144847 - 8 Jul 2025

Viewed by 2072

Abstract

Background/Objectives: Patients with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) may report abnormalities in voice and speech; however, no formal research has been conducted in this area. Methods: An online mixed-methods survey was completed by 685 people with ME/CFS. A total of 302 [...] Read more.

Background/Objectives: Patients with myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) may report abnormalities in voice and speech; however, no formal research has been conducted in this area. Methods: An online mixed-methods survey was completed by 685 people with ME/CFS. A total of 302 respondents completed the qualitative component (44.09%). Questions assessed disease experience with ME/CFS and post-exertional malaise without prompting on specific symptoms. Within the qualitative results, a search of the terms “speech, voice,” “words,” and “speak” was conducted. Results: Excluding neurocognitive associations, colloquial phrases, and “speech therapy,” there were 38 mentions of the terms in the context of voice or speech changes across 28 unique qualitative survey responses (9.27%). Conclusions: A notable portion of respondents reported voice or speech changes when responding to open-ended qualitative questions about their disease experience. More research is needed regarding the implications of voice and speech anomalies in ME/CFS pathology and disease monitoring. Full article

(This article belongs to the Special Issue POTS, ME/CFS and Long COVID: Recent Advances and Future Direction)

► Show Figures

Figure 1

15 pages, 815 KiB

Open AccessArticle

Tests of the Influence of DAF (Delayed Auditory Feedback) on Changes in Speech Signal Parameters

by Dominika Kanty and Piotr Staroniewicz

Appl. Sci. 2025, 15(13), 7524; https://doi.org/10.3390/app15137524 - 4 Jul 2025

Viewed by 250

Abstract

Contemporary phonetics and speech therapy continuously seek new techniques and methods that could contribute to improving verbal communication for individuals with speech disorders. One such phenomenon, Delayed Auditory Feedback (DAF), involves the speaker hearing their own voice with a specific delay relative to [...] Read more.

Contemporary phonetics and speech therapy continuously seek new techniques and methods that could contribute to improving verbal communication for individuals with speech disorders. One such phenomenon, Delayed Auditory Feedback (DAF), involves the speaker hearing their own voice with a specific delay relative to real-time speech. Although the research presented in this study was conducted on healthy individuals, it offers valuable insights into the mechanisms controlling speech, which may also apply to individuals with speech disorders. This article introduces a novel method and measurement setup, focusing on selected key speech signal parameters. To characterize the impact of Delayed Auditory Feedback (DAF) on fluent speakers, speech signal parameters were measured in 5 women and 5 men during spontaneous speech and reading. Parameters such as speech rate, fundamental frequency, formants, speech duration, jitter, and shimmer were analyzed both during and prior to the application of DAF. The results of this study may find practical applications in the field of telecommunications, especially in improving the efficiency and quality of human communication. Full article

► Show Figures

Figure 1

23 pages, 1945 KiB

Open AccessArticle

Spectro-Image Analysis with Vision Graph Neural Networks and Contrastive Learning for Parkinson’s Disease Detection

by Nuwan Madusanka, Hadi Sedigh Malekroodi, H. M. K. K. M. B. Herath, Chaminda Hewage, Myunggi Yi and Byeong-Il Lee

J. Imaging 2025, 11(7), 220; https://doi.org/10.3390/jimaging11070220 - 2 Jul 2025

Viewed by 349

Abstract

This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into [...] Read more.

This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0–2 kHz), mid-frequency (2–6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

27 pages, 4853 KiB

Open AccessReview

Robotic Systems for Cochlear Implant Surgeries: A Review of Robotic Design and Clinical Outcomes

by Oneeba Ahmed, Mingfeng Wang, Bin Zhang, Richard Irving, Philip Begg and Xinli Du

Electronics 2025, 14(13), 2685; https://doi.org/10.3390/electronics14132685 - 2 Jul 2025

Viewed by 575

Abstract

Sensorineural hearing loss occurs when cochlear hair cells fail to convert mechanical sound waves into electrical signals transmitted via the auditory nerve. Cochlear implants (CIs) restore hearing by directly stimulating the auditory nerve with electrical impulses, often while preserving residual hearing. Over the [...] Read more.

Sensorineural hearing loss occurs when cochlear hair cells fail to convert mechanical sound waves into electrical signals transmitted via the auditory nerve. Cochlear implants (CIs) restore hearing by directly stimulating the auditory nerve with electrical impulses, often while preserving residual hearing. Over the past two decades, robotic-assisted techniques in otologic surgery have gained prominence for improving precision and safety. Robotic systems support critical procedures such as mastoidectomy, cochleostomy drilling, and electrode array (EA) insertion. These technologies aim to minimize trauma and enhance hearing preservation. Despite the outpatient nature of most CI surgeries, surgeons still face challenges, including anatomical complexity, imaging demands, and rising costs. Robotic systems help address these issues by streamlining workflows, reducing variability, and improving electrode placement accuracy. This review evaluates robotic systems developed for cochlear implantation, focusing on their design, surgical integration, and clinical outcomes. This review concludes that robotic systems offer low insertion speed, which leads to reduced insertion forces and lower intracochlear pressure. However, their impact on trauma, long-term hearing preservation, and speech outcome remains uncertain. Further research is needed to assess clinical durability, cost-effectiveness, and patient-reported outcomes. Full article

(This article belongs to the Special Issue Emerging Biomedical Electronics)

► Show Figures

Figure 1

14 pages, 1112 KiB

Open AccessArticle

Individual Noise-Tolerance Profiles and Neural Signal-to-Noise Ratio: Insights into Predicting Speech-in-Noise Performance and Noise-Reduction Outcomes

by Subong Kim, Susan Arzac, Natalie Dokic, Jenn Donnelly, Nicole Genser, Kristen Nortwich and Alexis Rooney

Audiol. Res. 2025, 15(4), 78; https://doi.org/10.3390/audiolres15040078 - 2 Jul 2025

Viewed by 265

Abstract

Background/Objectives: Individuals with similar hearing sensitivity exhibit varying levels of tolerance to background noise, a trait tied to unique individual characteristics that affect their responsiveness to noise reduction (NR) processing in hearing aids. The present study aimed to capture such individual characteristics [...] Read more.

Background/Objectives: Individuals with similar hearing sensitivity exhibit varying levels of tolerance to background noise, a trait tied to unique individual characteristics that affect their responsiveness to noise reduction (NR) processing in hearing aids. The present study aimed to capture such individual characteristics by employing electrophysiological measures and subjective noise-tolerance profiles, and both were analyzed in relation to speech-in-noise performance and NR outcomes. Methods: From a sample of 42 participants with normal hearing, the neural signal-to-noise ratio (SNR)—a cortical index comparing the amplitude ratio between auditory evoked responses to target speech onset versus noise onset—was calculated, and individual noise-tolerance profiles were also derived using k-means cluster analysis to classify participants into distinct subgroups. Results: The neural SNR showed significant correlations with speech-in-noise performance and NR outcomes with varying strength. In contrast, noise-tolerance subgroups did not show meaningful group-level differences in either speech-in-noise or NR outcomes. The neural SNR and noise-tolerance profiles were found to be statistically independent. Conclusions: While the neural SNR reliably predicted perceptual performance in background noise and NR outcomes, our noise-tolerance profiles lacked sufficient sensitivity. Still, subjective ratings of individual noise tolerance are clinically accessible, and thus, integrating both physiology and subjective measures in the same cohort is a valuable strategy. Full article

► Show Figures

Figure 1

25 pages, 2093 KiB

Open AccessArticle

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

by Samuel Yaw Mensah, Tao Zhang, Nahid AI Mahmud and Yanzhang Geng

Electronics 2025, 14(13), 2643; https://doi.org/10.3390/electronics14132643 - 30 Jun 2025

Viewed by 733

Abstract

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and [...] Read more.

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. Full article

► Show Figures

Figure 1

11 pages, 670 KiB

Open AccessArticle

LLM-Enhanced Chinese Morph Resolution in E-Commerce Live Streaming Scenarios

by Xiaoye Ouyang, Liu Yuan, Xiaocheng Hu, Jiahao Zhu and Jipeng Qiang

Entropy 2025, 27(7), 698; https://doi.org/10.3390/e27070698 - 29 Jun 2025

Viewed by 345

Abstract

E-commerce live streaming in China has become a major retail channel, yet hosts often employ subtle phonetic or semantic “morphs” to evade moderation and make unsubstantiated claims, posing risks to consumers. To address this, we study the Live Auditory Morph Resolution (LiveAMR) task, [...] Read more.

E-commerce live streaming in China has become a major retail channel, yet hosts often employ subtle phonetic or semantic “morphs” to evade moderation and make unsubstantiated claims, posing risks to consumers. To address this, we study the Live Auditory Morph Resolution (LiveAMR) task, which restores morphed speech transcriptions to their true forms. Building on prior text-based morph resolution, we propose an LLM-enhanced training framework that mines three types of explanation knowledge—predefined morph-type labels, LLM-generated reference corrections, and natural-language rationales constrained for clarity and comprehensiveness—from a frozen large language model. These annotations are concatenated with the original morphed sentence and used to fine-tune a lightweight T5 model under a standard cross-entropy objective. In experiments on two test sets (in-domain and out-of-domain), our method achieves substantial gains over baselines, improving

F_{0.5}

by up to 7 pp in-domain (to 0.943) and 5 pp out-of-domain (to 0.799) compared to a strong T5 baseline. These results demonstrate that structured LLM-derived signals can be mined without fine-tuning the LLM itself and injected into small models to yield efficient, accurate morph resolution. Full article

(This article belongs to the Special Issue Natural Language Processing and Data Mining)

► Show Figures

Figure 1

25 pages, 2054 KiB

Open AccessArticle

Perception and Interpretation of Contrastive Pitch Accent During Spoken Language Processing in Autistic Children

by Pumpki Lei Su, Duane G. Watson, Stephen Camarata and James Bodfish

Languages 2025, 10(7), 161; https://doi.org/10.3390/languages10070161 - 28 Jun 2025

Viewed by 451

Abstract

Although prosodic differences in autistic individuals have been widely documented, little is known about their ability to perceive and interpret specific prosodic features, such as contrastive pitch accent—a prosodic signal that places emphasis and helps listeners distinguish between competing referents in discourse. This [...] Read more.

Although prosodic differences in autistic individuals have been widely documented, little is known about their ability to perceive and interpret specific prosodic features, such as contrastive pitch accent—a prosodic signal that places emphasis and helps listeners distinguish between competing referents in discourse. This study addresses that gap by investigating the extent to which autistic children can (1) perceive contrastive pitch accent (i.e., discriminate contrastive pitch accent differences in speech); (2) interpret contrastive pitch accent (i.e., use prosodic cues to guide real-time language comprehension); and (3) the extent to which their ability to interpret contrastive pitch accent is associated with broader language and social communication skills, including receptive prosody, pragmatic language, social communication, and autism severity. Twenty-four autistic children and 24 neurotypical children aged 8 to 14 completed an AX same–different task and a visual-world paradigm task to assess their ability to perceive and interpret contrastive pitch accent. Autistic children demonstrated the ability to perceive and interpret contrastive pitch accent, as evidenced by comparable discrimination ability to neurotypical peers on the AX task and real-time revision of visual attention based on prosodic cues in the visual-world paradigm. However, autistic children showed significantly slower reaction time during the AX task, and a subgroup of autistic children with language impairment showed significantly slower processing of contrastive pitch accent during the visual-world paradigm task. Additionally, speed of contrastive pitch accent processing was significantly associated with pragmatic language skills and autism symptom severity in autistic children. Overall, these findings suggest that while autistic children as a group are able to discriminate prosodic forms and interpret the pragmatic function of contrastive pitch accent during spoken language comprehension, differences in prosody processing in autistic children may be reflected not in accuracy, but in speed of processing measures and in specific subgroups defined by language ability. Full article

(This article belongs to the Special Issue Advances in the Acquisition of Prosody)

► Show Figures

Figure 1

26 pages, 478 KiB

Open AccessArticle

Physical Disabilities and Impediments to the Priesthood According to Orthodox Canon Law, with a Case Study of the Romanian Orthodox Church

by Răzvan Perșa

Religions 2025, 16(6), 789; https://doi.org/10.3390/rel16060789 - 17 Jun 2025

Viewed by 744

Abstract

This study examines, within the broader context of historical and cultural influences from Byzantine and Western canonical traditions, the canonical and theological treatment of physical disabilities as impediments to the priesthood within modern Orthodox Canon Law. It shows how traditional Orthodox Canon Law, [...] Read more.

This study examines, within the broader context of historical and cultural influences from Byzantine and Western canonical traditions, the canonical and theological treatment of physical disabilities as impediments to the priesthood within modern Orthodox Canon Law. It shows how traditional Orthodox Canon Law, particularly influenced by medieval Roman Catholic canonical understanding, has historically emphasised physical integrity as a requirement for ordination. The study critically examines historical and contemporary canonical attitudes towards candidates with hearing, speech, or visual impairments or with locomotor disability through the analysis of Apostolic canons, Canons of Ecumenical Councils, and later canonical sources. The methods include a critical canonical and historical analysis of primary sources such as the Canons, patristic writings, and synodal legislation, with particular reference to the initiatives of the Romanian Orthodox Church in the modern cultural and pastoral context. The study observes that, although such impairments continue to be recognised as canonical impediments according to traditional Orthodox law, contemporary ecclesial practice increasingly reflects a pastoral sensitivity that allows, in certain contexts, for the inclusion of persons with disabilities in ordained ministry. This is typically achieved through adaptations that preserve the integrity of liturgical function, such as assistance from co-ministers or specialised training. These developments, while not amounting to a formal canonical revision, signal a broader pastoral and ecclesiological openness toward the integration of persons with disabilities within the life of the Church. Full article

(This article belongs to the Special Issue The Ethics of the Body in Different Religious Traditions and Spiritual Discourses)

20 pages, 1481 KiB

Open AccessArticle

Analysis and Research on Spectrogram-Based Emotional Speech Signal Augmentation Algorithm

by Huawei Tao, Sixian Li, Xuemei Wang, Binkun Liu and Shuailong Zheng

Entropy 2025, 27(6), 640; https://doi.org/10.3390/e27060640 - 15 Jun 2025

Viewed by 368

Abstract

Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may [...] Read more.

Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may distort emotional labels, thereby reducing the performance of models. To address this issue, in this paper we systematically evaluate the influence of common data augmentation algorithms on emotion recognition from three dimensions: (1) we design subjective auditory experiments to intuitively demonstrate the impact of augmentation algorithms on the emotional expression of speech; (2) we jointly extract multi-dimensional features from spectrograms based on the Librosa library and analyze the impact of data augmentation algorithms on the spectral features of speech signals through heatmap visualization; and (3) we objectively evaluate the recognition performance of the model by means of indicators such as cross-entropy loss and introduce statistical significance analysis to verify the effectiveness of the augmentation algorithms. The experimental results show that “time stretching” may distort speech features, affect the attribution of emotional labels, and significantly reduce the model’s accuracy. In contrast, “reverberation” (RIR) and “resampling” within a limited range have the least impact on emotional data, enhancing the diversity of samples. Moreover, their combination can increase accuracy by up to 7.1%, providing a basis for optimizing data augmentation strategies. Full article

► Show Figures

Figure 1

Search Results (742)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (742)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI