Skip Content
You are currently on the new version of our website. Access the old version .

364 Results Found

  • Article
  • Open Access
812 Views
34 Pages

The Interaction of Target and Masker Speech in Competing Speech Perception

  • Sheyenne Fishero,
  • Joan A. Sereno and
  • Allard Jongman

Background/Objectives: Speech perception typically takes place against a background of other speech or noise. The present study investigates the effectiveness of segregating speech streams within a competing speech signal, examining whether cues such...

  • Feature Paper
  • Article
  • Open Access
8 Citations
2,828 Views
14 Pages

26 June 2021

Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathol...

  • Article
  • Open Access
3 Citations
7,702 Views
17 Pages

3 July 2023

The Demucs-Denoiser model has been recently shown to achieve a high level of performance for online speech enhancement, but assumes that only one speech source is present in the fed mixture. In real-life multiple-speech-source scenarios, it is not ce...

  • Article
  • Open Access
18 Citations
3,948 Views
13 Pages

29 September 2022

The “cocktail party” problem—how a listener perceives speech in noisy environments—is typically studied using speech (multi-talker babble) or noise maskers. However, realistic cocktail party scenarios often include background...

  • Article
  • Open Access
10 Citations
3,436 Views
17 Pages

23 August 2018

Enhancing speech captured by distant microphones is a challenging task. In this study, we investigate the multichannel signal properties of the single acoustic vector sensor (AVS) to obtain the inter-sensor data ratio (ISDR) model in the time-frequen...

  • Article
  • Open Access
1 Citations
2,111 Views
24 Pages

14 October 2024

The selection of a target when training deep neural networks for speech enhancement is an important consideration. Different masks have been shown to exhibit different performance characteristics depending on the application and the conditions. This...

  • Article
  • Open Access
6 Citations
4,951 Views
15 Pages

Target Speaker Extraction by Fusing Voiceprint Features

  • Shidan Cheng,
  • Ying Shen and
  • Dongqing Wang

15 August 2022

It is a critical problem to accurately separate clean speech in the multispeaker scenario for different speakers. However, in most cases, smart devices such as smart phones interact with only one specific user. As a consequence, the speech separation...

  • Article
  • Open Access
4 Citations
3,935 Views
24 Pages

Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network

  • Jian-Hong Wang,
  • Yen-Ting Lai,
  • Tzu-Chiang Tai,
  • Phuong Thi Le,
  • Tuan Pham,
  • Ze-Yu Wang,
  • Yung-Hui Li,
  • Jia-Ching Wang and
  • Pao-Chi Chang

When recording conversations, there may be multiple people talking at once. While our human ears can filter out unwanted sounds, this can be challenging for automatic speech recognition (ASR) systems, leading to reduced accuracy. To address this issu...

  • Hypothesis
  • Open Access
2 Citations
3,039 Views
36 Pages

30 December 2024

Speech is a highly skilled motor activity that shares a core problem with other motor skills: how to reduce the massive degrees of freedom (DOF) to the extent that the central nervous control and learning of complex motor movements become possible. I...

  • Article
  • Open Access
6 Citations
8,267 Views
18 Pages

14 October 2013

Cochlear implants (CIs) require efficient speech processing to maximize information transmission to the brain, especially in noise. A novel CI processing strategy was proposed in our previous studies, in which sparsity-constrained non-negative matrix...

  • Article
  • Open Access
6 Citations
4,436 Views
19 Pages

Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly fr...

  • Article
  • Open Access
778 Views
22 Pages

18 November 2025

Background/objectives: Speech sound disorder (SSD) and developmental language disorder (DLD) are common childhood disorders of communication that can also co-occur. This study investigated the reported content, format and delivery of UK speech and la...

  • Article
  • Open Access
7 Citations
3,855 Views
18 Pages

Image and Speech Recognition Technology in the Development of an Elderly Care Robot: Practical Issues Review and Improvement Strategies

  • Chin-Shyurng Fahn,
  • Szu-Chieh Chen,
  • Po-Yuan Wu,
  • Tsung-Lan Chu,
  • Cheng-Hung Li,
  • Deng-Quan Hsu,
  • Hsiu-Hung Wang and
  • Hsiu-Min Tsai

10 November 2022

As the world’s population is aging and there is a shortage of sufficient caring manpower, the development of intelligent care robots is a feasible solution. At present, plenty of care robots have been developed, but humanized care robots that c...

  • Article
  • Open Access
1 Citations
2,553 Views
12 Pages

Current target-speaker extraction (TSE) models have achieved good performance in separating target speech from highly overlapped multi-talker speech. However, in real-world applications, multi-talker speech is often sparsely overlapped, and the targe...

  • Article
  • Open Access
2 Citations
2,975 Views
10 Pages

Background: Spatial release from masking (SRM) is the improvement in speech intelligibility when the masking signals are spatially separated from the target signal. Young, normal- hearing listeners have a robust auditory sys-tem that is capable of us...

  • Article
  • Open Access
5 Citations
2,780 Views
19 Pages

An Electroglottograph Auxiliary Neural Network for Target Speaker Extraction

  • Lijiang Chen,
  • Zhendong Mo,
  • Jie Ren,
  • Chunfeng Cui and
  • Qi Zhao

29 December 2022

The extraction of a target speaker from mixtures of different speakers has attracted extensive amounts of attention and research. Previous studies have proposed several methods, such as SpeakerBeam, to tackle this speech extraction problem using clea...

  • Article
  • Open Access
1 Citations
3,130 Views
19 Pages

22 November 2022

How speech prosody is processed in the brain during language production remains an unsolved issue. The present work used the phrase-recall paradigm to analyze brain oscillation underpinning rhythmic processing in speech production. Participants were...

  • Article
  • Open Access
7 Citations
6,746 Views
15 Pages

22 June 2020

In the context of assisted human, identifying and enhancing non-stationary speech targets speech in various noise environments, such as a cocktail party, is an important issue for real-time speech separation. Previous studies mostly used microphone s...

  • Article
  • Open Access
2,615 Views
18 Pages

The quality of air traffic control speech is crucial. However, internal and external noise can impact air traffic control speech quality. Clear speech instructions and feedback help optimize flight processes and responses to emergencies. The traditio...

  • Article
  • Open Access
10 Citations
7,356 Views
24 Pages

21 October 2020

In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore...

  • Article
  • Open Access
6 Citations
5,237 Views
18 Pages

The BioVisualSpeech Corpus of Words with Sibilants for Speech Therapy Games Development

  • Sofia Cavaco,
  • Isabel Guimarães,
  • Mariana Ascensão,
  • Alberto Abad,
  • Ivo Anjos,
  • Francisco Oliveira,
  • Sofia Martins,
  • Nuno Marques,
  • Maxine Eskenazi and
  • Margarida Grilo
  • + 1 author

2 October 2020

In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including...

  • Article
  • Open Access
6 Citations
4,736 Views
15 Pages

22 October 2019

Tonal languages make use of pitch variation for distinguishing lexical semantics, and their melodic richness seems comparable to that of music. The present study investigated a novel priming effect of melody on the pitch processing of Mandarin speech...

  • Article
  • Open Access
4 Citations
4,818 Views
19 Pages

In Time with the Beat: Entrainment in Patients with Phonological Impairment, Apraxia of Speech, and Parkinson’s Disease

  • Ingrid Aichert,
  • Katharina Lehner,
  • Simone Falk,
  • Mona Späth,
  • Mona Franke and
  • Wolfram Ziegler

18 November 2021

In the present study, we investigated if individuals with neurogenic speech sound impairments of three types, Parkinson’s dysarthria, apraxia of speech, and aphasic phonological impairment, accommodate their speech to the natural speech rhythm of an...

  • Article
  • Open Access
2 Citations
2,037 Views
14 Pages

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora

  • Yuan Zong,
  • Hailun Lian,
  • Hongli Chang,
  • Cheng Lu and
  • Chuangao Tang

5 September 2022

In this paper, we focus on a challenging, but interesting, task in speech emotion recognition (SER), i.e., cross-corpus SER. Unlike conventional SER, a feature distribution mismatch may exist between the labeled source (training) and target (testing)...

  • Article
  • Open Access
4 Citations
4,318 Views
16 Pages

There is a large interest in the annotation of speech addressed to infants. Infant-directed speech (IDS) has acoustic properties that might pose a challenge to automatic speech recognition (ASR) tools developed for adult-directed speech (ADS). While...

  • Article
  • Open Access
7 Citations
3,415 Views
14 Pages

Effect of Distracting Background Speech in an Auditory Brain–Computer Interface

  • Álvaro Fernández-Rodríguez,
  • Ricardo Ron-Angevin,
  • Ernesto J. Sanz-Arigita,
  • Antoine Parize,
  • Juliette Esquirol,
  • Alban Perrier,
  • Simon Laur,
  • Jean-Marc André,
  • Véronique Lespinet-Najib and
  • Liliana Garcia

Studies so far have analyzed the effect of distractor stimuli in different types of brain–computer interface (BCI). However, the effect of a background speech has not been studied using an auditory event-related potential (ERP-BCI), a convenient opti...

  • Article
  • Open Access
15 Citations
2,720 Views
10 Pages

This study examined the effects of orofacial myofunctional therapy (OMT) on speech intelligibility in adults with persistent articulation impairments. Six adults in the age range of 18–23 years were selected to receive orofacial myofunctional therapy...

  • Article
  • Open Access
2,848 Views
35 Pages

Multi-Channel Speech Enhancement Using Labelled Random Finite Sets and a Neural Beamformer in Cocktail Party Scenario

  • Jayanta Datta,
  • Ali Dehghan Firoozabadi,
  • David Zabala-Blanco and
  • Francisco R. Castillo-Soria

8 March 2025

In this research, a multi-channel target speech enhancement scheme is proposed that is based on deep learning (DL) architecture and assisted by multi-source tracking using a labeled random finite set (RFS) framework. A neural network based on minimum...

  • Article
  • Open Access
6 Citations
2,165 Views
13 Pages

Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition

  • Cheng Lu,
  • Yuan Zong,
  • Chuangao Tang,
  • Hailun Lian,
  • Hongli Chang,
  • Jie Zhu,
  • Sunan Li and
  • Yan Zhao

31 August 2022

In this paper, we investigate the problem of cross-corpus speech emotion recognition (SER), in which the training (source) and testing (target) speech samples belong to different corpora. This case thus leads to a feature distribution mismatch betwee...

  • Article
  • Open Access
4 Citations
2,796 Views
18 Pages

30 December 2022

Electro-laryngeal (EL) speech has poor intelligibility and naturalness, which hampers the popular use of the electro-larynx. Voice conversion (VC) can enhance EL speech. However, if the EL speech to be enhanced is with complicated tone variation rule...

  • Article
  • Open Access
22 Citations
4,741 Views
21 Pages

Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments

  • Jaehun Bang,
  • Taeho Hur,
  • Dohyeong Kim,
  • Thien Huynh-The,
  • Jongwon Lee,
  • Yongkoo Han,
  • Oresti Banos,
  • Jee-In Kim and
  • Sungyoung Lee

2 November 2018

Personalized emotion recognition provides an individual training model for each target user in order to mitigate the accuracy problem when using general training models collected from multiple users. Existing personalized speech emotion recognition r...

  • Article
  • Open Access
5 Citations
2,858 Views
15 Pages

Speech Recognition for Task Domains with Sparse Matched Training Data

  • Byung Ok Kang,
  • Hyeong Bae Jeon and
  • Jeon Gue Park

4 September 2020

We propose two approaches to handle speech recognition for task domains with sparse matched training data. One is an active learning method that selects training data for the target domain from another general domain that already has a significant am...

  • Article
  • Open Access
16 Citations
3,453 Views
19 Pages

Extended Preoperative Audiometry for Outcome Prediction and Risk Analysis in Patients Receiving Cochlear Implants

  • Jan-Henrik Rieck,
  • Annika Beyer,
  • Alexander Mewes,
  • Amke Caliebe and
  • Matthias Hey

Background: The outcome of cochlear implantation has improved over the last decades, but there are still patients with less benefit. Despite numerous studies examining the cochlear implant (CI) outcome, variations in speech comprehension with CI rema...

  • Article
  • Open Access
17 Citations
4,194 Views
27 Pages

30 September 2020

For reliable speech recognition, it is necessary to handle the usage environments. In this study, we target voice-driven multi-unmanned aerial vehicles (UAVs) control. Although many studies have introduced several systems for voice-driven UAV control...

  • Article
  • Open Access
2 Citations
3,586 Views
15 Pages

Laser monitoring has received more and more attention in many application fields thanks to its essential advantages. The analysis shows that the target speech in the laser monitoring signals is often interfered by the echoes, resulting in a decline i...

  • Article
  • Open Access
4 Citations
5,361 Views
13 Pages

Speech Identification and Comprehension in the Urban Soundscape

  • Letizia Marchegiani,
  • Xenofon Fafoutis and
  • Sahar Abbaspour

Urban environments are characterised by the presence of copious and unstructured noise. This noise continuously challenges speech intelligibility both in normal-hearing and hearing-impaired individuals. In this paper, we investigate the impact of urb...

  • Article
  • Open Access
26 Citations
4,075 Views
37 Pages

CNN-Based Identification of Parkinson’s Disease from Continuous Speech in Noisy Environments

  • Paul Faragó,
  • Sebastian-Aurelian Ștefănigă,
  • Claudia-Georgiana Cordoș,
  • Laura-Ioana Mihăilă,
  • Sorin Hintea,
  • Ana-Sorina Peștean,
  • Michel Beyer,
  • Lăcrămioara Perju-Dumbravă and
  • Robert Radu Ileșan

Parkinson’s disease is a progressive neurodegenerative disorder caused by dopaminergic neuron degeneration. Parkinsonian speech impairment is one of the earliest presentations of the disease and, along with tremor, is suitable for pre-diagnosis...

  • Article
  • Open Access
2 Citations
3,299 Views
16 Pages

22 December 2022

Online multi-microphone speech enhancement aims to extract target speech from multiple noisy inputs by exploiting the spatial information as well as the spectro-temporal characteristics with low latency. Acoustic parameters such as the acoustic trans...

  • Article
  • Open Access
11 Citations
3,118 Views
15 Pages

29 July 2022

Cross-corpus speech emotion recognition (SER) is a challenging task, and its difficulty lies in the mismatch between the feature distributions of the training (source domain) and testing (target domain) data, leading to the performance degradation wh...

  • Article
  • Open Access
15 Citations
7,785 Views
26 Pages

Infants Segment Words from Songs—An EEG Study

  • Tineke M. Snijders,
  • Titia Benders and
  • Paula Fikkert

Children’s songs are omnipresent and highly attractive stimuli in infants’ input. Previous work suggests that infants process linguistic–phonetic information from simplified sung melodies. The present study investigated whether infa...

  • Article
  • Open Access
4 Citations
2,536 Views
13 Pages

Prepulse inhibition (PPI) is the reduction in the acoustic startle reflex (ASR) when the startling stimulus (pulse) is preceded by a weaker, non-starting stimulus. This can be enhanced by facilitating selective attention to the prepulse against a noi...

  • Review
  • Open Access
14 Citations
6,822 Views
28 Pages

Processing of Degraded Speech in Brain Disorders

  • Jessica Jiang,
  • Elia Benhamou,
  • Sheena Waters,
  • Jeremy C. S. Johnson,
  • Anna Volkmer,
  • Rimona S. Weil,
  • Charles R. Marshall,
  • Jason D. Warren and
  • Chris J. D. Hardy

The speech we hear every day is typically “degraded” by competing sounds and the idiosyncratic vocal characteristics of individual speakers. While the comprehension of “degraded” speech is normally automatic, it depends on dynamic and adaptive proces...

  • Article
  • Open Access
1 Citations
3,079 Views
17 Pages

Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasi...

  • Article
  • Open Access
1 Citations
1,910 Views
21 Pages

BIM-Based Adversarial Attacks Against Speech Deepfake Detectors

  • Wendy Edda Wang,
  • Davide Salvi,
  • Viola Negroni,
  • Daniele Ugo Leonzio,
  • Paolo Bestagini and
  • Stefano Tubaro

Automatic Speaker Verification (ASV) systems are increasingly employed to secure access to services and facilities. However, recent advances in speech deepfake generation pose serious threats to their reliability. Modern speech synthesis models can c...

  • Review
  • Open Access
139 Citations
33,722 Views
22 Pages

Online toxic discourses could result in conflicts between groups or harm to online communities. Hate speech is complex and multifaceted harmful or offensive content targeting individuals or groups. Existing literature reviews have generally focused o...

  • Article
  • Open Access
2 Citations
5,487 Views
26 Pages

28 December 2022

The implementation of an intervention protocol aimed at increasing vocal complexity in three pre-linguistic children with cerebral palsy (two males, starting age 15 months, and one female, starting age 16 months) was evaluated utilising a repeated AB...

  • Article
  • Open Access
4 Citations
3,071 Views
17 Pages

11 October 2022

In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source separation techniques are needed to separate target sounds. Downstream tasks...

  • Article
  • Open Access
954 Views
25 Pages

14 October 2025

This longitudinal study examines the acquisition of target-like patterns of phonological variation by 17 second language (L2) French learners during a semester or year of study abroad (SA) in France. In this study, speech data from sociolinguistic in...

  • Article
  • Open Access
1 Citations
2,410 Views
19 Pages

7 March 2025

It has been demonstrated that interactive speech and noise modeling outperforms traditional speech modeling-only methods for speech enhancement (SE). With a dual-branch topology that simultaneously predicts target speech and noise signals and employs...

of 8