MDPI - Publisher of Open Access Journals

27 pages, 3313 KiB

Open AccessArticle

Big-Delay Estimation for Speech Separation in Assisted Living Environments

by Swarnadeep Bagchi and Ruairí de Fréin

Future Internet 2025, 17(4), 184; https://doi.org/10.3390/fi17040184 - 21 Apr 2025

Viewed by 444

Phase wraparound due to large inter-sensor spacings in multi-channel demixing renders the DUET and AdRess source separation algorithms—known for their low computational complexity and effective speech demixing performance—unsuitable for hearing-assisted living applications, where such configurations are needed. DUET is limited to relative delays [...] Read more.

F_{s} = 16

kHz in anechoic scenarios, while the AdRess algorithm is constrained to instantaneous mixing problems. The task of this paper is to improve the performance of DUET-type time–frequency (TF) masks when microphones are placed far apart. A significant challenge in assistive hearing scenarios is phase wraparound caused by large relative delays. We evaluate the performance of a large relative delay estimation method, called the Elevatogram, in the presence of significant phase wraparound. We present extensions of DUET and AdRess, termed Elevato-DUET and Elevato-AdRess, which are effective in scenarios with relative delays of up to 200 samples. The findings demonstrate that Elevato-AdRess not only outperforms Elevato-DUET in terms of objective separation quality metrics—BSS_Eval and PEASS—but also achieves higher intelligibility scores, as measured by the Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Score (MOS) scores. These findings suggest that the phase wraparound limitations of DUET and AdRess algorithms in assistive hearing scenarios involving large inter-microphone spacing can be addressed by introducing the Elevatogram-based Elevato-DUET and Elevato-AdRess algorithms. These algorithms improve separation quality and intelligibility, with Elevato-AdRess demonstrating the best overall performance. Full article

► Show Figures

Figure 1

15 pages, 2051 KiB

Open AccessArticle

DIFFBAS: An Advanced Binaural Audio Synthesis Model Focusing on Binaural Differences Recovery

by Yusen Li, Ying Shen and Dongqing Wang

Appl. Sci. 2024, 14(8), 3385; https://doi.org/10.3390/app14083385 - 17 Apr 2024

Cited by 1 | Viewed by 1516

Abstract

Binaural audio synthesis (BAS) aims to restore binaural audio from mono signals obtained from the environment to enhance users’ immersive experiences. It plays an essential role in building Augmented Reality and Virtual Reality environments. Existing deep neural network (DNN)-based BAS systems synthesize binaural audio by modeling the overall sound propagation processes from the source to the left and right ears, which encompass early decay, room reverberation, and head/ear-related filtering. However, this end-to-end modeling approach brings in the overfitting problem for BAS models when they are trained using a small and homogeneous data set. Moreover, existing losses cannot well supervise the training process. As a consequence, the accuracy of synthesized binaural audio is far from satisfactory on binaural differences. In this work, we propose a novel DNN-based BAS method, namely DIFFBAS, to improve the accuracy of synthesized binaural audio from the perspective of the interaural phase difference. Specifically, DIFFBAS is trained using the average signals of the left and right channels. To make the model learn the binaural differences, we propose a new loss named Interaural Phase Difference (IPD) loss to supervise the model training. Extensive experiments have been performed and the results demonstrate the effectiveness of the DIFFBAS model and the IPD loss. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

17 pages, 1244 KiB

Open AccessArticle

Sound Source Localization Using a Convolutional Neural Network and Regression Model

by Tan-Hsu Tan, Yu-Tang Lin, Yang-Lang Chang and Mohammad Alkhaleefah

Sensors 2021, 21(23), 8031; https://doi.org/10.3390/s21238031 - 1 Dec 2021

Cited by 25 | Viewed by 7321

Abstract

In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications. Full article

(This article belongs to the Special Issue Intelligent Acoustic Sensors and Its Applications)

► Show Figures

Figure 1

15 pages, 5941 KiB

Open AccessArticle

Room Volume Estimation Based on Ambiguity of Short-Term Interaural Phase Differences Using Humanoid Robot Head

by Ryuichi Shimoyama and Reo Fukuda

Robotics 2016, 5(3), 16; https://doi.org/10.3390/robotics5030016 - 21 Jul 2016

Cited by 3 | Viewed by 7474

Abstract

Humans can recognize approximate room size using only binaural audition. However, sound reverberation is not negligible in most environments. The reverberation causes temporal fluctuations in the short-term interaural phase differences (IPDs) of sound pressure. This study proposes a novel method for a binaural humanoid robot head to estimate room volume. The method is based on the statistical properties of the short-term IPDs of sound pressure. The humanoid robot turns its head toward a sound source, recognizes the sound source, and then estimates the ego-centric distance by its stereovision. By interpolating the relations between room volume, average standard deviation, and ego-centric distance experimentally obtained for various rooms in a prepared database, the room volume was estimated by the binaural audition of the robot from the average standard deviation of the short-term IPDs at the estimated distance. Full article

► Show Figures

Graphical abstract

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI