Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = interaural phase difference (IPD)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 3313 KiB  
Article
Big-Delay Estimation for Speech Separation in Assisted Living Environments
by Swarnadeep Bagchi and Ruairí de Fréin
Future Internet 2025, 17(4), 184; https://doi.org/10.3390/fi17040184 - 21 Apr 2025
Viewed by 444
Abstract
Phase wraparound due to large inter-sensor spacings in multi-channel demixing renders the DUET and AdRess source separation algorithms—known for their low computational complexity and effective speech demixing performance—unsuitable for hearing-assisted living applications, where such configurations are needed. DUET is limited to relative delays [...] Read more.
Phase wraparound due to large inter-sensor spacings in multi-channel demixing renders the DUET and AdRess source separation algorithms—known for their low computational complexity and effective speech demixing performance—unsuitable for hearing-assisted living applications, where such configurations are needed. DUET is limited to relative delays of up to 7 samples, given a sampling rate of Fs=16 kHz in anechoic scenarios, while the AdRess algorithm is constrained to instantaneous mixing problems. The task of this paper is to improve the performance of DUET-type time–frequency (TF) masks when microphones are placed far apart. A significant challenge in assistive hearing scenarios is phase wraparound caused by large relative delays. We evaluate the performance of a large relative delay estimation method, called the Elevatogram, in the presence of significant phase wraparound. We present extensions of DUET and AdRess, termed Elevato-DUET and Elevato-AdRess, which are effective in scenarios with relative delays of up to 200 samples. The findings demonstrate that Elevato-AdRess not only outperforms Elevato-DUET in terms of objective separation quality metrics—BSS_Eval and PEASS—but also achieves higher intelligibility scores, as measured by the Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Score (MOS) scores. These findings suggest that the phase wraparound limitations of DUET and AdRess algorithms in assistive hearing scenarios involving large inter-microphone spacing can be addressed by introducing the Elevatogram-based Elevato-DUET and Elevato-AdRess algorithms. These algorithms improve separation quality and intelligibility, with Elevato-AdRess demonstrating the best overall performance. Full article
Show Figures

Figure 1

15 pages, 2051 KiB  
Article
DIFFBAS: An Advanced Binaural Audio Synthesis Model Focusing on Binaural Differences Recovery
by Yusen Li, Ying Shen and Dongqing Wang
Appl. Sci. 2024, 14(8), 3385; https://doi.org/10.3390/app14083385 - 17 Apr 2024
Cited by 1 | Viewed by 1516
Abstract
Binaural audio synthesis (BAS) aims to restore binaural audio from mono signals obtained from the environment to enhance users’ immersive experiences. It plays an essential role in building Augmented Reality and Virtual Reality environments. Existing deep neural network (DNN)-based BAS systems synthesize binaural [...] Read more.
Binaural audio synthesis (BAS) aims to restore binaural audio from mono signals obtained from the environment to enhance users’ immersive experiences. It plays an essential role in building Augmented Reality and Virtual Reality environments. Existing deep neural network (DNN)-based BAS systems synthesize binaural audio by modeling the overall sound propagation processes from the source to the left and right ears, which encompass early decay, room reverberation, and head/ear-related filtering. However, this end-to-end modeling approach brings in the overfitting problem for BAS models when they are trained using a small and homogeneous data set. Moreover, existing losses cannot well supervise the training process. As a consequence, the accuracy of synthesized binaural audio is far from satisfactory on binaural differences. In this work, we propose a novel DNN-based BAS method, namely DIFFBAS, to improve the accuracy of synthesized binaural audio from the perspective of the interaural phase difference. Specifically, DIFFBAS is trained using the average signals of the left and right channels. To make the model learn the binaural differences, we propose a new loss named Interaural Phase Difference (IPD) loss to supervise the model training. Extensive experiments have been performed and the results demonstrate the effectiveness of the DIFFBAS model and the IPD loss. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

17 pages, 1244 KiB  
Article
Sound Source Localization Using a Convolutional Neural Network and Regression Model
by Tan-Hsu Tan, Yu-Tang Lin, Yang-Lang Chang and Mohammad Alkhaleefah
Sensors 2021, 21(23), 8031; https://doi.org/10.3390/s21238031 - 1 Dec 2021
Cited by 25 | Viewed by 7321
Abstract
In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD [...] Read more.
In this research, a novel sound source localization model is introduced that integrates a convolutional neural network with a regression model (CNN-R) to estimate the sound source angle and distance based on the acoustic characteristics of the interaural phase difference (IPD). The IPD features of the sound signal are firstly extracted from time-frequency domain by short-time Fourier transform (STFT). Then, the IPD features map is fed to the CNN-R model as an image for sound source localization. The Pyroomacoustics platform and the multichannel impulse response database (MIRD) are used to generate both simulated and real room impulse response (RIR) datasets. The experimental results show that an average accuracy of 98.96% and 98.31% are achieved by the proposed CNN-R for angle and distance estimations in the simulation scenario at SNR = 30 dB and RT60 = 0.16 s, respectively. Moreover, in the real environment, the average accuracies of the angle and distance estimations are 99.85% and 99.38% at SNR = 30 dB and RT60 = 0.16 s, respectively. The performance obtained in both scenarios is superior to that of existing models, indicating the potential of the proposed CNN-R model for real-life applications. Full article
(This article belongs to the Special Issue Intelligent Acoustic Sensors and Its Applications)
Show Figures

Figure 1

15 pages, 5941 KiB  
Article
Room Volume Estimation Based on Ambiguity of Short-Term Interaural Phase Differences Using Humanoid Robot Head
by Ryuichi Shimoyama and Reo Fukuda
Robotics 2016, 5(3), 16; https://doi.org/10.3390/robotics5030016 - 21 Jul 2016
Cited by 3 | Viewed by 7474
Abstract
Humans can recognize approximate room size using only binaural audition. However, sound reverberation is not negligible in most environments. The reverberation causes temporal fluctuations in the short-term interaural phase differences (IPDs) of sound pressure. This study proposes a novel method for a binaural [...] Read more.
Humans can recognize approximate room size using only binaural audition. However, sound reverberation is not negligible in most environments. The reverberation causes temporal fluctuations in the short-term interaural phase differences (IPDs) of sound pressure. This study proposes a novel method for a binaural humanoid robot head to estimate room volume. The method is based on the statistical properties of the short-term IPDs of sound pressure. The humanoid robot turns its head toward a sound source, recognizes the sound source, and then estimates the ego-centric distance by its stereovision. By interpolating the relations between room volume, average standard deviation, and ego-centric distance experimentally obtained for various rooms in a prepared database, the room volume was estimated by the binaural audition of the robot from the average standard deviation of the short-term IPDs at the estimated distance. Full article
Show Figures

Graphical abstract

Back to TopTop