MDPI - Publisher of Open Access Journals

12 pages, 1782 KiB

Open AccessArticle

The Accuracy of Dynamic Sound Source Localization and Recognition Ability of Individual Head-Related Transfer Functions in Binaural Audio Systems with Head Tracking

by Vedran Planinec, Jonas Reijniers, Marko Horvat, Herbert Peremans and Kristian Jambrošić

Appl. Sci. 2023, 13(9), 5254; https://doi.org/10.3390/app13095254 - 23 Apr 2023

Cited by 3 | Viewed by 2990

Abstract

The use of audio systems that employ binaural synthesis with head tracking has become increasingly popular, particularly in virtual reality gaming systems. The binaural synthesis process uses the Head-Related Transfer Functions (HRTF) as an input required to assign the directions of arrival to [...] Read more.

The use of audio systems that employ binaural synthesis with head tracking has become increasingly popular, particularly in virtual reality gaming systems. The binaural synthesis process uses the Head-Related Transfer Functions (HRTF) as an input required to assign the directions of arrival to sounds coming from virtual sound sources in the created virtual environments. Generic HRTFs are often used for this purpose to accommodate all potential listeners. The hypothesis of the research is that the use of individual HRTF in binaural synthesis instead of generic HRTF leads to improved accuracy and quality of virtual sound source localization, thus enhancing the user experience. A novel methodology is proposed that involves the use of dynamic virtual sound sources. In the experiments, the test participants were asked to determine the direction of a dynamic virtual sound source in both the horizontal and vertical planes using both generic and individual HRTFs. The gathered data are statistically analyzed, and the accuracy of localization is assessed with respect to the type of HRTF used. The individual HRTFs of the test participants are measured using a novel and efficient method that is accessible to a broad range of users. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

26 pages, 10336 KiB

Open AccessArticle

The Performance of Inertial Measurement Unit Sensors on Various Hardware Platforms for Binaural Head-Tracking Applications

by Petar Franček, Kristian Jambrošić, Marko Horvat and Vedran Planinec

Sensors 2023, 23(2), 872; https://doi.org/10.3390/s23020872 - 12 Jan 2023

Cited by 14 | Viewed by 5654

Abstract

Binaural synthesis with head tracking is often used in spatial audio systems. The devices used for head tracking must provide data on the orientation of the listener’s head. These data need to be highly accurate, and they need to be provided as fast [...] Read more.

Binaural synthesis with head tracking is often used in spatial audio systems. The devices used for head tracking must provide data on the orientation of the listener’s head. These data need to be highly accurate, and they need to be provided as fast and as frequently as possible. Therefore, head-tracking devices need to be equipped with high-quality inertial measurement unit (IMU) sensors. Since IMUs readily include triaxial accelerometers, gyroscopes, and magnetometers, it is crucial that all of these sensors perform well, as the head orientation is calculated from all sensor outputs. This paper discusses the challenges encountered in the process of the performance assessment of IMUs through appropriate measurements. Three distinct hardware platforms were investigated: five IMU sensors either connected to Arduino-based embedded systems or being an integral part of one, five smartphones across a broad range of overall quality with integrated IMUs, and a commercial virtual reality unit that utilizes a headset with integrated IMUs. An innovative measurement method is presented and proposed for comparing the performance of sensors on all three platforms. The results of the measurements performed using the proposed method show that all three investigated platforms are adequate for the acquisition of the data required for calculating the orientation of a device as the input to the binaural synthesis process. Some limitations that have been observed during the measurements, regarding data acquisition and transfer, are discussed. Full article

(This article belongs to the Special Issue Smart Sensor Integration in Wearables)

► Show Figures

Figure 1

18 pages, 3158 KiB

Open AccessArticle

Creating Audio Object-Focused Acoustic Environments for Room-Scale Virtual Reality

by Constantin Popp and Damian T. Murphy

Appl. Sci. 2022, 12(14), 7306; https://doi.org/10.3390/app12147306 - 20 Jul 2022

Cited by 10 | Viewed by 5562

Abstract

Room-scale virtual reality (VR) affordance in movement and interactivity causes new challenges in creating virtual acoustic environments for VR experiences. Such environments are typically constructed from virtual interactive objects that are accompanied by an Ambisonic bed and an off-screen (“invisible”) music soundtrack, with [...] Read more.

Room-scale virtual reality (VR) affordance in movement and interactivity causes new challenges in creating virtual acoustic environments for VR experiences. Such environments are typically constructed from virtual interactive objects that are accompanied by an Ambisonic bed and an off-screen (“invisible”) music soundtrack, with the Ambisonic bed, music, and virtual acoustics describing the aural features of an area. This methodology can become problematic in room-scale VR as the player cannot approach or interact with such background sounds, contradicting the player’s motion aurally and limiting interactivity. Written from a sound designer’s perspective, the paper addresses these issues by proposing a musically inclusive novel methodology that reimagines an acoustic environment predominately using objects that are governed by multimodal rule-based systems and spatialized in six degrees of freedom using 3D binaural audio exclusively while minimizing the use of Ambisonic beds and non-diegetic music. This methodology is implemented using off-the-shelf, creator-oriented tools and methods and is evaluated through the development of a standalone, narrative, prototype room-scale VR experience. The experience’s target platform is a mobile, untethered VR system based on head-mounted displays, inside-out tracking, head-mounted loudspeakers or headphones, and hand-held controllers. The authors apply their methodology to the generation of ambiences based on sound-based music, sound effects, and virtual acoustics. The proposed methodology benefits the interactivity and spatial behavior of virtual acoustic environments but may be constrained by platform and project limitations. Full article

(This article belongs to the Special Issue Mixed Reality Games—Playful Experiences in Immersive and Interactive Media)

► Show Figures

Figure 1

23 pages, 20456 KiB

Open AccessArticle

Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down-Positioned Music Ensembles in Binaural Recordings

by Sławomir K. Zieliński, Paweł Antoniuk and Hyunkook Lee

Appl. Sci. 2022, 12(3), 1569; https://doi.org/10.3390/app12031569 - 1 Feb 2022

Cited by 3 | Viewed by 2343

Abstract

The automatic localization of audio sources distributed symmetrically with respect to coronal or transverse planes using binaural signals still poses a challenging task, due to the front–back and up–down confusion effects. This paper demonstrates that the convolutional neural network (CNN) can be used [...] Read more.

The automatic localization of audio sources distributed symmetrically with respect to coronal or transverse planes using binaural signals still poses a challenging task, due to the front–back and up–down confusion effects. This paper demonstrates that the convolutional neural network (CNN) can be used to automatically localize music ensembles panned to the front, back, up, or down positions. The network was developed using the repository of the binaural excerpts obtained by the convolution of multi-track music recordings with the selected sets of head-related transfer functions (HRTFs). They were generated in such a way that a music ensemble (of circular shape in terms of its boundaries) was positioned in one of the following four locations with respect to the listener: front, back, up, and down. According to the obtained results, CNN identified the location of the ensembles with the average accuracy levels of 90.7% and 71.4% when tested under the HRTF-dependent and HRTF-independent conditions, respectively. For HRTF-dependent tests, the accuracy decreased monotonically with the increase in the ensemble size. A modified image occlusion sensitivity technique revealed selected frequency bands as being particularly important in terms of the localization process. These frequency bands are largely in accordance with the psychoacoustical literature. Full article

(This article belongs to the Special Issue Applications of Machine Learning in Audio Classification and Acoustic Scene Characterization)

► Show Figures

Figure 1

21 pages, 752 KiB

Open AccessArticle

Dynamic Binaural Rendering: The Advantage of Virtual Artificial Heads over Conventional Ones for Localization with Speech Signals

by Mina Fallahi, Martin Hansen, Simon Doclo, Steven van de Par, Dirk Püschel and Matthias Blau

Appl. Sci. 2021, 11(15), 6793; https://doi.org/10.3390/app11156793 - 23 Jul 2021

Cited by 3 | Viewed by 2860

Abstract

As an alternative to conventional artificial heads, a virtual artificial head (VAH), i.e., a microphone array-based filter-and-sum beamformer, can be used to create binaural renderings of spatial sound fields. In contrast to conventional artificial heads, a VAH enables one to individualize the binaural [...] Read more.

As an alternative to conventional artificial heads, a virtual artificial head (VAH), i.e., a microphone array-based filter-and-sum beamformer, can be used to create binaural renderings of spatial sound fields. In contrast to conventional artificial heads, a VAH enables one to individualize the binaural renderings and to incorporate head tracking. This can be achieved by applying complex-valued spectral weights—calculated using individual head related transfer functions (HRTFs) for each listener and for different head orientations—to the microphone signals of the VAH. In this study, these spectral weights were applied to measured room impulse responses in an anechoic room to synthesize individual binaural room impulse responses (BRIRs). In the first part of the paper, the results of localizing virtual sources generated with individually synthesized BRIRs and measured BRIRs using a conventional artificial head, for different head orientations, were assessed in comparison with real sources. Convincing localization performances could be achieved for virtual sources generated with both individually synthesized and measured non-individual BRIRs with respect to azimuth and externalization. In the second part of the paper, the results of localizing virtual sources were compared in two listening tests, with and without head tracking. The positive effect of head tracking on the virtual source localization performance confirmed a major advantage of the VAH over conventional artificial heads. Full article

(This article belongs to the Special Issue Psychoacoustics for Extended Reality (XR))

► Show Figures

Figure 1

20 pages, 5260 KiB

Open AccessFeature PaperArticle

Creation of Auditory Augmented Reality Using a Position-Dynamic Binaural Synthesis System—Technical Components, Psychoacoustic Needs, and Perceptual Evaluation

by Stephan Werner, Florian Klein, Annika Neidhardt, Ulrike Sloma, Christian Schneiderwind and Karlheinz Brandenburg

Appl. Sci. 2021, 11(3), 1150; https://doi.org/10.3390/app11031150 - 27 Jan 2021

Cited by 19 | Viewed by 5328

Abstract

For a spatial audio reproduction in the context of augmented reality, a position-dynamic binaural synthesis system can be used to synthesize the ear signals for a moving listener. The goal is the fusion of the auditory perception of the virtual audio objects with [...] Read more.

For a spatial audio reproduction in the context of augmented reality, a position-dynamic binaural synthesis system can be used to synthesize the ear signals for a moving listener. The goal is the fusion of the auditory perception of the virtual audio objects with the real listening environment. Such a system has several components, each of which help to enable a plausible auditory simulation. For each possible position of the listener in the room, a set of binaural room impulse responses (BRIRs) congruent with the expected auditory environment is required to avoid room divergence effects. Adequate and efficient approaches are methods to synthesize new BRIRs using very few measurements of the listening room. The required spatial resolution of the BRIR positions can be estimated by spatial auditory perception thresholds. Retrieving and processing the tracking data of the listener’s head-pose and position as well as convolving BRIRs with an audio signal needs to be done in real-time. This contribution presents work done by the authors including several technical components of such a system in detail. It shows how the single components are affected by psychoacoustics. Furthermore, the paper also discusses the perceptive effect by means of listening tests demonstrating the appropriateness of the approaches. Full article

(This article belongs to the Special Issue Psychoacoustics for Extended Reality (XR))

► Show Figures

Figure 1

14 pages, 251 KiB

Open AccessReview

Ecological Validity of Immersive Virtual Reality (IVR) Techniques for the Perception of Urban Sound Environments

by Chunyang Xu, Tin Oberman, Francesco Aletta, Huan Tong and Jian Kang

Acoustics 2021, 3(1), 11-24; https://doi.org/10.3390/acoustics3010003 - 25 Dec 2020

Cited by 33 | Viewed by 7552

Abstract

Immersive Virtual Reality (IVR) is a simulated technology used to deliver multisensory information to people under different environmental conditions. When IVR is generally applied in urban planning and soundscape research, it reveals attractive possibilities for the assessment of urban sound environments with higher [...] Read more.

Immersive Virtual Reality (IVR) is a simulated technology used to deliver multisensory information to people under different environmental conditions. When IVR is generally applied in urban planning and soundscape research, it reveals attractive possibilities for the assessment of urban sound environments with higher immersion for human participation. In virtual sound environments, various topics and measures are designed to collect subjective responses from participants under simulated laboratory conditions. Soundscape or noise assessment studies during virtual experiences adopt an evaluation approach similar to in situ methods. This paper aims to review the approaches that are utilized to assess the ecological validity of IVR for the perception of urban sound environments and the necessary technologies during audio–visual reproduction to establish a dynamic IVR experience that ensures ecological validity. The review shows that, through the use of laboratory tests including subjective response surveys, cognitive performance tests and physiological responses, the ecological validity of IVR can be assessed for the perception of urban sound environments. The reproduction system with head-tracking functions synchronizing spatial audio and visual stimuli (e.g., head-mounted displays (HMDs) with first-order Ambisonics (FOA)-tracked binaural playback) represents the prevailing trend to achieve high ecological validity. These studies potentially contribute to the outcomes of a normalized evaluation framework for subjective soundscape and noise assessments in virtual environments. Full article

(This article belongs to the Collection Featured Position and Review Papers in Acoustics Science)

16 pages, 5101 KiB

Open AccessArticle

Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head

by Markus Zaunschirm, Matthias Frank and Franz Zotter

Appl. Sci. 2020, 10(5), 1631; https://doi.org/10.3390/app10051631 - 29 Feb 2020

Cited by 26 | Viewed by 5891

Abstract

To improve the limited degree of immersion of static binaural rendering for headphones, an increased measurement effort to obtain multiple-orientation binaural room impulse responses (MOBRIRs) is reasonable and enables dynamic variable-orientation rendering. We investigate the perceptual characteristics of dynamic rendering from MOBRIRs and [...] Read more.

To improve the limited degree of immersion of static binaural rendering for headphones, an increased measurement effort to obtain multiple-orientation binaural room impulse responses (MOBRIRs) is reasonable and enables dynamic variable-orientation rendering. We investigate the perceptual characteristics of dynamic rendering from MOBRIRs and test for the required angular resolution. Our first listening experiment shows that a resolution between

15^{\circ}

and

30^{\circ}

is sufficient to accomplish binaural rendering of high quality, regarding timbre, spatial mapping, and continuity. A more versatile alternative considers the separation of the room-dependent (RIR) from the listener-dependent head-related (HRIR) parts, and an efficient implementation thereof involves the measurement of a first-order Ambisonic RIR (ARIR) with a tetrahedral microphone. A resolution-enhanced ARIR can be obtained by an Ambisonic spatial decomposition method (ASDM) utilizing instantaneous direction of arrival estimation. ASDM permits dynamic rendering in higher-order Ambisonics, with the flexibility to render either using dummy-head or individualized HRIRs. Our comparative second listening experiment shows that 5th-order ASDM outperforms the MOBRIR rendering with resolutions coarser than

30^{\circ}

for all tested perceptual aspects. Both listening experiments are based on BRIRs and ARIRs measured in a studio environment. Full article

(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)

► Show Figures

Figure 1

16 pages, 579 KiB

Open AccessFeature PaperArticle

Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones

by Otto Puomio, Jukka Pätynen and Tapio Lokki

Appl. Sci. 2017, 7(12), 1282; https://doi.org/10.3390/app7121282 - 9 Dec 2017

Cited by 11 | Viewed by 5543

Abstract

The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over [...] Read more.

The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over headphones, in particular the positioning of the virtual loudspeakers. The paper presents an algorithm that optimizes the positioning of virtual reproduction loudspeakers to reduce the computational cost in head-tracked real-time rendering. The listening test results suggest that listeners could discriminate the optimized loudspeaker arrays for renderings that reproduced a relatively simple acoustic conditions, but optimized array was not significantly different from equally spaced array for a reproduction of a more complex case. Moreover, the optimization seems to change the perceived openness and timbre, according to the verbal feedback of the test subjects. Full article

(This article belongs to the Special Issue Sound and Music Computing)

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI