MDPI - Publisher of Open Access Journals

19 pages, 805 KB

Open AccessArticle

Antiphonal to Ambisonics: A Practice-Based Investigation of Spatial Choral Composition Through Built Environment Materiality

by Declan Tuite

Arts 2025, 14(6), 135; https://doi.org/10.3390/arts14060135 - 4 Nov 2025

Viewed by 844

Abstract

This paper presents Macalla, a practice-based research project investigating how architectural spaces function as co-creative instruments in Ambisonic choral composition. Comprising four original compositions, Macalla employed Nelson’s praxis model, integrating creative practice with critical reflection through iterative cycles of composition, anechoic vocal [...] Read more.

This paper presents Macalla, a practice-based research project investigating how architectural spaces function as co-creative instruments in Ambisonic choral composition. Comprising four original compositions, Macalla employed Nelson’s praxis model, integrating creative practice with critical reflection through iterative cycles of composition, anechoic vocal recording, and site-specific re-recording. The project explored six contrasting architecturally significant spaces including a gaol, churches, and civic offices. Using a stop-motion stem playback methodology, studio-recorded vocals were reintroduced to architectural spaces, revealing emergent sonic properties that challenged compositional intentions and generated new musical possibilities. The resulting Ambisonic works were disseminated through multiple formats including VR/360 video via YouTube, Octophonic concert performance, and immersive headphone experiences to maximize accessibility. Analysis of listener behaviours identified distinct engagement patterns, seekers actively hunting optimal positions and dwellers settling into meditative reception, suggesting spatial compositions contain multiple potential works activated through listener choice. The project contributes empirical evidence of acoustic agency, with documented sonic transformations demonstrating that architectural spaces actively participate in composition rather than passively containing it. This research offers methodological frameworks for site-specific spatial audio creation while advancing understanding of how Ambisonic technology can transform the composer-performer-listener relationship in contemporary musical practice. Full article

(This article belongs to the Special Issue Creating Musical Experiences)

► Show Figures

Figure 1

27 pages, 1533 KB

Open AccessEditor’s ChoiceArticle

Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions

by Bastian Estay Zamorano, Ali Dehghan Firoozabadi, Alessio Brutti, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar A. Azurdia-Meza

Electronics 2025, 14(14), 2778; https://doi.org/10.3390/electronics14142778 - 10 Jul 2025

Cited by 2 | Viewed by 1957

Abstract

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article [...] Read more.

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article addresses SELD by reformulating direction of arrival (DOA) estimation as a multi-class classification task, leveraging deep convolutional recurrent neural networks (CRNNs). We propose and evaluate two modified architectures: M-DOAnet, an optimized version of DOAnet for localization and tracking, and M-SELDnet, a modified version of SELDnet, which has been designed for joint SELD. Both modified models were rigorously evaluated on the STARSS23 dataset, which comprises 13-class, real-world indoor scenes totaling over 7 h of audio, using spectrograms and acoustic intensity maps from first-order Ambisonics (FOA) signals. M-DOAnet achieved exceptional localization (6.00° DOA error, 72.8% F1-score) and perfect tracking (100% MOTA with zero identity switches). It also demonstrated high computational efficiency, training in 4.5 h (164 s/epoch). In contrast, M-SELDnet delivered strong overall SELD performance (0.32 rad DOA error, 0.75 F1-score, 0.38 error rate, 0.20 SELD score), but with significantly higher resource demands, training in 45 h (1620 s/epoch). Our findings underscore a clear trade-off between model specialization and multifunctionality, providing practical insights for designing SELD systems in real-time and computationally constrained environments. Full article

(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis, 2nd Edition)

► Show Figures

Figure 1

16 pages, 4815 KB

Open AccessArticle

Minimum Audible Angle in 3rd-Order Ambisonics in Horizontal Plane for Different Ambisonic Decoders

by Katarzyna Sochaczewska, Karolina Prawda, Paweł Małecki, Magdalena Piotrowska and Jerzy Wiciak

Appl. Sci. 2025, 15(12), 6815; https://doi.org/10.3390/app15126815 - 17 Jun 2025

Viewed by 1436

Abstract

As immersive audio is gaining popularity, the perceptual aspects of spatial sound reproduction become relevant. The authors investigate a measure related to spatial resolution, the Minimum Audible Angle (MAA), which is understudied in the context of Ambisonics. This study examines MAA thresholds in [...] Read more.

As immersive audio is gaining popularity, the perceptual aspects of spatial sound reproduction become relevant. The authors investigate a measure related to spatial resolution, the Minimum Audible Angle (MAA), which is understudied in the context of Ambisonics. This study examines MAA thresholds in the horizontal plane in three ambisonic decoders—the Sample Ambisonic Decoder (SAD), Energy-Preserving Ambisonic Decoder (EPAD), and All-Round Ambisonic Decoder (AllRAD). The results demonstrate that the decoder type influences spatial resolution, with the EPAD exhibiting superior performance in MAA thresholds (

{1.24}^{\circ}

at

0^{\circ}

azimuth) compared to the SAD and AllRAD. These differences reflect the discrepancies in the decoders’ energy vector distribution and angular error. The MAA values remain consistent between decoders up to

30^{\circ}

azimuth but diverge significantly beyond this range, especially in the

60^{\circ}

–

135^{\circ}

region corresponding to the cone of confusion. The findings of this study provide valuable insights for spatial audio applications based on ambisonic technology. Full article

(This article belongs to the Special Issue Musical Acoustics and Sound Perception)

► Show Figures

Figure 1

17 pages, 16419 KB

Open AccessArticle

A Wearable Microphone Array Helmet for Automotive Applications

by Daniel Pinardi, Andrea Toscani, Marco Binelli, Angelo Farina and Jong-Suh Park

Appl. Sci. 2025, 15(6), 3202; https://doi.org/10.3390/app15063202 - 14 Mar 2025

Cited by 1 | Viewed by 4336

Abstract

Growing interest in microphone array technology has been observed in the automotive industry and in this work, specifically, for Active Noise Control (ANC) systems. However, the human presence always limits the usage of microphone arrays in driving conditions at the driver’s seat. This [...] Read more.

Growing interest in microphone array technology has been observed in the automotive industry and in this work, specifically, for Active Noise Control (ANC) systems. However, the human presence always limits the usage of microphone arrays in driving conditions at the driver’s seat. This is often the most important position of the car cabin; a wearable microphone array is particularly interesting. In this paper, a wearable helmet microphone array is presented featuring 32 microphones arranged over the surface of a helmet, which also integrates a specially designed Analog-to-Digital (A/D) converter, delivering digital signals over the Automotive Audio Bus (A²B). Digital signals are collected using a control unit located in the passenger compartment. The control unit can either deliver digital signals to a personal computer or analog signals to an external acquisition system, by means of Digital-to-Analog (D/A) converters. A prototype was built and acoustically characterized to calculate the beamforming filter matrix required to convert the recordings (pressure signals) into Ambisonics signals (a spatial audio format). The proposed solution was compared to the reference spherical microphone array of the last decade, demonstrating better performance in sound source localization at low frequencies, where ANC systems are mostly effective. Full article

(This article belongs to the Special Issue Advances in Human–Machine Systems, Human–Machine Interfaces and Human Wearable Device Performance)

► Show Figures

Figure 1

33 pages, 46059 KB

Open AccessArticle

Real and Virtual Lecture Rooms: Validation of a Virtual Reality System for the Perceptual Assessment of Room Acoustical Quality

by Angela Guastamacchia, Riccardo Giovanni Rosso, Giuseppina Emma Puglisi, Fabrizio Riente, Louena Shtrepi and Arianna Astolfi

Acoustics 2024, 6(4), 933-965; https://doi.org/10.3390/acoustics6040052 - 30 Oct 2024

Cited by 2 | Viewed by 3760

Abstract

Enhancing the acoustical quality in learning environments is necessary, especially for hearing aid (HA) users. When in-field evaluations cannot be performed, virtual reality (VR) can be adopted for acoustical quality assessments of existing and new buildings, contributing to the acquisition of subjective impressions [...] Read more.

Enhancing the acoustical quality in learning environments is necessary, especially for hearing aid (HA) users. When in-field evaluations cannot be performed, virtual reality (VR) can be adopted for acoustical quality assessments of existing and new buildings, contributing to the acquisition of subjective impressions in lab settings. To ensure an accurate spatial reproduction of the sound field in VR for HA users, multi-speaker-based systems can be employed to auralize a given environment. However, most systems require a lot of effort due to cost, size, and construction. This work deals with the validation of a VR-system based on a 16-speaker-array synced with a VR headset, arranged to be easily replicated in small non-anechoic spaces and suitable for HA users. Both objective and subjective validations are performed against a real university lecture room of 800 m³ and with 2.3 s of reverberation time at mid-frequencies. Comparisons of binaural and monoaural room acoustic parameters are performed between measurements in the real lecture room and its lab reproduction. To validate the audiovisual experience, 32 normal-hearing subjects were administered the Igroup Presence Questionnaire (IPQ) on the overall sense of perceived presence. The outcomes confirm that the system is a promising and feasible tool to predict the perceived acoustical quality of a room. Full article

(This article belongs to the Special Issue Acoustical Comfort in Educational Buildings)

► Show Figures

Figure 1

15 pages, 4269 KB

Open AccessArticle

Assessing Ambisonics Sound Source Localization by Means of Virtual Reality and Gamification Tools

by Esaú Medina, Rhoddy Viveros-Muñoz and Felipe Otondo

Appl. Sci. 2024, 14(17), 7986; https://doi.org/10.3390/app14177986 - 6 Sep 2024

Cited by 2 | Viewed by 3506

Abstract

Sound localization is a key area of interest in auditory research, especially in complex acoustic environments. This study evaluates the impact of incorporating higher-order Ambisonics (HOA) with virtual reality (VR) and gamification tools on sound source localization. The research addresses the current limitations [...] Read more.

Sound localization is a key area of interest in auditory research, especially in complex acoustic environments. This study evaluates the impact of incorporating higher-order Ambisonics (HOA) with virtual reality (VR) and gamification tools on sound source localization. The research addresses the current limitations in VR audio systems, particularly the lack of native support for HOA in game engines like Unreal Engine (UE). A novel framework was developed, combining UE for VR graphics rendering and Max for HOA audio processing. Participants performed sound source localization tasks in two VR environments using a head-mounted display (HMD). The assessment included both horizontal and vertical plane localization. Gamification elements were introduced to improve engagement and task comprehension. Results showed significant improvements in horizontal localization accuracy, although challenges remained in back localization. The findings underscore the potential of VR and gamification to enhance auditory tests, reducing test duration and participant fatigue. This research contributes to the development of immersive and interactive audio experiences, highlighting the broader applications of VR beyond entertainment. Full article

(This article belongs to the Special Issue Enhancing User Experience in Virtual Reality Environments: Innovative Interaction Design Strategies)

► Show Figures

Figure 1

23 pages, 11519 KB

Open AccessArticle

A Quantitative and Qualitative Experimental Framework for the Evaluation of Urban Soundscapes: Application to the City of Sidi Bou Saïd

by Mohamed Amin Hammami and Christophe Claramunt

ISPRS Int. J. Geo-Inf. 2024, 13(5), 152; https://doi.org/10.3390/ijgi13050152 - 1 May 2024

Cited by 1 | Viewed by 3678

Abstract

This research introduces an experimental framework based on 3D acoustic and psycho-acoustic sensors supplemented with ambisonics and sound morphological analysis, whose objective is to study urban soundscapes. A questionnaire that highlights the differences between what has been measured and what has been perceiveSd [...] Read more.

This research introduces an experimental framework based on 3D acoustic and psycho-acoustic sensors supplemented with ambisonics and sound morphological analysis, whose objective is to study urban soundscapes. A questionnaire that highlights the differences between what has been measured and what has been perceiveSd by humans complements the quantitative approach with a qualitative evaluation. The comparison of the measurements with the questionnaire provides a global vision of the perception of these soundscapes, as well as differences and similarities. The approach is experimented within the historical center of the Tunisian city of Sidi Bou Saïd, demonstrating that from a range of complementary protocols, a soundscape environment can be qualified. This framework provides an additional dimension to urban planning studies. Full article

► Show Figures

Figure 1

19 pages, 31042 KB

Open AccessArticle

Room Impulse Response Dataset of a Recording Studio with Variable Wall Paneling Measured Using a 32-Channel Spherical Microphone Array and a B-Format Microphone Array

by Grace Chesworth, Amy Bastine and Thushara Abhayapala

Appl. Sci. 2024, 14(5), 2095; https://doi.org/10.3390/app14052095 - 2 Mar 2024

Cited by 1 | Viewed by 4273

Abstract

This paper introduces RSoANU, a dataset of real multichannel room impulse responses (RIRs) obtained in a recording studio. Compared to the current publicly available datasets, RSoANU distinguishes itself by featuring RIRs captured using both a 32-channel spherical microphone array (mh acoustics em32 Eigenmike) [...] Read more.

This paper introduces RSoANU, a dataset of real multichannel room impulse responses (RIRs) obtained in a recording studio. Compared to the current publicly available datasets, RSoANU distinguishes itself by featuring RIRs captured using both a 32-channel spherical microphone array (mh acoustics em32 Eigenmike) and a B-format soundfield microphone array (Rode NT-SF1). The studio incorporates variable wall panels in felt and wood options, with measurements conducted for two configurations: all panels set to wood or felt. Three source positions that emulate typical performance locations were considered. RIRs were collected over a planar receiver grid spanning the room, with the microphone array centered at a height of 1.7 m. The paper includes an analysis of acoustic parameters derived from the dataset, revealing notable distinctions between felt and wood panel environments. Felt panels exhibit faster decay, higher clarity, and superior definition in mid-to-high frequencies. The analysis across the receiver grid emphasizes the impact of room geometry and source–receiver positions on reverberation time and clarity. The study also notes spatial variations in parameters obtained from the two microphone arrays, suggesting potential for future research into their specific capabilities for room acoustic characterization. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

17 pages, 3780 KB

Open AccessArticle

Recording, Processing, and Reproduction of Vibrations Produced by Impact Noise Sources in Buildings

by Franz Dolezal, Andreas Reichenauer, Armin Wilfling, Maximilian Neusser and Rok Prislan

Acoustics 2024, 6(1), 97-113; https://doi.org/10.3390/acoustics6010006 - 17 Jan 2024

Cited by 2 | Viewed by 4209

Abstract

Several studies on the perception of impact sounds question the correlation of standardized approaches with perceived annoyance, while more recent studies have come to inconsistent conclusions. All these studies neglected the aspect of whole-body vibrations, which are known to be relevant for the [...] Read more.

Several studies on the perception of impact sounds question the correlation of standardized approaches with perceived annoyance, while more recent studies have come to inconsistent conclusions. All these studies neglected the aspect of whole-body vibrations, which are known to be relevant for the perception of low-frequency sound and can be perceived especially in lightweight constructions. Basically, the contribution of vibrations to impact sound annoyance is still unknown and could be the reason for the contradictory results. To investigate this aspect, we measured vibrations on different types of floors under laboratory conditions and in situ. For this purpose, a vibration-sensing device was developed to record vibrations more cost-effectively and independently of commercial recording instruments. The vibrations of predefined impact sequences were recorded together with the sound field using a higher-order ambisonics microphone. In addition, a vibration exposure device was developed to expose the test objects to the exact vibrations that occur in the built environment. The vibration exposure device is integrated into the ambisonics reproduction system, which consists of a large number of loudspeakers in a spherical configuration. The article presents the development and performance achieved using the vibration-sensing unit and the vibration exposure device. The study is relevant for conducting future impact sound listening tests under laboratory conditions, which can be extended to include the reproduction of vibrations. Full article

(This article belongs to the Special Issue Building Materials and Acoustics)

► Show Figures

Figure 1

16 pages, 1632 KB

Open AccessArticle

Upmix B-Format Ambisonic Room Impulse Responses Using a Generative Model

by Jiawei Xia and Wen Zhang

Appl. Sci. 2023, 13(21), 11810; https://doi.org/10.3390/app132111810 - 29 Oct 2023

Cited by 3 | Viewed by 3120

Abstract

Ambisonic room impulse responses (ARIRs) are recorded to capture the spatial acoustic characteristics of specific rooms, with widespread applications in virtual and augmented reality. While the first-order Ambisonics (FOA) microphone array is commonly employed for three-dimensional (3D) room acoustics recording due to its [...] Read more.

Ambisonic room impulse responses (ARIRs) are recorded to capture the spatial acoustic characteristics of specific rooms, with widespread applications in virtual and augmented reality. While the first-order Ambisonics (FOA) microphone array is commonly employed for three-dimensional (3D) room acoustics recording due to its easy accessibility, higher spatial resolution necessitates using higher-order Ambisonics (HOA) in applications such as binaural rendering and sound field reconstruction. This paper introduces a novel approach, leveraging generative models to upmix ARIRs. The evaluation results validate the model’s effectiveness at upmixing first-order ARIRs to higher-order representations, surpassing the aliasing frequency limitations. Furthermore, the spectral errors observed in the Binaural Room Transfer Functions (BRTFs) indicate the potential benefits of using upmixed ARIRs for binaural rendering, significantly improving rendering accuracy. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

27 pages, 7639 KB

Open AccessArticle

Virtual Urban Field Studies: Evaluating Urban Interaction Design Using Context-Based Interface Prototypes

by Robert Dongas, Kazjon Grace, Samuel Gillespie, Marius Hoggenmueller, Martin Tomitsch and Stewart Worrall

Multimodal Technol. Interact. 2023, 7(8), 82; https://doi.org/10.3390/mti7080082 - 18 Aug 2023

Cited by 5 | Viewed by 3434

Abstract

In this study, we propose the use of virtual urban field studies (VUFS) through context-based interface prototypes for evaluating the interaction design of auditory interfaces. Virtual field tests use mixed-reality technologies to combine the fidelity of real-world testing with the affordability and speed [...] Read more.

In this study, we propose the use of virtual urban field studies (VUFS) through context-based interface prototypes for evaluating the interaction design of auditory interfaces. Virtual field tests use mixed-reality technologies to combine the fidelity of real-world testing with the affordability and speed of testing in the lab. In this paper, we apply this concept to rapidly test sound designs for autonomous vehicle (AV)–pedestrian interaction with a high degree of realism and fidelity. We also propose the use of psychometrically validated measures of presence in validating the verisimilitude of VUFS. Using mixed qualitative and quantitative methods, we analysed users’ perceptions of presence in our VUFS prototype and the relationship to our prototype’s effectiveness. We also examined the use of higher-order ambisonic spatialised audio and its impact on presence. Our results provide insights into how VUFS can be designed to facilitate presence as well as design guidelines for how this can be leveraged. Full article

(This article belongs to the Special Issue Interaction Design and the Automated City – Emerging Urban Interfaces, Prototyping Approaches and Design Methods)

► Show Figures

Figure 1

21 pages, 6026 KB

Open AccessArticle

Flying a Quadcopter—An Audio Entertainment and Training Game for the Visually Impaired

by Silviu Ivascu, Florica Moldoveanu, Alin Moldoveanu, Anca Morar, Ana-Maria Tugulea and Victor Asavei

Appl. Sci. 2023, 13(11), 6769; https://doi.org/10.3390/app13116769 - 2 Jun 2023

Cited by 4 | Viewed by 3015

Abstract

With the increase in the number of sensory substitution devices, the engineering community is confronted with a new challenge: ensuring user training in safe virtual environments before using these devices in real-life situations. We developed a game that uses an original sonification model, [...] Read more.

With the increase in the number of sensory substitution devices, the engineering community is confronted with a new challenge: ensuring user training in safe virtual environments before using these devices in real-life situations. We developed a game that uses an original sonification model, which, although not specific to a certain substitution device, can be an effective means of training for orientation in space based on audio stimuli. Thus, the game is not only a means of entertainment for visually impaired (VI) people but also one of training for the use of assistive devices. The game design and audio design are original contributions by the authors. The sonification model, which is crucial for a game dedicated to visually impaired people, is described in detail, both at the user and the implementation level. For better immersion, special sound design techniques have been used, such as ambisonic recordings and impulse response (IR) recordings. The game has been improved gradually, especially the sonification model, based on users’ feedback. Full article

► Show Figures

Figure 1

28 pages, 11874 KB

Open AccessArticle

Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios

by Ali Dehghan Firoozabadi, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar Azurdia-Meza

Sensors 2023, 23(9), 4499; https://doi.org/10.3390/s23094499 - 5 May 2023

Viewed by 2510

Abstract

Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for [...] Read more.

Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation. Full article

(This article belongs to the Special Issue Localising Sensors through Wireless Communication)

► Show Figures

Figure 1

21 pages, 1923 KB

Open AccessArticle

Binaural Auralization of Room Acoustics with a Highly Scalable Wave-Based Acoustics Simulation

by Takumi Yoshida, Takeshi Okuzono and Kimihiro Sakagami

Appl. Sci. 2023, 13(5), 2832; https://doi.org/10.3390/app13052832 - 22 Feb 2023

Cited by 4 | Viewed by 3822

Abstract

This paper presents a proposal of an efficient binaural room-acoustics auralization method, an essential goal of room-acoustics modeling. The method uses a massively parallel wave-based room-acoustics solver based on a dispersion-optimized explicit time-domain finite element method (TD-FEM). The binaural room-acoustics auralization uses a [...] Read more.

This paper presents a proposal of an efficient binaural room-acoustics auralization method, an essential goal of room-acoustics modeling. The method uses a massively parallel wave-based room-acoustics solver based on a dispersion-optimized explicit time-domain finite element method (TD-FEM). The binaural room-acoustics auralization uses a hybrid technique of first-order Ambisonics (FOA) and head-related transfer functions. Ambisonics encoding uses room impulse responses computed by a parallel wave-based room-acoustics solver that can model sound absorbers with complex-valued surface impedance. Details are given of the novel procedure for computing expansion coefficients of spherical harmonics composing the FOA signal. This report is the first presenting a parallel wave-based solver able to simulate room impulse responses with practical computational times using an HPC cloud environment. A meeting room problem and a classroom problem are used, respectively, having 35 million degrees of freedom (DOF) and 100 million DOF, to test the parallel performance of up to 6144 CPU cores. Then, the potential of the proposed binaural room-acoustics auralization method is demonstrated via an auditorium acoustics simulation of up to 5 kHz having 750,000,000 DOFs. Room-acoustics auralization is performed with two acoustics treatment scenarios and room-acoustics evaluations that use an FOA signal, binaural room impulse response, and four room acoustical parameters. The auditorium acoustics simulation showed that the proposed method enables binaural room-acoustics auralization within 13,000 s using 6144 cores. Full article

(This article belongs to the Special Issue Modelling Acoustics, Vibrations, or Vibroacoustics for a Modern Built Environment)

► Show Figures

Figure 1

14 pages, 3232 KB

Open AccessFeature PaperArticle

Deep Learning-Based Acoustic Echo Cancellation for Surround Sound Systems

by Guoteng Li, Chengshi Zheng, Yuxuan Ke and Xiaodong Li

Appl. Sci. 2023, 13(3), 1266; https://doi.org/10.3390/app13031266 - 17 Jan 2023

Cited by 3 | Viewed by 6845

Abstract

Surround sound systems that play back multi-channel audio signals through multiple loudspeakers can improve augmented reality, which has been widely used in many multimedia communication systems. It is common that a hand-free speech communication system suffers from the acoustic echo problem, and the [...] Read more.

Surround sound systems that play back multi-channel audio signals through multiple loudspeakers can improve augmented reality, which has been widely used in many multimedia communication systems. It is common that a hand-free speech communication system suffers from the acoustic echo problem, and the echo needs to be canceled or suppressed completely. This paper proposes a deep learning-based acoustic echo cancellation (AEC) method to recover the desired near-end speech from the microphone signals in surround sound systems. The ambisonics technique was adopted to record the surround sound for reproduction. To achieve a better generalization capability against different loudspeaker layouts, the compressed complex spectra of the first-order ambisonic signals (B-format) were sent to the neural network as the input features directly instead of using the ambisonic decoded signals (D-format). Experimental results on both simulated and real acoustic environments showed the effectiveness of the proposed algorithm in surround AEC, and outperformed other competing methods in terms of the speech quality and the amount of echo reduction. Full article

(This article belongs to the Special Issue Advances in Speech and Language Processing)

► Show Figures

Figure 1

Search Results (34)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (34)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI