MDPI - Publisher of Open Access Journals

28 pages, 7844 KB

Open AccessArticle

Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction

by Guangneng Li, Feiyu Zhao, Wei Tian and Tong Yang

Entropy 2025, 27(9), 942; https://doi.org/10.3390/e27090942 - 9 Sep 2025

Viewed by 892

Abstract

In recent years, with the popularization of intelligent scene monitoring, sound source localization (SSL) has become a major means for indoor monitoring and target positioning. However, existing sound source localization solutions are difficult to extend to multi-source and three-dimensional scenarios. To address this, [...] Read more.

In recent years, with the popularization of intelligent scene monitoring, sound source localization (SSL) has become a major means for indoor monitoring and target positioning. However, existing sound source localization solutions are difficult to extend to multi-source and three-dimensional scenarios. To address this, this paper proposes a three-dimensional sound source localization technology based on eight microphones. Specifically, the method employs a rectangular eight-microphone array and captures Direction-of-Arrival (DOA) information via the direct path relative transfer function (DP-RTF). It introduces spatial entropy to quantify the uncertainty caused by the exponentially growing DOA combinations as the number of sound sources increases, while further reducing the spatial entropy of sound source localization through geometric intersection. This solves the problem that traditional sound source localization methods cannot be applied to multi-source and three-dimensional scenarios. On the other hand, machine learning is used to eliminate coordinate deviations caused by DOA estimation errors of the direct path relative transfer function (DP-RTF) and deviations in microphone geometric parameters. Both simulation experiments and real-scene experiments show that the positioning error of the proposed method in three-dimensional scenarios is about 10.0 cm. Full article

(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing, Third Edition)

► Show Figures

Figure 1

51 pages, 15030 KB

Open AccessReview

A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods

by Reza Jalayer, Masoud Jalayer and Amirali Baniasadi

Appl. Sci. 2025, 15(17), 9354; https://doi.org/10.3390/app15179354 - 26 Aug 2025

Cited by 1 | Viewed by 1578

Abstract

Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation, human–machine dialogue, and condition monitoring. While existing surveys provide valuable [...] Read more.

Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation, human–machine dialogue, and condition monitoring. While existing surveys provide valuable historical context, they typically address general audio applications and do not fully account for robotic constraints or the latest advancements in deep learning. This review addresses these gaps by offering a robotics-focused synthesis, emphasizing recent progress in deep learning methodologies. We start by reviewing classical methods such as time difference of arrival (TDOA), beamforming, steered-response power (SRP), and subspace analysis. Subsequently, we delve into modern machine learning (ML) and deep learning (DL) approaches, discussing traditional ML and neural networks (NNs), convolutional neural networks (CNNs), convolutional recurrent neural networks (CRNNs), and emerging attention-based architectures. The data and training strategy that are the two cornerstones of DL-based SSL are explored. Studies are further categorized by robot types and application domains to facilitate researchers in identifying relevant work for their specific contexts. Finally, we highlight the current challenges in SSL works in general, regarding environmental robustness, sound source multiplicity, and specific implementation constraints in robotics, as well as data and learning strategies in DL-based SSL. Also, we sketch promising directions to offer an actionable roadmap toward robust, adaptable, efficient, and explainable DL-based SSL for next-generation robots. Full article

(This article belongs to the Special Issue Leveraging Digital Transformation for Enhanced Occupational Health and Safety in Manufacturing)

► Show Figures

Figure 1

19 pages, 3044 KB

Open AccessReview

Deep Learning-Based Sound Source Localization: A Review

by Kunbo Xu, Zekai Zong, Dongjun Liu, Ran Wang and Liang Yu

Appl. Sci. 2025, 15(13), 7419; https://doi.org/10.3390/app15137419 - 2 Jul 2025

Viewed by 1805

Abstract

As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which [...] Read more.

As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which struggle to meet practical demands in dynamic and complex scenarios. Recent advancements in deep learning have revolutionized SSL by leveraging its end-to-end feature adaptability, cross-scenario generalization capabilities, and data-driven modeling, significantly enhancing localization robustness and accuracy in challenging environments. This review systematically examines the progress of deep learning-based SSL across three critical domains: marine environments, indoor reverberant spaces, and unmanned aerial vehicle (UAV) monitoring. In marine scenarios, complex-valued convolutional networks combined with adversarial transfer learning mitigate environmental mismatch and multipath interference through phase information fusion and domain adaptation strategies. For indoor high-reverberation conditions, attention mechanisms and multimodal fusion architectures achieve precise localization under low signal-to-noise ratios by adaptively weighting critical acoustic features. In UAV surveillance, lightweight models integrated with spatiotemporal Transformers address dynamic modeling of non-stationary noise spectra and edge computing efficiency constraints. Despite these advancements, current approaches face three core challenges: the insufficient integration of physical principles, prohibitive data annotation costs, and the trade-off between real-time performance and accuracy. Future research should prioritize physics-informed modeling to embed acoustic propagation mechanisms, unsupervised domain adaptation to reduce reliance on labeled data, and sensor-algorithm co-design to optimize hardware-software synergy. These directions aim to propel SSL toward intelligent systems characterized by high precision, strong robustness, and low power consumption. This work provides both theoretical foundations and technical references for algorithm selection and practical implementation in complex real-world scenarios. Full article

► Show Figures

Figure 1

25 pages, 7065 KB

Open AccessArticle

A Planer Moving Microphone Array for Sound Source Localization

by Chuyang Wang, Karhang Chu and Yatsze Choy

Appl. Sci. 2025, 15(12), 6777; https://doi.org/10.3390/app15126777 - 16 Jun 2025

Viewed by 4012

Abstract

Sound source localization (SSL) equips service robots with the ability to perceive sound similarly to humans, which is particularly valuable in complex, dark indoor environments where vision-based systems may not work. From a data collection perspective, increasing the number of microphones generally improves [...] Read more.

Sound source localization (SSL) equips service robots with the ability to perceive sound similarly to humans, which is particularly valuable in complex, dark indoor environments where vision-based systems may not work. From a data collection perspective, increasing the number of microphones generally improves SSL performance. However, a large microphone array such as a 16-microphone array configuration may occupy significant space on a robot. To address this, we propose a novel framework that uses a structure of four planar moving microphones to emulate the performance of a 16-microphone array, thereby saving space. Because of its unique design, this structure can dynamically form various spatial patterns, enabling 3D SSL, including estimation of angle, distance, and height. For experimental comparison, we also constructed a circular 6-microphone array and a planar 4 × 4 microphone array, both capable of rotation to ensure fairness. Three SSL algorithms were applied across all configurations. Experiments were conducted in a standard classroom environment, and the results show that the proposed framework achieves approximately 80–90% accuracy in angular estimation and around 85% accuracy in distance and height estimation, comparable to the performance of the 4 × 4 planar microphone array. Full article

(This article belongs to the Special Issue Noise Measurement, Acoustic Signal Processing and Noise Control)

► Show Figures

Figure 1

23 pages, 7047 KB

Open AccessArticle

UaVirBASE: A Public-Access Unmanned Aerial Vehicle Sound Source Localization Dataset

by Gabriel Jekateryńczuk, Rafał Szadkowski and Zbigniew Piotrowski

Appl. Sci. 2025, 15(10), 5378; https://doi.org/10.3390/app15105378 - 12 May 2025

Cited by 1 | Viewed by 1425

Abstract

This article presents UaVirBASE, a publicly available dataset for the sound source localization (SSL) of unmanned aerial vehicles (UAVs). The dataset contains synchronized multi-microphone recordings captured under controlled conditions, featuring variations in UAV distances, altitudes, azimuths, and orientations relative to a fixed microphone [...] Read more.

This article presents UaVirBASE, a publicly available dataset for the sound source localization (SSL) of unmanned aerial vehicles (UAVs). The dataset contains synchronized multi-microphone recordings captured under controlled conditions, featuring variations in UAV distances, altitudes, azimuths, and orientations relative to a fixed microphone array. UAV orientations include front, back, left, and right-facing configurations. UaVirBASE addresses the growing need for standardized SSL datasets tailored for UAV applications, filling a gap left behind by existing databases that often lack such specific variations. Additionally, we describe the software and hardware employed for data acquisition and annotation alongside an analysis of the dataset’s structure. With its well-annotated and diverse data, UaVirBASE is ideally suited for applications in artificial intelligence, particularly in developing and benchmarking machine learning and deep learning models for SSL. Controlling the dataset’s variations enables the training of AI systems capable of adapting to complex UAV-based scenarios. We also demonstrate the architecture and results of the deep neural network (DNN) trained on this dataset, evaluating model performance across different features. Our results show an average Mean Absolute Error (MAE) of 0.5 m for distance and height, an average azimuth error of around 1 degree, and side errors under 10 degrees. UaVirBASE serves as a valuable resource to support reproducible research and foster innovation in UAV-based acoustic signal processing by addressing the need for a standardized and versatile UAV SSL dataset. Full article

(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)

► Show Figures

Figure 1

18 pages, 10372 KB

Open AccessArticle

Acoustic Fabry–Perot Resonance Detector for Passive Acoustic Thermometry and Sound Source Localization

by Yan Yue, Zhifei Dong and Zhi-mei Qi

Sensors 2025, 25(8), 2445; https://doi.org/10.3390/s25082445 - 12 Apr 2025

Viewed by 584

Abstract

Acoustic temperature measurement (ATM) and sound source localization (SSL) are two important applications of acoustic sensors. The development of novel acoustic sensors capable of both ATM and SSL is an innovative research topic with great interest. In this work, an acoustic Fabry-Perot resonance [...] Read more.

Acoustic temperature measurement (ATM) and sound source localization (SSL) are two important applications of acoustic sensors. The development of novel acoustic sensors capable of both ATM and SSL is an innovative research topic with great interest. In this work, an acoustic Fabry-Perot resonance detector (AFPRD) and its cross-shaped array were designed and fabricated, and the passive ATM function of the AFPRD and the SSL capability of the AFPRD array were simulated and experimentally verified. The AFPRD consists of an acoustic waveguide and a microphone with its head inserted into the waveguide, which can significantly enhance the microphone’s sensitivity via the FP resonance effect. As a result, the frequency response curve of AFPRD can be easily measured using weak ambient white noise. Based on the measured frequency response curve, the linear relationship between the resonant frequency and the resonant mode order of the AFPRD can be determined, the slope of which can be used to calculate the ambient sound velocity and air temperature. The AFPRD array was prepared by using four bent acoustic waveguides to expand the array aperture, which combined with the multiple signal classification (MUSIC) algorithm can be used for distant multi-target localization. The SSL accuracy can be improved by substituting the sound speed measured in real time into the MUSIC algorithm. The AFPRD’s passive ATM function was verified in an anechoic room with white noise as low as 17 dB, and the ATM accuracy reached 0.4 °C. The SSL function of the AFPRD array was demonstrated in the outdoor environment, and the SSL error of the acoustic target with a sound pressure of 35 mPa was less than 1.2°. The findings open up a new avenue for the development of multifunctional acoustic detection devices and systems. Full article

(This article belongs to the Special Issue Recent Advances in Optical and Optoelectronic Acoustic Sensors)

► Show Figures

Figure 1

23 pages, 1181 KB

Open AccessArticle

Diffusion-Based Sound Source Localization Using a Distributed Network of Microphone Arrays

by Davide Albertini, Alberto Bernardini, Gioele Greco and Augusto Sarti

Sensors 2025, 25(7), 2078; https://doi.org/10.3390/s25072078 - 26 Mar 2025

Viewed by 749

Abstract

Traditionally, microphone array networks for 3D sound source localization rely on centralized data processing, which can limit scalability and robustness. In this article, we recast the task of sound source localization (SSL) with networks of acoustic arrays as a distributed optimization problem. We [...] Read more.

Traditionally, microphone array networks for 3D sound source localization rely on centralized data processing, which can limit scalability and robustness. In this article, we recast the task of sound source localization (SSL) with networks of acoustic arrays as a distributed optimization problem. We then present two resolution approaches of such a problem; one is computationally centralized, while the other is computationally distributed and based on an Adapt-Then-Combine (ATC) diffusion strategy. In particular, we address 3D SSL with a network of linear microphone arrays, each of which estimates a stream of 2D directions of arrival (DoAs) and they cooperate with each other to localize a single sound source. We develop adaptive cooperation strategies to penalize the arrays with the most detrimental effects on localization accuracy and improve performance through error-based and distance-based penalties. The performance of the method is evaluated using increasingly complex DoA stream models and simulated acoustic environments characterized by various levels of reverberation and signal-to-noise ratio (SNR). Furthermore, we investigate how the performance is related to the connectivity of the network and show that the proposed approach maintains high localization accuracy and stability even in sparsely connected networks. Full article

(This article belongs to the Special Issue Acoustic Sensing and Monitoring in Urban and Natural Environments (2nd Edition))

► Show Figures

Figure 1

23 pages, 8182 KB

Open AccessArticle

Sound Source Localization Using Deep Learning for Human–Robot Interaction Under Intelligent Robot Environments

by Hong-Min Jo, Tae-Wan Kim and Keun-Chang Kwak

Electronics 2025, 14(5), 1043; https://doi.org/10.3390/electronics14051043 - 6 Mar 2025

Cited by 2 | Viewed by 1966

Abstract

In this paper, we propose Sound Source Localization (SSL) using deep learning for Human–Robot Interaction (HRI) under intelligent robot environments. The proposed SSL method consists of three steps. The first step preprocesses the sound source to minimize noise and reverberation in the robotic [...] Read more.

In this paper, we propose Sound Source Localization (SSL) using deep learning for Human–Robot Interaction (HRI) under intelligent robot environments. The proposed SSL method consists of three steps. The first step preprocesses the sound source to minimize noise and reverberation in the robotic environment. Excitation source information (ESI), which contains only the original components of the sound source, is extracted from a sound source in a microphone array mounted on a robot to minimize background influence. Here, the linear prediction residual is used as the ESI. Subsequently, the cross-correlation signal between each adjacent microphone pair is calculated by using the ESI signal of each sound source. To minimize the influence of noise, a Generalized Cross-Correlation with the phase transform (GCC-PHAT) algorithm is used. In the second step, we design a single-channel, multi-input convolutional neural network that can independently learn the calculated cross-correlation signal between each adjacent microphone pair and the location of the sound source using the time difference of arrival. The third step classifies the location of the sound source after training with the proposed network. Previous studies have primarily used various features as inputs and stacked them into multiple channels, which made the algorithm complex. Furthermore, multi-channel inputs may not be sufficient to clearly train the interrelationship between each sound source. To address this issue, the cross-correlation signal between each sound source alone is used as the network input. The proposed method was verified on the Electronics and Telecommunications Research Institute-Sound Source Localization (ETRI-SSL) database acquired from the robotic environment. The experimental results revealed that the proposed method showed an 8.75% higher performance in comparison to the previous works. Full article

(This article belongs to the Special Issue Control and Design of Intelligent Robots)

► Show Figures

Figure 1

21 pages, 1104 KB

Open AccessArticle

Advancing Applications of Robot Audition Systems: Efficient HARK Deployment with GPU and FPGA Implementations

by Zirui Lin, Hideharu Amano, Masayuki Takigahira, Naoya Terakado, Katsutoshi Itoyama, Haris Gulzar and Kazuhiro Nakadai

Chips 2025, 4(1), 2; https://doi.org/10.3390/chips4010002 - 27 Dec 2024

Cited by 1 | Viewed by 1798

Abstract

This paper proposes efficient implementations of robot audition systems, specifically focusing on deployments using HARK, an open-source software (OSS) platform designed for robot audition. Although robot audition systems are versatile and suitable for various scenarios, efficiently deploying them can be challenging due to [...] Read more.

This paper proposes efficient implementations of robot audition systems, specifically focusing on deployments using HARK, an open-source software (OSS) platform designed for robot audition. Although robot audition systems are versatile and suitable for various scenarios, efficiently deploying them can be challenging due to their high computational demands and extensive processing times. For scenarios involving intensive high-dimensional data processing with large-scale microphone arrays, our generalizable GPU-based implementation significantly reduced processing time, enabling real-time Sound Source Localization (SSL) and Sound Source Separation (SSS) using a 60-channel microphone array across two distinct GPU platforms. Specifically, our implementation achieved speedups of 23.3× for SSL and 3.0× for SSS on a high-performance server equipped with an NVIDIA A100 80 GB GPU. Additionally, on the Jetson AGX Orin 32 GB, which represents embedded environments, it achieved speedups of 14.8× for SSL and 1.6× for SSS. For edge computing scenarios, we developed an adaptable FPGA-based implementation of HARK using High-Level Synthesis (HLS) on M-KUBOS, a Multi-Access Edge Computing (MEC) FPGA Multiprocessor System on a Chip (MPSoC) device. Utilizing an eight-channel microphone array, this implementation achieved a 1.2× speedup for SSL and a 1.1× speedup for SSS, along with a 1.1× improvement in overall energy efficiency. Full article

► Show Figures

Figure 1

14 pages, 1309 KB

Open AccessArticle

Combined Keyword Spotting and Localization Network Based on Multi-Task Learning

by Jungbeom Ko, Hyunchul Kim and Jungsuk Kim

Mathematics 2024, 12(21), 3309; https://doi.org/10.3390/math12213309 - 22 Oct 2024

Viewed by 1600

Abstract

The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the [...] Read more.

The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the speaker and selectively spot meaningful keywords. Because keyword spotting (KWS) and sound source localization (SSL) are essential and must operate in real time, the efficiency of a neural network model is crucial for memory and computation. In this paper, a single neural network model for KWS and SSL is proposed to overcome the limitations of sequential KWS and SSL, which require more memory and inference time. The proposed model uses multi-task learning to utilize the limited resources of the device efficiently. A shared encoder is used as the initial layer to extract common features from the multichannel audio data. Subsequently, the task-specific parallel layers utilize these features for KWS and SSL. The proposed model was evaluated on a synthetic dataset with multiple speakers, and a 7-module shared encoder structure was identified as optimal in terms of accuracy, direction of arrival (DOA) accuracy, DOA error, and latency. It achieved a KWS accuracy of 94.51%, DOA error of 12.397°, and DOA accuracy of 89.86%. Consequently, the proposed model requires significantly less memory owing to the shared network architecture, which enhances the inference time without compromising KWS accuracy, DOA error, and DOA accuracy. Full article

(This article belongs to the Special Issue Computational Intelligence and Machine Learning with Applications)

► Show Figures

Figure 1

20 pages, 3915 KB

Open AccessArticle

A Study of Improved Two-Stage Dual-Conv Coordinate Attention Model for Sound Event Detection and Localization

by Guorong Chen, Yuan Yu, Yuan Qiao, Junliang Yang, Chongling Du, Zhang Qian and Xiao Huang

Sensors 2024, 24(16), 5336; https://doi.org/10.3390/s24165336 - 18 Aug 2024

Cited by 1 | Viewed by 1517

Abstract

Sound Event Detection and Localization (SELD) is a comprehensive task that aims to solve the subtasks of Sound Event Detection (SED) and Sound Source Localization (SSL) simultaneously. The task of SELD lies in the need to solve both sound recognition and spatial localization [...] Read more.

Sound Event Detection and Localization (SELD) is a comprehensive task that aims to solve the subtasks of Sound Event Detection (SED) and Sound Source Localization (SSL) simultaneously. The task of SELD lies in the need to solve both sound recognition and spatial localization problems, and different categories of sound events may overlap in time and space, making it more difficult for the model to distinguish between different events occurring at the same time and to locate the sound source. In this study, the Dual-conv Coordinate Attention Module (DCAM) combines dual convolutional blocks and Coordinate Attention, and based on this, the network architecture based on the two-stage strategy is improved to form the SELD-oriented Two-Stage Dual-conv Coordinate Attention Model (TDCAM) for SELD. TDCAM draws on the concepts of Visual Geometry Group (VGG) networks and Coordinate Attention to effectively capture critical local information by focusing on the coordinate space information of the feature map and dealing with the relationship between the feature map channels to enhance the feature selection capability of the model. To address the limitation of a single-layer Bi-directional Gated Recurrent Unit (Bi-GRU) in the two-stage network in terms of timing processing, we add to the structure of the two-layer Bi-GRU and introduce the data enhancement techniques of the frequency mask and time mask to improve the modeling and generalization ability of the model for timing features. Through experimental validation on the TAU Spatial Sound Events 2019 development dataset, our approach significantly improves the performance of SELD compared to the two-stage network baseline model. Furthermore, the effectiveness of DCAM and the two-layer Bi-GRU structure is confirmed by performing ablation experiments. Full article

(This article belongs to the Special Issue Sensors and Techniques for Indoor Positioning and Localization)

► Show Figures

Figure 1

21 pages, 6251 KB

Open AccessArticle

A High-Resolution Time Reversal Method for Target Localization in Reverberant Environments

by Huiying Ma, Tao Shang, Gufeng Li and Zhaokun Li

Sensors 2024, 24(10), 3196; https://doi.org/10.3390/s24103196 - 17 May 2024

Cited by 1 | Viewed by 1602

Abstract

Reverberation in real environments is an important factor affecting the high resolution of target sound source localization (SSL) methods. Broadband low-frequency signals are common in real environments. This study focuses on the localization of this type of signal in reverberant environments. Because the [...] Read more.

Reverberation in real environments is an important factor affecting the high resolution of target sound source localization (SSL) methods. Broadband low-frequency signals are common in real environments. This study focuses on the localization of this type of signal in reverberant environments. Because the time reversal (TR) method can overcome multipath effects and realize adaptive focusing, it is particularly suitable for SSL in a reverberant environment. On the basis of the significant advantages of the sparse Bayesian learning algorithm in the estimation of wave direction, a novel SSL is proposed in reverberant environments. First, the sound propagation model in a reverberant environment is studied and the TR focusing signal is obtained. We then use the sparse Bayesian framework to locate the broadband low-frequency sound source. To validate the effectiveness of the proposed method for broadband low-frequency targeting in a reverberant environment, simulations and real data experiments were performed. The localization performance under different bandwidths, different numbers of microphones, signal-to-noise ratios, reverberation times, and off-grid conditions was studied in the simulation experiments. The practical experiment was conducted in a reverberation chamber. Simulation and experimental results indicate that the proposed method can achieve satisfactory spatial resolution in reverberant environments and is robust. Full article

(This article belongs to the Collection Sensors and Systems for Indoor Positioning)

► Show Figures

Figure 1

28 pages, 11874 KB

Open AccessArticle

Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios

by Ali Dehghan Firoozabadi, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar Azurdia-Meza

Sensors 2023, 23(9), 4499; https://doi.org/10.3390/s23094499 - 5 May 2023

Viewed by 2277

Abstract

Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for [...] Read more.

Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation. Full article

(This article belongs to the Special Issue Localising Sensors through Wireless Communication)

► Show Figures

Figure 1

17 pages, 3474 KB

Open AccessCommunication

Listen to the Brain–Auditory Sound Source Localization in Neuromorphic Computing Architectures

by Daniel Schmid, Timo Oess and Heiko Neumann

Sensors 2023, 23(9), 4451; https://doi.org/10.3390/s23094451 - 2 May 2023

Cited by 4 | Viewed by 2815

Abstract

Conventional processing of sensory input often relies on uniform sampling leading to redundant information and unnecessary resource consumption throughout the entire processing pipeline. Neuromorphic computing challenges these conventions by mimicking biology and employing distributed event-based hardware. Based on the task of lateral auditory [...] Read more.

Conventional processing of sensory input often relies on uniform sampling leading to redundant information and unnecessary resource consumption throughout the entire processing pipeline. Neuromorphic computing challenges these conventions by mimicking biology and employing distributed event-based hardware. Based on the task of lateral auditory sound source localization (SSL), we propose a generic approach to map biologically inspired neural networks to neuromorphic hardware. First, we model the neural mechanisms of SSL based on the interaural level difference (ILD). Afterward, we identify generic computational motifs within the model and transform them into spike-based components. A hardware-specific step then implements them on neuromorphic hardware. We exemplify our approach by mapping the neural SSL model onto two platforms, namely the IBM TrueNorth Neurosynaptic System and SpiNNaker. Both implementations have been tested on synthetic and real-world data in terms of neural tunings and readout characteristics. For synthetic stimuli, both implementations provide a perfect readout (

100 %

accuracy). Preliminary real-world experiments yield accuracies of

78 %

(TrueNorth) and

13 %

(SpiNNaker), RMSEs of

41^{\circ}

and

39^{\circ}

, and MAEs of

18^{\circ}

and

29^{\circ}

, respectively. Overall, the proposed mapping approach allows for the successful implementation of the same SSL model on two different neuromorphic architectures paving the way toward more hardware-independent neural SSL. Full article

(This article belongs to the Special Issue Advanced Technology in Acoustic Signal Processing)

► Show Figures

Figure 1

14 pages, 1564 KB

Open AccessArticle

Sound Source Localization Indoors Based on Two-Level Reference Points Matching

by Shuopeng Wang, Peng Yang and Hao Sun

Appl. Sci. 2022, 12(19), 9956; https://doi.org/10.3390/app12199956 - 3 Oct 2022

Cited by 3 | Viewed by 1811

Abstract

A dense sample point layout is the conventional approach to ensure the positioning accuracy for fingerprint-based sound source localization (SSL) indoors. However, mass reference point (RPs) matching of online phases may greatly reduce positioning efficiency. In response to this compelling problem, a two-level [...] Read more.

A dense sample point layout is the conventional approach to ensure the positioning accuracy for fingerprint-based sound source localization (SSL) indoors. However, mass reference point (RPs) matching of online phases may greatly reduce positioning efficiency. In response to this compelling problem, a two-level matching strategy is adopted to shrink the adjacent RPs searching scope. In the first-level matching process, two different methods are adopted to shrink the search scope of the online phase in a simple scene and a complex scene. According to the global range of high similarity between adjacent samples in a simple scene, a greedy search method is adopted for fast searching of the sub-database that contains the adjacent RPs. Simultaneously, in accordance with the specific local areas’ range of high similarity between adjacent samples in a complex scene, the clustering method is used for database partitioning, and the RPs search scope can be compressed by sub-database matching. Experimental results show that the two-level RPs matching strategy can effectively improve the RPs matching efficiency for the two different typical indoor scenes on the premise of ensuring the positioning accuracy. Full article

(This article belongs to the Special Issue Audio and Acoustic Signal Processing)

► Show Figures

Figure 1

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI