Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (18)

Search Parameters:
Keywords = sound source localization (SSL)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 7844 KB  
Article
Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction
by Guangneng Li, Feiyu Zhao, Wei Tian and Tong Yang
Entropy 2025, 27(9), 942; https://doi.org/10.3390/e27090942 - 9 Sep 2025
Viewed by 892
Abstract
In recent years, with the popularization of intelligent scene monitoring, sound source localization (SSL) has become a major means for indoor monitoring and target positioning. However, existing sound source localization solutions are difficult to extend to multi-source and three-dimensional scenarios. To address this, [...] Read more.
In recent years, with the popularization of intelligent scene monitoring, sound source localization (SSL) has become a major means for indoor monitoring and target positioning. However, existing sound source localization solutions are difficult to extend to multi-source and three-dimensional scenarios. To address this, this paper proposes a three-dimensional sound source localization technology based on eight microphones. Specifically, the method employs a rectangular eight-microphone array and captures Direction-of-Arrival (DOA) information via the direct path relative transfer function (DP-RTF). It introduces spatial entropy to quantify the uncertainty caused by the exponentially growing DOA combinations as the number of sound sources increases, while further reducing the spatial entropy of sound source localization through geometric intersection. This solves the problem that traditional sound source localization methods cannot be applied to multi-source and three-dimensional scenarios. On the other hand, machine learning is used to eliminate coordinate deviations caused by DOA estimation errors of the direct path relative transfer function (DP-RTF) and deviations in microphone geometric parameters. Both simulation experiments and real-scene experiments show that the positioning error of the proposed method in three-dimensional scenarios is about 10.0 cm. Full article
Show Figures

Figure 1

51 pages, 15030 KB  
Review
A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods
by Reza Jalayer, Masoud Jalayer and Amirali Baniasadi
Appl. Sci. 2025, 15(17), 9354; https://doi.org/10.3390/app15179354 - 26 Aug 2025
Cited by 1 | Viewed by 1578
Abstract
Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation, human–machine dialogue, and condition monitoring. While existing surveys provide valuable [...] Read more.
Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation, human–machine dialogue, and condition monitoring. While existing surveys provide valuable historical context, they typically address general audio applications and do not fully account for robotic constraints or the latest advancements in deep learning. This review addresses these gaps by offering a robotics-focused synthesis, emphasizing recent progress in deep learning methodologies. We start by reviewing classical methods such as time difference of arrival (TDOA), beamforming, steered-response power (SRP), and subspace analysis. Subsequently, we delve into modern machine learning (ML) and deep learning (DL) approaches, discussing traditional ML and neural networks (NNs), convolutional neural networks (CNNs), convolutional recurrent neural networks (CRNNs), and emerging attention-based architectures. The data and training strategy that are the two cornerstones of DL-based SSL are explored. Studies are further categorized by robot types and application domains to facilitate researchers in identifying relevant work for their specific contexts. Finally, we highlight the current challenges in SSL works in general, regarding environmental robustness, sound source multiplicity, and specific implementation constraints in robotics, as well as data and learning strategies in DL-based SSL. Also, we sketch promising directions to offer an actionable roadmap toward robust, adaptable, efficient, and explainable DL-based SSL for next-generation robots. Full article
Show Figures

Figure 1

19 pages, 3044 KB  
Review
Deep Learning-Based Sound Source Localization: A Review
by Kunbo Xu, Zekai Zong, Dongjun Liu, Ran Wang and Liang Yu
Appl. Sci. 2025, 15(13), 7419; https://doi.org/10.3390/app15137419 - 2 Jul 2025
Viewed by 1805
Abstract
As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which [...] Read more.
As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which struggle to meet practical demands in dynamic and complex scenarios. Recent advancements in deep learning have revolutionized SSL by leveraging its end-to-end feature adaptability, cross-scenario generalization capabilities, and data-driven modeling, significantly enhancing localization robustness and accuracy in challenging environments. This review systematically examines the progress of deep learning-based SSL across three critical domains: marine environments, indoor reverberant spaces, and unmanned aerial vehicle (UAV) monitoring. In marine scenarios, complex-valued convolutional networks combined with adversarial transfer learning mitigate environmental mismatch and multipath interference through phase information fusion and domain adaptation strategies. For indoor high-reverberation conditions, attention mechanisms and multimodal fusion architectures achieve precise localization under low signal-to-noise ratios by adaptively weighting critical acoustic features. In UAV surveillance, lightweight models integrated with spatiotemporal Transformers address dynamic modeling of non-stationary noise spectra and edge computing efficiency constraints. Despite these advancements, current approaches face three core challenges: the insufficient integration of physical principles, prohibitive data annotation costs, and the trade-off between real-time performance and accuracy. Future research should prioritize physics-informed modeling to embed acoustic propagation mechanisms, unsupervised domain adaptation to reduce reliance on labeled data, and sensor-algorithm co-design to optimize hardware-software synergy. These directions aim to propel SSL toward intelligent systems characterized by high precision, strong robustness, and low power consumption. This work provides both theoretical foundations and technical references for algorithm selection and practical implementation in complex real-world scenarios. Full article
Show Figures

Figure 1

25 pages, 7065 KB  
Article
A Planer Moving Microphone Array for Sound Source Localization
by Chuyang Wang, Karhang Chu and Yatsze Choy
Appl. Sci. 2025, 15(12), 6777; https://doi.org/10.3390/app15126777 - 16 Jun 2025
Viewed by 4012
Abstract
Sound source localization (SSL) equips service robots with the ability to perceive sound similarly to humans, which is particularly valuable in complex, dark indoor environments where vision-based systems may not work. From a data collection perspective, increasing the number of microphones generally improves [...] Read more.
Sound source localization (SSL) equips service robots with the ability to perceive sound similarly to humans, which is particularly valuable in complex, dark indoor environments where vision-based systems may not work. From a data collection perspective, increasing the number of microphones generally improves SSL performance. However, a large microphone array such as a 16-microphone array configuration may occupy significant space on a robot. To address this, we propose a novel framework that uses a structure of four planar moving microphones to emulate the performance of a 16-microphone array, thereby saving space. Because of its unique design, this structure can dynamically form various spatial patterns, enabling 3D SSL, including estimation of angle, distance, and height. For experimental comparison, we also constructed a circular 6-microphone array and a planar 4 × 4 microphone array, both capable of rotation to ensure fairness. Three SSL algorithms were applied across all configurations. Experiments were conducted in a standard classroom environment, and the results show that the proposed framework achieves approximately 80–90% accuracy in angular estimation and around 85% accuracy in distance and height estimation, comparable to the performance of the 4 × 4 planar microphone array. Full article
(This article belongs to the Special Issue Noise Measurement, Acoustic Signal Processing and Noise Control)
Show Figures

Figure 1

23 pages, 7047 KB  
Article
UaVirBASE: A Public-Access Unmanned Aerial Vehicle Sound Source Localization Dataset
by Gabriel Jekateryńczuk, Rafał Szadkowski and Zbigniew Piotrowski
Appl. Sci. 2025, 15(10), 5378; https://doi.org/10.3390/app15105378 - 12 May 2025
Cited by 1 | Viewed by 1425
Abstract
This article presents UaVirBASE, a publicly available dataset for the sound source localization (SSL) of unmanned aerial vehicles (UAVs). The dataset contains synchronized multi-microphone recordings captured under controlled conditions, featuring variations in UAV distances, altitudes, azimuths, and orientations relative to a fixed microphone [...] Read more.
This article presents UaVirBASE, a publicly available dataset for the sound source localization (SSL) of unmanned aerial vehicles (UAVs). The dataset contains synchronized multi-microphone recordings captured under controlled conditions, featuring variations in UAV distances, altitudes, azimuths, and orientations relative to a fixed microphone array. UAV orientations include front, back, left, and right-facing configurations. UaVirBASE addresses the growing need for standardized SSL datasets tailored for UAV applications, filling a gap left behind by existing databases that often lack such specific variations. Additionally, we describe the software and hardware employed for data acquisition and annotation alongside an analysis of the dataset’s structure. With its well-annotated and diverse data, UaVirBASE is ideally suited for applications in artificial intelligence, particularly in developing and benchmarking machine learning and deep learning models for SSL. Controlling the dataset’s variations enables the training of AI systems capable of adapting to complex UAV-based scenarios. We also demonstrate the architecture and results of the deep neural network (DNN) trained on this dataset, evaluating model performance across different features. Our results show an average Mean Absolute Error (MAE) of 0.5 m for distance and height, an average azimuth error of around 1 degree, and side errors under 10 degrees. UaVirBASE serves as a valuable resource to support reproducible research and foster innovation in UAV-based acoustic signal processing by addressing the need for a standardized and versatile UAV SSL dataset. Full article
(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)
Show Figures

Figure 1

18 pages, 10372 KB  
Article
Acoustic Fabry–Perot Resonance Detector for Passive Acoustic Thermometry and Sound Source Localization
by Yan Yue, Zhifei Dong and Zhi-mei Qi
Sensors 2025, 25(8), 2445; https://doi.org/10.3390/s25082445 - 12 Apr 2025
Viewed by 584
Abstract
Acoustic temperature measurement (ATM) and sound source localization (SSL) are two important applications of acoustic sensors. The development of novel acoustic sensors capable of both ATM and SSL is an innovative research topic with great interest. In this work, an acoustic Fabry-Perot resonance [...] Read more.
Acoustic temperature measurement (ATM) and sound source localization (SSL) are two important applications of acoustic sensors. The development of novel acoustic sensors capable of both ATM and SSL is an innovative research topic with great interest. In this work, an acoustic Fabry-Perot resonance detector (AFPRD) and its cross-shaped array were designed and fabricated, and the passive ATM function of the AFPRD and the SSL capability of the AFPRD array were simulated and experimentally verified. The AFPRD consists of an acoustic waveguide and a microphone with its head inserted into the waveguide, which can significantly enhance the microphone’s sensitivity via the FP resonance effect. As a result, the frequency response curve of AFPRD can be easily measured using weak ambient white noise. Based on the measured frequency response curve, the linear relationship between the resonant frequency and the resonant mode order of the AFPRD can be determined, the slope of which can be used to calculate the ambient sound velocity and air temperature. The AFPRD array was prepared by using four bent acoustic waveguides to expand the array aperture, which combined with the multiple signal classification (MUSIC) algorithm can be used for distant multi-target localization. The SSL accuracy can be improved by substituting the sound speed measured in real time into the MUSIC algorithm. The AFPRD’s passive ATM function was verified in an anechoic room with white noise as low as 17 dB, and the ATM accuracy reached 0.4 °C. The SSL function of the AFPRD array was demonstrated in the outdoor environment, and the SSL error of the acoustic target with a sound pressure of 35 mPa was less than 1.2°. The findings open up a new avenue for the development of multifunctional acoustic detection devices and systems. Full article
(This article belongs to the Special Issue Recent Advances in Optical and Optoelectronic Acoustic Sensors)
Show Figures

Figure 1

23 pages, 1181 KB  
Article
Diffusion-Based Sound Source Localization Using a Distributed Network of Microphone Arrays
by Davide Albertini, Alberto Bernardini, Gioele Greco and Augusto Sarti
Sensors 2025, 25(7), 2078; https://doi.org/10.3390/s25072078 - 26 Mar 2025
Viewed by 749
Abstract
Traditionally, microphone array networks for 3D sound source localization rely on centralized data processing, which can limit scalability and robustness. In this article, we recast the task of sound source localization (SSL) with networks of acoustic arrays as a distributed optimization problem. We [...] Read more.
Traditionally, microphone array networks for 3D sound source localization rely on centralized data processing, which can limit scalability and robustness. In this article, we recast the task of sound source localization (SSL) with networks of acoustic arrays as a distributed optimization problem. We then present two resolution approaches of such a problem; one is computationally centralized, while the other is computationally distributed and based on an Adapt-Then-Combine (ATC) diffusion strategy. In particular, we address 3D SSL with a network of linear microphone arrays, each of which estimates a stream of 2D directions of arrival (DoAs) and they cooperate with each other to localize a single sound source. We develop adaptive cooperation strategies to penalize the arrays with the most detrimental effects on localization accuracy and improve performance through error-based and distance-based penalties. The performance of the method is evaluated using increasingly complex DoA stream models and simulated acoustic environments characterized by various levels of reverberation and signal-to-noise ratio (SNR). Furthermore, we investigate how the performance is related to the connectivity of the network and show that the proposed approach maintains high localization accuracy and stability even in sparsely connected networks. Full article
Show Figures

Figure 1

23 pages, 8182 KB  
Article
Sound Source Localization Using Deep Learning for Human–Robot Interaction Under Intelligent Robot Environments
by Hong-Min Jo, Tae-Wan Kim and Keun-Chang Kwak
Electronics 2025, 14(5), 1043; https://doi.org/10.3390/electronics14051043 - 6 Mar 2025
Cited by 2 | Viewed by 1966
Abstract
In this paper, we propose Sound Source Localization (SSL) using deep learning for Human–Robot Interaction (HRI) under intelligent robot environments. The proposed SSL method consists of three steps. The first step preprocesses the sound source to minimize noise and reverberation in the robotic [...] Read more.
In this paper, we propose Sound Source Localization (SSL) using deep learning for Human–Robot Interaction (HRI) under intelligent robot environments. The proposed SSL method consists of three steps. The first step preprocesses the sound source to minimize noise and reverberation in the robotic environment. Excitation source information (ESI), which contains only the original components of the sound source, is extracted from a sound source in a microphone array mounted on a robot to minimize background influence. Here, the linear prediction residual is used as the ESI. Subsequently, the cross-correlation signal between each adjacent microphone pair is calculated by using the ESI signal of each sound source. To minimize the influence of noise, a Generalized Cross-Correlation with the phase transform (GCC-PHAT) algorithm is used. In the second step, we design a single-channel, multi-input convolutional neural network that can independently learn the calculated cross-correlation signal between each adjacent microphone pair and the location of the sound source using the time difference of arrival. The third step classifies the location of the sound source after training with the proposed network. Previous studies have primarily used various features as inputs and stacked them into multiple channels, which made the algorithm complex. Furthermore, multi-channel inputs may not be sufficient to clearly train the interrelationship between each sound source. To address this issue, the cross-correlation signal between each sound source alone is used as the network input. The proposed method was verified on the Electronics and Telecommunications Research Institute-Sound Source Localization (ETRI-SSL) database acquired from the robotic environment. The experimental results revealed that the proposed method showed an 8.75% higher performance in comparison to the previous works. Full article
(This article belongs to the Special Issue Control and Design of Intelligent Robots)
Show Figures

Figure 1

21 pages, 1104 KB  
Article
Advancing Applications of Robot Audition Systems: Efficient HARK Deployment with GPU and FPGA Implementations
by Zirui Lin, Hideharu Amano, Masayuki Takigahira, Naoya Terakado, Katsutoshi Itoyama, Haris Gulzar and Kazuhiro Nakadai
Chips 2025, 4(1), 2; https://doi.org/10.3390/chips4010002 - 27 Dec 2024
Cited by 1 | Viewed by 1798
Abstract
This paper proposes efficient implementations of robot audition systems, specifically focusing on deployments using HARK, an open-source software (OSS) platform designed for robot audition. Although robot audition systems are versatile and suitable for various scenarios, efficiently deploying them can be challenging due to [...] Read more.
This paper proposes efficient implementations of robot audition systems, specifically focusing on deployments using HARK, an open-source software (OSS) platform designed for robot audition. Although robot audition systems are versatile and suitable for various scenarios, efficiently deploying them can be challenging due to their high computational demands and extensive processing times. For scenarios involving intensive high-dimensional data processing with large-scale microphone arrays, our generalizable GPU-based implementation significantly reduced processing time, enabling real-time Sound Source Localization (SSL) and Sound Source Separation (SSS) using a 60-channel microphone array across two distinct GPU platforms. Specifically, our implementation achieved speedups of 23.3× for SSL and 3.0× for SSS on a high-performance server equipped with an NVIDIA A100 80 GB GPU. Additionally, on the Jetson AGX Orin 32 GB, which represents embedded environments, it achieved speedups of 14.8× for SSL and 1.6× for SSS. For edge computing scenarios, we developed an adaptable FPGA-based implementation of HARK using High-Level Synthesis (HLS) on M-KUBOS, a Multi-Access Edge Computing (MEC) FPGA Multiprocessor System on a Chip (MPSoC) device. Utilizing an eight-channel microphone array, this implementation achieved a 1.2× speedup for SSL and a 1.1× speedup for SSS, along with a 1.1× improvement in overall energy efficiency. Full article
Show Figures

Figure 1

14 pages, 1309 KB  
Article
Combined Keyword Spotting and Localization Network Based on Multi-Task Learning
by Jungbeom Ko, Hyunchul Kim and Jungsuk Kim
Mathematics 2024, 12(21), 3309; https://doi.org/10.3390/math12213309 - 22 Oct 2024
Viewed by 1600
Abstract
The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the [...] Read more.
The advent of voice assistance technology and its integration into smart devices has facilitated many useful services, such as texting and application execution. However, most assistive technologies lack the capability to enable the system to act as a human who can localize the speaker and selectively spot meaningful keywords. Because keyword spotting (KWS) and sound source localization (SSL) are essential and must operate in real time, the efficiency of a neural network model is crucial for memory and computation. In this paper, a single neural network model for KWS and SSL is proposed to overcome the limitations of sequential KWS and SSL, which require more memory and inference time. The proposed model uses multi-task learning to utilize the limited resources of the device efficiently. A shared encoder is used as the initial layer to extract common features from the multichannel audio data. Subsequently, the task-specific parallel layers utilize these features for KWS and SSL. The proposed model was evaluated on a synthetic dataset with multiple speakers, and a 7-module shared encoder structure was identified as optimal in terms of accuracy, direction of arrival (DOA) accuracy, DOA error, and latency. It achieved a KWS accuracy of 94.51%, DOA error of 12.397°, and DOA accuracy of 89.86%. Consequently, the proposed model requires significantly less memory owing to the shared network architecture, which enhances the inference time without compromising KWS accuracy, DOA error, and DOA accuracy. Full article
(This article belongs to the Special Issue Computational Intelligence and Machine Learning with Applications)
Show Figures

Figure 1

20 pages, 3915 KB  
Article
A Study of Improved Two-Stage Dual-Conv Coordinate Attention Model for Sound Event Detection and Localization
by Guorong Chen, Yuan Yu, Yuan Qiao, Junliang Yang, Chongling Du, Zhang Qian and Xiao Huang
Sensors 2024, 24(16), 5336; https://doi.org/10.3390/s24165336 - 18 Aug 2024
Cited by 1 | Viewed by 1517
Abstract
Sound Event Detection and Localization (SELD) is a comprehensive task that aims to solve the subtasks of Sound Event Detection (SED) and Sound Source Localization (SSL) simultaneously. The task of SELD lies in the need to solve both sound recognition and spatial localization [...] Read more.
Sound Event Detection and Localization (SELD) is a comprehensive task that aims to solve the subtasks of Sound Event Detection (SED) and Sound Source Localization (SSL) simultaneously. The task of SELD lies in the need to solve both sound recognition and spatial localization problems, and different categories of sound events may overlap in time and space, making it more difficult for the model to distinguish between different events occurring at the same time and to locate the sound source. In this study, the Dual-conv Coordinate Attention Module (DCAM) combines dual convolutional blocks and Coordinate Attention, and based on this, the network architecture based on the two-stage strategy is improved to form the SELD-oriented Two-Stage Dual-conv Coordinate Attention Model (TDCAM) for SELD. TDCAM draws on the concepts of Visual Geometry Group (VGG) networks and Coordinate Attention to effectively capture critical local information by focusing on the coordinate space information of the feature map and dealing with the relationship between the feature map channels to enhance the feature selection capability of the model. To address the limitation of a single-layer Bi-directional Gated Recurrent Unit (Bi-GRU) in the two-stage network in terms of timing processing, we add to the structure of the two-layer Bi-GRU and introduce the data enhancement techniques of the frequency mask and time mask to improve the modeling and generalization ability of the model for timing features. Through experimental validation on the TAU Spatial Sound Events 2019 development dataset, our approach significantly improves the performance of SELD compared to the two-stage network baseline model. Furthermore, the effectiveness of DCAM and the two-layer Bi-GRU structure is confirmed by performing ablation experiments. Full article
(This article belongs to the Special Issue Sensors and Techniques for Indoor Positioning and Localization)
Show Figures

Figure 1

21 pages, 6251 KB  
Article
A High-Resolution Time Reversal Method for Target Localization in Reverberant Environments
by Huiying Ma, Tao Shang, Gufeng Li and Zhaokun Li
Sensors 2024, 24(10), 3196; https://doi.org/10.3390/s24103196 - 17 May 2024
Cited by 1 | Viewed by 1602
Abstract
Reverberation in real environments is an important factor affecting the high resolution of target sound source localization (SSL) methods. Broadband low-frequency signals are common in real environments. This study focuses on the localization of this type of signal in reverberant environments. Because the [...] Read more.
Reverberation in real environments is an important factor affecting the high resolution of target sound source localization (SSL) methods. Broadband low-frequency signals are common in real environments. This study focuses on the localization of this type of signal in reverberant environments. Because the time reversal (TR) method can overcome multipath effects and realize adaptive focusing, it is particularly suitable for SSL in a reverberant environment. On the basis of the significant advantages of the sparse Bayesian learning algorithm in the estimation of wave direction, a novel SSL is proposed in reverberant environments. First, the sound propagation model in a reverberant environment is studied and the TR focusing signal is obtained. We then use the sparse Bayesian framework to locate the broadband low-frequency sound source. To validate the effectiveness of the proposed method for broadband low-frequency targeting in a reverberant environment, simulations and real data experiments were performed. The localization performance under different bandwidths, different numbers of microphones, signal-to-noise ratios, reverberation times, and off-grid conditions was studied in the simulation experiments. The practical experiment was conducted in a reverberation chamber. Simulation and experimental results indicate that the proposed method can achieve satisfactory spatial resolution in reverberant environments and is robust. Full article
(This article belongs to the Collection Sensors and Systems for Indoor Positioning)
Show Figures

Figure 1

28 pages, 11874 KB  
Article
Speaker Counting Based on a Novel Hive Shaped Nested Microphone Array by WPT and 2D Adaptive SRP Algorithms in Near-Field Scenarios
by Ali Dehghan Firoozabadi, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar Azurdia-Meza
Sensors 2023, 23(9), 4499; https://doi.org/10.3390/s23094499 - 5 May 2023
Viewed by 2277
Abstract
Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for [...] Read more.
Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation. Full article
(This article belongs to the Special Issue Localising Sensors through Wireless Communication)
Show Figures

Figure 1

17 pages, 3474 KB  
Communication
Listen to the Brain–Auditory Sound Source Localization in Neuromorphic Computing Architectures
by Daniel Schmid, Timo Oess and Heiko Neumann
Sensors 2023, 23(9), 4451; https://doi.org/10.3390/s23094451 - 2 May 2023
Cited by 4 | Viewed by 2815
Abstract
Conventional processing of sensory input often relies on uniform sampling leading to redundant information and unnecessary resource consumption throughout the entire processing pipeline. Neuromorphic computing challenges these conventions by mimicking biology and employing distributed event-based hardware. Based on the task of lateral auditory [...] Read more.
Conventional processing of sensory input often relies on uniform sampling leading to redundant information and unnecessary resource consumption throughout the entire processing pipeline. Neuromorphic computing challenges these conventions by mimicking biology and employing distributed event-based hardware. Based on the task of lateral auditory sound source localization (SSL), we propose a generic approach to map biologically inspired neural networks to neuromorphic hardware. First, we model the neural mechanisms of SSL based on the interaural level difference (ILD). Afterward, we identify generic computational motifs within the model and transform them into spike-based components. A hardware-specific step then implements them on neuromorphic hardware. We exemplify our approach by mapping the neural SSL model onto two platforms, namely the IBM TrueNorth Neurosynaptic System and SpiNNaker. Both implementations have been tested on synthetic and real-world data in terms of neural tunings and readout characteristics. For synthetic stimuli, both implementations provide a perfect readout (100% accuracy). Preliminary real-world experiments yield accuracies of 78% (TrueNorth) and 13% (SpiNNaker), RMSEs of 41 and 39, and MAEs of 18 and 29, respectively. Overall, the proposed mapping approach allows for the successful implementation of the same SSL model on two different neuromorphic architectures paving the way toward more hardware-independent neural SSL. Full article
(This article belongs to the Special Issue Advanced Technology in Acoustic Signal Processing)
Show Figures

Figure 1

14 pages, 1564 KB  
Article
Sound Source Localization Indoors Based on Two-Level Reference Points Matching
by Shuopeng Wang, Peng Yang and Hao Sun
Appl. Sci. 2022, 12(19), 9956; https://doi.org/10.3390/app12199956 - 3 Oct 2022
Cited by 3 | Viewed by 1811
Abstract
A dense sample point layout is the conventional approach to ensure the positioning accuracy for fingerprint-based sound source localization (SSL) indoors. However, mass reference point (RPs) matching of online phases may greatly reduce positioning efficiency. In response to this compelling problem, a two-level [...] Read more.
A dense sample point layout is the conventional approach to ensure the positioning accuracy for fingerprint-based sound source localization (SSL) indoors. However, mass reference point (RPs) matching of online phases may greatly reduce positioning efficiency. In response to this compelling problem, a two-level matching strategy is adopted to shrink the adjacent RPs searching scope. In the first-level matching process, two different methods are adopted to shrink the search scope of the online phase in a simple scene and a complex scene. According to the global range of high similarity between adjacent samples in a simple scene, a greedy search method is adopted for fast searching of the sub-database that contains the adjacent RPs. Simultaneously, in accordance with the specific local areas’ range of high similarity between adjacent samples in a complex scene, the clustering method is used for database partitioning, and the RPs search scope can be compressed by sub-database matching. Experimental results show that the two-level RPs matching strategy can effectively improve the RPs matching efficiency for the two different typical indoor scenes on the premise of ensuring the positioning accuracy. Full article
(This article belongs to the Special Issue Audio and Acoustic Signal Processing)
Show Figures

Figure 1

Back to TopTop