Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction
Abstract
1. Introduction
2. Sound Source Localization Based on DOA and an 8-Microphone Array
2.1. DOA Estimation Based on DP-RTF
2.1.1. Principle of Single-Source DP-RTF Estimation
2.1.2. Extension of Multi-Source DP-RTF Estimation
2.1.3. Multi-Source DOA Estimation
2.2. Quantitative Model of Spatial Entropy in 3D Multi-Sound-Source Scenarios
2.3. First-Order Entropy Reduction Based on DOA Sorting: Constraining the Combination Space
2.4. Second-Order Entropy Reduction Based on Geometric Intersection
2.4.1. Step-I
2.4.2. Step-II
2.4.3. Step-III
2.5. Regression Model for Localization Correction
- 1.
- In Solution-I, we aim to establish a regression model such that
- 2.
- In Solution-II, we establish a regression model such that
- 3.
- In Solution-III, we establish a regression model such that
3. Simulation Research
3.1. Simulation Conditions
3.2. Results of Each Solution
3.3. Performance of Solution-III-SVR Model
3.3.1. Environments with Different Reverberation Levels
3.3.2. Results of Different Microphone Configurations
3.3.3. Results of Different DOA Algorithms
4. A Practical Case of Our Method
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
SSL | Sound Source Localization |
DOA | Direction Of Arrival |
GCC | Generalized Cross-Correlation |
MUSIC | Multiple Signal Classification |
GCC-PHAT | Generalized Cross-Correlation–Phase Transform |
DP-RTF | Direct Path Relative Transfer Function |
RF | Random Forest |
GBR | Gradient Boosting Regression |
KRR | Kernel Ridge Regression |
RR | Ridge Regression |
SVR | Support Vector Regression |
RT | Reverberation Time |
EDE | Euclidean Distance Error |
SR | Success Rate |
MEDE | Mean Euclidean Distance Error |
MAE | Mean Absolute Error |
Solution-I-KRR | Solution-I based on KRR |
Solution-II-SVR | Solution-II based on SVR |
Solution-III-SVR | Solution-III based on SVR |
References
- Hou, X.; Bergmann, J. Pedestrian Dead Reckoning With Wearable Sensors: A Systematic Review. IEEE Sens. J. 2021, 21, 143–152. [Google Scholar] [CrossRef]
- Ravindran, R.; Santora, M.J.; Jamali, M.M. Multi-Object Detection and Tracking, Based on DNN, for Autonomous Vehicles: A Review. IEEE Sens. J. 2021, 21, 5668–5677. [Google Scholar] [CrossRef]
- Garcia, N.; Wymeersch, H.; Larsson, E.G.; Haimovich, A.M.; Coulon, M. Direct Localization for Massive MIMO. IEEE Trans. Signal Process. 2017, 65, 2475–2487. [Google Scholar] [CrossRef]
- Alsmadi, L.; Kong, X.; Sandrasegaran, K.; Fang, G. An Improved Indoor Positioning Accuracy Using Filtered RSSI and Beacon Weight. IEEE Sens. J. 2021, 21, 18205–18213. [Google Scholar] [CrossRef]
- Yoo, J. Change Detection of RSSI Fingerprint Pattern for Indoor Positioning System. IEEE Sens. J. 2020, 20, 2608–2615. [Google Scholar] [CrossRef]
- Hajiakhondi-Meybodi, Z.; Mohammadi, A.; Hou, M.; Plataniotis, K.N. DQLEL: Deep Q-learning for energy-optimized LoS/NLoS UWB node selection. IEEE Trans. Signal Process. 2022, 70, 2532–2547. [Google Scholar] [CrossRef]
- Liaquat, M.U.; Munawar, H.S.; Rahman, A.; Qadir, Z.; Kouzani, A.Z.; Mahmud, M.A.P. Localization of Sound Sources: A Systematic Review. Energies 2021, 14, 3910. [Google Scholar] [CrossRef]
- Jekateryńczuk, G.; Piotrowski, Z. A Survey of Sound Source Localization and Detection Methods and Their Applications. Sensors 2024, 24, 68. [Google Scholar] [CrossRef] [PubMed]
- Chen, B.; Hei, C.; Luo, M.; Ho, M.S.C.; Song, G. Pipeline two-dimensional impact location determination using time of arrival with instant phase (TOAIP) with piezoceramic transducer array. Smart Mater. Struct. 2018, 27, 105003. [Google Scholar] [CrossRef]
- Wang, Y.W.; Li, J.G.; Yang, J. Acoustic Localization Based on the D-S Evidence Theory for Pressurized Gas Leakage Detection. In Proceedings of the 2023 5th International Conference on Industrial Artificial Intelligence (IAI), Shenyang, China, 21–24 August 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Luo, Z.; Liu, W.; Wang, Z.; Ao, S. Monitoring of laser welding using source localization and tracking processing by microphone array. Int. J. Adv. Manuf. Technol. 2016, 86, 21–28. [Google Scholar] [CrossRef]
- Nishikawa, A.; Hattori, K.; Tanaka, M.; Muranami, H.; Nishi, H. Anomalous Sound Detection, Extraction, and Localization for Refrigerator Units Using a Microphone Array. In Proceedings of the IECON 2022—48th Annual Conference of the IEEE Industrial Electronics Society, Brussels, Belgium, 17–20 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
- He, W.; Motlicek, P.; Odobez, J.M. Deep Neural Networks for Multiple Speaker Detection and Localization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 74–79. [Google Scholar] [CrossRef]
- Bingol, M.; Aydogmus, O. Performing predefined tasks using the human–robot interaction on speech recognition for an industrial robot. Eng. Appl. Artif. Intell. 2020, 95, 103903. [Google Scholar] [CrossRef]
- Meza, I.; Rascon, C.; Fuentes-Pineda, G.; Pineda, L. On Indexicality, Direction of Arrival of Sound Sources, and Human-Robot Interaction. J. Robot. 2016, 2016, 3081048. [Google Scholar] [CrossRef]
- Wu, S.; Zheng, Y.; Ye, K.; Cao, H.; Zhang, X.; Sun, H. Sound Source Localization for Unmanned Aerial Vehicles in Low Signal-to-Noise Ratio Environments. Remote Sens. 2024, 16, 1847. [Google Scholar] [CrossRef]
- Dang, X.; Ma, W.; Habets, E.A.P.; Zhu, H. TDOA-Based Robust Sound Source Localization With Sparse Regularization in Wireless Acoustic Sensor Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 1108–1123. [Google Scholar] [CrossRef]
- Alexandridis, A.; Mouchtaris, A. Multiple Sound Source Location Estimation in Wireless Acoustic Sensor Networks Using DOA Estimates: The Data-Association Problem. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 342–356. [Google Scholar] [CrossRef]
- Xenaki, A.; Boldt, J.; Christensen, M. Sound source localization and speech enhancement with sparse Bayesian learning beamforming. J. Acoust. Soc. Am. 2018, 143, 3912–3921. [Google Scholar] [CrossRef]
- Avots, E.; Vecvanags, A.; Filipovs, J.; Brauns, A.; Skudrins, G.; Done, G.; Ozolins, J.; Anbarjafari, G.; Jakovels, D. Towards Automated Detection and Localization of Red Deer Cervus elaphus Using Passive Acoustic Sensors during the Rut. Remote Sens. 2022, 14, 2464. [Google Scholar] [CrossRef]
- Qiu, Y.; Li, B.; Huang, J.; Jiang, Y.; Wang, B.; Huang, Z. An Analytical Method for 3-D Sound Source Localization Based on a Five-Element Microphone Array. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
- Qin, B.; Zhang, H.; Fu, Q.; Yan, Y. Subsample time delay estimation via improved GCC PHAT algorithm. In Proceedings of the 2008 9th International Conference on Signal Processing, Beijing, China, 26–29 October 2008; pp. 2579–2582. [Google Scholar] [CrossRef]
- Chung, M.A.; Lin, C.W.; Chou, H.C. Combined Multisensor-Based Angle Clipping Algorithm and Multichannel Noise Removal Method for Multichannel Sound Localization. IEEE Sens. J. 2024, 24, 700–709. [Google Scholar] [CrossRef]
- Padois, T. Acoustic source localization based on the generalized cross-correlation and the generalized mean with few microphones. J. Acoust. Soc. Am. 2018, 143, EL393–EL398. [Google Scholar] [CrossRef]
- Knapp, C.; Carter, G. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 1976, 24, 320–327. [Google Scholar] [CrossRef]
- Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef]
- Li, X.; Ban, Y.; Girin, L.; Alameda-Pineda, X.; Horaud, R. Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments. IEEE J. Sel. Top. Signal Process. 2019, 13, 88–103. [Google Scholar] [CrossRef]
- Li, X.; Girin, L.; Horaud, R.; Gannot, S. Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 24, 2171–2186. [Google Scholar] [CrossRef]
- Grobler, C.J.; Kruger, C.P.; Silva, B.J.; Hancke, G.P. Sound based localization and identification in industrial environments. In Proceedings of the IECON 2017—43rd Annual Conference of the IEEE Industrial Electronics Society, Beijing, China, 29 October–1 November 2017; pp. 6119–6124. [Google Scholar] [CrossRef]
- Grondin, F.; Létourneau, D.; Godin, C.; Lauzon, J.S.; Vincent, J.; Michaud, S.; Faucher, S.; Michaud, F. ODAS: Open embeddeD Audition System. Front. Robot. AI 2022, 9, 854444. [Google Scholar] [CrossRef]
- Zhuo, D.B.; Cao, H. Fast Sound Source Localization Based on SRP-PHAT Using Density Peaks Clustering. Appl. Sci. 2021, 11, 445. [Google Scholar] [CrossRef]
- Kraljević, L.; Russo, M.; Stella, M.; Sikora, M. Free-Field TDOA-AOA Sound Source Localization Using Three Soundfield Microphones. IEEE Access 2020, 8, 87749–87761. [Google Scholar] [CrossRef]
- Krause, D.A.; García-Barrios, G.; Politis, A.; Mesaros, A. Binaural Sound Source Distance Estimation and Localization for a Moving Listener. IEEE/ACM Trans. Audio, Speech, Lang. Process. 2024, 32, 996–1011. [Google Scholar] [CrossRef]
- Thakur, S.; Singh, S. An Improved 3-D Sound Source Localization of Varied Sources Using Oblique Square Pyramid Array. IEEE Sens. J. 2024, 24, 1772–1783. [Google Scholar] [CrossRef]
- Padois, T.; Berry, A. Two and Three-Dimensional Sound Source Localization with Beamforming and Several Deconvolution Techniques. Acta Acust. United Acust. 2017, 103, 392–400. [Google Scholar] [CrossRef]
- Yang, X.; Xing, H.; Ji, X. Sound Source Omnidirectional Positioning Calibration Method Based on Microphone Observation Angle. Complexity 2018, 2018, 2317853. [Google Scholar] [CrossRef]
- Fu, Y.; Ge, M.; Yin, H.; Qian, X.; Wang, L.; Zhang, G.; Dang, J. Iterative Sound Source Localization for Unknown Number of Sources. arXiv 2022, arXiv:2206.12273. [Google Scholar] [CrossRef]
- Li, X.; Girin, L.; Horaud, R.; Gannot, S. Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1997–2012. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Kim, G.; Park, B.; Kim, A. 1-Day Learning, 1-Year Localization: Long-Term LiDAR Localization Using Scan Context Image. IEEE Robot. Autom. Lett. 2019, 4, 1948–1955. [Google Scholar] [CrossRef]
- Zhang, S.; Xie, L.; Adams, M. Entropy based feature selection scheme for real time simultaneous localization and map building. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 1175–1180. [Google Scholar] [CrossRef]
- Rickard, S.; Yilmaz, O. On the approximate W-disjoint orthogonality of speech. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; Volume 1, pp. I-529–I-532. [Google Scholar] [CrossRef]
- Yilmaz, O.; Rickard, S. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 2004, 52, 1830–1847. [Google Scholar] [CrossRef]
- Sabine, W.C.; Egan, M.D. Collected Papers on Acoustics. J. Acoust. Soc. Am. 1994, 95, 3679–3680. [Google Scholar] [CrossRef]
- Davidbain. carEngineStart.wav. 2024. License: Attribution 4.0; Freesound Website; Audio File. Available online: https://freesound.org/s/209864/ (accessed on 10 December 2024).
- Robinhood76. 00773 Leaking Gas 1.wav. 2024. License: Attribution NonCommercial 4.0; FreesoundWebsite; Audio File. Available online: https://freesound.org/s/66248/ (accessed on 10 December 2024).
- Tosha73. Welding Machine.wav. 2024. License: Creative Commons 0; FreesoundWebsite; Audio File. Available online: https://freesound.org/s/496210/ (accessed on 10 December 2024).
- Dobroide. 20060422.sewing.machine.wav. 2024. License: Attribution 4.0; Freesound Website; Audio File. Available online: https://freesound.org/s/18455/ (accessed on 10 December 2024).
- Nebyoolae. Sensor Beeps. 2024. License: Attribution 4.0; FreesoundWebsite; Audio File. Available online: https://freesound.org/s/250285/ (accessed on 10 December 2024).
- GowlerMusic. Alarm Clock. 2024. License: Attribution 4.0; Freesound Website; Audio File. Available online: https://freesound.org/s/264863/ (accessed on 10 December 2024).
- Firoozabadi, A.D.; Irarrazaval, P.; Adasme, P.; Zabala-Blanco, D.; Palacios-Játiva, P.; Durney, H.; Sanhueza, M.; Azurdia-Meza, C. Three-dimensional sound source localization by distributed microphone arrays. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 196–200. [Google Scholar] [CrossRef]
x/m | y/m | z/m | |
---|---|---|---|
3.6 | 0.6 | 1.0 | |
4.2 | 0.6 | 1.0 | |
3.6 | 0.6 | 2.0 | |
4.2 | 0.6 | 2.0 | |
Sound | 2.5–6.5 | 0.5–4.0 | 0.5–2.2 |
Name | Description | Number of Samples |
---|---|---|
Engine noises [45] | Car engine being started, run idle then stopped. | 112 |
Gas leakage sound [46] | Hissing leaking gas recorded by zoom h2. | 107 |
Welding noises [47] | Noise of a welding machine. | 114 |
Sewing machine sounds [48] | Sound of an old sewing machine/sound of an old sewing machine. | 98 |
Buzzer tones [49] | A weird sensor alarm that talks and beeps at you. | 113 |
Alarm signals [50] | Alarm clock sound effect recorded in ableton live. | 104 |
Method | |||||||
---|---|---|---|---|---|---|---|
RF | 0.993 | 0.989 | 0.993 | 0.993 | 0.983 | 0.856 | 0.849 |
GBR | 0.995 | 0.996 | 0.996 | 0.996 | 0.983 | 0.858 | 0.829 |
KRR | 0.997 | 0.997 | 0.997 | 0.994 | 0.985 | 0.895 | 0.882 |
RR | 0.993 | 0.995 | 0.993 | 0.989 | 0.982 | 0.804 | 0.761 |
SVR | 0.995 | 0.996 | 0.995 | 0.991 | 0.985 | 0.833 | 0.751 |
PGM | 0.991 | 0.988 | 0.989 | 0.701 | 0.811 | 0.691 | 0.688 |
Method | |||||
---|---|---|---|---|---|
RF | 0.846 | 0.855 | 0.996 | 0.792 | 0.844 |
GBR | 0.872 | 0.889 | 0.995 | 0.835 | 0.871 |
KRR | 0.879 | 0.859 | 0.993 | 0.830 | 0.838 |
RR | 0.669 | 0.614 | 0.961 | 0.457 | 0.581 |
SVR | 0.905 | 0.914 | 0.997 | 0.870 | 0.939 |
Method | () | ||
---|---|---|---|
RF | 0.996 | 0.853 | 0.840 |
GBR | 0.995 | 0.846 | 0.758 |
KRR | 0.993 | 0.895 | 0.813 |
RR | 0.961 | 0.562 | 0.484 |
SVR | 0.997 | 0.894 | 0.934 |
Input Combination | MAE (m) | ||
---|---|---|---|
, , , | 0.02 | 0.08 | 0.09 |
, , , | 0.04 | 0.27 | 0.41 |
, , , | 0.06 | 0.31 | 0.49 |
, , , ⊗ | 0.41 | 2.27 | 1.15 |
Method | 2D/3D | MEDE | Applicable to Reverberation | Applicable to Multiple Sound Sources | Number of Microphones |
---|---|---|---|---|---|
Thakur & Singh [34] | 3D | 19 cm | No | No | 5 |
Li et al. [38] | 2D | / | Yes | Yes | 2 |
Krause et al. [33] | 2D | 1.6 m | Yes | No | 2 |
Lee & Kim [21] | 3D | 3 cm | No | No | 5 |
Yang el al. [36] | 3D | 7 cm | No | No | 7 |
Wang et al. [10] | 2D | 10 cm | No | No | / |
Luo et al. [11] | 2D | / | No | No | 8 |
Dehghan Firoozabadi et al. [51] | 3D | 30–40 cm | No | Yes | 38 |
Proposed | 3D | 5–15 cm | Yes | Yes | 8 |
x/m | y/m | z/m | |
---|---|---|---|
1.6 | 0.6 | 1.0 | |
2.2 | 0.6 | 1.0 | |
1.6 | 0.6 | 2.0 | |
2.2 | 0.6 | 2.0 | |
Sound | 0.6–3.0 | 0.5–2.0 | 0.6–2.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, G.; Zhao, F.; Tian, W.; Yang, T. Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction. Entropy 2025, 27, 942. https://doi.org/10.3390/e27090942
Li G, Zhao F, Tian W, Yang T. Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction. Entropy. 2025; 27(9):942. https://doi.org/10.3390/e27090942
Chicago/Turabian StyleLi, Guangneng, Feiyu Zhao, Wei Tian, and Tong Yang. 2025. "Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction" Entropy 27, no. 9: 942. https://doi.org/10.3390/e27090942
APA StyleLi, G., Zhao, F., Tian, W., & Yang, T. (2025). Three-Dimensional Sound Source Localization with Microphone Array Combining Spatial Entropy Quantification and Machine Learning Correction. Entropy, 27(9), 942. https://doi.org/10.3390/e27090942