Research on Traffic Sound Event Detection Based on Multi-Scale Feature Fusion
Abstract
1. Introduction
2. Feature Extraction Method
3. TSED-CNN Model
3.1. Sound Event Detection Model
3.2. The Improvement of Network Structure
3.2.1. Multi-Scale Convolutional Block
3.2.2. Hybrid Multi-Scale Attention Module
4. Experimental Analysis
4.1. Dataset
4.2. Experimental Environment and Evaluation Metrics
4.3. Analysis of Experimental Results
4.4. Ablation Test
4.5. Analysis and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, L.; Ota, K.; Dong, M. Humanlike Driving: Empirical Decision-Making System for Autonomous Vehicles. IEEE Trans. Veh. Technol. 2018, 67, 6814–6823. [Google Scholar] [CrossRef]
- Mishra, S.K.; Das, S. A Review on Vision Based Control of Autonomous Vehicles Using Artificial Intelligence Techniques. In Proceedings of the 2019 International Conference on Information Technology (ICIT), Bhubaneswar, India, 19–21 December 2019; pp. 500–504. [Google Scholar]
- Kuutti, S.; Bowden, R.; Jin, Y.; Barber, P.; Fallah, S. A Survey of Deep Learning Applications to Autonomous Vehicle Control. IEEE Trans. Intell. Transp. Syst. 2021, 22, 712–733. [Google Scholar] [CrossRef]
- Shi, J.; Zhao, L.; Wang, X.; Zhao, W.; Hawbani, A.; Huang, M. A Novel Deep QLearning-Based Air-Assisted Vehicular Caching Scheme for Safe Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4348–4358. [Google Scholar] [CrossRef]
- Wu, Q.; Li, X.; Wang, K.; Bilal, H. Regional feature fusion for on-road detection of objects using camera and 3D-LiDAR in high-speed autonomous vehicles. Soft Comput. 2023, 27, 18195–18213. [Google Scholar] [CrossRef]
- Gao, H.; Fang, D.; Xiao, J.; Hussain, W.; Kim, J.Y. CAMRL: A Joint Method of Channel Attention and Multidimensional Regression Loss for 3D Object Detection in Automated Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8831–8845. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, X.; Li, J.; Xv, B.; Fu, R.; Chen, H. Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving. IEEE Trans. Veh. Technol. 2023, 72, 5628–5641. [Google Scholar] [CrossRef]
- Broughton, G.; Majer, F.; Rouček, T.; Ruichek, Y.; Yan, Z.; Krajník, T. Learning to see through the haze: Multi-sensor learning-fusion System for Vulnerable Traffic Participant Detection in Fog. Robot. Auton. Syst. 2021, 136, 103687. [Google Scholar] [CrossRef]
- Buchan, S.J.; Duran, M.; Rojas, C.; Wuth, J.; Mahu, R.; Stafford, K.M.; Yoma, N.B. An HMM-DNN-Based System for the Detection and Classification of Low-Frequency Acoustic Signals from Baleen Whales, Earthquakes, and Air Guns off Chile. Remote Sens. 2023, 15, 2554. [Google Scholar] [CrossRef]
- Guan, J.; Liu, Y.; Zhu, Q.; Zheng, T.; Han, J.; Wang, W. Time-Weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Sun, S.; Tong, Y.; He, P.; Song, W.; Li, X.; Liu, G. A Novel GMM-Based Estimated Splitting Coefficient of Second Heart Sound for Diagnosing Aortic Septal Defect. IEEE Sens. J. 2024, 24, 16299–16315. [Google Scholar] [CrossRef]
- Han, X.; Peng, J. Bird sound classification based on ECOC-SVM. Appl. Acoust. 2023, 204, 109245. [Google Scholar] [CrossRef]
- Cinyol, F.; Baysal, U.; Köksal, D.; Babaoğlu, E.; Ulaşlı, S.S. Incorporating support vector machine to the classification of respiratory sounds by Convolutional Neural Network. Biomed. Signal Process. Control 2023, 79, 104093. [Google Scholar] [CrossRef]
- Sun, Z.; Gao, M.; Zhang, M.; Lv, M.; Wang, G. Research on recognition method of broiler overlapping sounds based on random forest and confidence interval. Comput. Electron. Agric. 2023, 209, 107801. [Google Scholar] [CrossRef]
- Roy, T.S.; Roy, J.K.; Mandal, N. Conv-Random Forest-Based IoT: A Deep Learning Model Based on CNN and Random Forest for Classification and Analysis of Valvular Heart Diseases. IEEE Open J. Instrum. Meas. 2023, 2, 2500717. [Google Scholar]
- Vashishtha, S.; Narula, R.; Chaudhary, P. Classification of Musical Instruments’ Sound using kNN and CNN. In Proceedings of the 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 28 February–1 March 2024; pp. 1196–1200. [Google Scholar]
- Chu, H.-C.; Zhang, Y.-L.; Chiang, H.-C. A CNN Sound Classification Mechanism Using Data Augmentation. Sensors 2023, 23, 6972. [Google Scholar] [CrossRef] [PubMed]
- İnik, Ö. CNN hyper-parameter optimization for environmental sound classification. Appl. Acoust. 2023, 202, 109168. [Google Scholar] [CrossRef]
- Zhang, K.; Cai, Y.; Ren, Y.; Ye, R.; He, L. MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection. IEEE Access 2020, 8, 147337–147348. [Google Scholar] [CrossRef]
- Petmezas, G.; Cheimariotis, G.-A.; Stefanopoulos, L.; Rocha, B.; Paiva, R.P.; Katsaggelos, A.K.; Maglaveras, N. Automated Lung Sound Classification Using a Hybrid CNN-LSTM Network and Focal Loss Function. Sensors 2022, 22, 1232. [Google Scholar] [CrossRef]
- Piczak, K.J. Environmental sound classification with convolutional neural networks. In Proceedings of the 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, USA, 17–20 September 2015; pp. 1–6. [Google Scholar]
- Xie, J.; Hu, K.; Zhu, M.; Yu, J.; Zhu, Q. Investigation of Different CNN-Based Models for Improved Bird Sound Classification. IEEE Access 2019, 7, 175353–175361. [Google Scholar] [CrossRef]
- Wang, Y.; Li, S.; Zhang, H.; Liu, T. A lightweight CNN-based model for early warning in sow oestrus sound monitoring. Ecol. Inform. 2022, 72, 101863. [Google Scholar] [CrossRef]
- Bardou, D.; Zhang, K.; Ahmad, S.M. Lung sounds classification using convolutional neural networks. Artif. Intell. Med. 2018, 88, 58–69. [Google Scholar] [CrossRef]
- Choi, J.; Im, S. Application of CNN Models to Detect and Classify Leakages in Water Pipelines Using Magnitude Spectra of Vibration Sound. Appl. Sci. 2023, 13, 2845. [Google Scholar] [CrossRef]
- Li, F.; Tang, H.; Shang, S.; Mathiak, K.; Cong, F. Classification of Heart Sounds Using Convolutional Neural Network. Appl. Sci. 2020, 10, 3956. [Google Scholar] [CrossRef]
- Luitel, B.; Murthy, Y.V.S.; Koolagudi, S.G. Sound event detection in urban soundscape using two-level classification. In Proceedings of the 2016 IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), Mangalore, India, 13–14 August 2016; pp. 259–263. [Google Scholar]
- Uchino, M.; Dawton, B.; Hori, Y.; Ishida, S.; Tagashira, S.; Arakawa, Y. Initial Design of Two-Stage Acoustic Vehicle Detection System for High Traffic Roads. In Proceedings of the 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Austin, TX, USA, 23–27 March 2020; pp. 1–6. [Google Scholar]
- Hao, M.; Ning, F.; Wang, K.; Duan, S.; Wang, Z.; Meng, D. Acoustic Non-Line-of-Sight Vehicle Approaching and Leaving Detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 9979–9991. [Google Scholar] [CrossRef]
- Jiang, Y.; Guo, D.; Wang, L.; Zhang, H.; Dong, H.; Qiu, Y.; Zou, H. Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information. Complex Intell. 2024, 10, 5653–5668. [Google Scholar] [CrossRef]
- Nithya, T.M.; Dhivya, P.; Sangeethaa, S.N.; Kanna, P.R. TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN–Attention BiLSTM. Biomed. Signal Process. Control 2024, 88, 105688. [Google Scholar] [CrossRef]
- Liang, C.; Chen, Q.; Li, Q.; Wang, Q.; Zhao, K.; Tu, J.; Jafaripournimchahi, A. HADNet: A Novel Lightweight Approach for Abnormal Sound Detection on Highway Based on 1D Convolutional Neural Network and Multi-Head Self-Attention Mechanism. Electronics 2024, 13, 4229. [Google Scholar] [CrossRef]
- Kong, Q.; Cao, Y.; Iqbal, T.; Wang, Y.; Wang, W.; Plumbley, M.D. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2880–2894. [Google Scholar] [CrossRef]
- Zhang, Q.-L.; Yang, Y.-B. SA-Net: Shuffle Attention for Deep Convolutional Neural Networks. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Xu, K.; Yao, L.; Yao, J.; Yao, D. Research on traffic sound event classification method based on improved VGG-16 network. J. Southwest Univ. Nat. Sci. Ed. 2023, 45, 145–156. [Google Scholar]








| Model | Accuracy/% | Precision/% | Recall/% | F1-Score | Param/MB |
|---|---|---|---|---|---|
| CNN6 | 94.176 | 94.279 | 94.186 | 0.94201 | 18.29 |
| CNN10 | 94.673 | 94.811 | 94.678 | 0.94689 | 19.82 |
| CNN14 | 95.384 | 95.390 | 95.389 | 0.95383 | 318.81 |
| Model | Accuracy/% | Precision/% | Recall/% | F1-Score | Param/MB |
|---|---|---|---|---|---|
| CNN10 | 94.673 | 94.811 | 94.678 | 0.94689 | 19.82 |
| CNN10 + MSCB | 95.241 | 95.332 | 95.245 | 0.95247 | 20.59 |
| CNN10 + HMSA | 95.952 | 96.080 | 95.955 | 0.95980 | 19.87 |
| CNN10 + SA | 95.312 | 95.432 | 95.319 | 0.95340 | 19.82 |
| TSED-CNN | 96.378 | 96.394 | 96.380 | 0.96382 | 20.63 |
| Model | Accuracy/% | Precision/% | Recall/% | F1-Score | Param/MB |
|---|---|---|---|---|---|
| Res2Net | 94.602 | 94.672 | 94.608 | 0.94599 | 22.46 |
| CAM++ | 95.526 | 95.591 | 95.533 | 0.95525 | 28.73 |
| TDNN | 94.176 | 94.204 | 94.181 | 0.94184 | 11.09 |
| EcapaTDNN | 94.815 | 94.828 | 94.823 | 0.94802 | 24.78 |
| TSED-CNN | 96.378 | 96.394 | 96.380 | 0.96382 | 20.63 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, Y.; Yao, L. Research on Traffic Sound Event Detection Based on Multi-Scale Feature Fusion. Appl. Sci. 2026, 16, 2359. https://doi.org/10.3390/app16052359
Zheng Y, Yao L. Research on Traffic Sound Event Detection Based on Multi-Scale Feature Fusion. Applied Sciences. 2026; 16(5):2359. https://doi.org/10.3390/app16052359
Chicago/Turabian StyleZheng, Yonghao, and Lingyun Yao. 2026. "Research on Traffic Sound Event Detection Based on Multi-Scale Feature Fusion" Applied Sciences 16, no. 5: 2359. https://doi.org/10.3390/app16052359
APA StyleZheng, Y., & Yao, L. (2026). Research on Traffic Sound Event Detection Based on Multi-Scale Feature Fusion. Applied Sciences, 16(5), 2359. https://doi.org/10.3390/app16052359
