Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion
Abstract
:1. Introduction
- This study proposes a stacking ensemble method to fuse acoustic and visual information, aiming to improve emergency vehicle detection under adverse conditions.
- This study incorporated a multi-level spatial fusion technique into YOLO to accommodate the deep-level semantic information required for multi-modal fusion.
- This study proposes an attention-based temporal spectrum network to aid in extracting semantic features for siren sound classification.
2. Materials and Methods
2.1. Detection of Emergency Vehicles Using Visual Cues
Proposing the MLSF-YOLO Model for Visual-Based EVD
2.2. Classifying Emergency Vehicles Using Acoustic Signals
2.2.1. Proposed Framework for Acoustic Classification
2.2.2. Feature Processing
2.2.3. Harmonic–Percussive Source Separation
2.2.4. Creating a Time–Frequency Attention Mechanism
2.2.5. Network Structure
2.2.6. Decision Approach
2.3. Harmonizing Hard and Soft Predictions
2.3.1. Hard Prediction
2.3.2. Soft Prediction
3. Experiments and Results
3.1. Hyperparameters Used
3.2. Data for Visual EVD Experiments
3.3. Data for Acoustic EVD Experiments
3.4. Visual EVD Experiments
Visual EVD with Different Object Detectors
3.5. Acoustic EVD Experiments
Results of Attention-Based Temporal Spectrum Network
3.6. Results of Multi-Level Spatial Fusion YOLO
3.7. Ensemble Learning Results
4. Conclusions
5. Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ghazi, M.U.; Khattak, M.A.K.; Shabir, B.; Malik, A.W.; Ramzan, M.S. Emergency message dissemination in vehicular networks: A review. IEEE Access 2020, 8, 38606–38621. [Google Scholar] [CrossRef]
- Damaševičius, R.; Bacanin, N.; Misra, S. From sensors to safety: Internet of Emergency Services (IoES) for emergency response and disaster management. J. Sens. Actuator Netw. 2023, 12, 41. [Google Scholar] [CrossRef]
- Wang, X.; Liu, Q.; Guo, F.; Xu, X.; Chen, X. Causation analysis of crashes and near crashes using naturalistic driving data. Accid. Anal. Prev. 2022, 177, 106821. [Google Scholar] [CrossRef] [PubMed]
- Razalli, H.; Ramli, R.; Alkawaz, M.H. Emergency vehicle recognition and classification method using HSV color segmentation. In Proceedings of the 2020 16th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Langkawi, Malaysia, 28–29 February 2020; pp. 284–289. [Google Scholar]
- Sarda, A.; Dixit, S.; Bhan, A. Object detection for autonomous driving using yolo [you only look once] algorithm. In Proceedings of the 2021 Third IEEE International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021; pp. 1370–1374. [Google Scholar]
- Kherraki, A.; El Ouazzani, R. Deep convolutional neural networks architecture for an efficient emergency vehicle classification in real-time traffic monitoring. IAES Int. J. Artif. Intell. 2022, 11, 110. [Google Scholar] [CrossRef]
- Sorour, S.E.; Hany, A.A.; Elredeny, M.S.; Sedik, A.; Hussien, R.M. An Automatic Dermatology Detection System Based on Deep Learning and Computer Vision. IEEE Access 2023, 11, 137769–137778. [Google Scholar] [CrossRef]
- Goel, S.; Baghel, A.; Srivastava, A.; Tyagi, A.; Nagrath, P. Detection of emergency vehicles using modified YOLO algorithm. In Proceedings of the Intelligent Communication, Control and Devices (ICICCD 2018); Springer: Berlin/Heidelberg, Germany, 2020; pp. 671–687. [Google Scholar]
- Berwo, M.A.; Khan, A.; Fang, Y.; Fahim, H.; Javaid, S.; Mahmood, J.; Abideen, Z.U.; Syam, M.S. Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. Sensors 2023, 23, 4832. [Google Scholar] [CrossRef] [PubMed]
- Baghel, A.; Srivastava, A.; Tyagi, A.; Goel, S.; Nagrath, P. Analysis of Ex-YOLO algorithm with other real-time algorithms for emergency vehicle detection. In Proceedings of the First International Conference on Computing, Communications, and Cyber-Security (IC4S 2019); Springer: Berlin/Heidelberg, Germany, 2020; pp. 607–618. [Google Scholar]
- Farid, A.; Hussain, F.; Khan, K.; Shahzad, M.; Khan, U.; Mahmood, Z. A Fast and Accurate Real-Time Vehicle Detection Method Using Deep Learning for Unconstrained Environments. Appl. Sci. 2023, 13, 3059. [Google Scholar] [CrossRef]
- Pan, M.; Liu, Y.; Cao, J.; Li, Y.; Li, C.; Chen, C.H. Visual recognition based on deep learning for navigation mark classification. IEEE Access 2020, 8, 32767–32775. [Google Scholar] [CrossRef]
- Tahir, N.U.A.; Zhang, Z.; Asim, M.; Chen, J.; ELAffendi, M. Object Detection in Autonomous Vehicles under Adverse Weather: A Review of Traditional and Deep Learning Approaches. Algorithms 2024, 17, 103. [Google Scholar] [CrossRef]
- Tran, V.T.; Tsai, W.H. Acoustic-based emergency vehicle detection using convolutional neural networks. IEEE Access 2020, 8, 75702–75713. [Google Scholar] [CrossRef]
- Pramanick, D.; Ansar, H.; Kumar, H.; Pranav, S.; Tengshe, R.; Fatimah, B. Deep learning based urban sound classification and ambulance siren detector using spectrogram. In Proceedings of the 2021 12th IEEE International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; pp. 1–6. [Google Scholar]
- Fatimah, B.; Preethi, A.; Hrushikesh, V.; Singh, A.; Kotion, H.R. An automatic siren detection algorithm using Fourier Decomposition Method and MFCC. In Proceedings of the 2020 11th IEEE International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–6. [Google Scholar]
- Mateen, A.; Hanif, M.Z.; Khatri, N.; Lee, S.; Nam, S.Y. Smart roads for autonomous accident detection and warnings. Sensors 2022, 22, 2077. [Google Scholar] [CrossRef]
- Tang, M.; Zhao, Q.; Ding, S.X.; Wu, H.; Li, L.; Long, W.; Huang, B. An improved lightGBM algorithm for online fault detection of wind turbine gearboxes. Energies 2020, 13, 807. [Google Scholar] [CrossRef]
- Mu, W.; Yin, B.; Huang, X.; Xu, J.; Du, Z. Environmental sound classification using temporal-frequency attention based convolutional neural network. Sci. Rep. 2021, 11, 21552. [Google Scholar] [CrossRef]
- Mahlous, A.R. Cyber security challenges in self-driving cars. Computer Fraud. Secur. 2022, 1873–7056. [Google Scholar] [CrossRef]
- Li, Q.; Garg, S.; Nie, J.; Li, X.; Liu, R.W.; Cao, Z.; Hossain, M.S. A highly efficient vehicle taillight detection approach based on deep learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4716–4726. [Google Scholar] [CrossRef]
- Yu, J.; Zhang, W. Face mask wearing detection algorithm based on improved YOLO-v4. Sensors 2021, 21, 3263. [Google Scholar] [CrossRef]
- Wu, D.; Lv, S.; Jiang, M.; Song, H. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput. Electron. Agric. 2020, 178, 105742. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, S.; Li, Y.; Li, Y.; Li, M.; Xu, X. Yolo-firi: Improved yolov5 for infrared image object detection. IEEE Access 2021, 9, 141861–141875. [Google Scholar] [CrossRef]
- Huang, Z.; Wang, J.; Fu, X.; Yu, T.; Guo, Y.; Wang, R. DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. 2020, 522, 241–258. [Google Scholar] [CrossRef]
- Hu, X.; Liu, Y.; Zhao, Z.; Liu, J.; Yang, X.; Sun, C.; Chen, S.; Li, B.; Zhou, C. Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved YOLO-V4 network. Comput. Electron. Agric. 2021, 185, 106135. [Google Scholar] [CrossRef]
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Ansari, S.; Alnajjar, K.A.; Khater, T.; Mahmoud, S.; Hussain, A. A Robust Hybrid Neural Network Architecture for Blind Source Separation of Speech Signals Exploiting Deep Learning. IEEE Access 2023, 11, 100414–100437. [Google Scholar] [CrossRef]
- Rehman, A.; Alam, T.; Mujahid, M.; Alamri, F.S.; Al Ghofaily, B.; Saba, T. RDET stacking classifier: A novel machine learning based approach for stroke prediction using imbalance data. Peerj Comput. Sci. 2023, 9, e1684. [Google Scholar] [CrossRef]
- Golchoubian, M.; Ghafurian, M.; Dautenhahn, K.; Azad, N.L. Pedestrian trajectory prediction in pedestrian-vehicle mixed environments: A systematic review. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11544–11567. [Google Scholar] [CrossRef]
- Guzhov, A.; Raue, F.; Hees, J.; Dengel, A. Audioclip: Extending clip to image, text and audio. In Proceedings of the ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 976–980. [Google Scholar]
- Nanni, L.; Maguolo, G.; Brahnam, S.; Paci, M. An ensemble of convolutional neural networks for audio classification. Appl. Sci. 2021, 11, 5796. [Google Scholar] [CrossRef]
- Gatto, R.C.; Forster, C.H.Q. Audio-based machine learning model for traffic congestion detection. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7200–7207. [Google Scholar] [CrossRef]
- Abdallah, M.; An Le Khac, N.; Jahromi, H.; Delia Jurcut, A. A hybrid CNN-LSTM based approach for anomaly detection systems in SDNs. In Proceedings of the 16th International Conference on Availability, Reliability and Security, Vienna, Austria, 17–20 August 2021; pp. 1–7. [Google Scholar]
- Kaushik, S.; Raman, A.; Rao, K.R. Leveraging computer vision for emergency vehicle detection-implementation and analysis. In Proceedings of the 2020 11th IEEE International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–6. [Google Scholar]
- Raj, V.S.; Sai, J.V.M.; Yogesh, N.L.; Preetha, S.K.; Lavanya, R. Smart Traffic Control for Emergency Vehicles Prioritization using Video and Audio Processing. In Proceedings of the 2022 6th IEEE International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 25–27 May 2022; pp. 1588–1593. [Google Scholar]
- Shatnawi, M.; Audat, A.; Saraireh, M. Intelligent Requirements Engineering: Applying Machine Learning for Requirements Classification. In Proceedings of the 2023 14th IEEE International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 21–23 November 2023; pp. 1–6. [Google Scholar]
- Zhao, J.; Hao, S.; Dai, C.; Zhang, H.; Zhao, L.; Ji, Z.; Ganchev, I. Improved vision-based vehicle detection and classification by optimized YOLOv4. IEEE Access 2022, 10, 8590–8603. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhao, H.; Zhang, X.; Liu, W. Vehicle classification based on audio-visual feature fusion with low-quality images and noise. J. Intell. Fuzzy Syst. 2023, 45, 1–14. [Google Scholar] [CrossRef]
- Jiang, K.; Su, D.; Zheng, Y. Intelligent acquisition model of traffic congestion information in the vehicle networking environment based on multi-sensor fusion. Int. J. Veh. Inf. Commun. Syst. 2019, 4, 155–169. [Google Scholar] [CrossRef]
- Al-Batat, R.; Angelopoulou, A.; Premkumar, S.; Hemanth, J.; Kapetanios, E. An end-to-end automated license plate recognition system using YOLO based vehicle and license plate detection with vehicle classification. Sensors 2022, 22, 9477. [Google Scholar] [CrossRef]
- Middya, A.I.; Nag, B.; Roy, S. Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowl.-Based Syst. 2022, 244, 108580. [Google Scholar] [CrossRef]
- Bolboacă, R. Adaptive ensemble methods for tampering detection in automotive aftertreatment systems. IEEE Access 2022, 10, 105497–105517. [Google Scholar] [CrossRef]
Model | Hyperparameters | Optimizer | Activation Function |
---|---|---|---|
ATSN | Loss Function: BCE Learning Rate: 0.001 Regularization Dropout | Adam | ReLU |
MLSF-YOLO | Batch Size: 16 Learning Rate: 0.01 Weight Decay: 0.0005 | SGD with Momentum | Mish |
Division | Data Collection | Frame Count | EV Samples | NEV Samples | Frame- Average Instances |
---|---|---|---|---|---|
Training | VEVD | 11,930 | 12,705 | 24,522 | 3.12 |
KITTI | 3995 | 0 | 18,960 | 4.74 | |
Validation | VEVD | 3902 | 3998 | 6519 | 2.43 |
KITTI | 998 | 0 | 5750 | 5.77 | |
Testing | VEVD | 4008 | 4432 | 7230 | 3.41 |
KITTI | 1349 | 0 | 6448 | 4.78 | |
Total | VEVD | 19,144 | 23,448 | 39,895 | 3.25 |
KITTI | 6850 | 0 | 32,752 | 4.78 | |
VEVD+KITTI | 25,994 | 23,448 | 72,647 | 4.45 |
Subset | Noise | Siren Sound | Total |
---|---|---|---|
Training | 11,020 | 5290 | 16,310 |
Validation | 3518 | 1744 | 5262 |
Test | 3409 | 1724 | 5133 |
Total | 17,947 | 8758 | 26,705 |
Detector | Architecture | Resolution | mAP@[0.5:0.95] | mAP@0.5 | mAP@0.75 | Time Cost (ms) |
---|---|---|---|---|---|---|
SSD-300 | VGG-16 | 300 × 300 | 51.96 | 83.1 | 63.9 | 19.8 |
SSD-512 | VGG-16 | 512 × 512 | 54.98 | 84.8 | 64.8 | 39.8 |
YOLOv4 | CSPDarknet-53 | 512 × 512 | 67.8 | 92.7 | 81.1 | 3.5 |
YOLOv4 | CSPDarknet-53 | 608 × 608 | 68.9 | 92.8 | 81.9 | 4.3 |
YOLOv3 | Darknet-53 | 512 × 512 | 67.1 | 90.9 | 80.1 | 5.9 |
YOLOv3 | Darknet-53 | 608 × 608 | 67.1 | 90.8 | 80.7 | 7.8 |
EfficientDet-D0 | Efficient-B0 | 512 × 512 | 61.9 | 86.7 | 73.2 | 26.1 |
EfficientDet-D1 | Efficient-B1 | 640 × 640 | 62.7 | 87.8 | 74.9 | 29.7 |
EfficientDet-D2 | Efficient-B2 | 768 × 768 | 63.9 | 89.8 | 78.3 | 35.2 |
EfficientDet-D3 | Efficient-B3 | 896 × 896 | 67 | 91.8 | 81.5 | 44.1 |
EfficientDet-D4 | Efficient-B4 | 1024 × 1024 | 66.3 | 91 | 80.7 | 67.3 |
MLSF-YOLO | CSPDarknet-53 | 608 × 608 | 71.1 | 95.2 | 86.2 | 4.9 |
Input Length | Original Data | +20 dB | +10 dB | 0 dB | −10 dB | −20 dB | −30 dB | Time |
---|---|---|---|---|---|---|---|---|
1.5 s | 93.47 | 93.76 | 93.75 | 93.37 | 92.19 | 93.32 | 91.58 | 8 ms |
1.0 s | 93.24 | 93.68 | 93.68 | 93.17 | 93.46 | 93.19 | 89.24 | 7 ms |
0.5 s | 93.29 | 92.81 | 93.47 | 91.75 | 91.88 | 91.39 | 89.32 | 4 ms |
0.25 s | 93.19 | 92.47 | 92.29 | 93.42 | 90.72 | 91.27 | 88.12 | 2 ms |
Class | Accuracy | Precision | Recall | F1score |
---|---|---|---|---|
1 | 93.47% | 0.93 | 0.94 | 0.93 |
2 | 92.98% | 0.94 | 0.93 | 0.94 |
Resolution (h × w) | mAP @[0.5:0.95] | mAP @0.5 | mAP @0.75 | Time (ms) |
---|---|---|---|---|
320 × 320 | 66.9 | 93.4 | 81.1 | 2.2 |
416 × 416 | 69.2 | 94.5 | 84.3 | 3.1 |
512 × 512 | 69.8 | 94.8 | 84.7 | 4.2 |
608 × 608 | 71.1 | 95.2 | 86.2 | 4.9 |
640 × 640 | 71.1 | 94.9 | 85.9 | 5.0 |
768 × 768 | 71.0 | 94.7 | 86.2 | 6.9 |
Study | Method | Accuracy | Limitations |
---|---|---|---|
Ref. [36] | RCNN | 92% | Lighting challenges |
Ref. [37] | Yolov5 | 88% | High computation cost |
Ref. [38] | CNN | 85% | Difficulty in classification |
Ref. [39] | YOLOv4_AF | 83% | Limited accuracy compared to other models |
Ref. [25] | YOLOv4_FIRI | 94% | Slow speed and limited adaptability |
Proposed | MLSF_YOLO | 95% | Lacks practical implications |
Class | Accuracy | Precision | Recall | F1score |
---|---|---|---|---|
1 | 96.19% | 0.97 | 0.95 | 0.97 |
2 | 95.93% | 0.95 | 0.97 | 0.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zohaib, M.; Asim, M.; ELAffendi, M. Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion. Mathematics 2024, 12, 1514. https://doi.org/10.3390/math12101514
Zohaib M, Asim M, ELAffendi M. Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion. Mathematics. 2024; 12(10):1514. https://doi.org/10.3390/math12101514
Chicago/Turabian StyleZohaib, Muhammad, Muhammad Asim, and Mohammed ELAffendi. 2024. "Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion" Mathematics 12, no. 10: 1514. https://doi.org/10.3390/math12101514
APA StyleZohaib, M., Asim, M., & ELAffendi, M. (2024). Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion. Mathematics, 12(10), 1514. https://doi.org/10.3390/math12101514