Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management
Abstract
:1. Introduction
2. Methods
2.1. Data Collection and Annotation
2.2. Model Selection
2.3. Training and Optimization
2.3.1. URA VLMs
2.3.2. ResNet34
2.4. Performance Evaluation
2.4.1. Model Performance
- Road anomaly classification
- Road waterlogging depth estimation
- Road safety level
2.4.2. Real-Time System Architecture
- Response time
- Robustness under diverse environmental conditions
3. Results
3.1. Data Annotation
3.2. Models Classification Performance Comparison
3.3. Performance Evaluation of URA-VLMs
3.3.1. Model Performance
- Road anomalies classification
- Road waterlogging depth estimation
- Road Safety level
3.3.2. Real-Time Performance
- Response time
- Robustness under diverse environmental conditions
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Category | Component/Feature | Height (mm) | Description |
---|---|---|---|
Buses, coaches, and large trucks | Overall Height | 3000–3500 | The total height of the bus |
Wheel Height | 900–1000 | Height of the bus wheels | |
Cars | Overall Height | 1400–1600 | The total height of the car |
Wheel Height | 600–800 | Height of the car wheels | |
Bicycle | Bicycle Overall Height | 1000 | The total height of a bicycle |
Bicycle Pedal Height | 120–150 | Height of the pedals from the ground, | |
Bicycle Tire Height | 650–700 | Height of the tires | |
Bicycle Tire Thickness | 30 | The thickness of the tire | |
People (Adult measurement) | Ankle Height | 70–90 | The height of an adult’s ankle |
Knee Height | 400–500 | Height of an adult’s knee | |
Hip Height | 800–1000 | Height of an adult’s hip | |
Other Facilities | Traffic Cone Height | 750 | Height of a traffic cone used for directing vehicle flow and safety |
Railing Top Rail Height | 900–1200 | Height of the top rail | |
Footpath (Curb/Sidewalk) Height | 50–150 | Height of curbs and sidewalks | |
Base Height of Sign Post | 800 | Base height of sign post |
References
- WHO. Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
- Silva, N.; Soares, J.; Shah, V.; Santos, M.Y.; Rodrigues, H. Anomaly Detection in Roads with a Data Mining Approach. Procedia Comput. Sci. 2017, 121, 415–422. [Google Scholar] [CrossRef]
- Xing, H.; Yang, F.; Qiao, X.; Li, F.; Huang, X. Enhanced End-to-End Regression Algorithm for Autonomous Road Damage Detection. J. Supercomput. 2025, 81, 380. [Google Scholar] [CrossRef]
- UNDRR Annual Report (2021); United Nations Office for Disaster Risk Reduction: Geneva, Switzerland, 2021.
- Devitt, L.; Neal, J.; Coxon, G.; Savage, J.; Wagener, T. Flood Hazard Potential Reveals Global Floodplain Settlement Patterns. Nat. Commun. 2023, 14, 2801. [Google Scholar] [CrossRef]
- Lin, Z.; Guan, S.; Zhang, W.; Zhang, H.; Li, Y.; Zhang, H. Towards Trustworthy LLMs: A Review on Debiasing and Dehallucinating in Large Language Models. Artif. Intell. Rev. 2024, 57, 243. [Google Scholar] [CrossRef]
- The Office of the National Disaster Prevention, Reduction and Relief Committee. Natural Disasters in China in the First Half of 2024; The Office of the National Disaster Prevention, Reduction and Relief Committee: Beijing, China, 2024. [Google Scholar]
- Garcia, V.M.; Granados, R.P.; Medina, M.E.; Ochoa, L.; Mondragon, O.A.; Cheu, R.L.; Villanueva-Rosales, N.; Rosillo, V.M.L. Management of Real-Time Data for a Smart Flooding Alert System. In Proceedings of the 2020 IEEE International Smart Cities Conference (ISC2), Piscataway, NJ, USA, 28 September–1 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8. [Google Scholar]
- Li, Z.; Wang, C.; Emrich, C.T.; Guo, D. A Novel Approach to Leveraging Social Media for Rapid Flood Mapping: A Case Study of the 2015 South Carolina Floods. Cart. Geogr. Inf. Sci. 2018, 45, 97–110. [Google Scholar] [CrossRef]
- Matini, N.; Qiao, Y.; Sias, J.E. Development of Time–Depth–Damage Functions for Flooded Flexible Pavements. J. Transp. Eng. Part B Pavements 2022, 148, 04022011. [Google Scholar] [CrossRef]
- Henao Salgado, M.J.; Zambrano Nájera, J. Assessing Flood Early Warning Systems for Flash Floods. Front. Clim. 2022, 4, 787042. [Google Scholar] [CrossRef]
- Kuller, M.; Schoenholzer, K.; Lienert, J. Creating Effective Flood Warnings: A Framework from a Critical Review. J. Hydrol. 2021, 602, 126708. [Google Scholar] [CrossRef]
- Chaudhary, P.; D’Aronco, S.; Leitão, J.P.; Schindler, K.; Wegner, J.D. Water Level Prediction from Social Media Images with a Multi-Task Ranking Approach. ISPRS J. Photogramm. Remote Sens. 2020, 167, 252–262. [Google Scholar] [CrossRef]
- Shah, S.A.; Seker, D.Z.; Hameed, S.; Draheim, D. The Rising Role of Big Data Analytics and IoT in Disaster Management: Recent Advances, Taxonomy and Prospects. IEEE Access 2019, 7, 54595–54614. [Google Scholar] [CrossRef]
- Lopez, T.; Al Bitar, A.; Biancamaria, S.; Güntner, A.; Jäggi, A. On the Use of Satellite Remote Sensing to Detect Floods and Droughts at Large Scales. Surv. Geophys. 2020, 41, 1461–1487. [Google Scholar] [CrossRef]
- Munawar, H.S.; Hammad, A.W.A.; Waller, S.T. Remote Sensing Methods for Flood Prediction: A Review. Sensors 2022, 22, 960. [Google Scholar] [CrossRef] [PubMed]
- Soomro, S.; Boota, M.W.; Zwain, H.M.; Soomro, G.-Z.; Shi, X.; Guo, J.; Li, Y.; Tayyab, M.; Aamir Soomro, M.H.A.; Hu, C.; et al. How Effective Is Twitter (X) Social Media Data for Urban Flood Management? J. Hydrol. 2024, 634, 131129. [Google Scholar] [CrossRef]
- Li, J.; Cai, R.; Tan, Y.; Zhou, H.; Sadick, A.-M.; Shou, W.; Wang, X. Automatic Detection of Actual Water Depth of Urban Floods from Social Media Images. Measurement 2023, 216, 112891. [Google Scholar] [CrossRef]
- Ajim, I. Pathan An IoT and AI Based Flood Monitoring and Rescue System. Int. J. Eng. Res. 2020, 9, 564–567. [Google Scholar] [CrossRef]
- Benoudjit, A.; Guida, R. A Novel Fully Automated Mapping of the Flood Extent on SAR Images Using a Supervised Classifier. Remote Sens. 2019, 11, 779. [Google Scholar] [CrossRef]
- Campolo, M.; Andreussi, P.; Soldati, A. River Flood Forecasting with a Neural Network Model. Water Resour. Res. 1999, 35, 1191–1197. [Google Scholar] [CrossRef]
- Rani, D.S.; Jayalakshmi, G.N.; Baligar, V.P. Low Cost IoT Based Flood Monitoring System Using Machine Learning and Neural Networks: Flood Alerting and Rainfall Prediction. In Proceedings of the 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 5–7 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 261–267. [Google Scholar]
- Wang, P.; Hu, Y.; Dai, Y.; Tian, M. Asphalt Pavement Pothole Detection and Segmentation Based on Wavelet Energy Field. Math. Probl. Eng. 2017, 2017, 1604130. [Google Scholar] [CrossRef]
- Banharnsakun, A. Hybrid ABC-ANN for Pavement Surface Distress Detection and Classification. Int. J. Mach. Learn. Cybern. 2017, 8, 699–710. [Google Scholar] [CrossRef]
- Shahabi, H.; Shirzadi, A.; Ghaderi, K.; Omidvar, E.; Al-Ansari, N.; Clague, J.J.; Geertsema, M.; Khosravi, K.; Amini, A.; Bahrami, S.; et al. Flood Detection and Susceptibility Mapping Using Sentinel-1 Remote Sensing Data and a Machine Learning Approach: Hybrid Intelligence of Bagging Ensemble Based on K-Nearest Neighbor Classifier. Remote Sens. 2020, 12, 266. [Google Scholar] [CrossRef]
- Alizadeh Kharazi, B.; Behzadan, A.H. Flood Depth Mapping in Street Photos with Image Processing and Deep Neural Networks. Comput. Environ. Urban Syst. 2021, 88, 101628. [Google Scholar] [CrossRef]
- Prakash, G.; Gupta, P.K.; Rao, G.V.; Pratap, D. Flood Inundation Mapping and Depth Modelling Using Machine Learning Algorithms and Microwave Data. J. Geomat. 2021, 15, 221–229. [Google Scholar]
- Hocini, N.; Payrastre, O.; Bourgin, F.; Gaume, E.; Davy, P.; Lague, D.; Poinsignon, L.; Pons, F. Performance of Automated Methods for Flash Flood Inundation Mapping: A Comparison of a Digital Terrain Model (DTM) Filling and Two Hydrodynamic Methods. Hydrol. Earth Syst. Sci. 2021, 25, 2979–2995. [Google Scholar] [CrossRef]
- Song, S.; Zhang, C.; Zhang, P.; Li, P.; Song, F.; Zhang, L. Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter. In Proceedings of the ECCV 2024, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Munappy, A.R.; Bosch, J.; Olsson, H.H.; Arpteg, A.; Brinne, B. Data Management for Production Quality Deep Learning Models: Challenges and Solutions. J. Syst. Softw. 2022, 191, 111359. [Google Scholar] [CrossRef]
- Iivanainen, S.; Lagus, J.; Viertolahti, H.; Sippola, L.; Koivunen, J. Investigating Large Language Model (LLM) Performance Using in-Context Learning (ICL) for Interpretation of ESMO and NCCN Guidelines for Lung Cancer. J. Clin. Oncol. 2024, 42, e13637. [Google Scholar] [CrossRef]
- Sony, S.; Laventure, S.; Sadhu, A. A Literature Review of Next-Generation Smart Sensing Technology in Structural Health Monitoring. Struct. Control Health Monit. 2019, 26, e2321. [Google Scholar] [CrossRef]
- Sony, S.; Dunphy, K.; Sadhu, A.; Capretz, M. A Systematic Review of Convolutional Neural Network-Based Structural Condition Assessment Techniques. Eng. Struct. 2021, 226, 111347. [Google Scholar] [CrossRef]
- Haldar, A.; Al-Hussein, A. Recent Developments in Structural Health Monitoring and Assessment—Opportunities and Challenges; World Scientific: Singapore, 2022; ISBN 978-981-12-4300-4. [Google Scholar]
- Shahriar, S.; Lund, B.D.; Mannuru, N.R.; Arshad, M.A.; Hayawi, K.; Bevara, R.V.K.; Mannuru, A.; Batool, L. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency. Appl. Sci. 2024, 14, 7782. [Google Scholar] [CrossRef]
- Floridi, L.; Chiriatti, M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
- Friha, O.; Amine Ferrag, M.; Kantarci, B.; Cakmak, B.; Ozgun, A.; Ghoualmi-Zine, N. LLM-Based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness. IEEE Open J. Commun. Soc. 2024, 5, 5799–5856. [Google Scholar] [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2024, arXiv:2303.18223. [Google Scholar]
- Feng, Y.; Brenner, C.; Sester, M. Flood Severity Mapping from Volunteered Geographic Information by Interpreting Water Level from Images Containing People: A Case Study of Hurricane Harvey. ISPRS J. Photogramm. Remote Sens. 2020, 169, 301–319. [Google Scholar] [CrossRef]
- Shafiq, S.; Awan, H.M.; Khan, A.A.; Amin, W. Driving Like Humans: Leveraging Vision Large Language Models for Road Anomaly Detection. In Proceedings of the 2024 3rd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE), Lahore, Pakistan, 26–27 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Akinboyewa, T.; Ning, H.; Lessani, M.N.; Li, Z. Automated Floodwater Depth Estimation Using Large Multimodal Model for Rapid Flood Mapping. Comput. Urban Sci. 2024, 4, 12. [Google Scholar] [CrossRef]
- Li, Z.; Ning, H. Autonomous GIS: The next-Generation AI-Powered GIS. Int. J. Digit. Earth 2023, 16, 4668–4686. [Google Scholar] [CrossRef]
- Wang, W.; Chen, Z.; Wang, W.; Cao, Y.; Liu, Y.; Gao, Z.; Zhu, J.; Zhu, X.; Lu, L.; Qiao, Y.; et al. Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization. arXiv 2024, arXiv:2411.10442. [Google Scholar]
- Cian, F.; Marconcini, M.; Ceccato, P.; Giupponi, C. Flood Depth Estimation by Means of High-Resolution SAR Images and Lidar Data. Nat. Hazards Earth Syst. Sci. 2018, 18, 3063–3084. [Google Scholar] [CrossRef]
- Beijing Water Authority. Beijing Urban Waterlogging Risk Map Report; Beijing Water Authority: Beijing, China, 2022. [Google Scholar]
- GB 51222-2017; Technical Specification for Urban Waterlogging Prevention and Control. Ministry of Housing and Urban-Rural Development of the People’s Republic of China: Beijing, China, 2017.
- GB 50014-2021; Outdoor Drainage Design Standards. Ministry of Housing and Urban-Rural Development of the People’s Republic of China: Beijing, China, 2021.
- Tian, Y.; Wang, Q.; Guo, Z.; Zhao, H.; Khan, S.; Mao, W.; Yasir, M.; Zhao, J. A Hybrid Deep Learning and Ensemble Learning Mechanism for Damaged Power Line Detection in Smart Grids. Soft Comput. 2022, 26, 10553–10561. [Google Scholar] [CrossRef]
- Sarda, A.; Dixit, S.; Bhan, A. Object Detection for Autonomous Driving Using YOLO [You Only Look Once] Algorithm. In Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1370–1374. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Bharati, P.; Pramanik, A. Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey. In Computational Intelligence in Pattern Recognition; Springer: Singapore, 2020; pp. 657–668. [Google Scholar]
- Huang, Y.; Chen, J.; Huang, D. UFPMP-Det:Toward Accurate and Efficient Object Detection on Drone Imagery. Proc. AAAI Conf. Artif. Intell. 2022, 36, 1026–1033. [Google Scholar] [CrossRef]
- Chen, Z.; Wu, J.; Wang, W.; Su, W.; Chen, G.; Xing, S.; Zhong, M.; Zhang, Q.; Zhu, X.; Lu, L.; et al. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Pytorch Team. Deep Residual Networks Pre-Trained on ImageNet. Available online: https://pytorch.org/hub/pytorch_vision_resnet/ (accessed on 19 February 2025).
- Yik, Y.K.; Alias, N.E.; Yusof, Y.; Isaak, S. A Real-Time Pothole Detection Based on Deep Learning Approach. J. Phys. Conf. Ser. 2021, 1828, 012001. [Google Scholar] [CrossRef]
- Aslan, O.D.; Gultepe, E.; Ramaji, I.J.; Kermanshachi, S. Using Artifical Intelligence for Automating Pavement Condition Assessment. In Proceedings of the International Conference on Smart Infrastructure and Construction 2019 (ICSIC), Cambridge, UK, 8–10 July 2019; ICE Publishing: London, UK, 2019; pp. 337–341. [Google Scholar]
- Arjapure, S.; Kalbande, D.R. Deep Learning Model for Pothole Detection and Area Computation. In Proceedings of the 2021 International Conference on Communication Information and Computing Technology (ICCICT), Mumbai, India, 25–27 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Bai, Z.; Wang, Y.; Zhang, A.; Wei, H.; Pan, G. Road Surface Condition Monitoring in Extreme Weather Using a Feature-Learning Enhanced Mask–RCNN. J. Transp. Eng. Part B Pavements 2024, 150, 04024030. [Google Scholar] [CrossRef]
- Jiang, J.; Xu, G.; Wang, H.; Yang, Z.; Sun, B.; Guan, C.; Feng, J.; Ma, Y.; Chen, X. High-Accuracy Road Surface Condition Detection through Multi-Sensor Information Fusion Based on WOA-BP Neural Network. Sens. Actuators A Phys. 2024, 378, 115829. [Google Scholar] [CrossRef]
- Li, J.; Chen, X.; Zhang, A.; Li, C.; Lin, D. Survey of Garbage Classification Methods Based on Deep Learning. Comput. Eng. 2022, 48, 4–9. [Google Scholar]
- Ahmed Khan, H.; Naqvi, S.S.; Alharbi, A.A.K.; Alotaibi, S.; Alkhathami, M. Enhancing Trash Classification in Smart Cities Using Federated Deep Learning. Sci. Rep. 2024, 14, 11816. [Google Scholar] [CrossRef] [PubMed]
- Nahiduzzaman, M.; Ahamed, M.F.; Naznine, M.; Karim, M.J.; Kibria, H.B.; Ayari, M.A.; Khandakar, A.; Ashraf, A.; Ahsan, M.; Haider, J. An Automated Waste Classification System Using Deep Learning Techniques: Toward Efficient Waste Recycling and Environmental Sustainability. Knowl.-Based Syst. 2025, 310, 113028. [Google Scholar] [CrossRef]
- Martin, K.D.; Zimmermann, J. Artificial Intelligence and Its Implications for Data Privacy. Curr. Opin. Psychol. 2024, 58, 101829. [Google Scholar] [CrossRef]
- Wang, Y.; Ding, Y.; Wu, Q.; Wei, Y.; Qin, B.; Wang, H. Privacy-Preserving Cloud-Based Road Condition Monitoring With Source Authentication in VANETs. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1779–1790. [Google Scholar] [CrossRef]
- Narayanan, K.L.; Naresh, R. Privacy-Preserving Dual Interactive Wasserstein Generative Adversarial Network for Cloud-Based Road Condition Monitoring in VANETs. Appl. Soft Comput. 2024, 154, 111367. [Google Scholar] [CrossRef]
- Krishna, S.; Siri, S.; Kamalsha, S.; Amruth, S.; Jadon, S. PRIVATE-AI: A Hybrid Approach to Privacy-Preserving AI. In Proceedings of the 2023 IEEE/ACIS 8th International Conference on Big Data, Cloud Computing, and Data Science (BCD), Ho Chi Minh City, Vietnam, 14–16 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 170–175. [Google Scholar]
- Zhang, J.; Fang, H.; Zhong, H.; Cui, J.; He, D. Blockchain-Assisted Privacy-Preserving Traffic Route Management Scheme for Fog-Based Vehicular Ad-Hoc Networks. IEEE Trans. Netw. Serv. Manag. 2023, 20, 2854–2868. [Google Scholar] [CrossRef]
- Cheng, H.; Yang, J.; Shojafar, M.; Cao, J.; Jiang, N.; Liu, Y. VFAS: Reliable and Privacy-Preserving V2F Authentication Scheme for Road Condition Monitoring System in IoV. IEEE Trans. Veh. Technol. 2023, 72, 7958–7972. [Google Scholar] [CrossRef]
- Radanliev, P. AI Ethics: Integrating Transparency, Fairness, and Privacy in AI Development. Appl. Artif. Intell. 2025, 39, 2463722. [Google Scholar] [CrossRef]
- Ye, X.; Yan, Y.; Li, J.; Jiang, B. Privacy and Personal Data Risk Governance for Generative Artificial Intelligence: A Chinese Perspective. Telecommun. Policy 2024, 48, 102851. [Google Scholar] [CrossRef]
- Saeed, M.M.; Alsharidah, M. Security, Privacy, and Robustness for Trustworthy AI Systems: A Review. Comput. Electr. Eng. 2024, 119, 109643. [Google Scholar] [CrossRef]
- Aravindkumar, S.; Varalakshmi, P.; Alagappan, C. Automatic Road Surface Crack Detection Using Deep Learning Techniques. In Artificial Intelligence and Technologies; Springer: Singapore, 2022; pp. 37–44. [Google Scholar]
- Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What We Know and What Is Left to Attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
- Kabudi, T.; Pappas, I.; Olsen, D.H. AI-Enabled Adaptive Learning Systems: A Systematic Mapping of the Literature. Comput. Educ. Artif. Intell. 2021, 2, 100017. [Google Scholar] [CrossRef]
- Yan, F.; Zhang, H.; Li, Y.; Yang, Y.; Liu, Y. End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model. Appl. Sci. 2023, 13, 3287. [Google Scholar] [CrossRef]
No. | Types | Description |
---|---|---|
1 | Bad road | The road exhibits significant cracks and potholes, posing potential hazards to vehicle safety and traffic flow. |
2 | Fallen tree | There are fallen tree obstructs the roadway, creating a safety hazard for vehicles. |
3 | Fire | A fire is present on the road, and poses a risk to nearby traffic. |
4 | Flooding | The road is subject to varying levels of flooding, which may impede vehicle traffic flow. |
5 | Garbage | Accumulated garbage is evident on the road, presenting obstacles and safety risks to traffic. |
6 | Normal | The road appears in good condition with no noticeable hazards, facilitating normal traffic flow. |
7 | Traffic accident | A traffic accident has occurred on the road, potentially disrupting traffic and posing safety concerns. |
8 | Traffic jam | A traffic accident has occurred on the road, potentially disrupting traffic and posing safety concerns. |
No. | Safety Levels | Types | Description |
---|---|---|---|
1 | GREEN | Normal Flooding (depth ≤ 10) | The road is in good condition, exhibiting no noticeable hazards, and has waterlogging measuring less than 10 mm, which facilitates normal traffic flow. |
2 | LIGHT GREEN | Garbage Traffic jam Flooding (10 mm < depth ≤ 150 mm) | The presence of traffic congestion, garbage accumulation, and minor flooding (10 mm < depth ≤ 150 mm) has a negligible impact on vehicular traffic. |
3 | YELLOW | Bad road Fallen tree Fire Traffic accident Flooding (depth > 150 mm) | Road damages, fires, substantial flooding (water depth > 150 mm), or traffic accidents pose a significant threat to vehicular traffic flow. |
No. | Types | Number of Data Points (Total) | Number of Data Points (Training Dataset) | Number of Data Points (Testing Dataset) |
---|---|---|---|---|
1 | Bad road | 1068 | 854 | 214 |
2 | Fallen tree | 1480 | 1184 | 296 |
3 | Fire | 1523 | 1218 | 305 |
4 | Flooding | 1598 | 1288 | 310 |
5 | Garbage | 1819 | 1455 | 364 |
6 | Normal | 3375 | 2700 | 675 |
7 | Traffic accident | 2163 | 1730 | 433 |
8 | Traffic jam | 2185 | 1748 | 437 |
Total | 15,211 | 12,177 | 3034 |
No. | Level | Types | Number of Data Points (Testing Dataset) | |
---|---|---|---|---|
1 | GREEN | Normal | 675 | 714 |
Flooding (depth ≤ 0) | 39 | |||
2 | LIGHT GREEN | Garbage | 364 | 844 |
Traffic jam | 437 | |||
Flooding (10 < depth ≤ 150) | 43 | |||
3 | YELLOW | Bad road | 214 | 1476 |
Fallen tree | 296 | |||
Fire | 305 | |||
Traffic accident | 433 | |||
(depth > 150 mm) | 228 | |||
Total | 3034 | 3034 |
Model | Main Features | Suitability for Real-Time Monitoring | Suitability for Urban Road Anomaly Classifications | Suitability for Urban Road Waterlogging Monitoring | |
---|---|---|---|---|---|
1 | ResNet34 (Image classification model) | A fast, simple convolutional neural network that performs classification. | Medium It focuses on image classification rather than real-time processing. | High It focuses on image classification. | Medium-Low It is limited in detecting flood depth. It should be used in conjunction with other models. |
2 | YOLO (You Only Look Once) (Object detection model) | Real-time processing with rapid inference speeds, well-suited for detecting multiple objects for specific classes [49]. | High It is fast in inference speed and efficient in object detection capabilities. | High It is suitable for detecting and classifying | Medium-Low It should be used in conjunction with other models for optimal results. |
3 | Faster R-CNN (Object detection model) | High detection accuracy with Region Proposal Networks (RPN), ideal for detailed segmentation tasks [50]. | Low Less suitable due to slower inference speeds compared to single-stage detectors. | High Highly suitable for accurate anomaly detection and classification. | Medium-Low The model can be applied to this scenario; however, it shares the same limitations as YOLO in capturing water depth. |
4 | Mask R-CNN (Object detection model) | Capable of instance segmentation with pixel-level accuracy [51]. | Medium-Low Moderate suitability due to its additional segmentation overhead, which slows down processing. | High Highly suitable for detailed anomaly detection and pixel-level segmentation. | Low It requires integration with other models or methods for depth estimation or event detection. |
5 | Efficient Det (Object detection model) | Balances accuracy and speed using a compound scaling method [52]. | High Highly suitable due to its balance between speed and accuracy. | High Suitable for detecting and classifying road anomalies. | Medium-Low It requires integration with other models or machine learning methods for depth estimation. |
6 | InternVL (Vision–Language Model) | Integrates visual and language understanding for contextual and vision monitoring [53]. | Medium-High Highly suitable due to its balance between speed and accuracy. | High Suitable for detecting and classifying. | Medium-High The model can be applied to this scenario and can be trained and optimized based on the context. |
No. | Types | Number of Data Points (Testing Dataset) | Model Accuracy (%) | ||
---|---|---|---|---|---|
ResNet34 (First Approach: Linear Probing) | ResNet34 (Second Approach: Not Finetuned, Training) | ResNet34 (Third Approach: Finetuned, Other Layer Weights Are 1 Times LR, and the Last FC Layer Is 10 Times) | |||
1 | Bad road | 214 | 79.44 | 78.04 | 79.91 |
2 | Fallen tree | 296 | 81.08 | 93.58 | 90.54 |
3 | Fire | 305 | 91.8 | 84.92 | 82.3 |
4 | Flooding | 310 | 62.42 | 53.73 | 64.91 |
5 | Garbage | 364 | 67.03 | 70.88 | 73.08 |
6 | Normal | 675 | 89.93 | 75.7 | 86.81 |
7 | Traffic accident | 433 | 92.15 | 66.74 | 63.97 |
8 | Traffic jam | 437 | 79.44 | 78.04 | 79.91 |
Overall accuracy (%) | 3034 | 81.48 | 75.31 | 77.38 |
Servers | A800 | NVIDIA Jetson |
---|---|---|
GPU | 80 GB × 8 | 64 GB |
CPU | 58 G | |
Memory | ||
GPU Memory | 61.4 G |
No. | Types | Number of Data Points (Testing Dataset) | Model Accuracy (%) | |
---|---|---|---|---|
InternVL-2.5 MPO (Preliminary Classification Prompt) | ResNet34 (First Approach: Linear Probing) | |||
1 | Bad road | 214 | 90.19 | 79.44 |
2 | Fallen tree | 296 | 96.62 | 81.08 |
3 | Fire | 305 | 96.72 | 91.80 |
4 | Flooding | 310 | 93.48 | 62.42 |
5 | Garbage | 364 | 84.62 | 67.03 |
6 | Normal | 675 | 92.00 | 89.93 |
7 | Traffic accident | 433 | 92.84 | 92.15 |
8 | Traffic jam | 437 | 79.18 | 79.44 |
Overall accuracy (%) | 3034 | 90.35 | 81.48 |
No. | Types | Number of Data Points (Testing Dataset) | Model Accuracy (%) | |
---|---|---|---|---|
InternVL-2.5 MPO (Preliminary Classification Prompt) | InternVL-2.5 MPO (Optimized Prompt) | |||
1 | Bad road | 214 | 90.19 | 95.33 |
2 | Fallen tree | 296 | 96.62 | 96.62 |
3 | Fire | 305 | 96.72 | 97.05 |
4 | Flooding | 310 | 93.48 | 92.86 |
5 | Garbage | 364 | 84.62 | 87.64 |
6 | Normal | 675 | 92.00 | 92.00 |
7 | Traffic accident | 433 | 92.84 | 93.53 |
8 | Traffic jam | 437 | 79.18 | 93.59 |
Overall accuracy (%) | 3034 | 90.35 | 93.20 |
Number of Data Points (Testing Dataset) | Depth Error (%) | MAE (mm) | ||
---|---|---|---|---|
InternVL-2.5 MPO (Preliminary Classification Prompt) | InternVL-2.5 MPO (Optimized Prompt) | InternVL-2.5 MPO (Preliminary Classification Prompt) | InternVL-2.5 MPO (Optimized Prompt) | |
310 | 45.99 | 33.57 | 238.79 | 97.21 |
No. | Safety Level | Number of Data Points (Testing Dataset) | Safety Level Accuracy (%) | |
---|---|---|---|---|
InternVL-2.5 MPO (Preliminary Classification Prompt) | InternVL-2.5 MPO (Optimized Prompt) | |||
1 | GREEN | 714 | 91.81 | 92.42 |
2 | LIGHT GREEN | 844 | 79.86 | 88.19 |
3 | YELLOW | 1476 | 95.7 | 96.45 |
No. | Tasks | (1) Cloud Computing Response Time (ms) InterVL2.5 26B | Edge Computing Response Time (ms) InterVL2.5 7B | (2) Edge Cloud Computing Response Time (ms) |
---|---|---|---|---|
1 | Road anomaly classification | 2656.59 | 5450.09 | 5450.09 |
2 | Road waterlogging depth estimation | 2884.02 | 8319.64 | 2884.02 |
3 | Road safety level | 3763.04 | 8903.92 | 3763.04 |
Total | 9303.65 | 22,673.66 | 12,097.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, H.; Du, Y.; Xia, Z. Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management. Appl. Sci. 2025, 15, 2517. https://doi.org/10.3390/app15052517
Ding H, Du Y, Xia Z. Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management. Applied Sciences. 2025; 15(5):2517. https://doi.org/10.3390/app15052517
Chicago/Turabian StyleDing, Hanyu, Yawei Du, and Zhengyu Xia. 2025. "Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management" Applied Sciences 15, no. 5: 2517. https://doi.org/10.3390/app15052517
APA StyleDing, H., Du, Y., & Xia, Z. (2025). Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management. Applied Sciences, 15(5), 2517. https://doi.org/10.3390/app15052517