QHAWAY: An Instance Segmentation and Monocular Distance Estimation ADAS for Vulnerable Road Users in Informal Andean Urban Corridors
Abstract
1. Introduction
2. Related Work
2.1. Instance Segmentation in ADAS Applications
2.2. Monocular Distance Estimation for ADAS
2.3. Speed Bump Detection
2.4. Pedestrian Detection and Crosswalk Analysis
2.5. Local Vehicle Classes and Data Localisation
3. Materials and Methods
3.1. QHAWAY ADAS System Architecture
3.2. Study Area
3.3. Informal Speed Bump Field Survey
3.4. Hybrid Dataset Construction
3.4.1. Data Sources and Collection Methods
3.4.2. Class Definition
3.4.3. Final Hybrid Dataset
3.5. Annotation with LabelMe and SAM2
3.6. YOLOv8L-seg Architecture
3.7. Three-Phase Progressive Training Strategy
3.8. Grad-CAM Explainability Analysis
4. Results
4.1. Training Progress
4.2. Three-Phase Ablation Study
4.3. Overall Detection and Segmentation Performance
4.4. Per-Class Performance
4.5. Precision–Recall Curves
4.6. Multi-Architecture Comparison: YOLOv8L-seg vs. YOLO26 Family
4.7. Real-Time Performance
4.8. Representative Detections and Operational Validation
4.9. Distance Estimation: Geometric Validation
4.10. Grad-CAM Explainability Results
5. Discussion
5.1. Informal Infrastructure: A Quantifiable Hazard
5.2. Mototaxi: First Instance Segmentation Benchmark
5.3. Operational Reliability of Distance Estimation
5.4. Architectural Insights from the YOLO26 Comparison
6. Limitations
7. Conclusions
Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ADAS | Advanced Driver Assistance System |
| FCW | Forward Collision Warning |
| FPS | Frames per Second |
| Grad-CAM | Gradient-weighted Class Activation Mapping |
| IMU | Inertial Measurement Unit |
| MAE | Mean Absolute Error |
| mAP | Mean Average Precision |
| MTC | Ministry of Transport and Communications |
| NHTSA | National Highway Traffic Safety Administration |
| STAL | Small-Target-Aware Label Assignment |
| TTC | Time-to-Collision |
| TTS | Text-to-Speech |
| YOLO | You Only Look Once |
References
- World Health Organization. Global Status Report on Road Safety 2023; WHO: Geneva, Switzerland, 2023. [Google Scholar]
- World Health Organization. Leaders Make New Road Safety Commitments, Endorse New Declaration on Action to Reduce Road Deaths. In Proceedings of the 4th Global Ministerial Conference on Road Safety, Marrakech, Morocco, 18–20 February 2025. [Google Scholar]
- Hornberger, N.H.; Coronel-Molina, S.M. Quechua Language Shift, Maintenance, and Revitalization in the Andes: The Case for Language Planning. Int. J. Sociol. Lang. 2004, 167, 9–67. [Google Scholar] [CrossRef]
- Cerrón-Palomino, R. Quechua Sureño: Diccionario Unificado; Biblioteca Nacional del Perú: Lima, Peru, 1994. [Google Scholar]
- Observatorio Nacional de Seguridad Vial (ONSV). Estadísticas de Siniestralidad Vial 2023; Ministerio de Transportes y Comunicaciones: Lima, Peru, 2024.
- Observatorio Nacional de Seguridad Vial (ONSV). Informe de Víctimas Fatales en Siniestros de Tránsito e Identificación de Puntos de Alta Siniestralidad en la Región Ayacucho, 2021–2024; MTC: Lima, Peru, 2024.
- Policía Nacional del Perú (PNP). Anuario Estadístico de Accidentes de Tránsito 2023; PNP: Lima, Peru, 2024.
- Tenorio-Huarancca, D.O.; Lizarbe-Alarcon, H.; Ayala-Bizarro, R.G.; Tenorio-Palomino, M.G.; Bravo-Anaya, R.G.; Ircañaupa-Huamaní, A.S.; Leon-Palacios, E.; Bellido-Aedo, V. Intelligent Traffic Light Optimization System Using Convolutional Neural Networks for Historic City Centers in Complex Scenarios. Math. Model. Eng. Probl. 2025, 12, 3247–3264. [Google Scholar] [CrossRef]
- Shah, S.A.; Ahmed, F.; Rani, P.; Ahmad, K.; Ahmad, B.; Khan, A.; Mehmood, I. A Comprehensive Survey on Advanced Driver Assistance Systems Based on Deep Learning Techniques. IEEE Access 2025, 13, 48712–48740. [Google Scholar]
- Ayachi, R.; Said, Y.; Afif, M.; Alshammari, A.; Hleili, M.; Abdelali, A.B. Assessing YOLO Models for Real-Time Object Detection in Urban Environments for Advanced Driver-Assistance Systems. Alex. Eng. J. 2025, 123, 530–549. [Google Scholar] [CrossRef]
- Unar, S.; Wang, W.; Zhang, C.; Zhang, Y. Visual Attention Models for Pixel-Level Scene Understanding in Autonomous Driving. Sensors 2026, 26, 1021. [Google Scholar]
- Dai, X.; Yin, Z.; Fang, C.; Chen, T.; Tian, Y. Multi-Task Learning for Simultaneous Monocular Depth Estimation and Object Detection. IEEE Intell. Transp. Syst. Mag. 2021, 13, 218–232. [Google Scholar]
- Chaman, A.; Iqbal, M.; Ul-Hassan, M.; Siddique, A. Benchmarking YOLO-Based Monocular ADAS for Low-Resource Deployment. Sensors 2026, 26, 889. [Google Scholar]
- National Highway Traffic Safety Administration. Event Data Recorders and Forward Collision Warning Systems; NHTSA: Washington, DC, USA, 2013.
- Park, S.; Kim, B.; Kim, C.; Kim, J. Pedestrian Detection Based on Monocular Camera Using Geometry Constraint. Int. J. Adv. Robot. Syst. 2014, 11, 153. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 1–21. [Google Scholar]
- Linardi, R.; Soo, N.; Nabua, A.M.; Adhi, C.G.S.; Engel, V.J.L. Traffic Sign Detection Using YOLO: Evaluating Failures on Localized Data. Procedia Comput. Sci. 2025, 269, 1494–1501. [Google Scholar] [CrossRef]
- Du, Z.; Li, H.; Chen, X.; Wang, M. ML-E-YOLO: Multi-Label Enhanced YOLO for Heterogeneous Urban Vehicle Detection. Sensors 2026, 26, 1147. [Google Scholar]
- Bao, W.; Zhang, J.; Liu, Y.; Xiao, T. EMD-YOLOv8: Enhanced Multi-Class Detection for Diverse Vehicle Types. IEEE Access 2026, 14, 18234–18249. [Google Scholar]
- Wang, R.; Luo, X.; Ye, Q.; Jiang, Y.; Liu, W. Research on Visual Perception of Speed Bumps Based on Lightweight FPNet. Sensors 2024, 24, 2130. [Google Scholar] [CrossRef]
- Kaya, Ö.; Çodur, M.Y.; Mustafaraj, E. Automatic Detection of Pedestrian Crosswalk with Faster R-CNN and YOLOv7. Buildings 2023, 13, 1070. [Google Scholar] [CrossRef]
- Russon, D.; Guennec, A.; Naredo-Turrado, J.; Xu, B.; Boussuge, C.; Battaglia, V.; Hiron, B.; Lagarde, E. Evaluating Pedestrian Crossing Safety: CNN Model Trained on Paired Aerial and Subjective Perspective Images. Heliyon 2025, 11, e42428. [Google Scholar] [CrossRef]
- Kulawik, J.; Kuczyński, Ł. AI-Based Detection and Classification of Horizontal Road Markings Dedicated to Driver Assistance Systems. Appl. Sci. 2025, 15, 12189. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J. Ultralytics YOLO26. 2026. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 6 March 2026).
- Pan, Z.; Wu, Z.; Ma, W.; Li, C.; Zhang, X. Pedestrian Detection Based on Instance Segmentation for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2024, 25, 3714–3725. [Google Scholar]
- Peralta-López, J.-E.; Morales-Viscaya, J.A.; Lazaro-Mata, D.; Villasenor-Aguilar, M.J.; Prado-Olivarez, J.; Perez-Pinal, F.J.; Padilla-Medina, J.A.; Martínez-Nolasco, J.J.; Barranco-Gutierrez, A.I. Speed Bump and Pothole Detection Using Deep Neural Network with ZED Camera. Appl. Sci. 2023, 13, 8349. [Google Scholar] [CrossRef]
- Zhumadillayeva, A.; Ahanger, T.A.; Matkarimov, B. An Intelligent Hybrid YOLO–CNN–LSTM Framework for Real-Time Road Infrastructure Monitoring. Measurement 2025, 242, 119561. [Google Scholar] [CrossRef]
- Han, Z.; Ma, X.; Zheng, J. Urban Road Pedestrian Detection System Integrating IoT Technology and Multi-Sensor Data Fusion. Array 2025, 28, 100563. [Google Scholar] [CrossRef]
- Zhang, Z.-D.; Tan, M.L.; Lan, Z.C.; Liu, H.C.; Pei, L.; Yu, W.X. CDNet: A Real-Time and Robust Crosswalk Detection Network on Jetson Nano Based on YOLOv5. Neural Comput. Appl. 2022, 34, 10717–10737. [Google Scholar] [CrossRef]
- Wang, G.; Lin, T.; Dong, X.; Wang, L.; Leng, Q.; Shin, S.Y. CGADNet: A Lightweight, Real-Time Crosswalk and Guide Arrow Detection Network for Complex Scenes. Appl. Sci. 2024, 14, 9445. [Google Scholar] [CrossRef]
- Asgari, S.; Luo, Y.; Bhattacharya, S.; Akbari, A.; Belbin, G.M.; Li, X.; Harris, D.N.; Selig, M.; Bartell, E.; Calderon, R.; et al. A Positively Selected FBN1 Missense Variant Reduces Height in Peruvian Individuals. Nature 2020, 582, 234–239. [Google Scholar] [CrossRef]
- Microsoft Corporation. Edge TTS: Text-to-Speech via Microsoft Azure Neural Voices. 2024. Available online: https://github.com/rany2/edge-tts (accessed on 6 March 2026).
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the IEEE/CVF CVPR; IEEE: Piscataway, NJ, USA, 2020; pp. 2636–2645. [Google Scholar]
- Behrendt, K.; Novak, L.; Botros, R. A Deep Learning Approach to Traffic Lights: Detection, Tracking, and Classification. In Proceedings of the IEEE ICRA; IEEE: Piscataway, NJ, USA, 2017; pp. 1370–1377. [Google Scholar]
- Zou, Q.; Jiang, H.; Dai, Q.; Yue, Y.; Chen, L.; Wang, Q. Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks. IEEE Trans. Veh. Technol. 2020, 69, 41–54. [Google Scholar] [CrossRef]
- Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.; et al. Segment Anything. In Proceedings of the IEEE/CVF ICCV; IEEE: Piscataway, NJ, USA, 2023; pp. 4015–4026. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.-T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 6 March 2026).
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE ICCV; IEEE: Piscataway, NJ, USA, 2017; pp. 618–626. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks. In Proceedings of the IEEE WACV; IEEE: Piscataway, NJ, USA, 2018; pp. 839–847. [Google Scholar]
- Muhammad, M.B.; Yeasin, M. Eigen-CAM: Class Activation Map using Principal Components. In Proceedings of the International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
- Sapkota, R.; Bhattarai, B.; Davison, A.J.; Kim, D. YOLO26: Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection. arXiv 2026, arXiv:2509.25164v4. [Google Scholar]
- Urbina-Dominguez, E.J.; Lizarbe-Alarcon, H.; Galvez-Gastelu, Y.; Porras-Flores, E.E.; Moncada-Sosa, W.; Estrada-Cardenas, J.E.; Leon-Palacios, E.; Tenorio-Huarancca, D.O. ADAS-TSR: A Deep Learning-Based Traffic Sign Recognition System with Voice Alerts for Andean Historic City Centers. Appl. Sci. 2026, 16, 2664. [Google Scholar] [CrossRef]













| ID | Class | (m) | Source |
|---|---|---|---|
| 0 | Speed bump (device height) | 0.10 | MTC field survey |
| 1 | Crosswalk (stripe width) | 0.30 | MTC standard |
| 4 | Person | 1.65 | Peruvian adult anthropometry [31] |
| 5 | Motorcycle | 1.10 | Field measurement |
| 6 | Private car | 1.50 | Field measurement |
| 7 | Truck | 3.20 | MTC regulation |
| 8 | Bus | 3.00 | MTC regulation |
| 10 | Traffic light (lens diameter) | 0.30 | MTC standard |
| 11 | Mototaxi (with roof) | 1.60 | Field measurement |
| Criterion | MTC Standard | Observed (Mean) | Compliance (%) |
|---|---|---|---|
| Height | ≤0.08 m | 0.11 m | 4 |
| Cross-section | Trapezoidal | Semicircular | 0 |
| Vertical sign (≤100 m) | Mandatory | Absent | 4 |
| Surface marking | Mandatory | Absent or deteriorated | 8 |
| Overall compliance | – | – | 4 |
| ID | Class Name | Instances | % |
|---|---|---|---|
| 0 | Speed bump | 1847 | 17.3 |
| 1 | Crosswalk | 412 | 3.9 |
| 2 | Straight line | 1203 | 11.2 |
| 3 | Straight-right line | 634 | 5.9 |
| 4 | Person | 2918 | 27.3 |
| 5 | Motorcycle | 891 | 8.3 |
| 6 | Private car | 1074 | 10.0 |
| 7 | Truck | 213 | 2.0 |
| 8 | Bus | 287 | 2.7 |
| 9 | Green traffic light | 344 | 3.2 |
| 10 | Red traffic light | 521 | 4.9 |
| 11 | Mototaxi | 357 | 3.3 |
| Total | 10,701 | 100.0 |
| Source | Images | Instances | Classes |
|---|---|---|---|
| Original Ayacucho (local) | 4598 | 10,701 | 12 |
| BDD100K (subset) | 14,211 | 78,932 | 8 |
| BSTLD | 5093 | 30,517 | 2 |
| RLMD | 1700 | 7375 | 2 |
| Hybrid total | 25,602 | 127,525 | 12 |
| Phase | Epochs | Resolution (px) | Batch Size | Learning Rate | Initial Weights |
|---|---|---|---|---|---|
| Phase 1 | 100 | 640 × 640 | 16 | 0.01 | COCO pre-trained |
| Phase 2 | 150 | 800 × 800 | 8 | 0.001 | Best from Phase 1 |
| Phase 3 | 223 | 1024 × 1024 | 4 | 0.0005 | Best from Phase 2 |
| Phase | Resol. (px) | mAP50 Box | mAP50 Mask | Box | Mask |
|---|---|---|---|---|---|
| Phase 1 (640 px) | 640 | 0.682 | 0.529 | – | – |
| Phase 2 (800 px) | 800 | 0.741 | 0.633 | +5.9 pp | +10.4 pp |
| Phase 3 (1024 px) | 1024 | 0.810 | 0.778 | +6.9 pp | +14.5 pp |
| Total gain, Phase 1 → Phase 3 | +12.8 pp | +24.9 pp | |||
| Metric | mAP50 | Precision | Recall | F1 |
|---|---|---|---|---|
| Detection (Box) | 0.810 | 0.885 | 0.724 | 0.796 |
| Segmentation (Mask) | 0.778 | 0.876 | 0.690 | 0.772 |
| Class | Val. Instances | mAP50 Box | mAP50 Mask |
|---|---|---|---|
| Speed bump | 379 | 0.923 | 0.897 |
| Straight-right line | 72 | 0.898 | 0.898 |
| Red traffic light | 255 | 0.892 | 0.785 |
| Person | 4608 | 0.866 | 0.826 |
| Green traffic light | 251 | 0.861 | 0.828 |
| Private car | 2256 | 0.828 | 0.791 |
| Motorcycle | 1025 | 0.791 | 0.786 |
| Mototaxi | 566 | 0.769 | 0.738 |
| Crosswalk | 563 | 0.765 | 0.771 |
| Bus | 440 | 0.762 | 0.728 |
| Truck | 177 | 0.756 | 0.659 |
| Straight line | 109 | 0.610 | 0.628 |
| All classes | 10,701 | 0.810 | 0.778 |
| Model | Best Ep. | mAP50 Box | mAP50 Mask | mAP50-95 Box | mAP50-95 Mask | P(B) | R(B) |
|---|---|---|---|---|---|---|---|
| YOLOv8L-seg | 173 | 0.810 | 0.778 | – | – | 0.885 | 0.724 |
| YOLO26n-seg | 222 | 0.665 | 0.631 | 0.436 | 0.386 | 0.737 | 0.602 |
| YOLO26s-seg | 242 | 0.759 | 0.730 | 0.541 | 0.470 | 0.831 | 0.676 |
| YOLO26L-seg | 179 | 0.829 | 0.788 | 0.614 | 0.510 | 0.868 | 0.751 |
| Class | mAP50 Box | mAP50 Mask | ||||
|---|---|---|---|---|---|---|
| v8L | 26L | v8L | 26L | |||
| Red traffic light | 0.892 | 0.935 | +4.3 | 0.785 | 0.795 | +1.0 |
| Person | 0.866 | 0.889 | +2.3 | 0.826 | 0.846 | +2.0 |
| Green traffic light | 0.861 | 0.899 | +3.8 | 0.828 | 0.824 | −0.4 |
| Speed bump | 0.923 | 0.883 | −4.0 | 0.897 | 0.882 | −1.5 |
| Private car | 0.828 | 0.853 | +2.5 | 0.791 | 0.793 | +0.2 |
| Truck | 0.756 | 0.850 | +9.4 | 0.659 | 0.850 | +19.1 |
| Mototaxi | 0.769 | 0.821 | +5.2 | 0.738 | 0.710 | −2.8 |
| Motorcycle | 0.791 | 0.820 | +2.9 | 0.786 | 0.796 | +1.0 |
| Straight line | 0.610 | 0.697 | +8.7 | 0.628 | 0.697 | +6.9 |
| Straight-right line | 0.898 | 0.773 | −12.5 | 0.898 | 0.773 | −12.5 |
| Crosswalk | 0.765 | 0.769 | +0.4 | 0.771 | 0.776 | +0.5 |
| Bus | 0.762 | 0.763 | +0.1 | 0.728 | 0.709 | −1.9 |
| All classes | 0.810 | 0.829 | +1.9 | 0.778 | 0.788 | +1.0 |
| Frame/Object | (m) | (m) | (m) | Primary Cause of Deviation |
|---|---|---|---|---|
| (a) Child crouching | 2.50 | 5.60 | 3.10 | Height-reference violation ( m) |
| (a) Standing pedestrian | 2.69 † | 3.80 | 1.11 | Lateral displacement () |
| (b) Speed bump | ≈2.00 | 1.70 | 0.30 | ; road curvature |
| (c) Standing pedestrian | 3.90 | 4.10 | 0.20 | Minor height deviation ( m vs. 1.65 m) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cruz-Moran, A.D.l.; Lizarbe-Alarcon, H.; Moncada, W.; Bellido-Aedo, V.; Carrasco-Badajoz, C.; Rayme-Chalco, C.; Aldana, C.; Saavedra, Y.; Saavedra, E.; Pereda, A. QHAWAY: An Instance Segmentation and Monocular Distance Estimation ADAS for Vulnerable Road Users in Informal Andean Urban Corridors. Sensors 2026, 26, 2569. https://doi.org/10.3390/s26082569
Cruz-Moran ADl, Lizarbe-Alarcon H, Moncada W, Bellido-Aedo V, Carrasco-Badajoz C, Rayme-Chalco C, Aldana C, Saavedra Y, Saavedra E, Pereda A. QHAWAY: An Instance Segmentation and Monocular Distance Estimation ADAS for Vulnerable Road Users in Informal Andean Urban Corridors. Sensors. 2026; 26(8):2569. https://doi.org/10.3390/s26082569
Chicago/Turabian StyleCruz-Moran, Abel De la, Hemerson Lizarbe-Alarcon, Wilmer Moncada, Victor Bellido-Aedo, Carlos Carrasco-Badajoz, Carolina Rayme-Chalco, Cristhian Aldana, Yesenia Saavedra, Edwin Saavedra, and Alex Pereda. 2026. "QHAWAY: An Instance Segmentation and Monocular Distance Estimation ADAS for Vulnerable Road Users in Informal Andean Urban Corridors" Sensors 26, no. 8: 2569. https://doi.org/10.3390/s26082569
APA StyleCruz-Moran, A. D. l., Lizarbe-Alarcon, H., Moncada, W., Bellido-Aedo, V., Carrasco-Badajoz, C., Rayme-Chalco, C., Aldana, C., Saavedra, Y., Saavedra, E., & Pereda, A. (2026). QHAWAY: An Instance Segmentation and Monocular Distance Estimation ADAS for Vulnerable Road Users in Informal Andean Urban Corridors. Sensors, 26(8), 2569. https://doi.org/10.3390/s26082569

