Research on Road Surface Distress Detection Algorithm in UAV Images with Multi-Scale Feature Fusion
Highlights
- An improved YOLOv8 algorithm is proposed for UAV-based road surface defect detection, incorporating four novel modules—FFDPN, IIDH, EIEM, and WaveletPool—to address insufficient feature fusion, detail loss, and small-target aliasing distortion, achieving a 12.2% increase in mAP (83.8% → 96.0%).
- With only 2.41 × 106 parameters, the improved model outperforms mainstream detectors, including Faster R-CNN, YOLOv9, and YOLOv11n, in Precision (93.7%), Recall (89.6%), and mAP, while effectively eliminating duplicate detections and missed detections across four defect categories.
- The proposed lightweight, high-accuracy model provides a practical solution for automated UAV-based highway pavement inspection, supporting the digital transformation of road maintenance by reducing reliance on manual labor and lowering inspection costs and safety risks.
- The design principles of FFDPN and WaveletPool offer transferable methodological insights for multi-scale feature fusion and anti-aliasing downsampling in small-target detection tasks, with broad applicability to other UAV remote sensing object detection scenarios.
Abstract
1. Introduction
2. Materials and Methods
2.1. The Oretical Overview of the YOLOv8 Algorithm
2.2. Improving the YOLOv8 Road Surface Defect Detection Algorithm
2.2.1. Design of Feature-Focusing Diffusion Pyramid Network
2.2.2. Design of Information Interaction Detection Head
2.2.3. Design of Edge Information Extraction Module
2.2.4. WaveletPool
3. Experiments and Analysis of Results
3.1. Experimental Setup and Evaluation Criteria
3.2. Ablation Experiment
3.3. Comparison Experiment
3.4. Detection Results
4. Discussion
4.1. Why the Proposed Modules Improve Detection
4.2. Comparison with State-of-the-Art Detectors
4.3. Distinguishing Distresses from Visually Similar Non-Distress Artifacts
4.4. Generalization, Failure Modes, and Limitations
4.5. On Novelty and the Engineering–Theory Trade-Off
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| UAV | Unmanned Aerial Vehicle |
| FPN | Feature Pyramid Network |
| WASPP | Waterfall Atrous Spatial Pyramid Pooling |
| SE | Squeeze and Excitation |
| IoU | Intersection over Union |
| MIoU | Mean Intersection over Union |
| Params | Parameters |
References
- Kano, E.; Tachibana, S.; Tsuda, K. Analyzing the Impact of Digital Technologies on the Productivity of Road Maintenance Operations. Procedia Comput. Sci. 2022, 207, 1623–1632. [Google Scholar] [CrossRef]
- Renzi, E.; Trifarò, C.A. Knowledge and Digitalization: A Way to Improve Safety of Road and Highway Infrastructures. Procedia Struct. Integr. 2023, 44, 1228–1235. [Google Scholar] [CrossRef]
- Zhang, C.; Nateghinia, E.; Miranda-Moreno, L.F.; Sun, L. Pavement Distress Detection Using Convolutional Neural Network (CNN): A Case Study in Montreal, Canada. Int. J. Transp. Sci. Technol. 2022, 11, 298–309. [Google Scholar] [CrossRef]
- Wang, J.; Zhou, K.; Xing, W.; Li, H.; Yang, Z. Applications, Evolutions, and Challenges of Drones in Maritime Transport. J. Mar. Sci. Engineering 2023, 11, 2056. [Google Scholar] [CrossRef]
- Yan, J. Research on the Application of UAV Remote Sensing Technology in Surveying and Mapping Engineering Survey. In Springer Proceedings in Physics; Springer Nature: Singapore, 2022; pp. 385–394. [Google Scholar]
- Wang, C.; Pei, H.; Tang, G.; Liu, B.; Liu, Z. Pointer Meter Recognition in UAV Inspection of Overhead Transmission Lines. Energy Rep. 2022, 8, 243–250. [Google Scholar] [CrossRef]
- Li, L.; Hu, Z.; Liu, Q.; Yi, T.; Han, P.; Zhang, R.; Pan, L. Effect of Flight Velocity on Droplet Deposition and Drift of Combined Pesticides Sprayed Using an Unmanned Aerial Vehicle Sprayer in a Peach Orchard. Front. Plant Sci. 2022, 13, 981494. [Google Scholar] [CrossRef]
- Zhu, J.; Zhong, J.; Ma, T.; Huang, X.; Zhang, W.; Zhou, Y. Pavement Distress Detection Using Convolutional Neural Networks with Images Captured via UAV. Autom. Constr. 2022, 133, 103991. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Zuo, Z.; Xu, X.; Wu, J.; Zhu, J.; Zhang, H.; Wang, J.; Tian, Y. Road Damage Detection Using UAV Images Based on Multi-Level Attention Mechanism. Autom. Constr. 2022, 144, 104613. [Google Scholar] [CrossRef]
- Wang, T.; Cui, Z.; Li, X. AMFT-YOLO: A Adaptive Multi-Scale YOLO Algorithm with Multi-Level Feature Fusion for Object Detection in UAV Scenes. In Lecture Notes in Computer Science; Springer Nature: Singapore, 2025; pp. 72–85. [Google Scholar]
- Yan, X.; Sun, S.; Zhu, H.; Hu, Q.; Ying, W.; Li, Y. DMF-YOLO: Dynamic Multi-Scale Feature Fusion Network-Driven Small Target Detection in UAV Aerial Images. Remote Sens. 2025, 17, 2385. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
- Ning, J.; Spratling, M. The Importance of Anti-Aliasing in Tiny Object Detection. arXiv 2023, arXiv:2310.14221. [Google Scholar]
- Deng, S.; Li, S.; Xie, K.; Song, W.; Liao, X.; Hao, A.; Qin, H. A Global-Local Self-Adaptive Network for Drone-View Object Detection. IEEE Trans. Image Process. 2021, 30, 1556–1569. [Google Scholar] [CrossRef] [PubMed]
- Cai, D.; Lu, Z.; Fan, X.; Ding, W.; Li, B. Improved YOLOv4-Tiny Target Detection Method Based on Adaptive Self-Order Piecewise Enhancement and Multiscale Feature Optimization. Appl. Sci. 2023, 3, 8177. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the Presented at the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Yang, C.; Huang, Z.; Wang, N. QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection. arXiv 2021, arXiv:2103.09136. [Google Scholar]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. arXiv 2023, arXiv:2309.11331. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO[EB/OL], Version 8.0.0.; AGPL-3.0 License; Ultralytics YOLO: London, UK, 2023. [Google Scholar]
- Ferrà, A.; Aguilar, E.; Radeva, P. Multiple Wavelet Pooling for CNNs. In Computer Vision–ECCV 2018 Workshops; Leal-Taixé, L., Roth, S., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11132, pp. 671–675. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets v2: More Deformable, Better Results. arXiv 2018, arXiv:1811.11168. [Google Scholar]
- Cheng, T.; Song, L.; Ge, Y.; Liu, W.; Wang, X.; Shan, Y. YOLO-World: Real-Time Open-Vocabulary Object Detection. arXiv 2024, arXiv:2401.17270. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the Presented at the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Jocher, G.; Qiu, J. Ultralytics YOLO11n[EB/OL], Version 11.0.0.; AGPL-3.0 License; Ultralytics YOLO: London, UK, 2024. [Google Scholar]
- Yan, H.; Zhang, J. UAV-PDD2023: A Benchmark Dataset for Pavement Distress Detection Based on UAV Images. Data Brief 2023, 51, 109692. [Google Scholar] [CrossRef] [PubMed]
- Ni, Z.; Xu, Y.; Zhang, W.; Zong, Z.; Ren, P. A Size-Aware Graph Embedding Approach to Remote Sensing Image Captioning with Object Relative Size Information. IEEE Trans. Geosci. Remote Sens. 2026, 64, 1–13. [Google Scholar] [CrossRef]
- Ma, P.; Fu, Y.; Lyu, J.; Liu, Z. Understanding Global Structure Relation via Reversible Visual State Space Model for Robust Cross-View Geo-Localization. In Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective (UAVM ‘25), Dublin, Ireland, 27–31 October 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 42–46. [Google Scholar] [CrossRef]











| Name | Parameters |
|---|---|
| Operating System | Windows 10 |
| Processor | AMD Ryzen 9 5950X |
| Graphics Card | NVIDIA GeForce RTX 3090Ti |
| RAM | 32G |
| Development Language | Python 3.9 |
| Development Environment | Pycharm 2021 |
| Network Architecture | Pytorch 1.10 |
| CUDA Toolkit | CUDA 11.3 |
| Hyperparameters | Value |
|---|---|
| Image input size settings | 1920 × 1080 × 3 |
| Optimizer | SGD |
| Initial learning rate | 0.01 |
| Weight decay coefficient | 5 × 10−4 |
| Momentum parameter | 0.937 |
| Batch size | 4 |
| Number of training rounds | 300 |
| Group | FFDPN | IIDH | WaveletPool | EIEM | P/% | R/% | mAP@0.50/% | Params/106 | GFLOPs/G | FPS/(Frame·s−1) |
|---|---|---|---|---|---|---|---|---|---|---|
| ① | − | − | − | − | 79.3 | 76.9 | 83.8 | 3.01 | 8.1 | 57.1 |
| ② | √ | − | − | − | 85.1 | 75.5 | 87.0 | 3.04 | 9.4 | 38.3 |
| ③ | − | √ | − | 87.6 | 79.7 | 86.7 | 2.24 | 8.6 | 59.2 | |
| ④ | − | − | √ | − | 89.9 | 74.9 | 84.5 | 2.70 | 7.5 | 53.7 |
| ⑤ | − | − | − | √ | 93.2 | 79.9 | 89.7 | 3.02 | 8.8 | 52.9 |
| ⑥ | √ | √ | − | − | 86.0 | 79.0 | 88.3 | 2.61 | 10.0 | 38.1 |
| ⑦ | √ | √ | √ | − | 88.1 | 83.7 | 90.5 | 2.31 | 9.4 | 38.3 |
| ⑧ | √ | √ | √ | √ | 93.7 | 89.6 | 96.0 | 2.41 | 10.4 | 30.3 |
| Model | P/% | R/% | mAP@0.50/% | Params/106 | GFLOPs/G | FPS/(Frame·s−1) |
|---|---|---|---|---|---|---|
| Faster R-CNN | 82.7 | 44.1 | 74.5 | 28.48 | 15,411.79 | 10.32 |
| YOLOv7-tiny | 82.9 | 55.2 | 70.4 | 10.80 | 8.2 | 56.9 |
| DETR | 80.5 | 70.9 | 78.7 | 19.89 | 57.0 | 12.6 |
| YOLOv9 | 83.9 | 90.8 | 92.4 | 50.70 | 236.7 | 5.8 |
| YOLOv11n | 80.5 | 79.3 | 87.0 | 2.58 | 6.3 | 41.0 |
| YOLO-Word | 82.6 | 76.2 | 83.5 | 4.05 | 9.6 | 41.1 |
| YOLOv10n | 56.7 | 73.0 | 67.8 | 2.27 | 6.5 | 66.7 |
| YOLOv8n | 79.3 | 76.9 | 83.8 | 3.01 | 8.1 | 57.1 |
| Ours | 93.7 | 89.6 | 96.0 | 2.41 | 10.4 | 30.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Guo, D.; Cai, W.; Shuai, H.; Wei, Z.; Chen, G. Research on Road Surface Distress Detection Algorithm in UAV Images with Multi-Scale Feature Fusion. Remote Sens. 2026, 18, 1461. https://doi.org/10.3390/rs18101461
Guo D, Cai W, Shuai H, Wei Z, Chen G. Research on Road Surface Distress Detection Algorithm in UAV Images with Multi-Scale Feature Fusion. Remote Sensing. 2026; 18(10):1461. https://doi.org/10.3390/rs18101461
Chicago/Turabian StyleGuo, Dudu, Wenxing Cai, Hongbo Shuai, Zhenxun Wei, and Guoliang Chen. 2026. "Research on Road Surface Distress Detection Algorithm in UAV Images with Multi-Scale Feature Fusion" Remote Sensing 18, no. 10: 1461. https://doi.org/10.3390/rs18101461
APA StyleGuo, D., Cai, W., Shuai, H., Wei, Z., & Chen, G. (2026). Research on Road Surface Distress Detection Algorithm in UAV Images with Multi-Scale Feature Fusion. Remote Sensing, 18(10), 1461. https://doi.org/10.3390/rs18101461

