Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation
Highlights
- Developed a hybrid Transformer–CNN model (PoFormer) that significantly improves pavement pothole segmentation.
- Achieved higher detection accuracy than existing models under diverse environmental conditions.
- Enables more reliable and efficient pavement condition monitoring for intelligent transportation systems.
- Provides an open-source dataset to support further research and model development in road surface analysis.
Abstract
1. Introduction
2. Methods
2.1. Pavement Distress Data Acquisition Vehicle
2.2. PoFormer
2.3. Overall Evaluation Procedure
2.4. Evaluation Metrics
3. Results
3.1. Dataset Characteristics for Open Source Data Access
3.2. Model Performance
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kim, T.; Ryu, S.-K. Review and analysis of pothole detection methods. J. Emerg. Trends Comput. Inf. Sci. 2014, 5, 603–608. [Google Scholar]
- Dhiman, A.; Klette, R. Pothole detection using computer vision and learning. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3536–3550. [Google Scholar] [CrossRef]
- Guan, J.; Yang, X.; Ding, L.; Cheng, X.; Lee, V.C.; Jin, C. Automated pixel-level pavement distress detection based on stereo vision and deep learning. Autom. Constr. 2021, 129, 103788. [Google Scholar] [CrossRef]
- Wang, N.; Dong, J.; Fang, H.; Li, B.; Zhai, K.; Ma, D.; Shen, Y.; Hu, H. 3D reconstruction and segmentation system for pavement potholes based on improved structure-from-motion (SFM) and deep learning. Constr. Build. Mater. 2023, 398, 132499. [Google Scholar] [CrossRef]
- Lee, S.-Y.; Le, T.H.M.; Kim, Y.-M. Prediction and detection of potholes in urban roads: Machine learning and deep learning based image segmentation approaches. Dev. Built Environ. 2023, 13, 100109. [Google Scholar] [CrossRef]
- Xiong, X.; Tan, Y.; Hu, J.; Hong, X.; Tang, J. Evaluation of Asphalt Pavement Internal Distresses Using Three-Dimensional Ground-Penetrating Radar. Int. J. Pavement Res. Technol. 2024, 18, 1126–1137. [Google Scholar] [CrossRef]
- Ravi, R.; Habib, A.; Bullock, D. Pothole mapping and patching quantity estimates using lidar-based mobile mapping systems. Transp. Res. Rec. 2020, 2674, 124–134. [Google Scholar] [CrossRef]
- Cho, Y.K.; Bode, T.; Song, J.; Jeong, J.-H. Thermography-driven distress prediction from hot mix asphalt road paving construction. J. Constr. Eng. Manag. 2012, 138, 206–214. [Google Scholar] [CrossRef]
- Zhang, T.; Rahman, M.A.; Peterson, A.; Lu, Y. Novel Damage Index-Based Rapid Evaluation of Civil Infrastructure Subsurface Defects Using Thermography Analytics. Infrastructures 2022, 7, 55. [Google Scholar] [CrossRef]
- Wang, A.; Lang, H.; Chen, Z.; Peng, Y.; Ding, S.; Lu, J.J. The Two-Step Method of Pavement Pothole and Raveling Detection and Segmentation Based on Deep Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5402–5417. [Google Scholar] [CrossRef]
- Fan, R.; Wang, H.; Bocus, M.J.; Liu, M. We learn better road pothole detection: From attention aggregation to adversarial domain adaptation. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. 2020; pp. 285–300. [Google Scholar]
- Mathavan, S.; Kamal, K.; Rahman, M. A review of three-dimensional imaging technologies for pavement distress detection and measurements. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2353–2362. [Google Scholar] [CrossRef]
- Vigneshwar, K.; Kumar, B.H. Detection and counting of pothole using image processing techniques. In Proceedings of the 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Chennai, India, 15–17 December 2016; pp. 1–4. [Google Scholar]
- Buza, E.; Omanovic, S.; Huseinovic, A. Pothole detection with image processing and spectral clustering. In Proceedings of the 2nd International Conference on Information Technology and Computer Networks, Antalya, Turkey, 8–10 October 2013; p. 4853. [Google Scholar]
- Koch, C.; Brilakis, I. Pothole detection in asphalt pavement images. Adv. Eng. Inform. 2011, 25, 507–515. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Lu, Y. A data-centric strategy to improve performance of automatic pavement defects detection. Autom. Constr. 2024, 160, 105334. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, H.; Lu, Y. Predicting pavement condition index using a novel image regression model. Constr. Build. Mater. 2025, 495, 143687. [Google Scholar] [CrossRef]
- Apeagyei, A.; Ademolake, T.E.; Adom-Asamoah, M. Evaluation of deep learning models for classification of asphalt pavement distresses. Int. J. Pavement Eng. 2023, 24, 2180641. [Google Scholar] [CrossRef]
- Behzadian, A.; Muturi, T.W.; Zhang, T.; Kim, H.; Mullins, A.; Lu, Y.; Owor, N.J.; Adu-Gyamfi, Y.; Buttlar, W.; Hamed, M. The 1st Data Science for Pavements Challenge. arXiv 2022, arXiv:2206.04874. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Lu, Y. RheologyNet: A physics-informed neural network solution to evaluate the thixotropic properties of cementitious materials. Cem. Concr. Res. 2023, 168, 107157. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Lu, Y. Machine learning-enabled regional multi-hazards risk assessment considering social vulnerability. Sci. Rep. 2023, 13, 13405. [Google Scholar] [CrossRef]
- Wang, D.; Liu, Z.; Gu, X.; Wu, W.; Chen, Y.; Wang, L. Automatic Detection of Pothole Distress in Asphalt Pavement Using Improved Convolutional Neural Networks. Remote Sens. 2022, 14, 3892. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Lu, Y. Benchmark Study on a Novel Online Dataset for Standard Evaluation of Deep Learning-based Pavement Cracks Classification Models. KSCE J. Civ. Eng. 2024, 28, 1267–1279. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Lu, Y. ECSNet: An Accelerated Real-Time Image Segmentation CNN Architecture for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15105–15112. [Google Scholar] [CrossRef]
- Ukhwah, E.N.; Yuniarno, E.M.; Suprapto, Y.K. Asphalt pavement pothole detection using deep learning method based on YOLO neural network. In Proceedings of the 2019 International Seminar on Intelligent Technology and Its Applications (ISITIA), Surabaya, Indonesia, 28–29 August 2019; pp. 35–40. [Google Scholar]
- Ahmed, K.R. Smart pothole detection using deep learning based on dilated convolution. Sensors 2021, 21, 8406. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Cui, B.; Liu, Z.; Yang, Q. UAV-YOLO12: A Multi-Scale Road Segmentation Model for UAV Remote Sensing Imagery. Drones 2025, 9, 533. [Google Scholar] [CrossRef]
- Wang, Z.; Li, S.; Xuan, J.; Shi, T. Biologically inspired compound defect detection using a spiking neural network with continuous time–frequency gradients. Adv. Eng. Inform. 2025, 65, 103132. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 2 November 2025).
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A.; Jawahar, C. Cats and dogs. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3498–3505. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Han, C.; Ma, T.; Gu, L.; Cao, J.; Shi, X.; Huang, W.; Tong, Z. Asphalt Pavement Health Prediction Based on Improved Transformer Network. IEEE Trans. Intell. Transp. Syst. 2022, 24, 4482–4493. [Google Scholar] [CrossRef]
- Xu, J.; Shi, W.; Gao, P.; Wang, Z.; Li, Q. UperFormer: A Multi-scale Transformer-based Decoder for Semantic Segmentation. arXiv 2022, arXiv:2211.13928. [Google Scholar]
- Guo, F.; Qian, Y.; Liu, J.; Yu, H. Pavement crack detection based on transformer network. Autom. Constr. 2023, 145, 104646. [Google Scholar] [CrossRef]
- Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, W.; Gu, X.; Cui, B. PaveDistress: A comprehensive dataset of pavement distresses detection. Data Brief 2024, 57, 111111. [Google Scholar] [CrossRef] [PubMed]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Mullins, A.; Lu, Y. Integrated APC-GAN and AttuNet Framework for Automated Pavement Crack Pixel-Level Segmentation: A New Solution to Small Training Datasets. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4474–4481. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Liu, Z.; Wang, S.; Gu, X.; Wang, D.; Dong, Q.; Cui, B. Intelligent Assessment of Pavement Structural Conditions: A Novel FeMViT Classification Network for GPR Images. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13511–13523. [Google Scholar] [CrossRef]
- Kurbiel, T.; Khaleghian, S. Training of deep neural networks based on distance measures using RMSProp. arXiv 2017, arXiv:1708.01911. [Google Scholar] [CrossRef]
- Salcedo, E.; Jaber, M.; Carrión, J.R. A Novel Road Maintenance Prioritisation System Based on Computer Vision and Crowdsourced Reporting. J. Sens. Actuator Netw. 2022, 11, 15. [Google Scholar] [CrossRef]









| Model | Precision | Recall | F1-Score |
|---|---|---|---|
| FCN | 79.25 | 60.80 | 68.79 |
| E-Net | 83.77 | 66.87 | 74.31 |
| LRASPP | 86.08 | 71.31 | 78.0 |
| U-Net | 86.78 | 59.68 | 70.71 |
| AttuNet | 86.09 | 68.47 | 76.27 |
| PoFormer | 85.92 | 72.20 | 78.43 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, T.; Liu, Z.; Cui, B.; Gu, X.; Lu, Y. Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation. Sensors 2025, 25, 6756. https://doi.org/10.3390/s25216756
Zhang T, Liu Z, Cui B, Gu X, Lu Y. Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation. Sensors. 2025; 25(21):6756. https://doi.org/10.3390/s25216756
Chicago/Turabian StyleZhang, Tianjie, Zhen Liu, Bingyan Cui, Xingyu Gu, and Yang Lu. 2025. "Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation" Sensors 25, no. 21: 6756. https://doi.org/10.3390/s25216756
APA StyleZhang, T., Liu, Z., Cui, B., Gu, X., & Lu, Y. (2025). Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation. Sensors, 25(21), 6756. https://doi.org/10.3390/s25216756

