Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation

Zhang, Tianjie; Liu, Zhen; Cui, Bingyan; Gu, Xingyu; Lu, Yang

doi:10.3390/s25216756

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation

by

Tianjie Zhang

^1,2,

Zhen Liu

³

,

Bingyan Cui

¹

,

Xingyu Gu

⁴

and

Yang Lu

^2,*

¹

Center for Advanced Infrastructure and Transportation, Rutgers University, Piscataway, NJ 08854, USA

²

Department of Engineering, Boise State University, Boise, ID 83752, USA

³

Institute of Space and Earth Information Science, Fok Ying Tung Remote Sensing Science Building, The Chinese University of Hong Kong, Hong Kong SAR, China

⁴

School of Transportation, Southeast University, Nanjing 211189, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(21), 6756; https://doi.org/10.3390/s25216756 (registering DOI)

Submission received: 30 September 2025 / Revised: 26 October 2025 / Accepted: 3 November 2025 / Published: 4 November 2025

(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)

Download Versions Notes

Abstract

Pavement surface defects such as potholes pose significant safety risks and accelerate infrastructure deterioration. Accurate and automated detection of such defects requires both advanced sensing technologies and robust deep learning models. In this study, we propose PoFormer, a Transformer–CNN hybrid framework designed for precise segmentation of pavement potholes from heterogeneous image datasets. The architecture leverages the global feature extraction ability of Transformers and the fine-grained localization capability of CNNs, achieving superior segmentation accuracy compared to state-of-the-art models. To construct a representative dataset, we combined open source images with high-resolution field data acquired using a multi-sensor pavement inspection vehicle equipped with a line-scan camera and infrared/laser-assisted lighting. This sensing system provides millimeter-level resolution and continuous 3D surface imaging under diverse environmental conditions, ensuring robust training inputs for deep learning. Experimental results demonstrate that PoFormer achieves a mean IoU of 77.23% and a mean pixel accuracy of 84.48%, outperforming existing CNN-based models. By integrating multi-sensor data acquisition with advanced hybrid neural networks, this work highlights the potential of 3D imaging and sensing technologies for intelligent pavement condition monitoring and automated infrastructure maintenance.

Keywords: transformer; pothole; image segmentation; CNN; deep learning

Share and Cite

MDPI and ACS Style

Zhang, T.; Liu, Z.; Cui, B.; Gu, X.; Lu, Y. Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation. Sensors 2025, 25, 6756. https://doi.org/10.3390/s25216756

AMA Style

Zhang T, Liu Z, Cui B, Gu X, Lu Y. Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation. Sensors. 2025; 25(21):6756. https://doi.org/10.3390/s25216756

Chicago/Turabian Style

Zhang, Tianjie, Zhen Liu, Bingyan Cui, Xingyu Gu, and Yang Lu. 2025. "Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation" Sensors 25, no. 21: 6756. https://doi.org/10.3390/s25216756

APA Style

Zhang, T., Liu, Z., Cui, B., Gu, X., & Lu, Y. (2025). Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation. Sensors, 25(21), 6756. https://doi.org/10.3390/s25216756

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transformer–CNN Hybrid Framework for Pavement Pothole Segmentation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI