The Fine Feature Extraction and Attention Re-Embedding Model Based on the Swin Transformer for Pavement Damage Classification
Abstract
:1. Introduction
2. Related Works
2.1. Pavement Distress Classification
2.2. Swin Transformer
2.3. Feature Re-Embedding
3. Methods
3.1. FFEAR-Swin Transformer
3.2. Fine Feature Extraction Module
3.3. Multi-Head Self-Attention Re-Embedding Module
4. Experiments
4.1. Experimental Environment
4.2. Datasets and Evaluation Metrics
4.3. Experimental Analysis
4.4. Visualization Analysis
4.5. Ablation Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
FFE | Fine feature extraction |
MHAR | Multi-head self-attention re-embedding |
R-CNN | Region-based convolutional neural network |
GAN | Generative Adversarial Network |
SE | Squeeze and Expand |
MC | Multi-Scale Convolution |
DSC | Depthwise separable convolution |
WSPLINs | Weakly supervised patch label inference networks |
PLIN | Patch label inference network |
FCN | Fully convolutional network |
LVE | Linear viscoelastic |
MLET | Multi-layered elastic theory |
FWD | Falling Weight Deflectometer |
GPR | Ground-Penetrating Radar |
ViT | Vision Transformer |
Swin-T | Swin Transformer |
RRT | Re-embedding region Transformer |
W-MSA | Window-based multi-head self-attention mechanism |
SW-MSA | Shifted window-based multi-head self-attention mechanism |
VGG16 | Visual Geometry Group 16-layer network |
FFVT | Feature Fusion Vision Transformer |
TransFG | Transformer Architecture for Fine-Grained Recognition |
DMTC | Dense Multiscale Feature Learning Transformer |
DDACDN | Deep Domain Adaptation for Pavement Crack Detection |
PicT | Pavement image classification Transformer |
References
- Doshi, K.; Yilmaz, Y. Road damage detection using deep ensemble learning. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 5540–5544. [Google Scholar]
- Valipour, P.S.; Golroo, A.; Kheirati, A.; Fahmani, M.; Amani, M.J. Automatic pavement distress severity detection using deep learning. Road Mater. Pavement Des. 2023, 25, 1830–1846. [Google Scholar] [CrossRef]
- Pan, N.; Liu, H.; Wu, D.; Liu, C.; Du, Y. Spatiotemporal matching method for tracking pavement distress using high-frequency detection data. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2257–2278. [Google Scholar] [CrossRef]
- Fahad, M.; Nagy, R.; Guangpin, L.; Rosta, S. Pavement Crack Monitoring: Literature Review. Iraqi J. Civ. Eng. 2023, 16, 76–89. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, W.; Yang, Y.; Sun, H.; Wang, L. An efficient pavement distress detection scheme through drone–ground vehicle coordination. Transp. Res. Part A Policy Pract. 2024, 180, 103949. [Google Scholar] [CrossRef]
- Chou, J.C.; O’Neill, W.A.; Cheng, H.D. Pavement distress classification using neural networks. IEEE Int. Conf. Syst. Man Cybern. 1994, 1, 397–401. [Google Scholar]
- Nejad, F.M.; Zakeri, H. An expert system based on wavelet transform and radon neural network for pavement distress classification. Expert Syst. Appl. 2011, 38, 7088–7101. [Google Scholar] [CrossRef]
- Lv, Z.; Cheng, C.; Lv, H. Automatic identification of pavement cracks in public roads using an optimized deep convolutional neural network model. Philos. Trans. R. Soc. A 2023, 381, 20220169. [Google Scholar] [CrossRef]
- Liu, Z.; Pan, S.; Gao, Z.; Chen, N.; Li, F.; Wang, L.; Hou, Y. Automatic intelligent recognition of pavement distresses with limited dataset using generative adversarial networks. Autom. Constr. 2023, 146, 104674. [Google Scholar] [CrossRef]
- Cheng, X.; He, T.; Shi, F.; Zhao, M.; Liu, X.; Chen, S. Selective feature fusion and irregular-aware network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 3445–3456. [Google Scholar] [CrossRef]
- Huang, S.; Tang, W.; Huang, G.; Huangfu, L.; Yang, D. Weakly supervised patch label inference networks for efficient pavement distress detection and recognition in the wild. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5216–5228. [Google Scholar] [CrossRef]
- Fan, Z.; Wu, Y.; Lu, J.; Li, W. Automatic pavement crack detection based on structured prediction with the convolutional neural network. arXiv 2018, arXiv:1802.02208. [Google Scholar]
- Sirhan, M.; Bekhor, S.; Sidess, A. Multilabel CNN model for asphalt distress classification. J. Comput. Civ. Eng. 2024, 38, 04023040. [Google Scholar] [CrossRef]
- Zhu, J.; Zhong, J.; Ma, T.; Huang, X.; Zhang, W.; Zhou, Y. Pavement distress detection using convolutional neural networks with images captured via UAV. Autom. Constr. 2022, 133, 103991. [Google Scholar] [CrossRef]
- Liang, J.; Gu, X.; Jiang, D.; Zhang, Q. CNN-based network with multi-scale context feature and attention mechanism for automatic pavement crack segmentation. Autom. Constr. 2024, 164, 105482. [Google Scholar] [CrossRef]
- Li, P.; Zhou, B.; Wang, C.; Hu, G.; Yan, Y.; Guo, R.; Xia, H. CNN-based pavement defects detection using grey and depth images. Autom. Constr. 2024, 158, 105192. [Google Scholar] [CrossRef]
- Tang, W.; Huang, S.; Zhao, Q.; Li, R.; Huangfu, L. An iteratively optimized patch label inference network for automatic pavement distress detection. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8652–8661. [Google Scholar] [CrossRef]
- Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road damage detection using deep neural networks with images captured through a smartphone. arXiv 2018, arXiv:1801.09454. [Google Scholar]
- Di Benedetto, A.; Fiani, M.; Gujski, L.M. U-Net-based CNN architecture for road crack segmentation. Infrastructures 2023, 8, 90. [Google Scholar] [CrossRef]
- Neto, O.P.V.; Luz, L.O.; Silva, P.A.R.; de Oliveira Bicalho, J.G.; Ruella, E.V.C.; Nacif, J.A.; Ferreira, R.S. The Impact of Information Flow Control on FCN Circuit Design. In Proceedings of the 2024 IEEE 24th International Conference on Nanotechnology (NANO), Gijón, Spain, 8–11 July 2024; pp. 448–453. [Google Scholar]
- Noori, H.; Sarkar, R. Airport Pavement Distress Analysis. Iran. J. Sci. Technol. Trans. Civ. Eng. 2024, 48, 1171–1190. [Google Scholar] [CrossRef]
- Xiong, B.; Hong, R.; Liu, R.; Wang, J.; Zhang, J.; Li, W.; Lv, S.; Ge, D. FCT-Net: A dual-encoding-path network fusing atrous spatial pyramid pooling and transformer for pavement crack detection. Eng. Appl. Artif. Intell. 2024, 137, 109190. [Google Scholar] [CrossRef]
- Mei, A.; Zampetti, E.; Di Mascio, P.; Fontinovo, G.; Papa, P.; D’Andrea, A. ROADS—Rover for Bituminous Pavement Distress Survey: An Unmanned Ground Vehicle (UGV) Prototype for Pavement Distress Evaluation. Sensors 2022, 22, 3414. [Google Scholar] [CrossRef]
- Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep Convolutional Neural Networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr. Build. Mater. 2017, 157, 322–330. [Google Scholar] [CrossRef]
- Tang, W.; Huang, S.; Zhang, X.; Huangfu, L. PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 3076–3084. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Ahmed, A.; Erlingsson, S. Viscoelastic Response Modelling of a Pavement under Moving Load. Transp. Res. Procedia 2016, 14, 748–757. [Google Scholar] [CrossRef]
- Gkyrtis, K.; Loizos, A.; Plati, C. A mechanistic framework for field response assessment of asphalt pavements. Int. J. Pavement Res. Technol. 2021, 14, 174–185. [Google Scholar] [CrossRef]
- Devlin, J. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–15. [Google Scholar]
- Chen, Z.; Zhu, Y.; Zhao, C.; Hu, G.; Zeng, W.; Wang, J.; Tang, M. DPT: Deformable patch-based transformer for visual recognition. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 2899–2907. [Google Scholar]
- Zheng, J.; Jeon, S.; Yang, X. GRDATFusion: A gradient residual dense and attention transformer infrared and visible image fusion network for smart city security systems in cloud and fog computing. Expert Syst. 2024, 42, e13685. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Tang, W.; Zhou, F.; Huang, S.; Zhu, X.; Zhang, Y.; Liu, B. Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 11343–11352. [Google Scholar]
- Eldem, H.; Ülker, E.; Işıklı, O.Y. AlexNet architecture variations with transfer learning for classification of wound images. Eng. Sci. Technol. Int. J. 2023, 45, 101490. [Google Scholar] [CrossRef]
- Jiang, Y.; Pang, D.; Li, C.; Yu, Y.; Cao, Y. Two-step deep learning approach for pavement crack damage detection and segmentation. Int. J. Pavement Eng. 2023, 24, 2065488. [Google Scholar] [CrossRef]
- Kumar, B.A.; Bansal, M. Pothole Detection of Road Pavement by Modified MobileNetV2 for Transfer Learning. In International Conference on Soft Computing for Problem-Solving; Springer Nature: Singapore, 2023; pp. 515–531. [Google Scholar]
- Li, B.; Xu, J.; Lian, Y.; Sun, F.; Zhou, J.; Luo, J. Improved MobileNet V3-Based Identification Method for Road Adhesion Coefficient. Sensors 2024, 24, 5613. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Yu, X.; Gao, Y. Feature fusion vision transformer for fine-grained visual categorization. arXiv 2021, arXiv:2107.02341. [Google Scholar]
- He, J.; Chen, J.N.; Liu, S.; Kortylewski, A.; Yang, C.; Bai, Y.; Wang, C. TransFG: A transformer architecture for fine-grained recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36. [Google Scholar]
- Liu, X.; Zhang, C.; Zhang, L. Vision mamba: A comprehensive survey and taxonomy. arXiv 2024, arXiv:2405.04404. [Google Scholar]
- Yu, W.; Wang, X. Mambaout: Do we really need mamba for vision? arXiv 2024, arXiv:2405.07992. [Google Scholar]
- Xu, C.; Zhang, Q.; Mei, L.; Shen, S.; Ye, Z.; Li, D.; Yang, W.; Zhou, X. Dense multiscale feature learning transformer embedding cross-shaped attention for road damage detection. Electronics 2023, 12, 898. [Google Scholar] [CrossRef]
- Liu, H.; Yang, C.; Li, A.; Huang, S.; Feng, X.; Ruan, Z.; Ge, Y. Deep domain adaptation for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2022, 24, 1669–1681. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Parameter | Value |
---|---|
Learning rate | 0.0001 |
Weight decay | 0.0005 |
Batch size | 32 |
Epoch | 50 |
Model | Accuracy | Precision | F1 Score |
---|---|---|---|
MobileNetV3 | 82.31% | 79.91% | 79.90% |
MobileNetV2 | 83.40% | 79.55% | 79.71% |
AlexNet | 73.43% | 75.16% | 73.48% |
VGG16 | 77.23% | 84.63% | 79.45% |
Swin-T-S | 87.43% | 88.65% | 87.50% |
Swin-T-B | 87.88% | 88.34% | 87.85% |
ViT-B-16 | 86.08% | 85.96% | 85.76% |
ViT-B-32 | 84.39% | 82.37% | 82.99% |
Vision Mamba | 87.12% | 88.21% | 87.33% |
MambaOut | 87.37% | 88.61% | 87.63% |
FFVT | 81.98% | 81.85% | 81.26% |
TransFG | 80.88% | 80.87% | 80.22% |
DMTC | 84.32% | 83.92% | 82.68% |
DDACDN | 83.62% | 83.73% | 83.57% |
PicT | 88.58% | 88.14% | 87.85% |
FFEAR | 89.24% | 88.36% | 88.35% |
Model | Accuracy | Precision | F1 Score |
---|---|---|---|
Swin-T | 85.63% | 84.72% | 83.91% |
DMTC | 84.32% | 83.92% | 82.68% |
DDACDN | 81.62% | 81.37% | 80.93% |
FFEAR | 89.24% | 88.36% | 88.35% |
Dataset | Model | Accuracy | Precision | F1 Score | Param (M) |
---|---|---|---|---|---|
CQU-BPDD | Swin-T | 87.43% | 88.65% | 87.50% | 48 |
MHAR | 87.11% | 85.51% | 85.61% | 58 | |
FFE | 88.14% | 88.30% | 87.66% | 48 | |
FFEAR | 89.24% | 88.36% | 88.35% | 58 | |
Crack500 | Swin-T | 76.82% | 76.64% | 76.37% | 48 |
MHAR | 76.52% | 76.77% | 76.42% | 58 | |
FFE | 77.24% | 77.10% | 76.50% | 48 | |
FFEAR | 77.71% | 77.65% | 76.80% | 58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, S.; Wang, K.; Liu, Z.; Huang, M.; Huang, S. The Fine Feature Extraction and Attention Re-Embedding Model Based on the Swin Transformer for Pavement Damage Classification. Algorithms 2025, 18, 369. https://doi.org/10.3390/a18060369
Zhang S, Wang K, Liu Z, Huang M, Huang S. The Fine Feature Extraction and Attention Re-Embedding Model Based on the Swin Transformer for Pavement Damage Classification. Algorithms. 2025; 18(6):369. https://doi.org/10.3390/a18060369
Chicago/Turabian StyleZhang, Shizheng, Kunpeng Wang, Zhihao Liu, Min Huang, and Sheng Huang. 2025. "The Fine Feature Extraction and Attention Re-Embedding Model Based on the Swin Transformer for Pavement Damage Classification" Algorithms 18, no. 6: 369. https://doi.org/10.3390/a18060369
APA StyleZhang, S., Wang, K., Liu, Z., Huang, M., & Huang, S. (2025). The Fine Feature Extraction and Attention Re-Embedding Model Based on the Swin Transformer for Pavement Damage Classification. Algorithms, 18(6), 369. https://doi.org/10.3390/a18060369