CONTI-CrackNet: A Continuity-Aware State-Space Network for Crack Segmentation
Abstract
1. Introduction
- CONTI-CrackNet architecture: a cascaded and lightweight crack segmentation network that effectively represents complex crack morphology while markedly reducing computation.
- Multi-Directional Selective Scanning Strategy (MD3S): efficient long-range modeling that characterizes crack continuity and shape from multiple directions combined with the Bidirectional Gated Fusion (BiGF) module to alleviate directional bias.
- Dual-Branch Pixel-Level Global–Local Fusion (DBPGL) module: a Pixel-Adaptive Pooling (PAP) mechanism that balances max-pooled and average-pooled features at each pixel, preserving edge fidelity while improving global connectivity.
2. Methods
2.1. Network Architecture
2.2. Multi-Directional Selective Scanning Strategy (MD3S)
2.3. Dual-Branch Pixel-Level Global–Local Fusion (DBPGL) Module
2.4. Pixel-Adaptive Pooling (PAP) Mechanism
3. Results
3.1. Datasets
- TUT [42]: Unlike datasets with simple backgrounds, TUT contains dense and cluttered scenes with diverse crack shapes. It includes 1408 RGB images from eight representative scenes: bitumen, cement, bricks, runways, tiles, metal, generator blades, and underground pipelines. Of these, 1270 images were collected in-house using mobile phones, and 138 images were sourced from the internet.
- CRACK500 [43]: Proposed by Yang et al., the original dataset contains 500 bitumen crack images at 2000 × 1500 resolution, which were all captured by mobile phones. Because the dataset is small, images are cropped into 16 nonoverlapping patches, and only samples with more than 1000 crack pixels are retained. After this processing, each patch has a resolution of 640 × 320. Data augmentation increases the total to 3368 images, and each sample is paired with a per-pixel binary mask.
3.2. Evaluation Metrics
3.3. Implementation Details and Training
3.4. Experimental Results
3.4.1. Experimental Results on Dataset TUT
3.4.2. Experimental Results on Dataset CRACK500
3.5. Ablation Study
3.5.1. Ablation Study of Components
3.5.2. Ablation on Scanning Strategies
3.5.3. Ablation Study of the Attention Mechanism in the PAP Module
3.6. Complexity Analysis
3.7. Analysis of Failure Cases and Limitations
4. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wu, S.; Withers, P.J.; Beretta, S.; Kang, G. Editorial: Tomography traces the growing cracks and defects. Eng. Fract. Mech. 2023, 292, 109628. [Google Scholar] [CrossRef]
- Matarneh, S.; Elghaish, F.; Edwards, D.J.; Rahimian, F.P.; Abdellatef, E.; Ejohwomu, O. Automatic crack classification on asphalt pavement surfaces using convolutional neural networks and transfer learning. J. Inf. Technol. Constr. 2024, 29, 1239–1256. [Google Scholar] [CrossRef]
- Singh, V.; Baral, A.; Kumar, R.; Tummala, S.; Noori, M.; Yadav, S.V.; Kang, S.; Zhao, W. A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms. Sensors 2024, 24, 7249. [Google Scholar] [CrossRef] [PubMed]
- Gao, J.; Gui, Y.; Ji, W.; Wen, J.; Zhou, Y.; Huang, X.; Wang, Q.; Wei, C.; Huang, Z.; Wang, C.; et al. EU-Net: A segmentation network based on semantic fusion and edge guidance for road crack images. Appl. Intell. 2024, 54, 12949–12963. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhou, S.; Zheng, Y.; Yan, L.; Yang, H. Clustering and diagnosis of crack images of tunnel linings via graph neural networks. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2024, 18, 825–837. [Google Scholar] [CrossRef]
- Lei, Q.; Zhong, J.; Wang, C. Joint optimization of crack segmentation with an adaptive dynamic threshold module. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6902–6916. [Google Scholar] [CrossRef]
- Liu, Y.; Yeoh, J.K.W. Robust pixel-wise concrete crack segmentation and properties retrieval using image patches. Autom. Constr. 2021, 123, 103535. [Google Scholar] [CrossRef]
- Yamaguchi, T.; Mizutani, T. Road crack detection interpreting background images by convolutional neural networks and a self-organizing map. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 1616–1640. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 2881–2890. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets V2: More Deformable, Better Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar] [CrossRef]
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 1140–1156. [Google Scholar]
- Li, W.; Xue, L.; Wang, X.; Li, G. ConvTransNet: A CNN-Transformer Network for Change Detection with Multiscale Global-Local Representations. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5610315. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Wang, J.; Zeng, Z.; Sharma, P.K.; Alfarraj, O.; Tolba, A.; Zhang, J.; Wang, L. Dual-path network combining CNN and transformer for pavement crack segmentation. Autom. Constr. 2024, 158, 105217. [Google Scholar] [CrossRef]
- Shao, Z.; Wang, Z.; Yao, X.; Bell, M.G.H.; Gao, J. ST-MambaSync: Complement the power of Mamba and Transformer fusion for less computational cost in spatial–temporal traffic forecasting. Inf. Fusion 2025, 117, 102872. [Google Scholar] [CrossRef]
- Zunair, H.; Ben Hamza, A. Masked Supervised Learning for Semantic Segmentation. arXiv 2022, arXiv:2210.00923. [Google Scholar] [CrossRef]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv 2024, arXiv:2401.09417. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2024, arXiv:2312.00752. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. VMamba: Visual State Space Model. arXiv 2024, arXiv:2401.10166. [Google Scholar]
- Yang, C.; Chen, Z.; Espinosa, M.; Ericsson, L.; Wang, Z.; Liu, J.; Crowley, E.J. PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition. arXiv 2024, arXiv:2403.17695. [Google Scholar]
- Guo, H.; Li, J.; Dai, T.; Ouyang, Z.; Ren, X.; Xia, S.T. MambaIR: A Simple Baseline for Image Restoration with State-Space Model. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp. 222–241. [Google Scholar] [CrossRef]
- Liu, M.; Dan, J.; Lu, Z.; Yu, Y.; Li, Y.; Li, X. CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation. arXiv 2024, arXiv:2405.10530. [Google Scholar]
- Luan, X.; Fan, H.; Wang, Q.; Yang, N.; Liu, S.; Li, X.; Tang, Y. FMambaIR: A Hybrid State-Space Model and Frequency Domain for Image Restoration. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4201614. [Google Scholar] [CrossRef]
- Li, B.; Zhao, H.; Wang, W.; Hu, P.; Gou, Y.; Peng, X. MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 7491–7501. [Google Scholar] [CrossRef]
- Ding, H.; Xia, B.; Liu, W.; Zhang, Z.; Zhang, J.; Wang, X.; Xu, S. A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation. Remote Sens. 2024, 16, 2620. [Google Scholar] [CrossRef]
- Zuo, X.; Sheng, Y.; Shen, J.; Shan, Y. Topology-aware Mamba for Crack Segmentation in Structures. Autom. Constr. 2024, 168, 105845. [Google Scholar] [CrossRef]
- Han, C.; Yang, H.; Yang, Y. Enhancing Pixel-Level Crack Segmentation with Visual Mamba and Convolutional Networks. Autom. Constr. 2024, 168, 105770. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, D.; Lu, Y. ECSNet: An Accelerated Real-Time Image Segmentation CNN Architecture for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15105–15112. [Google Scholar] [CrossRef]
- Wang, W.; Su, C. Automatic concrete crack segmentation model based on transformer. Autom. Constr. 2022, 139, 104275. [Google Scholar] [CrossRef]
- Fan, Y.; Hu, Z.; Li, Q.; Sun, Y.; Chen, J.; Zhou, Q. CrackNet: A Hybrid Model for Crack Segmentation with Dynamic Loss Function. Sensors 2024, 24, 7134. [Google Scholar] [CrossRef]
- He, T.; Shi, F.; Zhao, M.; Yin, Y. A Lightweight Selective Feature Fusion and Irregular-Aware Network for Crack Detection Based on Federated Learning. In Proceedings of the International Conference on High Performance Big Data and Intelligent Systems (HDIS), Tianjin, China, 10–11 December 2022; pp. 294–298. [Google Scholar] [CrossRef]
- Tao, H.; Liu, B.; Cui, J.; Zhang, H. A Convolutional-Transformer Network for Crack Segmentation with Boundary Awareness. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 86–90. [Google Scholar] [CrossRef]
- Rahman, M.M.; Tutul, A.A.; Nath, A.; Laishram, L.; Jung, S.K.; Hammond, T. Mamba in Vision: A Comprehensive Survey of Techniques and Applications. arXiv 2024, arXiv:2410.03105. [Google Scholar] [CrossRef]
- Liu, H.; Jia, C.; Shi, F.; Cheng, X.; Chen, S. SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 29406–29416. [Google Scholar] [CrossRef]
- You, H.; Li, Z.; Wei, Z.; Zhang, L.; Bi, X.; Bi, C.; Li, X.; Duan, Y. A Blueberry Maturity Detection Method Integrating Attention-Driven Multi-Scale Feature Interaction and Dynamic Upsampling. Horticulturae 2025, 11, 600. [Google Scholar] [CrossRef]
- Zhu, Q.; Fang, Y.; Cai, Y.; Chen, C.; Fan, L. Rethinking Scanning Strategies With Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18223–18234. [Google Scholar] [CrossRef]
- Qu, H.; Ning, L.; An, R.; Fan, W.; Derr, T.; Liu, H.; Xu, X.; Li, Q. A Survey of Mamba. arXiv 2025, arXiv:2408.01129. [Google Scholar] [CrossRef]
- Zhao, S.; Chen, H.; Zhang, X.; Xiao, P.; Bai, L.; Ouyang, W. RS-Mamba for Large Remote Sensing Image Dense Prediction. arXiv 2024, arXiv:2404.02668. [Google Scholar] [CrossRef]
- Sun, H.; Li, S.; Zheng, X.; Lu, X. Remote Sensing Scene Classification by Gated Bidirectional Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 82–96. [Google Scholar] [CrossRef]
- Nirthika, R.; Manivannan, S.; Ramanan, A.; Wang, R. Pooling in convolutional neural networks for medical image analysis: A survey and an empirical study. Neural Comput. Appl. 2022, 34, 5321–5347. [Google Scholar] [CrossRef]
- Xiao, J.; Guo, H.; Yao, Y.; Zhang, S.; Zhou, J.; Jiang, Z. Multi-Scale Object Detection with the Pixel Attention Mechanism in a Complex Background. Remote Sens. 2022, 14, 3969. [Google Scholar] [CrossRef]
- Liu, H.; Jia, C.; Shi, F.; Cheng, X.; Wang, M.; Chen, S. CrackSCF: Lightweight Cascaded Fusion Network for Robust and Efficient Structural Crack Segmentation. arXiv 2025, arXiv:2408.12815. [Google Scholar]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, A.A.; Dong, Z.; He, A.; Liu, Y.; Zhan, Y.; Wang, K.C.P. Robust Semantic Segmentation for Automatic Crack Detection Within Pavement Images Using Multi-Mixing of Global Context and Local Image Features. IEEE Trans. Intell. Transp. Syst. 2024, 25, 11282–11303. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]








| Dataset | Resolution | Images | Training | Validation | Test |
|---|---|---|---|---|---|
| TUT [42] | 1408 | 986 | 141 | 281 | |
| CRACK500 [43] | 3368 | 2358 | 337 | 673 |
| ODS | OIS | F1 | P | R | mIoU | |
|---|---|---|---|---|---|---|
| 0 | 0.8021 | 0.8059 | 0.8279 | 0.8176 | 0.8384 | 0.8368 |
| 0.1 | 0.8049 | 0.8116 | 0.8292 | 0.8185 | 0.8402 | 0.8371 |
| 0.2 | 0.8133 | 0.8165 | 0.8332 | 0.8220 | 0.8447 | 0.8436 |
| 0.3 | 0.8110 | 0.8123 | 0.8313 | 0.8196 | 0.8434 | 0.8420 |
| 0.5 | 0.8002 | 0.8017 | 0.8283 | 0.8201 | 0.8366 | 0.8359 |
| Methods | ODS | OIS | F1 | P | R | mIoU |
|---|---|---|---|---|---|---|
| SFIAN (2022) [31] | 0.7290 | 0.7513 | 0.7473 | 0.7715 | 0.7247 | 0.7756 |
| SegNeXt (2022) [11] | 0.7312 | 0.7435 | 0.7517 | 0.7812 | 0.7245 | 0.7785 |
| Crackmer (2024) [14] | 0.7429 | 0.7501 | 0.7578 | 0.7501 | 0.7656 | 0.7966 |
| SegFormer (2021) [13] | 0.7532 | 0.7612 | 0.7670 | 0.7654 | 0.7688 | 0.8078 |
| CT-crackseg (2023) [32] | 0.7940 | 0.7996 | 0.8199 | 0.8202 | 0.8195 | 0.8301 |
| CSMamba (2024) [22] | 0.7879 | 0.7946 | 0.8146 | 0.7947 | 0.8353 | 0.8263 |
| PlainMamba (2024) [20] | 0.7889 | 0.7954 | 0.8154 | 0.7955 | 0.8365 | 0.8316 |
| Ours | 0.8133 | 0.8165 | 0.8332 | 0.8220 | 0.8447 | 0.8436 |
| Methods | ODS | OIS | F1 | P | R | mIoU |
|---|---|---|---|---|---|---|
| SFIAN (2022) [31] | 0.6473 | 0.6941 | 0.7204 | 0.6983 | 0.7441 | 0.7315 |
| SegNeXt (2022) [11] | 0.6488 | 0.6762 | 0.7334 | 0.7134 | 0.7546 | 0.7345 |
| Crackmer (2024) [14] | 0.6933 | 0.7097 | 0.7267 | 0.6985 | 0.7572 | 0.7421 |
| SegFormer (2021) [13] | 0.6998 | 0.7134 | 0.7245 | 0.7067 | 0.7434 | 0.7456 |
| CT-crackseg (2023) [32] | 0.6941 | 0.7059 | 0.7322 | 0.6940 | 0.7748 | 0.7591 |
| CSMamba (2024) [22] | 0.6931 | 0.7162 | 0.7315 | 0.6858 | 0.7823 | 0.7592 |
| PlainMamba (2024) [20] | 0.6574 | 0.6870 | 0.7422 | 0.7318 | 0.7530 | 0.7579 |
| Ours | 0.7104 | 0.7301 | 0.7587 | 0.7333 | 0.7860 | 0.7760 |
| MD3S | ElemAdd | SKNet | DBPGL | PAP | ODS | OIS | F1 | P | R | mIoU |
|---|---|---|---|---|---|---|---|---|---|---|
| ✗ | ✓ | ✗ | ✗ | ✗ | 0.7845 | 0.7921 | 0.8049 | 0.7945 | 0.8156 | 0.8078 |
| ✓ | ✓ | ✗ | ✗ | ✗ | 0.8008 | 0.8074 | 0.8268 | 0.8161 | 0.8377 | 0.8350 |
| ✓ | ✗ | ✓ | ✗ | ✗ | 0.7944 | 0.8064 | 0.8200 | 0.8123 | 0.8279 | 0.8268 |
| ✓ | ✗ | ✗ | ✓ | ✗ | 0.8094 | 0.8168 | 0.8297 | 0.8142 | 0.8459 | 0.8406 |
| ✓ | ✗ | ✗ | ✓ | ✓ | 0.8133 | 0.8165 | 0.8332 | 0.8220 | 0.8447 | 0.8436 |
| Method | ODS | OIS | F1 | P | R | mIoU |
|---|---|---|---|---|---|---|
| ParaSna | 0.7880 | 0.7926 | 0.8063 | 0.7898 | 0.8236 | 0.8041 |
| DiagSna | 0.7897 | 0.7979 | 0.8077 | 0.7891 | 0.8273 | 0.8059 |
| ParaSna + DiagSna | 0.7970 | 0.8071 | 0.8127 | 0.7909 | 0.8359 | 0.8260 |
| MD3S | 0.8008 | 0.8074 | 0.8268 | 0.8161 | 0.8377 | 0.8350 |
| Methods | ODS | OIS | F1 | P | R | mIoU |
|---|---|---|---|---|---|---|
| CBAM | 0.8016 | 0.8023 | 0.8250 | 0.8245 | 0.8256 | 0.8277 |
| PAM | 0.8133 | 0.8165 | 0.8332 | 0.8220 | 0.8447 | 0.8436 |
| Methods | GFLOPs | Params | FPS |
|---|---|---|---|
| SFIAN (2022) [31] | 84.57 G | 13.63 M | 32 |
| SegNeXt (2022) [11] | 31.80 G | 27.52 M | 22 |
| Crackmer (2024) [14] | 14.94 G | 5.90 M | 33 |
| SegFormer (2021) [13] | 30.80 G | 28.20 M | 21 |
| CT-crackseg (2023) [32] | 39.47 G | 22.88 M | 28 |
| CSMamba (2024) [22] | 145.84 G | 35.95 M | 19 |
| PlainMamba (2024) [20] | 73.36 G | 16.72 M | 33 |
| Ours | 24.22 G | 6.01 M | 42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, W.; Zhao, M.; Xu, X. CONTI-CrackNet: A Continuity-Aware State-Space Network for Crack Segmentation. Sensors 2025, 25, 6865. https://doi.org/10.3390/s25226865
Song W, Zhao M, Xu X. CONTI-CrackNet: A Continuity-Aware State-Space Network for Crack Segmentation. Sensors. 2025; 25(22):6865. https://doi.org/10.3390/s25226865
Chicago/Turabian StyleSong, Wenjie, Min Zhao, and Xunqian Xu. 2025. "CONTI-CrackNet: A Continuity-Aware State-Space Network for Crack Segmentation" Sensors 25, no. 22: 6865. https://doi.org/10.3390/s25226865
APA StyleSong, W., Zhao, M., & Xu, X. (2025). CONTI-CrackNet: A Continuity-Aware State-Space Network for Crack Segmentation. Sensors, 25(22), 6865. https://doi.org/10.3390/s25226865
