Channel Segmentation Proofreading Network for Crack Counting with Imbalanced Samples
Abstract
1. Introduction
- (1)
- This paper pioneers the task of crack counting and proposes the first solution for this task, CSPNet.
- (2)
- The crack counting dataset annotated and verified by us will benefit the progress and development of crack counting, as well as other counting and multi-task crack detection domains.
- (3)
- The proposed AowFormer introduces a new solution to optimize the Transformer computation process, while the channel segmentation-based positive and negative features construction of CPM makes the proofreading module novel and provides accurate weight information for proofreading crack regions, ultimately positively affecting crack quantity predictions.
- (4)
- Extensive experiments confirm the effectiveness of the proposed CSPNet and the rationality of its internal component design.
2. Related Work
- (1)
- Traditional Crack Detection Methods: Subirats et al. [11] apply separable 2D continuous wavelet transform at different scales, analyze the maximum wavelet coefficients, and utilize post-processing to obtain a binary map indicating the presence of cracks. Refs. [12,13,14,15] are all methods based on image thresholds. Among them, in order to address the issue of gaps in cracks after image preprocessing, Huang et al. [12] propose an algorithm to connect these cracks, achieving accurate surface crack detection. Xu et al. [13] present an unsupervised crack detection method based on saliency and statistical features to address the problem of complex and diverse noise in large image regions. Other methods [16,17,18] rely on manually designed features and classification. Marcos [17] employs shoulder detection, unit candidate proposals, and crack classification in a three-step process for crack detection. Hamzeh [18] combines wavelet modulus and 3DRT for knowledge generation, trained and tested an artificial neural network classifier using peak features and parameters. Yan et al. [19] enhance grayscale road surface images by redesigning a median filtering algorithm with four structural elements, and combined morphological gradient operators and morphological closure operators to extract crack edges and fill crack gaps. Rabih et al. [20] propose an automatic crack detection algorithm highly dependent on the localization of the minimum path formed by a series of adjacent pixels within each image, introducing two post-processing steps to enhance detection quality.
- (2)
- Deep Learning-Based Crack Detection Methods: Deep learning-based crack detection methods can be categorized into block-level [8,21,22,23,24,25,26,27] and pixel-level crack detection [1,2,3,4,5,6,9,28,29,30,31,32,33,34]. Generally, patch-level crack detection methods are inferior to pixel-level crack detection methods in terms of detection accuracy and difficulty. Leo et al. [22] use a deeper neural network to differentiate between crack and non-crack patches, demonstrating the superior performance of deep neural networks. MOD-YOLO [26] enhances crack detection by introducing MODSConv for better channel interaction, Global Receptive Field-Space Pooling Pyramid-Fast for scale handling, and DAF-CA for precise feature extraction, all while maintaining dimensional integrity. Li et al. [8] proposed a semi-supervised method for road defect detection based on deep transfer learning, which achieved performance comparable to that of supervised learning with fewer annotated data and accurately determined the crack dimensions under different scenarios. Considering the limitations of patch-level crack detection in accuracy, which is not conducive to crack counting tasks, the more relevant field to crack counting is pixel-level crack detection. DeepCrack [1] designs a model that aggregates multi-scale and multi-level features, applies deep supervision directly to features at each stage, and optimizes the final prediction using guided filtering and conditional random field methods. FPHBN [3] incorporates a feature pyramid into the edge detection algorithm for crack detection, also using a hierarchical nested sample reweighting to balance the contribution of hard samples to the loss. Yuki et al. [6] formulates the crack problem as a weakly supervised problem and proposes a two-branch framework that maintains high detection accuracy even with low-quality labeling. TCDNet [30], by effectively embedding channel and position information in a mixed attention module, captures the long-range dependence of crack features and embed this module into a traditional U-shaped network using multi-scale feature fusion to construct the network. DcsNet [31], combining a morphology branch and a shallow detail branch, achieves a balance between crack detection speed and accuracy. Luo et al. [32] combine the advantages of traditional visual machine detection methods and semantic segmentation detection methods to improve the accuracy of pavement crack detection. For long and complex pavement cracks, CT [4] utilizes Swin Transformer [35] as the encoder and decoder, which is combined with all multi-layer perceptron (MLP) layers, thereby forming an innovative solution. Based on the design tenet of simultaneously learning cracks and crack-related information, Sun et al. [9] constructed a multi-task semi-supervised learning framework consisting of crack region detection, crack and noise edge classification, and crack counting.
3. Crack Counting Dataset
- (1)
- Annotation workflow: Six public benchmark datasets for crack detection—Crack500, CFD, GAPS384, cracktree200, AEL, and NCD—were integrated to generate 3040 raw images. A connectivity-based judgment algorithm was adopted to preliminarily count cracks in the ground-truth annotations, and manual verification was implemented to finish the primary annotation. Subsequently, joint examination and annotation correction were performed by specialists in road engineering and computer vision. The dataset was then randomly partitioned into training and test subsets at a 9:1 ratio. In response to the imbalanced distribution of samples, data augmentation operations including flipping and rotation were implemented on sample categories other than the two-crack class, thereby balancing the quantity of each category in the training set.
- (2)
- Validation of inter-annotator agreement: Three professional researchers with at least two years of relevant experience in crack detection were selected to conduct independent blind annotation. Post hoc verification demonstrated that all individual annotations satisfied the criterion of high consistency. For samples with inconsistent annotations, two senior domain experts conducted collective arbitration to confirm the ultimate annotation labels, so as to ensure the accuracy and consistency of the whole annotation process.
- (3)
- Potential biases in the dataset: The dataset suffers from three categories of inherent biases. First, original sample distribution bias: samples with one to three cracks account for more than 85% of the dataset. Although data augmentation achieved sample size balance, the feature diversity of augmented samples is slightly inferior to that of original samples. Second, scene coverage bias: all samples are restricted to pavement cracks, with no crack images from other civil engineering scenarios such as bridges and tunnels included, which limits the scene generalization ability. Third, annotation subjectivity bias: the connectivity judgment of some low-contrast and narrow cracks is affected by subjective factors, which may introduce minor uncertainty into the annotation of a small number of samples.
4. Method
4.1. AowFormer
4.2. CPM
4.3. Structure Design of CSPNet
5. Experiment
5.1. Experimental Settings
5.2. Evaluation Criteria
5.3. Comparison Experiment
5.4. Ablation Study
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
- Schmugge, S.J.; Rice, L.; Lindberg, J.; Grizziy, R.; Joffey, C.; Shin, M.C. Crack segmentation by leveraging multiple frames of varying illumination. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; IEEE: New York, NY, USA, 2017; pp. 1045–1053. [Google Scholar]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef]
- Guo, F.; Qian, Y.; Liu, J.; Yu, H. Pavement crack detection based on transformer network. Autom. Constr. 2023, 145, 104646. [Google Scholar] [CrossRef]
- Sun, M.; Zhao, H.; Li, J. Road crack detection network under noise based on feature pyramid structure with feature enhancement (road crack detection under noise). IET Image Process. 2022, 16, 809–822. [Google Scholar] [CrossRef]
- Inoue, Y.; Nagayoshi, H. Crack detection as a weakly-supervised problem: Towards achieving less annotation-intensive crack detectors. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: New, York, NY, USA, 2021; pp. 65–72. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4015–4026. [Google Scholar]
- Li, J.; Yuan, C.; Wang, X.; Chen, G.; Ma, G. Semi-supervised crack detection using segment anything model and deep transfer learning. Autom. Constr. 2025, 170, 105899. [Google Scholar] [CrossRef]
- Sun, M.; Zhao, H.; Liu, P.; Zhou, J. A multi-task mean teacher with two stage decoder for semi-supervised crack detection. Multimed. Tools Appl. 2024, 83, 59519–59536. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Subirats, P.; Dumoulin, J.; Legeay, V.; Barba, D. Automation of pavement surface crack detection using the continuous wavelet transform. In Proceedings of the 2006 International Conference on Image Processing, Las Vegas, NV, USA, 26–29 June 2006; IEEE: New York, NY, USA, 2006; pp. 3037–3040. [Google Scholar]
- Huang, W.; Zhang, N. A novel road crack detection and identification method using digital image processing techniques. In Proceedings of the 2012 7th International Conference on Computing and Convergence Technology (ICCCT), Seoul, Republic of Korea, 3–5 December 2012; IEEE: New York, NY, USA, 2012; pp. 397–400. [Google Scholar]
- Xu, W.; Tang, Z.; Zhou, J.; Ding, J. Pavement crack detection based on saliency and statistical features. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; IEEE: New York, NY, USA, 2013; pp. 4093–4097. [Google Scholar]
- Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. Pattern Recognit. Lett. 2012, 33, 227–238. [Google Scholar] [CrossRef]
- Tang, J.; Gu, Y. Automatic crack detection and segmentation using a hybrid algorithm for road distress analysis. In Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, UK, 13–16 October 2013; IEEE: New York, NY, USA, 2013; pp. 3026–3030. [Google Scholar]
- Kapela, R.; Śniatała, P.; Turkot, A.; Rybarczyk, A.; Pożarycki, A.; Rydzewski, P.; Wyczałek, M.; Błoch, A. Asphalt surfaced pavement cracks detection based on histograms of oriented gradients. In Proceedings of the 2015 22nd International Conference Mixed Design of Integrated Circuits & Systems (MIXDES), Toruń, Poland, 25–27 June 2015; IEEE: New York, NY, USA, 2015; pp. 579–584. [Google Scholar]
- Quintana, M.; Torres, J.; Menéndez, J.M. A simplified computer vision system for road surface inspection and maintenance. IEEE Trans. Intell. Transp. Syst. 2015, 17, 608–619. [Google Scholar] [CrossRef]
- Zakeri, H.; Nejad, F.M.; Fahimifar, A.; Torshizi, A.D.; Zarandi, M.F. A multi-stage expert system for classification of pavement cracking. In Proceedings of the 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), Edmonton, AB, Canada, 24–28 June 2013; IEEE: New York, NY, USA, 2013; pp. 1125–1130. [Google Scholar]
- Maode, Y.; Shaobo, B.; Kun, X.; Yuyao, H. Pavement crack detection and analysis for high-grade highway. In Proceedings of the 2007 8th International Conference on Electronic Measurement and Instruments, Xi’an, China, 16–18 August 2007; IEEE: New York, NY, USA, 2007; pp. 4–548. [Google Scholar]
- Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic crack detection on two-dimensional pavement images: An algorithm based on minimal path selection. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2718–2729. [Google Scholar] [CrossRef]
- Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: New York, NY, USA, 2016; pp. 3708–3712. [Google Scholar]
- Pauly, L.; Hogg, D.; Fuentes, R.; Peel, H. Deeper networks for pavement crack detection. In Proceedings of the 34th ISARC, Taipei, Taiwan, 28 June–1 July 2017; IAARC: Oulu, Finland, 2017; pp. 479–485. [Google Scholar]
- Feng, C.; Liu, M.Y.; Kao, C.C.; Lee, T.Y. Deep active learning for civil infrastructure defect detection and classification. Comput. Civ. Eng. 2017, 2017, 298–306. [Google Scholar]
- Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.M. How to get pavement distress detection ready for deep learning? A systematic approach. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: New York, NY, USA, 2017; pp. 2039–2047. [Google Scholar]
- Chen, Z.; Zhang, J.; Lai, Z.; Zhu, G.; Liu, Z.; Chen, J.; Li, J. The devil is in the crack orientation: A new perspective for crack detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6653–6663. [Google Scholar]
- Su, P.; Han, H.; Liu, M.; Yang, T.; Liu, S. MOD-YOLO: Rethinking the YOLO architecture at the level of feature information and applying it to crack detection. Expert Syst. Appl. 2024, 237, 121346. [Google Scholar] [CrossRef]
- Dong, X.; Liu, Y.; Dai, J. Concrete surface crack detection algorithm based on improved YOLOv8. Sensors 2024, 24, 5252. [Google Scholar] [CrossRef] [PubMed]
- Lau, S.L.; Chong, E.K.; Yang, X.; Wang, X. Automated pavement crack segmentation using u-net-based convolutional neural network. IEEE Access 2020, 8, 114892–114899. [Google Scholar] [CrossRef]
- Xu, H.; Su, X.; Wang, Y.; Cai, H.; Cui, K.; Chen, X. Automatic bridge crack detection using a convolutional neural network. Appl. Sci. 2019, 9, 2867. [Google Scholar] [CrossRef]
- Zhou, Q.; Qu, Z.; Li, Y.X.; Ju, F.R. Tunnel crack detection with linear seam based on mixed attention and multiscale feature fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
- Pang, J.; Zhang, H.; Zhao, H.; Li, L. DcsNet: A real-time deep network for crack segmentation. Signal Image Video Process. 2022, 16, 911–919. [Google Scholar] [CrossRef]
- Luo, J.; Lin, H.; Wei, X.; Wang, Y. Adaptive Canny and Semantic Segmentation Networks Based on Feature Fusion for Road Crack Detection. IEEE Access 2023, 11, 51740–51753. [Google Scholar] [CrossRef]
- Khan, M.A.M.; Kee, S.H.; Nahid, A.A. Vision-based concrete-crack detection on railway sleepers using dense U-Net model. Algorithms 2023, 16, 568. [Google Scholar] [CrossRef]
- Ai, W.; Zou, J.; Liu, Z.; Wang, S.; Teng, S. Light propagation and multi-scale enhanced DeepLabV3+ for underwater crack detection. Algorithms 2025, 18, 462. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR: London, UK, 2019; pp. 6105–6114. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 11976–11986. [Google Scholar]
- Trockman, A.; Kolter, J.Z. Patches are all you need? arXiv 2022, arXiv:2201.09792. [Google Scholar]
- Lou, M.; Yu, Y. OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 128–138. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Liu, J.; Huang, X.; Liu, Y.; Li, H. Mixmim: Mixed and masked image modeling for efficient visual representation learning. arXiv 2022, arXiv:2205.13137. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR: London, UK, 2021; pp. 10347–10357. [Google Scholar]
- Shi, D. Transnext: Robust foveal visual perception for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 17773–17783. [Google Scholar]
- Fan, Q.; Huang, H.; Chen, M.; Liu, H.; He, R. Rmt: Retentive networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 5641–5651. [Google Scholar]
- Su, L.; Ma, X.; Zhu, X.; Niu, C.; Lei, Z.; Zhou, J.Z. Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter-efficient image manipulation localization through spare-coding transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 3 March 2025; Volume 39, pp. 7024–7032. [Google Scholar]
- Wang, Z.; Liu, Y.; Tian, Y.; Liu, Y.; Wang, Y.; Ye, Q. Building Vision Models upon Heat Conduction. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 9707–9717. [Google Scholar]
- Xu, J.; Le, H.; Nguyen, V.; Ranjan, V.; Samaras, D. Zero-shot object counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15548–15557. [Google Scholar]
- Zhu, H.; Yuan, J.; Yang, Z.; Guo, Y.; Wang, Z.; Zhong, X.; He, S. Zero-shot object counting with good exemplars. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 368–385. [Google Scholar]
- Liu, X.; Li, G.; Qi, Y.; Yan, Z.; Zhang, W.; Qing, L.; Huang, Q. Dynamic example network for class-agnostic object counting. Pattern Recognit. 2026, 170, 111998. [Google Scholar] [CrossRef]
- He, J.; Li, P.; Geng, Y.; Xie, X. Fastinst: A simple query-based model for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 23663–23672. [Google Scholar]
- Wei, Z.; Chen, P.; Yu, X.; Li, G.; Jiao, J.; Han, Z. Semantic-aware sam for point-prompted instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 3585–3594. [Google Scholar]
- Akramin, M.; Marizi, M.; Husnain, M.; Shamil Shaari, M. Analysis of surface crack using various crack growth models. J. Phys. Conf. Ser. 2020, 1529, 042074. [Google Scholar] [CrossRef]
- Dai, Q.; Ishfaque, M.; Khan, S.U.R.; Luo, Y.L.; Lei, Y.; Zhang, B.; Zhou, W. Image classification for sub-surface crack identification in concrete dam based on borehole CCTV images using deep dense hybrid model. Stoch. Environ. Res. Risk Assess. 2025, 39, 4637–4654. [Google Scholar] [CrossRef]
- Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
- Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 6153–6162. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
- Ding, M.; Xiao, B.; Codella, N.; Luo, P.; Wang, J.; Yuan, L. Davit: Dual attention vision transformers. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XXIV; Springer: Cham, Switzerland, 2022; pp. 74–92. [Google Scholar]
- Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
- Mangalam, K.; Fan, H.; Li, Y.; Wu, C.Y.; Xiong, B.; Feichtenhofer, C.; Malik, J. Reversible vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10830–10840. [Google Scholar]
- Han, D.; Li, T.; Wang, Z.; Huang, G. Vision Transformers are Circulant Attention Learners. arXiv 2025, arXiv:2512.21542. [Google Scholar] [CrossRef]
- Wang, F.; Ren, S.; Zhang, T.; Neskovic, P.; Bhattad, A.; Xie, C.; Yuille, A. ViT-5: Vision Transformers for The Mid-2020s. arXiv 2026, arXiv:2602.08071. [Google Scholar] [CrossRef]




| Comparison Dimension | Typical Transformer | AowFormer |
|---|---|---|
| Core Design Concept | Self-attention core | self-attention approximation |
| Self-Attention Computation | Calculate weights then feature weighting | Feature fusion + query vector |
| Attention Scope | ViT: Global; Swin: Windowed with feature isolation | overlapping window |
| Convolution Usage | Auxiliary to self-attention | Core of self-attention approximation |
| Nonlinear Operations | FFN uses ReLU/BN | No nonlinearity in Conv stages |
| Channel Dimension | Aggregate then restore | Preserved throughout; richer feature expression |
| Complexity and Parameters | Quadratic complexity; high parameters | Conv-level complexity; far fewer parameters |
| Feature Extraction Focus | Global dependencies; noise-prone | Local overlapping windows; noise-suppressed |
| Method | Top1 Acc. | Mean AP | Mean Recall | Mean F1 |
|---|---|---|---|---|
| ViT [43] | 45.83 | 50.82 | 45.70 | 43.99 |
| MobileNetv3 [61] | 55.21 | 55.60 | 54.41 | 54.54 |
| MobileViT [45] | 52.43 | 55.73 | 51.78 | 50.19 |
| ResNet [37] | 48.26 | 48.94 | 47.74 | 48.25 |
| Swin Transformer [35] | 43.06 | 47.91 | 40.96 | 32.24 |
| ConvNext [40] | 37.85 | 51.29 | 36.91 | 29.50 |
| DeiT [47] | 51.04 | 51.07 | 51.51 | 50.12 |
| ConvMixer [41] | 51.74 | 51.39 | 52.08 | 51.22 |
| MixMim [46] | 35.42 | 38.69 | 35.83 | 32.31 |
| EfficientNet [39] | 54.51 | 55.95 | 53.75 | 54.15 |
| TransNext [48] | 53.47 | 53.04 | 54.52 | 51.49 |
| DaViT [62] | 54.51 | 54.48 | 54.63 | 54.48 |
| RepLKNet [63] | 53.12 | 53.98 | 51.91 | 52.25 |
| Rev-Vit [64] | 53.12 | 53.16 | 52.61 | 52.80 |
| RMT [49] | 54.17 | 52.80 | 54.65 | 50.15 |
| SparseViT [50] | 54.17 | 54.38 | 53.86 | 53.89 |
| vHeat [51] | 53.82 | 53.39 | 53.44 | 53.42 |
| OverLoCK [42] | 54.17 | 54.06 | 54.30 | 53.92 |
| CA-Deit [65] | 52.78 | 52.68 | 52.63 | 52.61 |
| ViT-5 [66] | 52.78 | 52.69 | 52.54 | 52.14 |
| CSPNet (ours) | 56.25 | 56.70 | 55.92 | 56.02 |
| Components | Top1 Acc. | Mean AP | Mean Recall | Mean F1 |
|---|---|---|---|---|
| AowFormer | 54.51 | 55.12 | 54.10 | 54.37 |
| CPM | 56.25 | 56.70 | 55.92 | 56.02 |
| Operations | Top1 Acc. | Mean AP | Mean Recall | Mean F1 |
|---|---|---|---|---|
| +ReLU | 55.21 | 55.93 | 55.20 | 55.25 |
| +BN and ReLU | 54.17 | 54.87 | 53.60 | 54.03 |
| Num 1 | 54.17 | 55.11 | 54.04 | 54.49 |
| Num 5 | 53.47 | 53.66 | 53.55 | 53.52 |
| Direct | 54.86 | 55.44 | 55.39 | 54.63 |
| QKV G = 1 | 54.51 | 54.45 | 54.10 | 54.26 |
| QKV G = 4 | 55.21 | 55.91 | 54.93 | 55.34 |
| Ratio | Top1 Acc. | Mean AP | Mean Recall | Mean F1 |
|---|---|---|---|---|
| 1:2 | 54.51 | 53.95 | 54.45 | 53.88 |
| 1:1 | 53.82 | 54.41 | 53.79 | 53.72 |
| 2:1 | 56.25 | 56.70 | 55.92 | 56.02 |
| 3:1 | 55.56 | 56.32 | 55.18 | 55.60 |
| Imbalance Conditions | Top1 Acc. | Mean AP | Mean Recall | Mean F1 |
|---|---|---|---|---|
| Raw ratio | 43.06 | 29.58 | 40.79 | 30.86 |
| Imbalanced Sample 1 | 50.69 | 69.96 | 50.22 | 45.83 |
| Imbalanced Sample 2 | 52.43 | 58.88 | 51.78 | 50.69 |
| Balanced ratio | 56.25 | 56.70 | 55.92 | 56.02 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, M.; Xu, F.; Zhang, F.; Zhao, J.; Zhao, H. Channel Segmentation Proofreading Network for Crack Counting with Imbalanced Samples. Algorithms 2026, 19, 236. https://doi.org/10.3390/a19030236
Sun M, Xu F, Zhang F, Zhao J, Zhao H. Channel Segmentation Proofreading Network for Crack Counting with Imbalanced Samples. Algorithms. 2026; 19(3):236. https://doi.org/10.3390/a19030236
Chicago/Turabian StyleSun, Mingsi, Fangai Xu, Fachao Zhang, Jian Zhao, and Hongwei Zhao. 2026. "Channel Segmentation Proofreading Network for Crack Counting with Imbalanced Samples" Algorithms 19, no. 3: 236. https://doi.org/10.3390/a19030236
APA StyleSun, M., Xu, F., Zhang, F., Zhao, J., & Zhao, H. (2026). Channel Segmentation Proofreading Network for Crack Counting with Imbalanced Samples. Algorithms, 19(3), 236. https://doi.org/10.3390/a19030236

