AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation

Fang, Shaoliang; Lu, Lu; Lin, Zhu; Yang, Zhanyu; Wang, Shaosheng

doi:10.3390/computers14050182

Open AccessArticle

AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation

by

Shaoliang Fang

^1,2

,

Lu Lu

^1,*,

Zhu Lin

²

,

Zhanyu Yang

¹

and

Shaosheng Wang

¹

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China

²

Guangdong Science and Technology Infrastructure Center, Guangzhou 510033, China

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(5), 182; https://doi.org/10.3390/computers14050182

Submission received: 19 March 2025 / Revised: 22 April 2025 / Accepted: 7 May 2025 / Published: 9 May 2025

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

Concrete surface crack detection plays a crucial role in infrastructure maintenance and safety. Deep learning-based methods have shown great potential in this task. However, under real-world conditions such as poor image quality, environmental interference, and complex crack patterns, existing models still face challenges in detecting fine cracks and often rely on large training parameters, limiting their practicality in complex environments. To address these issues, this paper proposes a crack detection model based on adaptive feature quantization, which primarily consists of a maximum soft pooling module, an adaptive crack feature quantization module, and a trainable crack post-processing module. Specifically, the maximum soft pooling module improves the continuity and integrity of detected cracks. The adaptive crack feature quantization module enhances the contrast between cracks and background features and strengthens the model’s focus on critical regions through spatial feature fusion. The trainable crack post-processing module incorporates edge-guided post-processing algorithms to correct false predictions and refine segmentation results. Experiments conducted on the Crack500 Road Crack Dataset show that, the proposed model achieves notable improvements in detection accuracy and efficiency, with an average F1-score improvement of 2.81% and a precision gain of 2.20% over the baseline methods. In addition, the model significantly reduces computational cost, achieving a 78.5–88.7% reduction in parameter size and up to 96.8% improvement in inference speed, making it more efficient and deployable for real-world crack detection applications.

Keywords:

surface crack detection; adaptive feature quantization; deep learning model; accuracy enhancement

1. Introduction

Concrete surface crack detection is a crucial component of road maintenance and infrastructure safety management. Early and accurate identification of surface cracks helps prevent potential structural damage and traffic accidents [1,2]. Among various approaches, vision-based automatic crack detection has attracted increasing attention due to its potential for high efficiency, scalability, and cost-effectiveness [3,4].

Recent crack detection models are predominantly based on supervised deep learning techniques, often adopting encoder–decoder architectures [5]. These models, such as UNet [6], BiSeNetV2 [7], and DeeplabV3+ [8], typically rely on convolutional backbones to extract local features, followed by upsampling strategies to produce dense segmentation results. With the development of attention mechanisms, models like PCNet have improved the continuity of detection by integrating spatial and channel attention [2], while high-resolution segmentation network (HrSegNet) leverages multi-resolution branches to address the scale imbalance of slender cracks [9]. Additionally, loss functions such as Focal Loss [10] enhance model focus on hard-to-detect crack features. Some studies have further explored the use of generative adversarial networks (GANs) for image reconstruction to improve feature extraction indirectly [11,12,13].

Despite these advancements, existing models still face key challenges under real-world road conditions. First, fine cracks with elongated and irregular shapes are easily lost during downsampling, resulting in low recall and incomplete segmentation [1,14]. Second, models often require large parameter sizes to achieve high accuracy, which limits their deployment in resource-constrained scenarios. Lastly, diverse road textures, occlusions such as stains and shadows, and inconsistent manual annotations introduce noise into training, reducing model robustness and generalization [3,4]. In addition, although post-processing algorithms have been developed to refine segmentation results based on surrounding pixel predictions [15,16,17], they often lack robustness across datasets and require manual parameter tuning.

To address these challenges, this paper proposes a novel crack detection model based on adaptive feature quantization. The model comprises three main components: (1) a maximum soft pooling module to preserve the continuity and completeness of crack structures; (2) an adaptive feature quantization module that highlights key crack features through enhanced spatial fusion; and (3) a trainable, edge-guided post-processing module that corrects prediction errors and refines segmentation boundaries. Experimental results on the Crack500 Road Crack Dataset demonstrate that our model achieves higher accuracy and precision over baseline methods, and significantly reduces computational cost, fewer parameters and faster inference.

The contributions of this article are as follows:

We propose an adaptive feature quantization network for instance-level surface crack segmentation (AFQSeg), a surface crack detection model based on adaptive feature quantization. It integrates three key modules—SMP for downsampling, AFQ for feature enhancement, and CR for refined segmentation—to improve robustness and accuracy in crack detection.
We design the SoftMax Pooling (SMP) module to enhance the continuity of fine crack detection. By preserving all pixel information within pooling windows and adaptively adjusting weights based on pixel distributions, the module reduces feature loss during downsampling while achieving efficient information compression.
We introduce the AFQ (Adaptive Feature Quantization) module, which enhances feature representation by leveraging a pre-trained vector quantized generative adversarial network (VQGAN) codebook and a spatial feature fusion strategy. This combination allows the model to extract more representative crack prototypes and adaptively integrate them with current feature maps, improving detection performance across different domains.
We propose the crack refinement (CR) module, which fuses edge information with high-level features and introduces an edge-aware loss function to enhance the accuracy of boundary segmentation.
We significantly reduce model complexity, achieving up to 88.7% fewer parameters and 96.8% faster inference compared to baseline models, making AFQSeg more suitable for real-world deployment.

The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 describes the proposed method. Section 4 outlines the experimental setup. Section 5 presents the results and analysis. Section 6 discusses the method’s strengths and limitations. Finally, Section 7 concludes the paper and suggests future work.

2. Related Works

2.1. Crack Detection Under Challenging Conditions

Recent studies have addressed the challenges of crack detection under complex environmental conditions, such as varying illumination and underwater settings. For instance, Fan et al. proposed a shadow-removal-oriented crack detection approach that effectively mitigates the interference of shadows in pavement crack images, enhancing detection accuracy [18]. Similarly, Pal et al. provided an overview of the difficulties associated with automatic detection of concrete cracks in the presence of shadows, highlighting the limitations of models trained on ideal conditions [19].

In underwater environments, where visibility is often compromised, Orinaitė et al. developed a deep learning-based approach for detecting concrete cracks below the waterline, demonstrating the feasibility of machine learning techniques in such challenging conditions [20]. Additionally, a study introduced an improved YOLOv8 network tailored for underwater crack detection, incorporating image enhancement methods to address issues like low contrast and blurriness [21].

2.2. Encoder–Decoder Architectures

Research on concrete surface crack detection has progressed significantly in recent years, particularly with the adoption of deep learning-based approaches. Most existing methods follow an encoder–decoder architecture, where convolutional layers extract features and upsampling layers restore spatial resolution for pixel-wise segmentation [6,22,23]. Among them, UNet remains a foundational baseline due to its simplicity and effectiveness [6].

2.3. Attention Mechanisms

To improve performance, many researchers have explored attention mechanisms. For instance, channel and spatial attention modules have been integrated to enhance the model’s capacity to focus on critical regions. PCNet introduces both types of attention to address crack continuity issues [2], while HrSegNet incorporates multi-resolution branches to capture detailed crack structures and mitigate scale imbalance [9]. Further improvements have been achieved using transformer-based designs and more advanced attention configurations [24,25,26].

2.4. Post-Processing Techniques

Loss function optimization is another area of focus. The Focal Loss has been shown to help models pay more attention to hard-to-detect crack features, thereby improving accuracy in imbalanced scenarios [10]. Additionally, some researchers have introduced generative adversarial networks (GANs) for image-level reconstruction before detection, which indirectly enhances the quality of feature representations [11,12,13].

To further refine results, post-processing methods have been adopted to improve segmentation accuracy. These techniques often rely on the contextual consistency of predicted pixels, adjusting outputs based on neighboring information [15,16,17]. However, such methods can be sensitive to parameter settings and lack robustness across datasets.

2.5. Existing Challenges

While the incorporation of attention mechanisms, advanced loss functions, and post-processing steps has led to notable performance improvements, existing methods still face limitations in robustness, parameter efficiency, and their ability to handle challenging real-world conditions.

3. Method

In light of the aforementioned challenges, this paper proposes a targeted AFQSeg surface crack detection model based on adaptive feature quantization. This model is primarily composed of three key modules: the maximum soft pooling module (SMP), the adaptive crack feature quantization module (AFQ), and the trainable crack post-processing module (CR). The SMP module is designed to address the limitations of continuous recognition of small cracks caused by feature information loss during dimensionality reduction in traditional max pooling operations. SMP efficiently preserves high-resolution features, minimizing information loss during down-sampling and enhancing the capture of fine crack details. The AFQ module is designed to extract more representative crack prototype features through VQGAN [27,28,29,30]. It dynamically adjusts the fusion strategy to ensure optimal alignment and complementarity between features, which in turn improves the model’s detection accuracy and robustness across various domains. The CR module is designed to solve the common problems of inaccurate segmentation and misjudgment caused by annotation errors in practical applications. By accurately locating edge pixels and guiding the segmentation pixels to diffuse towards the target edge, it effectively compensates for the segmentation deviation caused by inaccurate annotation, ensuring the accuracy and reliability of detection results.

3.1. Framework

The proposed AFQSeg surface crack detection model, which is based on adaptive feature quantization, introduces three key modules: the SMP (Soft Max Pooling), AFQ (Adaptive Crack Feature Quantization), and CR (Crack Post-processing) modules. These modules are integrated into the conventional encoder-decoder architecture for image defect detection, enabling efficient recognition and precise localization of crack areas in road images, as illustrated in Figure 1.

3.2. AFQSeg Surface Crack Detection Model

In the encoding stage, the model first processes the input crack image through a feature extraction module, reducing the size of the feature map to reduce computational complexity. It then employs a four-layer Attention Block module to extract features. This module merges traditional convolutional layers with attention mechanisms, encoding block features first and then enhancing the focus on global contextual information through self-attention mechanisms. This design enhances the model’s understanding of complex textures and structures, especially suitable for capturing slender and complex targets such as cracks. To mitigate computational complexity while retaining crucial information, the model introduces the SMP module with a probabilistic down-sampling method, effectively avoiding key information loss and enhancing the model’s robustness.

In the encoding-decoding stage, the AFQ module is extensively utilized, employing pre-trained codebooks for feature enhancement during the encoding phase. The vectors in these codebooks represent the key visual features in the image library. By calculating the feature distance and quantifying it, the model learns more abstract and generalized crack features, thereby enhancing the quality of feature representation and fortifying the model’s crack identification capabilities. To integrate quantitative and extracted features, the model uses the DF (Dynamic Fusion) module, which dynamically adjusts fusion weights based on various feature map positions, achieving efficient consolidation of features across different scales and types. This mechanism enhances the model’s ability to discern cracks against complex backgrounds, thereby improving the precision and robustness of its predictions. During the decoding process, multiple SegBlocks are used to fuse and segment the features of each layer. SegBlock consists of multiple convolutional layers, each containing multiple CBAM (Convolutional Block Attention Module) and MLP (Multi-Layer Perceptron) modules. These modules further extract and enhance crack features to ensure that the model can accurately identify and locate crack areas. The model uses the CBAM module to fuse and enhance the features of skip layer concatenation. Then, linear interpolation is used to up-sample the segmentation features.

In the decoding stage, in order to further improve the precision of prediction, especially in the definition of crack edges, a CR module is integrated for crack refinement. In this stage, multi-scale separable convolution modules are used to enhance feature extraction, and a reprediction module is employed to perform secondary segmentation on the edge regions in the preliminary prediction results. This strategy significantly reduces false positives and false negatives, and improves the accuracy of crack detection.

3.3. SoftMax Pooling Module

Max Pooling is a common downsampling operation in convolutional neural networks, which reduces the feature scale by preserving the maximum value within a specified window size. However, this method is prone to losing feature details because it directly ignores non maximum elements in the window [31]. To counteract this issue, this paper proposes the Soft Max Pooling method, aimed at preserving the information of all pixels in the window, not just the maximum value, as depicted in Figure 2. The main steps of Soft Max Pooling are as follows:

Conduct traditional max pooling to identify the index of the maximum value within each window.
Utilize tensors, where all elements are set to 1, along with the recorded index of the maximum value, to reconstruct a weight map equivalent to the current feature scale.
In this weight map, assign a weight of 1 to the position of the maximum value and 0 to all other positions.
To ensure a smoother transition and to retain more information, fine-tune the weight map.
Input the adjusted weight map into a 3 × 3 convolutional layer to calculate new pooling weights via convolution.
Assign a weight to each pixel based on the distribution of pixel values within the window, instead of selecting only the maximum value.
Apply the calculated pooling weights to perform weighted down-sampling on the original feature map, resulting in an output where each position is the weighted sum of all pixels in the window.
Integrate this process with a branch that employs adaptive down-sampling through convolution, merging its output with that of the Soft Max Pooling branch to achieve the final Soft Max Pooling result.

SoftMax Pooling retains the information of all pixels in the window, not just the maximum value, thereby reducing the loss of feature details. By calculating pooling weights through convolution, the weights can be adaptively adjusted based on the distribution of pixel values, making the down-sampling results more reasonable. Combining direct convolution down-sampling branches increases the flexibility and adaptability of the model.

3.4. Adaptive Feature Quantization Module

The core components of this module include a feature space conversion module and a quantized feature encoder [32]. This module is designed to effectively utilize the fourth layer features as input, and achieve nonlinear mapping of the feature space through a 1 × 1 convolutional layer to accurately match the feature space of the current codebook. Then, the features are quantized using a pre-training codebook. The quantized features are then mapped to the feature space through 1 × 1 convolution to obtain effective quantized features, as shown in Figure 3a.

The initialization of the codebook is completed through a pre-training process to ensure that its parameters can adapt to different task requirements. Primarily, the codebook is mainly responsible for quantifying the features extracted by the model. These quantified features can accurately reflect the characteristics of the pixel’s category, thereby enhancing the model’s ability to restore image details. To further optimize the effects of style reconstruction and local detail reconstruction, this module introduces a local discriminator based on the global discriminator, as shown in Figure 3b. This discriminator can classify both local and global features of an image simultaneously, ensuring consistency between overall style and local details during the reconstruction process. Through this mechanism, the codebook not only achieves image feature extraction and clustering optimization, but also plays an important role in feature enhancement.

During the feature quantization process, the model uses a codebook to quantize the extracted features. This quantization process aims to improve the processing efficiency and accuracy of the model by reducing the complexity of the feature space and encoding the image with a limited number of features. Concurrently, the quantified features can more effectively express the category information of pixels, providing valuable references for subsequent crack segmentation. Upon completing the spatial mapping of the fourth layer features, an efficient algorithm based on the cosine similarity formula is used to calculate the feature with the smallest distance between each pixel feature in the codebook. This calculation process is based on the cosine similarity formula, which can accurately reflect the similarity between pixel features and features in the codebook. Through this calculation process, the most matching feature representation can be selected for each pixel, providing a quantization target.

After completing the feature mapping, in order to further improve the generalization ability and robustness of the model, this study performed an additional spatial transformation on the obtained features. This conversion process aims to weaken the model’s excessive dependence on feature encoding, allowing the model to adapt more flexibly to different input data. Through this mechanism, the model can reduce its dependence on specific codebooks while maintaining high performance, improving its generality and scalability.

In order to effectively integrate quantitative features with extracted features, this paper proposes an adaptive spatial fusion module, as shown in Figure 4. This module first evaluates the importance of each pixel by calculating the channel accumulation value and performing convolution operations. Then, the softmax function is used to normalize these importance levels and obtain the weight of each pixel in the feature fusion process. Finally, based on these weights, the features are accumulated and calculated to achieve the fusion of quantized features and extracted features. This fusion process can fully utilize the advantages of both features, improving the model’s expressive power and reconstruction quality. Meanwhile, due to its adaptability, this module can automatically adjust the fusion strategy based on different input data, thus possessing high flexibility and generalization ability.

3.5. Post Processing Module

In crack detection tasks, there is often a high degree of similarity between the apparent and background features of cracks, which leads to uncertainty in predicting classification at the probability level. In addition, during the model training process, the prediction of segmentation results at the original resolution is achieved by up-sampling the segmentation results. However, this up-sampling method may not accurately capture the details of cracks, especially when the proportion of pixels in the image is small, thereby reducing the accuracy of segmentation [33]. To overcome this challenge, we incorporate the re-segmentation strategy from PointRend into our model, combining it with edge-aware optimization to refine the final crack segmentation outputs. Specifically, it is to use the extracted second layer features and the current prediction result as inputs for re-segmentation, and train a specialized re-segmentation module, as shown in Figure 5.

Firstly, multiple points are randomly oversampled on the feature map, and linear interpolation is applied to each input feature independently to obtain features upsampled to their original spatial positions. This method ensures that the model can fully utilize the information in the original image during the re-segmentation process, improving the accuracy of crack detection. However, due to the small pixel proportion of cracks in the image and the crucial importance of predicting crack edges for overall detection accuracy [34]. Therefore, this article further introduces edge detection operators, focusing on re-segmentation of edge pixels. The model concatenates the second layer features and classification features corresponding to the pixel, and inputs them into a multi-layer perceptron (MLP) network to predict the classification of the features. In this way, the model can more accurately identify the edges of cracks and further improve the accuracy of crack detection.

In the inference stage, the selection of sampling points is mainly based on the uncertainty of classification. Priority should be given to selecting points with low classification probability and high uncertainty for further segmentation, to ensure that the model can focus on difficulty to distinguish regions and further improve the accuracy and robustness of crack detection.

4. Experiment

4.1. Dataset Description

To verify the effectiveness of the proposed method, we adopt the Crack500 Road Crack Dataset, a widely used benchmark in the field of computer vision and deep learning for road crack detection. This dataset is of significant value for applications such as intelligent transportation systems, urban infrastructure maintenance, and predictive maintenance. Owing to its reliability and generality, Crack500 has been adopted in numerous prior studies.

The dataset consists of 250 training images, 50 validation images, and 200 test images, covering various types of cracks including horizontal, vertical, and irregular patterns—each presenting different levels of detection difficulty. Although the dataset primarily features road surface images, the proposed AFQSeg model is designed to generalize across different concrete structures. With minimal domain-specific adaptation, it can be extended to other scenarios such as bridges, walls, and building foundations.

4.2. Experimental Environment

To evaluate the performance of the proposed method, experiments were implemented using the PyTorch (https://pytorch.org/) deep learning framework, with code developed in Python 3.6. All input images were uniformly resized to a fixed resolution to reduce computational cost and maintain consistency. Batch normalization was applied after each convolutional layer to accelerate convergence during training. Xavier initialization was used to initialize the model weights, and parameters were updated using stochastic gradient descent (SGD).

Training and testing were conducted on a server equipped with an Intel(R) Xeon(R) Silver 4214 CPU (4 cores) and an NVIDIA Tesla A100-SXM4-40GB GPU.

4.3. Evaluation Indicators

This article uses four key evaluation metrics: Intersection over Union (IoU), Recall (R_ec), Precision (P_re), and F₁ Score, which are also evaluation metrics used by many machine learning methods. The calculation formula is as follows:

I o u = \frac{a r e a (C) \cap a r e a (G)}{a r e a (C) \cup a r e a (G)}

(1)

R_{e c} = \frac{T_{p}}{T_{p} + F_{n}}

(2)

P_{r e} = \frac{T_{p}}{T_{p} + F_{p}}

(3)

F_{1} = 2 \times \frac{P_{r e} \times R_{e c}}{P_{r e} + R_{e c}}

(4)

In the formula, T_p is the number of positive samples correctly classified, F_p is the number of negative samples misclassified as positive samples, F_n is the number of positive samples misclassified as negative samples, C is the candidate box, and G is the original marked box. Intersection over Union (IoU) is the overlap rate between the candidate bound generated by object detection and the ground truth bound, which is the ratio of their intersection to the union. The ideal situation is complete overlap, with a ratio of 1.

5. Results and Analyze

To better validate the accuracy, efficiency, and experimental feasibility of the proposed method, comparisons were made with five commonly used reference algorithms, including UNet [25], BiSeNetV2 [35], DeeplabV3+ [36], HRNet [13], and SwinUNet [37]. In addition to evaluation metrics such as Intersection over Union (IoU), Recall, Precision, and Score, Params and GFLOPs were also calculated to better verify the practicality and efficiency of the method. The specific experimental results are shown in the Table 1:

The method proposed in this article has been experimentally verified to have good detection performance, with an Intersection over Union (IoU) of 75.93%, a recall rate of 88.49%, an precision of 82.81%, and a score of 84.45%. Compared with five reference algorithms, UNet, BiSeNetV2, DeeplabV3+, HRNet, SwinUNet, etc., all four evaluation indicators have been improved. Among them, the precision can reflect the precision of the method in crack detection. Compared with the reference algorithm, the maximum improvement is 6.67%, and the minimum improvement is 0.62%; The intersection to union ratio can reflect the degree of overlap between the target and the candidate, and can also effectively reflect the crack detection effect.

Compared with the reference algorithm, it has a maximum improvement of 6.13% and a minimum improvement of 1.14%; The recall rate can reflect the comprehensiveness of the detection effect, especially in the field of transportation. In order to prevent traffic accidents, sometimes even subtle cracks need to be detected in a timely manner, and the improvement of the recall rate of the detection algorithm is quite important. The maximum increase in recall rate is 6.6%, and the minimum increase is 0.96%; The F1 value can reflect the degree of recognition of positive and negative examples, with a maximum increase of 6.95% and a minimum increase of 0.96%. In summary, the application of the method can comprehensively improve the precision, efficiency, and practicality of crack detection.

This method can effectively detect cracks in various scene images. After detecting cracks in scenes of the same type as the experimental images, the specific effect is shown in Figure 6, where rows represent a type of scene, columns represent each detection model, green represents the defect itself, and red represents the predicted result. It can be seen that our proposed method has good detection performance.

While improving the evaluation indicators, the model parameters and computational complexity of the method have also been greatly improved. The model parameters of this method are 6.683 MB, and the computational speed is 7.081 GFLOPs. Compared with the reference algorithm, this method can greatly reduce the model parameters and improve the computational speed. The specific experimental results are shown in the Table 2.

Among them, the model parameters have been reduced by 49.7–88.7%, with the highest improvement in DeeplabV3+ algorithm and 49.7% improvement in BiSeNetV2 algorithm with only 13.294 parameters. It can be seen that this method has been greatly optimized at the parameter level. The method has good computational performance, with an improvement range of 0.92–96.8%. Especially compared with the UNet method, it has the highest improvement rate.

The model integrates the SMP, AFQ, and CR modules into the encoding-decoding architecture, endowing the method with good accuracy, recall, and practicality. To better ensure the stability and simplicity of the model, one or more modules were removed for ablation experiments, and then the prediction performance compared and analyzed. The experimental results show that the fusion of the three modules can complement each other and has better effects than using them separately. The specific experimental results are shown in the Table 3.

6. Discussion

The proposed AFQSeg model demonstrates several notable advantages over existing crack detection methods. First, it achieves consistent improvements across all key evaluation metrics—IoU, Recall, Precision, and F1-score—when compared to five representative deep learning models (UNet, BiSeNetV2, DeeplabV3+, HRNet, and SwinUNet). Notably, AFQSeg achieves an F1-score of 84.45%, outperforming the best baseline (SwinUNet) by 1.75%, and yields a Recall improvement of up to 6.6%, which is particularly critical in safety-sensitive applications such as road maintenance, where even subtle cracks must be identified reliably.

Another significant strength lies in the model’s lightweight architecture. AFQSeg achieves these accuracy gains while using only 6.683 MB of parameters, with a parameter reduction of up to 88.7% compared to heavier models like DeeplabV3+, and up to 96.8% improvement in inference speed over UNet. These improvements make the model highly practical for real-time or resource-constrained scenarios such as UAV-based inspections or mobile deployment.

From a design perspective, AFQSeg integrates three complementary modules—SMP for feature-preserving downsampling, AFQ for semantic feature quantization and fusion, and CR for post-refinement guided by edge information. Ablation studies confirm that these modules contribute synergistically to the final performance, with the full integration yielding superior results over any partial configuration.

However, the method also has certain limitations. Unlike some recent approaches that report extremely high accuracy (up to 99%), the performance metrics reported here are relatively modest. This discrepancy is primarily due to the choice of dataset. The Crack500 dataset used in our experiments contains more diverse, complex, and challenging real-world images than many previously used datasets, making it a more rigorous testbed. In addition, no extensive pretraining or domain-specific tuning was applied to artificially boost performance. Thus, while the numerical performance may appear lower than some previous works, the evaluation here reflects realistic robustness and generalization.

Furthermore, the current model is validated mainly on road surface cracks. While the design is generic and potentially applicable to other types of concrete surfaces (e.g., bridges, tunnels, walls), additional optimization may be needed to handle drastically different image characteristics or environmental conditions such as underwater imagery.

In summary, AFQSeg offers a strong balance between detection accuracy, computational efficiency, and architectural simplicity. The proposed method demonstrates strong potential for real-world applications and future adaptation to a wider range of surface types or more challenging detection environments.

7. Conclusions

This paper addresses the challenge of detecting fine surface cracks, which are often difficult to identify due to their narrow width, irregular shapes, and susceptibility to visual interference. To tackle this problem, we propose a novel crack detection model based on adaptive feature quantization. The model integrates a feature-preserving pooling module, a prototype-driven quantization mechanism, and an edge-aware refinement component. Experimental results on the Crack500 dataset demonstrate that the proposed method outperforms several baselines in terms of IoU, precision, recall, and F1-score, while also significantly reducing model parameters and computational cost. By enhancing the contrast between cracks and background features and preserving high-resolution details, the model effectively improves the detection of subtle and elongated cracks. Moreover, the adaptive fusion strategy enhances the model’s robustness across complex scenarios.

In future work, we plan to explore the application of AFQSeg to more diverse concrete surfaces beyond roads, such as bridge decks and structural walls. Given the model’s adaptability and modular design, we expect that, with appropriate domain-specific fine-tuning, AFQSeg can be effectively extended to a broader range of crack detection scenarios.

Author Contributions

Conceptualization, S.F., Z.Y. and S.W.; methodology, S.F. and Z.Y.; software, S.F.; validation, L.L., Z.L. and L.L.; formal analysis, S.F. and L.L.; investigation, S.F.; resources; data curation; writing—original draft preparation, S.F.; writing—review and editing, S.F. and S.W.; visualization, S.F.; supervision, L.L. and Z.L.; project administration, L.L. and Z.L.; funding acquisition, L.L. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key Areas Research and Development Project of Guangdong Province number 2022B0101070001 and number 2022B0101070002.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jiang, Y.; Han, S.; Li, D.; Bai, Y.; Wang, M. Automatic concrete sidewalk deficiency detection and mapping with deep learning. Expert Syst. Appl. 2022, 207, 117980. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, G.; Zhang, D.; Tan, D.; Huang, H. A hybrid attention deep learning network for refined segmentation of cracks from shield tunnel lining images. J. Rock Mech. Geotech. Eng. 2023, 15, 3105–3117. [Google Scholar] [CrossRef]
Wang, M.; Yang, W.; Wang, L.; Chen, D.; Wei, F.; KeZiErBieKe, H.; Liao, Y. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 2023, 90, 103752. [Google Scholar] [CrossRef]
Faudzi, M.A.; Osman, M.; Yusof, N.M.; Ahmad, K.; Ahmad, F.; Idris, M.; Raof, R.; Hazlyna, H.N. Detection of Crack on Asphalt Pavement using Deep Convolutional Neural Network. J. Phys. Conf. Ser. 2021, 1755, 012048. [Google Scholar]
Al-Bayati, A.J.; Ali, M.; Nnaji, C. Managing work zone safety during road maintenance and construction activities: Challenges and opportunities. Pract. Period. Struct. Des. Constr. 2023, 28, 04022068. [Google Scholar] [CrossRef]
Zhang, J.; Tian, F.; Li, T.; Lan, H. A Surface Crack Detection Method of Spillway Tunnel Based on an Improved U-Net Network. In Proceedings of the 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Chengdu, China, 19–21 August 2022; pp. 845–850. [Google Scholar]
Yu, H.; Deng, Y.; Guo, F. Real-time pavement surface crack detection based on lightweight semantic segmentation model. Transp. Geotech. 2024, 48, 101335. [Google Scholar] [CrossRef]
Wang, X.; Wang, T.; Li, J. Advanced crack detection and quantification strategy based on CLAHE enhanced DeepLabv3+. Eng. Appl. Artif. Intell. 2023, 126, 106880. [Google Scholar] [CrossRef]
Li, Y.; Ma, R.; Liu, H.; Cheng, G. Real-time high-resolution neural network with semantic guidance for crack segmentation. Autom. Constr. 2023, 156, 105112. [Google Scholar] [CrossRef]
Geng, P.; Tan, Z.; Luo, J.; Wang, T.; Li, F.; Bei, J. ACPA-Net: Atrous channel pyramid attention network for segmentation of leakage in rail tunnel linings. Electronics 2023, 12, 255. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, H.; Li, H.; Wu, S. Recovering compressed images for automatic crack segmentation using generative models. Mech. Syst. Signal Process. 2021, 146, 107061. [Google Scholar] [CrossRef]
Ma, N.; Fan, R.; Xie, L. UP-CrackNet: Unsupervised pixel-wise road crack detection via adversarial image restoration. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13926–13936. [Google Scholar] [CrossRef]
Yu, J.; Kim, D.Y.; Lee, Y.; Jeon, M. Unsupervised pixel-level road defect detection via adversarial image-to-frequency transform. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1708–1713. [Google Scholar]
Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Constr. Build. Mater. 2022, 321, 126162. [Google Scholar] [CrossRef]
Xu, L.; Wang, Y.; Dong, A.; Zhu, L.; Shi, H.; Yu, Z. Image-based intelligent detection of typical defects of complex subway tunnel surface. Tunn. Undergr. Space Technol. 2023, 140, 105266. [Google Scholar] [CrossRef]
Inoue, Y.; Nagayoshi, H. Weakly-supervised crack detection. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12050–12061. [Google Scholar] [CrossRef]
Zhu, Q.; Dinh, T.H.; Phung, M.D.; Ha, Q.P. Hierarchical convolutional neural network with feature preservation and autotuned thresholding for crack detection. IEEE Access 2021, 9, 60201–60214. [Google Scholar] [CrossRef]
Fan, L.; Li, S.; Li, Y.; Li, B.; Cao, D.; Wang, F.Y. Pavement cracks coupled with shadows: A new shadow-crack dataset and a shadow-removal-oriented crack detection approach. IEEE/CAA J. Autom. Sin. 2023, 10, 1593–1607. [Google Scholar] [CrossRef]
Pal, M.; Palevičius, P.; Landauskas, M.; Orinaitė, U.; Timofejeva, I.; Ragulskis, M. An overview of challenges associated with automatic detection of concrete cracks in the presence of shadows. Appl. Sci. 2021, 11, 11396. [Google Scholar] [CrossRef]
Orinaitė, U.; Karaliūtė, V.; Pal, M.; Ragulskis, M. Detecting underwater concrete cracks with machine learning: A clear vision of a murky problem. Appl. Sci. 2023, 13, 7335. [Google Scholar] [CrossRef]
Li, X.; Xu, L.; Wei, M.; Zhang, L.; Zhang, C. An underwater crack detection method based on improved YOLOv8. Ocean. Eng. 2024, 313, 119508. [Google Scholar] [CrossRef]
Li, C.; Fan, Z.; Chen, Y.; Lin, H.; Moretti, L.; Loprencipe, G.; Sheng, W.; Wang, K.C. CrackCLF: Automatic pavement crack detection based on closed-loop feedback. IEEE Trans. Intell. Transp. Syst. 2023, 25, 5965–5980. [Google Scholar] [CrossRef]
Huang, Z.; Chen, W.; Al-Tabbaa, A.; Brilakis, I. NHA12D: A new pavement crack dataset and a comparison study of crack detection algorithms. arXiv 2022, arXiv:2205.01198. [Google Scholar]
Zhou, Z.; Zhang, J.; Gong, C. Hybrid semantic segmentation for tunnel lining cracks based on Swin Transformer and convolutional neural network. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2491–2510. [Google Scholar] [CrossRef]
Tao, H.; Liu, B.; Cui, J.; Zhang, H. A convolutional-transformer network for crack segmentation with boundary awareness. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 86–90. [Google Scholar]
Li, S.; Sui, X.; Luo, X.; Xu, X.; Liu, Y.; Goh, R. Medical image segmentation using squeeze-and-expansion transformers. arXiv 2021, arXiv:2105.09511. [Google Scholar]
Wu, X.; Hou, X.; Lai, Z.; Zhou, J.; Zhang, Y.n.; Pedrycz, W.; Shen, L. CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement. arXiv 2024, arXiv:2404.05253. [Google Scholar]
Tran, M.; Bounsavy, W.; Vo, K.; Nguyen, A.; Nguyen, T.; Le, N. Shapeformer: Shape prior visible-to-amodal transformer-based amodal instance segmentation. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
Gao, J.; Qian, X.; Wang, Y.; Xiao, T.; He, T.; Zhang, Z.; Fu, Y. Coarse-to-fine amodal segmentation with shape prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 1262–1271. [Google Scholar]
Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12873–12883. [Google Scholar]
Zhou, Z.; Yan, L.; Zhang, J.; Zheng, Y.; Gong, C.; Yang, H.; Deng, E. Automatic segmentation of tunnel lining defects based on multiscale attention and context information enhancement. Constr. Build. Mater. 2023, 387, 131621. [Google Scholar] [CrossRef]
Xiao, Y.; Xu, Y.; Zhong, Z.; Luo, W.; Li, J.; Gao, S. Amodal segmentation based on visible region segmentation and shape prior. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 2995–3003. [Google Scholar]
Zhou, S.; Chan, K.; Li, C.; Loy, C.C. Towards robust blind face restoration with codebook lookup transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 30599–30611. [Google Scholar]
Jang, W.D.; Wei, D.; Zhang, X.; Leahy, B.; Yang, H.; Tompkin, J.; Ben-Yosef, D.; Needleman, D.; Pfister, H. Learning vector quantized shape code for amodal blastomere instance segmentation. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 17–21 April 2023; pp. 1–5. [Google Scholar]
Kondo, Y.; Ukita, N. Joint learning of blind super-resolution and crack segmentation for realistic degraded images. IEEE Trans. Instrum. Meas. 2024, 73, 5013816. [Google Scholar] [CrossRef]
Zhou, Z.; Zheng, Y.; Zhang, J.; Yang, H. Fast detection algorithm for cracks on tunnel linings based on deep semantic segmentation. Front. Struct. Civ. Eng. 2023, 17, 732–744. [Google Scholar] [CrossRef]
Xu, L.; Gabbouj, M. Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets. arXiv 2024, arXiv:2402.02245. [Google Scholar]

Figure 1. Overall model framework.

Figure 2. Maximum pooling module.

Figure 3. Adaptive feature quantization module. (a) adaptive feature quantization module (feature space conversion). (b) adaptive feature quantization module (quantized feature encoder).

Figure 4. Adaptive spatial fusion module.

Figure 5. Post processing module.

Figure 6. Crack detection effect.

Table 1. Model evaluation index results.

Model	IoU	Recall	F1-Score	Precision
UNet	73.69%	84.96%	81.75%	82.19%
BiSeNetV2	74.27%	87.03%	82.42%	81.23%
DeeplabV3+	73.73%	86.22%	81.86%	81.05%
HRNet	69.80%	81.89%	77.49%	76.14%
SwinUNet	74.79%	87.53%	82.70%	80.46%
AFQSeg (ours)	75.93%	88.49%	84.45%	82.81%

Table 2. Model parameters and calculation improvement effect.

Model	UNet	BiSeNetV2	DeeplabV3+	HRNet	SwinUNet	Ours
Params/MB	31.038	13.294	59.339	29.533	27.146	6.683
Decrease	24.355	6.611	52.656	22.850	20.463	-
Parameter reduction rate	78.5%	49.7%	88.7%	77.4%	75.4%	-
GFLOPs	218.969	13.812	88.954	90.898	30.922	7.081
Decrease	211.888	6.731	81.873	83.817	23.841	-
Speed improvement rate	96.8%	48.7%	0.92%	92.2%	77.1%	-

Table 3. Ablation experiment.

	IoU	Recall	F1-Score	Precision
Remove CR module	75.67	86.26	84.22	81.27
Remove AFQ and CR modules	72.47	78.49	81.36	85.19
Remove SMP, AFQ, CR modules	72.36	82.12	81.31	80.7
AFQSeg (ours)	75.93	88.49	84.45	82.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, S.; Lu, L.; Lin, Z.; Yang, Z.; Wang, S. AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation. Computers 2025, 14, 182. https://doi.org/10.3390/computers14050182

AMA Style

Fang S, Lu L, Lin Z, Yang Z, Wang S. AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation. Computers. 2025; 14(5):182. https://doi.org/10.3390/computers14050182

Chicago/Turabian Style

Fang, Shaoliang, Lu Lu, Zhu Lin, Zhanyu Yang, and Shaosheng Wang. 2025. "AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation" Computers 14, no. 5: 182. https://doi.org/10.3390/computers14050182

APA Style

Fang, S., Lu, L., Lin, Z., Yang, Z., & Wang, S. (2025). AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation. Computers, 14(5), 182. https://doi.org/10.3390/computers14050182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AFQSeg: An Adaptive Feature Quantization Network for Instance-Level Surface Crack Segmentation

Abstract

1. Introduction

2. Related Works

2.1. Crack Detection Under Challenging Conditions

2.2. Encoder–Decoder Architectures

2.3. Attention Mechanisms

2.4. Post-Processing Techniques

2.5. Existing Challenges

3. Method

3.1. Framework

3.2. AFQSeg Surface Crack Detection Model

3.3. SoftMax Pooling Module

3.4. Adaptive Feature Quantization Module

3.5. Post Processing Module

4. Experiment

4.1. Dataset Description

4.2. Experimental Environment

4.3. Evaluation Indicators

5. Results and Analyze

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI