SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation

Huang, Zepeng; Liu, Liuyang; He, Tao; Ma, Ye; Shan, Jinhuan

doi:10.3390/buildings16050971

Open AccessArticle

SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation

by

Zepeng Huang

¹,

Liuyang Liu

^2,*,

Tao He

²,

Ye Ma

² and

Jinhuan Shan

³

¹

School of Highway, Chang’an University, Xi’an 710064, China

²

China Academy of Transportation Sciences, Beijing 100029, China

³

School of Data Science and Artificial Intelligence, Chang’an University, Xi’an 710064, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(5), 971; https://doi.org/10.3390/buildings16050971

Submission received: 30 December 2025 / Revised: 3 February 2026 / Accepted: 4 February 2026 / Published: 2 March 2026

(This article belongs to the Special Issue AI in Construction: Automation, Optimization, and Safety)

Download

Browse Figures

Versions Notes

Abstract

Crack segmentation on building surfaces is challenging due to the thin, curvilinear crack morphology and background interference from textures and illumination variations. This study proposes SDCrackSeg, a U-shaped network combining frequency-domain enhancement with geometry-adaptive convolution. The core Frequency Spatial Convolution module integrates two branches: Adaptive Frequency Convolution enhances high-frequency crack details, while Dynamic Snake Convolution adapts sampling to curvilinear structures. A topology-aware loss based on persistent homology further regularizes structural connectivity. Experiments on CHCrack5K demonstrate state-of-the-art performance with Precision 0.900, mIoU 0.816, F1-score 0.888, and Dice 0.675, while maintaining nearly 200 FPS inference speed. Results confirm that frequency–spatial fusion with topology regularization effectively improves crack detection reliability for practical building inspection.

Keywords:

building crack segmentation; structural health monitoring; Dynamic Snake Convolution; Adaptive Frequency Convolution; frequency–spatial fusion; topology-aware loss; persistent homology

1. Introduction

Cracks are among the most common and critical forms of damage in building and civil infrastructure systems [1]. They frequently occur during the service life of structures as a consequence of material aging, environmental effects, construction defects, or excessive loading conditions. As visible indicators of internal deterioration, cracks may significantly compromise structural safety, durability, and serviceability if left untreated [2,3,4]. Consequently, accurate and timely crack detection plays a vital role in structural health monitoring, condition assessment, and maintenance decision-making [5].

Traditional crack detection methods rely on manual visual inspection, ultrasonic testing, ground-penetrating radar, or infrared thermography [6,7,8]. While manual inspection remains widely adopted due to its simplicity, it is labor-intensive, time-consuming, and highly dependent on inspector expertise, leading to inconsistent results [9,10]. Contact-based techniques such as crack gauges provide precise localized measurements but require prior knowledge of crack locations [11]. Non-contact NDT methods offer improved detection capability but involve specialized equipment and high operational costs, restricting their deployment for routine large-scale inspections [12]. With the advancement of computer vision, vision-based approaches using handcrafted features such as edge detection, morphological operations, and texture analysis have emerged as cost-effective alternatives [13,14,15,16,17,18]. However, their robustness is limited under varying illumination, surface texture, and background interference commonly encountered in real-world environments. Machine learning methods integrating HOG, LBP, and Gabor features with classifiers such as SVM and random forests improve robustness to some extent [19,20,21,22,23,24,25], yet still rely on manually designed representations that may not generalize well across diverse crack patterns and imaging conditions.

Deep learning, particularly convolutional neural networks, has significantly advanced crack detection by automatically learning hierarchical feature representations from raw images through end-to-end training [26]. Existing CNN-based methods can be categorized into three frameworks. Classification-based approaches divide images into patches and assign binary labels [27], but cannot provide pixel-accurate delineation essential for quantitative crack analysis such as width measurement. Object detection frameworks predict bounding boxes for efficient localization [28,29], yet fail to capture the fine-grained, curvilinear structure of cracks that span large areas with irregular shapes. Segmentation-based methods generate dense pixel-wise predictions and have been widely adopted for crack analysis [30,31]. Representative works include Cao et al.’s VGG16-based encoder–decoder network for autonomous crack detection [32], Wu et al.’s enhanced DeepLabV3 framework with Dice loss for fine crack segmentation [33], and Choi and Cha’s SDDNet employing densely connected separable convolutions for real-time segmentation [34]. Recent studies have explored generative AI-based approaches such as diffusion models for crack detection and data augmentation [35]. Although the above segmentation-based methods offer pixel-wise predictions, they still rely on fixed-grid convolutions [36] and region-based losses that are not specifically designed for the curvilinear geometry and topological connectivity requirements of crack structures. In terms of feature extraction, standard convolutions sample features at fixed spatial intervals, which cannot adapt to the curvilinear and tortuous geometry of cracks, leading to broken skeletons at turning points and incomplete recovery of thin branches. Regarding training objectives, cross-entropy and Dice losses penalize pixel-wise or area-wise mismatch but do not explicitly enforce topological constraints, allowing fragmented predictions to achieve acceptable overlap scores while exhibiting severe structural discontinuities at crack junctions. From a representation perspective, existing methods operate exclusively in the spatial domain, treating all frequency components equally, which limits their ability to enhance crack-related high-frequency details while suppressing low-frequency background interference.

These challenges arise from the unique characteristics of crack structures. Cracks typically exhibit distinctive geometric characteristics that pose significant challenges for standard convolution [37]. Their widths range from sub-pixel hairline fractures to multi-pixel structural cracks, and their trajectories are often tortuous with curvature radii varying from sharp corners to gentle bends. Branching patterns frequently form Y-shaped and T-shaped junctions, while discontinuous segments may be interrupted by occlusions or surface wear. Practical imaging conditions further complicate detection. Uneven illumination casts shadows that mimic crack appearance or obscures genuine cracks in dark regions. Surface stains, weathering marks, and repair patches create false edges difficult to distinguish from real cracks. Textured backgrounds including exposed aggregate, mortar joints, and formwork imprints generate high-frequency patterns easily confused with fine cracks [38]. The key problem that remains unsolved is how to simultaneously achieve geometric adaptivity for curvilinear crack structures and topological consistency for connected crack networks. This study aims to bridge this gap by proposing a unified framework that integrates geometry-aware convolution with topology-preserving regularization.

The main contributions of this paper are as follows. First, SDCrackSeg is proposed as a crack-oriented segmentation network that integrates geometry-aware, frequency-aware, and topology-aware components. Second, Dynamic Snake Convolution (DSConv) is designed to adaptively deform convolution kernels along curvilinear crack trajectories. Third, Frequency Spatial Convolution (FSConv) is developed to fuse frequency-domain enhancement with spatial geometric adaptivity. Fourth, a topology-aware loss based on persistent homology is incorporated to regularize structural connectivity and reduce fragmentation. The overall architecture of SDCrackSeg is illustrated in Figure 1, which shows the integration of the proposed components and their relationships within the encoder–decoder framework.

2. Methodology

2.1. Overall Network Architecture

The proposed SDCrackSeg is tailored for crack segmentation with thin, elongated, and topology-sensitive structures. As depicted in Figure 1, the model employs an encoder–decoder framework while incorporating specialized geometry and frequency modules to enhance local detail extraction and global structural consistency.

The encoder progressively learns multi-scale representations through convolution and downsampling. To alleviate the limitations of fixed-grid convolutions, each stage employs a hybrid feature extraction scheme that integrates spatial deformation modeling and frequency-domain enhancement, improving robustness to complex crack morphologies and background interference.

The decoder mirrors the encoder and reconstructs high-resolution predictions through upsampling and feature fusion. Skip connections are used to recover fine details, and hybrid modules further refine boundaries and improve the recovery of thin or low-contrast branches.

In addition, a topology-aware supervision strategy is incorporated to encourage continuous and connected crack masks and to reduce fragmentation. Details of DSConv, FSConv, multi-scale fusion, and the topology-aware loss are presented in the following subsections.

2.2. Dynamic Snake Convolution (DSConv)

Crack patterns on pavement surfaces typically exhibit slender, curved, discontinuous, and highly irregular geometries. Conventional convolutions use fixed grid sampling and cannot align their receptive fields with such anisotropic structures. Even deformable convolutions only apply local pointwise offsets, lacking a mechanism to maintain global continuity along elongated structures.

To overcome these limitations, inspired by Dynamic Snake Convolution (DSConv) [39], a shape-adaptive convolutional operator is introduced, whose sampling locations form a smooth, snake-like trajectory that follows the geometry of line-shaped structures. DSConv was originally introduced for road segmentation in remote sensing imagery. Given the structural resemblance between roads and cracks, both exhibiting elongated, curvilinear, and topology-sensitive patterns, DSConv is well suited for adapting convolutional sampling to better capture crack morphology. DSConv consists of three tightly coupled components:

1.: Offset branch: learns $2 K$ displacement values for K sampling points.
2.: DSC module: converts offsets into a continuous deformable sampling curve.
3.: Directional convolution kernel: a $K \times 1$ (horizontal) or $1 \times K$ (vertical) kernel that performs convolution along the snake trajectory.

This design enables DSConv to dynamically align the receptive field with crack geometry.

Given an input feature map

f \in R^{B \times C \times H \times W}

, DSConv first predicts spatial offsets through a convolutional layer and constrains them using batch normalization and hyperbolic tangent activation:

Δ = tanh (BN ({Conv}_{3 \times 3} (f))),

(1)

where

Δ \in R^{B \times 2 K \times H \times W}

and the offset range is constrained to

[- 1, 1]

.

Unlike conventional deformable convolutions that apply offsets independently, DSConv employs an iterative offset propagation mechanism that ensures continuity in the sampling path, as illustrated in Figure 2a. The center position remains fixed while offsets propagate iteratively from the center to both ends, forming a continuous snake-like trajectory. The final sampling coordinates are computed as:

p_{final} = p_{center} + p_{grid} + α \cdot p_{offset},

(2)

where

α

is a hyperparameter controlling deformation magnitude (typically set to 1).

Features are extracted via bilinear interpolation at the deformed coordinates, as shown in Figure 2b, and a directional convolution (

K \times 1

for horizontal or

1 \times K

for vertical orientation) is applied along the snake trajectory. The complete mathematical derivations, including iterative offset propagation formulas, coordinate grid generation, and bilinear interpolation details, are provided in Appendix A.

2.3. Adaptive Frequency Convolution (AFConv)

Cracks appear as thin, high-frequency structures, while backgrounds typically contain low-frequency components from illumination and texture variations. Standard convolution treats all frequencies equally, limiting its ability to enhance crack details while suppressing background noise. To complement DSConv’s geometry-adaptive modeling, we introduce the AFConv module [40] to enhance crack-sensitive high-frequency cues, as illustrated in Figure 3. Given input features

X \in R^{B \times C \times H \times W}

, AFConv operates as follows:

(1) Adaptive routing: The input is projected to an expanded channel space

X^{'}

, and input-dependent routing coefficients are computed via global average pooling and MLP:

R = Softmax (f_{mlp} (GAP (X^{'}))), R \in R^{B \times F \times C^{'}},

(3)

where F is the number of learnable frequency filters

{K_{f}}_{f = 1}^{F}

.

(2) Frequency modulation: An adaptive spectral weight

W_{f} = \sum_{f = 1}^{F} R_{f} \cdot K_{f}

is applied in the frequency domain:

{\tilde{X}}_{f} = F (X^{'}) ⊙ W_{f} ⊙ g (F (X^{'})),

(4)

where

F (\cdot)

denotes FFT and

g (\cdot)

models inter-frequency dependencies.

(3) Output: The inverse FFT transforms features back to the spatial domain:

\hat{X} = F^{- 1} ({\tilde{X}}_{f})

.

As illustrated in Figure 1b, the proposed FSConv module combines AFConv and DSConv branches through residual fusion. The DSConv branch applies two orthogonal snake convolutions (horizontal and vertical) to capture crack patterns in different orientations. The branch outputs are concatenated and projected through a

1 \times 1

convolution followed by batch normalization:

Y = X + ϕ ([AFConv (X), DSConv (X)]),

(5)

where

[\cdot, \cdot]

denotes concatenation and

ϕ (\cdot)

is a

1 \times 1

convolution. The residual connection ensures stable gradient flow, while batch normalization balances the feature magnitudes from the two heterogeneous branches, preventing either from dominating the fused representation. This design enhances high-frequency crack details while preserving curvilinear geometry.

2.4. Topology-Aware Loss

Region-based losses, like cross-entropy and Dice loss, penalize area mismatch but not structural discontinuities, often producing fragmented masks. To preserve crack connectivity, a topology-aware loss based on persistent homology [41] is introduced, as illustrated in Figure 4.

Given predicted probability map

P \in {[0, 1]}^{H \times W}

and ground-truth mask

M \in {0, 1}^{H \times W}

, we first binarize the prediction:

\hat{M} = I (P > τ)

. The topology loss compares persistence diagrams which can capture connected components and loops using Hausdorff distance:

L_{Topo} = λ_{0} d_{H} (P_{0} (\hat{M}), P_{0} (M)) + λ_{1} d_{H} (P_{1} (\hat{M}), P_{1} (M)),

(6)

where

P_{0}

and

P_{1}

denote 0-dim (components) and 1-dim (loops) persistence diagrams, and

λ_{0}

,

λ_{1}

balance the two terms. In practice, we set

λ_{0} = λ_{1} = 1

to weight components and loops equally, and use

w_{tc}

to control the overall strength of topology regularization. Minimizing

L_{Topo}

encourages continuous crack masks with reduced fragmentation.

2.5. Overall Training Objective

The total loss combines pixel-wise supervision with topology regularization:

L_{total} = L_{CE} + w_{dice} L_{Dice} + 0.5 L_{aux} + w_{tc} L_{Topo},

(7)

where

L_{CE}

is cross-entropy,

L_{Dice}

handles class imbalance,

L_{aux}

supervises the auxiliary head, and

L_{Topo}

enforces topological consistency.

3. Experiments and Evaluation Metrics

3.1. Datasets

CHCrack5K is a large-scale benchmark dataset designed for building wall crack detection and segmentation [42]. It was constructed by integrating 11 publicly available crack datasets that mainly capture surface cracks on common building materials, such as concrete and mortar walls, as shown in Figure 5. To reduce the domain shift caused by heterogeneous data sources and to enable a fair comparison across methods, all images and corresponding annotations were standardized to a unified resolution of

480 \times 480

pixels through a careful preprocessing pipeline.

Specifically, images with non-square aspect ratios or missing borders were processed using padding so that the original spatial structure and crack geometry were preserved. For datasets that provided higher-resolution patches, cropping and resizing were applied to match the target size while retaining representative crack patterns. For sources with diverse image scales, padding and resizing were combined to achieve consistent spatial dimensions and to minimize distortion of thin crack morphology. Through these procedures, CHCrack5K forms a unified yet diverse and challenging benchmark, supporting robust evaluation of building crack segmentation models under variations in illumination, background texture, and crack width.

3.2. Experimental Setup

All experiments were implemented in PyTorch (version 2.1.0) using a unified training pipeline for semantic segmentation. Model training was performed on a single NVIDIA vGPU equipped with 48 GB memory.

The proposed method was trained for 200 epochs with a batch size of 8. Stochastic gradient descent was used as the optimizer, with an initial learning rate of 0.01, a momentum of 0.9, and a weight decay of

1 \times 10^{- 4}

. A polynomial learning-rate decay strategy was adopted throughout training, defined as

lr = {lr}_{0} {(1 - \frac{e}{E})}^{0.9},

(8)

where

{lr}_{0}

denotes the initial learning rate, e is the current epoch index, and E is the total number of training epochs.

For data preprocessing and augmentation, each input image was first resized using a base size of 520 and then randomly cropped to a resolution of

480 \times 480

. Random horizontal flipping was further applied with a probability of 0.5 to improve robustness to appearance variations. The dataloader used four worker processes. During validation, the batch size was set to 1 to ensure stable evaluation and to avoid memory-related constraints.

The training objective was composed of a standard pixel-wise cross-entropy loss and additional optional regularization terms. Specifically, a Dice loss term was included by default to better handle class imbalance between crack pixels and background pixels. In addition, an optional topology continuity constraint loss was incorporated to regularize structural connectivity. We set

λ_{0} = λ_{1} = 1

in

L_{Topo}

and used

w_{tc} = 0.1

to control its overall contribution, and these values were finalized after multiple rounds of tuning on the validation set. When this topology term was enabled, prediction maps were binarized using a threshold of 0.5, and the corresponding loss weight was set to 0.1.

3.3. Evaluation Metrics

This study evaluates crack segmentation performance using five commonly adopted metrics: Precision, Recall, F1-score, Dice, and mIoU. Let

T P

,

F P

, and

F N

denote the numbers of true-positive, false-positive, and false-negative pixels for the crack (foreground) class, respectively. Specifically,

T P

counts crack pixels correctly predicted as cracks,

F P

counts background pixels incorrectly predicted as cracks, and

F N

counts crack pixels missed by the model.

3.3.1. Precision and Recall

Precision measures the proportion of correctly predicted crack pixels among all pixels predicted as cracks, while Recall measures the proportion of correctly predicted crack pixels among all crack pixels in the ground truth. They are defined as

Precision = \frac{T P}{T P + F P}

(9)

Recall = \frac{T P}{T P + F N}

(10)

3.3.2. F1-Score

F1-score provides a balanced summary of Precision and Recall, and is defined as

F 1 -score = \frac{2 Precision \cdot Recall}{Precision + Recall}

(11)

3.3.3. Dice Coefficient

Dice evaluates the similarity between the predicted crack region and the ground truth. A higher Dice value (ranging from 0 to 1) indicates better segmentation quality:

Dice = \frac{2 T P}{2 T P + F P + F N}

(12)

3.3.4. Mean Intersection over Union (mIoU)

The Intersection over Union (IoU) measures the overlap between prediction and ground truth relative to their union. For binary crack segmentation, mIoU is equivalent to the IoU of the crack class:

mIoU = \frac{T P}{T P + F P + F N}

(13)

4. Results and Discussion

4.1. Quantitative Comparison and Discussion

Table 1 reports the quantitative performance of representative segmentation models on the crack segmentation task, including U-Net [31], FCN variants [30], DeepLabv3 variants [43], SegNet [44], and the proposed SDCrackSeg. Overall, SDCrackSeg achieves the best comprehensive performance across the key metrics, with a Precision of 0.900, an mIoU of 0.816, an F1-score of 0.888, and a Dice coefficient of 0.675. These results suggest that SDCrackSeg not only improves the agreement between predictions and ground-truth crack masks as reflected by overlap-based metrics, but also enhances the reliability of crack identification by effectively suppressing false positives in cluttered building-surface backgrounds, as reflected by the highest Precision.

A direct comparison with the strongest overall baseline in this setting highlights the practical improvements brought by SDCrackSeg. Relative to SegNet, SDCrackSeg increases Precision from 0.856 to 0.900, which indicates a marked reduction in false alarms caused by stains, rough textures, and crack-like background patterns. At the same time, mIoU increases from 0.777 to 0.816 and Dice increases from 0.597 to 0.675. The gains in overlap-based metrics are particularly relevant for crack segmentation because cracks often occupy a small fraction of pixels and exhibit thin, elongated shapes. In such cases, even small local discontinuities, missing branches, or boundary offsets can substantially degrade topology and connectivity, while region-level averages may appear only moderately affected. The consistent improvement across mIoU and Dice therefore suggests that SDCrackSeg recovers crack regions more completely and with higher boundary fidelity, which is aligned with the objective of preserving thin branches and maintaining continuity.

The Precision and Recall patterns across methods provide additional insights into different model behaviors. DeepLabv3 variants achieve the highest Recall values, reaching 0.896 and 0.905, implying that multi-scale context modeling helps reduce missed detections in faint or partially occluded cracks. However, this improved sensitivity is accompanied by lower Precision, with values of 0.805 and 0.795, indicating that context aggregation and strong semantic activation can also amplify crack-like background structures. This trade-off is common in building surface scenes where shadows, mortar lines, texture edges, and stains resemble crack trajectories. In contrast, SDCrackSeg maintains a competitive Recall of 0.878 while achieving substantially higher Precision. This balance indicates that the proposed feature design improves discriminability between true cracks and confusing background patterns, which is crucial in engineering practice because excessive false positives increase manual verification workload and may lead to overestimation of damage severity.

For classical encoder–decoder baselines, the results show stable yet limited performance. U-Net achieves a relatively high Recall of 0.887, but its mIoU and Dice remain lower than those of SDCrackSeg. This suggests that U-Net can detect many crack pixels but may produce masks with imprecise boundaries, local fragmentation, or incomplete branching, which reduces overlap consistency. The FCN variants exhibit comparable Precision and Recall, but their mIoU and Dice are noticeably lower, reflecting the difficulty of accurately recovering thin, curvilinear crack structures using coarse upsampling and standard fixed-grid convolution alone. These observations support the notion that, for topology-sensitive targets, accurate segmentation requires not only semantic recognition but also fine-scale structural modeling to maintain narrow branches and continuous skeletons.

The F1-score trends further confirm the overall superiority of SDCrackSeg. Since F1-score jointly reflects Precision and Recall, the best F1-score of 0.888 indicates that SDCrackSeg achieves a favorable compromise between sensitivity and specificity. In crack inspection tasks, this compromise is especially important because missed crack segments can break connectivity and bias length measurements, whereas false positives can contaminate crack networks and distort derived indicators such as density, branching degree, and orientation statistics. The combination of high Precision and high F1-score therefore implies that SDCrackSeg yields more trustworthy crack masks for downstream structural assessment.

From an application-oriented perspective, the improvements achieved by SDCrackSeg can be interpreted as enhanced robustness to three common sources of performance degradation. First, low-contrast and extremely thin cracks are better preserved, which contributes to higher Dice and mIoU by reducing missing branches and local gaps. Second, background interference is suppressed more effectively, improving Precision by avoiding spurious activations along crack-like textures. Third, the overall mask quality is more coherent, which is reflected in the consistent gains across overlap-based and classification-based metrics rather than in a single indicator. This consistency is important because it suggests that the performance benefits are not limited to a specific operating point but generalize across different evaluation criteria.

In summary, the quantitative evaluation demonstrates that SDCrackSeg provides consistent and meaningful improvements over representative baselines. The highest Precision indicates stronger resistance to false positives in cluttered backgrounds, while the best mIoU and Dice indicate more accurate overlap and better preservation of fine-scale crack regions. Together with the strong F1-score, these results suggest that SDCrackSeg is well suited for reliable building crack inspection in challenging real-world conditions, where both detailed structural representation and robust background suppression are required.

4.2. Feature Map Visualization and Qualitative Discussion

To provide an intuitive understanding of how different networks perceive crack structures, we visualize and compare the feature maps produced by SDCrackSeg and several representative baselines, including U-Net, SegNet, FCN with ResNet backbones, and DeepLabv3 variants. The qualitative results are shown in Figure 6. Overall, SDCrackSeg exhibits more crack-aligned, compact, and continuous responses, while the baselines tend to produce broader activations or spurious responses on background textures.

As illustrated in Figure 6, SDCrackSeg produces responses that closely follow the crack centerlines and remain compact near the crack boundaries. For thin cracks and weak contrast segments, the highlighted bands remain visible and continuous, indicating strong sensitivity to fine details. In contrast, U-Net and SegNet often exhibit more diffuse activations around crack neighborhoods. Such diffusion may visually correspond to thicker predicted boundaries and increased ambiguity, especially when cracks are narrow or partially occluded by surface texture.

In addition, SDCrackSeg shows improved stability at crack junctions and branching structures. For Y shaped and T shaped intersections, SDCrackSeg maintains coherent responses across multiple directions and preserves the connectivity of branches. Several baselines show blurred or widened responses near junctions, and some responses weaken along minor branches, which can lead to local discontinuities in the final segmentation. Since crack inspection often relies on connected crack paths and complete branch recovery, this qualitative advantage is practically important.

A further observation is the stronger background suppression ability of SDCrackSeg. Under challenging backgrounds such as rough textures, stains, illumination variations, and structural edges, SDCrackSeg demonstrates reduced activation spillover to non-crack regions. The compared FCN and DeepLabv3 variants, which emphasize large receptive fields and contextual aggregation, are more likely to activate on textured patterns or intensity transitions. Although such context modeling benefits large objects, it can be less suitable for cracks because the target is thin and topology sensitive, and background patterns frequently mimic crack appearance.

These qualitative results align with the design motivation of SDCrackSeg. The frequency enhancement pathway strengthens crack related high frequency cues that support boundary delineation and fine branch visibility. The geometry adaptive sampling pathway better follows tortuous crack trajectories and reduces the mismatch introduced by fixed grid convolution when modeling curvilinear structures. Together, they contribute to more selective and structurally consistent responses, which helps reduce false alarms and preserve crack connectivity in complex building surface scenes. The visual evidence in Figure 6 therefore provides an intuitive explanation for the superior quantitative performance reported in the benchmark experiments.

4.3. Qualitative Comparison on Challenging Crack Patterns

Figure 7 presents a qualitative comparison of crack segmentation results produced by U-Net, FCN with ResNet50 and ResNet101 backbones, DeepLabv3 with ResNet50 and ResNet101 backbones, SegNet, and the proposed SDCrackSeg. For each example, the first row shows the raw image, followed by the ground truth and the predicted masks. The red and green boxes indicate representative regions in which the compared methods yield clearly different outputs. The selected cases reflect common yet challenging scenarios in building crack inspection, including crack junctions, thin and low-contrast branches, tortuous crack trajectories, and ambiguous background textures.

Across all samples, SDCrackSeg generates predictions that are more consistent with the ground truth, particularly in terms of structural continuity, branch preservation, and boundary reliability. In the junction regions highlighted by the green boxes, several baseline methods exhibit discontinuities around intersections or fail to recover secondary branches, producing fragmented crack structures. These errors are important in practice because junction connectivity influences subsequent morphology analysis, including length estimation and branching characterization. In comparison, SDCrackSeg better preserves connectivity at intersections, maintaining coherent links between the main crack and its branches and producing more structurally plausible crack networks.

In the thin-branch regions highlighted by the red boxes, most baselines show reduced sensitivity to faint or narrow cracks. U-Net and FCN variants frequently yield incomplete responses along thin segments, resulting in small gaps and truncated endpoints, which suggests limitations in representing highly curvilinear patterns using fixed-grid convolutions and standard upsampling. DeepLabv3 variants are often more responsive to crack pixels, but this increased sensitivity can be accompanied by reduced precision. In several cases, they generate locally thicker masks or spurious activations on crack-like textures, leading to over-segmentation in the highlighted areas. SegNet provides comparatively stable predictions in some samples, yet missed detections and discontinuities remain evident when cracks become extremely thin or exhibit weak contrast.

Background interference is another recurring challenge. In the presence of rough surface texture, stains, or low-frequency intensity variations, some baseline methods produce false positives or irregular boundaries, especially when true cracks appear close to edges, shadows, or texture gradients. Compared with these approaches, SDCrackSeg yields cleaner masks with fewer visually implausible artifacts while retaining fine details without excessively widening the predicted crack regions. This behavior indicates improved discrimination between true cracks and crack-like background patterns.

Overall, the qualitative results confirm that SDCrackSeg is more robust on difficult crack patterns. The advantages are most apparent in topology-critical regions, such as junctions and branching structures, and in thin or low-contrast segments, where conventional methods often produce fragmented predictions or incomplete branches. These visual observations are consistent with the quantitative improvements reported in Table 1. Despite these improvements, segmentation errors may still occur when multiple challenging factors coexist, such as when extremely fine cracks traverse textured regions and approach junctions. Future work could incorporate multi-scale topological constraints or uncertainty estimation to further enhance robustness in these complex scenarios.

4.4. Accuracy and Runtime Efficiency Assessment

Figure 8 presents the relationship between segmentation accuracy and inference efficiency by reporting mIoU and FPS for different models. This comparison provides a practical perspective on deployability, where a higher mIoU indicates better pixel-level agreement with the ground truth and a higher FPS indicates stronger real-time capability. Overall, SDCrackSeg occupies a competitive region in the accuracy–efficiency space, indicating a favorable balance between segmentation quality and runtime speed.

In terms of accuracy, SDCrackSeg achieves the highest mIoU among all evaluated methods, reaching approximately 0.816. It surpasses the classical encoder–decoder baseline U-Net, which attains around 0.785, and also outperforms FCN and DeepLabv3 variants, whose mIoU values are approximately within the range of 0.755 to 0.770. The improvement suggests that SDCrackSeg provides more stable overlap with crack masks, especially for thin, curvilinear, and branching patterns that are prone to fragmentation or over-smoothing when using conventional convolutional backbones. Importantly, this accuracy gain is obtained without a clear reduction in inference speed, which is essential for large-scale inspection scenarios.

With respect to efficiency, SDCrackSeg maintains an inference speed close to 200 FPS under the adopted evaluation setting. Although SegNet and FCN_Resnet50 report higher throughput, exceeding 240 FPS, their mIoU values remain below 0.78, indicating that faster decoding or simplified feature processing may reduce the ability to preserve fine crack details. Conversely, DeepLabv3_Resnet101 exhibits the lowest efficiency, close to 160 FPS, while delivering only moderate accuracy. This observation suggests that increasing backbone depth and relying on multi-scale context aggregation does not necessarily lead to improved segmentation of topology-sensitive crack structures, but it does increase computational cost.

A closer inspection reveals clear differences across model families. FCN and SegNet achieve relatively high throughput, yet their accuracy is limited, which can be attributed to coarse upsampling and insufficient recovery of high-frequency boundaries. DeepLabv3 variants provide stronger contextual modeling but remain less efficient, and their accuracy gains are marginal, implying that background interference and false positives still affect final overlap quality. U-Net represents a stronger overall baseline with a more balanced profile, but it remains inferior to SDCrackSeg at similar runtime, reflecting the difficulty of capturing complex crack geometry with fixed-grid convolutions.

In summary, Figure 8 indicates that SDCrackSeg offers a superior balance between accuracy and runtime efficiency. It delivers the highest mIoU while retaining high inference throughput, which makes it suitable for practical applications that require both reliable segmentation and rapid processing, such as facade inspection using unmanned aerial vehicles, mobile imaging platforms, and long-term structural health monitoring.

5. Conclusions

This paper presented SDCrackSeg, a segmentation network designed for building cracks that are thin, elongated, and highly sensitive to structural continuity. The method targets two common failure modes in crack segmentation, namely blurred or over-thickened boundaries caused by background textures and broken connectivity caused by weak contrast, occlusions, and complex crack junctions.

A key contribution is the Frequency Spatial Convolution module, which integrates complementary cues from frequency enhancement and geometry adaptivity. The Adaptive Frequency Convolution branch emphasizes crack-sensitive high-frequency details that are essential for delineating narrow branches and weak-contrast segments. The Dynamic Snake Convolution branch improves the ability of convolutional sampling to follow curved crack paths, reducing the mismatch introduced by fixed-grid convolution when modeling tortuous structures. By combining these branches through feature fusion, SDCrackSeg produces representations that are more crack-aligned and less affected by irrelevant background patterns.

In addition to architectural design, this work incorporated a topology-aware loss based on persistent homology to explicitly regularize the structural consistency of predictions. Unlike region-based objectives that primarily measure overlap, the topology constraint penalizes fragmentation and encourages connectivity preservation, which is crucial for downstream crack morphology analysis and for obtaining reliable crack networks at junctions and turning points.

Extensive experiments on CHCrack5K validated the effectiveness of the proposed approach. SDCrackSeg achieved a precision of 0.900, an mIoU of 0.816, an F1-score of 0.888, and a Dice coefficient of 0.675, and it maintained an inference speed close to 200 FPS, indicating a favorable balance between accuracy and efficiency for practical inspection workloads. Together with the qualitative evidence from feature response visualizations, the results suggest that the proposed frequency enhancement, geometry-adaptive sampling, and topology regularization jointly improve boundary reliability, suppress false activations on crack-like textures, and reduce discontinuities in challenging regions.

Despite these promising results, several limitations should be acknowledged. First, although CHCrack5K integrates 11 diverse datasets, validation under extreme imaging conditions such as strong shadows, overexposure, or night-time acquisition remains limited and requires further investigation. Second, highly discontinuous cracks with large gaps may still produce fragmented predictions, as the topology-aware loss encourages connectivity but cannot hallucinate missing segments. Third, dense parallel crack patterns may occasionally merge due to the connectivity-favoring regularization. Fourth, the persistent homology computation increases training time by approximately 15%, although inference speed remains unaffected.

Future work will pursue the following specific directions. First, cross-domain generalization will be evaluated by testing SDCrackSeg on pavement crack datasets like CrackForest, DeepCrack, and industrial inspection scenarios to determine whether the frequency–geometry fusion approach transfers effectively across different crack types and imaging conditions. Second, a lightweight variant will be developed by replacing standard convolutions in DSConv with depthwise separable convolutions, targeting inference speeds exceeding 300 FPS on edge devices while maintaining mIoU above 0.80. Third, multi-threshold topology consistency will be investigated by computing the topology-aware loss at multiple binarization thresholds during training, with the hypothesis that this will reduce sensitivity to post-processing threshold selection. Fourth, joint crack attribute estimation will be explored by extending the decoder to predict crack width and orientation maps alongside segmentation masks in a multi-task learning framework.

Author Contributions

Conceptualization, Z.H., L.L. and J.S.; methodology, Z.H. and J.S.; software, L.L., T.H. and Y.M.; validation, Z.H. and J.S.; formal analysis, J.S.; investigation, Z.H. and J.S.; resources, L.L., T.H. and Y.M.; data curation, Z.H.; writing—original draft preparation, Z.H.; writing—review and editing, J.S.; visualization, Z.H.; supervision, L.L., T.H. and Y.M.; project administration, L.L., T.H. and Y.M.; funding acquisition, L.L., T.H. and Y.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, China Academy of Transportation Sciences, PRC (Grant No. 2025B1205).

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from Hanshen Chen and are available at https://github.com/hanshenchen/CHCrack5K (accessed on 15 December 2025) with the permission of Hanshen Chen.

Acknowledgments

The authors would like to express their gratitude to the members of the research group for their support.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Mathematical Details of Dynamic Snake Convolution

Appendix A.1. Offset Field Learning

This appendix provides complete mathematical derivations for DSConv. Given input

f \in R^{B \times C \times H \times W}

, spatial offsets are predicted:

Δ_{offset} = {Conv}_{3 \times 3} (f), Δ_{offset} \in R^{B \times 2 K \times H \times W}

(A1)

Offsets are normalized:

Δ = tanh (BN (Δ_{offset}))

, constraining to

[- 1, 1]

.

Appendix A.2. Iterative Offset Propagation

The offset tensor splits into

Δ = [Δ_{y}, Δ_{x}]

, each

[B, K, H, W]

. For center index

c = ⌊ K / 2 ⌋

:

\begin{matrix} y_{new} [c] & = 0, \end{matrix}

(A2)

\begin{matrix} y_{new} [c + i] & = y_{new} [c + i - 1] + Δ_{y} [c + i], i = 1, \dots, ⌊ K / 2 ⌋, \end{matrix}

(A3)

\begin{matrix} y_{new} [c - i] & = y_{new} [c - i + 1] + Δ_{y} [c - i], i = 1, \dots, ⌊ K / 2 ⌋ . \end{matrix}

(A4)

Appendix A.3. Coordinate Grid Generation

For horizontal snake (morph = 0):

x_{grid} [k] = k - ⌊ K / 2 ⌋, y_{grid} [k] = 0, k \in {0, \dots, K - 1}

(A5)

Base coordinates:

y_{center} [k, i, j] = j

,

x_{center} [k, i, j] = i

.

Appendix A.4. Deformed Sampling and Bilinear Interpolation

Final coordinates:

y_{final} = y_{center} + y_{grid} + α \cdot y_{new}, x_{final} = x_{center} + x_{grid} + α \cdot x_{new}

(A6)

Normalized to

[- 1, 1]

:

Y_{norm} = 2 Y_{final} / (W - 1) - 1

. Bilinear interpolation:

f_{deformed} (P) = \sum_{q \in N (P)} w_{q} \cdot f (q)

(A7)

with weights

w_{i j} = {(1 - | Δ x |}^{1 - i}) (1 - {| Δ y |}^{1 - j})

for

i, j \in {0, 1}

.

Appendix A.5. Directional Convolution

For horizontal snakes,

(K, 1)

convolution with stride

(K, 1)

:

O [i, j] = \sum_{k = 0}^{K - 1} W [k] \cdot f_{deformed} [i, K \cdot j + k], O_{out} = ReLU (BN (O))

(A8)

Appendix B. Pseudocode for Key Components

Appendix B.1. Adaptive Frequency Convolution

Algorithm A1 describes the forward pass of the AFConv module, which performs adaptive frequency-domain filtering to enhance crack-related high-frequency features.

Algorithm A1 Adaptive Frequency Convolution Forward Pass

Require: Input feature map

X \in R^{B \times C \times H \times W}

, Learnable frequency filters

{K_{f}}_{f = 1}^{F}

, expansion ratio r

Ensure: Output feature map

Y \in R^{B \times C \times H \times W}

1:: $X \leftarrow ReLU (BatchNorm (X))$
2:: $X^{'} \leftarrow Linear (X, C \to r \cdot C)$ # Channel expansion
3:: $X^{'} \leftarrow StarReLU (X^{'})$
4:: # Compute adaptive routing weights
5:: $g \leftarrow GlobalAvgPool (X^{'})$ , $g \in R^{B \times r C}$
6:: $R \leftarrow Softmax (MLP (g))$ , $R \in R^{B \times F \times r C}$
7:: # Transform to frequency domain
8:: $F_{X} \leftarrow RFFT 2 D (X^{'})$
9:: # Compute adaptive spectral weights
10:: $W \leftarrow \sum_{f = 1}^{F} R_{f} \cdot K_{f}$
11:: # Apply frequency modulation
12:: $F_{weighted} \leftarrow F_{X} ⊙ W$
13:: $G \leftarrow ComplexSELU (ComplexBN (ComplexConv (F_{X})))$
14:: $F_{out} \leftarrow F_{weighted} ⊙ G$ Complex multiplication
15:: # Transform back to spatial domain
16:: $X^{''} \leftarrow IRFFT 2 D (F_{out})$
17:: $Y \leftarrow Linear (X^{''}, r \cdot C \to C)$
18:: return $Y$

Appendix B.2. Topology-Aware Loss

Algorithm A2 describes the computation of the topology-aware loss based on persistent homology.

Algorithm A2 Topology-Aware Loss Computation

Require: Predicted probability map

P \in {[0, 1]}^{H \times W}

, Ground truth mask

M \in {0, 1}^{H \times W}

, Threshold

τ

, Weights

λ_{0}, λ_{1}

Ensure: Topology loss

L_{Topo}

1:: $\hat{M} \leftarrow I (P > τ)$ # Binarize prediction
2:: $P_{0}^{pred}, P_{1}^{pred} \leftarrow PersistentHomology (\hat{M})$
3:: $P_{0}^{gt}, P_{1}^{gt} \leftarrow PersistentHomology (M)$
4:: $L_{Topo} \leftarrow λ_{0} \cdot d_{H} (P_{0}^{pred}, P_{0}^{gt}) + λ_{1} \cdot d_{H} (P_{1}^{pred}, P_{1}^{gt})$
5:: return $L_{Topo}$

References

Chen, L.Z.; Zhou, L.Y.; Li, L.; Luo, M.Z. CrackDiffusion: Crack inpainting with denoising diffusion models and crack segmentation perceptual score. Smart Mater. Struct. 2023, 32, 054001. [Google Scholar] [CrossRef]
Zhang, L.X.; Shen, J.K.; Zhu, B.J. A research on an improved Unet-based concrete crack detection algorithm. Struct. Health Monit. Int. J. 2021, 20, 1864–1879. [Google Scholar] [CrossRef]
Li, H.W.; Wu, X.M.; Nie, Q.K.; Yu, J.C.; Zhang, L.; Wang, Q.J.; Gao, Q.Y. Lifetime prediction of damaged or cracked concrete structures: A review. Structures 2025, 71, 108095. [Google Scholar] [CrossRef]
Lin, Q.; Jiang, Y.; Sugimoto, S. Research on Strength Degradation and Crack Development in Defective Concrete. GeoHazards 2025, 6, 50–59. [Google Scholar] [CrossRef]
Chen, Q.; Huang, Y.C.; Weng, X.X.; Liu, W.J. Curve-based crack detection using crack information gain. Struct. Control Health Monit. 2021, 28, e2764. [Google Scholar] [CrossRef]
Yang, Y.S.; Yang, C.M.; Huang, C.W. Thin crack observation in a reinforced concrete bridge pier test using image processing and analysis. Adv. Eng. Softw. 2015, 83, 99–108. [Google Scholar] [CrossRef]
Fujita, Y.; Hamamoto, Y. A robust automatic crack detection method from noisy concrete surfaces. Mach. Vis. Appl. 2011, 22, 245–254. [Google Scholar] [CrossRef]
Dhital, D.; Lee, J.R. A fully non-contact ultrasonic propagation imaging system for closed surface crack evaluation. Exp. Mech. 2012, 52, 1111–1122. [Google Scholar] [CrossRef]
Chow, J.K.; Liu, K.F.; Tan, P.S.; Su, Z.Y.; Wu, J.; Li, Z.F.; Wang, Y.H. Automated defect inspection of concrete structures. Autom. Constr. 2021, 132, 103959. [Google Scholar] [CrossRef]
Zhou, H.; Xu, C.; Tang, X.; Wang, S.; Zhang, Z. A Review of Vision-Laser-Based Civil Infrastructure Inspection and Monitoring. Sensors 2022, 22, 5882. [Google Scholar] [CrossRef]
Yan, J.; Downey, A.; Cancelli, A.; Laflamme, S.; Chen, A.; Li, J.; Ubertini, F. Concrete Crack Detection and Monitoring Using a Capacitive Dense Sensor Array. Sensors 2019, 19, 1843. [Google Scholar] [CrossRef]
Shrestha, P.; Avci, O.; Rifai, S.; Abla, F.; Seek, M.; Barth, K.; Halabe, U. A Review of Infrared Thermography Applications for Civil Infrastructure. SDHM Struct. Durab. Health Monit. 2025, 19, 193–231. [Google Scholar] [CrossRef]
Yuan, Q.; Shi, Y.; Li, M. A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges. Remote Sens. 2024, 16, 2910. [Google Scholar] [CrossRef]
Koch, C.; Georgieva, K.; Kasireddy, V.; Akinci, B.; Fieguth, P. A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure. Adv. Eng. Inform. 2015, 29, 196–210. [Google Scholar] [CrossRef]
Avendaño, J.C.; Leander, J.; Karoumi, R. Image-Based Concrete Crack Detection Method Using the Median Absolute Deviation. Sensors 2024, 24, 2736. [Google Scholar] [CrossRef]
Talab, A.M.; Huang, Z.C.; Xi, F.; Liu, H.M. Detection crack in image using Otsu method and multiple filtering in image processing techniques. Optik 2016, 127, 1030–1033. [Google Scholar] [CrossRef]
Tian, F.; Zhao, Y.; Che, X.; Zhao, Y.; Xin, D. Concrete Crack Identification and Image Mosaic Based on Image Processing. Appl. Sci. 2019, 9, 4826. [Google Scholar] [CrossRef]
Medina, R.; Llamas, J.; Gómez-García-Bermejo, J.; Zalama, E.; Segarra, M.J. Crack Detection in Concrete Tunnels Using a Gabor Filter Invariant to Rotation. Sensors 2017, 17, 1670. [Google Scholar] [CrossRef]
Sun, Z.; Caetano, E.; Pereira, S.; Moutinho, C. Employing histogram of oriented gradient to enhance concrete crack detection performance with classification algorithm and Bayesian optimization. Eng. Fail. Anal. 2023, 150, 107351. [Google Scholar] [CrossRef]
Chen, C.; Seo, H.; Jun, C.H.; Zhao, Y. Pavement crack detection and classification based on fusion feature of LBP and PCA with SVM. Int. J. Pavement Eng. 2022, 23, 3274–3283. [Google Scholar] [CrossRef]
de León, G.; Fiorentini, N.; Leandri, P.; Losa, M. A New Region-Based Minimal Path Selection Algorithm for Crack Detection and Ground Truth Labeling Exploiting Gabor Filters. Remote Sens. 2023, 15, 2722. [Google Scholar] [CrossRef]
Wang, P.; Qiao, H.; Feng, Q.; Xue, C. Internal corrosion cracks evolution in reinforced magnesium oxychloride cement concrete. Adv. Cem. Res. 2023, 36, 15–30. [Google Scholar] [CrossRef]
Fan, C.L. Detection of multidamage to reinforced concrete using support vector machine-based clustering from digital images. Struct. Control Health Monit. 2021, 28, e2841. [Google Scholar] [CrossRef]
Jia, H.; Lin, J.; Liu, J. Bridge seismic damage assessment model applying artificial neural networks and the random forest algorithm. Adv. Civ. Eng. 2020, 2020, 6548682. [Google Scholar] [CrossRef]
Omar, I.; Khan, M.; Starr, A. Comparative Analysis of Machine Learning Models for Predicting Crack Propagation under Coupled Load and Temperature. Appl. Sci. 2023, 13, 7212. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Cha, Y.-J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Gao, X.; Cao, C.; Yi, X. Using the improved YOLOv11 model to enhance computer vision applications for building crack detection algorithms. Sci. Rep. 2025, 15, 38843. [Google Scholar] [CrossRef]
He, X.; Tang, Z.; Deng, Y.; Zhou, G.; Wang, Y.; Li, L. UAV-based road crack object-detection algorithm. Autom. Constr. 2023, 154, 105014. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
Wu, Z.; Tang, Y.; Hong, B.; Liang, B.; Liu, Y. Enhanced Precision in Dam Crack Width Measurement: Leveraging Advanced Lightweight Network Identification for Pixel-Level Accuracy. Int. J. Intell. Syst. 2023, 9940881, 16. [Google Scholar] [CrossRef]
Choi, W.; Cha, Y.-J. SDDNet: Real-Time Crack Segmentation. IEEE Trans. Ind. Electron. 2020, 67, 8016–8025. [Google Scholar] [CrossRef]
Shao, Y.; Li, L.; Li, J.; Yao, X.; Li, Q.; Hao, H. Advancing crack detection with generative AI for structural health monitoring. Struct. Health Monit. 2025, 14759217251369000. [Google Scholar] [CrossRef]
Cohen, T.; Welling, M. Group Equivariant Convolutional Networks. Int. Conf. Mach. Learn. 2016, 48, 2990–2999. [Google Scholar]
Sohaib, M.; Hasan, M.J.; Shah, M.A.; Zheng, Z. A robust self-supervised approach for fine-grained crack detection in concrete structures. Sci. Rep. 2024, 14, 12646. [Google Scholar] [CrossRef]
Zou, Q.; Zhang, Z.; Li, Q.; Qi, X.; Wang, Q.; Wang, S. DeepCrack: Learning Hierarchical Convolutional Features for Crack Detection. IEEE Trans. Image Process. 2019, 28, 1498–1512. [Google Scholar] [CrossRef]
Qi, Y.; He, X.; Qi, Y.; Zhang, Y.; Yang, G. Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation. In 2023 IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2023; pp. 6047–6056. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, Z.; Lan, C.; Wang, Y.; Zhang, X.; Yang, G. Adaptive Frequency Filters as Efficient Global Token Mixers. In IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2023; pp. 6026–6036. [Google Scholar] [CrossRef]
Aktas, M.E.; Akbas, E.; Fatmaoui, A.E. Persistence homology of networks: Methods and applications. Appl. Netw. Sci. 2019, 4, 61. [Google Scholar] [CrossRef]
CHCrack5K: A Comprehensive Crack Detection Dataset. GitHub, 2024. Available online: https://github.com/hanshenchen/CHCrack5K (accessed on 25 December 2025).
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Eur. Conf. Comput. Vis. 2018, 11219, 801–818. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall Network Architecture. Architecture of SDCrackSeg: (a) Overall U-shaped encoder–decoder segmentation framework with multi-scale skip connections; (b) Detailed structure of the FS-Conv block/FS_Module, featuring dual-branch feature fusion (AF-Conv and DS-Conv) followed by FFN/BN/GELU refinement; (c) Basic residual convolutional unit consisting of Conv

3 \times 3

+ Batch Normalization + GELU.

Figure 1. Overall Network Architecture. Architecture of SDCrackSeg: (a) Overall U-shaped encoder–decoder segmentation framework with multi-scale skip connections; (b) Detailed structure of the FS-Conv block/FS_Module, featuring dual-branch feature fusion (AF-Conv and DS-Conv) followed by FFN/BN/GELU refinement; (c) Basic residual convolutional unit consisting of Conv

3 \times 3

+ Batch Normalization + GELU.

Figure 2. Illustration of the Dynamic Snake Convolution principle, adapted from [39]. (a) Iterative offset propagation mechanism: the center position remains fixed while offsets accumulate iteratively toward both ends, forming a continuous snake-like sampling trajectory. (b) Bilinear interpolation for feature extraction at non-integer deformed coordinates, including coordinate normalization, four nearest neighbor identification, weight calculation, and weighted summation.

Figure 3. The architecture of AFConv module.

Figure 4. Persistent Homology and Persistence-Diagram Matching: (a) multi-scale filtration showing barcode evolution of connected components (

B_{0}

) and loops (

B_{1}

); (b) persistence diagrams P, Q and their Hausdorff distance

d_{H} (P, Q)

.

Figure 4. Persistent Homology and Persistence-Diagram Matching: (a) multi-scale filtration showing barcode evolution of connected components (

B_{0}

) and loops (

B_{1}

); (b) persistence diagrams P, Q and their Hausdorff distance

d_{H} (P, Q)

.

Figure 5. Examples from the CHCrack5K dataset for building wall crack segmentation (raw images and corresponding ground-truth masks).

Figure 6. Qualitative comparison of feature responses produced by different crack segmentation networks. From top to bottom are the raw images and the corresponding activation maps of SDCrackSeg and representative baselines, illustrating that SDCrackSeg yields more crack-aligned, compact, and structurally coherent responses while suppressing background interference. Warmer colors (red/yellow) indicate higher activation values, while cooler colors (blue) represent lower activation.

Figure 7. Qualitative comparison of crack segmentation results. Results are shown for U-Net, FCN (ResNet50/101), DeepLabv3 (ResNet50/101), SegNet, and the proposed SDCrackSeg. The highlighted regions indicate typical challenging patterns, including thin and low-contrast segments, tortuous trajectories, crack junctions, and background textures that resemble cracks.

Figure 8. Accuracy and efficiency comparison of crack segmentation models: mIoU versus FPS for SDCrackSeg and representative baselines.

Table 1. Quantitative Performance Comparison on CHCrack5K Dataset.

Model	Precision	Recall	mIoU	F1-Score	Dice
U-Net	0.846	0.887	0.786	0.865	0.622
FCN_Resnet50	0.845	0.855	0.767	0.850	0.558
FCN_Resnet100	0.839	0.851	0.761	0.845	0.544
Deeplabv3_Resnet50	0.805	0.896	0.760	0.844	0.551
Deeplabv3_Resnet101	0.795	0.905	0.756	0.841	0.547
SegNet	0.856	0.859	0.777	0.858	0.597
SDCrackSeg	0.900	0.878	0.816	0.888	0.675

Note: Bold values indicate the best performance in each metric.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Z.; Liu, L.; He, T.; Ma, Y.; Shan, J. SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation. Buildings 2026, 16, 971. https://doi.org/10.3390/buildings16050971

AMA Style

Huang Z, Liu L, He T, Ma Y, Shan J. SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation. Buildings. 2026; 16(5):971. https://doi.org/10.3390/buildings16050971

Chicago/Turabian Style

Huang, Zepeng, Liuyang Liu, Tao He, Ye Ma, and Jinhuan Shan. 2026. "SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation" Buildings 16, no. 5: 971. https://doi.org/10.3390/buildings16050971

APA Style

Huang, Z., Liu, L., He, T., Ma, Y., & Shan, J. (2026). SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation. Buildings, 16(5), 971. https://doi.org/10.3390/buildings16050971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SDCrackSeg: A Frequency- and Spatial Geometry-Aware Topology-Preserving Network for Building Crack Segmentation

Abstract

1. Introduction

2. Methodology

2.1. Overall Network Architecture

2.2. Dynamic Snake Convolution (DSConv)

2.3. Adaptive Frequency Convolution (AFConv)

2.4. Topology-Aware Loss

2.5. Overall Training Objective

3. Experiments and Evaluation Metrics

3.1. Datasets

3.2. Experimental Setup

3.3. Evaluation Metrics

3.3.1. Precision and Recall

3.3.2. F1-Score

3.3.3. Dice Coefficient

3.3.4. Mean Intersection over Union (mIoU)

4. Results and Discussion

4.1. Quantitative Comparison and Discussion

4.2. Feature Map Visualization and Qualitative Discussion

4.3. Qualitative Comparison on Challenging Crack Patterns

4.4. Accuracy and Runtime Efficiency Assessment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Mathematical Details of Dynamic Snake Convolution

Appendix A.1. Offset Field Learning

Appendix A.2. Iterative Offset Propagation

Appendix A.3. Coordinate Grid Generation

Appendix A.4. Deformed Sampling and Bilinear Interpolation

Appendix A.5. Directional Convolution

Appendix B. Pseudocode for Key Components

Appendix B.1. Adaptive Frequency Convolution

Appendix B.2. Topology-Aware Loss

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI