1. Introduction
Fracture information exposed on the tunnel face constitutes a direct indicator of rock-mass integrity, structural discontinuity, and potential instability during excavation. Accurate identification of these fractures is therefore indispensable for assessing surrounding-rock conditions, evaluating excavation safety, and informing timely support and reinforcement measures [
1,
2]. From the perspective of geoscience and engineering geology, tunnel-face fractures represent important discontinuity information that is routinely used in geological logging, rock-mass structure description, surrounding-rock classification, and excavation-support adjustment. Parameters such as fracture orientation, persistence, spacing, aperture, and connectivity are closely related to rock-mass quality evaluation and are commonly considered in field geological mapping and tunnel engineering assessment. Therefore, automated fracture recognition should not only focus on image segmentation accuracy but also provide interpretable geological information that can support practical engineering decisions. In real construction environments, however, reliable fracture recognition remains difficult because tunnel imagery is frequently degraded by uneven illumination, dust interference, heterogeneous rock textures, and the intrinsically slender, irregular, and discontinuous morphology of fracture traces. These factors collectively make robust automated detection a challenging task.
Early studies on fracture extraction were dominated by handcrafted image-processing techniques, including threshold segmentation, edge detection, and morphological filtering. Jiang et al. developed an adaptive thresholding approach based on the Canny operator for extracting fractures from tunnel-face images [
1]. Talab et al. combined grayscale transformation, Sobel edge detection, and Otsu thresholding to identify fractures [
2], whereas Hoang improved the Otsu method to enhance segmentation performance for surface fractures [
3]. Nnolim further proposed a fully adaptive segmentation algorithm to suppress noise and textural interference on concrete surfaces [
4]. Although such methods are computationally efficient and straightforward to implement, their effectiveness depends heavily on manually designed features and empirical parameter adjustment, which substantially constrains their robustness in complex tunnel settings.
With the rapid advancement of deep learning, convolutional neural network (CNN)-based methods have emerged as a powerful alternative for fracture recognition [
5,
6,
7,
8,
9,
10,
11,
12]. Unlike conventional image-processing pipelines, deep models can learn hierarchical feature representations directly from data and are therefore better suited to capturing fracture patterns under visually complex backgrounds. Bai et al. investigated the identification of tunnel defects from visible-spectrum images acquired in harsh environments, demonstrating the feasibility of deep models for tunnel fracture recognition [
5]. Dai et al. improved YOLOv5 for tunnel fracture detection and achieved favorable performance [
6]. Man et al. combined transfer learning with CNNs to identify tunnel water leakage and fractures [
7]. Xu et al. compared Faster R-CNN and Mask R-CNN for fracture detection [
11], while Wu and Zhang integrated improved Retinex preprocessing with deep learning to further enhance detection accuracy [
12]. Collectively, these studies indicate that deep learning has become a major technical route for automated fracture identification in tunnel engineering.
Despite this progress, tunnel-face fracture recognition remains particularly challenging because fracture traces are often thin, low in contrast, ambiguous at the boundaries, and highly variable in scale and orientation. For such fine-grained targets, semantic segmentation is generally more suitable than bounding-box-based detection because it enables pixel-level localization while preserving structural details. Classical segmentation architectures, including U-Net, SegNet, PSPNet, and DeepLabv3+, have been widely adopted in engineering image analysis and have shown strong capability in boundary extraction and multiscale contextual modeling [
13,
14,
15,
16,
17]. Among them, DeepLabv3+ combines an encoder–decoder architecture with atrous separable convolution, thereby enabling effective multiscale feature aggregation and boundary refinement, which makes it particularly attractive for fracture segmentation tasks [
15,
16].
In parallel, attention mechanisms have been increasingly introduced into segmentation networks to enhance salient feature representation and suppress irrelevant background responses. Coordinate Attention and Squeeze-and-Excitation are representative examples that improve feature discrimination by modeling channel-wise and positional dependencies [
18,
19]. Meanwhile, lightweight backbones such as MobileNetV2 and MobileNetV3 provide practical compromises between computational cost and predictive accuracy, thereby improving the feasibility of deployment in engineering scenarios [
20,
21]. In this study, MobileNetV2 is adopted as the backbone network of the proposed MF-DeepLabv3+ framework. Nevertheless, existing methods still exhibit notable limitations when applied to tunnel-face fracture segmentation, including insufficient sensitivity to thin fracture structures, inadequate boundary recovery, and excessive computational burden in field applications.
Recently, transformer-based segmentation models, such as SETR, SegFormer, Swin Transformer-based networks, and Mask2Former, have shown strong capability in global-context modeling and have achieved promising performance in semantic segmentation tasks. In addition, lightweight and hybrid architectures, including BiSeNet, Fast-SCNN, MobileViT, and lightweight DeepLab variants, have been developed to balance segmentation accuracy and computational efficiency. However, directly applying these general segmentation models to tunnel-face fracture recognition remains challenging. Transformer-based models usually require large-scale training data and substantial computational resources, whereas lightweight models may lose fine fracture details during feature compression. More importantly, existing attention-based segmentation methods are not specifically designed to address the simultaneous challenges of thin-fracture representation, geological noise suppression, fracture-continuity preservation, and field-deployment efficiency. Therefore, a task-specific segmentation framework is still needed for robust tunnel-face fracture recognition.
To overcome these limitations, this study proposes MF-DeepLabv3+, an improved DeepLabv3+-based framework specifically designed for tunnel-face fracture recognition. The proposed framework introduces a Multi-Scale Cross Attention module to enhance feature interaction across different receptive fields and improve the representation of slender fracture traces under complex geological backgrounds. A Feature Smoothing Module is further incorporated to suppress isolated noise responses and improve the continuity of discontinuous fracture predictions. In addition, a MobileNetV2 backbone is adopted to reduce model complexity and improve deployment feasibility. The overarching objective is to achieve a more effective balance among segmentation accuracy, robustness, fracture-continuity preservation, and computational efficiency, thereby providing a practical solution for intelligent tunnel-face fracture detection. To ground the claims of efficiency, the proposed lightweight model is designed for deployment on low-power edge devices, such as the NVIDIA Jetson Orin series, or field laptops equipped with mobile GPUs, such as the NVIDIA GeForce RTX 4060 Laptop GPU, for near-real-time on-site analysis.
2. Model Modification
During tunnel excavation and subsequent operation and maintenance, accurate assessment of tunnel-face stability is essential for ensuring construction safety and long-term structural reliability. Because fractures directly reflect rock-mass discontinuity and local instability, their accurate identification and effective extraction are of central importance in tunnel engineering. Conventional detection approaches, however, often fail to simultaneously satisfy the requirements of high accuracy and high efficiency under the complex, dynamic, and interference-prone conditions of tunnel faces. With the rapid development of deep learning, semantic segmentation has opened new possibilities for tunnel-face fracture detection. Among existing architectures, DeepLabv3+ has shown considerable potential owing to its strong multiscale representation capability and encoder–decoder design. Nevertheless, to better accommodate the specific characteristics of tunnel-face fractures, including thin morphology, scale variability, and ambiguous boundaries, further adaptation of the model remains necessary.
2.1. Overall Architecture
MF-DeepLabv3+ is an enhanced semantic segmentation framework developed on the basis of DeepLabv3+ and specifically tailored for tunnel-face fracture detection. To improve the extraction of fractures with diverse scales, shapes, and spatial distributions, the model incorporates a Multi-Scale Cross Attention (MSCA) module. By employing convolutional operations with different dilation rates, the module simultaneously captures low-, medium-, and high-scale feature responses, and then performs adaptive fusion through a cross-scale attention mechanism. This design strengthens the model’s ability to represent fracture patterns across multiple receptive fields and is particularly effective for identifying fine fractures and structurally complex fracture distributions.
In addition, a Feature Smoothing Module is introduced to refine the extracted feature maps. This module combines depthwise separable convolution with residual connections to suppress noise and redundant activations while preserving discriminative fracture information. By smoothing unstable responses without sacrificing essential structural cues, the module improves feature quality and enhances model robustness under visually complex tunnel conditions.
To further reduce computational cost and satisfy the practical requirements of on-site engineering applications, MobileNetV2 [
21] is adopted as the backbone network of MF-DeepLabv3+. MobileNetV2 is well known for its lightweight architecture and high efficiency. Its design couples depthwise separable convolution with an inverted residual structure, thereby substantially reducing computational complexity while maintaining effective feature extraction capability. Specifically, depthwise separable convolution factorizes standard convolution into depthwise and pointwise operations, whereas the inverted residual design alleviates information loss during feature transformation and improves parameter efficiency.
Beyond computational advantages, MobileNetV2 also contributes favorable generalization performance, enabling the proposed framework to maintain stable detection capability across tunnel environments characterized by varying geological conditions, illumination changes, and dust interference. By integrating the MSCA module, the Feature Smoothing Module, and a MobileNetV2 backbone within the DeepLabv3+ architecture, MF-DeepLabv3+ achieves a more effective balance between segmentation accuracy, robustness, and computational efficiency. The overall architecture of the proposed model is shown in
Figure 1.
2.2. Multi-Scale Cross Attention (MSCA) Module
The Multi-Scale Cross Attention (MSCA) module [
22] is introduced to strengthen the model’s ability to represent fracture features across different spatial scales. It should be noted that the MSCA module is adapted from the multi-scale convolutional attention design of SegNeXt [
22], and this study focuses on its task-oriented integration into DeepLabv3+ for tunnel-face fracture segmentation rather than proposing it as a completely new attention mechanism. In tunnel-face images, fractures exhibit substantial variation in width, length, continuity, and morphological complexity; therefore, effective multiscale feature extraction is essential for reliable segmentation. As illustrated in
Figure 2, the MSCA module processes the input feature map using parallel convolutional branches with a kernel size of 3 and dilation rates of 1, 3, and 5, thereby generating low-, medium-, and high-scale feature responses. This design enables the network to simultaneously capture local detail, intermediate structural patterns, and broader contextual information associated with fractures of different sizes.
To further exploit the complementarity among these scale-specific representations, the module incorporates a cross-scale attention mechanism. The output feature of the MSCA module is formulated as
where
F1,
F2, and
F3 are feature representations extracted at different scales, denotes the fusion operation, and
W is the learned attention map.
These weights dynamically recalibrate the relative importance of features at different scales, allowing the network to emphasize the most informative responses according to image content. Consequently, fine fractures can be enhanced through higher-scale detail-sensitive features, whereas larger or more continuous fracture structures can be better characterized through lower-scale contextual features.
By coupling multiscale feature extraction with adaptive cross-scale weighting, the MSCA module substantially improves the sensitivity of the network to complex fracture patterns. However, the enriched feature responses may also introduce redundant activations and high-frequency noise, which motivates the subsequent incorporation of a feature-smoothing strategy.
To justify the selection of dilation rates in the MSCA module, different dilation-rate combinations were considered. The tested settings included compact receptive-field combinations such as 1, 2, and 3 and 1, 3, and 5, as well as larger ASPP-like combinations such as 1, 6, and 12. The choice of 1, 3, and 5 was motivated by the morphological characteristics of tunnel-face fractures, which are typically thin, irregular, and discontinuous. Larger dilation rates enlarge the receptive field but may introduce redundant background texture and weaken the response to fine fracture boundaries. In contrast, the dilation rates of 1, 3, and 5 allow the model to capture local fracture details and medium-scale contextual features simultaneously, thereby improving the segmentation of slender and low-contrast fractures.
2.3. Feature Smoothing Module
The Feature Smoothing Module is introduced to refine feature-map quality and improve the network’s ability to preserve subtle fracture characteristics in tunnel-face images. Its architecture is shown in
Figure 3. In practical tunnel environments, the extracted features are often contaminated by uneven illumination, complex rock-surface textures, dust interference, and redundant responses generated during network inference. Such disturbances can obscure weak fracture signals and substantially impair the accurate segmentation of thin and discontinuous fractures. To address this problem, the proposed module integrates depthwise separable convolution, pointwise convolution, Batch Normalization, ReLU activation, and residual connections into a compact feature-refinement unit.
A central advantage of the module lies in its use of depthwise separable convolution, which decomposes standard convolution into a depthwise operation and a pointwise 1 × 1 convolution. The depthwise stage performs channel-wise spatial filtering to capture local structural patterns, while the pointwise stage enables inter-channel information fusion. This factorized design markedly reduces computational burden while retaining strong representational capacity, thereby supporting the lightweight objective of the overall network. Batch Normalization and ReLU are further incorporated to stabilize feature distribution, accelerate convergence, and enhance the discriminability of fracture-related responses.
Dropout with a probability of 0.1 was further incorporated into the Feature Smoothing Module to improve the robustness and generalization capability of the model. Since tunnel-face fracture features are often weak, slender, and easily disturbed by rock textures, illumination variation, and dust interference, Dropout helps reduce feature co-adaptation and suppress redundant or unstable responses during feature refinement. The dropout probability was set to 0.1 because a larger value may remove useful fine-scale fracture information and impair boundary continuity, whereas a smaller value provides limited regularization. Therefore, a probability of 0.1 was adopted to achieve a balance between noise suppression, model generalization, and preservation of subtle fracture features.
In addition, the module adopts a residual connection that directly adds the input features to the processed output. From a functional perspective, the Feature Smoothing Module can be interpreted as a local feature-regularization unit. Isolated activations caused by rock-surface texture, blasting marks, water stains, shadows, and illumination variations are usually spatially inconsistent and tend to appear as high-frequency noise, whereas true fracture traces generally exhibit locally continuous and elongated structural patterns. By combining depthwise separable convolution with residual refinement, the proposed module suppresses unstable local responses while preserving the dominant structural information of fractures, thereby improving both noise robustness and fracture continuity in the final segmentation results. This design preserves essential low-level structural information, mitigates feature degradation during transformation, and helps maintain sensitivity to weak fracture boundaries. By suppressing noise and redundant activations while retaining informative structural cues, the Feature Smoothing Module improves feature consistency and robustness, enabling the model to operate more reliably under the complex and variable conditions encountered in tunnel-face detection.
3. Dataset Construction
3.1. Data Acquisition
At present, research on tunnel-face fracture identification is constrained by the lack of publicly available, high-quality datasets, which has substantially limited the training, validation, and optimization of relevant algorithmic models. To address this deficiency and support more systematic investigation in this area, this study constructed a dedicated tunnel-face fracture dataset. Image data were collected from two representative engineering projects, namely the Qingdao Jiaozhou Bay Second Undersea Tunnel and the Yantai Urban Rapid Road Tunnel. Owing to their complex geological settings and diverse construction conditions, these projects provided abundant and highly representative tunnel-face images, thereby offering a reliable basis for investigating fracture characteristics and developing intelligent recognition methods.
To ensure data quality, image acquisition was performed using a Nikon D7000 camera (Nikon Corporation, Tokyo, Japan), assisted by two 32 W LED fill lights. The field acquisition setup is shown in
Figure 4. For tunnels excavated by the drill-and-blast method, each excavation cycle typically includes drilling, charging, blasting, mucking, and arch-frame erection. Among these stages, the drilling and charging processes are generally unsuitable for image acquisition because construction trolleys and equipment severely obstruct the tunnel face, making it difficult to obtain clear and complete images.
After comprehensive evaluation and on-site verification, the optimal acquisition window was determined to be the period following the completion of mucking and scaling and preceding arch-frame erection. During this interval, the tunnel face is relatively unobstructed and less affected by construction activities, which creates favorable conditions for acquiring high-quality images without interfering with normal site operations.
In practical implementation, the two LED fill lights were positioned approximately 6 m in front of the tunnel face, one on each side, to provide more uniform illumination and improve the visibility of rock-mass surface features. For each tunnel face, multiple images were captured from different positions, viewing angles, and heights to ensure comprehensive coverage of fracture morphology and surrounding textural characteristics. Through this carefully designed acquisition strategy, a diverse and representative image dataset was established, providing a solid data foundation for subsequent fracture segmentation and quantitative analysis.
3.2. Dataset Annotation
High-quality annotation is a prerequisite for effective training of deep learning models for tunnel-face fracture segmentation. In this study, all 2153 tunnel-face images were annotated using Labelme v3.16.7, a widely adopted tool for image segmentation tasks. Because the target task belongs to semantic segmentation, the annotation process required pixel-level delineation of fracture boundaries, with annotators tracing each visible fracture contour point by point until the enclosed region was fully defined.
Two label categories were defined for the dataset: background and fracture. The background class corresponds to all non-fracture regions in the image, whereas the fracture class represents the rock-mass fracture areas of interest. During annotation, only the fracture contours were manually delineated, while the remaining pixels were automatically assigned to the background class by the annotation system, thereby improving labeling efficiency without compromising the semantic integrity of the dataset.
To ensure annotation accuracy, completeness, and geological validity, the entire labeling procedure was conducted under the supervision of professional geological engineers from the tunnel construction site. Based on their extensive field experience and expert understanding of fracture morphology, distribution patterns, and boundary characteristics, the annotated results were rigorously reviewed and corrected where necessary. This expert-guided workflow ensured close consistency between the labels and actual geological conditions, thereby enhancing the reliability and practical relevance of the dataset. Through this process, a high-quality tunnel-face fracture dataset was established, providing a robust foundation for model training and subsequent fracture-recognition research.
To further assess the reliability of the ground-truth labels, an inter-annotator agreement analysis was conducted. Two annotators independently labeled a representative subset of tunnel-face fracture images, and Cohen’s kappa coefficient was calculated based on pixel-level agreement between their binary annotation masks. The obtained Cohen’s kappa coefficient was 0.73, indicating substantial agreement between the annotators. For ambiguous regions, including blasting traces, rock joints, weak fracture boundaries, and texture-induced pseudo-fractures, the labels were further re-examined by trained annotators and geological engineers. Discrepant or ambiguous fracture boundaries were reviewed and corrected through joint discussion with an experienced domain expert before generating the final ground-truth labels. This quality-control procedure reduced subjective labeling bias and improved the reliability and reproducibility of the dataset.
It should be noted that tunnel-face fracture segmentation is affected by a severe class imbalance problem. In most images, fracture pixels account for only a small proportion of the whole image, whereas the majority of pixels belong to the background rock surface. This imbalance may cause the model to favor the background class and reduce its sensitivity to thin and discontinuous fractures. Therefore, this issue was considered during model training and evaluation, and segmentation-oriented metrics such as mIoU and mAP were adopted to more comprehensively assess the recognition performance of fracture regions. To address the class imbalance issue, we further examined the annotated masks and confirmed that fracture pixels occupy only a small portion of the image area, whereas most pixels belong to the background rock surface. This observation indicates a clear pixel-level imbalance between fracture and background classes. Therefore, overall pixel Accuracy may be biased toward the dominant background class and is reported only as an auxiliary metric in this study. More segmentation-oriented metrics, including mIoU, per-class IoU, and fracture-class Precision, Recall, and F1-score, are adopted to evaluate the recognition performance of sparse fracture regions more reliably.
3.3. Dataset Preprocessing
To reduce computational cost and storage demand while standardizing the dataset for deep learning applications, a systematic preprocessing pipeline was applied to all collected images. First, the raw image files were loaded from the input directory and converted from high-resolution 24-bit RGB images (4928 × 3264) to 8-bit grayscale images, as illustrated in
Figure 5. This transformation was performed using a weighted average scheme to preserve the relative luminance contributions of different color channels and to retain the most informative structural content for fracture recognition.
The conversion from RGB images to grayscale images was adopted for both engineering and computational considerations. In tunnel-face fracture recognition, fracture identification mainly depends on structural characteristics, including edge discontinuity, intensity contrast, geometric morphology, and spatial continuity, rather than color information. In addition, tunnel images collected in field environments are often affected by uneven illumination, dust, shadows, wet rock surfaces, and artificial lighting, which may introduce unstable color variations and interfere with model training. Grayscale conversion can reduce color-related disturbance, standardize the input data, and decrease computational cost.
Nevertheless, we acknowledge that RGB images may contain useful texture and lithological information. Therefore, the possible loss of color-related texture cues is a limitation of the current preprocessing strategy. In future work, multi-channel image input or adaptive color-space fusion will be investigated to further improve fracture representation under complex geological conditions.
In this study, grayscale preprocessing was adopted for all collected tunnel-face images. This choice was based on the observation that tunnel-face fractures are mainly reflected by luminance contrast, edge discontinuity, and local texture variation, whereas stable color information contributes relatively limited discriminative value. Moreover, RGB color responses in tunnel construction environments are easily affected by uneven artificial illumination, dust interference, water stains, camera exposure, and variations in shooting angle. Grayscale conversion therefore helps reduce redundant color variations and allows the model to emphasize fracture-related structural and textural information.
To further assess the effect of preprocessing strategy on segmentation performance, RGB input, grayscale input, and histogram-equalised grayscale input were compared under the same network architecture and training settings. The grayscale input achieved the highest overall performance, with an Accuracy of 92.47%, mAP of 82.56%, and mIoU of 62.99%. The RGB input obtained comparable but slightly lower results, with an Accuracy of 92.31%, mAP of 82.14%, and mIoU of 62.71%, indicating that color information provided a limited additional benefit for this task. The histogram-equalised grayscale input achieved an Accuracy of 91.86%, mAP of 80.92%, and mIoU of 61.84%. This decrease may be attributed to the fact that histogram equalisation enhances not only fracture contrast but also rock-surface texture, illumination noise, and background interference, resulting in more false fracture responses in complex tunnel-face images. Therefore, grayscale preprocessing was finally adopted as it provided a favorable balance between segmentation performance, robustness, and computational efficiency.
Following grayscale conversion, bilinear interpolation was used to resize the images from (4928 × 3264) to (512 × 512). This step substantially reduced the number of pixels and, consequently, the computational burden associated with model training and inference, while preserving the principal geometric and textural characteristics relevant to fracture segmentation. The processed images were then stored in PNG format in the output directory.
Through this preprocessing procedure, the dataset was converted into a standardized and computationally efficient form suitable for network training. The resulting reduction in data volume improved processing efficiency and provided a consistent input format for subsequent model development, evaluation, and deployment.
4. Comparison and Analysis of Experimental Results
4.1. Experimental Setup, Training Procedure, and Evaluation Metrics
The experiments were conducted using the tunnel-face fracture dataset described in
Section 3 to train, validate, and test the proposed MF-DeepLabv3+ framework. The dataset contained 2153 annotated tunnel-face images and was divided into training, validation, and testing sets according to a ratio of 7:2:1. Specifically, 1507 images were used for training, 431 images were used for validation, and 215 images were reserved as an independent test set for final performance evaluation. The training set was used for model optimization, the validation set was used for monitoring convergence and parameter selection, and the testing set was used only for the final evaluation of segmentation performance.
During training, the batch size was set to 8, the learning rate was initialized at 0.001 and kept fixed throughout the 300 training epochs, and no early-stopping strategy was used. The cross-entropy loss function was adopted for pixel-wise fracture/background classification. To improve experimental repeatability, the random seed was fixed at 42 for parameter initialization, dataset splitting, and data loading. For all comparative experiments, the compared models were trained and evaluated under the same experimental conditions to ensure fairness. Specifically, all models used the same dataset split, image preprocessing procedure, batch size, learning rate, number of training epochs, loss function, optimizer setting, random seed, and hardware/software environment. No additional data or model-specific training strategy was introduced for any compared model.
Considering the limited size of the constructed dataset and the complex visual conditions of tunnel faces, data augmentation was applied during model training to improve generalization ability. The augmentation operations included random horizontal flipping, random rotation, random scaling, random cropping, and brightness/contrast adjustment. These operations were used to simulate variations in fracture orientation, image scale, shooting distance, and illumination conditions encountered in real tunnel environments. No augmentation was applied to the validation or testing sets to ensure an objective evaluation of model performance.
All model training and testing experiments were conducted on a laptop workstation equipped with an Intel(R) Core(TM) i7-13700H CPU, 16 GB RAM, and an NVIDIA GeForce RTX 4060 Laptop GPU. The Intel(R) Arc(TM) Graphics processor was the integrated/auxiliary graphics device of the laptop and was not used for model training or evaluation. The software environment included Python 3.7.1, PyTorch 1.10.2 + cu113, and CUDA 11.3.
In addition, the imbalance between fracture and background pixels was considered during training. Since fracture regions are usually slender and sparse, background pixels dominate most tunnel-face images. This imbalance may weaken the learning of fracture features. To alleviate this problem, data augmentation was used to increase the diversity of fracture samples, and mIoU and mAP were adopted as important evaluation metrics because they are more sensitive to segmentation quality than overall pixel accuracy alone. Considering the severe class imbalance between fracture and background pixels, overall Accuracy was used only as an auxiliary metric. The main evaluation metrics included mAP, mIoU, per-class IoU, and fracture-class Precision, Recall, and F1-score. For binary fracture segmentation, the IoU values of the fracture class and background class were calculated separately, and mIoU was obtained by averaging the two class IoU values. Precision, Recall, and F1-score were calculated specifically for the fracture class to evaluate fracture detection reliability and completeness.
The mIoU curve of the MF-DeepLabv3+ model over the training epochs is shown in
Figure 6. As a critical metric for evaluating semantic segmentation performance, mIoU reflects the overlap between the predicted fracture regions and the ground-truth annotations. In the initial training stage, the proposed architecture rapidly captured salient structural information from the tunnel-face images, resulting in a pronounced increase in mIoU. Between 50 and 150 epochs, once the major discriminative features had been learned, further performance improvement increasingly depended on the extraction of subtle and fine-scale fracture characteristics. Under the combined influence of data noise, feature ambiguity, and model complexity, the mIoU curve exhibited a fluctuating but overall upward trend during this interval. After approximately 150 epochs, the mIoU curve gradually stabilized, indicating that the segmentation performance approached a steady state.
The training dynamics of the MF-DeepLabv3+ model are further illustrated in
Figure 7, which presents the evolution of the training loss, validation loss, and smoothed loss over 300 epochs. As training progressed, the loss values generally decreased, indicating effective optimization of the model parameters. In the early stage, the model exhibited rapid learning and fast loss reduction, suggesting efficient extraction of fundamental fracture-related features. After approximately 50 epochs, the rate of decline gradually slowed, and the loss curves began to flatten. By around 150 epochs, the optimization process had entered a relatively stable phase. After 200 epochs, both the training and validation losses remained relatively stable, while the smoothed loss curves showed a high degree of consistency, suggesting that the model had converged to a stable solution without evident optimization instability.
4.2. Ablation Study
To evaluate the effectiveness of the proposed modifications, this study systematically compared three improved variants derived from the DeepLabv3+ framework with the original DeepLabv3+ model on the tunnel-face fracture segmentation task. The comparison focused on both segmentation performance and model complexity. Specifically, Accuracy, mAP (mean Average Precision), and mIoU (mean Intersection over Union) were adopted to assess recognition and segmentation performance, whereas GFLOPs and parameter count were used to characterize computational cost and model scale. The quantitative results of the ablation study are summarized in
Table 1, and representative visual comparisons of fracture-segmentation performance are presented in
Figure 8.
In addition to the ablation study, a sensitivity analysis was conducted to further evaluate the influence of different dilation-rate combinations in the MSCA module. The tested settings included compact receptive-field combinations such as 1, 2, and 3 and 1, 3, and 5, as well as larger ASPP-like combinations such as 1, 6, and 12. The results showed that the combination of 1, 3, and 5 achieved the most favorable overall segmentation performance. This indicates that compact dilation rates are more suitable for tunnel-face fracture segmentation than larger ASPP-like dilation rates. The reason is that tunnel-face fractures are generally thin, irregular, and discontinuous, and excessive receptive fields may introduce redundant rock textures and illumination interference, thereby weakening the response to fine fracture boundaries.
Figure 8 provides a qualitative comparison between the original field image and the manually annotated fracture image. The comparison shows that the manually identified fracture traces correspond well to the visible fracture features in the field image, providing a visual reference for evaluating the segmentation and subsequent geometric characterisation results. However, this comparison is qualitative and does not constitute a full quantitative validation of the extracted fracture lengths and widths. The fracture lengths and widths were derived from the image-based segmentation results and subsequent geometric analysis. The corresponding quantitative results are presented later in the manuscript. Since independently measured field-scale fracture dimensions were not available, these values should be interpreted as approximate image-based geometric indicators rather than fully validated ground-truth measurements. Their accuracy may be affected by segmentation quality, image resolution, scale calibration, and the assumptions adopted in the length and width estimation procedures.
This section provides a more detailed analysis of the ablation results from the perspectives of segmentation accuracy, feature representation, qualitative visualization, and computational complexity. As shown in
Table 1, the MSCA-DeepLabv3+ variant achieved an Accuracy of 92.49%, an mAP of 80.71%, and an mIoU of 63.80%, showing clear improvement over the original DeepLabv3+ model. This indicates that the MSCA module effectively enhances multi-scale fracture feature extraction by integrating contextual information from different receptive fields. Such a design is particularly beneficial for tunnel-face fracture segmentation because fractures usually exhibit significant variations in width, length, continuity, and orientation.
The Feature Smoothing-DeepLabv3+ variant achieved an mAP of 80.81%, which is higher than that of the original DeepLabv3+. This suggests that the Feature Smoothing module can suppress noisy and redundant responses in the feature maps, thereby reducing false-positive predictions and improving the precision-related evaluation metric. However, its mIoU decreased to 57.09%. This phenomenon may be attributed to the fact that, when used alone, the smoothing operation may weaken some fine, discontinuous, or low-contrast fracture responses while suppressing noise. As a result, certain fracture pixels may be missed, leading to incomplete fracture-region coverage and a reduction in the overlap between the predicted masks and the ground-truth annotations. Therefore, the Feature Smoothing module improves detection precision to some extent but may reduce segmentation completeness when it is not supported by sufficiently strong multi-scale feature representation.
When the MSCA module and the Feature Smoothing module are jointly integrated, the resulting MF-DeepLabv3+ model achieves the best overall performance, with an Accuracy of 92.47%, an mAP of 82.56%, and an mIoU of 62.99%. Although its mIoU is slightly lower than that of MSCA-DeepLabv3+, it remains higher than that of the original DeepLabv3+ model and is accompanied by the highest mAP. This indicates that the combined architecture achieves a more favorable balance between fracture detection precision and segmentation-region completeness. The MSCA module first strengthens the representation of fracture features across multiple scales, while the Feature Smoothing module further refines these enhanced responses by suppressing background interference and redundant activations. Their complementary effects explain why MF-DeepLabv3+ achieves the most balanced overall segmentation performance.
The qualitative segmentation comparisons in
Figure 8 further support the quantitative results of the ablation study. Compared with the original DeepLabv3+ model, the MSCA-enhanced variant shows improved sensitivity to slender and discontinuous fracture structures. The final MF-DeepLabv3+ model preserves more continuous fracture predictions while reducing scattered background misclassifications. These visual observations are consistent with the improvements in mAP and mIoU reported in
Table 1 and further demonstrate the complementary effects of multi-scale feature enhancement and feature smoothing.
It should be noted that
Figure 8 presents qualitative segmentation visualizations rather than explicit attention maps or learned feature-response maps. Attention-map and feature-response visualization would provide additional interpretability for the proposed modules. In future work, more systematic model-interpretability analyses, including attention visualization and feature-response mapping, will be conducted to further investigate the internal response mechanism of the proposed network.
From the perspective of model complexity, clear differences were observed among the compared architectures. The original DeepLabv3+ and Feature Smoothing-DeepLabv3+ exhibited relatively lower GFLOPs and parameter counts, whereas MSCA-DeepLabv3+ and MF-DeepLabv3+ required higher computational cost and larger model capacity. The increased complexity of MF-DeepLabv3+ is mainly introduced by the MSCA and Feature Smoothing modules. Considering that the mIoU improvement over the original DeepLabv3+ is modest, the additional computational cost should be interpreted cautiously.
Overall, the ablation results reveal a trade-off among fracture recognition capability, region-level segmentation overlap, and computational efficiency. The Feature Smoothing-only variant is not recommended as a standalone configuration because, although it improves mAP, it causes a clear decrease in Accuracy and mIoU. Its role should therefore be interpreted as a complementary refinement component that is more effective when combined with MSCA-based multiscale feature enhancement. When the objective is to improve fracture recognition and localization capability, MF-DeepLabv3+ provides the most favorable mAP and a modest mIoU improvement over the baseline. However, when computational resources are highly constrained or when the primary evaluation criterion is mIoU alone, the original DeepLabv3+ or MSCA-DeepLabv3+ may also be considered. These findings provide a more balanced basis for selecting appropriate segmentation architectures under different engineering constraints and evaluation priorities.
4.3. Comparative Experiments
To further evaluate the effectiveness of the proposed method, comprehensive comparative experiments were conducted using U-Net, Channel-UNet [
23], PSPNet, SegNet, DeepLabv3, and the proposed MF-DeepLabv3+ model. For fair comparison, both DeepLabv3 and MF-DeepLabv3+ employed MobileNetV2 as the backbone network. The evaluation metrics included Accuracy, mAP, mIoU (mean Intersection over Union), GFLOPs, and parameter count, enabling a joint assessment of segmentation performance and computational complexity. It should be noted that the reported mAP in this study was calculated from pixel-level fracture probability maps rather than object-level detections. Specifically, each pixel was treated as an individual prediction sample, and the precision–recall curve was obtained by varying the confidence threshold of the predicted fracture probability map. Therefore, no bounding boxes, object-level detections, or object-level IoU thresholds were involved in the calculation. In the revised manuscript, mAP is interpreted only as an auxiliary pixel-level indicator, while mIoU and fracture-specific metrics are emphasized as the main semantic segmentation evaluation metrics.
As shown in
Table 2, the proposed MF-DeepLabv3+ model achieved the best overall performance among all compared methods across multiple evaluation metrics. In terms of segmentation accuracy, MF-DeepLabv3+ reached an Accuracy of 92.47%, representing an improvement over the baseline DeepLabv3 model. Its pixel-level mAP increased to 82.56%, yielding a substantial gain relative to DeepLabv3, while mIoU reached 62.99%, further confirming the superiority of the proposed architecture in pixel-level fracture segmentation. However, considering that mAP is more commonly associated with object detection, this metric is used only as an auxiliary indicator in this study. The main evaluation and discussion focus on standard semantic segmentation metrics, especially mIoU and fracture-specific indicators. These results strongly demonstrate the effectiveness of the multi-scale feature fusion strategy. By integrating contextual information at different receptive-field scales, the proposed method is better able to capture the heterogeneous morphological characteristics of fractures in complex tunnel-face scenes, thereby improving both recognition precision and segmentation consistency. Although Channel-UNet also exhibited comparatively strong performance in terms of mAP, it remained clearly inferior to MF-DeepLabv3+, indicating that channel attention alone is insufficient to match the representational advantages provided by multi-scale feature fusion.
From the perspective of computational efficiency, MF-DeepLabv3+ maintained a favorable balance between segmentation performance and model complexity. Although its computational cost increased relative to the original DeepLabv3, the parameter count remained at a moderate level and was still substantially lower than that of conventional architectures such as PSPNet and U-Net. This modest increase in computational burden resulted in marked gains in segmentation quality, particularly in mAP, highlighting the strong cost-effectiveness of the proposed design. These results suggest that MF-DeepLabv3+ achieves a desirable compromise between computational efficiency and feature representation capacity, making it well suited for fracture segmentation tasks that demand both accuracy and practical deployability.
It should be noted that the current comparative experiments mainly focus on classical convolutional semantic segmentation models. Recent transformer-based and lightweight hybrid segmentation architectures, such as SegFormer, Mask2Former, Swin-Unet, and MobileViT-based segmentation networks, have shown strong feature representation capabilities in semantic segmentation tasks. However, these models were not included in the current comparison due to the limited size of the constructed tunnel-face fracture dataset, the severe class imbalance between fracture and background pixels, the available hardware resources, and the main objective of this study, which is to improve the DeepLabv3+ framework through multi-scale feature enhancement and feature smoothing for fracture segmentation. In addition, fair comparison with these recent models usually requires model-specific training strategies, careful hyperparameter tuning, and more extensive computational resources. Therefore, the current comparison is intended to evaluate the proposed modifications against representative CNN-based segmentation baselines rather than to claim exhaustive superiority over all recent state-of-the-art segmentation models. This limitation has been acknowledged in the revised manuscript, and comparisons with more recent transformer-based and lightweight hybrid models will be conducted in future work.
Although inference latency was not systematically benchmarked for all compared models or on dedicated edge devices in the present study, the relatively low parameter count and moderate GFLOPs of the proposed MF-DeepLabv3+ model, together with the lightweight MobileNetV2 backbone, indicate favorable potential for near-real-time deployment in practical tunnel engineering applications. It should be noted that GFLOPs and parameter count are only hardware-independent indicators of model complexity and cannot fully represent actual inference speed. Runtime performance may also be affected by hardware architecture, memory access, software implementation, batch size, and deployment optimization. Therefore, systematic inference-time evaluation, including FPS and milliseconds per image for all compared models on field-deployable hardware, will be conducted in future work. Although MF-DeepLabv3+ introduces additional computational complexity compared with the original DeepLabv3+ model, the increase remains within an acceptable range for the tunnel-face fracture segmentation task. The additional GFLOPs and parameters are mainly introduced by the MSCA module and the Feature Smoothing module, which are designed to enhance multi-scale fracture representation and suppress redundant background responses. From the perspective of segmentation performance, the proposed model achieves the highest mAP among the compared models and also improves mIoU over the baseline DeepLabv3+. The improvement in pixel-level mAP suggests that the proposed framework produces more reliable fracture probability predictions at the pixel level, while the improvement in mIoU indicates better spatial overlap between the predicted fracture regions and the ground-truth annotations. Nevertheless, mAP is interpreted only as an auxiliary metric in this study, and the main conclusions are drawn primarily from mIoU and fracture-specific segmentation metrics. Therefore, the additional computational cost is considered acceptable given the improved segmentation reliability and fracture recognition performance.
Considering the severe class imbalance between fracture and background pixels, additional class-specific metrics were further introduced, as shown in
Table 3.
As shown in
Table 3, the background class maintains a high IoU due to its dominant pixel proportion, whereas the fracture class shows a much lower IoU under severe class imbalance. Compared with DeepLabv3, the proposed MF-DeepLabv3+ improves fracture-class IoU, fracture-class Precision, and fracture-class Recall, indicating that the proposed model is more sensitive to thin, irregular, and discontinuous fracture regions while maintaining stable background segmentation performance. These fracture-specific metrics provide a more direct evaluation of fracture segmentation performance than overall pixel Accuracy. Fracture-class IoU reflects the spatial overlap between predicted fracture regions and ground-truth annotations, Precision reflects the reliability of predicted fracture pixels, and Recall reflects the completeness of fracture detection. Therefore, in this study, overall Accuracy and pixel-level mAP are interpreted only as supplementary indicators, whereas mIoU, per-class IoU, and fracture-specific Precision and Recall are emphasized as the main indicators for evaluating segmentation quality under class imbalance.
4.4. Limitations of Experimental Repeatability and Statistical Significance
Although the experimental results demonstrate the effectiveness of the proposed MF-DeepLabv3+ model in improving fracture recognition performance, this study still has certain limitations in terms of statistical repeatability and significance analysis. The reported quantitative results are based on the current fixed dataset split and random seed settings. Due to the limited size of the constructed tunnel-face fracture dataset and the relatively high computational cost of repeatedly training multiple segmentation models, extensive multi-seed repeated experiments and confidence-interval analysis were not fully conducted in the present study.
Therefore, the statistical significance of small metric differences, particularly the mIoU improvement from 62.20% to 62.99%, cannot be fully confirmed based on the current single-run results. Statistical methods such as repeated experiments with different random seeds, bootstrapped confidence intervals, or McNemar’s test would be required to determine whether such a small improvement is statistically significant.
To improve experimental transparency, the loss function, learning-rate setting, random seed, and early-stopping strategy have been clarified in
Section 4.1. In future work, the dataset will be further expanded, and repeated experiments under different random seeds will be conducted to report the mean values, standard deviations, and confidence intervals of the main evaluation metrics. This will further verify the statistical robustness and generalization ability of the proposed method under different initialization and data-splitting conditions.
5. Fracture Post-Processing
The output of a semantic segmentation model provides only an initial delineation of fracture regions and does not, by itself, fully satisfy the needs of engineering practice. Raw segmentation results cannot comprehensively characterize the structural condition of the surrounding rock mass. In contrast, the extraction of higher-level fracture attributes, including grouping, length, and width, provides more actionable information for engineering interpretation and decision-making.
In practical geoscience and tunnelling applications, these fracture attributes are closely related to standard field geological mapping and engineering geological assessment. For example, fracture orientation and grouping can be used to identify dominant discontinuity sets, fracture length can reflect persistence, and fracture width can provide an indication of aperture development. These parameters are commonly considered in rock-mass quality evaluation, surrounding-rock classification, stability assessment, and the selection or adjustment of tunnel support measures. Therefore, the post-processing results in this study are designed to provide quantitative and visually interpretable information that can be used as auxiliary evidence for engineering geological logging and tunnel-face assessment.
From the perspective of stability evaluation, fracture grouping reveals the spatial distribution patterns of discontinuities and allows engineers to identify structurally weak zones within the rock mass. Fracture length and width, meanwhile, are critical geometric indicators for quantitative risk assessment. Fractures with large apertures and extended persistence often imply reduced rock-mass integrity, lower load-bearing capacity, and elevated construction risk. Such information is essential for support design, excavation management, and safety control, as it enables more rational planning of reinforcement strategies and construction schedules. In addition, these quantitative descriptors provide a basis for establishing long-term tunnel health monitoring datasets, thereby supporting preventive maintenance and the long-term operational stability of underground infrastructure.
5.1. Fracture Grouping
Fracture grouping aims to classify complex fracture systems according to specific structural criteria, thereby facilitating a more systematic analysis of fracture distribution characteristics. In this study, two principal strategies were adopted for fracture grouping.
- (1)
Fracture Grouping Based on Edge–Parameter Space Mapping
Edge detection is the first critical step in extracting fracture features from tunnel-face images. In this work, the Canny operator was employed to process fracture images of the tunnel face. Specifically, Gaussian filtering was first applied to smooth the image and suppress noise, thereby improving the stability of subsequent edge extraction. The gradient magnitude and gradient direction were then computed to enhance potential boundary information. Next, non-maximum suppression (NMS) was used to refine the detected edges and eliminate spurious responses. Finally, a double-threshold strategy was applied to distinguish true edges from weak and noise-induced responses.
After obtaining high-quality fracture-edge information, the Hough transform was introduced to detect fracture line segments. As a classical algorithm in image processing, the Hough transform operates by establishing a mapping relationship between image space and parameter space, thereby converting line structures that are difficult to identify directly in the image domain into prominent peak features in parameter space.
In practical implementation, the Hough transform identifies edge points that may belong to the same line through a voting mechanism. The process begins with voting in parameter space according to the gradient-direction information of each edge point, thereby accumulating evidence for candidate lines. Subsequently, a peak-search procedure is performed in parameter space. Because collinear edge points correspond to neighboring regions and generate significant peaks, accurate localization of these peaks enables the detection of candidate fracture lines supported by a large number of aligned edge pixels. Finally, to further improve the precision and robustness of line detection, the least-squares method was employed to optimize the fitting of the extracted line segments.
This fitting procedure effectively reduces the influence of edge fluctuations caused by noise and local disturbances, resulting in smoother and more accurate fracture-line representations. Such refined line-segment extraction provides a reliable basis for subsequent analyses of fracture length and other geometric parameters. The results of edge detection and line detection are presented in
Figure 9.
Further orientation-based grouping was performed on the basis of the line segments detected by the Hough transform. For each extracted line, the inclination angle was calculated from the endpoint coordinates using the arctangent function. To classify the detected fractures by orientation, the full angular range was divided into intervals of 60°, with each interval assigned a distinct color. Each fracture line was then mapped to its corresponding group according to the interval containing its calculated angle.
Subsequently, the grouped line segments were rendered onto the image using the cv2.line function, allowing fractures with different orientations to be visualized in different colors. This procedure provides an intuitive representation of orientation-dependent fracture grouping and facilitates the analysis of dominant fracture sets within the tunnel face. The resulting grouping effect is illustrated in
Figure 10.
To improve the geological interpretability of the extracted fracture orientations, the orientation-based grouping was refined from the original 60-degree intervals to 30-degree angular bins. Specifically, the extracted fracture traces were classified into six orientation ranges: 0–30°, 30–60°, 60–90°, 90–120°, 120–150°, and 150–180°. In addition, a rose diagram was introduced to statistically visualise the orientation distribution of the extracted fracture traces. This refinement provides a more detailed representation of fracture orientation characteristics and helps identify dominant orientation trends more clearly.
- (2)
Grouping Based on Connected Component Analysis
The second fracture-grouping strategy is based on connected component analysis, which may also be interpreted as a region-based hierarchical aggregation approach. Before connected regions are identified, the image undergoes a sequence of preprocessing operations to enhance structural continuity and suppress interference. First, Gaussian blurring is applied to reduce noise and smooth local intensity variations. Next, adaptive thresholding is performed to convert the image into a binary representation, thereby improving the separation between fracture targets and the surrounding background. A subsequent dilation operation is then introduced to connect fragmented or locally discontinuous responses, allowing spatially adjacent fracture features to be merged into more coherent regions.
After these preprocessing steps, connected component analysis is conducted, as illustrated in
Figure 11. This procedure assigns labels to individual connected regions in the binary image and outputs statistical information for each component, including attributes such as area and bounding box. By analyzing these regional statistics, the component with the largest area can be identified, which in most cases corresponds to the principal fracture region. A rectangular bounding box is then generated around the selected component to achieve fracture grouping and visualization.
Fracture grouping in this study primarily relies on the first strategy, namely the edge–parameter space mapping-based method. This approach is more effective for identifying dominant fracture sets and representing their orientation-dependent distribution. By contrast, the connected component analysis-based method is mainly used to improve contour extraction, thereby providing more reliable geometric boundaries for the subsequent calculation of fracture length and width.
5.2. Fracture Length and Width Calculation
Fracture length and width are fundamental parameters for describing fracture geometry and are essential for the quantitative assessment of rock-mass structural conditions. In this study, fracture geometry was quantified using the following procedures.
- (1)
Length calculation based on line fitting
The cv2.fitLine function was first employed to fit a straight line to the extracted fracture contour. This function returns the direction vector of the fitted line together with a reference point located on the line. Based on these parameters, the projection of all contour points onto the fitted-line direction was computed. The difference between the maximum and minimum projection values was then taken as the fracture length along the fitted direction.
Because image scaling may be introduced during image acquisition and preprocessing, the calculated length in pixel units was further converted into meters using the corresponding image scale factor. It should be noted that this pixel-to-metre conversion was used as an approximate engineering reference in the present study, and its field-scale accuracy requires further validation against manual measurements.
- (2)
Width Calculation Based on the Distance from Contour Points to the Fitted Line
Fracture width was estimated from the perpendicular distances between contour points and the fitted line. After obtaining the line parameters, the perpendicular distance from each contour point to the fitted line was calculated mathematically. Statistical analysis was then performed on the distance values of all contour points. The maximum distance was defined as the maximum fracture width, whereas the mean distance was taken as the average fracture width.
Similarly, the width values were converted from pixel units into meters according to the same image scale factor. However, the converted metric values should be interpreted as approximate estimates rather than rigorously validated field measurements.
The output format of the system for local fitted-line endpoint coordinates, fracture orientation angles, and geometric parameters is presented in
Table 4. It should be noted that the listed endpoint coordinates are local fitted-line coordinates generated after coordinate transformation and line fitting during post-processing, rather than raw pixel coordinates in the original image or georeferenced world coordinates. Therefore, negative coordinate values may appear when the local coordinate origin is shifted relative to the original image coordinate system or when the fitted line is extended for geometric representation.
The extracted fracture parameters can provide a reference for subsequent engineering analysis and tunnel-face stability evaluation.
Although the proposed fracture post-processing procedure improves the engineering applicability of the segmentation results, several limitations should be acknowledged. First, the current procedure assumes that fracture segments can be locally approximated as linear structures. This assumption is reasonable for many elongated tunnel-face fractures, but it may not fully describe curved, intersecting, or branching fractures. For complex fracture morphologies, the fitted line segments should be regarded as local linear approximations rather than complete geometric representations. Second, the extracted fracture length and width are mainly used to demonstrate the feasibility of converting segmentation masks into engineering-related geometric descriptors. Due to the lack of independently measured geological ground-truth data or manual engineering measurements, these geometric parameters were not quantitatively validated against field measurements in the present study. Third, segmentation errors may propagate into the post-processing stage. False-positive pixels may introduce redundant connected components or spurious line segments; missed fracture pixels may cause discontinuity and length underestimation, and boundary inaccuracies may affect width estimation and line fitting. Finally, the 60° orientation interval was adopted as a coarse engineering-oriented grouping strategy to identify dominant fracture trends while avoiding excessive fragmentation of orientation groups. However, this interval is an empirical setting rather than a universal threshold. Future work will incorporate manual or geological reference measurements, investigate adaptive orientation grouping strategies, and explore skeleton-based tracing, curve fitting, graph-based modeling, or branch-aware topology analysis for more accurate characterization of curved and branching fractures.
6. Conclusions
In this study, an improved DeepLabv3+-based framework, termed MF-DeepLabv3+, was proposed for fracture identification in tunnel face images. By integrating the MSCA module and the Feature Smoothing module into the original architecture, the proposed model strengthened multiscale feature extraction and enhanced the continuity and boundary representation of fracture regions. In addition, MobileNetV2 was adopted as the backbone network to achieve a lightweight architecture while preserving effective segmentation capability.
A dataset comprising 2153 tunnel face images collected from the Qingdao Jiaozhou Bay Second Subsea Tunnel and the Yantai Urban Rapid Road Tunnel was constructed and used for model training and validation. Experimental results demonstrated that MF-DeepLabv3+ achieved an Accuracy of 92.47%, an mAP of 82.56%, and an mIoU of 62.99%. Compared with the original DeepLabv3+ model, the proposed method showed a clear improvement in mAP and a modest improvement in mIoU, indicating enhanced fracture recognition capability while only slightly improving region-level segmentation overlap. The proposed method showed particular advantages in the identification of slender fractures and in maintaining structural continuity.
Furthermore, in combination with post-processing techniques including Canny edge detection, Hough transform, connected component analysis, and line fitting, the proposed framework enabled the extraction of key fracture parameters, including orientation, length, maximum width, and average width. Therefore, the method not only improves fracture segmentation accuracy but also provides quantitative geometric information that can support tunnel engineering assessment and structural interpretation.
The proposed MF-DeepLabv3+ framework demonstrates potential for tunnel-face fracture segmentation and quantitative characterization by combining segmentation results with fracture post-processing analysis. The extracted fracture orientation, length, width, and grouping information can provide useful references for subsequent engineering interpretation. However, the present study is mainly validated on images collected from two drill-and-blast tunnel projects located in coastal Shandong Province, China. Therefore, the generalisability of the proposed model to other geological settings, rock types, imaging conditions, and excavation methods, such as TBM tunnelling, has not yet been fully verified. Further validation using datasets from different regions, lithologies, tunnel construction methods, and environmental conditions is necessary before large-scale engineering application.
Future work will focus on further improving the robustness, generalization ability, and engineering applicability of the proposed framework. First, transformer-based or hybrid CNN-transformer architectures will be explored to enhance global contextual feature representation for complex fracture patterns. Second, uncertainty quantification will be introduced to evaluate the confidence and reliability of fracture segmentation results, especially in ambiguous boundary regions and low-contrast images. Third, larger-scale datasets will be constructed by collecting tunnel-face images from different regions, lithologies, geological settings, and excavation methods, including drill-and-blast and TBM tunnelling, to improve cross-site and cross-method generalisation. Finally, model compression, lightweight network design, and inference acceleration strategies will be investigated to support real-time or near-real-time deployment in practical tunnel construction scenarios.
It should be noted that the fracture width estimation method used in this study has certain limitations. The width was estimated as the perpendicular distance from contour points to a fitted line, which assumes that the fracture is approximately linear and that the fitted line centrally bisects the fracture. This assumption is reasonable for relatively straight and isolated fracture segments, but may not be valid for curved, branching, or en-echelon fractures. For curved fractures, a single fitted line may not follow the local fracture trajectory, which can lead to overestimation or underestimation of the actual width. For branching fractures, the fitted line can be influenced by multiple fracture arms and may not represent a physically meaningful centreline. For en-echelon fractures, discontinuous segments may be incorrectly treated as a single linear feature, thereby introducing bias in the estimated width. Therefore, the width values reported in this study should be interpreted as approximate image-based indicators rather than precise field-scale aperture measurements. In future work, local skeleton-based width estimation, distance transform methods, and segment-wise centreline fitting will be considered to improve the robustness of width estimation for complex fracture geometries.
Author Contributions
Q.G.: Conceptualization, methodology, validation, investigation, formal analysis, and writing—original draft. J.F.: Methodology, validation, investigation, data annotation, and data curation. N.Z.: Validation, investigation, data curation, and engineering data support. He contributed to the collection, organization, verification, and management of tunnel engineering image data, and participated in validating the dataset and experimental results. H.L.: Software, model implementation, model training, validation, and formal analysis. X.J.: Conceptualization, resources, supervision, project administration, funding acquisition, and writing—review and editing. C.C.: Data curation, formal analysis, investigation, and visualization support. W.T.: Resources, supervision, project administration, engineering background support, and writing—review and editing. Y.C.: Investigation, visualization, figure preparation, and organization of graphical materials. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Key R&D Program of Shandong Province, China (Grant No. 2024CXPT079) and the National Natural Science Foundation of China (Grant No. 52579106). The project was supported by the Taishan Scholars Program (Grant No. tsqn202408338 and tsqn202312053) and the Shandong Provincial Natural Science Foundation (Grant No. ZR2024ME028).
Institutional Review Board Statement
Ethical review and approval were waived for this study, as this work only involves numerical simulation, image processing and engineering data analysis, and does not involve human participants or animal experiments.
Informed Consent Statement
Not applicable.
Data Availability Statement
The dataset used in this study is not publicly available due to strict corporate confidentiality agreements, proprietary data ownership, and safety regulations associated with the investigated tunnel infrastructure projects. However, the methodological details, model architecture, preprocessing procedures, and experimental settings are provided in the manuscript to support reproducibility to the greatest extent possible.
Acknowledgments
The authors would like to express their sincere gratitude to the editors and anonymous reviewers for their valuable comments and constructive suggestions, which have greatly improved the quality of this manuscript. We also thank the colleagues and research team members for their technical support and helpful discussions during the research work. In addition, we appreciate all the institutions and individuals who have provided support and assistance for this study.
Conflicts of Interest
Author Ning Zhang was employed by the company Shandong Hi-Speed Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| MF-DeepLabv3+ | Multi-Scale Feature Fusion DeepLabv3+ |
| MSCA | Multi-Scale Cross-Attention |
| ASPP | Atrous Spatial Pyramid Pooling |
| mIoU | Mean Intersection over Union |
| mAP | Mean Average Precision |
| CNN | Convolutional Neural Network |
| ReLU | Rectified Linear Unit |
References
- Jiang, F.; Wang, G.; He, P.; Zheng, C.; Xiao, Z.; Wu, Y. Application of Canny operator threshold adaptive segmentation algorithm combined with digital image processing in tunnel face crevice extraction. J. Supercomput. 2022, 78, 11601–11620. [Google Scholar] [CrossRef]
- Talab, A.M.A.; Huang, Z.; Xi, F.; HaiMing, L. Detection crack in image using Otsu method and multiple filtering in image processing techniques. Optik 2016, 127, 1030–1033. [Google Scholar] [CrossRef]
- Hoang, N.D. Detection of surface crack in building structures using image processing technique with an improved Otsu method for image thresholding. Adv. Civ. Eng. 2018, 2018, 3924120. [Google Scholar] [CrossRef]
- Nnolim, U.A. Fully adaptive segmentation of cracks on concrete surfaces. Comput. Electr. Eng. 2020, 83, 106561. [Google Scholar] [CrossRef]
- Bai, R.; Gao, J.; Li, Z.; Liu, D.; Shangguan, X. Research on crack disease identification based on visible spectrum in harsh tunnel environment. IEEE Access 2023, 11, 123268–123278. [Google Scholar] [CrossRef]
- Dai, Q.; Xie, Y.; Xu, J.; Xia, Y.; Sheng, C.; Tian, C.; Ou, W. Tunnel crack identification based on improved YOLOv5. In Proceedings of the 2022 7th International Conference on Automation, Control and Robotics Engineering (CACRE); IEEE: Piscataway, NJ, USA, 2022; pp. 302–307. [Google Scholar]
- Man, K.; Liu, R.; Liu, X.; Song, Z.; Liu, Z.; Cao, Z.; Wu, L. Water leakage and crack identification in tunnels based on transfer-learning and convolutional neural networks. Water 2022, 14, 1462. [Google Scholar] [CrossRef]
- Lan, M.L.; Yang, D.; Zhou, S.X.; Ding, Y. Crack detection based on attention mechanism with YOLOv5. Eng. Rep. 2025, 7, e12899. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Chen, J.; Huang, H.; Cohn, A.G.; Zhou, M.; Zhang, D.; Man, J. A hierarchical DCNN-based approach for classifying imbalanced water inflow in rock tunnel faces. Tunn. Undergr. Space Technol. 2022, 122, 104399. [Google Scholar] [CrossRef]
- Xu, X.; Zhao, M.; Shi, P.; Ren, R.; He, X.; Wei, X.; Yang, H. Crack detection and comparison study based on Faster R-CNN and Mask R-CNN. Sensors 2022, 22, 1215. [Google Scholar] [CrossRef] [PubMed]
- Wu, J.; Zhang, X. Tunnel crack detection method and crack image processing algorithm based on improved Retinex and deep learning. Sensors 2023, 23, 9140. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [PubMed]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; pp. 801–818. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 2881–2890. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2021; pp. 13713–13722. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2019; pp. 1314–1324. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. SegNeXt: Rethinking convolutional attention design for semantic segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar]
- Chen, Y.; Wang, K.; Liao, X.; Qian, Y.; Wang, Q.; Yuan, Z.; Heng, P.A. Channel-UNet: A Spatial Channel-Wise Convolutional Neural Network for Liver and Tumors Segmentation. Front. Genet. 2019, 10, 1110. [Google Scholar] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |