Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework

Cao, Jianzhao; Sun, Zhu; Ding, Jingguo; Li, Xu

doi:10.3390/met16060657

Open AccessArticle

Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework

¹

School of Computer Science and Engineering, Shenyang Jianzhu University, Shenyang 110168, China

²

State Key Laboratory of Digital Steel, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Metals 2026, 16(6), 657; https://doi.org/10.3390/met16060657 (registering DOI)

Submission received: 20 May 2026 / Revised: 10 June 2026 / Accepted: 11 June 2026 / Published: 14 June 2026

(This article belongs to the Special Issue Recent Progress in Metal Rolling Processes)

Download

Browse Figures

Versions Notes

Abstract

Surface cracks on mandrel bars affect product quality and production stability in seamless steel pipe manufacturing. Existing vision-based methods mainly rely on bounding-box detection, which is insufficient for precise crack delineation and geometric characterization. This study proposes a lightweight segmentation framework for online mandrel bar crack inspection using grayscale industrial images. Based on YOLO11n-seg, the framework incorporates single-channel input adaptation, lightweight network reconfiguration, and crack-oriented feature enhancement to improve the extraction of weak, thin, and irregular cracks while reducing computational cost. A dedicated industrial dataset and a sample-balancing strategy are introduced to alleviate severe crack–background imbalance. Based on the predicted pixel-level masks, crack area, projected length, maximum width, and average width are calculated for online evaluation. Experimental results show that the proposed method achieves a mask mAP@0.5 of 88.5%, a false negative rate of 1.72%, and real-time inference at 204 FPS with 3.01 GFLOPs. Field deployment further demonstrates the effectiveness of the proposed framework for online crack inspection and geometric parameter calculation of mandrel bars.

Keywords:

mandrel bar cracks; YOLO11n-seg; pixel-level segmentation; single-channel adaptation

1. Introduction

Seamless steel tubes are used extensively in demanding sectors such as energy, petrochemical, and heavy equipment manufacturing, which require a high degree of process stability and product consistency [1,2]. During continuous rolling, stable control of product quality is essential. Moreover, the condition of the critical tooling can impact production safety and tube performance. The mandrel bar, as the core tool, operates in the mandrel mill under coupled conditions of high temperature, heavy load, and severe friction during long-term service. Under these conditions, wear and cracking are inevitable [3]. Such damage directly affects the inner-surface forming quality and dimensional accuracy of steel pipes [4]. If a crack propagates along the circumference or in the rolling direction, it can cause the entire mandrel bar to fracture. Unplanned production stoppages could occur due to these failures [5]. Therefore, for mandrel bars, online crack inspection is not only a defect identification problem, but also a safety-critical task under continuous production conditions [6]. In practical applications, the inspection system is expected to respond in real time, resist strong background interference, and provide geometric parameter information for maintenance decision-making rather than only providing coarse defect alarms. More broadly, studies on complex industrial separation systems have emphasized the importance of stability evaluation [7], while real-time optimization studies have highlighted the role of measurement feedback and adaptive decision-making in reliable industrial operation [8].

Similar to metallic material studies and industrial surface defect detection, mandrel bar crack inspection methods can be divided into traditional methods, conventional machine vision methods, and deep-learning-based methods [9,10]. Traditional methods rely on manual inspection or non-destructive testing (NDT), such as infrared inspection, magnetic flux leakage, and eddy current testing [11]. However, manual inspection is inherently subjective, and many NDT methods depend on dedicated instruments and are not well suited to continuous online monitoring under production conditions [12]. To address these limitations, conventional machine vision methods have been introduced for automated online inspection. These methods typically employ handcrafted features, such as local binary patterns (LBP) or histograms of oriented gradients (HOG), in combination with a classifier to complete crack detection [13]. However, they usually rely on handcrafted shallow features and are therefore sensitive to illumination variation, surface contamination, and complex background textures [14]. As a result, they often provide only coarse defect localization rather than precise crack delineation, making subsequent geometric parameter calculation difficult.

In recent years, defect detection based on deep learning has been widely applied to the identification of surface defects in steel products, including sheets and bars [15]. These methods have improved both detection accuracy and real-time performance. Existing deep-learning-based detectors can generally be categorized into two-stage and one-stage frameworks. Methods that follow two-stage mechanisms like Faster R-CNN can achieve strong accuracy but often have higher computational cost for real-time industrial applications due to their complex design [16]. Alternatively, one-stage detectors such as the You Only Look Once (YOLO) series achieve a strong trade-off between accuracy and speed, and have been widely used for online industrial defect detection [17,18,19,20]. However, many current studies still formulate crack inspection as an object detection task [21]. For slender and irregular cracks, rectangular bounding boxes inevitably include large amounts of background pixels and are not well suited to describing crack boundaries, continuity, or local width variation. As a result, detection-based methods are insufficient when the engineering objective is not only to identify the presence of a crack, but also to calculate geometric parameters related to its morphology and size. In such cases, pixel-level segmentation is more suitable because it provides a more detailed representation of crack shape and supports subsequent geometric parameter analysis [22]. Segmentation-based approaches, including semantic segmentation and instance segmentation [23,24,25,26], are therefore better suited than detection-based methods for fine crack-region representation and mask-based parameter calculation.

Recent crack-inspection studies have shifted from coarse localization to pixel-level segmentation and mask-based quantification. These studies have explored detection–segmentation–quantification pipelines, lightweight crack segmentation, and feature-fusion-based segmentation, but they mainly focus on dam, concrete, or pavement crack inspection scenarios [27,28,29]. In contrast, mandrel bar crack segmentation under hot-rolling production conditions remains less explored, especially for grayscale metallic images with surface textures and crack-like background interference. Under such task-specific imaging and deployment conditions, existing lightweight crack segmentation approaches still have several limitations for mandrel bar inspection. First, online industrial deployment requires a balance between inference efficiency and segmentation reliability. Although lightweight models reduce computational cost, maintaining reliable segmentation under background-dominated production conditions remains difficult. Second, mandrel bar crack images acquired in practice are typically grayscale, whereas most mainstream deep networks are designed for three-channel input. A typical workaround is to repeat the grayscale image in all three channels [30]. However, this does not introduce any new information and increases the computational burden. Another approach uses fixed filtering to construct pseudo multi-channel inputs [31]; however, this may introduce edge blurring and increase system complexity. Third, mandrel bar cracks are often thin, elongated, locally low-contrast, and sometimes discontinuous, making weak crack cues difficult to preserve during lightweight feature extraction. Under such industrial imaging conditions, weak crack cues are also easily disturbed by surface textures and other crack-like interference. Finally, many existing studies mainly report detection or segmentation accuracy, while paying limited attention to alarm-level missed detections, false alarms, and mask-based geometric parameter output in online inspection. For production-line applications, such parameters are mainly used as rapid size-related references for defect warning and maintenance decisions.

To address these limitations, this study develops a lightweight segmentation-oriented framework for crack extraction and geometric parameter calculation on an actual mandrel bar production line. The framework first reformulates mandrel bar crack inspection from coarse bounding-box localization to pixel-level segmentation, enabling more detailed representation of crack boundaries and morphology. For grayscale industrial images, a single-channel input adaptation strategy is adopted to reduce redundant channel computation. To meet real-time deployment requirements, a lightweight inference-oriented design is introduced. To improve the extraction of weak, thin, elongated, low-contrast, and discontinuous cracks, crack-oriented feature enhancement is incorporated to strengthen multi-scale contextual representation, local detail preservation, and continuity perception. At the application level, a dedicated industrial dataset and a sample-balancing strategy are used to improve crack–background discrimination under background-dominated conditions. Based on the predicted masks, crack area, projected length, maximum width, and average width are further calculated as rapid size-related indicators for online warning and maintenance decision-making. The scientific contribution of this study lies in the task-oriented adaptation of a lightweight YOLO-based segmentation framework for online mandrel bar crack inspection. By integrating single-channel grayscale input adaptation, lightweight inference-oriented reconfiguration, crack-oriented feature enhancement, alarm-level evaluation, and mask-based geometric parameter calculation, the proposed framework provides a practical solution for real-time crack warning and size-related assessment under actual hot-rolling production-line conditions.

2. Proposed FGD-YOLO Model

2.1. Baseline Model: YOLO11n-Seg

YOLO11n-seg is a single-stage instance segmentation network that predicts class labels, bounding boxes, and segmentation masks. Inheriting the efficient, lightweight characteristics of the YOLO family, its architecture comprises a CSP backbone, a PAN–FPN [32,33] neck, and a three-branch prediction head. Specifically, the backbone extracts multi-scale feature representations from the input image, and the neck then fuses these features to enhance defect segmentation across different scales. The segmentation branch is dedicated to estimating pixel-level masks. This design facilitates unified detection and segmentation within a single-stage framework, while pixel-level mask prediction improves mandrel bar crack delineation and supports subsequent geometric parameter calculation based on crack masks.

2.2. FGD-YOLO Model Improvement

On the basis of adapting YOLO11n-seg to a single-channel, lightweight configuration, module-level refinement is conducted. First, a FAST structure is designed. Then, the C3k2 module at the 8th backbone stage is replaced with the Global-to-Local Controllable Receptive Module (G2L-CRM). The C2PSA module at the 10th backbone stage is replaced with a C3k2 module featuring a Dilated-Wide-Residual (DWR) architecture, denoted as C3k2-DWR. In this study, the proposed framework is denoted as FAST–G2L–DWR YOLO, abbreviated as FGD-YOLO, where F denotes the FAST lightweight configuration, G denotes G2L-CRM, and D denotes C3k2-DWR. The resulting FGD-YOLO architecture is shown in Figure 1.

2.2.1. Single-Channel Input Adaptation

Mandrel bar images acquired in industrial environments are typically grayscale, while the background often contains complex noise. To reduce redundant computation from a three-channel input in the baseline YOLO11n-seg, the model was modified for end-to-end single-channel adaptation. Because crack information in this scenario is mainly conveyed by intensity variation rather than color cues, replicating a grayscale image into three channels introduces extra computation without adding useful discriminative information. As shown in Figure 2, the original model (three-channel input) replicates the grayscale image into C1–C3, fusing them in Conv1. In contrast, the single-channel input scheme feeds the grayscale image directly into Conv1 for initial feature extraction. Conv1 is configured to accept only a single-channel input, while data loading, augmentation, training, and inference are adjusted accordingly.

This adaptation eliminates redundant channel processing, thereby reducing computational cost and memory bandwidth at the input stage and improving inference speed. It also helps the backbone focus on crack-relevant structural information, which is beneficial for preserving weak crack responses. This adaptation therefore provides an efficient and task-consistent input configuration for grayscale mandrel bar crack segmentation.

2.2.2. FAST Lightweight Configuration

YOLO11n-seg is optimized through lightweight design and configuration tuning to meet real-time constraints for online mandrel bar crack segmentation. The adjustments are: (1) reducing the number of stacked C3k2 blocks in the neck from 2 to 1; (2) changing the segmentation head output channels from [32,256] to [16,192]; and (3) disabling the residual connection in the C3k2 module at the sixth layer of the backbone and setting the channel compression ratio to 0.25. The input size was kept consistent across all experiments to avoid introducing additional resolution-related effects. This lightweight design is based on the observation that crack targets occupy only a limited image area; therefore, appropriately reducing redundant feature fusion and head capacity can improve inference efficiency while maintaining crack-region representation. Together, these changes constitute the FAST configuration of YOLO11n-seg for more efficient inference.

2.2.3. G2L-CRM

In this study, the G2L-CRM is designed as a task-oriented lightweight receptive-field enhancement module for mandrel bar crack segmentation, aiming to improve deployment efficiency, preserve thin-crack continuity, and suppress background interference. Within G2L-CRM, a multi-branch receptive-field modulation operation is adopted as a feature-refinement step in the local modeling branch. As schematically illustrated in Figure 3, the input tensor X is processed by parallel branches with different dilation rates to enhance multi-scale contextual representation. The outputs from these branches are summed together to create a preliminary fused feature (

\hat{F}

). This preliminary feature is then passed into a gating mechanism (

{C o n v}_{g a t e}

), which generates an M using a Sigmoid activation function. The preliminary fused feature is then element-wise multiplied by M to produce a refined representation. Finally, a shortcut connection adds this refined representation (

\hat{F}

) back to the original input tensor (X), yielding the output of this operation. This operation provides a useful basis for crack-oriented feature modeling. The parallel dilated branches support multi-scale context aggregation, while the gating mechanism helps suppress background interference and enhance informative crack responses.

To meet the real-time requirements of mandrel bar crack segmentation, the above receptive-field modulation operation is organized within a simplified and lightweight partial-aggregation module. As depicted in Figure 4, the G2L-CRM module adopts a CSP-style architecture with two branches: a bypass branch and a local modeling branch. The bypass branch helps preserve original crack cues during feature propagation, which is important for thin and weak crack responses. In the local modeling branch, multi-dilation convolutions enlarge the contextual receptive field for crack regions with different spatial spans, while cascaded stacking strengthens multi-scale representation. Compared with directly applying receptive-field modulation to all feature channels, G2L-CRM reformulates the module into a lightweight partial-aggregation structure, which reduces structural redundancy while retaining crack-oriented receptive-field modeling. This makes it more suitable for real-time mandrel bar crack segmentation and helps preserve thin crack continuity, adapt to cross-scale crack variation, and better capture locally interrupted crack regions.

2.2.4. C3k2-DWR

The C3k2-DWR module integrates channel compression with parallel multi-dilation context expansion as a task-oriented refinement within the lightweight YOLO framework, thereby enhancing multi-scale crack representation while maintaining real-time efficiency. Its design is motivated by the observation that weak or discontinuous crack segments are easily lost after repeated down-sampling and convolutional compression in noisy industrial backgrounds. In this study, C3k2-DWR embeds a DWR branch [34] and an SIR branch into the C3k2 framework to strengthen crack-context representation while maintaining lightweight feature refinement. As shown in Figure 5a, the DWR branch first applies a 3 × 3 convolution. It then performs channel compression and feature extraction, followed by batch normalization (BN) and ReLU. The features are then fed into parallel 3 × 3 convolution branches with dilation rates of 1, 3, and 5 to enlarge the receptive field and enhance multi-scale context modeling. The branch outputs are concatenated and fused by a 1 × 1 convolution, with the result being subsequently added to the branch input to produce the final DWR branch output. This design provides a basis for crack-oriented context modeling. The parallel multi-dilation branches aggregate contextual cues from different spatial ranges, which is beneficial when local crack evidence is weak, incomplete, or contaminated by background noise.

As shown in Figure 5b, the SIR branch first applies a 3 × 3 convolution, followed by BN and ReLU, and then a 1 × 1 convolution. The resulting features are added to the original input to obtain the SIR branch output. Compared with the DWR branch, the SIR branch is more compact because it uses a single convolution branch, thereby reducing complexity while supporting local refinement for weak crack responses. Finally, the outputs of the DWR and SIR branches are fused within the C3k2 framework. This forms the output features of C3k2-DWR. The task-oriented value of C3k2-DWR lies in its integration within the C3k2 partial aggregation pathway. In this design, the DWR branch provides contextual compensation for discontinuous or low-contrast crack regions, while the SIR branch preserves local details with low computational cost. By combining context enhancement and compact residual refinement in a lightweight framework, C3k2-DWR is more suitable for mandrel bar crack segmentation. It helps recover discontinuous cracks, retain weak crack cues, and improve discrimination in low-contrast or indistinct-edge crack regions under real-time constraints.

3. Experimental

3.1. Dataset Construction

The experimental data were collected on a hot-rolled seamless steel tube production line under real industrial conditions. The raw mandrel bar images were cropped from 2048 × 2048-pixel images into 512 × 512-pixel patches. The samples were then divided into crack and background categories based on the presence of crack defects. Crack images exhibit diverse morphologies. The background images comprise crack-free images that feature various interfering surface textures, including oxide scales and water stains. To increase data diversity, these images were collected from different mandrel bars and production periods to cover a wide range of surface conditions and imaging variations. All crack images were pixel-wise annotated using LabelMe. Figure 6 shows crack images, annotated images, and background images.

The dataset contains 2900 crack images and 11,320 background images. The crack images were randomly split into training, validation, and test sets at an 8:1:1 ratio, and the three subsets were kept strictly disjoint. For background images, disjoint training, validation, and test subsets were also constructed. In practical online inspection, the model is required to detect a small number of cracks from a large volume of background inputs. To better simulate this scenario, additional background images that were not used in training or validation were further incorporated into the test set. Therefore, the final test set consists of two parts: (1) crack samples that are strictly disjoint from the training and validation sets, and (2) a large number of previously unseen background images. This design yields an input distribution closer to real production conditions while ensuring a strictly independent evaluation for crack samples.

3.2. Experimental Environment

To ensure the reproducibility of the experimental results, the hardware and software configurations are summarized in Table 1 and Table 2.

All experiments were conducted under the same environment for fair comparison. The batch size was 16, and the weight decay was 0.0005. Training ran for 200 epochs with an initial learning rate of 0.01. No pre-trained weights were used during training. For YOLO-based models, other hyperparameters followed the default YOLO11n-seg (Ultralytics) settings. The semantic segmentation baselines were trained and evaluated under the same dataset split, input size, and evaluation protocol. To characterize run-to-run variability, the key experiments in Section 4.2.3 and Section 4.3 were independently repeated three times using the same dataset split, training settings, and evaluation protocol, and the results are reported as mean ± standard deviation.

3.3. Evaluation Metrics

For mandrel bar crack segmentation, average precision (AP) is used for performance evaluation. Precision and recall are defined in Equations (1) and (2). By varying the confidence threshold, a PR curve is constructed. Average precision (AP) is defined as the area under the PR curve over the range of recall values. Since this is a single-class problem, mean average precision (mAP) is equivalent to AP, as shown in Equation (3):

Precision = \frac{TP}{TP + FP}

(1)

Recall = \frac{TP}{TP + FN}

(2)

mAP = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{1} P (R) dR

(3)

Here, TP denotes true positives, FP false positives, TN true negatives, and FN false negatives. At an IoU threshold of 0.5, mAP@0.5(bbox) is calculated for bounding boxes and mAP@0.5(mask) for segmentation masks.

The False Positive Rate (FPR) is used to evaluate image-level false alarms on mandrel background images. It is defined as the ratio of background samples misclassified as cracks to all true background samples, as follows:

FPR = \frac{FP}{FP + TN}

(4)

The False Negative Rate (FNR) is used to evaluate image-level missed crack alarms on true mandrel bar crack images. It is defined as the ratio of crack images without predicted crack regions to all true crack images, as follows:

FNR = \frac{FN}{FN + TP}

(5)

In the implementation of Equations (4) and (5), all compared models were evaluated using the same image-level crack-presence protocol. The output of each model was converted into a binary crack map. For YOLO-based models, all predicted masks were merged into one binary map, and predictions were retained using a confidence threshold of 0.25 unless otherwise specified. After thresholding, an image with at least one non-empty predicted crack region was regarded as triggering a crack alarm. For semantic segmentation models, including U-Net, DeepLabv3+, and BiSeNetV2, the model output was converted into a probability map using a sigmoid function and then binarized using the same fixed decision threshold of 0.25. A crack image with any predicted crack region was counted as TP and otherwise as FN; a background image with any predicted crack region was counted as FP and otherwise as TN. FNR and FPR were calculated from these image-level TP, TN, FP, and FN values. This image-level alarm definition matches the practical requirement of online industrial inspection. In production-line applications, the system first needs to determine whether the current image should trigger a crack alarm for further checking and maintenance decision-making.

All FPS values were measured on the workstation specified in Table 1 using 512 × 512 input images. Frames per second (FPS) reflects real-time processing capability and is calculated from the average inference latency per image, denoted as Speed (ms/image), with the batch size set to 1 and excluding image loading, manual preprocessing, visualization, and result saving overhead, as follows:

FPS = \frac{1000}{Speed (ms)}

(6)

The Dice coefficient is used to evaluate the overlap between the predicted crack region and the ground-truth crack region, with a higher value indicating better pixel-level segmentation quality.

Parameters (M) denote the total number of trainable parameters, measured in millions, reflecting the model’s size and storage and deployment costs.

GFLOPs represents the number of giga floating-point operations required for a single forward inference, quantifying computational complexity and hardware demand.

4. Results and Discussion

All results reported in this section were obtained under the final independent test protocol described in Section 3.1 and the unified evaluation protocol described in Section 3.3. Unless otherwise specified, FNR and FPR are reported at the alarm level throughout this section.

4.1. Baseline Determination

In online industrial inspection, the alarm-level FPR directly affects alarm reliability and the cost of manual verification and maintenance. Background image features often resemble cracks, leading to false alarms when discrimination is weak. Incorporating real-texture background images into training strengthens crack–background discrimination.

4.1.1. Effect of Background Ratio

In this experiment, the crack-image split was kept unchanged, while the proportion of background images in the training and validation sets was adjusted. The background images were partitioned into disjoint pools for the two splits, ensuring no overlap. The background-to-defect ratio was set to 0, 0.3, 0.5, 1, and 2 for the training split, and 0.3, 0.5, and 1 for the validation split, yielding 15 data configurations. The analysis was conducted on the original YOLO11n-seg baseline to determine the background-ratio setting used in the subsequent experiments.

As shown in Table 3, increasing the proportion of background images in the training set generally reduces alarm-level false positives and improves mask-level segmentation performance. However, the optimal setting cannot be identified from a single metric alone. Without background images in the training set, the model achieves the lowest alarm-level FNR of 1.38%, but the alarm-level FPR remains excessively high, which would cause frequent false alarms in online inspection. In contrast, when the training background ratio is increased to 2, the FPR is further reduced and the mAP is improved, but the FNR rises to 2.07%. This is unfavorable for the present task, because missed crack samples may directly affect defect warning and maintenance decisions, whereas false alarms mainly increase manual verification cost. Among the remaining settings, the configuration with a training background ratio of 1 and a validation background ratio of 0.5 provides the most suitable overall performance. It keeps the FNR at 1.72%, reduces the FPR to 3.35%, and achieves strong segmentation accuracy. Compared with the setting of train 1 and val 1, it maintains the same FNR while yielding a lower FPR and higher FPS. Therefore, the setting with a training background ratio of 1 and a validation background ratio of 0.5 was selected for the subsequent experiments.

4.1.2. Effect of Input Channel Configuration

To further examine the effect of input channel configuration, the same 15 background-ratio settings were also evaluated under the single-channel YOLO11n-seg setting, with all other settings unchanged.

In each subfigure, FPR and FNR are shown on the left and right vertical axes, respectively, and the horizontal axis denotes the training background ratio. Figure 7a–c correspond to validation background ratios of 0.3, 0.5, and 1, respectively.

As shown in Figure 7, a similar trend is observed under the single-channel setting. Increasing the proportion of background images in the training set markedly reduces FPR, while the change in FNR is relatively smaller. This indicates that the background-to-crack ratio mainly affects false-alarm suppression, whereas the missed-alarm behavior changes more moderately. The configuration with a training background ratio of 1 and a validation background ratio of 0.5 still shows good performance under single-channel input, suggesting that the selected background-ratio setting remains applicable after the input is changed to a single-channel configuration. Under this setting, the single-channel YOLO11n-seg achieved an alarm-level FNR of 1.38% and an FPS of 181 in the baseline-determination experiment. The results reported in Section 4.1 correspond to representative single-run experiments used for baseline and setting determination, whereas the repeated-run statistics in the subsequent ablation and comparative experiments were independently obtained under the same final evaluation protocol. Therefore, the numerical values are not expected to be completely identical.

4.2. Model Improvements and Results

4.2.1. Layer Selection of G2L-CRM

To determine the most suitable insertion position of G2L-CRM, the module was placed at different backbone or neck layers under the selected experimental setting. Since feature maps at different depths differ in spatial resolution and semantic abstraction, the effect of G2L-CRM may vary with the insertion position. Shallow layers preserve more local details. However, they provide a weaker semantic context for modeling weak crack continuity. Deeper layers contain stronger semantic information, but repeated downsampling may weaken fine crack details. Therefore, a layer-wise comparison was conducted to identify the position that best balances crack continuity modeling, background suppression, and computational efficiency.

As shown in Table 4, the placement of G2L-CRM has a clear influence on alarm-level behavior, mask-level segmentation quality, and deployment efficiency. When inserted at the shallow P4 layer, the model yields a relatively low FPR, but FNR rises to 3.10%, indicating insufficient enhancement of weak crack responses and continuity modeling. At P6, FNR is reduced to 1.72%, but FPR increases to 5.53%, and both mask and bbox mAP decrease, suggesting that the response becomes stronger but the segmentation becomes less accurate. By comparison, P8, P10, and P13 all achieve the lowest FNR of 1.38%, showing that middle-to-deep placement is more effective for missed-alarm control. Among these candidates, P8 gives the lowest FPR, the highest mask mAP, and the lowest computational cost. Although P10 achieves a slightly higher bbox mAP, its FPR is higher than that of P8. In addition, the dual-layer setting P4 + P8 does not bring further improvement. Therefore, P8 provides the best overall trade-off and was selected as the final placement of G2L-CRM.

4.2.2. Layer Selection of C3k2-DWR

To determine the most suitable insertion position of C3k2-DWR, layer-wise experiments were conducted under the selected experimental setting. In this comparison, G2L-CRM was fixed at P8, and only the placement of C3k2-DWR was varied. All other settings were kept unchanged to ensure a fair comparison.

As shown in Table 5, although P6 achieves the lowest FNR of 1.72%, its FPR remains relatively high at 5.42%, and its mask mAP is lower than that of P10. This suggests that placing C3k2-DWR at P6 enhances crack response, but also introduces more interference from complex backgrounds. By comparison, P10 slightly increases FNR to 2.07%, but reduces FPR to 4.47%, improves mask mAP to 89.8%, and gives higher FPS. Although FNR was treated as the primary criterion, the final placement was not determined by FNR alone. Under the condition that the missed-alarm risk remained at a low level, false-alarm suppression, pixel-level segmentation quality, and deployment efficiency were also considered. From this perspective, P10 provides a more favorable overall trade-off and was selected as the final placement of C3k2-DWR.

4.2.3. Ablation Study

To further clarify the formation of the proposed framework, an ablation study was conducted under the selected experimental setting. The original YOLO11n-seg was first modified from its default three-channel input to a single-channel input configuration. On this single-channel baseline, FAST, G2L-CRM, and C3k2-DWR were then added step by step to assess their respective contributions and the overall effectiveness of their combination. All other settings were kept unchanged to ensure a fair comparison. The ablation results are summarized in Table 6.

As shown in Table 6, the proposed framework is formed through progressive and complementary improvements. Replacing the original baseline with the single-channel baseline reduces the alarm-level FNR from 2.41 ± 0.7% to 1.95 ± 0.5% and increases FPS from 166 ± 3 to 185 ± 6, indicating that single-channel input is more suitable for grayscale crack images. However, the alarm-level FPR also rises from 3.56% to 5.06%, which means that the stronger crack response is accompanied by more false alarms on crack-like background patterns. Although the single-channel adaptation removes redundant input-channel processing, the accompanying implementation adjustments in the single-channel baseline slightly increase the overall parameter count. After FAST is introduced, FNR is further reduced to 1.61 ± 0.2%, while GFLOPs decrease from 3.70 to 3.00 and FPS increases to 191 ± 6, showing that FAST mainly improves efficiency and missed-alarm control. After G2L-CRM is added, FPR decreases slightly, which suggests better background suppression. However, FNR becomes higher, and mAP becomes lower, so this module alone does not improve the overall result. A similar situation is observed when C3k2-DWR is added alone. Although mask mAP increases slightly, the overall performance gain remains limited. By contrast, when G2L-CRM and C3k2-DWR are used together, the full model achieves lower FNR and FPR than either single-module variant, while also providing higher mask mAP and the highest FPS, indicating improved pixel-level segmentation quality and deployment efficiency. This result suggests that the two modules have complementary effects, and their combination provides a more balanced improvement in crack continuity perception, local feature representation, and background suppression. The small standard deviations of the full model indicate stable alarm and inference performance across the three independent runs.

4.3. Comparative Experiment

To further evaluate the effectiveness of the proposed method, comparative experiments were conducted using representative semantic segmentation and YOLO-based instance segmentation models. This design was adopted because the main concern of this study is crack-region segmentation and subsequent geometric information extraction, rather than the prediction form itself. Considering their different prediction mechanisms, the results are interpreted separately by model type: FNR and FPR are used as common image-level alarm indicators, Dice is used as the common pixel-level metric, and mAP@0.5(mask) and mAP@0.5(bbox) are discussed only for YOLO-based instance segmentation models. Specifically, U-Net, DeepLabv3+, and BiSeNetV2 were selected as semantic segmentation baselines, representing a classical segmentation framework, a model with stronger feature representation capability, and a lightweight real-time segmentation method, respectively. YOLOv5n-seg, YOLOv8n-seg, YOLO12n-seg, and YOLO11n-seg were selected as lightweight YOLO-based instance segmentation baselines to evaluate the performance of the proposed method under similar real-time design objectives. To ensure a fair comparison, all models were tested on the same independent test set under the same image resolution, inference environment, and evaluation protocol. The proposed single-channel input was retained only for FGD-YOLO because it is one of the intended improvements of the proposed framework for grayscale mandrel bar images. Since missed-alarm control and real-time deployment are both critical in this task, the compared models are assessed from the perspective of overall engineering trade-off.

Low FNR and FPR values alone do not necessarily indicate accurate crack-region segmentation, because these image-level indicators only reflect whether a crack warning is triggered. Since the predicted masks are further used for geometric parameter calculation, pixel-level segmentation quality is also important. Therefore, Dice is used as a common pixel-level metric across all compared models, while mAP@0.5(mask) and mAP@0.5(bbox) are used only for YOLO-based instance segmentation models to avoid direct comparison using incompatible metrics.

For the semantic segmentation baselines, the results in Table 7 show that these models do not provide the most favorable overall performance for the present task.

Results of FNR, FPR, mAP, and FPS are reported as mean ± standard deviation over three independent runs. FNR and FPR are used as image-level alarm indicators. Due to different prediction forms, semantic segmentation baselines and YOLO-based instance segmentation models are interpreted separately. mAP@0.5(mask) and mAP@0.5(bbox) are reported only for YOLO-based instance segmentation models, while Dice is used as the common pixel-level segmentation quality metric across all compared models. Params and GFLOPs are used to evaluate model size and computational complexity, respectively.

Although U-Net, DeepLabv3+, and BiSeNetV2 achieve favorable FNR or FPR in some cases, their Dice scores are clearly lower than those of the YOLO-based models, indicating less complete crack-region representation. U-Net gives the lowest FNR, but its FPR is as high as 17.82%, and its model complexity is also much higher, making it unfavorable for online deployment. DeepLabv3+ controls false alarms better, but both configurations still require large numbers of parameters and GFLOPs, while their Dice scores remain around 81%. BiSeNetV2 is lighter and faster, but its Dice is only 75%. Overall, under the present engineering trade-off involving alarm behavior, pixel-level segmentation quality, model complexity, and inference speed, the semantic segmentation baselines do not present a clear advantage over FGD-YOLO.

Among the YOLO-based instance segmentation models, the main difference lies in their different tendencies in missed-alarm control, false-alarm suppression, and mask-level segmentation quality. YOLOv5n-seg achieves a relatively low FNR, but its FPR is the highest and its mask mAP is much lower than those of the newer YOLO variants. This suggests that it responds more aggressively to crack-like features, which helps reduce missed crack alarms but also leads to insufficient background suppression and poorer segmentation quality. YOLOv8n-seg and YOLO12n-seg show stronger overall segmentation performance, but their FNR remains higher than that of FGD-YOLO, indicating that weak or locally faint crack cues are still not preserved sufficiently. The original YOLO11n-seg yields a relatively low FPR, indicating better suppression of crack-like background interference, but its higher FNR suggests that some weak or locally faint crack cues are not retained sufficiently. Overall, the YOLO-based baselines exhibit different characteristics in crack sensitivity and background suppression, whereas FGD-YOLO places greater emphasis on reducing missed crack alarms while keeping false alarms at an acceptable level under practical inspection conditions.

By comparison, FGD-YOLO achieves a more favorable overall trade-off while maintaining high segmentation performance. Relative to the original YOLO11n-seg baseline, it reduces the mean alarm-level FNR from 2.41 ± 0.7% to 1.72 ± 0.3% and increases FPS from 166 ± 3 to 204 ± 1, while mAP@0.5(mask) remains comparable, changing from 88.2 ± 0.4% to 88.5 ± 1.1%. The FPR of FGD-YOLO is higher than that of the original YOLO11n-seg baseline, indicating a sensitivity–false-alarm trade-off rather than a uniform improvement in all indicators. Overall, FGD-YOLO mainly improves real-time inference efficiency while maintaining competitive segmentation performance. Because only three independent runs were conducted, the confidence intervals are used as descriptive stability information rather than definitive statistical significance evidence. In particular, the FPS interval of FGD-YOLO ranges from 201.5 to 206.5, whereas that of YOLO11n-seg ranges from 158.6 to 173.5, indicating a stable advantage in inference efficiency. For FNR, FGD-YOLO shows a lower mean value, although the corresponding confidence intervals still overlap to some extent: the interval ranges from 0.97 to 2.47 for FGD-YOLO and from 0.67 to 4.15 for YOLO11n-seg. Therefore, the FNR difference is interpreted as a favorable tendency in missed-alarm control under the current evaluation protocol, rather than as a statistically conclusive improvement. Similarly, the mAP@0.5(mask) result is interpreted as comparable segmentation performance rather than a statistically significant increase. To better illustrate the relationship between missed-alarm control and real-time deployment efficiency among the YOLO-based models, the FNR-FPS trade-off is shown in Figure 8. FGD-YOLO occupies a more favorable position in this plot, indicating a lower mean missed-alarm risk with higher real-time inference efficiency.

To further examine model behavior under different confidence thresholds, the FNR and FPR of FGD-YOLO and the original YOLO11n-seg were compared, as shown in Figure 9a,b. As shown in Figure 9a, FNR increases with increasing confidence threshold for both models, indicating that stricter thresholding suppresses weak predictions and makes weak, thin, or locally faint crack responses more likely to be missed. By comparison, FGD-YOLO maintains lower alarm-level FNR over most of the practical threshold range, especially from 0.2 to 0.5, showing better retention of weak crack cues. As shown in Figure 9b, FPR decreases as the confidence threshold increases for both models, which is consistent with the suppression of low-confidence false alarms. However, under most identical threshold settings, FGD-YOLO shows slightly higher FPR than YOLO11n-seg. This indicates that the proposed method reduces missed crack alarms by strengthening the response to weak, thin, and discontinuous crack features, but also becomes more sensitive to crack-like background structures. Taken together, these results show that, over most of the practical threshold range, FGD-YOLO achieves lower alarm-level FNR at the cost of a limited increase in alarm-level FPR. For practical mandrel bar inspection, where the cost of missed crack alarms is higher than that of false alarms, this behavior is more consistent with engineering requirements. Overall, the above results suggest that FGD-YOLO shows practical value for mandrel bar crack inspection because it provides lower missed-alarm risk, competitive segmentation quality, and higher real-time applicability under consistent evaluation settings.

4.4. Qualitative Analysis Under Challenging Industrial Conditions

To further evaluate model performance, representative crack cases under challenging industrial conditions are compared in Figure 10. The compared models include YOLOv8n-seg, YOLO12n-seg, YOLO11n-seg, and the proposed FGD-YOLO. The selected examples cover different visual difficulties together with a representative false-positive case.

Group (a) shows a crack with uneven visual saliency along its path. The coarse part is relatively clear, whereas the thinner extension is much weaker under background interference. YOLOv8n-seg detects the obvious segment but misses the thinner part, while YOLO11n-seg and YOLO12n-seg produce weaker responses. By comparison, FGD-YOLO yields a clearer and more continuous prediction. This shows better preservation of weak and thin crack cues.

Group (b) presents a slender and elongated crack, with local width variation along its path. All models detect the crack and cover its main region well. However, FGD-YOLO gives a tighter detection box that more closely matches the crack extent. This result indicates better localization compactness for slender crack structures.

Group (c) shows discontinuous cracks, which make crack perception and localization more difficult. In the first sample, YOLO11n-seg only responds to a local segment and fails to represent the crack completely, indicating the limited ability of the original baseline to handle interrupted crack responses. YOLOv8n-seg and YOLO12n-seg detect the crack, but they also include the non-crack feature below the crack region, resulting in relatively large prediction boxes. FGD-YOLO, in contrast, gives a tighter prediction in the first sample and is the only model that successfully detects the crack in the second sample. These results indicate that the proposed improvements are more effective for discontinuous crack features and can better suppress interference from similar background structures.

Group (d) corresponds to a low-contrast crack located in a dark background region. Uneven illumination further reduces the grayscale difference between the crack and the surrounding surface, making the crack boundary obscure and difficult to distinguish from the background texture. Under this condition, the compared baselines fail to detect the crack, whereas FGD-YOLO still identifies it effectively. This highlights the stronger adaptability of the proposed framework to low-contrast cracks under dark and unevenly illuminated backgrounds.

Group (e) presents a crack under a complex industrial background containing oxide peeling, stain-like interference, and mandrel surface textures. These background features generate crack-like responses and increase the difficulty of crack–background discrimination. Since the crack itself is also weak, it is easily submerged by the surrounding interference. Under this condition, only FGD-YOLO detects the crack successfully. This suggests better resistance to complex background interference and better preservation of weak crack cues.

Group (f) shows a typical crack with clear visual characteristics. Although all models detect it, YOLOv8n-seg and YOLO11n-seg produce duplicate boxes, while YOLO12n-seg gives a relatively large prediction box. In contrast, FGD-YOLO detects the crack with a single prediction box and without redundant responses. This reflects more stable prediction behavior for common crack samples.

Group (g) gives a representative false-positive example. Oxide peeling with a crack-like appearance is misclassified as a crack by FGD-YOLO, while the other models do not produce a detection. This suggests that the stronger sensitivity of the proposed framework to crack-like structures may also increase the risk of alarm-level false positives on similar background patterns, which is consistent with its slightly higher FPR in some comparisons.

5. Application

5.1. Online Detection System

Here, detection refers to online crack warning, crack localization, mask segmentation, and geometric parameter calculation. The FGD-YOLO model was deployed on a hot-rolled seamless steel pipe production line to examine its on-site feasibility for mandrel bar crack detection. As shown in Figure 11, the detection system is placed in the mandrel bar return area, after extraction from the mandrel mill and before the cooling bed. As shown in the actual deployment in Figure 12, the system primarily comprises line-scan industrial cameras, mounting brackets, protective components, a non-contact laser velocimeter, and an image-acquisition and processing unit. The laser velocimeter is used to record the running speed of the mandrel bar and provides positional information for calculating the crack-head distance in the online geometric-parameter interface.

To satisfy the requirements of large-area coverage and high-speed inspection, line-scan cameras are adopted for image acquisition. As shown in Figure 13, four line-scan cameras are evenly distributed around the circumference of the mandrel bar at a height of 500 mm to ensure full surface coverage. The captured surface images are streamed in real time to the acquisition and processing unit, which then feeds them into the FGD-YOLO model for crack detection. The main mandrel bar parameters and camera specifications are summarized in Table 8 and Table 9, respectively.

5.2. Crack Geometric Parameter Calculation

The crack segmentation results from the FGD-YOLO network model provide binary mask images for the mandrel bar cracks. Based on these binary mask images, several geometric parameters of each crack region are computed online, including area, length, maximum width, and average width. In the current production-line inspection system, the main requirement is to rapidly identify crack locations and provide size-related indicators for defect warning and maintenance decisions, rather than to perform high-precision offline metrological measurement. Therefore, the segmented crack region is used for a simple and stable geometric calculation process. Compared with more detailed path-based or contour-based measurement methods, this formulation requires less post-processing and is more suitable for real-time online extraction. Meanwhile, segmentation quality, crack morphology, and pixel-to-physical size conversion should be considered when interpreting the calculated parameters. The specific method is as follows:

First, the obtained crack binary mask M is represented in the form of a binary matrix:

M = \{M_{i, j} |1 \leq i \leq H, 1 \leq j \leq W\}, M_{i, j} \in \{0, 1\}

(7)

Here, H and W denote the height and width of the mask image, respectively, and i and j represent the row and column indices. When

M_{i, j} = 1

, the corresponding pixel belongs to the mandrel bar crack region. Connected component analysis is then performed on the binary mask M. The crack pixel coordinate set S is constructed based on the spatial coordinates of the pixels in the mask, as follows:

S = \{(i, j |M_{i, j} = 1)\}

(8)

Geometric analysis is performed based on the binary mask M and the corresponding crack pixel coordinate set S. Then, the crack area, length, maximum width, and average width are calculated as follows:

1. The area A of the crack is defined as the number of crack pixels (pixels with a value of 1) in the binary mask M, as follows:

A = \sum_{i = 1}^{H} \sum_{j = 1}^{W} M_{i, j}

(9)

2. The length L of the crack is defined as the span along the column index j in the segmented mask image, as follows:

L = |j_{r} - j_{l}|

(10)

The length in Equation (10) represents the horizontal span of the crack region rather than the true crack-path length. Here,

j_{r}

represents the column coordinate of the rightmost crack pixel in the crack region, and

j_{l}

represents the column coordinate of the leftmost crack pixel.

3. As shown in Figure 14, the maximum width

W_{m a x}

of the crack is defined as the diameter of the largest inscribed circle within the crack region. For the crack pixel set S and its boundary set

\partial S

, the minimum Euclidean distance from any pixel

s_{k} \in S

to the boundary is calculated by Equation (11). This distance is defined as the radius of the inscribed circle

r_{k}

:

r_{k} = {}_{b \in \partial S}^{\min}{‖s_{k} - b‖}_{2}

(11)

The

r_{m a x}

is determined by selecting the largest value among all inscribed circle radii. Consequently, the maximum crack width

W_{m a x}

is defined as twice this maximum radius, calculated as

W_{\max} = 2 r_{\max}

.

4. Since the crack region generally exhibits an elongated shape, its area can be approximately represented by the product of the projected crack length and the average width. Accordingly, the average width is estimated as follows:

W_{a v g} = A / L

(12)

The above geometric quantities are first calculated in the image coordinate system based on the number of pixels. Then, using the pixel-to-physical size calibration value, the results are converted from pixel units to actual physical dimensions. Although the converted values are expressed in physical units, they are still derived from the predicted crack mask and the corresponding geometric definitions. Therefore, this calculation is based on the assumption that the predicted binary mask preserves the main crack region with acceptable continuity and boundary integrity, and that the crack region generally exhibits an elongated morphology. This allows the projected length and area-based width estimation to be used as stable size-related descriptors for online inspection, while avoiding complex skeleton extraction or contour tracking. However, for cracks with strong curvature, inclination, branching, local expansion, or fragmented segmentation, the calculated parameters may deviate from more detailed path-based or contour-based measurements. In addition, mask over-segmentation, under-segmentation, boundary noise, false-positive crack-like regions, and the approximate pixel-to-physical size conversion may introduce deviations in the final geometric parameters. Therefore, these parameters are mainly used as online size-related indicators for rapid inspection and maintenance decision-making under the current production-line conditions.

5.2.1. Online Visualization

For field applications, the system visualizes crack geometric parameters in real time. As shown in Figure 15, each crack region is highlighted with a green rectangle and a red mask, while the geometric parameter outputs are displayed, providing an intuitive representation of the crack’s shape and size.

5.2.2. Field Comparison

To further illustrate the field applicability of the proposed method, a typical circumferential crack detected on a mandrel bar was selected for preliminary field comparison. Figure 16 shows the relationship among the ruler-referenced field image, industrial camera images, cropped crack regions, and online detection outputs. Panel (a) shows a field image captured by a mobile phone with a ruler. The red dashed boxes c1, c2, and c3 indicate the approximate positions corresponding to the cropped crack regions. Since the mandrel bar is cylindrical and the crack extends along the circumferential direction, a single mobile phone view can only record a local visible region. It cannot cover the whole circumferential crack.

In the online system, four industrial cameras are arranged around the mandrel bar to cover the circumferential surface. Panels (b) and (c) are 2048 × 2048-pixel images captured by the second and third cameras, respectively. They correspond to adjacent fields of view around the mandrel bar and record different circumferential parts of the same crack. It should be noted that panels (b) and (c) are raw images from different camera views. They were not subjected to cylindrical surface unwrapping, spatial registration, or panoramic stitching. Therefore, although these images correspond to continuous regions of the same crack in physical space, the crack segments may not appear visually continuous in the two-dimensional layout.

To meet the input size of the model, each original 2048 × 2048-pixel image was cropped into several 512 × 512-pixel sub-images. Panels (d), (e), and (f) show the cropped crack regions. They approximately correspond to c1, c2, and c3 in panel (a), respectively. Panels (g), (h), and (i) show the corresponding online detection outputs. In these results, the green rectangle indicates the detected crack region. The red text shows the geometric parameters output by the system, including crack type, crack-head distance, maximum width, and length. The yellow point indicates the center of the maximum inscribed circle calculated from the crack mask. It is used as the local reference point for maximum-width calculation. Since the system is deployed in a Chinese factory, the online interface is configured in Chinese according to field operating requirements. This language setting only affects the interface display and does not affect model inference or geometric parameter calculation. The crack-head distance is displayed to meet field deployment requirements and is obtained from the laser velocimeter integrated into the inspection system. It is mainly used for field localization and is therefore not further analyzed in this study.

As shown in Figure 16, the system can locate the main crack regions in actual industrial images and output the corresponding size parameters. For the longer circumferential crack regions in c1 and c3, the detection boxes cover the main visible crack regions in the cropped images. For the more complex crack morphology in c2, the system still provides an effective localization result. This suggests that the proposed method can support crack localization and visual output under field conditions.

Table 10 further compares the ruler-referenced field measurements with the online system outputs. For maximum width, the differences between the online results and the reference values for c1, c2, and c3 are all 0.1 mm. This is because the maximum width is mainly determined by the local widest part of the crack and is less dependent on the circumferential extension of the crack and the image coverage range. In contrast, the length parameter is more affected by imaging viewpoint, curved-surface projection, crack endpoint judgment, and segmentation completeness. For c1 and c2, the length differences are 3.0 mm and 2.4 mm, respectively, with an average difference of 2.7 mm. This shows that the online system can reflect the main scale of the crack, but the length result should be regarded as a field size reference rather than a strict metrological value. For c3, the manual length was measured only from the locally visible crack segment in the mobile-phone image, whereas the online result was calculated from the crack span detected in the industrial camera image. Since the two measurements do not cover exactly the same crack range, the length difference in c3 was not included in the average value. Overall, this preliminary field comparison shows that the proposed method can provide width information that is generally close to the ruler reference. It can also output projected crack length information as a practical online reference for crack warning, defect screening, and maintenance decision-making. This comparison is intended as a representative field demonstration rather than a statistically comprehensive metrological error evaluation, and larger-scale physically measured crack samples will be required for more robust error statistics in future work.

5.3. Limitations and Implications

Although the proposed geometric parameter calculation method can be used for rapid online evaluation, several limitations should be noted. First, the crack length used in this study is defined as the horizontal span of the crack region in the segmented mask rather than the true crack-path length. Therefore, it should be interpreted as a projected size-related indicator rather than a complete curvilinear crack length. This simplified definition is suitable for stable and efficient online evaluation, but bias may occur when the crack exhibits significant curvature, inclination, branching, or local expansion. In these cases, the projected length may underestimate the actual crack-path length, and the error level is morphology-dependent, depending on crack curvature, inclination, branching pattern, endpoint definition, and mask continuity. Second, while the proposed method showed favorable performance in the current production environment, its performance stability has mainly been evaluated within a single production-line inspection scenario. Thus, the reported results should be understood as evidence of effectiveness and stability under the current industrial conditions, rather than as a complete demonstration of cross-condition generalization capability. Cross-factory and cross-device validation involving different rolling mills, imaging systems, lighting conditions, and mandrel bar types has not yet been conducted. The possible influence of more complex image noise, broader lighting variations, different surface states, camera settings, or domain shifts under other production conditions still deserves further investigation. In addition, mask quality, crack morphology, and pixel-to-physical size conversion may introduce deviations in the calculated geometric parameters. In the current implementation, geometric-parameter extraction failure mainly occurs when no valid crack mask is generated, while heavily fragmented masks may reduce the reliability of projected length and width estimation.

From an engineering perspective, the proposed method still provides a practical balance between computational efficiency and geometric parameter calculation capability. It can support real-time crack localization, visualization, and geometric parameter output on the production line, which is valuable for rapid defect screening and on-site decision-making. Future work will examine the cross-condition generalization capability and practical applicability of the proposed framework under more diverse and complex industrial conditions, for example, by collecting multi-source production-line data, introducing stronger illumination and noise augmentation, and exploring domain-adaptation or fine-tuning strategies. In addition, more refined geometry extraction methods, such as skeleton-based crack length calculation and contour-based width estimation, will be investigated for curved, inclined, or circumferential cracks. These improvements may provide more flexible geometric characterization for complex crack morphologies while maintaining the current framework as a rapid online evaluation approach.

In practical applications, the proposed system is expected to reduce the workload of manual visual inspection, improve the timeliness of crack warning, and provide crack size-related information for mandrel bar maintenance decisions. By supporting earlier identification of damaged mandrel bars, the system may help lower the risk of unplanned downtime and reduce quality losses associated with crack-induced inner-surface defects in seamless steel tubes.

6. Conclusions

From an industrial application perspective, the proposed segmentation-based inspection framework is designed for online mandrel bar inspection in seamless tube rolling lines. The main conclusions are summarized as follows:

1. The proposed FGD-YOLO framework reformulates mandrel bar crack inspection from bounding-box-based localization to pixel-level segmentation. By introducing single-channel grayscale input adaptation, lightweight network reconfiguration, and crack-oriented feature enhancement, the proposed method improves the extraction of weak, thin, irregular, and discontinuous cracks while maintaining real-time inference capability. Experimental results show that FGD-YOLO achieves a favorable balance among alarm reliability, pixel-level segmentation quality, and inference efficiency compared with the original YOLO11n-seg baseline.

2. A dedicated industrial dataset and a background-sample balancing strategy were introduced to improve crack–background discrimination under background-dominated production conditions. By incorporating crack-free images containing oxide scales, water stains, and surface textures, the training and evaluation process better reflects the practical online inspection scenario, where a large number of non-defective images are continuously encountered.

3. Based on the predicted pixel-level masks, crack area, projected length, maximum width, and average width can be rapidly calculated online as size-related indicators. Field deployment results show that the proposed framework can provide practical size-related information for crack warning, defect screening, and mandrel bar maintenance decision-making.

Overall, the proposed method offers a feasible technical option for online mandrel bar crack inspection and geometric parameter calculation under actual production-line conditions. However, the present study is still limited to crack defects under the current production-line scenario. Although the framework was developed for mandrel bar cracks, its single-channel input adaptation and lightweight segmentation design may provide a transferable basis for other grayscale steel surface defect inspection tasks after category-specific annotation and retraining or fine-tuning. Future work will focus on validating the method under more diverse production conditions, extending it to other steel surface defects, and improving geometric measurement for curved, inclined, or fragmented cracks.

Author Contributions

Conceptualization, J.C., Z.S., J.D. and X.L.; methodology, Z.S.; software, Z.S.; validation, Z.S. and J.C.; formal analysis, J.C., Z.S., J.D. and X.L.; investigation, Z.S.; resources, J.C., J.D. and X.L.; data curation, J.C., Z.S., J.D. and X.L.; writing—original draft preparation, Z.S.; writing—review and editing, J.C., Z.S., J.D. and X.L.; visualization, Z.S.; supervision, J.C., J.D. and X.L.; project administration, J.C., J.D. and X.L.; funding acquisition, J.C., J.D. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Liaoning Provincial Science and Technology Program (Joint Program Project) under Grant No. 2025JH2/101800461, and by the Key Research Projects of the Education Department of Liaoning Province under Grant No. JYTZD2023159.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to industrial confidentiality and restrictions related to production-line image data.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Akiyama, M.; Tsubouchi, K.; Tsumura, M.; Hori, H. A study of the mechanism of crack initiation on a mandrel bar for mandrel mill rolling in a seamless tube. Proc. Inst. Mech. Eng. Part J. 2000, 214, 179–187. [Google Scholar] [CrossRef]
Li, Z.L.; Zhang, R.; Chen, D.; Xie, Q.; Kang, J.; Yuan, G.; Wang, G.D. Quenching stress of hot-rolled seamless steel tubes under different cooling intensities based on simulation. Metals 2022, 12, 1363. [Google Scholar] [CrossRef]
Derazkola, H.A.; Fauconnier, D.; Kalácska, A.; Garcia, E.; Murillo-Marrodán, A.; Baets, P.D. Tribological behaviour of DIN 1.2740 hot working tool steel during mandrel mill stretching process. Tribol. Int. 2025, 202, 110361. [Google Scholar] [CrossRef]
Li, C.; Shuang, Y.H.; Chen, J.X.; Zhou, Y.; Wang, C.; Chen, C.; Gou, Y.J.; Dong, B. Research on the impact of mandrels in titanium tubes during tube continuous rolling. Mater. Res. Express 2023, 10, 086512. [Google Scholar] [CrossRef]
He, Q.X.; Dosbaeva, G.K. Microstructural evaluation and failure analysis of flow forming mandrels: Case studies. Eng. Fail. Anal. 2026, 184, 110322. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, D.; Zhang, R.Q.; Yang, Y.H.; Pang, Y.H.; Wang, J.G.; Wang, H. Experimental and numerical analysis of rotary tube piercing process for producing thick-walled tubes of nickel-base superalloy. J. Mater. Process. Technol. 2020, 279, 116557. [Google Scholar] [CrossRef]
Ahmadi, S.; Khormali, A. Petroleum emulsion stability and separation strategies: A comprehensive review. ChemEngineering 2025, 9, 113. [Google Scholar] [CrossRef]
Chachuat, B.; Srinivasan, B.; Bonvin, D. Adaptation strategies for real-time optimization. Comput. Chem. Eng. 2009, 33, 1557–1567. [Google Scholar] [CrossRef]
Tang, B.; Chen, L.; Sun, W.; Lin, Z.K. Review of surface defect detection of steel products based on machine vision. IET Image Process. 2023, 17, 303–322. [Google Scholar] [CrossRef]
Wen, X.; Shan, J.; He, Y.; Song, K.C. Steel surface defect recognition: A survey. Coatings 2023, 13, 17. [Google Scholar] [CrossRef]
Luo, Q.W.; Fang, X.X.; Liu, L.; Yang, C.H.; Sun, Y.C. Automated visual defect detection for flat steel surface: A survey. IEEE Trans. Instrum. Meas. 2020, 69, 626–644. [Google Scholar] [CrossRef]
He, Y.; Song, K.C.; Meng, Q.G.; Yan, Y.H. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 2020, 69, 1493–1504. [Google Scholar] [CrossRef]
Demir, K.; Ay, M.; Cavas, M.; Demir, F. Automated steel surface defect detection and classification using a new deep learning-based approach. Neural Comput. Appl. 2023, 35, 8389–8406. [Google Scholar] [CrossRef]
Zhang, T.; Pan, P.F.; Zhang, J.; Zhang, X.C. Steel surface defect detection algorithm based on improved YOLOv8n. Appl. Sci. 2024, 14, 5325. [Google Scholar] [CrossRef]
Ameri, R.; Hsu, C.C.; Band, S.S. A systematic review of deep learning approaches for surface defect detection in industrial applications. Eng. Appl. Artif. Intell. 2024, 130, 107717. [Google Scholar] [CrossRef]
Liu, G.H.; Chu, M.X.; Gong, R.F.; Zheng, Z.H. DLF-YOLOF: An improved YOLOF-based surface defect detection for steel plate. J. Iron Steel Res. Int. 2024, 31, 442–451. [Google Scholar] [CrossRef]
Zhang, L.M.; Wang, Z.K.; Ma, Y.; Li, G.W. Steel surface defect detection algorithm based on improved YOLOv10. Sci. Rep. 2025, 15, 32827. [Google Scholar] [CrossRef] [PubMed]
Lu, J.B.; Yu, M.M.; Liu, J.Y. Lightweight strip steel defect detection algorithm based on improved YOLOv7. Sci. Rep. 2024, 14, 13267. [Google Scholar] [CrossRef]
Han, J.F.; Cui, G.Q.; Li, Z.W.; Zhao, J.X. DBCW-YOLO: A modified YOLOv5 for the detection of steel surface defects. Appl. Sci. 2024, 14, 4594. [Google Scholar] [CrossRef]
Chen, X.C.; Lv, J.; Fang, Y.L.; Du, S.C. Online detection of surface defects based on improved YOLOV3. Sensors 2022, 22, 817. [Google Scholar] [CrossRef] [PubMed]
Song, H.R. RSTD-YOLOv7: A steel surface defect detection based on improved YOLOv7. Sci. Rep. 2025, 15, 19649. [Google Scholar] [CrossRef]
Wang, X.; Yue, Q.R.; Liu, X.G. SBDNet: A deep learning-based method for the segmentation and quantification of fatigue cracks in steel bridges. Adv. Eng. Inform. 2025, 65, 103186. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation, Med. Image Comput. Comput.-Assist. Interv. (MICCAI). Lect. Notes Comput. Sci. 2015, 9351, 234–241. [Google Scholar] [CrossRef]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) 2018; Springer: Cham, Switzerland, 2018; pp. 833–851. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
He, W.; Xiang, Y.; Zhu, Y.T.; Yan, T.Y.; Xu, L.F.; Li, Y.T.; Lu, J.H. A three-stage identification and quantification for underwater cracks in dams using hybrid feature learning. Eng. Struct. 2025, 339, 120625. [Google Scholar] [CrossRef]
Wang, Y.; Li, S.Q.; Zhang, Y.; Li, Y.C. Lightweight concrete crack segmentation network for drone image with complex backgrounds using multi-scale feature fusion and optimized architecture. Constr. Build. Mater. 2025, 495, 143667. [Google Scholar] [CrossRef]
Dong, J.X.; Wang, N.N.; Fang, H.Y.; Guo, W.T.; Li, B.; Zhai, K.J. MFAFNet: An innovative crack intelligent segmentation method based on multi-layer feature association fusion network. Adv. Eng. Inform. 2024, 62, 102584. [Google Scholar] [CrossRef]
Mi, Z.Z.; Gao, Y.; Xu, X.Y.; Tang, J. Steel strip surface defect detection based on multiscale feature sensing and adaptive feature fusion. AIP Adv. 2024, 14, 045005. [Google Scholar] [CrossRef]
Liu, X.; Liu, H.B.; Zhou, D.Q. Multi-scale defect detection on product surface in grayscale images using YOLOv8-Ms algorithm. Nondestruct. Test. Eval. 2026, 41, 1067–1091. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.F.; Shi, J.P.; Jia, J.Y. Path aggregation network for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, B.; Qiu, H.S. Real-time Concrete Crack Segmentation for Bridge Structural Health Monitoring: A Lightweight YOLOv11-based Approach with Multi-scale Feature Fusion. Results Eng. 2025, 28, 108004. [Google Scholar] [CrossRef]

Figure 1. FGD-YOLO network architecture. The arrows indicate the feature propagation direction, and different colors are used to distinguish different functional modules.

Figure 2. Comparison between the original three-channel input and the single-channel input in YOLO11n-seg: first convolution and segmentation output. The stacked blocks schematically represent feature maps after convolution. The arrows indicate data flow, different colors distinguish the input channels and their corresponding convolution kernels, and the plus sign denotes feature fusion.

Figure 3. Schematic diagram of multi-branch receptive-field modulation.

Figure 4. Illustration of G2L-CRM.

Figure 5. Illustration of the proposed C3k2-DWR module (a) DWR branch for multi-scale contextual feature extraction using parallel dilated convolutions; (b) SIR branch for compact residual feature refinement. The arrows indicate the feature propagation direction. BN denotes batch normalization, ReLU denotes rectified linear unit, DConv denotes dilated convolution, and D-3 and D-5 denote dilation rates of 3 and 5, respectively.

Figure 6. Dataset examples. The red markings in the annotation row indicate the pixel-wise annotated crack regions.

Figure 7. FNR and FPR under different background-ratio settings for single-channel input. (a) Validation background ratio of 0.3; (b) validation background ratio of 0.5; (c) validation background ratio of 1.

Figure 8. FNR-FPS trade-off of the compared YOLO models.

Figure 9. Sensitivity of FGD-YOLO and YOLO11n-seg to different confidence thresholds: (a) FNR; (b) FPR.

Figure 10. Qualitative comparison of representative crack cases on the crack dataset. (a) Crack with uneven visual saliency and weak thin extension; (b) slender and elongated crack with local width variation; (c) discontinuous cracks with interrupted crack responses; (d) low-contrast crack in a dark background region; (e) weak crack under complex industrial background interference; (f) typical crack with clear visual characteristics; (g) representative false-positive example caused by crack-like oxide peeling.

Figure 11. Mandrel bar handling in seamless pipe production.

Figure 12. Detection system layout.

Figure 13. Camera arrangement: (a) axonometric view; (b) front view.

Figure 14. Maximum crack width calculation. The gray region represents the crack mask, the black pixels indicate the crack boundary, the red circle denotes the maximum inscribed circle used to calculate the maximum width, and the blue circle represents an example candidate inscribed circle.

Figure 15. Groups (a,c) present the original crack images collected on site, while groups (b,d) present the corresponding detection results produced by the proposed system. The green rectangle indicates the detected crack region, the red mask indicates the segmented crack area, and the blue circle marks the center of the maximum inscribed circle used for maximum-width calculation.

Figure 16. Comparison of field ruler-referenced image, industrial camera images, cropped crack regions, and online detection outputs. (a) Field image of a mandrel bar crack captured by a mobile phone with a ruler, where the red dashed boxes c1, c2, and c3 indicate the approximate positions corresponding to the cropped crack regions; (b) and (c) 2048 × 2048-pixel images captured by the second and third industrial cameras, respectively; (d), (e), and (f) 512 × 512-pixel cropped crack regions from the industrial camera images, corresponding to c1, c2, and c3, respectively; (g), (h), and (i) online detection outputs corresponding to (d), (e), and (f), respectively. The green rectangle indicates the detected crack region, the yellow point indicates the center of the maximum inscribed circle, the red Chinese text shows the geometric parameters output by the field-deployed online system. The Chinese text means crack type, crack-head distance, maximum width, and length.

Table 1. Hardware and operating system specifications.

Component	Specifications
Operating System	Windows 11 Pro (Microsoft, Redmond, WA, USA) (64-bit)
CPU	Intel Core i9-13900K (Intel, Santa Clara, CA, USA) (13th Gen)
Memory	128 GB DDR5
GPU	NVIDIA GeForce RTX 4080 (NVIDIA, Santa Clara, CA, USA)

Table 2. Software and deep learning environment.

Platform	Configuration
Programming Language	Python 3.9.23
Deep Learning Framework	PyTorch 2.0.0
CUDA	NVIDIA CUDA 11.8
GPU Driver	NVIDIA Driver 561.17

Table 3. Effect of background-to-defect ratio on the three-channel YOLO11n-seg baseline.

Background-to-Defect Ratio in Training Set	Background-to-Defect Ratio in Validation Set	FNR/%	FPR/%	mAP@0.5 (mask)/%	mAP@0.5 (bbox)/%	FPS
0	0.3	1.38	20.15	68.5	67.8	148
	0.5	1.38	21.22	72.8	73.5	157
	1	1.38	16.34	76.4	73.7	161
0.3	0.3	1.72	6.71	84.9	83.6	162
	0.5	2.07	5.96	86.8	87.0	162
	1	1.72	5.91	88.6	87.7	119
0.5	0.3	2.07	4.30	85.4	83.8	161
	0.5	1.72	4.90	88.3	87.0	162
	1	2.07	5.52	86.1	83.2	120
1	0.3	1.72	4.24	86.2	85.1	162
	0.5	1.72	3.35	89.1	87.9	165
	1	1.72	4.00	90.0	88.3	123
2	0.3	2.07	3.21	89.3	89.2	167
	0.5	2.07	3.17	88.6	88.1	168
	1	2.07	3.16	91.6	90.9	119

Table 4. Layer-wise comparison of G2L-CRM placement.

Layer	FNR/%	FPR/%	mAP@0.5 (mask)/%	mAP@0.5 (bbox)/%	Params/M	GFLOPs	FPS
P4	3.10	3.94	87.0	85.3	2.76	3.39	193
P6	1.72	5.53	85.2	83.0	2.84	3.14	186
P8	1.38	4.79	87.9	88.5	2.56	2.97	204
P10	1.38	5.32	86.7	88.7	2.63	2.99	204
P13	1.38	5.33	87.8	87.3	2.62	2.98	200
P4 + P8	1.72	4.88	85.3	82.0	2.68	3.36	198

Pn indicates that the original n-th backbone or neck layer is replaced by G2L-CRM, while P4 + P8 indicates that both the 4th and 8th backbone layers are replaced.

Table 5. Layer-wise comparison of C3k2-DWR placement.

Layer	FNR/%	FPR/%	mAP@0.5 (mask)/%	mAP@0.5 (bbox)/%	Params/M	GFLOPs	FPS
P4	2.07	5.27	87.5	87.7	2.56	3.00	203
P6	1.72	5.42	88.7	88.3	2.56	2.98	198
P10	2.07	4.47	89.8	88.3	2.69	3.01	205
P13	2.07	4.87	88.6	87.1	2.63	3.01	203
P4 + P10	2.41	4.93	89.9	89.3	2.69	3.03	185

Pn indicates that the original n-th backbone or neck layer is replaced by C3k2-DWR, while P4 + P10 indicates that both the 4th and 10th backbone layers are replaced.

Table 6. Ablation study of the proposed framework.

Configuration	FNR/%	FPR/%	mAP@0.5 (mask)/%	mAP@0.5 (bbox)/%	Params/M	GFLOPs	FPS
O	2.41 ± 0.7	3.56 ± 0.2	88.2 ± 0.4	87.3 ± 1.6	2.84	3.31	166 ± 3
B	1.95 ± 0.5	5.06 ± 0.5	88.7 ± 1.3	87.6 ± 1.6	3.16	3.70	185 ± 6
B + F	1.61 ± 0.2	5.23 ± 0.4	87.4 ± 0.5	86.9 ± 0.7	2.66	3.00	191 ± 6
B + F + G	2.53 ± 0.8	5.11 ± 0.1	87.1 ± 0.8	86.2 ± 1.9	2.56	2.97	199 ± 4
B + F + D	2.07 ± 0.7	5.49 ± 0.4	87.8 ± 1.3	87.1 ± 0.7	2.78	3.04	200 ± 4
B + F + G + D	1.72 ± 0.3	4.85 ± 0.3	88.5 ± 1.1	87.9 ± 1.7	2.69	3.01	204 ± 1

O denotes the original YOLO11n-seg baseline; B denotes the single-channel YOLO11n-seg baseline; F denotes FAST; G denotes G2L-CRM; D denotes C3k2-DWR. Results are reported as mean ± standard deviation over three independent runs.

Table 7. Comparative results of representative segmentation models.

Model	FNR/%	FPR/%	mAP@0.5 (mask)/%	mAP@0.5 (bbox)/%	Dice/%	Params /M	GFLOPs	FPS
U-Net (original)	0.89 ± 0.7	17.82 ± 7.1	-	-	82	31.10	218.9	68 ± 1
DeepLabv3 + ResNet34	1.20 ± 0.2	1.40 ± 0.5	-	-	81	22.43	31.7	132 ± 6
DeepLabv3 + ResNet50	1.00 ± 1.3	2.20 ± 1.6	-	-	81	26.67	36.9	125 ± 16
BiSeNetV2	1.38 ± 0.1	1.65 ± 0.1	-	-	75	4.20	12.3	154 ± 6
YOLOv5n-seg	1.84 ± 0.2	6.29 ± 0.1	77.3 ± 0.2	78.7 ± 0.1	93	1.88	4.31	132 ± 4
YOLOv8n-seg	2.07 ± 0.4	4.32 ± 0.2	88.1 ± 3.2	87.7 ± 1.6	95	3.26	3.87	176 ± 33
YOLO11n-seg	2.41 ± 0.7	3.56 ± 0.2	88.2 ± 0.4	87.3 ± 1.6	95	2.84	3.31	166 ± 3
YOLO12n-seg	2.07 ± 0.0	4.59 ± 0.3	87.8 ± 0.5	87.5 ± 0.4	95	2.82	3.33	129 ± 4
FGD-YOLO	1.72 ± 0.3	4.85 ± 0.3	88.5 ± 1.1	87.9 ± 1.7	95	2.69	3.01	204 ± 1

Table 8. Mandrel bar parameters.

Parameter Category	Specific Parameters
Mandrel bar diameter range (mm)	60–400
Mandrel bar speed (m/s)	0–5
Mandrel bar surface temperature (°C)	Room temperature-850
Circumferential coverage	360° full coverage
Longitudinal range	No length limitation

Table 9. Camera parameters.

Parameter Category	Specific Parameters
Image type	Black and white
Effective pixels	4096 × 8
Pixel size (μm)	7 × 7
Exposure mode	Global exposure
Frame buffer (MB)	128
Data interface	Gigabit ethernet

Table 10. Comparison between ruler-referenced field measurements and online geometric parameter outputs.

	Max Width/mm			Length/mm
Sample	Reference	Online	Difference	Reference	Online	Difference
c1	0.5	0.4	0.1	28.5	25.5	3.0
c2	0.5	0.6	0.1	23	25.4	2.4
c3	1.0	0.9	0.1	16	25.6	-
Mean	-	-	0.1	-	-	2.7

The length difference in c3 was not included in the mean because the reference and online results do not cover the same crack range.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, J.; Sun, Z.; Ding, J.; Li, X. Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework. Metals 2026, 16, 657. https://doi.org/10.3390/met16060657

AMA Style

Cao J, Sun Z, Ding J, Li X. Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework. Metals. 2026; 16(6):657. https://doi.org/10.3390/met16060657

Chicago/Turabian Style

Cao, Jianzhao, Zhu Sun, Jingguo Ding, and Xu Li. 2026. "Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework" Metals 16, no. 6: 657. https://doi.org/10.3390/met16060657

APA Style

Cao, J., Sun, Z., Ding, J., & Li, X. (2026). Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework. Metals, 16(6), 657. https://doi.org/10.3390/met16060657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Real-Time Crack Segmentation and Geometric Parameter Calculation of Mandrel Bars Based on an Improved YOLO Framework

Abstract

1. Introduction

2. Proposed FGD-YOLO Model

2.1. Baseline Model: YOLO11n-Seg

2.2. FGD-YOLO Model Improvement

2.2.1. Single-Channel Input Adaptation

2.2.2. FAST Lightweight Configuration

2.2.3. G2L-CRM

2.2.4. C3k2-DWR

3. Experimental

3.1. Dataset Construction

3.2. Experimental Environment

3.3. Evaluation Metrics

4. Results and Discussion

4.1. Baseline Determination

4.1.1. Effect of Background Ratio

4.1.2. Effect of Input Channel Configuration

4.2. Model Improvements and Results

4.2.1. Layer Selection of G2L-CRM

4.2.2. Layer Selection of C3k2-DWR

4.2.3. Ablation Study

4.3. Comparative Experiment

4.4. Qualitative Analysis Under Challenging Industrial Conditions

5. Application

5.1. Online Detection System

5.2. Crack Geometric Parameter Calculation

5.2.1. Online Visualization

5.2.2. Field Comparison

5.3. Limitations and Implications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI