1. Introduction
With the continuous expansion of offshore wind power capacity, wind turbines in coastal and nearshore areas have become an important component of the clean energy system [
1,
2]. However, the marine environment accelerates blade-coating corrosion, surface deterioration, and crack growth, thereby affecting structural safety and power-generation efficiency [
3,
4]. If blade defects are not detected and addressed promptly, they may aggravate structural fatigue and even lead to serious safety accidents [
5]. Therefore, constructing a high-precision and highly robust visual detection method to achieve automatic identification and localization of corrosion defects on coastal wind turbine blades is of great engineering significance and practical value [
6].
With the widespread adoption of UAV inspection technology, high-resolution blade images can be collected in batches within a short period, significantly reducing the risks associated with manual high-altitude operations [
7]. However, the surge in image quantity also makes manual screening inefficient and costly. In recent years, deep learning-based object detection methods have achieved significant progress in the field of industrial defect inspection [
8,
9,
10]. Single-stage detectors, such as the YOLO series, are widely used in real-time inspection scenarios due to their concise architecture, fast inference speed, and ease of deployment. Their objective is to achieve stable recognition and precise localization of multiple types of defects while maintaining real-time performance. However, compared with conventional industrial surface inspection scenarios, coastal environments pose more stringent challenges to visual detection.
First, salt spray scattering and high-humidity environments significantly reduce image contrast and introduce fog-like scattering noise, weakening the clarity of local texture boundaries. In this context, corrosion commonly observed in the dataset usually manifests as localized dark depressions or pit-like textures, which are small in scale and irregular in shape, and occur more frequently in salt-rich environments, posing challenges to the model’s ability to perceive weak high-frequency features.
Second, blade erosion is typically caused by long-term impacts from raindrops, hail, and other factors, accompanied by material stress accumulation and coating degradation, and manifests as rough edges, local peeling, and fragmented textures. Such defects are often confused with background coating textures or illumination reflections, making it difficult for the model to distinguish real defects from natural texture variations [
4].
Third, cracks usually appear as slender, low-contrast, cross-scale structures, with widths far smaller than their lengths, often spanning multiple texture regions. In addition, fiber textures, coating particles, and stain interference on the blade surface further cause the detector to suffer from missed detections or poorly fitted bounding boxes in small-object recognition and bounding box regression.
Through systematic observation and experimental analysis of coastal blade defect images, we find that the performance bottlenecks of current detection models in this scenario mainly stem from three capability limitations: shallow spatial details are difficult to effectively transmit to deep semantic representations during progressive downsampling, resulting in localization drift of small-scale corrosion or fine crack targets during feature fusion; the boundaries of erosion and crack regions are often blurred, and the model lacks mechanisms to strengthen local edge structures, making precise regression difficult; the dependency relationship between channel features and spatial features is insufficiently modeled, leading to significant fluctuations in class confidence under complex background interference.
To address the above issues, this study proposes CoastCor-Net, a multi-scale dense prediction detection network for coastal salt spray environments. The method enhances detection performance through three complementary mechanisms: spatial–semantic alignment, dilated boundary enhancement, and channel-transposed attention, implemented via CFAM, DDEB, and CTCA within a unified convolutional framework. The three modules form a collaborative optimization mechanism at the levels of feature alignment, boundary enhancement, and dependency modeling, thereby improving the detection accuracy and robustness of corrosion, erosion, and crack defects in complex environments.
An experimental platform is constructed on the public dataset Wind Turbine Blade Damage Dataset, and comparative validation is conducted around different module combinations and multiple mainstream detection methods [
11]. The experimental results demonstrate that the proposed improved framework outperforms existing detection methods in terms of detection accuracy, recall, and boundary stability, especially showing significant advantages in fine-grained corrosion and erosion target recognition.
The main contributions of this paper can be summarized in three aspects: First, an improved single-stage object detection framework for coastal wind turbine blade corrosion detection is constructed to achieve collaborative optimization of spatial and semantic features; Second, three key modules—CFAM, DDEB, and CTCA—are proposed and integrated to systematically address problems such as feature mismatch, limited receptive field, and insufficient channel dependency modeling; Third, the effectiveness and synergistic gain of each module are verified through systematic comparative experiments and ablation analysis. Through the above improvements, CoastCor-Net demonstrates superior accuracy and stability over existing detection methods in complex coastal wind turbine blade defect detection tasks, providing an engineerable visual inspection solution for offshore wind turbine intelligent inspection and structural health monitoring.
The remainder of this paper is organized as follows:
Section 2 introduces related research progress;
Section 3 describes the overall detection framework and key module design;
Section 4 presents the experimental results;
Section 5 discusses the findings;
Section 6 concludes the paper and outlines future work.
2. Related Work
Blade coating-related defects, including aging, blistering, peeling, and corrosion, are characterized by weak textures, blurred boundaries, significant scale variations, and complex background interference.
In the field of wind turbine blade surface defect detection, studies commonly adopt UAV inspection images as data sources and enhance sensitivity to small-scale and weak-texture defects by improving feature fusion and attention mechanisms [
12]. To address the problem of missed detections of blade surface defects under complex backgrounds, Liu et al. enhanced the interaction between multi-scale semantic and detailed information by introducing attention mechanisms and bidirectional feature fusion, thereby improving the detection rate of blade surface defects and robustness under occlusion scenarios [
13]. Furthermore, in the broader context of industrial surface defect inspection, Meng et al. proposed a coordinate modeling attention module to capture long-range dependencies while maintaining spatial interpretability, facilitating direct integration into convolutional backbones to improve performance in fine-grained defect localization tasks [
14]. In machine vision evaluation of wind turbine blades, Xu et al. emphasized automated diagnostic processes for multiple types of damage, combining signal processing and feature engineering to enhance usability across operating conditions, thus providing another reference approach for engineering deployment [
15].
In corrosion detection, corrosion recognition methods based on single-stage object detection frameworks have attracted extensive attention in marine engineering and industrial structural fields, particularly with continuous improvements around the YOLO series models. To address issues such as weak textures, small scales, and irregular boundaries of corrosion regions in complex environments, researchers have explored feature enhancement, loss optimization, and attention modeling from multiple perspectives. Yu et al. proposed an improved YOLOv5-GOLD-NWD model for corrosion detection on coated metal surfaces. By introducing normalized Wasserstein distance loss and structural optimization strategies, the localization accuracy of small-scale corrosion regions was improved, and the effectiveness of the model was validated in coastal metal surface corrosion assessment tasks [
16]. Subsequently, Yu et al. further constructed an improved detection framework integrating EfficientViT, NWD, and channel attention mechanisms for corrosion recognition and coating performance evaluation in marine exposure environments, emphasizing the importance of lightweight structures and multi-scale feature enhancement under complex salt spray conditions [
17]. These two studies indicate that optimizing loss functions and introducing efficient attention structures can effectively improve the accuracy and robustness of corrosion detection. Cheng and Kang proposed a detection and grading method based on an improved YOLOv10 series architecture for hydraulic metal structure corrosion detection and classification. While maintaining real-time performance, the method enhanced the recognition capability for irregular corrosion regions [
18]. This study combined corrosion detection with corrosion grade discrimination, providing a new perspective for quantitative assessment of corrosion states. Chen et al. focused on metal surface corrosion grade recognition and proposed an improved YOLOv8 model. By integrating attention mechanisms and data augmentation strategies, precise classification and localization of multi-level corrosion states were achieved. This work demonstrated that collaborative modeling of channel and spatial information is crucial for improving detection stability in fine-grained corrosion classification scenarios [
19]. Yu et al. also conducted a systematic comparative analysis of multiple improved YOLOv5 models in coastal corrosion detection scenarios, indicating that different attention modules and loss designs have significant impacts on recall and boundary fitting performance for small-scale corrosion targets, thus providing experimental evidence for subsequent structural optimization [
20].
Existing corrosion detection methods primarily focus on three directions: (1) strengthening spatial and channel feature representations through attention mechanisms [
21], (2) enhancing the recognition capability of small-scale corrosion regions through multi-scale feature fusion [
22], and (3) balancing accuracy and real-time performance through structural optimization and lightweight design. However, in the scenario of coastal wind turbine blade corrosion, problems such as mismatch between spatial details and semantic information, insufficient cross-scale information interaction, and unstable boundary localization still persist. Therefore, existing detection frameworks require systematic structural improvements to achieve high-precision corrosion detection under complex salt spray environments.
3. Materials and Methods
3.1. Overall Architecture
The proposed CoastCor-Net is constructed based on structural improvements to the YOLOv11 single-stage object detection framework. The overall network still adopts a multi-scale dense prediction mechanism, while three enhancement modules—DDEB, CFAM, and CTCA—are embedded at key positions in the backbone and detection head to improve the detection accuracy and robustness of blade corrosion, erosion, and crack defects under coastal salt spray environments. In practical deployment, the UAV-captured images are transmitted to a remote workstation for online processing, where the proposed network performs defect detection in near real-time.
UAV inspection images first pass through the Stem convolutional layer for initial feature extraction and downsampling, producing base feature maps. The network then enters the backbone stage, which is composed of multiple stacked Conv and C3k2 structures, generating multi-scale feature representations through progressive downsampling. Compared with the original YOLOv11 architecture, this study embeds DDEB into the C3k2 module of Stage1, replacing the original Bottleneck structure to enhance early boundary and fine-grained texture representation capability. In Stage2 and Stage3, the original C3k2 modules are replaced with CFAM to achieve complementary alignment between spatial details and high-level semantics. Finally, a CTCA module is inserted before the first Conv2d layer of both the classification branch and regression branch in the detection head, strengthening the coupled modeling ability of channel and spatial features, and improving class confidence stability and boundary regression accuracy.
Figure 1 illustrates the overall architecture of CoastCor-Net.
3.2. DDEB Design
In the overall network, Stage1 is mainly responsible for extracting shallow spatial features, and its output retains relatively rich edge and texture information. However, under salt spray scattering and low-contrast conditions, corrosion pit-like textures and fine cracks often exhibit blurred boundaries. The receptive field of traditional convolution kernels is limited, making it difficult to simultaneously account for local sharpening and contextual modeling. To enhance the representation capability of corrosion and crack boundaries in coastal environments, this paper embeds the DDEB module into the C3k2 module of Stage1 in the YOLOv11 backbone. Its structure is shown in
Figure 2.
Let the input feature be
. The input is first normalized and mapped by convolution, and then multi-scale spatial responses are extracted through three parallel convolutions with different dilation rates. The spatial enhancement process can be uniformly expressed as
where
denotes a composite mapping function including multi-scale depthwise dilated convolutions, Spatial Gating (SG), and Spatial–Channel Attention (SCA) [
23]. The multi-scale depthwise dilated convolutions adopt dilation rates of 1, 4, and 9 to simultaneously model local textures, erosion band structures, and cross-scale crack information. SG generates spatial response weights to suppress salt spray scattering noise regions, while SCA further models the dependency between channel and spatial dimensions, thereby enhancing the response intensity to real defect regions.
After completing spatial enhancement, the module enters the gated feed-forward stage. To improve nonlinear representation capability while maintaining computational efficiency, channel expansion and gating mechanisms are adopted for feature recombination, and the overall formulation is
where
denotes the gated feed-forward mapping function. This branch first performs a 1 × 1 convolution to expand the channel dimension, then completes feature selection and fusion through a gating unit, and finally compresses back to the original number of channels and forms a second residual connection with the input.
Combining the above two-stage structure, the overall mapping of DDEB can be simplified as
This dual-residual structure ensures training stability while significantly enhancing boundary sensitivity at the shallow stage. Since salt spray and humidity in coastal environments reduce image contrast, corrosion and crack regions often present blurred boundaries and fine-grained texture degradation. By enlarging the receptive field and introducing a spatial–channel collaborative modeling mechanism, DDEB effectively alleviates the problems of localization drift for small targets and poorly fitted boundaries.
3.3. CFAM Design
To alleviate the spatial–semantic imbalance of corrosion defects in deep features under coastal environments, this paper replaces the original C3k2 structures in Stage2 and Stage3 of the backbone network with CFAM (
Figure 3). The module follows the overall design philosophy of channel splitting, directional transformation, complementary mapping, and fused output. By explicitly constructing the interaction between spatial and semantic dual branches, it achieves effective alignment and compensation of shallow spatial information toward deep semantic features.
Let the input feature be
. CFAM first performs proportional splitting along the channel dimension, dividing the feature into two parts:
where
focuses on preserving spatial positional information, and
emphasizes semantic representation capability, with α denoting the splitting coefficient. Since Stage2 and Stage3 already possess a certain degree of semantic abstraction capability, a “spatial-priority in deeper layers” proportional strategy is adopted, allowing deeper layers to retain more spatial details to enhance localization stability.
In the directional transformation stage, the spatial branch adopts pointwise convolution to preserve the original structural information and avoid excessive destruction of low-level textures; the semantic branch adopts a 3 × 3 convolution to enhance cross-channel correlation modeling capability. The two branches produce
The module then enters the complementary mapping stage. Unlike simple concatenation or weighted summation, CFAM establishes bidirectional dependency through dual interaction mechanisms of channel guidance and spatial guidance.
First, channel mapping is performed on the semantic branch. A channel-wise convolution and global average pooling are applied to to obtain a channel weight vector . After Sigmoid activation, it is mapped to the spatial branch, achieving the modulation effect of “strong semantics guiding weak spatial features.”
Subsequently, spatial mapping is constructed on the spatial branch. Through GCConv and a normalization layer, a spatial attention map
is obtained. After Sigmoid processing, it is mapped to the semantic branch, achieving the effect of “strong structural information inversely correcting semantics” [
24].
The final output feature is expressed as
where ⊙ denotes element-wise multiplication. This expression only retains the core complementary relationship without introducing additional complex formulations, ensuring structural conciseness and implementability.
From a structural perspective, CFAM maintains a lightweight convolution-based implementation and does not introduce global self-attention computation. Therefore, the computational complexity mainly stems from two lightweight convolutions and one global pooling operation, resulting in extremely low additional parameter overhead. Its core advantage lies in embedding shallow spatial localization information into deep semantic representations through weight modulation, thereby enhancing boundary fitting capability and small-target recall for fine-grained defects such as corrosion, erosion, and cracks.
CFAM is positioned at Stage2 and Stage3, serving as a mid-level alignment module. It forms front–back collaboration with DDEB in Stage1: the former is responsible for spatial–semantic structural alignment, while the latter focuses on deblurring and boundary enhancement. After their coordination, the network maintains stable localization and classification performance under complex environments.
3.4. CTCA Design
To enhance the modeling capability of channel–spatial coupling relationships at the detection head stage, this paper embeds the CTCA module before the convolutions of the two decoupled branches in the detection head (
Figure 4). While maintaining the lightweight characteristics of the convolutional framework, this module establishes long-range inter-channel dependencies through cross-dimensional mapping and interactively couples them with spatial features, thereby stabilizing class confidence and suppressing background interference.
Let the input feature from the feature fusion layer be . It is first normalized by LNorm to obtain . Unlike conventional spatial self-attention that unfolds computation along the dimension, CTCA adopts a channel-transposition strategy, rearranging the feature as , that is, treating channels as token units and constructing correlation modeling along the channel dimension.
Subsequently, linear projections are applied to generate queries, keys, and values:
Attention computation is then performed along the channel dimension:
where
is the scaling factor. This operation captures the global response relationships among different channels, enabling the detection head to obtain cross-semantic dependency modeling capability before entering the classification and regression branches.
To reduce computational complexity and avoid noise amplification caused by excessive globalization, CTCA does not directly output
but instead introduces a dual-projection coupling structure. First, the Channel Branch extracts enhanced channel features
, while the Spatial Branch extracts spatially enhanced features
from the original feature. The two are cross-weighted through a Sigmoid gating function:
where
denotes the Sigmoid activation function, and DWConv represents depthwise separable convolution. This structure realizes a bidirectional coupling mechanism of “channel guiding spatial features and spatial feedback to channels.”
The module is embedded before the Conv2d layers of the two decoupled branches in the detection head, enabling the classification branch to obtain more stable discriminative features and the regression branch to obtain more consistent structural responses, thereby improving the localization accuracy of slender and low-contrast defects such as corrosion, erosion, and cracks. Compared with conventional spatial attention mechanisms, its cross-dimensional coupling strategy can significantly alleviate class confidence oscillation under complex coastal backgrounds.
3.5. Dataset
The experimental evaluation uses the UAV-based wind turbine inspection dataset constructed by Shihavuddin [
11]. The dataset contains 2995 high-resolution images (586 × 371 pixels), capturing key components such as blades, rotors, and towers. To ensure robust generalization, the dataset is divided into training, validation, and test sets according to a ratio of 8:1:1.
The annotated defects are categorized into dust and damage. As shown in
Figure 5, the defects exhibit diverse forms. The dust category mainly consists of large-scale targets, whereas the damage category is characterized by small-scale and irregular features, such as corrosion, cracks, and surface peeling. Improving the detection accuracy of these damage targets is the primary objective of this study.
4. Results
4.1. Experimental Environment and Data Settings
To ensure the reproducibility and fairness of the reported experimental results, all models were trained and evaluated under identical hardware and software settings. The experimental platform and training parameter settings are shown in
Table 1.
The dataset was divided according to an 8:1:1 ratio to ensure independence among training, validation, and testing processes. The evaluation metrics include Precision, Recall, mAP@0.5, and mAP@0.5:0.95. Among them, mAP@0.5 is used to measure the overall detection capability of the model, while mAP@0.5:0.95 evaluates localization stability under different IoU thresholds, thereby comprehensively reflecting the adaptability of the model to complex coastal corrosion scenarios.
4.2. Comparison with Mainstream Detection Models
To verify the effectiveness of the proposed method, CoastCor-Net is compared with mainstream object detection models under the same training strategy. The results are shown in
Table 2.
All experiments were conducted under fixed random seeds to ensure reproducibility. The performance improvements were consistently observed across repeated runs.
From
Table 2, traditional two-stage methods show insufficient performance in complex coastal environments, particularly in achieving relatively low accuracy in fine-grained corrosion target detection. Lightweight YOLO-series models demonstrate relatively stable overall performance; however, they still suffer from insufficient recall for small-scale corrosion and fine crack targets.
The proposed method achieves 84.7% in mAP@0.5, representing an improvement of 3.2 percentage points over YOLOv13n and a 5.2 percentage point increase in AP_damage, indicating that the proposed multi-module collaborative structure significantly enhances corrosion region recognition capability.
Figure 6 illustrates the relationship between parameter scale and detection performance (mAP@0.5) for different models. YOLOv5, YOLOv8, and YOLOv11 all exhibit performance improvements as parameter scale increases, but the growth rate gradually diminishes. The improvement from the n to s variants is relatively significant, whereas the increase from m to l becomes noticeably smaller, indicating that simply enlarging model scale does not yield proportional performance gains. Under similar parameter budgets, YOLOv11 achieves overall better performance than YOLOv5 and YOLOv8, suggesting that structural optimization is more effective than merely stacking parameters. The proposed CoastCor-Net maintains stable leading performance under a comparable parameter budget, with a reasonable and controlled advantage margin, demonstrating that its performance improvement primarily stems from structural enhancement and feature augmentation mechanisms rather than excessive parameter increase.
4.3. Overall Module Ablation Study
To verify the independent contribution of the three core modules, experiments were conducted by adding each module individually to the baseline model. The results are shown in
Table 3.
The results indicate that all three modules individually improve detection accuracy, among which CFAM contributes the most significant performance gain. When combined, the three modules achieve the maximum improvement, demonstrating a clear synergistic effect.
First, after introducing DDEB, mAP@0.5 increases from 81.0% to 82.6%, representing an improvement of 1.6 percentage points. Multi-scale dilated convolutions and spatial enhancement structures effectively enlarge the receptive field and enhance responsiveness to fine cracks and erosion boundaries. In coastal salt spray environments, corrosion regions often exhibit blurred boundaries and low contrast. Without boundary enhancement mechanisms, shallow features are prone to localization drift. Therefore, the enhancement effect of DDEB at the early stage is evident.
Second, when only CFAM is introduced, performance increases to 82.9%, yielding a gain of 1.9 percentage points, which is the largest improvement among the single modules. This phenomenon indicates that under complex background interference, the mismatch between spatial details and deep semantics is one of the main bottlenecks limiting detection accuracy. CFAM establishes bidirectional dependency through channel guidance and spatial inverse correction mechanisms, enabling shallow localization information to effectively compensate deep semantic representation, thereby significantly improving recall for small corrosion targets.
When only the CTCA module is added, mAP@0.5 increases to 82.3%. Although the improvement is relatively smaller, its contribution is mainly reflected in stabilizing confidence scores and improving boundary regression consistency at the detection head stage. By modeling cross-channel dependencies through channel transposition, CTCA enables the classification and regression branches to obtain more stable feature representations under complex backgrounds. Therefore, its contribution can be regarded as back-end optimization enhancement.
Further analysis of the combined experiment shows that when both DDEB and CFAM are introduced, mAP@0.5 reaches 83.8%, further improving over the single-module cases. This indicates a clear synergistic relationship between shallow boundary enhancement and mid-level feature alignment. DDEB provides high-quality spatial detail foundations, while CFAM effectively transfers this information to deep semantic representations, forming a functional closed loop across different structural stages.
The three modules form a structured collaborative mechanism across the shallow layer, mid-level layer, and detection head stage of the network, thereby systematically improving the stability and accuracy of corrosion and crack detection under complex coastal environments.
4.4. Comparison of Different Backbone Networks
To analyze the adaptability of different feature extraction structures to coastal corrosion scenarios, we replaced only the backbone network while keeping the detection head and training strategy consistent to evaluate structural adaptability. The results are shown in
Table 4.
From
Table 4, although the lightweight backbone MobileNetV3 has fewer parameters, it performs noticeably worse on the mAP@0.5:0.95 metric, indicating weaker boundary regression stability under high IoU thresholds. CSPDarknet and PartialNet show relatively stable overall detection capability, but still suffer from insufficient recall for fine-grained targets under complex salt spray backgrounds.
In contrast, the improved backbone proposed in this paper achieves the best results on both mAP@0.5 and mAP@0.5:0.95, with more pronounced improvement in the high-IoU range. These results suggest that the constructed feature enhancement and alignment mechanisms not only improve detection accuracy but also enhance localization consistency. Since coastal corrosion and crack targets typically exhibit slender and blurred boundary characteristics, the performance gain in the high-IoU range demonstrates substantial improvement in boundary fitting capability. Although the model size has increased slightly, the FPS remains high and does not affect the real-time performance of the detector.
Considering both accuracy and parameter scale, the proposed backbone achieves superior feature representation efficiency while maintaining a lightweight structure, reflecting better structural adaptability.
4.5. Convolution Structure Comparison Experiment
To verify the advantages of the multi-scale dilated convolution structure in complex coastal environments, comparative experiments were conducted by replacing only the convolution type in Stage1 while keeping the overall network structure consistent. The experimental results are shown in
Table 5.
From
Table 5, the performance difference between standard convolution and DWConv in overall detection accuracy is relatively small, but the improvement in recall is limited, indicating that single-scale convolutions are insufficient for modeling cross-scale defect structures. SWConv enhances spatial representation capability through structural reorganization, leading to improved performance, but it still does not achieve the optimal result.
The multi-scale dilated convolution achieves the highest performance in both mAP@0.5 and Recall, demonstrating that enlarging the receptive field and integrating responses from different scales play a critical role in modeling corrosion pit-like textures and fine crack structures. In particular, the significant improvement in recall suggests this structure effectively reduces the probability of missed detections, which is of important engineering significance for practical inspection tasks.
4.6. Comparison of Different Attention Mechanisms in the Detection Head
To verify the effectiveness of the proposed CTCA module at the detection head stage, comparative experiments were conducted by replacing only the attention mechanism before the detection head while keeping the Backbone and Neck structures completely consistent. All models were trained and tested under the same training strategy. The comparison results are shown in
Table 6.
4.7. Robustness Evaluation Under Composite Coastal Perturbation
To evaluate the robustness of the proposed model under practical coastal imaging disturbances, a composite augmentation setting was constructed. In this setting, Gaussian noise, brightness variation, and image rotation were simultaneously applied to the test images to approximate sensor interference, illumination instability, and UAV attitude variation encountered during offshore inspection (
Figure 7). All transformations were conducted while preserving the original annotations to ensure fair comparison.
Table 7 presents the mAP@0.5 results under clean and composite disturbance conditions. As expected, all models exhibit performance degradation when subjected to compounded perturbations, indicating sensitivity to environmental variability. YOLOv8n decreases from 80.4% to 77.2%, and YOLOv11n drops from 81.0% to 78.0%, corresponding to a performance reduction of approximately 3%. In contrast, CoastCor-Net maintains 82.6% mAP@0.5 under the same composite disturbance, showing a relatively smaller degradation.
These results suggest that the proposed spatial alignment and channel coupling mechanisms contribute to more stable feature representation under complex visual perturbations. From an engineering perspective, the enhanced robustness of CoastCor-Net indicates improved adaptability to real-world coastal UAV inspection scenarios.
4.8. Visualization Results Analysis
To further validate the capability of CoastCor-Net in recognizing fine-grained corrosion and erosion defects under complex coastal environments, this paper presents comparative visualization results between the baseline model and different improved structures, as shown in
Figure 8. From left to right, the figure displays the original image, the detection results of YOLOv5, YOLOv11, and the proposed model, respectively. The red dashed regions indicate enlarged areas.
From the overall detection results, under long-distance perspectives, all models are able to localize the main structural regions. However, there are significant differences in the number of detected fine-grained corrosion regions and localization accuracy. The baseline model shows weak responses to small-scale corrosion points, resulting in missed detections, and some predicted bounding boxes deviate from the true boundaries. In the medium-scale enlarged region, corrosion patches exhibit irregular shapes with blurred edges and are mixed with coating background textures. The baseline model only identifies some prominent corrosion patches in this region and fails to adequately respond to small-scale fragmented damage near the edges. In contrast, the proposed model demonstrates stronger capability in capturing corrosion details, with more sufficient detection boxes and tighter boundary fitting to the actual defect regions.
To further explore the interpretability of the proposed model, we conducted Grad-CAM visualization analysis to compare the feature activation regions among YOLOv8, YOLOv11, and CoastCor-Net. As shown in
Figure 9, the activation responses of YOLOv8 and YOLOv11 were relatively scattered, with some attention being diverted to non-defective areas. In contrast, CoastCor-Net exhibited a more concentrated and structurally consistent activation pattern around the corrosion and crack regions, especially near the blurred boundaries and fine defect areas. This demonstrates that the collaborative enhancement of the spatial alignment and channel coupling mechanisms enables the model to capture more discriminative defect features while suppressing background interference. The visualization results further validate the effectiveness of CFAM, DDEB, and CTCA in improving the reliability of defect localization.
5. Discussion
This study focuses on the complexity of wind turbine blade corrosion detection under coastal salt spray environments and constructs a multi-module collaborative enhancement detection framework. CoastCor-Net consistently outperforms mainstream lightweight detection models on both mAP@0.5 and mAP@0.5:0.95 metrics, with more pronounced improvements in fine-grained defect recognition for the “damage” category. Under complex low-contrast environments, standard convolution structures fail to fully capture blurred corrosion boundaries, whereas spatial enhancement and channel coupling mechanisms effectively mitigate this limitation.
From a structural perspective, the DDEB module expands the receptive field through multi-scale dilated convolutions at the shallow stage, making the model more sensitive to the spatial distribution of slender cracks and irregular corrosion patches. After embedding into Stage1, the module strengthens boundary information before feature downsampling, laying a foundation for subsequent semantic representation. Ablation experiments show that introducing DDEB alone brings stable performance improvement, indicating that shallow boundary enhancement is particularly critical in coastal environmental scenarios. CFAM achieves complementary alignment between spatial and semantic features through a bidirectional guidance mechanism. Experimental results demonstrate that this module yields the largest gain when added individually, suggesting that the primary performance bottleneck under complex background interference stems from spatial–semantic imbalance. By explicitly constructing cross-branch modulation relationships, the model maintains more stable localization capability in deeper layers, thereby improving recall for small-scale defects.
At the detection head stage, the CTCA module models cross-channel dependencies through channel transposition, enabling the classification and regression branches to obtain more consistent responses under high salt spray backgrounds. Although its individual gain is relatively modest, it plays an important role in stabilizing back-end outputs within the overall structure. In particular, the improvement in the mAP@0.5:0.95 metric reflects enhanced boundary regression accuracy.
From the perspective of scaling trends, as the number of model parameters increases, all models exhibit diminishing marginal performance gains. Compared with traditional YOLO-series models whose performance gradually saturates with scale expansion, the proposed model maintains higher accuracy under similar parameter budgets, indicating that structural optimization is more effective than simple parameter stacking. This characteristic is of practical engineering significance in resource-constrained UAV inspection scenarios.
Beyond visual detection performance, the proposed framework can support structural health monitoring (SHM) of offshore wind turbines by providing quantitative information on defect location, size, and category. These outputs can be integrated into fatigue assessment and reliability evaluation models to estimate structural risk and remaining service life, thereby assisting condition-based maintenance decisions. In large-scale wind farms, defect severity and spatial distribution can further be used to prioritize inspection and repair tasks, improving resource allocation efficiency and reducing operational risk.
However, the proposed method still has certain limitations. The model training relies on high-quality annotated data. Although the multi-module structure controls computational cost, it still introduces a slight increase in inference overhead compared with extremely lightweight models. Future research may further optimize deployment efficiency by integrating lightweight convolution alternatives or knowledge distillation strategies.
6. Conclusions
To address the problems of blurred boundaries, spatial–semantic imbalance, and insufficient channel dependency modeling in wind turbine blade corrosion, erosion, and crack detection under coastal salt spray environments, this study proposes CoastCor-Net, a multi-scale dense prediction detection framework. By introducing three enhancement modules—DDEB, CFAM, and CTCA—at key positions in the backbone network and detection head, the proposed method achieves collaborative optimization of shallow boundary enhancement, mid-level feature alignment, and back-end channel coupling modeling.
Experimental results on the Wind Turbine Blade Damage Dataset demonstrate that the proposed model outperforms mainstream single-stage detection methods on both mAP@0.5 and mAP@0.5:0.95 metrics, especially achieving higher recall and more stable localization performance in fine-grained target recognition for the “damage” category. Meanwhile, under the premise of maintaining a lightweight structure, the model’s inference speed satisfies the requirements of real-time UAV inspection.
Further ablation experiments verify the independent contributions and synergistic gains of the three modules, demonstrating that structural improvements are more effective than simple model scale expansion. The analysis of the parameter scale–performance relationship also reveals a clear diminishing marginal return trend, highlighting the parameter efficiency advantage of the proposed method.
Future work will focus on improving cross-scenario generalization capability and optimizing lightweight deployment, including the introduction of adaptive data augmentation strategies, cross-domain training mechanisms, and model compression techniques, to further enhance the practicality and stability of offshore wind power intelligent inspection systems.