Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement

Li, Jiayin; Lang, Yue; Shen, Jingfei; Ma, Binbin; Li, Shuang

doi:10.3390/electronics15102136

Open AccessArticle

Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement

by

Jiayin Li

,

Yue Lang

^*

,

Jingfei Shen

,

Binbin Ma

and

Shuang Li

School of Electric and Information Engineering, Hebei University of Technology, Tianjin 300401, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(10), 2136; https://doi.org/10.3390/electronics15102136

Submission received: 14 April 2026 / Revised: 11 May 2026 / Accepted: 13 May 2026 / Published: 16 May 2026

(This article belongs to the Special Issue Applications of Artificial Intelligence in Electric Power Systems)

Download

Browse Figures

Versions Notes

Abstract

Unmanned aerial vehicle (UAV)-based inspection is essential for transmission line maintenance, where object detection enables reliable identification of component states and defects. However, fog-induced degradation reduces image contrast and suppresses fine structural cues, thereby significantly degrading detection performance. To address this issue, we propose a robust detection framework, termed FogTLD-YOLO, for defect detection under foggy conditions. The proposed model adopts a degradation-adaptive enhancement strategy to mitigate feature deterioration. A fog-aware gated compensation module leverages frequency-domain priors to selectively compensate degraded regions, while a structural-positional enhancement pyramid preserves geometric continuity and positional sensitivity during feature aggregation. Together, these designs improve the representation of slender structures and small objects. Extensive experiments show that FogTLD-YOLO achieves 82.1% mAP50, outperforming the best competitive algorithm by 2.8% with comparable efficiency. Comprehensive analyses, including module insertion strategies, gating design variants, convolutional branch configurations, and cross-architecture evaluations, further validate the effectiveness and general applicability of the proposed design for robust defect detection in foggy inspection scenarios.

Keywords:

transmission line defect detection; foggy inspection scenario; structural–positional enhancement; slender structure; Haar wavelet decomposition

1. Introduction

With the rapid advancement of modern smart grids, unmanned aerial vehicles (UAVs) have become a core technology for automated power line inspection to ensure the safe and stable operation of power systems. Compared to conventional manual methods, intelligent UAV-based inspection schemes have distinct advantages, such as wide coverage, high operational efficiency, flexible deployment, and low overall costs. Consequently, they have been widely applied to state recognition and defect detection for various power components, including insulators, power fittings, anti-vibration hammers, and conductor accessories [1,2]. Therefore, UAV-based intelligent detection supports the maintenance of digital transmission lines and helps shift grid inspections from manual reliance toward data-driven intelligence.

Deep-learning-based approaches have fundamentally renewed power inspection in recent years. Early efforts relied on traditional image processing and handcrafted features, such as edge extraction and geometric analysis, to identify critical power lines and components [3,4]. However, these methods often lacked robustness when faced with complex backgrounds and varying illumination. With the rise of large-scale datasets, CNN-based frameworks have become the dominant approach due to their superior feature extraction capabilities. This progress encompasses both two-stage detectors, such as Faster R-CNN [5] and Mask R-CNN [6], and one-stage models, most notably RetinaNet [7] and the YOLO series [8,9,10,11,12,13]. Current research has widely focused on specialized architectures [14], feature enhancement for complex backgrounds [15], lightweight designs for efficient deployment [16], and adaptation to adverse weather [17]. Despite these advancements, existing methods have demonstrated strong potential for engineering applications under clear-weather conditions or when the training and testing distributions are relatively well matched. However, their robust generalization capability in unseen foggy target domains remains limited.

In practical applications, UAVs often encounter complex atmospheric conditions, particularly foggy weather. Atmospheric-optics studies have shown that suspended particles in foggy or bad-weather environments attenuate light during propagation, with the extinction process being closely related to wavelength, visibility, and meteorological conditions [18]. Consequently, fog-induced atmospheric scattering causes a significant degradation in image contrast, the loss of high-frequency details, and noticeable color shifts [19,20]. These degradation phenomena obscure the fine structural features of power lines and their components, severely compromising the performance of defect detection models. Overcoming these weather-induced limitations is therefore pivotal to enabling robust UAV-based inspection under clear-to-fog cross-domain conditions.

Existing studies on foggy inspection detection mainly follow two directions. The first focuses on mitigating the effects of fog through image preprocessing or restoration, aiming to enhance the visibility of input images before feeding them into a downstream detector. Within this direction, several studies rely on physics-based atmospheric scattering and light-extinction models, using transmission estimation, ambient light modeling, visibility-aware extinction estimation, spectral band-dependent attenuation analysis, or prior constraints to restore clear images or infer scene information under degraded weather conditions [21,22]. Another line of work uses deep learning to learn a direct mapping from degraded images to clear ones, often improving both defogging quality and fine-detail reconstruction [23,24,25]. While such methods can improve image quality, their optimization objectives are typically geared toward visual enhancement rather than aligning with the downstream task of object detection. Moreover, errors introduced during the restoration stage may propagate to the detector and compromise final performance. For fine-grained tasks such as transmission line inspection, where small objects rely heavily on structural details, relying solely on front-end image recovery is often insufficient to ensure robust generalization across unseen foggy target domains. The restoration process also carries the risk of introducing artifacts, over-enhancement, or local structural distortions [26,27], which can propagate to and undermine downstream defect recognition. Additionally, employing defogging as an independent preprocessing step leaves image restoration and object detection inherently decoupled, hindering end-to-end optimization for the final task.

Another direction emphasizes robust feature representation learning at the detector level. Instead of treating foggy degradation merely as an input quality issue, these approaches aim to enhance the detector’s resilience to weather-induced perturbations at the feature level. Some methods focus on improving the robustness of power-component features through architectural design. In particular, attention mechanisms, multi-scale modeling, and context aggregation have been introduced to strengthen object saliency and discriminative feature extraction under adverse weather conditions [28,29,30]. Beyond power component detection, related feature extraction and fusion strategies have also been investigated in broader remote sensing perception tasks, including hyperspectral imaging, infrared–visible fusion, and multi-source visual sensing. For example, hyperspectral anomaly detection methods have employed local contrast modeling, spatial–spectral gradient feature fusion, and spectral–spatial information fusion to enhance low-contrast anomalous regions and suppress complex background interference [31,32]. Hyperspectral video tracking methods further exploit spectral–spatial angle mapping and material–motion cue fusion to improve target–background discrimination and tracking robustness [33,34]. In addition, infrared–visible fusion and multi-source remote sensing detection studies have explored detection-guided fusion, multi-branch feature extraction, and cross-modal complementary feature interaction to improve perception robustness under complex imaging conditions [35,36,37]. These studies suggest that robust perception in degraded or cluttered scenes benefits from explicit modeling of local contrast, gradient cues, complementary modality information, and spatial structural relationships. However, most of these methods rely on multi-band or multi-sensor inputs, whereas foggy transmission line inspection in this study is addressed under an RGB-based clear-to-fog setting. Meanwhile, context aggregation modules, such as SPPCSPC, SPPELAN, and ASPP, are commonly used in modern detection and segmentation frameworks. These modules adopt spatial pyramid pooling and its variants to expand receptive fields and integrate multi-scale semantic information [11,38,39]. However, while convolutions, pooling, and aggressive downsampling help capture high-level semantics, they may also suppress fine-grained local details. This issue becomes particularly evident in low-resolution images and small-object scenarios [40,41]. These findings suggest that relying solely on large receptive fields and enriched high-level semantics is insufficient to fully compensate for the loss of fine-grained structural cues caused by foggy conditions.

A further line of work attempts to reduce the gap between clear and foggy scenes through cross-domain learning. In particular, some approaches employ domain adaptation methods by introducing unlabeled foggy target-domain images during training. Some strategies, such as adversarial alignment, statistical distribution matching, or pseudo-label self-training, are usually applied to improve detection performance on the target domain [42,43,44]. While such methods have proven effective in generic cross-domain scenarios, their application in power inspection, particularly for foggy defect detection, remains limited [45,46]. The scarcity of real-world foggy images has prompted some studies to adopt the domain generalization (DG) strategy [47,48,49]. This approach trains models exclusively on labeled fog-free power line images and aims to generalize to unseen foggy target domains without access to target-domain training samples. Despite its potential, DG has seldom been explicitly employed for foggy condition detection in power systems. In 2025, a model named YOLOv8-eRFD-AP [50] demonstrated the feasibility of DG in power inspection, yet its modeling efforts have primarily been centered on general architectural enhancements, while more explicit modeling of fog-induced degradation remains rather limited.

Even with these advances, one issue remains unresolved in foggy transmission line inspection. Under foggy conditions, high-frequency information, including edges, textures, and fine structures, is suppressed far more severely than low-frequency components such as overall luminance and coarse contours. For transmission line components, this implies that critical discriminative evidence, such as local edge continuity, slender structural contours, and tiny connecting regions, is already compromised at the imaging stage. Moreover, these weakened responses can be further smoothed during feature extraction and context aggregation, especially in pooling-dominated designs. However, existing approaches lack a unified treatment of fog-induced high-frequency attenuation and structural over-smoothing.

To address this issue, we propose a robust detection model for foggy transmission line inspection that incorporates frequency cues related to fog-induced degradation. The proposed model is built upon the vanilla YOLOv7 framework in this study, which provides an efficient one-stage, multi-scale detection pipeline. The model is trained exclusively on clear-weather source-domain samples without accessing any foggy target-domain images during training, and is evaluated directly on the foggy target domain. To address the loss of local discriminative evidence caused by fog-induced high-frequency attenuation, we design a fog-aware gated compensation (FAGC) module. This module integrates discrete wavelet transform (DWT) into high-resolution detection features. It constructs modulation coefficients based on the energy distribution of high- and low-frequency components that are sensitive to fog-induced degradation. These coefficients adaptively control the compensation strength of local residual information. Furthermore, by combining the wavelet statistics maps with a low-frequency-dominant appearance proxy to generate a spatial gating map, the proposed model achieves selective enhancement of degradation-sensitive regions. This mechanism more effectively compensates for the loss of evidence in small objects and edge-sensitive structures.

However, compensating for damaged edges only at the high-resolution stage remains insufficient. If the subsequent context aggregation module relies on pooling-dominated structures, then the compensated weak responses may again be smoothed out during deep feature propagation. Given that power line components exhibit strongly directional geometric characteristics, we introduce direction-sensitive convolutions to enhance the modeling capacity for orientation-specific gradients. For this reason, we replace SPPCSPC with a structural-positional enhancement pyramid (SPEP). SPEP first performs pooling-free structural enhancement through a structure-aware directional aggregation (SADA) unit composed of parallel branches with direction-sensitive and dilated convolutions, and then applies a coordinate-guided positional refinement (CGPR) unit. This design strengthens the continuity representation of slender structures and improves localization and discriminability under complex backgrounds.

The main contributions of this paper are summarized as follows:

A framework for transmission line defect detection in fog is proposed. This framework is trained exclusively on clear-weather data and tested on foggy scenes with varying fog densities.
A fog-aware gated compensation (FAGC) module is proposed to explicitly introduce fog-related frequency priors into the detection process. By leveraging frequency band energy statistics and a spatial gating mechanism, local residual information is selectively compensated, thereby improving model adaptability to fog-induced local evidence degradation.
A structural–positional enhancement pyramid (SPEP) is designed to alleviate traditional pooling-dominated context aggregation with a pooling-free, multi-branch architecture. By integrating direction-sensitive convolutions and coordinate attention, the over-smoothing problem of slender structures and small objects under foggy conditions is effectively mitigated.
Extensive experiments are conducted on a public transmission line inspection dataset. Results demonstrate that the proposed method outperforms various mainstream models across multiple fog density conditions, exhibiting particularly strong robustness in detecting slender structures.

The remainder of this paper is organized as follows. Section 2 introduces the proposed FogTLD-YOLO framework in detail. Section 3 presents the experimental settings and a comprehensive evaluation of the model performance, including comparative and ablation experiments. Section 4 provides further discussion and analysis of the proposed design. Section 5 summarizes the main findings of this study and discusses future work.

2. Methodology

2.1. Problem Formulation

In practical transmission line inspection, foggy conditions vary widely in density and coverage, making it impractical to collect and annotate exhaustive foggy data in advance. Detectors trained on clear-weather images alone tend to degrade significantly under fog due to domain shifts. Therefore, in this work, the model is trained solely on clear-weather source-domain data without exposure to foggy target-domain images and is expected to be tested directly on unseen foggy target domains with varying fog densities.

Unlike general domain shifts, performance degradation in foggy object detection stems not only from statistical distribution discrepancies but also from structured physical principles governing atmospheric scattering. Specifically, fog often suppresses high-frequency components, such as edges, local textures, and fine-grained geometric details [51,52,53]. For critical transmission line components, especially the categories insulator, stockbridge, and spacer, this degradation implies that essential discriminative cues, particularly local edge continuity and slender structures, are already weakened at the imaging stage. The frequency-asymmetric degradation induced by the environment is further exacerbated by the intrinsic architectural design of the detector. Operations such as multi-scale pooling, downsampling, and strided convolutions can collectively act as low-pass filters, thereby imposing cumulative smoothing effects. Consequently, these effects result in greater localization bias and a higher rate of missed objects.

Therefore, the model is required to address two challenges, namely the environmental suppression of high-frequency discriminative features and the architectural over-smoothing during the aggregation of deep features.

Figure 1 illustrates the configuration of the source and target domains in this study. During the training phase, the model is exclusively trained on labeled samples from the clear-weather source domain. In the inference stage, the model is evaluated across multiple target-domain test sets with various fog densities. Notably, during training, the proposed method does not rely on any target-domain samples and performs no explicit distribution alignment between the source and target domains.

The clear-weather source-domain training set is denoted by

D_{s} = {\{(x_{i}^{s}, y_{i}^{s})\}}_{i = 1}^{N_{s}},

(1)

where

x_{i}^{s} \in R^{3 \times H_{0} \times W_{0}}

denotes the i-th clear-weather image,

y_{i}^{s}

denotes the corresponding object detection annotation including category labels and bounding-box information, and

N_{s}

is the number of source-domain samples.

For the target domain, the k-th fog density subset is defined as

D_{t}^{(k)} = {(x_{j}^{t, k}, y_{j}^{t, k})}_{j = 1}^{N_{t}^{(k)}}, k = 1, \dots, K,

(2)

and the multi-density foggy target domain is denoted by

D_{t} = {D_{t}^{(k)}}_{k = 1}^{K} .

(3)

Under this formulation, the objective of this study is to develop a fog-robust detector using only supervised information from the clear-weather source domain while maintaining stable detection performance across multiple unseen foggy target domains with different fog densities.

2.2. Overall Framework

Figure 2 illustrates the overall architecture of the proposed FogTLD-YOLO under the clear-to-fog setting. FogTLD-YOLO is an end-to-end one-stage detector developed based on the YOLOv7 detection framework, consisting of a backbone, a neck, and detection heads. Within this framework, the backbone extracts hierarchical feature representations from the input image, the neck performs multi-scale feature fusion and propagation, and the detection heads generate bounding-box regression, objectness, and category predictions at different detection scales. As shown in Figure 2, the three prediction branches are denoted as P3, P4, and P5, corresponding to the high-, medium-, and low-resolution detection branches, respectively. For an input image of

640 \times 640

pixels, P3, P4, and P5 correspond to feature-map spatial sizes of

80 \times 80

,

40 \times 40

, and

20 \times 20

, respectively. To improve robustness under foggy conditions, two dedicated modules are incorporated into the detection pipeline. First, the fog-aware gated compensation (FAGC) module is inserted into the high-resolution prediction branch P3 to compensate for fog-weakened local structural evidence. Second, the original deep aggregation module is replaced by the structural–positional enhancement pyramid (SPEP), which strengthens structure-aware feature aggregation and alleviates the loss of fine structural cues during deep feature propagation. As shown in Figure 2, these two modules are integrated into different parts of the network while the overall detection pipeline remains unchanged.

Accordingly, FogTLD-YOLO is optimized with the standard detection objective, which consists of box regression, objectness, and classification terms.

For positive samples, the bounding-box regression loss is defined as

L_{box} = \frac{1}{N_{pos}} \sum_{n = 1}^{N_{pos}} (1 - CIoU (b_{n}, {\hat{b}}_{n})),

(4)

where

b_{n}

and

{\hat{b}}_{n}

denote the ground-truth box and the predicted box of the n-th positive sample, respectively, and

N_{pos}

is the number of positive samples.

The objectness loss is defined as

L_{obj} = \frac{1}{N_{obj}} \sum_{m = 1}^{N_{obj}} BCE (o_{m}, {\hat{o}}_{m}),

(5)

where

o_{m}

and

{\hat{o}}_{m}

denote the ground-truth objectness label and predicted objectness scores of the m-th prediction, respectively, and

N_{obj}

is the number of predictions contributing to the objectness loss.

For positive samples, the classification loss is defined as

L_{cls} = \frac{1}{N_{pos}} \sum_{n = 1}^{N_{pos}} BCE (c_{n}, {\hat{c}}_{n}),

(6)

where

c_{n}

and

{\hat{c}}_{n}

denote the ground-truth class vector and the predicted class vector of the n-th positive sample, respectively.

Accordingly, the overall detection loss is given by

L_{\det} = λ_{box} L_{box} + λ_{obj} L_{obj} + λ_{cls} L_{cls},

(7)

where

λ_{box}

,

λ_{obj}

, and

λ_{cls}

are the balancing coefficients.

Under the clear-to-fog setting, FogTLD-YOLO is trained only with labeled source-domain data. The trained parameter set

θ^{*}

is obtained by

θ^{*} = arg min_{θ} \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} L_{\det} (f_{θ} (x_{i}^{s}), y_{i}^{s}),

(8)

where

x_{i}^{s}

and

y_{i}^{s}

denote the i-th source-domain image and its corresponding annotation, respectively, and

N_{s}

is the number of source-domain training samples.

After training, the learned detector

f_{θ^{*}}

is directly evaluated on the unseen foggy target-domain test subsets

D_{t}^{(k)}

under different fog density conditions, where

k = 1, \dots, K

.

The detailed structures and operating mechanisms of FAGC and SPEP are described in the following subsections.

2.3. Fog-Aware Gated Compensation Module

To compensate for fog-induced attenuation of local evidence in high-resolution features, the proposed fog-aware gated compensation (FAGC) module is incorporated into the P3 prediction branch, as illustrated in Figure 3. Figure 4 presents the overall workflow of FAGC. Rather than indiscriminately amplifying feature responses, FAGC performs position-sensitive residual compensation guided by frequency priors that are consistent with the physical characteristics of fog-induced degradation. Specifically, fixed Haar wavelet decomposition [54] is employed to extract frequency-domain representations, from which statistical mappings are constructed to provide interpretable degradation-related cues. The fixed Haar wavelet transform is adopted because it provides a lightweight frequency decomposition while preserving local discontinuities related to edges and fine structures. This property is compatible with the present task, where fog-induced degradation mainly affects local structural evidence in high-resolution features. Based on these cues, the module adaptively modulates the compensation strength across spatial locations. It is worth noting that only the convolutional, gating, and fusion layers are learnable, and all parameters are optimized end-to-end using source-domain detection supervision. Consequently, FAGC does not explicitly model the distribution of foggy target domains. Instead, it encourages the network to exploit high-frequency edges, local textures, and fine-grained structures in a more robust manner under clear-weather training conditions.

2.3.1. Fog-Aware Wavelet-Guided Gating Mechanism

To enable spatially adaptive compensation under foggy conditions, a fog-aware wavelet-guided gating mechanism is developed to estimate degradation severity and localize compensation-worthy regions based on frequency-domain cues. This mechanism consists of three parts. First, the input feature is decomposed into multi-frequency subbands to characterize fog-related frequency variations. Second, frequency-domain statistical maps are constructed to describe the distribution of high- and low-frequency responses. Third, these cues are used to generate a spatial gating signal for subsequent local residual compensation.

Fog-induced atmospheric scattering degrades visual quality by attenuating high-frequency components associated with edges, local textures, and fine structures, while relatively amplifying low-frequency components corresponding to smooth regions. For transmission line inspection objects, this frequency shift means that critical local evidence, such as edge continuity and fine structural cues, is weakened under foggy conditions. Therefore, characterizing the relative distribution of high- and low-frequency components provides a physically grounded way to describe fog-induced feature degradation. Motivated by this observation, a fixed Haar wavelet transform [54] is employed to decompose the input feature

X \in R^{C \times H \times W}

into multi-frequency subbands, enabling explicit characterization of fog-induced distortions.

DWT (X) = (LL, LH, HL, HH), LL, LH, HL, HH \in R^{C \times \frac{H}{2} \times \frac{W}{2}},

(9)

where

LL

denotes the low-frequency approximation subband, and

LH

,

HL

, and

HH

denote the high-frequency detail subbands in the horizontal, vertical, and diagonal directions, respectively.

The low-frequency and high-frequency energies are then defined as

\begin{matrix} E_{LL} & = Mean (| LL |), \\ E_{HF} & = Mean (| LH | + | HL | + | HH |), \end{matrix}

(10)

where

| \cdot |

denotes the element-wise absolute value and

Mean (\cdot)

denotes averaging over all channels and spatial locations.

Based on these statistics, the fog-aware coefficient is defined as

f_{raw} = \frac{E_{LL}}{E_{LL} + E_{HF} + ε},

(11)

where

ε

is a small constant for numerical stability. The coefficient

f_{raw}

reflects the relative dominance of low-frequency energy over high-frequency energy. As fog density increases, high-frequency responses tend to be suppressed more strongly, and the value of

f_{raw}

correspondingly becomes larger. In this way, this coefficient serves as a degradation-related modulation signal for the subsequent compensation process.

Based on the decomposed components, statistical descriptors are constructed to capture the distribution of high- and low-frequency responses. These descriptors further provide spatial cues for identifying regions that require compensation.

While global degradation estimation provides informative guidance, spatially adaptive refinement remains necessary due to the non-uniform distribution of fog effects and the distinct local statistics between component regions and the background. Thus, a spatial gating strategy is introduced to further identify regions that require compensation. As shown in Figure 3, multiple wavelet-domain statistical maps are extracted, including high-frequency mean and maximum response maps, as well as low-frequency mean and standard deviation maps. Here,

{Avg}_{c} (\cdot)

and

{Std}_{c} (\cdot)

denote the channel-wise mean and the channel-wise standard deviation, respectively, and

Up (\cdot)

denotes bilinear upsampling to the same spatial resolution as X. Accordingly,

M_{hf}^{avg} = Up (\frac{1}{3} [{Avg}_{c} (| LH |) + {Avg}_{c} (| HL |) + {Avg}_{c} (| HH |)]),

(12)

M_{hf}^{\max} = Up (max \{{Avg}_{c} (| LH |), {Avg}_{c} (| HL |), {Avg}_{c} (| HH |)\}),

(13)

M_{ll}^{avg} = Up ({Avg}_{c} (| LL |)),

(14)

M_{ll}^{std} = Up ({Std}_{c} (LL)) .

(15)

Specifically,

M_{hf}^{avg}

is used to characterize the overall strength of high-frequency responses,

M_{hf}^{\max}

is used to highlight the most salient local high-frequency activations, and

M_{ll}^{avg}

together with

M_{ll}^{std}

reflect the low-frequency-dominant luminance distribution and local smoothness, respectively. Together, these maps provide complementary cues for spatial selection. The high-frequency maps are more closely related to local structural responses, whereas the low-frequency maps reflect smoothness and luminance-dominant context. Their combination provides a more suitable basis for identifying regions that require compensation.

While wavelet-based statistics provide degradation-aware structural cues, spatial selection also depends on local appearance context, particularly for suppressing over-compensation in homogeneous foggy regions. To this end, a low-frequency-dominant appearance proxy is incorporated as an auxiliary cue, capturing slowly varying contextual patterns. The low-frequency-dominant appearance proxy map is defined as

M_{app} = ϕ (\frac{| X - μ (X) |}{σ (X) + ε}),

(16)

where

μ (X)

and

σ (X)

denote the channel-wise mean map and standard-deviation map of the input feature, respectively.

ϕ (\cdot)

denotes a lightweight mapping function used to compress the normalized deviation into a single-channel representation.

The wavelet-statistics maps and the low-frequency-dominant appearance proxy map are then normalized and concatenated. The resulting fused descriptor is transformed by a lightweight convolution followed by a Sigmoid activation to generate the spatial gating map

G \in {[0, 1]}^{1 \times H \times W}

. By integrating frequency-aware statistics and appearance-aware cues, the proposed gating mechanism achieves more discriminative spatial selection, thereby enabling targeted compensation of degraded regions while avoiding unnecessary amplification of background noise.

2.3.2. Gated Residual Compensation Module

Once the regions requiring compensation are identified, local residual information should be re-introduced in a controlled manner to compensate for degraded local details. Direct amplification of the input feature maps often leads to noise accumulation and numerical instability in the representation space. To overcome these issues, the proposed FAGC employs a lightweight local residual compensation strategy, which selectively re-injects high-frequency components essential for structural cue compensation.

To extract multi-scale structural cues, we first apply local smoothing operations. Given an input feature map X, the smoothed representations

X_{n}

at different scales are obtained via average pooling:

X_{n} = {AvgPool}_{n \times n} (X), n \in {3, 7} .

(17)

The band-pass component and the high-pass component are then defined as

B = X_{3} - X_{7}, H = X - X_{3} .

(18)

Finally, the fog-aware coefficient and the spatial gating map jointly modulate the residual injection, and the output of FAGC is formulated as

Y = X + f_{raw} (F_{bh} ⊙ G),

(19)

where ⊙ denotes the element-wise product.

2.4. Structural-Positional Enhancement Pyramid

To mitigate structural over-smoothing caused by pooling-dominated aggregation, a structural–positional enhancement pyramid (SPEP) is proposed to enhance feature representation under foggy conditions. The proposed SPEP aims to enlarge the receptive field while preserving structural continuity and positional sensitivity, which are critical for accurately representing slender structures and small objects. In particular, it alleviates the dilution of fine-grained geometric details during deep feature aggregation.

The SPEP is composed of two complementary components. The first is a Structure-Aware Directional Aggregation (SADA) unit, which adopts pooling-free parallel branches to capture contextual information while preserving directional structural cues. The second is a Coordinate-Guided Positional Refinement (CGPR) unit, which enhances feature responses along horizontal and vertical directions to improve spatial sensitivity. Together, these components enable more structure-preserving and position-aware feature encoding under fog-induced degradation. The overall architecture of SPEP is illustrated in Figure 5.

2.4.1. Structure-Aware Directional Aggregation

This component aims to enlarge the receptive field while preserving local structural continuity. To this end, a multi-branch directional aggregation unit is designed to aggregate contextual information without relying on pooling operations, which are known to induce structural over-smoothing.

Given an input feature

F \in R^{C \times H \times W}

, channel compression is first applied via a

1 \times 1

convolution:

F_{r} = ϕ_{1 \times 1} (F),

(20)

where

ϕ_{1 \times 1} (\cdot)

denotes the

1 \times 1

convolutional mapping.

Based on

F_{r}

, three parallel branches are constructed to capture complementary structural and contextual information, yielding

F_{1} = ϕ_{1} (F_{r})

,

F_{2} = ϕ_{2} (F_{r})

, and

F_{3} = ϕ_{3} (F_{r})

. The first two branches are designed to encode directional structural cues, while the third branch captures broader contextual information. Compared with standard isotropic convolution, direction-sensitive convolution is more suitable for representing directional continuity in elongated structures. This property is particularly relevant to transmission line components, such as wires, insulator strings, and connectors, whose discriminative features rely more strongly on consistent local geometry.

Specifically,

ϕ_{1} (\cdot)

successively applies

1 \times 3

and

3 \times 1

convolutional modules, followed by a

3 \times 3

convolutional module,

ϕ_{2} (\cdot)

successively applies

3 \times 1

and

1 \times 3

convolutional modules, followed by a

3 \times 3

convolutional module, and

ϕ_{3} (\cdot)

combines a dilated

3 \times 3

convolution with two standard

3 \times 3

convolutions to introduce contextual information with a larger receptive field. By combining directional-sensitive and dilation-based operations, the proposed design effectively balances structural preservation and contextual aggregation.

The outputs of the three branches are concatenated and fused through a

1 \times 1

convolution, followed by a residual connection with the input feature to produce the enhanced representation.

Unlike pooling-based aggregation, which tends to blur fine geometric details, the proposed aggregation mechanism preserves directional continuity during receptive field expansion. As a result, it alleviates structural over-smoothing and improves feature fidelity for slender and strip-like objects under challenging conditions.

2.4.2. Coordinate-Guided Positional Refinement

Power inspection components often exhibit strong directional layouts and spatial continuity, for which channel-wise attention alone is insufficient to capture their positional dependencies. To address this limitation, a coordinate-guided positional refinement unit is introduced to recalibrate feature responses along the horizontal and vertical directions after structural enhancement.

The enhanced feature produced by the directional aggregation stage is denoted as

F_{SADA} \in R^{C \times H \times W}

, where H and W denote the height and width of the feature map. To encode positional information, global average aggregation is performed independently along the width and height dimensions, yielding two directional descriptors,

A_{h}

and

A_{w}

, respectively. These descriptors capture long-range dependencies along horizontal and vertical directions, thereby preserving coordinate-aware contextual information.

To ensure stable feature recalibration, the two directional responses are combined through averaging rather than direct summation:

G_{coord} = \frac{1}{2} (A_{h} + A_{w}) .

(21)

This mitigates the response imbalance caused by excessive amplification along a single direction. The refined feature is then obtained as

F_{SPEP} = F_{SADA} ⊙ G_{coord} .

(22)

Conventional coordinate attention may introduce overly strong directional bias, whereas the proposed design adopts a more balanced modulation strategy, making it more suitable as a post-aggregation recalibration module. This design reinforces spatial continuity of component structures while suppressing interference from diffuse foggy backgrounds. In particular, for slender and small objects with blurred boundaries and reduced contrast, the coordinate-guided refinement enhances positional consistency and strengthens spatial constraints, thereby improving the stability of localization representations and benefiting downstream bounding-box regression.

3. Experiments and Analysis

3.1. Experimental Settings

The experiments were conducted on the PTL-AI Furnas Dataset [55], an aerial-image dataset designed for UAV-based power transmission line inspection. The dataset contains 6295 images covering five component categories, namely baliser, bird nest, insulator, spacer, and stockbridge. Representative annotated examples and label definitions are shown in Figure 6. Among these categories, insulator, baliser, and spacer are further detailed into some substates.

The dataset was split into training, validation, and test sets in an 8:1:1 ratio using stratified sampling to preserve the original class distribution. Under the clear-to-fog setting considered in this study, the clear-weather training and validation subsets constituted the source domain. The target domain was constructed from the original clear-weather test set by applying the synthetic fog generation algorithm proposed in [56]. Specifically, three fog density levels were generated by setting the thickness parameter to 0.05, 0.07, and 0.09, respectively, and the foggy target domain was formed by evenly mixing samples synthesized under these three settings, with each level accounting for one-third of the target-domain samples. Figure 7 shows the visual differences between representative clear-weather reference images and their fog-synthesized counterparts under different fog density levels.

Some commonly used evaluation metrics for the object detection task were adopted in this study, including precision (P), recall (R), average precision (AP), and mean average precision (mAP). In all experiments, mAP50 was adopted as the main evaluation metric and is referred to as ‘mAP’ for brevity. In addition, mAP50-95 was reported in Table 1 to evaluate detection performance under stricter localization thresholds. These metrics are given as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(23)

R e c a l l = \frac{T P}{T P + F N}

(24)

A P = \int_{0}^{1} P (R) d R

(25)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(26)

Here, true positive (TP) denotes a predicted bounding box that correctly matches a ground-truth object of the same category under the specified IoU threshold, false positive (FP) denotes an incorrect prediction, and false negative (FN) denotes a missed ground-truth object. AP denotes the area under the precision–recall curve for a given category, and mAP is the average AP over all categories.

All experiments were conducted on a workstation equipped with an NVIDIA GeForce RTX 4080 GPU (16 GB memory) running Ubuntu 22.04.4 LTS. The proposed model and all the comparative algorithms were implemented in PyTorch 2.9.1 with CUDA 12.6 acceleration, using Python 3.11.14.

To ensure a fair and reproducible comparison, all detectors, including YOLO-series one-stage detectors and two-stage detectors, were trained and evaluated using the same dataset and training protocol and were reproduced based on their official implementations [5,6,7,8,9,10,11,12,13,16,17,50]. Specifically, all detectors were trained for 200 epochs with a batch size of 16 and an input size of

640 \times 640

pixels. The comparative algorithms were reproduced based on their official implementations and corresponding default optimization settings, while the proposed model was trained using stochastic gradient descent (SGD) with Nesterov momentum and an initial learning rate of 0.01. No additional foggy images were used for model training or fine-tuning. For evaluation, the checkpoint that achieved the best performance on the source-domain validation set during training was selected for testing.

3.2. Experimental Results

Comparative Experiments and Model Performance Validation

To comprehensively evaluate the effectiveness of the proposed method, we compared FogTLD-YOLO with a set of representative detectors under unified training and testing protocols. The comparative algorithms consist of representative generic one-stage detectors, namely RetinaNet [7], YOLOv5s [8], YOLOv8s [10], YOLOv9s [11], YOLOv10n [12], and YOLOv11n [13]. To mitigate the influence of model-scale differences, larger YOLO variants, including YOLOv5l [8], YOLOv8l [10], and YOLOv11l [13], are also incorporated into the comparison. In addition, two representative two-stage detectors, namely Faster R-CNN [5] and Mask R-CNN [6], are evaluated to provide a broader assessment across different detection paradigms. Specifically, several task-oriented transmission line inspection models are also involved in this work, including Lite-YOLO-ID [16], CACS-YOLO [17], and YOLOv8-eRFD-AP [50]. The quantitative results are summarized in Table 1.

In Table 1, FogTLD-YOLO achieves the best overall detection performance in the mixed-fog target domain, with 78.7% precision, 79.7% recall, 82.1% mAP, and 55.1% mAP50-95. Among all comparative algorithms, YOLOv7 provides the most competitive baseline performance, achieving 79.3% mAP and 53.6% mAP50-95. Compared with YOLOv7, FogTLD-YOLO increases mAP and mAP50-95 to 82.1% and 55.1%, respectively, while reducing the number of parameters from 36.53 M to 34.64 M and slightly decreasing FLOPs from 103.30 G to 102.70 G. Figure 8 provides an intuitive comparison of the performance–complexity trade-off across different detectors. In terms of inference speed, FogTLD-YOLO reaches 131.58 FPS, ranking in the middle-to-upper range among the compared detectors and showing a certain real-time processing capability.

General detectors still retain competitive values on some individual metrics, but their overall performance decreases in the unseen foggy target domain. YOLOv5s reaches a precision of 80.0%, whereas its recall and mAP are 53.1% and 56.9%, respectively. YOLOv8s, YOLOv9s, YOLOv10n, and YOLOv11n also exhibit relatively low overall performance in the clear-to-fog setting. The results of larger YOLO variants further suggest that increasing model capacity alone may not be sufficient to ensure improved foggy-domain robustness. Specifically, YOLOv5l and YOLOv8l obtain mAP values of 48.6% and 54.3%, respectively, and do not outperform their smaller-scale counterparts. Although YOLOv11l improves upon YOLOv11, its mAP remains at 64.1%, still considerably lower than the 82.1% achieved by FogTLD-YOLO. This performance gap is more evident in defect-related categories such as insulator_nok and spacer_nok. For spacer_nok, the AP values of YOLOv8s, YOLOv9s, YOLOv10n, and YOLOv11n are only 28.7%, 38.6%, 42.5%, and 25.6%, respectively. RetinaNet, Mask R-CNN, and Faster R-CNN show a similar tendency, especially on slender and small structures. Faster R-CNN obtains 37.9% mAP and 19.8% mAP50-95 in the foggy target domain. For spacer_nok, the AP values of Mask R-CNN and Faster R-CNN are only 5.45% and 8.33%, respectively. This indicates that proposal-based two-stage detectors are less stable for defect-sensitive categories under foggy degradation. These results indicate that categories relying on thin structures, local boundaries, and small abnormal regions are more affected under foggy conditions.

A similar trend is found in task-oriented transmission line inspection detectors. Lite-YOLO-ID and CACS-YOLO obtain recall values of 33.0% and 39.9% and mAP values of 38.9% and 44.3%, respectively. Their category-level performance is also low. For insulator_nok, the AP values are 44.7% and 46.8%. For spacer_nok, they further decrease to 25.1% and 30.8%. These results indicate difficulty in preserving local defect cues and slender component structures in the present clear-to-fog setting.

YOLOv8-eRFD-AP performs better than the other task-oriented detectors. Its recall reaches 53.3%, and its mAP reaches 58.1%. However, the AP values on insulator_nok and spacer_nok remain at 51.4% and 48.4%, respectively. Compared with FogTLD-YOLO, the gap appears not only in overall mAP but also in the categories that depend more strongly on local continuity and small structural deviations.

Figure 9 shows qualitative comparison results under foggy conditions. Ground Truth (GT) denotes the annotated reference results. Red, yellow, and green boxes denote false positives, missed detections, and correct detections, respectively. The compared models include YOLOv7, YOLOv9, CACS-YOLO, and YOLOv8-eRFD-AP. In each row of Figure 9, the three columns correspond to a background-induced false-positive case, a small object missed detection case under heavy fog, and a category confusion case with adjacent structures, respectively.

The image-level results help explain where the quantitative differences come from. Around the upper Stockbridge dampers, several comparative detectors produce extra responses, and some predictions extend to nearby background regions. Under heavy fog, missed detections increase after the visibility of insulator strings and neighboring slender parts decreases. In densely arranged insulator structures and connection regions, some detectors generate redundant boxes or assign incorrect categories. In these regions, FogTLD-YOLO produces fewer false positives and missed detections, and its predicted boxes are closer to the annotated object locations.

Figure 10 presents the corresponding feature activation heatmaps [57,58,59]. In each row of Figure 10, the three columns correspond to a background interference scene, a dense slender-component scene under fog, and a critical-connection region scene, respectively. The color changes from blue to red as the feature response increases. In YOLOv7, high-response regions extend beyond the objects and spread to the tower body, the sky boundary, and other background areas. YOLOv9 shows stronger responses on part of the objects, but the activations remain scattered in regions containing adjacent slender structures. CACS-YOLO and YOLOv8-eRFD-AP place more response on insulator strings and component bodies, but the separation between neighboring slender structures is still limited, and the response on connection nodes is not always concentrated.

For FogTLD-YOLO, high-response regions are more concentrated on insulator strings, spacer bodies, and connection nodes. High responses in background regions, including tower structures, sky, and vegetation, are more limited. Along elongated component paths, the responses are also less fragmented. This observation is consistent with the visual results in Figure 9, where missed detections, redundant boxes, and category confusion occur more often in the locations where other detectors show response spreading or response interruption.

3.3. Ablation Study

The ablation studies were conducted to assess the effectiveness of each proposed component, and three variants are designed as follows:

Variant I: The vanilla YOLOv7 model, where neither the fog-aware gated compensation module (FAGC) nor the structural–positional enhancement pyramid (SPEP) is introduced.

Variant II: Without the structural–positional enhancement pyramid (SPEP).

Variant III: Without the fog-aware gated compensation (FAGC) module.

As shown in Table 2, the proposed FogTLD-YOLO achieves the best overall performance, reaching 82.1% mAP in the foggy target domain. Relative to the complete model, removing either FAGC or SPEP reduces mAP from 82.1% to 81.0%, whereas removing both modules further decreases mAP to 79.3%. These results indicate that both FAGC and SPEP make effective contributions to the detection task, while their joint integration yields the strongest overall performance.

The precision and recall values reflect different effects after removing the two modules. Without SPEP, precision drops from 78.7% to 71.0%, whereas recall increases slightly from 79.7% to 81.1%. This change suggests reduced suppression of structurally ambiguous responses during deep feature aggregation. Without FAGC, precision rises from 78.7% to 80.9%, but recall decreases from 79.7% to 76.4%. This result is more closely related to the loss of compensation for weakened local evidence under foggy conditions. When the two modules are used together, the model reaches 78.7% precision and 79.7% recall and achieves the highest overall mAP among all variants, with a more balanced precision–recall trade-off.

The category-wise results further support the different effects of the two modules. Compared with the YOLOv7, the FogTLD-YOLO improves five of the nine object categories, including baliser_aok, insulator_ok, insulator_nok, spacer_ok, and spacer_nok. Among them, the most evident gains appear on insulator_nok and spacer_nok, with AP increases of 3.3% and 23.0%, respectively. These two categories are more dependent on local abnormal cues, edge continuity, and fine structural details, which are more easily weakened under foggy conditions. For categories with already high baseline AP or weaker dependence on local defect evidence, the changes are relatively small. For example, insulator_ok changes from 95.5% to 95.6%, and bird nest changes from 82.7% to 82.4%. This indicates that the proposed modules mainly improve fog-sensitive and defect-sensitive categories, rather than producing uniform AP increases across all categories. Relative to the complete model, Variant III shows a larger decrease on degradation-sensitive categories, with the largest drop of 3.5% on spacer_nok. Variant II shows more evident reductions on categories that depend on structural continuity and localization stability, including decreases of 2.8% on spacer_nok and 1.2% on spacer_ok. These category-level changes suggest that FAGC is more closely related to preserving degradation-sensitive local cues, whereas SPEP contributes more to structural continuity and localization stability during deep aggregation. Although minor decreases are observed in a few categories, the complete model still achieves the highest overall mAP, increasing from 79.3% to 82.1%. Together, these results support the complementary effects of the two modules in the final model.

4. Discussion

4.1. Impact of the Insertion Position of FAGC

FAGC was inserted separately into the P3, P4, and P5 detection branches, and also jointly into all three branches, to examine how the insertion position affects detection performance. In the following discussion, the latter is denoted as the joint P3–P5 configuration.

As shown in Figure 11, P3 gives the highest mAP, reaching 81.0%. When FAGC is inserted into the deeper branches, the performance decreases slightly. The joint P3–P5 configuration yields 78.7% mAP, which is 2.3% lower than that of P3.

This result is related to the role of FAGC in local feature modulation under foggy conditions. In the high-resolution prediction branch P3, local edge continuity, slender structures, and small abnormal regions are still represented more explicitly, and the effect of frequency-aware local compensation is therefore easier to retain at this stage. Since fog-induced degradation primarily suppresses high-frequency components, such as edges and fine structural details, FAGC can produce a more evident compensation effect in P3. In P4 and P5, these local structural cues become weaker and more abstract as the spatial resolution decreases, and the gain brought by FAGC correspondingly becomes smaller.

A similar tendency can also be seen at the category level. The categories insulator_nok and spacer_nok are taken as representative examples because both are associated with slender structures or small abnormal regions. When FAGC is inserted only into P3, the AP of insulator_nok reaches 91.3%, and the AP of spacer_nok reaches 69.8%. These are the highest values among the compared insertion settings. Extending FAGC to all three levels leads to lower results, indicating that direct multi-level insertion does not provide additional benefit in the present setting.

These results support placing FAGC in the high-resolution P3 branch, where fog-weakened local cues are still retained more clearly. In deeper branches, the effect of FAGC becomes less evident.

4.2. Ablation Study on the Gating Designs in FAGC

Three gating designs are compared in Table 3. One uses only wavelet-statistics maps, one uses only the low-frequency-dominant appearance proxy, and the third combines both cues.

The combined design yields the highest overall mAP at 81.0%. The wavelet-only and appearance-only settings reach 79.5% and 79.7%, respectively. This pattern further demonstrates that the gate is more effective when wavelet statistics maps and the low-frequency-dominant appearance proxy are used together.

Differences become clearer at the category level. With wavelet statistics alone, spacer_nok reaches 74.5%, the highest value among the three designs. The corresponding results for bird nest and stockbridge_ok, however, are 79.1% and 75.6%. When only the appearance proxy is used, the distribution across categories becomes more even. In this case, insulator_nok, bird nest, and stockbridge_ok rise to 90.7%, 83.1%, and 77.8%, respectively, whereas spacer_nok falls to 60.9%, well below the wavelet-only setting.

These category-level shifts point to different roles of the two cues. The wavelet-only design shows a stronger response on spacer_nok, a category that depends more heavily on fine structures and local geometric abnormalities. The appearance-only design gives more balanced results on insulator_nok, bird nest, and stockbridge_ok, which suggests that the appearance proxy contributes more to local context discrimination and suppression of irrelevant background responses.

The joint design retains the advantages of both cues to a greater extent. In this design, insulator_nok, bird nest, and stockbridge_ok reach 91.3%, 83.3%, and 79.0%, respectively, which are the most favorable results among the three designs. Although the result for spacer_nok does not exceed that of the wavelet-only design, it remains high at 69.8%. These results indicate that using wavelet-statistics maps together with the low-frequency-dominant appearance proxy provides a more suitable gating design for FAGC. The former highlights regions with weakened structural responses. The latter helps suppress ineffective compensation in smooth foggy backgrounds.

4.3. Effect of Convolutional Branch Design in SADA

Figure 12 and Figure 13 illustrate how detection performance varies with the convolutional branch design within the SADA module. In this analysis, the CGPR module and the dilated-context branch remain unchanged, while variations are introduced only in the first two structural branches. The compared settings include Variant I, with an isotropic design, Variant II, with a large-kernel isotropic design, Variant III, with a strong-directional design, and the SADA setting, which employs asymmetric

1 \times 3

and

3 \times 1

convolutional branches. The three variants are designed as follows:

Variant I: The two structural branches adopt the same configuration, namely successive

3 \times 3

and

3 \times 3

convolutional modules followed by a

3 \times 3

convolutional module.

Variant II: The two structural branches adopt the same configuration, namely successive

5 \times 5

and

1 \times 1

convolutional modules followed by a

3 \times 3

convolutional module.

Variant III: The two structural branches adopt different directional configurations: The first branch uses successive

1 \times 5

and

5 \times 1

convolutional modules followed by a

3 \times 3

convolutional module, whereas the second branch uses successive

5 \times 1

and

1 \times 5

convolutional modules followed by a

3 \times 3

convolutional module.

As shown in Figure 12, the adopted SADA setting achieves the highest mAP of 81.0%, whereas the mAP values for Variant I, Variant II, and Variant III are 78.0%, 80.2%, and 79.2%, respectively. These results suggest that the effectiveness of SADA may not be fully explained by receptive field enlargement alone, but is more closely related to the joint effect of directional sensitivity and appropriate receptive field construction. Specifically, the larger-kernel isotropic design and the stronger directional design provide larger spatial or directional receptive fields than the adopted 1 × 3/3 × 1 branches. However, their performance does not further exceed that of the adopted design, indicating that increasing the convolutional range alone may be insufficient for fog-degraded slender-component detection. Conventional isotropic convolutions provide limited structural discrimination for slender objects, while larger kernels may introduce additional neighboring responses and weaken fine structural details. Meanwhile, an overly strong directional constraint may reduce the flexibility needed to describe local connection regions and subtle defects. Therefore, the adopted 1 × 3/3 × 1 branches provide a more suitable balance between direction-sensitive local modeling and receptive field construction, enabling SADA to better preserve structural continuity under foggy conditions.

The same difference can also be seen in the category-level results in Figure 13. Under the adopted SADA setting, insulator_ok, insulator_nok, and spacer_nok reach 95.8%, 91.1%, and 69.1%, respectively, which are the highest values among the evaluated settings. Variant II, which uses larger isotropic kernels, still gives relatively high results on insulator categories. Its spacer_nok performance, however, drops to 65.9%. Variant III obtains 95.0% on insulator_ok, but its results on insulator_nok and spacer_nok are lower than those of the adopted SADA setting. These category-level changes indicate that an excessively strong directional bias is less favorable for preserving local defect sensitivity together with the continuity of slender structures.

These findings further elucidate the role of branch design within SADA. Its contribution is related not only to receptive-field expansion, but also to the way directional sensitivity is introduced into structural representation. The adopted asymmetric

1 \times 3

and

3 \times 1

convolutional design achieves a more suitable balance between structural continuity and local defect sensitivity, which is particularly important for slender structures and defect-sensitive categories in foggy scenes.

4.4. Impact of Proposed Modules on Different Detector Architectures

Table 4 compares the effects of FAGC and SPEP across the evaluated detector architectures. From single-module insertion to joint use of the two modules, overall mAP improves consistently across the evaluated YOLO-based one-stage detectors. The joint configuration reaches 60.9% for YOLOv5, 71.1% for YOLOv9, 57.1% for YOLOv10, and 82.1% for YOLOv7. To further examine the transferability of the proposed modules to a different detection paradigm, Faster R-CNN is additionally introduced as a representative two-stage detector. The results show that direct transfer to the two-stage framework leads to less consistent performance gains. Specifically, FAGC slightly improves the mAP of Faster R-CNN from 37.88% to 38.93%, suggesting that fog-aware local compensation can still provide some benefit for weakened local evidence. However, SPEP decreases the mAP to 36.52%.

Category-level results further separate the roles of the two modules. FAGC brings larger gains on baliser_nok and insulator_nok, where recognition depends more strongly on local defects and fine structural evidence. This effect appears in all four detector groups and is more consistent with the role of FAGC in frequency-aware compensation of fog-weakened local cues. SPEP contributes more on bird nest, spacer_ok, and spacer_nok, categories that rely more on structural continuity and direction-aware deep aggregation. The difference appears more clearly in YOLOv5 and YOLOv7.

The joint configuration produces the largest overall gains among the evaluated YOLO-based one-stage detectors. Across the evaluated architectures, it yields the best overall mAP and keeps competitive performance on defect-sensitive and structure-sensitive categories. FAGC mainly compensates local discriminative cues under foggy conditions, whereas SPEP better preserves structural continuity during deep feature aggregation. Used together, the two modules show complementary effects and address fog-induced degradation more broadly across YOLO-based one-stage detector architectures. In contrast, the results on Faster R-CNN indicate that FAGC and SPEP are not directly transferable to all detection paradigms in a plug-and-play manner. One possible reason is that YOLO-based one-stage detectors directly use multi-scale neck features for classification and localization, whereas two-stage detectors rely on region proposal generation and RoI-level feature extraction. When directly integrated into a proposal-based pipeline, the proposed modules may alter the features used for proposal generation and RoI-level refinement. Therefore, architecture-specific adaptation may be required when extending FAGC and SPEP to two-stage detectors.

4.5. Evaluation Across Fog Densities and Training Protocols

To analyze the influence of fog density and training protocol on detection performance, two training protocols are compared. The first follows the basic domain generalization protocol of this study, where only clear-weather images are used for training. The second adopts supervised mixed-fog training, where the original clear-weather training images are converted into synthetic-fog images at fog densities of 0.05, 0.07, and 0.09. Each density accounts for one third of the generated training samples. The proposed model is then evaluated under clear-weather and fog density-specific test conditions.

As shown in Figure 14, the two training settings exhibit different performance tendencies across test conditions. Under clear-weather training, FogTLD-YOLO achieves 90.9% mAP on the clear-weather test set. As the fog density increases from 0.05 to 0.07 and 0.09, the mAP decreases from 88.7% to 82.1% and 76.1%, respectively. The result on the mixed-fog test set is 82.1%. This trend indicates that the original training protocol preserves favorable performance under clear-weather conditions, but becomes increasingly affected by stronger fog degradation. Under supervised mixed-fog training, the performance trend is different. The mAP on the clear-weather test set decreases to 79.3%, whereas the mAP values on fog densities of 0.05, 0.07, and 0.09 reach 86.9%, 89.6%, and 90.3%, respectively. The mAP on the mixed-fog test set is 90.1%. This result shows that supervised mixed-fog training improves adaptation to synthetic foggy inputs, especially under higher fog density test conditions, but reduces compatibility with clear-weather images.

To further analyze the feature-distribution discrepancy between clear-weather and foggy domains, we calculate the maximum mean discrepancy (MMD) [60] under different fog density conditions. Specifically, features from the high-resolution P3 branch before the detection head are extracted to calculate MMD between clear-weather-domain samples and foggy-domain samples. A smaller MMD value indicates a smaller feature distribution discrepancy. As shown in Table 5, FogTLD-YOLO consistently yields lower MMD values than YOLOv7 under all evaluated fog density conditions, reducing the MMD by at least 0.077 in absolute terms and by at least 38.1% in relative terms. This result indicates that FogTLD-YOLO reduces the feature-level gap between the clear-weather source domain and foggy target domains, thereby improving the robustness of clear-weather-trained features under different fog density test conditions.

Figure 15 further shows the feature activation heatmap results of the mixed-fog-trained FogTLD-YOLO under different fog density inputs. For each row, the four images from left to right correspond to clear-weather, fog density 0.05, fog density 0.07, and fog density 0.09, respectively. As the fog density changes, the main high-response regions remain located around annotated transmission line objects, while diffuse responses in surrounding vegetation, tower structures, and cluttered backgrounds are not obviously amplified. This observation shows that the model maintains object-oriented spatial responses under different synthetic fog density conditions.

Overall, the MMD and feature activation heatmap analyses provide complementary evidence for the behavior of FogTLD-YOLO under different fog density conditions. MMD quantitatively measures the feature distribution discrepancy between clear-weather and foggy domains, while the feature activation heatmaps qualitatively show the spatial response behavior under different fog density inputs. Together with the performance results, these analyses show that the test performance of FogTLD-YOLO is affected by both fog density and training protocol. Clear-weather training better preserves the original-domain performance and still provides clear-to-fog robustness, whereas supervised mixed-fog training improves performance on synthetic foggy test subsets but sacrifices clear-weather compatibility.

5. Conclusions

This study proposes FogTLD-YOLO, an end-to-end framework to address the challenge of transmission line defect detection under foggy conditions with UAV-based inspection. By jointly modeling fog-induced degradation and structure-preserving feature aggregation, FogTLD-YOLO enhances weakened local evidence while maintaining the continuity and positional sensitivity of slender structures. In this way, the proposed model provides a targeted detection framework for UAV-based transmission line inspection when only clear-weather training data are available.

Extensive experiments under mixed fog density conditions demonstrate the effectiveness of FogTLD-YOLO. On the foggy target domain constructed with density levels of 0.05, 0.07, and 0.09, FogTLD-YOLO achieved 78.7% precision, 79.7% recall, and 82.1% mAP. Specifically, the proposed model achieves 92.0% AP on insulator_nok and 72.6% AP on spacer_nok, showing clear advantages on defect-sensitive and slender structure categories. Moreover, FogTLD-YOLO outperforms YOLOv7, the best-performing generic detector, by 2.8% on mAP. Compared with representative task-oriented detectors, the performance gain remains substantial, further indicating the robustness and practical value of the proposed design for foggy transmission line inspection.

Overall, the results indicate that explicit enhancement of degradation-sensitive local cues together with structure-aware aggregation is effective for robust defect detection in foggy inspection scenarios. Although the synthetic fog protocol enables controlled evaluation of clear-to-fog robustness under different fog densities, it cannot fully reproduce real foggy UAV inspection conditions. Real scenarios may involve spatially nonuniform fog, illumination variation, depth-dependent visibility changes, wind-induced motion, and sensor noise. Therefore, further evaluation on real foggy transmission line inspection data remains an important direction for future work.

Author Contributions

Conceptualization, J.L. and Y.L.; Data curation, J.L. and J.S.; Formal analysis, Y.L.; Funding acquisition, Y.L.; Investigation, J.L.; Methodology, J.L. and Y.L.; Project administration, Y.L., B.M., and S.L.; Resources, Y.L.; Software, J.L.; Supervision, Y.L.; Validation, J.S., B.M., and S.L.; Visualization, J.S. and B.M.; Writing—original draft, J.L.; Writing—review and editing, J.L. and B.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China under Grant 62401195 and in part by the Hebei Key Talent Program (Platform for Returnees from Overseas) under Grant C2024002.

Data Availability Statement

The PTL-AI Furnas dataset used in this study is publicly available as detailed in Section 3.1.

Acknowledgments

The authors would like to thank the creators of the PTL-AI Furnas dataset for making this dataset publicly available and for providing valuable image resources for research on fault detection in power transmission lines.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hao, K.; Chen, G.; Zhao, L.; Li, Z.; Liu, Y.; Wang, C. An Insulator Defect Detection Model in Aerial Images Based on Multiscale Feature Pyramid Network. IEEE Trans. Instrum. Meas. 2022, 71, 3522412. [Google Scholar] [CrossRef]
Zheng, J.; Wu, H.; Zhang, H.; Wang, Z.; Xu, W. Insulator-Defect Detection Algorithm Based on Improved YOLOv7. Sensors 2022, 22, 8801. [Google Scholar] [CrossRef]
Shuang, F.; Chen, X.; Li, Y.; Wang, Y.; Miao, N.; Zhou, Z. PLE: Power Line Extraction Algorithm for UAV-Based Power Inspection. IEEE Sens. J. 2022, 22, 19941–19952. [Google Scholar] [CrossRef]
Mei, H.; Jiang, H.; Yin, F.; Wang, L.; Farzaneh, M. Terahertz Imaging Method for Composite Insulator Defects Based on Edge Detection Algorithm. IEEE Trans. Instrum. Meas. 2021, 70, 4504310. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Jocher, G. Ultralytics YOLOv5. Computer Software. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 24 March 2026).
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. Computer Software. 2023. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 24 March 2026).
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Computer Vision–ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer Nature: Cham, Switzerland, 2025; pp. 1–21. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Jocher, G.; Qiu, J. Ultralytics YOLO11. Computer Software, Version 11.0.0. 2024. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 24 March 2026).
Lv, X.-L.; Chiang, H.-D. Visual clustering network-based intelligent power lines inspection system. Eng. Appl. Artif. Intell. 2024, 129, 107572. [Google Scholar] [CrossRef]
He, M.; Qin, L.; Deng, X.; Liu, K. MFI-YOLO: Multi-Fault Insulator Detection Based on an Improved YOLOv8. IEEE Trans. Power Deliv. 2024, 39, 168–179. [Google Scholar] [CrossRef]
Li, D.; Lu, Y.; Gao, Q.; Li, X.; Yu, X.; Song, Y. LiteYOLO-ID: A Lightweight Object Detection Network for Insulator Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 5023812. [Google Scholar] [CrossRef]
Cao, Z.; Chen, K.; Chen, J.; Chen, Z.; Zhang, M. CACS-YOLO: A Lightweight Model for Insulator Defect Detection Based on Improved YOLOv8m. IEEE Trans. Instrum. Meas. 2024, 73, 3530710. [Google Scholar] [CrossRef]
Zhao, D.; Asano, Y.; Gu, L.; Sato, I.; Zhou, H. City-Scale Distance Sensing via Bispectral Light Extinction in Bad Weather. Remote Sens. 2020, 12, 1401. [Google Scholar] [CrossRef]
Fang, W.; Zhang, G.; Zheng, Y.; Chen, Y. Multi-Task Learning for UAV Aerial Object Detection in Foggy Weather Condition. Remote Sens. 2023, 15, 4617. [Google Scholar] [CrossRef]
Dave, C.; Patel, H.; Kumar, A. Unsupervised single image dehazing—A contour approach. J. Vis. Commun. Image Represent. 2024, 100, 104119. [Google Scholar] [CrossRef]
Yan, W.; Cui, L. Image Dehaze Algorithm Based on Improved Atmospheric Scattering Models. IEEE Access 2024, 12, 98971–98976. [Google Scholar] [CrossRef]
Zhao, D.; Tang, L.; Arun, P.V.; Asano, Y.; Zhang, L.; Xiong, Y.; Tao, X.; Hu, J. City-Scale Distance Estimation via Near-Infrared Trispectral Light Extinction in Bad Weather. Infrared Phys. Technol. 2023, 128, 104507. [Google Scholar] [CrossRef]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. DehazeNet: An End-to-End System for Single Image Haze Removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef]
Wang, Y.; Yan, X.; Wang, F.L.; Xie, H.; Yang, W.; Zhang, X.-P.; Qin, J.; Wei, M. UCL-Dehaze: Toward Real-World Image Dehazing via Unsupervised Contrastive Learning. IEEE Trans. Image Process. 2024, 33, 1361–1374. [Google Scholar] [CrossRef]
Zhang, J.; Tao, D. FAMED-Net: A Fast and Accurate Multi-Scale End-to-End Dehazing Network. IEEE Trans. Image Process. 2020, 29, 72–84. [Google Scholar] [CrossRef]
Praveen, N.; Samridhi; Chahande, M. Comparative Study of Various Dehazing Algorithms. In Intelligent Systems; Sheth, A., Sinhal, A., Shrivastava, A., Pandey, A.K., Eds.; Springer: Singapore, 2021; pp. 231–243. [Google Scholar] [CrossRef]
Yi, W.; Dong, L.; Liu, M.; Hui, M.; Kong, L.; Zhao, Y. Priors-assisted dehazing network with attention supervision and detail preservation. Neural Netw. 2024, 173, 106165. [Google Scholar] [CrossRef]
Zhao, Y.; Ju, Z.; Sun, T.; Dong, F.; Li, J.; Yang, R.; Fu, Q.; Lian, C.; Shan, P. TGC-YOLOv5: An Enhanced YOLOv5 Drone Detection Model Based on Transformer, GAM & CA Attention Mechanism. Drones 2023, 7, 446. [Google Scholar] [CrossRef]
Liu, H.; Cheng, X.; Wang, C. MHP-DETR: A Vehicle Detection System Based on Multimodal Attention and Hypergraph Fusion for Complex Weather Conditions. IEEE Sens. J. 2025, 25, 39014–39026. [Google Scholar] [CrossRef]
Peng, B.; Ma, C.; Chen, Y.; Zhu, M.; Liao, N. MTW-DETR: A multi-task collaborative optimization model for adverse weather object detection. Pattern Recognit. Lett. 2026, 199, 7–12. [Google Scholar] [CrossRef]
Zhao, D.; Xu, X.; You, M.; Arun, P.V.; Zhao, Z.; Ren, J.; Wu, L.; Zhou, H. Local Sub-Block Contrast and Spatial–Spectral Gradient Feature Fusion for Hyperspectral Anomaly Detection. Remote Sens. 2025, 17, 695. [Google Scholar] [CrossRef]
Liu, S.; Li, Z.; Wang, G.; Qiu, X.; Liu, T.; Cao, J.; Zhang, D. Spectral–Spatial Feature Fusion for Hyperspectral Anomaly Detection. Sensors 2024, 24, 1652. [Google Scholar] [CrossRef]
Chen, Y.; Yuan, Q.; Tang, Y.; Xiao, Y.; He, J.; Liu, Z. SENSE: Hyperspectral Video Object Tracker via Fusing Material and Motion Cues. Inf. Fusion 2024, 109, 102395. [Google Scholar] [CrossRef]
Zhao, D.; Zhang, H.; Arun, P.V.; Jiao, C.; Zhou, H.; Xiang, P.; Cheng, K. SiamSTU: Hyperspectral Video Tracker Based on Spectral Spatial Angle Mapping Enhancement and State Aware Template Update. Infrared Phys. Technol. 2025, 150, 105919. [Google Scholar] [CrossRef]
Zhao, W.; Xie, S.; Zhao, F.; He, Y.; Lu, H. MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding from Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 13955–13965. [Google Scholar] [CrossRef]
Zhao, W.; Zhao, Z.; Xu, M.; Ding, Y.; Gong, J. Differential Multimodal Fusion Algorithm for Remote Sensing Object Detection through Multi-Branch Feature Extraction. Expert Syst. Appl. 2025, 265, 125826. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, J.; Wang, J.; Shi, D.; Han, S.; Deng, L. C²DFF-Net for Object Detection in Multimodal Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–16. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Machine Learning and Knowledge Discovery in Databases; Amini, M.-R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 443–459. [Google Scholar] [CrossRef]
Zhang, H.; An, L.; Chu, V.W.; Stow, D.A.; Liu, X.; Ding, Q. Learning Adjustable Reduced Downsampling Network for Small Object Detection in Urban Environments. Remote Sens. 2021, 13, 3608. [Google Scholar] [CrossRef]
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial Discriminative Domain Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
Li, C.-L.; Chang, W.-C.; Cheng, Y.; Yang, Y.; Poczos, B. MMD GAN: Towards Deeper Understanding of Moment Matching Network. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 2203–2213. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/dfd7468ac613286cdbb40872c8ef3b06-Paper.pdf (accessed on 24 March 2026).
Lee, D.-H. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. In Proceedings of the ICML 2013 Workshop on Challenges in Representation Learning, Atlanta, GA, USA, 21 June 2013. [Google Scholar]
Li, J.; Zhou, H.; Lv, G.; Chen, J. A2MADA-YOLO: Attention Alignment Multiscale Adversarial Domain Adaptation YOLO for Insulator Defect Detection in Generalized Foggy Scenario. IEEE Trans. Instrum. Meas. 2025, 74, 5011419. [Google Scholar] [CrossRef]
Liu, Q.; Liu, Y.; Yan, Y.; Jiang, Q.; Jiang, X. Addressing Domain Shift in Insulator Defect Data: A Generalization Framework for Cross-Domain Detection of Broken and Self-Blast Insulator Defect. IEEE Trans. Instrum. Meas. 2025, 74, 5037614. [Google Scholar] [CrossRef]
Xu, X.; Yang, J.; Chong, W.; Shi, W.; Sun, S.; Xing, J.; Liu, J. Boosting Single-Domain Generalized Object Detection via Vision–Language Knowledge Interaction. In Proceedings of the 33rd ACM International Conference on Multimedia (MM ’25), Dublin, Ireland, 27–31 October 2025; pp. 131–140. [Google Scholar] [CrossRef]
Vidit, V.; Engilberge, M.; Salzmann, M. CLIP the Gap: A Single Domain Generalization Approach for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 3219–3229. [Google Scholar]
Li, B.; Li, J.; Liu, X.; Xu, R.; Tu, Z.; Guo, J.; Zou, Q.; Li, X.; Yu, H. V2X-DGW: Domain Generalization for Multi-Agent Perception Under Adverse Weather Conditions. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025; pp. 974–980. [Google Scholar] [CrossRef]
Benelmostafa, B.-E.; Aitelhaj, R.; Medromi, H. YOLOv8-eRFD-AP: A Novel Domain Generalization Model for UAV-Based Insulator Inspection Under Adverse Weather Conditions. IEEE Access 2025, 13, 135336–135358. [Google Scholar] [CrossRef]
Yu, H.; Zheng, N.; Zhou, M.; Huang, J.; Xiao, Z.; Zhao, F. Frequency and Spatial Dual Guidance for Image Dehazing. In Computer Vision–ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 181–198. [Google Scholar] [CrossRef]
Dong, W.; Wang, C.; Sun, H.; Teng, Y.; Liu, H.; Zhang, Y.; Zhang, K.; Li, X.; Xu, X. End-to-End Detail-Enhanced Dehazing Network for Remote Sensing Images. Remote Sens. 2024, 16, 225. [Google Scholar] [CrossRef]
Wang, M.; Liao, L.; Huang, D.; Fan, Z.; Zhuang, J.; Zhang, W. Frequency and content dual stream network for image dehazing. Image Vis. Comput. 2023, 139, 104820. [Google Scholar] [CrossRef]
Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
de Oliveira, F.S.; de Carvalho, M.; Campos, P.H.T.; da Silva Soares, A.; Júnior, A.C.; da Silva Quirino, A.C.R. PTL-AI Furnas Dataset: A Public Dataset for Fault Detection in Power Transmission Lines Using Aerial Images. In Proceedings of the 35th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Natal, Brazil, 24–27 October 2022; pp. 7–12. [Google Scholar] [CrossRef]
Zhang, Z.-D.; Zhang, B.; Lan, Z.-C.; Liu, H.-C.; Li, D.-Y.; Pei, L.; Yu, W.-X. FINet: An Insulator Dataset and Detection Benchmark Based on Synthetic Fog and Improved YOLOv5. IEEE Trans. Instrum. Meas. 2022, 71, 6006508. [Google Scholar] [CrossRef]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar] [CrossRef]
Petsiuk, V.; Jain, R.; Manjunatha, V.; Morariu, V.I.; Mehra, A.; Ordonez, V.; Saenko, K. Black-Box Explanation of Object Detectors via Saliency Maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 11438–11447. [Google Scholar] [CrossRef]
Zhao, C.; Hsiao, J.H.; Chan, A.B. Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5967–5985. [Google Scholar] [CrossRef]
Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A Kernel Two-Sample Test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]

Figure 1. Clear-to-fog setting considered in this study.

Figure 2. Overall architecture of the proposed FogTLD-YOLO.

Figure 3. Architecture of the proposed FAGC module.

Figure 4. Workflow of the proposed FAGC module.

Figure 5. Architecture of the proposed SPEP module, consisting of SADA and CGPR.

Figure 6. Representative annotated samples from the PTL-AI Furnas dataset used in this study: (a) insulator; (b) stockbridge; (c) spacer; (d) baliser.

Figure 7. Representative clear-weather images and their fog-synthesized counterparts under different fog densities. Each subfigure shows a clear-weather image on the left and its corresponding fog-synthesized image on the right. (a) Fog density 0.05; (b) fog density 0.07; (c) fog density 0.09.

Figure 8. Performance–complexity trade-off among different detectors.

Figure 9. Qualitative comparison of detection results in the foggy target domain: (a) GT; (b) FogTLD-YOLO; (c) YOLOv7; (d) YOLOv9; (e) CACS-YOLO; and (f) YOLOv8-eRFD-AP.

Figure 10. Feature activation heatmap comparison of different detectors in the foggy target domain: (a) GT; (b) FogTLD-YOLO; (c) YOLOv7; (d) YOLOv9; (e) CACS-YOLO; and (f) YOLOv8-eRFD-AP. Red boxes denote ground-truth object regions.

Figure 11. Effect of the insertion position of FAGC on detection performance.

Figure 12. Effect of convolutional branch design in the SADA module of SPEP on overall mAP.

Figure 13. Effect of convolutional branch design in the SADA module of SPEP on category-wise AP.

Figure 14. mAP comparison of FogTLD-YOLO under different fog density test conditions and training protocols.

Figure 15. Feature activation heatmaps of the mixed-fog-trained FogTLD-YOLO under clear-weather and fog density conditions of 0.05, 0.07, and 0.09. (a) multi-object tower scene; (b) complex-background scene; (c) slender-component scene. Red boxes denote ground-truth object regions.

Table 1. Comparison of the proposed FogTLD-YOLO with mainstream generic detectors and task-oriented transmission line inspection detectors in the foggy target domain. Bold values denote the best results.

Model	P (%)	R (%)	mAP (%)	mAP50-95 (%)	insulator _nok (%)	spacer _nok (%)	Params (M)	FLOPs (G)	FPS
YOLOv5s [8]	80.0	53.1	56.9	35.9	66.7	49.5	7.08	8.18	222.22
YOLOv5l [8]	78.0	45.2	48.6	30.9	58.3	24.5	46.65	114.10	161.29
YOLOv7 [9]	78.6	79.2	79.3	53.6	88.7	49.6	36.53	103.30	140.85
YOLOv8s [10]	72.0	50.8	59.5	40.6	66.3	28.7	11.14	14.33	344.83
YOLOv8l [10]	68.8	45.4	54.3	37.1	61.6	43.5	43.61	164.90	130.60
YOLOv9s [11]	73.4	67.0	67.7	46.7	83.3	38.6	9.60	38.80	112.36
YOLOv10n [12]	63.3	50.3	52.5	35.0	59.1	42.5	2.28	6.50	294.12
YOLOv11n [13]	64.8	48.9	52.3	34.6	62.5	25.6	2.59	6.30	277.78
YOLOv11l [13]	78.5	53.0	64.1	44.7	72.5	43.0	25.29	86.60	121.95
RetinaNet [7]	47.6	67.5	47.6	30.2	52.2	57.1	32.41	129.97	39.63
Mask R-CNN [6]	27.5	67.8	45.3	25.7	53.1	5.45	44.02	134.54	42.63
Faster R-CNN [5]	74.6	48.0	37.9	19.8	24.9	8.33	41.35	181.71	96.26
Lite-YOLO-ID [16]	59.5	33.0	38.9	22.6	44.7	25.1	4.65	10.60	303.03
CACS-YOLO [17]	66.9	39.9	44.3	29.7	46.8	30.8	27.40	66.40	166.67
YOLOv8-eRFD-AP [50]	75.3	53.3	58.1	37.7	51.4	48.4	18.40	51.10	208.33
FogTLD-YOLO (ours)	78.7	79.7	82.1	55.1	92.0	72.6	34.64	102.70	131.58

Table 2. Ablation results of different YOLOv7-based variants in the foggy target domain. Bold values denote the best results.

Evaluation Item	Variant I	Variant II	Variant III	FogTLD-YOLO
FAGC	–	✓	–	✓
SPEP	–	–	✓	✓
P (%)	78.6	71.0	80.9	78.7
R (%)	79.2	81.1	76.4	79.7
mAP (%)	79.3	81.0	81.0	82.1
baliser_ok (%)	89.9	89.5	88.4	88.9
baliser_aok (%)	83.0	79.9	80.3	83.2
baliser_nok (%)	77.4	71.9	71.6	76.1
insulator_ok (%)	95.5	95.8	95.8	95.6
insulator_nok (%)	88.7	91.3	91.1	92.0
bird nest (%)	82.7	83.3	84.4	82.4
stockbridge_ok (%)	78.0	79.0	78.2	77.5
spacer_ok (%)	69.1	69.1	70.2	70.3
spacer_nok (%)	49.6	69.8	69.1	72.6
Params (M)	36.53	36.60	34.57	34.64
FLOPs (G)	103.30	104.20	101.80	102.70

Note: ✓ indicates that the corresponding module is used, while – indicates that it is not used.

Table 3. Performance comparison of different FAGC gating designs. Bold values denote the best results.

Wavelet	Appearance	mAP (%)	insulator_nok (%)	bird nest (%)	stockbridge_ok (%)	spacer_nok (%)
✓	–	79.5	89.4	79.1	75.6	74.5
–	✓	79.7	90.7	83.1	77.8	60.9
✓	✓	81.0	91.3	83.3	79.0	69.8

Note: ✓ indicates that the corresponding gating design is used, while – indicates that it is not used.

Table 4. Impact of FAGC and SPEP on different detector architectures in the foggy target domain. Bold values denote the best results within each detector group.

Detector	FAGC	SPEP	mAP (%)	baliser_ok (%)	baliser_aok (%)	baliser_nok (%)	insulator_ok (%)	insulator_nok (%)	bird nest (%)	stockbridge_ok (%)	spacer_ok (%)	spacer_nok (%)
YOLOv5 [8]			56.9	72.3	64.5	51.2	79.2	66.7	24.8	52.9	50.5	49.5
	✓		58.7	77.8	61.3	55.5	81.7	70.9	22.8	58.0	50.7	49.5
		✓	58.5	71.8	61.1	51.3	80.2	66.3	31.0	53.4	55.1	56.2
	✓	✓	60.9	77.5	67.0	54.3	80.2	67.7	25.4	53.7	47.8	74.5
YOLOv9 [11]			67.7	83.4	70.5	62.0	91.7	83.3	60.9	63.8	54.8	38.6
	✓		68.6	81.6	71.2	66.2	91.0	79.5	58.3	63.7	56.1	50.2
		✓	69.6	83.2	75.1	62.3	90.7	80.6	64.7	63.4	51.2	55.5
	✓	✓	71.1	84.4	71.6	67.9	91.6	83.9	60.4	63.6	51.8	65.0
YOLOv10 [12]			52.5	72.4	63.3	47.0	77.9	59.1	22.7	49.9	37.9	42.5
	✓		56.3	79.8	58.6	64.5	83.8	69.2	30.6	51.8	42.0	26.1
		✓	54.2	78.4	62.2	54.6	81.3	59.5	30.6	51.8	38.1	31.1
	✓	✓	57.1	71.1	60.9	58.5	84.7	67.3	25.1	53.6	39.9	52.5
YOLOv7 [9]			79.3	89.9	83.0	77.4	95.5	88.7	82.7	78.0	69.1	49.6
	✓		81.0	89.5	79.9	71.9	95.8	91.3	83.3	79.0	69.1	69.8
		✓	81.0	88.4	80.3	71.6	95.8	91.1	84.4	78.2	70.2	69.1
	✓	✓	82.1	88.9	83.2	76.1	95.6	92.0	82.4	77.5	70.3	72.6
Faster R-CNN [5]			37.9	61.3	66.5	60.9	70.0	24.9	5.56	22.2	21.3	8.33
	✓		38.9	64.8	54.2	56.4	65.7	31.8	5.98	24.5	22.0	25.0
		✓	36.5	57.2	53.9	55.8	66.3	37.1	3.09	22.7	20.0	12.5
	✓	✓	35.9	53.7	41.7	51.2	64.6	29.0	5.67	27.8	24.1	25.0

Note: ✓ indicates that the corresponding module is used, while a blank cell indicates that it is not used.

Table 5. Feature distribution discrepancy in terms of MMD (↓). Bold values denote the best results.

Model	Fog Density 0.05	Fog Density 0.07	Fog Density 0.09	Mixed-Fog Setting
YOLOv7 [9] (baseline)	0.116	0.190	0.265	0.173
FogTLD-YOLO (ours)	0.039	0.084	0.164	0.080

Note: ↓ indicates that lower values are better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Lang, Y.; Shen, J.; Ma, B.; Li, S. Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement. Electronics 2026, 15, 2136. https://doi.org/10.3390/electronics15102136

AMA Style

Li J, Lang Y, Shen J, Ma B, Li S. Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement. Electronics. 2026; 15(10):2136. https://doi.org/10.3390/electronics15102136

Chicago/Turabian Style

Li, Jiayin, Yue Lang, Jingfei Shen, Binbin Ma, and Shuang Li. 2026. "Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement" Electronics 15, no. 10: 2136. https://doi.org/10.3390/electronics15102136

APA Style

Li, J., Lang, Y., Shen, J., Ma, B., & Li, S. (2026). Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement. Electronics, 15(10), 2136. https://doi.org/10.3390/electronics15102136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Transmission Line Defect Detection in Fog via Structure-Preserving and Degradation-Aware Enhancement

Abstract

1. Introduction

2. Methodology

2.1. Problem Formulation

2.2. Overall Framework

2.3. Fog-Aware Gated Compensation Module

2.3.1. Fog-Aware Wavelet-Guided Gating Mechanism

2.3.2. Gated Residual Compensation Module

2.4. Structural-Positional Enhancement Pyramid

2.4.1. Structure-Aware Directional Aggregation

2.4.2. Coordinate-Guided Positional Refinement

3. Experiments and Analysis

3.1. Experimental Settings

3.2. Experimental Results

Comparative Experiments and Model Performance Validation

3.3. Ablation Study

4. Discussion

4.1. Impact of the Insertion Position of FAGC

4.2. Ablation Study on the Gating Designs in FAGC

4.3. Effect of Convolutional Branch Design in SADA

4.4. Impact of Proposed Modules on Different Detector Architectures

4.5. Evaluation Across Fog Densities and Training Protocols

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI