Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images

Kariri, Abdulrahman; Elleithy, Khaled

doi:10.3390/electronics15050979

Open AccessArticle

Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images

by

Abdulrahman Kariri

and

Khaled Elleithy

^*

Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 979; https://doi.org/10.3390/electronics15050979

Submission received: 20 January 2026 / Revised: 22 February 2026 / Accepted: 26 February 2026 / Published: 27 February 2026

(This article belongs to the Special Issue Feature Papers in Networks: 2025–2026 Edition)

Download

Browse Figures

Versions Notes

Abstract

Routine evaluation of insulator performance is important for maintaining the reliability and safety of power system operations. The use of unmanned aerial vehicles (UAVs) has been a significant advancement in transmission line monitoring, effectively replacing traditional manual inspection methods. With the rapid advancement of deep learning techniques, methods based on these models for detecting insulator defects have attracted increasing research interest and achieved notable advancements. Nevertheless, existing approaches primarily emphasize constructing sophisticated and intricate network architectures, which consequently lead to greater inference complexity when applied in practical scenarios. On the other hand, foggy scenarios pose challenges for learning algorithms due to difficulties in obtaining and labeling samples, as well as the poor performance of detectors trained on clear-weather samples. This study proposes adaptive enhancement based on YOLO, a framework that has robustness and domain generalization under fog-induced distribution shifts. It optimizes at multiple scales and enhances images as input to a detector in a single pipeline. Experimental results demonstrate improved performance on public UPID and SFID insulator defect datasets, improving insulator defect detection precision without increased computational complexity or inference resources, which is of great significance for advancing object detection in adverse weather. The proposed method achieves real-time performance, with an end-to-end inference speed exceeding 25 FPS and a model-only speed of approximately 38 FPS on 678 images from UPID, demonstrating both practical applicability and computational efficiency.

Keywords:

object detection; domain shift; domain generalization; adaptive enhancement; YOLO

1. Introduction

Reliable detection of insulator defects is a crucial component of power transmission line maintenance, as insulator failures can lead to severe outages, equipment damage, and safety hazards. With the expansion of modern power grids, the inspection workload has increased significantly, prompting a shift from traditional manual inspections to automated, intelligent visual inspection systems. Unmanned aerial vehicle (UAV)-based inspection has emerged as a cost-effective and efficient solution for monitoring large-scale transmission infrastructures. Coupled with the recent advances in deep learning, UAV imagery has enabled rapid progress in computer vision-based defect detection of power system components [1,2]. Therefore, fast detection and accurate location of insulator flaws are critical for ensuring transmission line operation [3,4]. Detecting insulator defects, fractures, and faults is particularly challenging in the complex, harsh environments where they are deployed, as illustrated in Figure 1.

UAV scan images detect the positions of electrical components and support autonomous inspections, navigation, camera focusing, and defect identification in transmission line components, thereby enhancing camera efficiency and accuracy. UAV inspections generate large images, necessitating the study of recognition and defect detection algorithms for electrical equipment in aerial photos to enhance automation in transmission line inspection. In ref. [5], a list of popular deep learning-based insulator defect detection algorithms is offered. Many effective detection approaches are tested on self-built datasets, complicating replication and limiting model applicability, resulting in low generalization ability. Insulator defect detection models trained on public datasets perform well only on the dataset cited in the literature. The datasets used for training fail to consider the impact of weather changes on the model’s accuracy in real-world and adverse weather scenarios due to their single-image scale.

Recent advances in UAV-based automated inspection technologies have significantly transformed insulator defect detection, improving operational efficiency and safety [1,2]. These systems enable real-time acquisition and analysis of fault information, supporting timely maintenance decisions and reducing manual intervention. Zhou et al. [6] proposed an improved YOLOv5 by optimizing its feature extraction and fusion mechanisms to enhance small-object detection. They also incorporated rotated bounding boxes to mitigate background interference, achieving improved detection accuracy under clear weather conditions. With high-quality labeled datasets and advanced detection algorithms, insulator defect detection in favorable environments has become increasingly reliable. Nevertheless, challenging weather conditions, such as fog, pose considerable obstacles to visual detection systems. This difficulty arises primarily from domain shift, characterized by substantial discrepancies in feature distributions between samples from clear and foggy environments [7,8]. Consequently, developing detection models that maintain high performance in complex environmental conditions, such as foggy scenarios, remains a critical research objective.

To address the challenges posed by adverse weather conditions, several deep learning-based dehazing approaches [9,10] have been proposed to restore visual information degraded by fog. These methods can partially recover lost details, enhance image visibility, and thereby improve target detection accuracy. However, their considerable computational demands make them unsuitable for real-time deployment in UAV-based inspection systems. Moreover, these techniques primarily enhance image quality rather than fundamentally resolving the domain shift problem that arises from substantial discrepancies in feature distributions between clear and foggy weather conditions. An alternative approach is to train detection models directly on foggy datasets, which can improve performance within that domain. Yet collecting and annotating large-scale foggy data is both time-consuming and costly [11]. To mitigate these limitations, domain adaptation (DA) has emerged as a promising solution [12]. DA seeks to align the feature representations of the source domain (clear-weather data) and the target domain (foggy-weather data), thereby enhancing the model’s adaptability to the target environment. Conventional deep learning methods often assume a data distribution similar to the target domain, limiting their generalization capacity in foggy conditions. Domain generalization (DG) aims to learn domain-invariant features from multiple source domains [13], but constructing diverse fog-related datasets presents challenges. Piva et al. [14] highlighted that unlabeled data can enhance generalization to unseen domains, a capability typically lacking in DG approaches but exploited by unsupervised DA (UDA) frameworks.

In object detection, numerous studies have investigated cross-domain adaptation methods based on the RCNN family of frameworks. For instance, Chen et al. [15] improved cross-domain object detection performance by incorporating an adversarial domain adaptation module into Faster R-CNN, enabling more effective feature alignment between source and target domains. Building on this idea, Zhou et al. [16] integrated Faster R-CNN with a self-adversarial disentanglement module to extract domain-invariant representations, further enhancing domain adaptation capability. Despite these advances, research on real-time object detection frameworks, particularly those derived from the YOLO architecture, remains comparatively limited. While domain adversarial learning (DAL) [17] is a key approach for learning domain-invariant features, its effectiveness depends heavily on the discriminative power of the domain classifier, often leading to unstable training and optimization challenges.

Motivated by this research gap, our previous work [18] proposed a lightweight adaptive enhancement based on a YOLO model as a detector for object detection in challenging environments. The proposed model achieved high detection precision with minimal computational overhead, demonstrating its suitability for challenging environments. The present study extends our previous framework by investigating its performance on a newly collected dataset that encompasses varied environmental and regional conditions. Specifically, we evaluate the model’s robustness and domain generalization capability when confronted with unseen data distributions, a scenario that commonly occurs in practical UAV inspection tasks. Extending AEA-YOLO’s evaluation to new datasets UPID and SFID with diverse environmental characteristics, focusing on robustness and generalization under synthetic fog and real-world variability. The work emphasizes experimental validation and robust analysis without introducing new algorithms. Unlike our earlier work, which primarily focused on network design and efficiency, this study emphasizes the analysis of cross-domain performance and detection stability. The main contributions of this paper can be summarized as follows:

Publicly available object detection datasets are systematically augmented using a custom degradation strategy that generates multiple synthetic adverse-weather variants per image, facilitating controlled evaluation of cross-domain robustness under diverse environmental conditions.
A YOLO-based detection framework is enhanced to enable systematic evaluation of robustness under domain shifts, without increasing inference resource requirements.
An enhanced training and evaluation pipeline incorporating domain generalization techniques is proposed to improve detection accuracy in adverse conditions.
Extensive experimental evaluations and comparative analyses are conducted to reveal both the strengths and limitations of existing detectors in practical UAV-based inspection environments.

This study offers new insights into the practical deployment of deep learning-based insulator inspection systems and contributes to advancing robust and efficient detection methods for intelligent power transmission line maintenance, which is of great significance for detecting objects in adverse weather. Although the architectural backbone remains consistent with the previously proposed AEA-YOLO framework, this study provides several new technical insights. First, the enhancement strategy is systematically evaluated under controlled synthetic fog generation with explicitly defined degradation parameters. Second, a structured ablation analysis is conducted to disentangle the contributions of individual enhancement components. Third, the framework is repositioned from domain adaptation to domain generalization under fog-induced distribution shifts, supported by transparent data-split protocols and leakage control. These additions extend the previous work by formalizing the degradation modeling process and providing a deeper empirical understanding of enhancement–detection interactions under environmental shift.

2. Related Work

2.1. Object Detection

Insulator defect detection, a specific application of object detection, has evolved from traditional image processing techniques to deep learning-based frameworks [5]. Deep learning-based object detectors are generally categorized into two-stage and one-stage approaches. Two-stage detectors, such as the R-CNN family, typically involve a region proposal step followed by object classification, resulting in higher computational complexity and slower inference speeds that limit their suitability for real-time applications [19,20]. In contrast, the one-stage detection paradigm, represented by the YOLO (You Only Look Once) series, performs both localization and classification within a single network pass, enabling real-time detection [21]. The YOLO architecture inherently supports multiscale feature learning through its backbone and neck modules, allowing it to capture objects of varying sizes efficiently. Over time, the YOLO family has undergone continuous enhancement, leading to more advanced versions. The proven effectiveness of YOLO in numerous computer vision tasks, including insulator defect detection, demonstrates its ability to maintain high accuracy while offering substantial gains in detection speed [22,23]. Nonetheless, despite these advancements, the performance of existing object detection models under cross-domain conditions remains limited, highlighting the need for further research to enhance their robustness and generalization across varying environments.

2.2. Domain Adaptation and Domain Generalization

Domain adaptation (DA) and domain generalization (DG) are two key strategies for mitigating the domain shift problem, which arises when a model trained on one distribution encounters data from a different distribution during testing. Both approaches aim to reduce the decline in detection accuracy caused by discrepancies between source and target domains. In DA, it is typically assumed that the distribution of the target domain used during adaptation is consistent with that of the test domain, enabling the model to learn transferable representations that bridge this gap. Among DA techniques, unsupervised domain adaptation (UDA) has gained prominence because it eliminates the need for labeled samples in the target domain [24]. On the other hand, DG seeks to train models using multiple labeled source domains that collectively capture diverse characteristics, allowing the model to generalize effectively to unseen test domains [25].

Although training models on real fog-domain data could substantially enhance robustness, collecting and annotating such data across multiple environmental conditions is prohibitively challenging. Consequently, synthetic fog datasets are often generated using fog-simulation algorithms [26]. Despite their utility, synthetic datasets struggle to capture the full complexity of natural fog, which depends on intricate factors such as humidity, illumination, and atmospheric pollutants. Thus, the fundamental challenge is to learn generalizable representations from limited, condition-specific fog data. Both DA and DG aim to derive domain-invariant features as a shared training objective. However, DG methods rely exclusively on labeled data and cannot leverage the rich information contained in unlabeled samples, a limitation that is increasingly recognized. In contrast, UDA can exploit unlabeled data, offering a promising yet underexplored pathway for enhancing generalization in object detection under complex weather conditions [14].

2.3. Domain Adaptive Object Detection

Research on DG for object detection remains limited, and the capabilities of UDA have yet to be fully explored. This study focuses on Domain Adaptive Object Detection (DAOD) using the AEA-YOLO model [18]. One of the earliest efforts toward domain-adaptive detection was proposed by Chen et al. [15], who attempted to align feature distributions across domains to mitigate domain shifts. However, their method focused primarily on aligning a single feature layer, which proved inadequate for addressing complex variations between domains. Later investigations revealed that domain-adversarial learning (DAL) can enhance cross-domain detection, particularly when applied within R-CNN-based frameworks and their extended variants [27,28].

Hnewa and Radha [29] integrate DAL into the YOLO architecture to enhance cross-domain detection performance. Their findings show that simple image-level adaptation networks struggle to compete with the YOLO backbone’s complex multi-scale feature representations, resulting in insufficient adaptation and limited performance improvement. They also proposed four multiscale architectures based on YOLOv4, incorporating multiple domain classifiers at intermediate layers to strengthen cross-domain feature alignment [30]. Despite these advancements, DAL research within the YOLO family remains considerably less developed than in R-CNN-based models [31,32]. Nevertheless, existing DAL approaches still face several limitations. Their domain adaptation performance depends heavily on the domain classifier, which uses domain classification loss as the sole optimization criterion. When the domain classifier is undertrained or poorly optimized, feature alignment becomes inaccurate, reducing the model’s adaptability to the target domain. Furthermore, gradient reversal layers introduce training instability [33] and make it challenging to achieve equilibrium between the feature extractor and the domain classifier during adversarial learning [34]. These issues are compounded by the high-dimensional parameter spaces and complex dynamics characteristic of deep detection networks [35].

3. Methods

This section describes how to apply the proposed model, resulting in enhanced cross-domain recognition for insulator detection and defect detection in foggy conditions. The proposed method includes improvements and strategies to enhance detection accuracy. This paper extends the framework by testing it on a new dataset with different environmental characteristics and evaluating its cross-domain generalization performance. The proposed method has three main components—image enhancement module (IEM), the parameter prediction network (PPN), and the detection network (DN)—as shown in Figure 2. These modules help improve object detection performance in challenging situations. Our goal is to use domain generalizations to strengthen new experiments and datasets against domain shifts, allowing them to converge toward domain invariance during training. The subsections that follow provide more information about each module and how it fits within the framework.

This strategy corresponds to promote domain generalization, utilizing the PPN, IEM, and DN in one pipeline to dynamically adjust enhancement parameters instead of adversarial domain alignment. Domain generalization uses hybrid training with varied degradation levels to help the detector learn domain-invariant representations applicable to unseen fog conditions. Empirical validation involves assessing performance transfer from clear to fog-degraded images for evaluating robustness across different fog densities and datasets for DG capability. The model improves robustness under fog-related domain shifts through exposure to synthetically generated fog degradations during training.

3.1. Image Enhancement Module (IEM)

Gradient-based optimization has become an effective paradigm in computational imaging, enabling both image enhancement and the integration of learnable processing operations within neural networks. Using differentiable transformations, image attributes such as contrast, sharpness, gamma, white balance, and tonal distribution can be adaptively adjusted during training. In this work, the image enhancement module (IEM) is constructed using a set of resolution-independent and fully differentiable filters, allowing seamless gradient backpropagation regardless of input image size. The IEM is designed to perform adaptive enhancement on input images prior to detection, thereby improving visual quality under diverse weather conditions before features are extracted by the detection network (DN). By embedding the enhancement process within the learning pipeline, the module supports end-to-end optimization and plays a key role in mitigating degradation caused by adverse environments.

The composition and ordering of the enhancement filters are determined based on both imaging theory and empirical observations. Specifically, the enhancement process begins with visibility restoration through defogging, followed by chromatic correction via white balance. Photometric refinement is then applied using gamma and tone adjustments, after which local details are emphasized through contrast enhancement and sharpening. This structured sequence yields more stable perceptual improvements and detection performance than arbitrary or reversed filter arrangements. Although the study does not exhaustively evaluate alternative orderings or subsets, this selected sequence aligns with best practices in computational photography and guarantees stable end-to-end optimization. For clarity, the filters within the IEM can be grouped into three functional categories: pixel-wise operations for global intensity adjustment, sharpening operations for detail enhancement, and defogging operations for visibility restoration in fog-affected scenes.

Pixel-wise Filters:

These filters transform the value of an input pixel to an output pixel value, as shown below. The input pixel value is represented by

P_{i}

, while the output pixel value is represented by

P_{0}

, which represents the three-color channels.

P_{i} = (R_{i}, G_{i}, B_{i})

(1)

P_{o} = (R_{o}, G_{o}, B_{o})

(2)

White Balance: The purpose of this filter is to correct color imbalances induced by lighting variances. The equation of the filter contains three scaling factors for Red, Green, and Blue.

P_{o} = (W_{R}, W_{G}, W_{B}) * P_{i}

(3)

Gamma Correction: The gamma correction filter adjusts the global brightness of the image through a nonlinear intensity transformation. It is mathematically expressed as follows:

P_{o} = P_{i}^{γ}

(4)

where (

γ

) is a tunable or learnable gamma coefficient. This operation enhances visibility in dark regions without saturating brighter areas, contributing to more stable contrast representation in low-light conditions.

Contrast Adjustment: The contrast filter improves visual distinction between objects and background by stretching or compressing the luminance range. The luminance of each pixel is computed as follows:

L u m (P_{i}) = 0.27 * R_{i} + 0.67 * G_{i} + 0.06 * B_{i}

(5)

The enhanced pixel value is then obtained using a weighted blend of the original and enhanced intensities:

P_{o} = α * E n (P_{i}) + (1 - α) * P_{i}

(6)

The symbol α represents weight, which controls the blend of original and augmented pixel values. This nonlinear formulation amplifies mid-tone luminance levels while preserving highlights and shadows, thereby producing perceptually balanced contrast enhancement. The parameter α in the equation is adaptively predicted by the parameter prediction network (PPN) rather than being manually fixed. Its value is learned during end-to-end training via backpropagation and dynamically adjusted according to image content and degradation characteristics. The improved image calculates as follows:

E n (P_{i}) = P_{i} * \frac{0.5 (1 - \cos (π * L u m (P_{i})))}{L u m (P_{i})}

(7)

Tone Adjustment: The tone filter modifies the global tone curve of the image to refine color representation and dynamic range. The operation is modeled as a piecewise linear function dividing the normalized intensity range [0, 1] into L discrete tone levels, each controlled by a learnable parameter tj. Equation (8) follows a standard piecewise linear tone-mapping formulation, where the tone curve is parameterized by learnable coefficients and constrained via a clipping function to ensure numerical stability and valid pixel ranges.

P_{o} = \frac{1}{L} \sum_{j = 0}^{L - 1} C l i p (L * P_{i} - j, 0,1) * t_{j}

(8)

The clipping function ensures that output intensities remain within valid pixel ranges. By adapting the slope of the tone curve in each interval, this filter improves color harmony and tonal balance across diverse lighting environments.

Sharpen Filter:

The sharpening filter enhances edge details and fine textures, improving the clarity of object boundaries critical for detection accuracy. This operation is formulated similarly to an unsharp masking technique:

F (x) = I (X) + λ * (I (x) - G (I (x)))

(9)

where I(x) denotes the input image, G(I(x)) is the result of Gaussian smoothing, and λ is a positive scalar that determines the sharpening intensity. The difference term I(x) − G(I(x)) extracts high-frequency edge information, which is then scaled and added back to the original image. Since this operation is differentiable with respect to x in both spatial directions, the sharpening strength can be learned during training to optimize downstream detection performance.

Defog Filter:

The defog filter restores scene visibility in hazy or foggy images using the atmospheric scattering model, which describes the interaction between scene radiance and light transmission through the atmosphere. The observed hazy image I(x) can be expressed as follows:

I (x) = J (x) * t (x) + A (1 - t (x))

(10)

where J(x) is the scene radiance (clean image), A is the global atmospheric light, and t(x) is the transmission map, defined as follows:

t (x) = e^{- β} d (x)

(11)

Here, d(x) represents the scene depth, and β is the atmospheric scattering coefficient. To enhance adaptability, the transmission map can be estimated with an additional learnable fog-removal parameter ω. Equation (12) is derived from the classical dark channel prior with an additional learnable parameter ω to adaptively control haze removal strength.

t (x, ω) = 1 - ω \min_{C} (\min_{y \in Ω (x)} \frac{I^{C} (y)}{A^{C}})

(12)

where Ω(x) denotes the neighborhood of pixel x, I^c(y) is the intensity at channel (c) and position y, and A^c is the atmospheric light for channel c. The learnable parameter ω governs the degree of haze removal, allowing the model to balance defogging strength adaptively. Since the entire process is differentiable, it supports backpropagation, enabling end-to-end optimization within the detection framework. The restored image J(x) is computed as follows:

J (x) = \frac{I (x) - A}{t (x)} + A

(13)

3.2. Parameter Prediction Network (PPN)

The parameter prediction network (PPN) is designed as a compact convolutional regression model that estimates the parameters required by the image enhancement module (IEM) from a low-resolution RGB representation of the input image. Its primary function is to infer a concise set of filter coefficients that adapt the enhancement process to the prevailing environmental conditions. As illustrated in Figure 3, the PPN consists of five convolutional stages that progressively reduce spatial resolution while increasing feature dimensionality. This design allows the network to capture global scene statistics efficiently without relying on max-pooling operations, thereby maintaining contextual information while keeping the parameter count low. Compared to pooling-based alternatives, this approach is more suitable for predicting compact parameter vectors.

To ensure stable training and fast convergence, batch normalization and Leaky-ReLU activations are employed throughout the network. The extracted features are compressed into a 32-dimensional scene representation, which is subsequently projected through a fully connected layer with 128 units. The final output corresponds to N continuous enhancement parameters, resulting in approximately 6 k learnable parameters in total. The predicted parameters are constrained within predefined ranges and linearly rescaled to ensure compatibility with photographic conventions and differentiable optimization. These constraints include bounded values for defog transmittance, limited chromatic adjustment for per-channel white balance gains, and restricted control points for tone mapping curves. To ensure stable optimization and physically plausible image transformations, each filter parameter in the PPN is constrained within predefined ranges. Specifically, the defog transmittance parameter is bounded within

[0.1, 1.0]

; the per-channel white balance gains are restricted to

[0.91, 1.10]

as ±10% chromatic adjustment; the gamma is controlled by a range factor of 3; the tone-mapping control points are constrained within

[0.5, 2.0]

; the contrast scaling is bounded by a range factor of 3.5; and the unsharp mask strength is restricted to

[0, 5]

. In total, the PPN optimizes 15 learnable parameters under these bounded constraints to maintain numerical stability and avoid over-enhancement artifacts during end-to-end training. Through convolutional feature learning, the PPN captures environmental cues such as luminance attenuation, color distortion, texture degradation, and depth-related veiling effects. By associating degraded inputs with corresponding clean-image statistics, the network learns to map different degradation patterns to appropriate regions in the enhancement parameter space, enabling adaptive adjustment of the enhancement strategy under adverse conditions.

3.3. Detection Network (DN)

In this study, the YOLO (You Only Look Once) architecture is used as the one-stage detection network because of its proven efficiency and widespread adoption in real-world visual inspection and object detection applications. YOLO’s design enables simultaneous localization and classification in a single forward pass, delivering high inference speed without compromising accuracy. This real-time capability makes it especially suitable for aerial inspection scenarios where rapid analysis of streaming imagery is essential for timely decision-making.

Building on earlier versions such as YOLOv3, recent YOLO iterations integrate the Darknet-53 backbone, which enhances feature extraction through a hierarchical convolutional structure. The backbone alternates 3 × 3 and 1 × 1 convolutional kernels within a ResNet-inspired residual architecture. This configuration facilitates deeper network training by mitigating the vanishing gradient problem and improving gradient flow across layers. The residual connections also enable efficient reuse of learned features, strengthening the model’s capacity to represent both low-level textures and high-level semantic information. Consequently, YOLOv3, which uses Darknet-53 as its backbone, provides a robust and computationally balanced foundation for object detection under diverse and challenging visual conditions. The loss function incorporates the standard YOLO detection loss, which includes components like bounding box regression loss, object-ness loss, and classification loss, adhering to the default YOLOv3 settings. DG occurs without explicit labels or adversarial signals, relying instead on an adaptive enhancement mechanism and hybrid data exposure. The PPN and IEM are jointly trained with the DN detector, with gradients propagated end-to-end via differentiable image filters, optimizing enhancement parameters through the detection loss.

The YOLO detector incorporates a learnable image pre-processing step that enhances images by extracting filter-parameter vectors and applying filters. This enhanced image is processed by the detection network (DN), resulting in three feature routes for identifying objects of varying sizes. Outputs are decoded into bounding boxes, converted to absolute coordinates, and duplicates are eliminated using Non-Maximum Suppression (NMS). The training loss is calculated by summing three key loss terms: Generalized Intersection over Union (GIoU) Loss, confidence loss, and classification loss.

L_{t o t a l} = L_{G I o U} + L_{c o n f} + L_{c l s}

where

L_{G I o U}

denotes the bounding box regression loss based on Generalized Intersection over Union (GIoU),

L_{c o n f}

is the object-ness confidence loss, and

L_{c l s}

is the classification loss. The bounding box loss is computed as

1 - GIoU

, with a scale factor

(2 - \frac{w \cdot h}{S^{2}})

is applied to emphasize small objects, where

w

and

h

denote bounding box width and height and

S

is the input resolution. The confidence loss is computed using sigmoid cross-entropy with focal modulation:

α {|y - p|}^{γ} * B C E (y, p)

where

y \in {0,1}

denotes the ground-truth object-ness label,

p

is the predicted object-ness probability after sigmoid activation,

α = 1

, and

γ = 2

, and background samples are selected using an IoU threshold of 0.5. The classification loss is computed using sigmoid cross-entropy between predicted class probabilities and ground-truth labels. All three loss components are summed without additional weighting. All experiments are conducted with a fixed random seed of 42 to ensure reproducibility. The optimizer, learning rate schedule, batch size, and other training configurations are specified in Section 4.1.

Compared with two-stage detectors such as Faster R-CNN, which separate region proposal and classification into distinct processes, the YOLO framework unifies these operations within a single network, resulting in significantly lower computational overhead. Two-stage models, while generally achieving high accuracy in static or laboratory environments, often incur latency and resource constraints that make them impractical for real-time applications. In contrast, YOLO’s end-to-end detection paradigm eliminates explicit region proposal stages, enabling rapid object localization and recognition in a single pass. This architectural simplicity not only enhances inference efficiency but also facilitates deployment on platforms with limited computational capacity, such as unmanned aerial vehicles (UAVs) or edge devices. Furthermore, YOLO’s dense spatial prediction strategy ensures that small and distant targets, which are common in transmission line and infrastructure inspection, are detected with higher responsiveness and precision. These characteristics collectively make YOLO a compelling choice for adaptive, real-time object detection in complex environmental scenarios.

4. Experiments and Results

This section presents the experimental design, implementation details, and performance evaluation of the proposed framework. The experiments were conducted to assess the effectiveness of the adaptive detection model across diverse environmental conditions and to validate its ability to address domain shift between clear and degraded imagery. All evaluations were performed on both synthetic and real-world datasets to provide a comprehensive analysis of the model’s generalization ability. The results are presented alongside several state-of-the-art detection methods to provide contextual performance comparison in terms of precision, robustness, and computational efficiency. In addition, quantitative metrics such as mean Average Precision (mAP), Precision, Recall, and FPS are used to provide objective performance evaluation, while qualitative results are included to illustrate the visual enhancement and detection accuracy achieved by the proposed method.

4.1. Experimental Environment and Configuration

To ensure a fair and consistent assessment, the experimental framework in this study is based on our previously published detection model, which serves as the foundational architecture for all subsequent evaluations. While the model design remains unchanged, the present work introduces two datasets, the Unifying Public Insulator Dataset (UPID) and the Synthetic Foggy Insulator Dataset (SFID), to investigate the model’s adaptability to unseen domains. UPID comprises aerial imagery of transmission line insulators captured under diverse environmental conditions, including varying illumination and background complexity. In contrast, SFID simulates adverse weather by applying synthetic fog layers of varying densities to clear images, providing a controlled setting for domain shift analysis. All datasets were resized and split into training and testing subsets to maintain uniform evaluation protocols. This experimental design enables a systematic comparison of model behavior across distinct visual domains and demonstrates its capacity to generalize effectively without architectural modification.

UPID, which contains 6860 images, was divided at the image level into 6182 training images and 678 testing images, approximately 90% and 10%, respectively. SFID, which contains 13,718 images, was divided into 10,975 training images and 2743 testing images, approximately 80% and 20%, respectively. Dataset splits were generated using a fixed random seed of 42 to ensure reproducibility. The train and test division were performed prior to any synthetic degradation. Fog-augmented variants were generated exclusively from training images. There are no original images or their derived fog variants appear in both training and testing sets, thereby preventing data leakage. There is no separate validation set that was used; hyperparameters follow default YOLOv3 configuration. The test set was used solely for final evaluation after training convergence. During training, ten fog variants were generated per image using the atmospheric scattering model. Three variants were randomly sampled during training to increase diversity.

Aligned with our previous experimental framework, the proposed evaluation employs a hybrid training strategy that integrates both clear and synthetically degraded imagery to enhance model robustness across diverse visual conditions. Adverse environments were simulated by applying fog-based image degradation generated through the atmospheric scattering model, in which each original image was converted into ten fog variants of varying densities, as shown in Figure 4. The fog parameters are uniformly sampled within predefined ranges to simulate mild to dense fog. Synthetic fog images were generated using an atmospheric scattering model, with fixed atmospheric light parameter A = 0.5. The scattering coefficient β was varied from 0.05 to 0.14 to create ten variants per training image, simulating increasing fog densities while ensuring visual plausibility. The transmission map was computed from the image center, and variants were restricted to training images to avoid data leakage. Degraded variants are included only in the training set, while test sets remain fixed to ensure fair evaluation. This approach ensures that the detector is regularly exposed to varying levels of visibility loss during training, thereby improving its ability to generalize to real-world foggy scenarios. In the current experiments, this hybrid strategy was extended to both UPID and SFID to maintain consistency with the earlier methodology while broadening the evaluation scope. All training images were augmented with geometric transformations, including random flips, rotations, and color jittering, to prevent overfitting and enhance data diversity.

The training and evaluation were performed on an NVIDIA GeForce RTX 4090 24 GB GPU workstation with CUDA acceleration. The model was trained for 200 epochs with a batch size of 6, using the Adam optimizer and an initial learning rate of 0.0001, which was reduced by a factor of 0.1 once learning plateaued. The training objective follows the YOLOv3 formulation, employing Generalized IoU (GIoU) loss for bounding box regression, sigmoid cross-entropy with focal modulation for object-ness confidence, and sigmoid cross-entropy for multi-class classification. This setup provides a robust and fair basis for evaluating the adaptability of the previously proposed model across distinct environmental domains and datasets.

4.2. Evaluation Metrics

In this experiment, mean Average Precision (mAP) is used as the performance evaluation metric to assess the effectiveness of detection algorithms. mAP is essential in object detection tasks and is calculated by matching model predictions with true labels using intersection over union (IoU). A threshold, typically set at 0.5, determines whether a predicted box matches a true box. For each category, Average Precision (AP) is computed by ordering predictions by confidence and calculating Precision (P) and Recall (R), which are computed at the dataset level by aggregating all detections across the test set. The overall mAP is the average of these category averages, with higher mAP indicating better accuracy in target identification. The calculation formula is as follows:

P = \frac{{T r u e}_{p o s i t i v e s}}{{T r u e}_{p o s i t i v e s} + {F a l s e}_{p o s i t i v e s}}

(14)

R = \frac{{T r u e}_{p o s i t i v e s}}{{T r u e}_{p o s i t i v e s} + {F a l s e}_{n e g a t i v e s}}

(15)

A P = \sum_{n} (R_{n} - R_{n - 1}) P_{n}

(16)

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(17)

4.3. Ablation Study

4.3.1. Effect of Hybrid Training and IEM

In this section, Table 1 contains the contributions of hybrid training utilizing synthetic fog images and the image enhancement module (IEM). Three configurations were assessed using UPID: (1) DN + Hybrid, a baseline detector trained with synthetic fog images without IEM; (2) DN + IEM, a detector with IEM integrated but trained solely on clean images; and (3) DN + IEM + Hybrid, which incorporates both hybrid fog exposure and adaptive enhancement as part of the complete framework.

These findings in Table 1 indicate that hybrid training with synthetic foggy images (DN + Hybrid) improves robustness to visibility variation during training but does not provide optimal performance by itself. Integrating IEM without hybrid exposure (DN + IEM) fails to significantly improve accuracy and may decrease mAP@50:95. The combination of hybrid training and IEM (DN + IEM + Hybrid) significantly boosts performance metrics, showing mAP@50 improvements from 92.94% to 99.79% and mAP@50:95 from 67.20% to 81.09%. This highlights that performance gains are derived from the interaction of hybrid exposure and IEM, rather than from either method alone. Since the parameter prediction network (PPN) is embedded within the IEM and learns filter parameters conditioned on both clean and synthetically degraded images during training, enhancement and hybrid exposure function as a unified learning mechanism. This architectural coupling explains why the full configuration yields the strongest performance.

4.3.2. IEM Component and Hybrid Training Analysis

In this study, measured by mean Average Precision (mAP), the results indicate that the proposed method consistently outperformed the baseline detector DN. Furthermore, qualitative visualizations of detection outputs show that the model successfully identifies small and partially occluded defects, even in foggy or low-contrast regions. These findings, presented in Table 2 and Figure 5, collectively validate the generalization capability of the previously proposed model when applied to new domains and demonstrate its suitability for real-world UAV-based inspection tasks under challenging environmental conditions. The observed performance improvements arise from the complementary interaction between synthetic fog exposure and adaptive enhancement. Synthetic fog augmentation increases the diversity of degradation patterns encountered during training, promoting robustness to distribution shifts. However, augmentation alone does not fully compensate for visibility loss or photometric distortion. The IEM further refines degraded inputs through adaptive enhancement prior to detection, improving feature separability under low-contrast conditions. The cumulative ablation results indicate that performance gains are not solely attributable to data augmentation but are strengthened by the integrated enhancement mechanism.

A more detailed analysis of Table 2 reveals the incremental contribution of individual enhancement components. When only a single filter is added to the baseline DN, performance improvements remain marginal, indicating that visibility restoration alone is insufficient to fully compensate for domain-induced degradation. However, combining complementary filters produces substantially larger gains. For example, the joint application of contrast and sharpening significantly increases mAP@50 from 92.94% to 98.76%, suggesting that local detail enhancement plays a critical role in improving small-defect detection under reduced visibility. Similarly, multi-filter combinations that include white balance and gamma correction improve robustness to color distortion and global luminance shifts. As additional filters are integrated, performance increases progressively, culminating in the full IEM configuration, which achieves the highest Recall and mAP. These results indicate that no single filter is solely responsible for the observed gains. Instead, the improvement arises from the complementary interaction of visibility restoration, photometric correction, and detail enhancement. The full module design therefore provides a balanced enhancement strategy that jointly addresses multiple degradation factors present in foggy environments. Although alternative filter orderings were not exhaustively evaluated, the progressive performance improvements observed in the cumulative ablation study support the effectiveness of the adopted sequence. These findings demonstrate that the observed performance gains are not attributable to a single enhancement operation, but rather to the integrated design of the IEM.

The comparative results of baseline DN and IEM + DN presented in Table 1 reveal that the proposed enhancement significantly improves overall detection performance. The Recall rate increased from 90.21% to 99.68%, indicating that the enhanced model successfully detected nearly all insulator targets, including those affected by fog or low contrast. While precision decreased marginally from 97.34% to 96.38%, this slight reduction is expected due to the increased detection sensitivity, which may produce a few additional false positives. The P and R trade-off shows slightly increased sensitivity, resulting in higher R without systematic or category-specific false detections. Nevertheless, both mAP@50 and mAP@50:95 improved substantially from 92.94% to 99.79%, and 67.20% to 81.09%, respectively, demonstrating enhanced localization and classification consistency across multiple IoU thresholds. These results confirm that the proposed enhancement module effectively increases detection robustness and adaptability without compromising overall accuracy.

Regarding inference speed, our proposed model achieved 37 Frames Per Second (FPS) with a batch size of one. This performance is noteworthy in real-time object detection, especially compared with dehazing-based detection methods that incur significantly greater computational overhead. In many practical applications, such as UAV inspection, surveillance systems, and industrial visual monitoring, an inference rate above approximately 25 FPS is generally considered sufficient for real-time responsiveness [36,37]. Thus, real-time systems require algorithms capable of operating at 25 FPS or higher on low-power edge devices. For example, recent studies of lightweight real-time detectors indicate that speeds in the 20–30 FPS range enable practical deployment on video streams [37]. While original YOLO architectures may reach high speeds under ideal conditions, such as on a high-end GPU without additional enhancement modules, our result shows that incorporating the proposed enhancement module does not compromise real-time operability. This confirms that the system remains viable for real-time deployment even when balanced with improved detection robustness and DG (Figure 6). Inference performance of the proposed YOLO-based model on 678 test images. The raw model achieved 37.39 FPS for the forward pass (model-only), while the complete pipeline, which included preprocessing, inference, and postprocessing, achieved 26.29 FPS end-to-end. The average model inference time per image was 0.0267 s, and the average total processing time per image was 0.0380 s, demonstrating real-time capability on a single GPU.

4.4. UPID and SFID Comparison Experiment

When compared with reported results of state-of-the-art (SOTA) detection frameworks, the proposed model achieves competitive performance under the evaluated settings. It is acknowledged that different methods exhibit varying generalization depending on dataset characteristics, training strategies, and experimental protocols. Table 3 presents a comparative performance summary of the proposed model and several recent state-of-the-art insulator detection methods on UPID. Under the reported results, the proposed approach attains a mAP@50 of 99.79%, which is numerically higher than HRGA-Net (99.56%) and FINet-YOLOv5 (99.30%). These results indicate that the proposed framework achieves strong detection accuracy while maintaining stable performance across evaluation metrics. Although the mAP@50:95 (81.09%) is slightly lower than MMA multi-Domain (82.82%) and HRGA-Net (84.11%), the marginal difference indicates that the proposed model maintains competitive localization precision at stricter IoU thresholds. Overall, these results confirm that the proposed model achieves competitive detection accuracy while sustaining comparable fine-grained localization performance, highlighting its potential for practical UAV-based insulator inspection tasks.

The findings reveal that although deep dehazing networks can enhance detection accuracy under hazy conditions, these improvements come at the cost of significantly increased computational complexity, leading to a noticeable decline in inference speed. In contrast, the model achieves detection precision at a level comparable to reported methods while maintaining real-time efficiency. Moreover, while standard YOLO models exhibit pronounced performance degradation when applied to unseen target domains, the proposed framework effectively mitigates this domain gap. This demonstrates its enhanced robustness and adaptability, enabling stable detection performance across diverse and previously unobserved visual environments.

To further evaluate the detection capability and generalization performance of the proposed algorithm, additional comparative experiments were conducted using the public Synthetic Foggy Insulator Dataset (SFID). This dataset was designed to simulate adverse atmospheric conditions and assess model robustness under fog-induced domain shifts. The comparative performance of several general object detection algorithms on SFID is summarized in Table 4. To evaluate the effectiveness of the proposed model, we compare it with recent methods, including FINet, ChainNet, and ODNet. The performance of these baselines is reported by Liao et al., while the performance of IDD-YOLO and MMA multi-Domain is taken from their respective original publications. Among the reported approaches, HRGA-Net reports a mAP@50 of 99.82% and mAP@50:95 of 85.29%. IDD-YOLO and MMA multi-Domain also demonstrate strong localization performance. FINet and ChainNet report Recall values above 99%, although with relatively lower precision in complex backgrounds. ODNet shows comparatively lower performance across the reported metrics. ChainNet achieves Recall of 99.23% and Precision of 89.71%, while IDD-YOLO reports mAP@50:95 of 87.20%. Under the same dataset, the proposed method attains Precision of 99.25%, Recall of 99.11%, and mAP@50 of 99.26%. While mAP@50:95 (80.93%) is lower than some recently reported methods, the results indicate a balanced trade-off between precision and overall detection performance. It should be noted that comparisons are based on reported results in the literature and may involve differences in training strategies and experimental configurations.

The results clearly show that the proposed model achieves competitive accuracy and stability across all evaluation metrics compared with standard detection frameworks, demonstrating its ability to handle visual degradation while preserving detection precision. Notably, SFID primarily comprises large, well-defined insulator targets and contains relatively few small or highly occluded objects. Consequently, overall detection complexity is moderate, with fewer localization ambiguities than datasets with multiple small-scale or overlapping targets. Nevertheless, the experiment remains significant for validating the model’s performance under fog-affected conditions, where contrast attenuation and color distortion often undermine the reliability of traditional detectors. The consistent improvement observed on SFID confirms that the proposed algorithm not only performs effectively under clear-weather scenarios but also maintains robust feature representation and detection accuracy when applied to foggy or low-visibility domains, underscoring its potential for practical deployment in real-world aerial inspection applications.

Notably, SFID primarily comprises large, well-defined insulator targets and contains relatively few small or highly occluded objects. Consequently, overall detection complexity is moderate, with fewer localization ambiguities than datasets with multiple small-scale or overlapping targets. Nevertheless, the experiment remains significant for validating the model’s performance under fog-affected conditions, where contrast attenuation and color distortion often undermine the reliability of traditional detectors. The consistent improvement observed on SFID confirms that the proposed algorithm not only performs effectively under clear-weather scenarios but also maintains robust feature representation and detection accuracy when applied to foggy or low-visibility domains, underscoring its potential for practical deployment in real-world aerial inspection applications.

Overall, the proposed framework demonstrates outstanding detection performance across diverse datasets, including the highly challenging UPID developed in this study and augmented with synthetic foggy conditions, as well as the publicly available large-scale SFID. The model consistently achieves high detection accuracy while preserving the lightweight nature of its image enhancement module, thereby maintaining computational efficiency. This balance between accuracy and efficiency makes the model particularly well suited for deployment on resource-limited platforms, such as unmanned aerial vehicles (UAVs), where real-time, reliable insulator defect detection is essential.

4.5. Visualizations and Qualitative Analysis

To further demonstrate the effectiveness of the proposed detection framework, qualitative visualization results are presented in Figure 7. The comparisons are organized into three columns: (1) results from the baseline detector network (DN) on real foggy images, (2) results from the enhanced IEM + DN model on the same foggy images, and (3) IEM + DN model outputs after incorporating the proposed data-augmentation strategy. Although the visual differences among detection results may appear limited, the proposed method consistently yields higher confidence and more stable detections under domain shift, as confirmed by the quantitative results. The proposed method demonstrates a significant increase in detection accuracy for fault insulators, achieving 99% compared to 84% with the baseline YOLOv3 (Darknet-53 backbone). Even with augmentations like flip or foggier images, the method maintains high performance, showing values of 98%. This indicates enhanced robustness and effectiveness in detection without added computational costs.

The proposed framework achieves strong localization and recognition accuracy in fog-affected scenes, demonstrating robustness under visibility degradation. The integration of the image enhancement module (IEM) effectively restores visual clarity by amplifying texture and contrast features, enabling more reliable detection of both intact and defective insulators. When the augmentation strategy is further applied during training, the IEM + DN model demonstrates the most robust performance, maintaining precise bounding boxes and stable confidence scores even under severe atmospheric degradation. These visual results confirm that the combination of enhancement and augmentation substantially improves the detector’s resilience to real-world visibility challenges and enhances its generalization capability for UAV-based inspection tasks.

5. Discussion and Analysis

The experimental results demonstrate the effectiveness of the proposed framework in improving object detection accuracy and robustness across varying environmental conditions. Comprehensive evaluations on both UPID and SFID validate the model’s cross-domain generalization. Transferability to real fog conditions is assessed through testing fog-impacted images from UPID, showing consistent performance improvements over the baseline DN. While pixel-level statistical matching between synthetic and real fog is not provided, the performance gains on actual fog images indicate effective transfer. On UPID, which captures natural environmental variations, the model achieved consistent, high-precision detection across diverse lighting and background conditions. Meanwhile, the evaluation on SFID highlights the model’s resilience in adverse visual environments, where synthetic fog was introduced to simulate reduced visibility. The proposed approach maintained stable performance without architectural modifications, thereby confirming the adaptability of the hybrid training strategy and the effectiveness of image-level enhancement filters in mitigating domain shift effects. Reported SOTA results are sourced from publicly available sources. Variations in data splits and training settings may occur. The comparison aims to offer contextual performance references rather than a strictly controlled benchmark.

Quantitative comparisons further substantiate the advantages of the proposed model over object detection models. On UPID, the framework achieved a mean Average Precision (mAP) of 99.79%, achieving strong performance relative to previously reported results, noting that these comparisons are contextual since different works may use varying splits and training configurations. The model’s confirms ability to maintain detection accuracy across varying illumination and background complexity. On SFID, which evaluates performance under synthetic fog conditions, the proposed method achieves a Precision of 99.25%, with Recall of 99.11% and mAP@50 of 99.26%. These results demonstrate strong detection accuracy under fog-induced visibility degradation. It should be noted that comparisons with other methods are based on reported results in the literature and are provided for contextual reference rather than strict benchmarking under a unified experimental protocol. The slightly lower Recall relative to Precision suggests a conservative detection strategy, where a small fraction of true objects may be missed, but the predictions that are made are almost always accurate. Together, these results demonstrate that our approach prioritizes detection accuracy without substantially compromising coverage, making it particularly suitable for applications where false positives are costly or undesirable. The proposed method achieves real-time performance 37 FPS model-only on a single GPU. HRGA-Net [43] prioritizes accuracy through complex feature aggregation, which generally results in lower inference speed and higher memory consumption, making it less suitable for real-time UAV deployment. Although FPS values for HRGA-Net are not consistently reported in the original publication, our method explicitly emphasizes speed–accuracy balance, which is critical for practical aerial inspection systems.

These results confirm that integrating adaptive enhancement filters with the hybrid training strategy significantly improves feature robustness under domain shifts. The proposed model shows greater tolerance to contrast degradation and color distortion caused by fog or uneven illumination, conditions that typically degrade the detection accuracy of standard models. Moreover, the model’s lightweight IEM design ensures these gains are achieved without additional inference cost, maintaining computational efficiency suitable for real-time UAV-based inspection tasks. Collectively, these outcomes verify that the proposed approach not only achieves higher detection precision but also enhances DG compared to conventional YOLO architectures. Hybrid training and standard geometric and photometric augmentations are essential parts of the training protocol and are not considered separate architectural contributions. To prevent over-fragmentation of results, no further experimental decomposition of these components is conducted.

6. Conclusions

This study proposed an adaptive enhancement algorithm based on the YOLO framework for robust and efficient insulator defect detection under diverse and challenging environmental conditions. Building on our previously published model, the present work extended its evaluation to new domains using two complementary datasets. UPID captures real-world variations in illumination and background complexity, while SFID introduces controlled fog levels to simulate degraded visibility. Through these experiments, the proposed model was systematically assessed for domain generalization and detection accuracy across both natural and synthetic visual conditions.

To enhance generalization in adverse environments, a hybrid training strategy was adopted, integrating clear images with synthetically fogged images generated via the atmospheric scattering model. This approach exposed the model to multiple degradation levels during training, enabling more robust feature learning and effective DG without architectural modification. In addition, an image enhancement module, composed of pixel-wise and feature-level filters such as white balance, gamma correction, contrast, tone, sharpening, and defog filters, was incorporated to mitigate image degradation effects. These components contribute to enhanced feature representation and stable detection behavior, particularly under foggy or low-contrast conditions that commonly challenge conventional object detectors.

Experimental evaluations confirmed that the proposed method demonstrates strong robustness and competitive performance under fog-related domain shifts. On UPID, the model achieved high detection precision across all insulator categories, while on SFID, it maintained robust detection performance under fog-induced visibility degradation. Furthermore, the model achieved real-time inference at 38 FPS, exceeding the generally accepted 20 FPS threshold for real-time performance in visual detection systems. This demonstrates that the proposed method successfully balances detection accuracy with processing efficiency.

Overall, whether evaluated on the challenging UPID with synthetic fog augmentation or the large-scale public SFID, the updated model delivered outstanding detection performance and strong domain generalization. These results confirm that the proposed framework provides a practical, efficient, and generalizable solution for automated insulator defect detection under complex environmental conditions. Future research will focus on expanding the model’s adaptability to additional weather effects, such as rain and low-light conditions, and on exploring multi-domain learning strategies to further enhance its generalization across diverse inspection environments.

Author Contributions

Methodology, A.K. and K.E.; conceptualization, A.K. and K.E.; software, A.K. and K.E.; formal analysis, A.K. and K.E.; validation, A.K. and K.E.; investigation, A.K. and K.E.; data curation, A.K. and K.E.; resources, A.K.; writing—original draft preparation, A.K. and K.E.; writing—review and editing, A.K. and K.E.; supervision, K.E.; visualization, A.K. and K.E.; project administration, A.K. and K.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Public datasets were used; references are available in this manuscript. All materials related to our study are publicly available: https://drive.google.com/drive/folders/1MK0VZnhgwrOysMBWppqZwnFEHp20Gcm8?usp=sharing (accessed on 20 January 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once
UAV	unmanned aerial vehicle
DG	domain generalization
UPID	Unifying Public Insulator Dataset
SFID	Synthetic Foggy Insulator Dataset

References

Jain, N.; Bedi, J.; Anand, A.; Godara, S. A transfer learning architecture to detect faulty insulators in powerlines. IEEE Trans. Power Deliv. 2024, 39, 1002–1011. [Google Scholar] [CrossRef]
Yang, Z.; Xu, Z.; Wang, Y. Bidirection-fusion-YOLOv3: An improved method for insulator defect detection using UAV image. IEEE Trans. Instrum. Meas. 2022, 71, 3201499. [Google Scholar] [CrossRef]
Gouda, O.E.; Darwish, M.M.; Mahmoud, K.; Lehtonen, M.; Elkhodragy, T.M. Pollution severity monitoring of high voltage transmission line insulators using wireless device based on leakage current bursts. IEEE Access 2022, 10, 53713–53723. [Google Scholar] [CrossRef]
Song, Z.; Huang, X.; Ji, C.; Zhang, Y. Deformable YOLOX: Detection and rust warning method of transmission line connection fittings based on image processing technology. IEEE Trans. Instrum. Meas. 2023, 72, 3238742. [Google Scholar] [CrossRef]
Liu, Y.; Liu, D.; Huang, X.; Li, C. Insulator defect detection with deep learning: A survey. IET Gener. Transm. Distrib. 2023, 17, 3541–3558. [Google Scholar] [CrossRef]
Zhou, M.; Li, B.; Wang, J.; He, S. Fault detection method of glass insulator aerial image based on the improved YOLOv5. IEEE Trans. Instrum. Meas. 2023, 72, TIM-2023. [Google Scholar] [CrossRef]
Wang, X.; Jiang, P.; Li, Y.; Hu, M.; Gao, M.; Cao, D.; Ding, R. Progressive critical region transfer for cross-domain visual object detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 9427–9441. [Google Scholar] [CrossRef]
Guo, Y.; Yu, H.; Xie, S.; Ma, L.; Cao, X.; Luo, X. Dsca: A dual semantic correlation alignment method for domain adaptation object detection. Pattern Recognit. 2024, 150, 110329. [Google Scholar] [CrossRef]
Chen, Z.; He, Z.; Lu, Z.M. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans. Image Process. 2024, 33, 1002–1015. [Google Scholar] [CrossRef]
Cui, Y.; Knoll, A. Dual-domain strip attention for image restoration. Neural Netw. 2024, 171, 429–439. [Google Scholar] [CrossRef]
He, Y.; Liu, Z. A feature fusion method to improve the driving obstacle detection under foggy weather. IEEE Trans. Transp. Electrif. 2021, 7, 2505–2515. [Google Scholar]
Oza, P.; Sindagi, V.A.; Patel, V.M. Unsupervised domain adaptation of object detectors: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 4018–4040. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Lan, C.; Liu, C.; Ouyang, Y.; Qin, T.; Lu, W.; Chen, Y.; Zeng, W.; Yu, P.S. Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowl. Data Eng. 2022, 35, 8052–8072. [Google Scholar] [CrossRef]
Piva, F.J.; De Geus, D.; Dubbelman, G. Empirical generalization study: Unsupervised domain adaptation vs. domain generalization methods for semantic segmentation in the wild. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: New York, NY, USA, 2023; pp. 499–508. [Google Scholar]
Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 3339–3348. [Google Scholar]
Zhou, Q.; Gu, Q.; Pang, J.; Lu, X.; Ma, L. Self-adversarial disentan gling for specific domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 8954–8968. [Google Scholar]
Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef]
Kariri, A.; Elleithy, K. AEA-YOLO: Adaptive Enhancement Algorithm for Challenging Environment Object Detection. AI 2025, 6, 132. [Google Scholar] [CrossRef]
Zhao, W.; Xu, M.; Cheng, X.; Zhao, Z. An insulator in transmission lines recognition and fault detection model based on improved faster RCNN. IEEE Trans. Instrum. Meas. 2021, 70, 3112227. [Google Scholar] [CrossRef]
Dong, C.; Zhang, K.; Xie, Z.; Shi, C. An improved cascade RCNN detection method for key components and defects of transmission lines. IET Gener. Transm. Distrib. 2023, 17, 4277–4292. [Google Scholar] [CrossRef]
Chen, C.; Zheng, Z.; Xu, T.; Guo, S.; Feng, S.; Yao, W.; Lan, Y. Yolo-based uav technology: A review of the research and its applications. Drones 2023, 7, 190. [Google Scholar] [CrossRef]
Yang, K.; Gao, S.; Yu, L.; Zhang, D.; Wang, J.; Song, C. A real-time Siamese network based on knowledge distillation for insulator defect detection of overhead contact lines. IEEE Trans. Instrum. Meas. 2024, 73, TIM-2024. [Google Scholar] [CrossRef]
Zhao, Y.; Wu, J.; Chen, W.; Wang, Z.; Tian, Z.; Yu, F.R.; Leung, V.C. A small object real-time detection method for power line inspection in low-illuminance environments. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 3936–3950. [Google Scholar] [CrossRef]
Fang, Y.; Yap, P.T.; Lin, W.; Zhu, H.; Liu, M. Source-free unsupervised domain adaptation: A survey. Neural Netw. 2024, 174, 106230. [Google Scholar] [CrossRef]
Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain generalization: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4396–4415. [Google Scholar] [CrossRef]
Ju, M.; Ding, C.; Ren, W.; Yang, Y.; Zhang, D.; Guo, Y.J. IDE: Image dehazing and exposure using an enhanced atmospheric scattering model. IEEE Trans. Image Process. 2021, 30, 2180–2192. [Google Scholar] [CrossRef] [PubMed]
Vs, V.; Poster, D.; You, S.; Hu, S.; Patel, V.M. Meta-uda: Unsupervised domain adaptive thermal object detection using meta-learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: New York, NY, USA, 2022; pp. 1412–1423. [Google Scholar]
Wang, T.; Zhang, X.; Yuan, L.; Feng, J. Few-shot adaptive faster r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019; pp. 7173–7182. [Google Scholar]
Hnewa, M.; Radha, H. Multiscale domain adaptive yolo for cross-domain object detection. In 2021 IEEE International Conference on Image Processing (ICIP); IEEE: New York, NY, USA, 2021; pp. 3323–3327. [Google Scholar]
Hnewa, M.; Radha, H. Integrated multiscale domain adaptive YOLO. IEEE Trans. Image Process. 2023, 32, 1857–1867. [Google Scholar] [CrossRef]
Zhang, S.; Tuo, H.; Hu, J.; Jing, Z. Domain adaptive yolo for one-stage cross-domain detection. In Asian Conference on Machine Learning; PMLR: South San Francisco, CA, USA, 2021; pp. 785–797. [Google Scholar]
Wei, J.; Wang, Q.; Zhao, Z. YOLO-G: Improved YOLO for cross-domain object detection. PLoS ONE 2023, 18, e0291241. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Zou, H.; Zhou, Y.; Xie, L. Robust adversarial discriminative domain adaptation for real-world cross-domain visual recognition. Neurocomputing 2021, 433, 28–36. [Google Scholar] [CrossRef]
Jing, M.; Meng, L.; Li, J.; Zhu, L.; Shen, H.T. Adversarial mixup ratio confusion for unsupervised domain adaptation. IEEE Trans. Multimed. 2022, 25, 2559–2572. [Google Scholar] [CrossRef]
Saxena, D.; Cao, J. Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–42. [Google Scholar] [CrossRef]
Lee, J.; Hwang, K.I. YOLO with adaptive frame control for real-time object detection applications. Multimed. Tools Appl. 2022, 81, 36375–36396. [Google Scholar] [CrossRef]
Gheorghe, C.; Duguleana, M.; Boboc, R.; Postelnicu, C. Analyzing real-time object detection with yolo algorithm in automotive applications: A review. Comput. Model. Eng. Sci. 2024, 141, 1939. [Google Scholar] [CrossRef]
Zhang, Z.D.; Zhang, B.; Lan, Z.C.; Liu, H.C.; Li, D.Y.; Pei, L.; Yu, W.X. FINet: An insulator dataset and detection benchmark based on synthetic fog and improved YOLOv5. IEEE Trans. Instrum. Meas. 2022, 71, 3194909. [Google Scholar] [CrossRef]
Chen, J.; Fu, Z.; Cheng, X.; Wang, F. An method for power lines insulator defect detection with attention feedback and double spatial pyramid. Electr. Power Syst. Res. 2023, 218, 109175. [Google Scholar] [CrossRef]
Liu, Y.; Huang, X. Efficient cross-modality insulator augmentation for multi-domain insulator defect detection in UAV images. Sensors 2024, 24, 428. [Google Scholar]
Li, J.; Zhou, H.; Lv, G.; Chen, J. A2MADA-YOLO: Attention Alignment Multiscale Adversarial Domain Adaptation YOLO for Insulator Defect Detection in Generalized Foggy Scenario. IEEE Trans. Instrum. Meas. 2025, 74, 3541814. [Google Scholar]
Tian, Y.; Ahmad, R.B.; Abdullah, N.A.B. Accurate and efficient insulator maintenance: A DETR algorithm for drone imagery. PLoS ONE 2025, 20, e0318225. [Google Scholar] [CrossRef]
Liao, Y.; Peng, C.; Li, X.; Wang, X.; Deng, Y. HRGA-Net: Hierarchical rotation Gaussian attention network for accurate insulator detection from UAV images. IEEE Trans. Power Deliv. 2025, 40, 2593–2610. [Google Scholar] [CrossRef]
Wei, N.; Li, X.; Jin, J.; Chen, P.; Sun, S. Detecting insulator strings as linked chain structure in smart grid inspection. IEEE Trans. Ind. Inform. 2022, 19, 9019–9027. [Google Scholar]
Liu, X.; Rao, Z.; Zhang, Y.; Zheng, Y. UAVs images based real-time insulator defect detection with transformer deep learning. In 2023 IEEE International Conference on Robotics and Biomimetics (ROBIO); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Lu, Y.; Li, D.; Li, D.; Li, X.; Gao, Q.; Yu, X. A lightweight insulator defect detection model based on drone images. Drones 2024, 8, 431. [Google Scholar] [CrossRef]
Shi, W.; Lyu, X.; Han, L. An Object Detection Model for Power Lines with Occlusions Combining CNN and Transformer. IEEE Trans. Instrum. Meas. 2025, 74, 3529073. [Google Scholar] [CrossRef]

Figure 1. Defective insulator (fault/broken).

Figure 2. The proposed method.

Figure 3. PPN diagram.

Figure 4. Example of the degradation strategy showing ten adverse-weather variants for one image.

Figure 5. Comparison of confusion matrices for the baseline DN and the proposed IEM + DN framework.

Figure 6. FPS of the proposed method.

Figure 7. Visualization results in real foggy images from UPID. Column 1: DN results; Column 2: IEM + DN results; Column 3: IEM + DN with augmentation.

Table 1. Effect of hybrid training and IEM on UPID.

DN	IEM	Hybrid	Recall	Precision	mAP@50	mAP@50:95
√		√	90.21%	97.34%	92.94%	67.20%
√	√		92.84%	96.61%	92.42%	64.77%
√	√	√	99.68%	96.38%	99.79%	81.09%

Table 2. Overall detection performance on UPID (metrics are overall dataset-level).

Mode/Filter	Defog	White Balance	Gamma	Tone	Contrast	Sharpen	Recall	Precision	mAP@50	mAP@50:95
DN							90.21%	97.34%	92.94%	67.20%
+1 Filter	√						90.61%	97.27%	93.09%	67.52%
+2 Filters					√	√	97.61%	96.84%	98.76%	75.37%
+3 Filters		√	√	√			93.87%	97.20%	96.60%	75.48%
+3 Filters	√				√	√	97.69%	96.69%	98.81%	75.52%
+5 Filters	√	√	√	√		√	99.60%	96.31%	99.74%	80.82%
IEM + DN	√	√	√	√	√	√	99.68%	96.38%	99.79%	81.09%

Table 3. Comparing experimental results using UPID.

Model	mAP@50	mAP@50:95
FINet-YOLOv5 [38] 2022	99.30%	-
AF-DSP method [39] 2023	97.10%	-
MMA multi-Domain [40] 2024	97.38%	82.82%
A2MADA-YOLOv9 [41] 2025	95.00%	-
DETR algorithm [42] 2025	96.30%	-
HRGA-Net [43] 2025	99.56%	84.11%
Proposed Method (ours)	99.79%	81.09%

Table 4. Overall detection performance comparison on SFID (metrics are overall dataset-level).

Model	Recall	Precision	mAP@50	mAP@50:95
FINet [38] 2022	99.06%	86.52%	99.12%	81.02%
ChainNet [44] 2022	99.23%	89.71%	99.42%	83.37%
RT-DERT Transformer [45] 2023	-	-	99.50%	74.50%
IDD-YOLO [46] 2024	-	-	99.40%	87.20%
MMA multi-Domain [40] 2024	-	-	98.48%	86.33%
ODNet [47] 2025	95.54%	85.65%	92.53%	77.70%
HRGA-Net [43] 2025	99.71%	91.33%	99.82%	85.29%
Proposed Method (ours)	99.11%	99.25%	99.26%	80.93%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kariri, A.; Elleithy, K. Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images. Electronics 2026, 15, 979. https://doi.org/10.3390/electronics15050979

AMA Style

Kariri A, Elleithy K. Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images. Electronics. 2026; 15(5):979. https://doi.org/10.3390/electronics15050979

Chicago/Turabian Style

Kariri, Abdulrahman, and Khaled Elleithy. 2026. "Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images" Electronics 15, no. 5: 979. https://doi.org/10.3390/electronics15050979

APA Style

Kariri, A., & Elleithy, K. (2026). Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images. Electronics, 15(5), 979. https://doi.org/10.3390/electronics15050979

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Object Detection in Generalized Foggy Conditions of Insulator Defect Detection Based on Drone Images

Abstract

1. Introduction

2. Related Work

2.1. Object Detection

2.2. Domain Adaptation and Domain Generalization

2.3. Domain Adaptive Object Detection

3. Methods

3.1. Image Enhancement Module (IEM)

3.2. Parameter Prediction Network (PPN)

3.3. Detection Network (DN)

4. Experiments and Results

4.1. Experimental Environment and Configuration

4.2. Evaluation Metrics

4.3. Ablation Study

4.3.1. Effect of Hybrid Training and IEM

4.3.2. IEM Component and Hybrid Training Analysis

4.4. UPID and SFID Comparison Experiment

4.5. Visualizations and Qualitative Analysis

5. Discussion and Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI