LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion

Qu, Jianan; Zhu, Zhiliang; Jiang, Ziang; Wen, Congjie; Weng, Yijian

doi:10.3390/app151910780

Open AccessArticle

LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion

by

Jianan Qu

^1,2,

Zhiliang Zhu

^1,2,*,

Ziang Jiang

^1,2,

Congjie Wen

^1,2 and

Yijian Weng

^1,2

¹

College of Electrical and Electronic Engineering, Wenzhou University, Wenzhou 325000, China

²

Wenzhou Key Laboratory of Unmanned Intelligent Systems and Equipment, Wenzhou University, Wenzhou 325000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10780; https://doi.org/10.3390/app151910780

Submission received: 4 September 2025 / Revised: 2 October 2025 / Accepted: 3 October 2025 / Published: 7 October 2025

(This article belongs to the Special Issue Advances in Wireless Networks and Mobile Communication)

Download

Browse Figures

Versions Notes

Abstract

While insulator integrity is critical for power grid stability, prevailing detection algorithms often rely on computationally intensive models incompatible with resource-constrained edge devices like unmanned aerial vehicles (UAVs). Key limitations—including redundant feature interference, inadequate sensitivity to small targets, rigid fusion weights, and sample imbalance—further restrict practical deployment. To address those problems, this study presents a lightweight insulator anomaly detection algorithm, LAI-YOLO. First, the SqueezeGate-C3k2 (SG-C3k2) module, equipped with an adaptive gating mechanism, is incorporated into the Backbone network to reduce redundant information during feature extraction. Secondly, we propose a High-level Screening–Feature Weighted Feature Pyramid Network (HS-WFPN) to replace FPN+PAN via selective weighted feature fusion, enabling dynamic cross-scale integration and enhanced small-target detection. Then, a reconstructed lightweight detection head coupled with Slide Weighted Focaler Loss (SWFocalerLoss) mitigates performance degradation from sample imbalance. Ultimately, the layer adaptation for the magnitude-based pruning (LAMP) technique slashes computational demands without sacrificing detection prowess. Experimental results on our insulator anomaly dataset demonstrate that the improved model achieves higher efficacy in identifying insulator anomalies, with mAP@0.5 increasing from 88.2% to 91.1%, while model parameters and FLOPs are diminished to 45.7% and 53.9% of the baseline, respectively. This efficiency facilitates the deployment of edge devices and highlights the method’s considerable application potential.

Keywords:

YOLO; insulator; lightweight; anomaly detection; weighted feature fusion

1. Introduction

The swift progression of contemporary power grids necessitates enhanced intelligent operation and repair of transmission equipment. Insulators, as the core components of transmission lines, provide mechanical support and electrical insulation, with their operational integrity directly impacting grid stability [1,2]. However, due to prolonged exposure to intricate natural surroundings, insulators are particularly vulnerable to a spectrum of faults (including aging, defects, and dirt, etc.) induced by the synergistic effects of wind, ultraviolet radiation, extreme temperature variations, and persistent mechanical stress [3,4]. The rapid evolution of UAV aerial photography has replaced manual inspections as the primary detection method, which has made UAV-based insulator anomaly detection a critical research focus [5].

Current insulator detection methods are categorized into two main types: typical image processing methods and deep learning techniques. Typical approaches generally utilize spatial morphology, texture, and chromatic attributes [6]. For instance, Liao et al. [7] proposed a local feature and spatial ordering-based detector, utilizing multi-scale descriptors to represent features and training spatial order features for enhanced robustness. He et al. [8] designed a non-contact faulty insulator detection method using infrared image matching, combining improved SIFT (scale-invariant feature transformation) and RANSAC (random sampling consistency) for efficient outdoor ceramic insulator string inspection. Wei [9] devised a hybrid insulator defect identification model utilizing stochastic Hough transform, wherein Canny-derived edge curves are subjected to elliptical fitting, and defects are discerned by contrasting fitted contours with actual contours. While these traditional methods improve detection efficiency, their reliance on manual feature engineering restricts applicability to specific scenarios, limiting generalization in complex real-world conditions.

Advancements in deep learning theory and object detection algorithms have established deep learning-based insulator anomaly detection as the preeminent research focus. These methodologies are classified into two types based on the detection stage, one being the two-stage detection algorithm, exemplified by R-CNN [10], Fast R-CNN [11], Faster R-CNN [12], etc., which have been extensively studied by many scholars. Chen et al. [13] combined the EfficientNet backbone, depthwise separable convolutions, and transfer learning with lightweight Faster R-CNN, reducing parameters while maintaining accuracy for efficient defect identification. Zhou et al. [14] augmented Mask R-CNN by incorporating attention mechanisms, rotation augmentation, and hyperparameter optimization via a genetic algorithm to boost small-target identification. Wang et al. [15] designed multiscale local feature aggregation and global feature alignment modules to boost Faster R-CNN accuracy and mitigate sample scarcity/annotation challenges. Despite the above methods performing excellently in terms of detection accuracy, such methods face computational bottlenecks on UAV edge devices due to structural redundancy and slow inference, hindering real-time deployment.

Another category comprises one-stage detectors dominated by YOLO (You Only Look Once) [16] and SSD (Single-Shot MultiBox Detector) [17], valued for faster inference speeds. Akella et al. [18] integrated deep convolutional generative adversarial networks (DCGAN) and super-resolution generative adversarial networks (SRGAN) with YOLOv3 to enhance low-resolution insulator imagery processing; Zeng et al. [19] introduced a lightweight Ghost-SSD architecture that substitutes the VGG-16 framework with GhostNet, incorporating the scSE attention mechanism and DIoU-NMS to improve occlusion recognition while integrating pruning and distillation techniques to attain an equilibrium between elevated precision and efficiency. Zhang et al. [20] augmented YOLOv7 with ECA attention and PConv, employing Normalized Wasserstein Distance to reduce small-target misdetections. Ji et al. [21] suggested an enhanced YOLO11 technique that integrates detecting heads with adaptively spatial feature fusion to augment feature recognition capabilities. The original Neck was substituted with a Bidirectional Feature Pyramid Network (BiFPN), while ShuffleNetV2 was incorporated to markedly decrease the model’s computational expense. Souza et al. [22] deployed an image-based power line inspection method with a Hybrid-YOLO improved model and ResNet-18 classifier mounted on a drone to quickly and efficiently identify and inspect faulty components in power systems in hard-to-reach areas. While one-stage detectors better satisfy real-time demands than two-stage approaches, they still face challenges in accuracy, model compression, vulnerability to redundant information, small-target sensitivity, rigid fusion weights, and sample imbalance.

To tackle these issues, we introduce LAI-YOLO, a lightweight insulator anomaly detection model derived from YOLO11, featuring the following primary contributions:

This paper introduces SqueezeGate-C3k2 (SG-C3k2), equipped with an adaptive gating mechanism, to alleviate the influence of ambient noise on feature extraction. The integration of a SqueezeGate (SG) layer facilitates the dynamic selection of essential feature channels, thus mitigating redundant input and enhancing the capacity of the model to identify small target attributes.
To resolve cross-scale information conflicts caused by fixed weights in conventional feature fusion, this paper presents the High-level Screening-Feature Weighted Fusion Pyramid Network (HS-WFPN). The trainable Weighted Select Feature Fusion (WSFF) module enables HS-WFPN to discern the contribution discrepancies of multi-scale features, optimize the fusion of deep semantic and low-level spatial data, and significantly mitigate the challenges associated with detecting small objects.
To better balance model performance and computational complexity, this paper restructures the detection head by replacing standard convolutions with depthwise separable convolutions and embedding an Efficient Channel Attention (ECA) mechanism. Additionally, Slide Weighted Focaler Loss (SWFocalerLoss) aims to alleviate the impact of class imbalance on accuracy and bolster the detection’s robustness.
This paper utilizes layer-adaptive sparsity for magnitude-based pruning (LAMP) to remove superfluous channels, thereby optimizing the model for the constrained hardware resources of edge devices like UAVs.

2. Related Works

This section reviews the key algorithmic frameworks and architectural components that serve as the direct foundation for our proposed LAI-YOLO model. By discussing the baseline YOLO11 structure, feature fusion strategies, and lightweight convolution designs, we establish the necessary technical context and highlight the specific areas within these general methodologies that our work seeks to advance.

2.1. YOLO11 Algorithm

YOLOv11 is the newest official iteration in the YOLO series from Ultralytics. As a recently introduced model, its architecture is designed to build upon the successes of its predecessors (e.g., YOLOv5, YOLOv8) and incorporates the three fundamental components of the YOLO series: Backbone, Neck, and Head. The principal structural components comprise convolution blocks (Conv2d + BatchNorm + SiLU), C3k2, SPPF, and C2PSA, with the whole architecture depicted in Figure 1. In comparison with the earlier YOLOv8 iteration, YOLO11 incorporates the C3k2 module, an advancement of the C2f module, which improves feature extraction efficiency via concurrent multi-scale convolutions and adjustable kernel configurations. This architectural refinement reduces memory consumption while improving gradient flow during training. At the terminal of the Backbone, YOLO11 incorporates the C2PSA module and employs multi-head attention methods and feedforward neural networks to enhance the perception of salient features, hence improving feature learning efficiency. Moreover, swapping out standard convolutions in the classification layer for depthwise separable convolutions slashes computational complexity without compromising accuracy, striking a perfect balance between efficiency and cost.

In the field of insulator anomaly detection, YOLOv11 serves as a powerful baseline due to its efficient architecture. However, its fixed feature extraction and fusion strategies are not optimally tailored for the specific challenges in this domain, such as the prevalence of small-sized anomalies and the complex, cluttered backgrounds of aerial transmission line imagery. These limitations motivate our subsequent enhancements to better capture insulator-specific features.

2.2. High-Level Screening–Feature Fusion Pyramid Networks

As deep learning-driven object detection algorithms advance, model accuracy has significantly improved. However, effective multi-scale feature fusion remains a primary obstacle to further performance gains. To address this, Chen et al. [23] designed a High-level Screening–Feature Fusion Pyramid Network (HS-FPN) architecture for the purpose of multi-scale feature fusion. Figure 2 illustrates that this architecture comprises two primary components: the Feature Selection Module and the Select Feature Fusion (SFF) Module. Subsequent to feature extraction, the Feature Selection Module taps into the Coordinate Attention (CA) mechanism to assess various scaled feature maps. This process sifts out complex details from each channel, leading to a seamless cross-scale alignment through the use of standard convolution.

SFF utilizes advanced semantic attributes as weight factors to efficiently sift key data from basic features, thereby achieving strategic feature integration. Given the input high-level features

f_{high} \in R^{C \times H \times W}

and low-level features

f_{low} \in R^{C \times H_{1} \times W_{1}}

, in SFF, the high-level features are initially upsampled using transposed convolution (T-Conv) and bilinear interpolation to align with the dimensions of the low-level features, yielding features

f_{att} \in R^{C \times H_{1} \times W_{1}}

. The CA attention mechanism converts the dimension-aligned high-level features into attention weights. It then performs adaptive filtering on the low-level features. Subsequently, the filtered low-level features are merged with the high-level features. This procedure improves the capability for feature expression, yielding

f_{out} \in R^{C \times H_{1} \times W_{1}}

. Equations (1) and (2) illustrate the integration process within SFF’s mechanism:

f_{att} = B L (T - Conv (f_{high})),

(1)

f_{out} = f_{low} \times CA (f_{att}) + f_{att} .

(2)

The multi-scale feature fusion capability of HS-FPN is particularly relevant for insulator anomaly detection. Insulator images captured by UAVs often contain targets at varying scales: entire insulator strings represent large targets, while specific defects constitute small targets. Effectively integrating high-level semantic information with low-level spatial details is essential. While HS-FPN provides a basic framework for this integration, its fixed fusion strategy may not optimally balance the contributions of different scales, especially for subtle anomalies, leading us to propose a more adaptive weighting mechanism in Section 3.2.

2.3. Characteristics of GSConv

GSConv [24] is an lightweight convolution design, as depicted in Figure 3. Input features undergo compression by standard convolution, halving the number of channels relative to the output channels. Subsequently, depthwise convolution is used to learn local feature details while maintaining channel independence. The original and refined features are combined along the channel axis, a strategy that leverages their supplementary characteristics to strengthen the feature representation’s discriminative power. Finally, a straightforward and effective channel shuffle procedure is presented, enabling the original features to interlace and integrate into the features of the depthwise convolution, hence enhancing the convolution’s representation to resemble that of standard convolution. The stringent resource constraints of edge devices make the lightweight design of GSConv particularly well-suited for anomaly detection applications. Reducing the FLOPs and parameter count without significant accuracy loss is paramount for real-time inference on these platforms. Therefore, we leverage GSConv as a fundamental building block in our subsequent neck design to construct an accurate and suitable insulator anomaly detection model.

3. Proposed Method and Experimental Setup

The LAI-YOLO architecture is indicated in Figure 4. In comparison to the basic YOLO11n, this algorithm incorporates the SqueezeGate-C3k2 (SG-C3k2) module into the Backbone, hence augmenting the model’s capacity to discern feature details via an adaptive gating mechanism. The High-level Screening–Feature Weighted Feature Pyramid Network (HS-WFPN) within the Neck network employs selective weighted feature fusion to facilitate the appropriate integration of cross-scale features, hence circumventing the information conflict problems associated with conventional multiple-scale fusion. The Head is restructured into a more lightweight and efficient LA-Head and coupled with an improved Slide Weighted Focaler Loss (SWFocalerLoss) to achieve synergistic optimization of model performance and complexity while alleviating sample imbalance. Finally, we introduce layer-adaptive sparsity for the magnitude-based pruning (LAMP) [25] method to eliminate superfluous channels in the model, greatly lowering the parameters and computational complexity. The following sections provide detailed explanations of the aforementioned SG-C3k2 module, HS-WFPN, improvements in the Head, LAMP channel pruning, and experiment-related setup.

3.1. SG-C3k2 Module

The adaptable and well-considered design of the C3k2 module in YOLOv11 strikes an optimal balance between computational efficiency and model accuracy. However, in complex and variable detection scenarios, such as insulator detection, the fixed channel interaction technique fails to adequately suppress redundant information, obstructing the model’s ability to acquire essential features. This study offers SG-C3k2, which incorporates an adaptive feature selection layer to boost the model’s capacity to distinguish target characteristics.

As depicted in Figure 5, compared to the original Bottleneck, SG-C3k2 embeds a SqueezeGate (SG) layer within its SG-Bottleneck. This layer initially utilizes a 1 × 1 convolution to condense the input into a lower-dimensional representation. The representation is subsequently aggregated spatially into a channel-specific description. A sigmoid function subsequently maps this descriptor into channel-wise gating signals. The signal weights facilitate adaptive channel recalibration of the initial features, allowing the model to prioritize essential feature channels and diminish redundant information, thus enhancing feature extraction efficiency markedly. Functionally, the SqueezeGate mechanism is separate from attention modules like ECA or CA, as it performs channel-wise gating without spatial-aware encoding. To further optimize the SG-Bottleneck structure, we utilize depthwise convolution instead of standard convolution. Research [26] indicates that depthwise convolution significantly reduces parameters and FLOPs, lowering model complexity. Its channel-independent mechanism focuses on capturing fine-grained spatial information per channel, enhancing feature learning efficiency and preserving detailed features.

3.2. High-Level Screening–Feature Weighted Feature Pyramid Network

As shown in Figure 6a, small-sized abnormal targets often exist in transmission insulator images. In the initial phases of the Backbone, as seen in Figure 6b, low-level features retain high-resolution spatial information, providing critical support for small target localization. With the augmentation of network depth, the feature maps experience numerous convolution processes. Advanced feature maps augment abstract semantic information while reducing essential location information. The comparison of feature maps in Figure 6c,d exemplifies this trend.

Specifically, high-level feature maps encompass abundant semantic information yet exhibit low spatial resolution, whereas low-level feature maps preserve intricate local features but lack semantic coherence. The feature fusion module of HS-FPN combines cross-scale features using an attention-guided element-wise addition method; nevertheless, this direct addition operation neglects the differing significance of various features. The disparity in feature weight distribution results in conflicts among features of varying scales, hence impairing performance in small object detection. This study introduces a High-level Screening–Feature Weighted Fusion Pyramid Network (HS-WFPN) to overcome this issue. HS-WFPN builds on the advantages of the HS-FPN framework by introducing a Weighted Select Feature Fusion (WSFF) module. This design strengthens the transmission path of small object features while enabling adaptive weighting for cross-scale feature fusion, effectively alleviating information conflicts arising from multi-scale feature discrepancies.

WSFF has abandoned the original bilinear interpolation design and retained transposed convolution as the sole upsampling method. This change is because bilinear interpolation uses fixed weights to interpolate neighboring pixels, which is computationally efficient but struggles to adapt to the geometric diversity of power transmission insulators. This inflexibility results in blurred edges in the reconstructed features, as illustrated in the red dashed region of Figure 7c. Furthermore, Coordinate Attention (CA) is employed to refine low-level features with high-level semantic guidance, which are then strategically merged with high-level features via weighted fusion. The particulars are as follows:

Suppose the set of input features is

{F_{i} \in R^{C \times H \times W}}_{i = 1}^{N}

, which includes both high-level and low-level features calibrated by the CA module. The CA mechanism operates in a channel-wise manner, enhancing or suppressing individual channels based on semantic relevance. Following this, the WSFF module assigns a scalar weight

α_{i}

to each entire feature map

F_{i}

, reflecting its relative importance at the scale level. The trainable weight parameters are denoted as

W = {[w_{1}, w_{2}, \dots, w_{N}]}^{T} \in R^{N}

, with

w_{i}

initialized to 1.0 and optimized by backpropagation. The fusion process can be articulated as follows:

α_{i} = \frac{ϕ (w_{i})}{\sum_{j = 1}^{N} ϕ (w_{j}) + ϵ} .

(3)

In Equation (3),

α_{i}

represents the normalized weight coefficient, used to adjust the overall contribution of feature maps at different scales, and

α_{i} \in (0, 1)

.

ϵ

is the smoothing factor employed to guarantee a non-zero denominator and maintain gradient stability.

ϕ (\cdot)

is the Swish activation function, which mitigates weight polarization by refining the nonlinear characteristics. The specific formulation is

ϕ (w_{i}) = w_{i} \cdot σ (w_{i}),

(4)

where

σ (\cdot)

is the sigmoid function utilized for achieving smooth nonlinear mapping. The output features

F_{fusion}

are finally obtained by multiplying the corresponding channel weights and accumulating them as shown in Equation (5).

F_{fusion} = \sum_{i = 1}^{N} α_{i} ⊙ F_{i} .

(5)

The scalar nature of weight

α_{i}

means it is shared across all channels of its corresponding feature map

F_{i}

, thereby applying a uniform scale factor. The channel-wise discrimination is already handled by the preceding CA mechanism, while the WSFF module focuses on adaptively balancing the contributions of different feature scales. WSFF dynamically modifies the fusion process by acquiring the differential significance of features via trainable weights; hence, it successfully alleviates the feature conflicts inherent in conventional fixed-weight fusion.

Additionally, to balance model performance and computational efficiency, a lightweight VoVGSCSPC [24] module based on GSConv constitutes the output stage of the HS-WFPN network, as shown in Figure 8. By integrating GSConv into the GSBottleneck, the module enhances nonlinear representation capacity and feature information reuse. Depthwise convolution is applied to residual connections, significantly reducing computational costs while improving gradient flow propagation, thereby substantially boosting detection performance.

3.3. Improvements of the Head

3.3.1. LA-Head

In the insulator anomaly detection task, the detection head utilizes extracted and fused features to predict target locations and anomaly categories. To further reduce model complexity, we design a lightweight detection head named LA-Head, as shown in Figure 9. By substituting standard convolutions with depthwise convolutions in both the regression and classification branches, computing costs are markedly decreased. An ECA [27] attention module is integrated in parallel within both branches to mitigate accuracy degradation resulting from the lightweight architecture. This module leverages global average pooling and one-dimensional convolutions to enhance channel-wise interactions, strengthening global feature representation without substantially increasing parameters, thus balancing accuracy and efficiency.

3.3.2. Slide Weighted Focaler Loss

The loss function in YOLOv11 comprises two constituent elements: a classification component utilizing BCEWithLogitsLoss, and a localization component that merges Distributed Focal Loss (DFL) with the CIoU. While current mainstream regression losses focus on geometric relationships between boxes, which improves the regression effect to some extent, they fail to address biases caused by imbalanced sample size distributions. In practical detection tasks, the proportion of differently sized samples varies significantly, leading to imbalanced gradient contributions during training. This imbalance compromises the model’s ability to learn multi-scale representations, consequently leading to missed detections of certain anomalous targets and significantly diminishing recognition accuracy. To address the issue of sample imbalance, we introduce a novel joint loss function named the Slide Weighted Focaler Loss (SWFocalerLoss), which integrates a Focaler-CIoU [28] regression mechanism with a Slide [29] weighted classification strategy into a unified loss framework. This design corrects size-related imbalance while refining bounding box predictions, significantly enhancing model robustness and accuracy across varying target sizes.

To rectify sample imbalance resulting from size discrepancies in detecting tasks, SWFocaler-IoU utilizes linear interval mapping to reformulate IoU. This allows the model to selectively prioritize critical sample intervals based on the characteristics of target size distribution, significantly enhancing regression efficiency and detection performance. The reconstruction formula is as follows:

I o U^{S W F o c a l e r} = \{\begin{matrix} 0, & I o U < d \\ \frac{I o U - d}{u - d}, & d \leq I o U \leq u \\ 1, & I o U > u \end{matrix}

(6)

where the term

I o U^{S W F o c a l e r}

refers to the reconstructed IoU, and the values of d and u are constrained within the interval

[0, 1]

. By autonomously adjusting the thresholds for d and u to ensure that

I o U^{S W F o c a l e r}

concentrates on the regression samples within the core interval, the loss function is defined as follows:

L_{S W F o c a l e r - I o U} = 1 - I o U^{S W F o c a l e r} .

(7)

The integration of the SWFocaler-IoU function into CIoU yields the loss function

L_{S W F o c a l e r - C I o U}

as follows:

L_{S W F o c a l e r - C I o U} = L_{C I o U} + I o U - I o U^{S W F o c a l e r},

(8)

where

L_{C I o U}

is CIoU loss.

To counteract classification bias induced by sample imbalance, SWFocalerLoss implements a Slide weighting strategy. This strategy establishes a weight function that prioritizes hard-to-classify instances during training. The formula for weight can be articulated as follows:

f (x_{i}) = \{\begin{matrix} 1, & I o U^{S W F o c a l e r} \leq μ - 0.1 \\ e^{1 - μ}, & μ - 0.1 < I o U^{S W F o c a l e r} < μ \\ e^{1 - I o U^{S W F o c a l e r}} . & I o U^{S W F o c a l e r} \geq μ \end{matrix}

(9)

The symbol

μ

represents the mean value of

I o U^{S W F o c a l e r}

across all samples, defining the criterion for hard–easy sample separation. Samples whose

I o U^{S W F o c a l e r}

values are at the boundary of the

μ

threshold are considered difficult to classify due to the ambiguity in their definition. To encourage the model to learn these samples, SWFocalerLoss exponentially amplifies the weight of such samples, while the remaining easy-to-classify samples are endowed with weights of 1 or close to 1. This reduces the model’s over-concentration on simple cases. The classification weights are applied to the original BCEWithLogitsLoss, expressed as follows:

L_{S W F o c a l e r} = f (X) ⊙ L_{B C E W i t h L o g i t s} .

(10)

3.4. LAMP Channel Pruning

Pruning is a widely adopted model compression technique for neural networks. It assesses structural importance and removes redundant components, significantly reducing computational overhead while preserving original model accuracy with near-lossless performance. To further reduce complexity in our improved model, we introduce layer-adaptive sparsity for the magnitude-based pruning (LAMP). This approach quantifies channel significance per layer using a LAMP score, then it prunes less critical channels to achieve efficient model compression.

Specifically, it first computes the LAMP score based on the weights of the channels in each layer to assess their relative importance. The formula is as follows:

S c o r e (u; W) = \frac{{(W [u])}^{2}}{\sum_{v \geq u} {(W [v])}^{2}},

(11)

where W is the one-dimensional unfolded weight vector for that layer, with its weights required to be arranged in ascending order based on the index mapping;

W [u]

represents the channel weight items mapped by index u;

\sum_{v \geq u} {(W [v])}^{2}

computes the cumulative sum of squares of weights with indices

v \geq u

;

S c o r e (u; W)

represents the LAMP score for channel u in that layer.

After layer-wise computation, channels with lower LAMP scores are pruned against a predefined threshold. Specifically, less important channels have their weights zeroed out, excluding them from model inference. The pruned model then undergoes fine-tuning to mitigate performance degradation. The overall pruning process can dynamically adjust the pruning ratio through the computational compression ratio

P r

, expressed as follows:

P r = \frac{C_{p r e}}{C_{p s t}},

(12)

where

C_{p r e}

is the amount of computation before pruning, and

C_{p s t}

is the amount of computation after pruning.

3.5. Dataset

The insulator anomaly dataset consists of photos obtained from the Chinese Power Line Insulator Dataset (CPLID), supplemented by publicly available insulator images gathered from the internet. The dataset covers three anomaly categories, aging insulator, insulator defect, and dirty insulator, with representative samples illustrated in Figure 10. The original dataset, consisting of 1306 images, was allocated to training, validation, and test sets in a rigorous 7:2:1 ratio to facilitate an unbiased evaluation. Given the scarcity of abnormal insulator samples in real-world scenarios, data augmentation was employed to enhance the model’s robustness and prevent overfitting. Critically, data augmentation procedures were restricted exclusively to the original training set. The augmented training samples were generated using a combination of techniques, including vertical flip (probability of 80%), horizontal flip (probability of 80%), translation (random translation by ±15 pixels on both X and Y axes), scaling (random scaling between 80% and 95% of the original size), rotation (random rotation between −30 and +30 degrees), brightness adjustment (multiply brightness values by a factor between 1.2 and 1.5), and Gaussian blur (apply a Gaussian blur with a fixed sigma of 3.0). The final augmented training set contained 3132 images, while the validation and test sets remained composed of the original images. All images were resized to a resolution of 640 × 640 pixels for model input.

Additionally, we note that the ‘defect’ category (as shown in Figure 10a), which includes flaws such as cracks and breakages, predominantly consists of small-sized targets relative to the entire insulator. The accurate detection of these small defect instances presents a significant challenge. Therefore, the performance metrics in this category are reported in our ablation and comparative studies.

3.6. Experimental Environment

To guarantee experimental equity, all models utilize a consistent input dimension of 640 × 640 pixels with a batch size of 16. Consistently across all phases, including standard training and fine-tuning, the same hyperparameters were applied. Stochastic Gradient Descent functions as the optimizer, utilizing an initial learning rate of 0.01. The momentum and weight decay parameters are established as 0.937 and 0.0005, respectively. Mosaic augmentation is utilized to improve model generalization. The comprehensive experimental platform environment is presented in Table 1.

3.7. Evaluation Metrics

This paper evaluates algorithms along two dimensions: model performance and complexity. For performance evaluation, we employ recall, precision, and mean average precision (mAP), calculated as shown in Equations (13) to (16).

R e c a l l = T P / (T P + F N),

(13)

P r e c i s i o n = T P / (T P + F P),

(14)

A P = \int_{0}^{1} P d R,

(15)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i},

(16)

Here,

T P

signifies the count of positive samples accurately identified as positive;

F N

denotes the count of positive samples erroneously classified as negative;

F P

indicates the count of negative samples mistakenly classified as positive by the model;

A P_{i}

represents the average precision for class i; n indicates the total number of classes, and mAP is calculated by averaging the AP values across all categories. mAP@0.5 signifies the mAP at IoU threshold of 0.5, whereas mAP@0.5:0.95 indicates the mAP across IoU thresholds from 0.5 to 0.95. A higher mAP number indicates superior model performance. All performance measures utilized in the studies are derived from the test set.

Additionally, this research employs three metrics to evaluate the computational complexity of the model: the number of model parameters (Params), the number of floating-point operations per second (FLOPs), and the model size.

4. Results and Discussion

4.1. LAMP Result Analysis

This research assesses the efficacy of the LAMP technique on the enhanced model by evaluating performance across various

P r

values. Table 2 summarizes the primary experimental findings.

The table indicates that for the model with

P r

= 1.2, the mAP@0.5 exhibits a 0.5% gain relative to the original model, signifying the most substantial enhancement across all experimental groups. Moreover, its mAP@0.5:0.95 rises to 61.2%. Compressions of 19.8%, 17.1%, and 16.5% were achieved in parameters, FLOPs, and model size, respectively. While network pruning is typically associated with a trade-off between performance and efficiency, this slight performance gain could be attributed to a regularization effect. As

P r

expands and the computational overhead of the model diminishes, the precision, recall, and mAP measures show a downward trend, suggesting that excessive pruning adversely affects the model’s feature representation. In conclusion, at the optimal pruning ratio, this approach successfully compresses the model while not only preserving but even slightly enhancing the detection performance.

Figure 11 distinctly illustrates the comparison of channel quantities prior to and after model pruning when

P r

= 1.2. The dark blue channels signify the pruned redundant channels, while the light blue channels denote the preserved channels, thereby confirming that the LAMP pruning technique effectively eliminates redundant structures from the model.

4.2. Comparative Analysis of Various Feature Fusion Networks

To provide additional validation for the multi-scale feature aggregation capability of the HS-WFPN, we executed comparative studies in which the Neck is substituted with several representative lightweight feature fusion networks under identical experimental settings. The outcomes are presented in Table 3.

Crucially, the HS-WFPN exhibits a consistent performance improvement, elevating recall by 0.1%, mAP@0.5 by 0.8%, and mAP@0.5:0.95 by 0.6%. The observed improvements demonstrate that the WSFF module in HS-WFPN significantly augments feature fusion capabilities. When compared with other fusion methods, HS-WFPN substantially outperforms Bi-FPN across all metrics. Despite the model’s complexity being marginally more than that of CCFF, it displays considerable detection benefits, with enhancements of 3.7%, 2.4%, and 1.0% in recall, mAP@0.5, and mAP@0.5:0.95, respectively. The experimental results conclusively confirm that HS-WFPN can effectively enhance model performance while reducing model complexity, achieving superior balance between performance and complexity.

4.3. Comparative Analysis of Various Loss Functions

To benchmark its performance, the proposed SWFocalerLoss is systematically compared with existing state-of-the-art loss functions in Table 4.

Compared to the original Focaler-CIoU, our loss demonstrates significant advantages across multiple core metrics: recall improved by 2.5%, mAP@0.5 by 1.2%, and mAP@0.5:0.95 by 1.3%, validating that its weighted classification strategy enhances overall model performance. When compared to other mainstream losses, while SWFocalerLoss exhibits marginally lower precision, it achieves superior recall and mAP values, outperforming all baselines. This indicates higher overall accuracy and reduced missed-detection rates in actual detection. Given that recall and mAP are typically prioritized in object detection, and our approach achieves optimal results on both metrics, the comprehensive experimental results confirm that SWFocalerLoss has superior applicability and effectiveness in enhancing detection performance.

4.4. Ablation Experiments

We performed an ablation study on LAI-YOLO to evaluate the contribution of each component, with the results presented in Table 5.

The individual introduction of SG-C3k2, HS-WFPN, LA-Head, and SWFocalerLoss boosts the mAP@0.5 by 0.6%, 1.1%, 0.5%, and 1.0%, respectively. Notably, SG-C3k2, HS-WFPN, and LA-Head further reduce computational costs, demonstrating their advantages in balancing accuracy and efficiency. When combined, these components significantly improve evaluation metrics compared to the baseline YOLOv11n. Especially under synergistic integration, the model achieves 90.0% mAP@0.5 with substantially reduced parameters and computations, confirming their structural compatibility. Further experimental results indicate that SWFocalerLoss exhibits strong adaptability to the enhanced architecture, elevating mAP@0.5 by 0.6% to 90.6%. Ultimately, the implementation of LAMP channel pruning in this optimized model results in further improvements: mAP@0.5 and mAP@0.5:0.95 increase by 0.5% and 3.0%, respectively. While significantly boosting performance, LAI-YOLO substantially streamlines the model architecture, greatly enhancing its deployability on resource-limited edge devices like UAVs.

4.5. Comparative Analysis of Various Lightweight YOLO Models

To comprehensively assess the efficacy and advancement of the improved model, comparison studies were performed using representative lightweight YOLO models with identical hyperparameters and settings, with the results presented in Table 6. The table indicates that LAI-YOLO attained a 61.2% mAP50-90 metric, representing a 2.2% improvement over the baseline YOLO11n, with its mAP50 improving by 2.9%, which validates its excellent performance in the insulator anomaly detection scenario. Furthermore, it attains optimal precision and recall among all models, demonstrating high accuracy with reduced missed detections, thereby better satisfying the high-recall requirements of actual insulator anomaly detection tasks.

The architecture also exhibits enhanced lightweight characteristics. Compared to the widely adopted YOLOv8n, parameter count and FLOPs decrease by 56.2% and 50%, respectively, significantly alleviating storage and computational demands on edge devices. By striking an effective balance between detection efficacy and computational demands, LAI-YOLO delivers a robust and efficient solution for automated insulator anomaly detection.

The FPS was benchmarked on an NVIDIA RTX 3090 GPU using a configuration of batch size 1 and a 640 × 640 input resolution to simulate real-time deployment. It is worth noting that real-time testing confirms a frame rate of 49.8 FPS, meeting the inspection requirements of UAVs. It is imperative to analyze the inherent trade-off between the model’s computational complexity and its inference speed. Although LAI-YOLO exhibits significantly lower FLOPs and parameters than several baseline models, its FPS is lower than that of YOLOv6n. This apparent discrepancy stems from the fact that theoretical FLOPs do not perfectly correlate with practical latency on parallel hardware like GPUs. The architectural components introduced in LAI-YOLO, such as the attention mechanisms and the adaptive feature fusion module, enhance representational power at the cost of increased sequential operations, which can reduce parallel efficiency. This design choice reflects a deliberate trade-off: we prioritize extreme parameter efficiency and high accuracy, which are critical for storage- and power-constrained edge devices, while maintaining a frame rate that comfortably makes it suitable for deployment in real-time UAV inspection systems. The significant reductions in model size and FLOPs make LAI-YOLO particularly suitable for embedded deployment.

Figure 12 depicts the comparative performance metrics of different YOLO models throughout the training phase. The pre-pruning LAI-YOLO (blue curve) demonstrates accelerated convergence and surpasses other comparative models in multiple key measures, indicating superior training efficiency and detection efficacy.

4.6. Visualization and Heatmap Comparative Experiments

We employed Grad-CAM++ to visually compare the attentional regions of LAI-YOLO with the YOLOv11n and YOLOv8n baselines. The pre-head activation heatmaps revealed distinct behavioral differences, as shown in Figure 13.

Based on the heatmap performance, YOLO11n exhibits significant heat distribution divergence, failing to comprehensively cover abnormal regions. Meanwhile, YOLOv8n shows susceptibility to background interference, with inadequate feature activation contrast between critical and non-critical areas, indicating insufficient discriminative feature learning during training, resulting in overlooked detections and erroneous positives. In contrast, LAI-YOLO generates concentrated thermal responses precisely localized to insulator anomalies, with more uniform activation intensity distribution. Benefiting from more accurate thermal focusing, LAI-YOLO provides superior detection confidence and stability while detecting abnormal areas, allowing for accurate identification of aging insulators, insulator defects, and dirty insulators.

5. Conclusions

This research presents LAI-YOLO, a lightweight algorithm based on YOLO11, to address the issues in edge-deployed insulator anomaly detection, including computational redundancy, limited sensitivity to small targets, cross-scale feature conflicts, and sample imbalance. The core of our approach lies in a series of synergistic architectural contributions: an adaptive gating mechanism (SG-C3k2) in the Backbone for dynamic feature recalibration; a novel weighted feature fusion neck (HS-WFPN) that enables adaptive cross-scale integration to enhance small-target detection; a reconstructed detection head (LA-Head) coupled with a tailored loss function (SWFocalerLoss) to balance performance and complexity while mitigating sample imbalance; and finally, the application of LAMP pruning for substantial model compression. Comprehensive experimental validation confirms that the proposed LAI-YOLO achieves a superior balance between high accuracy—reaching 91.1% mAP@0.5, a 2.9% improvement over the baseline—and remarkable efficiency, with only 1.18M parameters and 3.4G FLOPs.

The development of LAI-YOLO has direct and significant implications for modern power grid maintenance practices. By providing a highly accurate and computationally efficient anomaly detection model, this work paves the way for the widespread deployment of fully automated UAV-based inspection systems. Such systems can transition grid maintenance from a reactive, schedule-based model to a proactive, condition-based one. The ability to perform frequent, low-cost, and comprehensive inspections without endangering human workers will lead to the earlier detection of faulty insulators, thereby preventing costly power outages, minimizing downtime, and ultimately enhancing the resilience and operational safety of the transmission network.

While the results are positive, it is important to acknowledge the limitations of this research. The inference performance reported in this work is based on desktop GPU benchmarks, which may not fully reflect the real-time capabilities of the model when deployed on a UAV platform. In a complete UAV pipeline, factors such as image acquisition time, data transmission latency, and the computational capacity of embedded hardware could impact the overall frame rate and responsiveness. Future work will focus on integrating LAI-YOLO into a UAV system for end-to-end performance evaluation, including real-time image processing under varying flight altitudes, lighting conditions, and transmission scenarios. In future work, we will extend the comparison to additional representative lightweight detectors (e.g., EfficientDet-Lite) and emerging architectures (e.g., YOLO12) to further verify the generalization ability of LAI-YOLO. Moreover, the generalization performance of the proposed LAI-YOLO under more diverse and challenging conditions, such as fog, night, and significantly different UAV altitudes, requires further investigation due to the limited variability in the current dataset. Finally, the evaluation in this work primarily focused on deployment constraints relevant to edge devices. The scaling behavior of LAI-YOLO on more powerful GPU platforms and its competitive standing against larger, state-of-the-art models when computational resources are less constrained remain an open and interesting question.

Author Contributions

Conceptualization, J.Q. and Z.Z.; methodology, J.Q. and Z.Z.; software, J.Q.; validation, J.Q., Z.J. and C.W.; formal analysis, J.Q., Z.J. and Y.W.; investigation, Z.Z.; resources, Z.Z.; writing—original draft preparation, J.Q.; writing—review and editing, Z.Z.; visualization, J.Q. and Z.J.; supervision, Z.Z.; project administration, J.Q. and Z.Z.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “Vanguard Leading Geese+X” technology project of Zhejiang Province under Grant 2025C01033 and the scientific research project of Wenzhou under Grant ZF2022003.

Data Availability Statement

GitHub repository link: https://github.com/QuJianan729/LAI-YOLO (accessed on 1 October 2025).

Acknowledgments

The author gratefully acknowledges the meticulous participation of all laboratory members and the insightful recommendations from the unidentified reviewers for this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, J.; Hu, M.M.; Dong, J.Y.; Lu, X. Summary of insulator defect detection based on deep learning. Electr. Power Syst. Res. 2023, 224, 109688. [Google Scholar] [CrossRef]
Liu, Y.; Liu, D.C.; Huang, X.B.; Li, C.J. Insulator defect detection with deep learning: A survey. IET Gener. Transm. Distrib. 2023, 17, 3541–3558. [Google Scholar] [CrossRef]
Cao, Y.; Xu, H.; Su, C.; Yang, Q. Accurate Glass Insulators Defect Detection in Power Transmission Grids Using Aerial Image Augmentation. IEEE Trans. Power Deliv. 2023, 38, 956–965. [Google Scholar] [CrossRef]
Tao, X.; Zhang, D.P.; Wang, Z.H.; Liu, X.L.; Zhang, H.Y.; Xu, D. Detection of Power Line Insulator Defects Using Aerial Images Analyzed With Convolutional Neural Networks. IEEE Trans. Syst. Man-Cybern.-Syst. 2020, 50, 1486–1498. [Google Scholar] [CrossRef]
Ren, Z.H.; Fang, F.Z.; Yan, N.; Wu, Y. State of the Art in Defect Detection Based on Machine Vision. Int. J. Precis. Eng.-Manuf.-Green Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
Zhai, Y.J.; Chen, R.; Yang, Q.; Li, X.X.; Zhao, Z.B. Insulator Fault Detection Based on Spatial Morphological Features of Aerial Images. IEEE Access 2018, 6, 35316–35326. [Google Scholar] [CrossRef]
Liao, S.L.; An, J.B. A Robust Insulator Detection Algorithm Based on Local Features and Spatial Orders for Aerial Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 963–967. [Google Scholar] [CrossRef]
He, H.Y.; Hu, Z.; Wang, B.Z.; Luo, D.S.; Lee, W.J.; Li, J.M. A Contactless Zero-Value Insulators Detection Method Based on Infrared Images Matching. IEEE Access 2020, 8, 133882–133889. [Google Scholar] [CrossRef]
Wei, Z.x. Composite Insulator Defect Identification and Quantitative Method Based on Random Hough Transform Ellipse Detection. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; Volume 2170. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Chen, Y.; Deng, C.; Sun, Q.; Wu, Z.; Zou, L.; Zhang, G.; Li, W. Lightweight detection methods for insulator self-explosion defects. Sensors 2024, 24, 290. [Google Scholar] [CrossRef]
Zhou, M.; Wang, J.; Li, B. ARG-Mask RCNN: An Infrared Insulator Fault-Detection Network Based on Improved Mask RCNN. Sensors 2022, 22, 4720. [Google Scholar] [CrossRef]
Wang, Y.; Qu, Z.; Hu, Z.; Yang, C.; Huang, X.; Zhao, Z.; Zhai, Y. Cross-Domain Multilevel Feature Adaptive Alignment R-CNN for Insulator Defect Detection in Transmission Lines. IEEE Trans. Instrum. Meas. 2025, 74, 6001112. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Wei, L.; Dragomir, A.; Dumitru, E.; Christian, S.; Scott, R.; Cheng-Yang, F.; Berg, A.C. SSD: Single Shot MultiBox Detector; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
Akella, R.; Gunturi, S.K.; Sarkar, D. Enhancing Power Line Insulator Health Monitoring with a Hybrid Generative Adversarial Network and YOLO3 Solution. Tsinghua Sci. Technol. 2024, 29, 1796–1809. [Google Scholar] [CrossRef]
Zeng, B.; Zhou, Y.; He, D.; Zhou, Z.; Hao, S.; Yi, K.; Li, Z.; Zhang, W.; Xie, Y. Research on Lightweight Method of Insulator Target Detection Based on Improved SSD. Sensors 2024, 24, 5910. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Wei, X.; Zhang, L.; Yu, L.; Chen, Y.; Tu, M. YOLO v7-ECA-PConv-NWD Detects Defective Insulators on Transmission Lines. Electronics 2024, 12, 3969. [Google Scholar] [CrossRef]
Ji, Y.; Zhang, D.; He, Y.; Zhao, J.; Duan, X.; Zhang, T. Improved YOLO11 Algorithm for Insulator Defect Detection in Power Distribution Lines. Electronics 2025, 14, 1201. [Google Scholar] [CrossRef]
Souza, B.J.; Stefenon, S.F.; Singh, G.; Freire, R.Z. Hybrid-YOLO for classification of insulators defects in transmission lines based on UAV. Int. J. Electr. Power Energy Syst. 2023, 148, 108982. [Google Scholar] [CrossRef]
Chen, Y.F.; Zhang, C.Y.; Chen, B.; Huang, Y.Y.; Sun, Y.F.; Wang, C.M.; Fu, X.J.; Dai, Y.X.; Qin, F.W.; Peng, Y.; et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases. Comput. Biol. Med. 2024, 170, 107917. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A lightweight-design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive sparsity for the magnitude-based pruning. arXiv 2020, arXiv:2010.07611. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, S. Focaler-iou: More focused intersection over union loss. arXiv 2024, arXiv:2401.10525. [Google Scholar] [CrossRef]
Yu, Z.P.; Huang, H.B.; Chen, W.J.; Su, Y.X.; Liu, Y.H.; Wang, X.Y. YOLO-FaceV2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 62. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Zhang, H.; Zhang, S. Shape-iou: More accurate metric considering bounding box shape and scale. arXiv 2023, arXiv:2312.17663. [Google Scholar] [CrossRef]
Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More effective intersection over union loss with auxiliary bounding box. IEEE Access 2023, arXiv:2311.02877. [Google Scholar] [CrossRef]
Ma, S.; Xu, Y. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar] [CrossRef]

Figure 1. The basic architecture of YOLO11.

Figure 2. The architecture of High-level Screening–Feature Fusion Pyramid Networks (HS-FPN) [23].

Figure 3. The architecture of the GSConv [24] module.

Figure 4. The overall architecture of the proposed LAI-YOLO model.

Figure 5. Architectures of the Bottleneck. (a) Origin; (b) Improved SG-Bottleneck in SG-C3k2.

Figure 6. Visualization of feature maps for the Backbone network. (a) Origin; (b) Low-level feature map; (c) Middle-level feature map; (d) High-level feature map.

Figure 7. Upsampling feature map comparison: (a) rigin; (b) without bilinear interpolation; (c) both methods used.

Figure 8. Architectures of the (a) VoVGSCSPC and (b) GSBottleneck.

Figure 9. Architecture of the LA-Head.

Figure 10. Representative samples of the dataset. (a) insulator defect; (b) aging insulator; (c) dirty insulator.

Figure 11. Comparison of channel quantities prior to and after model pruning when

P r

= 1.2.

Figure 11. Comparison of channel quantities prior to and after model pruning when

P r

= 1.2.

Figure 12. Performance comparison curves during the training process.

Figure 13. Visual comparison of predictions and Grad-CAM++ heatmaps across different models.

Table 1. Experimental platform environment.

Environmental Configuration	Version
GPU	Intel i9-10900X 3.70 GHz
CPU	NVIDIA GeForce RTX 3090
RAM	32 GB
System	Windows 11
CUDA	11.8
Python	3.10
Torch	2.0.1

Table 2. Performance of LAI-YOLO under different LAMP pruning ratios (

P r

).

Table 2. Performance of LAI-YOLO under different LAMP pruning ratios (

P r

).

$\Pr$	Precision	Recall	mAP@0.5	mAP@0.5:0.95	Params	FLOPs	Size
$\Pr$	(%)	(%)	(%)	(%)	(M)	(G)	(MB)
1.0	92.5	86.7	90.6	58.2	1.47	4.1	3.16
1.1	93.6	84.7	91.1	60.3	1.31	3.7	2.88
1.2	95.0	83.1	91.1	61.2	1.18	3.4	2.64
1.3	93.6	84.1	90.2	60.6	1.09	3.1	2.45
1.4	94.8	83.0	89.5	60.3	0.99	2.9	2.27
1.5	92.9	83.3	88.8	60.7	0.90	2.7	2.10
1.6	93.6	81.8	88.6	60.4	0.84	2.6	1.98
1.7	91.9	84.0	88.3	59.4	0.77	2.4	1.85
1.8	90.6	82.3	88.2	58.9	0.71	2.2	1.72

Table 3. Comparative experiment of various feature fusion networks.

Method	Precision	Recall	mAP@0.5	mAP@0.5:0.95	Params	FLOPs	Size
Method	(%)	(%)	(%)	(%)	(M)	(G)	(MB)
baseline	94.0	80.2	88.2	59.0	2.58	6.3	5.24
Bi-FPN [30]	88.2	81.3	84.9	57.5	2.61	6.9	5.35
CCFF [31]	91.0	80.3	86.9	58.6	1.80	5.3	3.76
HS-FPN	90.0	83.9	88.5	59.0	1.85	5.7	3.83
HS-WFPN	89.4	84.0	89.3	59.6	1.85	5.5	3.86

Note: Values in bold signify the optimal results.

Table 4. Comparative experiment of various loss functions.

Loss	Precision (%)	Recall (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)
Baseline (CIoU)	94.0	80.2	88.2	59.0
DIoU	92.9	83.4	88.6	59.2
EIoU	90.2	78.9	85.9	57.8
GIoU	89.8	81.1	89.1	60.1
Shape-IoU [32]	91.6	82.8	87.5	58.4
inner-CIoU [33]	91.8	82.3	88.5	59.9
MPDIoU [34]	92.0	82.5	88.7	59.3
Focaler-CIoU [28]	90.0	83.6	88.0	60.0
SWFocalerLoss	87.6	86.1	89.2	61.3

Note: Values in bold signify the optimal results.

Table 5. Ablation experiment results.

SG-C3k2	HS-WFPN	LA-Head	SWFocaler -Loss	LAMP	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	FLOPs (G)
-	-	-	-	-	88.2	59.0	2.58	6.3
✓	-	-	-	-	88.8	60.5	2.39	5.9
-	✓	-	-	-	89.3	59.6	1.85	5.5
-	-	✓	-	-	88.7	59.5	2.26	5.1
-	-	-	✓	-	89.2	61.3	2.58	6.3
✓	✓	-	-	-	89.9	59.4	1.66	5.1
✓	-	✓	-	-	88.9	60.9	2.07	4.7
-	✓	✓	-	-	89.5	59.6	1.66	4.5
✓	✓	✓	-	-	90.0	58.5	1.47	4.1
✓	✓	✓	✓	-	90.6	58.2	1.47	4.1
✓	✓	✓	✓	✓	91.1	61.2	1.18	3.4

Note: ✓ signifies that the module is included, whereas - denotes that the module is excluded. Values in bold signify the optimal results.

Table 6. Comparative experiment of various lightweight YOLO models.

Model	Precision (%)	Recall (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	FLOPs (G)	Size (MB)	FPS
YOLOv3t	87.2	78.5	82.0	47.4	9.52	14.3	18.30	81.3
YOLOv5n	85.2	82.7	86.7	55.5	2.18	5.8	4.45	89.7
YOLOv6n	88.3	76.9	84.7	54.9	4.16	11.5	8.18	113.0
YOLOv8n	91.9	83.6	88.7	58.5	2.69	6.8	5.39	63.5
YOLOv9t	88.9	81.2	86.3	58.2	1.73	6.4	4.00	30.1
YOLO10n	89.5	81.6	87.9	58.5	2.70	8.2	5.51	36.6
YOLO11n	94.0	80.2	88.2	59.0	2.58	6.3	5.24	61.7
LAI-YOLO	95.0	83.1	91.1	61.2	1.18	3.4	2.64	49.8

Note: Values in bold signify the optimal results.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, J.; Zhu, Z.; Jiang, Z.; Wen, C.; Weng, Y. LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion. Appl. Sci. 2025, 15, 10780. https://doi.org/10.3390/app151910780

AMA Style

Qu J, Zhu Z, Jiang Z, Wen C, Weng Y. LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion. Applied Sciences. 2025; 15(19):10780. https://doi.org/10.3390/app151910780

Chicago/Turabian Style

Qu, Jianan, Zhiliang Zhu, Ziang Jiang, Congjie Wen, and Yijian Weng. 2025. "LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion" Applied Sciences 15, no. 19: 10780. https://doi.org/10.3390/app151910780

APA Style

Qu, J., Zhu, Z., Jiang, Z., Wen, C., & Weng, Y. (2025). LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion. Applied Sciences, 15(19), 10780. https://doi.org/10.3390/app151910780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LAI-YOLO: Towards Lightweight and Accurate Insulator Anomaly Detection via Selective Weighted Feature Fusion

Abstract

1. Introduction

2. Related Works

2.1. YOLO11 Algorithm

2.2. High-Level Screening–Feature Fusion Pyramid Networks

2.3. Characteristics of GSConv

3. Proposed Method and Experimental Setup

3.1. SG-C3k2 Module

3.2. High-Level Screening–Feature Weighted Feature Pyramid Network

3.3. Improvements of the Head

3.3.1. LA-Head

3.3.2. Slide Weighted Focaler Loss

3.4. LAMP Channel Pruning

3.5. Dataset

3.6. Experimental Environment

3.7. Evaluation Metrics

4. Results and Discussion

4.1. LAMP Result Analysis

4.2. Comparative Analysis of Various Feature Fusion Networks

4.3. Comparative Analysis of Various Loss Functions

4.4. Ablation Experiments

4.5. Comparative Analysis of Various Lightweight YOLO Models

4.6. Visualization and Heatmap Comparative Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI