A Lightweight Model for Insulator Defect Detection Based on Vision–Language Modeling and Prior Knowledge in Power Systems

Shanfeng Liu; Weijian Zhang; Shaoguang Yuan; Hua Bao; Wandeng Mao; Shengzhe Xi

doi:10.3390/pr13113714

,

and

¹

Electrical Power Research Institute of Henan Electric Power Corporation, Zhengzhou 450052, China

²

State Grid Henan Electric Power Company, Zhengzhou 450052, China

³

School of Artificial Intelligence, Anhui University, Hefei 230039, China

^*

Authors to whom correspondence should be addressed.

Processes2025, 13(11), 3714;https://doi.org/10.3390/pr13113714
(registering DOI)

This article belongs to the Special Issue AI-Driven Innovations for Enhancing Power System Stability and Operational Efficiency

Version Notes

Order Reprints

Abstract

Insulators serve as critical insulating components in power transmission lines, and their defects are one of the primary causes of power outages in power grids. Power companies widely utilize unmanned aerial vehicle (UAV) inspections to collect image data of power transmission lines. However, existing methods face two core challenges: at the data level, insulator defect samples are extremely scarce in massive image datasets, leading to severe data imbalance issues. At the algorithm level, deep learning-based defect detection methods rely on data-driven feature extraction, ignoring quantifiable prior knowledge such as insulator installation specifications and mechanical structure. This factor results in low localization efficiency and poor robustness in complex scenarios. To address these issues, this paper proposes an insulator defect detection method based on Vision–Language models and prior knowledge. It extracts prior knowledge about the physical characteristics of insulators, quantifies spatial structure and installation specifications as prior constraints, embeds prior knowledge into the vision–language model’s feature space to generate insulator defect samples, addresses the data imbalance issue, and detects insulator defects using an improved You Only Look Once (YOLO) algorithm. This approach reduces model parameters while maintaining detection accuracy, constructing a lightweight model for insulator defect detection. The experimental results show that, compared with PP-YOLOE-m and RT-DETR-R18 models, the method proposed in this paper can significantly improve the detection accuracy. The mean average precision indicator of the model in this paper has reached 95.7%.

Keywords:

insulator defect detection; deep learning; UAV; YOLO; lightweight model

1. Introduction

Overhead power lines are a critical component of the power transmission network and are responsible for transmitting electrical energy [,,]. Overhead power lines primarily consist of power poles, overhead power lines, insulators, and other hardware. Insulators play a dual core role in providing electrical insulation and mechanical support, making them the key components ensuring the safe and stable operation of overhead power lines. However, insulators are constantly exposed to outdoor environments and are highly susceptible to adverse environmental conditions and pollution [,,]. Over time, insulators inevitably undergo material degradation. These factors can lead to defects such as spontaneous breakage in insulators []. If such defects are not detected promptly by power companies, they can reduce the insulation performance of transmission lines [], potentially causing large-scale power outages and resulting in significant economic losses and social impacts []. Therefore, regular defect detection of insulators on transmission lines is an indispensable component in ensuring the reliability of the power supply [].

In the early days, insulator defect detection primarily relied on manual inspections []. Power company maintenance personnel used telescopes and other equipment to observe whether insulators on transmission lines had defects []. However, overhead lines cover large areas and span long distances, making this inspection method inefficient []. The detection of insulator defects largely depended on the experience of maintenance personnel, significantly reducing the accuracy of insulator defect detection [,]. In recent years, power companies have gradually adopted unmanned aerial vehicle (UAV) inspections [,,], manned helicopter inspections [], robotic inspections [], and remote sensing satellite inspections [] in order to improve inspection efficiency and accuracy of insulator defect identification. Among these methods, using UAV-mounted cameras to photograph transmission lines and identify defects in insulators and other components has significantly improved inspection efficiency [,,]. This method has become the preferred inspection approach adopted by many power companies.

The use of UAV inspections generates a large amount of image data. Improving inspection efficiency while ensuring detection accuracy has become a major issue that needs to be addressed urgently. Traditional insulator defect detection algorithms primarily rely on manually constructed insulator defect features. The traditional insulator defect detection algorithm workflow mainly includes image preprocessing, feature extraction, and a defect classification model []. Defects are located using the insulators’ mechanical structure, color distribution, and texture features. This primarily includes edge detection algorithms based on the Canny operator or Sobel operator, as well as frequency domain analysis methods that extract periodic texture features of insulators using the Fourier transform or wavelet analysis. These algorithms achieve high accuracy and detection efficiency when detecting images with simple backgrounds, good lighting conditions, and distinct defect features. However, the aforementioned methods heavily rely on manually designed feature sets, resulting in insulator defect detection models with extremely weak generalization capabilities. When processing UAV inspection images, detection accuracy drops sharply. This is because images captured by UAVs often have complex backgrounds, and various factors such as farmland, weather, forests, and buildings are recorded in the images. The complex background interferes with the feature extraction of traditional insulator defect detection algorithms. Additionally, the types of insulator defects based on manually constructed feature sets are limited and cannot cover all kinds of insulator defects in actual working conditions, making it difficult for traditional insulator defect detection algorithms to locate insulators and defect types in images accurately []. Thanks to advancements in computer technology, deep learning algorithms have achieved remarkable success across various industries. By leveraging self-learning mechanisms to uncover patterns within data, deep learning algorithms significantly reduce reliance on manually constructed feature sets. In recent years, they have emerged as the mainstream algorithm in insulator defect detection.

In 2016, deep learning achieved a breakthrough in the ImageNet large-scale visual recognition challenge. This shows that deep learning algorithms, represented by the convolutional neural network (CNN) model, can be effectively applied to image recognition. Deep learning models have complex model structures, which give deep learning algorithms good feature capture capabilities. Deep learning-based image recognition algorithms capture the features contained in image data through self-learning and abstract image features into high-dimensional feature vectors, no longer relying on human experience []. This demonstrates that when trained on sufficiently large datasets, deep learning-based image recognition algorithms can capture key features within the dataset, significantly enhancing the generalization capabilities of image recognition algorithms.

Deep learning-based insulator defect detection algorithms are divided into two-stage and single-stage detection algorithms []. Two-stage detection algorithms generate candidate regions in the first stage and then classify the candidate regions in the second stage. Such algorithms mainly include the faster r-cnn model, the cascade r-cnn model, and their improved variants. Taking the insulator defect detection algorithm based on the faster r-cnn [] model as an example, in the model recognition stage, the region proposal network (RPN) generates a series of region proposals. In this stage, the model mainly uses global features to mine potential region proposals, which contain insulators with defects. In the second stage, ROI Align is used to pool different region proposals into feature maps of a fixed size, and the positions of region proposals are adjusted using local features of the image to improve detection accuracy further. The insulator defect detection algorithm based on the faster r-cnn model has a complex structure, low recognition efficiency, and requires significant computational resources. However, it provides rich insulator defect information and achieves high detection accuracy.

However, in the massive aerial images captured by UAVs, only a few insulators have defects. Using a two-stage detection algorithm to screen out defective insulators consumes a significant amount of computational resources and cannot meet the requirements for real-time detection. Therefore, single-stage detection algorithms have gradually become a research hotspot in the field of insulator defect detection []. Single-stage detection algorithms abandon the RPN model, merging candidate region generation and precise detection into a single forward propagation, directly outputting the bounding boxes and defect categories of insulators in the image. Such algorithms primarily include the You Only Look Once (YOLO) algorithm, Single Shot MultiBox Detector (SSD) algorithm, RetinaNet algorithm, and others. Using the insulator defect detection algorithm based on the YOLO model as an example [], the YOLO model performs multiple samplings of the original image to generate various feature maps. Feature maps with large receptive fields are used to locate insulator strings, those with medium receptive fields contain defect features of multiple insulators, and those with small receptive fields contain defect features of single insulators []. By fusing features at different scales, the detection results are output. Single-stage detection algorithms have a clear workflow, a simple model structure, and high recognition speed, making them well-suited for real-time detection requirements. To enable single-stage detection algorithms to fully capture various insulator defect features, a large amount of training data with defective insulators is required. However, in UAV aerial images, normal insulators and defective insulators are extremely imbalanced, significantly reducing the accuracy of single-stage detection algorithms.

To address the issue of scarce defect insulator samples, relevant researchers have developed Data Augmentation algorithms based on pixel, geometric, hybrid, and simulation-level data augmentation algorithms [,,]. In pixel-level enhancement algorithms, noise is injected into images to enhance the robustness of the insulator detection model against noise. By adjusting the saturation of images, the algorithms simulate insulator images under different lighting conditions and seasons. In the geometric-level enhancement algorithm, new perspectives are expanded through random flipping and random rotation to simulate the tilted image capture of a UAV. Scale scaling simulates images taken by a UAV at different flight heights. In the hybrid-level enhancement algorithm, multiple images are stitched into a single image to increase the density of insulators in the image and construct a combination of various defect features. In the simulation-level enhancement stage, physical models overlay rain, fog, and other factors on the original images, enhancing the detection model’s robustness to weather conditions. While the above methods can significantly increase the number of detected defective insulators, they merely involve linear combinations and local transformations of existing defect features. References [,,] combine data augmentation techniques with deep learning methods to construct defect detection models; however, such approaches do not incorporate prior knowledge of the physical structure of insulator strings. The generated defect insulator samples may not match the actual shapes of insulators in real-world environments, leading to the detection model’s inability to recognize new defect features and thereby reducing the accuracy of the insulator defect detection model.

To address the above issues, this paper proposes a lightweight model for insulator defect detection based on Vision–language modeling and prior knowledge. The model generates defective insulator samples based on prior knowledge and Vision–language modeling to ensure the authenticity of the samples. An improved YOLO model is used to construct a lightweight model for insulator defect detection, which greatly reduces the computational cost while ensuring detection accuracy, thereby meeting the requirements of real-time detection. The main contributions of this paper are as follows:

(1): This paper proposes a lightweight model for insulator defect detection based on visual language models and prior knowledge. This method significantly reduces computational costs while ensuring detection accuracy, meeting the requirements of real-time detection scenarios.
(2): This paper uses vision–language model stable diffusion to generate defective insulator samples. By leveraging prior knowledge such as insulator installation specifications and mechanical structures, the visual language model is guided to generate defective insulator samples that comply with the prior knowledge.
(3): This paper proposes a lightweight model for insulator defect detection based on the coordinate attention mechanism to improve the YOLOv8m model, reducing the computational cost of the detection stage and meeting the requirements of real-time detection of insulator defects.

2. Methodology

To significantly reduce computational complexity while maintaining defect detection accuracy and addressing the scarcity of insulator defect samples, this paper proposes a lightweight model for insulator defect detection based on vision–language modeling and prior knowledge. The model primarily consists of a data augmentation module and an insulator defect detection module. The training-phase workflow is illustrated in the diagram below.

As shown in Figure 1, this paper generates insulator defect samples consistent with prior knowledge based on a small number of real defect samples and GPT-4 prompts. By employing a coordinate attention mechanism to reduce the model parameters of YOLOv8m, a lightweight insulator defect detection model is constructed.

Figure 1. Model flowchart during training phase.

2.1. Data Augmentation Methods Based on Vision–Language Modeling and Prior Knowledge

In recent years, large language models (LLMs) represented by GPT have achieved remarkable results in many fields. LLMs based on the Transformer model have billions of parameters. LLMs mine complex knowledge patterns from massive amounts of text, which gives LLM models good generalization and feature expression capabilities and enables them to process multimodal data. Take the visual language model stable diffusion (SD) as an example [,]. The SD model has cross-modal feature extraction and reasoning capabilities from text to images. This paper uses insulator installation specifications and mechanical structures as prior knowledge to guide the SD model in generating defective insulator samples and solving the data imbalance problem. The primary defect types of insulator strings include missing insulators, breakage, cracks, pollution, mechanical deformation, and pin loss. Among all defect types, missing insulators significantly reduce the insulating strength and mechanical strength of insulator strings. Therefore, among the numerous defects, this paper selects the missing insulator defect type as the primary analysis object.

The complex background of the real UAV aerial photograph is shown below. UAV aerial images are captured at a certain flight altitude with varying angles, resulting in complex backgrounds in UAV aerial images. Figure 2 shows various complex backgrounds, such as farmland and buildings, in UAV aerial images. Insulators are small and have few visible features in images, and complex backgrounds pose a major challenge for insulator defect detection.

Figure 2. UAV aerial image with complex backgrounds.

If complete UAV aerial images are generated based on the SD model, the complex image background significantly interferes with the authenticity of the generated images. The reason is that insulator images are relatively small in UAV aerial images, and generating complex image backgrounds affects the authenticity of the generated insulator defect samples. To enable the insulator defect detection model to capture effective insulator defect features fully, this paper first segments insulator images in the original images, generates defect insulator samples, and then re-superimposes them onto the original background.

This paper proposes a lightweight model for insulator defect detection based on vision–language modeling and prior knowledge in power systems. During the training stage, the main process of the proposed model is shown in the figure below. The proposed model extracts prior knowledge about the physical characteristics of insulators, quantifies spatial structure and installation specifications as prior constraints, embeds prior knowledge into the vision–language model’s feature space to generate insulator defect samples, addresses the data imbalance issue, and detects insulator defects using an improved You Only Look Once (YOLO) algorithm.

The SD model can generate high-quality images based on text prompts. The SD model [,] mainly consists of three parts: Contrastive Language-Image Pre-training (CLIP), variational auto-encoder (VAE), and Diffusion model. The CLIP model builds a bridge between textual information and image information. The CLIP model converts textual information containing prior knowledge into high-dimensional vectors. It injects them into the diffusion process through a cross-attention mechanism, guiding the SD model to generate new images that conform to prior knowledge. The VAE model significantly compresses images, reducing computational complexity and cost. The Diffusion model generates feature information for new image samples during inference. This paper employs SD [], a latent diffusion model (LDM), as the main model for generating images. The main equations are shown below.

L_{L D M} : = E_{E (x), ϵ ~ N (0, 1), t} [{‖ ϵ - ϵ_{θ} (z_{t}, t) ‖}_{2}^{2}]

(1)

In the above equation,

z_{t}

represents the noised latent variable at time step

t

.

L_{L D M}

represents the loss function.

E_{E (x), ϵ ~ N (0, 1), t} [\cdot]

represents the expected value of all random variables.

L_{L D M}

indicates that the optimization objective is to minimize the difference between the model’s predicted noise and the actual noise.

E (x)

represents the process by which encoder

E

encodes image

x

.

ϵ ~ N (0, 1)

denotes noise

ϵ

, which follows a standard normal distribution with mean 0 and variance 1. This loss function is used in a denoising setting, where the model is trained to predict and remove the noise added to the image during the forward diffusion process.

θ

represents model parameters.

ϵ_{θ}

denotes the noise predicted by the model based on

z_{t}

.

In the SD model [], the encoded text vectors are injected into the diffusion process through a cross-attention mechanism. The main expressions are as follows.

A = softmax (\frac{Q K^{T}}{\sqrt{d}}) \cdot V, A \in [P, P, N]

(2)

Q = W_{Q}^{(i)} \cdot φ_{i} (z_{t}), K = W_{K}^{(i)} \cdot τ (c), V = W_{V}^{(i)} \cdot τ (c),

(3)

In the above equation,

A

represents the attention matrices,

Q

represents the query matrices,

K

represents the key matrices,

V

represents the value matrices.

W^{(i)}

represents the learnable projection matrices,

d

represents the dimension of input data,

T

represents the number of time steps in the diffusion model,

P

represents the spatial resolution dimension, typically the height and width of an image,

N

represents the number of tokens in the text description.

φ_{i} (z_{t})

represents the intermediate feature.

It is worth noting that the general distribution of the SD model differs significantly from the normal distribution in the insulator defect detection scenario. Using the general distribution of the SD model will reduce the authenticity of the generated samples []. Therefore, this paper uses masks to constrain the location of defect generation. The main equations are shown below.

z_{t} = m a s k ⊙ z_{t} + (1 - m a s k) ⊙ z_{t}^{normal} .

(4)

The above equation ensures the global consistency of the generated samples. To enable the model to effectively generate defect features based on text information within the specified area, this paper introduces attention-guided anomaly optimization to enhance the model’s attention to text information []. The main equations are shown below.

{\bar{A}}_{t} = Gaussian (softmax ({\bar{A}}_{t})), {\bar{A}}_{t} \in 16 \times 16 \times N .

(5)

In the above equation,

{\bar{A}}_{t}

represents the averaged attention map.

Gaussian (\cdot)

performs Gaussian smoothing on the attention map to eliminate noise and fill in gaps, making the attention area more coherent in space and avoiding artifacts caused by excessive focus on isolated pixels []. The above equation represents the process of refining the attention map through normalization and smoothing. The loss function in the optimization process is shown below.

L_{a t t} = 1 - \max ({\bar{A}}_{t}^{j} ⊙ mask),

(6)

z_{t} \leftarrow z_{t} - α_{t} \cdot \nabla_{z_{t}} L_{a t t} ⊙ m a s k

(7)

In the above equation,

α_{t}

represents the dynamic step size.

L_{a t t}

represents the loss function. This loss function ensures that the model has at least one pixel within the specified area with the highest possible attention score for the text information.

To generate rich defect features, this paper uses GPT-4 to generate more detailed defect feature text information. For example, the initial text description “A photo of an insulator string that is damaged” is converted to “A photo of an insulator string with multiple elliptical insulators missing from the bottom of an insulator string”. Since the defect feature text information generated by GPT-4 is relatively long, this paper introduces a CLIP-based image generation loss in the final denoising steps to ensure the model fully considers this information []. The main expression is shown below.

L_{i m g} = 1.0 - cosine (Φ^{T} (c^{'}), Φ^{V} ({\tilde{x}}_{t}))

(8)

In the above equation,

Φ^{T} (\cdot)

represents the CLIP text.

Φ^{V} (\cdot)

represents the visual encoders.

c^{'}

represents the detailed exception text description provided by GPT-4. The value range of

cosine (\cdot, \cdot)

is [−1,1]. At this point, the joint optimization function [] of

z_{t}

is shown below.

L = L_{i m g} + α_{t} \cdot L_{a t t}, z_{t} \leftarrow z_{t} - \nabla_{z_{t}} L ⊙ m a s k .

(9)

L_{prompt} = 1.0 - cosine (τ (c), τ (c^{'}))

(10)

τ (c) \leftarrow τ (c) - \nabla_{τ (c)} (L_{prompt} + L_{img}) .

(11)

In the above equation,

τ (c)

represents the embedded vector of prompts. This paper employs GPT to generate prompt-guided text. Based on the text generated by GPT, the defect types of insulators are fine-tuned to enrich the defect features of insulators. The Data Augmentation model framework proposed in this paper is shown in the figure below.

As can be seen from Figure 3, based on normal insulator images and prior knowledge about defects, the proposed model generates new defect samples on normal insulator images. GPT is used to adjust the prior knowledge to increase the diversity of defect samples. This paper evaluates the validity of synthetic defects by determining whether segments in insulator images match the original images and whether complete insulators are missing at different locations.

Figure 3. Structure of Stable Diffusion Model.

2.2. Lightweight Algorithm Based on Coordinate Attention Mechanism

This paper selects a single-stage detection algorithm as the main technical approach to meet the requirements of real-time detection of insulator defects. Considering the small insulator images in UAV aerial photography, this paper selects the YOLO model as the backbone structure for a lightweight model for insulator defect detection. The YOLO model converts small object detection problems into regression problems, directly extracts features from the entire image through a fully convolutional network, integrates features of different scales, and outputs detection results in a single forward propagation calculation, greatly reducing computational costs and improving inference efficiency [].

This paper mixes the defective insulator samples generated in the previous section with real defective insulator samples to form a mixed dataset. To adapt to the small size of insulators, this paper selects YOLOv8m as the main research object and improves YOLOv8m using the coordinate attention mechanism []. Among the YOLO family, YOLOv8m is the medium-depth variant. YOLOv8m provides the best accuracy-to-speed-to-parameter ratio and a future-proof codebase, making it the most suitable starting point for lightweight enhancement tailored to real-time insulator defect detection on edge devices. Then, training is performed on the mixed dataset. The main insulator defect features detected in this paper are shown in Figure 4.

Figure 4. Typical defect insulator images.

The coordinate attention mechanism (CAM) is an attention module designed for lightweight models. CAM embeds spatial coordinate information into channel attention while retaining position information and channel dependencies. The CAM model architecture is shown in Figure 5. Using CAM to improve YOLOv8m can enhance its ability to identify insulator strings and defects in complex backgrounds, and significantly reduce computational complexity. The primary model structure of CAM is shown in the figure below. The label ‘X pool’ refers to pooling along the horizontal dimension of the input feature map. The label ‘Y pool’ indicates pooling along the vertical dimension. The label ‘Conv2D’ denotes a two-dimensional convolutional layer.

Figure 5. Coordinate attention mechanism structure.

CAM also retains position information and inter-channel dependencies, which allows the insulator defect detection model to maintain high accuracy even in complex contexts. The expressions of CAM in the height and width directions are shown below []. The equations for CAM in the height and width directions are shown below.

z_{c}^{h} (h) = \frac{1}{W} \sum_{i = 0}^{W} x_{c} (h, i)

(12)

z_{c}^{w} (w) = \frac{1}{H} \sum_{j = 0}^{H} x_{c} (j, w) .

(13)

In the above equation,

x_{c}

represents the input feature map of the

c

channel.

W

represents the width of the feature map.

H

represents the height of the feature map.

z_{c}^{h}

represents the pooling result in channel

c

at height

h

.

z_{c}^{w}

represents the pooling result in channel

c

at width direction

w

.

H

and

W

represent spatial dimensions of the feature map,

c

represents the depth dimension of the feature map. This enables the generated attention weights to enhance features along the main direction of the insulator while suppressing interference from complex backgrounds. The theory behind this mechanism is rooted in the ability of attention models to focus on relevant regions of an image while ignoring irrelevant parts. This is achieved by computing attention weights that amplify features in the main direction of the insulator. Next, a concatenation operation is performed []. The detailed equation is as follows:

f = δ (F_{1} ([z_{c}^{h}, z_{c}^{w}]))

(14)

In the above equation,

F_{1}

represents a fully connected layer.

δ

represents the rectified linear unit (ReLU) activation function. The above equation is the fusion step of the CAM. Next, the one-dimensional height feature

f^{h}

and the one-dimensional width feature

f^{w}

are mapped to the attention weight vectors

g^{h}

and

g^{w}

in the height direction and width direction, respectively []. The main expressions are shown below.

g^{h} = σ (F_{h} (f^{h}))

(15)

g^{w} = σ (F_{w} (f^{w}))

(16)

In the above equation,

σ

represents the Sigmoid activation function, which compresses the output to (0,1) to obtain normalized attention weights. In UAV aerial images, insulator strings often appear as long strips distributed vertically or horizontally. CAM assigns greater weight to the height corresponding to vertical insulators and greater weight to the width corresponding to horizontal insulators []. The final output equation for CAM is shown below.

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j) .

(17)

In the above equation,

x_{c} (i, j)

represents the original feature value of the

c

channel of the input feature map at position (i, j).

g_{c}^{h} (i)

represents the high attention weight of channel

c

and row

i

.

g_{c}^{w} (j)

represents the width attention weight of channel

c

and column

j

.

y_{c} (i, j)

represents the output feature value weighted by attention. The above equation generates an enhanced output feature map by simultaneously applying the channel, height, and width to the original feature map. In order to maintain the accuracy of the insulator defect detection model, this paper constructs a bottleneck structure [], the specific expression of which is shown below.

F (x_{c}) = x_{c} + F_{c} (x_{c})

(18)

In the above equation,

F_{c} (x_{c})

represents the feature map after CAM processing.

F (x_{c})

represents the output feature map after residual connection. In order to simultaneously take into account the characteristics of insulator strings and insulator defects, this paper uses identity residual connections to add the original unfiltered features and the coordinate attention-enhanced features element by element, which not only retains the fine-grained details of insulator defects but also prevents the attention mechanism from being overly suppressed, thereby significantly improving detection accuracy while ensuring lightweight []. The main model structure is shown below in Figure 6.

Figure 6. Structure of the CAM with the bottleneck.

At the same time, this paper performs convolution on half of channels to reduce the number of calculations while maintaining the feature information unchanged []. The specific expression is shown below.

P_{c} = [C_{1}, C_{2}, \dots, C_{c - 1}, C_{c}]

(19)

P_{\frac{c}{2}}^{1} = [C_{1}, C_{2}, \dots, C_{\frac{c}{2}}]

(20)

P_{\frac{c}{2}}^{2} = [C_{\frac{c}{2} + 1}, C_{\frac{c}{2} + 2}, \dots, C_{c}]

(21)

o u t p u t = F (P_{c}) = [P_{\frac{c}{2}}^{1}, F_{c} (P_{\frac{c}{2}}^{2})]

(22)

In the above equation,

P_{c}

represents all channels of the entire feature map.

C_{k}

represents the feature slice corresponding to the

k

channel.

P_{\frac{c}{2}}^{1}

skips convolution and retains the original features.

P_{\frac{c}{2}}^{2}

perform convolution and CAM calculations for feature extraction and attention weighting. Finally, the unprocessed half-channel will be concatenated with the half-channel that has undergone convolution and attention processing in the channel dimension.

2.3. Insulator Defect Detection Method Based on an Improved YOLO Model

Based on the YOLOv8m model, this paper improves the YOLOv8m model using the CAM. CAM encodes direction-sensitive positional information into attention weights through one-dimensional global pooling along the height and width directions. To avoid attention weight suppression of small defects, a residual structure is adopted to ensure the original features of the insulator string are transmitted without loss, while also taking into account the local features of the insulator defects. This allows the improved insulator defect detection model to maintain high detection accuracy while significantly reducing the computational cost. The architecture of the model proposed in this paper is shown in Figure 7.

Figure 7. Lightweight model for insulator defect detection structure.

This paper retains the main structure of YOLOv8m and improves it using CAM to enhance the model’s ability to extract multi-scale features of insulator strings, effectively reducing the amount of computation and improving the efficiency of insulator defect detection.

3. Example Analysis

In order to fully verify the detection accuracy of the algorithm in this paper, we generated defective insulator samples based on the public dataset China Power Line Insulator Dataset (CPLID) [] and private datasets. This paper mixed the real insulator samples with the generated insulator samples to form a mixed dataset. This paper selects the PPYOLOE-m model [] and RT-DETR-R18 model [] as the baseline models, and trains the PPYOLOE-m model, RT-DETR-R18 model, and the model in this paper on the same training data set. All processing of insulator images in this paper is based on NVIDIA 3090. The main software used in this paper includes Python 3.8 and PyTorch 1.13.1.

This paper selects Precision, Recall and mAP@.5:.95 (%) as the main evaluation metrics based on reference [,,,]. The mAP@.5:.95 is the mean Average Precision, which is a key metric for object detection models. The detection accuracy of each model is shown in Table 1.

Table 1. The detection accuracy of each model.

PP-YOLOE-m and RT-DETR-R18 were released by PaddlePaddle and Baidu, respectively, and are currently widely used in industry as relatively advanced benchmark models. As can be seen from the table above, the lightweight model for insulator defect detection proposed in this paper has high detection accuracy. The PPYOLOE-m model is a single-stage object detection algorithm, primarily consisting of a backbone, neck, and head. The PPYOLOE-m model adopts an anchor-free method, which significantly reduces model parameters and improves training efficiency. However, the PPYOLOE-m model has poor robustness and is easily affected by complex backgrounds, resulting in low detection accuracy for insulator defects. The RT-DETR-R18 model adopts a four-stage architecture, primarily comprising a Backbone, Hybrid Encoder, Decoder, and Detection Head. The simplified structure of the RT-DETR-R18 model can meet the requirements of real-time detection scenarios. However, insulators are small in size in UAV aerial images, and the self-attention mechanism of the RT-DETR-R18 model focuses on constructing global relationships in images, resulting in poor feature modeling for small targets such as insulator strings and insulator defects, leading to low detection accuracy of the RT-DETR-R18 model. This paper optimizes YOLOv8m based on the coordinate attention mechanism to construct a lightweight model for insulator defect detection, which enables the model to reduce the amount of computation while maintaining detection accuracy greatly. The detection results of the method in this paper are shown in Figure 8.

Figure 8. Defect detection results of insulators.

To measure the adaptability of the method in this paper in real-time detection scenarios, we compared the model parameters, Floating Point Operations (FLOPs), and Frames Per Second (FPS) of the PPYOLOE-m model, RT-DETR-R18 model, and the model in this paper in the same scenario. The Params indicates the number of parameters in the model, expressed in millions. FLOPs stands for Floating Point Operations per Second. It measures the computational load of the model. The specific results are shown in the Table 2 below.

Table 2. The computational expense of each model.

As can be seen from the table above, the model parameters and computational complexity of the model in this paper are slightly higher than those of the PPYOLOE-m model and the RT-DETR-R18 model. However, the real-time inference speed of the method in this paper is significantly higher than that of the above two models, which significantly improves the efficiency of insulator defect detection and can effectively process large amounts of aerial images from UAVs. The lightweight model constructed in this paper is not merely a comparison of model size, but rather the inference time and model size corresponding to the same FPS. The proposed model is trained using a high-quality dataset and improves YOLOv8m using a coordinate attention mechanism, which significantly improves model performance, fully verifying the advantages of the proposed model.

The proposed insulator defect detection model in this paper primarily targets missing insulators. However, for defects such as breakage, cracks, pollution, mechanical deformation, and pin loss, the model demonstrates poor performance. This is attributed to two factors. First, the complex characteristics of these defects make it impossible to generate high-quality training samples based on Vision–language modeling. Second, detecting these defect types would substantially increase model parameters and computational costs, failing to meet lightweight standards.

4. Conclusions

This paper proposes a lightweight model for insulator defect detection based on visual language models and prior knowledge to meet the real-time detection requirements for insulator defect detection. This model significantly reduces model parameters while ensuring detection accuracy, thereby lowering computational costs during the detection phase. This paper extracts prior knowledge, such as insulator installation specifications and mechanical structures. Based on the extracted textual information, the SD model is guided to generate defect insulator samples. To reduce the impact of complex environments on the authenticity of the generated defect insulator samples, this paper first segments the insulator images in the original image, then pastes the generated defective insulator samples back into the original background to ensure the authenticity of the generated defective insulator samples and solve the problem of imbalance between normal samples and defective samples. To reduce the inference cost of the detection model, improve detection efficiency, and meet the requirements of real-time detection scenarios, this paper adopts a single-stage detection algorithm as the main technical approach, using the YOLO model as the main detection algorithm. The YOLO model is optimized based on the coordinate attention mechanism, reducing the two-dimensional pooling algorithm to one-dimensional computation, significantly reducing model parameters and computational complexity. Finally, this paper mixes real and generated defect samples. The experimental results show that the method in this paper has high detection accuracy, which fully demonstrates that after training on a high-quality dataset, the FPS indicator of the proposed model reached 139, far exceeding other benchmark models. The lightweight model can fully capture the defect features of insulators and maintain high detection accuracy with low computational costs, meeting the requirements of detection accuracy and computational resources in real-time detection scenarios.

There are numerous types of defects in insulator strings, including breakage, cracks, pollution, and other types of insulator defects. In the next step, we will further explore the characteristics of complex insulator defects and improve the detection accuracy of the detection model for micro-defects in insulators.

Author Contributions

Conceptualization, S.L., W.Z., S.Y., H.B., W.M. and S.X.; Methodology, S.L., W.Z., S.Y., H.B., W.M. and S.X.; Software, S.L., W.Z., S.Y., H.B., W.M. and S.X.; Writing—original draft, S.L., W.Z., S.Y., H.B., W.M. and S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the State Grid Henan Electric Power Company Science and Technology Project: Key Technologies for Lightweight Training of Multi-modal Large Models for Power Grids (Project Number: 521702240015).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

We would like to express our sincere gratitude to the reviewers for your professional opinions and valuable suggestions, and our heartfelt thanks to the editorial board of the journal for their efficient work in supporting the rigorous presentation of the scholarly results. We would like to pay tribute to the academic community.

Conflicts of Interest

Author Shanfeng Liu, Shaoguang Yuan, Wandeng Mao and Shengzhe Xi were employed by Electrical Power Research Institute of Henan Electric Power Corporation. Author Weijian Zhang was employed by State Grid Henan Electric Power Company. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhang, T.; Zhong, S.; Xu, W.; Yan, L.; Zou, X. Catenary Insulator Defect Detection: A Dataset and an Unsupervised Baseline. IEEE Trans. Instrum. Meas. 2024, 73, 6006015. [Google Scholar] [CrossRef]
Dai, W.; Li, H.; Liu, H.; Goh, H.H.; Yuan, X.; Liu, Y.; Chen, B. An Efficient Affine Arithmetic-Based Optimal Dispatch Method for Active Distribution Networks with Uncertainties of Electric Vehicles. IEEE Trans. Sustain. Energy 2025, 16, 1021–1036. [Google Scholar] [CrossRef]
Cheng, Y.; Liu, D. AdIn-DETR: Adapting Detection Transformer for End-to-End Real-Time Power Line Insulator Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 3528511. [Google Scholar] [CrossRef]
Liu, Q.; Liu, Y.; Yan, Y.; Jiang, Q.; Jiang, X. Addressing Domain Shift in Insulator Defect Data: A Generalization Framework for Cross-Domain Detection of Broken and Self-Blast Insulator Defect. IEEE Trans. Instrum. Meas. 2025, 74, 5037614. [Google Scholar] [CrossRef]
Dai, W.; Li, D.; Liu, H.; Liu, Y. A Cost Surrogate Model for TSO-DSO Coordination based on Polynomial Chaos Expansion. IEEE Trans. Power Systems 2025, 40, 3580–3583. [Google Scholar] [CrossRef]
Yang, K.; Gao, S.; Yu, L.; Zhang, D.; Wang, J.; Song, C. A Real-Time Siamese Network Based on Knowledge Distillation for Insulator Defect Detection of Overhead Contact Lines. IEEE Trans. Instrum. Meas. 2024, 73, 2529916. [Google Scholar] [CrossRef]
Li, C.; Shi, Y.; Lu, M.; Zhou, S.; Xie, C.; Chen, Y. A Composite Insulator Overheating Defect Detection System Based on Infrared Image Object Detection. IEEE Trans. Power Deliv. 2025, 40, 203–214. [Google Scholar] [CrossRef]
Xu, J.; Liao, H.; Li, K.; Jiang, C.; Li, D. Multiscale Feature Fusion Transformer with Hybrid Attention for Insulator Defect Detection. IEEE Trans. Instrum. Meas. 2025, 74, 3539813. [Google Scholar] [CrossRef]
Zhang, N.; Yang, G.; Wang, D.; Hu, F.; Yu, H.; Fan, J. A Defect Detection Method for Substation Equipment Based on Image Data Generation and Deep Learning. IEEE Access 2024, 12, 105042–105054. [Google Scholar] [CrossRef]
Liu, B.; Jiang, W. DFKD: Dynamic Focused Knowledge Distillation Approach for Insulator Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 5039916. [Google Scholar] [CrossRef]
Lu, Z.; Li, Y.; Shuang, F.; Han, C. InsDef: Few-Shot Learning-Based Insulator Defect Detection Algorithm with a Dual-Guide Attention Mechanism and Multiple Label Consistency Constraints. IEEE Trans. Power Deliv. 2023, 38, 4166–4178. [Google Scholar] [CrossRef]
Zhang, Q.; Zhang, J.; Li, Y.; Zhu, C.; Wang, G. ID-YOLO: A Multimodule Optimized Algorithm for Insulator Defect Detection in Power Transmission Lines. IEEE Trans. Instrum. Meas. 2025, 74, 3505611. [Google Scholar] [CrossRef]
Dai, W.; Xu, J.; Goh, H.H.; Shi, T.; Zeng, Z. Small Signal Equivalent Modeling for Large ES-Embedded DFIG Wind Farm with Dynamic Frequency Response. IEEE Trans. Power Syst. 2025, 40, 2324–2335. [Google Scholar] [CrossRef]
Wen, L.; Huang, X.; Cao, B.; Wang, L. An Impulse Voltage-Based Compact Device for Porcelain Insulator Defect Detection. IEEE Trans. Dielectr. Electr. Insul. 2024, 31, 3185–3192. [Google Scholar] [CrossRef]
Cao, Y.; Xu, H.; Su, C.; Yang, Q. Accurate Glass Insulators Defect Detection in Power Transmission Grids Using Aerial Image Augmentation. IEEE Trans. Power Deliv. 2023, 38, 956–965. [Google Scholar] [CrossRef]
Li, Y.; Zhang, W.; Li, P.; Ning, Y.; Suo, C. A method for autonomous navigation and positioning of UAV based on electric field array detection. Sensors 2021, 21, 1146. [Google Scholar] [CrossRef]
Ma, Y.; Li, Q.; Chu, L.; Zhou, Y.; Xu, C. Real-time detection and spatial localization of insulators for UAV inspection based on binocular stereo vision. Remote Sens. 2021, 13, 230. [Google Scholar] [CrossRef]
Li, Y.; Zhu, C.; Zhang, Q.; Zhang, J.; Wang, G. IF-YOLO: An Efficient and Accurate Detection Algorithm for Insulator Faults in Transmission Lines. IEEE Access 2024, 12, 167388–167403. [Google Scholar] [CrossRef]
Tong, H.; Zeng, X.; Yu, K.; Zhou, Z. A Fault Identification Method for Animal Electric Shocks Considering Unstable Contact Situations in Low-Voltage Distribution Grids. IEEE Trans. Ind. Inform. 2025, 21, 4039–4050. [Google Scholar] [CrossRef]
Li, Z.; Guo, R.; Lai, Q.; Yang, J.; Yong, M.; Wang, L.; Fu, S.Y. Survey of inspection technology of overhead transmission line robot based on computer vision. Electr. Power 2018, 51, 139–146. [Google Scholar]
Xia, Y.; Li, Z.; Xi, Y.; Wu, G.; Peng, W.; Mu, L. Accurate Fault Location Method for Multiple Faults in Transmission Networks Using Travelling Waves. IEEE Trans. Ind. Inform. 2024, 20, 8717–8728. [Google Scholar] [CrossRef]
Yi, W.; Ma, S.; Li, R. Insulator and Defect Detection Model Based on Improved Yolo-S. IEEE Access 2023, 11, 93215–93226. [Google Scholar] [CrossRef]
Ding, L.; Rao, Z.Q.; Ding, B.; Li, S.J. Research on Defect Detection Method of Railway Transmission Line Insulators Based on GC-YOLO. IEEE Access 2023, 11, 102635–102642. [Google Scholar] [CrossRef]
Wang, M.; Du, W.; Sun, H.; Zhang, J. Transmission line fault diagnosis method based on infrared image recognition. Infrared Technol. 2017, 39, 383–386. [Google Scholar]
Li, D.; Lu, Y.; Gao, Q.; Li, X.; Yu, X.; Song, Y. LiteYOLO-ID: A Lightweight Object Detection Network for Insulator Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 5023812. [Google Scholar] [CrossRef]
Lu, G.; Li, B.; Chen, Y.; Qu, S.; Cheng, T.; Zhou, J. Precision in Aerial Surveillance: Integrating YOLOv8 with PConv and CoT for Accurate Insulator Defect Detection. IEEE Access 2025, 13, 49062–49075. [Google Scholar] [CrossRef]
Li, Z.; Jiang, C.; Li, Z. An Insulator Location and Defect Detection Method Based on Improved YOLOv8. IEEE Access 2024, 12, 106781–106792. [Google Scholar] [CrossRef]
Wang, Y.; Qu, Z.; Hu, Z.; Yang, C.; Huang, X.; Zhao, Z.; Zhai, Y. Cross-Domain Multilevel Feature Adaptive Alignment R-CNN for Insulator Defect Detection in Transmission Lines. IEEE Trans. Instrum. Meas. 2025, 74, 6001112. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Shen, W.; Fang, M.; Wang, Y.; Xiao, J.; Chen, H.; Zhang, W.; Li, X. AE-YOLOv5 for Detection of Power Line Insulator Defects. IEEE Open J. Comput. Soc. 2024, 5, 468–479. [Google Scholar] [CrossRef]
Lei, D.; Zhu, H.; Ren, H.; Lin, T.; Lin, K. A novel lightweight MT-YOLO detection model for identifying defects in permanent magnet tiles of electric vehicle motors. Expert Syst. Appl. 2025, 288, 128247. [Google Scholar] [CrossRef]
Mei, Z.; Xu, H.; Yan, L.; Wang, K. IALF-YOLO: Insulator defect detection method combining improved attention mechanism and lightweight feature fusion network. Measurement 2025, 253, 117701. [Google Scholar] [CrossRef]
Sánchez, V.; Güven, Ç.; Nápoles, G.; Postma, M.Š. Data Augmentation Techniques for fMRI Data: A Technical Survey. IEEE Access 2025, 13, 66529–66556. [Google Scholar] [CrossRef]
Lee, H.; Jin, Z.; Woo, J.; Noh, B. SaliencyMix+: Noise-Minimized Image Mixing Method with Saliency Map in Data Augmentation. IEEE Access 2025, 13, 21734–21743. [Google Scholar] [CrossRef]
Im, J.; Kasahara, J.Y.L.; Maruyama, H.; Asama, H.; Yamashita, A. Blend AutoAugment: Automatic Data Augmentation for Image Classification Using Linear Blending. IEEE Access 2024, 12, 68770–68784. [Google Scholar] [CrossRef]
Bakirci, M.; Bayraktar, I. Synthetic Data-Enhanced Microcrack Detection in Photovoltaic (PV) Cells Using YOLO11 for Smart Industry Applications. In Proceedings of the 2025 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russia, 24–28 March 2025; pp. 46–51. [Google Scholar] [CrossRef]
Boikov, A.; Payor, V.; Savelev, R.; Kolesnikov, A. Synthetic Data Generation for Steel Defect Detection and Classification Using Deep Learning. Symmetry 2021, 13, 1176. [Google Scholar] [CrossRef]
Lu, Y.; Li, D.; Li, D.; Li, X.; Gao, Q.; Yu, X. A Lightweight Insulator Defect Detection Model Based on Drone Images. Drones 2024, 8, 431. [Google Scholar] [CrossRef]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
Sun, H.; Cao, Y.; Dong, H.; Fink, O. Anomaly Anything: Promptable Unseen Visual Anomaly Generation. arXiv 2025, arXiv:2406.01078. [Google Scholar]
Wang, Y.; Song, X.; Feng, L.; Zhai, Y.; Zhao, Z.; Zhang, S.; Wang, Q. MCI-GLA Plug-In Suitable for YOLO Series Models for Transmission Line Insulator Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 9002912. [Google Scholar] [CrossRef]
Cao, Z.; Chen, K.; Chen, J.; Chen, Z.; Zhang, M. CACS-YOLO: A Lightweight Model for Insulator Defect Detection Based on Improved YOLOv8m. IEEE Trans. Instrum. Meas. 2024, 73, 3530710. [Google Scholar] [CrossRef]
Wang, Z. Insulator Data Set—Chinese Power Line Insulator Dataset (CPLID). Github. 2021. Available online: https://github.com/InsulatorData/InsulatorDataSet (accessed on 7 January 2025).
Chen, K.; Sun, D.; Zhang, M. Research on remote sensing object detection based on improved PP-YOLOE-R. In Proceedings of the International Conference on Pattern Recognition and Image Analysis (PRIA 2024), Nanjing, China, 18–20 October 2024; SPIE: Bellingham, WA, USA, 2025; Volume 1364, pp. 8–16. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.

Model	Precision (%)	Recall (%)	mAP@.5:.95 (%)
PPYOLOE-m	98.3	97.8	83.1
RT-DETR-R18	99.3	98.9	90.3
Proposed method	99.8	99.3	95.7

Model	Params (M)	FLOPs (G)	FPS
PPYOLOE-m	25.1	53.9	45
RT-DETR-R18	23.8	68.1	70
Proposed method	29.6	70.2	139