Insulator Defect Detection in Complex Environments Based on Improved YOLOv8

Qin, Yuxin; Zeng, Ying; Wang, Xin

doi:10.3390/e27060633

Open AccessArticle

Insulator Defect Detection in Complex Environments Based on Improved YOLOv8

by

Yuxin Qin

^1,2

,

Ying Zeng

³ and

Xin Wang

^1,*

¹

School of Electrical and Information Engineering, Hunan University of Technology, Zhuzhou 412007, China

²

School of Computer Science, University of Glasgow, Glasgow G12 8QQ, UK

³

School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(6), 633; https://doi.org/10.3390/e27060633

Submission received: 27 March 2025 / Revised: 8 May 2025 / Accepted: 9 June 2025 / Published: 13 June 2025

(This article belongs to the Section Signal and Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

Insulator defect detection is important in ensuring power systems’ safety and stable operation. To solve the problems of its low accuracy, high delay, and large model size in complex environments, following the principle of progressive extraction from high-entropy details to low-entropy semantics, an improved YOLOv8 target detection network for insulator defects based on bidirectional weighted feature fusion was proposed. A C2f_DSC feature extraction module was designed to identify more insulator tube features, an EMA (encoder–modulator–attention) mechanism and a BiFPN (bidirectional weighted feature pyramid network) fusion layer in the backbone network were introduced to extract different features in complex environments, and EIOU (efficient intersection over union) as the model’s loss function was used to accelerate model convergence. The CPLID (China Power Line Insulator Dataset) was tested to verify the effectiveness of the proposed algorithm. The results show its model size is only 6.40 M, and the mean accuracy on the CPLID dataset reaches 98.6%, 0.8% higher than that of the YOLOv8n. Compared with other lightweight models, such as YOLOv8s, YOLOv6, YOLOv5s, and YOLOv3Tiny, not only is the model size reduced, but also the accuracy is effectively improved with the proposed algorithm, demonstrating excellent practicality and feasibility for edge devices.

Keywords:

insulator defect detection; improved YOLOv8; C2f_DSC network; feature fusion; entropy

1. Introduction

Insulators are used to fix conductors to ensure smooth electricity transmission. However, they are easily damaged due to exposure to adverse outdoor weather conditions [1]. Traditional research usually involves staff operating drones to capture images and videos of insulators, and then manually analyzing them to identify defects and potential hazards [2]. AI-driven and information theory-based solutions have become indispensable in modern industries over the past decade [3,4,5]. Automatic defect detection as an efficient method has been applied to many areas, which not only improves production efficiency but also protects employees from potential hazards [6,7,8]. Currently, many methods related to deep-learning technology for insulator defect detection have been proposed [9]: one-step algorithms with regression, such as SSD (single-shot multibox detector) [10,11], YOLO series [12,13,14,15,16], etc.; two-step algorithms with candidate regions, such as R-CNN [17], Fast R-CNN [18,19], Mask R-CNN [20]; and hybrid methods, such as YOLO-HMC [6], YOLOu-Quasi-ProtoPNet [21], etc. The YOLO series algorithms have become some of the main algorithms in object detection applications in recent years. Wu et al. [22] proposed an improved YOLOv3 algorithm for insulator defect detection that accelerates the calculation speed by replacing the Darknet-53 structure in the backbone feature extraction network with the MobileNetV1 structure. However, its accuracy is relatively low. Li et al. [23] added a SimAM attention mechanism based on YOLOv3, effectively improving the feature extraction ability of images without changing the original feature pyramid network. However, this method cannot accurately recognize targets in complex environments. To better capture effective information, Xiao et al. [24] integrated a GAM attention mechanism and an ASFF adaptive feature fusion mechanism in the YOLOv5, but the accuracy of its object detection in complex environments is relatively low. Wang et al. [25] proposed an insulator defect detection method based on AMC-YOLOX-s, inserting a CA coordinate attention mechanism into CSPDarkNet to improve the classification and localization ability of insulators. Zou et al. [26] integrated a CA attention mechanism and BiFPN into the YOLOv7 algorithm and used data augmentation to improve the model performance in detecting insulator defects in complex environments, but the detection speed was slow. Jia et al. [27] integrated depth-wise separable convolution, point-wise convolution, and an ECA attention mechanism into YOLOv5 to improve the shallow network’s ability, greatly reducing the model’s complexity and solving its embedding problems in mobile devices. However, the model’s accuracy is still low.

Although the insulators can be detected well using the above algorithms, it is difficult to recognize insulator defect targets in complex environments in real time. This needs to find the trade-offs among model size, model detection speed, and accuracy, and execute embedded applications on edge devices [28]. To deal with the above problems, an improved YOLOv8 target detection network for insulator defects based on bidirectional weighted feature fusion is proposed. The main contributions of this paper are as follows:

(1): To tackle the intricate morphology of insulators, a new C2f_DSC network module was designed, combining a dynamic snake convolution (DSConv) kernel [29] with entropy-regulated feature compression. This convolution kernel structure can better capture the basic features of insulator defect areas, improve the perception ability of subtle defect targets, and enhance the robustness and accuracy of the algorithm.
(2): To address the entropy imbalance and feature fusion in insulator defect detection, BiFPN [30] was improved by adjusting its parameters and connections for multi-scale feature fusion, prioritizing defect-related features, and thereby enhancing recognition accuracy.
(3): The EMA mechanism [31] was integrated into the model, incorporating high-information–content features and performing weighted average processing on feature maps during training, highlighting key information, and improving the model’s attention to insulator defect areas. This integration enhances the model’s self-adaptive feature adjustment ability, enabling it to maintain good performance under different conditions.
(4): The EIOU loss function [32] was applied to YOLOv8. Compared with conventional CIOU (Complete Intersection over Union) [33], it explicitly incorporated geometric discrepancies between the target and the anchor boxes and further minimized geometric differences (e.g., center distance and aspect ratio) in a statistically guided manner, thereby improving the model’s convergence speed, accuracy, and stability.

These improvements form a unified entropy-driven framework, achieving synergistic optimization and enhancing feature extraction capability, accurate recognition, and localization ability for insulator defect targets in complex environments.

The remainder of this article is organized as follows: An improved YOLOv8 is proposed in Section 2. The performance of the improved YOLOv8 and its competitors is compared in Section 3. Finally, conclusions are drawn in Section 4.

2. Materials and Methods

2.1. YOLOv8 Algorithm

The YOLOv8 algorithm was released by Ultralytics (Frederick, MD, USA) in 2023 [15]. Its structure includes a backbone network, an anchor-free network, and a new loss function, and it achieves entropy-regulated feature optimization through three key mechanisms: information compression in the backbone network, entropy balancing within the feature pyramid, and entropy minimization at the detection head. Therefore, YOLOv8 is extremely efficient and can run on various platforms, from CPU to GPU. It is easy for users to switch between different YOLO versions.

YOLOv8′s backbone network is similar to YOLOv5. It is based on the idea of the CSP network structure and the ELAN structure in YOLOv7 [14]. C2f module is formed by combining C3 and ELAN (efficient lightweight attention network), which enables YOLOv8 to better capture rich features while maintaining its lightweight design. The backbone network of YOLOv8 still uses the most popular SPPF (spatial pyramid pooling fast) module, sequentially passing three Maxpools with a size of 5 × 5, and then connecting them between layers. This ensures the objects’ accuracy at different scales while ensuring their lightweight design.

In the neck region, PAN [34] + FPN [35] are still the main feature fusion methods used by YOLOv8, which includes two upsampling, multiple C2f modules, and decoupling head structures. This architecture achieves multi-scale entropy balancing through complementary integration of high-level (low-entropy) and low-level (high-entropy) features while optimizing entropy-regulated information flow across network hierarchies. In YOLOx [36], the decoupling head was applied to the neck by YOLOv8. YOLOv8 currently has five versions. These versions differ in terms of the depth and width of the networks. The YOLOv8n network structure is shown in Figure 1. It includes four core components: input, backbone feature extraction network, feature fusion network (neck), and prediction (head). The input mainly includes image data augmentation, image resizing, and adaptive anchors. The neck mainly includes the backbone network, feature pyramid, and loss function. The head mainly includes prediction box generation, prediction box filtering, and output.

2.2. Improved YOLOv8 Target Detection Network for Insulator Defects

An improved YOLOv8 target detection network for insulator defects based on bidirectional weighted feature fusion is proposed, as shown in Figure 2. Data augmentation was first implemented to enhance dataset diversity. A C2f_DSC module was specifically designed on the original backbone network to better adapt to complex insulator morphological variations. Concurrently, a bidirectional weighted feature fusion mechanism and a novel composite loss function were integrated to optimize target detection efficiency while balancing computational demands and memory constraints. These architectural enhancements enable precise capture of subtle defect signatures, thereby achieving accurate identification of both insulators and defect targets in challenging real-world scenarios.

2.2.1. Entropy-Driven Information-Guided Augmentation

To address the limited defect samples in the CPLID dataset, we developed an information–theoretic augmentation framework that generates semantically meaningful synthetic samples. The core principle involves selecting augmentation strategies (e.g., CutMix, Mosaic) to maximize mutual information (MI) between input images and defect classes:

I (X; Y) = H (X) - H (X ∣ Y)

(1)

where XX is the input image and YY is the defect category.

2.2.2. C2f_DSC Module

To extract local weak structural features, as well as diverse global morphological features at various scales, DSConv [29] was introduced and stacked multiple times to expand the receptive field ranges. DSConv is a convolutional neural network (CNN) technique in deep learning aimed at addressing certain limitations in convolution operations, leveraging entropy-driven adaptive mechanisms to enhance the network’s capacity for irregular data. Specifically, it dynamically adjusts the shape and size of convolutional kernels based on local feature entropy and data irregularity, optimizing kernel configurations through information–theoretic prioritization of high-information regions. The principle of increasing the receptive field through DSConv convolution is as follows:

In the standard 2D convolution coordinate system K with center coordinate K_i = (x_i, y_i), a conventional 3 × 3 convolution kernel with dilation rate of 1 is defined as K = {(x − 1, y − 1), (x − 1, y),…,(x + 1, y + 1)}. Inspired by deformable convolution [37], we introduce an entropy-optimized deformation offset ∆ to the dynamic snake convolution, where the elastic deformation of kernels along x–y axes is guided by minimizing the feature spatial entropy H(p). Specifically, the selection process for each grid position K_i_±c = (x_i±c, y_i_±c) (where c = {0, 1, 2, 3, 4} is the horizontal distance from the center grid) incorporates the principle of maximizing local mutual information I(X;Y), ensuring the dynamic adjustment of offset ∆ = {δ | δ ∈ [−1, 1]}, and following the information bottleneck theory. This entropy-constrained deformation mechanism guarantees that the accumulation process K_i₊₁ = K_i + ∆ consistently focuses on low-entropy, high-information-density regions. Not only does this approach enable each pixel in deeper network layers to cover wider input areas for capturing global features, but it also significantly improves information extraction efficiency by suppressing computational redundancy in high-entropy noise regions. As shown in Figure 3, compared to traditional convolutions, the resulting receptive field demonstrates distinct entropy-adaptive characteristics, exhibiting superior information-capturing capability in complex scenarios.

Compared with other convolution methods, dynamic snake convolution performs better in extracting features of tubular structures. Its design concept helps to more accurately and comprehensively capture feature information with tubular structures. The final designed C2f_DSC module structure is shown in Figure 4. A 1 × 1 convolution was employed to alter the number of channels in the input features, and then the split operation was used to replace the 1 × 1 convolution for feature segmentation. More skip connections were also used to decrease the parameters while extracting richer multi-scale insulation sub-features, enabling recognition of the complex structural and morphological features of insulators.

2.2.3. EMA Mechanism

The attention mechanism plays an important role in computer vision, as it helps models focus on the most critical local information in the images, especially for detecting target-related features. Therefore, high-information features can be selected through mutual information to optimize the network architecture or design attention mechanisms. Here, an EMA mechanism was incorporated into the final stage of the backbone network, helping the model concentrate on salient visual features, thereby improving the model’s performance. Moreover, the local cross-channel interactions in each parallel sub-network were established in the EMA mechanism without channel dimensionality reduction, which helps the network avoid excessive sequential processing and depth increases. The EMA mechanism module structure [31] is shown in Figure 5. “G” is the divided group. “X Avg Pool” is a horizontal global pool with one dimension. “Y Avg Pool” is a vertical global pool with one dimension. “Avg Pool” is the vertical global pool with two dimensions. C is the input channel number. H and W are the spatial dimensions of the input features. To obtain multi-scale spatial structural information together and respond quickly, 3 × 3 and 1 × 1 branches placed in parallel were adopted by EMA. Simultaneously, short-term and long-term dependencies were established effectively by the grouped features and multi-scale structure. The process of the EMA mechanism is as follows:

Firstly, the features are grouped. For any feature map, X ∈ RC × H × W, X = [X₀, X_i, … X_G₋₁], where X_i ∈ RC//G × H × W. Here, X is split into G sub-feature groups along the cross-channel dimension. This division helps the model learn different semantics. G≪C is taken, and the learned attention-weighted descriptors are used to focus on the regions of interest in each sub-feature, without sacrificing generality.

Secondly, parallel subnetworks are utilized. Multi-scale spatial information can be collected due to the neuron’s large local receptive field. Therefore, attention-weighted descriptors extracted through three parallel paths for grouped feature maps are used by EMA. One parallel route is 3 × 3 branches, and the other two routes are 1 × 1 branches, which are capable of capturing dependencies among all channels while also reducing computational complexity. Specifically, the channels are encoded along two spatial directions in the 1 × 1 branches through dual 1D global average pooling operations, while localized cross-channel dependencies are captured via 3 × 3 convolution in the 3 × 3 branches to increase the feature space.

Finally, cross-spatial learning is conducted. Two tensors are introduced by EMAs. One is the 1 × 1 branch output, and the other is the 3 × 3 branch output. The global spatial information in the 1 × 1 branch output is encoded using a two-dimensional global average pool. Before the joint activation mechanism of channel features, the smallest branch output is directly transformed into the corresponding dimensional shape to preserve the entire precise spatial position information. Within each group, the output feature maps are fused into dual spatial attention weights, and pixel-level pair-wise relationships are captured through the Sigmoid function, while the EMA module’s final output size remains the same.

2.2.4. Improved Feature Fusion Layer

The traditional YOLO series employs PAN + FPN for feature fusion, integrating multi-scale features through horizontal connections and hierarchical pyramid structures. However, targets often exhibit challenges in complex scenes such as multi-scale variation, occlusion, and information entropy imbalance issues poorly addressed by conventional fusion methods. To enhance detection performance without inflating model size, we adopted a bidirectional weighted feature fusion network (BiFPN). BiFPN has the following advantages:

(1): Entropy-constrained adaptive weighting is introduced to optimize feature transmission. BiFPN maximizes mutual information flow across scales by dynamically learning feature weights based on local information density, improving feature expressiveness, and reducing information entropy loss during fusion.
(2): Fine-grained features are preserved by pruning low-information pathways (high entropy noise) while reinforcing high-information channels. This entropy-driven selection elevates detection accuracy.
(3): Feature weights are adjusted dynamically using entropy-minimized criteria, ensuring fusion prioritizes semantically rich (low entropy) regions. This flexibility adapts to target scale variations and occlusions in complex scenes.

The model achieves higher detection accuracy and stability by integrating BiFPN, especially for targets with entropy-diverse scales (e.g., small objects in cluttered backgrounds). Different Feature fusion structures are shown in Figure 6.

2.2.5. Improved Loss Function

To better evaluate the similarity between the target and the predicted box and more accurately measure the degree of matching between the predicted results and the actual targets, EIOU was adopted as the target loss function, which consists of overlap loss (LIOU), center distance loss (Ldis), and width-to-height loss (Lass). In addressing overlap loss and center distance loss, the similarity and the degree of matching were assessed by considering both their degree of overlap and the distance separating their center points. In addition, the width and height losses are directly minimized by EIOU. These optimization strategies can make model convergence more quickly and effectively improve model performance. The loss function equation for CIOU [33] is as follows:

L_{C I O U} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α ν

(2)

α = \frac{ν}{(1 - I O U) + ν}

(3)

ν = {\frac{4}{π^{2}} (a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})}^{2}

(4)

The loss function equation for EIOU [32] is as follows:

L_{E I O U} = L_{I O U} + L_{d i s} + L_{a s p} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + \frac{ρ^{2} (w, w^{g t})}{c_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{c_{h}^{2}}

(5)

where b is the center point of the predicted box.

b^{g t}

is the center point of the real box.

ρ^{2} (w, w^{g t})

is the width difference between the predicted box and the real box.

ρ^{2} (h, h^{g t})

is the height difference between the predicted box and the real box.

c_{w}

and

c_{h}

are the width and height of the smallest bounding box that covers the predicted box and the real box, respectively.

3. Results and Discussion

3.1. Experimental Dataset

To assess the effectiveness of the proposed model, we utilized a dataset from the China Power Line Insulator Dataset (CPLID) [38], which provides images of normal insulators and synthetic defective insulators captured by drones. The number of normal insulator images is 600, and the number of insulator images containing defects is 248. Due to the limited amount of data, we adopted data-augmentation methods [39,40] to increase the image samples. According to the idea of maximizing information gain, the data-augmentation methods included the following: (1) adding noise; (2) adjusting the brightness; (3) cutout; (4) rotation; (5) cutting; (6) translation; and (7) mirroring. These seven methods were randomly combined to expand the data to seven times its original size. Some sample cases are shown in Figure 7. Finally, the dataset was divided into a training set and a test set in a 9:1 ratio, with the training set consisting of 6106 augmented images and some original images, and the test set consisting of 678 original images.

3.2. Experimental Environment

The experimental environment is shown in Table 1, using PyTorch 1.8 as the deep-learning framework and a GPU for training. During the training process, all parameters were initially set to 200 iterations. The starting learning rate was 0.01, the batch size was 64, and the weight decay coefficient was 0.0005.

3.3. Evaluation Indicators

To evaluate the image detection model’s performance, multiple evaluation metrics were used for a comprehensive analysis. These indicators included mean accuracy (mAP), precision, recall, model parameter count (Params), weight file size (Size), and inference time (Inference Time). Precision was calculated as the proportion of predicted positive samples among all predicted samples, whereas recall was determined as the proportion of correctly predicted positive samples relative to the total actual positive samples, as illustrated in Equations (5) and (6), respectively.

P r e c i s o n = \frac{T P}{T P + F P}

(6)

R e c a l l = \frac{T P}{T P + F N}

(7)

where TP is the number of positive samples detected as true, FP is the number of negative samples detected as true, and FN is the number of positive samples undetected.

A P = \int_{0}^{1} P (R) d R

(8)

m A P = \frac{1}{m} \sum_{i = 1}^{m} A P_{i}

(9)

where AP is the mean accuracy of each category, and mAP is the mean accuracy of all categories, as shown in Equation (7) and Equation (8), respectively. “mAP50” is the mAP when the intersection-over-union (IOU) threshold is 50%. “mAP50:90” is the mAP when the IOU threshold changes from 50% to 90%. The IOU threshold is typically used to measure the overlaps between the predicted bounding boxes and the actual bounding boxes. A threshold of 50% indicates that if the predicted bounding box overlaps with the actual bounding box by more than 50%, the predicted bounding box is considered correct [41].

3.4. Ablation Experiment

To verify the robustness of the proposed algorithm, ablation experiments were conducted on the CPLID dataset [38]. The essence of an Ablation Experiment is to quantitatively analyze the regulatory role of a specific component (e.g., module, layer, or connection) in an image information system by systematically removing or modifying it. The ablation experiment results are shown in Table 2, which demonstrates that the improved YOLOv8 algorithm outperformed the previous YOLO series algorithms. Although the weight size and time consumption increased slightly, the accuracy, robustness, and perceptual ability improved significantly, and, therefore, more accurate target localization and recognition were achieved at the expense of very few resources.

The training results of the improved YOLOv8 model are shown in Figure 8. The model was trained 200 times. As the training iterations increased, the loss functions of the training and testing sets rapidly decreased and eventually stabilized, while the precision, recall, mAP50, and mAP50:90 all rapidly increased and eventually stabilized. The mAP50 value remained stable at about 98.6%, demonstrating excellent performance during the training process.

3.5. Comparative Experiment

To further validate the reliability of the proposed algorithm, multiple common YOLO series object detection models (YOLOv3 Tiny, YOLOv5n, YOLOv5s, YOLOv6, YOLOv8n, and YOLOv8s) were compared using the indicators mentioned above, and they were trained and tested on the same CPLID dataset. The comparison results are shown in Table 3. Higher accuracy was achieved while the parameter number and model size were reduced through the proposed algorithm, with a mAP value of 98.6%. Compared with YOLOv3 Tiny, YOLOv5n, YOLOv5s, YOLOv6, YOLOv8n, and YOLOv8s, the average detection accuracy was improved by 1.5%, 0.8%, 0.7%, 1%, 0.8%, 0.2%, and 0.8%, respectively, indicating that the improved model performs well in insulator defect target detection tasks. The comparative effects of the various models shown in Figure 9 verify that the enhanced YOLOv8 algorithm exhibits higher efficiency and accuracy and can correctly identify insulator targets even if they are in obscured states. Therefore, it is suitable for insulator defect detection applications in complex environmental scenarios.

Experiments on steel surface defect detection were conducted to further demonstrate the practical applicability of the proposed model. The data is from the NEU-DET dataset (publicly released by Northeastern University in China), totaling 1800 images, and include six major categories of steel surface defects: “cracking”, “inclusion”, “patches”, “pitted_sturface”, “rolled-in-scale”, and “cracks”. Each category has 300 samples, including the training set of 270 samples and the testing set of 30 samples. The main results are shown in Table 4 and Figure 10, respectively. It can be seen that our model also has good adaptability, especially the snake convolution has better recognition of tubular objects, although it is designed for insulator defect detection. The improved C2f_DSC, BiFPN, and MEA enhance target detection ability. Therefore, our model will achieve good applications after in-depth development.

4. Conclusions

Aiming at the challenges of low accuracy, high latency, and large model size in insulator defect detection under complex environments, an improved YOLOv8 insulator defect detection algorithm based on bidirectional weighted feature fusion is proposed. A novel C2f_DSC structural module is designed to expand the receptive field for tubular insulators, and an EMA mechanism is incorporated at the end of the backbone network to extract multi-scale features from complex environments. Moreover, feature fusion is achieved using BiFPN, and the EIOU loss function is employed to accelerate model convergence. Experimental evaluations on the CPLID dataset demonstrate that the proposed model occupies only 6.40 M of storage space and achieves an average detection accuracy of 98.6%. Compared to YOLOv3 Tiny, YOLOv5n, YOLOv5s, YOLOv6, YOLOv8n, and YOLOv8s, the average detection accuracy of the proposed algorithm was improved by 1.5%, 0.8%, 0.7%, 1%, 0.8%, 0.2%, and 0.8%, respectively, while the model size and detection speed were maintained. To further demonstrate the practical applicability of the improved YOLOv8, experiments on steel surface defect detection were conducted. The results validate its excellent practicality and feasibility for edge devices, providing valuable experience for other object detection tasks. Future research will focus on the model’s lightweight design, further reducing its computational complexity and storage requirements, and balancing its real-time performance to adapt to a wider range of edge devices.

Author Contributions

Conceptualization, Y.Q. and X.W.; methodology, Y.Q.; software, Y.Z.; validation, Y.Z.; formal analysis, Y.Q.; investigation, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Q. and Y.Z.; writing—review and editing, X.W.; visualization, Y.Z.; supervision, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

Financial support was provided in part by the Natural Science Foundation of Hunan Province (grant number 2025JJ70017) and the Hunan Engineering Research Center of Electric Drive and Regenerative Energy Storage and Utilization.

Data Availability Statement

The experimental dataset for this study is from the China Power Line Insulator Dataset (CPLID), and collected images of insulator defects in actual production, as well as from the NEU-DET dataset.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, J.; Hu, M.; Dong, J.; Lu, X. Summary of insulator defect detection based on deep learning. Electr. Power Syst. Res. 2023, 224, 109688. [Google Scholar] [CrossRef]
Liu, S.; Xiao, J.; Hu, X.; Pan, L.; Liu, L.; Long, F. Defect insulator detection method based on deep learning. In Proceedings of the 2022 IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), Chengdu, China, 16–19 December 2022; pp. 1622–1627. [Google Scholar]
Wang, X.; Feng, J.; Qin, Y. An Improved Multi-objective artificial hummingbird algorithm for capacity allocation of Supercapacitor energy storage systems in urban rail transit. J. Bionic Eng. 2025, 22, 866–883. [Google Scholar] [CrossRef]
Wang, X.; Deng, C.; Qin, B. Working condition recognition based on lightweight network and knowledge distillation for rotary kilns. J. Electron. Meas. Instrum. 2023, 37, 149–159. [Google Scholar]
Kong, C.; Chen, B.; Li, H.; Wang, S.; Rocha, A.; Kwong, S. Detect and Locate: Exposing Face Manipulation by Semantic- and Noise-level Telltales. arXiv 2022, arXiv:2107.05821. [Google Scholar] [CrossRef]
Yuan, Y.; Zhou, Y.; Ren, X.; Zhi, H.; Zhang, J.; Chen, H. YOLO-HMC: An Improved Method for PCB Surface Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 2001611. [Google Scholar] [CrossRef]
Chen, J.; Wen, Y.; Nanehkaran, Y.A.; Zhang, D.; Zeb, A. Multiscale attention networks for pavement defect detection. IEEE Trans. Instrum. Meas. 2023, 72, 2522012. [Google Scholar] [CrossRef]
Wang, C.; Wei, X.; Jiang, X. An automated defect detection method for optimizing industrial quality inspection. Eng. Appl. Artif. Intell. 2024, 127, 107387. [Google Scholar] [CrossRef]
Liu, Y.; Liu, D.; Huang, X.; Li, C. Insulator defect detection with deep learning: A survey. IET Gener. Transm. Distrib. 2023, 17, 3541–3558. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Computer Vision–ECCV 2016 In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. Part 1. pp. 14, 21–37. [Google Scholar]
Miao, X.; Liu, X.; Chen, J.; Zhuang, S.; Fan, J.; Jiang, H. Insulator detection in aerial images for transmission line inspection using single shot multibox detector. IEEE Access 2019, 7, 9945–9956. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–23 June 2023; pp. 7464–7475. [Google Scholar]
Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A review on yolov8 and its advancements. In International Conference on Data Intelligence and Cognitive Informatics; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Zhao, W.; Xu, M.; Cheng, X.; Zhao, Z. An insulator in transmission lines recognition and fault detection model based on improved faster RCNN. IEEE Trans. Instrum. Meas. 2021, 70, 5016408. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN[C]. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Stefenon, S.F.; Singh, G.; Souza, B.J.; Freire, R.Z.; Yow, K.C. Optimized hybrid YOLOu-Quasi-ProtoPNet for insulators classification. IET Gener. Transm. Distrib. 2023, 17, 3501. [Google Scholar] [CrossRef]
Tao, W.; Weibin, W.; Li, Y.; Beimin, X.; Weiwei, Y.; Hongyu, W. Lightweight YOLOV3 insulator defect detection method. Comput. Eng. 2019, 45, 275–280. [Google Scholar]
Li, J.; Liu, L.; Niu, Y.; Li, L.; Peng, Y. YOLOv3 insulator string recognition method incorporating attention. High Volt. Appar. 2022, 58, 67–74. [Google Scholar]
Xiao, C.; Pan, R.; Li, C.; Huang, J. Research on Improved YOLOv5s Insulator Defect Detection Technology. Electron. Meas. Technol. 2022, 45, 137–144. [Google Scholar]
Wang, Y.; Feng, T.; Sun, N.; Yang, C.; Yu, H.; Cui, H. A defective detection method for power insulators integrating attention and multi-scale features. High Volt. Technol. 2024, 50, 1933–1942. [Google Scholar]
Zou, H.; Chen, J.; Chai, Y.; Yang, Q. Detection method for insulators and their self-explosion defects in foggy scenarios based on improved YOLOv7. Foreign Electron. Meas. Technol. 2023, 42, 1–11. [Google Scholar]
Jia, X.; Wu, X.; Zhao, B. Lightweight detection network DE-YOLO for insulator self-explosion defects. J. Electron. Meas. Instrum. 2023, 37, 28–35. [Google Scholar]
Oksuz, K.; Cam, B.C.; Kalkan, S.; Akbas, E. Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3388–3415. [Google Scholar] [CrossRef] [PubMed]
Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 6070–6079. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 July 2018; pp. 8759–8768. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Ge, Z. Yolox: Exceeding Yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
InsulatorData. InsulatorDataSet. [EB/OL]. (26 January 2018). Available online: https://github.com/InsulatorData/InsulatorDataSet (accessed on 20 January 2025).
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Yang, S.; Xiao, W.; Zhang, M.; Guo, S.; Zhao, J.; Shen, F. Image data augmentation for deep learning: A survey. arXiv 2022, arXiv:2204.08610. [Google Scholar]
Qin, B.; Zeng, Y.; Wang, X.; Peng, J.; Li, T.; Wang, T.; Qin, Y. Lightweight DB-YOLO Facemask Intelligent Detection and Android Application Based on Bidirectional Weighted Feature Fusion. Electronics 2023, 12, 4936. [Google Scholar] [CrossRef]

Figure 1. YOLOv8 network structure.

Figure 2. Improved YOLOv8 network structure for insulator defect detection.

Figure 3. Receptive fields of different convolutions (Green dots: sampling point positions; red dot: center position).

Figure 4. Designed C2f_DSC module structure.

Figure 5. EMA mechanism module structure.

Figure 6. Different feature fusion structures. (a) FPN; (b) PAN; (c) BiFPN.

Figure 7. Insulator defect detection dataset augmentation.

Figure 8. Training results of the improved YOLOv8 model for insulator defect detection (up/down pictures: evaluation indicators obtained with training/test set).

Figure 9. Comparative results with different YOLO algorithms for insulator defect detection.

Figure 10. Precision/recall curves of improved Yolov8.

Table 1. Experimental environment for insulator defect detection.

Experimental Environment	Environment Configuration
Frame	PyTorch 1.8
Language	Python 3.8
Operating system	Windows 10
Processor	Intel(R)Xeon(R)CPU E5-2680 v4, Intel, Santa Clara, CA, USA @2.4 GHz
GPU	24G NVIDIA Tesla M40, NVIDIA, Santa Clara, CA, USA

Table 2. Ablation experiment results with the CPLID dataset.

Methods	Data Augmentation	D2f_DySnake	EMA	BIFPN	EIOU	mAP50
1						94.7%
2	√					97.8%
3	√	√				98.0%
4	√	√	√			98.3%
5	√	√	√	√		98.5%
6	√	√	√	√	√	98.6%

Table 3. Comparative experimental results for insulator defect detection.

Name	Params	Size	Precision	Recall	mAP50	mAP50:95	ms
YOLOv3-Tiny	11.5 M	23.2 M	98.2%	94.9%	97.1%	82.0%	4.1 ms
YOLOv5n	2.39 M	5.02 M	97.5%	96.4%	97.8%	81.2%	3.9 ms
YOLOv5s	8.69 M	17.6 M	98.0%	96.3%	97.9%	82.4%	4.0 ms
YOLOv6	4.03 M	8.28 M	97.7%	96.6%	97.6%	81.0%	3.8 ms
YOLOv8n	2.87 M	5.97 M	98.3%	97.2%	97.8%	89.8%	4.0 ms
YOLOv8s	10.65 M	22.5 M	99.2%	97.5%	98.4%	90.6%	4.2 ms
Ours	3.05 M	6.40 M	99.2%	97.7%	98.6%	89.5%	4.1 ms

Table 4. Comparative experimental results for steel surface defect detection.

Name	Params	Size	Precision	Recall	mAP50	mAP50:95	FLOPS
YOLOv5s	2.5 M	17.6 M	62.0%	63.6%	68.3%	35.2%	7.2
YOLOv8n	3.01 M	5.97 M	71.1%	61.9%	68.3%	35.2%	8.2
Yolov8s	11.65 M	22.5 M	64.0%	65.4%	71.1%	38.3%	28
Ours	3.12 M	6.19 M	71.2%	69.9%	76.7%	42.0%	8.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, Y.; Zeng, Y.; Wang, X. Insulator Defect Detection in Complex Environments Based on Improved YOLOv8. Entropy 2025, 27, 633. https://doi.org/10.3390/e27060633

AMA Style

Qin Y, Zeng Y, Wang X. Insulator Defect Detection in Complex Environments Based on Improved YOLOv8. Entropy. 2025; 27(6):633. https://doi.org/10.3390/e27060633

Chicago/Turabian Style

Qin, Yuxin, Ying Zeng, and Xin Wang. 2025. "Insulator Defect Detection in Complex Environments Based on Improved YOLOv8" Entropy 27, no. 6: 633. https://doi.org/10.3390/e27060633

APA Style

Qin, Y., Zeng, Y., & Wang, X. (2025). Insulator Defect Detection in Complex Environments Based on Improved YOLOv8. Entropy, 27(6), 633. https://doi.org/10.3390/e27060633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Insulator Defect Detection in Complex Environments Based on Improved YOLOv8

Abstract

1. Introduction

2. Materials and Methods

2.1. YOLOv8 Algorithm

2.2. Improved YOLOv8 Target Detection Network for Insulator Defects

2.2.1. Entropy-Driven Information-Guided Augmentation

2.2.2. C2f_DSC Module

2.2.3. EMA Mechanism

2.2.4. Improved Feature Fusion Layer

2.2.5. Improved Loss Function

3. Results and Discussion

3.1. Experimental Dataset

3.2. Experimental Environment

3.3. Evaluation Indicators

3.4. Ablation Experiment

3.5. Comparative Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI