1. Introduction
In recent years, China’s electric power industry has witnessed remarkable growth in tandem with the country’s economic development. According to statistics, China’s industrial electricity consumption is set to reach 5509 billion kWh in 2021, an increase of 9.1% year on year, while urban and rural residential electricity consumption could reach 1174.3 billion kWh, an increase of 7.3% year on year [
1]. However, the manual inspection of transmission lines has become increasingly burdensome and cannot fully guarantee the normal operation of high-voltage equipment. As a result, frequent faults in transmission lines have emerged, necessitating higher requirements for the intelligent maintenance of power systems [
2].
Insulators, as a special isolation control, can be used to fix the conductive body to ensure the smooth transmission of electricity. In the overhead, high-voltage transmission lines play an important role. Due to their exposure to outdoor environments and susceptibility to adverse weather conditions, insulators are prone to damage and require regular inspection and maintenance. The inspection methods involved are relatively advanced. One effective method is the automatic detection of insulator defects, which can enhance productivity, protect staff from potential hazards, and mitigate safety risks. Therefore, the automatic defect detection of insulators holds great significance for improving maintenance efficiency [
3].
Both domestically and internationally, the current approaches to insulator defect detection in transmission lines can be broadly classified into traditional image processing techniques, machine learning algorithms, and deep learning methods.
Traditional image processing methods, such as filtering [
4], edge detection [
5], and morphological processing [
6], have been widely used. For example, Chen Guocui et al. [
7] applied an improved fast-guided filtering algorithm to filter insulator images, effectively removing noise while preserving the edge detail features of the insulators. Zhao Le et al. [
3] utilized an edge detection algorithm with a noise suppression module for power line feature extraction. Mei Xin et al. [
8] employed morphological processing to optimize images of composite insulator surfaces covered with water, mitigating the influence of lighting conditions. These traditional methods are simple and easy to use, with high computational efficiency and clear mathematical models and algorithmic formulas. However, they are also susceptible to interference from background pixels and external noise and lack robustness and immunity to interference. Machine learning methods, on the other hand, leverage spatial and color feature information for insulator detection. Wu Yang et al. [
9] applied the AdaBoost algorithm to insulator detection and recognition, demonstrating good robustness and laying the foundation for subsequent insulator fault diagnosis. Huang Huihang et al. [
10] integrated machine learning modules into an insulator anomaly detection system, enabling the automatic detection of insulator anomalies.
Compared with traditional image processing methods, machine learning methods are more robust, can automatically analyze large amounts of data to extract valuable information, have some noise processing ability, and have some insulator defect detection capabilities. However, these methods are more suitable for offline data analysis and may require human intervention to achieve better detection results. In addition, machine learning methods have limited adaptability and generalizability in complex environments. In conclusion, both traditional image processing methods and machine learning methods have limitations when effectively solving complex problems and might not be able to meet the practical requirements of detection accuracy and speed.
With the continuous advancements in hardware technology, new deep learning detection methods have emerged, addressing challenges that traditional image processing methods and machine learning approaches struggle with. These deep learning methods can be generally classified into two-stage detection methods and single-stage detection methods.
Two-stage detection methods utilize a two-step process to detect targets [
11]. Firstly, they generate candidate regions that potentially contain the insulators being detected. Then, each candidate region is classified and refined for recognition and localization. Some popular two-stage detection methods include R-CNN [
12] and Faster R-CNN [
13]. For instance, Zheng Ruojun [
14] employed cropped R-CNN to extract insulator features and performed validation on Raspberry Pi, eliminating the need to transfer images to remote GPU servers for processing. Yi Jiyu et al. [
15] incorporated multi-scale images and introduced an adversary generation strategy to enhance the accuracy of Faster RCNN when detecting blocked insulators. Tian Zijian et al. [
16] proposed a two-stage target enhancement network specifically designed for low illumination environments, effectively detecting insulator faults.
The two-stage detection method offers greater accuracy when locating insulators and achieves higher overall detection accuracy, particularly on large-scale datasets. However, this method is more complex, demands more computational resources, and poses challenges in deployment on mobile devices.
Single-stage detection methods offer faster inference speed and significant engineering application values than two-stage detection methods. Some classic single-stage detection methods include the YOLO [
17,
18,
19,
20,
21] series and the SSD [
22] series.
For instance, Zhu Youchan et al. [
23] utilized Darknet-53 as the feature extraction network in YOLOv3 to successfully detect and identify normal insulators. Song Libo et al. [
24] employed Resblock-D+CSPDarknet53-tiny as the backbone network for YOLOv4-tiny and achieved successful deployment on Jetson NANO, showcasing its engineering usability. Wang Jianye [
25] developed a lightweight multi-scale feature fusion SSD model to detect insulator self-detonation faults.
Compared to the two-stage detection method, the single-stage detection method outputs the defect detection results directly from the input image without multiple processing stages. Compared with the traditional multi-stage pipeline approach, it simplifies the complexity of the whole inspection system and improves the efficiency of the model. It facilitates insulator defect detection and model deployment. However, single-stage detection methods are more suitable for ordinary image comparisons and might struggle with high-resolution image detection. Insulator images often have complex backgrounds, variable sizes, and the insulator defect area is relatively small, which can lead to suboptimal detection results using single-stage methods.
Therefore, this paper presents a novel approach for detecting defects in transmission line insulators using an enhanced version of YOLO v7. This method aims to achieve the efficient and accurate detection of insulator faults while maintaining a high detection speed. The major contributions of this paper are outlined as follows:
- (1)
Data Enhancement: The standard insulator dataset TISLTR and the high-resolution tiny target faulty insulator dataset FISLTR are enhanced through various techniques, including image enhancement, flipping, cropping, blurring, and random transformations. These techniques enhance the datasets by increasing their diversity and quality.
- (2)
Attention Mechanism: To address the challenge of varying insulator sizes and occlusion caused by transmission line towers, the ECA is incorporated into the backbone of YOLO v7. This mechanism dynamically learns the significance of various channels in the input image, effectively reducing the impact of pole occlusion and enhancing the detection algorithm’s accuracy.
- (3)
Partial Convolution: The YOLO v7 network model has a complex network structure and many computational parameters, resulting in slower performance. To tackle this issue, partial convolution (PConv) is introduced as a replacement for traditional convolution in the YOLO v7 network model. PConv ensures both efficiency and detection accuracy while reducing computational parameters.
- (4)
Normalized weighted detection: Due to the complex background of insulator images and the presence of small and dense insulators in datasets, insulator features can easily be lost during feature extraction, leading to missed detections. The YOLO v7 network model uses the Normalized Weighted Distance (NWD) metric instead of the traditional Intersection over Union (IoU) for target detection, which reduces the sensitivity of the IoU to the positional deviation of insulators with small targets, thus achieving the effective detection of insulators with small targets.
3. Experimental Results and Analysis
In order to comprehensively evaluate the performance of the enhanced YOLO v7 algorithm, several experiments were conducted, including attention experiments, different convolution experiments, ablation experiments, and comparisons with classical network models. The details of these experiments are as follows:
3.1. Experimental Platform
The experimental environment for the images in this paper is shown in
Table 1. In this paper, the image experiment environment is Windows system, the CPU of the experiment platform is Intel i5-12400F, the graphics card is NVIDIA GeForce RTX3060, the memory is 16 GB, the input image size is 640 × 640 px, the batch size is 8, and the epochs are 150.
The YOLO v7 network model was trained using the PyTorch framework. This paper used a pre-trained weight model and performed model training updates on top of it. The model was trained for 150 epochs to generate the final network weight model.
The table below presents the hyperparameter settings to train the improved YOLO v7 network model, as proposed in this paper (
Table 2). Momentum action reduces oscillations and noise, which is common in traditional gradient descent algorithms. Learning rates affect the speed of convergence. Weight decay reduces the risk of overfitting in the model.
3.2. Data Description
To evaluate the generalizability of the algorithm, experiments were conducted using two different datasets: the TISLTR dataset for normal insulators and the FISLTR dataset for high-resolution small target fault insulators.
The TISLTR dataset consists of 976 normal insulator images with varying sizes and backgrounds. The resolution of these images is 1152 × 864 pixels. On the other hand, the FISLTR dataset contains 1231 faulty insulator images with a resolution of 3216 × 2136 pixels. The faulty insulators in this dataset exhibit flash and broken fault types. The data are labeled using the Labeling tool and divided into three main categories: normal insulators, flashover insulators, and broken insulators.
The original dataset is not sufficient to satisfy the experimental generalization and universality; therefore, the dataset is expanded by color random transformation, cropping, and blurring, and the result of image data enhancement and expansion is shown in
Figure 6.
To improve the generalization capability of the network model, the TISLTR and FISLTR datasets were divided into training, validation, and test sets in a ratio of 8:1:1.
3.3. Evaluation Indicators
To evaluate the superiority of the improved YO-LO v7 algorithm in an objective and accurate manner, metrics like precision, recall, and mean average precision (mAP) are employed to distinguish the network model. Precision refers to the percentage of correctly predicted positive samples using the model out of all samples predicted as positive categories. Recall is the number of samples correctly predicted by the model in the positive category as a proportion of the number of all samples actually in the positive category. The mAP is an evaluation metric in the target detection task that calculates the average accuracy value across multiple categories. Additionally, the frame rate (Frames Per Second, FPS) indicates the number of images that the model can process per second. The higher the model’s processing frame rate, the quicker it can detect the images.
3.4. Experimental Results and Analysis
3.4.1. Experimental Comparison of Different Attentional Mechanisms
This study aims to verify the effectiveness of the ECA mechanism in detecting insulator targets. To achieve this objective, two other attention mechanisms, namely the Squeeze-and-Excitation networks (SE) and Convolutional Block Attention Module (CBAM), were introduced and experimentally compared with the ECA mechanism within the YOLO v7 algorithm.
The SE mechanism learns the significance of each channel’s feature map through self-learning and assigns distinctive weights according to the specific characteristics. CBAM, in addition to channel attention, incorporates spatial attention by assigning varying weights to different objects or background information corresponding to feature channels and spatial locations. However, CBAM is less sensitive to smaller feature maps and is, therefore, less suitable for detecting small target faults in insulators.
Both SE and CBAM have certain limitations. On the other hand, ECA, which is built upon the SE mechanism, achieves better extraction of useful insulator features by applying one-dimensional convolution to information interaction across channels. The performance metrics of these three attention modules trained on both the TISLTR dataset (normal insulator dataset) and the FISLTR dataset (high-resolution small target faulty insulator dataset) are presented in
Table 3.
The table demonstrates that the YOLO v7 algorithm with the ECA attention mechanism outperforms both the SE and CBAM mechanisms in terms of detection results. This superiority can be observed not only in the normal insulator dataset (TISLTR) but also in the high-resolution small target fault dataset (FISLTR). These findings indicate that the ECA attention mechanism exhibits strong feature extraction capabilities for insulators within the dataset used in this study.
3.4.2. Comparison of Different Convolution Experiments
To evaluate the performance of partial convolution (PConv) in insulator target detection, this study compared it with conventional convolution (Conv) within the YOLO v7 network model. PConv achieves feature extraction for insulators by leveraging the similarity between channel features and extracting features from only a subset of the channels.
The TISLTR dataset (normal insulator dataset) and the FISLTR dataset (high-resolution small target fault insulator dataset) have been employed to evaluate the performance of both Conv and PConv.
Table 4 illustrates that PConv accomplishes a higher detection accuracy while having a reduced parameter count of 4.3 MB in both datasets compared to Conv. The utilization of PConv not only decreases the count of parameters but also significantly improves the detection speed.
3.4.3. A Comparison of Ablation Experiments
In order to assess the accuracy and effectiveness of the algorithms proposed in this paper and examine the impact of each module on the model’s performance metrics, we conducted ablation experiments on both the TISLTR and FISLTR datasets. We used the YOLO v7 model as the base model and gradually added different modules to evaluate the performance metrics of the network model.
Table 5 presents the results of the ablation experiments.
The experiments showed that incorporating the ECA attention mechanism into the YOLO v7 model led to an mAP enhancement of 1.9% in the TISLTR dataset and 5.4% in the FISLTR dataset. This suggests that the ECA attention mechanism allows the network model to concentrate on vital channels and have particular extraction capabilities to detect insulators in complex background images.
When comparing the performance metrics of YOLO v7+ECA and YOLO v7+ECA+PConv, the results indicate that the inclusion of PConv enhances Precision and Recall slightly, while significantly improving the average detection accuracy. These findings suggest that PConv improves the network model’s ability to extract features of insulators by utilizing channel similarity in conjunction with the channel attention provided by ECA.
In conclusion, the proposed YOLO v7+ECA+PConv+NWD network structure combines the strengths of the ECA attention mechanism, partial convolution (PConv), and normalized Wasserstein distance (NWD) in insulator target detection. This combination significantly enhances the performance metrics of the single-stage network model YOLO v7 in power system insulator detection, including higher Precision, Recall, and mAP scores.
To further demonstrate the ability of the improved network model to extract insulator features, heat maps of the model were generated using both the TISLTR and FISLTR datasets, which were trained with the best.pt model. As shown in
Figure 7, the heat maps demonstrate that the improved network model effectively highlights relevant features of insulators. This capability allows for the detection of insulator defects within power systems.
3.4.4. Experimental Comparison of Different Network Models
The improved YOLO v7 algorithm proposed in this paper was evaluated using the TISLTR dataset, and its performance was compared against several currently mainstream target detection network models, such as the single-stage network models SSD, YOLO v3, YOLO v4, YOLO v5, YOLOx, YOLO v7-tiny and the two-stage network model Faster R-CNN.
Table 6 displays the relevant performance metrics obtained from the experiments.
As seen in
Table 6, both the traditional single-stage network models and the two-stage network model performed well in power system insulator target detection. The mAP (Mean Average Precision) of all network models exceeded 75%, with the improved YOLO v7 network model in this paper achieving an mAP of 96.2%, which is higher than that of other network models. However, SSD and YOLO v4 had the lowest Recall, as indicated by
Figure 8, which shows instances of missed detections and leakage detection. In contrast, the improved network model in this paper achieved a Recall of 93.7% and a Precision of 96.8%, both of which surpass other network models. YOLO v7-tiny, being a lighter version of YOLO v7, demonstrated a detection performance second only to that of the improved YOLO v7 network model in this paper. The detection results of YOLO v3, YOLO v5, YOLO x, and Faster-CNN show that their network models are all weaker than the improved network model in this paper. Therefore, the improved network model proposed in this article exhibits a certain superiority.
Figure 8 illustrates that insulator detection is challenging due to the presence of background utility poles. The middle vertical insulator, in particular, almost blends into the background of utility poles, resulting in a complex background for insulator detection. Additionally,
Figure 9 presents the detection results for different insulator sizes. It can be observed that other algorithms missed detections during the process, while the improved network model from the article successfully detected insulators in complex backgrounds. The improved network model demonstrated better results for both insulator detection in complex backgrounds and the detection of insulators with different sizes.