High-Accuracy Insulator Defect Detection for Overhead Transmission Lines Based on Improved YOLOv5

: As a key component in overhead cables, insulators play an important role. However, in the process of insulator inspection, due to background interference, small fault area, limitations of manual detection, and other factors, detection is difﬁcult, has low accuracy, and is prone to missed detection and false detection. To detect insulator defects more accurately, the insulator defect detection algorithm based on You Only Look Once version 5 (YOLOv5) is proposed. A backbone network was built with lightweight modules to reduce network computing overhead. The small-scale network detection layer was increased to improve the network for small target detection accuracy. A receptive ﬁeld module was designed to replace the original spatial pyramid pooling (SPP) module so that the network can obtain feature information and improve network performance. Finally, experiments were carried out on the insulator image dataset. The experimental results show that the average accuracy of the algorithm is 97.4%, which is 7% higher than that of the original YOLOv5 network, and the detection speed is increased by 10 fps, which improves the accuracy and speed of insulator detection.


Introduction
In order to ensure the safe and reliable work of high-voltage transmission lines, the power business department needs to regularly patrol and maintain the substation system of transmission lines to ensure and reduce faults and hidden dangers.With the rapid development of China's market economy, higher technical requirements are put forward for the proper and safe operation of major facilities such as power transmission network equipment.In the reality of China's vast territory, the network lines in the power transmission system are widely distributed and the layout of the equipment is complicated.Insulators, as special insulating devices in the transmission lines, are required to withstand the power load and mechanical loads brought by the transmission lines when they work [1].As a result of long-term exposure to the natural environment, but also subject to dirt, lightning, strong winds, and bird damage, as well as other external factors, these factors will make the insulators on the transmission line gradually age and break [2].In the operation of transmission lines, defects of insulators often include zero value, broken string, corrosion, etc.The main defect fault studied in this paper is the phenomenon of broken insulator string.The main reasons for this defect are as follows: the poor quality of the insulator device itself making it easy to crack during the long-term operation of the insulator, resulting in broken strings; insulator impulse voltage during lightning-multiple lightning strikes damaging insulators, resulting in cracks and broken string; rainy and snowy weather overloading the insulator's mechanical stress, easily leading to insulator cracking and broken string.If there is no timely detection and elimination of these potential hazards, they eventually develop into a variety of serious failures, the safe operation of the power system poses a serious threat, ref. [3] and the insulator's working condition directly affects the safety and stability of the power grid [4].The traditional insulator inspection method is mostly manual inspection, which is labor-intensive and has low safety [5].The detection process is susceptible to environmental and human factors.Based on many factors, Unmanned aerial vehicles (UAVs)are now used to detect insulators and other electrical equipment in transmission lines [6].The main methods commonly used for insulator detection are artificial-based detection methods and machine learning-based detection methods.
In the artificial detection algorithm, Zheng [7] used ultrasonic technology to detect the density of basin insulators.Li et al. [8].proposed a new data augmentation method to reduce the adverse effect of unbalanced dataset distribution on detection performance during network model training, and optimized the parameters of the support vector machine by genetic algorithm.The authors of [9,10] extracted feature information for insulator detection based on different color models.Zhai et al. [11] combined the spatial characteristics and color characteristics of insulators to segment the insulators and background in the image and used the morphological algorithm [12] to locate the insulator defect area.The authors of [13][14][15] detected insulators according to their texture and shape characteristics.Yu et al. [16].extracted the shape and texture information of the insulator as a priori conditions, combined with the active contour model, to realize the segmentation of the insulator in a complex background.In the artificial detection algorithm, the color and shape of the image will change due to the influence of illumination, shooting distance, and angle when the UAV captures the image.The detection results of this method are susceptible to interference from the surrounding environment, and the experience of inspectors will also affect the detection accuracy is limited by specific conditions.
In the algorithm based on machine learning, Zhou et al. [17] based on the mask region convolutional neural network (R-CNN) model, changed the network structure according to the size of the detected target insulator and used a genetic algorithm to optimize the hyperparameters in the network.Based on fast R-CNN [18], Hu et al. [19] replaced the original Visual Geometry Group16 (VGG16) with a more complex feature extraction network to improve the ability of the network model to obtain image feature information and reduce information redundancy by adding an attention module.In 2016, Redmon et al. [19] first proposed a single-stage object detection algorithm, which pioneered the You Only Look Once (YOLO) [20][21][22][23] series.In [24], an end-to-end YOLO network model is used and a more accurate position of component defects in the transmission line is obtained by adding a coordinate attention module.In [25], the features of insulators with different specifications were extracted based on a deep neural network.The INSU-YOLO detection method was proposed, and the insulator defect dataset was constructed to avoid the problem of network overfitting caused by insufficient data.During the training process of the selected target detection network model, the training results are susceptible to the influence of the dataset, where the one-stage network model improves the detection speed compared to the two-stage network model, but its detection accuracy needs to be improved.
To solve the problems of insufficient accuracy and lack of robustness in the process of insulator defect fault detection, this paper proposes a YOLOv5 based on a receptive field module and multiscale.The main work is as follows: anchor frames are obtained that match the size of the detected target by k-means clustering to improve the detection accuracy of the network for target objects with different proportions; the low-level detail features are extracted from the network and fused with the deepest semantic features to the small-scale detection layer designed in this paper to improve the detection performance of the network model for small-area targets; a lightweight backbone network is built using the GhostNet [26] lightweight network to reduce convolution operations and improve the real-time performance of the model while ensuring detection accuracy; the channel receptive field block (CRF) receptive field module that integrates channel information is designed at the network head to replace the original SPP module [27], integrate channel information, fuse multiscale feature information, and use dilated convolution to reduce the calculation of redundant information.

Original YOLOv5 Algorithm
YOLO series is widely used in many fields with its fast speed and strong portability.The YOLOv5 network model [28] is mainly composed of five parts: Input, Backbone, Neck, Prediction, and Output.As shown in Figure 1, the image entered into the network is sliced first, and the downsampling effect is achieved when the complete image information is saved.The backbone network mainly completes the feature extraction of the image through the convolution module and the cross-stage partial module with residual structure.The neck of the network mainly fuses the image feature information extracted from the backbone network through the spatial pyramid pooling module, retaining rich image feature information for subsequent target object detection.The detection part of the network obtains the category of the target object, the category confidence, and the coordinate information of the object position, and the target area is marked by the anchor frame.
Appl.Sci.2022, 12, x FOR PEER REVIEW 3 of 14 receptive field block (CRF) receptive field module that integrates channel information is designed at the network head to replace the original SPP module [27], integrate channel information, fuse multiscale feature information, and use dilated convolution to reduce the calculation of redundant information.

Original YOLOv5 Algorithm
YOLO series is widely used in many fields with its fast speed and strong portability.The YOLOv5 network model [28] is mainly composed of five parts: Input, Backbone, Neck, Prediction, and Output.As shown in Figure 1, the image entered into the network is sliced first, and the downsampling effect is achieved when the complete image information is saved.The backbone network mainly completes the feature extraction of the image through the convolution module and the cross-stage partial module with residual structure.The neck of the network mainly fuses the image feature information extracted from the backbone network through the spatial pyramid pooling module, retaining rich image feature information for subsequent target object detection.The detection part of the network obtains the category of the target object, the category confidence, and the coordinate information of the object position, and the target area is marked by the anchor frame.As shown in Figure 1, in the original YOLOv5 network, for the image input network containing insulators to be detected, the image information is obtained through a series of operations such as slicing, convolution, and sampling, and finally, the detection result map with a detection frame is output to realize end-to-end insulator region recognition.

Improved YOLOv5 Algorithm
In this paper, based on the YOLOv5 network architecture, we introduce the Ghost lightweight module to reduce the network parameters, add detection layers to the network, and increase the detection scale.The CRF perceptual field module is designed to obtain more detailed feature information and improve detection accuracy.

Backbone Network
The original backbone network is sliced, and three scale feature maps of 128 × 80 × 80, 256 × 40 × 40, and 512 × 20 × 20 are obtained by threefold downsampling.The Ghost lightweight module is added to the backbone network to replace the cross-stage partial (CSP) [29] convolution module in the original backbone network and reduce the computational overhead of the network model.The Ghost module mainly uses the linear operation method with less computation to replace the original convolution operation while As shown in Figure 1, in the original YOLOv5 network, for the image input network containing insulators to be detected, the image information is obtained through a series of operations such as slicing, convolution, and sampling, and finally, the detection result map with a detection frame is output to realize end-to-end insulator region recognition.

Improved YOLOv5 Algorithm
In this paper, based on the YOLOv5 network architecture, we introduce the Ghost lightweight module to reduce the network parameters, add detection layers to the network, and increase the detection scale.The CRF perceptual field module is designed to obtain more detailed feature information and improve detection accuracy.

Backbone Network
The original backbone network is sliced, and three scale feature maps of 128 × 80 × 80, 256 × 40 × 40, and 512 × 20 × 20 are obtained by threefold downsampling.The Ghost lightweight module is added to the backbone network to replace the cross-stage partial (CSP) [29] convolution module in the original backbone network and reduce the computational overhead of the network model.The Ghost module mainly uses the linear operation method with less computation to replace the original convolution operation while ensuring the performance of the network detection accuracy.The feature map is obtained through the 3 × 3 convolution kernel, and the depthwise convolution performs the linear operation on each channel of the feature map to expand the channel, which is equivalent to the hierarchical convolution processing of the input feature map.
The backbone network structure is shown in Figure 2. The network designed in this paper makes full use of the feature map generated during the sampling process on the backbone network.Based on the original YOLOv5 three-layer detection layer, the feature map with a scale of 64 × 160 × 160 generated by downsampling is combined with the same scale feature map of the head to form a minimum scale detection layer.Among them, the internal structure diagram of each module in the backbone network of Figure 2 is further explained in detail in Figure 3.
ensuring the performance of the network detection accuracy.The feature map is obtained through the 3 × 3 convolution kernel, and the depthwise convolution performs the linear operation on each channel of the feature map to expand the channel, which is equivalent to the hierarchical convolution processing of the input feature map.
The backbone network structure is shown in Figure 2. The network designed in this paper makes full use of the feature map generated during the sampling process on the backbone network.Based on the original YOLOv5 three-layer detection layer, the feature map with a scale of 64 × 160 × 160 generated by downsampling is combined with the same scale feature map of the head to form a minimum scale detection layer.Among them, the internal structure diagram of each module in the backbone network of Figure 2 is further explained in detail in Figure 3.  ensuring the performance of the network detection accuracy.The feature map is obtained through the 3 × 3 convolution kernel, and the depthwise convolution performs the linear operation on each channel of the feature map to expand the channel, which is equivalent to the hierarchical convolution processing of the input feature map.The backbone network structure is shown in Figure 2. The network designed in this paper makes full use of the feature map generated during the sampling process on the backbone network.Based on the original YOLOv5 three-layer detection layer, the feature map with a scale of 64 × 160 × 160 generated by downsampling is combined with the same scale feature map of the head to form a minimum scale detection layer.Among them, the internal structure diagram of each module in the backbone network of Figure 2 is further explained in detail in Figure 3. Figure 3 is the internal structure of each module of the backbone network.As shown in Figure 3a, 1 in the Ghost1 _ X module is the convolution step size, and X is the number of times the module is repeated.GM in Ghost1 _ X module represents the Ghost lightweight module in (d), and BN represents batch normalization to speed up the network proficiency.RU represents the ReLu activation function to alleviate network overfitting.The add module combines the output of the previous layer with the output of this layer in the form of residual edges.As shown in Figure 3b, the CBL module is composed of three network layers: Conv, batch normalization, and Leaky ReLu.The LRU in the CBL module refers to the Leaky ReLu.As shown in Figure 3c, in the Ghost2 _ X module, 2 in the Ghost2 _ X module is the convolution step size, and X is the number of times the module is repeated.DWConv refers to depthwise separable convolution.As shown in Figure 3d, in the Ghost module, GConv refers to group convolution.As shown in Figure 3e, in the squeeze-and-excitation (SE) module, global average pooling is used to obtain global features, and the sigmoid activation function is used to introduce the nonlinear relationship between channels; FC refers to fully connected layers.
Among them, the SE attention mechanism finally obtained the weight matrix calculation process as follows [30]: The calculation is mainly divided into two parts: the first is to compress the feature map to obtain a matrix containing only channel information.
In the formula, H and W are the height and width of the input feature map, respectively, and u c is the feature map with c input channels.
The second is to weigh each channel.
In the formula, z c is the output of 1 × 1 × c after F sq operation, W 1 and W 2 are two fully connected operations, respectively, and σ refers to the sigmoid function.

CRF Receptive Field Module
Due to the small receptive field in the shallow feature map, it is not conducive to large target detection, and the large receptive field in the deep feature map is not conducive to small target detection.In this paper, we design the channel receptive field block receptive field module to enhance the receptive field of the network while introducing the residual edges of the channel attention mechanism to achieve deep features in a lightweight convolutional network by designing the network mechanism.
As shown in Figure 4, for the input feature map, the number of feature map channels is reduced by the convolution layer of 1 × 1 to reduce the computational overhead.Then, 1 × 1, 3 × 3, and 5 × 5 convolution kernels of different sizes are used to form a convolution layer of three branches to obtain receptive fields of different sizes, obtain more detailed feature information, and form receptive fields of different scales.Connect the dilated convolution corresponding to the expansion rate, and set the expansion rate as 1, 3, and 5, respectively.The convolution kernel is a 3 × 3 dilated convolution.By setting different expansion rates on different scale receptive fields, the eccentricity of each branch is obtained.Based on ensuring the resolution, the receptive field is increased, the discrimination ability of feature information is improved, and all branches are connected by Concat function.
The designed channel attention is embedded into the receptive field module in the form of residual edges.The designed channel attention mechanism first performs downsampling through adaptive pooling to compress the feature map.The embedded channel attention measures the features captured by the convolution kernel from different channels, effectively retaining data information and reducing calculation parameters.The designed channel attention is embedded into the receptive field module in the form of residual edges.The designed channel attention mechanism first performs downsampling through adaptive pooling to compress the feature map.The embedded channel attention measures the features captured by the convolution kernel from different channels, effectively retaining data information and reducing calculation parameters.
Calculation of receptive field size for each layer: ( ) In the formula, R is the receptive field size of the convolution layer, i represents the number of different convolution layers, Si is the step size of the convolution kernel of the i layer, and Ki is the size of the convolution kernel of the i layer.
The CRF receptive field module designed in this paper uses dilated convolution.For the calculation of the size of the dilated convolution receptive field: ( ) In the formula, D is the dilation rate of the hole convolution.The output of the network model to the input of each network layer is marked as 0~i, and the receptive field of the highest output layer is recorded as R0.The size of the receptive field of each layer is obtained by recursion layer by layer.When the R0 value is 1, the convolution kernel size is 3 × 3, the step size is 2, and the void rate is 2; the first layer receptive field R1 value is 9.

Multiscale Detection Layer
In the process of insulator image detection, the large-scale detection target area accounts for a large proportion, the feature information is rich, and it is easy to detect, while the insulator defect fault area size is small, the feature information contained is small, and the defect fault accounts for a small proportion in the overall image.To reduce the impact of the unbalanced proportion of target categories, a small-scale detection layer is designed to increase the detection scale by increasing the detection layer.The improved overall network structure is shown in Figure 5. Calculation of receptive field size for each layer: In the formula, R is the receptive field size of the convolution layer, i represents the number of different convolution layers, S i is the step size of the convolution kernel of the i layer, and K i is the size of the convolution kernel of the i layer.
The CRF receptive field module designed in this paper uses dilated convolution.For the calculation of the size of the dilated convolution receptive field: In the formula, D is the dilation rate of the hole convolution.The output of the network model to the input of each network layer is marked as 0~i, and the receptive field of the highest output layer is recorded as R 0 .The size of the receptive field of each layer is obtained by recursion layer by layer.When the R 0 value is 1, the convolution kernel size is 3 × 3, the step size is 2, and the void rate is 2; the first layer receptive field R 1 value is 9.

Multiscale Detection Layer
In the process of insulator image detection, the large-scale detection target area accounts for a large proportion, the feature information is rich, and it is easy to detect, while the insulator defect fault area size is small, the feature information contained is small, and the defect fault accounts for a small proportion in the overall image.To reduce the impact of the unbalanced proportion of target categories, a small-scale detection layer is designed to increase the detection scale by increasing the detection layer.The improved overall network structure is shown in Figure 5.
As shown in Figure 5, by combining the feature map generated during the head upsampling process of the network with the same scale feature map generated during the downsampling process of the backbone network, the original three-layer detection is added to the four-layer detection layer.As shown in Figure 5, the insulator image with a size of 640 × 640 is input, and the image size of 320 × 320 is obtained by focus slicing.After fourfold downsampling, the network obtained four scale feature maps, denoted as P2, P3, P4, and P5.The P2 scale is 160 × 160, the P2 downsampling is P3, the scale is 80 × 80, and P4 and P5 are obtained.The network head obtains C5 with a scale of 20 × 20; C5 upsampling obtains C4 with a scale of 40 × 40; and similarly, upsampling obtains C3 and C2.At the same time as upsampling, the four feature maps downsampled by the backbone network are connected with the feature maps of the same scale as the head.Through the Concat connection, the texture information extracted from the bottom layer is combined with the semantic information of the high layer to improve the overall detection performance of the network.As shown in Figure 5, by combining the feature map generated during the head upsampling process of the network with the same scale feature map generated during the downsampling process of the backbone network, the original three-layer detection is added to the four-layer detection layer.As shown in Figure 5, the insulator image with a size of 640 × 640 is input, and the image size of 320 × 320 is obtained by focus slicing.After fourfold downsampling, the network obtained four scale feature maps, denoted as P2, P3, P4, and P5.The P2 scale is 160 × 160, the P2 downsampling is P3, the scale is 80 × 80, and P4 and P5 are obtained.The network head obtains C5 with a scale of 20 × 20; C5 upsampling obtains C4 with a scale of 40 × 40; and similarly, upsampling obtains C3 and C2.At the same time as upsampling, the four feature maps downsampled by the backbone network are connected with the feature maps of the same scale as the head.Through the Concat connection, the texture information extracted from the bottom layer is combined with the semantic information of the high layer to improve the overall detection performance of the network.

Experiment Setting
This experiment is based on the Window10 operating system; CPU i5/4 core, GPU RTX3060/12GB, Python3.8,Cuda11.1 experimental platform.Considering that the anchor frames of the original YOLOv5 were set based on the target data in the public dataset COCO2017, the anchor frames were reclustered for the insulator dataset to obtain anchor frames with more accurate dimensions to obtain anchor frames that better match the detection target.When clustering the size of the anchor frame, the k-means clustering algorithm based on classification is used to obtain the anchor frame of the new size insulator dataset.The corresponding number of anchor frames is set for different detection scales.In this paper, four detection layers are used, and four different sizes of anchor frames are set accordingly.Each size has three types of aspect ratios.When clustering the sizes of

Experiment Setting
This experiment is based on the Window10 operating system; CPU i5/4 core, GPU RTX3060/12GB, Python3.8,Cuda11.1 experimental platform.Considering that the anchor frames of the original YOLOv5 were set based on the target data in the public dataset COCO2017, the anchor frames were reclustered for the insulator dataset to obtain anchor frames with more accurate dimensions to obtain anchor frames that better match the detection target.When clustering the size of the anchor frame, the k-means clustering algorithm based on classification is used to obtain the anchor frame of the new size insulator dataset.The corresponding number of anchor frames is set for different detection scales.In this paper, four detection layers are used, and four different sizes of anchor frames are set accordingly.Each size has three types of aspect ratios.When clustering the sizes of anchor frames, a classification-based k-means clustering algorithm is used to obtain new sizes of anchor frames for the insulator dataset.The corresponding number of anchor frames is set for different detection scales; in this paper, four detection layers are used and four different sizes of anchor frames are set accordingly, each with three classes of aspect ratio.

Experimental Datasets
The dataset used in this experiment is based on the open-source Chinese Power Line Insulator Dataset (CBLID), and the dataset is expanded.The open-source Labellmg data annotation tool is used to annotate the dataset.The dataset is divided into two categories: N-insulator (normal insulator) and D-insulator (defective insulator), using YOLO label format file to save labels.The dataset is divided into a training set and a test set at a ratio of 8:2.In the network training, the original insulator image size of the input network is 1152 × 864.The insulator data are first scaled to the standard size of 640 × 640, and then input into the backbone network to process the image.

Evaluating Indicator
To evaluate the effectiveness of the modified network more objectively, it is mainly tested from two aspects: detection accuracy and detection speed.In this paper, precision (P), recall (R), mean precision (mAP), and the number of frames per second (FPS) are selected as evaluation indicators to detect network performance.P is used to determine the probability of correct detection, R is used to determine whether the target in the full dataset can be found, and mAP is the average accuracy of all categories.The calculation formula is as follows [31]: There are two types of positive samples set in this paper, which are normal insulators and defective insulators.In Formulas ( 5) and ( 6), taking the normal insulator as an example, TP is the correct prediction of the normal insulator in the actual prediction, and FP is the detection of the abnormal insulator as the normal insulator, that is, the wrong prediction.FN is predicted as a defective insulator or undetected insulator for normal insulators, which is also an error prediction.Formula ( 7) is the average precision (AP), which means that the precision value obtained by the recall rate in the range of 0 to 1 is averaged.AP i represents the average accuracy of the i category of samples.In Formula (8), K is the number of categories of samples in the dataset, and K = 2 is set in this paper.Where mAP @ 0.5 indicates that when the intersection-union ratio is set to 0.5, for the average accuracy of the set two types of samples, the sum of the two is averaged to obtain the overall average accuracy mean.The above evaluation metrics provide an objective description of the test results of the insulator dataset on various models.

Discussion
To evaluate the performance of the algorithm more objectively and reasonably, two kinds of experiments are designed to judge the effectiveness of the algorithm in this paper from the aspects of improving the effectiveness of each module of the algorithm itself on the network and comparing it with the same detection algorithm.Firstly, different feature extraction modules are compared.At the same position of the backbone network, the CSP module and Ghost module are used, respectively.Comparing the parameters of the two, the parameter size of the CSP module is 0.567 MB, while the parameter size of the Ghost module is 0.033 MB.The data show that the Ghost module is used to build the backbone network of the detection model.Compared to the convolution module in the original network, the number of parameters is significantly reduced, the memory is lower, and the resource usage is reduced.

Comparison of Different Receptive Field Modules
At the end of the model backbone network, the receptive field module is added to fuse the feature information of each scale.The CRF receptive field module designed in this paper is compared with SPP and the receptive field block (RFB) [32] to verify the effectiveness of the CRF module.The results are shown in Table 1: Three kinds of receptive field modules were compared by detecting the accuracy rate, recall rate of normal and defective insulators, and the average accuracy of the two categories when the confidence level was 50.As can be seen from the data in the table, for the CRF module of normal insulators, the accuracy rate is 0.915, while the RBF module shows a better recall rate.In the detection of defective insulator areas, the detection accuracy of the RBF module is the highest, the recall rate of the CRF module is 1, and the overall detection accuracy of the CRF module is the highest.In summary, the designed CRF module compares the first two receptive field modules, combines the proportion information of the target in the graph, sets convolution kernels of different sizes, better obtains global and local feature information, and enhances the network's fusion of semantic and texture information.

Ablation Experiment
In order to verify the effectiveness of the improved algorithm proposed in this paper, different experimental groups are set up, and different experimental modules are replaced and added for experimental verification.The experimental results are shown in Table 2 below: Based on the original network, the Ghost lightweight module is set in turn, the network detection layer is increased, and the CRF receptive field module is designed.Comparing the modules added in this paper with the original network, comparing method 1 with method 2, the accuracy of model detection decreases by 0.5% on average with the Ghost lightweight module, but improves the speed of model detection and reduces the amount of network computation with the Ghost module.Comparing method 3 with method 2, the addition of detection layers reduces the recall for the category D-insulator but increases the accuracy due to the increased detection scale for the small area target of the defective insulator.The detection layer set up to constitute the new model architecture improves accuracy for both categories of detection, and the overall performance of the network improves.Comparing method 4 with method 3, replacing the SPP module with the CRF receptive field module improves the detection accuracy of the model by 1.9%.By analyzing the experimental results, we can see the effectiveness of the module designed in this paper.
As shown in Figure 6, the light blue curve is the P-R curve of the N-insulator category, the orange curve is the P-R curve of the D-insulator category, and the dark blue curve is the P-R curve of the overall average accuracy of the network.In the P-R graph, the abscissa is the recall rate, and the ordinate is the precision rate.The larger the area enclosed by the curve and the abscissa and ordinate axes, the closer the curve is to the upper right corner, and the better the network performance.The four experimental groups combined with the ablation experiment produced their corresponding P-R curves, arranged in order, as shown in Figure 6.By comparing the four graphs, it can be seen that the area enclosed by the curve and the horizontal and vertical axes in image (d) is larger, the accuracy and recall rate is higher, and the network model performance is better than the first four.
upper right corner, and the better the network performance.The four experimental groups combined with the ablation experiment produced their corresponding P-R curves, arranged in order, as shown in Figure 6.By comparing the four graphs, it can be seen that the area enclosed by the curve and the horizontal and vertical axes in image (d) is larger, the accuracy and recall rate is higher, and the network model performance is better than the first four.

Contrast Experiment
The algorithm in this paper is compared with Faster R-CNN [33], which is the typical representative of the two-stage detection algorithm in the current target detection algorithm, CenterNet [34] based on anchor-free, and the original YOLOv5 algorithm.
As can be seen from Table 3, Faster R-CNN has a higher recall rate for normal insulators, but a lower detection accuracy and slower detection speed for defective insulators in small areas.The performance of CenterNet is significantly lower than that of other algorithms, and its detection accuracy of defective insulators is high.However, CenterNet cannot detect the number of defective insulator images in the dataset well.In the above

Contrast Experiment
The algorithm in this paper is compared with Faster R-CNN [33], which is the typical representative of the two-stage detection algorithm in the current target detection algorithm, CenterNet [34] based on anchor-free, and the original YOLOv5 algorithm.
As can be seen from Table 3, Faster R-CNN has a higher recall rate for normal insulators, but a lower detection accuracy and slower detection speed for defective insulators in small areas.The performance of CenterNet is significantly lower than that of other algorithms, and its detection accuracy of defective insulators is high.However, CenterNet cannot detect the number of defective insulator images in the dataset well.In the above network model, the detection of the category D-insulator is poor.Compared with the two-stage detection model, the detection accuracy is improved by 14.2%, and the detection speed is also improved by 50 FPS.Compared with the detection model without an anchor frame, the detection accuracy is improved by 19.8%, and the detection speed is also improved.Compared with the original model, the detection accuracy is improved by 7.3%, and the detection speed is also improved by 9FPS.At the same time, the improved network has the best detection accuracy for normal insulators, and the network running speed is also improved.
The image in Figure 7 is the detection result map under different network models, where (a), (b) and (c) columns are three different insulator images to be detected.The detection results of the four network models are in turn.Through the picture, it is more clear and more intuitive to see that Faster-RCNN in column (a) misdetects the defective insulator as a normal insulator.The CenterNet network model misdetects the enclosure in the background as a normal insulator and the CenterNet in column (b) does not detect the defective insulator, resulting in a missed detection.The original YOLOv5 model also shows false detection of the wall in the defective insulators in column (a) and false detection of defective insulators.For defective insulators listed in (a), the method presented in this paper has more advantages.For normal insulators listed in (c) for detection, it can be seen that the results of the four models are better for the detection of normal insulators.Combined with network detection indicators and intuitive picture results, it can be seen that this method has higher detection accuracy and faster detection speed.From the detection accuracy and speed, this algorithm has more advantages.

Conclusions
Based on the YOLOv5 model architecture, this paper designs an algorithm for insulator defect detection of overhead transmission lines.First, according to the label file corresponding to the image in the training set, the anchor frame size is obtained by k-means

Conclusions
Based on the YOLOv5 model architecture, this paper designs an algorithm for insulator defect detection of overhead transmission lines.First, according to the label file corresponding to the image in the training set, the anchor frame size is obtained by k-means clustering, so that the network can obtain more accurate positioning.The lightweight Ghost module is used to replace the original convolution operation to construct a lightweight backbone network, which reduces the computational complexity, reduces the network computation, and improves the detection speed of the network to 62 FPS.The small-scale detection layer is added to reduce the loss of small-scale target features.At the same time, the CRF receptive field module is introduced to extract more effective feature information, which improves the detection accuracy of the network by 7.3%, and the average detection accuracy reaches 97.4%.It realizes a more intelligent detection of insulator defects, which reduces the manual input and improves the detection accuracy of defective insulators.In order to verify the effectiveness of the improved algorithm proposed in this paper, different experimental groups are set up, and different experimental modules are replaced and added for experimental verification.

Figure 2 .
Figure 2. Improving the backbone network of YOLOv5.

Figure 5 .
Figure 5. Improving the overall structure of YOLOv5.

Figure 5 .
Figure 5. Improving the overall structure of YOLOv5.

Figure 6 .
Figure 6.P-R curve.(a) P-R curve of method 1 in ablation experiment ; (b) P-R curve of method 2 in ablation experiment; (c) P-R curve of method 3 in ablation experiment; (d) P-R curve of method 4 in ablation experiment.

Figure 6 .
Figure 6.P-R curve.(a) P-R curve of method 1 in ablation experiment; (b) P-R curve of method 2 in ablation experiment; (c) P-R curve of method 3 in ablation experiment; (d) P-R curve of method 4 in ablation experiment.

Figure 7 .
Figure 7. Detection results of different network models.(a) Figure A shows the detection results under different network models; (b) Figure B shows the detection results under different network models; (c) Figure C shows the detection results under different network models.

Figure 7 .
Figure 7. Detection results of different network models.(a) Figure (A) shows the detection results under different network models; (b) Figure (B) shows the detection results under different network models; (c) Figure (C) shows the detection results under different network models.

Table 1 .
Comparison of different receptive field modules.

Table 2 .
Performance index comparison of ablation experiment.

Table 3 .
Performance comparison of different models.