Detection of Insulators on Power Transmission Line Based on an Improved Faster Region-Convolutional Neural Network

: Detecting insulators on a power transmission line is of great importance for the safe operation of power systems. Aiming at the problem of the missed detection and misjudgment of the original feature extraction network VGG16 of a faster region-convolutional neural network (R-CNN) in the face of insulators of different sizes, in order to improve the accuracy of insulators’ detection on power transmission lines, an improved faster R-CNN algorithm is proposed. The improved algorithm replaces the original backbone feature extraction network VGG16 in faster R-CNN with the Resnet50 network with deeper layers and a more complex structure, adding an efﬁcient channel attention module based on the channel attention mechanism. Experimental results show that the feature extraction performance has been effectively improved through the improvement of the backbone feature extraction network. The network model is trained on a training set consisting of 6174 insulator pictures, and is tested on a testing set consisting of 686 pictures. Compared with the traditional faster R-CNN, the mean average precision of the improved faster R-CNN increases to 89.37%, with an improvement of 1.63%.


Introduction
Insulators are widely used in power systems to provide electrical insulation and mechanical support for high-voltage transmission lines [1]. However, under the effects of long-term switching and lightening overvoltage, thermal strain, and natural aging, insulators will fail due to cracks or surface pollution which will hinder their safe operation and cause huge economic losses and casualties in the power transmission system. Therefore, it is crucial to detect insulators from complicated background to ensure the safe running of a power system. With the rapid development of Unmanned Aerial Vehicles (UAV), and since traditional manual inspection methods for detection are time-consuming and highly dangerous, the application of UAV inspection is becoming popular. As the massive aerial images are becoming increasingly easier to access, an accurate and real-time broken insulators location method is urgently needed.
At present, the methods for detecting insulators can be divided into two categories, according to the development stage. One is the traditional methods for detecting insulators, which combines human-designed features and classifiers [2][3][4][5]. Another is the the methods of detecting insulators that are based on deep learning networks [6][7][8][9]. For example, in [10], firstly, Xiaotong Yao et al. combined the Canny operator with the SURF algorithm to extract the edge feature points of the image insulator, then used the Haar wavelet to obtain the description information of the feature points and the Euclidean distance ratio to match the feature points of the target insulator, finally using the RANSAC algorithm to obtain the correct matching feature points for realizing the identification and detection of insulators. Although traditional methods provide a new idea for insulator detection, the drawbacks of traditional methods are gradually emerging. The detection results depend on the quality of image segmentation, which seriously affects the speed of subsequent feature extraction and classification and is not very good for insulator image segmentation with complex backgrounds in the traditional detecting methods of insulators.
In recent years, since deep learning networks have strong learning and expression ability, and feature extraction is generalized, experts and scholars at home and abroad have gradually applied deep learning networks to insulator detection. There are two main directions in the detecting methods of insulators based on deep learning networks: a singlestage target detection algorithm based on regression problems [5] and a two-stage target detection algorithm based on candidate regions. The main representatives of the singlestage target detection algorithm are the YOLO series algorithm and the SSD algorithm. For example, in [11], an insulator identification method combines traditional methods, such as edge detection and line detection, with the YOLO-V2 algorithm (You Only Look Once V2). However, the design of the loss function and anchor frame of the YOLO-V2 algorithm is far from that of the YOLO-V3 (You Only Look Once V3), and the detection performance is not as good as that of the YOLO-V3. Therefore, in [12], on the basis of the YOLO-V3 algorithm, the focal loss function and balanced cross-entropy function were introduced into YOLO-V3 for the imbalance of positive and negative samples in the training data set. However, since YOLO-V2 and YOLO-V3 are single-stage target detection algorithms, the detection accuracy of single-stage target detection algorithms is lower than of two-stage target detection algorithms. Liu et al. applied the YOLO-v3 target detection network to the localization and recognition of power insulator equipment, providing a new concept for power equipment inspection, but the recognition accuracy is still far from practical applications [13].
Compared with the single-stage target detection algorithm based on regression problems, the two-stage target detection algorithm based on candidate regions is superior in detection accuracy and positioning accuracy. Aiming at the low accuracy of YOLO-V3 in detecting insulator images, Yan Hongwen et al. used the focal loss function and the balanced cross-inheritance function to improve the loss function of the model for improving the insulator identification accuracy [14]. Ji Chao et al. proposed a saliency calculation method, which combines salient regions into the fast region candidate network Fast R-CNN, which avoids time-consuming candidate region traversal. In addition, a residual block is introduced based on the Fast R-CNN feature extraction network, which ensures the integrity of the feature information transfer and improves the detection efficiency [15]. Two convolutional neural networks on a limited insulator data set were discussed in [16]. Experimental results showed that the faster regional convolutional neural network (faster R-CNN) achieves a higher AP value (average precision) than the fully convolutional network (R-FCN). In [17], the faster R-CNN algorithm was used to extract features of insulators and the adaptive image preprocessing, area-based, non-maximum suppression and segmentation detection were introduced to effectively detect insulators, but the insulator positioning accuracy and model training efficiency were not high. In [18], an insulation detection method combining an attention mechanism and faster R-CNN was proposed. This method introduced a compressed excitation network structure in the convolutional feature extraction network.
In order to further improve the performance of faster R-CNN, we propose a faster R-CNN based on Resnet50 and efficient channel attention (ECA)-net. The contributions are summarized as follows.
(1) Resnet50 network is used as the backbone feature extraction network. The improved algorithm uses the Resnet50 network as the backbone feature extraction network to replace the original VGG16 network, which will result in more comprehensive features extracted; (2) An efficient channel attention (ECA)-net based on the channel attention mechanism is added. The addition of ECA-net helps to extract useful information and suppress useless information, which helps to improve the overall performance of faster R-CNN.
The structure of the paper is arranged as follows. The development process from R-CNN to faster R-CNN and how to improve the faster R-CNN algorithm will be introduced in Section 2. In Section 3, experimental results and analysis are discussed. Finally, conclusions are described in Section 4.

Target Detection Method
This section, first, briefly describes the development process from R-CNN to faster R-CNN in Section 2.1, and then the network structure of faster R-CNN is described in Section 2.2. Finally, the focus of this part will be described in detail in Section 2.3, which outlines how to improve the faster R-CNN algorithm.

The Development History of Faster R-CNN
R-CNN is a milestone in the application of the CNN method to the target detection problem. It was proposed by RBG in 2014. Based on the good feature extraction and classification performance of CNN, the target detection problem is transformed through the method of Region Proposal. The steps of using R-CNN for target detection are listed as follows: 1.
Input the image; 2.
Use the selective search algorithm to extract about 2000 Region Proposals, from top to bottom, in the image; 3.
Warp each Region Proposal to a size of 227 × 227 and input it to the CNN. The output of the fc7 layer is used as a feature; 4.
Input the CNN features extracted by each Region Proposal into SVM for classification; 5.
Perform border regression for the Region Proposal classified by SVM, and use the bounding box regression value to correct the original suggestion window. Generate prediction window coordinates.
On the Pascal VOC 2012 data set, R-CNN increased the verification index map for target detection to 53.3%, which is a full 30% improvement over the previous best results. However, R-CNN also has some obvious problems: fine-tuning the network + training SVM + training frame regressor, the steps are very cumbersome, and training is time-consuming, occupying a large space, using GPU to accelerate the VGG16 [19] model to process an image requires 47 s, and the CNN features are not learned and updated during the support vector machine (SVM) and regression process.
In response to these problems of R-CNN, in 2015, fast R-CNN [20] was improved on the basis of R-CNN. Compared with R-CNN, the main difference of fast R-CNN is that a RoI pooling layer is added after the last convolutional layer to make each suggestion window generate a fixed-size feature map. The loss function uses a multi-task loss function (multi-task loss). The bounding box regression is directly added to the CNN network for training, and the target classification and bounding box regression are corrected at the same time after the fully connected layer. Its target detection steps are listed as follows: 1.
Input the image; 2.
Use the selective search algorithm to extract 2000 or so proposal windows (Region Proposals), from top to bottom, in the image; 3.
Input the entire picture into CNN for feature extraction; 4.
Map the suggestion window to the last layer of the convolutional feature map of CNN; 5.
Use the RoI pooling layer to generate a fixed-size feature map for each suggestion window; 6.
Use Softmax Loss (probability of detection classification) and Smooth L1 Loss (bounding box regression) as a joint training for classification probability and bounding box regression.
Fast R-CNN normalizes the entire image, sends it directly to the CNN, and adds suggestion box information to the feature map output by the final convolutional layer, so that the previous CNN operations can be shared and the test speed is accelerated. During training, only one image needs to be sent to the network. Each image extracts CNN features and suggested regions at one time. The training data is directly entered into the loss layer in the GPU memory, so that the first few layers of features in the candidate region do not need to be recalculated, thus improving the speed of training; fast R-CNN implements both category judgment and location regression using deep networks, so no additional storage is needed; thus, memory space is saved. Because of the proposal of RoI pooling, there is no need to input for corp and wrap operations, thereby avoiding pixel loss and cleverly solving the problem of scale scaling.
Although fast R-CNN has made a big leap in speed, it still takes more than 2 s to detect a picture, which cannot meet the requirements from time to time. This is mainly because it consumes a lot of time during the region proposal stage and does not run well on the GPU. For this reason, faster R-CNN [6] was proposed in 2017, and it has two main differences from fast R-CNN. One is to use a RPN (Region Proposal Network) instead of the original selective search method to generate the suggestion window; the other is to share the CNN that generates the suggestion window and the CNN for target detection. Faster R-CNN creatively uses the convolutional network to generate the suggestion frame by itself, and shares the convolutional network with the target detection network, so that the number of suggestion frames is reduced from about 2000 to 300, and the quality of the suggestion frame is also substantially improved. Moreover, faster R-CNN further introduces the RPN network on the basis of fast R-CNN, and proposes an anchor box to integrate the region proposal generation process into the network training process, which effectively reduces the time of RoIs, and the accuracy is also significantly improved.

Faster R-CNN Network Structure and Detection Steps
The network structure diagram of faster R-CNN is shown in Figure 1. The detection process is described as follows: 1.
Input the image to be tested; 2.
Use the VGG16 network to extract feature maps from the entire input image. The feature maps are shared for the subsequent RPN layer and fully connected layer; 3.
The RPN network is used to generate region proposals. This layer uses softmax to determine whether the anchors are positive or negative, and then uses bounding box regression to correct the anchors to obtain accurate proposals. 4.
The RoI pooling layer collects the input feature maps and proposals, combines the information to extract the proposal feature maps, and sends them to the subsequent fully connected layer to determine the target category; 5.
Proposal feature maps are used to calculate the category of the proposal, and at the same time again, bounding box regression are used to obtain the final precise position of the detection frame.

Improved Faster R-CNN
In order to improve the performance of the faster R-CNN network model for insulator recognition, this paper proposes an improved faster R-CNN based on the faster R-CNN network model, by replacing the VGG16 backbone feature extraction network with a deeper network and a more complex model structure, namely, the Resnet50 network. In addition, an ECA-net module based on the channel attention mechanism has been added, which makes the Resnet50 network much better at the extraction of insulator features.

Resnet50 Network Replacing VGG16 Network
The VGG16 network is used as the feature extraction backbone network by the original faster R-CNN model, and the features output by the final convolution output layer of VGG16 are used as the shared features of the RPN network and the RoI pooling layer. This algorithm is used mainly to detect everyday objects, and it is suitable for the coco data set. When this algorithm is used in the insulator data set, there are mainly the following shortcomings: 1.
VGG16 uses a single-layer feature layer output to be suitable for the detection of single-sized targets. Because of the different sizes of insulators in the image, it is easy to cause missed detections and misjudgments; 2.
Due to the different scales of aerial insulator images, many insulators have become small targets related to the entire picture. In order to identify the insulators more accurately, the feature extraction backbone network needs to be improved.
The Resnet50 network [21] consists of 49 convolutional layers and 1 fully connected layer. The innovative introduction of residual blocks makes the network depth increase, and deeper features can be extracted while effectively avoiding gradient disappearance and gradient explosion. The authors in [21] claimed that the residual network with a depth of up to 152 layers on the ImageNet data set has still lower complexity than the VGG16 network. Therefore, comparing with VGG16, Resnet50 increases the depth of the network but does not increase the complexity of the network.

ECA-Net Module Joining the Resnet50 Network
Recently, the channel attention mechanism has been shown to have great potential for improving the performance of convolutional neural networks (CNNs). In order to make CNNs have better performance, more complex attention modules have been developed, such as SANet [22], SKNet [23], and ResNeSt [24]. Although the performance of the network is improved, it also indirectly increases the complexity of the model. An ECA module to improve the channel attention mechanism has been proposed in [25]. It introduced a local cross-channel interaction strategy without dimensionality reduction and a method of adaptively selecting the size of the one-dimensional convolution kernel, thereby achieving performance improvement and balancing the contradiction between performance and complexity well. This module generates channel attention through fast one-dimensional convolution, and the size of the convolution kernel can be adaptively determined by the nonlinear mapping of the channel dimension. The complexity of the model added by this module is small, and the improvement effect is significant.

Experimental Results and Analysis
This section will elaborate on the three aspects of experimental configuration and data set, evaluation index, and result analysis.

Experiments
The hardware and software configuration is described as follows: AMD Ryzen 9 3950X 16-Core Processor 3.50 GHz, 64 GB RAM, GeForce RTX 3090, Windows 10. The environment configuration is set as follows: CUDA Version 11.1, Python3.7.10, PyTorch1.8.0. The selected insulator data set is a public one with 6860 images https://github.com/heitorcfelix/public-insulator-datasets (accessed on 12 April 2021). The data set occupies a storage space of 3.3 GB. The data set consists of real images and synthetic images, and has been enhanced. This experiment uses 6174 pictures as the training set and 686 pictures as the testing set. An example of the image is shown in Figure 2. Since the format of the network data set is a coco format, the first step is to write a script to convert it to a VOC2012 format. In the experiments, although the insulator data set used is public, no literature related to this data set is available. Therefore, we have to compare the performance of the three networks, the proposed faster R-CNN with Resnet50+ECA-net, the original faster R-CNN, and faster R-CNN with Resnet50, on the public data set mentioned in this paper.

Evaluation Index
Insulator detection belongs to single-label classification learning. This experiment uses mean average precision (MAP) as the performance evaluation index of the model. Before introducing MAP, we need to review the concepts of intersection over union (IOU), precision, recall, and average precision (AP) [21,[26][27][28][29][30].
IOU is the intersection ratio, which measures the degree of overlap of two regions, and is the ratio of the overlapping area of the two regions to the total area of the two. As shown in Figure 3, the green box is the ground-truth box of the insulator; the red is the prediction box, and the IOU of the two rectangular boxes is the ratio of the cross area to the combined area. Precision and recall is described as follows. In the field of target detection, we first assume that there is a set of pictures to be detected. Precision represents the proportion of real target objects in the targets detected by the model; recall is considered to be the proportion of real targets detected by the model to all target objects in the image to be detected, that is, how many out of all real targets are successfully detected by the model. The calculation formula of precision and recall is described as follows: where TP, TN, FP, and FN represent a true positive, true negative, false positive, and false negative, respectively, and their meanings are described in Table 1. The IOU value of a target box in the data set is greater than 0.5 (True)

TP TN
The IOU value of all target boxes in the data set is less than 0.5, repeated detection (False)

FP FN
The precision-recall (PR) curve is shown in Figure 4. The higher the precision and recall, the better the model detection performance becomes, so we expect high values for both precision and recall, but in some cases they are contradictory. For example, in some extreme cases, only one result is detected and accurate; then, precision is 100%, and recall is very small. If all results are returned, then recall must be very large, but precision is very small. Therefore, it is necessary for different occasions to judge whether one wants higher precision or higher recall. Usually, the precision-recall curve is drawn to help understand them. As the name implies, AP is the average accuracy. Simply put, it is the average of the precision value on the PR curve, that is, the area under the PR curve. As shown in Figure 4, the area of the shaded area is the value of AP. AP is for a single category, and the average AP value of all categories is MAP.

Analysis of Experimental Results
This article replaces faster R-CNN's backbone feature network VGG16 with a Resnet50 network that adds the ECA-net module based on the channel attention mechanism. After the three networks (original faster R-CNN with VGG16, faster R-CNN with Resnet50, and faster R-CNN with Resnet50+ECA-net) are independently trained on the same insulator data set, the performance of the three networks for insulator detection is compared under the same testing set. Experimental results show that the improved faster R-CNN is better than the faster R-CNN with respect to the MAP.

Comparisons of Insulator Target Detection Training Results
The model parameters are set as follows: the number of iterations, the initial size of the learning rate, and the batch size are set to 150, 0.00005, and 8, respectively. Figure 5 shows the train loss curve, validation loss curve, smooth train loss curve, and validation loss curve, respectively, of faster R-CNN with VGG16 network as the backbone feature extraction network. These loss curves refer to the training loss curves of migration learning under the pre-training model of imagenet. After 126 epochs, the training loss converges to around 0.535, and the validation loss converges to around 0.716.
In order to verify the effectiveness of Resnet50 and Resnet50+ECA-net, we compare the loss curve of faster R-CNN with Resnet50, and compare faster R-CNN with Resnet50+ECAnet with the loss curve of the original faster R-CNN with VGG16, respectively. Figures 6 and  7 show the train loss curve, validation loss curve, smooth train loss curve, and validation loss curve of the faster R-CNN with the Resnet50 backbone and the faster R-CNN with the Resnet50 backbone and ECA-net, respectively. In the faster R-CNN with the VGG16 backbone, when the number of epochs is 140, the train and val losses are 0.536 and 0.718, respectively. Compared with the faster R-CNN with the VGG16 backbone, the train loss of the faster R-CNN with the Resnet50 backbone and the faster R-CNN with the Resnet50 backbone and ECA-net decreases by 16.6% and 20.1%, respectively. The val loss of the faster R-CNN with the Resnet50 backbone and the faster R-CNN with the Resnet50 backbone and ECA-net decrease by 9.6% and 10.7%, compared to the faster R-CNN with the VGG16 backbone, respectively. Experimental results indicate that the addition of the Resnet50 backbone and ECA-net helps to reduce the loss value.

Comparisons of Insulator Target Detection Testing Results
The testing experiments are performed to compare the the improved faster R-CNN and the faster R-CNN. In order to show the performance before and after improvement more directly, experimental results are shown in Figures 8-10 and Table 2, where the final APs of the PR curve of the faster R-CNN, the first improved faster R-CNN, and the second improved faster R-CNN on the testing set are 87.74%, 87.91%, and 89.37%, respectively, with an improvement of 0.19% and 1.63%. In Table 2, comparing with the original faster R-CNN, the first improved faster R-CNN and the second improved faster R-CNN have fewer parameters, lower training and testing loss values, and higher accuracy. In summary, The addition of the Resnet50 backbone and ECA-net decreases losses and raises the AP, which leads to the improved performance of the faster R-CNN.

Display of Actual Test Results
The faster R-CNN target detection model, using the Resnet50 network with the ECAnet module as the backbone feature extraction network, is tested on the testing set, and the insulator detection results of aerial transmission lines are shown in Figure 11. It can be seen that good results can be achieved for the multi-target detection of insulators of different sizes.

Conclusions
This paper proposes an insulator detection method based on an improved faster R-CNN. Aiming at the problem of missed detections and misjudgments of the original feature extraction network VGG16 in the face of insulators with different sizes, the ECA-net+Resnet50 network is used to replace the original VGG16 backbone network. The deeper network has a larger field of view, so as to facilitate the detection of targets with different scales. Experimental results indicate the effectiveness of the introduced method for detecting insulators. Comparing with the original faster R-CNN, the improved faster R-CNN has fewer parameters, lower training and testing loss values, and higher accuracy.
Our future work will focus on the further improvement of insulator detection, e.g., the direction-aware defect detection network in [31] can be considered; optimization spiking neural P systems [32][33][34] can be introduced to optimize the structure and parameters of the ECA-net+Resnet50 network used in this paper; learning spiking neural P systems or fuzzy reasoning spiking neural P systems [35,36] can also be used to enhance the insulator detection [37,38].

Data Availability Statement:
The selected insulator data set is a public one with 6860 images https://github.com/heitorcfelix/public-insulator-datasets (accessed on 12 April 2021).

Conflicts of Interest:
The authors declare no conflict of interest.