Search Like an Eagle: A Cascaded Model for Insulator Missing Faults Detection in Aerial Images

: Insulator missing fault is a serious accident of high-voltage transmission lines, which can cause abnormal energy supply. Recently, a lot of vision-based methods are proposed for detecting an insulator missing fault in aerial images. However, these methods usually lack efficiency and robustness due to the effect of the complex background interferences in the aerial images. More importantly, most of these methods cannot address the insulator multi-fault detection. This paper proposes an unprecedented cascaded model to detect insulator multi-fault in the aerial images to solve the existing challenges. Firstly, a total of 764 images are adopted to create a novel insulator missing faults dataset ‘IMF-detection’. Secondly, a new network is proposed to locate the insulator string from the complex background. Then, the located region that contains the insulator string is set to be an RoI (region of interest) region. Finally, the YOLO-v3 tiny network is trained and then used to detect the insulator missing faults in the RoI region. Experimental results and analysis validate that the proposed method is more efficient and robust than some previous works. Most importantly, the average running time of the proposed method is about 30ms, which demonstrates that it has the potential to be adopted for the on-line detection of insulator missing faults.


Introduction
In the past decades, human demand for electric energy has been increasing with the development of industrial civilization [1]. This situation inevitably leads to a phenomenon that many countries have begun to build a large number of high-voltage transmission lines. Typically, the transmission line will have various types of faults after running for a while, such as insulator missing [2,3], birds nest invasion [4,5], and transmission lines fault [6,7], etc. Therefore, it is quite important to conduct regular inspections for high-voltage transmission lines. However, the transmission lines usually span a large geographical area, which causes the traditional human-vision inspection to suffer from high error rates and be time-consuming [8], as shown in the first column of the Figure 1a.  Current power grid companies apply manned helicopter for transmission lines inspection by using various sensors [9], such as cameras, infrared sensors, and ultraviolet sensors, etc. [10] to address these problems, as shown in the second column of Figure 1a. However, this method is costly, because the helicopter driver must have a wealth of driving experience and the infrared and ultraviolet sensors are usually expensive.
Recently, adopting UAV to carry a camera to inspect key components of transmission lines has received increasing attention due to its advantages when compared with the human-vision-based and helicopter-based method, including flexibility, small size, and low-cost [11][12][13], as shown in the third column of Figure 1a. For the insulator string detection and insulator faults detection in aerial images, many vision-based methods have been proposed to pursue automatic detection that could be used in on-line or off-line applications [14]. Some typical insulator faults, such as missing [3], flashover [15], and contamination [16], have attracted widespread attention. Specifically, the missing faults are considered to be the most severe insulator fault if they cannot be detected and then repaired in time. From the perspective of a third party, UAV searching for insulator missing faults looks like an eagle searching for prey, as shown in Figure 1 b.
To detect the insulator string, the method [17] proposes three binary shapes prior to detecting the insulator string in aerial images. Firstly, the contour of the insulator string is extracted and then the RANSAC (RANdom SAmple Consensus) algorithm [18] is adopted to achieve the possible direction orientation of an insulator string. Secondly, three binary shape priors are designed to search for each insulator string candidate. Finally, the insulator string candidates with small areas are removed and the real insulator string can then be located in the aerial image. However, the shapes of the insulator pieces vary obviously due to the changes in filming angles and distances, which would result in the failure of this method; this situation will also degrade the performances of [19][20][21][22][23]. In Wang [24], the aerial image is converted from RGB (Red, Green, and Blue) space to Lab (Luminance, a channel and b channel) space for avoiding the influences from lighting changes. Subsequently, an improved Otsu algorithm [25] is proposed for obtaining the insulator string candidates. Finally, the morphological operation is applied to locate the real region of the insulator string. However, it can be expected that these methods can only achieve excellent performance in the aerial image with the salient color of insulator string since the aerial image contains various complex background interferences, such as grass, tree, and roof, whose surface colors are sometimes similar to that of the insulator string. To enhance the accuracy of insulator string detection, some researchers try to adopt machine learning-based methods [26][27][28] to detect the insulator string. In Liao [26] and Tian [27], the SVM (Support Vector Machine) model is employed to train classifiers by using the fusion features of the insulator strings. However, in the case of limited training samples, only using the color and the texture features to train an SVM classifier cannot effectively distinguish the insulator strings from the background interferences. The method [28] uses Sketchup software to create the synthetic insulator string dataset for training an Adaboost classifier to solve this problem. Although the authors only create hundreds of insulator string images, it can be expected that this method can effectively augment the insulator string dataset, which is quite useful for the training of the machine-learningbased methods. Recently, with increasing attention paid to the deep neural networks, researchers have devoted great efforts to the development of deep learning-based insulator string detection methods. Specifically, the strategies of these methods can be attributed to two categories: object detection based [29][30][31][32] and semantic segmentation based [33][34][35][36]. Although these methods achieve good performances when compared with the previous works, they have to use data augmentation methods, such as rotation and flipping, to expand datasets to meet the training needs of deep neural networks. Specially, the work [33] deems it is very hard and costly to collect variety rich insulator string samples. Therefore, they propose a novel method to create insulator string synthetic images. Firstly, the real insulator string patches are segmented from the aerial images as the basic seeds. Then, HSV (Hue, Saturation, and Value) space adjustment and affine transform are performed to expand the basic seeds. Finally, the expanded basic seeds are pasted to the aerial images that do not contain insulator strings to create a synthetic insulator string dataset. Multiple neural networks verify that their synthetic aerial images have good generalization to the real-world images.
To detect the insulator missing fault, earlier methods usually locate the coarse position of the insulator string as the first step. After that, multiple feature priors such as color [3], shape [22] and spatial distribution [8] are explored to detect the insulator missing fault. However, the earlier methods can only achieve good performances in some specific situations, while most of them are severely affected by background interferences, filming angles, and distances. Besides, these methods are time-consuming, which are far-away from real-time applications. In recent years, this research filed benefits from the development of deep learning theory, which makes insulator missing fault detection more accurate and efficient. In Gao [37], the Faster-RCNN model [38] is exploited for locating the insulator string. Subsequently, the FCN (Fully Convolutional Networks) model [39] is used to segment each insulator piece from the detected insulator string. Finally, the mid-point of each insulator piece is calculated, and then a midpoints-based criterion is proposed for judging the position of insulator missing fault. However, the performance of this method depends entirely on the performance of the FCN model. Moreover, this method will degrade in cases where insulator pieces are overlapped. To compensate for the above shortcoming, the method [40] uses an improved Faster-RCNN to train a classifier that can directly detect the insulator missing fault. However, the insulator string dataset consists of thousands of images, while the original missing fault dataset is composed of only 120 images in total; this situation will lead to the class imbalance problem. In Ling [41], the Faster-RCNN model and the U-net model are combined to create a new cascaded structure for insulator missing fault detection. Firstly, the insulator string is located by the Faster-RCNN model. Subsequently, the located insulator string is set to be an RoI (Region of interest). Finally, the U-net model performs to segment the contour of missing fault. However, labeling the ground-truth of missing fault dataset for the semantic segmentation task (i.e., for U-net training) is quite timeconsuming and laborious; moreover, it is worth noting that both methods [42][43][44] also utilize the semantic segmentation model to perform insulator missing fault detection. In [45], a new saliencybased network is proposed to detect the contour of the insulator string. Subsequently, a mathematical model of insulator pieces is developed to analyze the spatial feature of each insulator piece. Finally, an area-based principle is proposed to judge the position of the missing fault. Similar to [37], this method will degrade in cases where the insulator pieces are overlapped. The method [46] adopts an SSD model [47] to perform multi-level perception for both insulator string and missing fault detection to enhance the robustness of missing fault detection. The experimental results validate the effectiveness and efficiency of the proposed method. However, this method also suffers from the limited images, which might lead to the over-fitting problem of the network. To solve this problem, a novel data augmentation strategy is proposed in [48]. Firstly, the affine transformation algorithm is applied to the original image. Subsequently, the TV-based algorithm and U-net model are used to segment the insulator string from the original background, and the segmented insulator string is then merged with the new background. Finally, the Gaussian blur and the brightness transformation are performed to change the ambiguity and brightness of the image for simulating real imaging environments.
In general, most of the previous works lack efficiency and robustness, and the existing challenges can be concluded, as follows: 1. There has been no publicly available dataset for insulator string detection in UAV aerial images so far. Moreover, the number of insulator missing faults' images is even less than that of the insulator string. 2. The aerial images usually contain lots of background interferences, and the shapes of the insulator string and insulator missing faults changes significantly due to the changes in filming angle and distance. 3. Most of the previous works can only detect insulator missing one fault, while they cannot recognize insulator missing multi-fault. In addition, most of the previous works usually lack efficiency.
On the one hand, the first challenge makes it difficult to obtain a fair evaluation of the performances of the existing methods. Meanwhile, this challenge also makes it difficult to apply deep learning methods in the field of insulator faults detection. On the other hand, the first and the second challenge both lead to the fact that most previous methods usually lack robustness, as they are only designed by analyzing few aerial images of insulator missing faults while the features of the insulator missing faults may be quite distinctive in different aerial images. It is worth noting that the third challenge is considered as a common challenge for most of the previous works.
We cooperate with the Chinese Power Grid Companies to collect insulator string aerial images to address the existing challenges. Moreover, since the natural images that contain insulator missing faults are scarce, we create simulated insulator missing faults images as a key step. After that, we create a novel insulator missing faults dataset which is named 'IMF-detection' (Insulator Missing Fault, IMF) in this paper. Based on the above works, we propose an unprecedented cascaded model to detect the insulator missing faults in aerial images. The proposed model contains two cascaded steps: Firstly, we improve the network structure used in our previous work [2] by integrating a novel SPP (Spatial Pyramid Pooling) model. The improved network can more accurately detect the insulator string and then remove most of the background interferences in aerial images. Secondly, the detected region of the insulator string is set to be an RoI, and the YOLOv3-tiny network is then adopted (YOLOv3-tiny is the abbreviated version of YOLOv3 [49]) to detect the insulator missing faults. Our proposed method is simple but effective when compared with the previous methods. Moreover, the proposed method can detect not only the insulator missing one fault, but also the insulator missing multi-fault. Most importantly, the average running time of the proposed method takes only about 30ms, which can meet the requirements of real-time applications.
The remainder of this paper is organized, as follows. Section 1 reports the related works of insulator string detection and insulator missing fault detection. Section 2 gives a detailed description of the proposed method. Section 3 exhibits and discusses the experimental results. Finally, Section 4 presents the conclusion and future work. Most of the experimental results are shown in pictures, and please zoom in for a better view.

Proposed Method
Recently, deep-learning-based methods are widely used for object detection in the computer vision community. Some research fields, such as autonomous driving [50], three-dimensional (3D) scene Representation [51], and person re-identification [52] have developed rapidly. Inspired by these pioneering excellent works, it is worth investigating how to use deep-learning-based models for insulator strings and missing faults detection to address the existing challenges. In this section, the proposed cascaded model is divided into two parts for a step by step analysis.

Insulator String Detection
It is found that these images contain not only insulator strings, but also background interferences, based on the observations of large amounts of aerial images. Specifically, these background interferences are usually quite complex that will affect the accuracy of insulator strings detection. Our previous work [2] proposed a novel network to obtain accurate insulator strings' positions to solve this problem; and the experimental results show that it achieved AP (Average Precision) value (89.96%) on 'InST_detection' dataset, which was higher than that of YOLOv2 [53], ResnetV2 [54], and YOLOv3-tiny, while it was lower than that of YOLOv3 [49]. In fact, the accuracy of insulator string detection will affect the accuracy of subsequent insulator missing faults detection in the practical applications. Therefore, although the experimental results validated that our previous network can effectively and efficiently distinguish the insulator strings from the background interferences in aerial images, we decide to propose a new network that is more accurate than the method [2] proposed in this work.
As we know, the previous deep-learning based models usually need fixed-resolution of the input images during the training and testing process. Some researchers try to adopt cropping or warping operations to solve this problem. However, these operations often reduce the recognition accuracies of the previous models. The work of [55] proposes an embeddable SPP (Spatial Pyramid Pooling) model with multiple pooling size to enhance the performances of the previous deep-learning based models to tackle this challenge. Experimental results demonstrate the performance of CNNs (Convolutional Neural Networks) can be improved by incorporating that SPP model. In addition, the experimented results in the work of [56,57] also validate the effectiveness of the SPP model. In this work, being motivated by these previous works, a novel three-structure SPP model is proposed to create multi-scale local region features, as shown in Figure 2. The proposed SPP model can be divided into three parts, the first part is inspired by the work of [56], which uses a four-level spatial pyramid (1×1, 5×5, 9×9, 13×13) to max-pool the feature maps. The second part and the third part adopts two new combinations of different max-pooling sizes to extract the feature maps, i.e., 1×1, 5×5, 7×7, 11×11 and 1×1, 3×3, 5×5, 7×7, respectively.
To ensure the multi-scale local region feature created in the previous detection branch can be used by the next detection branch, we integrate it in our previous network between the 8 th and 9 th , 19 th and 20 th , 30 th , and 31 st convolutional layers in front of the 1 st , 2 nd and 3 rd detection header to formulate the proposed network, as shown in Figure 3. The reasons that we propose the novel SPP model can be concluded, as follows: 1. In the real world, the objects' sizes can be roughly divided into three categories, i.e., large size, middle size, and small size. Moreover, each category can be further sub-divided into different sizes. Therefore, it is quite important to use multi-scale local region features with different receptive fields in the same pipe-line for insulator strings detection. Unfortunately, most of the existing insulator strings detection networks [58][59][60][61] neglect this important information. 2. With the increase of the network depth, the max-pooling filter size should be decreased to prevent the loss of the features of insulator strings. Therefore, the sizes of the last two maxpooling filters in the second branch are reduced, while the sizes of the last three max-pooling filters in the third branch are further reduced, as compared with the SPP structure used in the first branch. The effectiveness of the proposed SPP model is verified in the experiment, as shown in Section 3. GIoU loss [62] is introduced in the training phase of the proposed network in this work to further improve the performance of the network, which is given as follows: Where measures the intersection over union between the predicted bounding box and the ground-truth box.
indicates the area of the smallest box containing the predicted bounding box and the ground-truth box. denotes the loss value of GIoU. The AP value of the proposed network can be further boosted by incorporating the GIoU during the training phase.

Insulator missing Faults Detection
In this work, a cascaded model is adopted to detect the insulator missing faults. First of all, the proposed network that is described in Section 2.1 is adopted to locate the insulator string's position in an aerial image. After that, the located insulator string's position is set to be an RoI region. Finally, the YOLOv3-tiny network is employed to detect the insulator missing faults in the RoI region.
Although the insulator missing faults are not real objects, they still have a range of dimensions in the fixed-size images. To make the YOLOv3-tiny network easier to learn, the k-means algorithm is adopted to obtain the bounding box priors of insulator missing faults; Figure 4 shows the results. Based on the observation of Figure 4, it is found that after k = 6, the value of average IoUs increases slowly. Therefore, the number of bounding box priors of insulator missing faults is set to be 6 (IoU = 83.13%) in this work, and the corresponding bounding box priors are listed in ascending order of size: 30  Energies 2020, 13,713 The reasons that we choose the two-step model (i.e., the proposed network in Section 2.1 combine with YOLOv3-tiny network) to detect insulator missing faults can be concluded, as follows: 1. The number of insulator missing faults aerial images is usually less than that of insulator strings aerial images since the insulator missing faults samples are scarce. If the insulator missing fault is regarded as a new class that has equal status with the insulator string to train the network, it will inevitably result in a class imbalance problem in some practical applications 2. The speed of object detection in the YOLOv3-tiny network is quite fast. In the size of a 416×416 pixels aerial image, the YOLOv3-tiny network can process this image in real-time, which means that it can be adopted for real-time applications 3. Most of the memory capacities of the on-UAV embedded vision processing units are limited.
The memory usage of the final weights for the YOLOv3-tiny network is only 33MB, which is less than most of the one-stage deep learning networks 4. Insulator missing fault is just one of the fault types of insulator strings faults, and there are also faults, such as flashover, bird dung pollution, etc. Through using the proposed two-step strategy, it can be easy to realize multi-type insulator faults detection by replacing the YOLOv3-tiny network with the other specialized networks.

Dataset Preparation
In this work, the proposed network in Section 2.1 is trained on the 'InST_detection' dataset for a fair comparison with the work of Han [2]. The 'InST_dataset' contains 4031 insulator string aerial images, in which 2675 images are assigned to be a training set, while the other 1356 images make up a testing set [2]. All of the images are resized to the same size of 416×416 pixels.
We follow the idea proposed in our previous work [2] to create simulated insulator missing faults samples using Photoshop software [63] since the insulator missing faults aerial images are scarce. Our dataset is named as 'IMF-detection', and its details are shown in Table 1. The 'IMFdetection' dataset contains 764 images, in which 573 images are randomly selected to be a training set, while the other 191 images are assigned to be a testing set. Moreover, there are a total of 1194 insulator missing faults in the 'IMF-detection' dataset. Specially, the images in the training set are resized as 416×416 pixels, while all of the images for testing are 800×530 pixels. The labelImg tool [64] is employed to label the missing faults positions to achieve the ground-truth of insulator missing faults. Specially, motivated by the work of [48], the labeled missing faults position contains not only the missing position, but also two adjacent insulator pieces, as shown in Figure 5.

Experimental Results and Discussion
In this work, the experiments are evaluated based on the Darknet framework [54], and they were conducted on a new PC with an Intel(R) quad-core(TM) i7-8700K, 3.7GHz CPU, 32 GB of RAM, and an NVIDIA GeForce TITAN XP (12 GB). In addition, it should be noted that parts of the source code of our previous work are evaluated on the visual studio framework.

Performances of the Insulator Strings Detection
To verify the effectiveness and the robustness of the proposed network, we compare it with five one-stage networks, i.e., YOLOv3, YOLOv3-tiny, YOLOv2, ResnetV2, and method [2]. For fair comparisons, all six networks are trained and then tested on the 'InST_detection' dataset. The proposed network employs the ResNet50 network that was pre-trained on the ImageNet dataset as the backbone, while the parameters of the other layers are randomly initialized. During the training phase, the maximum iterations of the networks are set to be 35,000, and the learning rates are initialized to be 0.001. Moreover, the learning rates decrease by a factor of 90% after 20,000 th and 28,000 th iterations. We apply the data augmentation method used in [2,53] to improve the generalization ability of the networks, and then the hue, saturation, and exposure of the images are randomly changed during the training phase of both the proposed network and the five compared networks.
The AP (Average Precision) value is introduced in the experiment to measure the performances of the six networks. The higher the AP value, the more accurate the detection result. Specifically, the Precision value and Recall value can be calculated, as follows: In formula (2), True Positive gives the number of insulator strings that have been correctly identified. False Positive denotes the number of background interferences that are identified as insulator strings, while the False Negative indicates the number of insulator strings that are regarded as background interferences. Figure 6a shows the Precision-Recall curves for the six networks tested on the testing set of 'InST_dataset', and the AP value for each network is listed on the left-bottom of Figure 6a. Specifically, the X-axis and the Y-axis indicates the recall value and precision value, respectively. Based on the observation of Figure 6a, it is found that the AP value of the proposed network (90.31%) is higher than that of YOLOv3 (90.05%), YOLOv3-tiny (52.78%), YOLOv2 (89.83%), ResnetV2 (85.92%), and method [2] (89.96%), which means that the proposed method is more accurate than the five compared methods in insulator strings detection. Specially, the AP value of the proposed network is higher than that of our previous work, which indicates that our improvement strategy is effective.
We design and test two other networks to further validate the performance of the proposed SPP model and the GIoU introduced on the training phase of the proposed network: Model A and Model B on the testing set of 'InST_dataset', and the results are shown in Figure 6b. The Model A integrates the same SPP model, i.e., 1×1, 5×5, 9×9, 13×13 in both the three detection branches of our previous network, and it also adopts the GIoU during the training phase. The Model B adopts the proposed SPP model, i.e., 1×1, 5×5, 9×9, 13×13; 1×1, 5×5, 7×7, 11×11 and 1×1, 3×3, 5×5, 7×7 to improve our previous network, while it does not employ the GIoU on the training phase. Based on the observation of Figure 6b, the proposed network achieves 90.31% AP value, which is higher than that of model A (90.16%). This means the proposed SPP model can effectively improve the performance of our previous network. As for the model B, it achieves 90.24% AP value, which is lower than the proposed network. This means that, by integrating the GIoU strategy, the network will achieve a better state that can more accurately recognize the insulator strings. Some typical images are selected and divided into four common conditions to further exhibit the visualization performances of the proposed network, as shown in Figure 7. Specifically, Figure  7a shows two conditions that are caused by the filming angles; the left image shows an aerial image with a clear view on insulator strings, while the insulator string in the right one is truncated. Figure  7b exhibits two aerial images with different lighting conditions; the color of insulator strings in the left image appears white under the interference of strong light, while the right image is filmed in common light condition. In Figure 7c, the background of the left image is a lake, which is more simple than the background of the right image that contains a lot of leaves that are similar to the insulator pieces. Figure 7d gives two aerial images with different sizes of insulator strings. Specially, the insulator strings in the right image are quite small than the insulator strings in the other seven images. Based on the observations of Figure 7, the proposed network achieves good performances in both four common conditions in the UAV-based transmission lines inspection, which means that it can provide accurate insulator strings' positions for the subsequent insulator missing faults detection.

Performances of the Insulator missing Faults Detection
The average precision rate and the average recall rate are introduced and then adopted in this section to validate the performances of the proposed method on the testing set of the 'IMF-detection' dataset, as shown in formula (3).
Specifically, and denote the precision rate and the recall rate on the detection of the ith image, respectively. ∈ 1, shows the image number. indicates the average precision rate, while the gives the average recall rate on the testing set of the 'IMFdetection' dataset. We follow the rounding principle to keep only one digit after the decimal point.
To verify the accuracy and robustness of the proposed method, we compared it with method [2], which is considered to be an excellent method that can be used to detect both insulator missing one fault and insulator missing multi-fault on aerial images. For a fair comparison, the two methods are tested on the testing set of the 'IMF-detection' dataset. Figure 8 shows the average precision rates and the average recall rates of the two methods, respectively. Based on the observations of Figure 8, the average precision rate of the method [2] achieves 64.4%, which is 27.7% lower than that of the proposed method (92.1%). This situation indicates that a large number of false positive samples of insulator missing faults are generated while using the method [2] for missing faults detection, while the proposed method generates less false positive samples of insulator missing faults. Moreover, the average recall rate of the method [2] achieves 59.8%, which is 32.4% lower than that of the proposed method (92.2%). This means that most of the insulator missing faults are successfully detected by the proposed method, while the method [2] fails to detect some insulator missing faults. The testing set of the 'IMF-detection' dataset is divided into two parts, one contains only insulator missing one fault images (98 images) while the other contains only insulator missing multi-fault images (93 images), to further validate the robustness of the networks. Figure 9 and Figure 10 show the experimental results of one-fault detection and multi-fault detection, respectively. Based on the observations of Figure 9 and Figure 10, it is found that both the average precision rate and average recall rate of the proposed method are higher than those of the method [2]. Therefore, it can be concluded that the proposed method outperforms the method [2] on the 'IMF-detection' dataset.   Some typical images with different aerial scenes are selected to evaluate the performances of the proposed method on a range of aerial scences, as shown in Figure 11. Specifically, parts of images contain insulator missing one fault while the others contain missing multi-fault. During the transmission lines inspection, the UAVs usually film the insulator strings under proper lighting conditions to make it easier to observe whether there are missing faults in insulator strings. However, some collected images' qualities are not as good as expected due to the limitations of flying surroundings, as shown in the first row of Figure 11a. Figure 11a contains two images, in which one is filmed under strong lighting interference, while the other one is taken under the proper lighting condition. Although it is possible for human eyes to find the missing fault under strong lighting interference, it is still difficult to accurately detect missing fault by the vision-based algorithms due to the loss of the special features of the insulator missing fault. Based on the observation of Figure  11a, although the method [2] can locate the missing fault in the image with proper lighting conditions, it gives the wrong position of the insulator missing fault as expected due to the strong light interference. In contrast, the proposed method can accurately detect the missing faults in both two images in Figure 11a with different lighting conditions. Similarly, the insulator pieces in the aerial images are sometimes overlapped due to the limitations of filming angles and distances. For instance, the first row of Figure 11b exhibits an image with a clear view on each insulator piece, while the second row of Figure 11b gives an image in which the insulator pieces are overlapped. Most importantly, the missing fault positions in the second image of Figure 11b are truncated by the insulator pieces. Although this situation is even difficult for the human eyes to capture the missing faults, the proposed method can still give accurate missing faults' positions. On the contrary, the method [2] can only detect one of the missing faults while missing the other one. Figure 11c shows the conditions that made most of the previous works suffer poor performances, i.e., the complex interferences in the backgrounds of aerial images. The background of the first image of Figure 11c is a lake, which is more simple than that of most of the aerial images. However, although the method [2] recognizes the missing fault, the detected fault position deviates from the real position. The background of the second image of Figure 11c contains lots of grass patches, which are similar with insulator pieces; the detection result of the method [2] reports this insulator string works normally without insulator missing fault, while the proposed method distinguishes the missing fault from the complex background. Figure 11d shows two images that contain insulator missing multi-fault. All of the insulator pieces in the first image are of the same type, while there are five insulator pieces types in the second image that are different from that of the other insulator pieces. Based on the observations of Figure 11d, the method [2] can effectively identify all of the missing faults in the first images, while it regards the positions of the five special insulator pieces as the insulator missing faults' positions. In contrast, the proposed method achieves good performances in both two images in Figure  11d. We think that the possible reasons for the good performances of the proposed method can be concluded, as follows: (1) The proposed cascaded model contains two parts, in which the proposed network in the first part can accurately locate the insulator strings, while the complex background interferences are removed. Therefore, the accuracy of the subsequent missing fault detection is potentially improved. (2) The second part of the proposed cascaded model adopts the CNN-based module as the missing fault detection strategy, while the method [2] still adopts traditional algorithms as the key strategy in the process of the fault detection task [65]. In comparison with traditional algorithms, the CNN-based model can directly capture the high-level representation of missing faults. In summary, it can be concluded that the proposed method is more accurate and robust than the method [2] in the task of insulator missing faults detection. (c) First-row image: Simple background, and Second-row image: Complex background.
(d) First-row image: Insulator pieces are the same type, and Second-row image: Insulator pieces are not of the same type. Figure 11. Experiments in different aerial scenes. In each sub-image, the detected bounding box of the method [2] is marked by green color, while that of the proposed method is marked by a pink color.
Finally, for a method to be applicable to on-line insulator missing faults detection, it should be robust under diverse aerial scences and perform in real-time. Therefore, the real-time performance of the proposed method should be verified in this section, as shown in Figure 12 (Under the rounding principle). The average running time of the proposed method is about 30ms, which is faster than that of the method [2] (130 ms), method [3] (530 ms) and method [8] (680 ms), according to the observations in Figure 12. Most importantly, the proposed method can achieve real-time detection, which means that it has the potentiality for real-time applications in insulator missing faults detection.

Conclusions and Future Works
In this paper, a novel cascaded model is proposed to detect insulator missing faults in aerial images. First of all, we collect and create an unprecedented and novel dataset 'IMF-detection'. This dataset contains 764 insulator missing faults aerial images that are filmed in different aerial scenes by UAVs. After that, a new spatial pyramid pooling is introduced in our previous proposed network to improve the accuracy of insulator string detection. Based on the above steps, the detected region of the insulator string is set to be an RoI; and, then the Yolo-v3 tiny network is adopted to detect the insulator missing faults in the RoI. The experimental results verify that the proposed model achieves better performance than our previous method on both 'InST_detection' and 'IMF-detection' datasets. Most importantly, the average running time of the proposed cascaded model is approximately 30ms. Therefore, it can be concluded that the proposed method has the potential that it can be used in realtime applications for the insulator missing faults detection.
Although the proposed cascaded model shows good effectiveness and efficiency in this work, automatic detection of insulator missing faults remains a challenging task due to the changes in insulator string types in the high-voltage transmission lines. For example, Figure 13 shows two hard samples, in which the insulator strings are vertical. The proposed model failed to detect the missing faults in these images since images containing such insulator strings are rare in our dataset. Therefore, the next work is needed to collect and create more insulator missing faults images to improve the generalization of the proposed cascaded model. In addition, the other typical faults in insulator string, such as missing, flashover, and contamination, should also be researched in future works.