In computer vision, the family of methods and algorithms developed for object recognition and classification has grown constantly in the last decades. Also, the recent achievements in machine learning and artificial intelligence have had a significant impact on the development of new tools and real-time applications. Therefore, the state-of-the-art literature recently provides several results of machine learning techniques applied to the detection, classification, and segmentation of electric towers. In [11
], Sampedro et al. addressed the detection and classification tasks using a supervised machine learning approach; HOG features are used to train two multi-layer perception neural networks, one to predict whether the region inside an image is a tower (or not), and the other to distinguish the tower type on the basis of a training dataset.
In this work, we focus on the insulators, which are components of the electrical equipment and are essential to ensure the efficacy of the current conduction. Indeed, several researchers devoted quite an effort to develop methods to detect the insulators and classify them as damaged (an example is shown in Figure 1
A,B) or normal. In [12
], the author proposes an automatic method to detect insulators in images, based on image segmentation and template matching (using normalized cross-correlation). In [13
] aerial images of power lines are used to test a method to detect insulators, based of the fusion of HOG and LBP features, reaching a correct detection of 89.1%. Zhao et al. in papers [14
] documented an extensive study about the insulator detection and localization. At first, the authors proposed a method combining an orientation angle detection with a binary shape prior knowledge, and tested this method to complex aerial images, achieving the recognition of multiple insulators in real-time. More recently, Zhao et al. moved to a machine learning approach based on a multi-patch CNN feature extraction and classification, not only to localize insulators, but also to diagnose their condition (defect, normal). The results are excellent in accuracy, but it is also noted that the multi-patch strategy implies an increase of the computation burden. Also, Liu et al. in [17
] proposed a method for the insulator localization based on deep learning, and achieved a true positive detection rate of 90.9%.
The inspection of electrical components from an optical video stream is a standard task in computer vision, made of object detection and classification, generally addressed in two steps, as shown in Figure 3
. First, a region proposal algorithm produces a set of candidate regions where the object is likely to be present. Then, the sub-image corresponding to the candidate region is handed to a classifier which produces in output a score about the presence of the object within the region. This approach has a key issue in finding the trade-off between the number of candidate regions produced and computing time. Also, the first step may be performed using a selective search approach (as in super-pixel-based methods [18
]) or using a sliding windows approach, at multiple sizes and scales (e.g., [19
]). It has been observed that the first step corresponding to a region proposal represents an important computational burden and is, indeed, the bottleneck towards fast detection. In our approach, in order to get an application able to provide automatic and fast classification of insulators, it was decided to resort to a particular kind of neural network architecture, named the Faster Region Convolutional Neural Network (Faster R-CNN) [20
] that is an original region proposal network sharing features with the detection network that improves both region proposal quality and object detection accuracy. Faster R-CNN uses two networks: a deep fully convolutional network that proposes regions (named Region Proposal Network, RPN) and another module that classifies the proposed regions (classification network). The RPN produces region proposals more quickly than the selective search [22
] algorithm used in previous solutions. By sharing information between the two networks, the accuracy is also improved, and this solution is currently the one with the best results in the latest object detection competitions. More precisely, the two sub-networks share the first layers which act globally as a deep feature extraction module. Several architectures can be used for building such a feature extraction module. Specifically, Inception Resnet v2 model was selected in this paper and instantiated for this particular application making use of TensorFlow [23
]. Transfer learning was used to cope with the limited dataset of images, which is not sufficient for dealing with training from scratch. Namely, an inference graph for Inception Resnet v2 pre-trained on COCO dataset [24
] has been imported. On the basis of the extracted feature, the RPN produces candidates for regions that might contain objects of interest. Namely, sliding a small window on the feature map, the RPN produces probabilities about the object presence in that region for region boxes of fixed aspect ratio and scale; a bounding box regressor also provides optimal size and position of the candidate rectangular areas in an intrinsic coordinate system. Candidates with a high probability of object presence are then passed to the classification network that is in charge of assessing the presence of an object category inside the region. As a training strategy, firstly, only the final fully connected layers of the two sub-networks were trained, leaving frozen all the other layers. In a fine-tuning phase, also the layers in the feature extraction module were optimized by using the training routines made available in TensorFlow. This architecture achieves, at the same time, region proposal (RP) and object detection. Faster R-CNN, designed for object detection in 2D images, is made of two networks: the former proposes candidate regions of interest, likely featuring an object inside (Region Proposals), and the latter uses the RPs to detect objects. Details about the built model, the results of the experimentation, and a description of the application deployed are reported in Section 3