Transmission Line Obstacle Detection Based on Structural Constraint and Feature Fusion

: Accurate detection and identiﬁcation of obstacles plays an important role in the navigation and behavior planning of the patrol robot. Aiming at the patrol robot with camera mounted symmetrically, an obstacle detection method based on structural constraint and feature fusion is proposed. Firstly, in order to discover the region of interest, the bounding box algorithm is used to propose the region. The location of the detected ground wire is used to constrain the region, and the image block of interest is clipped. Secondly, in order to accurately represent the multi-view and multi-scale obstacle images, the global shape features and the improved local corner features are fused by di ﬀ erent weights. Then, the particle swarm-optimized support vector machine (PSO-SVM) is used for classifying and recognizing obstacles. On block data set B containing multi-view and multi-scale obstacle images, the recognition rate of this method can reach up to 86.2%, which shows the e ﬀ ectiveness of weighted fusion of global and local features. On data set A containing complete images of di ﬀ erent distances, the detection success rate of long-distance obstacles can reach 80.2%. The validity of the proposed method based on structural constraints and feature fusion is veriﬁed.


Introduction
The study of cable inspection robots with long-time automatic operation has attracted great attention [1][2][3][4][5]. To achieve long-term operation, robots need to cross various types of obstacles. Therefore, real-time obstacle detection and recognition plays a crucial role in robot navigation and behavior planning. Our previous research mainly includes image quality enhancement of field inspection [6] and accurate detection and positioning of ground wires [7]. This paper mainly focuses on the detection and identification of obstacles.
The obstacle identification methods can be classified into two categories: sensor-based and vision-based methods. Chen Zhongwei [8] proposed an electromagnetic sensor navigation method for high-voltage line patrol robot. Three sets of electromagnetic sensor probes were designed on each robot arm to form a sensor probe array for identifying high-voltage conductor and obstacles. Cao Heng [9] proposed an obstacle detection and positioning method for the autonomous inspection robot of high-voltage transmission lines. GPS was used for long-distance measurement, and multi-electromagnetic sensors were used to identify obstacles at a short distance. Richard and Pouliot [10,11] introduced a LIDAR system, UTM-30LX, to detect ground wire and obstacles by analyzing the distance, diameter, and signal intensity. The sensor has the advantages of small structure, light weight, convenient installation, and convenient software processing and low cost. However, it can only detect obstacles on the wire and measure the distance, and cannot identify the categories of obstacles.
The vision-based method was proposed because the visible light camera can get more comprehensive information. It is divided into two categories: structural constraint-based method and machine learning-based method. Li Minmin [12] designed an obstacle identification method based on structural constraints, and on this basis, a binocular vision positioning method was proposed. Zhang Yunchu [13] quickly and reliably extracted the graphic primitives such as lines, circles, and ellipses from the edge images, and determined the obstacles of the detected primitives under structural constraints. Hu Caishi [14] improved Canny operator and used OTSU algorithm based on Zhang Yunchu to extract image edges and reduce the impact of illumination change. Tan Lei [15] proposed a similar method. Zhenhui LI [16] identified the damper by analyzing the vertical projection of the image and guided the motion planning of the robot. These methods are fast and can meet the real-time requirements. They are effective for specific perspective cases and clear target contour. However, when the contrast of obstacle images is low and the background is complex, it may be difficult to extract the graphic primitives, resulting in detection failure.
Another approach is based on machine learning. Miao siyi [17] extracted the wavelet moment features of the image and trained the obstacle classifier by wavelet neural network. Cao Wenming [18] extracted joint invariant moment features of edge images and used wavelet neural network for classification. To improve the recognition rate, Cao Wenming [19] also proposed a method based on wavelet moment and support vector machine. Tang Hongwei [20] proposed the particle swarm optimization algorithm to optimize the wavelet neural network and improve the accuracy of obstacle classification. Similar studies include the obstacle identification method proposed by Shen Chunsheng [21] and Wang Yuping [22]. The difference lies in the feature and classification model. Cheng Li [23] proposed a Hu moment plus KNN method for obstacle identification, and a single visual distance model for ranging was followed. Liu Ao [24] proposed a method to extract features and classify them through convolutional neural network. These methods are suitable for the scenes with fixed camera angle and close distance from the obstacles, while they may fail to deal with multiangle and multi-scale obstacles at different distances.
The robot used in this research is shown in Figure 1. It runs on an improved ground wire, only the dampers and suspension clamps are connected to the ground wire, which are the obstacles for the robot. Two cameras are symmetrically installed on both sides of the robot for obstacle detection. The images are shot symmetrically by these cameras, thus producing different shooting angles. In order to solve the multi-angle and multi-scale problems, a visual detection method is proposed by combining the structural constraint method with machine learning method. Firstly, the location of ground wire is quickly detected based on ED lines algorithm [25], and then the bounding box [26] is extracted by the bounding box algorithm. The location of the ground wire is used to constrain the position of the bounding box to propose the obstacle area and extract the block of the obstacle image. Finally, the global and local features are integrated, and the multi-classification model of particle swarm-optimized support vector machine (PSO-SVM) is used to identify obstacles.

Symmetrically
Installed cameras The remaining chapters are arranged as follows: Section 2 describes the proposed method in detail, including ground wire detection, region proposal, feature extraction, combination and The remaining chapters are arranged as follows: Section 2 describes the proposed method in detail, including ground wire detection, region proposal, feature extraction, combination and obstacle Symmetry 2020, 12, 452 3 of 21 classification. Section 3 describes the experiments and results of detection and recognition, and Section 4 summarizes our work and discusses the future research direction.

Visual Obstacle Detection Method
The specific flow of the method proposed in this paper is shown in Figure 1, which mainly includes two parts: the proposal and classification identification of obstacle areas.

Region Proposal of Obstacles
The region proposal is to extract the rectangular bounding box containing obstacles from the image. All obstacles are attached to the ground wire, and the location of the ground wire constrains the area where the obstacles may appear in Figure 2.
Symmetry 2020, 12, x FOR PEER REVIEW 3 of 22 obstacle classification. Section 3 describes the experiments and results of detection and recognition, and Section 4 summarizes our work and discusses the future research direction.

Visual Obstacle Detection Method
The specific flow of the method proposed in this paper is shown in Figure 1, which mainly includes two parts: the proposal and classification identification of obstacle areas.

Region Proposal of Obstacles
The region proposal is to extract the rectangular bounding box containing obstacles from the image. All obstacles are attached to the ground wire, and the location of the ground wire constrains the area where the obstacles may appear in the image.

Ground Wire Detection
The ground wire has obvious double-edge features in the perspective of robot head camera, thus the ground wire detection can be transformed into the identification of two straight line segments. Since there are many lines in the image, it is necessary to select two lines as double edge lines through prior information to form the center line of the ground wire. It can be divided into the following steps: Step 1: EDLines algorithm [25] is used to extract straight lines. It does not need any artificial parameters and has good real-time performance. On the basis of the edge graph generated by Edge Drawing (ED) [27], the line is extracted by the least square fitting method is adopted to extract. Finally, the method of eliminating short lines is adopted to confirm the line segment instead of the timeconsuming Helmholtz law.
Step 2: Determine the ground double edge line. The detected line segment set is represented by , and n is the total number of the line segment.

Ground Wire Detection
The ground wire has obvious double-edge features in the perspective of robot head camera, thus the ground wire detection can be transformed into the identification of two straight line segments. Since there are many lines in the image, it is necessary to select two lines as double edge lines through prior information to form the center line of the ground wire. It can be divided into the following steps: Step 1: EDLines algorithm [25] is used to extract straight lines. It does not need any artificial parameters and has good real-time performance. On the basis of the edge graph generated by Edge Drawing (ED) [27], the line is extracted by the least square fitting method is adopted to extract. Finally, the method of eliminating short lines is adopted to confirm the line segment instead of the time-consuming Helmholtz law.
Step 2: Determine the ground double edge line. The detected line segment set is represented by S = {L 1 , L 2 , . . . L n }, and n is the total number of the line segment. The i th line segment is defined as L i = (x i0 , y i0 , x i1 , y i1 , l, θ), where the first four parameters represent the coordinate values of the two endpoints of the line segment; l represents the length of the line segment, and θ represents the angle formed with the horizontal axis in the image coordinate system.
There are two kinds of prior knowledge that can be used to assist the detection of ground wires: Prior knowledge 1: According to the line model and camera imaging model, the angle range of the ground in the image can be calculated. Prior knowledge 2: The two edge segments of the ground line are the longest of all the segments and are nearly parallel.
According to prior knowledge 1, the included angle range of the current ground wire in the image is determined, and the straight line segment whose angle is not in the range is eliminated. If the number of remaining line segments is less than or equal to 1, the ground detection fails. If the number of remaining straight lines is greater than 2, the longest two straight lines L1 and L2 are selected. If the following conditions are met abs(abs(θ 1 ) − abs(θ 2 )) ≤ 5 The two line segments are considered to be two edge lines of the ground wire. θ 1 and θ 2 represent the angles of lines L1 and L2 with the horizontal axis of the image respectively. The detection effect of three representative images is shown in Figure 3, where the red line is the two edge lines of the detected ground wire. Prior knowledge 2: The two edge segments of the ground line are the longest of all the segments and are nearly parallel.
According to prior knowledge 1, the included angle range of the current ground wire in the image is determined, and the straight line segment whose angle is not in the range is eliminated. If the number of remaining line segments is less than or equal to 1, the ground detection fails. If the number of remaining straight lines is greater than 2, the longest two straight lines L1 and L2 are selected. If the following conditions are met The two line segments are considered to be two edge lines of the ground wire. and represent the angles of lines L1 and L2 with the horizontal axis of the image respectively. The detection effect of three representative images is shown in Figure 3, where the red line is the two edge lines of the detected ground wire. The bounding box algorithm uses the sliding window method in the whole image to build the bounding box, producing a large number of bounding boxes, and the real-time performance is limit. According to the characteristics of the obstacle and its position relationship with the ground wire, the following constraints are proposed. (1) As the obstacles are attached to the ground wire, the position information of the ground wire is used to constrain the surrounding area. (2) The size of the bounding box is limited by the actual shape and size of the obstacle in the image. These two limits significantly reduce the number of bounding boxes. The specific steps are as follows: Step 1: Generating the bounding box. The detected ground wire is shown in Figure 3. Suppose that the intersection point between the two edge lines of the ground wire and the lower edge of the image is p1 and p2. The length between p1 and p2 is L1. The intersection point with the horizontal axis of the image is p3 and p4, and the length between them is L2. The possible bounding box position S is shown in Figure 4.

Obstacle Region Proposal based on Structural Constraints
The bounding box algorithm uses the sliding window method in the whole image to build the bounding box, producing a large number of bounding boxes, and the real-time performance is limit. According to the characteristics of the obstacle and its position relationship with the ground wire, the following constraints are proposed. (1) As the obstacles are attached to the ground wire, the position information of the ground wire is used to constrain the surrounding area. (2) The size of the bounding box is limited by the actual shape and size of the obstacle in the image. These two limits significantly reduce the number of bounding boxes. The specific steps are as follows: Step 1: Generating the bounding box. The detected ground wire is shown in Figure 3. Suppose that the intersection point between the two edge lines of the ground wire and the lower edge of the image is p1 and p2. The length between p1 and p2 is L 1 . The intersection point with the horizontal axis of the image is p3 and p4, and the length between them is L 2 . The possible bounding box position S is shown in Figure 4. Prior knowledge 2: The two edge segments of the ground line are the longest of all the segments and are nearly parallel.
According to prior knowledge 1, the included angle range of the current ground wire in the image is determined, and the straight line segment whose angle is not in the range is eliminated. If the number of remaining line segments is less than or equal to 1, the ground detection fails. If the number of remaining straight lines is greater than 2, the longest two straight lines L1 and L2 are selected. If the following conditions are met The two line segments are considered to be two edge lines of the ground wire. and represent the angles of lines L1 and L2 with the horizontal axis of the image respectively. The detection effect of three representative images is shown in Figure 3, where the red line is the two edge lines of the detected ground wire. The bounding box algorithm uses the sliding window method in the whole image to build the bounding box, producing a large number of bounding boxes, and the real-time performance is limit. According to the characteristics of the obstacle and its position relationship with the ground wire, the following constraints are proposed. (1) As the obstacles are attached to the ground wire, the position information of the ground wire is used to constrain the surrounding area. (2) The size of the bounding box is limited by the actual shape and size of the obstacle in the image. These two limits significantly reduce the number of bounding boxes. The specific steps are as follows: Step 1: Generating the bounding box. The detected ground wire is shown in Figure 3. Suppose that the intersection point between the two edge lines of the ground wire and the lower edge of the image is p1 and p2. The length between p1 and p2 is L1. The intersection point with the horizontal axis of the image is p3 and p4, and the length between them is L2. The possible bounding box position S is shown in Figure 4.   The length of the intersection line between S and the bottom edge of the image is L 1l =L 1r =k*L 1 , and that with the horizontal axis of the image is L 2l =L 2r =k*L 2 . When k=7, S contains all obstacles after statistical analysis. The generated bounding box can be expressed as b i = (x i , y i , w, h, h b ), position (x i , y i ) ∈ S and (w, h) are the length and width respectively, and h/w ⊂ [1 /3, 3] can be obtained by statistical analysis. h b is the score of the bounding box, indicating the probability of containing the target. When the bounding box A is determined, the next bounding box B is generated by changing the position of the center point and the aspect ratio. If the overlap rate of A and B is greater than the threshold, then the bounding box is deleted; otherwise, the next bounding box is continued to be generated.
Step 2: Calculating the bounding box score. The more inner edges of the bounding box, the more likely it is to contain a complete obstacle target, and the higher the score. By judging the position relationship between the edge and the four edges of the bounding box, the edges can be divided into inner and intersecting ones. As shown in Figure 5, the inner edge is marked green, and the edge intersecting with the border is marked red, which are mainly the edges of the ground wire, not a part of the obstacle. The length of the intersection line between S and the bottom edge of the image is L1l=L1r=k*L1, and that with the horizontal axis of the image is L2l=L2r=k*L2. When k=7, S contains all obstacles after statistical analysis. The generated bounding box can be expressed as , position and are the length and width respectively, and can be obtained by statistical analysis.
hb is the score of the bounding box, indicating the probability of containing the target. When the bounding box A is determined, the next bounding box B is generated by changing the position of the center point and the aspect ratio. If the overlap rate of A and B is greater than the threshold, then the bounding box is deleted; otherwise, the next bounding box is continued to be generated.
Step 2: Calculating the bounding box score. The more inner edges of the bounding box, the more likely it is to contain a complete obstacle target, and the higher the score. By judging the position relationship between the edge and the four edges of the bounding box, the edges can be divided into inner and intersecting ones. As shown in Figure 5, the inner edge is marked green, and the edge intersecting with the border is marked red, which are mainly the edges of the ground wire, not a part of the obstacle.
where bw and bh are the width and height of the bounding box respectively; mi are the sum of the gradient values of all pixels in edge ci, and k is the adjustment coefficient, which is set to 1.5.
Step 3: Non-maximum suppression-since there may be multiple bounding boxes near the target, non-maximum suppression (NMS) is required to eliminate the overlapping bounding boxes with the threshold of overlap rate β. The result is shown in Figure 6. Applying the method mentioned above, the region proposal results of three images are shown in Figure 7.

( )
, , , , For a bounding box, the edges set is defined as E = {e 1 , e 2 , . . . , e n }, and the weight value given to each edge is w b (e i ). For the interaction edges, w b (e i ) = 0; otherwise, w b (e i )= 1. The scoring function of each bounding box is defined as where b w and b h are the width and height of the bounding box respectively; m i are the sum of the gradient values of all pixels in edge c i , and k is the adjustment coefficient, which is set to 1.5.
Step 3: Non-maximum suppression-since there may be multiple bounding boxes near the target, non-maximum suppression (NMS) is required to eliminate the overlapping bounding boxes with the threshold of overlap rate β. The result is shown in Figure 6. The length of the intersection line between S and the bottom edge of the image is L1l=L1r=k*L1, and that with the horizontal axis of the image is L2l=L2r=k*L2. When k=7, S contains all obstacles after statistical analysis. The generated bounding box can be expressed as , position and are the length and width respectively, and can be obtained by statistical analysis.
hb is the score of the bounding box, indicating the probability of containing the target. When the bounding box A is determined, the next bounding box B is generated by changing the position of the center point and the aspect ratio. If the overlap rate of A and B is greater than the threshold, then the bounding box is deleted; otherwise, the next bounding box is continued to be generated.
Step 2: Calculating the bounding box score. The more inner edges of the bounding box, the more likely it is to contain a complete obstacle target, and the higher the score. By judging the position relationship between the edge and the four edges of the bounding box, the edges can be divided into inner and intersecting ones. As shown in Figure 5, the inner edge is marked green, and the edge intersecting with the border is marked red, which are mainly the edges of the ground wire, not a part of the obstacle.
where bw and bh are the width and height of the bounding box respectively; mi are the sum of the gradient values of all pixels in edge ci, and k is the adjustment coefficient, which is set to 1.5.
Step 3: Non-maximum suppression-since there may be multiple bounding boxes near the target, non-maximum suppression (NMS) is required to eliminate the overlapping bounding boxes with the threshold of overlap rate β. The result is shown in Figure 6. Applying the method mentioned above, the region proposal results of three images are shown in Figure 7.

( )
, , , , Applying the method mentioned above, the region proposal results of three images are shown in Figure 7.
It can be seen that most of the obstacles are included in the box. At this point, the obstacle detection problem can be converted into recognition problem. For each candidate bounding box, the clipped image shown in Figure 8 determines whether it is an obstacle through the following pattern recognition algorithm.

Obstacle Recognition
In order to identify the obstacles with different distances and perspectives, Hu invariant moments [28] global features and improved ORB (Oriented FAST and Rotated BRIEF) [29] local corner feature are used to establish fusion features. Hu moment feature has translation, rotation and scaling invariance, and the improved ORB feature has invariant varying scale and perspective transformation invariance.
The representation of fusion features to the target is more accurate than that of a single feature, and its dimensions are generally higher. In high-dimensional feature space, it is more difficult to find a single multi-dimensional plane to accurately separate different types of feature vectors, that is, this is a linear inseparability problem. Moreover, the number of obstacle samples in this project is limited. Therefore, the support vector machine (SVM) algorithm based on structural risk minimization is adopted to solve the problem of multi-class obstacle classification with small samples, high feature dimensions, and feature nonlinearity.

Extraction of Hu invariant moments
The normalized obstacle image can be regarded as the probability density of a two-dimensional random variable, and the properties of the random variable can be described by moment characteristics. Set the size of the grayscale image I as M×N, the (p+q) order moment is defined as ( ) where I(i,j) is the coordinate of the image. In order to obtain translation invariance, the central moment can be expressed as It can be seen that most of the obstacles are included in the box. At this point, the obstacle detection problem can be converted into recognition problem. For each candidate bounding box, the clipped image shown in Figure 8 determines whether it is an obstacle through the following pattern recognition algorithm. It can be seen that most of the obstacles are included in the box. At this point, the obstacle detection problem can be converted into recognition problem. For each candidate bounding box, the clipped image shown in Figure 8 determines whether it is an obstacle through the following pattern recognition algorithm.

Obstacle Recognition
In order to identify the obstacles with different distances and perspectives, Hu invariant moments [28] global features and improved ORB (Oriented FAST and Rotated BRIEF) [29] local corner feature are used to establish fusion features. Hu moment feature has translation, rotation and scaling invariance, and the improved ORB feature has invariant varying scale and perspective transformation invariance.
The representation of fusion features to the target is more accurate than that of a single feature, and its dimensions are generally higher. In high-dimensional feature space, it is more difficult to find a single multi-dimensional plane to accurately separate different types of feature vectors, that is, this is a linear inseparability problem. Moreover, the number of obstacle samples in this project is limited. Therefore, the support vector machine (SVM) algorithm based on structural risk minimization is adopted to solve the problem of multi-class obstacle classification with small samples, high feature dimensions, and feature nonlinearity.

Extraction of Hu invariant moments
The normalized obstacle image can be regarded as the probability density of a two-dimensional random variable, and the properties of the random variable can be described by moment characteristics. Set the size of the grayscale image I as M×N, the (p+q) order moment is defined as ( ) where I(i,j) is the coordinate of the image. In order to obtain translation invariance, the central moment can be expressed as

Obstacle Recognition
In order to identify the obstacles with different distances and perspectives, Hu invariant moments [28] global features and improved ORB (Oriented FAST and Rotated BRIEF) [29] local corner feature are used to establish fusion features. Hu moment feature has translation, rotation and scaling invariance, and the improved ORB feature has invariant varying scale and perspective transformation invariance.
The representation of fusion features to the target is more accurate than that of a single feature, and its dimensions are generally higher. In high-dimensional feature space, it is more difficult to find a single multi-dimensional plane to accurately separate different types of feature vectors, that is, this is a linear inseparability problem. Moreover, the number of obstacle samples in this project is limited. Therefore, the support vector machine (SVM) algorithm based on structural risk minimization is adopted to solve the problem of multi-class obstacle classification with small samples, high feature dimensions, and feature nonlinearity.

Extraction of Hu invariant moments
The normalized obstacle image can be regarded as the probability density of a two-dimensional random variable, and the properties of the random variable can be described by moment characteristics. Set the size of the grayscale image I as M×N, the (p+q) order moment is defined as where I(i,j) is the coordinate of the image. In order to obtain translation invariance, the central moment can be expressed as where I(i c ,j c ) is the coordinate of the image center of mass, and it can be expressed as Since the shape feature is extracted from the binary edge image, m 00 represents the area of the image. The scale invariance characteristic can be obtained through the transformation of the central moment, and it can be expressed as where p+q=2,3,4 . . . . Gonzalez [30] proposed that seven rotation, translation and scale-invariant moment eigenvectors can be derived from the second and third order moments, that is [φ1, φ2, φ3, φ4, φ5, φ6, φ7,]. Then the Hu moment feature vector can be expressed as The edges of the three obstacle image blocks obtained in Figure 8 are extracted, and the new images are transformed by scaling, rotation, and translations shown in Figure 9. The seven invariant moment features are calculated, and the results are analyzed by using MATLAB box statistically.
where I(ic,jc) is the coordinate of the image center of mass, and it can be expressed as Since the shape feature is extracted from the binary edge image, m00 represents the area of the image. The scale invariance characteristic can be obtained through the transformation of the central moment, and it can be expressed as [30] proposed that seven rotation, translation and scale-invariant moment eigenvectors can be derived from the second and third order moments, that is . , , , , , , The edges of the three obstacle image blocks obtained in Figure 8 are extracted, and the new images are transformed by scaling, rotation, and translations shown in Figure 9. The seven invariant moment features are calculated, and the results are analyzed by using MATLAB box statistically. As shown in Figure 10, the horizontal axis represents seven invariant moments, and the vertical axis represents the mean value of five images in each row of Figure 9. For the convenience of display, both are enlarged by the same order of magnitude. In Figure 10, the upper, lower and median values of the same obstacle at the same time basically coincide, indicating that the Hu moment feature of the same image and its transformed image remains unchanged. In addition, it can be seen that the median characteristics of the same Hu moments vary greatly in the images of different obstacles. Therefore, the Hu invariant moment feature is suitable for describing the characteristics of ground wire obstacles.
[ ] As shown in Figure 10, the horizontal axis represents seven invariant moments, and the vertical axis represents the mean value of five images in each row of Figure 9. For the convenience of display, both are enlarged by the same order of magnitude. In Figure 10, the upper, lower and median values of the same obstacle at the same time basically coincide, indicating that the Hu moment feature of the same image and its transformed image remains unchanged. In addition, it can be seen that the median characteristics of the same Hu moments vary greatly in the images of different obstacles. Therefore, the Hu invariant moment feature is suitable for describing the characteristics of ground wire obstacles. For the descriptor set of extracted ORB feature points, K-means method [32] is used for clustering analysis. It is assumed that the number of clustering centers is k, and k clustering centers are obtained. Each cluster center corresponds to a descriptor, also known as a visual vocabulary, which constitutes a list of visual words.
For the obstacle image, the ORB feature points and descriptors can be extracted firstly, and then all the feature point descriptors are mapped to the nearest visual vocabulary. Since the 256-bit descriptor is binary code, the hamming distance is used to calculate the distance between the descriptors. The frequency of word occurrence is counted to construct the histogram, and then the final eigenvector expression is obtained by normalization operation. For the three obstacle image blocks, the histogram diagram of visual vocabulary is shown in Figure 11: where k represents the number of K-means lustering centers, that is, the total number of words.  2. Improved ORB corner feature extraction Suppose n ORB feature points are detected in the obstacle image, which is represented as P = p 1 , p 2 , . . . , p n . The descriptor of the i th feature point is 256 bit binary coded orb i , and then the corresponding descriptor of the image block constitutes the feature vector [orb 1 , orb 2 , . . . , orb n ].Since the number of feature points extracted from each obstacle image is not consistent, and each image block has different dimensions of descriptors, which cannot be directly classified as feature vectors. The bag of visual words (BOVW) model [31] is used to extract the visual vocabulary of obstacle images and construct the visual vocabulary histogram to complete the unification of feature dimensions.
For the descriptor set of extracted ORB feature points, K-means method [32] is used for clustering analysis. It is assumed that the number of clustering centers is k, and k clustering centers are obtained. Each cluster center corresponds to a descriptor, also known as a visual vocabulary, which constitutes a list of visual words.
For the obstacle image, the ORB feature points and descriptors can be extracted firstly, and then all the feature point descriptors are mapped to the nearest visual vocabulary. Since the 256-bit descriptor is binary code, the hamming distance is used to calculate the distance between the descriptors. The frequency of word occurrence is counted to construct the histogram, and then the final eigenvector expression is obtained by normalization operation. For the three obstacle image blocks, the histogram diagram of visual vocabulary is shown in Figure 11: For the descriptor set of extracted ORB feature points, K-means method [32] is used for clustering analysis. It is assumed that the number of clustering centers is k, and k clustering centers are obtained. Each cluster center corresponds to a descriptor, also known as a visual vocabulary, which constitutes a list of visual words.
For the obstacle image, the ORB feature points and descriptors can be extracted firstly, and then all the feature point descriptors are mapped to the nearest visual vocabulary. Since the 256-bit descriptor is binary code, the hamming distance is used to calculate the distance between the descriptors. The frequency of word occurrence is counted to construct the histogram, and then the final eigenvector expression is obtained by normalization operation. For the three obstacle image blocks, the histogram diagram of visual vocabulary is shown in Figure 11: where k represents the number of K-means lustering centers, that is, the total number of words.
where k represents the number of K-means lustering centers, that is, the total number of words.

Fusion of Local and Global Features
The method of combining different types and complementary features to enhance their descriptive ability has been widely used in computer vision projects. Literature [33][34][35] verified the feasibility of this method in pedestrian detection, scene recognition, and cloud classification. Therefore, this paper introduces the method of feature fusion to identifying obstacles on the ground wire.
For an obstacle image block, the global Hu moment feature and local feature ORBBOVW are combined to form the final feature description of the obstacle. The simplest fusion method is to join these two eigenvectors, i.e., Hu, ORBBOVW . This approach ignores the differences between features, which may lead to the reduction of description ability. Therefore, an improved method is to assign different weights to each feature based on its description ability. Suppose the description ability of each dimension of an eigenvector is expressed as p i , i=1,2,···m, and the weight of each dimension can be defined as This method is consistent with the objective fact that the higher the resolution, the greater the weight given by the feature. m is set to 2 because there are two features. Set the weight of Hu feature vector as γ, and the weight of ORBBOVW feature vector as (1-γ). γ is called the fusion coefficient, and the fusion feature is defined as In order to eliminate the influence of absolute value of the feature on fusion performance, it is necessary to normalize the feature. Since ORBBOVW has been normalized, thus each dimension of the Hu feature vector is normalized as where φ min φ max are the maximum and minimum values in the 7-dimensional features respectively, and φ * i is the normalized features. Therefore, the total dimension of the feature vector is 7+k. k is the number of clustering centers set by K-means algorithm, and the next part of the classifier is used to classify the feature vector of the obstacle image.

Support Vector Machine Classification and Particle Swarm Optimization
For the feature vector x, the final classifier function of support vector machine [36] can be expressed as where x i represents the feature vector V, which is calculated from an obstacle image patch. y i represents the corresponding label and is defined as y i ∈ {−1, 1}. Label 1 represents that the sample is a shockproof hammer, and -1 represents that the sample is not. K is the kernel function, and linear, polynomial and radial basis kernel functions are commonly used. Since the radial basis kernel function can fit any nonlinear data, RBF kernel function is selected in this paper, which contains two important parameters C and gamma. C is the penalty factor, indicating the tolerance for error. The higher C is, the easier it is to overfit; otherwise, it is easy to underfit. Gamma determines the distribution of data mapped to higher-dimensional space. The larger the gamma, the greater the risk of overfitting, and the smaller the gamma, the smoother the function is, resulting in underfitting and lower accuracy. Thus, C and gamma should be selected appropriately.
The traditional grid search method [37,38] performs permutation and combination experiments with two parameters in a certain range, which is time-consuming and difficult to obtain the global optimal solution. Therefore, an adaptive particle swarm optimization algorithm is used to find the global optimal solution in the search space. It is assumed that n particles are randomly generated in R d search space, and the position of the ith particle can be expressed as x i = (x i1 , x i2 , . . . , x id ). The individual extremum is p bi = (p bi1 , p bi2 , . . . , p bid ) and the global extremum is p gb = max(p bi ), i = 1, 2, . . . n. The velocity of the ith particle is expressed as v i = (v i1 , v i2 , . . . , v id ). In each iteration, each particle updates the speed and position according to the individual extremum and global extremum in the current generation according to the principle [39] as shown in where w is the inertia factor, w∈[0,1], and literature [40] shows that the linear decreasing weight strategy can balance the ability of global optimization and local optimization. In the process of iteration, the value of linear decreasing w is shown as where iter max represents the maximum number of iterations; iter represents the number of current iterations, and w max and w min represent the maximum and minimum inertia respectively. c1 and c2 are non-negative acceleration factors; c1 represents the acceleration factor of experience; c2 represents the acceleration factor of group experience, and the example speed can be expressed as The termination condition of PSO iteration is to reach the maximum number of iterations G. When the maximum number of iterations is reached, the global optimal particle represents the optimal solution of the problem. When PSO is adopted to optimize SVM parameters, the particle position can be expressed as x i = (c i , g i ), and the corresponding velocity can be expressed as v i = v ci , v gi .

Obstacle Classification based on PSO-SVM
There are four types of objects, including three types of obstacles on the ground wire and the background containing clouds and towers. Multi-classification problems can be solved by SVM using indirect methods, which are mainly divided into 'one-to-one' and 'one-to-many' strategies. Based on the indirect method of 'one-to-many' SVM, the target recognition framework of decision tree is established as shown in Figure 12. smaller the gamma, the smoother the function is, resulting in underfitting and lower accuracy. Thus, C and gamma should be selected appropriately. The traditional grid search method [37,38] performs permutation and combination experiments with two parameters in a certain range, which is time-consuming and difficult to obtain the global optimal solution. Therefore, an adaptive particle swarm optimization algorithm is used to find the global optimal solution in the search space. It is assumed that n particles are randomly generated in R d search space, and the position of the ith particle can be expressed as . The individual extremum is and the global extremum is . The velocity of the ith particle is expressed as . In each iteration, each particle updates the speed and position according to the individual extremum and global extremum in the current generation according to the principle [39] as shown in where w is the inertia factor, w∈[0,1], and literature [40] shows that the linear decreasing weight strategy can balance the ability of global optimization and local optimization. In the process of iteration, the value of linear decreasing w is shown as where itermax represents the maximum number of iterations; iter represents the number of current iterations, and wmax and wmin represent the maximum and minimum inertia respectively. c1 and c2 are non-negative acceleration factors; c1 represents the acceleration factor of experience; c2 represents the acceleration factor of group experience, and the example speed can be expressed as .
The termination condition of PSO iteration is to reach the maximum number of iterations G. When the maximum number of iterations is reached, the global optimal particle represents the optimal solution of the problem. When PSO is adopted to optimize SVM parameters, the particle position can be expressed as , and the corresponding velocity can be expressed as .

Obstacle Classification based on PSO-SVM
There are four types of objects, including three types of obstacles on the ground wire and the background containing clouds and towers. Multi-classification problems can be solved by SVM using indirect methods, which are mainly divided into 'one-to-one' and 'one-to-many' strategies. Based on the indirect method of 'one-to-many' SVM, the target recognition framework of decision tree is established as shown in Figure 12. , ,..., Three SVM classifiers are constructed as follows, and parameter C and gamma of each classifier is optimized by PSO. SVM1: Taking three types of samples as positive samples and the background as negative samples, the classifier is trained to identify whether the target is an obstacle or not; SVM2: Taking the obstacle group as positive samples, the suspension clamp and damper as negative samples, the classifier is trained to determine whether the target is the obstacle group or not; SVM3: Taking the suspension clamp as positive samples, and the damper as negative samples. The classifier is trained to determine whether the target category is suspension clamp or damper.
The fusion feature of the input image starts from the root node of the decision tree and ends the classification of any leaf node through three support vector machines. Since the detected region of interest contains more background, the first layer can reduce the number of classification.

Database
To verify the effectiveness of the algorithm proposed in this paper, 1000 images with the size of 720*1280 are collected from the experimental and project sites. LabelMe tool [41] is used to mark the category and location information of obstacles to form ground truth dataset A, as shown in Table 1. Some examples of labeling are shown in Figure 13. At the same time, the distance information is added to the XML file. The samples with a distance more than 5 m are defined as A1, and the samples with a distance of less than 5 m are defined as A2. Three SVM classifiers are constructed as follows, and parameter C and gamma of each classifier is optimized by PSO. SVM1: Taking three types of samples as positive samples and the background as negative samples, the classifier is trained to identify whether the target is an obstacle or not; SVM2: Taking the obstacle group as positive samples, the suspension clamp and damper as negative samples, the classifier is trained to determine whether the target is the obstacle group or not; SVM3: Taking the suspension clamp as positive samples, and the damper as negative samples. The classifier is trained to determine whether the target category is suspension clamp or damper.
The fusion feature of the input image starts from the root node of the decision tree and ends the classification of any leaf node through three support vector machines. Since the detected region of interest contains more background, the first layer can reduce the number of classification.

Database
To verify the effectiveness of the algorithm proposed in this paper, 1000 images with the size of 720*1280 are collected from the experimental and project sites. LabelMe tool [41] is used to mark the category and location information of obstacles to form ground truth dataset A, as shown in Table 1. Some examples of labeling are shown in Figure 13. At the same time, the distance information is added to the XML file. The samples with a distance more than 5 m are defined as A1, and the samples with a distance of less than 5 m are defined as A2.  The labeled obstacles are clipped from the image. The method of data enhancement is adopted to randomly add a certain degree of rotation, scaling, translation, clipping, and noise to the image to generate new samples and increase the sample size. Then, the obviously untrue samples are eliminated through manual screening to form obstacle dataset B, as shown in Table 2. Some image samples are shown in Figure 14.  The labeled obstacles are clipped from the image. The method of data enhancement is adopted to randomly add a certain degree of rotation, scaling, translation, clipping, and noise to the image to generate new samples and increase the sample size. Then, the obviously untrue samples are eliminated through manual screening to form obstacle dataset B, as shown in Table 2. Some image samples are shown in Figure 14. As shown in Figure 14, dataset B contains samples of different types, perspectives and scales. The background images are mostly sky, poles and towers, which are consistent with the actual line situation. Moreover, each background image contains more image details, which are consistent with the region of interest detected by the bounding box algorithm. Therefore, the selection of background samples is reasonable. As shown in Figure 14, dataset B contains samples of different types, perspectives and scales. The background images are mostly sky, poles and towers, which are consistent with the actual line situation. Moreover, each background image contains more image details, which are consistent with the region of interest detected by the bounding box algorithm. Therefore, the selection of background samples is reasonable.

Influence of ORB Feature Dimension k on Obstacle Recognition
In order to solve the problem of inconsistent feature dimensions of each image, BOVW model is adopted to improve ORB features, and the dimension is unified into k. 1200 samples are randomly selected from the four types of samples in dataset B to extract their ORB feature points, and the number of feature points is histogram analyzed as shown in the figure below. It can be seen from Figure 15, the abscissa axis represents the number of feature points of the sample, which ranges from 50 to 450. By detecting the outliers for continuous distributions, the number of feature points is approximately Gaussian distribution centered on 250. In order to avoid the problem of feature sparsity caused by more zero components, k is set to 20, 50, 80, and 110 as the feature dimensions respectively.
The 'one-to-many' SVM classification method mentioned above is applied to the data set for four experiments, and the prediction performance of the classifier is evaluated by 10-fold cross-validation

Influence of ORB Feature Dimension k on Obstacle Recognition
In order to solve the problem of inconsistent feature dimensions of each image, BOVW model is adopted to improve ORB features, and the dimension is unified into k. 1200 samples are randomly selected from the four types of samples in dataset B to extract their ORB feature points, and the number of feature points is histogram analyzed as shown in the figure below.
It can be seen from Figure 15, the abscissa axis represents the number of feature points of the sample, which ranges from 50 to 450. By detecting the outliers for continuous distributions, the number of feature points is approximately Gaussian distribution centered on 250. In order to avoid the problem of feature sparsity caused by more zero components, k is set to 20, 50, 80, and 110 as the feature dimensions respectively. As shown in Figure 14, dataset B contains samples of different types, perspectives and scales. The background images are mostly sky, poles and towers, which are consistent with the actual line situation. Moreover, each background image contains more image details, which are consistent with the region of interest detected by the bounding box algorithm. Therefore, the selection of background samples is reasonable.

Influence of ORB Feature Dimension k on Obstacle Recognition
In order to solve the problem of inconsistent feature dimensions of each image, BOVW model is adopted to improve ORB features, and the dimension is unified into k. 1200 samples are randomly selected from the four types of samples in dataset B to extract their ORB feature points, and the number of feature points is histogram analyzed as shown in the figure below. It can be seen from Figure 15, the abscissa axis represents the number of feature points of the sample, which ranges from 50 to 450. By detecting the outliers for continuous distributions, the number of feature points is approximately Gaussian distribution centered on 250. In order to avoid the problem of feature sparsity caused by more zero components, k is set to 20, 50, 80, and 110 as the feature dimensions respectively.
The 'one-to-many' SVM classification method mentioned above is applied to the data set for four experiments, and the prediction performance of the classifier is evaluated by 10-fold cross-validation The 'one-to-many' SVM classification method mentioned above is applied to the data set for four experiments, and the prediction performance of the classifier is evaluated by 10-fold cross-validation method. When extracting features, ORBBOVW features are extracted according to different k values and the model is trained and tested. For intuitive representation, the average classification accuracy is expressed in the form of confusion matrix, as shown in Figure 16.
As shown in Figure 16, the x-axis represents the predicted value; the y-axis represents the true value, and the number in each box (x,y) represents the probability that the sample of class y is identified as class x. When k is 80 or 110, the recognition rate is low because the number of feature points is small and the feature is sparse. When k takes a small value (such as 20), the information is lost due to excessive compression of features, which also leads to a low recognition rate. When k=50, the recognition rate is relatively high, which shows that there is a balance between the sparsity of data features and information integrity. Therefore, in the following experiment, k=50 is adopted.
Meanwhile, it can also be seen from Figure 16 that the highest recognition rate of obstacles is only 75%, indicating that a single local feature cannot accurately describe obstacles. Therefore, it needs to be integrated with other features to improve the accuracy of recognition.
value, and the number in each box (x,y) represents the probability that the sample of class y is identified as class x. When k is 80 or 110, the recognition rate is low because the number of feature points is small and the feature is sparse. When k takes a small value (such as 20), the information is lost due to excessive compression of features, which also leads to a low recognition rate. When k=50, the recognition rate is relatively high, which shows that there is a balance between the sparsity of data features and information integrity. Therefore, in the following experiment, k=50 is adopted.
Meanwhile, it can also be seen from Figure 16 that the highest recognition rate of obstacles is only 75%, indicating that a single local feature cannot accurately describe obstacles. Therefore, it needs to be integrated with other features to improve the accuracy of recognition.

Influence of Feature Fusion Parameters on Obstacle Recognition
Since there are only two types of features in this paper, the experimental verification method is used to select the optimal feature fusion parameter. The value range of γ is 0-1, and that of discrete γ by the interval is 0.1. For each value, the experiment is carried out in the same way as above, where k = 50. The results of experimental accuracy are shown in Figure 17.

Influence of Feature Fusion Parameters on Obstacle Recognition
Since there are only two types of features in this paper, the experimental verification method is used to select the optimal feature fusion parameter. The value range of γ is 0-1, and that of discrete γ by the interval is 0.1. For each value, the experiment is carried out in the same way as above, where k = 50. The results of experimental accuracy are shown in Figure 17. As shown in Figure 17, taking damper as an example, the detection accuracy increased first and then decreased. When the map is equal to 0.4, the recognition rate reached a maximum of about 86%. It shows that the global feature is beneficial to the description of damper. When γ is bigger than 0.4, the accuracy rate drops gradually. It can be inferred that the global feature of the damper has strong discernibility ability than the local feature. The accuracy changing trends of the other two kinds of obstacles is similar to that of damper. Therefore, in order to obtain better classification accuracy, γ is set As shown in Figure 17, taking damper as an example, the detection accuracy increased first and then decreased. When the map is equal to 0.4, the recognition rate reached a maximum of about 86%.
It shows that the global feature is beneficial to the description of damper. When γ is bigger than 0.4, the accuracy rate drops gradually. It can be inferred that the global feature of the damper has strong discernibility ability than the local feature. The accuracy changing trends of the other two kinds of obstacles is similar to that of damper. Therefore, in order to obtain better classification accuracy, γ is set as 0.4.

Comparison with Other Methods
For the obstacle database B, four obstacle identification methods are compared with the proposed method. The NN method is implemented by using Keras tools and the parameters of the network are the same as [24]. The key parameters of the proposed algorithm adopt the optimal value selected above, and PSO related parameters are shown in Table 3. The experimental results are shown in Table 4. It can be seen from the above table that the proposed algorithm achieved higher recognition accuracy than the other three methods. The recognition accuracy of the four comparison method is relatively low. This may be due to the limited ability of single feature describing multiple-scale and multi-perspective obstacles in database B. For the CNN method, the low detection accuracy may be due to the small number of samples, which lead to overfitting. The proposed algorithm uses global and local feature fusion to enhance the description ability of obstacles with different scales and shapes at the same time, PSO-SVM has good classification ability, and the classification accuracy is high. Meanwhile, it can be seen that the recognition accuracy of these five methods for the background image block is about 85%.
In order to further verify the effectiveness of the proposed method, other classification metrics such as positive predictivity (PP), sensitivity [42], and F1-score are calculated. For multi-class classification cases, the PP, sensitivity, and F1-score are evaluated by using the sklearn toolbox. For example, the micro PP is calculated as where the index i represents the obstacle class. If the prediction of the output is positive and the true result is TRUE, then this is called true positive (TP). If the true result is FALSE, it is called the false positive (FP). If the output is negative and the true result is FALSE, it is called false negative (FN). The contradiction between PP and Sensitivity occurs from time to time, which requires comprehensive consideration. The most common method is f-measure (also known as f-score), and the F1-Score is calculated as F1-Score = 2 * PP * S/(PP + S) where PP and S are the abbreviations of positive predictivity and sensitivity. PP represents the percentage of positive samples predicted to be positive. Sensitivity represents the percentage of the positive samples predicted to be positive. The larger these two values are, the better the classification performance is, and 1 is the ideal optimal performance. F measurement is a trade-off between accuracy and sensitivity. The larger the value, the better the classification performance is. When F is 1, the optimal performance is achieved, and the PP and sensitivity are also 1. Figure 18 shows the classification performance metrics of five different methods on dataset B. According to the three metrics in Figure 18, the proposed method achieves the optimal classification performance. The accurate classification rate of positive samples is about 85%, and that of positive samples is about 84%. For the CNN method, about 20 percent of positive samples are lost. To further analyze each type of sample specifically, the classification performance metrics for each class are listed in the following figure. According to the three metrics in Figure 18, the proposed method achieves the optimal classification performance. The accurate classification rate of positive samples is about 85%, and that of positive samples is about 84%. For the CNN method, about 20 percent of positive samples are lost. To further analyze each type of sample specifically, the classification performance metrics for each class are listed in the following figure.
For the PP metric, the obstacle group achieves the highest value, while the sensitivity metric for the obstacle group is relatively low. It can be inferred that the other three types of samples are unlikely to be misidentified as obstacle groups in Figure 19. The reason may be that the shapes of the other three types of samples have great difference from those of the obstacle group. The obstacle group achieves the highest value for F1-Score, which means that the proposed method has the best performance on the classification of obstacle group. However, it can be seen from Table 4 that the classification effect of the damper is slightly better than that of obstacle group. It may be because that the PP metric of obstacle group is larger than that of the damper. Therefore, it can be seen that different evaluation criterion lead to different conclusions. For the PP metric, the obstacle group achieves the highest value, while the sensitivity metric for the obstacle group is relatively low. It can be inferred that the other three types of samples are unlikely to be misidentified as obstacle groups in Figure 19. The reason may be that the shapes of the other three types of samples have great difference from those of the obstacle group. The obstacle group achieves the highest value for F1-Score, which means that the proposed method has the best performance on the classification of obstacle group. However, it can be seen from Table 4 that the classification effect of the damper is slightly better than that of obstacle group. It may be because that the PP metric of obstacle group is larger than that of the damper. Therefore, it can be seen that different evaluation criterion lead to different conclusions.

Parameter Settings
The best parameters are shown in Table 5. thr is the threshold value to determine whether the obstacle is detected accurately, that is, when the overlap rate between the detected target area and the manually marked area is less than thr, the obstacle is detected correctly. thr is respectively set to 0.5 and 0.7 in the experiment. The overlap rate α and the overlap rate threshold β are the key parameters in the region proposal stage. In [26], the author made sufficient experiments on PASCAL VOC2007 dataset and concluded that    

Parameter Settings
The best parameters are shown in Table 5. thr is the threshold value to determine whether the obstacle is detected accurately, that is, when the overlap rate between the detected target area and the manually marked area is less than thr, the obstacle is detected correctly. thr is respectively set to 0.5 and 0.7 in the experiment. The overlap rate α and the overlap rate threshold β are the key parameters in the region proposal stage. In [26], the author made sufficient experiments on PASCAL VOC2007 dataset and concluded that (α, β) = (0.65, 0.75) was an optimal choice. The clustering central dimension k and fusion parameters γ affect the generation of feature vectors and the recognition of obstacle image patches.

Visual Effect Analysis
In order to verify the effectiveness of the proposed method, three typical cases of robot obstacle detection are selected for experiments, namely single-target case, multi-target case, and far and near target case.
• Case 1: Single obstacle target at different scales and angles As shown in Figure 20, the first row represents the region proposal result, and the second row represents the recognition results based on region proposal. The manual labeled box is marked in red and the green box represents the identified obstacle.
When the robot is far away from the poles and towers, it first captures the obstacle group as shown in Figure 20a-c. Figure 20a,b are taken by left and right cameras installed on the robot at different shooting angles. The obstacle group size is small in Figure 20c-f due to the far shooting distance.
However, due to the constraint of ground wire, the obstacle group can be accurately detected and identified. It can be seen that Figure 20d,e are quite different in scale and perspective, because they are shot at different distances. However, both of them can be detected accurately.

Feature fusion
Clustering center dimension k 50 Fusion parameter γ 0.4

Visual Effect Analysis
In order to verify the effectiveness of the proposed method, three typical cases of robot obstacle detection are selected for experiments, namely single-target case, multi-target case, and far and near target case. As shown in Figure 20, the first row represents the region proposal result, and the second row represents the recognition results based on region proposal. The manual labeled box is marked in red and the green box represents the identified obstacle.
When the robot is far away from the poles and towers, it first captures the obstacle group as shown in Figure 20  • Case 2: Multi-obstacle target It can be seen from Figure 21, multiple obstacles can be detected and identified simultaneously. The reason is that multiple regions that may contain obstacles are detected in region proposal stage, and then the images in the region are classified and recognized. Especially for the obstacles with small distance, they can also be detected because the bounding box is limited to the ground wire and the minimum rectangular box area is fixed. It can be seen from Figure 21, multiple obstacles can be detected and identified simultaneously. The reason is that multiple regions that may contain obstacles are detected in region proposal stage, and then the images in the region are classified and recognized. Especially for the obstacles with small distance, they can also be detected because the bounding box is limited to the ground wire and the minimum rectangular box area is fixed.
• Case 3: The robot approaches the obstacle from far to near • Case 3: The robot approaches the obstacle from far to near Figure 22 shows the three images shot by the robot when it approaches the pole and tower. The damper in the first image is not detected, because the target is small and merged with the ground Symmetry 2020, 12, 452 18 of 21 wire and tower head for the long distance between the target and the robot. With the decrease of distance, although the size and the shooting angle are changed, the first damper can be detected accurately. The front and rear dampers can be detected at the nearest distance.
Therefore, it can be seen that the proposed method can accurately detect multiple obstacles from near and far angles. It can be seen from Figure 21, multiple obstacles can be detected and identified simultaneously. The reason is that multiple regions that may contain obstacles are detected in region proposal stage, and then the images in the region are classified and recognized. Especially for the obstacles with small distance, they can also be detected because the bounding box is limited to the ground wire and the minimum rectangular box area is fixed.
• Case 3: The robot approaches the obstacle from far to near  Figure 22 shows the three images shot by the robot when it approaches the pole and tower. The damper in the first image is not detected, because the target is small and merged with the ground wire and tower head for the long distance between the target and the robot. With the decrease of distance, although the size and the shooting angle are changed, the first damper can be detected accurately. The front and rear dampers can be detected at the nearest distance.

Quantitative Analysis
In order to further verify the effectiveness of the proposed method, the test results are quantitatively analyzed. Assume that IOU(box 1 , box 2 ) ≤ thr&&label(box 2 ) = damper (19) It is a valid detection, where box1 and box2 are the manual labeled and the detected bounding box. The detection success rate can be defined as the ratio of the valid detection number to the total number. In order to verify the influence of distance on obstacle detection and recognition, parameters in Table 5 are used to test data sets A1 and A2. The success rate of detection is shown in Table 6. The performance of the proposed method on short-range data set A2 is better than that on long-range data set A1. Because when the distance is close, the edge of the obstacle is clear and overlaps less with the ground wire, resulting in a higher number of bounding boxes generated around it. The extracted feature points, corner points, and shape features are more accurate. For data set A2, the detection success rate of three types of obstacles is also different. The detection success rate of the damper is the highest with an average of 89.4%, and the obstacle group is the lowest with an average of 83.6%. The reason is that the damper is separated from the ground wire at short distance and has obvious morphological characteristics, and the proposed algorithm can accurately identify it as a single object. The obstacle group is formed by the overlaps of several dampers and poles and towers. When the distance is close, it is likely to be identified as a damper rather than an obstacle group, thus its detection rate is low. When the distance is close, the boundary between the suspension clamp, the pole tower and the ground wire is obvious, and it is easy to form an independent object. For data set A1, the detection success rate of obstacle group is higher than that of the damper and suspension clamp. The reason is that when the distance is far, the overlap between obstacles is large, and the characteristic of the obstacle group is obvious, so the success rate is high. Moreover, for the same obstacle, when thr is 0.5, the detection success rate is higher than that when thr is 0.7. Because the small thr value can tolerate the small overlap between the detection rectangular box and the manually labeled rectangular box, and the probability of detecting obstacles is high. However, too small a thr will lead to failure detection. The same pattern can be obtained by analyzing dataset A1.
Finally, the average time cost of the proposed algorithm is shown in Table 7, that is, the actual processing speed can reach 8fps. The feasibility and effectiveness of the proposed method are verified through qualitative effect analysis and quantitative detection success rate analysis in the above experiments.

Conclusions
In this paper, a method of obstacle detection and identification is proposed, which makes the detection and identification process independent. Firstly, the region is proposed by the detected ground wire constraint bounding box, and then each obstacle image block is classified by the multi-classification SVM classifier to realize the recognition of multi-obstacle targets. In the recognition stage, global shape features and improved local features are fused in a weighted way to enhance the characterization of multi-angle and multi-scale obstacles.
The effectiveness of the proposed method is verified from the perspective of visual effect and detection success rate. The recognition accuracy of the proposed method for the three types of obstacles can reach 86.2%, 83.6%, and 83.1%respectively. Meanwhile, the detection success rate of short distance data set A2 is up to 92.6%, and that of the long distance data set A1 is 80.2%, which shows that the method can also detect obstacles in a relatively long distance. In the future, the obstacle recognition and detection based on deep learning and GAN will be studied, and the improved hardware architecture of robot will be combined to realize real-time and accurate detection. It will lay a foundation for the next stage in planning.