Research on Identiﬁcation and Location of Charging Ports of Multiple Electric Vehicles Based on SFLDLC-CBAM-YOLOV7-Tinp-CTMA

: With the gradual maturity of autonomous driving and automatic parking technology, electric vehicle charging is moving towards automation. The charging port (CP) location is an important basis for realizing automatic charging. Existing CP identiﬁcation algorithms are only suitable for a single vehicle model with poor universality. Therefore, this paper proposes a set of methods that can identify the CPs of various vehicle types. The recognition process is divided into a rough positioning stage (RPS) and a precise positioning stage (PPS). In this study, the data sets corresponding to four types of vehicle CPs under different environments are established. In the RPS, the characteristic information of the CP is obtained based on the combination of convolutional block attention module (CBAM) and YOLOV7-tinp, and its position information is calculated using the similar projection relationship. For the PPS, this paper proposes a data enhancement method based on similar feature location to determine the label category (SFLDLC). The CBAM-YOLOV7-tinp is used to identify the feature location information, and the cluster template matching algorithm (CTMA) is used to obtain the accurate feature location and tag type, and the EPnP algorithm is used to calculate the location and posture (LP) information. The results of the LP solution are used to provide the position coordinates of the CP relative to the robot base. Finally, the AUBO-i10 robot is used to complete the experimental test. The corresponding results show that the average positioning errors (x, y, z, rx, ry, and rz) of the CP are 0.64 mm, 0.88 mm, 1.24 mm, 1.19 degrees, 1.00 degrees, and 0.57 degrees, respectively, and the integrated insertion success rate is 94.25%. Therefore, the algorithm proposed in this paper can efﬁciently and accurately identify and locate various types of CP and meet the actual plugging requirements.


Introduction
With the worldwide continuous reduction in the availability of fossil energy, the advantages of new energy vehicles have been gradually highlighted [1][2][3]. Electric vehicles rely on clean and pollution-free features to get strong support from the government [4][5][6]. In recent years, the shortage of urban land resources has become increasingly prominent, and the application of stereo charging garages has promoted the development of tram charging towards unmanned direction [7]. Automatic parking and driverless technology are gradually becoming mature. With this technology, a vehicle will arrive at the parking lot by itself and should be charged automatically. For publicly used electric vehicles or those on time-sharing lease, when the user returns a vehicle, the charging is often delayed that affects the user experience and utilization. The charging pile is damaged by weather and human factors, and manual charging will have significant safety risks. At the same time, the DC charging gun line is heavy, which is not conducive to manual plugging [8]. Based on the aforementioned problems, automatic charging of electric vehicles is an urgent problem that needs to be solved.
At present, some companies and research institutions have proposed their own solutions [9][10][11][12][13][14][15][16][17][18][19]. These solutions show that the core of automatic charging of electric vehicles mainly consists of two parts: the identification and positioning of charging port (CP) and the plug-in mechanism. The identification and positioning of CP is the premise of plugging. Furthermore, the accuracy and universality of CP identification are important guarantees for the successful plugging of robots. Therefore, the high-precision identification and positioning of the CP is of great significance towards realization of automatic charging technology.
At present, the main CP recognition method uses visual positioning, which is divided into two categories: (1) with feature recognition and (2) without feature recognition. In terms of feature recognition, Lv [20] added white labels around the CP, used feature matching for rough positioning, and inserted the CP according to the six-axis force sensor compensation. The author did not provide recognition and positioning accuracy. Pan et al. [21] added five black and white labels around the CP. Based on the contour of the open operation, the geometric solution method was used to calculate the location and posture (LP) of the CP. The LP errors were 1.4 mm and 1.6 degrees, respectively, and the insertion success rate was 98.9%. The CP recognition methods without feature recognition include Li et al. [22] that proposed a CP identification and location method based on the Scale-invariant feature transform and semi-global block matching. The method achieved an average error of 1.51 mm. Zhang et al. [16] improved the canny edge detection and the CP image correlation algorithm of combined morphology. The authors did not specify any recognition accuracy, and the overall insertion success rate was 95.55%. Yao et al. [23], based on the template matching algorithm in Halcon commercial vision software, tested the CP LP error in a room, achieving average errors of 2.5 mm in position and 0.8 degrees in angle. Quan et al. [24] tested the CP identification accuracy in multiple environments using the cluster template matching algorithm (CTMA). The LP errors were 0.91 mm and 0.87 degrees, respectively, and the plugging success rate was 95%.
In recent years, deep learning has achieved rapid development in the field of target recognition, with the emergence of a series of target detection models such as Faster-RCNN, YOLO and SSD [25][26][27][28][29][30][31][32]. These models have improved the universality of target recognition, especially for specific targets, significantly improving the recognition accuracy in complex scenes and light. The YOLO algorithm is highly favored for its relatively high accuracy while ensuring high speed [33,34].
Based on the above research, CP recognition can only adapt to a single type of CP. Although the size of the CP has a unified standard, the CPs from different manufacturers and even different batches of CP from the same manufacturer will result in the inconsistency of the detailed texture and surface roughness of the CP. Due to the limitations of traditional algorithms, different CPs require adjustment of different characteristic parameters, and the universality is poor. At present, among the target detection algorithms, there is no recognition optimization algorithm with structural features. In view of the specificity of CP features, this paper proposes a data enhancement method based on the YOLOV7tinp algorithm. The locations of similar features determine the label cate-gory (SFLDLC), which improved the target classification accuracy of similar feature location-determining categories. At the same time, using the convolutional block attention module (CBAM) attention mechanism combined with the CTMA, the universality and accuracy of the algorithm are improved, and a guarantee for the LP calculation algorithm of CP is provided. The rough positioning stage (RPS) and precise positioning stage (PPS) use the similar projection relationship and EPnP algorithm, respectively, to solve the LP of the recognition results. Subsequently, the robot is guided to complete the insertion work, which realizes automatic charging of various vehicle CPs. Our contributions in this paper are as follows: (1) We propose a solution that combines deep learning methods to identify charging port pose information. (2) We propose an SFLDLC and CTMA for CP recognition and positioning, which improves the accuracy of recognition. (3) We have integrated CBAM into YOLOV7-tinp for CP recognition and positioning, improving recognition accuracy.
This paper is organized as follows: Section 2 introduces the data collection process and the identification and location methods. Section 3 conducts experimental verification in different scenarios, providing positioning accuracy and insertion success rate. Section 4 discusses the sources of positioning errors. Section 5 summarizes the experimental results and further research directions.

Construction of Experimental Test Platform
The experimental test platform mainly includes three components: visual module, control module and plug-in actuator, as shown in Figure 1. To meet the requirements of the experimental insertion workspace, the actuator uses the AUBO-i10 articulated robot with six degrees of freedom. This paper uses the camera manufactured by Daheng Image Vision Co., Ltd. in Beijing, China, where the model of the camera is MER-125-30GM/C-P series industrial camera. The camera lens is the M0814-MP2 lens of Comstar. The light intensity is measured using the Taiwan Taishi TES-1335 digital illuminometer. The specific information is shown in Table 1. This paper adopts the camera calibration method of Zhang [35] and the hand-eye calibration method of Zhu et al. [36].
proves the accuracy of recognition.
(3) We have integrated CBAM into YOLOV7-tinp for CP recognition and positioning, improving recognition accuracy.
This paper is organized as follows: Section 2 introduces the data collection process and the identification and location methods. Section 3 conducts experimental verification in different scenarios, providing positioning accuracy and insertion success rate. Section 4 discusses the sources of positioning errors. Section 5 summarizes the experimental results and further research directions.

Construction of Experimental Test Platform
The experimental test platform mainly includes three components: visual module, control module and plug-in actuator, as shown in Figure 1. To meet the requirements of the experimental insertion workspace, the actuator uses the AUBO-i10 articulated robot with six degrees of freedom. This paper uses the camera manufactured by Daheng Image Vision Co., Ltd. in Beijing, China, where the model of the camera is MER-125-30GM/C-P series industrial camera. The camera lens is the M0814-MP2 lens of Comstar. The light intensity is measured using the Taiwan Taishi TES-1335 digital illuminometer. The specific information is shown in Table 1. This paper adopts the camera calibration method of Zhang [35] and the hand-eye calibration method of Zhu et al. [36].

Image Data Acquisition
This study is aimed at the DC CP with the national standard number of GBT 20234.3-2011. During data collection, the robot is fixed on the base in order to obtain the actual LP

Image Data Acquisition
This study is aimed at the DC CP with the national standard number of GBT 20234.3-2011. During data collection, the robot is fixed on the base in order to obtain the actual LP information of the CP relative to the camera. The base world coordinates are kept unchanged, and the robot is moved into the CP in the state of teaching, which is the zero LP state. The robot is then moved out of the CP randomly within the recognition range, and the LP information of the camera relative to the CP is obtained based on the robot's LP information and zero LP information.
Four types of CPs are used in this paper. In order to reduce the interference of the environment, i.e., the collected image is too dark or too bright, this paper designs an automatic exposure algorithm to adjust the average brightness value of the image between 100-160. The data in this paper were collected in the Songjiang District, Shanghai (120.5924 E, 31.3036 N). This article considers indoor, outdoor, morning, afternoon, noon, evening, sunny, and cloudy environments. There are 12 scenarios in total. Because of the similarity of the scenarios, six scenarios are finally determined, as shown in Table 1. In order to improve the actual positioning accuracy, this paper divides positioning into two stages: RPS and PPS, as shown in Figure 2. information of the CP relative to the camera. The base world coordinates are kept un changed, and the robot is moved into the CP in the state of teaching, which is the zero LP state. The robot is then moved out of the CP randomly within the recognition range, and the LP information of the camera relative to the CP is obtained based on the robot's LP information and zero LP information. Four types of CPs are used in this paper. In order to reduce the interference of the environment, i.e., the collected image is too dark or too bright, this paper designs an au tomatic exposure algorithm to adjust the average brightness value of the image between 100-160. The data in this paper were collected in the Songjiang District, Shangha (120.5924 E, 31.3036 N). This article considers indoor, outdoor, morning, afternoon, noon evening, sunny, and cloudy environments. There are 12 scenarios in total. Because of the similarity of the scenarios, six scenarios are finally determined, as shown in Table 1. In order to improve the actual positioning accuracy, this paper divides positioning into two stages: RPS and PPS, as shown in Figure 2.  Table 3.  The purpose of RPS is to find the CP target and achieve rough positioning. According to the actual application scenario, the ranges of the x, y, and z directions are [ Table 3. The feature recognition is divided into RPS and PPS in order to ensure accurate feature location. The RPS is mainly used to identify and locate the CP target with a long distance and a wide range. The main problem is that when the target is far away, the image is blurred in the non-focal position, the characteristics of the CP vary greatly, and the proportion of the target in the field of vision is small. To deal with the aforementioned problems, we choose a larger target as the feature of the CP. Although the outermost feature of the CP is the most obvious, the outermost dimension of the CP from each manufacturer is different and there is no standard size. Therefore, we choose a relatively large feature and regard the round feature of the CP as a whole. The PPS is mainly aimed at the target near the focal length, which requires a high recognition accuracy. Therefore, we consider the individual circular feature contour as the target feature for recognition, and the feature range of RPS is as follows: where x n , y n represent the center point coordinates of the nth feature; w n , h n represent the length and width of the nth feature, and (X min , Y min ) and (X max , Y max ) represent the pixel coordinate positions of the upper left and lower right corners of the feature box, respectively. The center position of the CP is calculated according to the obtained characteristic information of the CP as where (x m , y m ) and (X m , Y m ) represent the recognized feature center point coordinates and the actual feature center point coordinates, respectively, and a m represents the conversion coefficient between pixel features and physical features. Using the calculated (X m , Y m ), the coordinate value of the CP in the physical coordinate system can be obtained according to the similar projection relationship.

Identification Algorithm Model
In the YOLOv7 network, the number of times the model performs feature extraction will increase as the depth of the model increases, which will lead to a high computational complexity. Based on the characteristics of image data set at different recognition stages of the CP, and considering the requirements of image resolution, GPU memory and detection accuracy, the image data sets of the two stages are input into the neural network for training in this paper. The input image resolution is set to 960 × 960 in order to reduce the impact of image compression on the accuracy. Three different sizes of detector heads are used to output the results, including the location information, category information, and confidence of the CP features. Figure 3 shows the network structure.  The feature type and location of the CP have a coupling relationship based on the CBAM-YOLOV7-tinp-CTMA network structure. This paper proposes a data enhancement method for the input image. It is based on the SFLDLC, which enhances the data generalization ability and improves the detection accuracy. The CBAM attention mechanism is fused to identify the feature location information in YOLOV7-tinp. The above methods are combined with CTMA; the accurate feature location and label type are obtained, which provide a guarantee for the LP calculation of the CP relative to the camera.

Data Enhancement Method
The RPS feature is unique and there is no mutual substitution relationship between the spatial positions. Therefore, the image is randomly scaled and clipped, and the Mosaic method is used to achieve data enhancement. The characteristic type and position of the CP have a coupling relationship during PPS. The determination of the label type is not based on the characteristics of the label type, but more importantly depends on the position relationship of the label. Therefore, when the data are enhanced, their location characteristics cannot be changed but can be zoomed, cropped, enhanced using the Mosaic data enhancement method, etc. The categories can be changed at the same time, but the original category characteristics of the feature do not change. In addition, this paper proposes a data enhancement method based on the SFLDLC, which enhances the ability of data generalization and improves the target recognition accuracy. In this paper, the round The feature type and location of the CP have a coupling relationship based on the CBAM-YOLOV7-tinp-CTMA network structure. This paper proposes a data enhancement method for the input image. It is based on the SFLDLC, which enhances the data generalization ability and improves the detection accuracy. The CBAM attention mechanism is fused to identify the feature location information in YOLOV7-tinp. The above methods are combined with CTMA; the accurate feature location and label type are obtained, which provide a guarantee for the LP calculation of the CP relative to the camera.

Data Enhancement Method
The RPS feature is unique and there is no mutual substitution relationship between the spatial positions. Therefore, the image is randomly scaled and clipped, and the Mosaic method is used to achieve data enhancement. The characteristic type and position of the CP have a coupling relationship during PPS. The determination of the label type is not based on the characteristics of the label type, but more importantly depends on the position relationship of the label. Therefore, when the data are enhanced, their location characteristics cannot be changed but can be zoomed, cropped, enhanced using the Mosaic data enhancement method, etc. The categories can be changed at the same time, but the original category characteristics of the feature do not change. In addition, this paper proposes a data enhancement method based on the SFLDLC, which enhances the ability of data generalization and improves the target recognition accuracy. In this paper, the round features of the CP from left to right, top to bottom, are defined as features 1 to 9, as shown in Figure 4. During data enhancement, when each feature is enhanced by traditional data methods, the feature is first extracted from the image for use. When the feature is replaced by its position, the label category of the feature is related to its position and has no relation with the feature itself. Different features define the label category according to the replaced position. The constraint conditions of the data during enhancement are as follows: represents the center point coordinates of the nth feature, and , ℎ represents the length and width of the nth feature. The adjustment coefficients of the distance between the first and third floors, first and second floors, and second and third floors are denoted by , , and , respectively, while represents the degree of adhesion between all features.

Attention Mechanism
The CBAM is a convolution attention mechanism module that combines space and channels. As Figure 5 shows, given the intermediate feature graph = × × as the input, the CBAM module will judge the attention graph in turn along two independent channels. Subsequently, it will multiply the attention graph with the input feature graph to optimize the features, which not only reduces the size and computation of the feature graph, but also improves the expression ability of the network. In order to extract the effective contour features of the target and obtain the main content of the target detection, the channel attention module is introduced, which is calculated as follows: During data enhancement, when each feature is enhanced by traditional data methods, the feature is first extracted from the image for use. When the feature is replaced by its position, the label category of the feature is related to its position and has no relation with the feature itself. Different features define the label category according to the replaced position. The constraint conditions of the data during enhancement are as follows: where x n , y n represents the center point coordinates of the nth feature, and w n , h n represents the length and width of the nth feature. The adjustment coefficients of the distance between the first and third floors, first and second floors, and second and third floors are denoted by a 1 , a 2 , and a 4 , respectively, while a 3 represents the degree of adhesion between all features.

Attention Mechanism
The CBAM is a convolution attention mechanism module that combines space and channels. As Figure 5 shows, given the intermediate feature graph F = R C×H×W as the input, the CBAM module will judge the attention graph in turn along two independent channels. Subsequently, it will multiply the attention graph with the input feature graph to optimize the features, which not only reduces the size and computation of the feature graph, but also improves the expression ability of the network. In order to extract the effective contour features of the target and obtain the main content of the target detection, the channel attention module is introduced, which is calculated as follows: where σ represents the sigmoid function, W 0 ∈ R c/r×c and W 1 ∈ R c×c/r , where W 0 and W 1 represent two inputs shared weights. The ReLU activation function is followed by W 0 , and F c avg and F c max represent the feature map generated in space by using average pooling and maximum pooling, respectively. The height is denoted by H, W is the width, C is the number of channels, and r is the reduction rate. where represents the sigmoid function, ∈ / × and ∈ × / , where and represent two inputs shared weights. The ReLU activation function is followed by , and and represent the feature map generated in space by using average pooling and maximum pooling, respectively. The height is denoted by , is the width, is the number of channels, and is the reduction rate. In order to accurately locate the detected target and improve the target detection accuracy, the spatial attention module is introduced to focus on key features. It is calculated according to the following expression: where and represent the characteristics of average pooling and maximum pooling of channels, respectively, and × represents the convolution operation with a filter size of 7 × 7.

CTMA
The classification accuracy of model recognition in the output layer is improved in this paper by introducing the CTMA. It defines all contour pixel positions and peripheral rectangle contour information that meet the feature points as ( , , , ℎ ) , ( = 1, 2, 3 ⋯ ), and thus establishes the contour matching function between features 1, 2 and 8. The specific optimization method is as follows: where represents the nearest distance between the outer surface of features n and m; _ represents the deviation coefficient of features m and n; _ represents the deviation coefficient of features j and n; represents the adjustment coefficient, and R represents the contour matching degree.
According to all the detected contour information, use Equation (6) to match and locate the contour points, and use the located contour information to calculate the labels of all features based on Equation (7). The specific optimization method is as follows: where s represents the direction of the feature point: s in features 1, 3, 4, 6, 7 and 8 is 1, and s in features 2, 5 and 9 is −1; represents deflection angle. In order to accurately locate the detected target and improve the target detection accuracy, the spatial attention module is introduced to focus on key features. It is calculated according to the following expression: where F s avg and F s avg represent the characteristics of average pooling and maximum pooling of channels, respectively, and f 7×7 represents the convolution operation with a filter size of 7 × 7.

CTMA
The classification accuracy of model recognition in the output layer is improved in this paper by introducing the CTMA. It defines all contour pixel positions and peripheral rectangle contour information that meet the feature points as x pn , y pn , w pn , h pn , (n = 1, 2, 3 · · · n), and thus establishes the contour matching function between features 1, 2 and 8. The specific optimization method is as follows: (y n − y m ) 2 + (x n − x m ) 2 − (r om + r on ) · c bn w bn +c bn h bn 2r on (x bn − x bm ) 2 + (y bn − y bm ) 2 = c length nm [(c bn w bn + c bn h bn +c bm w bm + c bm h bm )/4 + D nm ] x bn − x bj 2 + y bn − y bj 2 = c length_nj (c bn w bn + c bn h bn +c bj w bj + c bj h bj /2 where D nm represents the nearest distance between the outer surface of features n and m; c length_nm represents the deviation coefficient of features m and n; c length_nj represents the deviation coefficient of features j and n; a represents the adjustment coefficient, and R represents the contour matching degree. According to all the detected contour information, use Equation (6) to match and locate the contour points, and use the located contour information to calculate the labels of all features based on Equation (7). The specific optimization method is as follows: x bn = sin(θ bn )· x n 2 + y n 2 y bn = cos(θ bn )· x n 2 + y n 2 θ bn = arccot y on x on − arccot ·s + arccot 2(y o8 −y o5 ) where s represents the direction of the feature point: s in features 1, 3, 4, 6, 7 and 8 is 1, and s in features 2, 5 and 9 is −1; θ bn represents deflection angle. Based on the above matching conditions, the label of the feature is reassigned to ensure the accuracy of the feature label and reduce the situation where the LP cannot be solved due to the abnormal classification label.

Loss Function
The loss function used in YOLOV7-tinp is CIoU-Loss. It is calculated as follows: where IOU represents the overlapping area of the prediction box; A represents the prediction box; B represents the real box; α is the weight function; v is the consistency of the aspect ratio; ρ(A, B) is the Euclidean distance between the center point coordinates of the A box and the B box, and c is the diagonal distance of the smallest box wrapping box A and box B.

Model Evaluation
The main evaluation indicators selected to verify the effectiveness of the proposed model are precision (P), recall (R), and mean average precision (mAP). The formulas for calculating these indicators are as follows: where T p represents the actual positive case and is judged as a positive case by the classifier; F p represents the actual negative case is judged as a positive case by the classifier; F N represents the actual positive case but is judged as a negative case by the classifier; and C represents the number of detection categories. This study only needs to identify the circular features of the CP; therefore, C ≤ 9. The average value of AP is represented by mAP, which can measure the overall performance of the target detection algorithm.

Location Solution in RPS
Using the calculated (X m , Y m ), the coordinate value of the CP in the physical coordinate system can be deduced according to the similar projection relationship. It is calculated as follows: where (X, Y, D z ) is the actual spatial position of the target point relative to the camera; (x, y) is the position of the target point in the pixel in the image; L w and L h represent the length and width of the target circular feature, respectively; L iw and L ih represent the pixel size of the CP feature in the length and width directions, respectively; s c represents the pixel size of the camera, and s w and s h represent the length and width pixel sizes of the image, respectively. The location information of the CP can be calculated using (10), which guides the robot end to move to the focal length of the camera and provide guarantee for the PPS of the CP.

PPS LP Solution
The pixel position information of effective feature points can be obtained using the above algorithm. Combined with the three-dimensional spatial position of the CP, it can be converted into a PNP problem [37]. Therefore, we use the pixel coordinate x apn , y apn and its corresponding spatial position coordinate (x on , y on , z on ). Subsequently, the position information x pos , y pos , z pos and attitude information x ang , y ang , z ang of the CP coordinate origin relative to the camera center point can be obtained. For the solution to the PNP problem, different methods need different number of effective feature points.
In space, based on the vector set composed of three-dimensional coordinates of at least six feature points, the position of any coordinate point can be represented by setting the weight size as follows: where p w i is the point with known three-dimensional coordinates in the world coordinate system; c w j is the jth control point of p w i in the world coordinate system, and α ij is the weight coefficient. Figure 6 shows the EPnP algorithm location process.

PPS LP Solution
The pixel position information of effective feature points can be obtained u above algorithm. Combined with the three-dimensional spatial position of the C be converted into a PNP problem [37]. Therefore, we use the pixel coordinate and its corresponding spatial position coordinate ( ,  ,  ). Subsequently, t tion information ( , , ) and attitude information ( , , ) of coordinate origin relative to the camera center point can be obtained. For the sol the PNP problem, different methods need different number of effective feature p In space, based on the vector set composed of three-dimensional coordina least six feature points, the position of any coordinate point can be represented by the weight size as follows: where is the point with known three-dimensional coordinates in the world nate system; is the jth control point of in the world coordinate system, an the weight coefficient. Figure 6 shows the EPnP algorithm location process.
, .. According to the above positioning process, the pixel coordinates of each based on the target recognition result are taken as input. In order to ensure the cal accuracy, the weight coefficient is calculated only when the number of featur is not less than six. Subsequently, the feature points are calculated in the camera nate system. The error is defined by the Gauss-Newton algorithm, and the tra vector t and rotation vector R are calculated. Finally, the position and orientatio mation of the CP can be obtained. According to the above positioning process, the pixel coordinates of each feature based on the target recognition result are taken as input. In order to ensure the calculation accuracy, the weight coefficient α ij is calculated only when the number of feature points is not less than six. Subsequently, the feature points are calculated in the camera coordinate system. The error is defined by the Gauss-Newton algorithm, and the translation vector t and rotation vector R are calculated. Finally, the position and orientation information of the CP can be obtained.

Results
The test process is conducted under the Windows 10 operating system. A processor of model Intel (R) Core (TM) i7-10700K CPU @ 3.80 GHz, 3.79 GHz memory, and Nvidia GeForce RTX 3080 graphics card is used. The programming language used is Python 3.9 on the PyCharm programming platform, and PyTorch 1.6 is selected as the deep learning framework. The training is based on the GPU. During the performance test, the CPU is used for comparative testing in order to ensure that it is similar to the actual application scenario.

Judgment Basis of CP LP Error
During data collection, this research fixed the robot on the base in order to obtain the actual position and orientation information of the CP relative to the camera. The world coordinates of the base were kept unchanged, and the robot was inserted into the CP while teaching. This state was considered as the zero LP. The robot was moved randomly out of the CP within the recognition range. Based on the LP information of the robot during data collection and combining it with the zero LP information, the LP information of the CP relative to the end of the manipulator was obtained. Subsequently, the actual LP information of the CP relative to the camera was calculated. The absolute difference between the actual LP information and the theoretical relative LP calculated in this paper was used as the basis for evaluating the accuracy of this algorithm.

LP Accuracy Test in RPS
The RPS is mainly divided into feature recognition and LP resolution of the CP. Figure 7 shows the recognition performance of the feature points in different scenarios. The theoretical LP information is obtained based on the LP resolution algorithm proposed in this paper, and subsequently, the actual LP error information is obtained. A comparison of the different recognition methods in the RPS is provided in Table 4. The experimental results in Table 4 show that the precision of CBAM-YOLOV7-tinp is 0.002 higher than that of Fast RCNN, 0.003 higher than that of yolov3, and 0.001 higher than those of YOLOV4, YOLOV5, and YOLOV7-tinp. The recall value of CBAM-YOLOV7-tinp is 0.02 higher than those of Faster RCNN, YOLOV3, and YOLOV4, and 0.001 higher than those of YOLOV5s and YOLOV7-tinp. In this paper, considering mAP @ 0.5:0.95 as an example, CBAM-YOLOV7-tinp has the highest accuracy, which is 0.005 higher than that without the CBAM. In the actual positioning, we try to improve the detection accuracy by reducing the false recognition in order to avoid damaging the manipulator. Therefore, CBAM-YOLOV7-tinp performs the best in terms of the detection accuracy. Although the detection time increases slightly due to the addition of the attention mechanism, this increased time is acceptable due to the improved accuracy weight in each index.
Based on the comparison of the above results, we use CBAM-YOLOV7-tinp to identify the position of the feature target, substitute the feature position information into the LP solution model, and obtain the LP information in different scenarios, as shown in Table 5.

LP Accuracy Test in RPS
The RPS is mainly divided into feature recognition and LP resolution of the CP. Figure 7 shows the recognition performance of the feature points in different scenarios. The theoretical LP information is obtained based on the LP resolution algorithm proposed in this paper, and subsequently, the actual LP error information is obtained. A comparison of the different recognition methods in the RPS is provided in Table 4.  The experimental results in Table 4 show that the precision of CBAM-YOLOV7-tinp is 0.002 higher than that of Fast RCNN, 0.003 higher than that of yolov3, and 0.001 higher  Table 4 show that the indoor accuracy is basically the same as that at night, and the relative accuracy is relatively high. The average accuracy values of X, Y, and Z are 2.34 mm, 2.51 mm, and 2.64 mm, respectively. The accuracy in the sunny morning is basically the same as that on the cloudy day. The average accuracy values of X, Y, and Z are 2.72 mm, 2.92 mm, and 2.98 mm, respectively. The accuracy values of X, Y, and Z are 2.81 mm, 2.99 mm, and 3.17 mm, respectively, at noon on the sunny day. The average accuracy values of all cases are 2.61 mm, 2.79 mm, and 2.90 mm, which can meet the needs of the RPS. The reason for the above accuracy difference is related to the shooting clarity and light difference of the image under different light field conditions.

LP Accuracy Test in PPS
The PPS is mainly divided into feature recognition and LP resolution of the CP. Figure 8 shows the effect of feature recognition in different scenarios. The theoretical LP information is obtained based on the LP resolution algorithm in this paper, and subsequently the actual LP error information is obtained. Table 6 shows the LP error of the CP in different scenarios. It can be concluded based on the experimental results in Table 5 that out of Faster RCNN, YOLOV3, YOLOV4, YOLOV5s, and YOLOV7-tinp, YOLOV7-tinp outperforms the other models in terms of various indicators. It can further be noted that the results of the PPS directly affect the positioning results. In order to improve the accuracy and meet the insertion accuracy, YOLOV7-tinp is further improved. The precision of SFLDLC-CBAM-YOLOV7-tinp-CTMA algorithm proposed in this paper is 0.002 and 0.001 higher than YOLOV7-tinp and CBAM-YOLOV7-tinp, respectively. Its recall value is 0.002 higher than that of YOLOV7-tip; mAP @ 0.5 value is 0.002 and 0.001 higher than those of YOLOV7tinp and CBAM-YOLOV7-tinp, respectively, and the mAP @ 0.5:0.95 value is 0.005 and 0.003 higher than those of YOLOV7-tinp and CBAM-YOLOV7-tinp, respectively. In the actual positioning, damage to the manipulator can be avoided by improving the detection accuracy as much as possible by reducing misidentification. Therefore, SFLDLC-CBAM-YOLOV7-tinp-CTMA performs the best in terms of the detection accuracy. However, its detection time is slightly increased due to the addition of SFLDLC, CBAM, and CTMA. This increased time is acceptable due to the improved accuracy of each index.
Based on the comparison of the above results, we use SFLDLC-CBAM-YOLOV7tinp-CTMA to identify the position of the feature target, substitute the feature position information into the LP solution model, and obtain the LP information in different scenarios. The corresponding errors are shown in Table 7.

LP Accuracy Test in PPS
The PPS is mainly divided into feature recognition and LP resolution of the CP. Figure 8 shows the effect of feature recognition in different scenarios. The theoretical LP information is obtained based on the LP resolution algorithm in this paper, and subsequently the actual LP error information is obtained. Table 6 shows the LP error of the CP in different scenarios.    The positioning results in Table 6 show that the features of PPS and RPS have a common feature of circular edges. Therefore, the detection and positioning accuracy trends are similar. The positioning accuracy is basically the same in indoor sunny days, outdoor cloudy days, and at night, and the relative positioning accuracy is relatively high. The average accuracy values of x, y, z, Rx, Ry, and Rz are 0.61 mm, 0.85 mm, 1.21 mm, 1.16 degrees, 0.94 degrees, and 0.54 degrees, respectively. The positioning accuracy is low in outdoor sunny days, especially at noon. The average accuracy values of x, y, z, Rx, Ry, and Rz in outdoor sunny days are 0.70 mm, 0.95 mm, 1.30 mm, 1.24 degrees, 1.14 degrees, and 0.64 degrees, respectively. The positioning accuracy can meet the needs of PPS. The reason for the above accuracy difference is related to the shooting clarity and light difference of the image under different light field conditions.

Comparison of Results
In order to evaluate the progressiveness of the algorithm proposed in this paper, this paper compared it with three advanced electric vehicle CP identification and location methods. Table 8 shows the comparison results. Table 7 shows that when the three advanced methods are used to identify multicategory CPs, the robustness of the algorithm is low, the error is high, and they are unable to identify and locate. Therefore, it is verified that the algorithm proposed in this paper exhibits robustness with respect to the identification of multiple types of CPs and has a significant application value.

Plug Test Verification
As the positioning accuracy in outdoor sunny days is low in the above tests, and the positioning accuracy of other scenes is basically the same, we define two cases as scene 1 and scene 2. We carried out 200 plug-in tests for each of these two situations. In these tests, the algorithm proposed in this paper is used for positioning, combined with the minimum mechanism of three iterations, and the 6-DOF articulated robot of AUBO-i10 is used for plugging. Table 9 shows the test results. Based on the identification and location algorithm proposed in this paper, the average plugging rate of the CP is 96.5% in indoor (sunny/cloudy/night) conditions, and 92.0% in outdoor sunny (morning/noon/afternoon) conditions.

Discussion
The LP errors are mainly caused by feature recognition positioning errors and system errors. Next, we discuss these two types of errors.

Errors Caused by Complex Scenes and Different Characteristics of CPs
Although the size of CP has a unified standard, its material, smoothness, light angle, light brightness, and the specificity of different manufacturers will affect the recognition of CP characteristics. It can be divided into the following five situations, as shown in Figure 9: The difference of recognition of the same CP with respect to different times, locations, and scenes: The different time periods include morning and afternoon, noon, and evening. Different scenes mainly include indoor and outdoor. Different orientations mainly include the degree of the camera and the sun, and these differences will increase the recognition difficulty. b.
The difference of the same CP under different plugging times: As the plugging times increase, there will be bumps on the surface of the CP. The consistency of the surface will be damaged, which degrades the recognition performance of the algorithm. c.
It is difficult to identify the color and structural characteristics of the same CP. The round feature chamfer of the CP will cause the outline of the CP to deviate under different angles. The round feature and background of the CP are both black in color. The inside of the CP has a circle similar to the feature, which increases the difficulty of identifying the feature. d. Difference of different CPs: Chamfer degree of CP feature (d1), as well as different surface materials of CP (d2), the degree of reflection and smoothness of the surface of the CP (d3). The above factors will increase the difficulty of CP identification. e.
The image at the non-focus position is blurred. To calculate the LP of the CP, the camera used is an industrial camera with an invariable focal length. Therefore, the image will be blurred in the non-focal position.
The aforementioned five conditions are the main reasons for the CP feature recognition difficulty. Therefore, during the recognition process, the occurrence of these conditions should be minimized in order to reduce the interference of complex environment on algorithm recognition. In Tables 5 and 6, the lighting environment of the images taken at night is relatively stable; therefore, it will exhibit a good performance. However, the positioning results are similar to those for indoor daytime, and no higher accuracy is obtained in the stable light field. The main reason is that at the time of data acquisition, the surfaces of some CPs are smooth, which will cause specular reflection. When only fill light is available, the effective surfaces of the CPs and the camera lens will be parallel or form a certain angle, and the effective feature area of the CPs is relatively large. Therefore, when specular reflection occurs at a few feature locations during data acquisition, a large amount of reflected light enters the camera aperture, resulting in overexposure of this part. Due to specular reflection of some features, a large amount of reflected light is reflected away, which will easily cause the loss of feature information. At noon in the outdoor environment, the shadow will appear inside the CP due to the angle between the sun and the CP. Shadows can express additional interference features. The characteristics of the parts under direct and non-direct sunlight have large differences in the amount of light. This situation is the most complex, causing the largest recognition error. e. The image at the non-focus position is blurred. To calculate the LP of the CP, the camera used is an industrial camera with an invariable focal length. Therefore, the image will be blurred in the non-focal position. The aforementioned five conditions are the main reasons for the CP feature recognition difficulty. Therefore, during the recognition process, the occurrence of these conditions should be minimized in order to reduce the interference of complex environment on algorithm recognition. In Tables 5 and 6, the lighting environment of the images taken at night is relatively stable; therefore, it will exhibit a good performance. However, the positioning results are similar to those for indoor daytime, and no higher accuracy is obtained in the stable light field. The main reason is that at the time of data acquisition, the surfaces of some CPs are smooth, which will cause specular reflection. When only fill light is available, the effective surfaces of the CPs and the camera lens will be parallel or form a certain angle, and the effective feature area of the CPs is relatively large. Therefore, when specular reflection occurs at a few feature locations during data acquisition, a large amount of reflected light enters the camera aperture, resulting in overexposure of this part. Due to specular reflection of some features, a large amount of reflected light is reflected away, which will easily cause the loss of feature information. At noon in the outdoor environment, the shadow will appear inside the CP due to the angle between the sun and the CP. Shadows can express additional interference features. The characteristics of the parts

LP Calculation Error
The accuracy of EPNP algorithm depends on the pixel coordinate position of feature points and the actual three-dimensional coordinate position. Furthermore, the number of effective features, camera calibration, and hand-eye calibration accuracy affect the solution results. The pixel feature position is directly related to the recognition accuracy of feature points. The actual three-dimensional coordinate position deformation is related to the expression of features on the image. The number of effective feature points is related to the number of features recognized by the algorithm. The calibration accuracy has a relatively small impact on the positioning accuracy.

System Error
The system error is mainly caused by the robot positioning accuracy, including three aspects. First, the robot's repeated positioning accuracy. Second, the vibration of the base of the robot during the image acquisition and plugging process. Last, the positioning error caused by the gravity interference of the robot at different positions.

Conclusions
This paper proposed a set of electric vehicle CP identification and location algorithm based on CBAM-YOLOV7-tinp-CTMA, which realized the CP identification and location in multiple categories, multiple scenes, and a wide range. In this paper, the recognition process was divided into two stages, and the recognition and positioning model was established, respectively. The LP was calculated based on the similar projection relationship and EPnP algorithm, and the insertion test was completed by using the mechanical arm.
The two stages were tested in this paper, and the average positioning errors (x, y, z) of RPS CP were 2.61 mm, 2.79 mm, and 2.90 mm, respectively. The average LP errors (x, y, z, rx, ry, and rz) of the fine positioning CP were 0.64 mm, 0.88 mm, 1.24 mm, 1.19 degrees, 1.00 degrees, and 0.57 degrees, respectively. In different scenarios, the higher the positioning accuracy, the greater the plugging success rate. The plugging success rate in outdoor sunny days was 92.0%, and in other cases, it was equal to 96.5%. Compared with the existing advanced methods, the algorithm proposed in this paper had a high universality and could identify various types of CPs and complete positioning. It provided a theoretical basis for the positioning of various CPs and could have a high engineering application value.
In the future, more data on CP types and environmental complexity will be added. The improved algorithm will be optimized to improve its adaptability and recognition accuracy, increase the success rate of plugging, and reduce the impact of the plugging process on robots and vehicles. If there are problems with visual positioning, we can use vibration signals to compensate for visual positioning errors in the future, thereby avoiding accidents.  Acknowledgments: The authors would like to thank all of the authors cited and the anonymous referees in this article for their helpful suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.

CP
Charging port RPS Rough positioning stage PPS Precise positioning stage CBAM Convolutional block attention module CTMA Cluster template matching algorithm LP Location and posture SFLDLC Similar feature location to determine label category