1. Introduction
In recent years, unmanned aerial vehicle (UAV) technology has been widely used in transmission line inspection, providing an effective solution to overcome the limitations of traditional inspection methods [
1,
2]. The traditional manual inspection method is inefficient, prone to omissions, labor-intensive, and difficult to make quantitative evaluations, so it cannot meet the urgent needs of intelligent management of modern power grids [
3]. With the rapid development of remote sensing and artificial intelligence technology, the unmanned aerial vehicle inspection system equipped with a LiDAR sensor can quickly obtain high-precision and high-density three-dimensional point cloud data from transmission line corridors [
4]. With their excellent data acquisition capabilities and reliability, these systems provide key technical support for the refined monitoring of power transmission facilities [
5,
6,
7].
However, although onboard LiDAR technology has shown significant advantages in data collection, there are still many challenges in the intelligent processing of point cloud data. Especially in complex point cloud data, accurately identifying and segmenting key targets such as wires, towers, and insulators, and accurately locating key monitoring points, has become a bottleneck that restricts the development of autonomous UAV inspection technology [
8]. At present, the power patrol technology based on drones has gradually developed from the manual control of early visible light photography to the intelligent patrol mode based on the LiDAR point cloud [
9,
10,
11]. However, in practice, inspectors still need to manually screen the target points in a large amount of three-dimensional point cloud data indoors to plan the inspection route. This process is not only time-consuming and labor-intensive but also prone to errors [
12,
13]. Therefore, the automatic extraction and accurate positioning of keypoints is of great theoretical significance and practical value for further improving the efficiency of inspection, reducing the cost of operation and maintenance, and promoting the intelligent development of autonomous UAV inspection technology [
14,
15].
In the power transmission system, in addition to path planning, precise positioning of key components such as insulator strings is equally important for fault diagnosis and preventive maintenance. Keypoint positioning technology has been widely studied in the fields of computer vision and robotics. In the field of two-dimensional image processing, target detection and keypoint extraction methods based on deep learning have made remarkable progress in the detection of power facilities, including applications such as insulator detection and defect identification [
16,
17]. However, the extraction of three-dimensional point cloud keypoints still faces many challenges, such as high computational complexity and strict data quality requirements [
18,
19]. Point cloud data has the characteristics of being unstructured, disorderly, and sparse, which makes it difficult for traditional two-dimensional methods to be directly applied to three-dimensional point clouds [
20,
21,
22].
As a key component in the transmission system, the insulator string plays a crucial role in ensuring the safe operation of the power grid by monitoring its status [
23]. The existing insulator defect detection methods mainly rely on visible light or infrared images [
24], while deep learning-based defect identification technology has made remarkable progress. Liu et al. [
25] reviewed the insulator defect detection method based on deep learning, pointing out that the existing method still faces challenges in terms of detection accuracy and real-time performance in complex environments. These image-based methods are highly susceptible to light conditions, camera angle, weather conditions, and other factors; in strong light, shadow occlusion, or smog weather, the detection accuracy will be significantly reduced [
26]. In contrast, point cloud-based insulator keypoint extraction can provide more accurate and stable three-dimensional location information [
27], but research in this field is still limited [
28]. Shen et al. [
28] proposed an automatic tower detection framework based on stratified thick and fine segmentation, which improved the detection accuracy in complex scenarios through a multi-scale segmentation strategy. However, this method is mainly aimed at the main structure of the tower, and the positioning ability of fine components such as insulators is limited. Chen et al. [
29] proposed an insulator string extraction method based on multi-scale feature histograms, which realizes automatic identification through point cloud geometric feature analysis. However, this method relies on a priori knowledge of towers and transmission lines, and the adaptability to complex scenarios still needs to be improved. Zhang et al. [
30] proposed an automatic tower extraction framework based on the point cloud of UAV LiDAR, demonstrating the application potential of point cloud data in the identification of power facilities and laying the foundation for subsequent research.
Whether it is the keypoint detection of insulator strings or the keypoint detection of other power transmission facilities, the efficient three-dimensional point cloud keypoint detection technology is at its core. In the field of three-dimensional point cloud processing, keypoint detection technology has been widely studied. The early method was mainly based on geometric features and identified feature points through indicators such as curvature and normal vector changes [
31,
32]. Traditional three-dimensional keypoint detectors, such as inherent shape features (ISS) [
33], identify prominent feature points by calculating the characteristic value distribution of local point cloud neighborhoods and show good robustness in tasks such as rigid body matching. Traditional geometric feature descriptors, such as linearity, planarity, and scattering based on principal component analysis (PCA), have been widely used in the geometric structure analysis of point clouds [
34]. These features delineate the distribution characteristics of the point cloud along the main direction by calculating the eigenvalue ratio of the covariance matrix. However, for V-shaped insulator strings, PCA characteristics have limitations: the V-shaped structure is composed of two approximately linear branches. The overall PCA may still show high linearity, and it is difficult to effectively distinguish between V-shaped and single-string linear insulators. These methods, based on manual features, often face challenges such as low computing efficiency and limited feature representation ability when dealing with complex scenarios and large-scale point clouds.
In recent years, deep learning methods have made remarkable progress in the field of three-dimensional keypoint detection. PointNet [
35] and its improved version, PointNet++ [
36], pioneered the deep learning architecture of direct point cloud processing, realizing permutation invariance through multi-layer perceptrons and symmetric functions, thus laying the foundation for subsequent research. Subsequently, various point cloud processing methods based on deep learning were proposed, including Relation-Shape Convolutional Neural Networks [
37] and Point Transformer [
38], which enhance feature extraction capabilities through self-attention mechanisms and more complex network architectures. These methods learn the relationship between points and context information, achieve higher accuracy in tasks such as shape classification and semantic segmentation [
20], and provide a new technical way for keypoint detection in complex scenarios.
For the specific field of transmission tower keypoint detection, Wu et al. [
39] proposed a UPKD method specifically for transmission tower keypoint detection. The method uses unsupervised learning to extract detection points from three-dimensional LiDAR data and identify the structural characteristics of the tower through clustering algorithms, showing the potential in detecting keypoints of the tower. Li et al. [
15] developed a target point positioning algorithm based on deep learning for drone patrol route planning and realized end-to-end detection from point cloud to detection target point. Although end-to-end deep learning methods have made remarkable progress in the field of two-dimensional image keypoint detection in recent years, there are still many challenges in directly applying these methods to the keypoint detection of transmission towers in three-dimensional point clouds. First of all, in terms of data requirements, the end-to-end deep learning method requires large-scale annotation data for training, and the three-dimensional annotation of keypoints of the transmission tower is expensive and time-consuming, making it difficult to quickly accumulate sufficient training samples [
40]. Secondly, in terms of generalization ability, the structure of different types of transmission towers is significantly different. The performance fluctuations of deep learning models may fluctuate under unprecedented tower types or different acquisition conditions (such as different distances and point cloud densities), which requires continuous data accumulation and model retraining. Third, in terms of actual deployment, the “black box” characteristics of the deep learning model limit its interpretability and debugability, and the real-time performance needs to be improved, which makes it difficult to meet the online processing needs of large-scale point cloud data. The SKD method proposed by Tinchev et al. [
41] utilizes a keypoint detection method based on significance, which alleviates the dependence on annotation data to a certain extent, but its applicability in transmission scenarios has not been fully verified. The weakly supervised learning method (“One Thing One Click”) proposed by Liu et al. [
42] reduces the amount of labeling workload through self-training strategies and provides new ideas for reducing labeling costs. However, this method is mainly aimed at semantic segmentation tasks, and its application in keypoint detection still needs to be further studied. This paper focuses on the actual needs of the following aspects in the method design: (1) There is no need for large-scale annotation training sets, and only a small number of samples are needed for parameter tuning, which significantly reduces the cost of data preparation; (2) The algorithm logic is clear and transparent, which is convenient for targeted adjustment and troubleshooting according to the actual scenario; and (3) It has good generalization ability for different tower types and point cloud qualities and can adapt to new scenarios without retraining.
The extraction of keypoints of transmission towers faces the following core challenges: (1) The structure of transmission towers is complex and diverse, and the geometric characteristics of different types of towers are significantly different [
43,
44]; (2) The quality of point cloud data is affected by factors such as scanning distance and occlusion, which is manifested as uneven density and missing local data [
45]; and (3) The positioning accuracy of keypoints needs to be further improved in order to achieve precise target point positioning to meet the needs of autonomous UAV inspection applications [
15].
The main contribution of this paper is the TPKE method, which can automatically extract key detection points from multiple types of transmission towers after semantic segmentation. The structure of the method is as follows:
Section 2.1 briefly introduces the preprocessing semantic segmentation module.
Section 2.2 introduces the core algorithm of TPKE, which contains two special keypoint extraction modules: (1) Insulator string module: use the adaptive DBSCAN clustering method to segment the insulator point cloud, introduce the “concavity” morphological metric (η) to identify V-shaped insulators, and realize positioning-verification-compensation strategies through directional bounding box analysis and axial extension search to solve the problem of missing endpoint data. (2) Ground wire module: Identify the candidate connection area through local geometric feature analysis (linearity and flatness measurement based on PCA), and achieve precise positioning through spatial orthogonal projection and weighted center of mass estimation.