Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds

Tong, Chaoliu; Shen, Yu; Zhang, Kanjian; Wei, Haikun

doi:10.3390/rs18132082

Open AccessArticle

Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds

¹

School of Automation, Southeast University, Nanjing 210096, China

²

Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(13), 2082; https://doi.org/10.3390/rs18132082 (registering DOI)

Submission received: 13 April 2026 / Revised: 10 June 2026 / Accepted: 15 June 2026 / Published: 25 June 2026

(This article belongs to the Section Engineering Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A training-free shape-prior framework is proposed for extracting OTL towers from cluttered UAV LiDAR point clouds.
The method achieved a 97.07% average F1-score and the lowest normalized inference time on six OTL datasets.

What are the implications of the main findings?

Explicit tower geometry priors can reduce dependence on large annotated datasets and heavy network inference.
The extracted tower point clouds support digital acceptance, UAV path planning, geometric measurement, and defect inspection.

Abstract

Accurate extraction of transmission towers from Unmanned Aerial Vehicle (UAV) Light Detection and Ranging (LiDAR) point clouds is a prerequisite for overhead transmission line (OTL) acceptance. This task remains challenging because tower points are heavily entangled with ground, vegetation, conductors, and insulators, especially in complex terrain. To address this issue, we propose a shape prior-guided coarse-to-fine framework for tower extraction from UAV LiDAR point clouds. First, candidate tower regions are localized from the scene point cloud through preprocessing, near-ground suppression, and density-based clustering. Second, the least-disturbed central body of each candidate tower is identified in a slice-wise manner and used to estimate the tower orientation and four principal structural axes. Third, side-view and front-view structural envelopes are progressively inferred to suppress non-tower points around the tower body and tower head. Finally, a base-constrained filtering strategy is introduced to remove residual ground and low-vegetation points within the tower footprint. Experiments conducted on multiple OTL datasets acquired in different regions of China, including plains and mountainous areas, demonstrate that the proposed method achieves robust and efficient tower extraction across diverse scenarios. The results indicate that explicit structural priors offer a promising complement to feature-driven and data-intensive approaches, particularly in scenarios with limited annotated data and strict real-time requirements. The proposed method processes scene point clouds containing tens to hundreds of millions of points, with an average extraction time of approximately 100 to 300 s per tower depending on scene density.

Keywords:

overhead transmission line; tower extraction; UAV LiDAR point cloud; shape prior information

1. Introduction

Target extraction in complex scenes has a broad application prospect in electric power inspection [1,2], intelligent driving [3], intelligent agriculture [4], etc. As a typical application scenario of target extraction in complex scenes, tower extraction plays an important role in the acceptance of overhead transmission lines (OTLs) [5]. With the ongoing advancement of the social economy, humanity’s demand for energy continues to grow. OTLs are essential for the power system and shoulder the critical responsibility of transmitting electric energy over long distances [6]. In view of safety and efficiency, acceptance of OTLs is crucial before commissioning. Compared with manual acceptance, automatic acceptance using the Unmanned Aerial Vehicle (UAV) is obviously more efficient. The flight path planning algorithms for automatic UAV data collection rely on the scene point cloud; they have already completed the extraction of the towers. Based on the extracted tower point clouds, acceptance tasks including missing member detection, verticality inspection, and other specific checks can be performed. Therefore, the accurate extraction of the tower point cloud from the scene point cloud is crucial for the realization of the digital acceptance of the OTLs.

Recent studies have explored deep learning for OTL point cloud segmentation, including CNN-based [7], PointNet-based [8,9], and graph-based models [10], as well as data augmentation strategies. These methods have demonstrated the potential of representation learning for extracting towers, wires, and ground points. However, their effectiveness in OTL scenes is still constrained by several practical factors, including the scarcity of large-scale annotated datasets, the severe class imbalance of tower points, and the structural diversity of different tower components. In addition, although recent lightweight architectures such as PointNeXt [11] and Point Transformer V3 [12], together with self-supervised and weakly supervised learning paradigms [13], have improved computational efficiency and reduced dependence on fully annotated data, their transferability to domain-specific OTL scenarios still requires careful validation. This is because large-scale OTL point cloud datasets remain difficult to obtain, and the geometric configurations of tower components are highly specialized. Therefore, while recent deep learning approaches are promising, their practical deployment in acceptance applications with limited annotated data and real-time requirements remains challenging. These limitations also motivate the exploration of methods that more explicitly exploit the geometric and structural priors of transmission towers.

Because of the limitations of deep learning methods, feature-based methods have also been explored for tower extraction. Such methods mainly rely on hand-crafted local geometric features and can generally be divided into supervised and unsupervised approaches. Supervised methods usually accomplish tower extraction through neighborhood construction, feature design, and classifier selection. Representative classifiers include Random Forest [14,15,16], Support Vector Machines [17], gradient boosting [18], and JointBoost [19]. Although these studies demonstrate the feasibility of classical machine learning for OTL point cloud classification, their performance still depends strongly on the design of discriminative features and the selection of suitable classifiers. Therefore, satisfactory accuracy and robustness are still difficult to achieve in complex OTL scenes.

Unsupervised methods mainly exploit heuristic rules, height distribution characteristics, topological relations, and geometric assumptions to extract towers or pylons from OTL point clouds. Early studies mainly relied on height gaps, local maxima, connected components, and region growing to localize pylon candidates from corridor scenes [20,21,22]. Other methods incorporated vertical slice analysis, topological constraints, or intensity- and echo-based patterns for extracting towers or power structures [23,24,25]. More recent studies further explored automatic geometric-analysis pipelines for pylon detection and extraction in powerline corridors [26]. Compared with supervised or deep learning methods, these approaches are usually more interpretable and less dependent on annotated training data. However, their effectiveness still often depends on empirical thresholds, data quality, and scene-specific geometric assumptions. Therefore, although unsupervised methods have shown practical value in tower or pylon extraction, satisfactory robustness and generalization are still difficult to achieve in complex terrain or cluttered environments. It is also worth noting that a few recent studies have introduced more explicit geometric priors for pylon refinement. For example, Shen et al. [27] proposed a hierarchical coarse-to-fine framework for pylon detection from UAV Light Detection and Ranging (LiDAR) point clouds, in which shape prior knowledge was mainly used to restore distorted slice shapes and refine pylon-leg points from surrounding attachments. However, its treatment of the pylon-head powerlines is still relatively simplified. Specifically, the associated powerlines at the pylon head were first deliberately retained and then removed by a projected histogram analysis along the dominant direction rather than by explicit structural modeling of the tower head. Moreover, the method does not provide an effective automatic solution for removing vegetation points beneath or inside the pylon base, and the authors themselves noted that some points under the pylon still remain due to vegetation. Therefore, although recent studies have begun to consider shape information, the use of explicit tower-shape priors for complete tower extraction remains insufficiently explored, especially for tower head interference suppression and geometry-guided filtering of residual points within the tower base region.

OTL towers exhibit distinctive structural regularities that can be explicitly exploited for geometric extraction from UAV LiDAR point clouds. In general, a tower can be decomposed into three components, namely the base, the body, and the head. The base is typically composed of four pyramid-like supporting structures, the body can be approximated as a centrally symmetric quadrangular prism, and the head consists of another centrally symmetric prism together with cross-arms. These structural characteristics give rise to stable and view-dependent geometric patterns: the central body of the tower remains relatively regular and less disturbed, and the cross-arms are largely suppressed in the side view, while they induce abrupt lateral variations in the front view. However, such explicit structural priors have not been fully exploited in existing tower extraction studies. As a result, feature-driven methods may be sensitive to near-ground interference and surrounding clutter, whereas data-driven methods usually require a large amount of annotated data and considerable computational resources. Motivated by these observations, we propose a shape prior-guided coarse-to-fine method for tower extraction from UAV LiDAR point clouds. Specifically, candidate tower regions are first localized from the scene point cloud, after which the structural priors of the tower body and head are utilized to progressively infer side-view and front-view envelopes for suppressing conductors, ground, and vegetation points around the tower. Finally, the pyramid-like prior of the tower base is further introduced to remove residual ground and low-vegetation points inside the tower foot. In this way, tower extraction is formulated as a progressive geometric refinement process under explicit structural constraints rather than a purely point-wise classification problem.

The main contributions of this work are summarized as follows:

We develop an engineering-oriented and training-free geometric extraction framework for UAV LiDAR-based OTL tower extraction. The framework progressively integrates tower-specific structural priors, including central-body regularity, dual-view envelope constraints, and base geometry consistency.
We propose a central region-guided structural inference strategy, in which the least-disturbed tower body is used to estimate tower orientation and principal structural axes, and the tower geometry is further refined through side-view and front-view envelope constraints.
We introduce a base-constrained residual filtering scheme that exploits the geometric prior of the tower foot to remove ground and low-vegetation points both outside and inside the tower base, thereby improving extraction robustness in complex terrains and across transmission corridors of different voltage levels.

The subsequent sections of this paper are organized as follows: Section 2 provides the application background and the description of structural priors of OTL towers used in this paper. Section 3 introduces the proposed method in detail. Section 4 reports the experimental results, ablation studies, comparison with reproduced baseline methods, and computational efficiency analysis. Section 5 discusses the applicability, limitations, and future extensions of the proposed method. Section 6 draws the conclusion.

2. Background

In this section, we briefly introduce the application background of our work. After that, we discuss the shape prior information of the tower.

2.1. Application Background

The acceptance work of OTLs mainly includes the structural detection of towers, sag calculation, bolt detection and insulator detection. At present, the acceptance of OTLs is mainly done manually. The main ways of manual acceptance are workers boarding the tower to check the details or workers standing at the foot of the tower to observe the tower with telescopes or cameras [28,29]. In recent years, with the rapid development of UAV technology and artificial intelligence technology, UAVs have been gradually applied to the field of OTL inspection. Compared with inspection work, acceptance work requires the UAV to carry out more in-depth and complete data collection of the tower, which is more difficult and dangerous. Therefore, UAV-based digital acceptance of OTLs is rarely reported. However, with the rapid development of Light Detection and Ranging (LiDAR) technology [30,31,32], it is possible to realize digital acceptance of OTLs. Airborne LiDAR point cloud contains XYZ information, echo information, intensity information and RGB information, which can be used to establish a 3D digital model of the OTL acceptance scene and provide data support for automatic acceptance.

The proposed OTL digital acceptance system is shown in Figure 1. Firstly, the point cloud of the acceptance scene is obtained by a manual or semi-automatic flight (manual pointing). However, before collecting the scene point cloud, we only have tower planning location information without an accurate 3D digital map of the whole scene. Therefore, our point cloud acquisition method is to set the LiDAR direction vertically down, and then the UAV passes through the position of about 10 m above the top of each tower in turn. This leads to situations where the base is far from the UAV, and the base may be occluded by power lines or low vegetation. Consequently, the quality of scene point clouds acquired through manual or semi-automatic flights is often poor, which is specifically reflected in the absence of the point cloud at the base. Therefore, the scene point cloud can only support sag calculation, verticality detection, digital map generation and path planning, but not other acceptance work that requires the use of point clouds. For OTL acceptance, the interest targets mainly include towers and power lines. As can be seen from Figure 1, only the sag calculation is related to the power line in our proposed acceptance content, and only the point cloud data collected in the first flight is needed to complete it. The rest of the acceptance content is related to the tower and requires both point cloud data and image data to accomplish. The aim of this paper is to extract tower point clouds from an OTL scene point cloud accurately and quickly. After extracting the tower point clouds, we can divide the acceptance scene into interest regions (tower point clouds) and obstacle regions (other point clouds), so as to plan the UAV flight path to collect high-quality tower images and tower point clouds. At the same time, the proposed method can separate the interference points from the tower points in the high-quality point clouds collected by the subsequent automatic flight, and support the missing steel detection and tower model generation in OTL acceptance work. In summary, how to quickly and accurately extract tower point clouds from low-quality OTL scene point clouds is the key to implement the proposed OTL digital acceptance system.

2.2. Structural Priors of OTL Towers

The proposed method is built upon several geometric priors of OTL towers. These priors are not used as strict shape templates, but as structural constraints that guide candidate localization, pose estimation, multi-view refinement, and base-constrained filtering. As illustrated in Figure 2, an OTL tower can be decomposed into three main components, namely the tower base, tower body, and tower head. Based on their geometric characteristics, the structural priors used in this study can be summarized as follows.

Tower base: The tower base is typically composed of four pyramid-like supporting structures. Its lower footprint is approximately square, and the four supporting components provide a stable geometric constraint for identifying and removing residual ground and low-vegetation points around the tower foot.
Tower body: The tower body is approximately a centrally symmetric quadrangular-prism-like structure. When projected onto the horizontal plane, its cross-section is relatively regular, which provides a reliable basis for identifying the least-disturbed central region of the tower.
Principal structural axes: The tower body is supported by four principal structural axes located near its outer boundaries. These axes exhibit approximate bilateral symmetry and gradually converge toward the tower center as the height increases. This property enables robust pose normalization and structural-axis fitting.
Tower head: The tower head can be regarded as an upper structural component connected to the tower body, and it remains approximately symmetric with respect to the tower center. Its boundary still follows a structured geometric trend, although it is more easily affected by conductors and insulators.
Cross-arms: The cross-arms are the most distinctive view-dependent structures of the tower head. In the side view, their boundaries are largely aligned with the tower head axes and therefore contribute limited lateral expansion. In the front view, however, the lower boundaries of the cross-arms introduce abrupt width variations, which provide an important cue for front-view structural envelope inference.
Tower height: The height of an OTL tower is defined as the vertical distance from the bottom of the tower base to the top of the tower head. In practical transmission corridors, tower heights usually fall within a limited engineering range, which can be used as a weak prior for rejecting non-tower clusters during candidate localization.
Structural members: Tower members are relatively thin components compared with the overall tower scale. This characteristic implies that coarse candidate localization should prioritize structural completeness, whereas precise extraction should rely on explicit geometric constraints to avoid over-filtering slender tower parts.

3. Methodology

3.1. Overview of the Proposed Method

Given a UAV LiDAR scene point cloud of an OTL corridor, the objective of this study is to accurately extract the point cloud of each transmission tower while suppressing interference from ground, vegetation, conductors, and insulators. To this end, we propose a shape prior-guided coarse-to-fine framework, as illustrated in Figure 3. The proposed method consists of two successive stages, namely candidate tower localization and precise structural refinement.

In the first stage, the scene point cloud is preprocessed and clustered to localize candidate tower regions. In the second stage, the structural priors of the tower are progressively exploited to refine each candidate region. Specifically, the least-disturbed central region of the pre-extracted tower is first identified to estimate the tower orientation and principal structural axes. The rotated tower point cloud is then refined in side and front views to suppress non-tower points around the tower body and tower head. Finally, a base-constrained filtering strategy is introduced to remove residual ground and low-vegetation points around and inside the tower foot. Through this progressive refinement process, tower extraction is formulated as a geometric inference problem under explicit structural constraints rather than a purely point-wise discrimination problem.

The overall procedure of the proposed method is summarized in Algorithm 1.

Algorithm 1: Shape prior-guided coarse-to-fine tower extraction

3.2. Candidate Tower Localization

3.2.1. Scene Preprocessing and Height-Suppressed Candidate Clustering

The raw scene point cloud is first preprocessed to improve computational efficiency and reduce the influence of outliers. In our implementation, two down-sampling operations are used for different purposes. The first one, referred to as scene down-sampling, as shown in Figure 4b, is performed on the original scene point cloud to reduce data size while preserving the overall tower geometry as much as possible. The second one, referred to as DBSCAN down-sampling, as shown in Figure 4c, is further applied before clustering in order to improve the efficiency of density-based grouping on large-scale corridor scenes. The method used in scene down-sampling is voxel down-sampling and statistical outlier removal (SOR).

After scene down-sampling, a further voxel down-sampling and a grid-based height suppression strategy are adopted to remove most ground points and continuous low vegetation, thereby enhancing the separability of tower points from surrounding clutter in DBSCAN down-sampling. Specifically, the point cloud is partitioned in the horizontal plane, and the relative height of each point is computed with respect to the lowest point within the corresponding grid. Denote the horizontal coordinate of a point by

p_{i} = (x_{i}, y_{i})

and the corresponding grid index by

g_{i} = ⌊\frac{p_{i} - p_{min}}{d_{g}}⌋,

(1)

where

p_{min}

is the minimum horizontal coordinate of the scene and

d_{g}

is the grid size on the

X O Y

plane. For each grid, the relative height of a point is defined as

Δ z_{i} = z_{i} - z_{min} (g_{i}),

(2)

where

z_{min} (g_{i})

denotes the minimum elevation within the corresponding grid. Points with small relative heights are discarded, since they are mainly associated with the ground surface and low vegetation. Because transmission towers are substantially taller than near-ground objects, this operation preserves the main tower structure while significantly reducing background interference.

DBSCAN [33] is then applied to the filtered point cloud to obtain spatial clusters in the corridor scene. Owing to the sparse vertical layout and prominent elevation of towers, tower candidates can be effectively separated from most residual objects after near-ground suppression. Based on the expected number of towers and basic height characteristics, non-tower clusters are discarded and the remaining clusters are regarded as candidate tower regions.

3.2.2. Cluster-Guided Recovery of Pre-Extracted Tower Points

Although the previous step provides reliable tower localization, some tower points may be removed during coarse down-sampling and near-ground suppression, especially in the lower part of the tower or around thin structural members. Therefore, the bounding boxes obtained from DBSCAN are used only for candidate localization rather than as the final extraction result, as shown in Figure 5.

To recover a more complete tower point cloud, each candidate bounding box is projected back onto the denser preprocessed scene point cloud. The box is further expanded in the horizontal directions, and its vertical extent is restored according to the point distribution of the preprocessed point cloud. In this way, the preliminary tower point cloud can be recovered from the denser scene representation. This design also explains the difference between the two bounding boxes: the smaller box is generated from the filtered and clustered point cloud for robust localization, whereas the larger box is reconstructed on the denser preprocessed point cloud to compensate for points removed in the previous step.

3.3. Tower Precise Extraction

3.3.1. Central Region-Guided Pose Normalization and Main-Axis Estimation

The pre-extracted tower point cloud still contains interference from conductors, insulators, ground points, and vegetation. However, such interference is not uniformly distributed along the vertical direction. As shown in Figure 6a, the pre-extracted tower can be roughly divided into three regions, namely the low region, central region, and high region. The low region is mainly affected by ground points and vegetation, the high region is more easily disturbed by conductors and insulators, whereas the central region is relatively regular and contains the least interference. Therefore, the central region provides the most reliable geometric support for subsequent pose estimation and structural inference.

Based on this observation, the pre-extracted tower is sliced along the vertical direction, and each slice is projected onto the horizontal plane. For the ith slice, the minimum-area rectangle is computed, and its length and width are denoted by

l_{i}

and

w_{i}

, respectively. A slice-wise rectangular regularity index is defined as

r_{i} = \frac{| l_{i} - w_{i} |}{l_{i} + w_{i}},

(3)

where a smaller value of

r_{i}

indicates that the projected slice is closer to a regular quadrangular cross-section. The central region is determined as the longest consecutive slice interval in which

r_{i}

remains small and stable, as shown in Figure 6b.

After the central region has been identified, the orientation angles of the corresponding minimum-area rectangles are collected and statistically filtered to suppress unstable slices and outliers. Let

I_{c}

denote the retained central region slice set, and let

α_{i}

be the orientation angle of the minimum-area rectangle of the ith retained slice. The initial Minimum Bounding Rectangle (MBR)-induced orientation is then computed as

α_{mbr} = \frac{1}{| I_{c} |} \sum_{i \in I_{c}} α_{i} .

(4)

However, the minimum-area rectangle only determines the horizontal orientation up to a

90^{\circ}

ambiguity. In other words,

α_{mbr}

and

α_{mbr} + π / 2

may both be geometrically plausible. To resolve this ambiguity, the top region is further used because it contains the characteristic structural difference between the side and front views. Specifically, the side view usually presents a sparse line–tower–line distribution, whereas the front view exhibits a denser cross-arm–tower–cross-arm distribution.

Let

z_{c}^{+}

denote the upper boundary of the central region, and define the top-region point set as

P_{t} = \{p \in P_{c} ∣ z (p) \geq z_{c}^{+}\},

(5)

where

P_{c}

is the pre-extracted candidate tower point cloud. A temporary rotation is first performed using

α_{mbr}

. The point density histograms of the rotated top-region points are then computed along the x and y directions, respectively. Let

h_{x} (k)

and

h_{y} (k)

denote the histogram counts of the kth bin along these two directions. For each direction

u \in {x, y}

, the low density bin ratio is defined as

Γ_{u} = \frac{\sum_{k \in Ω_{u}} I (0 < h_{u} (k) \leq η h_{u}^{max})}{| Ω_{u} |}, Ω_{u} = {k ∣ h_{u} (k) > 0},

(6)

where

h_{u}^{max} = {max}_{k} h_{u} (k)

,

η

is a small ratio threshold, and

I (\cdot)

is the indicator function. A larger

Γ_{u}

indicates that the corresponding projection direction contains more low-density non-empty bins and is therefore more consistent with the side-view-like sparse distribution. Accordingly, the final orientation is selected as

α^{*} = \{\begin{matrix} α_{mbr}, & Γ_{x} \geq Γ_{y}, \\ α_{mbr} + \frac{π}{2}, & Γ_{x} < Γ_{y} . \end{matrix}

(7)

This density-histogram-based selection avoids relying solely on the principal variance direction of the top region, which may be biased by conductors or other elongated interference.

The point cloud is subsequently rotated in the horizontal plane so that the principal structure of the tower becomes approximately aligned with the coordinate axes:

p^{'} = p [\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}],

(8)

where

p = [x, y]

and

p^{'} = [x^{'}, y^{'}]

denote the horizontal coordinates before and after rotation, respectively, and

θ = π / 2 - α^{*}

is the final pose-normalization angle.

On the basis of the rotated central region, the four boundary trajectories induced by the rotated minimum-area rectangles are used to fit the principal structural axes. The four axes are parameterized as

L_{a} (z) = [\begin{matrix} x_{a} (z) \\ y_{a} (z) \end{matrix}] = [\begin{matrix} k_{a}^{x} z + b_{a}^{x} \\ k_{a}^{y} z + b_{a}^{y} \end{matrix}], a = 1, 2, 3, 4,

(9)

where

L_{a} (z)

denotes the ath principal structural axis. These four axes are not only geometric descriptors of the central tower body but also the structural references that guide the subsequent envelope inference and base reconstruction. To make this explicit, let

π_{s} (\cdot)

and

π_{f} (\cdot)

denote the projections onto the side-view and front-view planes, respectively. The axis-induced structural references in the two views are written as

s_{a} (z) = π_{s} (L_{a} (z)), f_{a} (z) = π_{f} (L_{a} (z)), a = 1, 2, 3, 4 .

(10)

Accordingly, the side-view and front-view structural center lines are defined as

c_{s} (z) = \frac{1}{4} \sum_{a = 1}^{4} s_{a} (z), c_{f} (z) = \frac{1}{4} \sum_{a = 1}^{4} f_{a} (z) .

(11)

These quantities provide explicit structural references for the multi-view refinement stage.

3.3.2. Multi-View Structural Refinement

After pose normalization, the rotated tower point cloud is projected onto two orthogonal vertical planes to form the side view and the front view, respectively. It should be emphasized that the envelope inference in this study is not formulated as a global optimization problem. Instead, it is implemented as a geometry-guided and layer-wise boundary construction procedure under continuity, symmetry, and structural-trend constraints. The inferred side-view and front-view envelopes are then used as admissible ranges for point filtering.

(1): Side-view structural envelope

Let

s_{L} (z)

and

s_{R} (z)

denote the left and right axis-induced structural references in the side view:

s_{L} (z) = min_{a} s_{a} (z), s_{R} (z) = max_{a} s_{a} (z) .

(12)

These two functions describe the expected structural trend of the tower body in the side view and provide the initial envelope of the central region. Let the linearly extrapolated left and right side-view references be denoted by

x_{l}^{0} (z)

and

x_{r}^{0} (z)

, respectively. For the low and central regions, the side-view envelope is directly given by these structural references. Let

z_{c}^{+}

denote the upper boundary of the central region. In the high region

(z > z_{c}^{+})

, the envelope is constructed by a layer-wise upward search with the same vertical discretization step

Δ z

as the sliced tower point cloud.

Let

X (z) = {x ∣ (x, z) \in P_{s v}},

(13)

where

P_{s v}

is the side-view point set. For each layer in the high region, the candidate left and right boundaries are defined as

{\hat{x}}_{l} (z) = min X (z), {\hat{x}}_{r} (z) = max X (z) .

(14)

At the top of the central region, the initial high region boundaries are set as

x_{l}^{h} (z_{c}^{+}) = x_{l}^{0} (z_{c}^{+}), x_{r}^{h} (z_{c}^{+}) = x_{r}^{0} (z_{c}^{+}) .

(15)

If

X (z) = ⌀

, the previous boundary is temporarily propagated to the current layer, i.e.,

x_{l}^{h} (z) = x_{l}^{h} (z - Δ z), x_{r}^{h} (z) = x_{r}^{h} (z - Δ z) .

(16)

If

X (z) \neq ⌀

, the candidate boundary is recorded only when it satisfies the geometric consistency conditions. For the left boundary, the validity set is written as

Z_{l} = \{z | \forall x \in X (z), x \geq x_{l}^{0} (z_{c}^{+}), x_{l}^{h} (z_{l}^{-}) \leq {\hat{x}}_{l} (z) \leq x_{r}^{h} (z_{r}^{-}), {\hat{x}}_{l} (z) < x_{l}^{0} (z)\},

(17)

where

z_{l}^{-}

and

z_{r}^{-}

denote the most recent valid heights at which the left and right high region boundaries have been recorded. Similarly, the validity set of the right boundary is

Z_{r} = \{z | \forall x \in X (z), x \leq x_{r}^{0} (z_{c}^{+}), x_{l}^{h} (z_{l}^{-}) \leq {\hat{x}}_{r} (z) \leq x_{r}^{h} (z_{r}^{-}), {\hat{x}}_{r} (z) > x_{r}^{0} (z)\} .

(18)

Accordingly, the high region recorded boundaries are updated as

x_{l}^{h} (z) = {\hat{x}}_{l} (z), z \in Z_{l},

(19)

x_{r}^{h} (z) = {\hat{x}}_{r} (z), z \in Z_{r} .

(20)

When a layer contains points but does not satisfy the above validity conditions, no new boundary is recorded at that height. Therefore, unlike the empty-layer case, such a layer is treated as a missing boundary layer and is completed later by interpolation.

Because valid boundary layers are not always obtained continuously in the high region, missing boundary segments are compensated by linear interpolation. If valid left boundary values are available at two adjacent recorded heights

z_{i}

and

z_{i + 1}

with

z_{i + 1} - z_{i} > Δ z

, the missing segment is completed as

x_{l}^{h} (z) = x_{l}^{h} (z_{i}) + \frac{z - z_{i}}{z_{i + 1} - z_{i}} (x_{l}^{h} (z_{i + 1}) - x_{l}^{h} (z_{i})), z_{i} < z < z_{i + 1},

(21)

and similarly for the right boundary

x_{r}^{h} (z) = x_{r}^{h} (z_{i}) + \frac{z - z_{i}}{z_{i + 1} - z_{i}} (x_{r}^{h} (z_{i + 1}) - x_{r}^{h} (z_{i})), z_{i} < z < z_{i + 1} .

(22)

If the final recorded boundary does not reach the tower top, the last valid value is extended to the topmost layer before interpolation. In this way, continuous left and right side-view boundaries are obtained in the high region.

The complete side-view envelope is then written as

x_{l} (z) = \{\begin{matrix} x_{l}^{0} (z), & z \leq z_{c}^{+}, \\ x_{l}^{h} (z), & z > z_{c}^{+}, \end{matrix} x_{r} (z) = \{\begin{matrix} x_{r}^{0} (z), & z \leq z_{c}^{+}, \\ x_{r}^{h} (z), & z > z_{c}^{+} . \end{matrix}

(23)

The side-view admissible set is then written as

E_{x} (z) = [x_{l} (z) - δ_{x}, x_{r} (z) + δ_{x}],

(24)

where

δ_{x}

is the side-view tolerance margin. In this way, the side-view refinement removes a large proportion of non-tower points in the low and high regions while preserving the global tapering structure of the tower. The complete process is illustrated in Figure 7.

(2): Front-view structural envelope

After side-view filtering, the remaining point set is projected onto the front-view plane. In contrast to the side-view refinement, which mainly exploits the tapering profile of the tower body, the front-view refinement is designed to capture the lateral width variations caused by the tower head and cross-arms. Therefore, the front-view envelope is inferred from contour extrema and their height-wise variation.

Let

y_{l}^{0} (z)

and

y_{r}^{0} (z)

denote the left and right front-view structural references linearly fitted from the central region. At the upper boundary of the central region, denoted by

z_{c}^{+}

, a fixed front-view partition center is defined as

y_{c} = \frac{y_{l}^{0} (z_{c}^{+}) + y_{r}^{0} (z_{c}^{+})}{2} .

(25)

This partition center is used to divide the contour points into left and right parts in the high region analysis.

The projected front-view points are first rasterized into a binary image, and the maximum external contour is extracted from the rasterized point map. Let

C_{f v}

denote the extracted front-view contour. For each height level z in the high region, the contour ordinate set is defined as

Y (z) = {y ∣ (y, z) \in C_{f v}} .

(26)

According to the partition center

y_{c}

,

Y (z)

is divided into the left and right subsets

Y_{L} (z) = {y \in Y (z) ∣ y \leq y_{c}}, Y_{R} (z) = {y \in Y (z) ∣ y > y_{c}} .

(27)

At

z = z_{c}^{+}

, the front-view contour extrema are initialized by the central region structural references:

y_{l}^{min} (z_{c}^{+}) = y_{l}^{max} (z_{c}^{+}) = y_{l}^{0} (z_{c}^{+}), y_{r}^{min} (z_{c}^{+}) = y_{r}^{max} (z_{c}^{+}) = y_{r}^{0} (z_{c}^{+}) .

(28)

For subsequent height levels, if both

Y_{L} (z)

and

Y_{R} (z)

are non-empty, the four contour extrema are computed as

y_{l}^{min} (z) = min Y_{L} (z), y_{l}^{max} (z) = max Y_{L} (z), y_{r}^{min} (z) = min Y_{R} (z), y_{r}^{max} (z) = max Y_{R} (z) .

(29)

If either side has no contour point at the current height, the extrema from the previous height level are inherited. This inheritance avoids unstable boundary changes caused by local sparsity or incomplete contour extraction.

To characterize the structural transition of the tower head, the contour-difference function is defined as

Δ (z) = y_{r}^{min} (z) - y_{l}^{max} (z) .

(30)

Its discrete first difference is further written as

\nabla Δ (z) = Δ (z + Δ z) - Δ (z) .

(31)

The height corresponding to the minimum value of

\nabla Δ (z)

is used to locate the recessed-top transition:

z_{t} = arg min_{z} \nabla Δ (z) .

(32)

Meanwhile, the set of positive jump layers is defined as

J = {z ∣ \nabla Δ (z) \geq τ_{c}},

(33)

where

τ_{c}

is a positive jump threshold. The layers in

J

correspond to abrupt outward contour expansion, which is mainly associated with the lower edges of cross-arm regions.

Based on these contour transitions, the high region front-view envelope is constructed piecewise as

(y_{l}^{h} (z), y_{r}^{h} (z)) = \{\begin{matrix} (y_{l}^{min} (z + Δ z), y_{r}^{max} (z + Δ z)), & z \in J, \\ (y_{l}^{min} (z + 2 Δ z), y_{r}^{max} (z + 2 Δ z)), & z + Δ z \in J, \\ (y_{l}^{min} (z), y_{r}^{max} (z)), & z \geq z_{t}, z \notin J, z + Δ z \notin J, \\ (y_{l}^{max} (z), y_{r}^{min} (z)), & z < z_{t}, z \notin J, z + Δ z \notin J . \end{matrix}

(34)

In Equation (34), the outer contour pair is adopted around positive jump layers to preserve the lower structural edges of the cross-arms. The outer contour pair is also used above the recessed top transition, whereas the inner contour pair is used below this transition when no positive jump is detected.

For the low and central regions, the front-view envelope is directly extrapolated from the central region structural references. Therefore, the complete front-view envelope is given by

(y_{l} (z), y_{r} (z)) = \{\begin{matrix} (y_{l}^{0} (z), y_{r}^{0} (z)), & z < z_{c}^{+}, \\ (y_{l}^{h} (z), y_{r}^{h} (z)), & z \geq z_{c}^{+} . \end{matrix}

(35)

The front-view admissible set is written as

E_{y} (z) = [y_{l} (z) - δ_{y}, y_{r} (z) + δ_{y}],

(36)

where

δ_{y}

is the front-view tolerance margin. Points outside this admissible range are removed in the front-view filtering step. Compared with the side-view refinement, the front-view refinement is more sensitive to the characteristic lateral variations produced by the tower head and cross-arms, and therefore plays a complementary role in suppressing residual interference around the upper part of the tower. The complete process is illustrated in Figure 8.

By combining the side-view and front-view constraints, the dual-view refined point set is obtained as

T_{m v} = \{(x, y, z) \in P_{c} | x \in E_{x} (z), y \in E_{y} (z)\},

(37)

where

P_{c}

denotes the candidate tower point set. This dual-view refinement removes a large proportion of non-tower points around the tower body and tower head while preserving the main structural completeness of the tower.

3.3.3. Base-Constrained Removal of Residual Ground and Vegetation Points

After dual-view refinement, most conductors, insulators, and external clutter have been removed. Nevertheless, residual ground points and low vegetation may still remain within the footprint of the tower base. This type of interference is difficult to eliminate using only height or contour information because it is located inside the tower projection. Therefore, a final base-constrained refinement is performed by propagating the principal structural axes estimated from the central region toward the tower foot.

Let

Δ z_{b}

denote the vertical search step for base reconstruction. A low region height sequence is first defined as

Z_{b} = \{z_{min} - m_{l} + n Δ z_{b} ∣ n = 0, 1, \dots, z_{min} - m_{l} + n Δ z_{b} \leq z_{c}^{t o p}\},

(38)

where

z_{min}

is the minimum height of the dual-view refined point set

T_{m v}

,

z_{c}^{t o p}

is the top height of the identified central region, and

m_{l}

is a lower-margin offset that extends the search slightly below the lowest observed structural contact.

For each principal structural axis

L_{a} (z)

, the proximity between the downward-extrapolated axis and the dual-view refined point set is evaluated over

Z_{b}

:

d_{a} (z) = min_{p \in T_{m v}} {∥p - L_{a} (z)∥}_{2}, a = 1, 2, 3, 4 .

(39)

The first reliable contact between the ath axis and the refined point set is detected using the axis-neighborhood threshold

τ_{a}

. The lower height of the corresponding supporting component is then estimated as

z_{a}^{-} = min \{z \in Z_{b} ∣ d_{a} (z) \leq τ_{a}\} - m_{l},

(40)

and the lower vertex is defined as

v_{a}^{-} = L_{a} (z_{a}^{-}), a = 1, 2, 3, 4 .

(41)

If no reliable contact is detected for an axis,

z_{min}

is used as a conservative fallback. This strategy avoids directly selecting isolated low points, which may correspond to residual ground or vegetation rather than true tower foot points.

Next, for each pair of adjacent principal axes, an inter-axis structural center line is defined as

C_{a} (z) = \frac{1}{2} (L_{a} (z) + L_{a + 1} (z)), a = 1, 2, 3, 4,

(42)

where

L_{5} \equiv L_{1}

is adopted cyclically. The proximity profile of each inter-axis center line is computed as

d_{a}^{c} (z) = min_{p \in T_{m v}} {∥p - C_{a} (z)∥}_{2}, a = 1, 2, 3, 4 .

(43)

The upper boundary of the tower base is inferred from discontinuities in the center-line contact profiles. Let

I_{a} = \{i ∣ d_{a}^{c} (z_{i}) \leq τ_{a}, z_{i} \in Z_{b}\}

(44)

be the set of contact indices on the ath inter-axis center line. A transition candidate is detected when two adjacent contact indices are separated by a gap larger than the contact-gap threshold

g_{c}

:

i_{j + 1} - i_{j} \geq ⌈\frac{g_{c}}{Δ z_{b}}⌉, i_{j}, i_{j + 1} \in I_{a} .

(45)

The corresponding transition height is recorded as

z_{a, j}^{t} = z_{i_{j + 1}} .

(46)

Rather than directly using the first detected transition, all transition heights from the four inter-axis center lines are retained. To reduce the influence of small height fluctuations, transition heights whose mutual differences are smaller than a clustering tolerance

g_{t}

are grouped into the same candidate set. The representative height of the kth transition group is defined as

{\hat{z}}_{b, k} = median (G_{k}),

(47)

where

G_{k}

denotes the kth transition-height group. In this study,

g_{t}

is set to 1.0 m. If no reliable transition is detected, a fallback candidate is defined as

{\hat{z}}_{b} = max_{a} z_{a}^{-} + H_{b},

(48)

where

H_{b}

is the fallback base height.

For each candidate base-top height

{\hat{z}}_{b, k}

, a corresponding pyramid-like base model is reconstructed. The upper axis vertices and inter-axis midpoints are computed as

v_{a, k}^{+} = L_{a} ({\hat{z}}_{b, k}), m_{a, k} = C_{a} ({\hat{z}}_{b, k}), a = 1, 2, 3, 4 .

(49)

The ath base component under the kth candidate height is represented as a tetrahedral geometric component:

B_{a, k} = Conv (v_{a}^{-}, v_{a, k}^{+}, m_{a - 1, k}, m_{a, k}), a = 1, 2, 3, 4,

(50)

where cyclic indexing is adopted, i.e.,

m_{0, k} \equiv m_{4, k}

. The reconstructed base model for the kth candidate height is

B_{k} = ⋃_{a = 1}^{4} B_{a, k} .

(51)

The final base-top height is selected by a label-free geometric criterion. For each candidate

{\hat{z}}_{b, k}

, base filtering is first applied to the low region points. Points above

{\hat{z}}_{b, k} - m_{l}

are preserved, while points below this height are retained only if they are inside or sufficiently close to the reconstructed base model:

T_{k} = \{p \in T_{m v} ∣ z (p) > {\hat{z}}_{b, k} - m_{l}\} \cup \{p \in T_{m v} ∣ z (p) \leq {\hat{z}}_{b, k} - m_{l}, p \in B_{k} or d (p, B_{k}) \leq τ_{b}\},

(52)

where

d (p, B_{k})

denotes the shortest distance from point p to the candidate base model, and

τ_{b}

is the distance tolerance.

To evaluate different candidate heights, the four lower vertices

{v_{a}^{-}}_{a = 1}^{4}

are fitted to an approximate tower foot plane

Π_{b}

. A slab with thickness

h_{p}

is constructed around

Π_{b}

, and the number of retained points in

T_{k}

that fall inside this slab is counted as

N_{k}^{p}

. The normalized height term and normalized slab-count term are defined as

s_{1, k} = \frac{{\hat{z}}_{b, k} - z_{min}}{H_{T}},

(53)

s_{2, k} = \frac{N_{k}^{p}}{{max}_{q} N_{q}^{p} + ε},

(54)

where

H_{T} = z_{max} - z_{min}

denotes the candidate tower height, and

ε

is a small positive constant to avoid division by zero. The final score of the kth candidate is then defined as

S_{k} = s_{1, k} + λ s_{2, k},

(55)

where

λ

controls the relative contribution of the slab-count term. The final base-top height is selected as

z_{b} = {\hat{z}}_{b, k^{*}}, k^{*} = arg min_{k} S_{k} .

(56)

If two candidates have the same score, the lower transition height is selected. This criterion favors a lower and structurally compact base-top height while penalizing candidates that retain many residual points near the tower foot plane.

Using the selected

z_{b}

, the final reconstructed base model is denoted by

B = B_{k^{*}}

. The final extracted tower point set is then obtained as

T = \{p \in T_{m v} ∣ z (p) > z_{b} - m_{l}\} \cup \{p \in T_{m v} ∣ z (p) \leq z_{b} - m_{l}, p \in B or d (p, B) \leq τ_{b}\} .

(57)

In this formulation,

Δ z_{b}

controls the vertical resolution of the base search,

τ_{a}

determines whether an extrapolated axis or center line is in contact with the refined point set,

g_{c}

detects discontinuities in the inter-axis contact profiles,

g_{t}

controls the clustering of transition-height candidates,

m_{l}

provides a downward safety margin for low region filtering,

H_{b}

defines the fallback base height,

h_{p}

defines the thickness of the tower foot plane slab,

λ

balances the relative-height and slab-count terms in the base-top selection score, and

τ_{b}

controls the strictness of the final distance-based base filtering. In this way, the principal structural axes estimated from the least-disturbed central region are propagated to the tower foot, and base filtering is converted from a purely local thresholding problem into a geometry-constrained inference problem. Points in the low region that are inconsistent with the reconstructed base geometry are identified as residual ground or low-vegetation points and removed, whereas points conforming to the base model are retained as valid tower points, as shown in Figure 9.

4. Experiments and Results

4.1. Datasets

To evaluate the proposed method, we collected UAV LiDAR scene point clouds from five OTL corridors in China. In addition, to further evaluate the generalizability of the proposed method on an independent public dataset, one point cloud scene containing an OTL corridor was selected from the WHU-Urban3D dataset [34] and used as Line6 in this study. WHU-Urban3D is a general urban-scene LiDAR point cloud dataset for semantic instance segmentation, rather than a dataset specifically designed for OTL inspection. The dataset information and acquisition characteristics are summarized in Table 1. The data acquisition platform for the five private datasets consisted of a DJI Matrice 300 RTK (UAV; SZ DJI Technology Co., Ltd., Shenzhen, China) equipped with a DJI Zenmuse L1 LiDAR sensor (SZ DJI Technology Co., Ltd., Shenzhen, China). For these private datasets, the flight altitude ranged from 50 m to 200 m above ground level (AGL), and the LiDAR scan angle/FOV was 70.4°. For the public dataset, the corresponding flight altitude, LiDAR scan angle, and voltage level were not reported in the original data source; therefore, they are marked as “NR” (Not Reported) in Table 1.

Among these datasets, Line1 corresponds to a mountainous corridor with significant elevation variation, whereas Line2 to Line5 were acquired in relatively flat areas. The terrain elevation variation of Line1 reaches 119.3 m, which is substantially larger than that of the flat private corridors, whose elevation variations range from 1.1 m to 3.7 m. The public dataset Line6 has an elevation variation of 1.9 m. The scene size ranges from approximately 2.14 million points in the public dataset to more than 160 million points in the private datasets, and each corridor contains between two and eight towers. The point density also varies considerably across datasets, from 39.08 pts/m² in Line6 to 2098.60 pts/m² in Line1. This large density difference provides a useful basis for evaluating the robustness of the method under both dense and sparse UAV LiDAR observations. In addition, the voltage levels of the corresponding private OTLs vary from 110 kV to 500 kV, which leads to substantial variations in tower scale and structural configuration. Overall, the datasets cover mountainous and flat terrains, different voltage levels, different point-density conditions, and both private and public data sources. These datasets therefore provide a suitable test bed for evaluating the robustness of the proposed method under different terrain conditions, scene scales, data densities, acquisition conditions, and tower types.

4.2. Parameterization Strategy and Sensitivity Analysis

The proposed method contains several explicit parameters associated with preprocessing, candidate tower localization, central region pose normalization, multi-view structural refinement, and base-constrained filtering. These parameters have clear geometric or algorithmic meanings, such as voxel resolution, DBSCAN neighborhood radius, envelope tolerance, and base-model distance tolerance. Table 2 summarizes the parameterization strategy used in this study. It should be noted that the values reported in the last column have different meanings depending on the parameter type. Fixed values denote the default settings used in all experiments; intervals validated by the sensitivity analysis denote stable operating ranges; and the interval of

Δ z_{c}

denotes the candidate search space used by the adaptive central region selection strategy. In particular,

Δ z_{c}

is not manually tuned for each dataset, but is automatically selected according to the slope consistency of the four fitted structural axes.

4.2.1. Parameterization Strategy

In the preprocessing stage,

d_{v 1}

controls the resolution of scene down-sampling. According to the sensitivity analysis,

d_{v 1} = 0.1

–0.2 m provides a favorable balance between structural-detail preservation and data reduction. Therefore,

d_{v 1} = 0.1

m is adopted as the default setting in this study to better preserve fine tower members, while

d_{v 1} = 0.2

m can be used as an efficient alternative when computational efficiency is prioritized. The SOR parameters

k_{n n}

and

α_{s o r}

are set to 10 and 5.0, respectively, to remove isolated noise without noticeably eroding slender tower components. For near-ground suppression, the grid size

d_{g}

is fixed to 1.0 m, and the relative-height threshold

h_{f}

is set to 15 m by default.

In the candidate localization stage,

d_{v 2}

denotes the voxel size used before DBSCAN clustering. It is fixed to 1.0 m to reduce computational cost and smooth local density fluctuations. The DBSCAN neighborhood radius

ϵ_{d b}

and the minimum point number

M_{d b}

jointly determine the connectivity and validity of candidate clusters. Based on the stable operating intervals observed in the sensitivity analysis,

ϵ_{d b}

is recommended within 6–10 m, and

M_{d b}

is recommended within 150–300. In the final experiments,

ϵ_{d b} = 8

m and

M_{d b} = 200

are used as balanced default settings, considering both the dense private datasets and the sparser public dataset. The cluster-height threshold

h_{d b}

and the horizontal bounding-box expansion margin

b_{x y}

are fixed to 15 m and 2.0 m, respectively, to reject low non-tower clusters and recover peripheral tower points.

For central region pose normalization, the pre-extracted tower is sliced along the vertical direction, and local MBRs are estimated within slice windows of height

H_{c} = 2.0

m. The thresholds

T_{θ}

and

T_{r}

constrain the angular consistency and rectangular regularity of each slice, while

γ_{θ}

is used to suppress unstable slice orientations. The parameter

T_{s}

is introduced as a structural-consistency criterion for the four fitted boundary trajectories. Specifically, the maximum difference among their absolute slopes should be smaller than

T_{s}

. If this condition is not satisfied,

Δ z_{c}

is automatically re-selected from the candidate range listed in Table 2 until the fitted structural axes satisfy the slope-consistency criterion or the best available candidate is obtained. This strategy provides an adaptive mechanism for central region selection while preserving the explicit geometric interpretation of the fitted structural axes.

In the multi-view structural refinement stage,

δ_{x}

and

δ_{y}

are the tolerance margins in the side-view and front-view admissible sets

E_{x} (z)

and

E_{y} (z)

, respectively. They compensate for local point sparsity, minor envelope-estimation errors, and small structural deviations. The recommended range of both parameters is 0.2–0.5 m. The front-view contour-jump threshold

T_{j}

is used to detect abrupt contour variations associated with tower head and cross-arm transitions, and

T_{j} = 2.0

m is adopted as the default value.

For base-constrained filtering,

τ_{a}

defines the axis-neighborhood threshold for searching structural contacts along the principal axes. The parameters

Δ z_{b}

,

g_{c}

,

g_{t}

,

m_{l}

, and

H_{b}

control the vertical search resolution, contact-gap detection, clustering of transition-height candidates, lower-margin offset, and fallback base height, respectively. The parameter

h_{p}

defines the thickness of the slab around the tower foot plane, which is used to evaluate candidate base-top heights. The weight

λ

balances the relative-height term and the normalized slab-count term in the base-top selection score. According to the sensitivity analysis, small values of

λ

provide more stable results, and

λ = 0.2

is adopted as a balanced default setting in this study. The distance tolerance

τ_{b}

determines whether a low region point is sufficiently close to the reconstructed base model

B

. Based on the sensitivity results,

τ_{b}

is recommended within 0.6–1.0 m, and 0.8 m is used as the default setting.

4.2.2. Sensitivity Analysis

To quantitatively evaluate the influence of key parameters and the robustness of the selected settings, a one-factor-at-a-time sensitivity analysis was conducted. Nine representative parameters were selected, including

M_{d b}

,

δ_{x}

,

δ_{y}

,

ϵ_{d b}

,

h_{f}

,

T_{j}

,

τ_{b}

,

λ

, and

d_{v 1}

. For each experiment, only one parameter was varied within a practical range, while all other parameters were fixed at their default values. The experiments were conducted on all test lines. The mean F1-score was used to evaluate average extraction accuracy, and the standard deviation was used to indicate cross-scene stability.

As shown in Figure 10, most parameters exhibit stable operating intervals rather than isolated optimal values. For candidate localization,

M_{d b}

remains stable within 150–300, with a slightly higher mean F1-score around 250. The DBSCAN radius

ϵ_{d b}

provides high F1-scores within 6–9 m, while a larger value slightly decreases the mean F1-score and increases cross-scene variation. This is consistent with the role of DBSCAN: an overly large radius may merge tower candidates with nearby clutter, whereas an excessively small value may fragment tower points.

For preprocessing,

h_{f}

performs stably within 10–20 m, but the F1-score decreases noticeably when

h_{f}

is increased to 25 m. This indicates that an overly large near-ground suppression threshold may remove valid lower tower structures or reduce candidate completeness. The scene down-sampling voxel size

d_{v 1}

also has a clear effect on the result. Values of 0.1–0.2 m preserve more structural details and yield higher F1-scores, whereas larger voxel sizes lead to geometric-detail loss and reduced extraction accuracy.

For multi-view structural refinement,

δ_{x}

and

δ_{y}

are relatively insensitive within 0.2–0.5 m, indicating that the inferred side-view and front-view envelopes provide stable geometric constraints. The slightly lower performance at 0.1 m suggests that an overly strict tolerance may remove valid tower points near the inferred boundary. The front-view contour-jump threshold

T_{j}

has the weakest influence on the final F1-score, suggesting that the front-view refinement is mainly governed by the overall contour-envelope structure rather than by a single jump threshold.

For base-constrained filtering,

τ_{b}

shows a clearer influence on the final extraction result. Increasing

τ_{b}

relaxes the base constraint and may retain more residual ground or low-vegetation points near the tower foot, leading to a gradual decrease in F1-score. Therefore, a moderate range of 0.6–1.0 m is preferred. The base-top selection weight

λ

also affects the final result. When

λ

is small, the score is mainly governed by the relative transition height, and the F1-score remains relatively stable. As

λ

increases, the slab-count term becomes more dominant, which may over-penalize candidate heights with more structural points near the tower foot plane and reduce extraction accuracy. Therefore,

λ

is recommended to be selected within a small range, and

λ = 0.1

is used as a balanced setting in this study.

Overall, the sensitivity analysis supports the parameter ranges reported in Table 2. The results indicate that the final settings are selected from stable operating intervals rather than over-tuned to a specific dataset. Therefore, the parameterization strategy combines fixed engineering-scale parameters, adaptive selection for the central region slicing interval, and sensitivity-validated stable ranges for key parameters. This design improves the practical robustness of the proposed method while maintaining the interpretability of the overall pipeline.

4.3. Candidate Tower Localization Results

After scene preprocessing, height suppression, and density-based clustering, candidate tower regions were successfully localized in all five datasets. The localization results are shown in Figure 11. It should be noted that the objective of this stage is not to perform precise ground filtering, but to rapidly suppress near-ground clutter and preserve tower separability before clustering. For this reason, we adopted a grid-based height suppression strategy rather than more elaborate ground filtering methods. This design is computationally efficient for large corridor scenes and is also effective in the presence of continuous low vegetation, which is common in practical OTL inspection scenarios.

After clustering, the candidate bounding boxes were projected back onto the denser preprocessed scene point cloud to recover more complete pre-extracted tower point sets. Therefore, the output of this stage provides reliable tower localization together with sufficiently complete candidate regions for the subsequent structural refinement.

4.4. Final Tower Extraction Results

Following candidate tower localization, each pre-extracted tower was further processed by central region-guided pose normalization, multi-view structural refinement, and base-constrained filtering. The final extraction results are shown in Figure 12. It is worth noting that the extracted point sets in this study intentionally exclude insulators. This design choice is consistent with the objective of tower-focused acceptance and geometric analysis. In the proposed method, side-view refinement can preserve some insulators that hang approximately perpendicular to the ground, whereas insulators that are nearly parallel to the ground may be confused with conductors. Likewise, retaining more points in the upper front-view region may preserve additional insulator points, but it also introduces a larger number of conductor points. Considering that insulators are not the target object in this study and may interfere with tower-focused inspection, they were intentionally excluded from the final extracted tower point sets.

4.5. Quantitative Comparison with Representative Methods

To quantitatively evaluate the tower extraction performance, the tower points in each dataset were manually annotated using CloudCompare. The annotated tower points were used as the positive class, while the remaining scene points were treated as the negative class. For each OTL, the confusion counts of all towers were first accumulated, and Precision, Recall, and F1-score were then computed from the accumulated TP, FP, and FN. Specifically, TP denotes the number of correctly extracted tower points, FP denotes the number of non-tower points incorrectly extracted as tower points, and FN denotes the number of tower points missed by the extraction result. TN was not used in the main evaluation because the number of non-tower scene points is much larger than the number of tower points and may dominate accuracy-based metrics. The evaluation metrics are defined as

Precision = \frac{TP}{TP + FP},

(58)

Recall = \frac{TP}{TP + FN},

(59)

F 1 -score = \frac{2 \times Precision \times Recall}{Precision + Recall} .

(60)

To ensure a fair comparison, the representative comparison methods that could be reproduced were re-implemented and evaluated on the same Line1–Line6 datasets. These methods include deep learning methods, feature-based supervised methods, and unsupervised methods. For supervised deep learning and feature-based methods, a leave-one-line-out cross-validation protocol was adopted. In each fold, one complete OTL was used as the test set, while the remaining lines were used for training. No point or tower from the test line was used during training or validation. The proposed method and the unsupervised methods do not require training and were directly applied to each test line.

All methods were evaluated using the same point-level metrics. The reported average values were computed as the macro-average over Line1–Line6, so that each OTL contributed equally to the final average. This protocol avoids the unfairness caused by literature-reported results obtained from different datasets or inconsistent evaluation metrics.

As shown in Table 3, the proposed method achieves the highest average Precision, Recall, and F1-score among all reproduced methods. Compared with the best representative method, the proposed method improves the average F1-score from 89.46% to 97.07%. The improvement is mainly attributed to the coarse-to-fine shape-prior-guided design, which first localizes candidate tower regions and then uses the least-disturbed central body, multi-view envelopes, and base-constrained filtering to suppress surrounding clutter while preserving tower completeness.

As shown in Table 3, the proposed method achieves the highest average Precision, Recall, and F1-score among all reproduced methods. Compared with the best representative method, the proposed method improves the average F1-score from 89.46% to 97.07%. The improvement is mainly attributed to the coarse-to-fine shape-prior-guided design, which first localizes candidate tower regions and then uses the least-disturbed central body, multi-view envelopes, and base-constrained filtering to suppress surrounding clutter while preserving tower completeness. In addition, the visual results of the representative methods are provided in Appendix A, which further illustrates the differences in extraction completeness, residual clutter, and tower-structure preservation among different methods.

To further examine whether the performance improvement is statistically significant, a one-sided Wilcoxon signed-rank test was conducted on the line-level F1-scores between the proposed method and each representative method. The null hypothesis was that the proposed method does not achieve higher F1-scores than the comparison method. The statistical test results are listed in Table 4.

As shown in Table 4, the proposed method achieved an average F1-score of 97.07%, which is higher than all representative methods. The p-values are all smaller than 0.05, indicating that the proposed method achieves statistically significant improvements over the representative methods in terms of line-level F1-score.

It should be noted that the Wilcoxon signed-rank test was conducted using six line-level paired F1-scores. Owing to the small sample size, the exact p-values are discrete. When the proposed method achieves higher F1-scores on all six lines than a comparison method, the one-sided exact p-value reaches the minimum value of 0.0156. Therefore, identical p-values may appear even when the average F1-score improvements are different.

4.6. Tower-Level Extraction Accuracy and Completeness

To explicitly evaluate the extraction quality of individual tower structures, tower-level Precision, Recall, and F1-score were computed for each annotated tower, as shown in Table 5. In this evaluation, Precision measures the accuracy of the extracted tower point cloud, namely the proportion of extracted points that truly belong to the tower. Recall measures the completeness of the extracted tower structure, namely the proportion of ground-truth tower points that are successfully preserved. F1-score provides a balanced measure of accuracy and completeness.

The results show that the proposed method achieves stable single-tower extraction performance across Line1–Line6. Most towers have F1-scores higher than 95%, and the Recall values are generally high, indicating that the main structural points of individual towers are well preserved. This confirms that the proposed structural refinement strategy not only performs well at the line level but also maintains high completeness for single tower structures. The relatively lower Precision values for several towers, especially in Line6, are mainly caused by residual non-tower points near the tower base or tower head regions, where electrical components, vegetation, or near-ground points are spatially close to the tower structure.

4.7. Error Analysis and Computational Efficiency

The experimental setup is introduced first to clarify the computational environment. Except for the deep learning methods, all experiments were conducted on a server equipped with a 2.5 GHz Intel Core i7-11700 CPU, 32 GB RAM, and an NVIDIA RTX 3060Ti G6X GPU (CUDA 11.3). Because the original experimental platform could not efficiently support all deep learning methods, the deep learning experiments were conducted on a separate server equipped with Ubuntu 22.04.5 LTS, an Intel Xeon Gold 6226R CPU, 128 GB RAM, and two NVIDIA GeForce RTX 3090 GPUs. The implementation was written in Python 3.10. Therefore, the efficiency comparison should be interpreted together with the reported hardware information.

As shown in Table 3, the proposed method generally yields high Recall, indicating that most tower points are preserved during structural refinement. Representative error patterns are illustrated in Figure 13. FP points are mainly concentrated around the tower base and around the junction between tower cross-arms and conductors. Most of the FPs at the tower foot can be attributed to low data quality and the difficulty of precisely distinguishing tower base points from near-ground points during manual annotation. By contrast, FN points are mostly scattered over the tower surface. These errors are mainly caused by local point sparsity, which may shift the inferred envelope slightly inward during structural refinement and therefore remove a small number of valid tower points.

Although a small number of FNs remain, their spatial distribution is sparse and mostly confined to the tower surface, which has limited influence on the overall tower geometry. Therefore, the extracted tower point sets remain suitable for subsequent tasks such as tower-oriented inspection, geometric analysis, and path planning.

Since corridor acceptance and UAV-based OTL inspection are time-sensitive, computational efficiency is also an important performance criterion. To provide a fairer efficiency comparison, we compared the training time, inference time, and normalized inference time of the representative methods and the proposed method on Line1–Line6, as shown in Table 6. The normalized inference time was computed as the inference time divided by the total number of million points in Line1–Line6. For supervised deep learning and feature-based methods, training time is reported separately as an offline cost and is not included in the inference-time comparison. For unsupervised methods, no training stage is required.

As shown in Table 6, the proposed method has the shortest inference time and the lowest normalized processing time among the compared methods. This is mainly because the proposed method avoids global point-wise semantic inference and instead uses a coarse-to-fine geometric filtering strategy. It first reduces the search space through candidate localization and then performs structural refinement only within candidate regions. Compared with supervised deep learning and feature-based methods, the proposed method does not require offline training or large-scale annotated training data, which is advantageous in practical engineering scenarios where labeled tower point clouds are limited. Despite the hardware difference between the proposed method and the deep learning methods, the proposed method still achieves the lowest normalized inference time, suggesting that it is computationally efficient for large-scale UAV LiDAR corridor processing.

4.8. Ablation Study

To verify the contribution of the main components in the proposed method, an ablation study was conducted on Line1–Line6. Three key modules were evaluated: central region guidance, dual-view structural refinement, and base-constrained filtering. For each ablation variant, only the corresponding module was removed or simplified, while the remaining settings were kept the same as the full model. The evaluation metrics were computed using the same point-level Precision, Recall, and F1-score definitions as in the quantitative comparison, and the reported values are macro-averages over Line1–Line6.

The first variant removes the central region guidance. Instead of identifying the least-disturbed central body, the whole candidate point cloud is directly used for pose estimation and structural reference fitting. This setting is used to evaluate whether the central region selection is necessary for robust tower orientation estimation. The second variant removes the dual-view refinement by retaining only the side-view envelope constraint and disabling the front-view refinement. This setting evaluates the complementary contribution of side-view and front-view constraints. The third variant removes the geometric base-constrained filtering. In this case, the reconstructed base model is replaced by a simple height threshold filtering strategy, where points lower than 1 m above the minimum height of the refined candidate point cloud are removed. This setting evaluates whether the proposed base geometry provides additional benefits over simple low-height filtering.

The results in Table 7 show that all three modules contribute to the final extraction performance. Removing the central region guidance causes the most severe degradation, reducing the average F1-score from 97.21% to 43.61%. This indicates that directly using the whole candidate point cloud for pose estimation is unreliable because the candidate region may contain conductors, insulators, ground points, vegetation, and other surrounding clutter. These interference points disturb the estimation of tower orientation and structural axes, leading to error propagation in the subsequent refinement stages. Therefore, identifying the least-disturbed central region is essential for robust pose normalization and structural inference.

When the dual-view refinement is replaced by side-view-only refinement, the average F1-score decreases from 97.21% to 94.23%. The Recall remains high, but the Precision decreases from 96.51% to 90.41%. This suggests that the side-view envelope can preserve most tower points, but it is insufficient to remove all non-tower points, especially those that overlap with the tower structure in the side-view projection. The front-view refinement provides complementary constraints and helps suppress residual conductors, insulators, and tower head interference points.

Removing the geometric base-constrained filtering also leads to a clear performance decrease. When the base model is replaced by simple height threshold filtering, the average F1-score decreases from 97.21% to 89.20%, and the Precision drops from 96.51% to 81.77%. This demonstrates that a simple low-height threshold cannot effectively distinguish true tower-base points from residual ground or low-vegetation points. In contrast, the proposed base-constrained filtering uses the extrapolated structural axes and reconstructed base geometry to preserve valid tower-base points while removing geometrically inconsistent low-region clutter.

Overall, the ablation results confirm the necessity of the proposed coarse-to-fine design. The central region guidance provides reliable pose and structural references, the dual-view refinement offers complementary envelope constraints, and the base-constrained filtering improves the accuracy of low-region extraction. The full model achieves the best overall F1-score, demonstrating that these modules work cooperatively to improve both the accuracy and completeness of transmission tower extraction.

5. Discussion

5.1. Applicability Boundary for Different Tower Configurations

The proposed method is designed based on explicit geometric priors of transmission towers. In particular, the central region identification strategy assumes that the tower body contains a relatively regular and approximately quadrangular structural region. This assumption is reasonable for the typical self-supporting angle-steel towers considered in the main experiments, including the T-shaped or dry-type towers. Under this condition, the least-disturbed central body can be used to estimate the tower orientation and principal structural axes, which further support side-view and front-view envelope inference.

However, this assumption may not always hold for all tower configurations, such as portal-type, wine-glass-type, narrow-base, single-pole, compact, or steel tubular combination towers. For these structures, the tower body may not provide a stable quadrangular cross-section, and the main structural axes and envelope surfaces may differ from those assumed in the current framework. In such cases, the rectangular-regularity-based central region identification may fail to locate a reliable central interval, and errors may propagate to pose normalization, multi-view envelope construction, and base-constrained filtering. Therefore, the current method should not be interpreted as a universal extraction framework for all transmission tower types.

To provide an initial evaluation on another tower configuration, an additional experiment was conducted on a cat-head-type tower selected from the public WHU-Urban3D dataset [34]. The proposed method achieved a precision of 95.44%, a recall of 97.50%, and an F1-score of 96.46%. The result indicates that the central region-based strategy still has a certain degree of transferability when the tower body retains a sufficiently regular central structure. In this case, the main tower body and most structural members can be effectively extracted.

Nevertheless, the cat-head-type tower also reveals a typical limitation of the current multi-view envelope refinement. As shown in Figure 14, a short conductor segment and an insulator string are located near the middle of the tower head. Since these electrical components overlap with the tower head structure in both the side-view and front-view projections, they are difficult to remove using only 2D envelope constraints. As a result, a small number of non-tower points remain in the extracted result, which mainly affects precision rather than recall. This observation suggests that the central region strategy can be transferred to towers with regular central bodies, but the tower head refinement still requires tower-type-aware or component-level constraints for more complex configurations.

For non-standard tower configurations, the shape prior should be redesigned according to the corresponding structural form. For example, portal-type towers may require multi-leg or frame-level structural modeling; wine-glass-type and cat-head-type towers may require tower head-specific envelope constraints; and single-pole or steel tubular towers may be better represented by circular or elliptical fitting than by minimum bounding rectangles. In addition, for tower heads where conductors, jumper wires, or insulator strings overlap with structural members in projection, additional 3D cues such as local connectivity, rod-like member fitting, conductor direction priors, or insulator-string descriptors may be required. Developing a tower-type-aware extraction framework with adaptive geometric priors and component-level refinement is an important direction for future work.

5.2. Potential of Adaptive Parameter Selection

Although the proposed method adopts explicit and interpretable parameters, further adaptive parameter selection based on scene statistics is feasible and deserves investigation. In the current study, most key parameters are determined from physically meaningful settings and sensitivity-validated stable intervals. This design maintains the transparency and stability of the engineering pipeline. However, some parameters could potentially be estimated automatically from the statistical characteristics of the input point cloud.

For example, the near-ground suppression threshold

h_{f}

could be adaptively determined from the distribution of local relative heights within horizontal grids, rather than being fixed to a predefined value. A percentile-based strategy, such as selecting a high percentile of the relative-height distribution, may help adapt the ground-suppression strength to different terrain undulations and vegetation conditions. Similarly, the envelope tolerance parameters

δ_{x}

and

δ_{y}

could be related to the point density, voxel size, or fitting residuals of the side-view and front-view structural envelopes. For sparse point clouds, slightly larger tolerance margins may be required to avoid removing valid tower points near the inferred boundaries, whereas dense and clean point clouds may allow stricter envelope constraints.

The base-filtering tolerance

τ_{b}

and the base-top selection weight

λ

also have the potential to be adaptively adjusted. For instance,

τ_{b}

could be linked to the residual distribution between low region points and the reconstructed base model, while

λ

could be adjusted according to the uncertainty of transition-height candidates or the density of residual points near the tower foot plane. In addition, the DBSCAN parameters, such as

ϵ_{d b}

and

M_{d b}

, could be estimated from the density distribution of the down-sampled scene or from the spacing between local point clusters. This would be particularly useful for sparse public datasets, scenes with strong density variation, or corridors containing closely spaced towers.

Nevertheless, fully adaptive parameter estimation was not implemented in the current version because the main objective of this study is to develop an interpretable and stable shape-prior-guided extraction pipeline. Introducing too many adaptive rules may improve flexibility but could also reduce transparency and introduce additional failure modes. Therefore, the current study adopts sensitivity-validated default values and stable operating ranges, while only the central region slicing interval

Δ z_{c}

is adaptively selected according to the slope consistency of the four fitted structural axes. Developing a more comprehensive scene statistics-driven parameter selection strategy will be investigated in future work, especially for extremely sparse point clouds, severe clutter, complex tower head configurations, and inter-tower adhesion scenarios.

5.3. Influence of Inter-Tower Adhesion

Inter-tower adhesion is a practical challenge in UAV LiDAR-based transmission corridor mapping. When adjacent towers or crossing towers are spatially close, density-based clustering may group multiple towers into the same candidate region, especially when the DBSCAN neighborhood radius is relatively large. This problem mainly occurs in the candidate tower localization stage. Once two towers are merged into one candidate region, the subsequent central region identification and structural refinement may be affected because the candidate no longer corresponds to a single tower instance.

To further examine this issue, an additional two-tower scene, denoted as Line7, was used for analysis. One of the two towers was horizontally shifted to generate different inter-tower spacing conditions, as shown in Figure 15. Two distance measures were used to characterize the adhesion degree: the tower-center distance

D_{c}

and the minimum inter-tower distance

D_{t}

. Here,

D_{c}

describes the distance between the two tower centers, whereas

D_{t}

describes the closest distance between the two tower point sets. A smaller

D_{t}

indicates a higher risk of inter-tower adhesion during density clustering.

The experimental results are summarized in Table 8. When

D_{c}

values were 27.3 m and 24.1 m, corresponding to

D_{t}

values of 12.6 m and 9.4 m, respectively, the default setting successfully separated the two towers into two candidate regions. The corresponding F1-scores were 98.83% and 98.56%, indicating that the proposed method can handle normal adjacent-tower spacing conditions.

When

D_{c}

decreased to 21.9 m and

D_{t}

decreased to 7.7 m, the default DBSCAN setting merged the two towers into one candidate region. In this case, the tower number detected in the candidate localization stage was 1, and the F1-score was not reported because the candidate localization result no longer corresponded to two independent tower instances. After reducing the DBSCAN radius to

ϵ_{d b} = 6

m, the two towers were correctly separated, and an F1-score of 98.58% was achieved. This result indicates that inter-tower adhesion is mainly controlled by the candidate localization parameters, especially

ϵ_{d b}

.

A more challenging case was further constructed with

D_{c} = 19.1

m and

D_{t} = 4.6

m. Under

ϵ_{d b} = 6

m, the two towers could still be separated, but the F1-score decreased to 95.41%. This suggests that even when candidate separation is successful, the reduced spacing may still increase interference in subsequent structural refinement. After adjusting the base-top selection weight to

λ = 0.2

, the F1-score increased to 98.92%. This improvement indicates that, in close-tower scenarios, both candidate localization and base-constrained refinement may influence the final extraction accuracy.

Overall, the Line7 experiment shows that reducing

ϵ_{d b}

can effectively mitigate moderate inter-tower adhesion. However, this adjustment involves a trade-off. A smaller DBSCAN radius can reduce the risk of merging adjacent towers, but an excessively small radius may fragment a single tower into multiple clusters, especially in sparse point clouds. Therefore,

ϵ_{d b}

should be selected within the stable range identified by the sensitivity analysis while considering tower spacing and scene density.

It should also be noted that the current DBSCAN-based localization strategy does not fully solve all inter-tower adhesion cases. For extremely close adjacent towers, crossing towers, or scenes where conductors, vegetation, or other objects create continuous connections between tower regions, a secondary candidate-splitting mechanism may be required. For example, a merged candidate region could be detected by its abnormal horizontal extent, excessive bounding-box width, or the presence of multiple stable central regions. It could then be separated using secondary clustering, multi-center detection, or graph-based connectivity analysis. Developing such an automatic candidate-splitting strategy is an important direction for future work.

5.4. Error Propagation and Future Extensions

Another limitation of the proposed method lies in the sequential nature of the coarse-to-fine refinement process. The candidate localization, central region identification, pose normalization, multi-view envelope inference, and base-constrained filtering are performed in a forward pipeline. Therefore, errors in early stages may influence subsequent steps. For example, if the tower center or orientation is inaccurately estimated, the side-view and front-view projections may be biased, and the inferred structural envelopes may no longer match the true tower geometry. Similarly, an unstable central region estimation may lead to inaccurate principal structural axes, which further affects tower head refinement and base reconstruction.

In the current implementation, several strategies are used to reduce this error propagation. The tower pose and principal axes are estimated from the least-disturbed central region rather than from the whole candidate point cloud. The selected central region is constrained by slice-wise rectangular regularity and further checked by the slope consistency of the fitted structural axes. In addition, tolerance margins are introduced in the side-view and front-view admissible ranges to compensate for moderate pose-estimation and envelope-estimation errors. These mechanisms improve robustness in typical cases, but they do not constitute a complete backward correction or skip-level feedback mechanism.

A possible improvement is to introduce feedback-based refinement into the current framework. For instance, after multi-view filtering, the retained tower points could be used to re-estimate the tower orientation and update the structural axes. The consistency between the inferred envelopes and the filtered point distribution could also be used as a confidence measure to trigger re-initialization when the intermediate results are unreliable. Moreover, the base geometry and tower head structure could provide additional constraints for correcting the central-body axes. Developing such iterative or feedback-guided refinement mechanisms is an important direction for future work.

6. Conclusions

This study developed a shape prior-guided coarse-to-fine framework for extracting transmission towers from UAV LiDAR point clouds in OTL corridors. Rather than treating tower extraction as a purely point-wise classification problem, the method explicitly exploits the regular geometry of tower bodies, heads, and bases to progressively suppress ground, vegetation, conductors, and other non-tower interference points.

The experiments lead to three main conclusions. First, explicit structural priors can provide accurate and robust tower extraction across heterogeneous OTL scenes. On Line1–Line6, the proposed method achieved the highest average Precision, Recall, and F1-score among all reproduced representative methods, with the average F1-score improved from 89.46% to 97.07% compared with the best comparison method. The Wilcoxon signed-rank test further confirmed statistically significant line-level improvements with

p < 0.05

. Second, the extracted tower point clouds preserve high structural completeness at the individual-tower level, with most towers achieving F1-scores higher than 95%. Third, the method is computationally efficient for large-scale corridor data and achieves the lowest normalized inference time of 1.93 s/Mpts without requiring offline training or large annotated datasets.

The academic contribution of this work is the explicit formulation of tower-specific shape priors as a practical geometric extraction mechanism. The results suggest that shape priors are particularly useful when annotated tower samples are limited, surrounding clutter is severe, and engineering applications require efficient processing without heavy network inference. In addition, the refinement chain of candidate localization, central region-guided pose normalization, multi-view envelope inference and base-constrained filtering provides a transferable template for extracting structurally regular objects from cluttered 3D point clouds beyond power infrastructure. In automated OTL acceptance workflows, the proposed method can serve as a geometric preprocessing module that provides clean and complete tower point clouds for downstream geometric measurement, structural analysis, UAV path planning, and AI-based defect detection.

Several limitations remain. The current priors are mainly derived from typical self-supporting angle-steel towers with approximately quadrangular central bodies, and extension to portal-type, wine-glass-type, compact, single-pole, and steel tubular towers requires tower-type-aware descriptors and adaptive shape priors. Moreover, 2D multi-view envelope refinement may still be insufficient when conductors, jumper wires, or insulator strings strongly overlap with tower-head members in projection. Future work will focus on local 3D connectivity modeling, feedback-guided refinement, confidence-based re-initialization, and adaptive parameter selection based on scene statistics, point density, envelope-fitting residuals, and inter-tower spacing.

Author Contributions

Conceptualization, H.W.; methodology, C.T. and Y.S.; software, C.T.; validation, C.T. and H.W.; formal analysis, C.T.; investigation, Y.S.; resources, H.W.; data curation, C.T.; writing—original draft preparation, C.T.; writing—review and editing, C.T., Y.S., K.Z. and H.W.; visualization, C.T.; supervision, H.W.; project administration, H.W.; funding acquisition, K.Z. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Fundamental Research Funds for the Central Universities, grant number 2242025F20002.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author. The source code is available at https://github.com/TyCaLn123/otl-tower-extraction (accessed on 14 June 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

OTL	Overhead transmission line
UAV	Unmanned Aerial Vehicle
LiDAR	Light Detection and Ranging
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
SOR	Statistical Outlier Removal
MBR	Minimum Bounding Rectangle
PCA	Principal Component Analysis
TP	True Positive
FP	False Positive
FN	False Negative
TN	True Negative
AGL	Above Ground Level
Mpts	Million points

Appendix A. Visual Results of Representative Methods

Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8 and Figure A9 show the visual extraction results generated by our reproduced implementations of the corresponding methods on the Line1–Line6 datasets. In these figures, gray points in the full-scene views denote points classified as non-tower by the corresponding method, whereas points in other colors denote points classified as tower. In the enlarged views, blue points indicate correctly extracted tower points, brown points indicate non-tower points incorrectly extracted as tower points.

Figure A1. Visual extraction results of method in [35] on Line1–Line6.

Figure A2. Visual extraction results of method in [36] on Line1–Line6.

Figure A3. Visual extraction results of method in [10] on Line1–Line6.

Figure A4. Visual extraction results of method in [37] on Line1–Line6.

Figure A5. Visual extraction results of method in [16] on Line1–Line6.

Figure A6. Visual extraction results of method in [19] on Line1–Line6.

Figure A7. Visual extraction results of method in [23] on Line1–Line6.

Figure A8. Visual extraction results of method in [27] on Line1–Line6.

Figure A9. Visual extraction results of method in [38] on Line1–Line6.

References

Xia, Y.; Lu, J.; Li, H.; Xu, H. A Deep Learning Based Image Recognition and Processing Model for Electric Equipment Inspection. In Proceedings of the 2018 2nd IEEE Conference on Energy Internet and Energy System Integration, Beijing, China, 20–22 October 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
Ren, S.; Hu, W.; Bradbury, K.; Harrison-Atlas, D.; Valeri, L.M.; Murray, B.; Malof, J.M. Automated Extraction of Energy Systems Information from Remotely Sensed Data: A Review and Analysis. Appl. Energy 2022, 326, 119876. [Google Scholar] [CrossRef]
Fernandes, D.; Silva, A.; Névoa, R.; Simões, C.; Gonzalez, D.; Guevara, M.; Novais, P.; Monteiro, J.; Melo-Pinto, P. Point-Cloud Based 3D Object Detection and Classification Methods for Self-Driving Applications: A Survey and Taxonomy. Inf. Fusion 2021, 68, 161–191. [Google Scholar]
Horng, G.J.; Liu, M.X.; Chen, C.C. The Smart Image Recognition Mechanism for Crop Harvesting System in Intelligent Agriculture. IEEE Sens. J. 2019, 20, 2766–2781. [Google Scholar] [CrossRef]
Wang, D.; Zhao, J.; Long, X.; Chen, Y.; Wu, S.; Hu, W. Research on Acceptance Technology of Transmission Line Infrastructure Construction Based on Laser LiDAR Technology. In Proceedings of the Seventh Symposium on Novel Photoelectronic Detection Technology and Applications, Kunming, China, 5–7 November 2021; SPIE: Bellingham, WT, USA, 2021; Volume 11763, pp. 1896–1900. [Google Scholar]
Li, X.; Song, Q.; Liu, W.; Rao, H.; Xu, S.; Li, L. Protection of Nonpermanent Faults on DC Overhead Lines in MMC-Based HVDC Systems. IEEE Trans. Power Deliv. 2012, 28, 483–490. [Google Scholar] [CrossRef]
Zorzi, S.; Maset, E.; Fusiello, A.; Crosilla, F. Full-Waveform Airborne LiDAR Data Classification Using Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8255–8261. [Google Scholar] [CrossRef]
Su, C.; Wu, X.; Guo, Y.; Lai, C.S.; Xu, L.; Zhao, X. Automatic Multi-Source Data Fusion Technique of Powerline Corridor Using UAV LiDAR. In Proceedings of the 2022 IEEE International Smart Cities Conference, Pafos, Cyprus, 26–29 September 2022; IEEE: New York, NY, USA, 2022; pp. 1–5. [Google Scholar]
Zhang, L.; Wang, J.; Shen, Y.; Liang, J.; Chen, Y.; Chen, L.; Zhou, M. A Deep Learning Based Method for Railway Overhead Wire Reconstruction from Airborne LiDAR Data. Remote Sens. 2022, 14, 5272. [Google Scholar] [CrossRef]
Li, W.; Luo, Z.; Xiao, Z.; Chen, Y.; Wang, C.; Li, J. A GCN-Based Method for Extracting Power Lines and Pylons from Airborne LiDAR Data. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5700614. [Google Scholar]
Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. [Google Scholar]
Wu, X.; Jiang, L.; Wang, P.S.; Liu, Z.; Liu, X.; Qiao, Y.; Ouyang, W.; He, T.; Zhao, H. Point Transformer V3: Simpler Faster Stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; IEEE: New York, NY, USA, 2024; pp. 4840–4851. [Google Scholar]
Fei, B.; Xu, J.; Li, Y.; Yang, W.; Zhou, Q.; Liu, L.; Luo, T.; He, Y. Self-supervised learning for pre-training 3D point clouds: A survey. Comput. Vis. Media 2026, 12, 509–573. [Google Scholar] [CrossRef]
Kim, H.B.; Sohn, G. Point-Based Classification of Power Line Corridor Scene Using Random Forests. Photogramm. Eng. Remote Sens. 2013, 79, 821–833. [Google Scholar] [CrossRef]
Kim, H.B.; Sohn, G. 3D Classification of Power-Line Scene from Airborne Laser Scanning Data Using Random Forests. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2010, 38, 126–132. [Google Scholar]
Tang, Q.; Zhang, L.; Lan, G.; Shi, X.; Duanmu, X.; Chen, K. A Classification Method of Point Clouds of Transmission Line Corridor Based on Improved Random Forest and Multi-Scale Features. Sensors 2023, 23, 1320. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Chen, Q.; Liu, L.; Zheng, D.; Li, C.; Li, K. Supervised Classification of Power Lines from Airborne LiDAR Data in Urban Areas. Remote Sens. 2017, 9, 771. [Google Scholar] [CrossRef]
Kuprowski, M.; Drozda, P. Feature Selection for Airbone LiDAR Point Cloud Classification. Remote Sens. 2023, 15, 561. [Google Scholar] [CrossRef]
Guo, B.; Huang, X.; Zhang, F.; Sohn, G. Classification of Airborne Laser Scanning Data Using JointBoost. ISPRS J. Photogramm. Remote Sens. 2015, 100, 71–83. [Google Scholar] [CrossRef]
Awrangjeb, M.; Islam, M.K. Classifier-Free Detection of Power Line Pylons from Point Cloud Data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 4, 81–87. [Google Scholar] [CrossRef]
Lu, Z.; Gong, H.; Jin, Q.; Hu, Q.; Wang, S. A Transmission Tower Tilt State Assessment Approach Based on Dense Point Cloud from UAV-Based LiDAR. Remote Sens. 2022, 14, 408. [Google Scholar] [CrossRef]
Awrangjeb, M. Extraction of Power Line Pylons and Wires Using Airborne LiDAR Data at Different Height Levels. Remote Sens. 2019, 11, 1798. [Google Scholar] [CrossRef]
Zhang, R.; Yang, B.; Xiao, W.; Liang, F.; Liu, Y.; Wang, Z. Automatic Extraction of High-Voltage Power Transmission Objects from UAV LiDAR Point Clouds. Remote Sens. 2019, 11, 2600. [Google Scholar] [CrossRef]
Ortega, S.; Trujillo, A.; Santana, J.M.; Suárez, J.P. An Image-Based Method to Classify Power Line Scenes in LiDAR Point Clouds. In Proceedings of the 12th International Symposium on Tools and Methods of Competitive Engineering, Las Palmas de Gran Canaria, Spain, 7–11 May 2018; Organizing Committee of TMCE: Delft, The Netherlands; Las Palmas de Gran Canaria, Spain, 2018; pp. 585–593. [Google Scholar]
Ortega, S.; Trujillo, A.; Santana, J.M.; Suárez, J.P.; Santana, J. Characterization and Modeling of Power Line Corridor Elements from LiDAR Point Clouds. ISPRS J. Photogramm. Remote Sens. 2019, 152, 24–33. [Google Scholar] [CrossRef]
Nardinocchi, C.; Balsi, M.; Esposito, S. Fully Automatic Point Cloud Analysis for Powerline Corridor Mapping. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8637–8648. [Google Scholar] [CrossRef]
Shen, Y.; Huang, J.; Chen, D.; Wang, J.; Li, J.; Ferreira, V. An Automatic Framework for Pylon Detection by a Hierarchical Coarse-to-Fine Segmentation of Powerline Corridors from UAV LiDAR Point Clouds. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103263. [Google Scholar] [CrossRef]
Li, Z.; Tian, Y.; Yang, G.; Li, E.; Zhang, Y.; Chen, M.; Liang, Z.; Tan, M. Vision-Based Autonomous Landing of a Hybrid Robot on a Powerline. IEEE Trans. Instrum. Meas. 2022, 72, 3501711. [Google Scholar] [CrossRef]
Paneque, J.L.; Martínez-de Dios, J.R.; Ollero, A.; Hanover, D.; Sun, S.; Romero, A.; Scaramuzza, D. Perception-Aware Perching on Powerlines with Multirotors. IEEE Robot. Autom. Lett. 2022, 7, 3077–3084. [Google Scholar] [CrossRef]
Chen, Y.; Lin, J.; Liao, X. Early Detection of Tree Encroachment in High Voltage Powerline Corridor Using Growth Model and UAV-Borne LiDAR. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102740. [Google Scholar] [CrossRef]
Hu, J.; He, J.; Guo, C. End-to-End Powerline Detection Based on Images from UAVs. Remote Sens. 2023, 15, 1570. [Google Scholar] [CrossRef]
Schofield, O.B.; Iversen, N.; Ebeid, E. Autonomous Power Line Detection and Tracking System Using UAVs. Microprocess. Microsyst. 2022, 94, 104609. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; AAAI Press: Washington, DC, USA, 1996; pp. 226–231. [Google Scholar]
Han, X.; Liu, C.; Zhou, Y.; Tan, K.; Dong, Z.; Yang, B. WHU-Urban3D: An urban scene LiDAR point cloud dataset for semantic instance segmentation. ISPRS J. Photogramm. Remote Sens. 2024, 209, 500–513. [Google Scholar] [CrossRef]
Guan, H.; Sun, X.; Su, Y.; Hu, T.; Wang, H.; Wang, H.; Peng, C.; Guo, Q. UAV-LiDAR Aids Automatic Intelligent Powerline Inspection. Int. J. Electr. Power Energy Syst. 2021, 130, 106987. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Zeng, Z.; Qiu, H.; Zhou, J.; Dong, Z.; Xiao, J.; Li, B. PointNAT: Large-Scale Point Cloud Semantic Segmentation via Neighbor Aggregation with Transformer. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 5704618. [Google Scholar] [CrossRef]
Shan, L.; Yue, J. Automatic Extraction Algorithm of High Voltage Pylon Based on LiDAR Point Cloud. Laser Optoelectron. Prog. 2021, 58, 2428009. [Google Scholar] [CrossRef]

Figure 1. Pipeline of the overhead transmission line (OTL) digital acceptance system. Abbreviations: UAV, unmanned aerial vehicle; LiDAR, Light Detection and Ranging.

Figure 2. Definition of the main structural components of an OTL tower.

Figure 3. Overall workflow of the proposed shape prior-guided coarse-to-fine tower extraction framework: (a) input OTL scene point cloud; (b) scene preprocessing and candidate clustering; (c) cluster-guided cropping of candidate towers; (d) pre-extracted tower point cloud; (e) identification of the least-disturbed central region; (f) side-view structural refinement; (g) front-view structural refinement; (h) base-constrained filtering; (i) final extracted tower point cloud. The colors in the point cloud panels, except for subfigures (f,g), represent height variation for visualization and do not indicate additional semantic categories. In subfigures (f,g), the blue lines indicate the discrimination boundaries between tower and non-tower regions, whereas the orange points represent the two-dimensional projections of the tower regions.

Figure 4. Candidate tower localization by scene preprocessing and height-suppressed density clustering: (a) raw OTL corridor point cloud; (b) scene down-sampling and statistical denoising; (c) DBSCAN down-sampling and near-ground suppression; (d) candidate tower clusters obtained by DBSCAN. In subfigure (a), the colors represent height variation for visualization. In subfigure (d), different colors denote different candidate tower clusters identified by DBSCAN, whereas black points denote the remaining non-candidate scene points.

Figure 5. Recovery of pre-extracted tower points by cluster-guided bounding-box expansion. The candidate box obtained from the clustered point cloud is projected back to the denser preprocessed scene and enlarged to compensate for tower points removed during near-ground suppression and coarse down-sampling. In the point cloud panels, the rainbow colors represent height variation for visualization and do not indicate additional semantic categories. In the clustering panel inherited from Figure 4d, different non-black colors denote different candidate tower clusters identified by DBSCAN, whereas black points denote the remaining scene points.

Figure 6. Identification of the least-disturbed central region using slice-wise rectangular regularity: (a) division of the pre-extracted tower into low, central, and high regions; (b) variation of the rectangular length–width difference across vertical slices, where the stable interval corresponds to the central region.

Figure 7. Side-view structural envelope inference: (a) initialization of central region structural references; (b) downward extension to the low region and layer-wise upward search in the high region; (c) interpolation-based completion of missing high region boundary segments; (d) final side-view envelope for all height regions.

Figure 8. Front-view contour-based envelope refinement: (a) extraction of the maximum external contour after side-view filtering; (b) height-wise analysis of contour extrema and contour-difference transitions; (c) piecewise construction of the final front-view envelope.

Figure 9. Base-constrained removal of residual ground and vegetation points: (a) tower points after dual-view refinement; (b) reconstruction of the pyramid-like base components using the extrapolated structural axes, inter-axis center lines, and transition-height candidates; (c) final extracted tower after removing residual ground and low vegetation inside and around the tower foot. The rainbow colors in the point cloud panels represent height variation for visualization and do not indicate additional semantic categories. In subfigure (b), the colored circles denote the transition-height candidates used for base reconstruction, and the capitalized letters A–D are used to mark the four base-side components involved in the pyramid-like base constraint construction.

Figure 10. Parameter sensitivity analysis in terms of F1-score. The subfigures correspond to: (a)

M_{d b}

, (b)

δ_{x}

, (c)

δ_{y}

, (d)

ϵ_{d b}

, (e)

h_{f}

, (f)

T_{j}

, (g)

τ_{b}

, (h)

λ

, and (i)

d_{v 1}

. The solid curves represent the mean F1-score across test lines, and the shaded regions indicate the standard deviation.

Figure 10. Parameter sensitivity analysis in terms of F1-score. The subfigures correspond to: (a)

M_{d b}

, (b)

δ_{x}

, (c)

δ_{y}

, (d)

ϵ_{d b}

, (e)

h_{f}

, (f)

T_{j}

, (g)

τ_{b}

, (h)

λ

, and (i)

d_{v 1}

. The solid curves represent the mean F1-score across test lines, and the shaded regions indicate the standard deviation.

Figure 11. Visual results of candidate tower localization on Line1–Line6. Gray points in the full-scene views denote points classified as non-tower by the proposed method, whereas points in other colors denote points classified as tower localizations. In the enlarged views, blue points indicate tower points, brown points indicate non-tower points.

Figure 12. Visual extraction results of the proposed method on Line1–Line6. Gray points in the full-scene views denote points classified as non-tower by the proposed method, whereas points in other colors denote points classified as tower. In the enlarged views, blue points indicate correctly extracted tower points, brown points indicate non-tower points incorrectly extracted as tower points.

Figure 13. Representative error distribution of the proposed method. TP, FP, FN, and TN are shown in red, green, blue, and black, respectively.

Figure 14. Additional experiment on a cat-head-type tower from the public WHU-Urban3D dataset: (a) Ground-truth annotation in the original scene; (b) Tower extraction result in the original scene; (c) Enlarged view of the extracted cat-head-type tower. Brown points denote background or non-tower scene points, while blue points denote the ground-truth tower points or the extracted tower points in the corresponding panels. The result shows that the main structure of the cat-head-type tower can be effectively extracted. However, a short conductor segment and an insulator string near the middle of the tower head are still retained because they overlap with the tower head structure in both side-view and front-view projections.

Figure 15. Ground-truth configurations of the Line7 inter-tower adhesion experiment. Brown points denote background scene points, while blue points denote the ground-truth tower points. Different spacing conditions were generated by horizontally shifting one of the two towers.

D_{c}

denotes the tower-center distance, and

D_{t}

denotes the minimum inter-tower distance.

Figure 15. Ground-truth configurations of the Line7 inter-tower adhesion experiment. Brown points denote background scene points, while blue points denote the ground-truth tower points. Different spacing conditions were generated by horizontally shifting one of the two towers.

D_{c}

denotes the tower-center distance, and

D_{t}

denotes the minimum inter-tower distance.

Table 1. Dataset information and acquisition characteristics.

Dataset	Source	Towers	Pts. (M)	Density	Alt.	FOV	Elev. Var.	Voltage
Line1	Private	3	163.81	2098.60	200	70.4	119.3	500 kV
Line2	Private	5	70.00	704.12	70	70.4	2.8	220 kV
Line3	Private	2	28.67	540.88	60	70.4	1.1	110 kV
Line4	Private	6	134.49	825.09	70	70.4	3.7	220 kV
Line5	Private	8	156.01	605.12	50	70.4	2.3	110 kV
Line6	Public	2	2.14	39.08	NR	NR	1.9	NR

Note: Pts. is reported in million points; density is reported in pts/m²; Alt. denotes flight altitude in m above ground level; FOV is reported in degrees; Elev. var. denotes terrain elevation variation in meters; NR indicates that the corresponding information was not reported in the original public dataset.

Table 2. Parameter settings of the proposed method.

Module	Symbol	Description	Default Value or Tested/Adaptive Range
Preprocessing	$d_{v 1}$	Scene down-sampling voxel size	0.1–0.2 m
	$k_{n n}$	Number of neighbors in SOR	10
	$α_{s o r}$	Standard-deviation multiplier in SOR	5.0
	$d_{g}$	Horizontal grid size for near-ground suppression	1.0 m
	$h_{f}$	Relative-height threshold for near-ground suppression	10–20 m
Candidate tower localization	$d_{v 2}$	Candidate-clustering voxel size before DBSCAN	1.0 m
	$ϵ_{d b}$	DBSCAN neighborhood radius	6–10 m
	$M_{d b}$	Minimum number of points in DBSCAN	150–300
	$h_{d b}$	Cluster-height threshold for candidate screening	15 m
	$b_{x y}$	Horizontal expansion margin of candidate bounding box	2.0 m
central region pose normalization	$Δ z_{c}$	Vertical slice interval for central region search	0.1–2.0 m
	$H_{c}$	Slice window height for local MBR estimation	2.0 m
	$T_{θ}$	Maximum angle range of MBR edges in a stable slice	$5^{\circ}$
	$T_{r}$	Maximum length–width difference of MBR	1.0 m
	$γ_{θ}$	Standard-deviation factor for stable-slice selection	1.0
	$T_{s}$	Maximum allowed difference among absolute boundary slopes	0.02
Multi-view structural refinement	$Δ z_{x}$	Vertical step for side-view envelope construction	0.1 m
	$δ_{x}$	Side-view envelope tolerance in $E_{x} (z)$	0.2–0.5 m
	$Δ z_{y}$	Vertical step for front-view envelope construction	0.1 m
	$δ_{y}$	Front-view envelope tolerance in $E_{y} (z)$	0.2–0.5 m
	$T_{j}$	Front-view contour-jump threshold for tower head transition	1.0–3.0 m
Base-constrained filtering	$Δ z_{b}$	Vertical search step for base reconstruction	0.1 m
	$τ_{a}$	Axis-neighborhood threshold for structural contact search	0.2 m
	$g_{c}$	Minimum contact-gap threshold on inter-axis center lines	0.5 m
	$g_{t}$	Clustering tolerance for transition-height candidates	1.0 m
	$m_{l}$	Lower-margin offset for low region base filtering	0.5 m
	$H_{b}$	Fallback base height if no transition is detected	5.0 m
	$h_{p}$	Thickness of the tower foot plane slab	1.0 m
	$λ$	Weight of the normalized slab-count term in base-top selection	0–0.4
	$τ_{b}$	Distance tolerance to the reconstructed base model $B$	0.6–1.0 m

Table 3. Quantitative comparison of tower extraction performance with representative methods on Line1–Line6.

Method	Type	Metrics	Line1	Line2	Line3	Line4	Line5	Line6	Average
Method in [35]	PointNet-based framework	Precision	49.38	77.75	90.48	81.92	86.19	72.35	76.35
		Recall	94.31	75.07	75.38	73.26	81.05	77.08	79.36
		F1-score	64.82	76.39	82.24	77.35	83.54	74.64	76.50
Method in [36]	PointNet++	Precision	68.63	81.83	94.29	81.44	91.60	92.30	85.01
		Recall	84.96	88.23	83.35	80.35	82.47	90.46	84.97
		F1-score	75.93	84.91	88.48	80.89	86.80	91.37	84.73
Method in [10]	GCN-based framework	Precision	78.04	84.20	92.25	67.58	65.29	44.96	72.05
		Recall	97.45	87.87	92.10	94.29	96.94	91.01	93.28
		F1-score	86.67	86.00	92.17	78.73	78.03	60.19	80.30
Method in [37]	PointNAT	Precision	87.72	90.62	94.25	96.05	94.54	96.17	93.23
		Recall	74.87	82.56	83.75	81.02	88.98	92.17	83.89
		F1-score	80.79	86.41	88.69	87.90	91.68	94.13	88.26
Method in [16]	Improved Random Forest	Precision	69.70	86.63	90.62	90.74	86.72	55.75	80.03
		Recall	98.08	95.58	96.64	97.47	94.75	79.57	93.68
		F1-score	81.49	90.89	93.53	93.98	90.56	65.56	86.00
Method in [19]	JointBoost	Precision	76.49	86.96	81.40	94.80	88.34	77.24	84.21
		Recall	98.92	94.63	96.99	96.92	96.34	89.11	95.48
		F1-score	86.27	90.63	88.51	95.84	92.17	82.75	89.36
Method in [23]	Regularized grid-based method	Precision	69.47	81.71	87.50	84.25	79.77	61.12	77.30
		Recall	97.85	95.15	96.79	97.32	93.88	92.71	95.62
		F1-score	81.25	87.91	91.91	90.31	86.25	73.67	85.22
Method in [27]	Shape-prior-based hierarchical segmentation	Precision	94.35	79.51	85.85	85.52	80.05	57.73	80.50
		Recall	94.51	92.63	97.19	97.31	95.12	96.66	95.57
		F1-score	94.43	85.57	91.16	91.03	86.94	72.28	86.90
Method in [38]	Improved DBSCAN	Precision	82.86	83.88	85.73	92.00	86.10	85.71	86.05
		Recall	94.44	92.95	92.18	93.01	93.63	93.15	93.23
		F1-score	88.27	88.18	88.84	92.51	89.71	89.28	89.46
Proposed Method	Shape-prior-guided coarse-to-fine framework	Precision	98.67	97.04	97.35	96.43	94.65	90.20	95.72
		Recall	99.43	96.61	98.68	99.53	99.14	97.61	98.50
		F1-score	99.05	96.82	98.01	97.96	96.84	93.76	97.07

Note: Bold values indicate the best performance for each metric on each line and in the average column among all compared methods. Rows shaded in orange, green, and blue denote deep learning methods, feature-based methods, and unsupervised methods, respectively.

Table 4. Statistical significance test based on line-level F1-scores.

Comparison Method	Type	Reported F1 (%)	Proposed F1 (%)	ΔF1 (%)	p-Value
Method in [35]	PointNet-based framework	76.50	97.07	20.57	0.0156
Method in [36]	PointNet++	84.73	97.07	12.34	0.0156
Method in [10]	GCN-based framework	80.30	97.07	16.77	0.0156
Method in [37]	PointNAT	88.26	97.07	8.81	0.0313
Method in [16]	Improved Random Forest	86.00	97.07	11.07	0.0156
Method in [19]	JointBoost	89.36	97.07	7.71	0.0156
Method in [23]	Regularized grid-based method	85.22	97.07	11.85	0.0156
Method in [27]	Shape-prior-based hierarchical segmentation	86.90	97.07	10.17	0.0156
Method in [38]	Improved DBSCAN	89.46	97.07	7.61	0.0156

Note: The p-value was obtained by the one-sided Wilcoxon signed-rank test using the six line-level F1-scores of the proposed method and each representative method. The average F1-score is reported as the macro-average over Line1–Line6.

Table 5. Single-tower extraction accuracy and completeness on Line1–Line6.

Tower	Precision	Recall	F1 Score
Line1 T1	99.31	99.79	99.55
Line1 T2	97.85	99.57	98.70
Line1 T3	98.21	98.96	98.59
Line2 T1	97.45	97.97	97.71
Line2 T2	95.32	93.59	94.45
Line2 T3	96.90	96.46	96.68
Line2 T4	98.94	97.74	98.34
Line2 T5	95.01	96.82	95.90
Line3 T1	98.16	99.67	98.91
Line3 T2	96.82	98.02	97.42
Line4 T1	96.17	99.74	97.92
Line4 T2	97.45	98.21	97.82
Line4 T3	96.46	99.88	98.14
Line4 T4	96.37	99.85	98.08
Line4 T5	96.19	99.90	98.01
Line4 T6	95.99	99.54	97.74
Line5 T1	94.53	99.28	96.84
Line5 T2	96.07	98.77	97.40
Line5 T3	96.20	99.23	97.69
Line5 T4	96.72	98.96	97.83
Line5 T5	92.41	99.58	95.86
Line5 T6	97.70	98.29	97.99
Line5 T7	92.11	99.28	95.56
Line5 T8	93.08	99.30	96.09
Line6 T1	88.26	99.21	93.41
Line6 T2	93.22	95.36	94.27

Table 6. Computational efficiency comparison of representative methods and the proposed method on Line1–Line6.

Method	Type	Training Time (s)	Inference Time (s)	Time per Million Points (s/Mpts)
Method in [35]	Deep learning	12,358.29	1093.63	1.96
Method in [36]		46,234.72	18,234.88	34.84
Method in [10]		20,582.36	6632.74	11.97
Method in [37]		82,945.98	32,765.34	64.15
Method in [16]	Feature-based	87,234.79	10,972.34	19.69
Method in [19]	Feature-based	42,923.25	14,824.38	25.26
Method in [23]	Unsupervised	–	1288.42	2.35
Method in [27]		–	1234.46	2.23
Method in [38]		–	51,173.21	92.53
Proposed Method		–	1067.16	1.93

Note: Bold values indicate the best performance for each metric on each line and in the average column among all compared methods. “–” indicates that no training stage is required. Training time is reported separately for supervised methods and is not included in the inference-time comparison.

Table 7. Ablation study of the main modules of the proposed method on Line1–Line6.

Variant	Ablation Setting	Precision (%)	Recall (%)	F1-Score (%)	ΔF1 (%)
Full model	Central-region guidance + side/front views + geometric base model	96.51	97.93	97.21	–
Without central region guidance	Whole candidate for pose/reference estimation; other modules retained	48.90	40.80	43.61	−53.60
Without dual-view refinement	Side-view refinement only; other modules retained	90.41	98.42	94.23	−2.98
Without base-constrained filtering	Height-threshold filtering only; other modules retained	81.77	98.36	89.20	−8.01

Note: Bold values indicate the best performance for each metric on each line and in the average column among all compared methods. ΔF1 denotes the F1-score difference between each ablation variant and the full model. In the variant without base-constrained filtering, the geometric base model is replaced by a simple height-threshold filter with a threshold of 1 m above the minimum height of the refined candidate point cloud.

Table 8. Additional experiment on inter-tower adhesion using Line7.

Line	Parameter	$D_{c}$ (m)	$D_{t}$ (m)	Candidate Number	F1-Score (%)
Line7_1	Default	27.3	12.6	2	98.83
Line7_2	Default	24.1	9.4	2	98.56
Line7_3	Default	21.9	7.7	1	No result
Line7_3	$ϵ_{d b} = 6$	21.9	7.7	2	98.58
Line7_4	$ϵ_{d b} = 6$	19.1	4.6	2	95.41
Line7_4	$ϵ_{d b} = 6, λ = 0.2$	19.1	4.6	2	98.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tong, C.; Shen, Y.; Zhang, K.; Wei, H. Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds. Remote Sens. 2026, 18, 2082. https://doi.org/10.3390/rs18132082

AMA Style

Tong C, Shen Y, Zhang K, Wei H. Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds. Remote Sensing. 2026; 18(13):2082. https://doi.org/10.3390/rs18132082

Chicago/Turabian Style

Tong, Chaoliu, Yu Shen, Kanjian Zhang, and Haikun Wei. 2026. "Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds" Remote Sensing 18, no. 13: 2082. https://doi.org/10.3390/rs18132082

APA Style

Tong, C., Shen, Y., Zhang, K., & Wei, H. (2026). Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds. Remote Sensing, 18(13), 2082. https://doi.org/10.3390/rs18132082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Shape Prior-Guided Coarse-to-Fine Extraction of Overhead Transmission Line Towers from UAV LiDAR Point Clouds

Highlights

Abstract

1. Introduction

2. Background

2.1. Application Background

2.2. Structural Priors of OTL Towers

3. Methodology

3.1. Overview of the Proposed Method

3.2. Candidate Tower Localization

3.2.1. Scene Preprocessing and Height-Suppressed Candidate Clustering

3.2.2. Cluster-Guided Recovery of Pre-Extracted Tower Points

3.3. Tower Precise Extraction

3.3.1. Central Region-Guided Pose Normalization and Main-Axis Estimation

3.3.2. Multi-View Structural Refinement

3.3.3. Base-Constrained Removal of Residual Ground and Vegetation Points

4. Experiments and Results

4.1. Datasets

4.2. Parameterization Strategy and Sensitivity Analysis

4.2.1. Parameterization Strategy

4.2.2. Sensitivity Analysis

4.3. Candidate Tower Localization Results

4.4. Final Tower Extraction Results

4.5. Quantitative Comparison with Representative Methods

4.6. Tower-Level Extraction Accuracy and Completeness

4.7. Error Analysis and Computational Efficiency

4.8. Ablation Study

5. Discussion

5.1. Applicability Boundary for Different Tower Configurations

5.2. Potential of Adaptive Parameter Selection

5.3. Influence of Inter-Tower Adhesion

5.4. Error Propagation and Future Extensions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Visual Results of Representative Methods

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI