Next Article in Journal
Mining Remnants Hindering Forest Management Detected Using Digital Elevation Model from the National Airborne Laser Scanning Database (Kłobuck Forest District and Its Environs, Southern Poland)
Previous Article in Journal
Quantifying Multi-Scale Carbon Sink Capability in Urban Green Spaces Using Integrated LiDAR
Previous Article in Special Issue
Optimal Phenology Windows for Discriminating Populus euphratica and Tamarix chinensis in the Tarim River Desert Riparian Forests with PlanetScope Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Framework for Individual Tree Segmentation in Complex Urban Forests from Terrestrial LiDAR Point Clouds

1
College of Architecture, Nanjing Tech University, Nanjing 211816, China
2
Shanghai Chenshan Botanical Garden, Songjiang, Shanghai 201602, China
3
Interdisciplinary Innovation Institute, Xi’an University of Architecture and Technology, Xi’an 710054, China
*
Author to whom correspondence should be addressed.
Forests 2026, 17(1), 36; https://doi.org/10.3390/f17010036
Submission received: 7 November 2025 / Revised: 20 December 2025 / Accepted: 24 December 2025 / Published: 26 December 2025

Abstract

Accurate individual tree inventories are fundamental to urban forest management, yet automated delineation from Terrestrial Laser Scanning (TLS) data remains a challenge. This study presents a two-stage hybrid framework that combines a domain-adapted deep learning model (TreeLA-Net) with a geometric algorithm (SEGR) to solve this issue, aiming to reduce the need for instance-level annotations. TreeLA-Net first generates semantic labels, outperforming the baseline RandLA-Net by 2.5 percentage points in overall accuracy. Subsequently, SEGR leverages these priors to achieve a tree detection rate of 92.0% on our primary study site. To assess the framework’s transferability, an external validation was conducted on a new, independent site, where the model, without retraining, yielded a recall of 81.5%. These findings suggest that the framework is not strictly overfitted and possesses generalization capabilities. The proposed approach is offered as a potential tool to support data-driven urban forest management, particularly for automated tree mapping and inventory. We hope that this study may contribute to ongoing efforts to develop robust methods for characterizing complex urban forest structures.

1. Introduction

Urban forests are critical ecosystems that provide essential services, including carbon sequestration, air quality improvement, and biodiversity conservation, which are fundamental to sustainable urban development [1,2]. Effective management and planning for these vital green infrastructures depend on accurate, timely, and scalable forest inventories. Traditionally, these inventories rely on manual field surveys, a process that is not only labor-intensive and costly but also prone to error and impractical for large-scale or continuous monitoring, especially in structurally complex, mixed-species stands [3]. The advent of Terrestrial Laser Scanning (TLS) has revolutionized forest mensuration by providing highly detailed 3D structural data. However, a significant bottleneck persists in translating this raw data into actionable, tree-level information. The automated delineation of individual trees—a prerequisite for deriving key inventory metrics—remains a formidable challenge in dense urban canopies with interlocking crowns and complex understory, hindering the widespread operational adoption of TLS for urban forest management [4].
The direct application of general-purpose deep learning algorithms, which have excelled in other domains, has proven suboptimal for forestry applications due to the unique characteristics of forest point clouds. Forest scenes are inherently vertically stratified and suffer from extreme class imbalance; for instance, tree trunks, which are essential for identifying individual stems, often represent less than 15% of the total data points [5]. This leads to poor recognition of these critical yet sparse features by standard models. Furthermore, the intricate and often overlapping branch structures within dense canopies create geometrically ambiguous boundaries that confuse generic classifiers. These domain-specific challenges underscore the need for specialized deep learning architectures that are explicitly designed to interpret the complex, hierarchical structure of forest ecosystems.
While foundational deep learning models for point clouds, such as PointNet [6] and PointNet++ [7], established key principles for 3D data processing, their application to large-scale, unstructured forest scenes is limited. More recent models like RandLA-Net have improved efficiency but still struggle with forestry-specific problems [8]; for example, its random sampling mechanism can inadvertently discard the very trunk points that are crucial for tree identification. This indicates a clear need to move beyond the direct application of existing models and towards the development of novel architectures that are fundamentally adapted to the nuances of forest point cloud data.
Moreover, even with accurate semantic classification (i.e., labeling points as trunk, crown, or ground), the subsequent step of grouping these points into discrete individual tree instances presents another major hurdle. Conventional geometric post-processing methods often fail in complex stands: connectivity analysis tends to merge adjacent trees (under-segmentation) [7], clustering algorithms like DBSCAN are notoriously sensitive to parameter settings [9], and trunk-based methods are unreliable where stems are occluded [10]. While end-to-end deep learning models for instance segmentation exist, they typically require vast amounts of manually annotated instance-level training data—a practical impossibility for most large-scale forestry projects [11]. This predicament highlights a critical research opportunity: to develop hybrid frameworks that combine the semantic feature-learning power of deep learning with the logic and efficiency of geometric algorithms, thereby creating a solution that is both accurate and scalable [12].
Therefore, the primary aim of this study is to develop and validate a novel, two-stage hybrid framework, TreeLA-Net + SEGR, for high-accuracy individual tree segmentation in complex urban forests, specifically designed to operate without the need for instance-level annotations. We introduce a deep learning model (TreeLA-Net) tailored for robust semantic segmentation in forest scenes and a geometrically driven algorithm (SEGR) for precise instance delineation. By systematically evaluating this framework across a gradient of forest densities, we also seek to establish clear applicability boundaries, providing a practical and validated solution that advances the field of precision forestry and offers a critical tool for modern, data-driven urban forest management.

2. Materials and Methods

2.1. Study Area and Data Acquisition

The point cloud data for this study were acquired from a 1-hectare (ha) permanent monitoring forest plot located in the Shanghai Haiwan National Forest Park, China (30°49′ N, 121°40′ E). This large-scale urban ecological forest, established using near-natural afforestation techniques, features a complex, multi-layered vegetation structure characteristic of mature urban woodlands (Figure 1) [13]. The plot, established in 2000, contains a mixed-species community. The upper canopy is composed of softwoods species (Camphora officinarum Nees ex Wall., Ligustrum lucidum W.T.Aiton, Michelia chapensis Dandy and Photinia davidsoniae Rehder & E.H.Wilson) and deciduous species (Sapium sebiferum (L.) Small, Camptotheca acuminata Decne., Koelreuteria paniculata Laxm., and Cornus wilsoniana Wangerin). A dense understory layer, dominated by Distylium racemosum Siebold & Zucc. and Pittosporum tobira (Thunb.) W.T.Aiton, further contributes to the structural complexity of the site.
Data were collected using a RIEGL VZ-4000 Terrestrial Laser Scanner (TLS) (RIEGL Laser Measurement Systems GmbH, Horn, Austria), which provides a scanning accuracy of ±5 mm over a range of 0.5–400 m. This high-precision instrument is well-suited for capturing the detailed structural information required for individual tree segmentation in complex environments. A multi-scan approach was adopted, with scanner positions strategically distributed throughout the plot to ensure complete coverage and minimize occlusion effects. A total of 15 scan stations (base points) were strategically established. The spatial distribution of these stations was designed to follow a systematic grid pattern of approximately 25 m × 25 m, with additional scans placed opportunistically in areas of particularly dense understory vegetation. This strategy guaranteed the spatial continuity and integrity of the final co-registered point cloud (Figure 2). The raw dataset for the entire 1-ha plot comprises approximately 1.2 billion points.

2.2. Data Preprocessing and Annotation

The raw TLS point cloud required a multi-step preprocessing workflow to prepare it for model training. First, noise was removed using a Statistical Outlier Removal (SOR) fil-ter configured with a neighborhood size (k) of 30 points and a standard deviation multiplier threshold (α) of 1.0. These parameters were determined empirically to achieve an optimal balance between effective noise removal and the preservation of fine structural details, such as small branches and foliage. Second, the point cloud was classified into ground and non-ground points using the Cloth Simulation Filter (CSF) algorithm [14]. The CSF is a state-of-the-art method specifically designed and widely validated for accurately separating ground points in complex, vegetated terrain. Its parameters were carefully tuned to the specific conditions of our forest plot. A high-resolution (0.25 m) Digital Elevation Model (DEM) was then directly generated by interpolating this high-quality set of the classified ground points. To eliminate terrain-induced height variations, a normalized height (i.e., height above ground) was calculated for every non-ground point. This was achieved by querying the DEM for the ground elevation at each point’s XY location and subtracting this value from the point’s absolute Z-coordinate. This standardized and robust workflow provided a reliable basis for all subsequent height-based analyses. Finally, the point clouds from all scan stations were co-registered into a single, unified coordinate system, specifically the UTM Zone 51N with a WGS84 datum.
For the development and direct geometric evaluation of our point-level segmentation models, a high-quality annotated dataset was created through meticulous manual labeling of the point cloud in CloudCompare (v2.12). This annotated dataset serves as the necessary training and validation data for the deep learning algorithm, a standard requirement for developing such supervised models. Its primary purpose is to enable the model to learn point-level geometric patterns and to allow for a direct assessment of the segmentation’s geometric fidelity (i.e., how accurately points are assigned to the correct class and instance). We acknowledge that the ultimate validation of any remote sensing-derived forest inventory product comes from comparison against independent, high-accuracy field measurements. However, for the specific methodological goal of this study—to develop and assess the geometric accuracy of the segmentation algorithm itself—the manually annotated point cloud provides the direct, point-wise reference needed.
Our annotation scheme was designed to capture the dominant structural elements of the forest, assigning all points to one of four functionally distinct semantic classes. The Ground class includes the terrain surface and any low-lying vegetation (e.g., grass, seedlings) structurally indistinguishable from it at the given point cloud resolution. The Shrub class encompasses understory woody vegetation without a distinct single-stem form, typically below 2.5 m in height. The Trunk class represents the entire woody framework of a tree, including its main stem and major structural branches. Finally, the Crown class primarily consists of the tree’s foliage (leaves and fine branches). In addition to these semantic labels, each of the 441 individual trees, comprising both trunk and crown points, was assigned a unique instance ID to provide the ground truth for the instance segmentation task.
To prepare the data for the deep learning model, several critical preprocessing steps were taken. First, all point coordinates were globally normalized to a unit space [15]. This normalization is essential for two primary reasons: (1) it ensures scale invariance, allowing the network to learn geometric patterns independent of the point cloud’s absolute coordinates or scale, and (2) it improves training stability and accelerates convergence by scaling features to a consistent range (e.g., [−1, 1]), which mitigates numerical issues during backpropagation. This was achieved by subtracting the global centroid (Cglobal) from each point’s raw coordinates (Praw) and dividing by the maximum spatial extent of the dataset (Smax), as shown in Equation (1):
P n o r m = P r a w C g l o b a l S m a x
Second, to manage the computational demands, the normalized point cloud was partitioned into a grid of 20 m × 20 m sub-plots with a 5 m overlap between adjacent plots [16]. This resulted in 36 sub-plots for training and 9 for testing. For each point within these sub-plots, a 15-dimensional feature vector was then computed. This vector comprised its normalized 3D coordinates (XYZ) and 12 geometric features derived from the covariance matrix of its 20 nearest neighbors [17]. Following the methodology, these features include 3D shape features (Linearity, Planarity, Sphericity, Omnivariance, Anisotropy, Eigenentropy, Sum of eigenvalues, and Change of curvature), the three components of the 3D normal vector, and Verticality. This feature engineering step provides the network with rich, multi-scale geometric information beyond simple coordinates. Finally, the annotated dataset was split into training (80%) and testing (20%) sets.

2.3. The TreeLA-Net Framework for Semantic Segmentation

To address the limitations of generic point cloud models in forest environments, we developed TreeLA-Net, a novel deep learning architecture adapted from RandLA-Net [18]. As illustrated in Figure 3, our framework introduces targeted enhancements to the input layer, attention mechanism, and loss function to improve the perception of vertically stratified and class-imbalanced forest scenes.
Conventional networks often rely solely on XYZ coordinates, forcing them to learn fundamental geometric properties from scratch. To overcome this, TreeLA-Net employs a multi-modal input layer that fuses the 12 pre-computed geometric features with the raw 3D coordinates. This 15-dimensional input vector provides the network with explicit geometric priors, which accelerates convergence and enhances the model’s ability to discriminate between classes with distinct structural signatures (e.g., vertical trunks vs. spherical crowns) [19,20].
Standard attention mechanisms treat all spatial relationships equally, a suboptimal approach for vertically structured forest ecosystems. We introduce the Height-Stratified Self-Attention (HSSA) module, which explicitly incorporates vertical awareness into the attention calculation. HSSA modulates the standard self-attention weights using a learnable, height-based decay factor, effectively prioritizing feature aggregation among points at similar vertical elevations.
The core of the HSSA module lies in a height-aware modulation mechanism for attention weights. Given a feature map F R B × C × N (where B denotes batch size, C the number of feature channels, and N the number of points), the height coordinate zi of each point is extracted to calculate the pairwise height difference:
Δ h i j = | z i z j |
The standard self-attention weight is calculated as:
α i j standard = exp ( Q i K j T ) k exp ( Q i K k T )
where Q i and K j are the query and key vectors, respectively.
The HSSA module introduces a height modulation factor β i j , which decays with increasing height difference:
β i j = exp ( λ Δ h i j )
where λ is a learnable parameter that controls the influence of height difference.
The final HSSA attention weight is defined as:
α i j HSSA = α i j standard β i j k ( α i k standard β i k )
This design enables the model to capture strong intra-layer dependencies (e.g., within the canopy layer) while preserving weaker but important inter-layer relationships, significantly improving its understanding of the forest’s vertical structure [21,22].
To simultaneously address class imbalance (i.e., the scarcity of trunk points) and ambiguous class boundaries, we adopt a hybrid loss function that combines a weighted Focal Loss and a Dice Loss:
L t o t a l = λ f o c a l L f o c a l + λ d i c e L d i c e
The weighted Focal Loss component up-weights the contribution of the rare trunk class and other hard-to-classify samples, forcing the model to focus on these challenging examples. The Dice Loss component directly optimizes the spatial overlap between predicted and ground-truth segments, which is particularly effective for refining class boundaries and is inherently robust to class imbalance [23].
In our implementation, Lfocal and λdice were set to 1.0 and 0.5, respectively, based on empirical validation. This configuration effectively enhances the model’s ability to recognize rare classes (e.g., trunks) and improves the precision of class boundaries.

Semantic Segmentation Accuracy Evaluation

The TreeLA-Net model was implemented in TensorFlow (v2.4.1) and trained on a workstation equipped with an NVIDIA RTX A4000 GPU (NVIDIA Corporation, Santa Clara, CA, USA; 16 GB VRAM). We used the Adam optimizer with an initial learning rate of 0.002 and a batch size of 32. A learning rate decay scheduler was employed, which halved the learning rate every 5 epochs if the validation loss did not improve. The model was trained for a maximum of 100 epochs. The optimal combination of hyperparameters was selected by monitoring the model’s classification accuracy during training (Table 1).

2.4. SEGR: A Semantic-Guided Instance Segmentation Algorithm

Following semantic segmentation, we introduce our Semantic-guided Extraction via Region Growing (SEGR) algorithm to delineate individual tree instances. This unsupervised method leverages the semantic priors from TreeLA-Net to achieve accurate instance segmentation without requiring any instance-level training data.

Multi-Source Anchor Extraction Strategy

Accurate tree localization is the critical first step. To ensure robustness against trunk occlusion and segmentation errors, SEGR employs a hybrid anchor generation strategy that combines two complementary information sources:
(1)
Trunk Anchors: All points classified as “trunk” are clustered using DBSCAN. The centroid of each resulting cluster serves as a high-confidence “trunk anchor,” leveraging the strong prior that each tree has a single main stem.
(2)
Crown Density Anchors: To detect trees whose trunks are occluded or were missed during semantic segmentation, a 2D Kernel Density Estimation (KDE) is applied to all “crown” points [24]. The local maxima of this density map are identified as “crown density anchors”.
These two sets of anchors are then fused. A distance-based filter (threshold = 1.5 m, based on minimum tree spacing) and a point-count filter (threshold = 200 points, based on minimum crown size) are applied to remove redundant anchors and false positives, yielding a final, robust set of seed points for instance segmentation.
With the hybrid anchors as seeds, a competitive region growing algorithm assigns all “crown” points to their respective tree instances [25]. Unlike traditional methods that use 2D distance, SEGR employs a 3D weighted Euclidean distance that prioritizes horizontal proximity over vertical proximity (Wxy = 0.8, Wz = 0.2). This aligns with the natural growth patterns of trees, where crowns expand horizontally. The growth process initiates simultaneously from all seeds. When growth fronts collide, points in the contested area are assigned to the closest anchor based on this weighted distance. An iterative boundary refinement step is applied to ambiguous boundary points to ensure smooth and plausible crown shapes. Finally, any small, unassigned point clusters are merged with the nearest tree instance [26].

2.5. Experimental Design for Performance Evaluation

To comprehensively evaluate the applicability of the TreeLA-Net + SEGR framework across forest scenarios of varying complexity, a performance testing protocol based on density gradients was designed. This experiment quantifies the relationship between algorithm performance and scene complexity, thereby delineating the application scope and performance boundaries of the proposed method.

2.5.1. Construction of Test Subsets and Complexity Classification

To assess the framework’s robustness and define its practical application boundaries, we evaluated its performance across three levels of forest complexity, which were simulated by randomly sampling subsets of trees from the main test plot [27]. The complexity levels were defined based on tree density and crown overlap. The complexity levels were defined based on tree density and crown overlap. The thresholds for these levels were established based on a combination of established stand density classification principles from forestry literature [12] and a data-driven analysis of our own study site’s structural characteristics. The “high-density” scenario was defined to represent the full, unaltered stand conditions of our complex plot, while the “low” and “medium” density thresholds were set to simulate a realistic gradient of structural complexity found across various urban forest types. The specific parameters are as follows:
  • Low-density scenario: 10–20 trees per subset, with a tree density of <0.05 trees/m2 and crown overlap of <20%. This category represents simple forest scenes characterized by sparsely distributed trees and minimal crown overlap.
  • Medium-density scenario: 30–50 trees, with a tree density of 0.05–0.1 trees/m2 and crown overlap of 20%–40%. This category represents typical urban forest configurations, where localized crown overlap occurs but individual tree remains generally distinguishable.
  • High-density scenario: ≥88 trees per subset, with tree density of >0.1 trees/m2 and crown overlap of >40%. This category represents complex scenes (e.g., dense mixed forests) featuring severely interlocking crowns and complex understory vegetation.

2.5.2. Evaluation Metrics

The full semantic and instance segmentation pipeline was run on each subset, and performance metrics were recorded to analyze the relationship between algorithm accuracy and scene complexity.
Semantic Segmentation was evaluated using Overall Accuracy (OA), mean Intersection over Union (mIoU), and the Kappa coefficient. Instance Segmentation was evaluated using Completeness (Recall), Correctness (Precision), and instance-level mIoU. OA represents the proportion of correctly classified points to the total number of points, serving as a direct indicator of the model’s overall classification performance. MIoU is calculated as the average of the IoU for each semantic class, providing a holistic measure of segmentation accuracy across all target categories. Kappa coefficient quantifies the consistency between the model’s classification and the ground truth, while accounting for the possibility of random agreement, which is particularly valuable in scenarios with class imbalance.
Instance matching was performed using the Hungarian algorithm with an IoU threshold of 0.5 to determine a true positive detection.

2.6. Baseline Models for Comparison

To benchmark the performance of our proposed TreeLA-Net, we compared it against two influential deep learning models for point cloud segmentation: PointNet [6] and the original RandLA-Net [8]. PointNet serves as a pioneering baseline, while RandLA-Net represents the state-of-the-art architecture from which our model was adapted.
For a fair and direct comparison, both baseline models were implemented using their publicly available, official codebases. They were trained and evaluated on our dataset under the exact same conditions as TreeLA-Net. This included using the identical training and testing data split, the same data preprocessing workflow, and the same hardware environment. To ensure a level playing field, the 15-dimensional feature vector (XYZ + 12 geometric features) was used as input for both RandLA-Net and our TreeLA-Net, while PointNet was trained on the XYZ coordinates as per its original design. The hyperparameters for the baseline models were tuned to achieve their best performance on our dataset.

2.7. Experimental Protocol for External Validation

To rigorously assess the generalization capability of our framework, we conducted an external validation experiment on an entirely independent dataset. This dataset was acquired from a forest stand on Chongming Island, Shanghai (31°30′ N, 121°30′ E), an environment structurally and ecologically distinct from our primary training site. While the primary site is a mature, mixed-species urban forest, the Chongming validation site is a coastal plantation characterized by a different species composition, stand age, and canopy structure. This domain shift provides a robust test of the model’s transferability.

2.7.1. Data Acquisition and Preprocessing

The validation point cloud was collected using a Feima SLAM200 handheld mobile laser scanner (Feima Robotics, Shenzhen, China). This system leverages a multi-platform SLAM (Simultaneous Localization and Mapping) algorithm to generate georeferenced 3D point clouds in GNSS-denied environments. The scanner has a range of 120 m, a point acquisition rate of 320,000 points/s, and a nominal relative accuracy of ±5 cm. Data were collected by walking systematic transects to ensure comprehensive coverage of the plot.

2.7.2. Ground Truth Generation

A high-quality reference dataset was generated for validation. Individual trees were first automatically segmented from the point cloud using a geometry-based approach and then subjected to a meticulous manual verification and refinement process. Any poorly defined or incomplete tree point clouds were excluded, resulting in a final ground truth dataset of 508 well-delineated individual trees.

2.7.3. Validation Protocol

To simulate a real-world deployment scenario and test for site-specific overfitting, the TreeLA-Net model—trained exclusively on the Haiwan Park dataset—was applied directly to the Chongming dataset without any retraining or fine-tuning. The continuous point cloud was first partitioned into 20 m × 20 m subplots with a 5 m buffer to ensure seamless processing. After running the full TreeLA-Net + SEGR pipeline on each subplot, duplicate tree instances predicted in the overlapping buffer zones were identified and merged based on the proximity of their centroids (DBSCAN, ε = 1.8 m). The final instance segmentation performance was evaluated against the ground truth using Precision, Recall (Detection Rate), F1-Score, and mean Intersection over Union (mIoU), with a match defined by an IoU threshold of 0.5.

3. Results

3.1. Performance of the TreeLA-Net Semantic Segmentation Model

To qualitatively assess the impact of our novel Height-Stratified Self-Attention (HSSA) module, we visualized the high-dimensional feature space learned by the TreeLA-Net encoder. A t-SNE projection of the 256-dimensional features into a 3D space reveals that points from different vertical strata (ground, shrub, trunk, crown) form distinct and well-separated clusters (Figure 4). Feature overlap is minimal and primarily confined to logical transition zones, such as the interface between the upper trunk and the lower crown. This spatial organization in the feature space confirms that the HSSA module successfully embeds height information into the feature learning process, enabling the model to develop a robust, vertically aware understanding of the forest structure.
Further analysis of the HSSA module’s internal behavior via a feature affinity matrix provides quantitative support for this conclusion (Figure 5). Results indicate that the intra-layer affinity (matrix diagonal) is consistently higher than inter-layer affinity. The shrub layer exhibited the highest intra-layer affinity (0.637), followed by the ground layer (0.620), trunk layer (0.610) and crown layers (0.587). This confirms that the HSSA module achieves its design objective of strengthening intra-layer feature correlations, as it effectively captures the stronger feature associations among points within the same height layer. Furthermore, affinity between adjacent layers (e.g., 0.630 for ground-shrub and 0.616 for trunk-crown) was higher than that between non-adjacent layers (e.g., 0.603 for ground-crown). This indicates that while the module reinforces intra-layer relationships, it preserves cross-layer structural information, which is critical for the model to interpret the overall vertical hierarchy of forests and avoid classification errors arising from disconnected inter-layer information.
A feature response analysis further revealed that HSSA dedicates greater attention to structurally complex classes like trunks and shrubs, which validates its ability to focus on hard-to-classify categories (Figure 6). As shown in Figure 6, the feature response strength for shrubs (composite score: 0.466) and trunks (0.461) was significantly higher than that for crowns (0.442) and the ground (0.441). This pattern is consistent with geometric characteristics of forest scene: shrubs are intermingled with ground cover, and trunks extend vertically across multiple height layers. Both classes exhibit high spatial complexity and feature variance, requiring more refined feature learning. In contrast, crowns are concentrated in the upper canopy with relatively regular shapes, and the ground exhibits near-horizontal and uniformly distributed. Both present stronger geometric regularities, resulting in lower learning difficulty. This confirms that the HSSA module exhibits stronger feature activation in structurally complex regions, enabling enhance fine-grained recognition of hard-to-classify categories and further validating its effectiveness.
Figure 7 presents trunk segmentation results of TreeLA-Net on a representative sample, with Figure 7a showing the ground truth and Figure 7b displaying the model’s predictions. Visualization confirms that the model successfully captures the vertical continuity of the trunk structure. The predicted trunk points exhibit a distinct cylindrical distribution, highly consistent with the spatial morphology of the ground truth. Even in regions where trunks and shrubs intersect, the model accurately discriminates between the two semantic classes, minimizing inter-class confusion. This result provides additional evidence that the HSSA module significantly improves trunk segmentation accuracy, laying a robust semantic foundation for subsequent individual tree instance segmentation.
TreeLA-Net significantly outperformed both PointNet and the baseline RandLA-Net in all key semantic segmentation metrics (Table 2). Our model achieved an Overall Accuracy (OA) of 89.8% and a mean Intersection over Union (mIoU) of 81.9%. This represents a substantial improvement of 5.5 percentage points in OA and 4.5 percentage points in mIoU over PointNet, and a 2.5 and 1.8 percentage point improvement over RandLA-Net, respectively.
As shown in Table 3, TreeLA-Net achieved an IoU of 92.3% for the crown class and 91.4% for the ground class, outperforming the comparative algorithms with consistent improvements. For the trunk class (a highly confused category, primarily with shrubs), the IoU reached 61.4%, which represents a 3.6% point improvement over the original RandLA-Net. The shrub class IoU was 82.6%, representing a 1.9% point increase compared to RandLA-Net, indicating enhanced ability to distinguish shrubs from other classes.
The confusion matrix analysis (Figure 8) show that TreeLA-Net achieved recognition accuracies of 92% for crowns and 61% for trunks, which was superior to PointNet (89% for crowns, 54% for trunks) and RandLA-Net (91% for crowns, 58% for trunks). All three models exhibited primary class confusion between shrubs and trunks. However, TreeLA-Net’s shrub-trunk confusion rate (12%) was lower than that of PointNet (15%) and the original RandLA-Net (14%). This confusion arises from spatial and height overlap between trunks and shrubs in the study plot, which limits complete discrimination based on current features. Nevertheless, the achieved accuracy is sufficient to support subsequent individual tree segmentation tasks.

3.2. Performance of the SEGR Instance Segmentation Algorithm

Leveraging the high-quality semantic priors from TreeLA-Net, our unsupervised SEGR algorithm successfully delineated individual trees from the point cloud (Figure 9). From 88 target trees in the test set, the SEGR method, which was based on a hybrid anchor strategy, successfully detected 81 trees, achieving a detection rate of 92.0%. Overall performance metrics included a recall of 80.68%, precision of 86.59%, and instance-level mIoU of 56.18%, demonstrating the method can effectively extract individual tree instances in complex urban forests.
To clarify the role of different anchor strategies in SEGR post-processing, an ablation study was conducted to compare three semantic-guided schemes: a baseline method using only trunk anchors derived from semantic segmentation, a second method using only crown density peaks as anchors, and our proposed hybrid strategy that combines both trunk anchors and crown density supplements. As shown in Table 4, the hybrid strategy leverages a complementary mechanism: it first clusters trunk points to generate initial anchors, then performs crown density peak detection in unassigned areas to supplement missed trees. This approach detected 81 of the 88 trees (a detection rate of 92.0%), with a recall of 80.7%, precision of 86.6% and an mIoU of 56.2%. Compared to the trunk-anchor-only strategy, the hybrid method improved recall by 21.4% points and mIoU by 20.5% points. Compared to the crown density only strategy, it increased the detection rate by 29.5% points and recall by 22.7% points.

3.3. Framework Performance Across Forest Density Gradients

The TreeLA-Net+SEGR framework achieves peak performance in medium-density scenarios (0.05–0.1 trees/m2). Despite facing over-segmentation challenges in low-density scenarios and boundary blurring in high-density scenarios, it maintains an approximately 80% detection rate, showing strong cross-density generalization ability.
As shown in Figure 10a, mIoU exhibits a downward trend with increasing tree density: in low-density scenarios, mIoU values are 71.8% (10-tree scene) and 71.6% (30-tree scene), both stably above the 65% high-quality threshold; in medium-density scenarios, mIoU values are 68.2% and 66.9%, remaining near the threshold; in high-density scenarios, mIoU drops to 56.2% (approximately 9% below the threshold). Beyond the critical density, severe crown interlocking causes boundary blurring, which becomes the dominant factor limiting segmentation accuracy. Figure 10b further supports this conclusion: low-density scenarios present a typical “high precision-low recall” pattern. The 50-tree medium-density scene reaches 82% recall and 72% precision (optimal balance between detection completeness and accuracy; high-density scenarios show 78% recall and 85% precision (characterized by reduced under-detection), attributed to the hybrid anchor strategy’s effective compensation for trunk-occluded regions. Figure 10c intuitively reflects the algorithm’s detection capability: the 10-tree scene detects 8 trees (20% under-detection rate); the 30-tree scene detects up to 45 trees (50% over-detection rate, indicating the algorithm tends to misjudge scattered crowns of individual trees as multiple trees under sparse distribution); the 40-tree and 50-tree scenes have 52 and 57 trees, respectively (13%–14% deviation, showing significantly improved accuracy); the high-density 88-tree scene detects 81 trees (8% under-detection rate, demonstrating stable detection).
Visual results (Figure 11) show that: in low-density scene (sparsely distributed trees, low crown overlap), the algorithm accurately segments individual tree boundaries, with results highly consistent with true tree morphology (Figure 11a); in medium-density scene (closer tree spacing, localized crown overlap), the algorithm still performs well in boundary identification, with only slight ambiguity in severely overlapping regions (Figure 11b); in high-density scene (dense distribution, severe crown interlocking and complex understory), the algorithm successfully detects most trees, with missed detections limited to a few small, heavily occluded trees, which demonstrated overall robust segmentation performance (Figure 11c).

3.4. Assessing Generalization on an Independent Test Site

To rigorously assess the generalization capability of our framework and address the limitations of a single-site validation, we applied our pre-trained model directly to an independent validation dataset. This new dataset was acquired from a distinct forest environment on Chongming Island, Shanghai, featuring different species compositions and stand structures. The model was applied without any retraining or fine-tuning to simulate a real-world deployment scenario.
The framework demonstrated strong generalization performance (Table 5). Out of 508 manually verified reference trees on the Chongming site, the model correctly identified 414, achieving a recall (detection rate) of 81.5% and a precision of 79.0% (Figure 12). This indicates that the model successfully transferred its learning to a new, unseen environment while maintaining a low rate of false positives.
While there was a moderate decrease in recall and precision compared to the primary test site-an expected outcome of domain shift in cross-site applications—the instance-level mean IoU on the Chongming dataset was a remarkable 75.4%. This is significantly higher than the mIoU achieved on the high-density primary test site (56.2%). This finding suggests that while the model’s detection sensitivity may be slightly reduced when deployed in a new environment, its ability to precisely delineate the geometric boundaries of correctly identified trees is exceptionally robust.
The model predicted 524 trees compared to the 508 reference trees, representing a minor over-prediction rate of approximately 3%. This can be attributed to factors such as minor crown fragmentation in dense canopies or subtle differences in understory vegetation between the two sites.
Overall, these external validation results provide compelling evidence that the TreeLA-Net + SEGR framework is not overfitted to its training data. Instead, it has learned generalizable structural features of trees, demonstrating the robustness and practical applicability required for deployment in large-scale, operational urban forest inventories.

4. Discussion

The results of this study, which utilized a two-stage hybrid framework for individual tree segmentation, offer an opportunity to discuss several aspects related to the development of automated forest inventory tools. A primary observation is that combining a domain-adapted deep learning model with a geometric algorithm may provide a pathway to achieve robust segmentation performance while reducing the reliance on instance-level annotations. The core contribution of this work lies in its interpretable hybrid intelligence paradigm, which offers a practical and scalable alternative to end-to-end “black-box” models. Unlike opaque deep learning architectures, each component of our SEGR algorithm—from hybrid anchor generation to competitive region growing—is based on explicit, geometrically grounded rules, enhancing the transparency, reliability, and diagnostic capability of the segmentation process [28].
TreeLA-Net adapts the original RandLA-Net to the unique characteristics of forest point clouds (i.e., vertical stratification and class imbalance) through four targeted improvements. First, the multi-modal feature input layer integrates geometric priors such as verticality, normalized height and local density, enabling the model to establish a foundational understanding of the forest’s vertical structure from the initial training phase and reducing the complexity semantic learning [15]. Second, the HSSA module explicitly models the influence of height differences on semantic correlations, strengthening intra-layer feature aggregation and improving the trunk class IoU by 3.6% points compared to the baseline. Third, a hybrid loss function, combining weighted Focal Loss and Dice Loss, addresses extreme class imbalance by prioritizing learning for rare classes (i.e., trunks) while optimizing boundary segmentation. Experiments results confirmed that TreeLA-Net achieved an overall accuracy of 89.8% and a mean IoU(mIou) of 81.9% in semantic segmentation, representing improvements of 2.5% and 1.8% points over the original RandLA-Net, respectively. These outcomes validate the effectiveness of the targeted enhancements for forest point cloud analysis.
This study further introduces the SEGR framework, which enables individual tree segmentation without requiring additional instance-level annotations. By adopting a “geometric-semantic alignment and correction” strategy, SEGR significantly reduces data annotation requirements and enhances cross-scene transferability compared to end-to-end deep learning methods. Its core innovation is the “trunk anchor + crown density supplement” hybrid strategy: initial anchors are generated from clustered trunk points (a direct output of semantic segmentation) to ensure localization stability, while supplementary anchors are identified in crown-dense areas (where trunks may be occluded) using Kernel Density Estimation (KDE) to improve detection completeness. Crown point assignment is based on a weighted combination of horizontal and vertical distances, avoiding the boundary distortions common in 2D-based assignment methods and mitigating un-der-segmentation in dense stands. Ablation studies revealed significant limitations of single-anchor strategies (recall of 59.3% for trunk-only and 58.0% for crown-only), whereas the hybrid strategy achieved a recall of 80.7%. The final method achieved a 92.0% detection rate, 80.7% recall, and 56.2% instance-level mIoU. These results validate the corrective role of semantic guidance in enhancing traditional geometric methods and align with the trend of “weakly supervised learning” in point cloud instance segmentation [5].
To quantitatively analyze the non-liner relationship between algorithm performance and scene complexity, a three-tiered scenario gradient was constructed: low-density (10–30 trees), medium-density (40–50 trees) and high-density (88 trees). In low-density scenarios (sparse trees and low crown overlap), the algorithm exhibited a slight over-segmentation tendency but maintained stable overall accuracy. In medium-density scenarios represented typical urban forest configurations, the algorithm achieved an optimal balance between detection accuracy and boundary quality. In high-density scenarios (severe crown interlocking), the detection rate remained high (>90%), but boundary ambiguity caused the mIoU to drop to around 56%. These findings define the application scope and performance boundaries of the TreeLA-Net + SEGR framework. For low-to-medium density stands (tree density <0.1 trees/m2, crown overlap < 40%), the framework achieves an mIoU > 65% and recall > 66%, meeting high-accuracy requirements for applications such as urban forest inventories and carbon stock estimation. For high-density stands (tree density > 0.1 trees/m2, crown overlap > 40%), the tree detection rate remains above 80%, but boundary segmentation accuracy requires further improvement [27]. For applications requiring high boundary precision (e.g., precise measurement of DBH or tree height), the method should be combined with complementary techniques such as high-resolution imagery fusion or multi-source sensor data analysis. However, for tasks requiring only macroscopic information such as tree count and location, the algorithm can be directly applied.
A key question for any new method is its ability to generalize beyond the training data. To assess this, we conducted an external validation by applying our pre-trained model to an independent dataset from Chongming Island. This site differed from our primary study site in species composition, stand structure, and data acquisition method (mobile SLAM vs. static TLS). The model was applied without any retraining or fine-tuning. The results from this external validation (Table 5) indicate that the framework retains a high degree of effectiveness on unseen data. The model achieved a recall of 81.5% and a precision of 79.0%, suggesting that it has learned generalizable structural features rather than site-specific patterns. While a performance decrease compared to the primary site is observable and expected due to the domain shift, the instance-level mean IoU remained high at 75.4%. This suggests that the geometric delineation quality for detected trees is robust.
These findings provide evidence that the proposed method is not strictly overfitted to a single plot. However, the performance gap between the two sites also highlights that domain shift—arising from differences in forest structure and sensor characteristics—remains a challenge. Therefore, while our framework shows potential for transferability, its application to new environments should be approached with an understanding that site-specific factors may influence performance. Further research involving a wider variety of forest types is needed to fully characterize its operational capabilities. To contextualize our findings, it is useful to compare our TLS-based framework with recent studies, many of which utilize UAV-based LiDAR. While direct methodological comparisons are challenging due to differing sensor perspectives and forest conditions, our performance metrics appear to be competitive. For instance, our framework’s F1-scores are comparable to high-performing results reported from UAV data processed with fully supervised deep learning models [29]. A potential advantage of our approach, however, may lie in its scalability, as it does not require the instance-level annotations typically needed for such supervised models. Similarly, while other methods report high accuracy through the fusion of LiDAR with hyperspectral or RGB data [25,30], our study suggests that a domain-adapted, single-source (TLS) hybrid model can also achieve robust performance. This indicates that enhancing the initial semantic segmentation quality is a critical step that may empower simpler, unsupervised geometric algorithms, offering a practical alternative to more complex data fusion or supervised clustering workflows [31].
A key aspect of our work is the quantitative analysis of performance across varying stand densities. The observed decrease in delineation accuracy (mIoU) in our high-density scenario is consistent with findings from other studies that identify crown morphology and overlap as significant limiting factors in segmentation [29]. Despite this, our framework maintained a high tree detection rate (>80%) even in the densest plots. This apparent trade-off between robust detection and declining delineation accuracy under severe canopy interlocking seems to be a characteristic of the proposed hybrid approach. By investigating performance across a density gradient, our study aims to provide nuanced, practical information regarding the operational application of automated segmentation tools, which moves beyond a single accuracy metric to explore the boundaries of the method’s effectiveness.

5. Limitations

Despite the potential demonstrated by the TreeLA-Net + SEGR framework for automated individual tree segmentation, this study has several limitations. First, while our external validation provided evidence of transferability, the framework’s performance was primarily evaluated at a single, complex study site. We acknowledge that the ultimate validation for any forest inventory method must come from direct comparison against high-accuracy, independent field measurements. Second, the performance of the instance segmentation is inherently dependent on the quality of the initial semantic segmentation, meaning errors in the semantic stage can propagate and affect the final delineation accuracy. Finally, this study focused on geometric accuracy, while other important aspects of forest inventory, such as species identification and health assessment, were not addressed.

6. Conclusions

This study presented a two-stage hybrid framework, TreeLA-Net + SEGR, to address the challenge of automated individual tree segmentation in complex urban forests. Our work introduced a forest-adapted semantic network and a hybrid-anchor instance algorithm, which together yielded a high tree detection rate without requiring instance-level annotations. A key aspect of this research was a dual validation approach: systematic internal validation across density gradients was conducted to establish the framework’s performance boundaries, while an external validation on an independent site provided evidence of its generalization capability. The proposed framework is offered as a potential tool for consideration in the field of precision forestry. It is hoped that these findings on performance and transferability can contribute to the ongoing efforts to develop robust and scalable solutions for automated urban forest inventory.

Author Contributions

Conceptualization, K.S., M.L. and Y.Z.; methodology, Y.Z. and P.L.; software, Y.Z.; validation, Y.Z., M.L. and P.L.; formal analysis, Y.Z.; investigation, Y.Z., P.L. and G.Z.; resources, M.L. and Q.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, K.S. and M.L.; visualization, Y.Z.; supervision, K.S. and M.L.; project administration, K.S.; funding acquisition, K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Science and Technology Commission of Shanghai Municipality under the project “Key Technology Research and Demonstration for Intelligent Monitoring of Forest Resources” (Grant numbers: 23dz1204500 and 23dz1204501).

Data Availability Statement

The point cloud data presented in this study are not publicly available due to proprietary restrictions associated with the data acquisition agreement. However, the data may be made available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge the Shanghai Haiwan National Forest Park for providing site access. We also thank the personnel who provided technical support throughout the experimental process.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Latifi, H.; Fassnacht, F.E.; Müller, J.; Tharani, A.; Dech, S.; Heurich, M. Forest inventories by LiDAR data: A comparison of single tree segmentation and metric-based methods for inventories of a heterogeneous temperate forest. Int. J. Appl. Earth Obs. Geoinf. 2015, 42, 162–174. [Google Scholar] [CrossRef]
  2. Xiang, B.; Wielgosz, M.; Kontogianni, T.; Peters, T.; Puliti, S.; Astrup, R.; Schindler, K. Automated forest inventory: Analysis of high-density airborne LiDAR point clouds with 3D deep learning. Remote Sens. Environ. 2024, 305, 114078. [Google Scholar] [CrossRef]
  3. Zhang, C.; Zhou, Y.; Qiu, F. Individual tree segmentation from LiDAR point clouds for urban forest inventory. Remote Sens. 2015, 7, 7892–7913. [Google Scholar] [CrossRef]
  4. Kulicki, M.; Cabo, C.; Trzciński, T.; Będkowski, J.; Stereńczak, K. Artificial intelligence and terrestrial point clouds for forest monitoring. Curr. For. Rep. 2024, 11, 5. [Google Scholar] [CrossRef]
  5. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
  6. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
  7. Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 14–19 June 2020; pp. 11108–11117. [Google Scholar]
  8. Engelmann, F.; Kontogianni, T.; Hermans, A.; Leibe, B. Exploring spatial context for 3D semantic segmentation of point clouds. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 716–724. [Google Scholar]
  9. Liu, Q.; Ma, W.; Zhang, J.; Liu, Y.; Xu, D.; Wang, J. Point-cloud segmentation of individual trees in complex natural forest scenes based on a trunk-growth method. J. For. Res. 2021, 32, 2403–2414. [Google Scholar] [CrossRef]
  10. Comesaña-Cebral, L.; Martínez-Sánchez, J.; Lorenzo, H.; Arias, P. Individual tree segmentation method based on mobile backpack LiDAR point clouds. Sensors 2021, 21, 6007. [Google Scholar] [CrossRef]
  11. Ayrey, E.; Fraver, S.; Kershaw, J.A., Jr.; Kenefic, L.S.; Hayes, D.; Weiskittel, A.R.; Roth, B.E. Layer stacking: A novel algorithm for individual forest tree segmentation from LiDAR point clouds. Can. J. Remote Sens. 2017, 43, 16–27. [Google Scholar] [CrossRef]
  12. Peng, Y.; Feng, H.; Chen, T.; Hu, B. Point cloud instance segmentation with inaccurate bounding-box annotations. Sensors 2023, 23, 2343. [Google Scholar] [CrossRef]
  13. Shang, K.; Zheng, S.; Zhang, Q. Characteristics of plant community structure in a 1 hm2 lot of the Haiwan National Forest Park of Shanghai and the significance of its dynamics monitoring. J. Ecol. Rural Environ. 2013, 29, 316–321. [Google Scholar]
  14. Cai, S.; Yu, S.; Hui, Z.; Tang, Z. ICSF: An improved cloth simulation filtering algorithm for airborne LiDAR data based on morphological operations. Forests 2023, 14, 1520. [Google Scholar] [CrossRef]
  15. Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
  16. Weinmann, M.; Jutzi, B.; Hinz, S.; Mallet, C. Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J. Photogramm. Remote Sens. 2015, 105, 286–304. [Google Scholar] [CrossRef]
  17. Weinmann, M.; Jutzi, B.; Mallet, C. Geometric features and their relevance for 3D point cloud classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 4, 157–164. [Google Scholar] [CrossRef]
  18. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  19. Brodu, N.; Lague, D. 3D terrestrial lidar data classification of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology. ISPRS J. Photogramm. Remote Sens. 2012, 68, 121–134. [Google Scholar] [CrossRef]
  20. Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking network design and local geometry in point cloud: A simple residual MLP framework. arXiv 2022, arXiv:2202.07123. [Google Scholar] [CrossRef]
  21. Wang, Y.; Lehtomäki, M.; Liang, X.; Pyörälä, J.; Kukko, A.; Jaakkola, A.; Liu, J.; Feng, Z.; Chen, R.; Hyyppä, J. Is field-measured tree height as reliable as believed–A comparison study of tree height estimates from field measurement, airborne laser scanning and terrestrial laser scanning in a boreal forest. ISPRS J. Photogramm. Remote Sens. 2019, 147, 132–145. [Google Scholar] [CrossRef]
  22. Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16259–16268. [Google Scholar]
  23. Yeung, M.; Sala, E.; Schönlieb, C.-B.; Rundo, L. Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 2022, 95, 102026. [Google Scholar] [CrossRef]
  24. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: Abingdon-on-Thames, UK, 2018. [Google Scholar]
  25. Qin, H.; Zhou, W.; Yao, Y.; Wang, W. Individual tree segmentation and tree species classification in subtropical broadleaf forests using UAV-based LiDAR, hyperspectral, and ultrahigh-resolution RGB data. Remote Sens. Environ. 2022, 280, 113143. [Google Scholar] [CrossRef]
  26. Li, W.; Guo, Q.; Jakubowski, M.K.; Kelly, M. A new method for segmenting individual trees from the lidar point cloud. Photogramm. Eng. Remote Sens. 2012, 78, 75–84. [Google Scholar] [CrossRef]
  27. Liang, X.; Hyyppä, J.; Kaartinen, H.; Lehtomäki, M.; Pyörälä, J.; Pfeifer, N.; Holopainen, M.; Brolly, G.; Francesco, P.; Hackenberg, J.; et al. International benchmarking of terrestrial laser scanning approaches for forest inventories. ISPRS J. Photogramm. Remote Sens. 2018, 144, 137–179. [Google Scholar] [CrossRef]
  28. Winiwarter, L.; Mandlburger, G.; Schmohl, S.; Pfeifer, N. Classification of ALS point clouds using end-to-end deep learning. PFG–J. Photogramm. Remote Sens. Geoinf. Sci. 2019, 87, 75–90. [Google Scholar] [CrossRef]
  29. Liu, Y.; You, H.; Tang, X.; You, Q.; Huang, Y.; Chen, J. Study on individual tree segmentation of different tree species using different segmentation algorithms based on 3D UAV data. Forests 2023, 14, 1327. [Google Scholar] [CrossRef]
  30. Chen, X.; Wang, R.; Shi, W.; Li, X.; Zhu, X.; Wang, X. An individual tree segmentation method that combines LiDAR data and spectral imagery. Forests 2023, 14, 1009. [Google Scholar] [CrossRef]
  31. Wang, Y.; Yang, X.; Zhang, L.; Fan, X.; Ye, Q.; Fu, L. Individual tree segmentation and tree-counting using supervised clustering. Comput. Electron. Agric. 2023, 205, 107629. [Google Scholar] [CrossRef]
Figure 1. The study plot and the validation plot in Shanghai.
Figure 1. The study plot and the validation plot in Shanghai.
Forests 17 00036 g001
Figure 2. The raw point cloud scene of the study plot.
Figure 2. The raw point cloud scene of the study plot.
Forests 17 00036 g002
Figure 3. The overall frameworks of TreeLA-Net.
Figure 3. The overall frameworks of TreeLA-Net.
Forests 17 00036 g003
Figure 4. Feature distribution of different height layers after dimensionality reduction.
Figure 4. Feature distribution of different height layers after dimensionality reduction.
Forests 17 00036 g004
Figure 5. Feature affinity matrix of the HSSA module. Note: Matrix elements represent feature similarity between different height layers, calculated using a Gaussian kernel based on Euclidean distance.
Figure 5. Feature affinity matrix of the HSSA module. Note: Matrix elements represent feature similarity between different height layers, calculated using a Gaussian kernel based on Euclidean distance.
Forests 17 00036 g005
Figure 6. Feature response strength of the HSSA module across different semantic classes.
Figure 6. Feature response strength of the HSSA module across different semantic classes.
Forests 17 00036 g006
Figure 7. Ground truth and model predictions in a front view.
Figure 7. Ground truth and model predictions in a front view.
Forests 17 00036 g007
Figure 8. Confusion matrices for semantic segmentation results from different models.
Figure 8. Confusion matrices for semantic segmentation results from different models.
Forests 17 00036 g008
Figure 9. Visual comparison of individual tree segmentation results from different strategies.
Figure 9. Visual comparison of individual tree segmentation results from different strategies.
Forests 17 00036 g009
Figure 10. Comparison of segmentation performance metrics across scenarios with different tree densities.
Figure 10. Comparison of segmentation performance metrics across scenarios with different tree densities.
Forests 17 00036 g010aForests 17 00036 g010b
Figure 11. Individual tree segmentation results in scenes of varying densities: (a) low-density, (b) medium-density, (c) high-density.
Figure 11. Individual tree segmentation results in scenes of varying densities: (a) low-density, (b) medium-density, (c) high-density.
Forests 17 00036 g011
Figure 12. Individual tree segmentation results from the Independent Validation Site (Chongming Island).
Figure 12. Individual tree segmentation results from the Independent Validation Site (Chongming Island).
Forests 17 00036 g012
Table 1. Hyperparameter Settings.
Table 1. Hyperparameter Settings.
ParameterValueDescription
Batch-Size32Number of samples per batch
Max-epochs100Maximum number of training epochs
Learning-Rate0.002Initial learning rate
Decay-Rate0.5Factor for learning rate decay
Table 2. Semantic Segmentation Accuracy Comparison of Three Algorithms.
Table 2. Semantic Segmentation Accuracy Comparison of Three Algorithms.
ModelOA/%mIoU/%Kappa
Pointnet84.377.40.79
RandLA-Net87.380.10.83
TreeLA-Net89.881.90.86
Table 4. Ablation Study Comparison for Segmentation Strategies.
Table 4. Ablation Study Comparison for Segmentation Strategies.
MethodDetected/TotalRecall/%Precision/%mIoU/%Main Issue
Semantic + Trunk Anchors72/8859.370.835.7Severe under-segmentation due to missed detection in occluded areas
Semantic + Crown Density55/8858.092.755.5Prone to over-segmentation by splitting large crowns
Semantic + Hybrid Strategy81/8880.786.656.2Balanced performance, but boundary accuracy remains
Table 5. Performance comparison between the primary test site (Haiwan Park) and the independent validation site (Chongming Island).
Table 5. Performance comparison between the primary test site (Haiwan Park) and the independent validation site (Chongming Island).
MetricPrimary Test Site (Haiwan Park)Independent Validation Site (Chongming Island)
Recall (Detection Rate)92.0% (81/88 trees)81.5% (414/508 trees)
Precision86.7%79.0%
F1-Score89.2%80.2%
Mean IoU56.2%75.4%
Table 3. Per-class IoU Comparison of the Three Models.
Table 3. Per-class IoU Comparison of the Three Models.
ClassPointnet/%RandLA-Net/%TreeLA-Net/%Improvement/%
Crown89.291.492.3+0.9
Trunk54.357.861.4+3.6
Shrub78.280.782.6+1.9
Ground88.190.391.4+1.1
Note: Improvements are calculated relative to RandLA-Net.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, M.; Zhang, Y.; Zhang, G.; Luo, P.; Zhang, Q.; Shang, K. A Novel Framework for Individual Tree Segmentation in Complex Urban Forests from Terrestrial LiDAR Point Clouds. Forests 2026, 17, 36. https://doi.org/10.3390/f17010036

AMA Style

Liu M, Zhang Y, Zhang G, Luo P, Zhang Q, Shang K. A Novel Framework for Individual Tree Segmentation in Complex Urban Forests from Terrestrial LiDAR Point Clouds. Forests. 2026; 17(1):36. https://doi.org/10.3390/f17010036

Chicago/Turabian Style

Liu, Ming, Yanwen Zhang, Guowei Zhang, Peiwen Luo, Qian Zhang, and Kankan Shang. 2026. "A Novel Framework for Individual Tree Segmentation in Complex Urban Forests from Terrestrial LiDAR Point Clouds" Forests 17, no. 1: 36. https://doi.org/10.3390/f17010036

APA Style

Liu, M., Zhang, Y., Zhang, G., Luo, P., Zhang, Q., & Shang, K. (2026). A Novel Framework for Individual Tree Segmentation in Complex Urban Forests from Terrestrial LiDAR Point Clouds. Forests, 17(1), 36. https://doi.org/10.3390/f17010036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop