Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests

Zhang, Kongwen (Frank); Zhang, Tianning; Liu, Jane

doi:10.3390/rs17173102

Open AccessArticle

Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests

by

Kongwen (Frank) Zhang

^1,*

,

Tianning Zhang

¹

and

Jane Liu

²

¹

School of Computing, University of the Fraser Valley, Abbotsford, BC V2S 7M7, Canada

²

Department of Geography and Planning, University of Toronto, Toronto, ON M5S 3G3, Canada

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 3102; https://doi.org/10.3390/rs17173102

Submission received: 28 July 2025 / Revised: 29 August 2025 / Accepted: 2 September 2025 / Published: 5 September 2025

(This article belongs to the Special Issue Advanced Artificial Intelligence and Deep Learning for Remote Sensing (3rd Edition))

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

Expanded the study of the pseudo tree crown (PTC), a pseudo-3D (2.5D) data representation derived from nadir UAV RGB imagery for individual tree classification and vertical structure estimation.
Demonstrated that PTC consistently improves individual conifer species classification accuracy across multiple ML/DL models compared with standard RGB images.

What is the implication of the main finding?

PTC provides a cost-effective and scalable alternative to LiDAR, enabling the accurate classification of tree species and the extraction of pseudo-3D crown structure using only single nadir images.
This approach strengthens forest monitoring, biodiversity assessment, and climate resilience planning, supporting sustainable forestry and future integration with full 3D crown models.

Abstract

Coniferous forests in Canada play a vital role in carbon sequestration, wildlife conservation, climate change mitigation, and long-term sustainability. Traditional methods for classifying and segmenting coniferous trees have primarily relied on the direct use of spectral or LiDAR-based data. In 2024, we introduced a novel data representation method, pseudo tree crown (PTC), which provides a pseudo-3D pixel-value view that enhances the informational richness of images and significantly improves classification performance. While our original implementation was successfully tested on urban and deciduous trees, this study extends the application of PTC to Canadian conifer species, including jack pine, Douglas fir, spruce, and aspen. We address key challenges such as snow-covered backgrounds and evaluate the impact of training dataset size on classification results. Classification was performed using Random Forest, PyTorch (ResNet50), and YOLO versions v10, v11, and v12. The results demonstrate that PTC can substantially improve individual tree classification accuracy by up to 13%, reaching the high 90% range.

Keywords:

pseudo tree crown(PTC); individual tree species (ITS) classification; unmanned aerial vehicle (UAV); deep learning (DL); machine learning (ML)

1. Introduction

Forests play a vital role in sustaining ecological balance, supporting biodiversity, and mitigating the impacts of climate change [1,2]. In particular, North America’s boreal forests have the potential to act as climate refugia under global warming, thereby contributing to the preservation of biodiversity and the resilience of ecosystems [3].

Accurate and detailed tree classification, particularly at the individual tree level, is essential for biodiversity assessment, forest health monitoring, and sustainable forest management [4,5]. Information on tree species distributions further provides valuable insights into drought susceptibility and growth dynamics, enabling proactive forest conservation strategies [6].

Remote sensing (RS) has become a cornerstone of forest monitoring due to its wide spatial coverage, cost-effectiveness, and long-term sustainability [7]. A variety of RS platforms, including satellite-borne, airborne, unmanned aerial vehicles (UAVs), and ground-based sensors, supply diverse datasets ranging from multispectral and hyperspectral imagery to high-resolution and thermal data. Traditionally, RS-based forest analysis has relied on vegetation indices and empirical models across canopy and leaf scales. More recently, advances in machine learning (ML) and deep learning (DL) have enabled more intelligent and automated image classification. However, classification accuracy remains heavily dependent on training data quality and is often affected by noise and background variability [8,9].

One of the major limitations of most remote sensing (RS) techniques is their reliance on nadir (top-down) imagery, which lacks the vertical structural information essential for accurate forest characterization [10]. Our long-term research has sought to address this limitation by developing methods to compensate for the missing vertical dimension in nadir-view imagery. Over a decade of research and development [11,12] has led to the introduction of the pseudo tree crown (PTC) data representation, a pseudo-3D (2.5D) model that strongly correlates with the true tree crown (TTC) structure. Unlike traditional spatially overlapping photogrammetry approaches or data-driven LiDAR methods, PTC provides an independent pathway for vertical structure characterization.

The PTC approach requires only a single nadir image. It avoids the spatial variability and interclass similarity challenges inherent in photogrammetry and eliminates the need for direct vertical measurements, such as light detection and ranging (LiDAR), which are costly and limited in availability. In dense forest canopies, LiDAR returns are often attenuated or deflected, further constraining its effectiveness.

This research has progressed in three phases. First, PTC is derived from nadir imagery and applied to improve tree species classification, which is the focus of this study and part of a broader research program. Previous work demonstrated the successful application of PTC to deciduous tree classification [13,14]. In this paper, we extend the method to coniferous species, building on preliminary findings reported in [15]. The second phase involves constructing species-specific average TTC models using LiDAR data, which serve only as training references and are not required in future applications. The final phase establishes a systematic correlation between PTC and TTC, with initial results presented in Balkenhol and Zhang (2017) [12]. We also have more papers in the pipeline.

In this study, we evaluate our PTC employing five ML-based image classifiers, as follows: (1) random forest (RF), serving as a traditional benchmark; (2) ResNet50 implemented in the PyTorch framework, representing a widely adopted deep classifier; and (3–5) three recent variants of You Only Look Once (YOLO)—v10, v11, and v12. Our primary objective is not simply to compare classifiers, but to assess whether the adoption of PTC enhances classification outcomes across these models. Experimental results demonstrate consistent improvements of 2–12% when PTC is incorporated.

The remainder of this paper is structured as follows: Section 2 provides a review of recent related work, focusing primarily on studies from late 2024 and 2025, as earlier developments were extensively covered in Miao et al. (2024) [14]. Section 3 details the dataset and preprocessing methods. Section 4 presents comparative results and analyzes the effects of snow background and dataset size on classification accuracy. Section 5 discusses the broader implications of our findings, and Section 6 concludes the paper with key insights and directions for future research.

2. Related Work

2.1. Individual Tree Species Classification and Vertical Biometric Information

Image classification is a fundamental task in remote sensing and computer vision, with applications spanning a wide range of disciplines, including vegetation monitoring, urban studies, geological exploration [16], and object detection [17]. Within forestry and vegetation studies, traditional methods have primarily relied on spectral vegetation indices, such as the normalized difference vegetation index (NDVI) [18,19], and empirical radiative transfer models, including scattering by arbitrarily inclined leaves (SAIL) [20] at the canopy level and PROSPECT at the leaf level [21]. These approaches typically employ natural color (RGB), hyperspectral, or high-resolution imagery to capture vegetation properties. In recent years, advances in machine learning (ML) and deep learning (DL) have substantially improved classification accuracy by enabling the extraction of complex spatial and spectral features [22]. Building on this trend, the present study introduces PTC, a novel data modification technique designed to further enhance classification performance. Conventional image classification methods generally produce two-dimensional (2D) boundary delineations and categorical labels; however, they often lack the ability to capture detailed three-dimensional (3D) biometric information.

For 3D structural attributes such as tree height and diameter at breast height (DBH), LiDAR remains the most reliable data source [23]. LiDAR systems, utilizing single or multiple returns, provide direct and accurate measurements of vertical vegetation structure. While photogrammetric techniques have been proposed as an alternative, their effectiveness in forested environments is often limited due to the inherent complexity and heterogeneity of tree canopies [24]. Furthermore, the acquisition of LiDAR data remains costly and subject to limited spatial and temporal availability. Since its introduction into forestry applications in the late 2000s, LiDAR has not achieved the long-term temporal coverage or historical depth of optical imagery archives [25,26].

The proposed PTC method seeks to bridge the gap between 2D image-based classification and 3D structural analysis. By introducing a novel data representation, PTC not only improves classification accuracy but also establishes correlations between PTC and TTC, offering a pathway toward integrating spectral-spatial classification with structural vegetation metrics. This dual capability highlights the potential of PTC to advance remote sensing applications by linking traditional image-based approaches with emerging 3D forest characterization techniques.

2.2. The Classic Machine Learning (ML) Classification

Classic machine learning classifiers, such as support vector machine (SVM) [27] and random forest (RF) [28], remain widely used in recent research due to their relatively low computational cost and straightforward architecture [29]. In contemporary studies, SVM and RF are often employed as benchmark models or integrated into more complex frameworks to evaluate new data representations. For instance, Hu et al. (2025) [30] applied RF to multispectral airborne LiDAR data for urban tree species classification, demonstrating the importance of structured 3D information. Similarly, Ke et al. (2025) [31] adopted a modified RF approach, MRMR-RF-CV, for wetland classification using multi-source remote sensing imagery, highlighting the value of combining diverse features. El Kharki et al. (2024) [32] utilized SVM with Sentinel-2 data for argan tree classification, while Thapa et al. (2024) [33] compared SVM, RF, and neural networks (NN) for urban tree species identification, emphasizing feature selection and data representation in improving accuracy. Wang and Jiang (2024) [34] combined SVM with a convolutional neural network (CNN) for hyperspectral tree classification, illustrating the benefit of richer feature representations. Zhang et al. (2024) [35] explored differences in classification performance between Sentinel-2 and Landsat-8 data using RF, SVM, and gradient tree boosting (GTB), further supporting the influence of input data structure on model performance. Aziz et al. (2024) [36] compared ANN and RF classifiers, showing that the choice of algorithm interacts with feature representation, while Manoharan et al. (2023) [37] proposed a hybrid fuzzy SVM model for coconut tree classification, again demonstrating the importance of transforming raw data into meaningful representations.

These studies collectively highlight a key motivation for our work—the design of novel data representations, such as PTC, which can adopt different classifiers and significantly enhance classification performance.

2.3. The Deep Learning Classification Progress

CNN was originally introduced by LeCun et al. (1998) [38], while graph neural networks (GNNs) were first proposed by Scarselli et al. (2009) [39]. Krizhevsky et al. (2012) [40] introduced AlexNet, and Simonyan et al. (2014) [41] proposed the Visual Geometry Group (VGG) network, which increased depth by adding layers to improve accuracy. Ronneberger et al. (2015) [42] proposed U-Net, which is effective for small sample sizes and pixel-level segmentation. He et al. (2016) [43] introduced the revolutionary ResNet based on residual blocks, marking an important milestone for CNNs. He et al. (2017) [44] proposed Mask R-CNN for instance segmentation. Woo et al. (2018) [45] introduced the convolutional block attention module (CBAM), which enhances feature representation through attention mechanisms. Veličković et al. (2018) [46] proposed graph attention networks (GATs), which introduce attention mechanisms to graph classification. Tan et al. (2019) [47] introduced EfficientNet, which applied a compound scaling method to improve model efficiency. Dosovitskiy et al. (2020) [48] introduced the Vision Transformer (ViT), which explores the potential of Transformers in image classification. Ren et al. (2021) [49] proposed Rotated Mask R-CNN for oriented object detection. Ding et al. (2022) [50] proposed RepLKNet, which uses large kernels to improve image classification performance. With the rapid development of CNN models and their application to tree classification, explosive progress has been made. Hundreds of papers have been published, showcasing a wide range of methods and results. A comprehensive summary is available in Zhong et al. (2024) [9]. For example, He et al. (2025) [51] used a lightweight fusion manifold for multimodal remote sensing data classification. Wang et al. (2025) [52] proposed TGF-Net, a fusion of Transformer and Gist CNN networks for multi-modal image classification. Chi et al. (2024) [53] introduced TCSNet, designed for individual tree crown classification using UAV imagery. Dersch et al. (2024) [54] applied Mask R-CNN for UAV RGB image classification. Qi et al. (2024) [55] used a two-branched network for hyperspectral image cross-domain classification.

Despite this progress, most deep learning models rely on standard image representations (RGB, hyperspectral, or LiDAR-derived features) and do not explicitly encode the vertical structural information of individual trees. The PTC representation developed in this study addresses this gap by transforming individual tree imagery into a 2.5D pseudo-3D structure, capturing vertical crown morphology in addition to spectral features. By integrating PTC with CNN-based classifiers, we provide a data representation that enhances feature richness and classification accuracy, particularly for challenging cases such as mixed-species canopies or snow-covered trees. This approach builds upon and complements existing DL methods while providing a novel pathway to incorporate structural tree information into automated classification pipelines.

2.4. The You Only Look Once (YOLO) Development

The You Only Look Once (YOLO) family has emerged as one of the most influential object detection frameworks, maintaining widespread adoption and active development since its introduction by Redmon et al. (2015) [56]. Continuous improvements, particularly since 2020, reflect the community’s commitment to enhancing detection accuracy, efficiency, and adaptability [57,58]. It is important to note that YOLO version numbers indicate release order rather than a strict measure of performance or feature inheritance.

Among these, YOLOv5 [57,58], developed by Glenn Jocher and the Ultralytics team in 2020, was the first stable and widely adopted version. Built on a cross-stage partial (CSP) backbone and Path Aggregation Network (PANet) for feature fusion, YOLOv5 was one of the classifiers evaluated in our previous work [14], demonstrating competitive performance on individual tree detection tasks. YOLOv8 (2023) introduced a refined CSP backbone and an anchor-free detection head, improving adaptability to objects with varying sizes and aspect ratios.

More recent developments include YOLOv10 (2024) [59,60], which incorporates a dual-head design, “one-to-many” for training and “one-to-one” for inference, eliminating non-maximum suppression and reducing inference latency. YOLOv11 (late 2024) [61] employs a Transformer-based backbone with a dynamic detection head, while YOLOv12 (early 2025) [62] integrates an attention-centric design and FlashAttention for efficient processing in cluttered scenes. Although YOLOv13 has been proposed, its algorithm and code remain unstable, and it was not included in this study.

In our current work, we utilize these YOLO versions to evaluate the performance of the PTC representation. While YOLO provides strong object detection capabilities without preprocessing, the integration of PTC enhances feature representation by encoding vertical crown structure, allowing the models to better discriminate individual trees under challenging conditions, such as overlapping canopies or snow-covered branches. By comparing multiple YOLO versions, we assess the robustness of PTC data representations across state-of-the-art detection frameworks and demonstrate their potential to improve tree species classification beyond standard RGB or multispectral imagery.

3. Data and Preprocessing

The study area is located south of Kamloops, British Columbia (50°34′19.345″N, 120°27′1.209″W), and has four dominant tree species: fir, pine, spruce, and aspen, representing a typical North American boreal forest mix according to Natural Resources Canada [63], as shown in Figure 1. Aerial imagery was captured on 17 February 2023, and uploaded by Arura UAV [64]. The dataset, licensed under Creative Commons (CC) 4.0, was designated as a public project on 31 March 2025.

The survey was conducted using a DJI Zenmuse P1 camera (Shenzhen, China) mounted on a DJI M300 RTK drone (Shenzhen, China) at an average flight altitude of approximately 128 m. The camera has a 35 mm fixed focal length with a maximum aperture of f/2.8. The shutter speed was set to 1/500 s. All images were georeferenced using the WGS84 coordinate system and saved in JPEG format.

A sample image, along with examples of individual trees from four species: fir, pine, spruce, and aspen, is presented in Figure 2. All available trees within the image were manually identified, segmented, and annotated by the authors, as illustrated in Figure 3. These annotations serve as the reference ground truth in this study.

4. Methodology

Simply speaking, in this study, the methodology contains two main steps. First, we manually segment the original RGB images into individual tree images, then generate the PTC representation as a novel image input, replacing the conventional nadir-view imagery. Second, these PTC images are used as inputs to various ML/DL classifiers and compared to the same images without PTC generation using the same classifiers. This work extends our previous research, with additional details on the PTC methodology available in Miao et al. [14]. The overall workflow of the study is depicted in Figure 4, while the detailed PTC generation process is illustrated in Figure 5.

The PTC data representation, which was generated from the original RGB nadir image, is the primary methodological contribution of this study, having been developed and refined over more than a decade, with its historical context presented in the introduction. As shown in Figure 5, individual trees were manually segmented based on ground truth annotations and then converted into single-band monochrome images. The gray-scale values of these images were then mapped to vertical heights, effectively transforming each pixel into a vertical bar, thereby producing a three-dimensional (3D) representation of the tree crown. Because this representation does not directly map out the actual crown structure, it is referred to as the “pseudo” crown. The PTC representation is a full 3D model, which can be rotated and interpreted differently depending on the viewing angle, as illustrated in Figure 6. For comparison with the original RGB images, a fixed viewing angle was adopted, effectively rendering the representation in 2.5D. The influence of viewing angle, along with the overall evaluation of PTC effectiveness and performance, is discussed in Section 6.

In this study, we focused on conifer trees under winter conditions, which required an additional step of snow-background removal. We compared classification results with and without this step, as discussed in Section 5.2. Some samples of individual trees with preprocessed snow-background removal are shown in Figure 4.

In winter conditions, the snow background exhibits very high gray values, which render the trees relatively dark and effectively introduce noise rather than useful information. To address this issue, we implemented a preprocessing step to remove the snow background. This step was applied consistently to both PTC and RGB images for subsequent classification comparisons. Specifically, pixels with very high gray values (above 220) were filtered out, and the images were clipped to focus on the tree crown region, as illustrated in Figure 7. The processed images were then converted into PTC representations, with a sample shown in Figure 8. When the snow background is present, the trees appear very dark, resulting in low gray values and producing PTCs that are nearly flat and devoid of distinguishable features. We did not have to do this step in our summer study.

We evaluated the following ML/DL image classifiers: (1) traditional ML RF as a baseline, illustrated in Figure 9. RF is a widely used ensemble method known for its robustness and interpretability, especially in cases with limited training data. (2) ResNet50, implemented in PyTorch v.2.8.0 stable version, which previously yielded the best performance in our deciduous tree study (Figure 10). ResNet50 benefits from deep residual learning, making it effective for capturing complex features in tree crowns. The detailed parameter information is listed in Table 1. (3) YOLOv10, (4) YOLOv11, and (5) YOLOv12, the latest version implemented in this study. These YOLO models represent state-of-the-art object detection architectures, with increasing efficiency and accuracy across versions. The general workflow for YOLO-based models is shown in Figure 11.

While many other classifiers could be explored, we limited our scope to these five due to time constraints and paper length. We encourage researchers to experiment with PTC using their own datasets and preferred models.

5. Results

5.1. The Classification Results of Different Models

We used standard evaluation metrics, including the confusion matrix, accuracy, precision, recall, F1 score, and Intersection over Union (IoU). The corresponding formulas are shown in Equations (1)–(5), where TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives.

Accuracy = \frac{T P + T N}{T P + F N + F P + T N}

(1)

Precision = \frac{T P}{F P + T P}

(2)

Recall = \frac{T P}{T P + F N}

(3)

F 1 Score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(4)

IoU = \frac{T P}{T P + F P + F N}

(5)

The classification accuracy results are summarized in Table 2 to provide a comprehensive overview. Table 3 presents the training times of the different models using the complete dataset of 852 trees. The results demonstrate that models with PTC inputs consistently outperform those with the original RGB images in both accuracy and training time. Detailed performance metrics, including precision, recall, F1 score, and Intersection over Union (IoU), are reported in Table 4 for RF, PyTorch, and YOLO versions v10, v11, and v12, respectively. The corresponding confusion matrices for these models are illustrated in Figure 12 for RF and YOLOv10, with additional models in Figure A1.

5.2. Snow-Background Removal

Since YOLO does not require preprocessing, we evaluated its performance both with and without snow-background removal. The results, summarized in Table 5, indicate a substantial impact of background removal on classification accuracy. We investigate the potential reasons for this effect and discuss them in the discussion section.

5.3. The Classification Results of Different Dataset Sizes

We investigated the effect of dataset size on model performance by conducting experiments with varying training set sizes. The corresponding results for RF, PyTorch (ResNet50), and YOLO versions 10, 11, and 12 are presented in Table 6. Overall, the results indicate a clear trend—increasing the dataset size generally improves classification accuracy across all models, with the rate of improvement differing between algorithms.

6. Discussion

Overall, the PTC consistently outperformed the original RGB images across nearly all models evaluated, corroborating the findings of our previous study [14]. This result underscores the potential of the PTC image representation to be effectively integrated with both traditional machine learning and deep learning classifiers. The underlying rationale is simple—the pseudo-3D structure introduces an additional information channel for classification. As illustrated in Figure 13, projecting grayscale values as “height” generates an artificial 3D vertical profile, which highlights the structural shape of individual crowns. From a visual interpretation perspective, it is generally easier to extract features from a discernible shape than from pixel-level correlations alone; in essence, identifying a tree from a side view is often more straightforward than from a top-down nadir perspective.

From an information-theoretic standpoint, the PTC representation reduces conditional entropy by decreasing uncertainty and randomness in the feature space. Importantly, this pseudo-3D representation is not arbitrary. Variations in grayscale values correspond to meaningful structural attributes, including branch positions and relative leaf and branch heights. Across the tested models, the use of PTC resulted in classification accuracy improvements ranging from approximately 2% to 11%, demonstrating its practical value for enhancing tree crown delineation and species classification.

More importantly, this represents the intermediate step outlined in the introduction as part of our long-term research roadmap. The PTC serves as a bridge toward establishing true tree crown correlations, enabling the extraction or compensation of vertical structural information directly from the pseudo-3D representation.

That said, we observed that certain species, such as aspen (Figure 7), were particularly sensitive to snow-covered backgrounds in winter when foliage is absent. This background interference led to a modest decrease in classification robustness, especially for species exhibiting finer structural features. Nevertheless, even under these challenging conditions, the PTC representation remained relatively resilient. These results indicate that while background interference can directly influence the PTC-derived features and, thus, affect classification outcomes, the pseudo-3D structural information still provides a degree of robustness that mitigates the impact of such environmental variability.

When comparing model performance, several notable patterns were observed. Surprisingly, RF delivered the best overall results, highlighting its robustness and suitability for the PTC representation despite its relative simplicity. Among the deep learning models, YOLOv10 outperformed both YOLOv11 and YOLOv12. This outcome is not entirely unexpected, as YOLO version numbers primarily indicate release sequence rather than guaranteed performance improvements. In practice, YOLOv10 represents a significant enhancement and more stable iteration of YOLOv5, which we successfully employed in our previous deciduous tree study [14]. In contrast, YOLOv11 and YOLOv12 did not confer additional advantages in this specific context, potentially due to differences in their optimization or training procedures, which may not be fully compatible with PTC-style inputs. We did not evaluate YOLOv13, which was released during the final stages of this study.

We also observed clear differences in classification performance between deciduous and coniferous tree species. Deciduous trees, particularly in summer when foliage is fully developed, were more easily distinguished from the background, facilitating higher model accuracy. In contrast, coniferous trees in winter presented greater challenges; reduced foliage and heavy snow cover caused the background to blend with tree structures, particularly for species such as trembling aspen, where contrast was minimal. In some cases, snow significantly obscured discriminative features. As shown in Table 5, snow-induced interference led to performance drops of 8–15% for RGB images, which rely primarily on color and texture. While PTC representations, which emphasize structural shape features, were also affected, the decline was comparatively smaller, ranging from 6% to 13%. These results indicate that even under challenging environmental conditions, the PTC approach provides greater resilience in tree species classification.

Another important factor influencing classification performance was dataset size. As expected, the number of training samples played a significant role. While PTC consistently outperformed the original RGB format across all dataset sizes, the magnitude of improvement varied. To investigate this effect, we tested six dataset sizes—170, 270, 370, 470, 570, and 852 samples, as summarized in Table 6.

For RF and PyTorch (ResNet50), increasing the dataset size resulted in a steady improvement in classification accuracy. Interestingly, YOLO-based models exhibited greater variability. For instance, YOLOv12 achieved only 66.8% accuracy at the smallest dataset size, highlighting its lower robustness with limited training data. In contrast, RF performance remained remarkably stable, with accuracy varying by approximately 2% across all sample sizes. ResNet50 and other YOLO versions showed moderate variation of 3–7%, indicating reasonable but less consistent performance compared to RF. Notably, the YOLO models appeared to reach a performance threshold at around 570 samples, after which accuracy plateaued, likely reflecting limitations in the quality of the training data. These findings suggest that while PTC enhances classification performance, its effectiveness is still dependent on both the size and quality of the training dataset, and that traditional machine learning methods such as RF may be preferable for smaller datasets.

7. Conclusions

In this study, we reported a new application of the PTC approach, a novel data representation that transforms traditional top-down (nadir) imagery into informative pseudo-3D (2.5D) perspectives of trees. This approach demonstrates substantial potential for improving tree species classification accuracy and estimating 3D biometric parameters, which are critical for effective forest monitoring and sustainable management practices.

We extended the PTC concept beyond deciduous trees in the summer, as explored in our previous work, to the more challenging scenario of coniferous trees in winter. This extension enabled us to investigate both structural and seasonal variations, highlighting the adaptability and robustness of PTC across different tree types and environmental conditions. Across all experiments involving multiple machine learning and deep learning models, PTC consistently outperformed the original RGB inputs, with notable improvements in classification accuracy and resilience under challenging conditions, such as snow cover and sparse foliage.

While this study focused on a relatively limited dataset, the results underscore the value of PTC as an intermediate representation bridging traditional imagery and fully 3D tree crown models. Future studies could expand this work by evaluating PTC with larger and more diverse datasets, testing additional deep learning architectures, including YOLOv13 and Transformer-based models, and exploring the integration of multi-spectral or hyperspectral imagery to enhance structural and spectral discrimination. Incorporating vegetation indices or phenological information may further improve classification accuracy and enable species-specific tailoring of PTC representations.

Another promising direction is the integration of PTC with true tree crown (TTC) models, leveraging LiDAR or other 3D data sources to capture vertical structural details more accurately. Methods such as vector projection [12] could provide a foundation for correlating PTC and TTC representations, ultimately enabling more precise 3D crown delineation and biophysical parameter estimation. Beyond classification, these approaches may facilitate ecological monitoring applications such as biomass estimation, forest health assessment, and carbon accounting.

Finally, while recent advances in computer vision aim to reconstruct 3D structures directly from 2D imagery, complex tree architecture and occluded sub-canopy regions remain significant challenges. PTC addresses these limitations by estimating and compensating for missing vertical information, providing a practical, scalable, and interpretable foundation for future 3D tree modeling. Building on this framework, TTC offers the potential for more detailed and accurate vertical profiling, supporting robust forest monitoring and advancing the development of data-driven ecological and environmental management strategies.

Author Contributions

Conceptualization, K.Z.; methodology, K.Z.; software, K.Z. and T.Z.; validation, K.Z. and T.Z.; formal analysis, K.Z. and T.Z.; investigation, K.Z. and T.Z.; resources, K.Z. and J.L.; data curation, T.Z.; writing—original draft preparation, K.Z. and T.Z.; writing—review and editing, K.Z. and J.L.; visualization, K.Z. and T.Z.; supervision, K.Z.; project administration, K.Z. and J.L.; funding acquisition, K.Z. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially supported by the University of the Fraser Valley Faculty Publication fund and the University of Toronto.

Data Availability Statement

The data are publicly available under CC 4.0 license. You can access at: https://universe.roboflow.com/arura-uav/uav-tree-identification-new (last accessed on 31 July 2025).

Acknowledgments

The authors would like to thank Shengjie Miao, former student of Frank Zhang, for assisting in the training of Tianning Zhang and helping to resolve coding issues related to the new dataset. The authors also thank Raghav Sharma, current research assistant of Frank Zhang, for his contribution to preliminary dataset exploration and testing.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript (in order):

PTC	pseudo tree crown
ITS	individual tree species
UAV	unmanned aerial vehicle
DL	deep learning
ML	machine learning
LiDAR	light detection and ranging
RS	remote sensing
YOLO	You Only Look Once
CNN	convolutional neural network
GTB	gradient tree boosting
ANN	artificial neural network
CSP	cross-stage partial
PANet	Path Aggregation Network
NMS	non-maximum suppression

Appendix A

The additional confusing matrix for other models, namely PyTorch(ResNet50), YOLOv10, and YOLOv11.

Figure A1. Confusion matrices of RGB and PTC classification for all models: PyTorch (ResNet50) (top), YOLOv11 (middle), and YOLOv12 (bottom). 0 * means the trembling aspen was classified as unknown or background.

References

Normile, D. Saving Forests to Save Biodiversity. Science 2010, 329, 5997. [Google Scholar] [CrossRef] [PubMed]
Jacob, A.; Wilson, S.; Lewis, S. Forests are more than sticks of carbon. Nature 2014, 507, 306. [Google Scholar] [CrossRef] [PubMed]
D’Orangeville, L.; Duchesne, L.; Houle, D.; Kneeshaw, D.; Cote, B.; Pederson, N. Northeastern North America as a potential refugium for boreal forests in a warming climate. Science 2016, 352, 6292. [Google Scholar] [CrossRef] [PubMed]
Abdollahnejad, A.; Panagiotidis, D. Tree Species Classification and Health Status Assessment for a Mixed Broadleaf-Conifer Forest with UAS Multispectral Imaging. Remote Sens. 2020, 12, 3722. [Google Scholar] [CrossRef]
Li, X.; Zhang, Z.; Xu, C.; Zhao, P.; Chen, J.; Wu, J.; Zhao, X.; Mu, X.; Zhao, D.; Zeng, Y. Individual tree-based forest species diversity estimation by classification and clustering methods using UAV data. Front. Ecol. Evol. 2023, 11, 1139458. [Google Scholar] [CrossRef]
Lu, P.; Parker, W.; Colombo, S.; Skeates, D. Temperature-induced growing season drought threatens survival and height growth of white spruce in southern Ontario, Canada. For. Ecol. Manag. 2019, 448, 355–363. [Google Scholar] [CrossRef]
Pandey, P.C.; Arellano, P. Advances in Remote Sensing for Forest Monitoring; Wiley: Hoboken, NJ, USA, 2023; ISBN 9781119788157. [Google Scholar] [CrossRef]
Feng, Y.; Gong, H.; Li, H.; Qin, W.; Li, J.; Zhang, X. Tree Species Classification Based on Remote Sensing: A Review of Data, Methods, and Challenges. Remote Sens. 2022, 14, 3656. [Google Scholar] [CrossRef]
Zhong, L.; Dai, Z.; Fang, P.; Cao, Y.; Wang, L. A Review: Tree Species Classification Based on Remote Sensing Data and Classic Deep Learning-Based Methods. Forests 2024, 15, 852. [Google Scholar] [CrossRef]
Fourier, R.A.; Edwards, G.; Eldridge, N.R. A catalogue of potential spatial discriminators for high spatial resolution digital images of individual crowns. Can. J. Remote. Sens. 1995, 3, 285–298. [Google Scholar] [CrossRef]
Zhang, K.; Hu, B. Classification of individual urban tree species using very high spatial resolution aerial multi-spectral imagery using longitudinal profiles. Remote Sens. 2012, 4, 1741–1757. [Google Scholar] [CrossRef]
Balkenhol, L.; Zhang, K. Identifying Individual Tree Species Structure with High-Resolution Hyperspectral Imagery Using a Linear Interpretation of the Spectral Signature. In Proceedings of the 38th Canadian Symposium on Remote Sensing, Montreal, QC, Canada, 20–22 June 2017. [Google Scholar]
Miao, S.; Zhang, K.; Liu, J. An AI-based Tree Species Classification Using a 3D Tree Crown Model Derived From UAV Data. In Proceedings of the 44th Canadian Symposium on Remote Sensing, Yellowknife, NWT, Canada, 19–22 June 2023. [Google Scholar]
Miao, S.; Zhang, K.; Zeng, H.; Liu, J. Improving Artificial-Intelligence-Based Individual Tree Species Classification Using Pseudo Tree Crown Derived from Unmanned Aerial Vehicle Imagery. Remote Sens. 2024, 16, 1849. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, K.F.; Miao, S.; Liu, J. Canadian conifer tree classification using the Pseudo Tree Crown (PTC). In Proceedings of the 2025 IEEE International Conference on Aerospace Electronics and Remote Sensing Technology (ICARES); Telkom University Surabaya: Surabaya, Indonesia, 2025; submitted. [Google Scholar]
Wang, W.; Zhao, C.; Wu, Y. Spatial weighting—An effective incorporation of geological expertise into deep learning models. Geochemistry 2024, 84, 126212. [Google Scholar] [CrossRef]
Mehmood, M.; Shahzad, A.; Zafar, B.; Shabbir, A.; Ali, N. Remote Sensing Image Classification: A Comprehensive Review and Applications. Math. Probl. Eng. 2022, 2022, 5880959. [Google Scholar] [CrossRef]
Yan, K.; Gao, S.; Yan, G.; Ma, X.; Chen, X.; Zhu, P.; Li, J.; Gao, S. A global systematic review of the remote sensing vegetation indices. Int. J. Appl. Earth Obs. Geoinf. 2025, 139, 104560. [Google Scholar] [CrossRef]
Vélez, S.; Martínez-Peña, R.; Castrillo, D. Beyond Vegetation: A Review Unveiling Additional Insights into Agriculture and Forestry through the Application of Vegetation Indices. J 2023, 6, 421–436. [Google Scholar] [CrossRef]
Verhoef, W. Light scattering by leaf layers with application to canopy reflectance modeling: The SAIL model. Remote Sens. Environ. 1984, 16, 125–141. [Google Scholar] [CrossRef]
Jacquemoud, S.; Baret, F. PROSPECT: A model of leaf optical properties spectra. Remote Sens. Environ. 1990, 34, 75–91. [Google Scholar] [CrossRef]
Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, D. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Li, R.; Wang, L.; Zhai, Y.; Huang, Z.; Jia, J.; Wang, H.; Ding, M.; Fang, J.; Yao, Y.; Ye, Z.; et al. Modeling LiDAR-Derived 3D Structural Metric Estimates of Individual Tree Aboveground Biomass in Urban Forests: A Systematic Review of Empirical Studies. Forests 2025, 16, 390. [Google Scholar] [CrossRef]
Pascual, C.; Garcia-Abril, A.; Cohen, W.; Martin-Fernandez, S. Relationship between LiDAR derived forest canopy height and Landsat images. Int. J. Remote Sens. 2010, 31, 1261–1280. [Google Scholar] [CrossRef]
Coops, N.C.; Tompalski, P.; Goodbody, T.R.H.; Queinnec, M.; Luther, J.E.; Bolton, D.K.; White, J.C.; Wulder, M.A.; van Lier, O.R.; Hermosilla, T. Modelling lidar-derived estimates of forest attributes over space and time: A review of approaches and future trends. Remote Sens. Environ. 2021, 260, 112477. [Google Scholar] [CrossRef]
Lin, Y.-C.; Shao, J.; Shin, S.-Y.; Saka, Z.; Joseph, M.; Manish, R.; Fei, S.; Habib, A. Comparative Analysis of Multi-Platform, Multi-Resolution, Multi-Temporal LiDAR Data for Forest Inventory. Remote Sens. 2022, 14, 649. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Mboga, N.; Persello, C.; Bergado, J.R.; Stein, A. Detection of Informal Settlements from VHR Images Using Convolutional Neural Networks. Remote Sens. 2017, 9, 1106. [Google Scholar] [CrossRef]
Hu, P.; Chen, Y.; Imangholiloo, M.; Holopainen, M.; Wang, Y.; Hyyppa, J. Urban tree species classification based on multispectral airborne LiDAR. J. Infrared Millim. Waves 2025, 44, 198–202. [Google Scholar] [CrossRef]
Ke, L.; Zhang, S.; Lu, Y.; Lei, N.; Yin, C.; Tan, Q.; Wang, Q. Classification of wetlands in the Liaohe Estuary based on MRMR-RF-CV feature preference of multi-source remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6116–6133. [Google Scholar] [CrossRef]
El Kharki, A.; Mechbouh, J.; Wahbi, M.; Alaoui, O.Y.; Boulaassal, H.; Maatouk, M.; El Kharki, O. Optimizing SVM for argan tree classification using Sentinel-2 data: A case study in the Sous-Massa Region, Morocco. Rev. Teledetec. 2024. [Google Scholar] [CrossRef]
Thapa, B.; Darling, L.; Choi, D.H.; Ardohain, C.M.; Firoze, A.; Aliaga, D.G.; Hardiman, B.S.; Fei, S. Application of multi-temporal satellite imagery for urban tree species identification. Urban For. Urban Green. 2024, 98, 128409. [Google Scholar] [CrossRef]
Wang, J.; Jiang, Y. A Hybrid convolution neural network for the classification of tree species using hyperspectral imagery. PLoS ONE 2024, 19, e0304469. [Google Scholar] [CrossRef]
Zhang, J.; Li, H.; Wang, J.; Liang, Y.; Li, R.; Sun, X. Exploring the Differences in Tree Species Classification between Typical Forest Regions in Northern and Southern China. Forests 2024, 15, 929. [Google Scholar] [CrossRef]
Aziz, G.; Minallah, N.; Saeed, A.; Frnda, J.; Khan, W. Remote sensing based forest cover classification using machine learning. Sci. Rep. 2024, 14, 69. [Google Scholar] [CrossRef]
Manohanran, S.K.; Megalingam, R.K.; Kota, A.H.; Tejaswi, P.V.K.; Sankardas, K.S. Hybrid fuzzy support vector machine approach for Coconut tree classification using image measurement. Eng. Appl. Artif. Intell. 2023, 126-A, 106806. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G. The graph neural network model. In Proceedings of the NIPS’12, 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 1, pp. 1097–1105. [Google Scholar]
Simonyan, S.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2012. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph Attention Networks. International Conference on Learning Representations (ICLR). arXiv 2018. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; PMLR: Proceedings of Machine Learning Research. Volume 97, pp. 6105–6114. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Ren, W.; Wang, J.; Sun, Q.; Feng, J.; Liu, X. R-Mask R-CNN: Rotated Mask R-CNN for Oriented Object Detection. Remote Sens. 2021, 13, 3423. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling Up Your Kernels to 31 × 31: Revisiting Large Kernel Design in CNNs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
He, X.; Han, X.; Chen, Y.; Huang, L. A Light-weighted Fusion Vision Mamba for Multimodal Remote Sensing Data Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 21532–21548. [Google Scholar] [CrossRef]
Wang, H.; Wang, H.; Wu, L. TGF-Net: Transformer and Gist CNN Fusion Network for Multi-Modal Remote Sensing Image Classification. PLoS ONE 2025, 20, e0316900. [Google Scholar] [CrossRef] [PubMed]
Chi, Y.; Wang, C.; Chen, Z.; Xu, S. TCSNet: A New Individual Tree Crown Segmentation Network from Unmanned Aerial Vehicle Images. Forests 2024, 15, 1814. [Google Scholar] [CrossRef]
Dersch, S.; Schöttl, A.; Krzystek, P.; Heurich, M. Semi-Supervised Multi-Class Tree Crown Delineation Using Aerial Multispectral Imagery and LiDAR Data. ISPRS J. Photogramm. Remote Sens. 2024, 216, 154–167. [Google Scholar] [CrossRef]
Qi, Y.; Zhang, J.; Liu, D.; Zhang, Y. Multisource Domain Generalization Two-Branch Network for Hyperspectral Image Cross-Domain Classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5502205. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Terven, J.; Cordova-Esparza, D. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. arXiv 2023. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Hussain, M. YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision. arXiv 2024. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024. [Google Scholar] [CrossRef]
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025. [Google Scholar] [CrossRef]
Natural Resources Canada. Forest Classification; Natural Resources Canada: Ottawa, ON, Canada, 2025; Available online: https://natural-resources.canada.ca/forest-forestry/sustainable-forest-management/forest-classification (accessed on 19 August 2025).
Arura Uav, Uav-Tree-Identification-new_DATASET, UAV Tree Identification—NEW Dataset, Open Source Dataset. Roboflow Universe, Roboflow. 2023. Available online: https://universe.roboflow.com/arura-uav/uav-tree-identification-new (accessed on 9 July 2025).

Figure 1. The study area is located south of Kamloops, British Columbia (50°34′19.345″N, 120°27′1.209″W), which is labeled in the red frame.

Figure 2. The sample image view (top) and sample trees of fir, pine, spruce, and aspen (bottom left to right).

Figure 3. All trees within the images were manually identified, labeled, and segmented. We used this information as the ground truth.

Figure 4. General workflow of the study. Images were manually segmented using ground truth data, followed by PTC generation, with results compared across the same classifiers.

Figure 5. The workflow of PTC generation. An individual tree image (1) converted into a single-band monochrome image (2). The gray values figure (3) are then transformed into vertical heights, producing a three-dimensional representation of the tree crown, referred to as the pseudo tree crown (PTC) (4).

Figure 6. Full 3D PTC model, which can be viewed from different azimuths and elevations. For this study, a fixed viewing angle was used, effectively rendering the representation in 2.5D. (a) −120, 75 (b) 90, 75, (c) 120, 75 (all units in degrees).

Figure 7. Preprocessing sample for snow-background removal.

Figure 8. Sample of PTC with and without snow background removed. (a) original tree crown manually clipped with snow background; (b) the PTC derived from (a) directly; (c) tree crown with background removal (mainly snow); (d) the PTC derived from (c).

Figure 9. The PTC in the RF classifier workflow.

Figure 10. The PTC in PyTorch (ResNet50) classifier workflow.

Figure 11. The PTC in YOLO (v10, v11, and v12) classifier workflow.

Figure 12. Confusion matrices of RGB and PTC classification for RF (top) and YOLOv10 (bottom). 0 * means the trembling aspen was classified as unknown or background.

Figure 13. Comparisonbetween the 3D PTC representation (1)–(4) and the original grayscale pixel view (a–d). (a) one pixel image and (1) its PTC; (b) seven pixel image and (2) its PTC; (c) a coarse (reduced) tree crown pixel image and (3) its PTC; (d) a typical full pixel tree crown image and (4) its PTC.

Table 1. PyTorch (ResNet50) parameters.

Level	Layer Name	Input Channels	Output Channels	Kernel Size	Stride	Padding	Repetitions
Conv1 (Stage 0)	Conv	3	64	7 × 7	2	3	1
MaxPool1	MaxPool	64	64	3 × 3	2	1	1
Stage1	Bottleneck	64	256	1 × 1 → 3 × 3 → 1 × 1	(1,1,1)	(0,1,0)	3
Stage2	Bottleneck	256	512	1 × 1 → 3 × 3 → 1 × 1	(2,1,1)	(0,1,0)	4
Stage3	Bottleneck	512	1024	1 × 1 → 3 × 3 → 1 × 1	(2,1,1)	(0,1,0)	6
Stage4	Bottleneck	1024	2048	1 × 1 → 3 × 3 → 1 × 1	(2,1,1)	(0,1,0)	3
AvgPool1	AvgPool	2048	2048	–	–	–	1
FC Layer1	Fully Connected	2048	64	–	–	–	1
FC Layer2	Fully Connected	64	4	–	–	–	1

Table 2. The classification accuracy results with different models.

Model	RGB [%]	PTC [%]
RF	89.9	93.1
PyTorch (ResNet50)	83.4	85.8
YOLOv10	84.2	90.1
YOLOv11	83.3	88.1
YOLOv12	70.1	81.3

Table 3. The training times with different models with 852 trees.

Model	RGB [Minutes]	PTC [Minutes]
RF	0.64	0.62
PyTorch (ResNet50)	7.29	6.95
YOLOv10	19.02	16.80
YOLOv11	15.54	9.84
YOLOv12	23.04	23.04

Table 4. Classification results of RF and ResNet50 models.

Model	Data	Tree Species	Precision	Recall	F1 Score	IoU
RF	RGB	Fir	0.7715	0.8175	0.7938	0.6581
		Pine	0.8067	0.9030	0.8521	0.7423
		Spruce	0.9018	0.7581	0.8237	0.7003
		Trembling aspen	0.8467	1.0000	0.9170	0.8467
	PTC	Fir	0.8390	0.8266	0.8327	0.7134
		Pine	0.8000	0.9091	0.8511	0.7407
		Spruce	0.8877	0.8724	0.8800	0.7857
		Trembling aspen	0.8933	0.8428	0.8673	0.7657
ResNet50	RGB	Fir	0.7715	0.8175	0.7938	0.6581
		Pine	0.8067	0.9030	0.8521	0.7423
		Spruce	0.9018	0.7581	0.8237	0.7003
		Trembling aspen	0.8467	1.0000	0.9170	0.8467
	PTC	Fir	0.8390	0.8266	0.8327	0.7134
		Pine	0.8000	0.9091	0.8511	0.7407
		Spruce	0.8877	0.8724	0.8800	0.7857
		Trembling aspen	0.8933	0.8428	0.8673	0.7657
YOLOv10	RGB	Fir	0.9102	0.8727	0.8910	0.8034
		Pine	0.9789	0.9329	0.9533	0.9145
		Spruce	0.8713	0.9263	0.8980	0.8148
		Trembling aspen	1.0000	1.0000	1.0000	1.0000
	PTC	Fir	0.9574	0.9251	0.9410	0.8885
		Pine	0.9108	0.9533	0.9316	0.8720
		Spruce	0.9450	0.9683	0.9565	0.9167
		Trembling aspen	0.9793	0.9467	0.9627	0.9281
YOLOv11	RGB	Fir	0.8889	0.8689	0.8788	0.7838
		Pine	0.9726	0.9467	0.9595	0.9221
		Spruce	0.8746	0.9053	0.8897	0.8012
		Trembling aspen	1.0000	1.0000	1.0000	1.0000
	PTC	Fir	0.9081	0.9625	0.9345	0.8771
		Pine	0.9371	0.8933	0.9147	0.8428
		Spruce	0.9819	0.9509	0.9661	0.9345
		Trembling aspen	0.9733	0.9733	0.9733	0.9481
YOLOv12	RGB	Fir	0.7810	0.8015	0.7911	0.6544
		Pine	0.8239	0.9667	0.8896	0.8011
		Spruce	0.8538	0.7579	0.8030	0.6708
		Trembling aspen	1.0000	0.9933	0.9967	0.9933
	PTC	Fir	0.9325	0.8801	0.9056	0.8275
		Pine	0.8293	0.9067	0.8662	0.7640
		Spruce	0.9212	0.9439	0.9324	0.8734
		Trembling aspen	0.9167	0.8800	0.8980	0.8148

Table 5. The classification results with and without snow background.

Data	YOLO	With	Without
RGB	v10	0.8498	0.9236
	v11	0.8330	0.9178
	v12	0.7010	0.8498
PTC	v10	0.9010	0.9483
	v11	0.8810	0.9484
	v12	0.8310	0.9061

Table 6. Comparison of RGB and PTC accuracy across different dataset sizes for five models.

Dataset Size	ResNet50 (PyTorch)		RF		YOLOv10		YOLOv11		YOLOv12
Dataset Size	RGB [%]	PTC [%]	RGB [%]	PTC [%]	RGB [%]	PTC [%]	RGB [%]	PTC [%]	RGB [%]	PTC [%]
170	73.2	70.8	67.6	75.9	76.9	86.7	73.2	83.5	66.8	79.7
270	76.1	79.9	79.6	84.8	80.1	88.0	78.0	81.8	70.3	77.7
370	75.5	81.5	85.1	88.9	80.3	89.0	79.0	82.7	67.2	79.5
470	78.0	83.8	88.3	91.3	80.9	90.7	81.1	85.8	69.1	81.3
570	81.2	85.0	90.4	92.8	83.4	91.9	82.9	88.1	70.9	83.4
852	83.5	85.8	89.9	93.1	84.2	90.1	83.3	88.1	70.1	81.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, K.; Zhang, T.; Liu, J. Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests. Remote Sens. 2025, 17, 3102. https://doi.org/10.3390/rs17173102

AMA Style

Zhang K, Zhang T, Liu J. Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests. Remote Sensing. 2025; 17(17):3102. https://doi.org/10.3390/rs17173102

Chicago/Turabian Style

Zhang, Kongwen (Frank), Tianning Zhang, and Jane Liu. 2025. "Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests" Remote Sensing 17, no. 17: 3102. https://doi.org/10.3390/rs17173102

APA Style

Zhang, K., Zhang, T., & Liu, J. (2025). Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests. Remote Sensing, 17(17), 3102. https://doi.org/10.3390/rs17173102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Individual Tree Species Classification Using Pseudo Tree Crown (PTC) on Coniferous Forests

Abstract

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Individual Tree Species Classification and Vertical Biometric Information

2.2. The Classic Machine Learning (ML) Classification

2.3. The Deep Learning Classification Progress

2.4. The You Only Look Once (YOLO) Development

3. Data and Preprocessing

4. Methodology

5. Results

5.1. The Classification Results of Different Models

5.2. Snow-Background Removal

5.3. The Classification Results of Different Dataset Sizes

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI