1. Introduction
Pine trees (
Pinus spp.) are foundational elements of forest ecosystems worldwide, contributing essential ecological functions such as carbon sequestration, biodiversity maintenance, and watershed regulation. They also hold considerable economic and cultural value in many regions [
1]. In recent decades, however, invasive pests and diseases, including pine wilt disease (
Bursaphelenchus xylophilus) and the mountain pine beetle (
Dendroctonus ponderosae), have caused widespread mortality in pine forests, disrupting ecosystem balance and reducing forest resilience [
2,
3]. Monitoring tree health status based on visible phenotypic changes is critical for assessing the impact of pests and guiding management. However, traditional ground-based surveys are constrained by dense canopy cover, limited site accessibility, and high labor demands, rendering them unsuitable for rapid, large-scale forest assessments.
Recent advances in unmanned aerial vehicle (UAV) platforms and deep learning algorithms have revolutionized forest health monitoring, particularly in the context of pest and disease surveillance [
4,
5,
6,
7]. UAVs enable high-resolution image acquisition across broad spatial extents, while convolutional neural networks (CNNs) have substantially improved the automated analysis of UAV imagery [
8,
9,
10]. Among these, the YOLO (You Only Look Once) framework has become a widely adopted object detection algorithm in remote sensing due to its balance of speed and accuracy [
11,
12,
13]. Recent research has highlighted the capability of YOLO-based object detection frameworks for identifying and localizing harmful pests in complex environmental scenes [
14,
15]. For example, DeepForest and YOLOv5 have been used to map bark beetle damage in Mexican pine stands [
16], and an enhanced YOLO model (YOLO-PWD) has improved pine wilt disease detection by distinguishing discolored and dead trees with higher precision [
17].
The invasive woodwasp
Sirex noctilio (Hymenoptera, Siricidae), a xylophagous insect native to Eurasia and North Africa, has become an invasive pest of increasing concern in pine ecosystems worldwide. It is now recognized by the Food and Agriculture Organization (FAO) as a high-priority forest quarantine pest due to its destructive capacity and rapid spread [
18]. Female Sirex woodwasps deposit eggs into weakened or stressed pine trees, simultaneously injecting a phytotoxic mucus and spores of the symbiotic fungus
Amylostereum areolatum. The synergistic action of these agents disrupts vascular function and leads to tree mortality [
19,
20,
21]. Although pine wilt disease (PWD) is more widely known, the ecological threat posed by Sirex woodwasp infestations is comparably severe and warrants equal attention [
22].
In China, this species was first reported in 2013 in plantations of
Pinus sylvestris var.
mongholica in Heilongjiang Province, where it has since become a major forest health concern. Several remote sensing–based methods have been proposed to detect infestations [
23,
24]. For example, spectral indices derived from multispectral imagery and photometric point clouds have been used in machine learning frameworks [
25]; hyperspectral data combined with Random Forest (RF) and Support Vector Machine (SVM) models have been applied to distinguish infested from healthy or lightning-damaged trees [
26]; and RF-based classification using PlanetScope imagery has shown promising accuracy in mapping damage [
27]. These approaches have contributed valuable insights but still face notable limitations, such as reliance on handcrafted features, low spatial resolution, and poor generalization under complex forest conditions.
Previous ecological studies have established that the Sirex woodwasp preferentially infests weakened or drought-stressed pine trees [
28,
29,
30]. These trees typically exhibit subtle indicators of decline, such as slight needle yellowing, reduced foliage density, and crown thinning, making it difficult to distinguish them from healthy trees [
31]. Traditional remote sensing methods attempt to capture these symptoms through handcrafted spectral or geometric features, followed by classification using algorithms such as Random Forest and Support Vector Machine [
32,
33]. However, these handcrafted features often fail to represent the nuanced changes in color and canopy structure that signal early stages of infestation.
Deep learning offers a more robust solution to this challenge. CNNs can automatically learn hierarchical feature representations from high-resolution UAV imagery, enabling improved classification performance and stronger generalization across varied forest environments [
34,
35]. Such models are capable of capturing spatial patterns in needle coloration and canopy morphology that are indicative of incipient Sirex woodwasp damage. Nevertheless, detecting these weak visual cues remains difficult in practice due to confounding factors such as background vegetation, inconsistent lighting, variable image resolution, and the high aspect ratio of pine tree crowns. The present study addresses these challenges by designing a specialized deep learning architecture tailored to detecting early signs of pine tree health decline in UAV imagery under complex field conditions.
Although previous studies have applied remote sensing and machine learning techniques to detect pine wilt disease and other forest threats, no research to date has explored the use of deep learning methods for detecting and classifying pine trees damaged by Sirex woodwasp. Addressing this gap, the present study leverages the known ecological characteristics of the Sirex woodwasp, particularly its preference for weakened or stressed pine hosts, to develop a robust deep learning framework for pine tree health assessment based on UAV imagery. We propose a novel model, YOLO-Pine Tree Health Detection (YOLO-PTHD), designed to classify individual trees as healthy, weakened, or dead, to support early detection of Sirex woodwasp damage. Unlike YOLO-PWD [
17], which analyzes medium-resolution UAV imagery captured at higher flight altitudes for large-area monitoring, YOLO-PTHD is optimized for high-resolution images from lower-altitude flights, enabling finer detection of early phenotypic signs such as needle discoloration and crown thinning. To support this, YOLO-PTHD integrates strip convolution, context anchor attention, and a dynamic loss function, enhancing detection of elongated crowns and partially occluded trees. The main contributions of this study are as follows:
We constructed a new UAV-based dataset of pine trees infested by Sirex woodwasp, including both orthophotos and oblique-angle images collected from field-verified outbreak areas.
We designed YOLO-PTHD by incorporating three key architectural components: a strip convolutional structure using separate vertical and horizontal filters to accommodate the elongated crown morphology of pine trees; a context anchor attention mechanism that captures long-range spatial dependencies to improve distinction between healthy and infested trees; and a dynamic loss function that adjusts adaptively based on tree size, improving both localization and classification accuracy.
We conducted a series of ablation experiments to evaluate the performance contribution of each module, supported by qualitative visualization of model attention on canopy and needle features.
We validated the model against field survey data and demonstrated a high detection accuracy of 96.3% in identifying weakened trees damaged by Sirex woodwasp.
We evaluated the model’s generalization ability using the publicly available the Real PWD dataset from South Korea, which confirmed that YOLO-PTHD can be effectively applied to detect symptoms caused by other invasive pests and diseases. These findings highlight the practical value and transferability of the proposed model in large-scale pine forest health monitoring.
3. Results
This section presents a comprehensive evaluation of the proposed YOLO-PTHD model. First, we assess its performance on the Sirex woodwasp (SW) dataset specifically constructed in this study, including detection accuracy across different health categories, ablation studies to examine the contribution of each architectural module, and validation using ground survey data. These experiments collectively demonstrate the model’s effectiveness in detecting pine trees affected by Sirex woodwasp under real-world conditions.
To further evaluate the model’s generalization capability, we introduce the Real Pine Wilt Disease (R-PWD) dataset, described in
Appendix A. Two experimental settings are explored: (i) training and testing on the R-PWD dataset alone, and (ii) joint training using a combined dataset comprising both SW and R-PWD images. These experiments are designed to assess YOLO-PTHD’s robustness across distinct tree diseases and varying imaging scenarios.
For the main comparative study, five YOLO models from different release stages were selected, including YOLOv8 [
49], YOLOv9 [
53], YOLOv10 [
54], YOLOv11 [
48], and YOLOv12 [
55]. Among them, YOLOv8 and YOLOv11 were released by Ultralytics, while YOLOv12 represents the most recent iteration in the YOLO series. To ensure a fair comparison, all models were trained using identical hyperparameters and on the same datasets. Notably, YOLOv9 was trained using the “t” (tiny) model variant, while the others used the “n” (nano) variant, indicating that lightweight configurations were employed across all models for consistency and deployment feasibility.
3.1. Evaluation of the Performance of YOLO-PTHD Using P-R Curves
As illustrated in
Figure 10, YOLO-PTHD achieved the highest performance across all three categories, particularly in Class_SWAll and Class_SWWeak.
Figure 10a shows that YOLO-PTHD achieved the most balanced accuracy-recall trade-off in identifying target pine trees across both categories.
Figure 10b shows that for detecting weakened pine trees, YOLO-PTHD maintains a notable lead over the state-of-the-art YOLOv12.
Figure 10 indicates that for detecting dead pine trees, YOLO-PTHD exhibits a greater advantage over other models.
3.2. Performance Evaluation
The performance of YOLO-PTHD was evaluated and compared against five YOLO baseline models using the Sirex Woodwasp dataset.
Table 3 presents detailed quantitative results across three evaluation metrics—mAP and F1-score for three target categories: Class_SWAll Class_SWWeak, and (Class_SWDead).
YOLO-PTHD achieved the highest performance among all evaluated YOLO models across multiple detection categories. Compared with YOLOv12, it showed a 2.9% improvement in mAP and a 3.2% increase in F1-score for overall detection Class_SWAll. For weakened trees (Class_SWWeak), mAP and F1-score increased by 2.7% and 3.0%, respectively. The most notable gains were observed in the detection of dead trees (Class_SWDead), where mAP improved by 3.1% and F1-score by 3.4%. These results indicate that YOLO-PTHD delivers more reliable identification of both early and advanced symptoms of pine decline.
In terms of computational complexity, YOLO-PTHD demonstrated strong efficiency. As shown in
Table 3, it required 6.0 GFLOPs, representing a 7.69% reduction compared to YOLOv12 and a 31.03% reduction compared to YOLOv8 (8.7 GFLOPs).
Figure 11 visualizes the trade-off between accuracy (mAP) and computational cost (GFLOPs), confirming that YOLO-PTHD achieves superior detection accuracy with the lowest computational demand among all models tested.
3.3. Ablation Study
To assess the contribution of each proposed module, ablation experiments were conducted based on YOLOv11. As shown in
Figure 12 the Strip, CAA, and DL modules individually improved mAP by 1.9%, 1.8%, and 1.5%, respectively. Pairwise combinations such as Strip + CAA, CAA + DL, and Strip+DL yielded further gains, while the full model (YOLO-PTHD) achieved the highest mAP improvement of 4.3%.
In terms of computational efficiency, the Strip and CAA modules reduced GFLOPs by 4.4% and 6.0%, respectively, and their combination achieved a maximum reduction of 6.6%. The DL module did not affect GFLOPs but contributed to accuracy.
These results indicate that each module contributes to detection performance, with Strip and CAA also offering efficiency benefits.
These results confirm that each module helps improve detection performance, and Strip and CAA are also effective in reducing computational cost.
To offer a clearer understanding of the three modules’ impact on detection performance, the ablation study outcomes were visualized. Using Grad-CAM [
56], the highlighted image regions where the model focuses for predictions are visually illustrated.
The left-side images show the original images alongside those annotated with detection bounding boxes. The upper image presents a scene with two dead pine trees, whereas the lower image shows one weakened pine tree. All models successfully recognized these scenes, as illustrated in
Figure 13.
The heatmaps in
Figure 13, where brighter colors signify higher model focus, illustrate how the model utilizes features from specific regions for detection. The baseline demonstrates that without these modules, the model fails to accurately focus on needle and canopy characteristics.
In contrast, incorporating either the Strip or CAA module significantly enhances the model’s focus on the needles and canopy. The combined effect of both modules nearly fully directs the model’s focus to the needles and canopy. The DL module enhances the model’s focus on small-area canopy and needle features but requires combination with Strip and CAA for optimal performance.
3.4. Validation and Application of YOLO-PTHD in Ground Survey
The georeferenced coordinates of 141 UAV orthophotos covering Plot A were imported into the GIS environment, after which the YOLO-PTHD model was executed to detect weakened and dead pine trees. Only bounding boxes with a confidence score greater than 0.60 were retained to minimize false positives. The detection process was conducted under the configuration described in
Section 2.3.4 and required approximately 5 h to complete. In the visualization, weakened trees were rendered as yellow circles and dead trees as white triangles.
To verify model performance, these detections were compared with a comprehensive ground survey that distinguished two classes of Sirex woodwasp damage: (i) trees damaged in previous years and (ii) trees newly damaged during the current survey. Spatial overlap between survey points and model predictions determined detection success. For previously damaged trees, successful matches were re-labeled with blue circles; unmatched trees were marked with blue triangles. For newly damaged trees, successful detections were shown as red circles, whereas omissions were indicated with red triangles. The resulting composite map (
Figure 14) provides an intuitive overview of the health status and highlights areas where the model succeeded or failed. Representative image chips beneath the main panel further illustrate that the predicted bounding boxes generally align well with field observations, even in scenes containing multiple weakened or dead trees.
YOLO-PTHD detected 68 of the 70 previously damaged trees and 10 of the 11 newly damaged trees recorded during the ground survey, yielding accuracy rates of 97.14% and 90.91%, respectively. These results, summarized in
Table 4, indicate that the model achieved an overall detection accuracy of 96.30% in real forest conditions. This strong agreement between ground survey and model outputs confirms YOLO-PTHD’s effectiveness in identifying Sirex woodwasp damage with high reliability and low omission risk.
3.5. Generalization Capability of the Model
To evaluate the robustness and cross-domain adaptability of YOLO-PTHD, we conducted experiments under two settings: (1) training on the R-PWD dataset to assess generalization to unseen symptom patterns and imaging domains, and (2) joint training on a merged dataset combining different tree diseases (SW + R-PWD) to evaluate cross-disease generalization performance.
3.5.1. Generalization Performance on the R-PWD Dataset
The Real Pine Wilt Disease (R-PWD) dataset (
Appendix A.1) contains UAV-acquired imagery collected from various forest sites in South Korea, annotated with two symptom classes: Infected and Dead. Unlike the SW dataset used in prior training, R-PWD represents a different disease type and was captured under distinct environmental conditions.
To assess YOLO-PTHD’s ability to generalize beyond its source domain, we trained the model solely on the R-PWD dataset using the same hyperparameter configuration as in previous experiments. Performance metrics are reported in
Appendix A Table A2, and compared with the results of EfficientNetv2-S from [
47]. YOLO-PTHD achieves higher precision (0.908), recall (0.926), and F1 score (0.917) values, surpassing EfficientNetv2-S in all metrics. This demonstrates that the model can effectively recognize symptoms of pine wilt disease, despite differences in acquisition platform, disease characteristics, and label distributions, thus validating its domain-level generalization capability.
3.5.2. Cross-Disease Detection on the Combined SW + R-PWD Dataset
To further examine the model’s ability to generalize across multiple tree diseases, we constructed a balanced dataset by combining the full SW dataset (1330 images) with a randomly sampled subset of R-PWD (665 images per class). The merged dataset contains 2660 images evenly distributed across four classes: SWWeak, SWDead, PWD-Infected, and PWD-Dead. A stratified 70/30 train/validation split was applied to ensure balanced representation of all classes.
Detection results on the combined SW + R-PWD dataset are presented in
Table 5. YOLO-PTHD achieved the best overall performance, with a mAP of 0.918 and an F1-score of 0.888, outperforming all baseline YOLO models. Compared to YOLOv12, YOLO-PTHD showed a 3.7% increase in mAP and a 1.6% gain in F1-score for the overall category (Class_SWPWDAll). Notably, across all four subclasses—including weakened and dead trees in both SW and PWD domains—YOLO-PTHD consistently produced the highest or near-highest scores, reflecting its strong capability in handling different symptom stages and disease types.
These results demonstrate that the proposed model effectively captures transferable visual patterns associated with pine tree decline, supporting its robust cross-disease generalization on heterogeneous UAV imagery from different geographic and pathological contexts.
4. Discussion
This study proposed YOLO-PTHD, a lightweight yet high-performance deep learning framework designed for UAV-based detection of pine tree health conditions under various biotic stressors. Validated on two distinct datasets—covering both Sirex woodwasp-induced damage and pine wilt disease—the model demonstrated strong generalization across different pest types, geographic locations, and imaging conditions. YOLO-PTHD achieved an overall detection accuracy of 96.3% in field-verified Sirex outbreak areas and outperformed five state-of-the-art YOLO variants with a mAP of 0.923, an F1-score of 0.866, and a reduced computational cost of 6.0 GFLOPs. Through the integration of Strip-based convolution, Channel-Aware Attention, and a scale-sensitive dynamic loss function, YOLO-PTHD effectively addresses critical challenges in phenotype-level detection of pine decline, such as subtle needle discoloration, elongated canopy structures, and occlusion in dense forest environments. These results confirm the model’s robustness, efficiency, and practical value as a scalable tool for forest health surveillance, early pest outbreak detection, and ecological risk mitigation.
Our findings are consistent with previous studies emphasizing the utility of deep learning in tree health monitoring. In prior work, DeepForest and YOLOv5 were jointly applied to detect bark beetle damage in Mexican pine forests, with detection primarily based on visibly discolored canopies [
16]. Another study developed the YOLO-PWD model for pine wilt disease identification, incorporating attention mechanisms to improve accuracy, yet focusing mainly on clearly dead or severely affected trees [
17]. In contrast, YOLO-PTHD demonstrates improved sensitivity to visually detectable symptoms such as needle yellowing and crown thinning. This may be attributed to the StripBlock-enhanced backbone and CAA-integrated neck, which enable extraction of orientation-aware and context-rich features suited to pine crown morphology. As shown in the ablation study, each proposed module—Strip, CAA, and SDIoU—independently contributed to performance gains, with a combined mAP improvement of 4.3%. Grad-CAM visualizations (
Figure 13) further illustrate enhanced model attention on needle and canopy features, supporting more accurate detection under complex forest conditions.
In comparison with previous studies on Sirex woodwasp detection, which primarily relied on multispectral imagery and traditional classifiers such as Random Forest and SVM [
26,
27], YOLO-PTHD offers clear advantages in model adaptability and operational efficiency. First, it reduces reliance on handcrafted spectral features by leveraging end-to-end deep learning from RGB UAV imagery, enabling early identification of subtle phenotypes such as crown thinning and needle yellowing. Second, the object detection framework allows for more efficient annotation at the crown level, greatly accelerating dataset development. Experiments on the SW dataset confirm YOLO-PTHD’s strong detection capability for Sirex woodwasp damage. When retrained on the R-PWD dataset, the model maintained high accuracy under different forest types and imaging conditions, demonstrating its cross-region and cross-pest generalization. Furthermore, training on the combined SW and R-PWD dataset showed that the model could distinguish between different crown phenotypes associated with Sirex woodwasp and pine wilt disease, highlighting its potential as a unified deep learning backbone for monitoring multiple forest health threats in
Pinus ecosystems.
Despite its strong performance, YOLO-PTHD has key limitations that warrant consideration. Most notably, the model relies solely on RGB imagery, which restricts its ability to detect early physiological stress before visible symptoms appear. Moreover, multiple biotic and abiotic factors such as drought, fungal pathogens, and nutrient deficiencies can cause phenotypic changes like needle yellowing and crown thinning that resemble symptoms induced by Sirex woodwasp. Consequently, while YOLO-PTHD can effectively detect declining trees and is well suited for monitoring in areas with confirmed Sirex woodwasp outbreaks, it cannot independently confirm pest-specific damage in regions where the cause of decline is unknown. In such cases, RGB-based detection must be supplemented with additional evidence [
57] to accurately attribute decline to Sirex woodwasp.
On the deployment side, YOLO-PTHD (6.0 GFLOPs) achieves a lower computational load compared to YOLOv11n (6.5 GFLOPs), making it well suited for real-time inference on lightweight edge devices such as the Jetson Nano [
58]. This efficiency is particularly valuable in field operations. For example, in this study, UAV flights over Plot A required approximately 90 min, and the YOLO-PTHD model processed all acquired images within 5 h. In contrast, the corresponding ground survey conducted by six trained researchers took 12 full days to complete the same area. These results highlight the model’s potential to substantially reduce the time and labor required for forest health assessments. By enabling rapid onboard analysis during UAV flights, deployment on edge devices can further minimize data transfer delays and accelerate detection of tree decline. To enhance this capability, future work could explore model pruning or knowledge distillation techniques to further reduce inference time and resource demand, supporting real-time forest monitoring and precision pest management in operational settings.
5. Conclusions
This study presents YOLO-PTHD, a lightweight deep learning model tailored for UAV-based detection of pine-tree health under biotic stresses. Trained and evaluated on the newly constructed Sirex Woodwasp (SW) dataset, the model achieved 96.3% overall accuracy, mAP 0.923, and F1-score 0.866 while requiring only 6.0 GFLOPs—outperforming five state-of-the-art YOLO baselines in both accuracy and efficiency. Ablation experiments confirmed that each targeted innovation—StripBlock convolution, Channel-Aware Attention, and the scale-adaptive SDIoU loss—contributes incrementally to performance, yielding a combined 4.3% mAP gain and a 6.6% reduction in computation relative to the YOLOv11 backbone.
Robustness tests demonstrated strong generalization. When retrained on the independent Real Pine Wilt Disease (R-PWD) dataset, YOLO-PTHD reached precision 0.908, recall 0.926, and F1-score 0.917, surpassing a recently reported EfficientNetv2-S benchmark [
47]. On the combined SW + R-PWD dataset, the model attained mAP 0.918 and F1-score 0.888, accurately distinguishing crown phenotypes produced by two distinct pests and validating its cross-disease adaptability.
By combining sensitivity to phenotypic indicators of tree decline with computational efficiency, YOLO-PTHD serves as a practical and scalable tool for large-scale forest health surveillance and rapid pest outbreak response. Its tree-level annotation workflow accelerates dataset expansion, and its compact footprint makes deployment on edge devices (e.g., Jetson-class modules or UAV onboard processors) feasible. These strengths make YOLO-PTHD a scalable foundation for multi-disease surveillance in Pinus ecosystems and a promising component of real-time, precision pest management systems. By enabling accurate and timely detection of pest-induced decline, the model can support responsive forest management and facilitate early intervention in pest mitigation efforts.