A Multi-Temporal Instance Segmentation Framework and Exhaustively Annotated Tree Crown Dataset for a Subtropical Urban Forest Case
Highlights
- Based on an exhaustively annotated 47,754-crown dataset, bi-temporal joint training is sufficient to overcome the severe phenological overfitting of single-temporal models.
- Strategic phase selection can yield higher segmentation generalizability than merely increasing multi-temporal data volume.
- The exhaustive annotation of all visible crowns provides the model with more complete feature information to handle complex backgrounds in urban spaces.
- Integrating multi-date UAV imagery effectively mitigates the influence of seasonal and illumination variations on tree crown geometric delineation.
Abstract
1. Introduction
2. Materials and Methods
2.1. Materials
2.1.1. Study Area
2.1.2. Airborne Optical Images
2.2. Methods
2.2.1. Sample Labeling
2.2.2. Dataset Partitioning
2.2.3. Temporal Data Characterization
2.2.4. Tree Crown Recognition Based on ConvNext-V2
2.2.5. Experiment Design
2.2.6. Result of Fusion Based on Non-Maximum Suppression
2.2.7. Assessment
3. Results
3.1. Benchmarking and Parameter Analysis
3.1.1. Model Architecture Comparison
3.1.2. Parameter Analysis
3.1.3. Comparison of Fusion Strategies
3.1.4. Post-Processing Results at Different Stages
3.2. Results Based on Multiple Temporal Observations
3.2.1. Training Gains from Multi-Temporal Fusion
3.2.2. Image Quality and Sensitivity Analysis
3.2.3. Visual Performance of Multi-Temporal Fusion
3.2.4. Multi-Temporal Information Synergy and Complementary Effects
3.2.5. Quantitative Evaluation of Boundary Delineation Accuracy
4. Discussion
4.1. Applicability of an Exhaustively Annotated Tree Crown Dataset
4.2. Characteristics of the Deep Learning Architectures
4.3. Considerations for Multi-Temporal Data Approach
4.4. Limitations and Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Blickensdörfer, L.; Oehmichen, K.; Pflugmacher, D.; Kleinschmit, B.; Hostert, P. National Tree Species Mapping Using Sentinel-1/2 Time Series and German National Forest Inventory Data. Remote Sens. Environ. 2024, 304, 114069. [Google Scholar] [CrossRef]
- Crowther, T.W.; Maynard, D.S.; Leff, J.W.; Oldfield, E.E.; McCulley, R.L.; Fierer, N.; Bradford, M.A. Predicting the Responsiveness of Soil Biodiversity to Deforestation: A Cross-Biome Study. Glob. Change Biol. 2014, 20, 2983–2994. [Google Scholar] [CrossRef]
- Duinker, P.N.; Ordóñez, C.; Steenberg, J.W.N.; Miller, K.H.; Toni, S.A.; Nitoslawski, S.A. Trees in Canadian Cities: Indispensable Life Form for Urban Sustainability. Sustainability 2015, 7, 7379–7396. [Google Scholar] [CrossRef]
- Wang, Y.; Zhu, Y.; Cook-Patton, S.C.; Sun, W.; Zhang, W.; Ciais, P.; Li, T.; Smith, P.; Yuan, W.; Zhu, X.; et al. Land Availability and Policy Commitments Limit Global Climate Mitigation from Forestation. Science 2025, 389, 931–934. [Google Scholar] [CrossRef] [PubMed]
- Crowther, T.W.; Glick, H.B.; Covey, K.R.; Bettigole, C.; Maynard, D.S.; Thomas, S.M.; Smith, J.R.; Hintler, G.; Duguid, M.C.; Amatulli, G.; et al. Mapping Tree Density at a Global Scale. Nature 2015, 525, 201–205. [Google Scholar] [CrossRef]
- Cheng, K.; Yang, H.; Chen, Y.; Yang, Z.; Ren, Y.; Zhang, Y.; Lin, D.; Liu, W.; Huang, G.; Xu, J.; et al. How Many Trees Are There in China? Sci. Bull. 2025, 70, 1076–1079. [Google Scholar] [CrossRef]
- Straker, A.; Puliti, S.; Breidenbach, J.; Kleinn, C.; Pearse, G.; Astrup, R.; Magdon, P. Instance Segmentation of Individual Tree Crowns with YOLOv5: A Comparison of Approaches Using the ForInstance Benchmark LiDAR Dataset. ISPRS Open J. Photogramm. Remote Sens. 2023, 9, 100045. [Google Scholar] [CrossRef]
- Xiang, B.; Wielgosz, M.; Kontogianni, T.; Peters, T.; Puliti, S.; Astrup, R.; Schindler, K. Automated Forest Inventory: Analysis of High-Density Airborne LiDAR Point Clouds with 3D Deep Learning. Remote Sens. Environ. 2024, 305, 114078. [Google Scholar] [CrossRef]
- Sun, Y.; Li, Z.; He, H.; Guo, L.; Zhang, X.; Xin, Q. Counting Trees in a Subtropical Mega City Using the Instance Segmentation Method. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102662. [Google Scholar] [CrossRef]
- Liang, X.; Chen, J.; Gong, W.; Puttonen, E.; Wang, Y. Influence of Data and Methods on High-Resolution Imagery-Based Tree Species Recognition Considering Phenology: The Case of Temperate Forests. Remote Sens. Environ. 2025, 323, 114654. [Google Scholar] [CrossRef]
- Sylvain, J.-D.; Drolet, G.; Thiffault, É.; Anctil, F. High-Resolution Mapping of Tree Species and Associated Uncertainty by Combining Aerial Remote Sensing Data and Convolutional Neural Networks Ensemble. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103960. [Google Scholar] [CrossRef]
- Tucker, C.; Brandt, M.; Hiernaux, P.; Kariryaa, A.; Rasmussen, K.; Small, J.; Igel, C.; Reiner, F.; Melocik, K.; Meyer, J.; et al. Sub-Continental-Scale Carbon Stocks of Individual Trees in African Drylands. Nature 2023, 615, 80–86. [Google Scholar] [CrossRef]
- Xie, Y.; Wang, Y.; Sun, Z.; Liang, R.; Ding, Z.; Wang, B.; Huang, S.; Sun, Y. Instance Segmentation and Stand-Scale Forest Mapping Based on UAV Images Derived RGB and CHM. Comput. Electron. Agric. 2024, 220, 108878. [Google Scholar] [CrossRef]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. arXiv 2022, arXiv:2112.01527. [Google Scholar] [CrossRef]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-Designing and Scaling ConvNets with Masked Autoencoders. arXiv 2023, arXiv:2301.00808. [Google Scholar]
- Zhang, J.; Lei, F.; Fan, X. Parameter-Efficient Fine-Tuning for Individual Tree Crown Detection and Species Classification Using UAV-Acquired Imagery. Remote Sens. 2025, 17, 1272. [Google Scholar] [CrossRef]
- Cloutier, M.; Germain, M.; Laliberté, E. Influence of Temperate Forest Autumn Leaf Phenology on Segmentation of Tree Species from UAV Imagery Using Deep Learning. Remote Sens. Environ. 2024, 311, 114283. [Google Scholar] [CrossRef]
- Troles, J.; Schmid, U.; Fan, W.; Tian, J. BAMFORESTS: Bamberg Benchmark Forest Dataset of Individual Tree Crowns in Very-High-Resolution UAV Images. Remote Sens. 2024, 16, 1935. [Google Scholar] [CrossRef]
- Ball, J.G.C.; Hickman, S.H.M.; Jackson, T.D.; Koay, X.J.; Hirst, J.; Jay, W.; Archer, M.; Aubry-Kientz, M.; Vincent, G.; Coomes, D.A. Accurate Delineation of Individual Tree Crowns in Tropical Forests from Aerial RGB Imagery Using Mask R-CNN. Remote Sens. Ecol. Conserv. 2023, 9, 641–655. [Google Scholar] [CrossRef]
- Jansen, A.J.; Nicholson, J.D.; Esparon, A.; Whiteside, T.; Welch, M.; Tunstill, M.; Paramjyothi, H.; Gadhiraju, V.; van Bodegraven, S.; Bartolo, R.E. Deep Learning with Northern Australian Savanna Tree Species: A Novel Dataset. Data 2023, 8, 44. [Google Scholar] [CrossRef]
- Vasquez, V.; Cushman, K.; Ramos, P.; Williamson, C.; Villareal, P.; Gomez Correa, L.F.; Muller-Landau, H. Barro Colorado Island 50-Ha Plot Crown Maps: Manually Segmented and Instance Segmented; Smithsonian Tropical Research Institute: Panama City, Panama, 2023.
- Hickman, S.; Jackson, T. Datasets of SH’s AI4ER MRes Project; Zenodo: Geneva, Switzerland, 2021. [Google Scholar]
- van Geffen, F.; Heim, B.; Brieger, F.; Geng, R.; Shevtsova, I.A.; Schulte, L.; Stuenzi, S.M.; Bernhardt, N.; Troeva, E.I.; Pestryakova, L.A.; et al. SiDroForest: A Comprehensive Forest Inventory of Siberian Boreal Forest Investigations Including Drone-Based Point Clouds, Individually Labeled Trees, Synthetically Generated Tree Crowns, and Sentinel-2 Labeled Image Patches. Earth Syst. Sci. Data 2022, 14, 4967–4994. [Google Scholar] [CrossRef]
- Lefebvre, I.; Laliberté, E. UAV LiDAR, UAV Imagery, Tree Segmentations and Ground Mesurements for Estimating Tree Biomass in Canadian (Quebec) Plantations; Federated Research Data Repository: Waterloo, ON, Canada, 2024. [Google Scholar] [CrossRef]
- Shcherbacheva, A.; Campos, M.B.; Wang, Y.; Liang, X.; Kukko, A.; Hyyppä, J.; Junttila, S.; Lintunen, A.; Korpela, I.; Puttonen, E. A Study of Annual Tree-Wise LiDAR Intensity Patterns of Boreal Species Observed Using a Hyper-Temporal Laser Scanning Time Series. Remote Sens. Environ. 2024, 305, 114083. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Hao, Z.; Lin, L.; Post, C.J.; Mikhailova, E.A.; Li, M.; Chen, Y.; Yu, K.; Liu, J. Automated Tree-Crown and Height Detection in a Young Forest Plantation Using Mask Region-Based Convolutional Neural Network (Mask R-CNN). ISPRS J. Photogramm. Remote Sens. 2021, 178, 112–123. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [PubMed]
- Teng, M.; Ouaknine, A.; Laliberté, E.; Bengio, Y.; Rolnick, D.; Larochelle, H. Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery. arXiv 2025, arXiv:2503.20199. [Google Scholar] [CrossRef]
- Sun, E.; Cui, Y.; Liu, P.; Yan, J. A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities. Inf. Fusion 2025, 126, 103675. [Google Scholar] [CrossRef]
- Zhu, X.; Wang, T.; Skidmore, A.K.; Duporge, I. A Deep Learning Framework for Mapping Evergreen Conifer Fractional Cover at 30 m Resolution Using Fused Bi-Temporal WorldView and Time-Series Landsat Imagery in Mixed Mountain Forests. Remote Sens. Environ. 2025, 331, 115055. [Google Scholar] [CrossRef]
- Grybas, H.; Congalton, R.G. A Comparison of Multi-Temporal RGB and Multispectral UAS Imagery for Tree Species Classification in Heterogeneous New Hampshire Forests. Remote Sens. 2021, 13, 2631. [Google Scholar] [CrossRef]















| Datasets | Exhaustive Annotation 1 | Multi-Temporal | n of Labels |
|---|---|---|---|
| BAMFORESTS [18] | Unclear 2 | NO | 27,160 |
| Quebec Trees [17] | NO | YES | 23,000 |
| Detectree2 [19] | NO | NO | 3797 |
| Jansen et al. [20] | NO | NO | 2547 |
| BCI50ha [21] | NO | NO | 2454 |
| Hickman et al. [22] | NO | NO | 901 |
| SiDroForest [23] | NO | NO | 872 |
| Quebec Plantations [24] | NO | NO | - |
| Models | Bbox mAP | Segm mAP | Segm mAP50 | Segm mAP75 | Segm mAP_s | Segm mAP_m |
|---|---|---|---|---|---|---|
| Mask R-CNN [28] | 0.700 | 0.744 | 0.919 | 0.865 | 0.632 | 0.844 |
| Cascade Mask R-CNN [29] | 0.751 | 0.763 | 0.919 | 0.875 | 0.637 | 0.870 |
| QueryInst [30] | 0.579 | 0.629 | 0.886 | 0.731 | 0.527 | 0.736 |
| Mask2Former [14] | 0.550 | 0.595 | 0.843 | 0.682 | 0.465 | 0.712 |
| ConvNeXt-V2 [15] | 0.818 | 0.852 | 0.949 | 0.930 | 0.749 | 0.943 |
| Method | Fusion Strategy | Input Channels | Pre-Trained Weights | Bbox_mAP | Segm_mAP |
|---|---|---|---|---|---|
| NMS | Late Fusion | Three-channel | ImageNet | 0.845 | 0.874 |
| Naive Stacking | Early Fusion | Six-channel | None (Random Init) | 0.634 | 0.661 |
| Training Configuration | Training Data | Bbox_mAP (Self 1) | Segm_mAP (Self) | Bbox_mAP (Common 2) | Segm_mAP (Common) |
|---|---|---|---|---|---|
| Single-temporal | 23Feb | 0.798 | 0.823 | 0.383 | 0.398 |
| 23Mar | 0.773 | 0.791 | 0.404 | 0.419 | |
| 24Nov | 0.818 | 0.852 | 0.346 | 0.361 | |
| Bi-temporal | 24Nov + 23Feb | 0.845 | 0.874 | 0.637 | 0.665 |
| 24Nov + 23Mar | 0.81 | 0.834 | 0.57 | 0.588 | |
| 23Feb + 23Mar | 0.739 | 0.769 | 0.619 | 0.651 | |
| Tri-temporal | 24Nov + 23Feb + 23Mar | 0.717 | 0.748 | - | - |
| Training Data | Weight Ratio (24Nov:23Feb:23Mar) | Bbox_mAP | Segm_mAP |
|---|---|---|---|
| 24Nov + 23Feb + 23Mar | 1:1:1.0 | 0.717 | 0.748 |
| 1:1:0.7 | 0.746 | 0.778 | |
| 1:1:0.5 | 0.756 | 0.786 | |
| 1:1:0.3 | 0.758 | 0.791 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lin, W.; Jiang, H.; Ku, M.; Zhang, J.; Wang, B. A Multi-Temporal Instance Segmentation Framework and Exhaustively Annotated Tree Crown Dataset for a Subtropical Urban Forest Case. Remote Sens. 2026, 18, 1082. https://doi.org/10.3390/rs18071082
Lin W, Jiang H, Ku M, Zhang J, Wang B. A Multi-Temporal Instance Segmentation Framework and Exhaustively Annotated Tree Crown Dataset for a Subtropical Urban Forest Case. Remote Sensing. 2026; 18(7):1082. https://doi.org/10.3390/rs18071082
Chicago/Turabian StyleLin, Weihong, Hao Jiang, Mengjun Ku, Jing Zhang, and Baomin Wang. 2026. "A Multi-Temporal Instance Segmentation Framework and Exhaustively Annotated Tree Crown Dataset for a Subtropical Urban Forest Case" Remote Sensing 18, no. 7: 1082. https://doi.org/10.3390/rs18071082
APA StyleLin, W., Jiang, H., Ku, M., Zhang, J., & Wang, B. (2026). A Multi-Temporal Instance Segmentation Framework and Exhaustively Annotated Tree Crown Dataset for a Subtropical Urban Forest Case. Remote Sensing, 18(7), 1082. https://doi.org/10.3390/rs18071082

