Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data

Natesan, Sowmya; Vepakomma, Udayalakshmi; Armenakis, Costas

doi:10.3390/f17070738

Open AccessArticle

Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data

by

Sowmya Natesan

^1,*,

Udayalakshmi Vepakomma

^2,† and

Costas Armenakis

¹

Department of Earth and Space Science and Engineering, Lassonde School of Engineering, York University, Toronto, ON M3J 1P3, Canada

²

FPInnovations, Pointe-Claire, QC H9R 3J9, Canada

^*

Author to whom correspondence should be addressed.

^†

Current address: GeoClarET, 6705 Kirwan, Côte St-Luc, QC H4V 1B6.

Forests 2026, 17(7), 738; https://doi.org/10.3390/f17070738 (registering DOI)

Submission received: 18 April 2026 / Revised: 17 June 2026 / Accepted: 19 June 2026 / Published: 25 June 2026

(This article belongs to the Special Issue Classification of Forest Tree Species Using Remote Sensing Technologies: Latest Advances and Improvements)

Download

Browse Figures

Versions Notes

Abstract

Individual tree species classification is essential for detailed forest inventories, ecosystem monitoring, and biodiversity assessment. While UAV-acquired RGB and multispectral (MS) imagery have advanced tree species mapping, most studies focus on a single sensor type. In practice, UAV platforms carry diverse sensors with varying spatial resolutions, spectral bands, radiometric responses, and noise characteristics, introducing domain shifts that limit model generalization across datasets. To overcome these challenges, we propose a supervised cross-sensor transfer learning approach, leveraging a DenseNet-121 model pretrained on high-resolution UAV RGB imagery to improve classification on lower-resolution multispectral imagery with limited labelled data. The adapted model achieved 75% overall accuracy and a macro-F1 score of 0.706, significantly improving over models trained from scratch. Its performance was further evaluated on downsampled UAV MS imagery simulating conventional airborne multispectral photographs, demonstrating robustness and practical applicability for regional-scale forest inventories. This study highlights cross-domain transfer learning as a pathway toward sensor-independent, efficient, and operationally scalable tree species classification.

Keywords:

UAV; tree species classification; RGB images; multispectral images; deep learning; transfer learning; domain adaptation; Canadian forestry

1. Introduction

Tree species classification is an important part of sustainable forest management in Canada, supporting biodiversity conservation, carbon accounting, wildfire risk assessment, habitat protection, and the development of climate-resilient strategies [1,2,3]. Accurate species-level information provides a solid basis for both conservation planning and industry applications. Identifying species at the individual tree level enhances forest inventories by producing detailed, high-resolution maps of forest composition, thereby supporting ecosystem monitoring, targeted management practices, and informed decision-making for conservation and resource use. Although significant research has been done in species classification using remote sensing methods, achieving a fully automated and reliable computational solution is still an ongoing task [4,5,6,7]. According to a recent review on forest inventories in Canada, automated tree species recognition and mapping were identified as a research priority by many provinces in Canada [8]. Despite advances in remote sensing technology, most operational forest inventories continue to characterize forests at broad levels, such as percent hardwood and softwood or broad species mix categories in a given stand, often with limited accuracy or high uncertainty [9]. Literature highlights the need for improved individual tree detection since species-specific data are crucial for timber and non-timber valuation, silviculture planning, pest and disease management, biodiversity assessment, and understanding forest succession.

Recent progress in artificial intelligence (AI), computational efficiencies, and the availability of high-resolution UAV (Unmanned Aerial Vehicles or drones) imagery have opened new possibilities for detailed forest monitoring, particularly at the individual tree level [10,11,12]. Although ultra-high density lidar is explored for structurally separable species, given their ease in acquisition and better spectral separability, both RGB (natural colour imagery) and multispectral (MS) images from UAV have been widely used for tree species classification at the individual tree level [13,14,15,16]. Compared to RGB sensors, off-the-shelf UAV multispectral sensors offer additional advantages for species classification by sensing beyond the visible spectrum. For instance, MicaSense RedEdge, Altum, Parrot’s Sequoia, Sentera 6X Thermal Pro, etc., also capture in red-edge and near-infrared (NIR) bands. These additional bands can enhance tree species discrimination by capturing subtle spectral differences and thus resulting in more distinct leaf reflectance.

Despite these advances, existing tree species classification approaches remain constrained by several methodological and operational challenges. In mixed and structurally complex forests, species discrimination is difficult because different species may exhibit similar canopy appearances and overlapping spectral signatures. A recent work has shown that overlapping spectral responses can limit the separability of forest composition and diversity patterns, indicating that spectral information alone may not always clearly distinguish complex vegetation conditions [17]. In addition, spectral reflectance can vary within the same species due to differences in canopy structure, biochemistry, physiology, phenology, tree health, illumination, and acquisition conditions [17]. At the individual-tree level, these challenges are further amplified by crown overlap, forked crowns, variability in tree architecture, and tree density, which can affect both crown delineation and species classification accuracy [10]. Similar crown shapes and spectral reflectance among species, particularly in mixed conifer–broadleaf forests, can further increase inter-species confusion [10]. As a result, intra-species variability may be comparable to, or even greater than, inter-species differences, particularly for visually or spectrally similar species in complex mixed forests. Some previous studies have also highlighted that overlapping crowns, irregular canopy structures, spectral similarity between species, and within-species variability complicate both tree crown delineation and classification workflows [16,18]. In addition, classification performance is strongly influenced by spatial resolution and scale of analysis, as individual pixels in very high-resolution imagery may represent different canopy components, such as leaves, branches, bark, shadows, or canopy gaps, making the spectral signature of a single species difficult to define consistently [16,19]. Conventional machine learning approaches can perform well when suitable predictor variables are available; however, their success depends strongly on manually engineered spectral, textural, structural, or thermal features, and the most relevant features can vary with sensor type, forest structure, species separability, and study site conditions [16,19,20]. Multi-sensor fusion can improve classification by combining complementary information from different sensors, but it also increases data acquisition, preprocessing, co-registration, and feature extraction requirements, which may limit operational scalability [19,20].

Several studies have reported high classification accuracies under specific data and sensor configurations. For example, UAV-based multispectral point-cloud classification using a dual attention graph convolutional network achieved an overall accuracy of 89.80% and a macro-F1 score of 87.80% [21]. Similarly, deep learning models trained on UAV-based RGB crown images achieved strong performance in a temperate forest, with summer imagery producing an average F1-score of 0.96 and fall imagery producing F1-scores greater than 0.90 for several advanced models [11].

LiDAR-based deep learning studies have further shown the value of three-dimensional structural information for tree classification. In one such study, PointCNN and 3DmFV-Net were evaluated for classifying broader tree categories, including coniferous trees, deciduous trees, and dead-tree classes, using airborne LiDAR data in combination with intensity and multispectral features. The results showed that PointCNN achieved a test accuracy of up to 87.0% when 3D point coordinates, laser intensity, and multispectral information were included, and that the addition of multispectral features improved classification accuracy by up to 16.3 percentage points [22]. This highlights the importance of combining structural and spectral information, particularly when separating classes that cannot be reliably distinguished using geometry alone. However, the classification was conducted at the broader functional or structural class level rather than at the individual species level, indicating that species-level discrimination remains a more challenging task, especially in mixed forests with spectrally or structurally similar species. Similarly, a UAV-LiDAR-based comparison of machine learning and deep learning methods for four tree species found that PointMLP achieved the highest overall accuracy of 96.94%, followed by random forest and support vector machine models [23]. However, this study also noted that species with similar crown structures were more prone to misclassification and that broader validation over larger areas is needed before such models can be operationally generalized.

Recent multisensor studies have also emphasized the benefits of integrating spectral, structural, and object-based information in complex forest environments. An object-based deep learning framework using UAV hyperspectral imagery and LiDAR data was developed for tree species classification in natural secondary forests [24]. The workflow combined U-Net and SLIC for individual tree crown delineation and compared 1D-, 2D-, and 3D-CNN models with and without a convolutional block attention module. The addition of the attention mechanism improved the performance of all CNN models, and the 1D-CNN with attention achieved the highest overall accuracy when selected hyperspectral and LiDAR features were used. The study also showed that red-edge and near-infrared spectral features, texture measures, vegetation indices, and LiDAR height features contributed importantly to species discrimination. At the same time, it highlighted persistent challenges associated with overlapping crowns, crown-boundary delineation, input patch size, and the labour-intensive nature of producing labelled individual-tree samples.

Cross-platform LiDAR-based transfer learning has also shown strong potential for improving generalization across heterogeneous sensors and data-limited target domains. For example, a recent framework using geometry-consistent preprocessing, surface orientation, multi-scale density features, and staged fine-tuning achieved an overall accuracy of 94.8% and a mean F1-score of 91.9% when adapting from a multi-species pretraining dataset to an unseen UAV LiDAR target dataset [25]. The same study further showed that transfer learning improved training efficiency compared with training from scratch, with substantially faster convergence under limited-data conditions, demonstrating its potential for scalable cross-platform point-cloud analysis in forest monitoring.

Collectively, these studies demonstrate that high tree species classification accuracies can be achieved using rich structural information, hyperspectral or multispectral features, data fusion, advanced deep learning architectures, attention mechanisms, or large and diverse training datasets. However, these high accuracies are generally achieved within specific domains, such as RGB imagery, hyperspectral imagery, multispectral point clouds, or LiDAR point clouds, and often depend on sensor-specific preprocessing, high-density point clouds, or favourable sample conditions. Most existing transfer learning studies in remote sensing focus on land-use classification, seasonal transfer, or single-modality workflows, while cross-sensor transfer for individual tree species classification remains relatively limited. Therefore, despite strong reported performance in recent studies, improving generalization across sensors, spatial resolutions, and labelled-data availability remains an important research gap.

Beyond traditional remote sensing and machine learning algorithms, recent years have seen the wide adoption of Convolutional Neural Networks (CNNs) for tree species classification due to their ability to learn hierarchical spatial and spectral representations directly from image data [2,11,26]. Conventional machine learning classifiers, such as Random Forest (RF) and Support Vector Machine (SVM), remain valuable and widely used in tree species classification [6,19], particularly when labelled samples are limited and well-designed predictor variables are available. However, their performance has been shown to vary across species classes, sensor types, study regions, and acquisition conditions [6,19]. In particular, these methods tend to perform well for dominant or spectrally distinct species but exhibit reduced accuracy for less represented or spectrally similar classes [15,16]. A key limitation is that these methods generally rely on manually engineered spectral, textural, structural, or vegetation-index features, and the quality of the classification is therefore strongly influenced by the choice and transferability of these features [13,27]. Features optimized for one dataset, sensor configuration, or forest condition may not generalize effectively to another, increasing data preparation requirements and potentially limiting model portability [6,19].

In contrast, CNNs perform feature learning directly from real-world imagery through convolutional filters, allowing the model to learn multi-scale patterns such as edges, textures, crown shapes, branching structure, and higher-level canopy characteristics that may be difficult to define manually [13,27]. This is particularly relevant for UAV-based individual tree species classification, where very high spatial resolution imagery contains fine-scale canopy patterns that can support discrimination among visually similar species [1,11,14,28]. Reviews of CNN applications in vegetation remote sensing have shown that CNNs frequently outperform shallow machine learning methods, largely because they can exploit spatial context and reduce the need for handcrafted feature engineering [27]. Previous studies have also demonstrated that CNN-based approaches can achieve strong classification performance using UAV imagery and can remain robust under varying acquisition conditions when trained with diverse image samples [11,13].

Nevertheless, CNNs are not inherently superior in all situations. They typically require larger training datasets, greater computational resources, and careful tuning than RF or SVM, and overly complex architectures may overfit or underperform when sample sizes are small, and there is no significant data diversity [13,27]. Therefore, under limited labelled data conditions, CNNs are most appropriate when combined with strategies such as transfer learning, data augmentation, regularization, and careful architecture selection to improve generalization [11,13,27]. In this study, CNNs were preferred to exploit learned spatial–spectral representations from tree-crown imagery and evaluate whether knowledge from a larger RGB source dataset could improve multispectral classification under data-scarce conditions. Conventional classifiers such as RF and SVM remain important benchmarks, but the proposed CNN-based framework was selected to support end-to-end feature learning and cross-domain transfer from high-resolution UAV RGB imagery to limited multispectral data.

Despite the advantages of CNN-based approaches, their operational scalability in UAV-based tree species classification remains limited. UAV data collection is constrained by coverage area, cost, and flight regulations, as training CNNs demands large, labelled datasets that are costly, time- consuming, and logistically difficult to obtain. Although considerable progress has been made in tree species classification using UAV-acquired RGB, multispectral, and LiDAR-based datasets, most studies focus on optimizing classification within a specific sensor type, season, platform, or data domain. In practice, UAV platforms are equipped with diverse sensors that differ in spatial resolution, spectral band configuration, radiometric response, and noise characteristics. Similarly, tree species classification studies using RGB and multispectral imagery have shown that these data sources contribute different types of information, with RGB imagery providing fine spatial and textural detail and multispectral imagery contributing additional spectral information [18]. These variations introduce domain shifts between datasets, causing models trained on imagery from one sensor to learn sensor-specific features that may not generalize well to data acquired from another system, thereby limiting broader applicability and operational scalability. More broadly, domain adaptation studies in remote sensing have shown that distribution differences between source and target domains can arise from variations in sensor characteristics, imaging conditions, spatial resolution, and scene properties [18]. Recent research has also emphasized that robustness and transferability across seasons, sites, larger spaces and sensors still remain a critical challenge for deep learning pipelines, in addition to the cost of training data creation (data collection, delineation and labelling) [11]. These limitations highlight the need for cross-domain supervised transfer learning, where knowledge gained from one sensor (e.g., RGB imagery) is leveraged to improve performance on another sensor (e.g., multispectral imagery) with minimal additional training. Such strategies offer the potential to reduce dependence on large, labelled datasets, improve efficiency, and enable more robust, sensor-independent tree species classification.

Although transfer learning has been widely applied in remote sensing and vegetation classification, many existing approaches rely on models pretrained on general-purpose datasets such as ImageNet. Such pretrained models can improve performance when labelled data are limited, as demonstrated in UAV-based deciduous versus evergreen tree classification using winter orthomosaic imagery, where ImageNet-based transfer learning improved performance compared with training without transfer learning [29]. However, the benefit of generic pretraining is not always consistent and may depend on the similarity between the source and target domains, the classification task, and the network architecture. For example, pretraining has been shown to provide limited or inconsistent improvement in tree seedling detection, with its effectiveness varying according to network complexity [30]. Similarly, full training has been reported to outperform fine-tuning of ImageNet-pretrained backbones in remote sensing classification tasks [31], while another study suggests that task-specific training may outperform generic pretraining when sufficient training data and computational resources are available [32]. These findings indicate that features learned from generic image datasets may not always be optimal for vegetation-focused remote sensing applications.

Therefore, the present study adopts a more domain-relevant transfer learning strategy. Instead of relying solely on generic ImageNet-based pretraining, we adapt a DenseNet-121 model previously trained on a large, high-resolution UAV RGB tree-crown dataset acquired from the same study site. Although the source and target datasets were acquired using different sensors, they represent similar tree species, canopy structures, and site-specific ecological conditions. This makes the transferred representations more relevant to the target multispectral classification task and provides a more appropriate framework for supervised cross-domain and cross-modal transfer from UAV RGB imagery to UAV multispectral imagery.

In the context of tactical forest management, aerial imagery is routinely collected by provincial governments across Canada with the same spectral configuration as off-the-shelf UAV multispectral sensors, such as red, green, blue (RGB), red-edge, and NIR bands, making it compatible in terms of spectral content. Despite its operational utility, traditional aerial multispectral imagery has long struggled to support reliable individual tree species classification because its spatial resolution (typically 0.5–2 m) is too coarse to capture crown-level spectral and textural details, especially in dense boreal stands where crowns overlap, and mixed pixels are common [33,34]. These resolution constraints, combined with radiometric inconsistencies across airborne campaigns, have hindered species-level mapping [33], thus restricting it to stand-scale assessment, generally completed through manual interpretation. The performance for automated tree species classification has been limited as well, primarily due to its coarser spatial resolution failing to capture species-specific crown texture [33], the labour-intensive process of generating labelled training data [34,35], and the sensitivity of object-based methods to segmentation errors and mixed pixels [36]. If models trained on high-resolution UAV multispectral imagery could be effectively adapted to classify species in aerial photographs, this would significantly expand the applicability of deep learning to forest inventories and ecological monitoring at larger scales.

In this study, we successfully developed a tree classification model for UAV-based multi-spectral imagery using domain adaptation to address limited labelled samples, with potential for scalability to aerial platforms with similar spectral configurations. This was achieved through the following contributions:

Baseline modelling with limited multispectral data:

We develop and evaluate a baseline individual tree species classification model using UAV-acquired multispectral imagery with limited labelled samples, examining the role of increased spectral dimensionality under data-scarce conditions.

2.: Cross-domain transfer learning from RGB to multispectral imagery:

We investigate the potential of supervised cross-domain transfer learning by adapting a convolutional neural network (CNN) model trained on high-resolution UAV RGB imagery with a large, labelled dataset to lower-resolution multispectral imagery with limited labelled samples.

3.: Assessment of scalability to aerial imagery:

We assess the feasibility of scaling the adapted model to aerial platforms by applying it to down-sampled UAV multispectral imagery that simulates the spatial characteristics of multispectral aerial photographs used in regional forest inventories.

We demonstrated the contributions through application in parts of a complex mixedwood boreal forest in Canada.

2. Materials and Methods

2.1. Study Area

The research was conducted in a ~20 ha forested site located within the Petawawa Research Forest (maintained by Natural Resources Canada), designated for ongoing monitoring in Ontario, Canada. This site is a representation of multiple stand conditions and species mix generally seen in sub-boreal mixed woods. About 10 ha of the site is harvested under a shelterwood system to manage pine species, a hectare area of older stand with naturally developed mixed species retained as a control block for a continuous reference, while the northern part of the site consists of an unmanaged younger stand of naturally developed mixed species. The tree composition in the site is dominated by eastern white pine (Pinus strobus) and red pine (Pinus resinosa), along with mixtures of balsam fir (Abies balsamea), white spruce (Picea glauca), red maple (Acer rubrum), red oak (Quercus rubra), and white birch (Betula papyrifera). In swampy areas, eastern white cedar (Thuja occidentalis) is present, occasionally mixed with black ash (Fraxinus nigra) and a small fraction of black spruce (Picea mariana). Canopy heights range from 6 to 36 m. The terrain is rocky but fertile and is relatively flat.

2.2. Data Acquisition

The acquisitions of multispectral and RGB images over the study area were completed in separate missions. UAV-based multi-temporal RGB images using a Nikon D810 DSLR camera were acquired during both leaf-on conditions (summer 2016 and summer 2018) and leaf-off conditions (fall 2015) to maximize seasonal and temporal variations. Images of the flatly placed calibration reflectance panel were taken at 1 m height just before the start and right after the completion of the multi-spectral image acquisition. Multi-spectral imagery was acquired during the summer of 2018 using a MicaSense RedEdge-M sensor (Micasense Inc., Seattle, WA, USA) mounted on a DJI Matrice 210 RTK (SZ DJI Technology Co., Ltd., Shenzhen, China) UAV platform (Figure 1a).

The on-board RTK was fully integrated to provide centimetre-level positional and vertical accuracy for all images. The RTK base station was established in a clear opening approximately 800–900 m northwest of the flight area. Within the UAV flight area, the farthest image acquisition location was more than 200 m from the GNSS base station. Flights were conducted at an average altitude of 60 m above ground, which provided a ground sampling resolution of approximately 8 cm, though slight variations in resolution occurred across images. The flight followed a lawn mower pattern with 80% forward and side overlaps, respectively, to ensure the necessary coverage, accurate image mosaicking, and 3D reconstruction.

MicaSense RedEdge-M (Figure 1b) offers compact multi-band rigs of five cameras with complementary metal oxide semiconductor (CMOS) sensors, a type of imaging sensor commonly found in phones and single reflex lens DSLRs (https://support.micasense.com/). Each camera in the rig is equipped with an individual narrow-band filter that removes all but a discrete section of the visible or near-infrared part of the spectrum. The spectral characteristics of Micasense RedEdge-M are shown in Table 1.

Radiometric calibration was performed to convert at-sensor radiance (in digital numbers) into absolute surface reflectance while accounting for variations in ambient light caused by weather, solar angle, and sensor characteristics. This relationship between at-sensor and surface reflectance is typically established using targets of known reflectance, replicating natural reference surfaces (Figure 1d). Following the standard procedure recommended by MicaSense [37] images of a calibrated reflectance panel (CRP) (Figure 1c) were captured prior to and after each flight to derive band-specific reflectance factors. These factors were computed using the known absolute reflectance values of the CRP and the average radiance measured from the panel images, ensuring that raw digital numbers were accurately converted to surface reflectance. This procedure corrects for sensor and illumination-related variability, enabling consistent spectral comparison across flight missions. The calibration process was automated in Metashape v2.3 photogrammetric software [38] by providing the reflectance panel images collected at the beginning and end of each mission, allowing the software to apply the derived factors and convert to surface reflectance.

2.3. Multispectral UAV Image Pre-Processing and Preparation of Labelled Dataset

The images were also processed using the Metashape v2.3 photogrammetric software, to generate a digital surface model (DSM), a surface mesh, and an orthomosaic image following a standard workflow available within the tool (details standard procedures could be referred to Metashape’s online manual). The spatial resolution of the resulting orthomosaic was 7.6 cm (Figure 2). All subsequent processing steps of delineating the individual crowns, labelling, and extraction were carried out in ArcGIS Pro 3.1.0.

In a previous study, we developed a tree species identification model using Dense Convolutional Network (DenseNet-121) using a large crown database of these multi-temporal UAV RGB orthoimages [28]. The orthomosaic of the RGB images acquired in 2018 is shown in Figure 3. Since the study area was the same, we utilized the RGB labelled tree crown polygons to generate MS crown samples of various species. RGB crown polygons were overlaid on the multispectral orthomosaic to ensure spatial alignment between the datasets. As both orthomosaics were georeferenced, the polygons were largely well-aligned with the tree crowns in the multispectral imagery. Minor manual adjustments were needed where slight spatial discrepancies were observed to ensure accurate crown delineation. The crown segmentation was subsequently verified by an experienced photo interpreter and forestry experts to ensure accuracy. Each tree crown polygon was converted to a minimum bounding rectangle to standardize the extraction process (Figure 4).

To develop the reference dataset, field observations were collected during the fall of 2018 with support from professional foresters. Individual trees representing both hardwood and softwood species were identified in the field using characteristics such as canopy appearance, bark texture, leaf morphology, and seasonal coloration patterns. A georeferenced PDF version of the orthomosaic was loaded into the Avenza Maps application to record and spatially tag the identified trees directly in the field. Sampling was designed to capture variability across species and forest stand types, including mixed conifer, conifer-dominated, and mixed hardwood stands. More than 300 trees were field-verified and linked to their corresponding crowns in the UAV imagery.

These field-confirmed trees were subsequently used as reference samples for the interpretation and labelling of additional crowns in the multispectral orthomosaic. Crown labels were assigned through detailed visual interpretation and independently cross-checked against the corresponding RGB orthomosaic by trained forestry technicians. The final labelled dataset was further reviewed by forestry experts to ensure label consistency and delineation accuracy. Through this process, a total of 1632 labelled tree crowns were generated for the multispectral dataset.

To maintain representative species distributions across datasets, the samples were partitioned using a stratified approach. Of the total labelled crowns, 70% were allocated to the model development dataset, of which 20% was further used for internal validation and hyperparameter tuning during training. The remaining 30% was set aside as an independent test set and was not used during any stage of model training or validation. The number of labelled training samples for each species is presented in Table 2.

The corresponding multispectral image subsets representing individual tree crowns were then extracted using these minimum bounding rectangles. Detailed information on the preprocessing workflow and the preparation of the training and testing datasets can be found in [28]. Figure 5 presents representative examples of individual tree crown images extracted from the multispectral orthomosaic.

3. Methods

A schematic diagram showing the methodological workflow summarizing the three main experimental phases is provided in Figure 6. The workflow illustrates the progression from baseline modelling using limited UAV multispectral data, to supervised RGB-to-multispectral domain adaptation, followed by aerial image simulation to assess the scalability of the adapted model.

3.1. Tree Species Classification Models Using Multispectral Images

To establish a baseline for individual tree species classification using UAV multispectral imagery, we conducted a series of experiments in which DenseNet-121 and EfficientNet CNN architectures [39,40] were trained from scratch using the limited labelled multispectral (MS) dataset. The selection of these two architectures was motivated by their complementary design philosophies and their suitability for data-constrained learning scenarios.

DenseNet-121 was chosen for its densely connected architecture, in which feature maps are concatenated within dense blocks to promote feature reuse and efficient information flow across layers. This connectivity pattern improves gradient propagation, mitigates the vanishing gradient problem, and encourages the learning of compact yet discriminative representations. Such characteristics are particularly advantageous in data-constrained settings, as they enhance learning efficiency while controlling model complexity through transition layers incorporating batch normalization, 1 × 1 convolutions, and 2 × 2 average pooling. Moreover, a prior comparative study [7] has shown that DenseNet-121 can outperform several lightweight and widely used backbone networks in tree species classification tasks. This improved performance has been attributed to its dense connectivity and effective feature reuse, which enable the network to better capture subtle inter-class relationships among visually similar tree species. These properties make DenseNet-121 particularly well suited for multispectral tree species classification, where discriminative features are often subtle and highly correlated across classes. The schematic representation of the DenseNet architecture is shown in Figure 7.

EfficientNet, in contrast, was selected to evaluate a more recent architecture based on compound scaling of network depth, width, and input resolution. Rather than scaling these dimensions independently, EfficientNet applies a balanced and systematically optimized scaling strategy that improves accuracy while maintaining strong parameter efficiency. This makes it well suited for applications involving limited labelled samples and varying spatial resolutions, such as UAV multispectral imagery. Its use of mobile inverted bottleneck (MBConv) blocks and squeeze- and-excitation modules further enables adaptive channel-wise feature recalibration, which may be beneficial for multispectral data where different spectral bands contribute unequally to species discrimination.

Multiple trials were conducted under different configurations of loss functions and regularization strategies to assess model performance as shown in Table 3. Initial experiments with EfficientNet variants (B0, B3, and B7), which represent progressively scaled versions of a common baseline architecture with increasing network depth, width, and input resolution, were conducted. Specifically, EfficientNet-B0 serves as the baseline model with relatively low parameter count (~5 M) and an input resolution of 224 × 224, EfficientNet-B3 represents a mid-sized configuration (~12 M parameters, 300 × 300 resolution) offering a balance between capacity and efficiency, while EfficientNet-B7 corresponds to a substantially larger model (~66 M parameters, 600 × 600 resolution) designed for maximum representational power at significantly higher computational cost. These models were trained from scratch using categorical cross-entropy loss and dropout regularization but resulted in relatively poor classification performance with limited training data.

DenseNet-121 achieved comparatively better, though still moderate, results, reaching 56% overall accuracy when class weights were applied to address class imbalance using categorical cross-entropy loss. Replacing the loss function with focal loss led to a modest improvement, increasing accuracy to 62%, while the addition of L2 regularization did not yield further gains.

3.2. Tree Species Classification Models Using Multispectral Images and Supervised Domain Adaptation Using RGB-Based Source Model

The moderate classification performance observed in Section 3.1 highlights key challenges associated with training deep learning models directly on UAV-acquired multispectral imagery under data-scarce conditions. In particular, the relatively low accuracies can be attributed to two primary factors: (i) the limited availability of labelled multispectral training samples, which restricts the model’s ability to learn robust and generalizable feature representations, and (ii) the comparatively lower spatial resolution of multispectral imagery relative to higher resolution RGB imagery, which reduces the level of structural and textural detail available for discriminating between tree species. These constraints hinder the effectiveness of conventional supervised learning approaches when applied solely within the multispectral domain.

To address these limitations, we propose a supervised domain adaptation framework that leverages knowledge learned from a source domain consisting of high-resolution RGB imagery with abundant labelled data and transfers it to a target domain comprising lower-resolution multispectral imagery with limited labelled samples. By exploiting the rich feature representations learned from the source domain, the proposed approach aims to improve generalization performance in the target domain despite data scarcity. This cross-domain, cross-modal transfer learning strategy was used to transfer discriminative representations learned from RGB imagery while adapting to differences in spectral and spatial characteristics between RGB and multispectral data.

3.2.1. Tree Species Classification Models Using RGB Imagery

In our earlier work [28] we developed a DenseNet-121 based deep learning model for individual tree-level classification of softwood species using high-resolution UAV-acquired RGB imagery collected with a consumer-grade camera mounted on a UAV platform. The model was subsequently extended to classify nine tree species, incorporating both softwood and hardwood classes. The model was trained on a large and diverse dataset of over 10,000 cropped RGB images of individual tree crowns, collected over a three-year period under leaf-on and leaf-off conditions. The dataset captured substantial seasonal, temporal, illumination, and angular variability, including differences in foliage density, greenness, time of day, and image acquisition parameters. The trained model demonstrated strong performance, achieving an overall classification accuracy of 79% across the nine species. Given the robust performance achieved on RGB imagery, the model provided a strong foundation for transfer learning and motivated its adaptation to multispectral imagery where labelled training data are limited.

3.2.2. Tree Species Classification Models Using Multispectral Images and Supervised Domain Adaptation

The source domain comprises high-resolution RGB imagery, whereas the target domain consists of lower-resolution multispectral imagery. This setup characterizes a domain adaptation problem, where the task remains unchanged, but the feature distributions differ between the domains [41,42]. Given that the target domain includes a limited number of MS labelled data, the scenario falls under supervised domain adaptation [43]. Moreover, because the transfer involves learning across different data modalities from RGB to multispectral imagery, this approach is also referred to as cross-modal transfer learning, or modality transfer, which has been shown to facilitate knowledge transfer between heterogeneous feature spaces [44].

To leverage the pretrained features learned by this model from the RGB images for our 5-band multispectral dataset, we used our DenseNet-121 model initially trained on RGB images. The multispectral imagery used in this study consisted of five spectral bands (blue, green, red, red-edge, and near-infrared), requiring modification of the original 3-channel RGB input layer to a 5-channel configuration. We transferred the weights from the early convolutional layer of the original model trained on RGB images to learn low-level features of the MS model such as edges, textures, and simple patterns, which are common between RGB and multispectral images of trees. The extracted weights from the first convolutional layer, had an input kernel shape of (7, 7, 3, 64), representing a 7 × 7 kernel size, 3 input channels, and 64 output filters. To adapt this kernel for compatibility with our 5-band input images, we calculated the mean of the pretrained RGB filters across the spectral dimension. The resulting mean filter was duplicated twice to generate weights for the two additional multispectral bands. This approach was selected to preserve the low-level spatial features learned from RGB imagery while providing a stable initialization for the expanded multispectral input space under limited training data conditions. These additional filters were concatenated with the original 3-channel weights, forming a new kernel of shape (7, 7, 5, 64). The modified 5-channel kernel was injected into the DenseNet-121 model by replacing the weights of the first convolutional layer. The input shape was defined as 50 × 50 × 5 to match our preprocessed multispectral images of tree crowns. The proposed supervised domain adaptation framework for tree species classification is shown in Figure 8.

Data Augmentation

Since the size of the training dataset was significantly low, we carried out data augmentation to increase the number of training images to reduce overfitting and improve the generalization of the model. Each original training image underwent random spatial and radiometric transformations including horizontal and vertical flips, random 90-degree rotations, and random brightness and contrast adjustments. For each original image, three additional augmented samples were generated, effectively quadrupling the size of the training dataset from 1181 to 4724 images.

Custom Classification and Regularization

At the end of the adapted DenseNet-121 network, we added two fully connected (dense) layers following the global average-pooling layer. This allowed us to fine-tune the higher-order feature representations of the tree canopies along with the final Softmax activation function.

Following the DenseNet-121 feature extractor, a custom classification head was added. This consisted of a global max pooling layer followed by two fully connected layers with 512 and 64 units respectively, each using ReLU activation and L2 regularization (λ = 1 × 10⁻⁴). Dropout (rate = 0.5) was applied after each dense layer to reduce overfitting. The final classification layer used a Softmax activation to output probabilities across 9 classes.

Loss Functions and Optimization

To address the substantial class imbalance in our training dataset (Table 2), where the number of samples per class varies considerably from as few as 56 samples for red maple and 68 for eastern white cedar to as many as 272 for eastern white pine and 206 for red pine, a focal loss function was implemented. This imbalance can bias the model toward majority classes during training, leading to suboptimal performance on underrepresented species. Focal loss modifies the standard cross-entropy loss by applying a modulating factor that down-weighs well-classified examples and emphasizes misclassified or minority class instances [45]. It reduces the contribution of well-represented species while increasing the contribution of underrepresented species during training. The Focal loss is defined as:

F L (p_{t}) = - α_{t} {(1 - p_{t})}^{γ} l o g (p_{t})

(1)

where p_t denotes the model’s estimated probability for the ground-truth class. For a given sample with true class label y ∈ {0,1} and predicted probability p, p_t is defined as:

p_{t} = \{p, i f y = 1 1 - p, o t h e r w i s e\}

(2)

Here,

α_{t}

∈ [0, 1] is a weighting factor that balances the importance of positive and negative classes, helping to address class imbalance at the dataset level. The focusing parameter γ ≥ 0 controls the rate at which easy examples are downweighed.

In this study, focal loss with parameters α = 0.25 and γ = 2.0 was used to address class imbalance in the training dataset. These values were chosen to reduce the dominance of majority class samples during training while increasing the contribution of underrepresented classes.

The input data consisted of 5-band multispectral images resized to 50 × 50 pixels, with corresponding categorical labels. To evaluate generalization performance, 20% of the training data was reserved for validation using an internal split. The training was performed for 300 epochs with a batch size of 32, and the data were shuffled at the beginning of each epoch. The model was optimized using the Adam optimizer with a learning rate of 1 × 10⁻⁴. Training was monitored using a set of callbacks, including ModelCheckpoint, which saved model weights at the end of each epoch based on validation accuracy. Although validation performance stabilized after approximately 150 epochs, training was continued for the full 300 epochs to fully capture the learning dynamics of the transfer learning framework under limited multispectral data conditions. The final model used for evaluation corresponds to the epoch with the highest validation accuracy (epoch 247), as identified through checkpointed training history. In addition, a CSV logger was used to record training and validation metrics across epochs to facilitate performance analysis. All models used in this study were implemented using the Keras deep learning library with a TensorFlow backend.

3.3. Aerial Imagery Simulation

Following the development of the domain adaptation framework described in Section 3.2, we further extend the methodology to evaluate its potential applicability to coarser-resolution data, representative of operational forest inventory systems. We designed an experiment to assess model performance under conditions that approximate airborne multispectral imagery commonly used in regional-scale forest inventories. Specifically, we applied the trained model to artificially downsampled UAV multispectral imagery to simulate the spatial characteristics of conventional airborne multispectral sensors. This approach enables a controlled evaluation of the model’s sensitivity to spatial resolution while preserving the original spectral information and class labels.

The original tree crown images in the independent test dataset were at a spatial resolution of 8 cm, providing fine spatial detail of crown structure and texture. To approximate the conditions of standard multispectral aerial images, these images were resampled to a coarser resolution of 20 cm, resulting in a noticeable reduction in spatial and textural detail that are often critical for species discrimination (Figure 9). Despite this degradation, the image dimensions were preserved at 50 × 50 pixels to maintain compatibility with the input architecture of the pretrained CNN model and to ensure a fair comparison with previous experiments.

3.4. Accuracy Assessment

Model performance was evaluated using an independent test dataset, comprising 30% of the labelled samples from each species class, which was held out and not used at any stage of model development or cross-validation. Classification performance was assessed using standard metrics, including precision, recall, F1-score, and overall accuracy. These metrics primarily focus on the correct identification of positive instances and do not explicitly incorporate true negative predictions. Precision measures the proportion of correctly predicted positive samples among all samples assigned to a given class, whereas recall quantifies the proportion of correctly identified positive samples relative to all actual positives. Macro-averaged precision (P_M) and recall (R_M) were computed as defined in Equations (1) and (2). In these equations, t_p, f_p, and f_n denote true positives, false positives, and false negatives, respectively, while l represents the total number of classes. Overall classification accuracy was calculated as shown in Equation (5).

P_{M} = \frac{\sum_{i = 1}^{l} \frac{t_{p_{i}}}{t_{p_{i}} + f_{p_{i}}}}{l}

(3)

R_{M} = \frac{\sum_{i = 1}^{l} \frac{t_{p_{i}}}{t_{p_{i}} + f_{n_{i}}}}{l}

(4)

A c c u r a c y = \frac{C o r r e c t l y c l a s s i f i e d i m a g e s}{T o t a l n u m b e r o f i m a g e s}

(5)

4. Results

4.1. Classification with RGB Pretrained Model

A series of experiments were conducted using DenseNet-121 pretrained on RGB imagery with different combinations of loss functions and regularization techniques. Table 4 summarizes the experimental configurations and the corresponding classification accuracies obtained using the pretrained DenseNet-121 model. Overall, pretraining on RGB data led to a strong and stable performance during subsequent training on multispectral imagery. Further, the use of the categorical cross-entropy loss functions was tested as well. The training and validation curves obtained using categorical cross-entropy loss in conjunction with dropout, L2 regularization, and data augmentation demonstrate consistent convergence and improved generalization performance (Figure 10). Both training and validation accuracy increased steadily and plateaued above 0.88, indicating that the model learned generalizable features for individual tree species classification well. The small and consistent gap between training and validation accuracy further demonstrates good generalization with minimal overfitting. Similarly, the training and validation loss curves show a smooth downward trend and converge around epoch 150, with losses stabilizing at low values toward the end of training. Minor fluctuations observed in the validation loss during later epochs are likely attributable to the effects of data augmentation and do not indicate significant instability. Collectively, these results confirm that the combined use of focal loss, L2 regularization, and data augmentation contributed to a robust and well-optimized training process. Overall, the accuracy and loss curves reflect a good fit learning behaviour, demonstrating that the RGB-pretrained DenseNet-121 model, when adapted to multispectral data, maintains stable learning dynamics and achieves robust performance.

Table 5 presents the confusion matrix obtained from the pretrained DenseNet-121 model trained with Focal loss and regularization techniques, including Dropout, L2 weight decay, and data augmentation, when evaluated on the independent test dataset. Species-level recall and precision were visualized using point estimates (Figure 11) with 95% confidence intervals derived from the independent test-set confusion matrix. The points indicate the observed recall and precision for each species, while the error bars represent uncertainty due to finite class-specific sample sizes. Overall, the model achieved an accuracy of 0.75 on the independent test set, with a macro-F1 score of 0.706 and a weighted F1 score of 0.719. The weighted F1 being slightly higher than the macro-F1 indicates that performance was somewhat stronger for the more represented species, while the macro-F1 highlights residual variability in classification performance across classes.

The confusion matrix indicates generally strong but species-dependent performance, with clear differences between coniferous and deciduous species. Several classes achieve high recall, including red oak (0.84), red pine (0.83), balsam fir (0.83), and eastern white cedar (0.81), suggesting that the model effectively captures discriminative spectral-structural features. In contrast, red maple (0.44) shows the weakest performance, followed by white spruce (0.56) and white birch (0.59), reflecting greater spectral overlap.

Classification performance is generally higher and more consistent for coniferous species than for deciduous species (Table 5). Most conifers achieve relatively high recall (typically ≳0.78), with the notable exception of white spruce, which shows comparatively lower performance. Red pine and eastern white pine demonstrate strong and stable classification. This likely reflects the pine-dominated nature of the study area, where greater representation of pine species enhances feature learning and generalization. In contrast, deciduous species exhibit more variable performance; while red oak is well distinguished, red maple shows substantial confusion across classes. Figure 12 presents the spatial distribution of correctly classified and misclassified tree crowns in the independent test dataset, providing a visual representation of the classification outcomes summarized in the confusion matrix.

4.2. Simulated Aerial Imagery Model Using Downsampled MS Imagery

To further evaluate the generalization capability of the proposed framework beyond high-resolution UAV imagery, we assessed its performance under simulated aerial imaging conditions. For assessing the scalability to aerial imagery, we selected the best-performing configuration from the previous stage: a DenseNet-121 model pretrained on large-scale RGB imagery and fine-tuned on MS data using focal loss in combination with dropout, L2 regularization (weight decay), and moderate data augmentation. This model, which achieved 75% accuracy on the original multispectral dataset, serves as a robust baseline for evaluating performance under reduced spatial resolution. The model was applied directly to the downsampled multispectral dataset without further retraining, enabling an assessment of its resilience to spatial degradation and its potential applicability to operational airborne data.

Under these conditions, the model achieved an overall classification accuracy of 69%, indicating that while performance declined relative to the original high-resolution UAV imagery, the adapted model retained a substantial level of discriminative capability even at reduced spatial resolution. The confusion matrix for all 9 species is shown in Table 6. The aggregate performance metrics further indicate a moderate but uneven classification outcome, with a macro-F1 score of 0.615 and a weighted F1 score of 0.655. The higher weighted F1 suggests that the model performed relatively better on the more represented species, whereas less represented or more spectrally similar species remained more difficult to classify. This result suggests that the transfer-learned model has potential for application to coarser resolution aerial multispectral data, although further refinement may be necessary to optimize performance for operational forest inventory mapping.

The confusion matrix (Table 6) shows a clear but expected decline in class separability relative to the original high-resolution imagery. This reduction likely reflects the loss of fine-scale spatial and textural information that is important for species-level discrimination in UAV data. Despite this decline, the model retains meaningful separability across several species, demonstrating some robustness to spatial degradation.

Species-wise performance shows increased variability compared to the original results. Coniferous species continue to perform relatively well, particularly red pine (recall 0.91) and eastern white pine (0.88), reinforcing the earlier observation that dominant pine species are more reliably classified, likely due to both their structural distinctiveness and stronger representation in the dataset. However, some conifers exhibit notable degradation; for instance, white spruce drops substantially (0.25) and is frequently confused with balsam fir and eastern white pine, indicating that reduced resolution exacerbates spectral similarity within conifer groups.

Deciduous species show a more pronounced decline in performance. Red maple is particularly affected (0.20), with widespread confusion across multiple species, while black ash (0.70), although still moderate, continues to exhibit dispersed misclassification patterns.

The misclassification patterns reveal an intensification of both intra- and inter-group confusion. While conifer errors remain partly structured within the group (e.g., white spruce with balsam fir), cross-group confusion increases under downsampling, with several deciduous species misclassified as conifers and vice versa. This suggests that the loss of spatial detail reduces the model’s ability to leverage crown structure, forcing greater reliance on spectral cues that are less distinctive across species.

5. Discussion

This study demonstrates that pretraining DenseNet-121 on a model trained using a large amount of RGB images provides a strong foundation for learning discriminative features in comparatively low spatial resolution multispectral imagery, even with limited training data to discriminate nine commercial tree species (of which five are coniferous and four are deciduous), in a complex mixedwood boreal forest. This is particularly important because the baseline multispectral-only models in Section 3.1 showed only moderate performance, highlighting the challenges of training deep CNNs directly on data-scarce multispectral datasets.

The performance limitations observed in the multispectral-only models are likely influenced by two main factors: the coarser spatial resolution of multispectral imagery compared to RGB data and the limited size of labelled training samples. These constraints reduce the model’s ability to learn fine-grained and generalizable representations, especially in a complex mixedwood forest where species discrimination can be affected by subtle spectral differences, overlapping crowns, and class imbalance.

To address these limitations, this study explored cross-domain transfer learning by adapting a DenseNet-121 model pretrained on a large, high-resolution UAV RGB dataset. Despite differences in sensor modality and spatial resolution, the pretrained model provided a strong initialization for learning discriminative features in multispectral imagery with limited labelled data. The adapted model achieved stable convergence and improved classification performance, demonstrating the effectiveness of transfer learning for multispectral tree species classification under limited training data conditions.

The benefits of transfer learning were further evaluated on downsampled UAV multispectral imagery designed to simulate conventional airborne multispectral data with lower spatial resolution. The model maintained robust performance under these conditions, demonstrating its resilience across different spatial resolutions and its practical applicability for regional-scale forest inventories. Although the downsampled imagery does not replace validation using true airborne multispectral photographs, it provides an initial assessment of the model’s sensitivity to reduced spatial detail and its potential scalability to coarser-resolution aerial imagery.

Overall, these results highlight cross-domain transfer learning as an effective strategy for overcoming data scarcity and sensor limitations, offering a pathway toward scalable, sensor-independent, efficient, and operationally applicable tree species classification.

5.1. Performance of Supervised Domain Adaptation with RGB Pretrained Model

Classification performance of the RGB pretrained model adapted for the MS imagery is seen to be generally higher and more consistent for coniferous species than for deciduous species. Most conifers achieve relatively high recall (typically ≳0.78), with the notable exception of white spruce, which shows comparatively lower performance. In particular, red pine and eastern white pine demonstrate strong and stable classification. This likely reflects the pine-dominated nature of the study area, where greater representation of pine species enhances feature learning and generalization. In contrast, deciduous species exhibit more variable performance; red oak was well distinguished, while red maple showed substantial confusion across classes.

The misclassification patterns reveal a nuanced balance between intra- and inter-group confusion. Conifer errors are more structured and largely confined within the conifer group, as seen in the confusion of white spruce with balsam fir and red pine, indicating similarity in spectral responses. In contrast, deciduous species exhibit broader, less structured mixing across multiple classes. Notably, black ash is misclassified across a wide range of classes, suggesting that the confusion is broadly distributed rather than dominated by any single class. This pattern may be partly attributed to the impact of emerald ash borer infestation in the study area, which alters canopy condition and spectral response, reducing class separability. Although some inter-group confusion is present (e.g., eastern white pine misclassified as red oak), it remains comparatively limited.

Class-wise performance also appears to be influenced by training data representation. Species with relatively greater representation in the training data, such as red pine and eastern white pine, show more stable and higher recall, whereas underrepresented classes, particularly red maple, exhibit poorer performance despite the use of focal loss. This suggests that while loss reweighting mitigates imbalance to some extent, limited training samples still constrain the model’s ability to learn robust class-specific features. Overall, the model demonstrates strong discrimination for dominant conifers while highlighting persistent challenges in separating spectrally similar or underrepresented deciduous species.

5.2. Comparison to Existing UAV RGB and MS-Based Approaches

When compared with existing studies, results from UAV RGB imagery highlight the advantages of very high spatial resolution and explicit structural representation. For example, object-based CNN approaches integrating RGB imagery with 3D-derived information have reported accuracies as high as 93% for seven broad-level tree classes, benefiting from reduced intra-class variability [12]. Similarly, another deep learning model applied to large-area UAV RGB datasets (~40 km²) achieved overall accuracies of approximately 84% across multiple species in a subtropical forest using architectures designed to capture spatial context and small object features [5]. However, such approaches typically rely on high-quality RGB data, explicit segmentation, or large training datasets, and often involve fewer or more aggregated class definitions.

In contrast, multispectral-based approaches introduce additional spectral information but often face challenges related to limited labelled data and reduced spatial resolution. For instance, studies using UAV multispectral imagery with Random Forest classifiers have reported F1 scores ranging from 0.69 to 0.83 in structurally simpler forest types and with fewer species, often relying on handcrafted spectral, textural, and structural features [16]. Similarly, high overall accuracies close to 91% have been reported in wetland environments, though these are typically based on classification tasks involving only a small number of species (e.g., three), representing lower inter- class complexity [15]. Another multi-source UAV study combining RGB, multispectral, and LiDAR data reported an overall accuracy of 83.98%, with species classification improving by 14–18% when multi-season data were used instead of single-season inputs [6]. That study also showed the added value of vegetation indices, texture, and elevation features, but relied on substantially richer temporal and sensor information than considered here.

5.3. Comparison to Existing Multi-Source and Multi-Temporal Deep Learning-Based Approaches

More comparable deep learning-based approaches further highlight the effectiveness of the proposed method. Large-scale multi-source studies integrating aerial and satellite data (e.g., Sentinel-1 and Sentinel-2) have reported F1 scores of approximately 72% across a higher number of species (e.g., 15) but rely on substantially larger and more diverse training datasets [7]. Likewise, UAV-based CNN approaches using EfficientNet architectures and extensive multitemporal datasets (e.g., >17,000 images across eight species) have achieved mean macro F1 scores around 75% [18]. In contrast, the present study achieves similar overall accuracy (75%) using significantly fewer labelled samples and without temporal information, demonstrating the efficiency of cross-domain transfer learning. Overall, these comparisons indicate that the proposed approach performs competitively despite operating under more constrained conditions, including limited training data, the absence of time-series information, and higher species complexity. This underscores its practical value for operational forest inventory applications, where labelled multispectral datasets are often scarce and heterogeneous across sensors.

5.4. Performance of the Simulated Aerial Imagery Model

Classification performance of the MS model applied to the downsampled dataset showed a clear but expected decline relative to the original high-resolution imagery, with overall accuracy decreasing to 69%. This decline is also reflected in the macro-F1 score (0.615) and weighted F1 score (0.655), indicating uneven performance across species and comparatively stronger results for the more represented classes. This reduction reflects the loss of fine-scale spatial and textural information that is critical for species-level discrimination in UAV data. Nevertheless, the model retains a meaningful level of separability across several species, demonstrating a degree of robustness to spatial degradation.

Species-wise performance shows increased variability compared to that of the high-resolution MS imagery. Coniferous species continue to perform relatively well, particularly notable are red pine (recall 0.91) and eastern white pine (0.88), reinforcing the earlier observation that dominant pine species are more reliably classified, likely due to both their structural distinctiveness and stronger representation in the dataset. However, some conifers exhibit notable degradation; for instance, white spruce drops substantially (0.25) and is frequently confused with balsam fir and eastern white pine, indicating that reduced resolution exacerbates spectral similarity within conifer groups. Deciduous species show a more pronounced decline in performance. Red maple is particularly affected (0.20), with widespread confusion across multiple species, while black ash (0.70), although still moderate, continues to exhibit dispersed misclassification patterns. The misclassification patterns reveal an intensification of both intra- and inter-group confusion. While conifer errors remain partly structured within the group (e.g., white spruce with balsam fir), cross-group confusion increases under downsampling, with several deciduous species misclassified as conifers and vice versa. This suggests that the loss of spatial detail reduces the model’s ability to leverage crown structure, forcing greater reliance on spectral cues that are less distinctive across species.

5.5. Comparison to Existing Aerial Imagery-Based Approaches

When compared with studies conducted on true airborne data, the observed performance differences should be interpreted considering substantially different data requirements and problem settings rather than as a direct benchmark. Airborne studies have shown that classification performance is strongly influenced by sensor richness, spatial resolution, structural information, and class complexity. For example, airborne hyperspectral imagery combined with LiDAR has achieved high kappa accuracies for general macro-classes, forest types, and individual species, with reported values of 93.2%, 82.1%, and 76.5%, respectively. However, when the spectral data were downgraded from hyperspectral to multispectral imagery, classification accuracy decreased, particularly for single-species classification, although performance remained relatively high for broader forest-type and macro-class mapping [34]. The same study also showed that high-density LiDAR provided more useful structural information than low-density LiDAR when combined with either hyperspectral or multispectral data. Similarly, a full-waveform LiDAR study has shown that classification performance depends strongly on class complexity, with higher accuracies obtained when the task is simplified from multiple tree species to dominant species or broader coniferous/broadleaved groups [36].

A study using 30 cm aerial imagery combined with airborne LiDAR reported accuracies of approximately 78%, and approximately 73% on independent validation, for nine species using DenseNet-based models [2]. However, these results were achieved for tree species composition mapping, not individual tree-level classification, and relied on an extensive reference dataset comprising approximately 614,582 samples derived from more than 250 aerial images and 354 interpreted sites. Similarly, another large-area aerial mapping study trained and tested nine CNN models using combinations of three training datasets and three architectures, VGG16, ResNet50v2, and DenseNet121, with multiband aerial photographs and a LiDAR-derived canopy height model. The final super-ensemble was evaluated using 1311 independent forest inventory plots and used inter-model agreement to generate spatial uncertainty maps [3]. These examples demonstrate that high-performing airborne approaches are often supported by extensive labelled datasets, explicit structural information, multiple model architectures, and ensemble prediction strategies, all of which can represent major operational challenges.

In contrast, the present study focuses on individual tree classification using limited labelled UAV multispectral data without access to explicit 3D structural inputs. The aerial imagery component was also based on downsampled UAV multispectral imagery designed to simulate the spatial characteristics of conventional airborne multispectral data, rather than true airborne acquisition geometry. Therefore, the downsampled experiment should be interpreted as an initial assessment of model sensitivity to reduced spatial resolution and potential scalability, rather than as a direct comparison with true airborne imagery studies. Importantly, rather than directly competing with such data-intensive approaches, the results highlight the potential of the proposed framework to reduce dependence on large annotated airborne datasets. Despite limited target-domain labels, reduced spatial detail, and the absence of structural inputs, the model maintained reasonable performance for dominant species, particularly conifers such as red pine and eastern white pine. This suggests that cross-domain transfer learning captures features that remain partially invariant to spatial resolution changes. This positions the approach as a practical and cost-effective strategy, where UAV data can be leveraged to pretrain models and support future tree species composition mapping workflows, ultimately reducing the need for extensive manual interpretation and large labelled datasets in airborne applications.

5.6. Training Stability and Generalization

Regarding the model training, the consistently smooth and stable training and validation curves indicate that the pretrained DenseNet-121 learned generalizable representations rather than memorizing training samples. The small gap between training and validation accuracy, together with the convergence of loss curves, suggests effective regularization and minimal overfitting, even in the presence of an imbalanced dataset. This is particularly important for ecological and forestry applications, where collecting large, well-balanced MS datasets is often impractical. The results suggest that a small number of labelled samples, when combined with appropriate pretraining and regularization strategies, can be sufficient to train reliable deep learning models. Although the model was trained for a fixed number of epochs to comprehensively assess its convergence behaviour, improvements in validation performance became marginal during the later stages of training. Future implementations could incorporate an early stopping strategy with a defined patience parameter to improve computational efficiency while maintaining model performance.

5.7. Impact of RGB-Based Pretraining

A key observation from this work is the substantial performance gain achieved through RGB-based pretraining. When pretrained weights were not used, and data augmentation was omitted, classification accuracy on MS imagery dropped sharply, indicating that learning from scratch is inadequate under limited-sample conditions. In contrast, initializing the network with RGB-pretrained weights significantly improved accuracy, confirming the importance of transfer learning. Although RGB and MS data differ spectrally, the pretrained network appears to transfer low- and mid-level spatial features (e.g., texture, edges, and structural patterns) that remain relevant across domains. This supports the use of RGB pretraining as a practical form of domain adaptation for MS classification tasks, particularly when labelled MS data are scarce.

5.8. Evaluation of Loss Functions

The comparison between focal loss and categorical cross-entropy loss reveals that both loss functions benefited from pretraining and regularization. While focal loss is theoretically well-suited for imbalanced datasets, the observed performance differences between focal loss and categorical cross-entropy were relatively small once dropout, L2 regularization, and data augmentation were applied. This suggests that, in this setting, the benefits of focal loss may be partially absorbed by other regularization mechanisms, especially when strong pretrained features are available. Nonetheless, focal loss contributed to stable optimization and helped mitigate class imbalance during training.

5.9. Role and Limits of Data Augmentation

Data augmentation played a critical role in improving overall performance, increasing accuracy from approximately 66%–69% (without augmentation) to 75% when augmentation was applied. This confirms that synthetic variability can partially compensate for the limited number of training samples and the challenges posed by high spatial resolution (30 cm) MS imagery. However, further increasing the number of augmented samples (five augmented images per original image) did not yield additional performance gains and instead led to saturation or slight degradation in accuracy (from 75% to 74%). This suggests that excessive augmentation may introduce redundant or less informative samples, offering diminishing returns once the model has learned the dominant spatial and spectral patterns. These findings highlight the importance of balancing augmentation size with dataset size and variability.

5.10. Operational Scalability and Cost Considerations

In operational forestry, the dominant cost drivers are field data collection, manual interpretation, and repeated airborne acquisitions. In the Canadian boreal context, even basic field inventory plots typically cost $250–500 per plot (based on operational experience in Canada). Generating species labels for remote sensing workflows requires additional expert time, specifically to delineate and label individual crowns by an experienced geomatician with forestry and field knowledge, which could add another $2 per crown, approximately (based on recent operational experience over large areas). At landscape scales, these annotation requirements quickly exceed the cost of the imagery itself. Although airborne multispectral surveys can be effective for district or provincial scale applications, their acquisition costs also remain expensive, averaging approximately $75 per km², whereas UAV acquisitions can be conducted for less than $2 per ha [46,47], making them substantially more accessible for local, regional, or project-level mapping.

The cross-sensor transfer learning approach evaluated in this study has the potential to directly address these structural cost constraints. By pretraining on high-resolution UAV RGB imagery and adapting the model to lower-resolution multispectral data with only limited target-domain labels, the method reduces both the number of required field plots and the volume of crown-level annotations. Although the adapted model achieved a moderate overall accuracy (75%) relative to the highest-performing single-sensor deep learning studies [11,18] or the sensor-specific models [25], this level of performance is comparable to, or better than, many manual interpretation workflows currently used in operational forestry mapping. More importantly, the approach reduces labelling effort and enables the use of heterogeneous imagery sources. If the downsampled MS data could be replaced with conventional airborne photographs, there is a strong potential to further reduce the overall cost of producing species-level maps over broader geographic areas. This would also enhance the level of detail from stand-level classification to individual tree-level mapping.

For agencies responsible for large, remote, and heterogeneous forests at operational scales, the trade-off of slightly lower accuracy can be outweighed by gains in cost efficiency, repeatability, and scalability. The results suggest that cross-sensor transfer learning provides a viable pathway toward sensor-independent, lower-cost species mapping, enabling more frequent updates and broader geographic coverage without the prohibitive expenses associated with traditional inventory methods.

5.11. Future Research Directions

These findings reinforce the notion that large, easily acquirable RGB datasets can be leveraged to overcome data scarcity in MS satellite and aerial imagery, supporting the secondary objective of domain generalization across spectral modalities and spatial resolutions. Overall, the results demonstrate that although performance degrades with reduced resolution, the model maintains reasonable accuracy for dominant species, particularly pines. However, the increased confusion among underrepresented and spectrally similar species highlights the need for further adaptation when transferring to operational airborne imagery. In this study, the model was evaluated in a zero-shot transfer setting, where weights learned from high-resolution UAV multispectral data were directly applied to simulated aerial-scale inputs. While this provides a useful measure of robustness, performance is likely constrained by the lack of adaptation to resolution-induced feature shifts. As a next step, fine-tuning the pretrained DenseNet-121 on downsampled imagery, even with limited samples, could help recalibrate spatial filters and better align feature representations with coarser canopy structure, thereby reducing intra-class confusion, particularly among similar coniferous and underrepresented deciduous species. A further extension could be a multi-resolution learning framework, where the model is trained on both high-resolution UAV data and lower-resolution (simulated or real) airborne data. This would help the model learn features that are more stable across different spatial scales. This idea can be further improved using domain adaptation methods, such as aligning features between high and low-resolution data. A practical next step can be a simulated-to-real transfer pipeline, where downsampled UAV data acts as a bridge between UAV and real airborne multispectral imagery used in operational forest inventory systems. This would allow more realistic testing of model performance in deployment settings. Finally, the present study was designed to evaluate CNN-based baseline modelling and supervised RGB-to-multispectral domain adaptation. Future work could extend this framework by incorporating benchmark comparisons with conventional machine learning classifiers, such as Random Forest, which are widely used in tree species classification. When implemented with appropriate feature extraction, model tuning, and validation procedures, such comparisons would provide additional context for assessing the relative advantage of the proposed transfer-learning framework.

5.12. Summary

Overall, the results indicate that the proposed training strategy successfully overcomes two major challenges in MS image classification: limited sample size and differences in spatial resolution and spectral characteristics. By leveraging RGB-pretrained DenseNet-121 models, effective regularization, and moderate data augmentation, the model achieves stable learning dynamics and robust performance without requiring large MS datasets.

6. Conclusions

This study addressed three key challenges in UAV-based tree species classification: the limited availability of labelled multispectral data, the lack of cross-sensor generalization, and the need for scalability to operational aerial imagery. First, baseline experiments confirmed that training CNN models directly on lower spatial resolution multispectral imagery under data-scarce conditions yields only moderate performance, highlighting the limitations of conventional approaches. Second, the proposed supervised cross-sensor transfer learning framework successfully leveraged knowledge from a high-resolution RGB-trained DenseNet-121 model, resulting in substantial improvements in classification accuracy and demonstrating effective transfer of spatial feature representations across spectral domains. Third, the evaluation on downsampled multispectral imagery showed that the adapted model retains meaningful discriminative capability under reduced spatial resolution, indicating its potential applicability to aerial platforms used in regional forest inventories.

Collectively, these findings demonstrate that cross-sensor transfer learning provides a practical and efficient pathway for overcoming data scarcity and improving the robustness and scalability of tree species classification models. While the results are promising and show a good potential, the study is limited to a single study site and a relatively small multispectral dataset, which may constrain broader generalization across diverse forest types and acquisition conditions. Future work could explore more explicit domain adaptation techniques, such as feature alignment or adversarial learning, as well as self-supervised pretraining on large volumes of unlabeled multispectral data to further enhance generalization across sensors, seasons, and geographic regions.

Author Contributions

Conceptualization, S.N., U.V. and C.A.; methodology, S.N.; software, S.N.; validation, S.N. and U.V.; formal analysis, S.N.; investigation, S.N.; resources, U.V. and C.A.; data curation, S.N. and U.V.; writing—original draft preparation, S.N.; writing—review and editing, U.V. and C.A.; visualization, S.N.; supervision, U.V. and C.A.; project administration, U.V. and C.A.; funding acquisition, U.V. and C.A. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by multiple sources. This work is financially supported by a Discovery Grant, Natural Sciences and Engineering Research Council of Canada (NSERC) and York University. Support for data acquisition was made possible by Natural Resources Canada under the Transformative Technologies contribution agreement with FPInnovations.

Data Availability Statement

The data presented in this study are available upon reasonable request. Access to the data is restricted due to privacy considerations.

Acknowledgments

The authors would like to thank Petawawa Reserve Forest (Canadian Forest Service) and Petawawa Army Base for making the area available for our trial and for providing background data. We thank FPInnovations for providing the entire image and field data for this study as well as field assistance.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned Aerial Vehicle
MS	Multispectral
RGB	Red-Green-Blue
NIR	Near infrared
CNN	Convolutional Neural Network
RTK	Real-time kinematic
CMOS	Complementary Metal Oxide Semiconductor
DSLR	Digital Single-lens Reflex
CRP	Calibrated Reflectance Panel

References

Chen, J.; Liang, X.; Liu, Z.; Gong, W.; Chen, Y.; Hyyppä, J.; Kukko, A.; Wang, Y. Tree Species Recognition from Close-Range Sensing: A Review. Remote Sens. Environ. 2024, 313, 114337. [Google Scholar] [CrossRef]
Reisi Gahrouei, O.; Côté, J.-F.; Bournival, P.; Giguère, P.; Béland, M. Comparison of Deep and Machine Learning Approaches for Quebec Tree Species Classification Using a Combination of Multispectral and LiDAR Data. Can. J. Remote Sens. 2024, 50, 2359433. [Google Scholar] [CrossRef]
Sylvain, J.-D.; Drolet, G.; Thiffault, É.; Anctil, F. High-Resolution Mapping of Tree Species and Associated Uncertainty by Combining Aerial Remote Sensing Data and Convolutional Neural Networks Ensemble. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103960. [Google Scholar] [CrossRef]
Quan, Y.; Shao, G.; Hao, Y.; Li, M. Bridging Forestry Practice and Remote Sensing: Scaling up Forest Composition with Integrated UAV LiDAR and Hyperspectral Data. J. For. Res. 2026, 37, 33. [Google Scholar] [CrossRef]
Zhang, X.; Gu, J.; Azam, B.; Zhang, W.; Lin, M.; Li, C.; Jing, W.; Akhtar, N. RSVMamba for Tree Species Classification Using UAV RGB Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5607716. [Google Scholar] [CrossRef]
Avtar, R.; Chen, X.; Fu, J.; Alsulamy, S.; Supe, H.; Pulpadan, Y.A.; Louw, A.S.; Tatsuro, N. Tree Species Classification by Multi-Season Collected UAV Imagery in a Mixed Cool-Temperate Mountain Forest. Remote Sens. 2024, 16, 4060. [Google Scholar] [CrossRef]
Qin, T.; Zhao, Q. Multi-Branch and Multi-Label Tree Species Classification Using Deep Learning for UAV Aerial Photography and Sentinel Remote Sensing Images. Sci. Rep. 2025, 15, 32710. [Google Scholar] [CrossRef] [PubMed]
White, J.C.; Tompalski, P.; Bater, C.W.; Wulder, M.A.; Fortin, M.; Hennigar, C.; Robere-McGugan, G.; Sinclair, I.; White, R. Enhanced Forest Inventories in Canada: Implementation, Status, and Research Needs. Can. J. For. Res. 2025, 55, 1–37. [Google Scholar] [CrossRef]
White, J.C.; Coops, N.C.; Wulder, M.A.; Vastaranta, M.; Hilker, T.; Tompalski, P. Remote Sensing Technologies for Enhancing Forest Inventories: A Review. Can. J. Remote Sens. 2016, 42, 619–641. [Google Scholar] [CrossRef]
Karthigesu, J.; Owari, T.; Tsuyuki, S.; Hiroshima, T. Improving Individual Tree Crown Detection and Species Classification in a Complex Mixed Conifer–Broadleaf Forest Using Two Machine Learning Models with Different Combinations of Metrics Derived from UAV Imagery. Geomatics 2025, 5, 32. [Google Scholar] [CrossRef]
Huang, Y.; Ou, B.; Meng, K.; Yang, B.; Carpenter, J.; Jung, J.; Fei, S. Tree Species Classification from UAV Canopy Images with Deep Learning Models. Remote Sens. 2024, 16, 3836. [Google Scholar] [CrossRef]
Onishi, M.; Ise, T. Explainable Identification and Mapping of Trees Using UAV RGB Image and Deep Learning. Sci. Rep. 2021, 11, 903. [Google Scholar] [CrossRef] [PubMed]
Wang, N.; Pu, T.; Zhang, Y.; Liu, Y.; Zhang, Z. More Appropriate DenseNetBL Classifier for Small Sample Tree Species Classification Using UAV-Based RGB Imagery. Heliyon 2023, 9, e20467. [Google Scholar] [CrossRef] [PubMed]
Shi, W.; Wang, S.; Yue, H.; Wang, D.; Ye, H.; Sun, L.; Sun, J.; Liu, J.; Deng, Z.; Rao, Y.; et al. Identifying Tree Species in a Warm-Temperate Deciduous Forest by Combining Multi-Rotor and Fixed-Wing Unmanned Aerial Vehicles. Drones 2023, 7, 353. [Google Scholar] [CrossRef]
Ngo, D.T. Mapping Tree Species of Wetlands Using Multispectral Images of UAVs and Machine Learning: A Case Study of the Dong Rui Commune. Heliyon 2024, 10, e35159. [Google Scholar] [CrossRef] [PubMed]
Sivanandam, P.; Lucieer, A. Tree Detection and Species Classification in a Mixed Species Forest Using Unoccupied Aircraft System (UAS) RGB and Multispectral Imagery. Remote Sens. 2022, 14, 4963. [Google Scholar] [CrossRef]
Donnini, J.; Kross, A.; Alejo, C. Spectral Diversity as a Predictor of Tree Diversity: Exploring Challenges and Opportunities Across Forest Ecosystems. Can. J. Remote Sens. 2024, 50, 2403495. [Google Scholar] [CrossRef]
Ecke, S.; Stehr, F.; Frey, J.; Tiede, D.; Dempewolf, J.; Klemmt, H.-J.; Endres, E.; Seifert, T. Towards Operational UAV-Based Forest Health Monitoring: Species Identification and Crown Condition Assessment by Means of Deep Learning. Comput. Electron. Agric. 2024, 219, 108785. [Google Scholar] [CrossRef]
Xu, Z.; Shen, X.; Cao, L.; Coops, N.C.; Goodbody, T.R.H.; Zhong, T.; Zhao, W.; Sun, Q.; Ba, S.; Zhang, Z.; et al. Tree Species Classification Using UAS-Based Digital Aerial Photogrammetry Point Clouds and Multispectral Imageries in Subtropical Natural Forests. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102173. [Google Scholar] [CrossRef]
Hartling, S.; Sagan, V.; Maimaitijiang, M. Urban Tree Species Classification Using UAV-Based Multi-Sensor Data Fusion and Machine Learning. GIScience Remote Sens. 2021, 58, 1250–1275. [Google Scholar] [CrossRef]
Li, X.; Wang, L.; Guan, H.; Chen, K.; Zang, Y.; Yu, Y. Urban Tree Species Classification Using UAV-Based Multispectral Images and LiDAR Point Clouds. J. Geovis Spat. Anal. 2024, 8, 5. [Google Scholar] [CrossRef]
Hell, M.; Brandmeier, M.; Briechle, S.; Krzystek, P. Classification of Tree Species and Standing Dead Trees with Lidar Point Clouds Using Two Deep Neural Networks: PointCNN and 3DmFV-Net. PFG 2022, 90, 103–121. [Google Scholar] [CrossRef]
Zhang, H.; Liu, B.; Yang, B.; Guo, J.; Hu, Z.; Zhang, M.; Yang, Z.; Zhang, J. Efficient Tree Species Classification Using Machine and Deep Learning Algorithms Based on UAV-LiDAR Data in North China. Front. For. Glob. Change 2025, 8, 1431603. [Google Scholar] [CrossRef]
Ma, Y.; Zhao, Y.; Im, J.; Zhao, Y.; Zhen, Z. A Deep-Learning-Based Tree Species Classification for Natural Secondary Forests Using Unmanned Aerial Vehicle Hyperspectral Images and LiDAR. Ecol. Indic. 2024, 159, 111608. [Google Scholar] [CrossRef]
Wang, L.; Guan, H.; Lu, D.; Zhang, D.; Li, J. Exploring Transfer Learning for Individual Tree Species Classification by Cross-Platform Point Cloud. Int. J. Appl. Earth Obs. Geoinf. 2026, 149, 105247. [Google Scholar] [CrossRef]
Bayrak, O.C.; Erdem, F.; Uzar, M. Deep learning based aerial imagery classification for tree species identification. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, XLVIII-M-1–2023, 471–476. [Google Scholar] [CrossRef]
Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on Convolutional Neural Networks (CNN) in Vegetation Remote Sensing. ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49. [Google Scholar] [CrossRef]
Natesan, S.; Armenakis, C.; Vepakomma, U. Individual Tree Species Identification Using Dense Convolutional Network (DenseNet) on Multitemporal RGB Images from UAV. J. Unmanned Veh. Sys. 2020, 8, 310–333. [Google Scholar] [CrossRef]
Kentsch, S.; Lopez Caceres, M.L.; Serrano, D.; Roure, F.; Diez, Y. Computer Vision and Deep Learning Techniques for the Analysis of Drone-Acquired Forest Images, a Transfer Learning Study. Remote Sens. 2020, 12, 1287. [Google Scholar] [CrossRef]
Fromm, M.; Schubert, M.; Castilla, G.; Linke, J.; McDermid, G. Automated Detection of Conifer Seedlings in Drone Imagery Using Convolutional Neural Networks. Remote Sens. 2019, 11, 2585. [Google Scholar] [CrossRef]
Mahdianpari, M.; Salehi, B.; Rezaee, M.; Mohammadimanesh, F.; Zhang, Y. Very Deep Convolutional Neural Networks for Complex Land Cover Mapping Using Multispectral Remote Sensing Imagery. Remote Sens. 2018, 10, 1119. [Google Scholar] [CrossRef]
He, K.; Girshick, R.; Dollár, P. Rethinking ImageNet Pre-Training. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4917–4926. [Google Scholar]
Fassnacht, F.E.; Latifi, H.; Stereńczak, K.; Modzelewska, A.; Lefsky, M.; Waser, L.T.; Straub, C.; Ghosh, A. Review of Studies on Tree Species Classification from Remotely Sensed Data. Remote Sens. Environ. 2016, 186, 64–87. [Google Scholar] [CrossRef]
Dalponte, M.; Bruzzone, L.; Gianelle, D. Tree Species Classification in the Southern Alps Based on the Fusion of Very High Geometrical Resolution Multispectral/Hyperspectral Images and LiDAR Data. Remote Sens. Environ. 2012, 123, 258–270. [Google Scholar] [CrossRef]
Qasim, H.; Ding, X.; Usman, M.; Abbas, S.; Shahzad, N.; Keshk, H.M.; Bilal, M.; Ahmad, U. Advancing Tree Species Classification with Multi-Temporal UAV Imagery, GEOBIA, and Machine Learning. Geomatics 2025, 5, 42. [Google Scholar] [CrossRef]
Heinzel, J.; Koch, B. Exploring Full-Waveform LiDAR Parameters for Tree Species Classification. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 152–160. [Google Scholar] [CrossRef]
MicaSense. User Guide for MicaSense Sensors; MicaSense: Seattle, WA, USA, 2026. [Google Scholar]
Agisoft LLC. Agisoft Metashape User Manual; Agisoft LLC: St. Petersburg, Russia, 2025. [Google Scholar]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 2261–2269. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
Wang, M.; Deng, W. Deep Visual Domain Adaptation: A Survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Csurka, G. Domain Adaptation for Visual Applications: A Comprehensive Survey. arXiv 2017, arXiv:1702.05374. [Google Scholar]
Zhou, Y.; Zhang, X.; Wang, Y.; Zhang, B. Transfer Learning and Its Application Research. J. Phys. Conf. Ser. 2021, 1920, 012058. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar]
Vepakomma, U.; Cormier, D.; Hansson, L.; Talbot, B. Remote Sensing at Local Scales for Operational Forestry. In Boreal Forests in the Face of Climate Change; Girona, M.M., Morin, H., Gauthier, S., Bergeron, Y., Eds.; Advances in Global Change Research; Springer International Publishing: Cham, Switzerland, 2023; Volume 74, pp. 657–682. [Google Scholar]
Vepakomma, U. Changing Perspectives with Precision: Low-Cost Solutions for Operational Forestry; FPInnovations: Pointe-Claire, QC, Canada, 2024; p. 34. [Google Scholar]

Figure 1. (a) DJI M210 platform carrying MicaSense RedEdge M (b) Micasense RedEdge M sensor (c) Reflectance calibration panel (d) Spectral Reflectance curves from MSS sensor–Courtesy MicaSense.

Figure 2. Orthomosaic of multispectral images (Bands 5, 4 and 3 assigned to R, G, B channels).

Figure 3. Orthomosaic of RGB images.

Figure 4. Tree Crown segmentation.

Figure 5. Examples of individual tree crowns of different species extracted from the multispectral orthomosaic, shown using bands 5, 4, and 3 assigned to the red, green, and blue channels, respectively.

Figure 6. Schematic overview of the methodological workflow.

Figure 7. DenseNet architecture illustrating dense connectivity and feature reuse.

Figure 8. Proposed supervised domain adaptation framework for tree species classification. A DenseNet-121 model pretrained on high-resolution RGB imagery (top) is adapted for classification using lower-resolution multispectral imagery (bottom) by transferring and modifying the initial convolutional layer weights.

Figure 9. Image of a red pine tree canopy (a) before resampling (8 cm resolution) and (b) after resampling (20 cm resolution).

Figure 10. Learning curves showing training and validation accuracy and loss in the pretrained model. (The final reported test results correspond to the model checkpoint obtained at epoch 247, which achieved the highest validation accuracy during training).

Figure 11. Species-level Recall and Precision score with 95% confidence intervals.

Figure 12. Spatial distribution of correctly (green) and incorrectly (red) classified tree crowns in the independent test dataset.

Table 1. Spectral characteristics of Micasense RedEdge-M.

Band Number	Band Name	Centre Wavelength (nm)	Bandwidth (nm)
1	Blue	475	20
2	Green	560	20
3	Red	668	10
4	Near IR	840	40
5	Red Edge	717	10

Table 2. Summary of the distribution of labelled dataset.

Species Name	Total	Training/Cross Validation	Testing
black ash	307	182	125
eastern white cedar	100	68	32
balsam fir	148	124	24
red maple	81	56	25
red oak	141	91	50
red pine	272	206	66
white spruce	134	102	32
white birch	109	80	29
eastern white pine	340	272	68
Total	1632	1181	451

Table 3. Summary of the experiments conducted using DenseNet-121 and EfficientNet Models.

Model	Loss Function	Regularization Methods	Accuracy
EfficientNet B7	Categorical Cross-Entropy	Dropout	35%
EfficientNet B3	Categorical Cross-Entropy	Dropout	42%
EfficientNet B0	Categorical Cross-Entropy	Dropout	43%
DenseNet-121	Categorical Cross-Entropy Class_weight used to handle class imbalance	Dropout	56%
DenseNet-121	Focal	Dropout	62%
DenseNet-121	Focal	Dropout L2 Regularization/weight decay	62%

Table 4. Summary of the experiments conducted using DenseNet-121 pretrained on the RGB model.

Experiment	Model	Pretrained from RGB Model?	Loss Function	Regularization Methods	Overall Accuracy
1	DenseNet-121	Yes	Focal	Dropout L2 Regularization/weight decay	66%
2	DenseNet-121	Yes	Categorical Cross-Entropy	Dropout L2 Regularization/weight decay	69%
3	DenseNet-121	Yes	Categorical Cross-Entropy	Dropout L2 Regularization/weight decay Data Augmentation	75%
4	Densenet-121	Yes	Focal	Dropout L2 Regularization/weight decay -Data Augmentation	75%

Table 5. Heatmap representation of the Confusion matrix for the pretrained Densenet121 model.

	SPECIES	PREDICTED LABEL
	SPECIES	BA	EWC	BF	RM	RO	RP	WS	WB	EWP	TOTAL	RECALL
TRUE LABEL	BA	96	0	4	7	2	3	1	7	5	125	0.77
	EWC	0	26	2	0	0	0	4	0	0	32	0.81
	BF	0	0	20	0	0	0	2	1	1	24	0.83
	RM	7	0	0	11	1	0	0	6	0	25	0.44
	RO	0	0	0	3	42	0	0	4	1	50	0.84
	RP	0	0	0	0	0	55	3	0	8	66	0.83
	WS	0	2	6	0	0	3	18	0	3	32	0.56
	WB	4	0	0	3	0	5	0	17	0	29	0.59
	EWP	2	0	1	0	8	4	0	0	53	68	0.78
	TOTAL	109	28	33	24	53	70	28	35	71	451
	PRECISION	0.88	0.93	0.61	0.46	0.79	0.79	0.64	0.49	0.75
	ACCURACY	0.75
	MACRO F1 SCORE	0.706
	WEIGHTED F1 SCORE	0.719

BA: black ash, EWC: eastern white cedar, BF: balsam fir, RM: red maple, RO: red oak, RP: red pine, WS: white spruce, WB: white birch, EWP: eastern white pine. Cell shading progresses from light blue to darker blue as values increase.

Table 6. Heatmap representation of the Confusion matrix for species classification using the adapted DenseNet-121 model on the downsampled MS dataset.

	SPECIES	PREDICTED LABEL
	SPECIES	BA	EWC	BF	RM	RO	RP	WS	WB	EWP	TOTAL	RECALL
TRUE LABEL	BA	87	2	6	4	1	3	0	3	19	125	0.70
	EWC	0	23	5	0	0	0	4	0	0	32	0.72
	BF	0	0	16	0	0	1	1	0	6	24	0.67
	RM	8	0	0	5	3	0	0	9	0	25	0.20
	RO	0	0	4	0	35	0	0	4	7	50	0.70
	RP	0	0	0	0	0	60	0	0	6	66	0.91
	WS	0	1	11	0	0	5	8	0	7	32	0.25
	WB	4	0	1	1	3	0	1	18	1	29	0.62
	EWP	0	0	0	1	4	3	0	0	60	68	0.88
	SUM	99	26	43	11	46	72	14	34	106	451
	PRECISION	0.88	0.88	0.37	0.45	0.76	0.83	0.57	0.53	0.57
	ACCURACY	0.69
	MACRO F1 SCORE	0.615
	WEIGHTED F1 SCORE	0.655

BA: black ash, EWC: eastern white cedar, BF: balsam fir, RM: red maple, RO: red oak, RP: red pine, WS: white spruce, WB: white birch, EWP: eastern white pine. Cell shading progresses from light blue to darker blue as values increase.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Natesan, S.; Vepakomma, U.; Armenakis, C. Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data. Forests 2026, 17, 738. https://doi.org/10.3390/f17070738

AMA Style

Natesan S, Vepakomma U, Armenakis C. Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data. Forests. 2026; 17(7):738. https://doi.org/10.3390/f17070738

Chicago/Turabian Style

Natesan, Sowmya, Udayalakshmi Vepakomma, and Costas Armenakis. 2026. "Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data" Forests 17, no. 7: 738. https://doi.org/10.3390/f17070738

APA Style

Natesan, S., Vepakomma, U., & Armenakis, C. (2026). Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data. Forests, 17(7), 738. https://doi.org/10.3390/f17070738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Domain-Adapted Supervised Learning for Tree Species Mapping Using UAV Multispectral Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Acquisition

2.3. Multispectral UAV Image Pre-Processing and Preparation of Labelled Dataset

3. Methods

3.1. Tree Species Classification Models Using Multispectral Images

3.2. Tree Species Classification Models Using Multispectral Images and Supervised Domain Adaptation Using RGB-Based Source Model

3.2.1. Tree Species Classification Models Using RGB Imagery

3.2.2. Tree Species Classification Models Using Multispectral Images and Supervised Domain Adaptation

Data Augmentation

Custom Classification and Regularization

Loss Functions and Optimization

3.3. Aerial Imagery Simulation

3.4. Accuracy Assessment

4. Results

4.1. Classification with RGB Pretrained Model

4.2. Simulated Aerial Imagery Model Using Downsampled MS Imagery

5. Discussion

5.1. Performance of Supervised Domain Adaptation with RGB Pretrained Model

5.2. Comparison to Existing UAV RGB and MS-Based Approaches

5.3. Comparison to Existing Multi-Source and Multi-Temporal Deep Learning-Based Approaches

5.4. Performance of the Simulated Aerial Imagery Model

5.5. Comparison to Existing Aerial Imagery-Based Approaches

5.6. Training Stability and Generalization

5.7. Impact of RGB-Based Pretraining

5.8. Evaluation of Loss Functions

5.9. Role and Limits of Data Augmentation

5.10. Operational Scalability and Cost Considerations

5.11. Future Research Directions

5.12. Summary

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI