Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples

He, Li; Wang, Leiguang; Hong, Liang; Dai, Qinling; Gu, Wei; Du, Xingyue; Yang, Mingqi; Liu, Juanjuan; Feng, Yaoming

doi:10.3390/rs18060951

Open AccessArticle

Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples

by

Li He

^1,2,

Leiguang Wang

^1,2

,

Liang Hong

^3,4,*,

Qinling Dai

⁵,

Wei Gu

⁶,

Xingyue Du

^1,2,

Mingqi Yang

^1,2,

Juanjuan Liu

^1,2 and

Yaoming Feng

⁷

¹

College of Landscape Architecture and Horticulture, Southwest Forestry University, Kunming 650224, China

²

Yunnan Key Laboratory of Landscape Plant Resource Cultivation and Application, Southwest Forestry University, Kunming 650224, China

³

Faculty of Geography, Yunnan Normal University, Kunming 650500, China

⁴

China and GIS Technology Research Centre of Resource and Environment in Western China of Ministry of Education, Yunnan Normal University, Kunming 650500, China

⁵

Art and Design College, Southwest Forestry University, Kunming 650224, China

⁶

Shanghai Hecaray Technology Co., Ltd., Shanghai 201100, China

⁷

Territorial Space Information College, Yunnan Land and Resources Vocational College, Kunming 652501, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(6), 951; https://doi.org/10.3390/rs18060951

Submission received: 20 January 2026 / Revised: 10 March 2026 / Accepted: 18 March 2026 / Published: 21 March 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Self-supervised representation learning provides a practical solution for large-area tree species mapping in mountainous regions where field inventory samples are scarce, imbalanced, or temporally inconsistent.
The proposed framework offers transferable insights for operational forest inventory and ecological monitoring by improving mapping robustness without increasing field survey costs.

What are the implications of the main findings?

The framework offers transferable methodological insights for operational forest inventory and ecological monitoring in data-scarce mountainous regions.
Improved mapping robustness can be achieved without increasing field survey intensity or associated costs, supporting cost-effective large-area forest assessments.

Abstract

Accurate tree species mapping is critical for forest inventory, biodiversity assessment, and ecosystem management. In mountainous regions, terrain-induced radiometric non-stationarity and limited field access often produce scarce, clustered, and environmentally biased samples, limiting model generalization. To address this issue, this study proposes a terrain-aware self-supervised representation learning framework for tree species classification under small-sample conditions. The framework integrates terrain information into representation learning and adopts a hybrid contrastive–generative self-supervised strategy to learn discriminative and terrain-robust features from large volumes of unlabeled multi-source remote sensing data. These learned representations are subsequently combined with limited field samples to produce regional-scale tree species maps. Experiments conducted across Yunnan Province, China, using Sentinel-1, Sentinel-2 and Landsat time-series data show that the proposed framework substantially improvesa class separability and classification robustness in complex mountainous environments. The framework achieves an overall accuracy of 75.8%, significantly outperforming conventional feature engineering (38.3–40.6%) and supervised deep learning models (37.3–47.8%). Species with relatively homogeneous structure and strong ecological niche dependence can be accurately mapped with limited training samples, whereas structurally complex forest communities require broader environmental sample coverage. Overall, the results highlight the potential of terrain-aware self-supervised representation learning as a scalable and data-efficient paradigm for forest mapping in mountainous and environmentally heterogeneous regions.

Keywords:

tree species mapping; self-supervised learning; mountainous forest; remote sensing time series; terrain effects; limited training samples; forest inventory

1. Introduction

Accurate tree species information is fundamental for forest resource inventory, biodiversity conservation, ecosystem service assessment, and climate change mitigation [1,2]. In mountain regions, forests play a critical ecological role in regulating hydrological processes, stabilizing soils, and maintaining biodiversity across steep elevation gradients [3]. However, reliable tree species mapping in such environments remains a long-standing challenge in remote sensing due to strong terrain-induced radiometric variability, fragmented forest distributions, and pronounced ecological heterogeneity [4,5]. While Satellite remote sensing offers an indispensable observation for large-area tree species mapping [6,7,8,9], most existing classification approaches implicitly assume radiometric stationarity and sufficient labeled samples [10,11,12]. In mountain landscapes, variations in illumination, shadowing, viewing geometry, and terrain effects substantially distort spectral and backscattering responses, leading to weak class separability even for ecologically distinct forest types. Meanwhile, field surveys in mountainous regions are often constrained by limited accessibility and high costs, resulting in training samples that are scarce, spatially clustered, and environmentally unbalanced [13]. In this study, small-sample conditions refer not merely to limited sample count, but to training data that fail to adequately represent environmental gradients, terrain variability, and ecological heterogeneity at the study scale. Such environmentally unrepresentative samples substantially undermine the generalization ability of conventional supervised classification models [14]. Therefore, achieving accurate tree species classification in mountainous regions under small-sample conditions has become a central challenge in forestry remote sensing.

To alleviate sample scarcity, numerous strategies have been explored, including data augmentation, transfer learning, few-shot learning, and rule-based classification. Data augmentation [15] and generative learning [16] methods artificially expand training sample size, but may introduce unrealistic spectral patterns or fail to preserve ecological consistency in heterogeneous mountainous forests. Transfer learning [17] and few-shot learning [18] approaches leverage pretrained models or meta-knowledge to reduce labeling requirements, yet their performance often degrades due to domain shifts in species composition, climate regimes, and topographic complexity. Rule-based approaches construct spectral indices, texture metrics, phenological features, and environmental variables derived from multisource remote sensing data [19]; however, their effectiveness is constrained by terrain-induced radiometric variability and unstable feature–class relationships across environmental gradients. Although supervised machine learning and deep learning approaches can capture nonlinear relationships, they require large and representative labeled datasets to learn robust decision boundaries [20]. Under small-sample conditions, these models tend to overfit spatially clustered samples and fail to generalize across heterogeneous terrain and ecological gradients. Increasing sample quantity alone cannot resolve this limitation when environmental representativeness remains insufficient.

In this context, self-supervised learning (SSL) has emerged as a promising paradigm for remote sensing representation learning by exploiting the intrinsic structure of large volumes of unlabeled data [21]. By learning invariant and task-relevant features without relying on manual annotations, SSL offers a potential pathway to mitigate sample scarcity and improve representation robustness [21,22,23]. Existing SSL methods can be broadly categorized into generative and contrastive paradigms. Generative approaches, such as masked image modeling, emphasize local spatial and spectral reconstruction and are effective in capturing fine-scale canopy structures, but often lack discriminative power for forest types with subtle inter-class differences [24]. In contrast, contrastive learning focuses on global semantic separability by enforcing representation consistency across multiple views, yet may overlook local structural cues critical for distinguishing heterogeneous forest canopies in complex terrain [25]. When applied independently, both paradigms may inadvertently encode terrain-induced illumination artifacts as discriminative features, thereby limiting their robustness in mountainous environments.

Recently, hybrid self-supervised frameworks that integrate generative and contrastive objectives have been proposed to jointly model local structural consistency and global semantic information [22,26]. Among them, Contrastive Masked Image Distillation (CMID) provides a unified learning framework that combines masked reconstruction with contrastive alignment [26]. While CMID has demonstrated promising performance in general remote sensing representation learning tasks, its applicability to tree species classification in mountain environments characterized by strong terrain effects, fragmented forest distributions, and limited labeled samples has not yet been systematically examined. More importantly, the conditions under which such hybrid self-supervised representations effectively reduce dependence on labeled samples, and their limitations across forest types with varying ecological complexity, remain unclear.

To address these gaps, this study frames self-supervised representation learning as a terrain-aware solution to forest type classification in mountainous regions. Using multi-source optical, SAR, and environmental data, we investigate whether self-supervised representations that jointly encode local canopy structures and global ecological semantics can improve tree species classification under small-sample conditions. Rather than proposing a new network architecture, this study focuses on systematically analyzing the effectiveness, robustness, and limitations of hybrid self-supervised learning for mountainous forest mapping. Specifically, the objectives of this study are to: (1) evaluate the capability of contrastive–generative self-supervised representations to mitigate terrain-induced radiometric non-stationarity in mountain forests; (2) analyze how self-supervised features affect class separability and classification performance under limited and spatially unbalanced training samples; and (3) assess the relationship between forest ecological complexity, sample representativeness, and the effectiveness of small-sample classification strategies in mountainous environments.

2. Study Area and Datasets

2.1. Study Area

Yunnan Province (97°31′–106°11′E, 21°8′–29°15′N), located in southwestern China (Figure 1a), is characterized by a large elevation gradient (Figure 1b), complex mountain terrain (Figure 1c), and diverse climate and ecological conditions [27]. Forest ecosystems in this region exhibit strong spatial heterogeneity in species composition, stand structure, and phenology, which poses substantial challenges for remote-sensing-based tree species mapping [28]. The pronounced topographic relief further induces significant radiometric variability caused by illumination differences, shadowing, and terrain–sensor geometry effects. These characteristics make Yunnan Province an ideal testbed for evaluating the robustness of tree species classification methods under terrain-induced non-stationarity and limited training sample conditions.

2.2. Collecting Field Survey Samples for Tree Species

Field survey samples were collected from a combination of recent forest inventory records and targeted field campaigns. Due to limited accessibility, steep terrain, and high logistical costs in mountainous regions, field plots established for the target year (2023) were inevitably sparse, spatially clustered, and unevenly distributed across forest types and environmental gradients (Figure 2a). In total, 3120 forest plots were available as training samples for tree species classification. As summarized in Table 1, the number of training samples varies substantially among tree species, reflecting both the natural distribution of forest types and practical constraints of field surveys in complex terrain. Dominant and widely distributed forest types, such as Yunnan pine, oak forest, and other broadleaved forests, are relatively well represented, whereas high-elevation coniferous forests (e.g., Abies–Picea forest), bamboo forests, and rubber plantations are characterized by much smaller sample sizes. This pronounced class imbalance is typical for large-area forest inventories in mountainous regions, where accessibility and forest fragmentation strongly influence sampling design. Rather than artificially balancing sample sizes, this study deliberately retains the original sample distribution to realistically reflect operational forest inventory conditions and to evaluate the robustness of classification methods under severe sample imbalance.

To enhance survey efficiency and logistical feasibility in the mountainous environment, sample plots were predominantly established along accessible road corridors. Each plot covered an area of 0.09 ha and was designed following a standardized forest inventory protocol. Its corresponding forest type was determined through a systematic field protocol comprising three key steps. A 30 m × 30 m plot was delineated using ropes and precisely georeferenced with a Real-Time Kinematic Global Positioning System (RTK-GPS), ensuring accurate correspondence between field observations and remote sensing pixels [29]. Within each plot, all trees with a diameter at breast height (DBH) greater than 5 cm were visually identified to the species level based on leaf morphology, and the canopy cover contributed by each species was estimated. Tree species labels were then assigned according to the relative proportion of canopy cover: plots dominated by a single species accounting for at least 65% of total canopy cover were classified as pure forest types, whereas plots without a dominant species were labeled as mixed forest [30].

Independent validation samples were obtained from forest resource inventory datasets collected in 2012 and 2016. To ensure temporal consistency between historical records and the target year, a forest-succession-aware historical sample transfer strategy was applied [31], retaining only samples with stable species labels and clear canopy conditions (Figure 2b). These temporally consistent samples were used exclusively for accuracy assessment and were spatially more evenly distributed than the training samples, providing an independent evaluation of classification performance under realistic sampling constraints.

2.3. Multi-Source Remote Sensing Data Acquisition and Preprocessing

This study integrates multi-source remote sensing data to capture complementary canopy structural, spectral, and environmental features in Yunnan Province. These datasets include optical, synthetic aperture radar (SAR) data, and environmental data.

2.3.1. Optical Remote Sensing Data

Optical imagery consists of Sentinel-2 surface reflectance imagery acquired over Yunnan Province between October 2022 and October 2023, as well as the complete Landsat archive (Landsat 5/7/8 Collection 2 Tier 1 surface reflectance) spanning 1986–2023, which was obtained from the Google Earth Engine (GEE) platform. Among them, Sentinel-2 images were processed using an adjusted cloud scoring algorithm [32] to mask clouds, cloud shadows, and snow. Annual median composite image was then used to extract spectral, phenological, growth rate, and textural features of forest canopies.

To characterize phenological features, time-series red-edge position indices (REPI) [33] were generated at 10-day intervals using linear interpolation [34] and Savitzky–Golay filtering algorithms [35]. Statistical metrics describing intra-annual variability were subsequently calculated to represent phenological differences among forest types [36]. In addition, seasonal differences in Normalized Difference Vegetation Index (NDVI) maxima between summer and winter periods were used to further capture phenological contrasts [37]. Spatial texture features were derived from the Sentinel-2 Red Edge 1 band using gray-level co-occurrence matrix (GLCM) [38] measures to describe canopy structural heterogeneity.

Long-term forest growth dynamics were characterized using annual cloud-free Landsat composites from 1986 to 2023. All Landsat images were preprocessed following standard procedures. Cloud, shadow, and snow were masked using CFMask [39]. Missing pixels in Landsat 7 ETM+ SLC-off scenes were reconstructed using the neighborhood similar pixel interpolation method [40]. Temporal gaps caused by cloud contamination were filled through linear interpolation [34], followed by Savitzky–Golay filtering to smooth residual noise and preserve phenological trajectories [35]. To ensure radiometric consistency among sensors, Landsat 8 OLI/TIRS data were normalized to Landsat TM using established cross-sensor calibration coefficients [41], resulting in a harmonized multi-sensor Landsat time series suitable for long-term analysis.

To maintain tree species temporal consistency across the multi-decadal record, forest disturbances and potential succession were explicitly considered. The LandTrendr disturbance detection algorithm [42] was applied to identify the most recent disturbance year for each pixel, capturing abrupt spectral changes associated with logging, fire, or severe canopy loss. Pixels exhibiting disturbance signals were regarded as having potential species transitions. For these pixels, only observations acquired after the last detected disturbance year were retained to construct the time series, ensuring that derived phenological, spectral, and texture features represent stable post-disturbance forest conditions. This disturbance-aware preprocessing strategy reduces the influence of long-term succession and spectral instability on feature extraction, thereby improving the reliability of tree species characterization from multi-decadal Landsat observations.

2.3.2. SAR Data

Compared with optical imagery, SAR data provide complementary information on forest vertical structure and are less affected by atmospheric conditions [37,43]. Sentinel-1 SAR data acquired between October 2022 and October 2023 were obtained from the GEE platform and composited using median values. A set of commonly used backscatter- and polarization-based indices was derived from VV and VH bands to represent forest structural features. Temporal variability metrics and seasonal features [44] of selected SAR-related indices were further calculated to capture phenological differences among tree species.

2.3.3. Environmental Condition

Given the strong influence of terrain, climate, and soil conditions on forest distribution in mountainous regions, environmental variables were incorporated to describe the ecological niche. Topographic factors, including elevation, slope, and aspect, were derived from 30 m Shuttle Radar Topography Mission (SRTM) digital elevation model data [45]. Climatic variables were obtained from the WorldClim v2.1 dataset [46] and include mean and seasonal metrics of temperature and precipitation [47]. These variables were incorporated to represent broad-scale ecological gradients rather than fine-scale microclimatic variability. Soil moisture conditions were characterized using a soil moisture response index [48] derived from soil moisture of China by the in situ data, version 1.0 (SMCI 1.0) products [49], representing the water availability for forest growth. To ensure spatial alignment with optical and SAR datasets, all environmental variables were resampled to 30 m using bilinear interpolation. This resampling was performed solely for spatial consistency and does not introduce additional spatial detail beyond the native resolution. In mountainous regions, microclimatic variability is strongly controlled by elevation, slope, and aspect; therefore, terrain-derived variables (elevation, slope, and aspect) were explicitly included to capture local terrain effects that are not represented in coarse-resolution climate data. While topographically informed downscaling may further improve microclimate representation, such approaches require dense in situ observations that are currently unavailable for the study area. This limitation is discussed in Section 5. A summary of all extracted forest canopy and environmental features is provided in Table 2. Detailed definitions of individual indices are provided in Appendix A (Table A1 and Table A2).

3. Method

3.1. Overall Workflow

This study adopts a terrain-aware self-supervised learning (TA-SSL) framework to address the limitations of imposed by environmentally heterogeneous terrain and limited labeled samples in mountainous regions. The workflow consists of four main steps (Figure 3). First, canopy- and environment-related features were derived from multi-source remote sensing data to characterize canopy structural, spectral, phenological, and site-condition attributes. Second, an existing self-supervised learning framework, CMID [26], was applied to learn general-purpose feature representations from unlabeled data by exploiting the internal consistency of multi-dimensional observations and prior knowledge of forest distribution in plateau–mountain environments. Third, the learned representations were combined with a limited set of field survey samples to train a Random Forest classifier for tree species mapping across the entire study area. Finally, independent validation samples were used to evaluate the robustness and applicability of the proposed strategy in complex mountainous forest environments. TA-SSL extends the CMID framework by incorporating terrain-aware feature construction and a representation learning strategy tailored for environmentally heterogeneous mountainous regions.

3.2. Construction of Unlabeled Multi-Source Image Patches

Unlabeled data for self-supervised representation learning were constructed using all canopy- and environment-related feature layers (Table 2). All feature layers were co-registered to WGS84 and resampled to a spatial resolution of 30 m. Non-forest areas were masked using an existing forest/non-forest map (Figure 1d) [42]. The entire study area was then partitioned into fixed-size image patches of 224 × 224 pixels using a sliding window with 10% overlap, yielding 18,360 unlabeled image patches. For each patch, 84 feature layers were stacked to form a multi-channel image. These unlabeled patches served as input to the self-supervised learning stage, enabling representation learning without manual annotations.

3.3. Terrain-Aware Self-Supervised Representation Learning

The representation learning module is built upon a CMID-style contrastive–generative architecture and is designed to capture complementary local structural and global semantic information.

3.3.1. Local Structural Representation Learning

Local structural representations were learned using a combination of masked image modeling and local contrastive learning. This design exploits the inherent spatial consistency of canopy features across spectral, structural, phenological, and environmental dimensions. Spectral features reflect biochemical properties of vegetation, SAR features describe forest structure and moisture conditions, phenological features capture seasonal growth dynamics, growth-rate features represent long-term development trends, and environmental variables constrain species distribution through site conditions.

(1): Masked images reconstruction

A simple masked image modeling (simMIM) [50] approach was employed to reconstruct masked regions in both spatial and frequency domains. A subset of pixels in each image patch was randomly masked. The encoder processed the masked input, and a lightweight decoder predicted the original values, encouraging the model to learn fine-scale spatial structures. The spatial-domain reconstruction loss (

L_{s p a t}

) is defined as:

L_{s p a t} = \frac{1}{Ω (x_{m})} {∥ x_{m} - {x_{m}}^{'} ∥}_{1}

(1)

where

x_{m}

and

{x_{m}}^{'}

denote the original and reconstructed pixel values in the masked regions, and

Ω (x_{m})

represents the feature dimensionality.

To further improve reconstruction consistency in the frequency domain, a focal frequency loss (FFL) [51] was introduced:

L_{f r e d} = \frac{1}{N} \sum_{c = 1}^{N} F F L (x_{c}, x_{c}^{'})

(2)

where

N

is the number of channels, and

x_{c}

and

{x_{c}}^{'}

represent the original and reconstructed frequency-domain values of channel

c .

The joint optimization of spatial and frequency loss enables the model to capture fine-scale canopy structure while preserving high-level semantic information.

(2): Prototype-based local contrastive learning

To reduce information loss caused by high masking ratios and fragmented forest distributions, a prototype-based local contrastive learning branch was introduced. Feature vectors extracted at corresponding spatial locations from the student and teacher networks were first aligned using absolute positional encoding:

l_{1} = i_{2} + \frac{w_{2}}{2 W} + \frac{W_{i n p u t}}{W} (u - 1)

(3)

l_{2} = j_{2} + \frac{h_{2}}{2 H} + \frac{H_{i n p u t}}{H} (v - 1)

(4)

where

i_{2}

,

j_{2}

,

w_{2}

,

h_{2}

denote the spatial parameters of the image patch,

H_{i n p u t} \times W_{i n p u t}

represent the input image size, and

H \times W

denote the feature map size.

The aligned feature

{\{(x_{i}, \hat{x_{i}})\}}_{i = 1}^{N}

were then projected onto a set of learnable prototypes

C \in R^{K \times d}

, representing typical canopy structural and phenological patterns. Similarity distributions between features and prototypes were computed using a SoftMax function with different temperature parameters for the student and teacher networks:

p_{i} = S o f t m a x (\frac{〈x_{i}, C〉}{τ_{s}})

(5)

q_{i} = S o f t m a x (\frac{〈\hat{x_{i}}, C〉}{τ_{t}})

(6)

where

τ_{s}

and

τ_{t}

are temperature parameters set to 0.2 and 0.7, respectively. Local semantic alignment was enforced by minimizing a cross-entropy loss between the two distributions:

L_{l o c a l} = \frac{1}{N} \sum_{i = 1}^{N} - p_{i} \times l o g q_{i}

(7)

This objective promotes consistency between student and teacher representations at corresponding locations, enhancing sensitivity to fragmented forest structures and subtle phenological differences.

3.3.2. Global Semantic Representation Learning

Global semantic representations were learned using a momentum contrastive learning (Moco) strategy [52]. Multiple augmented views of each image patch were generated to capture invariant semantic information under varying illumination and terrain conditions. Global feature vectors were obtained through global average pooling and projected into a latent embedding space. Teacher-network representations of the same image served as positive samples, while a dynamically updated feature queue provided negative samples. The InfoNCE loss [52] was used to encourage intra-class compactness and inter-class separability, thereby reducing the influence of terrain-induced radiometric variability.

L_{N C E} = - l o g \frac{e x p (〈 q, k^{+} 〉 / τ)}{\sum_{i = 1}^{K} e x p (〈 q, k_{i} 〉 / τ)}

(8)

where

q

and

k^{+}

denote the student and teacher global feature representations of the same image,

k_{i}

represents negative samples from the queue, and

τ

is a temperature parameter. This loss promotes intra-class compactness and inter-class separability, reducing the influence of terrain-induced radiometric variability.

3.3.3. Joint Optimization of Local and Global Representation

The final representation was obtained by jointly optimizing the masked reconstruction loss, the global contrastive loss, and the local contrastive loss. The overall loss function is defined as:

L = λ_{1} (L_{s p a t} + L_{f r e q}) + λ_{2} L_{N C E} + λ_{3} L_{l o c a l}

(9)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

control the relative contributions of each learning objection. All weights were set to 1 to ensure balanced optimization across learning branches.

3.3.4. Training Configuration

The TA-SSL backbone was implemented using PyTorch 1.8 and trained on an NVIDIA A100 GPU. Pre-training was conducted for 200 epochs with a batch size of 128. The model was optimized using the AdamW optimizer with an initial learning rate of 1 × 10⁻⁴, which was decayed following a cosine annealing schedule throughout the training process. A weight decay of 0.05 was applied for regularization. Following the TA-SSL framework design described in Section 3.3.3, the masking ratio for the MIM branch was set to 0.6, and the queue size for the contrastive learning branch was set to 65,536. These settings follow commonly adopted configurations in recent self-supervised remote sensing studies to ensure stable convergence and reproducibility.

3.4. Tree Species Classification

The pretrained representations were combined with 3120 field samples from the target year to train a Random Forest (RF) classifier [53]. RF was adopted as a classifier-agnostic evaluator of representation quality due to its robustness to high-dimensional inputs and spatially imbalanced samples. A partitioned classification strategy was applied, restricting prediction within ecologically similar forest type regions to reduce class imbalance. For the Random Forest classifier, the number of trees ranged from 10 to 400 with an increment of 10, and different feature selection strategies (auto, sqrt, and log2) were evaluated. The Random Forest classifier was configured with 300 trees and the “sqrt” feature selection strategy, which provided the best trade-off between accuracy and generalization. Tree species maps were generated at 30 m resolution across the study area.

3.5. Comparative Experiments

Provincial-scale tree species mapping in Yunnan Province typically relies on 8000–17,000 labeled samples to achieve stable model performance [54,55]. In contrast, this study intentionally restricts the training dataset to 3120 samples in order to simulate a small-sample scenario for large-area tree species classification. This design enables a controlled off-model robustness under constrained labeling conditions. To systematically evaluate the effectiveness of the proposed terrain-aware self-supervised representations learning framework, comparative experiments were conducted against both conventional feature optimization strategies and supervised deep learning approaches. Specifically, two widely adopted feature optimization approaches were included: Random-Forest-based feature importance ranking using mean decrease accuracy (RF–MDA ranking) [56]; and rank-correlation-based classification feature selection (RCCF selection) [54]. These approaches represent commonly used strategies for dimensionality reduction and discriminative feature extraction in tree species classification. Additionally, two supervised deep learning baselines were implemented: multi-layer perceptron (MLP) [57] and Transformer-based feature architectures [58]. These models provide benchmarks for evaluating whether performance gains arise from representation learning itself rather than from classifier complexity. To ensure experimental fairness, all comparative methods were initialized from the identical full feature set described in Table 2. For feature selection approaches (RF–MDA and RCCF), only features exceeding predefined importance or correlation thresholds were retained prior to classification. The detailed feature subsets selected for each tree species are provided in Appendix A (Table A3 and Table A4). All classifiers were trained and evaluated using the same training–validation samples and performance metrics. Hyperparameters for all comparative models were tuned using identical cross-validation procedures to prevent performance bias induced by differential optimization effort.

3.6. Classification Accuracy Assessment

To quantitatively evaluate the effectiveness of different feature learning strategies for tree species classification under small-sample conditions in mountain regions, all learned feature representations were evaluated using a unified RF classifier trained with the same set of training samples shown in Figure 2a. Tree species classification was conducted across Yunnan Province for each feature learning method, and classification performance was independently assessed using a spatially relatively uniform validation sample (Figure 2b). Classification accuracy was quantified using five commonly adopted metrics, including overall accuracy (OA), Kappa coefficient, producer’s accuracy (PA), user’s accuracy (UA), and F1-score. The corresponding formulations are provided below.

O A = \frac{\sum_{i = 1}^{n} X_{i i}}{N} \times 100 %

(10)

k a p p a = \frac{N \sum_{i = 1}^{n} X_{i i} - \sum_{i = 1}^{n} (X_{i +} + X_{+ i})}{N^{2} - \sum_{i = 1}^{n} (X_{i +} + X_{+ i})}

(11)

P A = \frac{X_{i i}}{\sum_{i = 1}^{n} X_{i +}} \times 100 %

(12)

U A = \frac{X_{i i}}{\sum_{i = 1}^{n} X_{+ i}} \times 100 %

(13)

F 1 s c o r e = \frac{P A \times U A}{P A + U A} \times 2

(14)

where

n

represents the number of forest types;

N

denotes the total number of validation samples;

X_{i i}

corresponds to the diagonal elements of the confusion matrix, representing the number of correctly classified samples for each forest type

i

;

X_{i +}

and

X_{+ i}

represent the column and row sums of the confusion matrix, respectively, accounting for the total number of misclassified or erroneously assigned pixels for each forest type.

4. Results

4.1. Overall Classification Performance

Table 3 summarizes the overall classification performance of different feature learning strategies for tree species mapping in mountainous regions. Using the proposed terrain-aware self-supervised representations, the OA reached 75.80%, with a Kappa coefficient of 0.72, indicating substantial agreement between predicted and reference labels. In contrast, conventional feature engineering approaches based on manual feature selection achieved substantially lower accuracies, with OA values ranging from 38.33% to 40.55%. Supervised deep learning models trained directly on the limited labeled samples showed only moderate improvement, yielding OA values between 37.33% and 47.83%.

The magnitude performance gap indicates that classification accuracy is primarily constrained by the quality and robustness of feature representations rather than by classifier complexity. Under small-sample conditions, supervised models are unable to learn stable decision boundaries due to sample scarcity and environmental bias. By contrast, the proposed framework leverages large volumes of unlabeled data to learn terrain-robust representations, thereby substantially improving regional-scale classification reliability.

4.2. Classification Performance Across Individual Tree Species

Figure 4 compares class-level accuracies obtained using different feature learning strategies in terms of UA, PA, and F1-score. Feature selection methods (RCCF and RF-MDA) show limited improvements over the full feature set and exhibit strong variability among tree species, particularly for classes with few training samples.

For coniferous forest types, the proposed TA-SSL framework consistently achieves higher classification accuracy than all comparison methods. In particular, F1-scores for Yunnan pine, Simao pine, Picea–Abies forest, and other coniferous forests increase by approximately 0.16–0.64 relative to feature engineering and supervised deep learning approaches. These improvements are most pronounced for forest types with limited training samples.

For broadleaved forest types, feature engineering approaches generally outperform supervised deep learning models, reflecting the difficulty of training deep networks with highly imbalanced samples. Although TA-SSL showed slightly lower accuracy for some broadleaved categories, it maintains stable performance across forest types with highly uneven sample sizes. For example, TA-SSL achieved F1-scores of 0.39 for Picea–Abies forest, 0.69 for Simao pine, and 0.90 for rubber plantations, whereas supervised deep learning models exhibited near-zero accuracy for several of these classes.

UA and PA further emphasize these differences. TA-SSL achieves higher UA values for Yunnan pine (87.26%) and Simao pine (100%), and high PA values for Yunnan pine (99.05%) and rubber plantations (100%). Overall, TA-SSL provided more balanced class-level performance under severe sample imbalance.

4.3. Spatial Comparison of Tree Species Classification Maps

Figure 5 presents the spatial distribution of tree species classification maps generated using different feature learning methods across Yunnan Province. Pronounced spatial inconsistencies are observed among methods, particularly in mountainous regions characterized by complex terrain and fragmented forest distributions. Compared with feature selection and supervised learning approaches, TA-SSL produces spatial patterns that are more consistent with known ecological and geographical distributions. For example, rubber plantations are correctly identified by TA-SSL in low-elevation regions such as Xishuangbanna, Pu’er, Honghe, Lincang, and Dehong, whereas supervised deep learning models frequently misclassify these areas as other broadleaved forests. Similarly, TA-SSL accurately captures the spatial distribution of Simao pine in western Pu’er, while other methods confuse it with Yunnan pine or other coniferous species in central and northern Yunnan. High-elevation tree species, such as Picea–Abies forests that are strongly constrained by climate and topography, are also more reliably mapped by TA-SSL. Only TA-SSL and the full-feature-based approach correctly identify their distribution in northwestern Yunnan, whereas other methods exhibit widespread misclassification. These results indicate that TA-SSL effectively integrates structural, phenological, and environmental information, enabling improved discrimination under terrain-induced radiometric variability.

4.4. Local-Scale Validation and Detailed Spatial Consistency

While the province-scale maps in Figure 5 illustrate overall spatial patterns of tree species distribution and reveal clear differences among feature learning strategies, they do not fully capture classification behavior in heterogeneous and fragmented mountain landscapes. To further evaluate method performance at finer spatial scales and assess consistency with field observations, a local-scale validation using field photographs and high-resolution reference data was conducted (Figure 6). Pronounced discrepancies among classification results derived from different methods are evident at the local scale, particularly in areas characterized by complex terrain and high forest structural heterogeneity. In regions dominated by Picea–Abies forest (Figure 6(A1–A7)), only the CMID-based TA-SSL framework accurately identifies the target forest type, whereas all other methods misclassify these areas as Yunnan pine or Simao pine. A similar pattern is observed for Yunnan pine (Figure 6(B1–B7)), where correct identification is achieved only by TA-SSL and the full-feature-based approach, indicating persistent confusion among spectrally similar coniferous species when using alternative methods.

For forest types with intermediate ecological and structural characteristics, such as birch forests (Figure 6(C1–C7)), TA-SSL and selected feature-based approaches correctly capture local distributions, while other algorithms frequently confuse these areas with oak or other broadleaved forests. In contrast, tree species with distinctive spectral, phenological, and site-dependent characteristics, such as Simao pine in the Pu’er region (Figure 6(D1–D7)), are consistently well classified by all feature learning strategies, suggesting that these classes are inherently easier to discriminate regardless of the learning paradigm.

For rubber plantations (Figure 6(E1–E7)) and other broadleaved forests (Figure 6(F1–F7)), TA-SSL and the all-feature-based approach show clear advantages, whereas supervised deep learning models and other feature learning strategies frequently misclassify these areas as oak or birch forests. Overall, TA-SSL reduced confusion among spectrally and structurally similar forest types and produced more spatially coherent results in fragmented mountain environments.

5. Discussions

5.1. Terrain Robustness of TA-SSL in Mountainous Environments

Terrain-induced radiometric variability remains a major limitation for tree species mapping in mountainous regions. Illumination differences, terrain shadowing, and anisotropic reflectance distort spectral responses, reduce inter-scene consistency, and ultimately model transferability. Previous studies have attempted to mitigate these effects through topographic correction [59,60] or terrain-stratified modeling [61], which normalize reflectance or partition landscapes into homogeneous terrain units. Although effective in specific settings, these approaches rely on simplified illumination assumptions and often exhibit limited generalizability across sensors, seasons, and regions.

In contrast, TA-SSL addresses terrain effects at the representation level rather than through post hoc radiometric correction. By embedding terrain variables and terrain-aware objectives into feature learning, the model learns terrain-invariant representations that emphasize canopy structure, phenology, and ecological context. The improved spatial coherence observed in high-relief zones where terrain-induced illumination variability is strongest (Section 4.3 and Section 4.4) indicates that the learned features rely less on illumination-sensitive spectral responses and more on stable ecological attributes. From a mechanistic perspective, terrain robustness arises from the complementary roles of the generative and contrastive learning objectives. The generative objective promotes reconstruction of local spatial structure that remains relatively stable under varying illumination, enabling the encoding of terrain-independent structural cues. Meanwhile, the contrastive objective enforces semantic consistency across observations acquired under different lighting and viewing conditions, encouraging the model to prioritize invariant ecological characteristics over transient radiometric variations. Together, these mechanisms facilitate the learning of terrain-invariant features, thereby reducing sensitivity to shadowing, slope effects, and viewing geometry.

Compared with conventional terrain-stratified accuracy assessment used in earlier studies [62], our evaluation strategy employs a large independent validation dataset spanning the full range of elevation and slope conditions. This design reduces uncertainty associated with uneven terrain sampling and enables a more robust assessment of model generalization across terrain gradients [63]. Although the experiments were conducted in Yunnan Province, the underlying mechanism of terrain-aware representation learning is not region-specific. Similar terrain-induced radiometric variability occurs in many mountainous ecosystems worldwide, including alpine forests, subtropical mountain systems, and temperate high-relief landscapes. Therefore, integrating terrain information directly into representation learning provides a transferable methodological pathway for improving remote sensing classification in heterogeneous terrain environments.

5.2. Complementarity of Self-Supervised Learning Objectives

Recent remote sensing studies have demonstrated the potential of SSL to reduce dependence on labeled data [64], yet most applications rely on a single objective [65], such as masked image modeling (MIM) or contrastive learning (CL). Our results (Figure 7) show that TA-SSL, which jointly optimizes generative and contrastive objectives, consistently outperforms either objective alone. This finding indicates that single-objective SSL is insufficient to capture the intertwined structural complexity and environment gradients of mountainous forests. MIM emphasizes local spatial continuity and is effective for learning canopy texture and structural patterns, consistent with findings in vegetation structure mapping studies. However, its discriminative capacity is limited for forest types with subtle spectral differences, particularly under terrain-induced radiometric variability. Conversely, CL enhances global semantic separability by enforcing consistency across augmented views and multi-source observations, a strategy widely reported to improve land-cover classification under heterogeneous conditions. Nevertheless, CL alone may overlook fine-scale structural cues critical for distinguishing ecologically similar tree species.

By jointly optimizing generative and contrastive objectives, TA-SSL captures both local structural consistency and global semantic separability. The reduced cross-class confusion shown in Figure 8 confirms that this dual-objective design improves discrimination among spectrally similar tree species. More importantly, the complementary interaction between structural reconstruction and semantic discrimination enables the model to learn representations that remain stable under both terrain-induced radiometric variability and ecological gradients. This finding suggests that multi-objective SSL may represent a more general strategy for remote sensing representation learning in environmentally heterogeneous landscapes where both spatial structure and ecological context influence spectral patterns.

5.3. Environmentally Representativeness as the True Constraint of Small-Sample Learning

The concept of “small samples” in remote sensing is inherently scale-dependent and cannot be defined solely by absolute sample counts. In mountainous regions, environmental heterogeneity often makes environmental representativeness more critical than sample quantity. A modest number of environmentally representative samples may outperform a larger but spatially clustered dataset that fails to capture dominant terrain and ecological gradients. To examine this issue, training samples were progressively filtered based on geographic environmental similarity, prioritizing samples that represent major terrain and ecological conditions [66]. The results (Figure 9) show that classification accuracy remains stable once training samples capture dominant gradients, even when the absolute sample number is reduced. However, accuracy declines sharply when sample reduction leads to insufficient environmental coverage, indicating that representation learning cannot fully compensate for environmentally biased training samples. Compared with conventional supervised models, which are highly sensitive to sample imbalance, TA-SSL maintains stable performance under reduced-sample scenarios because self-supervised pretraining leverages large volumes of unlabeled data to encode terrain–ecology relationships. Nevertheless, the results confirm that representation learning alleviates, but does not eliminate, the need for environmentally representative field samples. These findings highlight that the fundamental limitation of small-sample learning in remote sensing is not simply the number of samples, but the degree to which training data capture dominant environmental gradients. This insight has broader implications for remote sensing studies in heterogeneous landscapes, suggesting that environmentally informed sampling strategies may be more important than increasing sample quantity alone for achieving robust model generalization.

5.4. Ecological Separability as an Intrinsic Limit of Representation Learning

While representation learning enhances robustness to terrain-induced variability and sample scarcity, classification performance remains bounded by ecological separability [54,55]. Species assemblages characterized by homogeneous stand structure and distinct ecological niches form compact and distinguishable clusters in representation space. Conversely, species-rich and structurally heterogeneous communities exhibit overlapping ecological signals, limiting discriminability regardless of model architecture (Figure 10). This observation highlights an important methodological boundary: algorithmic advances cannot fully overcome ecological overlap in complex forest ecosystems. Consequently, the limits of small-sample learning are not purely computational but also ecological. Recognizing this constraint is essential for avoiding unrealistic expectations of machine learning approaches in species-level mapping. Even with advanced representation learning frameworks, ecological similarity among species may impose an upper bound on classification accuracy. This limitation is not specific to the present study area but reflects a broader challenge for species-level remote sensing across biodiverse forest ecosystems worldwide.

5.5. Implication for Forest Inventory and Mountainous Tree Species Mapping

This study contributes to efforts to improve forest inventory in data-scarce mountainous regions by demonstrating that terrain-aware self-supervised representation learning can substantially reduce dependence on extensive field sampling. Compared with traditional supervised approaches, which require dense and balanced samples, the proposed framework enables cost-effective regional mapping while maintaining robust performance across heterogeneous terrain. However, the results also indicate that small-sample strategies must be carefully aligned with ecological complexity and environmental variability. Sampling designs that prioritize environmental representativeness and ecological coverage are essential for achieving reliable mapping results. Future forest inventory programs may benefit from integrating terrain stratification, ecological knowledge, and data-driven representation learning to optimize sampling efficiency. Beyond the specific study region, the proposed framework illustrates how terrain-aware representation learning can provide a scalable methodological paradigm for remote sensing applications in environmentally complex landscapes. By shifting the focus from purely radiometric correction and large sample requirements toward representation learning and environmental representativeness, this approach offers new opportunities for efficient ecological mapping and monitoring across large mountainous regions.

6. Conclusions

Forest type mapping in mountain regions is frequently constrained by scarce and spatially unbalanced field samples, limiting the effectiveness of conventional supervised classification approaches. This study proposed a TA-SSL framework that demonstrates that integrates terrain information into representation learning and leverages large volumes of unlabeled multi-source remote sensing data to extract discriminative and terrain-robust features. By combining these representations with limited field samples, the framework enables accurate tree species mapping under small-sample conditions. Experiments conducted in Yunnan Province show that TA-SSL consistently outperforms traditional feature engineering approaches and supervised deep learning models when training samples are limited. The results further show that terrain-aware representation learning effectively mitigates terrain-induced radiometric variability and improves classification robustness across heterogeneous mountainous environments. In addition, the required number of training samples is jointly controlled by forest structural complexity and environmental representativeness. Forest types with relatively simple stand structures can be reliably mapped using approximately 30–80 representative samples, whereas structurally complex forest communities require broader environmental coverage. Overall, this study highlights the potential of terrain-aware self-supervised representation learning as a scalable and data-efficient approach for forest mapping in mountainous and environmentally heterogeneous regions. These findings provide practical guidance for optimizing sampling strategies and improving the efficiency of large-area forest inventory and biodiversity monitoring using remote sensing data. Future work will further evaluate the transferability of the proposed framework across different mountainous ecosystems and incorporate additional ecological variables to enhance species discrimination.

Author Contributions

Conceptualization, L.H. (Li He), L.W. and L.H. (Liang Hong); methodology, L.H. (Li He), W.G. and L.W.; software, L.H. (Li He) and W.G.; validation, L.H. (Li He), L.H. (Liang Hong) and L.W.; formal analysis, L.H. (Li He) and Y.F.; investigation, L.H. (Li He); resources, L.W. and L.H. (Liang Hong); data curation, L.H. (Li He) and L.W.; writing—original draft preparation, L.H. (Li He); writing—review and editing, L.H. (Li He), L.H. (Liang Hong), Y.F. and L.W.; visualization, L.H. (Li He) and X.D.; supervision, M.Y., W.G. and J.L.; project administration, Q.D.; funding acquisition, Q.D., L.W. and L.H. (Liang Hong). All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China, grant numbers 32160369 and 42171392; the Fundamental Research Project of Yunnan Province, grant number 202501AS070090; The National Social Science Fund of China, grant number 23CMZ045; and Project of Doctoral Scientific Research Initiation Foundation of Southwest Forestry University, grant number 110225036.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request. The code used in this study can be accessed via the GitHub v1.0 repository: https://github.com/Liheynnu1/TA-SSL (accessed on 6 March 2026).

Acknowledgments

The authors would like to acknowledge the anonymous reviewers and editors whose thoughtful comments helped to improve this manuscript.

Conflicts of Interest

Author Wei Gu was employed by the company Shanghai Hecaray Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Vegetation indices and growth-related metrics derived from multi-source remote sensing data.

Data Source	Indices	Formulation	Reference
Sentinel-1	VV/VH	$V V / V H$	[67]
	DIF	$V V - V H$	[68]
	AVE	$(V V + V H) / 2$	[69]
	NDI	$(V V - V H) / (V V + V H)$	[70]
	RVI4S1	$\sqrt{V V / (V V + V H)} \times (V V / V H)$	[71]
	mRVI	$\sqrt{V V / (V V + V H)} \times (4 \times V H / (V V + V H))$	[72]
	VDDPI	$(V V + V H) / V V$	[73]
	mRVI_summerwinter	${m R V I}_{s u m m e r} - {m R V I}_{w i n t e r}$	[44]
Sentinel-2	NDVI	$(ρ_{N I R} - ρ_{r e d}) / (ρ_{N I R} + ρ_{r e d})$	[74]
	EVI	$2.5 * ((ρ_{N I R} - ρ_{r e d}) / (ρ_{N I R} + 6 * ρ_{r e d} - 7.5 * ρ_{b l u e} + 1))$	[75]
	REP	$705 + 3.5 * (((ρ_{N I R} + ρ_{R e d E d g e 3}) / 2 - ρ_{R e d E d g e 2}) / (ρ_{R e d E d g e 2} - ρ_{R e d E d g e 1}))$	[76]
	SVVI	$S D (ρ_{b l u e}, ρ_{g r e e n}, ρ_{r e d}, ρ_{N I R}, ρ_{{S W I R}_{1}}, ρ_{{S W I R}_{2}}) - S D (ρ_{N I R}, ρ_{{S W I R}_{1}}, ρ_{{S W I R}_{2}})$	[77]
	NDRE	$(ρ_{N I R} - ρ_{R e d E d g e 3}) / (ρ_{N I R} + ρ_{R e d E d g e 3})$	[78]
	MCARI	$(ρ_{R e d E d g e 1} - ρ_{r e d}) - 0.2 * (ρ_{R e d E d g e 1} - ρ_{g r e e n}) * (ρ_{R e d E d g e 1} / ρ_{r e d})$	[79]
	MSR_re	$(ρ_{N I R} / ρ_{r e d e d g e 1} - 1) / \sqrt{(ρ_{N I R} - ρ_{R e d E d g e 1}) + 1}$	[80]
	NDPI	$(ρ_{N I R} - (0.74 * ρ_{r e d} + 0.26 * ρ_{{S W I R}_{1}})) / (ρ_{N I R} + (0.74 * ρ_{r e d} + 0.26 * ρ_{{S W I R}_{1}}))$	[81]
	NDre1	$(ρ_{R e d E d g e 2} - ρ_{R e d E d g e 1}) / (ρ_{R e d E d g e 2} + ρ_{R e d E d g e 1})$	[82]
	IRECI	$(ρ_{R e d E d g e 2} - ρ_{r e d}) / (ρ_{R e d E d g e 1} / ρ_{R e d E d g e 2})$	[83]
	NDVIre2	$(ρ_{N I R} - ρ_{R e d E d g e 2}) / (ρ_{N I R} + ρ_{R e d E d g e 2})$	[82]
	NDVI_maxsummer	${N D V I}_{a n n u a l m a x i m u m} - {N D V I}_{w i n t e r}$	[37]
Landsat time series data	GrowthrateL	${G r o w t h r a t e L}_{i} = \frac{B_{i} - O_{i}}{B_{y e a r} - O_{y e a r}}$
Landsat time series data	Growthrate_2023	${G r o w t h r a t e 2023}_{i} = \frac{A_{i} - O_{i}}{2023 - O_{y e a r}}$
WorldClim	PrecMs/TemMs	$M S = \frac{{S D}_{m o n t h}}{{M e a n}_{m o n t h}} \times 100$	[47]
SMCI 1.0	SMRI	Soil Moisture Response Index	[48]

In Table A1,

ρ_{g r e e n}

,

ρ_{b l u e}

,

ρ_{r e d}

,

ρ_{N I R}

,

ρ_{{S W I R}_{1}}

,

ρ_{{S W I R}_{1}}

,

ρ_{{S W I R}_{2}}

,

ρ_{R e d E d g e 1}

,

ρ_{R e d E d g e 2}

,

ρ_{R e d E d g e 3}

,

ρ_{R e d E d g e 4}

represent canopy reflectance in the green, blue, red, near-infrared (NIR), shortwave infrared 1 (SWIR₁), shortwave infrared 2 (SWIR₂), red edge 1, red edge 2, red edge 3, and red edge 4 bands, respectively. VV and VH indicate vertical–vertical and vertical–horizontal backscattering coefficients derived from Sentinel-1 SAR data. The terms

{m R V I}_{s u m m e r}

and

{m R V I}_{w i n t e r}

represent the summer and winter maximum composite of the mRVI, while

{N D V I}_{a n n u a l m a x i m u m}

and NDVI_winter correspond to annual and winter maximum NDVI composites, respectively. Growth-related metrics were derived from long-term Landsat time-series data to quantify post-disturbance canopy development dynamics.

{G r o w t h r a t e 2023}_{i}

represents the mean annual growth rate from disturbance year to 2023, whereas

{G r o w t h r a t e L}_{i}

refers to the mean annual growth rate from the disturbance year to the year when canopy spectral or textural features reached a stable mature stage. Here,

A_{i}

,

O_{i}

and

B_{i}

represent the NDVI or canopy texture features for the year 2023, the forest disturbance year, and the maturity year, respectively. In this study, canopy maturity was defined as the first year in which the difference between the canopy feature value and its maximum recorded during the monitoring period fell below 1%.

Table A2. Categories of canopy- and environment-related features derived from multi-source remote sensing data.

Feature Category	Description	Number of Features
Spectral	10 spectral bands from Sentinel-2, tasseled cap greenness index (TC_greenness), and vegetation indices: NDVI, SVVI, EVI, NDRE, REPI, REI, MCARI, NDPI, MCR, IRECI, NDVIR1, TC_Greenness	22
SAR backscatter polarization	Sentinel-1 backscatter coefficients: VV and VH	2
SAR polarization indices	Polarimetric indices derived from VV and VH: VV/VH, DIF, AVE, NDI, RVI, mRVI, VDDPI	7
Texture	Texture metrics (e.g., Asm, contrast, dvar…) derived from the red_edge1 band using the Gray-Level Co-occurrence Matrix (GLCM)	18
Phenology	REPI (std, mean, median, min, max, Q25, Q75, IQR, cv, zf), std of mRVI, VV, and VH, mRVI_summerwinter, NDVI_maxsummer	15
Environmental	Elevation, Slope, Aspect, Mean Annual Precipitation (PrecMean), Mean seasonal Precipitation (PrecMs), Mean Annual Temperature (TemMean), Mean seasonal Temperature (TemMs), Soil Moisture Response Index (SMRI)	8
Spectral growth rate	NDVI-derived metrics: mean, median, std, zf, cv, Growthrate_2023, Growthrate_Longterm	6
Texture growth rate	SVVI_svar-derived metrics: mean, median, std, zf, cv, Growthrate_2023, GrowthrateL	6

Table A3. Feature combinations selected using rank-correlation-based feature selection.

Forest Type	Feature Types	Selected Features	Dimension
Coniferous forests	Spectral/Polarization	B1, B2, B3, B4, B5, B6, B8, B8A, B9, REPI, NDVIR1, MCARI, REI, mRVI, NDRE, NDVI, Ratio, EVI, NDPI, MCR, VDDPI, RVI, DIF	23
	Phenological	REPI_median, NDVI_shixu, REPI_min, REPI_minQ1d, REPI_maxQ3d	5
	Environmental	PrecMean, PrecMs, TemMs, TemMean, Elevation, SMRI	8
	Growth-rate	NDVI_cvL, NDVI_speedyL	2
Broadleaved forests	Spectral/Polarization	B1, B2, B3, B4, B5, B6, B7, B8A, B9, B11, B12, DIF, REI, SVVI, NDVIR1, NDRE, RVI, REPI, MCR, TC_greenness, AVE, VV, VH, NDPI	24
	Texture	B5_corr, B5_shade	2
	Phenological	REPI_Q3Q1zfd, REPI_meand, mRVI_sumwinter, REPI_zhenfud, REPI_maxQ3d, DIF_std, REPI_maxd, NDVI_shixu	9
	Environmental	Elevation, TemMs, TemMean, PrecMs, Aspect, SMRI	6
	Growth-rate	NDVI_speedyL, NDVI_speedy2023L, NDVI_cvL, NDVI_meanL	4

Table A4. Feature combinations selected using Random-Forest-based feature importance.

Forest Type	Feature Types	Selected Features	Dimension
Coniferous forests	Spectral/Polarization	B1, B2, B5, B9, REPI, MCARI	6
	Phenological	REPI_median, NDVI_shixu	2
	Environmental	PrecMean, TemMs, TemMean, Elevation, PrecMs	5
	Growth-rate	NDVI_speedyL	1
Broadleaved forests	Spectral/Polarization	B1, B2, B3, B4, B5, B7, B8A, B9, B11, DIF, NDRE, REPI, IRECI, RVI, AVE, TC_greenness, SVVI, MCARI, MCR, NDVIR1, VDDPI	21
	Phenological	NDVI_shixu, REPI_mediand, REPI_maxd, REPI_Q3Q1zfd	4
	Environmental	Elevation, TemMs, TemMean, PrecMean, PrecMs, SMRI	6
	Growth-rate	NDVI_speedyL, NDVI_speedy2023L, NDVI_cvL	3

References

Hemmerling, J.; Pflugmacher, D.; Hostert, P. Mapping temperate forest tree species using dense Sentinel-2 time series. Remote Sens. Environ. 2021, 267, 112743. [Google Scholar] [CrossRef]
Pereira Martins-Neto, R.; Garcia Tommaselli, A.M.; Imai, N.N.; Honkavaara, E.; Miltiadou, M.; Saito Moriya, E.A.; David, H.C. Tree Species Classification in a Complex Brazilian Tropical Forest Using Hyperspectral and LiDAR Data. Forests 2023, 14, 945. [Google Scholar] [CrossRef]
Liu, F.; Hu, J.; Yang, F.; Li, X. Heterogeneity-diversity Relationships in Natural Areas of Yunnan, China. Chin. Geogr. Sci. 2021, 31, 506–521. [Google Scholar] [CrossRef]
Kluczek, M.; Zagajewski, B.; Zwijacz-Kozica, T. Mountain Tree Species Mapping Using Sentinel-2, PlanetScope, and Airborne HySpex Hyperspectral Imagery. Remote Sens. 2023, 15, 844. [Google Scholar] [CrossRef]
Liu, P.; Ren, C.; Wang, Z.; Jia, M.; Yu, W.; Ren, H.; Xia, C. Evaluating the Potential of Sentinel-2 Time Series Imagery and Machine Learning for Tree Species Classification in a Mountainous Forest. Remote Sens. 2024, 16, 293. [Google Scholar] [CrossRef]
Jia, K.; Liang, S.L.; Zhang, L.; Wei, X.; Yao, Y.; Xie, X. Forest cover classification using Landsat ETM+ data and time series MODIS NDVI data. Int. J. Appl. Earth Obs. Geoinf. 2014, 33, 32–38. [Google Scholar] [CrossRef]
Griffiths, P.; Kuemmerle, T.; Baumann, M.; Radeloff, V.C.; Abrudan, I.V.; Lieskovsky, J.; Munteanu, C.; Ostapowicz, K.; Hostert, P. Forest disturbances, forest recovery, and changes in forest types across the Carpathian ecoregion from 1985 to 2010 based on Landsat image composites. Remote Sens. Environ. 2014, 151, 72–88. [Google Scholar] [CrossRef]
Shimada, M.; Itoh, T.; Motooka, T.; Watanabe, M.; Shiraishi, T.; Thapa, R.; Lucas, R. New global forest/non-forest maps from ALOS PALSAR data (2007–2010). Remote Sens. Environ. 2014, 155, 13–31. [Google Scholar] [CrossRef]
Zhang, X.; Long, T.; He, G.; Guo, Y.; Yin, R.; Zhang, Z.; Xiao, H.; Li, M.; Cheng, B. Rapid generation of global forest cover map using Landsat based on the forest ecological zones. J. Appl. Remote Sens. 2020, 14, 022211. [Google Scholar] [CrossRef]
Svoikin, F.; Zhuk, K.; Svoikin, V.; Ugryumov, S.; Bacherikov, I.; Iniesta, D.V.; Ryapukhin, A. Classification of Tree Species in the Process of Timber-Harvesting Operations Using Machine-Learning Methods. Inventions 2023, 8, 57. [Google Scholar] [CrossRef]
Wang, X.; Wang, J.; Lian, Z.; Yang, N. Semi-Supervised Tree Species Classification for Multi-Source Remote Sensing Images Based on a Graph Convolutional Neural Network. Forests 2023, 14, 1211. [Google Scholar] [CrossRef]
Zheng, P.; Fang, P.; Wang, L.; Ou, G.; Xu, W.; Dai, F.; Dai, Q. Synergism of Multi-Modal Data for Mapping Tree Species Distribution—A Case Study from a Mountainous Forest in Southwest China. Remote Sens. 2023, 15, 979. [Google Scholar] [CrossRef]
Goodchild, M.F. The validity and usefulness of laws in geographic information science and geography. Ann. Assoc. Am. Geogr. 2004, 94, 300–303. [Google Scholar] [CrossRef]
Cheng, K.; Wang, J. Forest-Type Classification Using Time-Weighted Dynamic Time Warping Analysis in Mountain Areas: A Case Study in Southern China. Forests 2019, 10, 1040. [Google Scholar] [CrossRef]
Chen, L.; Wu, J.; Xie, Y.; Chen, E.; Zhang, X. Discriminative feature constraints via supervised contrastive learning for few-shot forest tree species classification using airborne hyperspectral images. Remote Sens. Environ. 2023, 295, 113710. [Google Scholar] [CrossRef]
Dieste, Á.G.; Argüello, F.; Heras, D.B. ResBaGAN: A Residual Balancing GAN with Data Augmentation for Forest Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6428–6447. [Google Scholar] [CrossRef]
Shi, Y.; Ma, D.; Lv, J.; Li, J. ACTL: Asymmetric Convolutional Transfer Learning for Tree Species Identification Based on Deep Neural Network. IEEE Access 2021, 9, 13643–13654. [Google Scholar] [CrossRef]
Wang, N.; Pu, T.; Zhang, Y.; Liu, Y.; Zhang, Z. More appropriate DenseNetBL classifier for small sample tree species classification using UAV-based RGB imagery. Heliyon 2023, 9, e20467. [Google Scholar] [CrossRef]
Liu, X.; Bo, Y.; Zhang, J.; He, Y. Classification of C3 and C4 Vegetation Types Using MODIS and ETM+ Blended High Spatio-Temporal Resolution Data. Remote Sens. 2015, 7, 15244–15268. [Google Scholar] [CrossRef]
Zhang, X.; Yu, L.; Zhou, Q.; Wu, D.; Ren, L.; Luo, Y. Detection of Tree Species in Beijing Plain Afforestation Project Using Satellite Sensors and Machine Learning Algorithms. Forests 2023, 14, 1889. [Google Scholar] [CrossRef]
Tao, C.; Qi, J.; Guo, M.; Zhu, Q.; Li, H. Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5610426. [Google Scholar] [CrossRef]
Wang, X.; Yang, N.; Liu, E.; Gu, W.; Zhang, J.; Zhao, S.; Sun, G.; Wang, J. Tree Species Classification Based on Self-Supervised Learning with Multisource Remote Sensing Images. Appl. Sci. 2023, 13, 1928. [Google Scholar] [CrossRef]
Xie, L.; You, S.; Liu, A.; He, Y.; Huang, C.; Deng, J. Mitigating data Constraints in crop mapping: A self-supervised framework integrating adaptive clustering, graph convolution and global spatiotemporal attention. Int. J. Appl. Earth Obs. Geoinf. 2025, 144, 104951. [Google Scholar] [CrossRef]
Sharma, R.C.; Hara, K. Self-Supervised Learning of Satellite-Derived Vegetation Indices for Clustering and Visualization of Vegetation Types. J Imaging 2021, 7, 30. [Google Scholar] [CrossRef] [PubMed]
Liu, B.; Gao, K.; Yu, A.; Ding, L.; Qiu, C.; Li, J. ES2FL: Ensemble Self-Supervised Feature Learning for Small Sample Classification of Hyperspectral Images. Remote Sens. 2022, 14, 4236. [Google Scholar] [CrossRef]
Muhtar, D.; Zhang, X.; Xiao, P.; Li, Z.; Gu, F. CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5607817. [Google Scholar] [CrossRef]
Zhang, Z.; van Coillie, F.; Ou, X.; de Wulf, R. Integration of Satellite Imagery, Topography and Human Disturbance Factors Based on Canonical Correspondence Analysis Ordination for Mountain Vegetation Mapping: A Case Study in Yunnan, China. Remote Sens. 2014, 6, 1026–1056. [Google Scholar] [CrossRef]
Li, Y.; Xu, X.; Wu, Z.; Fan, H.; Tong, X.; Liu, J. A forest type-specific threshold method for improving forest disturbance and agent attribution mapping. GIScience Remote Sens. 2022, 59, 1624–1642. [Google Scholar] [CrossRef]
Takasu, T.; Yasuda, A. Development of the low-cost RTK-GPS receiver with an open source program package RTKLIB. In Proceedings of the International Symposium on GPS/GNSS, Jeju Island, Republic of Korea, 4–6 November 2009; pp. 1–6. [Google Scholar]
Bahamondez, C.; Álvarez, O.; Itzelcoaut, M. Global Forest Resources Assessment 2010 Main Report; Food and Agriculture Organization of the United Nations: Rome, Italy, 2010. [Google Scholar]
He, L.; Hong, L.; Dai, Q.; He, G.; Du, X.; Liu, J.; Xie, J. Enhancing forest-type classification in mountainous regions using a forest-succession-aware sample transfer strategy with multi-source remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2026; under review. [Google Scholar]
Oreopoulos, L.; Wilson, M.J.; Várnai, T. Implementation on Landsat Data of a Simple Cloud-Mask Algorithm Developed for MODIS Land Bands. IEEE Geosci. Remote Sens. Lett. 2011, 8, 597–601. [Google Scholar] [CrossRef]
Xiao, C.; Li, P.; Feng, Z.; Liu, Y.; Zhang, X. Sentinel-2 red-edge spectral indices (RESI) suitability for mapping rubber boom in Luang Namtha Province, northern Lao PDR. Int. J. Appl. Earth Obs. Geoinf. 2020, 93, 102176. [Google Scholar] [CrossRef]
Powell, M.J. A Direct Search Optimization Method that Models the Objective and Constraint Functions by Linear Interpolation; Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar]
Press, W.H.; Teukolsky, S.A. Savitzky-Golay Smoothing Filters. Comput. Phys. 1990, 4, 669–672. [Google Scholar] [CrossRef]
Grabska, E.; Frantz, D.; Ostapowicz, K. Evaluation of machine learning algorithms for forest stand species mapping using Sentinel-2 imagery and environmental data in the Polish Carpathians. Remote Sens. Environ. 2020, 251, 112103. [Google Scholar] [CrossRef]
Li, R.; Xia, H.; Zhao, X.; Guo, Y. Mapping evergreen forests using new phenology index, time series Sentinel-1/2 and Google Earth Engine. Ecol. Indic. 2023, 149, 110157. [Google Scholar] [CrossRef]
Ma, M.; Liu, J.; Liu, M.; Zeng, J.; Li, Y. Tree Species Classification Based on Sentinel-2 Imagery and Random Forest Classifier in the Eastern Regions of the Qilian Mountains. Forests 2021, 12, 1736. [Google Scholar] [CrossRef]
Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Joseph Hughes, M.; Laue, B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef]
Chen, J.; Zhu, X.; Vogelmann, J.E.; Gao, F.; Jin, S. A simple and effective method for filling gaps in Landsat ETM+ SLC-off images. Remote Sens. Environ. 2011, 115, 1053–1064. [Google Scholar] [CrossRef]
Roy, D.P.; Kovalskyy, V.; Zhang, H.K.; Vermote, E.F.; Yan, L.; Kumar, S.S.; Egorov, A. Characterization of Landsat-7 to Landsat-8 reflective wavelength and normalized difference vegetation index continuity. Remote Sens. Environ. 2016, 185, 57–70. [Google Scholar] [CrossRef] [PubMed]
He, L.; Hong, L.; Zhu, A. A modified LandTrendr for forest disturbance detection using Landsat time-series data: A case study in Yunnan Province, China. J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024; Submitted for publication. [Google Scholar]
Dostálová, A.; Lang, M.; Ivanovs, J.; Waser, L.T.; Wagner, W. European Wide Forest Classification Based on Sentinel-1 Data. Remote Sens. 2021, 13, 337. [Google Scholar] [CrossRef]
Szigarski, C.; Jagdhuber, T.; Baur, M.; Thiel, C.; Parrens, M.; Wigneron, J.-P.; Piles, M.; Entekhabi, D. Analysis of the Radar Vegetation Index and Potential Improvements. Remote Sens. 2018, 10, 1776. [Google Scholar] [CrossRef]
Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
Fick, S.E.; Hijmans, R.J. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 2017, 37, 4302–4315. [Google Scholar] [CrossRef]
Su, Y.; Guo, Q.; Jin, S.; Guan, H.; Sun, X.; Ma, Q.; Hu, T.; Wang, R.; Li, Y. The Development and Evaluation of a Backpack LiDAR System for Accurate and Efficient Forest Inventory. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1660–1664. [Google Scholar] [CrossRef]
Einzmann, H.J.; Beyschlag, J.; Hofhansl, F.; Wanek, W.; Zotz, G. Host tree phenology affects vascular epiphytes at the physiological, demographic and community level. AoB Plants 2014, 7, plu073. [Google Scholar] [CrossRef]
Li, Q.; Shi, G.; Shangguan, W.; Nourani, V.; Li, J.; Li, L.; Huang, F.; Zhang, Y.; Wang, C.; Wang, D.; et al. A 1 km daily soil moisture dataset over China using in situ measurement and machine learning. Earth Syst. Sci. Data 2022, 14, 5267–5286. [Google Scholar] [CrossRef]
Xie, Z.; Zhang, Z.; Cao, Y.; Lin, Y.; Bao, J.; Yao, Z.; Dai, Q.; Hu, H. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9653–9663. [Google Scholar]
Li, S.; Wu, D.; Wu, F.; Zang, Z.; Li, S. Architecture-Agnostic Masked Image Modeling--From ViT back to CNN. arXiv 2022, arXiv:2205.13943. [Google Scholar]
Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Fang, P.; Ou, G.; Li, R.; Wang, L.; Xu, W.; Dai, Q.; Huang, X. Regionalized classification of stand tree species in mountainous forests by fusing advanced classifiers and ecological niche model. GIScience Remote Sens. 2023, 60, 2211881. [Google Scholar] [CrossRef]
Li, R.; Fang, P.; Xu, W.; Wang, L.; Ou, G.; Zhang, W.; Huang, X. Classifying Forest Types over a Mountainous Area in Southwest China with Landsat Data Composites and Multiple Environmental Factors. Forests 2022, 13, 135. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
D’Amico, G.; Francini, S.; Giannetti, F.; Vangi, E.; Travaglini, D.; Chianucci, F.; Mattioli, W.; Grotti, M.; Puletti, N.; Corona, P.; et al. A deep learning approach for automatic mapping of poplar plantations using Sentinel-2 imagery. GIScience Remote Sens. 2021, 58, 1352–1368. [Google Scholar] [CrossRef]
Sun, P.; Yuan, X.; Li, D. Classification of Individual Tree Species Using UAV LiDAR Based on Transformer. Forests 2023, 14, 484. [Google Scholar] [CrossRef]
Chen, R.; Yin, G.; Zhao, W.; Yan, K.; Wu, S.; Hao, D.; Liu, G. Topographic correction of optical remote sensing images in mountainous areas: A systematic review. IEEE Geosci. Remote Sens. Mag. 2023, 11, 125–145. [Google Scholar]
Yin, H.; Tan, B.; Frantz, D.; Radeloff, V.C. Integrated topographic corrections improve forest mapping using Landsat imagery. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102716. [Google Scholar] [CrossRef]
Adhikari, H.; Heiskanen, J.; Maeda, E.E.; Pellikka, P.K.E. The effect of topographic normalization on fractional tree cover mapping in tropical mountains: An assessment based on seasonal Landsat time series. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 20–31. [Google Scholar] [CrossRef]
Meng, Q.; Wang, J.; Yang, K.; He, Y.; Xiao, L.; Zhou, H. Evaluating the Performance of Land Use Products in Mountainous Regions: A Case Study in the Wumeng Mountain Area, China. Land 2025, 14, 1730. [Google Scholar] [CrossRef]
Zhu, X.; Wang, T.; Skidmore, A.K.; Duporge, I. A deep learning framework for mapping evergreen conifer fractional cover at 30 m resolution using fused bi-temporal WorldView and time-series Landsat imagery in mixed mountain forests. Remote Sens. Environ. 2025, 331, 115055. [Google Scholar] [CrossRef]
Xue, Z.; Yu, X.; Yu, A.; Liu, B.; Zhang, P.; Wu, S. Self-Supervised Feature Learning for Multimodal Remote Sensing Image Land Cover Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5533815. [Google Scholar] [CrossRef]
Pang, S.; Xiang, J.; Zuo, Z.; Hu, H.; Jiang, H. Contrastive Masked Feature Modeling for Self-Supervised Representation Learning of High-Resolution Remote Sensing Images. Remote Sens. 2026, 18, 626. [Google Scholar] [CrossRef]
Zhu, A.X.; Turner, M. How is the Third Law of Geography different? Ann. GIS 2022, 28, 57–67. [Google Scholar] [CrossRef]
Bazzi, H.; Baghdadi, N.; El Hajj, M.; Zribi, M.; Minh, D.H.T.; Ndikumana, E.; Courault, D.; Belhouchette, H. Mapping Paddy Rice Using Sentinel-1 SAR Time Series in Camargue, France. Remote Sens. 2019, 11, 887. [Google Scholar] [CrossRef]
Zhao, F.; Wang, T.; Zhang, L.; Feng, H.; Yan, S.; Fan, H.; Xu, D.; Wang, Y. Polarimetric Persistent Scatterer Interferometry for Ground Deformation Monitoring with VV-VH Sentinel-1 Data. Remote Sens. 2022, 14, 309. [Google Scholar] [CrossRef]
Tazmul Islam, M.; Meng, Q. An exploratory study of Sentinel-1 SAR for rapid urban flood mapping on Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 103002. [Google Scholar] [CrossRef]
Sarzynski, T.; Giam, X.; Carrasco, L.; Lee, J.S.H. Combining Radar and Optical Imagery to Map Oil Palm Plantations in Sumatra, Indonesia, Using the Google Earth Engine. Remote Sens. 2020, 12, 1220. [Google Scholar]
Snevajs, H.; Charvat, K.; Onckelet, V.; Kvapil, J.; Zadrazil, F.; Kubickova, H.; Seidlova, J.; Batrlova, I. Crop Detection Using Time Series of Sentinel-2 and Sentinel-1 and Existing Land Parcel Information Systems. Remote Sens. 2022, 14, 1095. [Google Scholar]
Gella, G.W.; Bijker, W.; Belgiu, M. Mapping crop types in complex farming areas using SAR imagery with dynamic time warping. ISPRS J. Photogramm. Remote Sens. 2021, 175, 171–183. [Google Scholar] [CrossRef]
Periasamy, S. Significance of dual polarimetric synthetic aperture radar in biomass retrieval: An attempt on Sentinel-1. Remote Sens. Environ. 2018, 217, 537–549. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Liu, H.Q.; Huete, A. A feedback based modification of the NDVI to minimize canopy background and atmospheric noise. IEEE Trans. Geosci. Remote Sens. 1995, 33, 457–465. [Google Scholar] [CrossRef]
Schlerf, M.; Atzberger, C.; Hill, J. Remote sensing of forest biophysical variables using HyMap imaging spectrometer data. Remote Sens. Environ. 2005, 95, 177–194. [Google Scholar] [CrossRef]
Coulter, L.L.; Stow, D.A.; Tsai, Y.-H.; Ibanez, N.; Shih, H.-c.; Kerr, A.; Benza, M.; Weeks, J.R.; Mensah, F. Classification and assessment of land cover and land use change in southern Ghana using dense stacks of Landsat 7 ETM+ imagery. Remote Sens. Environ. 2016, 184, 396–409. [Google Scholar] [CrossRef]
Ahamed, T.; Tian, L.; Zhang, Y.; Ting, K.C. A review of remote sensing methods for biomass feedstock production. Biomass Bioenergy 2011, 35, 2455–2469. [Google Scholar] [CrossRef]
Daughtry, C.S.; Walthall, C.; Kim, M.; De Colstoun, E.B.; McMurtrey, J.E., III. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating chlorophyll content from hyperspectral vegetation indices: Modeling and validation. Agric. For. Meteorol. 2008, 148, 1230–1241. [Google Scholar] [CrossRef]
Wang, C.; Chen, J.; Wu, J.; Tang, Y.; Shi, P.; Black, T.A.; Zhu, K. A snow-free vegetation index for improved monitoring of vegetation spring green-up date in deciduous ecosystems. Remote Sens. Environ. 2017, 196, 1–12. [Google Scholar] [CrossRef]
Gitelson, A.; Merzlyak, M.N. Spectral reflectance changes associated with autumn senescence of Aesculus hippocastanum L. and Acer platanoides L. leaves. Spectral features and relation to chlorophyll estimation. J. Plant Physiol. 1994, 143, 286–292. [Google Scholar] [CrossRef]
Rozenstein, O.; Haymann, N.; Kaplan, G.; Tanny, J. Validation of the cotton crop coefficient estimation model based on Sentinel-2 imagery and eddy covariance measurements. Agric. Water Manag. 2019, 223, 105715. [Google Scholar] [CrossRef]

Figure 1. Study area overview showing (a) geographic location, (b) elevation, (c) slope, and (d) forest cover.

Figure 2. Spatial distribution of (a) training samples and (b) validation samples.

Figure 3. Workflow of tree species classification based on the proposed terrain-aware self-supervised learning framework under limited training samples.

Figure 4. Class-level classification performance of different feature learning methods in terms of (a) UA, (b) PA, (c) F1-score.

Figure 5. Spatial distribution of tree species classification results in Yunnan Province generated from different feature learning algorithms: (a) All features; (b) RCCF; (c) RF-MDA; (d) MLP; (e) Transformer and (f) TA-SSL.

Figure 6. Local-scale comparison of tree species classification results obtained using different feature learning methods, validated with field photographs and high-resolution reference data: (A1–A7) Abies-Picea forest; (B1–B7) Yunnan pine; (C1–C7) Birch forests; (D1–D7) Simao pine; (E1–E7) Rubber plantation; (F1–F7) Other broadleaved forest.

Figure 7. Effects of different feature learning strategies on overall classification performance in complex mountainous regions, measured by (a) OA and (b) F1-score.

Figure 8. Confusion matrices of tree species classification results obtained using different self-supervised feature learning strategies in mountainous region: (a) contrast learning (CL), (b) masked image modeling (MIM), and (c) TA-SSL.

Figure 9. Sensitivity of classification performance to training sample quantity: (a) classification accuracy and (b) corresponding number of training samples.

Figure 10. Classification performance of individual tree species under different training sample sizes, shown by (a) F1-score and (b) number of training samples for each tree species.

Table 1. Number of training and validation samples for each tree species.

Tree Species	Training Samples	Validation Samples
Yunnan pine	565	2007
Simao pine	340	497
Abies-Picea forest	84	604
Other coniferous forest	250	3295
Oak forest	651	3302
Birch forests	271	2134
Rubber plantation	56	117
Other broadleaved forest	577	3144
Mixed coniferous-broadleaved forests	289	369
Bamboo forest	37	41
Total	3120	15,510

Table 2. Summary of canopy-related and environmental features extracted from multi-source remote sensing data.

Feature Category	Description	Data Source	Dimensionality
Spectral	Sentinel-2 multispectral bands, tasseled cap greenness index (TC_greenness) and commonly used vegetation indices describing canopy reflectance	Sentinel-2	22
Polarization	Sentinel-1 backscatter coefficients and polarization-based indices characterizing forest structural features	Sentinel-1	9
Texture	Gray-level co-occurrence matrix (GLCM) based texture metrics derived from red-edge1 band	Sentinel-2	18
Phenological	Optical- and SAR-based phenological metrics describing intra-annual and seasonal canopy dynamics	Sentinel-1/2	15
Environmental variables	Topographic, climatic, and soil moisture variables representing forest site conditions	SRTM, WorldClim, SMCI	8
Spectral growth features	Growth-rate metrics derived from long-term NDVI time series data	Landsat time series data	6
Texture growth features	Growth-rate metrics derived from time-series texture features	Landsat time series data	6

Note: Detailed definitions of individual indices are provided in the Appendix A (Table A1).

Table 3. Classification accuracy of different feature representation strategies for tree species mapping.

Feature Representation Strategies		OA (%)	Kappa
Feature selection–based methods	All features	40.55	0.2953
	RF–MDA	38.33	0.2831
	RCCF	39.28	0.2883
Supervised deep learning models	MLP	47.83	0.3396
Supervised deep learning models	Transformer	37.33	0.2296
Self-supervised method	TA-SSL	75.80	0.6868

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, L.; Wang, L.; Hong, L.; Dai, Q.; Gu, W.; Du, X.; Yang, M.; Liu, J.; Feng, Y. Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples. Remote Sens. 2026, 18, 951. https://doi.org/10.3390/rs18060951

AMA Style

He L, Wang L, Hong L, Dai Q, Gu W, Du X, Yang M, Liu J, Feng Y. Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples. Remote Sensing. 2026; 18(6):951. https://doi.org/10.3390/rs18060951

Chicago/Turabian Style

He, Li, Leiguang Wang, Liang Hong, Qinling Dai, Wei Gu, Xingyue Du, Mingqi Yang, Juanjuan Liu, and Yaoming Feng. 2026. "Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples" Remote Sensing 18, no. 6: 951. https://doi.org/10.3390/rs18060951

APA Style

He, L., Wang, L., Hong, L., Dai, Q., Gu, W., Du, X., Yang, M., Liu, J., & Feng, Y. (2026). Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples. Remote Sensing, 18(6), 951. https://doi.org/10.3390/rs18060951

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Terrain-Aware Self-Supervised Representation Learning for Tree Species Mapping in Mountainous Regions Under Limited Field Samples

Highlights

Abstract

1. Introduction

2. Study Area and Datasets

2.1. Study Area

2.2. Collecting Field Survey Samples for Tree Species

2.3. Multi-Source Remote Sensing Data Acquisition and Preprocessing

2.3.1. Optical Remote Sensing Data

2.3.2. SAR Data

2.3.3. Environmental Condition

3. Method

3.1. Overall Workflow

3.2. Construction of Unlabeled Multi-Source Image Patches

3.3. Terrain-Aware Self-Supervised Representation Learning

3.3.1. Local Structural Representation Learning

3.3.2. Global Semantic Representation Learning

3.3.3. Joint Optimization of Local and Global Representation

3.3.4. Training Configuration

3.4. Tree Species Classification

3.5. Comparative Experiments

3.6. Classification Accuracy Assessment

4. Results

4.1. Overall Classification Performance

4.2. Classification Performance Across Individual Tree Species

4.3. Spatial Comparison of Tree Species Classification Maps

4.4. Local-Scale Validation and Detailed Spatial Consistency

5. Discussions

5.1. Terrain Robustness of TA-SSL in Mountainous Environments

5.2. Complementarity of Self-Supervised Learning Objectives

5.3. Environmentally Representativeness as the True Constraint of Small-Sample Learning

5.4. Ecological Separability as an Intrinsic Limit of Representation Learning

5.5. Implication for Forest Inventory and Mountainous Tree Species Mapping

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI