Abstract
Accurate tree species mapping is critical for forest inventory, biodiversity assessment, and ecosystem management. In mountainous regions, terrain-induced radiometric non-stationarity and limited field access often produce scarce, clustered, and environmentally biased samples, limiting model generalization. To address this issue, this study proposes a terrain-aware self-supervised representation learning framework for tree species classification under small-sample conditions. The framework integrates terrain information into representation learning and adopts a hybrid contrastive–generative self-supervised strategy to learn discriminative and terrain-robust features from large volumes of unlabeled multi-source remote sensing data. These learned representations are subsequently combined with limited field samples to produce regional-scale tree species maps. Experiments conducted across Yunnan Province, China, using Sentinel-1, Sentinel-2 and Landsat time-series data show that the proposed framework substantially improvesa class separability and classification robustness in complex mountainous environments. The framework achieves an overall accuracy of 75.8%, significantly outperforming conventional feature engineering (38.3–40.6%) and supervised deep learning models (37.3–47.8%). Species with relatively homogeneous structure and strong ecological niche dependence can be accurately mapped with limited training samples, whereas structurally complex forest communities require broader environmental sample coverage. Overall, the results highlight the potential of terrain-aware self-supervised representation learning as a scalable and data-efficient paradigm for forest mapping in mountainous and environmentally heterogeneous regions.