Next Article in Journal
Global Inversion of Terrestrial Net Ecosystem Exchange: Integrating Explicit Multi-Source Predictors and High-Dimensional Remote-Sensing Embeddings
Previous Article in Journal
Development of a Spatiotemporal Estimation Method for Rice Plant Height Using Pattern Matching Based on Time-Series Satellite-Derived Vegetation Indices and In Situ Measurements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Dimensional Feature-Driven Method for Remote Sensing-Based Identification of Cereal and Oil Crops in the Tibetan Plateau

1
State Key Laboratory of Soil and Water Conservation and Desertification Control, College of Soil and Water Conservation Science and Engineering, Northwest A&F University, Yangling 712100, China
2
Institute of Soil and Water Conservation, Chinese Academy of Sciences and Ministry of Water Resources, Yangling 712100, China
3
College of Grassland Agriculture, Northwest A&F University, Yangling 712100, China
4
School of Life Sciences, University of Technology Sydney, Broadway, Sydney, NSW 2007, Australia
5
College of Information Engineering, Northwest A&F University, Yangling 712100, China
6
New South Wales Department of Climate Change, Energy, the Environment and Water, Parramatta, NSW 2150, Australia
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(9), 1391; https://doi.org/10.3390/rs18091391
Submission received: 21 March 2026 / Revised: 21 April 2026 / Accepted: 28 April 2026 / Published: 30 April 2026
(This article belongs to the Section Environmental Remote Sensing)

Highlights

What are the main findings?
  • A high-accuracy crop mapping framework was developed for the Shigatse region using GEE and Sentinel-2 data, with Random Forest achieving 84.77% accuracy.
  • The study identified the spatial distribution of highland barley, wheat, and rapeseed, with highland barley dominating the region.
What are the implications of the main findings?
  • This framework provides an efficient solution for crop identification in complex, high-altitude environments, addressing challenges like cloud and snow interference.
  • It offers valuable insights for precision agriculture and land use management in remote regions, with potential for broader application across similar ecological zones.

Abstract

Fragmented farmland and persistent cloud–snow interference in the high-altitude cold regions of the Qinghai–Tibet Plateau, coupled with unstable crop phenology, pose significant challenges for accurate cereal and oil crop identification using single-date imagery or low-dimensional features. This study focused on the agricultural areas of the Shigatse River Valley in the Qinghai–Tibet Plateau. Leveraging the Google Earth Engine (GEE) cloud computing platform, we integrated Sentinel-2 remote sensing data with field survey sampling data to extract the planting structures, distribution patterns, and cultivated areas of cereal and oil crops. Three machine-learning classifiers—Random Forest (RF), Support Vector Machine (SVM), and Gradient Boosted Trees (GBT)—were evaluated to investigate the influence of different feature sets and classifier combinations on mapping accuracy. The results indicated that when all feature bands were utilized, the RF classifier achieved the highest performance, with an overall accuracy of 84.77% and a kappa coefficient of 0.64, outperforming both the SVM and GBT models. The incorporation of phenological and topographic features further enhanced classification accuracy, providing a robust framework for identifying cereal and oil crops in high-altitude environments. Based on the optimal model estimation, the cultivated areas in 2021 were 581.52 km2 for highland barley, 295.39 km2 for wheat, and 386.81 km2 for rapeseed. Their spatial patterns closely aligned with the valley-terrace topography and local irrigation conditions. These findings offer novel insights and a reliable methodology for the rapid extraction of crop spatial information in regions with complex planting structures.

1. Introduction

Food security constitutes a fundamental pillar of national security, providing essential empirical evidence for economic development, ecological assessments, and strategic governance [1,2]. Precise identification of crop types and their spatial distributions is paramount for advancing precision agriculture, yield forecasting, and sustainable resource optimization [3]. Nevertheless, global agricultural systems are currently confronting unprecedented stressors. Extensive land degradation, exacerbated by anthropogenic activities, has triggered persistent yield declines in regions supporting approximately 1.7 billion people, thereby jeopardizing global agricultural productivity and food stability [4]. Furthermore, the intensification of climate change—characterized by increasingly frequent extreme events such as droughts, floods, and thermal anomalies—continues to disrupt crop phenological cycles, productivity, and nutritional quality. Coupled with escalating water scarcity and soil pedological degradation, these multifaceted drivers pose substantial risks to the sustainable cultivation of cereal and oil crops. Mitigating these threats necessitates the development of robust, scalable, and high-fidelity methodologies for monitoring agricultural land-use dynamics and crop distributions to facilitate adaptive management and resilient strategic planning.
These challenges are markedly intensified in geographically isolated and ecologically fragile regions, such as the Qinghai–Tibet Plateau (QTP). As one of the world’s most challenging high-altitude and cryospheric agricultural environments, the QTP is defined by extreme climatic constraints, complex topographic gradients, and pronounced environmental heterogeneity. The Shigatse region, situated in the southwestern QTP, epitomizes these alpine and arid conditions, where agrarian systems are dominated by cold-resilient cultivars, specifically highland barley, wheat, and rapeseed [5]. Nonetheless, the region’s expansive spatial scale, sparse population, and insufficient logistical infrastructure render conventional ground-based surveys logistically prohibitive and spatially inadequate [6]. While satellite remote sensing offers a viable alternative, its efficacy in this periglacial landscape is hindered by severe land fragmentation, persistent cloud and snow obscuration, and substantial spatiotemporal variability. Furthermore, the abbreviated and erratic growing seasons induce instabilities in crop phenological trajectories and spectral signatures [7,8,9], compromising the fidelity of classification frameworks reliant on monotemporal imagery or limited feature sets. Currently, research in this domain remains constrained by inadequate feature dimensionality and a lack of integrated methodological synergy. Consequently, there is an urgent need to establish a robust, high-performance geospatial framework to accurately delineate the spatial patterns of cereal and oil crops across Tibet’s agricultural landscapes.
Cloud-based geospatial processing platforms, exemplified by Google Earth Engine (GEE), have fundamentally transformed land cover and crop mapping research over large areas [10]. GEE provides unprecedented access to vast archives of satellite imagery (e.g., Sentinel-2) and high-performance computing resources, enabling the efficient processing of multi-temporal datasets at a regional scale [11,12]. By leveraging high-resolution Sentinel-2 imagery, GEE facilitates refined crop identification, enhances computational efficiency, and mitigates the aliasing effects or data limitations inherent in coarse-resolution products [13]. This paradigm shift effectively overcomes the constraints of conventional field surveys, which are resource-intensive, time-consuming, and lack the necessary spatial granularity for precise mapping [14].
Despite these advances, accurate crop classification in complex terrains and extreme climatic environments cannot rely solely on spectral information. Crop spectral signatures and phenological trajectories are strongly governed by temperature, radiation, and topographic factors such as elevation and aspect, leading to pronounced “same species, different spectra” phenomenon [15,16,17]. As a result, classification approaches based on monotemporal observations or low-dimensional features are highly susceptible to terrain shadows, snow interference, and spatio-temporal variability, resulting in unstable classification outputs and limited accuracy [18,19]. Therefore, integrating multi-dimensional features—such as temporal spectral indices, texture information, phenological metrics, topographic variables, and edaphic properties—is critical for improving classification robustness and accuracy in high-altitude cold regions.
Machine learning classifiers play a pivotal role in modern remote sensing classification. Representative algorithms, such as random forest (RF), support vector machine (SVM), and gradient-boosted trees (GBT), have demonstrated robust potential in crop mapping applications; however, their performance remains highly dependent on feature selection, regional heterogeneity, and landscape complexity [20,21,22]. Consequently, comparative evaluations across different agro-ecological zones are imperative for identifying optimal classifiers. Although Yan Jianzhong successfully extracted crop information in eastern Qinghai using phenological features and the CART algorithm, achieving an overall accuracy of 86.23%, the applicability and robustness of advanced machine learning classifiers in the distinctive environmental setting of Shigatse remain insufficiently explored [9].
To address these limitations, this study aims to develop a robust crop recognition framework that integrates multi-dimensional features with machine learning classifiers for high-altitude cold regions. Specifically, the objectives are to: (1) construct a multi-dimensional feature set integrating spectral, phenological, textural, and environmental variables to comprehensively characterize crop growth dynamics; (2) systematically evaluate the classification performance of RF, SVM, and GBT using these high-dimensional features to select the optimal recognition model for plateau environments; and (3) develop an integrated workflow on the GEE platform encompassing data preprocessing, feature construction, and classification mapping. This workflow was validated through an empirical case study in a representative agricultural area of Shigatse, Tibet, to assess its accuracy and robustness. Ultimately, this study provides a scalable remote sensing-based crop identification solution applicable to high-altitude cold regions.

2. Materials and Methods

2.1. Study Area

The Tibet Autonomous Region (TAR) is located in the southwestern part of the Qinghai–Tibet Plateau, extending from 78°25′E to 99°06′E and 26°50′N to 36°53′N. It covers approximately 1.22 × 106 km2 and has an average elevation exceeding 4000 m above sea level, and it is known as the “Roof of the World” [23]. Climatologically, the TAR belongs to the Qinghai–Tibet cold-arid agricultural climatic zone, encompassing frigid, sub-frigid, and temperate zones that exhibit pronounced vertical climatic zonation [24]. The regional climate is characterized by severe cold and aridity in the northwest and relatively warm and humid conditions in the southeast, with long winters, indistinct summers, and merged spring–autumn seasons. The annual mean temperature ranges from −2.4 to 12.1 °C, and annual sunshine duration varies between 1443.5 and 3574.3 h. Precipitation shows pronounced spatial heterogeneity, with annual totals ranging from approximately 400 to 1000 mm [3]. According to the Third National Land Survey Bulletin, the TAR currently possesses 4420 km2 of arable land, accounting for a mere 0.37% of its total land area [25]. Cultivated land is primarily distributed along the river valleys of the Yarlung Tsangpo and its tributaries, as well as foothill slopes, alluvial fans, and lakeshore plains. Based on distinct biophysical and agricultural conditions, the region’s cropland is categorized into three major agricultural zones: Eastern, Southern, and Northern Tibet [26].
The study focuses on two representative agricultural regions within the central Tibetan Plateau: Shigatse and Shannan. Shigatse serves as the primary area for model development and internal validation, while Shannan is employed to evaluate the generalization performance of the models. Both regions are situated in the primary grain-producing zone of the southern Tibetan valleys (Figure 1), encompassing a total area of approximately 2.58 × 105 km2 [27]. The southern Tibetan cultivation zone is characterized as a semi-agricultural and semi-pastoral region, with its southeastern portion dominated by forestry; it lies south of the Gangdise-Nyenchen Tanglha Mountains and west of the Hengduan Mountains, spanning the Yarlung Tsangpo River valley, the southern Tibetan Plateau, and the Himalayan region. This zone contains over 60% of arable land in Tibet and contributes approximately 70% of the region’s total grain yield. Highland barley is the dominant crop, constituting about 47% of the cultivated area, followed by winter wheat (29%), spring wheat (6.7%), and legumes (14.4%). Cash crops primarily consist of rapeseed and sugar beets, while localized vegetable cultivation is concentrated in peri-urban areas.

2.2. Datasets

The datasets used in this study included Sentinel-2 multispectral imagery accessed through the GEE platform, global cropland datasets, topographic data, climate reanalysis data, and soil datasets. Administrative boundary data at the provincial, municipal, and county levels (2024 edition) were obtained from the National Geographic Information Public Service Platform of China, and crop sample points were collected through field surveys.

2.2.1. Remote Sensing Imagery

(1)
Sentinel-2 Data
Sentinel-2A and Sentinel-2B were launched by the European Space Agency (ESA) on 23 June 2015 and 7 March 2017, respectively, to provide systematic observations of terrestrial vegetation, soils, inland and coastal waters, and related land surfaces [28]. Both satellites are equipped with the Multispectral Instrument (MSI), operate at an altitude of 786 km and provide 13 spectral bands with a swath width of 290 km [29]. These bands have spatial resolutions of 10 m, 20 m, and 60 m, covering wavelengths from the visible and near-infrared to the shortwave infrared regions. The revisit time of a single satellite is 10 days, whereas the combined constellation provides a revisit time of 5 days. Sentinel-2 data serve as an ideal source for monitoring vegetation health and crop identification [30]. In this study, Sentinel-2 Level-2A surface reflectance (SR) imagery covering the main crop growing season in 2021 was accessed through the GEE platform for crop classification. The Sentinel-2 Level-2A surface reflectance products used in this study were generated using the Sen2Cor processor, which applies atmospheric, illumination, and terrain corrections to improve the physical consistency of the spectral features.
(2)
Cropland Data
The Global Cropland 2019 dataset was developed by researchers at the University of Maryland based on multi-source remote sensing data and is publicly available on the GEE platform [31]. This dataset provides global cropland distribution data for 2019 at a spatial resolution of 30 m, enabling accurate delineation of cropland boundaries at regional scales. It adopts a binary classification scheme, in which a pixel value of 1 represents cropland and 0 represents non-cropland. Validation results have demonstrated its high overall accuracy, indicating that it is suitable for cropland monitoring, cropland dynamics analysis, and food security and ecological studies. In this study, the Global Cropland 2019 dataset was used as a mask to exclude non-cropland areas, thereby reducing background interference and improving classification accuracy for cereal and oil crops.
(3)
Administrative Boundary Data
Administrative boundary data at the provincial, municipal, and county levels for 2024 were obtained from the National Center for Basic Geographic Information and officially released through the National Geographic Information Public Service Platform (Tianditu, 2024 edition) [32]. This dataset provides updated vector boundary data for all three administrative levels.
(4)
Other Datasets
Cloud, cirrus, cloud shadow, snow, and saturated-pixel masks were generated using the QA60 and scene classification (SCL) layers of Sentinel-2 Level-2A products available in GEE. By leveraging bits 10 and 11 together with SCL-based scene classification identifiers, this method removes clouds, cirrus, cloud shadows, snow, and saturated pixels on a pixel-by-pixel basis [33]. The masks are updated synchronously with the imagery and preserve the original 10 m spatial resolution. Topographic data were derived from the 30 m NASA SRTM C-band InSAR digital elevation model (DEM), from which elevation, slope, and aspect were computed within GEE [34]. Climate data were obtained from the ERA5-Land monthly reanalysis product at 0.1°. Monthly mean temperature and precipitation for 2021 were aggregated to derive annual averages [35]. Soil texture information was derived from the 250 m OpenLandMap USDA soil texture classification dataset, which integrates Landsat, MODIS, SRTM, and global soil profile data using machine-learning-based spatial downscaling. Soil clay content data were obtained from the 250 m OpenLandMap clay mass fraction dataset for the 0–5 cm topsoil layer, which was generated using a quantile regression forest model. All datasets used in this study are summarized in Table 1.

2.2.2. Field Survey Data

(1)
Field Sampling Data
Based on the distribution of cultivated land from the Third National Land Survey, major cropland patches in Shigatse were identified, and field survey routes were subsequently designed accordingly. The field survey, conducted in July 2025, collected data from 522 valid sampling points through a fully manual procedure: investigators walked to each sampling location, used handheld GPS devices for point-by-point positioning, and completed paper-based survey forms through on-site manual recording. Each sampling point recorded the crop code, survey area, geographic coordinates, elevation, crop type, growth stage, coverage, planting pattern, surface condition, image acquisition status, and flight ID. After quality control and the removal of anomalous records, 342 valid samples were retained, including 219 samples of highland barley, 79 samples of wheat, and 44 samples of rapeseed (Table 2). The spatial distribution of the sampling points is shown in Figure 2.
To assess spatial generalization, an independent validation dataset was collected from the Shannan area, which has a comparable elevation range but different cropping patterns and phenological calendars. A total of 263 field samples were collected in Shannan using the same field survey protocol as that used in Shigatse. These samples were used exclusively for external validation of the trained model; that is, the model was not trained on any Shannan samples. After quality control and removal of anomalous records, 252 valid samples were retained, including 55 samples of highland barley, 147 samples of wheat, and 50 samples of rapeseed (Table 3). The spatial distribution of the sampling points is shown in Figure 3.
(2)
Drone Survey Data
Based on the spatial distribution and aggregation characteristics of cultivated land patches, 20 representative plots were selected as UAV survey sites. Low-altitude aerial photography was conducted using a multi-rotor unmanned aerial vehicle (UAV) equipped with a high-resolution optical sensor, thereby producing centimeter-level surface imagery. Flight paths were designed taking into account the terrain conditions and flight safety. The distribution of UAV flight locations and ground sample plots is shown in Figure 4. UAV data acquisition was conducted solely to support field sampling operations, such as preliminary site inspection and sample point localization. These data were not used in the subsequent machine learning modeling or accuracy assessment.

2.3. Methods

The workflow diagram of this study is presented in Figure 5. GEE was used as the primary platform for cloud masking and cropland extraction. The platform was also used for the systematic integration of diverse datasets, including vegetation indices, phenological metrics, topographic variables, climate factors, texture features, and soil properties. The performance of three machine learning classifiers under different feature combinations was compared to quantify the contribution of each feature category to crop discrimination and to optimize the feature set and model configuration. This framework establishes a remote sensing-based crop classification approach tailored for high-altitude agricultural regions, achieving a balance between classification accuracy and computational efficiency and thereby supporting crop mapping and monitoring in complex environments.

2.3.1. Multi-Dimensional Feature Extraction and Band Optimization

To enhance the discriminative capability of crop classification models, we constructed a multi-scale, multi-type feature space from Sentinel-2 Level-2A surface reflectance imagery by integrating vegetation indices and auxiliary environmental variables. The feature space comprised three complementary dimensions: (i) intra-annual statistical and phenological characteristics, (ii) temporal morphological and textural information, and (iii) environmental constraint factors. Together, these components formed a high-dimensional feature set encompassing spectral, temporal, textural, topographic, climatic, and soil factors. The complete list of feature bands is provided in Table 4.
(1)
Annual statistics and phenological characteristics
Six vegetation indices were derived to characterize crop spectral responses, biophysical properties, and environmental stress information. Cloud, shadow, and snow/ice pixels were masked using the QA60 band and the SCL layer in Sentinel-2 Level-2A products. Surface reflectance was then rescaled using a scale factor of 0.0001. Based on the masked and rescaled imagery, we calculated the following indices: Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI) [36], Land Surface Water Index (LSWI), Normalized Drought Index (NDI), Cultivation Index (CI), and Green Canopy Vegetation Index (GCVI). These indices capture vegetation greenness, resistance to atmospheric disturbance, canopy moisture status, drought stress, cultivation characteristics, and chlorophyll content [37,38].
To exploit intra-annual temporal information, statistical metrics were extracted at multiple time scales. For each index, monthly medians, means, maxima, minima, and standard deviations were calculated. Annual median composites were also generated to suppress short-term noise and residual contamination. Phenological features were extracted using mean values within representative phenological windows (April–May, June–July, August–September) to capture crop-specific growth rhythms. The vegetation index feature system is summarized in Table 4.
We also experimented with monthly phenological windows (May, June, July, August, and September), but found that frequent cloud cover during the plateau monsoon season led to unreliable monthly composites. Therefore, we adopted bimonthly aggregated windows (April–May, June–July, August–September), which provide a more robust representation of crop phenology without losing essential seasonal patterns. Monthly statistics (median, mean, etc.) are still included in the feature set (Table 4) to capture finer intra-seasonal dynamics.
(2)
Temporal Patterns and Texture Information
Building on NDVI, EVI, and LSWI, we further derived smoothed intra-annual indices, texture features, and seasonal variation parameters based on harmonic regression fitting. NDVI_smooth, EVI_smooth, and LSWI_smooth were obtained through annual median compositing of the time series, thereby reducing the influence of clouds, noise, and outliers. Texture features were extracted from 3 × 3 window grey-scale co-occurrence matrices (GLCMs), encompassing statistics such as homogeneity, contrast, and entropy. These metrics represent spatial structural heterogeneity and are particularly useful for distinguishing crop types with similar spectral signatures but different spatial patterns in fragmented cropping areas. Seasonal parameters derived from harmonic regression fitting (e.g., the cosine and sine components of NDVI) were used to characterize periodic crop dynamics. The constant term (const_NDVI) reflects the baseline vegetation level over time, with elevated values predominantly occurring in intensively farmed regions with favorable irrigation conditions and relatively flat terrain. This variable therefore serves as an important temporal structural feature for delineating variations in crop development.
(3)
Environmental Constraint Factors
Topographic factors are key environmental constraints on crop distribution, especially in plateau regions where they strongly delineate cultivation boundaries [39]. Slope and elevation were derived from a 30 m SRTM DEM. As non-spectral auxiliary variables, topographic features combined with spectral indices such as NDVI and EVI can significantly improve the representation of crop-terrain coupling.
To further characterize environmental heterogeneity, climate and soil variables were incorporated. Climate variables included annual mean temperature and annual cumulative precipitation from the ERA5-Land reanalysis product, which were bilinearly resampled to 30 m to match Sentinel-2 spatial resolution [40]. Soil variables were derived from OpenLandMap, including topsoil texture type (USDA classification) and clay content (%) in the 0–10 cm topsoil layer [41]. These variables reflect soil physical structure and its capacity for water and nutrient retention, thereby providing important constraints on crop suitability.
(4)
Feature multicollinearity diagnosis
To assess potential redundancy in the high-dimensional feature set (originally 103 bands), we performed a multicollinearity diagnosis. Feature values were extracted from the 342 balanced training samples. The variance inflation factor (VIF) was calculated for each feature using ordinary least squares regression, where VIF = 1/(1 − R2). A VIF value greater than 10 is commonly considered indicative of severe multicollinearity. An iterative backward elimination procedure was applied: at each step, the feature with the highest VIF was removed, and VIFs were recalculated for the remaining features until all VIF values fell below 10. This analysis was implemented in Jupyter Lab (Version 4.5.0) with the statsmodels (version 0.14.0) and pandas (version 2.0.3) libraries [M27.1].
The procedure reduced the original 103 effective features (after removing constant columns) to 51 non-collinear features, with all VIF values below 10 (Table A1). However, because RF is inherently robust to multicollinearity due to its use of random feature subspacing and its independence from coefficient estimation, we additionally compared the classification performance of RF using the full 103-feature set versus the reduced 51-feature set. The comparison used the same training/testing samples and RF hyperparameters.
Seven feature-combination schemes were constructed to systematically evaluate the contribution of multi-dimensional features to crop classification (Table 5). Scheme 1 uses only vegetation index features as the baseline. Schemes 2–5 each combine vegetation indices with one additional feature category: phenological characteristics (Scheme 2), topographic features (Scheme 3), climate characteristics (Scheme 4), and texture features (Scheme 5). Scheme 6 integrates vegetation indices with phenology, topography, climate, and texture (five categories in total). Scheme 7 further adds soil characteristics, resulting in a six-category composite feature set.
It should be noted that the seven schemes shown in Table 5 do not represent all possible combinations of the six feature categories. In preliminary experiments, we also evaluated intermediate combinations (e.g., VI + two categories, VI + three categories, VI + four categories). However, these intermediate combinations yielded only marginal improvements in overall accuracy (<1.5%) compared to the best single-category addition (texture), and their performance was adequately represented by the cumulative trends observed in Schemes 6 and 7. Therefore, we omitted them from the final scheme table to maintain clarity and focus on the most informative comparisons: the individual contribution of each feature type (Schemes 2–5) and the full integration of all categories (Schemes 6 and 7).

2.3.2. Classification Model Construction

To achieve high-precision identification of cropland types, three widely used supervised classification models were implemented, including RF, GBT, and SVM.
(1)
RF Classification Model
RF was used for supervised classification with a multi-dimensional feature stack including vegetation-index statistics, smoothed indices, phenological-window metrics, harmonic regression coefficients (NDVI, EVI, and LSWI), GLCM texture features, and topographic, climate, and soil variables. RF is an ensemble learning method composed of multiple decision trees and is known for its robustness, strong generalization, and resistance to overfitting [42]. The RF model was configured with 800 decision trees, with each tree’s minimum leaf node sample size set to 5. The sample sampling ratio was set to 70%, and 10 features were randomly selected for splitting operations during each division.
(2)
GBT Classification Model
GBT utilizes sequential residual approximation to enhance model performance. The advantages of GBT lie in its ability to progressively fit complex boundaries, depict nonlinear relationships among features, and demonstrate strong robustness against outliers [43]. Given that land cover classification exhibits nonlinear, multiscale, and feature coupling characteristics, this study introduces GBT as a second-class ensemble learning model, with input features consistent with RF classification.
The parameter settings for GBT adhere to the stable convergence principle of “weak learners + small step size”, with specific parameters as follows: the number of trees is set to 400, the shrinkage rate is 0.05, the sampling rate is 0.7, and the maximum number of leaf nodes per tree is 64. This parameter configuration strikes a balance between fitting capability and preventing overfitting, making it suitable for common challenges in remote sensing scenarios.
(3)
SVM Classification Model
SVM can automatically identify support vectors with significant discriminative power for classification, and is particularly suited for handling classification problems with small samples in high-dimensional data [44]. Input features remained consistent with the preceding methods. The SVM employed a radial basis function (RBF) kernel with the following key hyperparameter settings: C (cost parameter) was set to 10, representing the soft margin penalty coefficient to balance margin maximization and error tolerance; gamma was set to 0.05, serving as the width parameter of the RBF kernel to control the influence range of samples in the feature space. In high-dimensional feature spaces, the radial basis function SVM (RBF-SVM) constructs a nonlinearly separable optimal hyperplane, demonstrating robust performance against fuzzy boundaries and a small number of outliers.

2.3.3. Model Training, Accuracy Validation, and Generalization Capability Assessment

To enhance the model’s generalization ability and classification performance for minority classes, the original samples were first randomly split into training (80%) and testing (20%) sets. Jitter augmentation was then applied exclusively to the training set. Specifically, two independent sets of random numbers were generated for each training sample and added to the longitude and latitude dimensions, respectively, with a perturbation magnitude of approximately 0.00025° (about 25–30 m), thereby generating an augmented training set. The augmented training set was then merged with the original training samples, and class balancing (by downsampling all classes to the size of the minority class) was performed on the combined training set. The test set remained as the original unaugmented samples. This procedure ensures no overlap between training and test samples.
Model training was performed using the RF, GBT, and SVM-learning classifiers on the GEE platform. Crop type codes were used as classification attributes, and the system automatically identified and integrated input feature bands from the imagery for training. The training workflow was implemented on the GEE platform. Classification results were output within the cultivated land mask, accompanied by area statistics.
Accuracy was assessed using an independent field-based validation dataset and confusion matrices. Overall accuracy and the kappa coefficient were calculated to compare the classification performance of different models [45]. To reduce the randomness in sample partitioning and model fitting, we reported the mean values of overall accuracy and the kappa coefficient across five classification runs as the final accuracy metrics.
To evaluate the spatial transferability of the RF model trained on Shigatse, we applied the model directly to the Shannan region—an adjacent agricultural area with similar elevation range but different cropping patterns and phenological schedules. Independent validation samples collected from Shannan using the same field survey protocol as that used in Shigatse consisted of 252 points. The trained RF model was applied to the Shannan area without any retraining or parameter adjustment. The predicted crop types were compared with the ground truth labels to calculate overall accuracy and kappa coefficient. This test provides a realistic assessment of model generalization across the heterogeneous Tibetan Plateau landscape.

3. Results

3.1. Accuracy Evaluation of Three Machine-Learning Classifiers

To evaluate the classification performance of the three machine-learning classifiers for cereal and oil crops in cold, high-altitude regions, seven feature-combination schemes incorporating vegetation indices, phenological variables, climatic variables, textural features, soil variables, and topographic variables were assessed using Sentinel-2 imagery in the Shigatse study area. Among the three machine-learning classifiers, the RF model achieved the best performance, with the highest overall accuracy of 84.77% and the highest kappa coefficient of 0.64, indicating strong agreement between the classification results and actual crop types [46]. This result exceeded the conventional accuracy threshold for remote sensing crop classification (OA > 80%, kappa > 0.60), demonstrating high reliability and practical applicability. According to the Landis and Koch standards, a kappa value between 0.61 and 0.80 indicates substantial agreement, suggesting that the RF model provided reliable classification results in the study area. The GBT model yielded its best overall classification accuracy of 78.47% with a kappa coefficient of 0.55, representing moderate-to-high discrimination capability across various land cover types. The SVM model achieved its best overall accuracy of 82.89%, with a kappa coefficient of 0.55 (Figure 6 and Figure 7).
Based on the average performance of the seven feature-combination schemes, the average kappa coefficients for the three machine-learning classifiers were ranked as SVM > RF > GBT, whereas their average overall accuracies were broadly comparable. In the Shigatse study area, the SVM classifier achieved the highest average overall accuracy (72.33%), exceeding RF and GBT by 6 and 8 percentage points, respectively, while its kappa coefficient (0.46) was 0.06 and 0.07 higher than those of RF and GBT (Table 6).
In summary, the RF model exhibited the best best-case classification performance under the optimal feature combination, whereas the SVM model demonstrated greater stability in average performance across feature schemes. Considering both the upper limit of classification accuracy and model stability, the RF and SVM models both produced satisfactory classification results, with RF showing superior peak classification performance. These two models can serve as useful references for the identification of cereal and oil crops in the high-altitude regions of Tibet.
To assess spatial generalization, the trained RF model was directly applied to the Shannan region without retraining. As detailed in Appendix A.2, Table A2, independent validation samples yielded an overall accuracy of 60.00% and a kappa of 0.40, indicating a clear decline in performance. This confirms that the model’s high internal accuracy does not guarantee cross-regional transferability. Visual inspection of the classification map (Figure A2) suggests that the decline in model generalization performance stems primarily from interregional spatial heterogeneity and phenological bias. While the model successfully identifies distribution patterns along the main river valleys, significant classification fragmentation occurs in the tributary terminal areas of Shannan. The detailed inset further reveals severe confusion between highland barley and wheat. This is attributed to regional variations in elevation gradients and phenological rhythms between the two areas, which lead to insufficient feature adaptability when the model is transferred from Shigatse to Shannan.

3.2. Feature Set Construction and Band Optimization

Among all machine learning classifiers, the composite feature scheme integrating all feature categories (Scheme 7) achieved the highest overall accuracy and kappa coefficient. Scheme 7 achieved the highest overall accuracy of 84.77% (with a kappa coefficient of 0.64), which was 26.21 percentage points higher than that of Scheme 1 (58.56%, Table 7). Taking the RF classifier as an example, the average overall accuracy was only 58.56% when only vegetation-index features were used. With the progressive integration of phenological, climatic, textural, soil, and topographic features, the overall accuracy increased to 84.77%. Similar improvement trends were observed for SVM and GBT, indicating that multi-dimensional feature integration substantially enhances crop classification performance.
Feature importance and out-of-bag (OOB) accuracy derived from the optimized RF model are shown in Figure 8 and Figure 9. Feature importance was ranked as follows: vegetation indices > climatic variables > textural features > soil attributes > topographic factors > phenological indicators. Vegetation indices dominated the importance ranking, with the four most important features belonging to this category (Figure 9). Key indices such as sin3_LSWI, cos6_LSWI, and sin3_NDVI each contributed to increasing OOB accuracy to above 63%, confirming their foundational role in classification.
Climatic characteristics, particularly indicators related to temperature and precipitation, ranked second in importance, providing supplementary information that moderately improved classification performance. Texture features aided in distinguishing crop types based on canopy structure details, while soil and topographic characteristics had relatively minor impacts. However, certain variables related to soil properties and elevation still provided localized discriminatory information under specific conditions.
The OOB accuracy ranged from 57.41% to 65.28%, indicating significant variation in predictive contribution across different features (Figure 10). Only six features showed excellent discriminatory capability (OOB > 64.00%), 11 were good (63.00–64.00%), 66 were moderate (61.00–63.00%), and 20 were poor (≤61.00%). Using an OOB threshold of ≥63%, 17 highly discriminative features were selected to form an optimized feature subset (Table 8). The top-performing features included EVI_4_5 (OOB = 65.28%), GLCM_NDVI_smooth_savg, and NDVI_8_9 (both with OOB values of 64.81%). Notably, three phenological and three textural features ranked among the top six, underscoring their joint importance in capturing crop growth dynamics and structural patterns. Removing 20 low-contribution features (OOB < 60.00%) effectively reduced dimensionality while maintaining classification performance.

3.3. Effect of VIF-Based Feature Reduction on Classification Accuracy

The iterative VIF elimination procedure removed 49 features, resulting in a reduced set of 51 features with all VIF values below 10 (maximum VIF = 9.76). The removed features were predominantly GLCM texture bands and some harmonic components (e.g., const_NDVI, const_EVI, and LSWI_smooth). The final VIF values for the retained features are presented in Figure A1. Figure A2 shows the Pearson correlation matrix of the selected features after recursive VIF filtering. The results demonstrate that the high multicollinearity among the initial multi-source features (especially the GLCM texture indices) was effectively mitigated. The majority of the remaining variables exhibit low to moderate correlation coefficients, ensuring that the input features provide non-redundant information for the random forest classifier.
When the RF model was retrained using only the 51 selected features, the overall accuracy (OA) decreased from 84.77% (with the full 103-feature set) to 65.0%, and the kappa coefficient dropped from 0.64 to 0.44 (Table A1). This decline indicates that, for the random forest, removing highly collinear features does not improve classification performance but rather reduced it. We attribute this to two factors: (1) RF is insensitive to multicollinearity because it randomly selects a subset of features at each split, and (2) some collinear features (e.g., multiple GLCM textures) carry complementary spatial information that benefits crop discrimination in the heterogeneous Tibetan Plateau landscape. Therefore, the full feature set was retained for final classification.

3.4. Spatial Distribution of Major Cereal and Oil Crops

Multi-dimensional time-series features combined with machine learning models enabled high-precision identification of major cereal and oil crops in the Shigatse. Figure 10 shows the classification results of the three machine learning classifiers under different feature combinations.
(1)
RF Classification Results
The RF-based classification map (Figure 10b) identified three primary crops: highland barley, wheat, and rapeseed. In 2021, the mapped areas were 581.52 km2 for highland barley, 295.39 km2 for wheat, and 386.81 km2 for rapeseed (Table 9). Wheat was sparsely distributed across central and northern plateaus, forming discrete patches that reflect the reality of limited large-scale contiguous cultivation due to seasonal irrigation constraints. Rapeseed occurred mainly along transport corridors and alluvial fans, forming linear or punctate patches, which reflects its dependence on thermal and irrigation resources. The RF classifier demonstrated robust category discrimination by integrating multi-temporal spectral, textural, and topographic features, producing crop distribution maps with high classification accuracy and strong geographic interpretability.
(2)
GBT Classification Results
The GBT-based classification map (Figure 10d) identified three primary crops: highland barley, wheat, and rapeseed. In 2021, the mapped areas were 525.41 km2 for highland barley, 333.16 km2 for wheat, and 404.53 km2 for rapeseed (Table 9). The GBT classification map (Figure 10d) showed highland barley widely distributed along central river valleys and foothill areas in linear strip-like patterns that closely align with alluvial plains of the river valleys. Wheat was primarily concentrated in open, gently sloping intermontane basins with contiguous patches. Rapeseed was scattered in lower-elevation areas with favorable slope orientations, reflecting its localized cultivation preferences. Although its accuracy was slightly lower than that of the RF model, GBT showed good applicability in complex terrain environments.
(3)
SVM Classification Results
The SVM-based classification map (Figure 10f) identified three primary crops: highland barley, wheat, and rapeseed. In 2021, the mapped areas were 1267.07 km2 for highland barley, 0.03 km2 for wheat, and 0.03 km2 for rapeseed (Table 9). According to the SVM-based classification map (Figure 10f), highland barley was the primary crop type, predominantly distributed in the river valleys of the southern and central regions, forming continuous strip-like patterns. Classification results for wheat and rapeseed were barely identified. Other land cover types (gray) are primarily concentrated in high-altitude and non-agricultural areas with clear boundaries, indicating that the SVM model retains a strong capability in distinguishing agricultural from non-agricultural land cover but shows limited discrimination among minority crop types. Although the SVM model achieved relatively competitive overall accuracy and kappa values, it exhibited substantial omission errors in the identification of wheat and rapeseed.

4. Discussion

4.1. Machine Learning Model Performance and GEE-Based Workflow Feasibility for Highland Crop Classification in the Tibetan Plateau

This study addresses a critical challenge in remote sensing-based crop identification on the Tibetan Plateau: the development of a robust and accurate framework for mapping cereal and oil crops under complex terrain, severe climatic constraints, and unstable spectral responses [7,8]. By leveraging the cloud-based GEE platform, we integrated multi-temporal Sentinel-2 imagery, multi-dimensional feature sets, and machine learning classifiers into a unified mapping workflow. This framework effectively overcomes the limitations of conventional field surveys and single-temporal image classification [6,18,19,47], demonstrating the feasibility and scalability of cloud-based remote sensing for agricultural monitoring in remote, high-altitude regions.
Methodological novelty: Unlike previous studies that often rely on a single feature type (e.g., only spectral or only temporal) [20,21], our work introduces a synergistic combination of six feature categories specifically tailored to the plateau’s environmental heterogeneity. Moreover, the entire pipeline—from data preprocessing (including the use of Sentinel-2 Level-2A terrain-corrected products) to feature extraction (monthly statistics, phenological windows, GLCM texture, harmonic regression, and external environmental layers) and classifier training—was implemented entirely within the GEE environment. This cloud-native approach eliminates the need for local data download and high-performance computing infrastructure, which is a significant advantage for operational mapping in data-scarce regions [47]. To our knowledge, this is one of the first studies to systematically evaluate such a comprehensive feature set for distinguishing highland barley, wheat, and rapeseed in the central Tibetan Plateau.
We have systematically evaluated the classification performance of three machine learning classifiers: RF, SVM, and GBT. Among them, RF outperformed SVM and GBT in plateau environments. This superiority is attributable to the algorithmic structure of RF and the characteristics of the research data. The SVM algorithm classifies data by identifying an optimal hyperplane. When applied to high-dimensional and nonlinear data with substantial inter-class feature overlap, such as that observed among highland barley, wheat, and rapeseed in this study, its generalization capability is often limited [48]. This limitation becomes especially pronounced when training samples are imbalanced, as SVM models tend to favor the majority class, resulting in sparse or absent classification outputs for wheat and rapeseed, a phenomenon that can be described as category collapse. In contrast, RF integrates multiple decision trees generated through bootstrap sampling, which enables effective modeling of nonlinear relationships and enhances robustness to noise and class imbalance. These results indicate that ensemble learning models are more suitable for the high-altitude agricultural systems characterized by strong environmental heterogeneity.

4.2. Mechanistic Interpretation of Multi-Dimensional Feature Integration and Dimensionality Reduction Analysis

A key finding is that the integration of phenological, topographic, and textural features substantially improved classification accuracy. This result directly addresses the long-recognized “same spectrum–different object” problem in plateau crop mapping [17,49]. On the Qinghai–Tibet Plateau, crop growth cycles and spectral characteristics are strongly regulated by topographic factors such as altitude and aspect, alongside the short and variable growing season. Phenological metrics (e.g., NDVI trajectories during critical growth stages) capture the unique temporal rhythms of different crops; topographic features (e.g., elevation, slope) explain spectral spatial differentiation driven by elevation-dependent hydrothermal stress; and textural characteristics enhance crop distinguishability within fragmented farmland by providing spatial contextual information. Their joint integration establishes a spatio-environmental feature representation system that extends beyond purely spectral discrimination and significantly enhances model robustness in complex alpine environments.
Mechanistically, the effectiveness of topographic features can be linked to the plateau’s terrain-controlled microclimate. Highland barley is often planted on slightly higher terraces with better drainage, whereas rapeseed tends to occupy lower, flatter areas with higher soil moisture. Elevation and slope thus act as proxies for these edaphic and hydrological conditions. Similarly, phenological features (especially NDVI and LSWI trajectories during June–July) reflect the rapid response of crop canopy development to the short summer warmth. The inclusion of soil clay content further improves separation because highland barley and wheat show preferences for different soil textures. These mechanistic linkages explain why the full feature set (Scheme 7) consistently outperformed simpler schemes.
Dimensionality reduction analysis and the “collinearity paradox”: A recursive backward elimination procedure reduced the feature set to 51 non-collinear variables (all VIF < 10). However, retraining the RF classifier with only these 51 features caused a marked decline in overall accuracy (from 84.77% to 65.0%) and kappa coefficient (from 0.64 to 0.44). This “collinearity paradox” suggests an important implication for our study: in the fragmented, high-altitude environment of Shigatse, feature redundancy may contribute to model robustness. The collinear features contain subtle, overlapping spatiotemporal signals that RF effectively leverages through its random feature subspace selection mechanism. Unlike linear models, RF is well suited to capturing nonlinear interactions among correlated variables. Removing these variables based solely on linear VIF metrics leads to significant information loss and model underfitting. Therefore, we retained the full feature set in the final model to maximize discriminatory information, while providing the complete collinearity analysis in Appendix A for transparency.

4.3. Analysis of Anomalous Results, Generalization Capability, and Future Directions

Analysis of anomalous results: Although the model achieved high overall accuracy, we observed several specific misclassification patterns. The most notable anomalies occurred in areas with complex topography (e.g., steep south-facing slopes) where highland barley and wheat were frequently confused. In these regions, solar illumination angle varies dramatically over short distances, and despite Sentinel-2 Level-2A products, residual shadow effects may still cause spectral confusion. Another anomaly was the systematic overestimation of rapeseed area in the western part of the study area, possibly due to spectral similarity with early-stage green vegetation (e.g., weeds) before full canopy cover. These anomalies highlight the need for more refined topographic normalization (e.g., using a high-resolution DEM and a physically based BRDF model) and the use of very high-resolution imagery for validation.
Generalization capability: The current workflow was developed and validated using field samples collected from the Shigatse region, a typical agro-pastoral zone of the central Tibetan Plateau. To test its transferability, we applied the trained RF model (with the full feature set) directly to the Shannan region, which has similar elevation ranges but different cropping patterns and phenological schedules. As reported in Section 3.1, the overall accuracy dropped from 84.77% (internal validation) to 60.0%, and the kappa coefficient decreased from 0.64 to 0.40. This substantial performance decline indicates that the model learned site-specific feature–crop relationships that do not generalize well across the Tibetan Plateau. The main sources of error were confusion between highland barley and wheat, likely due to overlapping phenological windows in Shannan, and between rapeseed and early-stage weeds. This suggests that while the feature set itself is theoretically transferable, the learned classification rules remain highly site-specific. Future work should adopt domain adaptation strategies, such as active learning, transfer learning, or incorporating more stable phenological features, to improve cross-regional generalization.
Limitations and future optimization: Several additional limitations indicate directions for future work. First, discrepancies exist between the crop areas extracted by the model and official statistics. This may be due to mixed pixels and field fragmentation, or sampling representativeness and mismatches in spatial units. Future work would benefit from more extensive field surveys and finer-scale alignment with agricultural statistical units. Second, while the model performed well for major cereal and oil crops, its ability to distinguish minor crops or intra-crop variants (e.g., highland barley with varying maturity periods) remains unverified. Incorporating finer-grained phenological parameters (e.g., dynamic time warping) or utilizing deep learning models (e.g., convolutional neural networks) may help address these limitations. Third, although the current workflow was validated in typical agricultural areas of Shigatse, its transferability across the broader Tibetan Plateau with diverse agricultural ecosystems requires further testing. Extending the framework to climate change scenarios could further enhance its applicability for dynamic crop monitoring and yield assessment.

5. Conclusions

This study developed a cloud-based crop mapping framework using the GEE platform to evaluate the performance of three machine learning classifiers and multi-dimensional feature combinations for cereal and oil crop identification on the Qinghai–Tibet Plateau. The main conclusions are as follows:
(1)
From the perspective of classification performance, the RF classifier achieved an overall accuracy of 84.77% and a kappa coefficient of 0.64 on the validation samples, indicating strong agreement between classified results and actual crop distributions. RF outperformed both the SVM and GBT, with an overall accuracy of 2 and 5 percentage points higher, respectively.
(2)
In classifying cereal and oil crops in the Shigatse area, feature importance derived from the RF model ranked in the order: vegetation indices > climatic variables > textural features > soil attributes > topographic factors > phenological indicators. Incorporating climatic, topographic, and phenological features significantly enhanced classification performance under cold, high-altitude conditions, highlighting their greater relevance compared to feature sets typically applied in lowland regions.
(3)
Based on the optimal model estimation, cultivated areas in 2021 were 581.52 km2 for highland barley, 295.39 km2 for wheat, and 386.81 km2 for rapeseed. Their spatial patterns closely corresponded to valley-terrace topography and irrigation conditions. Highland barley predominated in the broad Yarlung Tsangpo River valley and alluvial fans (slope < 6°, elevation 3800–4050 m). Wheat occurred mainly in strips along gentle foothill slopes, and rapeseed was concentrated on low terraces adjacent to irrigation canals.
In summary, the proposed GEE-based RF framework provides a reliable and scalable approach for high-precision crop mapping in alpine cold regions and offers important technical support for sustainable agricultural management and food-security monitoring on the Qinghai–Tibet Plateau.

Author Contributions

A.L.: Conceptualization, Methodology, Software, Validation, Formal Analysis, Writing—Original Draft; H.S.: Conceptualization, Funding Acquisition, Project Administration, Supervision, Writing—Review & Editing; Y.L.: Data Curation, Investigation, Resources; Z.W.: Data Curation, Investigation, Visualization; A.R.H.: Methodology, Writing—Review & Editing, Validation; H.Z.: Investigation, Formal Analysis; G.Z.: Software, Validation, Formal Analysis; Y.W.: Writing—Review & Editing, Visualization, Formal Analysis; G.Y.: Software, Validation, Resources; X.Y.: Methodology, Writing—Review & Editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Sub-project of the Science and Technology Plan Project of the Tibet Autonomous Region (XZ202501ZY0045), the High-end Foreign Experts Recruitment Plan of China (G2022172016L), and the National Natural Science Foundation of China (41501055).

Data Availability Statement

The datasets used in this study consist of both publicly available authoritative datasets and self-collected datasets acquired for the identification of cereal and oil crops in Tibet.The Global Cropland 2019 dataset is available from the University of Maryland Potapov Team: https://glad.umd.edu/dataset/GLCLUC2020 (accessed on 5 August 2025). The boundary of the study area dataset is available from the National Center for Basic Geographic Information: https://www.ngcc.cn/xwzx/ywcg/202404/t20240426_2410.html (accessed on 15 August 2025). The Sentinel-2 MSI (A/B) imagery (including the derived albedo, indices, phenology, and harmonics) is available from the ESA Copernicus Open Access Hub: https://catalogue.ceda.ac.uk/uuid/9edbe5a1f7f5496cbc5863e53335b4a9/ (accessed on 25 August 2025). The clouds/cirrus/cloud shadows/snow/saturated pixel mask (QA60 and SCL scene classification) is derived from the same Sentinel-2 MSI data product and is available from the same ESA Copernicus source. The terrain dataset (elevation, slope, slope direction) is available from the NASA/USGS LP DAAC (SRTM C-band InSAR): https://doi.org/10.5066/F7PR7TFT (accessed on 28 August 2025). The climate dataset (annual average temperature, annual precipitation) is available from the ERA5-Land reanalysis dataset (Copernicus ECMWF) via the Copernicus Climate Data Store: https://cds.climate.copernicus.eu/ (accessed on 20 August 2025). The soil texture classification dataset is available from OpenLandMap: https://openlandmap.org/ (accessed on 28 August 2025). The soil clay content dataset is also available from OpenLandMap: https://openlandmap.org/ (accessed on 30 August 2025). All processed datasets generated during this study are available upon reasonable request from the corresponding author. This statement fully complies with the MDPI Research Data Policies.

Acknowledgments

We appreciate the assistance of the Google Earth Engine platform and its creators. We acknowledge the journal editor and the anonymous reviewers for their in-sightful criticisms and outstanding work on this research. We also acknowledge data support from “Loess plateau science data center, National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China (http://loess.geodata.cn) (accessed on 8 August 2025)”.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Appendix A.1

Table A1. Comparison of classification accuracy using full vs. VIF-reduced feature sets.
Table A1. Comparison of classification accuracy using full vs. VIF-reduced feature sets.
Feature Set Number of FeaturesOverall Accuracy (%)Kappa
Full set100 84.80 0.64
VIF-reduced (VIF < 10)51 65.00 0.44

Appendix A.2

Figure A1. Histogram-style display of VIF distribution (features sorted by VIF). The red dashed line marks the threshold VIF = 10.
Figure A1. Histogram-style display of VIF distribution (features sorted by VIF). The red dashed line marks the threshold VIF = 10.
Remotesensing 18 01391 g0a1
Table A2. Classification accuracy of the two areas.
Table A2. Classification accuracy of the two areas.
AreaOverall Accuracy%Kappa
Shigatse84.800.64
Shannan60.000.40
Figure A2. Correlation heat map of the final 51 features.
Figure A2. Correlation heat map of the final 51 features.
Remotesensing 18 01391 g0a2
Figure A3. Classification Results of Shannan City. The upper-right inset provides an enlarged view of the red-boxed area from the main map.
Figure A3. Classification Results of Shannan City. The upper-right inset provides an enlarged view of the red-boxed area from the main map.
Remotesensing 18 01391 g0a3

References

  1. Xu, Y.B.; Wang, J. Impact of the synergistic digital-green transformation on food security capacity. J. China Agric. Univ. 2025, 30, 316–328. [Google Scholar]
  2. Duan, X.M.; Zhou, H.T. Safeguarding national food security with new quality productive forces: Theoretical logic, practical challenges, and pathways. Rural Econ. 2025, 4, 22–31. [Google Scholar]
  3. Shi, J.Q.; Luo, Z.; Yi, X.Z.M.; Liu, S.; Li, J.H.; Dan Zeng, Y.G.; Gan, C.L. Trend analysis of seasonal changes in Xizang based on climate change and new seasonal division methods. Arid Land Geogr. 2025, 48, 1141–1152. [Google Scholar]
  4. Food and Agriculture Organization of the United Nations. The State of Food and Agriculture 2025: Addressing Land Degradation Across Landholding Scales; FAO: Rome, Italy, 2025. [Google Scholar]
  5. Jiang, Y.; Shu, H.T.; Dong, X.C.; Chen, J.; Wang, X.Y.; Wei, L.; Li, Z.C.; Long, Y.Q.; Huang, P.; Ding, M.Z. Analysis of spatial pattern of typical crops in main grain producing areas of Xizang. Southwest China J. Agric. Sci. 2025, 38, 1305–1316. [Google Scholar]
  6. Tian, M.; Tian, Y.C. A study on sustainable urbanization based on ecological protection and human wellbeing in Xizang Autonomous Region. Chin. Sustain. Dev. Rev. 2024, 3, 82–94. [Google Scholar] [CrossRef]
  7. Hu, J. Study on the current situation and promotion countermeasures of cultivated land quality grade in Tibet. Tibet J. Agric. Sci. 2021, 43, 75–78. [Google Scholar]
  8. Qin, J.W.; Sun, Q.P.; Yang, S.T.; Wang, Y.X.; Tashi, N. Preliminary study on current situation of farmland fragmentation in typical agricultural areas of Tibet: A case study of a township in Lazi County, Xigaze City. Tibet J. Agric. Sci. Technol. 2020, 42, 91–94. [Google Scholar]
  9. Yan, J.Z.; Zhang, M.; Zhang, S.Y. Information extraction of main crops in eastern Qinghai Province based on GEE platform and MODIS NDVI time series. J. Southwest Univ. (Nat. Sci. Ed.) 2023, 45, 55–64. [Google Scholar]
  10. Mou, X.L.; Li, H.; Huang, C.; Liu, Q.S.; Liu, G.H. Application progress of Google Earth Engine in land use and land cover remote sensing information extraction. Remote Sens. Land Resour. 2021, 33, 1–10. [Google Scholar]
  11. Cai, W.B.; Wang, W.; Zhu, Q.; Zhang, Z.D.; Peng, W.T.; Cai, Y.L. Research progress on the application of natural resource ecological security supported by Google Earth Engine big data. Acta Ecol. Sin. 2025, 45, 3544–3554. [Google Scholar]
  12. Hao, B.F.; Han, X.J.; Ma, M.G.; Liu, Y.T.; Li, S.W. Research progress on the application of Google Earth Engine in geoscience and environmental sciences. Remote Sens. Technol. Appl. 2018, 33, 600–611. [Google Scholar]
  13. Yin, L.; Han, Q.F.; Zhao, Y.; Liu, W.X. Identification of areas of Aconitum leucostomum incursion and monitoring of grassland degradation in the Tuohulasu grassland of Xinjiang based on multi-feature fusion. Acta Prataculturae Sin. 2025, 34, 73–84. [Google Scholar]
  14. Meng, J.H.; Lin, Z.X.; Gao, X.Y.; He, R.P.; Zuo, L.J. Ten years of progress in big Earth data for sustainable agriculture. J. Geo-Inf. Sci. 2025, 27, 2531–2551. [Google Scholar]
  15. Zhao, Y. Research progress and application of random forest in remote sensing information extraction. Geomat. Spat. Inf. Technol. 2021, 44, 133–136+139. [Google Scholar]
  16. Xiao, D.P.; Tao, F.L.; Shen, Y.J.; Liu, J.F.; Wang, R.D. Sensitivity of response of winter wheat to climate change in the North China Plain in the last three decades. Chin. J. Eco-Agric. 2014, 22, 430–438. [Google Scholar]
  17. Deng, C.; Bai, H.; Ma, X.; Huang, X.; Zhao, T. Variation characteristics and its north-south differences of the vegetation phenology by remote sensing monitoring in the Qinling Mountains during 2000–2017. Acta Ecol. Sin. 2021, 41, 1068–1080. [Google Scholar]
  18. Chen, J.; Liu, T.Y.; Shi, Q.; Dong, J.W.; Chen, Y. Early-season crop classification: Recent developments and prospects. Natl. Remote Sens. Bull. 2025, 29, 1890–1900. [Google Scholar]
  19. Han, Y.; Meng, J. A review of per-field crop classification using remote sensing. Remote Sens. Land Resour. 2019, 31, 1–9. [Google Scholar]
  20. Gao, R.N.; Shi, J.Z.; Fan, H.; Jia, Y.H. Study on crop classification in the Sanjiang Plain area based on random forest. J. Wuhan Univ. (Eng. Ed.) 2024, 57, 519–527. [Google Scholar]
  21. Song, Q. Extraction of Crop Planting Structure Based on GF-1/WFV Data and Object-Oriented Method. Master’s Thesis, Chinese Academy of Agricultural Sciences, Beijing, China, 2016. [Google Scholar]
  22. Shao, M.; An, J.; Liu, B.; Wu, J.; Zhang, Q.; Yao, X.; Cheng, T.; Jiang, C.; Cao, W.; Zheng, H.; et al. Comprehensive assessment of wheat seedling growth status based on multimodal data. Sci. Agric. Sin. 2025, 58, 3857–3871. [Google Scholar]
  23. Shi, J.Q.; Dou, Y.L.; Yang, F.Y.; Dai, R.; Hu, J. Spatiotemporal pattern characteristics of potential evapotranspiration and its influencing factors in the Tibet region. Arid Zone Res. 2021, 38, 724–732. [Google Scholar]
  24. Ci, W.; Du, J.; Zha, X.D.; Chen, X.Y.; Xiao, Z.J.; Liu, S. Climate response and operational impact of updated climatological normals on the Xizang Plateau. Arid Land Geogr. 2025, 1–12. [Google Scholar]
  25. Sun, J.; Lu, F.; Wang, K.M.; Zhou, Y.Y.; Yu, S.B.; Dai, Y.Y.; Zhu, C.Y. A study on the impact of climate change on the water requirement of highland barley in the Tibet Autonomous Region. China Rural Water Hydropower 2024, 6, 64–74+81. [Google Scholar]
  26. Chen, Y.; Yang, Q. Spatiotemporal evolution characteristics and driving factors of land use/cover change in Tibet Autonomous Region. J. Soil Water Conserv. 2022, 36, 173–180. [Google Scholar]
  27. Xiong, X. Changes in agricultural planting structure and influencing factors in Xigazê City, Tibet. Cultiv. Tillage 2020, 40, 39–43. [Google Scholar]
  28. Zhang, Y. Spatiotemporal variation analysis of chlorophyll-a concentration in the Sansha Bay based on Sentinel-2 data. Mod. Surv. Mapp. 2024, 47, 50–55+84. [Google Scholar]
  29. Zhao, D.; Wang, F.; Li, D. Analysis of spatiotemporal variation characteristics of sea ice in the Bohai Sea from 2017 to 2024 based on Google Earth Engine. J. Hebei Acad. Sci. 2025, 42, 74–79+90. [Google Scholar]
  30. Zhao, J.C.; Sun, X.F.; Wang, M.; Wang, J.B. Identification of farmland and grassland on the Tibetan Plateau based on optical and radar remote sensing data. Remote Sens. Technol. Appl. 2025, 40, 695–707. [Google Scholar] [CrossRef]
  31. Potapov, P.; Turubanova, S.; Hansen, M.C.; Tyukavina, A.; Zalles, V.; Khan, A.; Song, X.P.; Pickens, A.; Shen, Q.; Cortez, J. Global maps of cropland extent and change show accelerated cropland expansion in the twenty-first century. Nat. Food 2022, 3, 19–28. [Google Scholar] [CrossRef]
  32. Ministry of Civil Affairs of the People’s Republic of China. Code for the Administrative Divisions of the People’s Republic of China (2024) [Data Set]; Ministry of Civil Affairs of the People’s Republic of China: Beijing, China, 2024. [Google Scholar]
  33. Bing, F.F.; Jin, Y.T.; Zhang, W.H.; Xu, N.; Yu, T.; Zhang, L.L.; Pei, Y.Y. Research progress on cloud detection in remote sensing images based on machine learning. Remote Sens. Technol. Appl. 2023, 38, 129–142. [Google Scholar]
  34. NASA/USGSLPDAAC. Landsat 8–9 Collection 2 Level-2 Science Products [Data Set]; NASA EOSDIS Land Processes DAAC, USGS Earth Resources Observation and Science (EROS) Center: Sioux Falls, SD, USA, 2023. [Google Scholar]
  35. Copernicus Climate Change Service (C3S). ERA5-Land Hourly Data from 1950 to Present [Data Set]; Copernicus Climate Change Service (C3S) Climate Data Store (CDS): Bonn, Germany, 2019. [Google Scholar]
  36. Son, N.T.; Chen, C.F.; Chen, C.R.; Minh, V.Q.; Trung, N.H. A comparative analysis of multitemporal MODIS EVI and NDVI data for large-scale rice yield estimation. Agric. For. Meteorol. 2014, 197, 52–64. [Google Scholar] [CrossRef]
  37. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, 6, 610–621. [Google Scholar] [CrossRef]
  38. Ao, D.; Yang, J.H.; Ding, W.T.; An, S.S.; He, H.L. A review of research progress on 54 vegetation indices. J. Anhui Agric. Sci. 2023, 51, 13–21+28. [Google Scholar]
  39. Yang, G.; Liu, J.; Zhao, C.; Li, Z.; Huang, Y.; Yu, H.; Yang, X. Unmanned aerial vehicle remote sensing for field-based crop phenotyping: Current status and perspectives. Front. Plant Sci. 2017, 8, 1111. [Google Scholar] [CrossRef]
  40. Muñoz Sabater, J. ERA5-Land Hourly Data from 1981 to Present [Data Set]; Copernicus Climate Change Service (C3S) Climate Data Store (CDS): Reading, UK, 2019. [Google Scholar]
  41. Hengl, T.; Mendes de Jesus, J.; Heuvelink, G.B.; Gonzalez, M.R.; Kilibarda, M.; Blagotić, A.; Kempen, B. SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 2017, 12, e0169748. [Google Scholar] [CrossRef]
  42. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  43. Zhang, D. Remote Sensing Identification of Crop Planting Structure in the Chaohu Lake Basin. Master’s Thesis, Anhui Agricultural University, Hefei, China, 2024. [Google Scholar]
  44. Foody, G.M.; Mathur, A. A relative evaluation of multiclass image classification by support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1335–1343. [Google Scholar] [CrossRef]
  45. Yue, R. Land Cover Classification of the Mongolian Plateau Based on MODIS Data. Master’s Thesis, Inner Mongolia Normal University, Hohhot, China, 2010. [Google Scholar]
  46. Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef]
  47. Huang, Z.L.; He, J.; Liu, G.; Li, Z. Research progress on remote sensing image analysis and application for the Google Earth Engine (GEE) platform. Remote Sens. Technol. Appl. 2023, 38, 527–534. [Google Scholar]
  48. Hoffman, A.L.; Kemanian, A.R.; Forest, C.E. Analysis of climate signals in the crop yield record of sub-Saharan Africa. Glob. Change Biol. 2018, 24, 143–157. [Google Scholar] [CrossRef] [PubMed]
  49. Zhao, Y. Spatiotemporal Characteristics of Winter Wheat Phenology and Its Response to Climate Change in Shandong Province. Master’s Thesis, Ludong University, Yantai, China, 2021. [Google Scholar]
Figure 1. Overview of the study area. (1) Shigatse Topography; (2) Shannan Topography. The digital elevation model (DEM) elevation ranges from 1717 m to 8336 m for Shigatse and from 83 m to 7197 m for Shannan. Both maps include scale bars (200 km for Shigatse, 150 km for Shannan), north arrows, and regional context insets (Shigatse: 30°N–35°N, 85°E–100°E; Shannan: 20°N–35°N, 80°E–100°E). The 30 m DEM data used in this figure were derived from the NASA SRTM Digital Elevation 30m dataset (SRTMGL1_003) (https://developers.google.com/earth-engine/datasets/catalog/USGS_SRTMGL1_003, accessed on 15 August 2025).
Figure 1. Overview of the study area. (1) Shigatse Topography; (2) Shannan Topography. The digital elevation model (DEM) elevation ranges from 1717 m to 8336 m for Shigatse and from 83 m to 7197 m for Shannan. Both maps include scale bars (200 km for Shigatse, 150 km for Shannan), north arrows, and regional context insets (Shigatse: 30°N–35°N, 85°E–100°E; Shannan: 20°N–35°N, 80°E–100°E). The 30 m DEM data used in this figure were derived from the NASA SRTM Digital Elevation 30m dataset (SRTMGL1_003) (https://developers.google.com/earth-engine/datasets/catalog/USGS_SRTMGL1_003, accessed on 15 August 2025).
Remotesensing 18 01391 g001
Figure 2. Sampling point distribution in Shigatse, including both training and testing sample points for crop (arable land) classification. The map shows the spatial arrangement of training samples (used for model calibration) and testing samples (used for accuracy validation) within the study area.
Figure 2. Sampling point distribution in Shigatse, including both training and testing sample points for crop (arable land) classification. The map shows the spatial arrangement of training samples (used for model calibration) and testing samples (used for accuracy validation) within the study area.
Remotesensing 18 01391 g002
Figure 3. Sampling Point Distribution in Shannan City. The table presents the coordinates (X: longitude, Y: latitude) of the sampling points.
Figure 3. Sampling Point Distribution in Shannan City. The table presents the coordinates (X: longitude, Y: latitude) of the sampling points.
Remotesensing 18 01391 g003
Figure 4. Distribution of UAV Ground Plot and Point Samples in Shigatse. The map displays sampling points by county, with an arrow highlighting three example crops: naked barley, wheat, and rapeseed.
Figure 4. Distribution of UAV Ground Plot and Point Samples in Shigatse. The map displays sampling points by county, with an arrow highlighting three example crops: naked barley, wheat, and rapeseed.
Remotesensing 18 01391 g004
Figure 5. Workflow diagram of the study. The flowchart illustrates data preparation, multi-dimensional feature extraction (vegetation indices, phenology, texture, topography, climate, soil), seven feature combination schemes (Schemes 1–7), classification using GBT, SVM, and RF algorithms, and final crop mapping, with the ultimate goal of building an intelligent recognition pipeline for plateau crops.
Figure 5. Workflow diagram of the study. The flowchart illustrates data preparation, multi-dimensional feature extraction (vegetation indices, phenology, texture, topography, climate, soil), seven feature combination schemes (Schemes 1–7), classification using GBT, SVM, and RF algorithms, and final crop mapping, with the ultimate goal of building an intelligent recognition pipeline for plateau crops.
Remotesensing 18 01391 g005
Figure 6. Kappa Coefficient Heatmap for Different Feature-Classifier Combinations.
Figure 6. Kappa Coefficient Heatmap for Different Feature-Classifier Combinations.
Remotesensing 18 01391 g006
Figure 7. Classification Accuracy Heatmap for Different Feature Combinations and Machine-learning Classifiers.
Figure 7. Classification Accuracy Heatmap for Different Feature Combinations and Machine-learning Classifiers.
Remotesensing 18 01391 g007
Figure 8. Feature Importance.
Figure 8. Feature Importance.
Remotesensing 18 01391 g008
Figure 9. Out-of-Bag Accuracy.
Figure 9. Out-of-Bag Accuracy.
Remotesensing 18 01391 g009
Figure 10. Classification Results for Different Feature Combination Schemes. (Random Forest, Gradient Boosting, Support Vector Machine with Scheme 6 or Scheme 7). Each plot (a–f) represents a model-scheme combination: (a) RF + Scheme 6, (b) RF + Scheme 7, (c) GBT + Scheme 6, (d) GBT + Scheme 7, (e) SVM + Scheme 6, (f) SVM + Scheme 7. The two views on the right show the details of the red-boxed areas from the main map.
Figure 10. Classification Results for Different Feature Combination Schemes. (Random Forest, Gradient Boosting, Support Vector Machine with Scheme 6 or Scheme 7). Each plot (a–f) represents a model-scheme combination: (a) RF + Scheme 6, (b) RF + Scheme 7, (c) GBT + Scheme 6, (d) GBT + Scheme 7, (e) SVM + Scheme 6, (f) SVM + Scheme 7. The two views on the right show the details of the red-boxed areas from the main map.
Remotesensing 18 01391 g010
Table 1. The remote sensing data used in this study. Descriptions include dataset name, source/sensor, temporal and spatial resolutions, and main explanatory notes (e.g., mask types, derived variables). For terrain data, the 30 m NASA SRTM DEM was used; for climate, ERA5-Land 0.1° monthly data; for soil properties, OpenLandMap 250 m products.
Table 1. The remote sensing data used in this study. Descriptions include dataset name, source/sensor, temporal and spatial resolutions, and main explanatory notes (e.g., mask types, derived variables). For terrain data, the 30 m NASA SRTM DEM was used; for climate, ERA5-Land 0.1° monthly data; for soil properties, OpenLandMap 250 m products.
Data NameData Source/SensorTemporal ResolutionSpatial ResolutionMain Explanation
Global Cropland 2019University of Maryland Potapov Team 30 mPixel Value 1 = Cropland, 0 = Non-Cropland
Boundary of the Study AreaNational Center for Basic Geographic Information
https://www.ngcc.cn/xwzx/ywcg/202404/t20240426_2410.html (accessed on 15 August 2025)
-VectorResearch Area Boundary
Sentinel-2 MSI (A/B)ESA Copernicus Sentinel-2A/B
Sensor: Sentinel-2 MSI
5 days (equator), 2–3 days (mid-latitudes)10 m (Visible-Red Edge-Near-Infrared), 20 m (Shortwave Infrared)Primary Image (Albedo, Index, Phenology, Harmonics)
Clouds/Cirrus clouds/Cloud shadows/Snow/Saturated pixel maskESA COPERNICUS S2_SR_HARMONIZED Assembly Sensor: Sentinel-2 MSISynchronized with the video60 m (QA60) 10 m (SCL)QA60 + SCL Scene Classification
Terrain (Elevation/Slope/Slope Direction)NASA/USGS LP DAAC
https://doi.org/10.5066/F7PR7TFT (accessed on 28 August 2025)
NASA SRTM C-band InSAR (SNR-C Radar)
-30 m
Climate (annual average temperature, annual precipitation)ERA5-Land Reanalysis (Copernicus)
ECMWF/ERA5\_LAND/MONTHLY\_AGGR
Month by month0.1° (≈9 km)
Soil Texture ClassificationOpenLandMap-250 m
Soil clay contentOpenLandMap-250 m
Table 2. Sample Point Data Statistics in Shigatse. The table summarizes the number and proportion of ground-truth sample points for each crop type (total n = 359).
Table 2. Sample Point Data Statistics in Shigatse. The table summarizes the number and proportion of ground-truth sample points for each crop type (total n = 359).
Crop TypeNumberProportion/%
Highland Barley21961.00
Wheat7922.01
Rapeseed4412.26
Oat133.62
Buckwheat20.56
Corn10.28
Barley10.28
Table 3. Sample Point Data Statistics in Shannan City. The table summarizes the number and proportion of ground-truth sample points for each crop type (total n = 263).
Table 3. Sample Point Data Statistics in Shannan City. The table summarizes the number and proportion of ground-truth sample points for each crop type (total n = 263).
Crop TypeNumberProportion/%
Highland Barley5520.91
Wheat14755.89
Rapeseed5019.01
Corn114.18
Table 4. Summary of multidimensional features and their relevant indices. The table lists vegetation indices (NDVI, EVI, LSWI, NDSVI, NDTI, GCVII) and their derived statistics (monthly mean, std, max, min, harmonic terms, smoothed median), along with texture, topography, climate, soil, and phenology features, including descriptions and data sources.
Table 4. Summary of multidimensional features and their relevant indices. The table lists vegetation indices (NDVI, EVI, LSWI, NDSVI, NDTI, GCVII) and their derived statistics (monthly mean, std, max, min, harmonic terms, smoothed median), along with texture, topography, climate, soil, and phenology features, including descriptions and data sources.
FeaturesIndicesDescriptionSource/Method
Vegetation NDVI   =   ( NIR   Red ) ( NIR   +   Red ) (1)Basic vegetation detection, distinguishing vegetation from non-vegetationSentinel-2 B8 and B4 Bands
EVI   =   2.5   ·   ( NIR Red )   ( NIR   +   6 · Red 7.5 · Blue   +   1 ) (2)Enhance the ability to distinguish dense crops and supplement NDVIB8, B4, B2 Band Calculation
LSWI   = ( NIR SWIR 1 )     ( NIR   +   SWIR 1 ) (3)Identify aquatic crops such as rice and monitor drought stressB8 to B11 Band Difference
NDSVI   = ( SWIR 1 Red )   ( SWIR 1   +   Red )   (4)Identify crop maturity stages, fallow land, and soil backgroundB11 to B4 Band Difference
NDTI   =   ( SWIR 1 SWIR 2 )   ( SWIR 1   +   SWIR 2 )     (5)Differentiate field management practicesB12 to B11 Band Difference
GCVI   = NIR Green 1   (6)Sensitively captures the early growth stage and the green vitality of the canopyB8 to B3 ratio
NDVI_meanNDVI monthly averageCalculate the monthly average of NDVI imagery
EVI_meanEVI monthly averageCalculate the monthly average of EVI imagery
LSWI_meanLSWI monthly averageCalculate the monthly average of LSWI imagery
NDVI_stdNDVI monthly standard deviationDevCalculate the standard deviation for the monthly NDVI image collection
EVI_stdEVI monthly standard deviationDevCalculate the standard deviation for the monthly EVI image collection
LSWI_stdLSWI monthly standard deviationDevCalculate the standard deviation for the monthly LSWI image collection
NDVI_maxNDVI monthly maximumFind the maximum value in the monthly NDVI image collection
EVI_maxEVI monthly maximumFind the maximum value in the monthly EVI image collection
LSWI_maxLSWI monthly maximumFind the maximum value in the monthly LSWI image collection
NDVI_minNDVI monthly minimumFind the minimum value in the monthly NDVI image collection
EVI_minEVI monthly minimumFind the minimum value in the monthly EVI image collection
LSWI_minLSWI monthly minimumFind the minimum value in the monthly LSWI image collection
NDVI_sin3sin(3πt)Time-component coefficients obtained from harmonic regression
NDVI_cos3cos(3πt)
NDVI_sin6sin(6πt)
NDVI_cos6cos(6πt)
EVI_sin3sin(3πt)
EVI_cos3cos(3πt)
EVI_sin6sin(6πt)
EVI_cos6cos(6πt)
LSWI_sin3sin(3πt)
LSWI_cos3cos(3πt)
LSWI_sin6sin(6πt)
LSWI_cos6cos(6πt)
NDVI_smoothAnnual NDVI median (smoothed)NDVI annual median
EVI_smoothAnnual EVI median (smoothed)EVI annual median
LSWI_smoothAnnual LSWI median (smoothed)LSWI annual median
Texture GLCMEffectively identifying patch variability and boundary structures among crop typesERA5-Land Reanalysis (Copernicus)
const_NDVIOne of the key temporal structural characteristics for depicting variations in crop growth patternsERA5-Land Reanalysis (Copernicus)
TopographySlopeEnhancing the Spatial Adaptability of ModelsNASA/USGS LP DAAC
ElevationEnhancing the Spatial Adaptability of ModelsNASA/USGS LP DAAC
Climate Annual precipitation (P_sum, mm)Aids in distinguishing crop varieties with significantly different water requirementsERA5-Land Reanalysis (Copernicus)
Annual average temperature (T2m_mean, °C)Assists in analyzing crop suitability zones and seasonal development variationsERA5-Land Reanalysis (Copernicus)
Soil USDA Texture Classification of TopsoilProvide supplementary information for crop classificationOpenLandMap
0–10 cm percentage of clay contentProvide supplementary information for crop classificationOpenLandMap
PhenologyNDVI_4_5Average NDVI values for April-MayNDVI averaged over the April–May time window
EVI_4_5Average EVI values for April-MayEVI averaged over the April–May time window
LSWI_4_5Average LSWI values for April-MayLSWI averaged over the April–May time window
NDVI_6_7Average NDVI values for June-JulyNDVI averaged over the June–July time window
EVI_6_7Average EVI values for June-JulyEVI averaged over the June–July time window
LSWI_6_7Average LSWI values for June-JulyLSWI averaged over the June–July time window
NDVI_8_9Average NDVI values for August-SeptemberCalculate the average NDVI value within the August–September time window.
EVI_8_9Average EVI values for August-SeptemberCalculate the average EVI value within the August–September time window.
LSWI_8_9Average LSWI values for August-SeptemberCalculate the average LSWI value within the August–September time window.
Table 5. Different feature combination schemes (Schemes 1–7) implemented in this study. The schemes progressively incorporate additional feature groups (phenology, terrain, climate, texture, soil) to evaluate their individual and combined effects on crop classification performance.
Table 5. Different feature combination schemes (Schemes 1–7) implemented in this study. The schemes progressively incorporate additional feature groups (phenology, terrain, climate, texture, soil) to evaluate their individual and combined effects on crop classification performance.
MethodFeature Combination
Scheme 1 Vegetation Index Characteristics
Scheme 2 Vegetation Index + Phenological Characteristics
Scheme 3 Vegetation Index + Terrain Features
Scheme 4 Vegetation Index + Climate Characteristics
Scheme 5 Vegetation Index + Texture Features
Scheme 6 Vegetation Index + Phenology + Topography + Climate + Texture Characteristics
Scheme 7 Vegetation Index + Phenology + Topography + Climate + Texture + Soil Characteristics
Table 6. Classification accuracy of the three machine-learning classifiers. For each classifier (RF, SVM, GBT), the table provides optimal overall accuracy, optimal Kappa, average overall accuracy, and average Kappa across feature schemes.
Table 6. Classification accuracy of the three machine-learning classifiers. For each classifier (RF, SVM, GBT), the table provides optimal overall accuracy, optimal Kappa, average overall accuracy, and average Kappa across feature schemes.
Classification
Algorithm
Optimal Overall
Accuracy%
Optimal Kappa
Coefficient
Average Overall
Accuracy%
Average Kappa
Coefficient
RF84.770.6466.420.40
SVM82.890.5572.330.46
GBT78.470.5564.360.39
Table 7. Classification accuracy table for different feature combination schemes (Schemes 1–7) with RF, GBT, and SVM classifiers. The table presents overall accuracy (%) and kappa coefficient for each scheme-classifier pair.
Table 7. Classification accuracy table for different feature combination schemes (Schemes 1–7) with RF, GBT, and SVM classifiers. The table presents overall accuracy (%) and kappa coefficient for each scheme-classifier pair.
MethodAlgorithmOverall Accuracy/%Kappa Coefficient
Scheme 1 RF58.560.29
GBT60.360.35
SVM55.860.29
Scheme 2 RF62.160.35
GBT63.060.36
SVM49.550.21
Scheme 3 RF62.160.34
GBT59.460.29
SVM81.980.58
Scheme 4 RF63.960.36
GBT62.160.37
SVM72.070.53
Scheme 5 RF62.160.37
GBT57.660.37
SVM82.880.55
Scheme 6 RF71.170.48
GBT69.370.46
SVM81.080.49
Scheme 7 RF84.770.64
GBT78.470.55
SVM82.890.55
Table 8. High-Discriminative Features and their OOB accuracy.
Table 8. High-Discriminative Features and their OOB accuracy.
Characteristic BandOut-of-Bag Accuracy/%
EVI_4_565.28
GLCM_NDVI_smooth_savg64.81
NDVI_8_964.81
GLCM_EVI_smooth_asm64.35
LSWI_4_564.35
sin3_NDVI64.35
cos6_LSWI63.89
GLCM_LSWI_smooth_dvar63.89
GLCM_NDVI_smooth_imcorr263.89
GLCM_NDVI_smooth_svar63.89
NDVI_4_563.89
GLCM_EVI_smooth_idm63.43
GLCM_EVI_smooth_shade63.43
GLCM_NDVI_smooth_idm63.43
GLCM_NDVI_smooth_prom63.43
GLCM_NDVI_smooth_sent63.43
sin3_LSWI63.43
Table 9. Classification area (km2) of various crops in 2021 derived from RF, GBT, and SVM classifiers using Scheme 7. The table presents planting areas for highland barley, wheat, rapeseed, and others.
Table 9. Classification area (km2) of various crops in 2021 derived from RF, GBT, and SVM classifiers using Scheme 7. The table presents planting areas for highland barley, wheat, rapeseed, and others.
Crop RF (Classification Area, km2)GBT (Classification Area, km2)SVM (Classification Area, km2)
Highland Barley581.52 525.41 1267.07
Wheat295.39 333.16 0.03
Rapeseed386.81 404.53 0.03
Others13,790.40 13,791.02 34.37
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, A.; Shi, H.; Liu, Y.; Wen, Z.; Huete, A.R.; Zhang, H.; Zhao, G.; Wang, Y.; Yang, G.; Yang, X. A Multi-Dimensional Feature-Driven Method for Remote Sensing-Based Identification of Cereal and Oil Crops in the Tibetan Plateau. Remote Sens. 2026, 18, 1391. https://doi.org/10.3390/rs18091391

AMA Style

Li A, Shi H, Liu Y, Wen Z, Huete AR, Zhang H, Zhao G, Wang Y, Yang G, Yang X. A Multi-Dimensional Feature-Driven Method for Remote Sensing-Based Identification of Cereal and Oil Crops in the Tibetan Plateau. Remote Sensing. 2026; 18(9):1391. https://doi.org/10.3390/rs18091391

Chicago/Turabian Style

Li, Aoxue, Haijing Shi, Yangyang Liu, Zhongming Wen, Alfredo R. Huete, Hongming Zhang, Gang Zhao, Ye Wang, Guang Yang, and Xihua Yang. 2026. "A Multi-Dimensional Feature-Driven Method for Remote Sensing-Based Identification of Cereal and Oil Crops in the Tibetan Plateau" Remote Sensing 18, no. 9: 1391. https://doi.org/10.3390/rs18091391

APA Style

Li, A., Shi, H., Liu, Y., Wen, Z., Huete, A. R., Zhang, H., Zhao, G., Wang, Y., Yang, G., & Yang, X. (2026). A Multi-Dimensional Feature-Driven Method for Remote Sensing-Based Identification of Cereal and Oil Crops in the Tibetan Plateau. Remote Sensing, 18(9), 1391. https://doi.org/10.3390/rs18091391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop