Next Article in Journal
A Study on the Prioritization of Reuse Models for Abandoned Quarries Based on Residents’ Demands: A Case Study of Jiawang District, Xuzhou City
Previous Article in Journal
Balancing Ecological Restoration and Industrial Landscape Heritage Values Through a Digital Narrative Approach: A Case Study of the Dagushan Iron Mine, China
Previous Article in Special Issue
A Semi-Systematic Global Review to Understand the Key Components Essential for Advancing the Actual Design, Planning, and Implementation of Blue–Green Infrastructure in Indian Cities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Training Sample Migration for Temporal Cropland Mapping in Central Asia

Digital FAO and Agro-Informatics Division, Food and Agriculture Organization of the United Nations, 00153 Rome, Italy
*
Author to whom correspondence should be addressed.
Land 2026, 15(1), 156; https://doi.org/10.3390/land15010156
Submission received: 27 November 2025 / Revised: 24 December 2025 / Accepted: 7 January 2026 / Published: 13 January 2026

Abstract

Accurate cropland mapping in data-scarce regions remains challenging due to limited field data, strong interannual climatic variability, and heterogeneous cropping systems. This study proposes an NDVI-based training sample migration framework that transfers labeled samples from reference years in irrigated and rainfed agricultural systems to a target year using time-series similarity analysis. Ten similarity metrics representing geometric, temporal alignment, and correlation-based families were systematically evaluated to identify optimal thresholds and robust hybrid combinations for stable cropland transfer. The migrated samples were used to train a Random Forest classifier to generate binary cropland maps for 2021. Independent validation yielded overall accuracies of 86% in Kazakhstan and 95% in Uzbekistan. Comparisons with global cropland products (WorldCereal 2021 and WorldCover 2021) demonstrated improved spatial coherence and reduced misclassification, particularly in semi-arid environments. The proposed framework extends the temporal utility of existing labeled datasets and supports scalable cropland mapping without the need for repeated annual field surveys.

1. Introduction

Timely and accurate cropland information is essential for monitoring agricultural activities, estimating yields, managing natural resources, and supporting food security [1,2]. The Food and Agriculture Organization (FAO) defines temporal cropland as land used for crops with a less-than-one-year growing cycle, replanted after each harvest. Accurate mapping of such land is therefore vital for agricultural planning, land-use change detection, and climate impact assessment [3].
Over the past decade, several global cropland datasets have been developed to address this need, including ESA WorldCover [4], WorldCereal [5], GlobeLand30 [6], FROM-GLC [7], and GLAD-LCLUC [8]. These products, typically available at 10–30 m spatial resolution, are derived from Earth Observation (EO) missions such as MODIS, Landsat, and Sentinel and are produced using supervised machine-learning algorithms trained on large collections of labeled samples [9,10,11]. While they provide valuable insights at continental and global scales, these datasets often suffer from limited temporal consistency and reduced local accuracy, particularly in regions characterized by fragmented or mixed agricultural systems [12,13,14,15].
Despite major advances in EO data availability and cloud computing, achieving high-accuracy cropland mapping remains challenging. Variability in agricultural practices, climate conditions, and cropping systems, combined with limitations in spatial resolution and temporal coverage, frequently leads to misclassification [16]. Reliable cropland mapping therefore depends not only on robust classifiers but also on the availability of representative, high-quality training samples that are temporally consistent [17,18].
The quality and representativeness of labeled samples remain key factors influencing classification accuracy. Machine-learning algorithms such as Random Forest (RF), Support Vector Machines (SVM), and Decision Trees require sufficient and well-distributed training data to achieve generalizable results [19,20]. As emphasized by Millard and Richardson (2015) [21], model reliability should always be assessed using independent validation data, since non-independent testing tends to overestimate performance. Spatial bias, inconsistent sample distribution, and temporal mismatch further constrain model transferability, especially in heterogeneous or data-scarce agricultural regions [22,23,24,25].
Collecting reliable ground-truth data on a regular basis remains logistically difficult and resource-intensive, particularly across large or remote regions. Seasonal constraints, restricted access, and rapid crop phenological changes limit the feasibility of repeated field campaigns [16]. These challenges are especially pronounced in Central Asia, where arid and semi-arid climates, recurrent droughts, and strong dependence on irrigation result in high interannual variability of vegetation dynamics [26,27,28,29,30]. In northern Kazakhstan, extensive rainfed wheat systems are frequently affected by drought stress, which suppresses vegetation vigor and complicates both field-based data collection and remote sensing interpretation. As a result, the availability of reliable annual reference data in such environments is particularly limited.
One promising approach to address these constraints is training sample migration, which transfers labeled samples from a reference year to a target year based on the temporal similarity of satellite-derived indicators such as the normalized difference vegetation index (NDVI). High-quality reference samples can thus be reused to expand training datasets both spatially and temporally, complementing existing land-cover products [21,27]. Several studies have demonstrated the feasibility of this concept using similarity measures such as Euclidean Distance (ED), Spectral Angle Distance (SAD), and Dynamic Time Warping (DTW) to identify unchanged pixels across years [24,31,32].
However, relatively few studies explicitly focus on sample migration for cropland mapping under contrasting climatic and management conditions. Most existing approaches rely on a limited set of similarity metrics and provide limited guidance on metric selection, threshold definition, and robustness in regions characterized by strong interannual variability. This lack of systematic evaluation reduces the reliability of sample transfer in agricultural systems affected by drought, irrigation practices, and management-driven fluctuations.
To address this gap, this study proposes a tailored NDVI-based training sample migration framework that systematically evaluates multiple time-series similarity metrics and their combinations to identify temporally stable cropland samples. The migrated samples are subsequently used to train a Random Forest classifier for annual cropland mapping. The framework is tested in two pilot regions representing contrasting agricultural systems in Central Asia: rainfed wheat production in drought-prone areas of Kazakhstan and irrigated cotton-dominated systems in Uzbekistan. By reducing reliance on repeated field surveys and leveraging temporal consistency in NDVI trajectories, the proposed approach offers a scalable and operational solution for cropland mapping in data-limited and climatically challenging environments.

2. Materials and Methods

This section provides an overview of the data sources and methodological framework employed in this study. First, the study area and input datasets are introduced, including satellite-derived NDVI time-series composites and publicly available reference data from Kazakhstan and Uzbekistan. Second, the procedures for generating NDVI trajectories, preprocessing, normalization, and temporal smoothing are described. Third, the methodological workflow for migrating training samples is presented, covering the computation of ten similarity metrics, threshold optimization, and metric-combination strategies. Finally, the classification and validation procedures are detailed, including Random Forest model training, production of the 2021 cropland map, and accuracy assessment using independent validation samples and global land cover products.

2.1. Study Area and Data

Central Asia is a strategically important yet underexplored region for agricultural research, particularly in the context of remote-sensing-based cropland classification. According to the Food and Agriculture Organization (FAO) [33], Kazakhstan is one of the largest grain producers in Eurasia, with total cereal production reaching approximately 18.6 million tons in 2024 and wheat exports forecast at around 10 million tons for the 2024/25 season. Uzbekistan has a long-established cotton sector and remains among the world’s top ten cotton producers, with an annual output of approximately one million tons [34].
The agro-environmental context of Central Asia is defined by pronounced climatic aridity, strong continentality, and high interannual variability, which together impose substantial constraints on agricultural production and monitoring. Based on the FAO–IIASA Global Agro-Ecological Zones (GAEZ) framework [35], northern Kazakhstan, including the Akmola Region, is classified within a temperate cool, semi-arid agro-ecological zone dominated by rainfed cereal-based agriculture. This zone is characterized by limited water availability, high precipitation variability, and frequent drought conditions, resulting in strong year-to-year fluctuations in vegetation development. In contrast, southern Uzbekistan, including the Kashkadarya Region, falls within arid to semi-arid agro-ecological zones with ample irrigated soils, where agricultural production is largely sustained through controlled irrigation. These conditions support cotton-dominated and mixed cropping systems with comparatively stable seasonal vegetation dynamics despite arid climatic conditions.
Despite the agricultural importance of the region, open-access, high-resolution, and ground-validated datasets for cropland monitoring remain limited. Existing studies have largely focused on general land-cover mapping rather than crop-specific temporal dynamics and interannual stability [36,37]. Furthermore, land-use legacies inherited from the Soviet period continue to influence field structure and management practices, while water scarcity and increasing climatic variability impose additional constraints on agricultural productivity and data collection [29,30]. These environmental and institutional challenges make systematic field data acquisition particularly difficult in arid and semi-arid areas, especially in rainfed systems prone to recurrent drought.
To ensure representative coverage and reliable evaluation under these constraints, two pilot regions were selected to reflect the dominant agro-climatic regimes of Central Asia. The Akmola Region in northern Kazakhstan represents a semi-arid continental environment with rainfed wheat-dominated agriculture and high sensitivity to drought. The Kashkadarya Region in southern Uzbekistan represents an arid irrigated agricultural system dominated by cotton and mixed cropping. Together, these regions provide contrasting yet complementary conditions for evaluating the robustness, transferability, and practical applicability of NDVI-based training sample migration in data-scarce and climatically challenging environments.

2.1.1. Satellite Imagery and Preprocessing

The satellite imagery used in this study comprised surface reflectance products from Landsat-8 (LANDSAT/LC08/C02/T1_L2) and Sentinel-2 (COPERNICUS/S2_SR). A biweekly composited NDVI time series was employed for two main reasons: (i) single-date observations are often insufficient to capture crop-specific phenological differences in heterogeneous landscapes, and (ii) time-series composites provide the temporal continuity required to characterize seasonal vegetation dynamics and improve the discrimination of cropland from other land cover types [38,39].
NDVI series were produced for 2016 (Uzbekistan reference) and 2022 (Kazakhstan reference) as well as for the 2021 target year. Median composites were created for each two-week interval following cloud masking and temporal interpolation [40,41], yielding 24 NDVI layers per year covering the entire vegetation season.
The normalized difference vegetation index (NDVI) was calculated according to Equation (1):
N D V I   =   ( N I R     R e d ) / ( N I R   +   R e d )
where NIR and Red denote near-infrared and red spectral reflectance, respectively, with sensor-specific wavelength definitions given by Rouse et al. (1974) [42].
For cropland extent mapping, the 2021 NDVI composites were exported separately for each study region. This division facilitated easier data handling and downloading, while still preserving full temporal coverage with 24 layers per region.

2.1.2. Reference and Validation Samples

The collection of representative training reference samples formed a foundational component of this study. Given the scarcity of harmonized, high-quality ground-truth datasets across Central Asia, the initial focus was placed on assembling reliable reference data from publicly accessible sources.
For Kazakhstan, cropland reference data were obtained via the AgroOpen national satellite monitoring platform, operated by Joint Stock Company “Kazakhstan Gharysh Sapary” (National Space Centre) (https://gharysh.kz) (accessed on 5 March 2025) [43], which provides open-access land cover and crop type maps from 2022 to 2024. The year 2022 was selected as the reference year, representing the most complete and validated dataset. The platform, maintained by the National Space Centre and the Ministry of Agriculture, integrates satellite-based classifications with annual field surveys conducted across multiple administrative districts. This combination of remote sensing and in situ validation ensures high reliability of the dataset for regional-scale analysis.
For Uzbekistan, historical cropland reference data were obtained from the ESA WorldCereal project (https://esa-worldcereal.org/en/reference-data) (accessed on 5 March 2025) [44]. These data include spatially distributed, manually validated crop type labels collected between 2011 and 2018 through field campaigns, national inventories, and expert input. For the purposes of this study, the year 2016 was selected as the reference year, as it corresponds to a period with the highest data reliability and coverage for the country. The dataset spans multiple agro-ecological zones and offers rare temporal depth in a region where high-quality labels are limited. A full description is given by Remelgado et al. (2020) [45], and the dataset is openly available through the ESA WorldCereal portal.
To evaluate classification accuracy, validation samples were generated for the year 2021 using stratified random sampling to ensure representation across major agro-ecological zones [46,47]. Stratification was based on cropland and non-cropland classes, with samples distributed proportionally across the study area to capture spatial variability.
Non-cropland samples were included to support binary cropland versus non-cropland classification and to reduce commission errors in transitional landscapes. In semi-arid and irrigated environments, cropland is often spectrally similar to grassland, fallow land, or sparse natural vegetation. Including representative non-cropland samples therefore improves class separability and ensures a more reliable assessment of classification accuracy.
Validation followed a two-step procedure. First, each sample point was visually interpreted using multiple lines of evidence: (1) Sentinel-2 10 m imagery acquired at key crop growing stages, (2) NDVI time series profiles [39], and (3) high-resolution imagery from sources such as Google Earth and Bing Maps. Second, all points underwent an independent verification using the FAO-supported Collect Earth platform [48], which enables field-scale interpretation by overlaying very high-resolution satellite data (e.g., Google, Bing, Sentinel Hub) with interactive annotation tools. Additionally, long-term NDVI trajectories derived from Landsat-7/8 and Sentinel-2 were examined to confirm vegetation stability or land-cover change over time [49,50]. This temporal assessment reduced uncertainty and improved differentiation between stable cropland, temporary fields, and non-cropland areas.
Within this framework, stable reference samples were defined as points whose cropland label remained consistent between the reference year (2016 for Uzbekistan, 2022 for Kazakhstan) and the target year (2021). Label consistency was verified using independent validation based on visual interpretation of multi-temporal Sentinel-2 imagery, NDVI time-series profiles, and high-resolution base maps. These stable samples formed the basis for testing threshold-based metric combinations (see Section 3.4), while a separate set of independent validation samples was used exclusively for accuracy assessment.
The spatial distribution of reference and validation samples are illustrated in Figure 1 for Uzbekistan and Kazakhstan, while the total number of samples for both countries is summarized in Table 1.

2.2. Methodology

This study proposes a systematic approach to migrate labeled training samples from a reference year to a target year based on NDVI time-series similarity. The workflow (Figure 2) includes satellite image preprocessing, NDVI time-series generation, computation of similarity metrics, threshold-based selection of stable samples, cropland map classification using a Random Forest classifier, and accuracy assessment.
The use of NDVI time-series similarity as a basis for sample migration builds on extensive evidence that cropland systems with stable management practices exhibit recurrent and well-defined phenological trajectories over time. In semi-arid and irrigated agricultural environments, such as those dominating Central Asia, rainfed wheat and irrigated cotton systems tend to maintain consistent seasonal NDVI patterns despite interannual climatic variability. Under these conditions, temporal similarity of NDVI profiles has been shown to provide a reliable indicator of cropland persistence and a practical basis for reusing reference samples across years, particularly when conservative similarity thresholds and independent validation are applied [27].
Following this conceptual framework, both reference- and target-year datasets were processed in parallel to ensure temporal comparability of NDVI time series. Stable samples identified through similarity thresholding were subsequently used to train the classifier and generate the 2021 binary cropland map. The final output was evaluated using independent validation samples and further compared with global cropland products (ESA WorldCover 2021 and ESA WorldCereal 2021) to assess classification accuracy and spatial consistency [4,5].

2.2.1. NDVI Time Series

After preprocessing and exporting the satellite imagery, the analysis continued with the NDVI time-series data stored in CSV format. Each sample contained 24 biweekly NDVI values per year, covering the full vegetation season for the reference years 2016 (Uzbekistan) and 2022 (Kazakhstan), as well as for the target year 2021.
In the first step, raw NDVI curves were plotted to visualize phenological patterns and detect anomalies caused by cloud contamination or atmospheric effects. These unfiltered trajectories revealed seasonal vegetation dynamics but also showed short-term fluctuations and inconsistencies that could interfere with similarity analysis.
The normalization step was particularly important for Kazakhstan. The 2022 growing season in North Kazakhstan was characterized by widespread drought, which substantially reduced NDVI amplitudes and increased intra-seasonal variability. Independent assessments classified large parts of the country as experiencing severe to extreme agricultural drought, with below-average precipitation and vegetation stress during key crop growth stages [51,52,53]. Under such conditions, two cropland fields could display markedly different absolute NDVI values while both remained under cultivation.
To reduce this bias and to emphasize phenological curve shape rather than absolute magnitude, all NDVI time series for Kazakhstan were normalized to the interval [0, 1] using min–max scaling, a standard preprocessing approach in NDVI-based phenological analysis [54], as defined in Equation (2):
N D V I n o r m =   N D V I i N D V I m i n N D V I m a x N D V I m i n
where NDVIi is the raw value, and NDVImin and NDVImax are the minimum and maximum values within the series. This rescaling ensured comparability by emphasizing curve shape rather than magnitude, making similarity metrics sensitive to phenological patterns rather than drought-induced amplitude shifts.
In Uzbekistan, the 2016 reference year was characterized by extensive irrigation and stable vegetation development. NDVI profiles showed high and consistent seasonal peaks, minimizing the need for normalization. Consequently, the analysis proceeded directly to filtering.
Following normalization in Kazakhstan, or immediately after raw plotting in Uzbekistan, NDVI series were smoothed using the Savitzky–Golay filter, a well-established technique in agricultural monitoring that suppresses short-term noise while preserving phenological patterns [55]. This process removed high-frequency variability and produced temporally consistent trajectories suitable for subsequent metric-based similarity analysis.

2.2.2. Distance Metrics

To compare NDVI trajectories between the reference and target years, ten similarity metrics were selected to capture different dimensions of temporal correspondence, including magnitude, timing, and shape similarity. The metrics were grouped into three categories: (i) distance-based metrics, (ii) temporal alignment and angle-based metrics, and (iii) correlation- and similarity-based metrics.
Each metric captures a distinct aspect of vegetation dynamics: distance metrics (e.g., Euclidean, Chebyshev, Minkowski, Cubic, Spectral Angle) quantify absolute differences and detect magnitude shifts [56,57,58]; temporal alignment and angle-based metrics (e.g., DTW, SAD) account for timing offsets between phenological phases and are effective in agricultural systems with asynchronous growing seasons [59,60]; correlation- and similarity-based metrics (e.g., Pearson, Spearman, Cosine, Kendall Tau) capture trend consistency and shape similarity, making them particularly suitable under variable climatic or sensor conditions [61,62].
A summary of all metrics, their mathematical formulations and theoretical advantages are provided in Table 2. All computations were performed on normalized and smoothed NDVI series to ensure comparability across years, sensors, and regions [54,55,63].

2.2.3. Threshold Optimization and Metric Combination

For each distance and similarity metric, a reproducible threshold selection procedure was applied to identify stable reference samples suitable for migration between years. Thresholds were evaluated using a grid-based search within the interval [0, 1] with a step of 0.05. For each candidate threshold, samples were classified as stable or changed based on whether the metric value satisfied the threshold criterion, and the resulting classification was compared against binary cropland labels available for both the reference and target years.
Threshold performance was assessed using an independent validation design in which points maintaining the same cropland label across years were considered stable, while points exhibiting a label change were considered unstable. For each metric, the optimal threshold was defined as the value that maximized validation accuracy while preserving the largest possible number of stable samples. This criterion ensured a balance between classification reliability and sample availability, avoiding overly restrictive thresholds that would substantially reduce training data.
Consistent patterns emerged across metric families. Distance-based, temporal alignment, and angle-based metrics (e.g., Chebyshev distance, SAD, DTW) achieved optimal performance at relatively low threshold values, indicating strong temporal similarity between stable cropland trajectories. In contrast, correlation- and similarity-based metrics (e.g., Pearson correlation, Cosine similarity) performed best at higher threshold values, reflecting their sensitivity to strong temporal consistency rather than absolute magnitude differences [71,72]. All threshold evaluations were conducted using per-sample NDVI metrics and corresponding binary cropland labels stored in CSV format.
To reduce redundancy among similarity measures, pairwise relationships between the ten metrics were quantified using the Pearson correlation coefficient (r). Metrics exhibiting strong linear association were considered redundant, whereas weakly correlated metrics were interpreted as capturing complementary aspects of temporal similarity [73]. In cases of high redundancy, such as between Pearson and Spearman correlation, only one representative metric was retained, selected based on higher validation accuracy and greater retention of stable samples.
Building on this analysis, a hybrid metric combination strategy was developed to integrate complementary similarity properties. Metrics were combined into pairs or triplets with low inter-correlation and distinct sensitivity characteristics, for example, combining a geometric distance metric (Euclidean), a temporal alignment metric (DTW), and a correlation-based metric (Spearman). Each combination was evaluated within the same threshold-based framework to identify stable reference samples.
The performance of both individual metrics and hybrid combinations was assessed across the two study regions using Overall Accuracy (OA). Hybrid configurations consistently outperformed single-metric approaches, demonstrating improved robustness to phenological variability and interannual differences. These results confirm that multi-metric similarity frameworks provide a more stable and transferable basis for NDVI time-series-based sample migration, consistent with recent findings on time-warping and multi-metric approaches.

2.2.4. Cropland Extent Mapping and Validation

Migrated samples in the target year were obtained using the optimal distance–threshold combinations defined in Section 3.4. These stable samples were then used to train a Random Forest (RF) classifier, which incorporated both the 24 biweekly NDVI composites and additional NDVI-derived features such as minimum, maximum, and seasonal amplitude. The trained model was subsequently applied to the 2021 NDVI stack to generate a binary cropland extent map (cropland = 1, non-cropland = 0).
To evaluate the performance of the migrated training samples and the resulting cropland classification, the output map was first compared with two global land cover products, ESA WorldCereal 2021 and ESA WorldCover 2021. The comparison provided insight into the degree of spatial agreement and areas of divergence, allowing assessment of how the migration-based approach performs relative to globally standardized datasets.
Following this global comparison, an independent validation was carried out using validation samples that were not involved in migration or model training. Validation relied on stratified random sampling to maintain balanced representation of cropland and non-cropland areas across agro-ecological zones. Model accuracy was quantified using four commonly applied metrics: Overall Accuracy (OA), Precision, Recall, and F1-score, calculated from the confusion matrix (Table 3).
Finally, the same independent validation dataset was applied consistently to the migration-based cropland map, ESA WorldCereal and WorldCover products to enable a direct and objective comparison of classification accuracy across all three datasets. This final assessment quantified the reliability of the proposed migration approach for cropland detection under the contrasting agricultural and climatic conditions of Central Asia.

3. Results

The results are presented in four interconnected parts that follow the structure of this section. Section 3.1 analyzes the NDVI time-series profiles for the reference and target years, illustrating regional phenological differences and the impact of normalization and smoothing. Section 3.2 reports the behavior and accuracy of the ten similarity metrics across threshold ranges, emphasizing regional contrasts in metric sensitivity. Section 3.3 explores the correlation patterns among metrics to identify complementary measures suitable for hybrid similarity frameworks. Finally, Section 3.4 presents the classification outcomes, including comparisons with global cropland products and independent validation of the 2021 migration-based cropland map.

3.1. NDVI Time-Series Comparison Between Reference and Target Years

The NDVI time-series profiles revealed distinct seasonal vegetation dynamics between the reference and target years in both study regions (Figure 3). In the Akmola Region (Kazakhstan), NDVI trajectories for 2021 and 2022 exhibited a single-peak seasonal pattern typical of rainfed wheat-based cropland, with maximum greenness occurring in June–July. The 2022 season showed lower NDVI amplitudes, reflecting the impact of widespread drought that suppressed vegetation growth. After applying min–max normalization and Savitzky–Golay smoothing, the NDVI trajectories retained their characteristic phenological shape while minimizing short-term fluctuations and amplitude bias, thus ensuring consistency for subsequent similarity analysis.
In the Kashkadarya Region (Uzbekistan), NDVI profiles for 2016 and 2021 demonstrated similar temporal behavior, with a single pronounced peak during July–August, characteristic of irrigated cotton fields. The overall amplitude and timing remained consistent between the two years, indicating stable irrigation-supported vegetation dynamics. Minor fluctuations observed in 2021 may reflect variations in water availability or planting density but do not significantly alter the overall seasonal trajectory.
The normalized and smoothed NDVI profile highlight the importance of using shape-based similarity metrics rather than relying solely on absolute NDVI values. These processed trajectories formed the analytical foundation for computing the ten distance and similarity metrics evaluated in the following section.

3.2. Metric Performance and Threshold Evaluation

The performance of the ten NDVI-based similarity metrics was systematically evaluated across both study regions to determine their capability to differentiate stable cropland samples (unchanged between years) from those that experienced land-cover transitions. Figure 4 illustrates the variation in validation accuracy as a function of the similarity threshold for each metric family, separately for Uzbekistan and Kazakhstan.
Overall, the trends were consistent with theoretical expectations: geometric distance metrics (e.g., Euclidean, Chebyshev, Weighted Minkowski, and Cubic) reached their highest accuracies at relatively low thresholds (0.2–0.4), indicating that small temporal distances between NDVI trajectories correspond to high interannual stability. Temporal alignment and angle-based metrics (DTW and Spectral Angle) also performed well at thresholds around 0.2–0.3 but showed gradual declines beyond 0.5, suggesting reduced discriminative power for looser matching. In contrast, correlation-based metrics (e.g., Pearson, Spearman, Kendall Tau, and Cosine Similarity) achieved optimal accuracy at higher thresholds (0.7–0.95), highlighting their dependence on strong linear or monotonic consistency between annual NDVI curves.
Although the general behavior of the metrics was similar across regions, Kazakhstan showed lower overall accuracy and flatter response curves due to the greater year-to-year variability typical of rainfed systems. Uzbekistan, characterized by irrigated cropland and more stable vegetation dynamics, exhibited steeper and more distinct accuracy peaks across thresholds, reflecting more predictable temporal trajectories.
The optimal threshold values identified from these analyses are summarized in Figure 5. The bar chart compares the best-performing thresholds for each metric between Kazakhstan (Akmola) and Uzbekistan (Kashkadarya). For distance and temporal or angle based metrics in Kazakhstan, lower optimal thresholds (0.15–0.25) were consistently observed. This reflects the use of normalized NDVI values, where min–max scaling was applied to mitigate the effects of drought-induced amplitude suppression and highlight relative phenological patterns. For correlation-based metrics in Kazakhstan, higher optimal thresholds (0.6–0.95) were identified, indicating enhanced shape-based similarity after normalization. In contrast, lower correlation thresholds (0.60–0.80) were found in Uzbekistan, where raw NDVI values, smoothed only with the Savitzky–Golay filter, retained higher absolute amplitudes due to the stability of irrigated cropping systems.
These differences emphasize the influence of preprocessing choices on threshold calibration: normalization enhances shape-based similarity in climatically variable environments, while unscaled NDVI supports more stringent similarity criteria in phenologically stable, irrigated regions.

3.3. Correlation Analysis and Optimal Metric Combinations

To better understand inter-metric dependencies and potential redundancy, pairwise correlations were analyzed using Pearson’s correlation coefficient (r). The resulting matrices (Figure 6) revealed clear clustering patterns consistent with the three predefined metric families. In both Akmola (Kazakhstan) and Kashkadarya (Uzbekistan), geometric distance metrics—Euclidean, Chebyshev, Cubic and Weighted Minkowski—exhibited strong positive correlations (r > 0.8), confirming their shared sensitivity to magnitude-based differences between NDVI trajectories. Temporal alignment and angle-based metrics, particularly DTW and SAD, showed moderate correlation with distance measures (r ≈ 0.5–0.7), reflecting partial overlap in their ability to capture phase shifts and timing variations in seasonal vegetation dynamics.
In contrast, correlation and similarity-based metrics (Pearson, Spearman, Kendall Tau, Cosine-Similarity) exhibited weak association with the distance-based group (r < 0.4), suggesting their complementary role in capturing trend and shape similarity rather than absolute NDVI magnitude. This differentiation between metric families highlights the potential for constructing hybrid similarity frameworks that integrate diverse sensitivity types—magnitude, timing, and trend consistency—to enhance temporal matching robustness.
To visualize these relationships, Figure 7 shows example scatterplots for the two best-performing metric combinations: Chebyshev vs. Cosine Similarity (Kazakhstan) and DTW vs. Cosine Similarity (Uzbekistan). Both combinations exhibit low-to-moderate correlation (R2 = 0.19 and R2 = 0.34, respectively), confirming their non-redundant behavior. DTW captures temporal flexibility in phase alignment, while Cosine Similarity emphasizes overall curve shape and directional similarity. Conversely, Chebyshev’s strong amplitude-based sensitivity complements the shape-preserving properties of Cosine Similarity, making this pairing particularly effective under the variable phenological conditions of rainfed agriculture.
These low R2 relationships guided the selection of hybrid similarity frameworks that integrate metrics from distinct families. These hybrid combinations achieved 5–8% higher overall classification accuracy compared to single-metric models, demonstrating improved resilience to phenological variability and cross-year inconsistencies. The results confirm that integrating alignment- and correlation-based metrics provides a more robust and transferable framework for multi-year cropland mapping under contrasting climatic and management conditions.

3.4. Binary Cropland Map Validation and Comparison with Global Products

3.4.1. Comparison of the Migration-Based Cropland Map with Global Products

The final binary cropland map for 2021 was generated using migrated training samples derived through the NDVI based temporal similarity framework. To evaluate its consistency and thematic precision, the resulting classification was compared with two widely used global cropland products, WorldCereal 2021 and WorldCover 2021, for the Akmola (Kazakhstan) (Figure 8) and Kashkadarya Region (Uzbekistan).
The visual comparison presented in Figure 9a–d shows four representative zoom-in areas extracted from the regional map to illustrate differences in spatial detail and thematic accuracy. Across all subregions, the migration-based binary map demonstrates cleaner delineation of cultivated fields, sharper parcel boundaries, and a clearer exclusion of non-agricultural land, such as fallow areas, steppe vegetation, and semi-arid pastures.
In contrast, both WorldCereal and WorldCover tend to overestimate cropland extent, particularly in sparsely vegetated zones where natural grassland is spectrally similar to low-biomass crops.
For the Akmola Region (Kazakhstan), the agreement with WorldCereal 2021 reached an overall accuracy of 0.93 and a balanced accuracy of 0.87, indicating a high level of consistency. The comparison with WorldCover 2021 produced a slightly lower accuracy of 0.87, primarily due to overestimation of cropland in steppe and pasture areas. Precision (0.94) and recall (0.96) values for the non-cropland class demonstrate that the migration-based map effectively reduced commission errors, yielding a more conservative and realistic estimate of cultivated land extent.
In the Kashkadarya Region (Uzbekistan), the migration-based cropland map achieved an accuracy of 0.92 and a balanced accuracy of 0.7 relative to WorldCereal 2021, and 0.88 compared to WorldCover 2021 (Figure 10).
Overall, the migration-based maps outperform global products in spatial coherence, boundary definition, and suppression of noise in transitional landscapes.
As illustrated in Figure 11a–d, zoomed comparisons across four subregions highlight the markedly sharper cropland boundaries and reduced fragmentation in the migration-based product, especially in irrigated zones where field parcels are narrow and spatially heterogeneous. In Uzbekistan, cropland limits are delineated more precisely, and non-agricultural areas such as orchards, canals, and settlement fringes are more accurately excluded. The slightly lower recall for cropland (0.49) reflects the complexity of irrigated systems with multi-cropping and seasonal fallow patterns. Nevertheless, the high precision (0.93) indicates that the map effectively removes false cropland detections in orchards, abandoned fields, and marginal irrigation zones frequently misclassified in global datasets.
While WorldCereal and WorldCover tend to overestimate cropland, particularly in dryland regions, the proposed approach provides a more accurate and agronomically consistent representation of cultivated land.

3.4.2. Independent Validation Using Reference Samples

Independent accuracy assessment was performed using validation samples not involved in model training to evaluate the reliability of the migration-based cropland maps (Table 4). The validation confirmed that the proposed approach achieved strong classification performance across both rainfed and irrigated agricultural systems.
In the Akmola Region (Kazakhstan), the model reached an overall accuracy of 0.86 and an F1-score of 0.85 for the cropland class. The producer’s accuracy (recall) of 0.92 indicates that most cropland pixels were correctly identified, whereas the user’s accuracy (precision) of 0.78 suggests minor commission errors, primarily along fragmented field boundaries and transitional steppe-cropland zones.
When compared with global datasets, WorldCereal and WorldCover achieved lower accuracies of 0.79 and 0.69, respectively, confirming that the migration-based model provided a more balanced and realistic representation of cropland extent.
In the Kashkadarya Region (Uzbekistan), the classification demonstrated even higher reliability, with an overall accuracy of 0.95 and an F1-score of 0.93. The precision (0.91) and recall (0.96) values for the cropland class highlight the model’s strong performance in complex irrigated landscapes.
By comparison, WorldCereal and WorldCover achieved accuracies of 0.88 and 0.83, respectively, both showing reduced ability to capture the fine spatial variability of irrigated cotton and double-cropping systems.
These validation results confirm that the migration-based Random Forest model generalizes effectively across regions with contrasting agricultural practices and climatic conditions. The combination of NDVI trajectory similarity metrics, adaptive threshold selection, and hybrid metric integration contributed to the model’s superior accuracy, especially in distinguishing cultivated land from non-agricultural vegetation.

4. Discussion

This study demonstrates that NDVI time-series similarity can provide a robust operational basis for training sample migration in annual cropland mapping when implemented within a systematic and validated framework. By explicitly evaluating multiple similarity metrics and their combinations, the proposed approach extends beyond single-metric or heuristic migration strategies and enables the reuse of high-quality reference samples across years.
The comparative analysis of ten similarity metrics showed that different metric families capture complementary aspects of interannual NDVI variability. Distance-based metrics were sensitive to absolute magnitude differences, temporal alignment metrics such as Dynamic Time Warping (DTW) effectively handled phase shifts in seasonal development, and correlation-based metrics emphasized overall trajectory shape. The improved performance observed for hybrid metric combinations confirms that integrating complementary similarity properties enhances robustness, consistent with previous findings in time-series analysis and land-cover mapping [56,57,58,59,60,62].
A key strength of the proposed framework is its demonstrated effectiveness in climatically challenging environments. The successful application of the method in drought-prone rainfed wheat systems in northern Kazakhstan is particularly notable, as such conditions are typically associated with suppressed NDVI amplitudes and increased interannual variability. The results indicate that, despite these constraints, NDVI-based sample migration remains feasible when conservative thresholds and appropriate preprocessing strategies are applied. The comparable performance achieved in irrigated systems in Uzbekistan further highlights the adaptability of the framework across contrasting management regimes.
Comparisons with global cropland products, including ESA WorldCereal 2021 and ESA WorldCover 2021, showed that migration-based maps exhibited improved spatial coherence, sharper field boundaries, and reduced misclassification in semi-arid transition zones. These improvements stem from prioritizing temporally stable training samples, which reduces spectral confusion between cropland and sparsely vegetated non-cropland areas rather than merely increasing overall accuracy.
Several limitations of the proposed approach should be acknowledged. First, the reliability of migrated samples depends on the quality of the initial reference datasets; errors in reference labels may propagate across years if not adequately controlled. Second, abrupt changes in agricultural management, such as crop rotation, temporary fallowing, or extreme climatic anomalies, can alter NDVI trajectories even when land use remains unchanged, reducing migration reliability. Third, although threshold selection was implemented in a systematic and reproducible manner, optimal thresholds may vary across crop types, regions, and climatic regimes. Future work could explore adaptive or data-driven threshold optimization strategies, such as ROC-based methods, to further enhance generalizability.
Finally, while this study was conducted for two pilot regions, the methodological design is not region-specific and can be extended to other drought-prone and data-scarce agricultural systems where multi-year NDVI time series are available. Future research should focus on large-scale cross-country transfer experiments, integration of crop-type information, and uncertainty quantification to further assess the operational potential of training sample migration for long-term cropland monitoring.

5. Conclusions

This study demonstrates that NDVI time-series-based training sample migration provides an effective and scalable solution for updating annual cropland maps in regions with limited ground-truth data. By exploiting temporal similarity between reference-year and target-year NDVI trajectories, the proposed framework enables the identification of temporally stable cropland samples and supports reliable model training without the need for repeated field surveys.
The approach was evaluated across two contrasting agricultural systems in Central Asia, rainfed wheat production in the Akmola Region of Kazakhstan and irrigated cotton-dominated systems in the Kashkadarya Region of Uzbekistan. High classification accuracies of 86% and 95%, respectively, confirm the robustness of the method under different climatic and management conditions and highlight the importance of temporal consistency in NDVI trajectories for successful sample transfer.
A key contribution of this work lies in the systematic evaluation and integration of multiple similarity metrics representing geometric, temporal alignment, and correlation-based families. The results show that combining complementary metrics enhances robustness and transferability compared to single-metric approaches, particularly in heterogeneous and drought-prone agricultural landscapes. The comparison with global cropland products (WorldCereal 2021 and WorldCover 2021) further demonstrated improved spatial coherence, sharper field boundaries, and reduced misclassification in semi-arid transition zones.
Overall, the proposed framework extends the temporal utility of existing labeled datasets and offers a cost-effective, repeatable pathway for regional cropland monitoring in regions where ground-truth data are scarce or difficult to collect. Future research should focus on adaptive threshold selection, uncertainty quantification, and the integration of crop-type information to further strengthen the operational applicability of training sample migration for large-scale, multi-year cropland mapping.

Author Contributions

A.B.: data collection, analysis, visualization, and writing of the original draft; P.H.: conceptualization, methodology design, manuscript review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All input datasets used in this study are openly available from public satellite data repositories and global land cover products, as referenced in the manuscript. No new datasets were deposited in public repositories. The cropland classification maps and intermediate outputs generated during the analysis were produced solely for the purpose of this study and are available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge the support and guidance of Zhongxin Chen and Karl Morteo, Senior IT Officers, as well as the Digital FAO and Agro-Informatics Division of the Food and Agriculture Organization of the United Nations. During the preparation of this manuscript, the author(s) used ChatGPT (OpenAI, GPT-5, 2025) to assist with minor text and lan-guage editing. All generated content was reviewed, revised, and approved by the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NDVINormalized Difference Vegetation Index
EOEarth Observation
DTWDynamic Time Warping
SADSpectral Angle Distance

References

  1. Fritz, S.; See, L.; Perger, C.; McCallum, I.; Schill, C.; Schepaschenko, D.; Duerauer, M.; Karner, M.; Dresel, C.; Laso Bayas, J.-C.; et al. A global dataset of crowdsourced land cover and land use reference data. Sci. Data 2017, 4, 170075. [Google Scholar] [CrossRef] [PubMed]
  2. Thenkabail, P.S.; Xiong, J.; Gumma, M.K.; Giri, C.; Milesi, C.; Ozdogan, M.; Congalton, R.; Tilton, J.; Sankey, T.R.; Massey, R.; et al. Global Cropland Area Database (GCAD) derived from remote sensing in support of food security in the twenty-first century: Current achievements and future possibilities. In Remote Sensing Handbook, Volume II: Land Resources Monitoring, Modeling, and Mapping with Remote Sensing; Taylor & Francis: Boca Raton, FL, USA, 2015; pp. 1–45. Available online: https://pubs.usgs.gov/publication/70117684 (accessed on 16 September 2025).
  3. Food and Agriculture Organization of the United Nations (FAO). FAOSTAT Cropland and Land Use: CNB Methodology 2022; FAO: Rome, Italy, 2022; Available online: https://files-faostat.fao.org/production/ESB/CNB%20methodology_2022.pdf (accessed on 20 March 2025).
  4. European Space Agency (ESA). WorldCover 10 m 2021 v100; ESA: Paris, France, 2021; Available online: https://esa-worldcover.org (accessed on 7 March 2025).
  5. European Space Agency (ESA) WorldCereal. Global Maps; ESA: Paris, France, 2021; Available online: https://esa-worldcereal.org/en/products/global-maps (accessed on 7 March 2025).
  6. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef]
  7. Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Liang, L.; Li, C.; Wang, X.; Bai, Y.; Cheng, Y.; Zhu, Z.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef]
  8. Potapov, P.; Hansen, M.C.; Pickens, A.; Hernandez-Serna, A.; Tyukavina, A.; Zalles, V.; Li, X.; Khan, A.; Stolle, F.; Stehman, S. The Global 2000–2020 Land Cover and Land Use Change Dataset derived from the Landsat archive: First results. Front. Remote Sens. 2022, 3, 856903. [Google Scholar] [CrossRef]
  9. Phalke, A.; Ozdogan, M.; Thenkabail, P.; Erickson, T.; Gorelick, N.; Yadav, K.; Congalton, R. Mapping croplands of Europe, Middle East, Russia, and Central Asia using Landsat, Random Forest, and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 167, 104–122. [Google Scholar] [CrossRef]
  10. Gumma, M.K.; Thenkabail, P.S.; Teluguntla, P.G.; Oliphant, A.; Xiong, J.; Giri, C.; Pyla, V.; Dixit, S.; Whitbread, A.M. Agricultural cropland extent and areas of South Asia derived using Landsat 30 m time-series big data and Random Forest machine learning algorithms on the Google Earth Engine cloud. GISci. Remote Sens. 2020, 57, 302–322. [Google Scholar] [CrossRef]
  11. Herold, M.; Mayaux, P.; Woodcock, C.E.; Baccini, A.; Schmullius, C. Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets. Remote Sens. Environ. 2008, 112, 2538–2556. [Google Scholar] [CrossRef]
  12. Kerner, H.; Nakalembe, C.; Yang, A.; Zvonkov, I.; McWeeny, R.; Tseng, G.; Becker-Reshef, I. How accurate are existing land cover maps for agriculture in Sub-Saharan Africa? Sci. Data 2024, 11, 486. [Google Scholar] [CrossRef]
  13. Lu, M.; Wu, W.; Zhang, L.; Peng, J.; You, L.; Yang, P.; Li, Z.; Cui, Y. A comparative analysis of five global cropland datasets in China. Sci. China Earth Sci. 2016, 59, 2307–2317. [Google Scholar] [CrossRef]
  14. Wang, Z.; Mountrakis, G. Accuracy assessment of eleven medium resolution global and regional land cover land use products: A case study over the conterminous United States. Remote Sens. 2023, 15, 3186. [Google Scholar] [CrossRef]
  15. Laso Bayas, J.C.; See, L.; Perger, C.; Justice, C.; Nakalembe, C.; Dempewolf, J.; Fritz, S. Validation of automatically generated global and regional cropland data sets: The case of Tanzania. Remote Sens. 2017, 9, 815. [Google Scholar] [CrossRef] [PubMed]
  16. Fritz, S.; See, L.; Laso Bayas, J.C.; Waldner, F.; Jacques, D.; Becker-Reshef, I.; Whitcraft, A.; Baruth, B.; Bonifacio, R.; Crutchfield, J.; et al. A comparison of global agricultural monitoring systems and current gaps. Agric. Syst. 2019, 168, 258–272. [Google Scholar] [CrossRef]
  17. Zhang, C.; Kerner, H.; Wang, S.; Hao, P.; Li, Z.; Hunt, K.A.; Abernethy, J.; Zhao, H.; Gao, F.; Di, L.; et al. Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products. Remote Sens. Environ. 2025, 330, 114995. [Google Scholar] [CrossRef]
  18. Li, C.; Ma, Z.; Wang, L.; Yu, W.; Tan, D.; Gao, B.; Feng, Q.; Guo, H.; Zhao, Y. Improving the accuracy of land cover mapping by distributing training samples. Remote Sens. 2021, 13, 4594. [Google Scholar] [CrossRef]
  19. Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a Random Forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
  20. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  21. Millard, K.; Richardson, M. On the importance of training data sample selection in Random Forest image classification: A case study in peatland ecosystem mapping. Remote Sens. 2015, 7, 8489–8515. [Google Scholar] [CrossRef]
  22. Stehman, S.V. Sampling designs for accuracy assessment of land cover. Int. J. Remote Sens. 2009, 30, 5243–5272. [Google Scholar] [CrossRef]
  23. Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar] [CrossRef]
  24. Belgiu, M.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted Dynamic Time Warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
  25. Nabil, M.; Zhang, M.; Wu, B.; Bofana, J.; Elnashar, A. Constructing a 30 m African cropland layer for 2016 by integrating multiple remote sensing, crowdsourced, and auxiliary datasets. Big Earth Data 2021, 6, 54–76. [Google Scholar] [CrossRef]
  26. Han, W.; Zheng, J.; Guan, J.; Liu, Y.; Liu, L.; Han, C.; Li, J.; Li, C.; Mao, X.; Tian, R. Assessment of vegetation drought loss and recovery in Central Asia considering a comprehensive vegetation index. Remote Sens. 2024, 16, 4189. [Google Scholar] [CrossRef]
  27. Hao, P.; Löw, F.; Biradar, C. Annual cropland mapping using reference Landsat time series: A case study in Central Asia. Remote Sens. 2018, 10, 2057. [Google Scholar] [CrossRef]
  28. Lu, L.; Guo, H.; Kuenzer, C.; Klein, I.; Zhang, L.; Li, X. Analyzing phenological changes with remote sensing data in Central Asia. IOP Conf. Ser. Earth Environ. Sci. 2014, 17, 12005. [Google Scholar] [CrossRef]
  29. Raab, C.; Spies, M. Characterising cropland fragmentation in post-Soviet Central Asia using Landsat remote-sensing time series data. Appl. Geogr. 2023, 156, 102968. [Google Scholar] [CrossRef]
  30. Lerman, Z. Agricultural recovery in the former Soviet Union: An overview of 15 years of land reform and farm restructuring. Post Communist Econ. 2008, 20, 391–412. [Google Scholar] [CrossRef]
  31. Spiegel, S. Time Series Mining: Segmentation, Classification, and Clustering of Temporal Data. Doctoral Thesis, Technische Universität Berlin, Berlin, Germany, 2015. [Google Scholar] [CrossRef]
  32. Jin, W.; Li, J.; Zhang, S.; Feng, M.; Feng, S. Weighted Minkowski distance-based differential protection for active distribution networks considering the uncertainty of frequency-domain characteristics. Front. Energy Res. 2023, 11, 1242325. [Google Scholar] [CrossRef]
  33. Food and Agriculture Organization of the United Nations (FAO). GIEWS Country Brief: Kazakhstan—27 March 2024. 2024. Available online: https://www.fao.org/giews/countrybrief/country.jsp?code=KAZ (accessed on 1 June 2025).
  34. World Bank. Weaving a New Future in Uzbekistan’s Cotton Sector. World Bank, 27 May 2025. Available online: https://www.worldbank.org/en/news/feature/2025/05/27/weaving-a-new-future-in-uzbekistan-s-cotton-sector (accessed on 1 June 2025).
  35. Food and Agriculture Organization of the United Nations (FAO). International Institute for Applied Systems Analysis (IIASA). Global Agro-Ecological Zones (GAEZ v4): Model Documentation; FAO: Rome, Italy; IIASA: Laxenburg, Austria, 2021; Available online: https://gaez.fao.org/ (accessed on 20 December 2025).
  36. Klein, I.; Gessner, U.; Kuenzer, C. Generation of up-to-date land cover maps for Central Asia. In Novel Measurement and Assessment Tools for Monitoring and Management of Land and Water Resources in Agricultural Landscapes of Central Asia; Environmental Science and Engineering Series; Kuenzer, C., Dech, S., Wagner, W., Eds.; Springer: Cham, Switzerland, 2014; pp. 249–273. [Google Scholar] [CrossRef]
  37. Hu, Y.; Hu, Y. Land cover changes and their driving mechanisms in Central Asia from 2001 to 2017 supported by Google Earth Engine. Remote Sens. 2019, 11, 554. [Google Scholar] [CrossRef]
  38. Kovalskyy, V.; Roy, D.P.; Zhang, X.Y.; Ju, J. The suitability of multi-temporal web-enabled Landsat data NDVI for phenological monitoring: A comparison with flux tower and MODIS NDVI. Remote Sens. Lett. 2011, 3, 325–334. [Google Scholar] [CrossRef]
  39. Hasenbein, K.; Abdel-Rahman, E.M.; Adan, M.; Gachoki, S.M.; King’ori, E.; Dubois, T.; Landmann, T. Availability of Sentinel-2-based time-series observations: Which vegetation phenology-based metrics perform best for mapping farming systems in complex landscapes? Remote Sens. Lett. 2022, 13, 695–707. [Google Scholar] [CrossRef]
  40. Vermote, E.; Justice, C.; Claverie, M.; Franch, B. Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sens. Environ. 2016, 185, 339–349. [Google Scholar] [CrossRef]
  41. Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud-shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
  42. Rouse, J.W.; Haas, R.H.; Deering, D.W.; Schell, J.A. Monitoring the vernal advancement and retrogradation (green wave effect) of natural vegetation. In Final Report RSC 1978-4; Remote Sensing Center, Texas A&M University: College Station, TX, USA, 1974. [Google Scholar]
  43. Joint Stock Company “Kazakhstan Gharysh Sapary” (National Space Centre). AgroOpen National Satellite Monitoring Platform. Astana, Kazakhstan. 2025. Available online: https://gharysh.kz/ (accessed on 5 March 2025).
  44. European Space Agency (ESA) WorldCereal. Reference Data Portal. 2021. Available online: https://esa-worldcereal.org/en/reference-data (accessed on 5 March 2025).
  45. Remelgado, R.; Zaitov, S.; Kenjabaev, S.; Martynenko, A.; Lamers, J.P.A.; Conrad, C. A crop type dataset for consistent land cover classification in Central Asia. Sci. Data 2020, 7, 250. [Google Scholar] [CrossRef] [PubMed]
  46. Noor, S.; Tajik, O.; Golzar, J. Simple random sampling. Int. J. Educ. Lang. Stud. 2022, 1, 78–82. [Google Scholar] [CrossRef]
  47. Koutsos, T.M.; Menexes, G.C. Comparing spatial sampling designs for estimating effectively maize crop traits in experimental plots. Agronomy 2024, 14, 280. [Google Scholar] [CrossRef]
  48. Food and Agriculture Organization of the United Nations (FAO); World Resources Institute (WRI). Collect Earth: Land Use Survey through Visual Interpretation; FAO and WRI: Rome, Italy, 2020; Available online: https://www.openforis.org/tools/collect-earth.html (accessed on 9 April 2025).
  49. Roy, D.P.; Wulder, M.A.; Loveland, T.R.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Helder, D.; Irons, J.R.; Johnson, D.M.; Kennedy, R.; et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ. 2014, 145, 154–172. [Google Scholar] [CrossRef]
  50. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  51. Ryssaliyeva, L.; Salnikov, V.; Lin, Z.; Raimbekova, Z. Seasonal sensitivity of drought indices in Northern Kazakhstan: A comparative evaluation and selection of optimal indicators. Sustainability 2025, 17, 9413. [Google Scholar] [CrossRef]
  52. Grain Union of Kazakhstan. Overview of the 2022 Agricultural Season in Kazakhstan: Drought Impacts and Grain Yield Assessment; Union of Grain Processors and Exporters of Kazakhstan: Astana, Kazakhstan, 2022; Available online: https://www.grainunion.kz/upload/files/e2t2fs26388c4ogk044w.pdf (accessed on 25 September 2025).
  53. Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ). Economy-Wide Impacts of Climate Change in Kazakhstan; GIZ: Bonn, Germany, 2025; Available online: https://www.giz.de/sites/default/files/media/pkb-document/2025-09/giz2025-en-kazakhstan-economy-wide-impacts-climate.pdf (accessed on 25 September 2025).
  54. Kobayashi, H.; Nagai, S.; Kim, Y.; Yang, W.; Ikeda, K.; Ikawa, H.; Nagano, H.; Suzuki, R. In Situ Observations Reveal How Spectral Reflectance Responds to Growing Season Phenology of an Open Evergreen Forest in Alaska. Remote Sens. 2018, 10, 1071. [Google Scholar] [CrossRef]
  55. Chen, J.; Jönsson, P.; Tamura, M.; Gu, Z.; Matsushita, B.; Eklundh, L. A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky–Golay filter. Remote Sens. Environ. 2004, 91, 332–344. [Google Scholar] [CrossRef]
  56. Lhermitte, S.; Verbesselt, J.; Verstraeten, W.W.; Coppin, P. A comparison of time series similarity measures for classification and change detection of ecosystem dynamics. Remote Sens. Environ. 2011, 115, 3129–3152. [Google Scholar] [CrossRef]
  57. Ding, H.; Trajcevski, G.; Scheuermann, P.; Wang, X.; Keogh, E. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 2008, 1, 1542–1552. [Google Scholar] [CrossRef]
  58. Wang, X.; Mueen, A.; Ding, H.; Trajcevski, G.; Scheuermann, P.; Keogh, E. Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 2013, 26, 275–309. [Google Scholar] [CrossRef]
  59. Petitjean, F.; Inglada, J.; Gançarski, P. Satellite image time series analysis under time warping. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3081–3095. [Google Scholar] [CrossRef]
  60. Giorgino, T. Computing and visualizing dynamic time warping alignments in R: The dtw package. J. Stat. Softw. 2009, 31, 1–24. [Google Scholar] [CrossRef]
  61. Dongen, S.; Enright, A. Metric distances derived from cosine similarity and Pearson and Spearman correlations. arXiv 2012, arXiv:1208.3145. [Google Scholar] [CrossRef]
  62. Novák, V.; Mirshahi, S. On the similarity and dependence of time series. Mathematics 2021, 9, 550. [Google Scholar] [CrossRef]
  63. Jönsson, P.; Cai, Z.; Melaas, E.; Friedl, M.A.; Eklundh, L. A method for robust estimation of vegetation seasonality from Landsat and Sentinel-2 time series data. Remote Sens. 2018, 10, 635. [Google Scholar] [CrossRef]
  64. Huang, H.; Wang, J.; Liu, C.; Liang, L.; Li, C.; Gong, P. The migration of training samples towards dynamic global land cover mapping. ISPRS J. Photogramm. Remote Sens. 2020, 161, 27–36. [Google Scholar] [CrossRef]
  65. Fekri, E.; Latifi, H.; Amani, M.; Zobeidinezhad, A. A training sample migration method for wetland mapping and monitoring using Sentinel data in Google Earth Engine. Remote Sens. 2021, 13, 4169. [Google Scholar] [CrossRef]
  66. Ibrahim, I.; Gillis, J.; Decré, W.; Swevers, J. Exact wavefront propagation for globally optimal one-to-all path planning on 2D Cartesian grids. IEEE Robot. Autom. Lett. 2024, 9, 9431–9437. [Google Scholar] [CrossRef]
  67. Zhang, Z.; Tang, P.; Hu, C.; Liu, Z.; Zhang, W.; Tang, L. Seeded classification of satellite image time series with lower-bounded dynamic time warping. Remote Sens. 2022, 14, 2778. [Google Scholar] [CrossRef]
  68. Elsevier. Pearson correlation. In ScienceDirect Topics; Elsevier: Amsterdam, The Netherlands, 2025; Available online: https://www.sciencedirect.com/topics/computer-science/pearson-correlation (accessed on 10 September 2025).
  69. SAGE Publications. Correlation, Spearman. In The SAGE Encyclopedia of Communication Research Methods; Allen, M., Ed.; SAGE Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar] [CrossRef]
  70. Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  71. Zhang, Z.; Jiang, W.; Song, J.; Ling, Z.; Yang, Z.; Van de Voorde, T.; Ngoie Inabanza, O. Mapping the first global long time series wetland multitype sample dataset via the Google Earth Engine: A hybrid method of automated generation–index thresholding–spectral matching. GIScience Remote Sens. 2025, 62, 2553942. [Google Scholar] [CrossRef]
  72. Werner, R.; Valev, D.; Danov, D. The Pearson’s correlation—A measure for the linear relationships between time series? Fundam. Space Res. 2009, 92, 92–96. [Google Scholar]
  73. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Routledge: New York, NY, USA, 1988. [Google Scholar] [CrossRef]
Figure 1. Distribution of cropland and non-cropland samples used for training (reference) and validation across two study regions: Akmola Region (a) (Zharkain and Zhaksynsky districts), Kazakhstan, and (b) Kashkadarya Region, Uzbekistan.
Figure 1. Distribution of cropland and non-cropland samples used for training (reference) and validation across two study regions: Akmola Region (a) (Zharkain and Zhaksynsky districts), Kazakhstan, and (b) Kashkadarya Region, Uzbekistan.
Land 15 00156 g001
Figure 2. Workflow of cropland mapping using sample migration and NDVI-based similarity metrics.
Figure 2. Workflow of cropland mapping using sample migration and NDVI-based similarity metrics.
Land 15 00156 g002
Figure 3. Biweekly NDVI trajectories for cropland sites in Akmola (Kazakhstan) and Bukhara (Uzbekistan) across representative years. The shaded areas mark the growing seasons (April-September in Kazakhstan, June–October in Uzbekistan). Differences between raw, normalized and smoothed NDVI curves illustrate the seasonal vegetation dynamics and data preprocessing steps for cropland mapping.
Figure 3. Biweekly NDVI trajectories for cropland sites in Akmola (Kazakhstan) and Bukhara (Uzbekistan) across representative years. The shaded areas mark the growing seasons (April-September in Kazakhstan, June–October in Uzbekistan). Differences between raw, normalized and smoothed NDVI curves illustrate the seasonal vegetation dynamics and data preprocessing steps for cropland mapping.
Land 15 00156 g003
Figure 4. Heatmaps of validation accuracy across tested threshold values for all NDVI-based similarity metrics in (a) Uzbekistan (irrigated agricultural system) and (b) Kazakhstan (rainfed agricultural system). Colors indicate validation accuracy, with warmer colors corresponding to higher accuracy.
Figure 4. Heatmaps of validation accuracy across tested threshold values for all NDVI-based similarity metrics in (a) Uzbekistan (irrigated agricultural system) and (b) Kazakhstan (rainfed agricultural system). Colors indicate validation accuracy, with warmer colors corresponding to higher accuracy.
Land 15 00156 g004
Figure 5. Optimal thresholds for NDVI similarity metrics in Kazakhstan (Akmola) and Uzbekistan (Kashkadarya). Bars represent optimal threshold values, where dark-colored bars with solid outlines correspond to Kazakhstan, and light-colored bars with dashed outlines correspond to Uzbekistan. The horizontal dashed line indicates the reference similarity threshold value of 0.5 used in the analysis.
Figure 5. Optimal thresholds for NDVI similarity metrics in Kazakhstan (Akmola) and Uzbekistan (Kashkadarya). Bars represent optimal threshold values, where dark-colored bars with solid outlines correspond to Kazakhstan, and light-colored bars with dashed outlines correspond to Uzbekistan. The horizontal dashed line indicates the reference similarity threshold value of 0.5 used in the analysis.
Land 15 00156 g005
Figure 6. Pearson correlation matrices of the ten NDVI-based similarity metrics for (a) the Akmola Region (Kazakhstan, rainfed system) and (b) the Kashkadarya Region (Uzbekistan, irrigated system).
Figure 6. Pearson correlation matrices of the ten NDVI-based similarity metrics for (a) the Akmola Region (Kazakhstan, rainfed system) and (b) the Kashkadarya Region (Uzbekistan, irrigated system).
Land 15 00156 g006
Figure 7. Pairwise correlations for selected metric combinations: (a) Chebyshev vs. Cosine Similarity for Akmola Region, Kazakhstan; (b) DTW vs. Cosine Similarity for Kashkadarya Region, Uzbekistan.
Figure 7. Pairwise correlations for selected metric combinations: (a) Chebyshev vs. Cosine Similarity for Akmola Region, Kazakhstan; (b) DTW vs. Cosine Similarity for Kashkadarya Region, Uzbekistan.
Land 15 00156 g007
Figure 8. Comparison of the migration-based binary cropland map (2021) with global products for the Akmola Region, Kazakhstan. Red boxes (a–d) indicate representative areas selected for detailed visual comparison with global products; corresponding zoomed-in views are presented in Figure 9a–d.
Figure 8. Comparison of the migration-based binary cropland map (2021) with global products for the Akmola Region, Kazakhstan. Red boxes (a–d) indicate representative areas selected for detailed visual comparison with global products; corresponding zoomed-in views are presented in Figure 9a–d.
Land 15 00156 g008
Figure 9. Binary cropland map for the Akmola Region (Kazakhstan) in 2021, compared with WorldCereal and WorldCover global products. Insets (ad) show detailed views of typical land-use structures, including rainfed wheat cropland and steppe–cropland transitions, illustrating the improved parcel delineation achieved by the migration-based classification. Green areas represent cropland, while beige areas indicate non-cropland in all map panels.
Figure 9. Binary cropland map for the Akmola Region (Kazakhstan) in 2021, compared with WorldCereal and WorldCover global products. Insets (ad) show detailed views of typical land-use structures, including rainfed wheat cropland and steppe–cropland transitions, illustrating the improved parcel delineation achieved by the migration-based classification. Green areas represent cropland, while beige areas indicate non-cropland in all map panels.
Land 15 00156 g009
Figure 10. Comparison of the migration-based binary cropland map (2021) with global products for Kashkadarya Region, Uzbekistan. Red boxes (a–d) indicate representative areas selected for detailed visual comparison, with corresponding zoomed-in views shown in Figure 11a–d.
Figure 10. Comparison of the migration-based binary cropland map (2021) with global products for Kashkadarya Region, Uzbekistan. Red boxes (a–d) indicate representative areas selected for detailed visual comparison, with corresponding zoomed-in views shown in Figure 11a–d.
Land 15 00156 g010
Figure 11. Binary cropland map for Kashkadarya Region (Uzbekistan) in 2021, compared with WorldCereal and WorldCover global products. Insets (ad) show detailed views of representative areas selected for visual comparison, illustrating the spatial differences between the migration-based cropland map and global products. Green areas represent cropland, while beige areas indicate non-cropland in all map panels.
Figure 11. Binary cropland map for Kashkadarya Region (Uzbekistan) in 2021, compared with WorldCereal and WorldCover global products. Insets (ad) show detailed views of representative areas selected for visual comparison, illustrating the spatial differences between the migration-based cropland map and global products. Green areas represent cropland, while beige areas indicate non-cropland in all map panels.
Land 15 00156 g011
Table 1. The number of reference and validation samples across the study area.
Table 1. The number of reference and validation samples across the study area.
CountryRegion (District)Reference SamplesValidation Samples
CroplandNon-CroplandCroplandNon-Cropland
Kazakhstan
(2022)
Akmola (Zharkain, Zhaksynsky)1301864969
Uzbekistan
(2016)
Kashkadarya1711135190
Table 2. Summarizes the mathematical formulations of all ten metrics and the rationale behind their inclusion.
Table 2. Summarizes the mathematical formulations of all ten metrics and the rationale behind their inclusion.
CategoryMetricFormulaDescription
Distance-based
metrics
Euclidean distance (ED) E D = i = 1 N ( X i Y i ) 2 Measures absolute magnitude differences between two NDVI trajectories [64,65].
Chebyshev distance D x y = lim p ( i = 1 n x i y i p ) 1 p Captures the largest single deviation across the NDVI time series [31].
Minkowski (weighted) M r = p = 1 3 ω p M r p n Generalized distance; weights and exponent p control sensitivity [32].
Cubic distance D x y = ( i = 1 n x i y i 3 ) 1 3 Emphasizes larger deviations and strong outliers [66].
Temporal alignment and
angle-based
Dynamic Time Warping (DTW) D T W ( X , Y ) = min W k = 1 K d ( x i k , y j k ) Aligns similar NDVI sequences that differ in timing [67].
Spectral Angle Distance (SAD) θ = c o s 1   t = 1 N X t t 1   Y t ( t 1 ) t = 1 N ( X t t 1 ) 2 t = 1 N ( Y t t 1 ) 2 Measures angular similarity; insensitive to illumination or scale [64,65].
Correlation and similarity-based
metrics
Pearson r = ( x i x ¯   ) ( y i y ¯   ) ( x i x ¯   ) 2     ( y i y ¯   ) 2 Quantifies linear relationship strength [61,68].
Spearman ρ = 1 6 i = 1 n d i 2 n ( n 2 1 ) Captures monotonic trends; robust to non-linearity [61,69].
Kendall Tau τ = n c n d 1 2 n ( n 1 ) Based on concordant– discordant pairs; rank-based measure [70].
Cosine similarity s = x i y i x i 2   y i 2 Focuses on shape similarity while ignoring amplitude [61].
Note: x i and y i represent NDVI values at time step i extracted from the reference year and target year time series, respective; N = 24 corresponds to the number of biweekly observations per growing season.
Table 3. Structure of the confusion matrix used for accuracy assessment.
Table 3. Structure of the confusion matrix used for accuracy assessment.
Predicted CroplandPredicted Non-Cropland
Actual CroplandTPFN
Actual Non-CroplandFPTN
TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.
Table 4. Accuracy assessment results based on independent validation samples.
Table 4. Accuracy assessment results based on independent validation samples.
RegionProductOverall AccuracyPrecision (Crop)Recall (Crop)F1 (Crop)
AkmolaBinary Map (2021)0.860.780.920.85
WorldCereal (ESA, 2021)0.790.710.860.78
WorldCover (ESA, 2021)0.690.571.000.73
KashkadaryaBinary Map (2021)0.950.910.960.93
WorldCereal (ESA, 2021)0.880.751.000.86
WorldCover (ESA, 2021)0.830.681.000.81
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Batkalova, A.; Hao, P. Training Sample Migration for Temporal Cropland Mapping in Central Asia. Land 2026, 15, 156. https://doi.org/10.3390/land15010156

AMA Style

Batkalova A, Hao P. Training Sample Migration for Temporal Cropland Mapping in Central Asia. Land. 2026; 15(1):156. https://doi.org/10.3390/land15010156

Chicago/Turabian Style

Batkalova, Aiman, and Pengyu Hao. 2026. "Training Sample Migration for Temporal Cropland Mapping in Central Asia" Land 15, no. 1: 156. https://doi.org/10.3390/land15010156

APA Style

Batkalova, A., & Hao, P. (2026). Training Sample Migration for Temporal Cropland Mapping in Central Asia. Land, 15(1), 156. https://doi.org/10.3390/land15010156

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop