1. Introduction
Maize is one of the three major food crops in the world, with a global production exceeding 1.22 billion tons in 2024, according to data from the United States Department of Agriculture (USDA). However, while it alleviates the hunger problem in modern society, its large-scale cultivation has already negatively influenced the natural environment [
1,
2,
3]. Consequently, accurate and timely mapping of maize planting areas is essential for sustainable agriculture and national food security, while also serving as a key data source for maize growth monitoring and yield estimation [
4,
5,
6].
Nowadays, for practical crop mapping tasks, three main types of remote sensing imagery are commonly utilized: Synthetic Aperture Radar (SAR), optical imagery, and their integration [
7,
8,
9,
10,
11]. For staple food crops, regardless of the data source, the most frequently adopted methods in practical crop mapping tasks are the traditional supervised machine learning algorithms (e.g., random forests, RF) [
12,
13,
14] or phenology-based methods (e.g., simple threshold-based method) [
15,
16,
17]. Although these methods have contributed to practical crop mapping tasks, they still have some disadvantages, as described below.:
First, traditional supervised machine learning algorithms depend on localized ground samples, limiting model transferability and product timeliness. This limitation arises from the fact that models trained on localized ground samples often fail to capture the essential change patterns of land cover, and their performance tends to degrade significantly [
18,
19,
20,
21,
22] when applied to target regions that are geographically distant from the reference area and lack ground truth data, as similar land cover types may exhibit distinct phenological behaviors due to regional environmental differences [
23,
24,
25]. Additionally, the collection of non-target crop ground samples (i.e., negative samples) is also relatively challenging, as it requires a wider range of land cover types than the target crop to ensure sufficient diversity and representativeness for robust model training. The issue becomes more pronounced in deep learning models with strong feature extraction capabilities. These challenges indicate that traditional supervised machine learning methods alone are insufficient for timely updates and spatial scalability of crop mapping data products [
26,
27,
28,
29].
Second, simple phenology-based methods can only extract pixels that exhibit obvious features consistent with the prior phenological information of the target crop during specific growth stages. This is mainly because these methods are built on the prior threshold of vegetation index corresponding to the key growth stages of crops [
30,
31,
32]. As a consequence, they tend to focus solely on surface-level spectral characteristics of land cover at specific observation period (i.e., in multi-temporal data, land cover features are typically treated as temporally independent), while ignoring the temporal variation patterns of land cover across different observation periods. Therefore, while simple phenology-based methods avoid the need for manually annotated samples, they are less applicable in regions characterized by diverse crop types and overlapping phenological stages.
In light of this, accurately capturing the temporal phenology patterns of target crops becomes essential for timely updates and spatial expansion of crop mapping data products, particularly under conditions of limited ground sample availability. Currently, to address this challenge and better incorporate the time-varying growth characteristics of crops, various time series similarity measurement methods—originally developed in the speech recognition community—have been adapted for use in agricultural remote sensing community [
33,
34,
35]. Compared with simple phenology-based methods, these approaches have demonstrated improved crop mapping performance and achieved results comparable to those of traditional supervised machine learning models [
36,
37,
38]. Among them, Dynamic Time Warping (DTW) is one of the most representative techniques [
33,
34,
38,
39].
However, when the DTW is directly applied to crop phenological time series, it can result in excessive stretching or compression along the temporal axis, thereby limiting its effectiveness in crop mapping. To overcome this limitation, the method has been further refined in the agricultural remote sensing community through the introduction of time weighting—resulting in the Time Weighted Dynamic Time Warping (TWDTW) approach—which retains the nonlinear alignment capability of DTW while effectively capturing the seasonal dynamics of crop growth [
40]. For example, Shen et al. applied TWDTW combined with Landsat and Sentinel imagery to successfully produce a high-resolution (30 m) maize distribution map across 22 major provinces in China, achieving an overall accuracy of 79%, with user accuracy and producer accuracy of 81.59% and 76.15%, respectively, demonstrating the practical utility and effectiveness of TWDTW for large-scale crop monitoring [
41]. Nonetheless, it is worth noting that most related studies rely on crop time series features covering the full season for analysis, often overlooking the potential of phenological features during key growth stages [
34,
42,
43,
44]. While full-season time series provide more comprehensive phenological information, they also increase algorithmic complexity and may reduce transferability due to interannual phenological variations. Moreover, for the input features of TWDTW, either only single-source features are used, or the selection of multi-temporal and multi-source features is driven by empirical rules [
40,
43,
44,
45]. The potential benefits of systematically optimizing features at each observation time to improve TWDTW performance have largely been overlooked [
46,
47,
48,
49]. For example, Chaves et al. applied TWDTW to full-season MODIS time series to map interannual cropping practice changes in the Brazilian Cerrado, without considering feature optimization at key growth stages [
48]. Similarly, Wei et al. conducted early-season crop mapping in Northeast China using Sentinel-2 time series, where feature selection was based on commonly used spectral bands and vegetation indices rather than systematic optimization [
49].
Consequently, in the case of limited ground sample availability, this study took maize, one of the three major staple crops, as the target crop, and systematically explored the potential of multi-source remote sensing images (i.e., Sentinel-1 and Sentinel-2 images) covering key phenological periods of maize to drive TWDTW for maize mapping at a regional scale. Meanwhile, in order to provide a benchmark, we selected the traditional supervised machine learning model (i.e., RF) and the temporal deep learning model (i.e., Long Short-Term Memory, LSTM) for comparison, since their robust performances for crop mapping have been verified [
14,
50,
51,
52]. Then, the advantages of using TWDTW driven by the optimal multi-source time series features were evaluated from multiple perspectives, including optimization of feature and temporal combinations from multi-source remote sensing images, computational complexity, and maize mapping results. The specific objectives of this study are as follows: (1) to propose a strategy for identifying the optimal multi-source time series features for maize; (2) to evaluate, based on optimal multi-source features with different temporal lengths, the comparative performance of TWDTW against commonly used temporal machine learning models in terms of classification accuracy and computational efficiency; (3) to validate the applicability of TWDTW driven by the optimal multi-source time series features across different years.
Meanwhile, the following three research questions (RQs) can be addressed based on the three objectives outlined above:
(RQ1) What are the key time series and feature combinations used for maize mapping based on Sentinel-1/Sentinel-2 images?
(RQ2) How did TWDTW performance differ when driven by Sentinel-1/Sentinel-2 images covering maize key phenological periods versus the full-season?
(RQ3) How accurate were the maize mapping results produced by TWDTW using Sentinel-1/Sentinel-2 images covering key maize phenological periods across different years?
3. Methodology
An overview of the framework investigated in this study, which consists of three principal modules, is shown in
Figure 8: (1) optimal multi-source time series feature determination, (2) performance comparison between RF, LSTM, and TWDTW driven by different feature sets, and (3) TWDTW-based maize mapping using optimal time series features. The details of each module are present in the following sections.
All analyses and machine learning models in this study were conducted in the PyCharm environment using Python 3.8. The TWDTW method was implemented with customized code. Throughout the analysis, we primarily employed commonly used libraries including NumPy 2.2.6, Rasterio 1.4.3, GeoPandas 1.1.1, Scikit-learn 1.7.1, PyTorch 2.8.0+cpu, Joblib 1.5.2, and TQDM 4.67.1.
3.1. Determination of Optimal Multi-Source Time Series Features
3.1.1. Determination of the Optimal Multi-Source Features in Different Observation Dates Using Global Search Strategy
A quantitative separability indicator-based global search strategy was used to determine the optimal multi-source features for each observation date. This approach could ensure that the selected feature combinations consistently provide high separability between maize and non-maize across different observation dates. The specific determination of the optimal multi-source features for each observation date is illustrated in
Figure 9. For each observation date, all possible feature combinations were generated. The separability indicators of these combinations were then calculated to identify the optimal multi-source features for that specific date. Finally, a comprehensive analysis was conducted on the optimal multi-source feature combinations identified for each observation date. The feature combinations that most frequently yielded the highest separability indicators across all observation dates were identified as the most representative for distinguishing maize from non-maize, and considered the optimal multi-source features.
Moreover, the Jeffries–Matusita (JM) distance was used in this study as the separability indicator, since it can overcome the limitations of traditional metrics in high-dimensional feature spaces, can effectively measure the differences between the joint probability distributions of multiple classes, and has demonstrated robust performance in the agricultural remote sensing community [
55,
56]. The specific calculation formulas are explained in Equations (3) and (4):
where
mi represent the mean of feature vector of class
i, ∑
i and denotes the covariance matrix of class
i. The superscript T represents the transpose of the matrix. The value range of JM distance is [0, 2]—the two classes can be completely distinguished when it equals 2, and they are completely confused when it equals 0.
3.1.2. Determination of the Optimal Multi-Source Time Series Features Using Global Search Strategy
Based on the optimal multi-source feature combination identified in
Section 3.1.1, the optimal multi-source time series feature was subsequently derived. Specifically, each observation date was used as a start point, with the sequence length progressively extended until including the final observation date (i.e., the 12th date).
Figure 10 illustrates this process using the first observation date as the start point. On this basis, all Sentinel-1 images within the observation range of each Sentinel-2 time series combination were used in combination with the corresponding Sentinel-2 images, forming multi-source time series datasets. This process can also be used when other observation dates are used as start point. Similarly, for each time series combination, the JM distance between maize and non-maize was assessed using the optimal multi-source feature combination identified in
Section 3.1.1. Consequently, by combining the optimal multi-source features for common observation dates, the optimal recognition period for maize mapping could be found through this type of method.
3.2. Performance Comparison Between RF, LSTM, and TWDTW Driven by Different Feature Sets
Based on the full-season optimal multi-source features and the constructed optimal multi-source time series features, this study compared the differences in maize mapping performance between TWDTW, RF, and LSTM.
For the supervised RF and LSTM, 10% of the maize and non-maize pixel samples from 2021, corresponding to a total of 160,833 pixels, were randomly selected as the training set to build the models. Data from both 2020 and 2021 were used to evaluate model performance, as comprehensive ground reference datasets were available for these two years. Additionally, the model structure and parameter settings were the same as those reported in the references [
57,
58].
For the TWDTW, the algorithm achieves alignment of temporal sequences by nonlinearly warping the time axis and incorporating time-based weights, as shown in
Figure 11. This helps to more accurately capture the seasonal growth patterns shared between the standard temporal curve of the target crop and the temporal curve of the pixels to be classified. For the time-weighted component, a Gaussian function was employed to penalize the time offset (i.e., excessive distortion on the time axis). The specific calculation formula is explained in Equation (5):
where
t is the time offset, representing the time difference between two time points in the sequence.
μ represents the central time point, which is usually set to 0 and indicates an ideal match in time.
σ represents the standard deviation, which is related to the data density of the time series.
σ is typically set as 1.2 to 2 times the median time interval between adjacent time points, which could be used to control the decay rate of the time weight and improve the adaptability to different time series distributions.
Moreover, different from that is driven by the single feature, the TWDTW employed in this study is driven by the optimal multi-source time series features, allowing for a more comprehensive characterization of temporal dynamics, illustrated in
Figure 12. Specifically, multiple standard time series curves for maize were generated by averaging the values of maize samples for each time series feature included in the optimal multi-source time series features. The individual TWDTW distance between the pixel to be classified and the corresponding standard maize time series curve was then calculated for each time series feature. The final TWDTW distance is computed by summing the individual TWDTW distances corresponding to each time series feature within the optimal multi-source time series features. A smaller TWDTW distance indicates a higher similarity between the pixel and the standard time series curve, thereby implying a greater likelihood that the pixel belongs to the maize category.
In this study, the standard maize time series curve constructed from 2021 was used as a reference for cross-year mapping. Time series data from other years were directly matched against this curve to perform maize mapping, which could be used to evaluate the temporal generalization ability of the TWDTW method.
3.3. Maize Mapping via TWDTW Driven by Optimal Multi-Source Time Series Features
The optimal standard time series curves for maize, constructed in 2021, were used as the benchmark. Subsequently, based on TWDTW and optimal multi-source time series features, maize maps of Yangling District were generated for the years 2020 to 2023. Meanwhile, the extraction results were assessed from both spatial and statistical perspectives, including spatial consistency and extracted area-based comparisons.
For maize spatial distribution verification, the evaluation indicators were calculated by comparing differences between mapping results and ground reference datasets. The indicators mainly included user accuracy, producer accuracy, F1-score, etc. Please refer to [
59] for the specific mathematical expression. For verification of the maize area statistics, average relative error was calculated by comparing the differences between the maize area derived from Sentinel-1/2 images and the maize area recorded in the statistics data.
4. Experimental Results and Discussions
4.1. Optimal Multi-Source Time Series Features of Sentinel-1 and Sentinel-2 Images for Maize Mapping
(RQ1) In order to construct the optimal multi-source time series features of Sentinel-1 and Sentinel-2 images for maize mapping, the two-step JM distance-based global search strategy was used to select optimal multi-source features at each observation date and to build optimal multi-source time series features by incorporating temporal information. Subsequently, the standard time series curve for maize was established using the constructed dataset.
4.1.1. Determination of the Optimal Features of Sentinel-1/2 Images Across Multiple Observation Dates
Based on the framework presented in
Section 3.1.1, the optimal features of Sentinel-1/2 across multiple observation dates were systematically selected. The JM distances from different feature combinations for each observation date were calculated. Due to the large number of feature combinations, only the top eight groups with the highest JM distance values for each observation date are displayed (i.e., 26 feature combinations, as listed in
Table 2), and their JM distances calculated from different observation dates are shown in
Figure 13. Different colors represent the values of JM distance calculated from different feature combinations.
It can be seen that the feature combination including all features presented in this study (Feature combination 1) most frequently yielded the highest JM distance across multiple observation dates (
Figure 13). These features include twelve multispectral bands and two vegetation indices (NDVI and EVI) from Sentinel-2 images, as well as two polarization features (VV and VH) from Sentinel-1 images. In addition, it was worth noting that JM distances calculated from different feature combinations on the fifth and sixth observation dates, which corresponds to the early jointing stage of maize, were significantly higher than those from other observation dates. This is mainly because maize undergoes rapid changes in its external morphological characteristics during the early jointing stage, particularly in plant height and leaf expansion. Therefore, at this phenology stage, spectral signals from other land cover types are primarily governed by surface diffuse reflection from the soil, whereas maize exhibits strong volumetric scattering due to its developing canopy structure. This contrast results in increased spectral separability between maize and other land cover types.
In summary, the combination of twelve spectral bands, NDVI, EVI, and VV and VH polarization features represented the optimal multi-source features for distinguishing maize from non-maize throughout the maize growth period. Therefore, based on this feature combination, this study further explored the optimal temporal combination for maize discrimination.
4.1.2. Determination of the Optimal Multi-Source Time Series Features of Sentinel-1/2 Images
Based on the optimal feature combinations obtained from
Section 4.1.1 and the framework presented in
Section 3.1.2, the optimal multi-source time series features of Sentinel-1/2 were selected. The JM distances derived from the optimal multi-source features constructed over different temporal windows corresponding to various growth stages of maize are shown in
Figure 14. The curves in different colors represent the JM distance distributions calculated from time series that gradually incorporate data from various initial observation dates to the maturity stage of maize.
It can be seen that the JM distance was not the highest when images covering the entire maize growth cycle were used (
Figure 14). This may be attributed to the presence of non-maize crops in the region that share similar phenology stages with maize, especially during the early growth stages, which limits the separability between maize and non-maize even with the introduction of multi-temporal data. This also indicates that directly using data spanning the entire growth cycle of maize may not be optimal for accurate maize mapping. In contrast, when only the optimal multi-source features from Sentinel-1/2 images during the mid-to-late growth stages were used, the JM distance between maize and non-maize reached its maximum (i.e., 2) and tended to stabilize. Specifically, based on the optimal multi-source features, when the fifth observation date was used as the starting point, the JM distance between maize and non-maize was effectively improved and remained constant at 2 as optimal multi-source features from subsequent dates were progressively added to the time series. However, when the time series features exclude the fifth observation date, the separability decreases significantly. This indicates the importance of multi-source features from the fifth observation date (i.e., the observation dates aligned with the maize jointing stage) in accurately distinguishing maize from non-maize. Moreover, when the 10th observation date was used as the starting point, and subsequent data after the 10th observation date were incorporated into the time series feature set, the JM distance exhibited a declining trend. This means that the data from observation dates after the 10th observation date had little impact on distinguishing maize from non-maize.
Therefore, to utilize shorter yet effective time series features, this study selected the optimal multi-source features from the 5th to 10th observation dates, corresponding to the period from late July to mid-September. This period aligns with the jointing to tasseling stages of maize. The observation dates of the final optimal multi-source time series features are shown in
Figure 15. Compared with using full-season time series data, this temporal combination not only maintained a high level of class separability but also reduced redundant observations. Further details and validations are presented in the subsequent sections.
4.1.3. Construction and Analysis of Standard Time Series Curves Based on the Optimal Multi-Source Time Series Features from Sentinel-1/2 Images
Based on the optimal multi-source time series features identified in
Section 4.1.1 and
Section 4.1.2, sixteen standard maize time series curves were constructed, each covering six observation dates spanning from the jointing to tasseling stages. To improve the robustness and representativeness of the standard time series curves, maize sample pixels were further screened, rather than relying solely on the mean values of each feature across all samples.
Specifically, the data from 2021 was taken as an example. For each observation date, the initial mean and a two-standard-deviation range for each feature were derived from the 98,982 maize pixels in the reference data. These statistics were used to represent the overall distribution of maize features in the study area. Subsequently, to remove potential outliers and reduce extreme variability, only maize pixels with feature values falling within ±1 standard deviation of the mean were retained, resulting in 73,682 pixels. Finally, the mean values of different features were recalculated based on the filtered data to construct the optimized maize standard time series curve. The specific changes in the maize standard time series curves based on different features before and after optimization in Yangling District in 2021 are shown in
Figure 16.
It can be seen that this data filtering method could reduce the fluctuation of pixel features used to construct the maize standard time series curve, without substantially altering its mean values (
Figure 16). This indicates that the method can effectively suppress the influence of outliers and noise, thereby improving the stability and representativeness of the resulting time series curves, which is crucial for improving the accuracy of TWDTW-based crop mapping. Furthermore, the minimal change in mean values before and after optimization provides a certain degree of validation for the reliability of the overall maize distribution in the open-source reference data. This also indicates that constructing a maize standard curve based on the open-source reference data used in this study is feasible.
4.2. Comparison Between TWDTW and Traditional Supervised Machine Learning Methods for Maize Mapping Across Different Time Series Lengths
(RQ2) Based on the full-season optimal multi-source features and the constructed optimal multi-source time series features, this study compared the performance of TWDTW, RF, and LSTM methods in maize mapping within Yangling District, focusing on both mapping accuracy and computational complexity.
4.2.1. Comparison of Maize Mapping Methods Using Full-Season Optimal Multi-Source Features
The TWDTW, traditional supervised RF, and LSTM methods, driven by the full-season optimal multi-source features, produced maize mapping results for Yangling District across different years, as shown in
Figure 17. Their corresponding accuracy evaluation results for different years are presented in
Table 3, and the detailed confusion matrices can be found in
Tables S1 and S2 of the Supplementary Material.
All three methods could effectively extract the spatial distribution of maize in the study area for both 2020 and 2021 (
Figure 17 and
Table 3). In particular, the TWDTW method, driven by the full-season optimal multi-source features, significantly outperformed the supervised RF and demonstrated certain advantages over the supervised LSTM in terms of mapping accuracy and stability. Specifically, for the maize mapping in 2021, the TWDTW achieved an overall accuracy of 98.01%, with a user accuracy of 97.65%, a producer accuracy of 89.92%, and an F1 score of 0.9369. Compared with the supervised RF and LSTM, these indicators showed improvements of 2.46% and 7.88% in user accuracy, 10.99% and 4.76% in producer accuracy, and 0.0739 and 0.0629 in F1 score, respectively. The superior performance of TWDTW was also evident in the maize mapping results for 2020, further confirming its robustness across different temporal contexts. However, it was worth noting that the overall accuracies of TWDTW were slightly lower than those of the supervised RF and LSTM models. This was mainly due to the significant class imbalance in the study area, where non-maize pixels greatly outnumbered maize pixels. As a result, when the model tends to classify more pixels as non-maize, the overall accuracy becomes higher.
Moreover, representative maize-intensive and maize-sparse regions in Yangling District were selected for visual comparison to assess the effectiveness of TWDTW driven by full-season optimal multi-source features in mapping spatial details, as shown in
Figure 18 and
Figure 19.
For maize mapping across different planting regions and years, it can be seen that in both maize-intensive and maize-sparse regions, the supervised RF and LSTM models performed well in extracting large-scale contiguous maize planting regions (
Figure 18 and
Figure 19). However, under the influence of mixed pixels along field boundaries, noticeable misclassification along field boundary and salt-and-pepper noise were observed in the mapping results of RF and LSTM. In contrast, TWDTW could effectively preserve field integrity with fewer boundary errors and significantly reduce salt-and-pepper noise in the mapping results, thereby producing the most accurate maize maps across different planting regions in Yangling District. This is mainly because, unlike the supervised RF and LSTM models that rely primarily on spectral values, TWDTW exploits the temporal trajectory of crop growth and aligns it with standard crop time series curves. This trajectory-based comparison emphasizes the temporal consistency and shape of vegetation dynamics, thereby reducing the sensitivity to spectral distortions introduced by non-crop components within mixed pixels. As a result, TWDTW can partially overcome the effects of mixed pixels.
In summary, compared with the supervised RF and LSTM, commonly used in the agricultural remote sensing community, TWDTW driven by the full-season optimal multi-source features can also be applied to maize mapping and demonstrate certain advantages in maize mapping performance. However, the use of full-season optimal multi-source features may increase the computational complexity of TWDTW, which can also lead to a certain amount of redundant information. Therefore, in the following section, we would further explore the maize mapping performance of TWDTW driven by optimal multi-source time series features.
4.2.2. Comparison of Maize Mapping Methods Using Optimal Multi-Source Time Series Features
Moreover, the TWDTW, traditional supervised RF, and LSTM methods, driven by the optimal multi-source time series features, produced maize mapping results for Yangling District across different years, as shown in
Figure 20. Their corresponding accuracy evaluation results for different years are presented in
Table 4, and the detailed confusion matrices can be found in
Tables S3 and S4 of the Supplementary Material.
It can be seen that all three methods, driven by optimal multi-source time series features, were still able to effectively extract the spatial distribution of maize in the study area for both 2020 and 2021 (
Figure 20 and
Table 4). Similarly, the TWDTW method, driven by optimal multi-source time series features, significantly outperformed the supervised RF and LSTM in terms of mapping accuracy and stability. Specifically, for maize mapping in 2021, the overall accuracy, maize user’s accuracy, producer accuracy, and F1 score of TWDTW reached 99.61%, 99.95%, 92.91%, and 0.9630, respectively. Notably, the maize user’s accuracy, producer accuracy, and F1 score of TWDTW showed greater improvements than those of the supervised RF and LSTM. Its superior performance could also be observed in the maize mapping results for 2020. In addition, compared with the methods driven by full-season optimal multi-source features, those using the optimal multi-source features further improved maize mapping performance. These results confirmed the effectiveness of the selected temporal features in this study. In particular, the improvement observed in TWDTW was more pronounced.
Moreover, to further evaluate the spatial details of maize mapping, different representative maize planting regions in Yangling District, selected in
Section 4.2.1, were used for visual comparison to assess the effectiveness of TWDTW driven by optimal multi-source time series features, as shown in
Figure 21 and
Figure 22.
For maize mapping across different planting regions and years, it can be seen that the performance of the three methods driven by optimal multi-source time series features was generally consistent with that of the same methods driven by full-season optimal multi-source features in both maize-intensive and maize-sparse regions (
Figure 21 and
Figure 22). RF and LSTM exhibited considerable misclassification in small and fragmented maize planting areas, whereas TWDTW produced the most accurate mapping results in these regions.
In summary, although RF and LSTM can also achieve good maize mapping results, they rely heavily on a large number of manually annotated ground samples across various land cover categories. This dependency limits their transferability and timeliness, especially in complex environments characterized by diverse non-target classes (e.g., bare soil, various non-target crops, and other types of vegetation). In contrast, TWDTW can classify pixels by matching them with the standard time series curve of the target crop, even with limited samples of the target crop. This process does not require the collection of non-target crop samples, which greatly facilitates practical crop mapping. In addition, it was worth noting that compared with the methods driven by full-season optimal multi-source features, those using optimal multi-source time series features achieved improved maize mapping performance while requiring less input data. This significantly reduces the computational complexity of the method, which would be proved in the subsequent content.
4.2.3. Comparison of Computational Complexity Between Full-Season and Optimal Time Series Schemes
For practical crop mapping tasks, while high mapping accuracy is important, computational complexity ultimately determines a model’s feasibility for large-scale crop mapping. Consequently, we further compared computational complexity for models driven by optimal multi-source features under the full-season and optimal time series schemes, as listed in
Table 5.
It can be seen that the inference time of all models was significantly reduced when using the optimal multi-source features from the optimal time series (
Table 5). Specifically, for TWDTW, its inference time dropped from 154.22 min to 38.73 min, a reduction of approximately 74.9%. For the supervised RF and LSTM, their inference times dropped by about 24.9% and 26.4%, respectively. The reduced inference time of the RF model using optimal multi-source time series features could be attributed to a 43% reduction in parameters compared to the version based on full-season optimal multi-source features, resulting in significantly lower computational complexity. For the LSTM model, the number of parameters remained the same across the two feature modes, as it primarily depends on the input feature dimension (i.e., 16) at each time step, the hidden state size, and the number of LSTM layers, all of which were kept constant in this study. However, the longer time sequence in the full-season scheme increased computational complexity, resulting in a longer inference time. For the TWDTW, a non-parametric method with no trainable parameters, the computational cost is primarily determined by the time series length (n) and feature dimension (d), with a complexity of O (n
2·d). Consequently, when optimal multi-source features from the reduced time series were used, the computational cost of TWDTW could theoretically be reduced to 25% of that under the full-season scheme, as the time series length was halved while the feature dimension per observation date remained constant at 16. Combined with the mapping performance results in
Section 4.2.1 and
Section 4.2.2, this further confirms the practical value of TWDTW constructed with optimal multi-source features from the optimal time series for large-scale crop mapping tasks.
In general, the incorporation of optimal multi-source time series features effectively improved mapping performance while substantially lowering computational costs compared to the full-season scheme, particularly for the TWDTW, which relies less on ground samples (i.e., only representative samples of target crops are needed, without requiring samples of complex non-target crop categories).
4.3. Maize Mapping for Yangling District via TWDTW Driven by Optimal Multi-Source Time Series Features
(RQ3) The optimal multi-source time series features from different years were input into the TWDTW model calibrated using 2021 data, to generate 10 m maize maps of Yangling District from 2020 to 2023. They were validated from the perspectives of maize spatial distribution and maize area statistics.
4.3.1. Comparison Between the Mapping Results and Reference Datasets
The specific maize mapping results of Yangling District for different years are shown in
Figure 23, and their accuracies, evaluated against ground reference data, are summarized in
Table 6, and the detailed confusion matrices can be found in
Table S5 of the Supplementary Material.It can be seen that the spatial distribution of maize cultivation in Yangling District remained relatively stable from 2020 to 2023, with cultivation primarily concentrated in the central and southern areas (
Figure 23). The TWDTW-based maize maps, driven by optimal multi-source time series features, maintained high accuracy across the four years (
Table 6). Specifically, the overall classification accuracies consistently remained above 99%, exhibiting minimal interannual variation. Additionally, user accuracy, producer accuracy, and F1 scores for maize classification were all stably above 85%, with the difference between user and producer accuracy remaining within a 10% range. This indicates that both omission and commission errors for maize were limited. However, the producer’s accuracy for maize in 2023 dropped to 88.09%, possibly due to limitations in the accuracy of visual interpretation reference samples and interannual variations in crop phenology (i.e., There are certain differences in crop phenology across different years, especially when the years are widely spaced), which together led to misclassification in a small number of areas.
Moreover, visual analysis of maize extraction results from local areas in Yangling District across different years was conducted to assess the robustness of the TWDTW method, driven by optimal multi-source time series features, in capturing local details. Specifically, the visually interpreted regions, as described in
Section 2.2.4, from 2022 and 2023 were used as reference areas to analyze the local maize mapping results over these years. For 2020 and 2021, the mapping results were directly validated using open-source ground reference data within the areas interpreted in 2022 and 2023. The specific maize mapping results across different years for this local area are shown in
Figure 24.
It can be seen that the maize mapping results generated by the TWDTW method, driven by optimal multi-source time series features, demonstrated relatively stable performance in the same local areas over multiple years, with minimal classification error (
Figure 24). In particular, the maize mapping results for 2020 and 2021 better preserved field integrity, with fewer boundary errors, whereas slightly more misclassifications occurred along field edges in 2022 and 2023, especially in 2023, when this type of error was relatively more frequent. A possible reason is that interannual variations in crop phenology were smaller in 2020 and 2021, leading to better alignment between the standard curve and observed temporal trajectories (The maize standard curve was constructed using data from 2021). This allows TWDTW to more accurately discriminate field boundaries (i.e., mixed pixels), whereas larger phenological shifts in 2022 and 2023 resulted in slightly higher boundary misclassification.
In general, from the perspectives of maize spatial distribution, the TWDTW driven by optimal multi-source time series features can effectively extract the maize planting areas across different years in Yangling District.
4.3.2. Comparison Between the Extracted Maize Area and Subnational Statistics Data
Moreover, maize areas extracted from Sentinel-1/2 images across different years were compared with those recorded in statistical data, as reported in
Table 7.
It can be found that based on the TWDTW driven by optimal multi-source time series features, the four-year average relative error between the maize planting area of Yangling District derived from Sentinel-1/2 images and the statistical data was 6.61% (
Table 7). This may be due to differences in the phenology periods of maize between 2021 and 2023. Therefore, when the maize standard curve constructed in 2021 was directly used for 2023, even with optimal multi-source time series features covering key growth periods, the mapping performance of TWDTW can still be limited to some extent.
In summary, in terms of maize spatial distribution and area statistics, the TWDTW driven by optimal multi-source time series features robustly extracted maize planting information from Sentinel-1/2 images over multiple years. Moreover, this method demonstrated satisfactory interannual transferability when the temporal gap between years was relatively short. It holds significant practical value for obtaining crop distribution information in regions with limited sample availability.
4.4. Advantages and Limitations
Compared with existing TWDTW-based crop mapping methods [
43,
44,
45,
60], this study employed the two-step JM distance-based global search strategy to select optimal multi-source features at each observation date and to build optimal multi-source time series features by incorporating temporal information. Then, the differences between TWDTW driven by full-season optimal multi-source features, TWDTW driven by optimal multi-source time series features, and commonly used supervised methods were systematically analyzed. The analysis focused on mapping performance across different years and computational complexity, avoiding the direct use of full-season time series data or empirically selected multi-temporal and multi-source features. Finally, the maize maps produced by the optimal multi-source time series features-driven TWDTW were consistent with existing research results, as the evaluation metrics calculated based on previous maize maps all reached desirable levels [
54].
However, there were still some limitations in this study, described as follows:
(1) In this study, optimal multi-source time series features were fused by directly stacking feature dimensions. However, high feature dimensionality may still occur even after optimization. This issue can be addressed by introducing criteria for dimensionality reduction and selecting appropriate reduction methods.
(2) In this study, the standard corn growth curve established in 2021 was directly applied to TWDTW-based cross-year mapping for other years. However, when the phenology characteristics of maize vary significantly between years, the effectiveness of this transfer may be limited, especially when the temporal intervals are long (i.e., The situation of larger phenological shifts may occur). To address this limitation, the benchmark maize standard curve can be gradually refined by integrating multi-year sample datasets from diverse regions, which would improve the spatiotemporal generalization ability of TWDTW and enhance its robustness in large-scale regions.
(3) In this study, the advantages of the TWDTW for maize mapping, based on the constructed optimal multi-source time series features, were mainly validated in Yangling District, a plain area, while its applicability in other complex environments was ignored. Such environments pose additional challenges, including terrain-induced spectral variability, crop mixtures within smallholder fields, and irregular planting patterns. These issues can be further investigated through transfer experiments using the TWDTW method with optimal multi-source time series features in different regions, thereby helping to validate and enhance its generalization capacity.
5. Conclusions
In the case of limited ground sample availability, this study systematically explored the potential of multi-source remote sensing images (i.e., Sentinel-1 and Sentinel-2), acquired during key phenological stages of maize, to support TWDTW-based mapping at the regional scale. First, the optimal multi-source time series features were identified through a two-step JM distance-based global search strategy. Then, based on optimal multi-source features with different temporal lengths, a systematic comparison of maize mapping performance and computational complexity between TWDTW and commonly used supervised machine learning models in agricultural remote sensing was conducted. Finally, maize maps of Yangling District from 2020 to 2023 were produced using optimal multi-source time series features-based TWDTW. The main conclusions are described as follows:
(1) For the maize mapping tasks in Yangling District, twelve spectral bands, the NDVI, EVI, as well as VV and VH corresponding to the maize jointing to tasseling stages, were the optimal multi-source time series features.
(2) For the maize mapping tasks in Yangling District, TWDTW driven by optimal multi-source time series features not only significantly reduced computational complexity, but also outperformed both TWDTW using full-season optimal multi-source features and traditional supervised machine learning models reliant on large multi-category sample sizes.
(3) Maize maps of Yangling District from 2020 to 2023, produced using TWDTW based on optimal multi-source time series features, consistently achieved overall accuracies above 90%, with an average relative error of only 6.61% compared to statistical yearbook data.
We hope this study may provide a new and comprehensive strategy for feature optimization in maize mapping tasks. In addition, it may offer guidance for enhancing TWDTW performance in large-scale crop mapping, which is particularly important when sample availability is limited.