1. Introduction
Soybeans, as one of the most important food crops, are not only a major source of high-quality plant-based protein, but also a key raw material for the production of soybean oil and soybean meal [
1,
2,
3,
4]. Soybean meal, as a high-protein feed, is widely used in animal husbandry, particularly in the production of pork, beef, and poultry [
5]. With global population growth and shifts in dietary patterns, the demand for plant-based protein and animal feed continues to rise, making soybeans an irreplaceable component in the global agricultural and food system [
6,
7]. In addition, soybeans are also used in the production of biodiesel, playing a crucial role in supporting the renewable energy sector [
8]. China, as the fourth-largest soybean producer and the largest soybean importer in the world, plays a vital role in ensuring both food and energy security [
9]. Nevertheless, due to rapid urbanization and industrialization, rising global temperatures, and the implementation of the Soybean Revitalization Plan, significant changes have occurred in soybean production in Northeast China [
10]. Therefore, mapping soybean distribution in Northeast China is of great significance for safeguarding national food and energy security.
Northeast China—primarily comprising Heilongjiang, Jilin, and Liaoning provinces—is the country’s main soybean-producing region, contributing the majority of the national soybean output [
11,
12]. Between 2013 and 2022, soybean cultivation in this region underwent significant spatiotemporal changes. During this period, the Chinese government introduced a series of agricultural policies aimed at optimizing the agricultural structure, enhancing food security, and improving agricultural productivity [
13,
14]. For example, the implementation of the soybean target price subsidy policy in 2014 [
15] and the launch of the “crop rotation and fallow system pilot” in 2016 had far-reaching impacts on soybean cultivation in Northeast China. Meanwhile, during the China–U.S. trade tensions—especially after 2018—China imposed tariffs on U.S. soybeans, leading to domestic soybean price fluctuations and reduced imports [
16]. This further stimulated the expansion of soybean planting areas in Northeast China. However, market uncertainty and price volatility also posed risks and challenges for farmers [
10]. Moreover, the increasing frequency of extreme weather events caused by climate change, such as floods and droughts, has had a non-negligible impact on soybean production [
17].
In recent decades, remote sensing imagery has played a crucial role in crop identification and monitoring [
18,
19,
20]. Many researchers have utilized remote sensing data to conduct crop mapping at both national and regional scales [
21,
22,
23,
24,
25]. Traditionally, most soybean maps have been generated through crop classification using MODIS satellite data with a coarse spatial resolution of 500 m to 1 km. While daily MODIS data can provide continuous crop phenological information at high temporal resolution, its coarse spatial resolution is insufficient to accurately capture the distribution of crops in China, which is dominated by small-scale farmland. To address this challenge, a reasonable solution is to use high-spatial-resolution satellites such as Sentinel and Landsat to generate crop maps [
26,
27,
28]. Although Landsat imagery has a lower spatial resolution than Sentinel imagery, Landsat offers more than 50 years of long-term Earth observations, making it an indispensable foundation for long-term, large-scale land use analysis.
In earlier studies, scholars typically relied on single-date or multi-temporal imagery as the primary data source for crop remote sensing classification [
29,
30,
31]. By acquiring one or more images during the key growth stages of crops, they could conduct classification tasks. However, such approaches often involved high computational complexity, while the extracted crop feature information remained relatively limited, leading to lower classification accuracy [
32]. In recent years, time-series remote sensing data have increasingly gained attention in crop identification research because of their ability to accurately capture crop growth and development processes [
33,
34,
35]. Several studies have demonstrated that leveraging time-series data can significantly enhance both the accuracy and stability of crop classification [
36,
37]. Meanwhile, machine learning algorithms, with their powerful self-learning capabilities and strong generalization performance, have shown remarkable effectiveness and robustness in crop classification tasks, becoming the most widely used methodological approach [
36,
38,
39]. Nevertheless, accurately distinguishing typical crop types such as soybean and maize in remote sensing imagery remains a substantial challenge [
27].
To address the above challenges, this study proposes a simple yet effective approach that integrates remote sensing time-series image synthesis with deep learning. Although recent studies have highlighted the advantages of time-series remote sensing data in capturing crop phenological trajectories, major theoretical and empirical gaps remain. Theoretically, existing methods still struggle to fully characterize asynchronous phenological variations caused by cloud contamination, irregular image acquisition intervals, and heterogeneous farming practices. Empirically, most previous works have focused on short time spans and limited spatial extents or relied on high-quality seasonal composites that are difficult to generalize across years and regions. Furthermore, few studies have systematically evaluated how advanced deep learning architectures—particularly Transformer-based models—perform in large-scale, long-term mapping of soybean cultivation under smallholder-dominated agricultural landscapes. This study fills these gaps by developing a unified framework that combines bi-monthly Landsat NDVI maximization, time-series reconstruction, and a Transformer-based classification model. Unlike traditional unsupervised or non-temporal classification approaches, our method explicitly models multi-temporal spectral dynamics, allowing the Transformer to learn long-range temporal dependencies and phenological differences between crops. Our main contributions are as follows: (1) We collected 59,789 ground-truth samples from extensive field surveys conducted between 2017 and 2019, providing a large and reliable training dataset. (2) We developed a time-series image synthesis method tailored for large-scale soybean mapping to mitigate vegetation’s spatiotemporal variability and the effects of cloud contamination. (3) We proposed a Transformer-based deep learning method and conducted comparative experiments with classical machine learning and deep learning approaches to evaluate its superiority. (4) We applied the trained model to generate high-precision soybean distribution maps for Northeast China from 2013 to 2022, producing the first decade-long Landsat-based soybean time-series map for the region. The predicted soybean planting areas showed strong agreement with prefecture-level statistical yearbook data (R2 = 0.9226), demonstrating the model’s robustness and transferability across years and heterogeneous landscapes. This confirms that the proposed framework can effectively map soybean distribution even in smallholder-dominated agricultural systems and provides a valuable reference for agricultural monitoring and food security assessment.
2. Materials
2.1. Study Area
The study area is located in Northeast China, encompassing Heilongjiang Province, Jilin Province, Liaoning Province, and four prefecture-level cities in eastern Inner Mongolia. This region spans approximately from 39° N to 54° N latitude and from 115° E to 135° E longitude, encompassing a diverse range of geographical and climatic conditions. China’s agricultural natural regions are defined by temperature and humidity levels, and Northeast China comprises four such zones, as shown in
Figure 1. Due to differences in water and temperature conditions, these zones vary in heat availability, growing seasons, and crop types (
Table 1).
2.2. Landsat Images
The Landsat data used in this study were obtained from the United States Geological Survey (USGS) and include imagery from Landsat 1, 2, 3, 5, 7, 8, and 9 [
40,
41]. We specifically employed Landsat 8 and Landsat 9 datasets for our analysis. Landsat 8 imagery from the OLI sensor covering the years 2013–2021 and Landsat 9 imagery from the OLI sensor for the year 2022 were selected. For each image, we extracted the Blue, Green, Red, NIR, SWIR1, and SWIR2 spectral bands, as well as the Normalized Difference Vegetation Index (NDVI) [
42] and the image acquisition date. We used the Landsat 8 Collection 2 Tier 1 and near real-time data in DN values, which represent scaled and calibrated sensor radiance. Cloud masking was performed using the QA band. To ensure cloud-free imagery and meet the multi-temporal requirements for crop identification, we adopted a two-month compositing strategy, using data from April–May, June–July, August–September, and October–November (
Figure 2). In cases where certain pixels lacked cloud-free coverage, we filled the gaps with imagery from adjacent months, ensuring a largely cloud-free time series. Ultimately, we obtained four temporal composites per year during the soybean growing season. To ensure the acquisition dates were as close as possible across years, we selected the images corresponding to the maximum NDVI value for each period—an important step for image acquisition in the Google Earth Engine (GEE) platform.
2.3. Sample Data
Each year, independent reference samples were collected across Northeast China, covering Heilongjiang, Jilin, Liaoning, and eastern Inner Mongolia. The dataset included samples from various crop types (
Table 2)—such as soybean, rice, corn, potato, and wheat—as well as multiple non-crop categories, including forest, grassland, shrubland, and bare land. This comprehensive sampling ensured the spatial representativeness of each selected sample and the broad distribution of crop samples across the entire region. Crop samples and part of the non-crop samples were obtained through field surveys using handheld electronic devices. All collected field samples were then cross-validated and corrected using Google Earth imagery and Sentinel-2A images to eliminate potential misidentifications, particularly along roads or field boundaries. Additional non-crop samples were derived from a visual interpretation of Google Earth imagery, where the central points of homogeneous land-cover patches were selected as representative samples to minimize mixed-pixel effects. To ensure maximum representativeness, the samples covered all climatic zones within the study area, providing a balanced distribution across different environmental conditions. All samples were merged and randomly divided into training, testing, and validation sets in proportions of 70%, 15%, and 15%, respectively. By pooling all samples together rather than segmenting them by region, the model’s generalization ability was enhanced, thereby improving its temporal transferability across different years.
2.4. Cropland Mask
In our study, we applied a cropland mask to exclude non-agricultural pixels from the soybean mapping process. Previous studies have demonstrated that using a cropland mask can effectively reduce topological errors in crop mapping [
23]. Therefore, we utilized the China Land Cover Dataset (CLCD) [
43] to extract the cropland layer for each year from 2013 to 2022. After performing soybean classification, we applied this cropland layer to mask the resulting soybean maps. This procedure allowed us to remove non-cropland areas and produce more accurate representations of soybean distribution.
2.5. Agricultural Statistics
We collected municipal-level soybean census data from the Agricultural Statistics Bureau’s statistical yearbooks [
44] to validate the soybean planting area extracted in our study. The dataset mainly covers the years 2013–2022. In our analysis, we examined soybean statistics for 298 municipalities from 2013 to 2022.
3. Methods
As shown in
Figure 3, we developed and trained an end-to-end deep learning model to extract soybean cultivation areas. The entire process consists of four main steps: image preprocessing, feature selection, sample collection, and supervised classification. Among them, image preprocessing and feature selection were performed on the GEE platform; sample collection was conducted through field surveys; and supervised classification was carried out on a local server using Python 3.9 and the PyTorch 1.13.1 framework. The trained model was then applied to identify soybean cultivation areas from satellite images spanning 2013–2022.
3.1. Data Preprocessing
Image processing in this study includes both image inspection and subsequent processing, all of which were conducted on the GEE cloud platform. Building on previous research, we used the Landsat 8 top-of-atmosphere (TOA) product as the data source. The product has already undergone radiometric, topographic, and geometric corrections, which helps minimize the influence of solar elevation angle, terrain, and other factors on crop identification [
45]. First, we inspected the image coverage during the crop growing season within the study area on the GEE platform. Cloud removal was applied to the images, and in line with the requirements for subsequent crop classification, we ensured that each scene contained at least a minimum number of valid pixels. In cases where large portions of an image had very few valid pixels, we reconsidered the compositing strategy. In Northeast China, the main crop growing season spans from May to October. However, studies have shown that the period after maize harvest is also a favorable time for crop identification. Therefore, we analyzed images from April to November. Using 2019 as an example, each year’s dataset contained more than eight valid scenes for classification. The cloud platform was also used to visually inspect the coverage of different phenological periods. Cloud and haze pixels were removed based on the QA_PIXEL band in the Landsat data, which encodes various pixel quality flags. This resulted in a cloud-free time series, though it contained many missing values. To address this, we performed image compositing to replace missing values. We primarily adopted a two-month compositing strategy, which our experiments determined to be an optimal choice.
3.2. Soybean Growth Curve
The soybean growth cycle can generally be divided into three stages: the seedling stage, the vegetative growth stage, and the reproductive stage [
46]. During the seedling stage (from sowing to emergence), NDVI values remain low due to limited vegetation coverage. Entering the vegetative growth stage, soybean plants grow rapidly in height, and leaf area expands quickly, leading to a significant increase in NDVI values that reflects the enhanced green vegetation cover. In the reproductive stage, soybeans enter flowering, pod-setting, and maturity phases; NDVI reaches its peak and then gradually declines as leaves turn yellow and drop off, and vegetation vigor diminishes. Overall, the NDVI curve exhibits a pattern of gradual increase, peak, gradual decline, with the peak usually occurring in the mid to late growing season. This pattern is quite similar to that of certain other crops, such as maize, particularly during the seedling and vegetative growth stages (
Figure 4). However, it is worth noting that maize generally has a longer growing period, with a faster NDVI increase, higher peak, and longer duration at the peak. In contrast, soybeans often have a slightly shorter growth cycle in some regions, with a relatively flatter NDVI peak and an earlier onset of decline. These differences can be effectively used for crop classification or phenology identification based on NDVI time series.
3.3. Feature Selection
To obtain suitable imagery, we used all Landsat 8 images covering the crop growing season in the study area from 2013 to 2022. Considering cloud cover and crop phenological information, we applied a two-month compositing strategy for crop identification, namely April–May, June–July, August–September, and October–November, each composited into a single image. Specifically, crop extraction was performed by identifying the bands from the image in which the NDVI reached its maximum value. Despite these efforts, approximately 5% of the pixels still remained cloud-contaminated, and we conducted additional experiments to assess the impact of residual clouds. In addition to the direct use of spectral bands, we also tested four widely used vegetation indices, which were derived from the spectral data of Landsat imagery. These indices include the NDVI, the Enhanced Vegetation Index (EVI), the Land Surface Water Index (LSWI) and Green Normalized Difference Vegetation Index (GNDVI), as shown in
Table 3.
On the GEE cloud platform, the default compositing approach typically relies on the maximum value, which can lead to inconsistencies across bands because the sampled time for each band may differ. To address this issue, we improved the method by using all bands from the image corresponding to the maximum NDVI value as the composite. This approach not only helps to improve classification accuracy but also enhances the model’s generalization ability, feature extraction capability, and spatiotemporal transferability.
3.4. Crop Classification Model
We employed a Transformer model combined with a Multi-Layer Perceptron (MLP) for the classification of soybeans and other crops [
49,
50]. The Transformer architecture is based on the self-attention mechanism, which enables it to capture global dependencies between any time steps in a sequence without relying on sequential propagation. This property provides strong representational power when dealing with remote sensing time series, particularly for capturing the asynchronous phenological patterns that different crops exhibit at various stages. The Transformer has a notable advantage in modeling long-range dependencies and frequency variations, making it well-suited to flexibly extract temporal feature differences across spectral bands and growth stages. In our model design, the remote sensing image time series is first fed into a Transformer encoder, which consists of multi-head self-attention mechanisms and feedforward neural networks. This encoder extracts multi-scale, cross-temporal key features. A positional encoding mechanism is integrated into the Transformer to preserve the temporal order of the sequence and prevent information loss. The feature representations generated by the Transformer encoder are then passed to a three-layer MLP with hidden layers of 128, 128, and 64 neurons, respectively, to further learn higher-order nonlinear features. Finally, a sigmoid activation function is applied at the output layer to predict the crop category for each sample.
To compare the classification accuracy of different machine learning and deep learning approaches for processing remote sensing time-series data, we selected four representative models: Random Forest (RF) [
51], XGBoost [
52], LSTM + MLP [
50,
53], and Transformer + MLP [
49,
50]. RF is a classical ensemble learning algorithm that has been widely used in remote sensing classification. It serves as a benchmark for traditional, interpretable, and stable models. XGBoost, as an advanced tree-based boosting algorithm, demonstrates excellent performance in structured tabular data such as multi-temporal vegetation indices, representing state-of-the-art ensemble methods that excel in high-dimensional feature learning. LSTM + MLP combines the Long Short-Term Memory (LSTM) network—designed to capture sequential dependencies and temporal dynamics—with a Multi-Layer Perceptron (MLP), effectively learning both temporal and nonlinear spectral relationships. This hybrid architecture represents deep learning methods that focus on modeling time-series data. Transformer + MLP, based on the self-attention mechanism, excels at capturing long-range dependencies without the limitations of recurrent structures. Its integration with an MLP further enhances nonlinear feature learning. This model represents cutting-edge deep learning architectures capable of learning highly generalizable multi-temporal and multi-spectral relationships.
3.5. Quantitative Evaluation Metrics
In this study, we evaluated the accuracy of soybean extraction results using both validation sample data and official municipal-level statistical census data on soybean planting areas. The evaluation employed a sample-based confusion matrix, from which several widely used quantitative metrics were derived, including the F1-score (F1), Overall Accuracy (OA), User’s Accuracy (UA) and Producer’s Accuracy (PA). The formulas are expressed as follows:
where
TP (True Positive) represents the number of soybean samples correctly classified as soybeans.
TN (True Negative) represents the number of non-soybean samples correctly classified as non-soybeans.
FP (False Positive) represents the number of non-soybean samples incorrectly classified as soybeans.
FN (False Negative) represents the number of soybean samples incorrectly classified as non-soybeans.
4. Results
4.1. Accuracy Assessment with Ground Samples
We evaluated the classification performance for soybean and non-soybean categories using F1, OA, UA, and PA. As shown in
Table 4, the proposed maps achieved high accuracy across all years, with an overall accuracy of approximately 89%. To further assess the temporal transferability of the model, we trained the classifier using samples from 2018—selected because this year had the most complete set of crop types within the 2017–2019 sample pool—and tested it on data from 2017 and 2019. In both cases, the model maintained an accuracy of around 89%. Given the comprehensive coverage of crop types in the collected samples, these results indicate that the classifier remains stable across different years. This provides strong support for using the model trained on 2017–2019 samples to generate soybean maps for the entire 2013–2022 period.
To further validate model performance, we compared four models: RF, XGBoost, LSTM + MLP, and Transformer + MLP. The detailed results are presented in
Table 4. The quantitative evaluation shows that the models were ranked in terms of accuracy as Transformer + MLP > XGBoost > RF > LSTM + MLP.
Among them, the Transformer + MLP model achieved the highest accuracy, while the LSTM + MLP model had the lowest. Although LSTM can capture temporal features, it suffered from overfitting on our dataset, with an F1-score of only 87.26%.
4.2. Comparison of the Classification with Agricultural Statistics
The soybean distribution maps of Northeast China from 2013 to 2022 are presented in
Figure 5. We employed a pixel-counting method to calculate the annual soybean planting area for each prefecture-level administrative unit. The estimated soybean areas showed a significant correlation with the prefecture-level statistics reported in the Statistical Yearbooks of Northeast China, with an overall R
2 reaching 0.9226 (
Figure 6). However, it is noteworthy that our estimates consistently exhibited an overestimation compared with the statistical records. This highlights the limitations of soybean mapping based on land satellite images; despite the use of a large number of ground truth samples and high-performance Transformer models, some non-soybean crops are still misclassified as soybeans due to the presence of mixed pixels.
5. Discussion
In this study, we systematically compared the performance of four representative models for soybean classification, namely RF, XGBoost, LSTM + MLP, and Transformer + MLP, using multi-temporal Landsat imagery. The results demonstrate that the Transformer + MLP model consistently achieved the highest overall accuracy and F1 score across all study years, outperforming both classical machine learning and other deep learning approaches. This indicates that the Transformer architecture, when combined with an MLP classifier, can effectively capture the spatiotemporal dependencies inherent in crop phenological dynamics, thereby providing a more robust framework for long-term crop mapping [
54].
Our findings are consistent with recent advances in remote sensing and time-series modeling. Prior studies have demonstrated that the attention-based Transformer model excels in capturing long-range temporal dependencies and global contextual relationships, surpassing recurrent neural networks (RNNs) and LSTMs in various Earth observation tasks. Unlike traditional sequential models that rely on stepwise propagation of temporal information, the self-attention mechanism of the Transformer enables simultaneous modeling of all temporal positions, which enhances its ability to discern asynchronous phenological changes among different crops. Compared to the Transformer-based model, RF and XGBoost showed moderate performance but maintained strong robustness, aligning with previous studies reporting their effectiveness in medium-resolution crop classification [
55]. These tree-based ensemble methods benefit from non-parametric decision boundaries, insensitivity to noise, and fast convergence, making them suitable for datasets with limited features or moderate temporal variability.
In contrast, LSTM + MLP exhibited some capacity for time-series modeling but performed worse than both ensemble methods and Transformer + MLP. This outcome supports prior findings that LSTMs often face overfitting and gradient vanishing issues when trained on limited or noisy remote sensing time series. While LSTM architectures are theoretically capable of capturing temporal dependencies, their sequential computation and limited memory make them less efficient for long-term satellite records spanning multiple years. The Transformer architecture, by contrast, employs parallel computation and global attention mechanisms that allow for more scalable and interpretable time-series modeling.
Theoretically, this study contributes to bridging the gap between spectral feature extraction and phenological pattern recognition in remote sensing. By integrating the Transformer’s temporal attention with the MLP’s nonlinear feature transformation, we demonstrate a unified deep learning framework that can jointly model both temporal continuity and spatial heterogeneity. This methodological innovation expands upon previous approaches that either relied solely on spectral information or treated temporal features as static inputs. Moreover, our approach emphasizes the importance of data synthesis in mitigating cloud contamination and seasonal gaps—an aspect that has been a persistent limitation in Landsat-based agricultural studies.
Empirically, the study underscores the potential of combining deep learning with medium-resolution imagery for long-term agricultural monitoring in smallholder-dominated regions. The high correlation (R2 = 0.9226) between the predicted soybean area and prefecture-level statistical data validates the reliability of our method under heterogeneous landscape and management conditions. This strong agreement supports the feasibility of using freely available Landsat archives and cloud-based processing platforms (e.g., GEE) for national-scale crop monitoring, especially in regions where field data are sparse or inconsistent. These findings align with global efforts to enhance agricultural transparency and food security assessment through open-access Earth observation data.
From a practical perspective, our results highlight distinct model strengths across different application scenarios. Transformer-based models are better suited for tasks that require the learning of complex spatiotemporal dependencies, such as multi-year crop rotation detection or phenological phase tracking. In contrast, RF and XGBoost remain valuable for rapid assessments where model interpretability, computational efficiency, and robustness to noise are prioritized. Future research could explore hybrid frameworks that integrate the interpretability of ensemble learning with the representational power of deep attention networks. Such hybrid strategies could improve both classification accuracy and model generalization across different agro-ecological zones.
Despite the promising performance of the proposed Transformer + MLP framework, several limitations should be acknowledged. First, the computational cost associated with large-scale data processing on the GEE platform is nontrivial. Although GEE provides efficient access to massive Landsat archives, running multi-temporal composites and exporting high-resolution results requires substantial cloud resources and time, which may limit scalability for near-real-time applications or global mapping initiatives. Future studies could explore more lightweight model architectures or distributed training strategies to reduce computational overhead. Second, the method relies heavily on the availability and representativeness of ground-truth samples. The accuracy of supervised models such as Transformer + MLP is constrained by the quality, spatial coverage, and temporal consistency of field data. Given the heterogeneity of agricultural landscapes in Northeast China, future work should consider semi-supervised or active learning approaches to reduce dependence on extensive field surveys while maintaining classification reliability. Third, the mapping results may be sensitive to interannual variations driven not only by biophysical factors but also by socio-economic and policy changes. For instance, government subsidies or land use regulations could influence soybean planting patterns independently of environmental drivers, introducing additional uncertainty in time-series classification. Integrating auxiliary datasets—such as agricultural census data, policy indicators, or economic statistics—could help disentangle human-induced variability from natural vegetation dynamics. Overall, while the Transformer + MLP framework demonstrates strong generalization and robustness for large-scale crop mapping, addressing these computational, sampling, and socio-economic sensitivities will be essential for advancing its operational applicability and policy relevance in future research.
6. Conclusions
This study developed a novel method for large-scale crop mapping by integrating Landsat imagery, the GEE cloud platform, and deep learning models, and applied it to produce long-term soybean distribution maps for Northeast China. The proposed approach fully leveraged the spatiotemporal resolution of Landsat data by constructing a bi-monthly NDVI maximum composite and combining it with a Transformer + MLP architecture. Validation using extensive ground-truth samples demonstrated high classification accuracy, with F1, OA, PA, and Recall values reaching 0.8918, 0.8938, 0.8977, and 0.8859, respectively. Furthermore, the comparison with prefecture-level statistical yearbook data yielded a strong correlation (R2 = 0.9226), confirming the reliability and generalizability of the proposed method.
In the field of soybean remote sensing identification, the Transformer + MLP model has shown superior performance compared with traditional unsupervised and label-free crop classification methods [
56,
57]. Its key advantage lies in its ability to efficiently extract and integrate multi-temporal and multi-spectral features from remote sensing imagery, thus enhancing both classification accuracy and computational efficiency. The proposed method provides a robust framework for large-scale and long-term soybean mapping, with strong potential for extension to other staple crops. It offers valuable support for agricultural monitoring, food security assessment, and the sustainable management of cropland resources.
Despite these promising results, several limitations should be acknowledged. First, soybean mapping accuracy is still affected by mixed-pixel and field-size constraints inherent to Landsat’s 30 m resolution. In Northeast China’s smallholder-dominated landscapes, individual fields often range from 0.1 to 0.5 ha, meaning that many boundary pixels contain mixtures of soybean, maize, and non-crop vegetation. Such mixed pixels likely contribute to a portion of the observed overestimation and classification errors, especially along field edges. Although the use of cropland masks helps mitigate this effect, a systematic quantification of mixed-pixel influence remains an open task for future work. Second, residual uncertainty arises from cloud contamination and missing observations, which can weaken the temporal signature even after applying NDVI compositing. This introduces additional noise into the classification and may slightly reduce accuracy in humid or mountainous areas. Third, although the Transformer model effectively captures temporal dependencies, its high computational cost poses challenges for large-scale or near real-time applications. Finally, this study focuses on soybean mapping in Northeast China; therefore, the model’s transferability to other regions, crops, or management systems has not yet been fully evaluated. A more comprehensive uncertainty analysis and cross-regional validation would further strengthen the generalizability of the proposed framework.
Future work should focus on integrating multi-source satellite data, such as Sentinel-1/2, MODIS, and hyperspectral imagery, to improve temporal continuity and spatial precision. In addition, coupling deep learning with physical process models or data assimilation frameworks could enhance the interpretability of crop growth dynamics. Developing lightweight or hybrid models that balance accuracy, generalization, and computational efficiency will be crucial for large-scale operational deployment. Finally, extending this approach to assess crop yield, phenology, and climate resilience would provide deeper insights into agricultural sustainability and contribute to national food security strategies.
Author Contributions
Conceptualization, Q.X. and Z.H.; methodology, Q.X.; software, Q.X.; validation, Q.X., Z.H. and H.D.; formal analysis, Q.X.; investigation, Q.X.; resources, Q.X., Z.H., H.D. and J.Z.; data curation, Q.X.; writing—original draft preparation, Q.X.; writing—review and editing, Q.X., Z.H., H.D. and J.Z.; visualization, Q.X.; supervision, Q.X.; project administration, Z.H. and J.Z.; funding acquisition, Z.H. and J.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by the National Natural Science Foundation of China (grant no. 42401541); the National Natural Science Foundation of China (grant no. 42301456) and the Natural Science Foundation of Sichuan Province (grant no. 2025ZNSFSC0321).
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Singh, P.; Kumar, R.; Sabapathy, S.N.; Bawa, A.S. Functional and Edible Uses of Soy Protein Products. Compr. Rev. Food Sci. Food Saf. 2008, 7, 14–28. [Google Scholar] [CrossRef]
- Thrane, M.; Paulsen, P.V.; Orcutt, M.W.; Krieger, T.M. Soy Protein: Impacts, Production, and Applications. In Sustainable Protein Sources; Elsevier: Amsterdam, The Netherlands, 2017; pp. 23–45. [Google Scholar]
- Dilawari, R.; Kaur, N.; Priyadarshi, N.; Prakash, I.; Patra, A.; Mehta, S.; Singh, B.; Jain, P.; Islam, M.A. Soybean: A Key Player for Global Food Security. In Soybean Improvement: Physiological, Molecular and Genetic Perspectives; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–46. [Google Scholar]
- Guo, B.; Sun, L.; Jiang, S.; Ren, H.; Sun, R.; Wei, Z.; Hong, H.; Luan, X.; Wang, J.; Wang, X. Soybean Genetic Resources Contributing to Sustainable Protein Production. Theor. Appl. Genet. 2022, 135, 4095–4121. [Google Scholar] [CrossRef]
- Banaszkiewicz, T. Nutritional Value of Soybean Meal. Soybean Nutr. 2011, 12, 1–20. [Google Scholar]
- Messina, M. Perspective: Soybeans Can Help Address the Caloric and Protein Needs of a Growing Global Population. Front. Nutr. 2022, 9, 909464. [Google Scholar] [CrossRef]
- Tomić, J.; Škrobot, D.; Pojić, M. Shift to Plant-Based Proteins: Environmental, Economic, and Social Implications. In Future Proteins; Elsevier: Amsterdam, The Netherlands, 2023; pp. 411–423. [Google Scholar]
- Churasia, A.; Singh, J.; Kumar, A. Production of Biodiesel from Soybean Oil Biomass as Renewable Energy Source. J. Environ. Biol. 2016, 37, 1303. [Google Scholar]
- Naser, M.; Abdelghany, A.M.; Wu, T.; Sun, S.; Tianfu, H. Soybean in Egypt: Current Situation, Challenges, and Future Perspectives. Discov. Sustain. 2024, 5, 425. [Google Scholar] [CrossRef]
- Di, Y.; You, N.; Dong, J.; Liao, X.; Song, K.; Fu, P. Recent Soybean Subsidy Policy Did Not Revitalize but Stabilize the Soybean Planting Areas in Northeast China. Eur. J. Agron. 2023, 147, 126841. [Google Scholar] [CrossRef]
- Chen, W.; Zhang, B.; Kong, X.; Wen, L.; Liao, Y.; Kong, L. Soybean Production and Spatial Agglomeration in China from 1949 to 2019. Land 2022, 11, 734. [Google Scholar] [CrossRef]
- Zhao, J.; Wang, Y.; Zhao, M.; Wang, K.; Li, S.; Gao, Z.; Shi, X.; Chu, Q. Prospects for Soybean Production Increase by Closing Yield Gaps in the Northeast Farming Region, China. Field Crops Res. 2023, 293, 108843. [Google Scholar] [CrossRef]
- Mongol, N.; ZHANG, F. The Transformation of Agriculture in China: Looking Back and Looking Forward. J. Integr. Agric. 2018, 17, 755–764. [Google Scholar] [CrossRef]
- Du, M.; Lei, J.; Li, S. Navigating the Path to Food Security in China: Challenges, Policies, and Future Directions. Foods 2025, 14, 644. [Google Scholar] [CrossRef]
- Ministry of Finance of the People’s Republic of China. Guiding Opinions on the Soybean Target Price Subsidy; Ministry of Finance of the People’s Republic of China: Beijing, China, 2014.
- Adjemian, M.K.; Smith, A.; He, W. Estimating the Market Effect of a Trade War: The Case of Soybean Tariffs. Food Policy 2021, 105, 102152. [Google Scholar] [CrossRef]
- Yuan, X.; Li, S.; Chen, J.; Yu, H.; Yang, T.; Wang, C.; Huang, S.; Chen, H.; Ao, X. Impacts of Global Climate Change on Agricultural Production: A Comprehensive Review. Agronomy 2024, 14, 1360. [Google Scholar] [CrossRef]
- Bauer, M.E. The Role of Remote Sensing in Determining the Distribution and Yield of Crops. Adv. Agron. 1975, 27, 271–304. [Google Scholar]
- Wu, B.; Zhang, M.; Zeng, H.; Tian, F.; Potgieter, A.B.; Qin, X.; Yan, N.; Chang, S.; Zhao, Y.; Dong, Q. Challenges and Opportunities in Remote Sensing-Based Crop Monitoring: A Review. Natl. Sci. Rev. 2023, 10, nwac290. [Google Scholar] [CrossRef] [PubMed]
- Omia, E.; Bae, H.; Park, E.; Kim, M.S.; Baek, I.; Kabenge, I.; Cho, B.-K. Remote Sensing in Field Crop Monitoring: A Comprehensive Review of Sensor Systems, Data Analyses and Recent Advances. Remote Sens. 2023, 15, 354. [Google Scholar] [CrossRef]
- Wardlow, B.D.; Egbert, S.L. Large-Area Crop Mapping Using Time-Series MODIS 250 m NDVI Data: An Assessment for the US Central Great Plains. Remote Sens. Environ. 2008, 112, 1096–1116. [Google Scholar] [CrossRef]
- Song, X.-P.; Potapov, P.V.; Krylov, A.; King, L.; Di Bella, C.M.; Hudson, A.; Khan, A.; Adusei, B.; Stehman, S.V.; Hansen, M.C. National-Scale Soybean Mapping and Area Estimation in the United States Using Medium Resolution Satellite Imagery and Field Survey. Remote Sens. Environ. 2017, 190, 383–395. [Google Scholar] [CrossRef]
- You, N.; Dong, J.; Huang, J.; Du, G.; Zhang, G.; He, Y.; Yang, T.; Di, Y.; Xiao, X. The 10-m Crop Type Maps in Northeast China during 2017–2019. Sci. Data 2021, 8, 41. [Google Scholar] [CrossRef]
- Wang, H.; Ye, Z.; Yao, Y.; Chang, W.; Liu, J.; Zhao, Y.; Li, S.; Liu, Z.; Zhang, X. Improving Cross-Regional Model Transfer Performance in Crop Classification by Crop Time Series Correction. Geo-Spat. Inf. Sci. 2025, 28, 1581–1596. [Google Scholar] [CrossRef]
- Luo, K.; Lu, L.; Xie, Y.; Chen, F.; Yin, F.; Li, Q. Crop Type Mapping in the Central Part of the North China Plain Using Sentinel-2 Time Series and Machine Learning. Comput. Electron. Agric. 2023, 205, 107577. [Google Scholar] [CrossRef]
- Liu, S.; Liu, J.; Tan, X.; Chen, X.; Chen, J. A Hybrid Spatiotemporal Fusion Method for High Spatial Resolution Imagery: Fusion of Gaofen-1 and Sentinel-2 over Agricultural Landscapes. J. Remote Sens. 2024, 4, 0159. [Google Scholar] [CrossRef]
- Zhang, H.; Lou, Z.; Peng, D.; Zhang, B.; Luo, W.; Huang, J.; Zhang, X.; Yu, L.; Wang, F.; Huang, L. Mapping Annual 10-m Soybean Cropland with Spatiotemporal Sample Migration. Sci. Data 2024, 11, 439. [Google Scholar] [CrossRef]
- Li, J.; Xiao, Z.; Sun, R.; Song, J.; Shi, C. Retrieval of Leaf Area Index from the Landsat Surface Reflectance Using Multi-Task Adversarial Transfer Learning. Int. J. Digit. Earth 2025, 18, 2520002. [Google Scholar] [CrossRef]
- Vescovi, F.D.; Gomarasca, M.A. Integration of Optical and Microwave Remote Sensing Data for Agricultural Land Use Classification. Environ. Monit. Assess. 1999, 58, 133–149. [Google Scholar] [CrossRef]
- Turker, M.; Arikan, M. Sequential Masking Classification of Multi-temporal Landsat7 ETM+ Images for Field-based Crop Mapping in Karacabey, Turkey. Int. J. Remote Sens. 2005, 26, 3813–3830. [Google Scholar] [CrossRef]
- Jia, K.; Wu, B.; Li, Q. Crop Classification Using HJ Satellite Multispectral Data in the North China Plain. J. Appl. Remote Sens. 2013, 7, 073576. [Google Scholar] [CrossRef]
- Zhong, L.; Hu, L.; Zhou, H. Deep Learning Based Multi-Temporal Crop Classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
- Siachalou, S.; Mallinis, G.; Tsakiri-Strati, M. A Hidden Markov Models Approach for Crop Classification: Linking Crop Phenology to Time Series of Multi-Sensor Remote Sensing Data. Remote Sens. 2015, 7, 3633–3650. [Google Scholar] [CrossRef]
- Gao, F.; Zhang, X. Mapping Crop Phenology in near Real-Time Using Satellite Remote Sensing: Challenges and Opportunities. J. Remote Sens. 2021, 2021, 8379391. [Google Scholar] [CrossRef]
- Teixeira, I.; Morais, R.; Sousa, J.J.; Cunha, A. Deep Learning Models for the Classification of Crops in Aerial Imagery: A Review. Agriculture 2023, 13, 965. [Google Scholar] [CrossRef]
- Cai, Y.; Guan, K.; Peng, J.; Wang, S.; Seifert, C.; Wardlow, B.; Li, Z. A High-Performance and in-Season Classification System of Field-Level Crop Types Using Time-Series Landsat Data and a Machine Learning Approach. Remote Sens. Environ. 2018, 210, 35–47. [Google Scholar] [CrossRef]
- Li, H.; Di, L.; Zhang, C.; Lin, L.; Guo, L.; Yu, E.G.; Yang, Z. Automated In-Season Crop-Type Data Layer Mapping without Ground Truth for the Conterminous United States Based on Multisource Satellite Imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4403214. [Google Scholar] [CrossRef]
- Peña, J.M.; Gutiérrez, P.A.; Hervás-Martínez, C.; Six, J.; Plant, R.E.; López-Granados, F. Object-Based Image Classification of Summer Crops with Machine Learning Methods. Remote Sens. 2014, 6, 5019–5041. [Google Scholar] [CrossRef]
- Yao, J.; Wu, J.; Xiao, C.; Zhang, Z.; Li, J. The Classification Method Study of Crops Remote Sensing with Deep Learning, Machine Learning, and Google Earth Engine. Remote Sens. 2022, 14, 2758. [Google Scholar] [CrossRef]
- Loveland, T.R.; Dwyer, J.L. Landsat: Building a Strong Future. Remote Sens. Environ. 2012, 122, 22–29. [Google Scholar] [CrossRef]
- Lulla, K.; Nellis, M.D.; Rundquist, B.; Srivastava, P.K.; Szabo, S. Mission to Earth: LANDSAT 9 Will Continue to View the World; Taylor & Francis: New York, NY, USA, 2021; Volume 36, pp. 2261–2263. ISBN 1010-6049. [Google Scholar]
- Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the Radiometric and Biophysical Performance of the MODIS Vegetation Indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
- Yang, J.; Huang, X. 30 m Annual Land Cover and Its Dynamics in China from 1990 to 2019. Earth Syst. Sci. Data Discuss. 2021, 2021, 1–29. [Google Scholar]
- China Rural Statistical Yearbook (2013–2022); National Bureau of Statistics: Beijing, China, 2013–2022.
- Chander, G.; Markham, B.L.; Helder, D.L. Summary of Current Radiometric Calibration Coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI Sensors. Remote Sens. Environ. 2009, 113, 893–903. [Google Scholar] [CrossRef]
- Purcell, L.C.; Salmeron, M.; Ashlock, L. Soybean Growth and Development. Ark. Soybean Prod. Handb. 2014, 197, 1–8. [Google Scholar]
- Gao, B.-C. NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
- HUANG, J. New Vegetation Index and Its Application in Estimating Leaf Area Index of Rice. Rice Sci. 2007, 14, 195–203. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in Transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Riedmiller, M.; Lernen, A. Multi Layer Perceptron. Mach. Learn. Lab Spec. Lect. Univ. Freibg. 2014, 24, 11–60. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ramraj, S.; Uzir, N.; Sunil, R.; Banerjee, S. Experimenting XGBoost Algorithm for Prediction and Classification of Different Datasets. Int. J. Control Theory Appl. 2016, 9, 651–662. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
- Saki, M.; Keshavarz, R.; Franklin, D.; Abolhasan, M.; Lipman, J.; Shariati, N. A Data-Driven Review of Remote Sensing-Based Data Fusion in Precision Agriculture from Foundational to Transformer-Based Techniques. IEEE Access 2025, 13, 166188–166209. [Google Scholar] [CrossRef]
- Maleki, S.; Baghdadi, N.; Bazzi, H.; Dantas, C.F.; Ienco, D.; Nasrallah, Y.; Najem, S. Machine Learning-Based Summer Crops Mapping Using Sentinel-1 and Sentinel-2 Images. Remote Sens. 2024, 16, 4548. [Google Scholar] [CrossRef]
- Wang, R.; Ma, L.; He, G.; Johnson, B.A.; Yan, Z.; Chang, M.; Liang, Y. Transformers for remote sensing: A systematic review and analysis. Sensors 2024, 24, 3495. [Google Scholar] [CrossRef] [PubMed]
- Li, N.; Wang, Z.; Cheikh, F.A. Discriminating spectral–spatial feature extraction for hyperspectral image classification: A review. Sensors 2024, 24, 2987. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).