Highlights
What are the main findings?
- We propose an Enhanced Chlorophyll Index (NRLI) that significantly improves the separability between soybean and maize, two spectrally similar crops that are difficult to distinguish using traditional vegetation indices.
- NRLI enables the optimal soybean identification period to be automatically determined based on the temporal peak of the index, improving robustness across years and regions.
What are the implications of the main findings?
- NRLI allows earlier and more reliable soybean mapping, advancing the effective identification period by approximately 20 days compared with conventional indices.
- The proposed method provides a practical and transferable strategy for large-scale crop monitoring in operational remote sensing applications.
Abstract
Soybean is a key global crop for food and oil production, playing a vital role in ensuring food security and supplying plant-based proteins and oils. Accurate information on soybean distribution is essential for yield forecasting, agricultural management, and policymaking. In this study, we developed an Enhanced Chlorophyll Index (NRLI) to improve the separability between soybean and maize—two spectrally similar crops that often confound traditional vegetation indices. The proposed NRLI integrates red-edge, near-infrared, and green spectral information, effectively capturing variations in chlorophyll and canopy water content during key phenological stages, particularly from flowering to pod setting and maturity. Building upon this foundation, we further introduce a pixel-wise compositing strategy based on the peak phase of NRLI to enhance the temporal adaptability and spectral discriminability in crop classification. Unlike conventional approaches that rely on imagery from fixed dates, this strategy dynamically analyzes annual time-series data, enabling phenology-adaptive alignment at the pixel level. Comparative analysis reveals that NRLI consistently outperforms existing vegetation indices, such as the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Greenness and Water Content Composite Index (GWCCI), across representative soybean-producing regions in multiple countries. It improves overall accuracy (OA) by approximately 10–20 percentage points, achieving accuracy rates exceeding 90% in large, contiguous cultivation areas. To further validate the robustness of the proposed index, benchmark comparisons were conducted against the Random Forest (RF) machine learning algorithm. The results demonstrated that the single-index NRLI approach achieved competitive performance, comparable to the multi-feature RF model, with accuracy differences generally within 1–2%. In some regions, NRLI even outperformed RF. This finding highlights NRLI as a computationally efficient alternative to complex machine learning models without compromising mapping precision. This study provides a robust, scalable, and transferable single-index approach for large-scale soybean mapping and monitoring using remote sensing.
1. Introduction
Soybean is one of the most important economic crops globally. Major soybean-producing regions, including Asia, North America, and South America, have consistently ranked among the leaders in global crop acreage and yield [1]. As a critical food and industrial crop, soybean plays an irreplaceable role in food processing, feed production, and as a raw material for various industries [2]. Accurate monitoring of soybean planting distribution not only supports agricultural production scheduling and precision management but also provides essential data for agricultural research, policy formulation, market forecasting, and ecological environment assessment [3]. Due to its large-area coverage, periodic observation capability, and multispectral characteristics, remote sensing has emerged as a key technology for advancing agricultural informatization and promoting sustainable agricultural development [4].
In recent years, soybean remote sensing classification methods have evolved from single-spectral analysis to multi-source, multi-temporal, and intelligent classification approaches [5,6,7,8]. The integration of time-series analysis and multi-source data fusion [9,10], along with the incorporation of machine learning and deep learning techniques [11,12], has significantly enhanced the quality of feature extraction and the generalization capability of crop classification models. Despite these advancements, considerable challenges persist in large-scale and cross-regional applications. Traditional crop classification approaches typically rely on single-date satellite imagery. For instance, Tanzer et al. [13] utilized single-date Landsat-8 imagery, combining multiple vegetation indices (e.g., Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI)) (A list of abbreviations used in this study is provided in Appendix A) and original spectral bands within a random forest framework to classify crops. Similarly, Chen et al. [14] proposed a composite index integrating greenness and water content information to achieve end-to-end soybean mapping based on a single Sentinel-2 multispectral image. While these approaches offer advantages such as high computational efficiency, ease of implementation, and minimal data requirements, their classification accuracy remains highly sensitive to image acquisition timing, crop phenological stage, and environmental conditions.
Single-date approaches are often insufficient to capture the phenological dynamics of crops throughout the growing season. When different crops exhibit similar spectral responses during overlapping phenological stages, spectral confusion can occur, leading to reduced classification accuracy. As the limitations of single-date methods become increasingly evident in complex agricultural landscapes, researchers have shifted towards multi-temporal and phenology-driven crop classification approaches [15]. By leveraging temporal information to enhance inter-crop spectral differences, these methods aim to improve crop discrimination and classification accuracy. For example, Chen et al. [16] extracted crop growth curve features from multi-temporal Harmonized Landsat and Sentinel (HLS) data, achieving early-season identification of soybean and maize through curve-matching techniques, which significantly improved both timeliness and stability. Similarly, Zhou et al. [17] employed multi-temporal Unmanned Aerial Vehicle (UAV) imagery to derive spectral and texture features at different growth stages for ratoon rice yield prediction, substantially enhancing model accuracy and generalization capability. These studies demonstrate that multi-temporal features can effectively capture the spectral dynamics of crops throughout the growing season and improve their separability. Despite these advantages, multi-temporal approaches face several practical challenges. On one hand, long-term remote sensing time series are often constrained by satellite revisit intervals and cloud–rain interference, resulting in data gaps during key phenological phases and hindering continuous crop monitoring [8,18,19,20]. On the other hand, significant variations in soybean sowing dates and growth progress across regions and years make fixed time-window classification strategies difficult to generalize, as they are easily influenced by climatic and varietal differences [21]. Additionally, the phenological overlap between soybean and maize during critical growth periods often results in highly similar spectral signatures at peak growth stages [22], weakening the separability of multi-temporal and phenology-driven methods in heterogeneous agricultural regions.
Classification frameworks based on single-date or fixed-period imagery often fail to ensure interannual consistency, undermining the reliability of long-term monitoring of soybean growth dynamics and changes in planting patterns [23]. To address this issue, this study introduces a phenology-driven recognition mechanism that amplifies spectral differences between soybean and other crops, such as maize, by leveraging spectral characteristics during the vigorous growth period. This approach enhances inter-class separability and enables earlier identification of soybean within the growing season. Specifically, we propose a new remote sensing index for soybean identification, the Enhanced Chlorophyll Index (NRLI), which demonstrates high accuracy and robustness across multi-regional and multi-temporal conditions. By integrating red-edge, near-infrared, and green bands, NRLI enhances sensitivity to chlorophyll variations during key soybean phenological stages. Using multi-temporal Sentinel-2 imagery across several representative soybean-producing regions and leveraging the Google Earth Engine (GEE) platform, we systematically compared the performance of NRLI with commonly used vegetation indices. Furthermore, two classification strategies were employed: (1) a threshold-based index approach [14,24,25,26] and (2) a machine learning approach based on the Random Forest (RF) algorithm, to evaluate their accuracy and stability in distinguishing soybean from other crops [27,28,29]. Overall, this study validates the cross-regional applicability of NRLI and provides a transferable and efficient technical reference for large-scale crop identification using remote sensing.
2. Study Area and Materials
2.1. Study Sites
Four representative soybean-producing regions were selected to evaluate the performance and cross-regional applicability of the proposed method: (1) Hailun City and Kedong County in Heilongjiang Province, China (C1, C2); (2) parts of Iowa and Missouri in the U.S. Midwest (U1, U2); (3) the Rio Verde Agricultural Area in Goiás State, Brazil (B1); and (4) Marcos Juárez City in southeastern Córdoba Province, Argentina (A1). Notably, the spatial extents of all selected sites are strictly defined by their official administrative boundaries rather than arbitrary rectangular tiles. These administrative jurisdictions represent complete regional management units, and their specific, irregular geometries are explicitly shown in Figure 1 and summarized in Table 1. These regions are among the world’s major soybean production areas and span a wide range of climatic conditions, from temperate to tropical zones. The dominant climate types include temperate monsoon climate (northeastern China) [20], temperate continental climate (U.S. Midwest), tropical savanna climate (Brazil), and subtropical to temperate monsoon climates (Argentina) [30]. In addition to climatic diversity, the study areas exhibit substantial variations in soil properties (e.g., black soil, red soil) and agricultural management practices, providing diverse environmental and agronomic conditions for model testing. Such multi-regional and multi-conditional settings allow for a rigorous assessment of the robustness and transferability of the proposed soybean classification approach across different agroecological systems (Figure 1).
Figure 1.
Spatial distribution and climatic environments of the six study sites: C1 (Hailun City), C2 (Keshan County), U1 (Cedar County), U2 (Holt County), B1 (Fernando de Noronha), and A1 (Marcos Juárez). The climate types are classified according to the simplified Köppen–Geiger climate classification [31].
Table 1.
Geographic location and spatial extent of the study areas. All regions are delineated by their official administrative boundaries.
2.2. Various Datasets
Two primary categories of remote sensing data were utilized in this study: (1) Sentinel-2 surface reflectance data (COPERNICUS/S2_SR_HARMONIZED), and (2) the European Space Agency World Cover dataset (ESA/WorldCover/v200/2021). The Sentinel-2 imagery [32,33], acquired by both the Sentinel-2A and Sentinel-2B satellites, includes 12 spectral bands (excluding the cirrus band B10) covering the visible, near-infrared (NIR), and shortwave infrared (SWIR) regions, with spatial resolutions ranging from 10 m to 60 m and a revisit frequency of 5 days. For this study, we specifically utilized the green (B3), red (B4), red-edge (B5, B6), and NIR (B8) spectral bands to compute the NRLI. These selected bands are particularly sensitive to variations in chlorophyll content and canopy water, which are critical for distinguishing between soybean and other crops, such as maize. Notably, the B1, B9, B11, and B12 bands were not used, as they are not essential for the specific spectral characteristics needed to enhance crop differentiation [34]. The ESA WorldCover dataset provides global land cover information at a 10 m resolution [35], from which the cropland class was extracted to create a farmland mask, excluding non-agricultural land from the subsequent analysis. Additionally, representative training samples were selected across the study regions based on field surveys and agricultural statistical data, forming a reliable foundation for the construction and validation of the proposed NRLI. The detailed number and spatial distribution of the samples are described in the Section 2.3.
2.3. Validation Data
To assess the performance of different methods for soybean classification across multiple regions, four types of validation datasets were selected in this study, covering representative soybean-producing areas in the United States, Brazil, Argentina, and China. Detailed information for each dataset is provided in Table 2.
In the Chinese administrative regions (C1 and C2), field data acquisition strategies were tailored to the specific study periods. For 2023, ground truth data were collected through roadside surveys along major national highways and rural roads [36]. This sampling trajectory was strategically planned to penetrate the primary soybean-producing areas, yielding approximately 5600 on-site samples. Spatially, these 2023 field samples exhibit a network-like distribution, widely dispersed across the administrative extents, spanning the full latitudinal and longitudinal range of the counties. This distribution ensures the coverage of diverse topographic conditions (e.g., flat to undulating terrain) and typical planting patterns within the black soil region. (Note: For the historical period of 2019–2022, separate validation datasets were employed to maintain temporal consistency). In contrast, for the international regions (U1, U2, B1, and A1), where on-site access was not feasible, reference samples were derived from high-confidence official land cover datasets (e.g., United States Department of Agriculture Cropland Data Layer (USDA CDL) [37], MapBiomas [38]) cross-referenced with high-resolution satellite imagery. To ensure these samples captured the environmental heterogeneity of each region, a stratified random sampling strategy was employed [39]. This approach ensured a comprehensive spatial distribution across diverse topographic conditions, soil properties, and planting structures, avoiding clustering in specific sub-areas. The spatial distribution of these samples effectively represents the variability within the administrative boundaries. All validation samples, regardless of their source, were kept strictly independent of the training data to guarantee an objective assessment of classification accuracy [40].
Table 2.
Table of Study Area Information.
Table 2.
Table of Study Area Information.
| Study Area Code | Dataset Name | Temporal Coverage | Spatial Resolution (Dataset) |
|---|---|---|---|
| U1, U2 | Cropland Data Layer | Annual update | 30 m |
| B1 | MapBiomas Collection 8.0 & MapBiomas AGRO | 1985–2024 | 30 m |
| A1 | Soybean Planting Dataset [41] | 2000–present | 30 m |
| C1, C2 | National Soybean Planting Dataset (2019–2022) [42] | 2019–2022 | 10 m |
2.4. Software and Computational Environment
The entire workflow of this study, from data acquisition to statistical analysis and visualization, was implemented using a suite of specialized software and cloud computing platforms [43]. (1) Cloud Computing and Image Processing: The GEE (available at: https://code.earthengine.google.com/, accessed 1 November 2025) platform was used as the primary computational engine for large-scale remote sensing data processing. GEE facilitated the acquisition of Sentinel-2 imagery [32], the calculation of spectral indices (e.g., NRLI, Red-Edge Normalized Difference Vegetation Index (RENDVI)), and the execution of classification algorithms, including the threshold-based method and the RF classifier [44]. The GEE Python API (available at: https://developers.google.com/earth-engine/guides/python_install, accessed 1 November 2025) was employed to automate these tasks efficiently [45]. (2) Statistical Analysis and Area Estimation: Post-classification statistical analyses were conducted using PyCharm 2023.1.2 (JetBrains, Prague, Czech Republic). This environment was used to aggregate pixel-level classification results into county-level soybean planting areas and to calculate performance metrics, specifically the Coefficient of Determination (R2) and Root Mean Square Error (RMSE), for the national-scale validation [46]. (3) GIS Operations and Visualization: ArcMap (V10.8, Esri, Redlands, CA, USA) was used for all spatial data management, validation sample collection, and final map production. This Geographic Information System (GIS) software facilitated the visualization of classification results, the generation of spatial distribution maps for sample points, and the layout design of all figures presented in this manuscript.
3. Methods
3.1. Early Soybean Identification Using NRLI
This study employed a method for selecting the optimal temporal window based on time-series indices. The basic approach involves analyzing each pixel individually by traversing its spectral reflectance time series throughout the year (Day of Year (DOY) 0–360) and calculating the corresponding NRLI curve. For each pixel, the annual maximum NRLI value and its occurrence date are extracted, and this date is regarded as the pixel’s optimal phenological stage. Based on this, a quality mosaic approach is applied to integrate the optimal dates of all pixels within the study area, producing a spatiotemporally adaptive composite image and enabling pixel-level temporal window selection [47,48]. Taking soybean phenology in Northeast China as an example, the NRLI peak for soybean typically occurs around 220 days after sowing (as shown later in Section 3.2), at which point its NRLI value is the highest among all crop types. Therefore, the period approximately 20 days prior to the peak (DOY 200–220) can be defined as the optimal time window for soybean identification. This period coincides with the end of the peak growth of potato, allowing the spectral differences between soybean and other crops to be effectively amplified, thus enabling early identification.
3.2. Enhanced Chlorophyll Index
In this study, the required spectral indices, including NIR, RENDVI, NIR−RENDVI, Leaf Area Index-insensitive Chlorophyll Index (LICI), and NRLI, were calculated using Sentinel-2 surface reflectance data. Data processing was performed on the GEE platform [43], which facilitated the retrieval and analysis of large-scale satellite imagery across the entire soybean growing season. First, the relevant Sentinel-2 bands were extracted, and NIR, RENDVI, and LICI values were computed for each pixel over time. The NIR−RENDVI index was derived by calculating the spectral difference between the NIR [49] and red-edge bands, amplifying spectral contrasts between soybean and other crops, such as maize. After processing in GEE, the temporal spectral profiles of these indices were exported and analyzed locally using Origin 2021 software. Origin 2021 was used to visualize the time-series data, displaying the spectral dynamics of each crop type (soybean, maize, rice, and potato) throughout the growing season. Additionally, Origin’s data smoothing and curve fitting functions were employed to enhance the clarity of the trends, making it easier to compare spectral responses across crops and identify key separation points, particularly during critical growth phases such as flowering and pod-setting. NRLI effectively separates soybean from other crops and is applied for soybean distribution mapping. The formula is expressed as:
The design of NRLI is grounded in the chlorophyll content dynamics of various crops during their peak growth stages. During this period, the LICI values of potato and rice are considerably lower than those of soybean, enabling effective separation from most other crops. For maize, which presents a major challenge in soybean classification, the difference between NIR and RENDVI is used to amplify the LICI discrepancy between maize and soybean during the peak growth stage (Figure 2c). LICI, a chlorophyll-sensitive index constructed using the red-edge and visible bands of Sentinel-2, is defined as [50]:
The LICI time series (Figure 3a) indicate that the four crop types exhibit relatively small differences during the early growth stage (DOY < 150). Upon entering the peak growth period (DOY 200–250), LICI increases rapidly and reaches a maximum around DOY 220. At this stage, the curves of maize and soybean largely overlap, with similar peak positions and magnitudes, whereas rice exhibits lower values and potato senesces earlier. Maize shows slightly higher reflectance in the NIR band (Figure 2a), while soybean exhibits slightly lower RENDVI values in the red-edge region (Figure 2b). The difference between these two signals (NIR−RENDVI) was used as a new spectral feature (Figure 2c), combining the two opposing trends and thereby amplifying the LICI differences between the two crops, enhancing their separability. RENDVI is defined as [51]:
Based on this concept, NRLI was constructed by introducing NIR−RENDVI as an enhancement factor to LICI. The new chlorophyll-enhanced index, NRLI, was obtained by taking the product of LICI and NIR−RENDVI (Figure 3b), incorporating both red-edge sensitivity and chlorophyll responsiveness, thereby improving the discriminative capability of the index.
Figure 2.
Temporal spectral profiles of (a) near-infrared (NIR), (b) Red-Edge Normalized Difference Vegetation Index (RENDVI), and (c) NIR−RENDVI. The gray shaded area represents the optimal classification period for soybean, where the spectral differences are most pronounced. The time node (220) indicates the peak classification time for soybean, where the separability between soybean and other crops is maximized.
Figure 2.
Temporal spectral profiles of (a) near-infrared (NIR), (b) Red-Edge Normalized Difference Vegetation Index (RENDVI), and (c) NIR−RENDVI. The gray shaded area represents the optimal classification period for soybean, where the spectral differences are most pronounced. The time node (220) indicates the peak classification time for soybean, where the separability between soybean and other crops is maximized.

Figure 3.
(a) Temporal spectral profiles of Leaf Area Index-insensitive Chlorophyll Index (LICI). (b) Temporal spectral profiles of Enhanced Chlorophyll Index (NRLI).The gray shaded area represents the optimal classification period for soybean, where the spectral differences are most pronounced. The time node (220) indicates the peak classification time for soybean, where the separability between soybean and other crops is maximized. The Peak in (b) represents the maximum NRLI value.
Figure 3.
(a) Temporal spectral profiles of Leaf Area Index-insensitive Chlorophyll Index (LICI). (b) Temporal spectral profiles of Enhanced Chlorophyll Index (NRLI).The gray shaded area represents the optimal classification period for soybean, where the spectral differences are most pronounced. The time node (220) indicates the peak classification time for soybean, where the separability between soybean and other crops is maximized. The Peak in (b) represents the maximum NRLI value.

3.3. Classification Method
To compare the performance of NRLI with other commonly used vegetation indices in soybean identification, two complementary classification strategies were employed in this study: a threshold-based index segmentation approach and a machine learning-based RF algorithm (Figure 4). In the data preprocessing stage, this study used pre-processed Sentinel-2 surface reflectance products rather than raw satellite imagery. Cloud and cloud-shadow masking were first applied to ensure data quality, followed by the selection of the study regions and spatial cropping to match the boundaries of each research area [52]. Based on the processed Sentinel-2 data, relevant spectral features were constructed for subsequent analysis. The method selection stage involved two parallel classification strategies: the threshold-based method and the RF machine learning method. For the threshold-based method, NRLI was calculated for each DOY across the soybean growing season. The maximum NRLI value and its corresponding DOY were identified to determine the optimal classification time point. The resulting maximum NRLI image was then used as input for soybean extraction through a threshold-based decision rule. In parallel, the Random Forest method was implemented using selected spectral features and index-based variables. Sample points were divided into training data (70%) and validation data (30%). The RF model was trained using the training samples and subsequently validated to assess classification performance. To ensure an objective comparison, the classification performance of the threshold-based NRLI method, the Random Forest model, and different spectral indices was quantitatively evaluated using accuracy metrics. Finally, in the soybean mapping stage, the optimal method and feature combination were selected based on the evaluation results, and the final soybean distribution map was generated [39].
Figure 4.
Workflow for soybean classification.
3.3.1. Threshold-Based Index Segmentation
Soybean pixels were separated from non-soybean pixels by applying an optimized threshold to the NRLI index. The threshold determination procedure was based on a comprehensive statistical analysis of multi-temporal training samples [53]. First, spectral reflectance and index values for all major crop types (soybean, maize, rice, and potato) were extracted from the target study years using the collected ground truth samples. These values were then aggregated and visualized using box plots to examine their statistical distributions, including medians and interquartile ranges, across the growing season. By analyzing the spectral overlap and separation distance between the soybean and non-soybean distributions (as illustrated in the Section 4, Figure 5), the optimal threshold was identified [54]. This threshold was selected to maximize the distinction between classes, thereby achieving the highest possible overall accuracy and Kappa coefficient.
Figure 5.
Boxplots of spectral differences between LICI and NRLI. The horizontal red solid lines represent the ST for maize and soybean based on LICI and NRLI. (a) Shows the classification results using LICI, and (b) shows the results using NRLI.
3.3.2. Soybean Classification Using RF
RF is a non-parametric classification method based on the Bagging ensemble strategy, which constructs multiple decision trees and determines the class of each sample through majority voting. It possesses strong nonlinear fitting capability and feature selection ability [55,56,57]. In this study, the input features for RF consisted of NRLI and multiple temporal vegetation indices (e.g., NDVI, EVI), Greenness and Water Content Composite Index (GWCCI)) to fully exploit spectral and vegetation index information, thereby improving classification accuracy.
For the training and validation datasets in research areas C1 and C2, real ground truth data were collected through field surveys in the selected study areas. Additionally, data from existing publicly available datasets (referenced in Section 2.3, Table 2) were incorporated. The number of labeled samples for China exceeded 3000 points, with 70% of the samples used for training and 30% for validation. The field survey data provided a reliable source of ground truth labels for the training process. The training and validation datasets for research areas U1 and U2 were derived from publicly available datasets (referenced in Section 2.3, Table 2), including data from USDA CDL and other remote sensing sources. A total of 5000 labeled points were used, with 70% allocated for training and 30% for validation. These datasets were carefully selected to include a range of agricultural practices and crop types, ensuring a representative sample for model evaluation. For research area B1, the training and validation datasets were obtained from the MapBiomas collection, a publicly accessible remote sensing dataset for land cover mapping. A total of 4000 labeled points were used for training and validation, with 70% of the samples used for training and 30% for validation. The data points were selected to cover different crop types, including soybean, across diverse climatic and environmental conditions. The training and validation datasets for research area A1 were derived from publicly available datasets (referenced in Section 2.3, Table 2). In total, 2500 labeled points were used for training and validation, with 70% allocated for training and 30% for validation. The sampling process ensured that the dataset covered a broad range of soybean cultivation areas in Argentina.
For parameter settings, the number of decision trees (n_estimators) was set to 500, the maximum number of features per tree (max_features) was set to ‘sqrt’, and all other parameters were set to their default values in Google Earth Engine (GEE). This configuration balanced model stability, computational efficiency, and classification accuracy [58,59].
3.4. Accuracy Assessment
Both classification methods employed validation samples to calculate overall accuracy (OA), producer’s accuracy (PA), user’s accuracy (UA), and the Kappa coefficient [60]. Additionally, the classification results were compared with official statistical data using the statistical yearbook approach to assess spatial and quantitative consistency, further verifying the reliability of the classification [36]. These two assessment approaches are complementary: the validation sample method evaluates local accuracy, while the statistical yearbook method assesses the reasonableness of the classification results at a regional scale, ensuring a comprehensive evaluation of accuracy and reliability. Particular attention was given to soybean identification, including PA (omission error) and UA (commission error), as soybean was the focal target of this study. For the RF model, differences in performance on the validation dataset were analyzed to discuss the advantages and limitations of the different algorithms.
4. Results
4.1. Statistical Separability Analysis
Figure 5 presents the statistical value distributions of LICI and NRLI for soybean, maize, rice, and potato derived from the training samples. In the LICI box plots (Figure 5a), the value ranges of soybean and maize exhibit spectral overlap, with their interquartile ranges intersecting near the separation threshold (ST) of 4.8. In contrast, the NRLI box plots (Figure 5b) show a shift in distribution patterns. The soybean class displays elevated index values with a median exceeding 1.3, while the maize, rice, and potato classes are clustered in a lower value range. A distinct separation between the soybean and non-soybean distributions is observed at the threshold line of 1.3.
This enhanced separability is particularly significant for early identification. As illustrated in Figure 6, the temporal evolution of the Spectral Overlap Rate (SOR) between soybean and maize reveals distinct discriminability patterns across the growing season. In this plot, the x-axis represents the DOY, and the y-axis represents the SOR, where lower values indicate higher separability. The black dashed line at 0.5 serves as a reference threshold; values falling below this line indicate a relatively good separation between the two crops. While all indices exhibit high overlap during the early vegetative stages, the NRLI curve (blue triangles) demonstrates a notably steeper decline compared to traditional indices. Specifically, NRLI crosses the effective separability threshold (SOR < 0.5) as early as DOY 195 (indicated by the vertical gray dashed line). In contrast, both NDVI (pink squares) and EVI (orange circles) show a significant lag, crossing the 0.5 threshold approximately 20–25 days later (around DOY 215–225). Furthermore, NRLI reaches a lower minimum overlap rate (near 0) during the peak window compared to NDVI and EVI, confirming that it not only enables earlier identification but also achieves a higher degree of spectral separation.
Figure 6.
Temporal evolution of the Spectral Overlap Rate (SOR) between soybean and maize throughout the growing season. Note: The overlap rate is calculated as the ratio of the intersection to the union of the interquartile ranges (Q1–Q3) for both crops, with outliers excluded using the Interquartile Range (IQR) method. A lower overlap rate indicates higher spectral separability.
4.2. Classification Accuracies Using Vegetation Indices
To rigorously evaluate the performance differences between NRLI and commonly used vegetation indices, OA based on the threshold-based method was calculated for four typical soybean-producing regions. Crucially, to ensure a fair and optimal assessment of the traditional indices (NDVI, EVI, and GWCCI), their classification was conducted using imagery from the specific phenological windows where the spectral difference between soybean and maize was maximized. Even under these optimized conditions, significant differences in soybean classification OA were observed among the indices during 2019–2023 (Figure 7). The results indicate that NRLI consistently achieved the highest classification accuracy across all study areas and years. In the major production regions of U1, U2, C1, and B1, NRLI demonstrated exceptional stability, with OA values consistently exceeding or nearing 90%. For instance, NRLI reached a peak of 93.2% in U1 (2019) and 94.46% in C1 (2023), while maintaining robust performance in B1, ranging from 90.9% to 93.4%. In contrast, despite using the optimal time windows, EVI and NDVI exhibited considerably lower accuracy, generally fluctuating between 70% and 80%. For example, in U1, NDVI ranged from 70.23% to 76.35%, and in B1, it was as low as 69.96% (2019). GWCCI generally performed better than EVI and NDVI but remained inferior to NRLI in most scenarios. While GWCCI showed competitive accuracy in C1 (e.g., 92.6% in 2020), it often lagged behind NRLI in other regions, such as in A1 (2023), where GWCCI achieved 81.9% compared to NRLI’s 88.56%. Overall, across the study period and regions, NRLI improved OA by approximately 12–23 percentage points compared with NDVI and EVI, and generally outperformed GWCCI by 2–8 percentage points, validating its superior capability for precise soybean mapping.
Figure 7.
Overall accuracy (%) of soybean classification using different vegetation indices (Enhanced Vegetation Index (EVI), Normalized Difference Vegetation Index (NDVI), NRLI, Greenness and Water Content Composite Index (GWCCI)) across six study regions (U1, U2, A1, C1, C2, B1) over a five-year period (2019–2023). Each panel (a–f) represents a different study region, showing the variation in classification accuracy for each year. The bar colors correspond to the different indices, with NRLI shown in yellow, EVI in blue, NDVI in green, and GWCCI in light green.
4.3. Classification Accuracy of NRLI, GWCCI Versus RF
To systematically evaluate the soybean classification performance of different methods, a comparative analysis of the classification results for RF, GWCCI, and NRLI was conducted across six study areas (U1, U2, A1, C1, C2, and B1) (Figure 8). Evaluation metrics included OA, UA, PA, and the Kappa coefficient (Table 3). The results indicate that the RF model generally achieved the best performance in most regions, such as U1 (OA = 91.3%, Kappa = 0.86), U2 (OA = 90.5%, Kappa = 0.88), and C1 (OA = 93.5%, Kappa = 0.86), demonstrating clear advantages in both classification accuracy and consistency. The NRLI method exhibited comparable performance to RF in some regions and even outperformed it in C2 (OA = 90.45%, Kappa = 0.85) and B1 (OA = 92.6%, Kappa = 0.85), indicating strong discriminative capability in complex terrains or spectrally mixed environments. In contrast, GWCCI showed relatively lower classification accuracy overall, particularly in regions with pronounced topography, such as A1 (OA = 80.11%, Kappa = 0.68) and C2 (OA = 85.54%, Kappa = 0.71). Overall, both RF and NRLI achieved high-accuracy soybean identification, with NRLI demonstrating potential for enhancing crop separability. GWCCI, on the other hand, is more suitable for regional-scale vegetation condition assessment but remains limited for fine-scale land cover classification.
Figure 8.
Soybean classification results in 2022 using NRLI, GWCCI, and the RF algorithm.
Table 3.
Classification accuracy of soybean in 2022 using NRLI, GWCCI, and the Random Forest (RF) algorithm.
4.4. Detailed Classification Results
The soybean classification results varied noticeably across different geographic conditions (Figure 9). In the United States, the U1 and U2 regions are characterized by flat terrain and large, consolidated agricultural areas. Both RF and NRLI performed excellently in these regions, achieving high overall classification accuracy, whereas GWCCI exhibited relatively lower performance. In Argentina, the A1 region has a more scattered distribution of soybean fields; RF and NRLI were able to effectively identify soybean planting areas, although some misclassification occurred in edge and transitional zones, while GWCCI performed poorly. In China, the C1 and C2 regions feature complex topography and diverse land use. RF and NRLI maintained high classification accuracy in most areas, but some misclassifications remained in mountainous and hilly zones. GWCCI showed relatively stable performance in these regions, though its accuracy was slightly lower. In Brazil, the B1 region contains extensive and uniformly distributed soybean fields. RF and NRLI accurately classified large planting areas, whereas GWCCI exhibited limited capability in regions with interspersed land uses or rapidly changing conditions. Overall, RF and NRLI demonstrated stable performance across varying geographic and cultivation conditions, particularly excelling in large-scale, consolidated, or uniformly distributed planting areas. In contrast, GWCCI showed lower classification accuracy in complex or scattered distribution regions.
Figure 9.
Spatial details of soybean fields derived from GWCCI, RF, and NRLI classifications. The colored circles highlight the discrepancies between the methods in each region. The yellow regions represent soybean areas, and the differences between the methods in classifying soybean areas are marked within the circles. Reference data are used for validation, sourced from the dataset in Section 2.3.
4.5. National-Scale Area Comparison and Statistical Validation
To comprehensively evaluate the superiority and transferability of the NRLI index, the classification framework was extended from local study sites to a national scale across China, the United States, Brazil, and Argentina for the year 2022. A county-level comparison was conducted to validate the consistency between the satellite-derived soybean areas (from both NRLI and the benchmark GWCCI model) and reference statistics. Reference data sources varied for domestic and international regions. For China, official soybean sowing areas for individual counties were manually compiled from the 2023 Agricultural Statistical Yearbooks, retrieved via the China National Knowledge Infrastructure (CNKI). For the United States, Brazil, and Argentina, reference areas were aggregated from high-confidence land cover datasets (e.g., USDA CDL, MapBiomas) downloaded to a local environment. Data processing was carried out using PyCharm 2023.1.2. We calculated the total soybean pixel area within each administrative unit for both models and compared these against the reference values. The agreement was quantified using the R2 and RMSE. Detailed results of this comparative analysis, including the fitting performance and error metrics for each country, are presented and discussed in the next section.
4.6. Model Performance Assessment
In 2022, across the four major soybean-producing countries (China, the United States, Brazil, and Argentina), a stability assessment based on the optimal classification period indicated that the NRLI model generally outperformed GWCCI [14] (Figure 10) in terms of both prediction error and fitting accuracy. In China, NRLI exhibited superior performance, with a higher coefficient of determination (R2 = 0.911) and lower RMSE (26,663) compared to GWCCI (R2 = 0.899, RMSE = 28,157) (Figure 10a). In the United States, NRLI achieved an R2 of 0.933 and an RMSE of 11,253, clearly surpassing GWCCI, which yielded a lower R2 of 0.856 and a higher RMSE of 12,604 (Figure 10b). In Brazil, NRLI performed most prominently, recording the highest goodness of fit (R2 = 0.973) and a substantially lower RMSE (46,473) compared to GWCCI (R2 = 0.866, RMSE = 75,769) (Figure 10c). In Argentina, comparisons revealed a slight deviation; NRLI yielded an R2 of 0.833 and an RMSE of 40,809, whereas GWCCI showed slightly better metrics with an R2 of 0.841 and an RMSE of 24,292 (Figure 10d), indicating some variability in this region. Overall, NRLI demonstrated lower prediction errors and stronger correlations (higher R2 values) in three of the four major producing regions (China, the United States, and Brazil), highlighting its reliability in soybean area estimation. Although performance in Argentina showed slight fluctuations with a marginally lower R2, NRLI still maintained a comparable level of explanatory power to GWCCI, further confirming its robustness for cross-regional applications.
Figure 10.
Scatter plots of NRLI and GWCCI in 2022 for China, the United States, Brazil, and Argentina. The scatter plots illustrate the agreement between the predicted planting areas (y-axis) and the reference areas (x-axis) for: (a) China, (b) United States, (c) Brazil, and (d) Argentina. The green dots represent the proposed NRLI model, while the orange dots represent the GWCCI model. The solid lines indicate the linear regression fit for each model. The red dashed line represents the 1:1 line (Y = X). Statistical metrics, including the R2 and RMSE, are provided for each region to quantify model performance.
5. Discussion
5.1. Separability Reliability of NRLI
Soybean and maize exhibit highly similar spectral and phenological characteristics, making them one of the most challenging crop combinations for remote sensing-based classification. Spectrally, both crops possess comparable canopy structures, chlorophyll content, and leaf area index (LAI) during the early-to-mid growing season, resulting in nearly overlapping reflectance in the visible and NIR bands. Consequently, commonly used vegetation indices such as NDVI, EVI, and Land Surface Water Index (LSWI) show substantial overlap between these two crops [14,16,22,25,36]. Phenologically, maize and soybean are often sown simultaneously in major producing regions, grow concurrently, and reach peak growth around the same time. Their time-series vegetation index curves are therefore highly similar, posing challenges for traditional classification approaches based on growth curve matching or Dynamic Time Warping (DTW) [16]. Moreover, conventional greenness indices such as NDVI tend to saturate at high canopy cover, reducing sensitivity to differences in chlorophyll content and canopy structure, which further limits discrimination capability [22]. Variations in climate, cultivar, and management practices can also induce significant intra-class variability in spectral and phenological traits of maize and soybean [36], For example, differences in irrigation or soil moisture may blur their spectral boundaries [61,62]. The fundamental challenge for early soybean identification is therefore the high synchrony in spectral response and phenological progression between soybean and maize [49], making simple single-band or time-series index methods inadequate for capturing subtle physiological differences. To address this, the present study combines the LICI with a spectral difference enhancement term (NIR − RENDVI). LICI captures the relationship between red-edge slope and photosynthetic characteristics, while (NIR−RENDVI) amplifies spectral differences associated with canopy structure and chlorophyll content, thereby enabling directional enhancement of soybean-specific features. The superior separability demonstrated by NRLI, particularly at its peak value, is fundamentally rooted in the distinct physiological and morphological divergences between soybean and maize during their critical reproductive stages. From the perspective of growth phenology and canopy structure, the peak value of NRLI typically corresponds to the flowering-to-podding stage for soybean (R1–R4) and the silking-to-filling stage for maize (R1–R3) [63]. During this window, soybean (a dicotyledonous C3 crop) develops a dense, broadleaf canopy with high closure, whereas maize (a monocotyledonous C4 crop) maintains an erectophile leaf architecture [49,64,65]. While traditional indices saturate due to high biomass in both crops, the dense structure of the soybean canopy induces a stronger multiple scattering effect in the NIR region (Figure 2a) [66], which the proposed (NIR−RENDVI) term is specifically designed to amplify (Figure 2c).
Regarding physicochemical parameters, the trends in chlorophyll content and leaf water potential differ significantly [67]. As soybean enters the pod-filling stage, it exhibits a “stay-green” trait, maintaining relatively high and stable chlorophyll concentration and canopy water content to support seed development [68]. This physiological trend is effectively captured by RENDVI (Figure 2b), where soybean maintains higher values compared to maize due to the elevated red-edge reflectance slope. NRLI integrates this red-edge sensitivity with the structural information from NIR. This physiological divergence translates directly into an advantage for early identification. To verify this, the SOR between soybean and maize was calculated from DOY 150 onwards (Figure 6), with outliers excluded using the Interquartile Range (IQR) method. As shown in Figure 6, traditional indices like NDVI and EVI maintain high spectral overlap rates (SOR > 0.5) throughout much of the growing season, indicating limited separability. In contrast, the NRLI curve exhibits a steeper decline, dropping below the overlap threshold of 0.5 significantly earlier (around DOY 195) compared to the other indices (around DOY 220). This temporal lead confirms that NRLI effectively amplifies emerging spectral differences sooner than biomass-based indices, enabling earlier crop identification.
Beyond temporal advantages, NRLI also demonstrates significant chlorophyll-signal enhancement and statistical separability. As presented in the Section 4.1, comparative analysis reveals that within LICI alone, considerable index overlap limits stable discrimination (Figure 5a). In contrast, NRLI successfully overcomes this by integrating the structure-pigment enhancement factor [51], establishing a clear separation boundary (Figure 5b). The red line represents the separation threshold (ST): 4.8 for LICI and 1.3 for NRLI. NRLI significantly widens inter-class intervals, with soybean showing the highest median values clearly distinguished from maize, rice, and potato at the threshold line (ST = 1.3). Importantly, this improvement stems from the strategic amplification of existing signals rather than additional bands. By integrating (NIR−RENDVI) (Figure 2c), NRLI enhances the spectral response differences among crops and reduces distribution overlap. Compared with other indices like GWCCI, which is sensitive to soil background and atmospheric variability [14], NRLI directly amplifies the red-edge–NIR signals associated with crop type. Overall, the dual advantages of early identification capability and robust chlorophyll enhancement make NRLI a powerful tool for cross-regional mapping.
5.2. Advantages of Peak-Based Temporal Selection for NRLI
This study proposes a per-pixel compositing strategy based on the NRLI peak to enhance the temporal adaptability and spectral discriminability of crop classification. Compared to conventional approaches relying on fixed-date imagery, this strategy dynamically analyzes the full-year time series to achieve phenology-adaptive alignment at the pixel level. The advantages of this approach are threefold. First, it automatically selects the peak date for each pixel based on crop sowing differences and cultivar characteristics, mitigating classification bias caused by inconsistent phenological rhythms across the region [69]. Second, by employing the Maximum Value Composite (MVC) principle to extract the peak image, the method substantially reduces interference from clouds, shadows, and missing data [70]. Since the NRLI values for healthy, dense soybean canopies are typically much higher than those for atmospheric noise, selecting the temporal maximum effectively filters out these lower-value components, thereby improving the completeness and robustness of the feature information. Third, when combined with NRLI’s spectral difference amplification mechanism, this peak-based temporal selection further enhances the separability of soybean from other crops. However, certain limitations must be acknowledged. The method’s effectiveness is inherently contingent upon the availability of high-frequency clear observations during the critical growth stages. Although prior cloud masking in GEE minimizes noise, the strategy relies on capturing at least one clear observation during the peak window. In regions with persistent and continuous cloud cover throughout the growing season (e.g., tropical monsoon zones), or where satellite revisit rates are insufficient, the “true” phenological peak may not be captured, potentially leading to the selection of suboptimal features or pseudo-peaks [71].
5.3. Comparative Analysis with Machine Learning Algorithm
To rigorously validate the robustness of the proposed index, we benchmarked the NRLI-based threshold method against the RF algorithm, a widely used machine learning classifier known for its ability to handle high-dimensional feature spaces [58]. The results (Table 3) revealed a compelling finding: the single-index NRLI strategy achieved classification accuracies statistically comparable to the multi-feature RF model (OA difference < 2%) and even marginally outperformed RF in specific regions such as C2 and B1. This performance parity can be attributed to the efficiency of physically-based feature extraction versus data-driven learning [28,68]. While RF relies on aggregating information from multiple spectral bands to construct decision trees, this process can be sensitive to feature redundancy and noise, particularly when spectral separability is low [27]. In contrast, NRLI is designed to directly amplify the specific physiological divergence (i.e., chlorophyll density and canopy structure) between soybean and maize during the reproductive stage. By condensing the diagnostic information into a single, highly discriminative metric, NRLI minimizes the “noise” that might confuse a machine learning model trained on raw bands. Furthermore, the generalization capability of NRLI appears superior in heterogeneous environments. Machine learning models often carry a risk of overfitting to the specific conditions of the training data, leading to performance degradation when applied to regions with different soil or climatic characteristics (e.g., the slight underperformance of RF in B1) [72]. NRLI, being grounded in crop physiology, maintains a stable response across regions. Consequently, NRLI offers a distinct operational advantage: it provides high-accuracy mapping capabilities without the computational burden, extensive training data requirements, and “black-box” complexity associated with machine learning workflows.
5.4. Spatiotemporal Transferability
From a spatiotemporal perspective, NRLI demonstrates stronger robustness for cross-regional and cross-condition soybean monitoring. Based on the optimal classification period in 2022 across four major producing countries, NRLI achieved high fitting accuracy and low RMSE in China, the United States, and Brazil, significantly outperforming GWCCI, indicating consistent discriminative capability across different latitudes and environmental conditions. This performance reflects NRLI’s spectral difference amplification mechanism, implemented as (NIR − RENDVI) × LICI, which maintains stable separability across different fields and climate zones while reducing threshold drift caused by geographic or management variability. In contrast, GWCCI is more sensitive to soil moisture and environmental conditions, resulting in greater fluctuations in fitting performance in some regions [14,68]. It should be noted that NRLI exhibited slightly higher RMSE in Argentina, suggesting that in certain spatial contexts, its performance may still be affected by mixed cropping patterns or fragmented fields. Overall, NRLI achieved higher fitting accuracy, lower errors, and threshold variation across countries of less than 0.2, demonstrating superior spatiotemporal stability and transferability. Therefore, NRLI offers greater reliability and potential for cross-regional soybean area estimation and long-term dynamic monitoring compared to GWCCI.
5.5. Potential Applications and Future Research Directions
This study demonstrates the significant potential of NRLI for improving soybean classification, particularly in distinguishing it from maize, one of the most challenging crop pairs for remote sensing-based classification [73]. By combining NRLI with NIR-RENDVI, the separability between soybean and other crops is enhanced, as it amplifies spectral differences that are often indistinguishable with conventional methods. Future research should expand the use of NRLI to other crop types and environments to assess its broader applicability. Additionally, refining threshold selection methods and incorporating environmental variables such as soil moisture could further improve classification accuracy. The integration of multi-sensor and multi-temporal data, alongside advanced machine learning techniques, is also recommended. For instance, using data from Sentinel-1 radar and WorldView high-resolution imagery could mitigate issues with cloud cover. Exploring deep learning models may further enhance accuracy, particularly in complex and heterogeneous landscapes.
6. Conclusions
This study proposes a chlorophyll-enhanced index for soybean classification, the NRLI, which is constructed based on the distinct spectral characteristics of soybean at critical growth stages. By effectively integrating the (NIR−RENDVI) structure–pigment enhancement factor with the chlorophyll-sensitive LICI, NRLI maintains high sensitivity to chlorophyll variation while ensuring stability against environmental background noise, without requiring additional spectral bands. Validation results across four typical soybean-producing regions (China, the United States, Brazil, and Argentina) showed overall accuracies of 92.3%, 90.8%, 88.5%, and 88.1%, respectively. These figures represent improvements of approximately 10 to 20% over existing indices (e.g., NDVI, EVI, GWCCI), demonstrating the proposed method’s higher accuracy and cross-regional stability. Moreover, benchmarking against the RF algorithm underscores the methodological significance of this study. The finding that NRLI—a simple threshold-based method—can achieve accuracy levels comparable to, or even superior to (as seen in regions C2 and B1), the multi-feature RF model suggests that capturing distinct physiological signatures is as effective as high-dimensional feature learning. This establishes NRLI as a streamlined, computationally efficient alternative that eliminates the complexity of feature engineering typically required by machine learning, lowering the barrier for operational large-scale applications. NRLI thus provides a concise, robust, and transferable technical solution for soybean remote sensing identification. Despite the promising results, some limitations were observed. The classification accuracy of NRLI may be influenced by environmental variables such as soil moisture, vegetation density, and atmospheric conditions. In regions with significant environmental heterogeneity, the performance of NRLI could be further optimized by integrating additional factors such as soil properties or seasonal changes. Future research should focus on refining threshold selection and improving model calibration by incorporating more diverse environmental conditions, as well as expanding the dataset to include a broader range of crops and climates. Additionally, the inclusion of multi-sensor and multi-temporal data could help address the limitations of single-source data and enhance the robustness of NRLI for global applications.
Author Contributions
Conceptualization, D.L. and C.L.; methodology, D.L. and Z.Z.; Writing—original draft preparation, D.L. and C.L.; writing—review and editing, B.Z., Z.Z. and K.S.; funding acquisition, B.Z., Z.Z. and K.S.; All authors have read and agreed to the published version of the manuscript.
Funding
This study was supported by the Changbai Talent Program of Jilin Province, China (Grant No. 202431210) and Jilin Province Natural Science Foundation (YDZJ202301ZYTS239).
Data Availability Statement
The data used in this study are subject to legal and contractual restrictions imposed by the data providers and cannot be publicly released. According to the data usage agreement and relevant regulatory requirements, redistribution of the original datasets is not permitted. However, to support research transparency, portions of the processed data used in this study can be made available from the corresponding author upon reasonable request, subject to compliance with data provider policies and applicable legal constraints.
Acknowledgments
The authors would like to thank the National Earth System Science Data Center, China (www.geodata.cn) for providing data support.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
| Abbreviation | Full Name | Formula |
| NRLI | Enhanced Chlorophyll Index | |
| EVI | Enhanced Vegetation Index | |
| GWCCI | Greenness and Water Content Composite Index | |
| NDVI | Normalized Difference Vegetation Index | |
| SWIR | Shortwave Infrared | B11 or B12 |
| LSWI | Land Surface Water Index | |
| RF | Random Forest | Ensemble method of decision trees |
| HLS | Harmonized Landsat and Sentinel | Time-series fusion of Landsat and Sentinel |
| UAV | Unmanned Aerial Vehicle | Aerial-based remote sensing system |
| GEE | Google Earth Engine | Cloud-based geospatial platform |
| USDA CDL | United States Department of Agriculture Cropland Data Layer | Agricultural land cover data for the U.S. |
| GIS | Geographic Information System | Spatial data management tool |
| R2 | Coefficient of Determination | |
| RMSE | Root Mean Square Error | sqrt(mean((observed − predicted)2)) |
| LAI | Leaf Area Index | Leaf area per unit ground area |
| DOY | Day of Year | Day number within a year |
| OA | Overall Accuracy | (TP + TN)/(TP + TN + FP + FN) |
| UA | User’s Accuracy | TP/(TP + FP) |
| PA | Producer’s Accuracy | TP/(TP + FN) |
| ST | Separation Threshold | Threshold that best separates classes |
| CNIK | China National Knowledge Infrastructure | Chinese database for academic resources |
| DTW | Dynamic Time Warping | Algorithm for matching time-series data |
| R1–R3 | Silking-to-Filling stage (Maize)/Flowering-to-PodDing stage (Soybean) | Growth stages of maize and soybean |
| C3 | C3 photosynthetic pathway | C3 plants, including soybean |
| C4 | C4 photosynthetic pathway | C4 plants, including maize |
| SOR | Spectral Overlap Rate | Overlapping pixels between crops in spectral bands |
| IQR | Interquartile Range | Statistical range between the 25th and 75th percentile |
| Q1–Q3 | First Quartile to Third Quartile | Middle 50% range between first and third quartiles |
| MVC | Maximum Value Composite | Selecting peak value from temporal series |
References
- Li, H.; Song, X.-P.; Hansen, M.C.; Becker-Reshef, I.; Adusei, B.; Pickering, J.; Wang, L.; Wang, L.; Lin, Z.; Zalles, V.; et al. Development of a 10-m resolution maize and soybean map over China: Matching satellite-based crop classification with sample-based area estimation. Remote Sens. Environ. 2023, 294, 113623. [Google Scholar] [CrossRef]
- Zhang, S.B.; Shi, B.F. The Asymmetric Tail Risk Spillover from the International Soybean Market to China’s Soybean Industry Chain. Agriculture 2024, 14, 1198. [Google Scholar] [CrossRef]
- Fisk, C.; Clarke, K.D.; Delean, S.; Lewis, M.M. Distinguishing Photosynthetic and Non-Photosynthetic Vegetation: How Do Traditional Observations and Spectral Classification Compare? Remote Sens. 2019, 11, 2589. [Google Scholar] [CrossRef]
- Shi, H.Z.; Liu, Z.Y.; Li, S.Q.; Jin, M.; Tang, Z.J.; Sun, T.; Liu, X.C.; Li, Z.J.; Zhang, F.C.; Xiang, Y.Z. Monitoring Soybean Soil Moisture Content Based on UAV Multispectral and Thermal-Infrared Remote-Sensing Information Fusion. Plants 2024, 13, 2417. [Google Scholar] [CrossRef] [PubMed]
- Abubakar, G.A.; Wang, K.; Shahtahamssebi, A.; Xue, X.; Belete, M.; Gudo, A.J.A.; Mohamed Shuka, K.A.; Gan, M. Mapping Maize Fields by Using Multi-Temporal Sentinel-1A and Sentinel-2A Images in Makarfi, Northern Nigeria, Africa. Sustainability 2020, 12, 2539. [Google Scholar] [CrossRef]
- Griffel, L.M.; Delparte, D.; Edwards, J. Using Support Vector Machines classification to differentiate spectral signatures of potato plants infected with Potato Virus Y. Comput. Electron. Agric. 2018, 153, 318–324. [Google Scholar] [CrossRef]
- Peña, J.; Gutiérrez, P.; Hervás-Martínez, C.; Six, J.; Plant, R.; López-Granados, F. Object-Based Image Classification of Summer Crops with Machine Learning Methods. Remote Sens. 2014, 6, 5019–5041. [Google Scholar] [CrossRef]
- Rußwurm, M.; Courty, N.; Emonet, R.; Lefèvre, S.; Tuia, D.; Tavenard, R. End-to-end learned early classification of time series for in-season crop type mapping. ISPRS J. Photogramm. Remote Sens. 2023, 196, 445–456. [Google Scholar] [CrossRef]
- Ashourloo, D.; Nematollahi, H.; Huete, A.; Aghighi, H.; Azadbakht, M.; Shahrabi, H.S.; Goodarzdashti, S. A new phenology-based method for mapping wheat and barley using time-series of Sentinel-2 images. Remote Sens. Environ. 2022, 280, 113206. [Google Scholar] [CrossRef]
- Xu, Y.J.; Ma, Y.C.; Zhang, Z. Self-supervised pre-training for large-scale crop mapping using Sentinel-2 time series. ISPRS J. Photogramm. Remote Sens. 2024, 207, 312–325. [Google Scholar] [CrossRef]
- Gadiraju, K.K.; Vatsavai, R.R. Remote Sensing Based Crop Type Classification Via Deep Transfer Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 4699–4712. [Google Scholar] [CrossRef]
- Hamza, M.A.; Alrowais, F.; Alzahrani, J.S.; Mahgoub, H.; Salem, N.M.; Marzouk, R. Squirrel Search Optimization with Deep Transfer Learning-Enabled Crop Classification Model on Hyperspectral Remote Sensing Imagery. Appl. Sci. 2022, 12, 5650. [Google Scholar] [CrossRef]
- Tanzer, D.N.; Witharana, C.; Fahey, R.T. Classification of tree mortality following drought-defoliation interaction using single date Landsat imagery and comparison to aerial detection surveys. Int. J. Appl. Earth Obs. Geoinf. 2025, 139, 104488. [Google Scholar] [CrossRef]
- Chen, H.; Li, H.; Liu, Z.; Zhang, C.; Zhang, S.; Atkinson, P.M. A novel Greenness and Water Content Composite Index (GWCCI) for soybean mapping from single remotely sensed multispectral images. Remote Sens. Environ. 2023, 295, 113679. [Google Scholar] [CrossRef]
- Woodcock, C.E.; Loveland, T.R.; Herold, M.; Bauer, M.E. Transitioning from change detection to monitoring with remote sensing: A paradigm shift. Remote Sens. Environ. 2020, 238, 111558. [Google Scholar] [CrossRef]
- Chen, R.; Sun, L.; Chen, Z.; Wuyun, D.; Sun, Z. Early Identification of Corn and Soybean Using Crop Growth Curve Matching Method. Agronomy 2024, 14, 146. [Google Scholar] [CrossRef]
- Zhou, L.F.; Meng, R.; Yu, X.; Liao, Y.G.; Huang, Z.H.; Lü, Z.G.; Xu, B.Y.; Yang, G.D.; Peng, S.B.; Xu, L. Improved Yield Prediction of Ratoon Rice Using Unmanned Aerial Vehicle-Based Multi-Temporal Feature Method. Rice Sci. 2023, 30, 247–256. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Deep learning techniques to classify agricultural crops through UAV imagery: A review. Neural Comput. Appl. 2022, 34, 9511–9536. [Google Scholar] [CrossRef]
- Cheng, X.; Sun, Y.; Zhang, W.; Wang, Y.; Cao, X.; Wang, Y. Application of Deep Learning in Multitemporal Remote Sensing Image Classification. Remote Sens. 2023, 15, 3859. [Google Scholar] [CrossRef]
- Wang, H.; Chang, W.; Yao, Y.; Yao, Z.; Zhao, Y.; Li, S.; Liu, Z.; Zhang, X. Cropformer: A new generalized deep learning classification approach for multi-scenario crop classification. Front. Plant Sci. 2023, 14, 1130659. [Google Scholar] [CrossRef]
- Ma, Y.Y.; Shen, Y.L.; Guan, H.X.; Wang, J.; Hu, C.L. A novel approach to detect the spring corn phenology using layered strategy. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103422. [Google Scholar] [CrossRef]
- You, N.; Dong, J.; Li, J.; Huang, J.; Jin, Z. Rapid early-season maize mapping without crop labels. Remote Sens. Environ. 2023, 290, 113496. [Google Scholar] [CrossRef]
- Murguia-Cozar, A.; Macedo-Cruz, A.; Fernandez-Reynoso, D.S.; Salgado Transito, J.A. Recognition of Maize Phenology in Sentinel Images with Machine Learning. Sensors 2021, 22, 94. [Google Scholar] [CrossRef] [PubMed]
- Kordelas, G.A.; Manakos, I.; Aragones, D.; Diaz-Delgado, R.; Bustamante, J. Fast and Automatic Data-Driven Thresholding for Inundation Mapping with Sentinel-2 Data. Remote Sens. 2018, 10, 910. [Google Scholar] [CrossRef]
- Xiao, G.; Huang, J.; Song, J.; Li, X.; Du, K.; Huang, H.; Su, W.; Miao, S. A novel soybean mapping index within the global optimal time window. ISPRS J. Photogramm. Remote Sens. 2024, 217, 120–133. [Google Scholar] [CrossRef]
- Liu, X.Y.; Zhang, J.S.; Li, X.H.; Shen, K.J.; Zhu, S.; Liang, Z.H. Highly efficient wheat lodging extraction algorithm based on two-peak search algorithm. Precis. Agric. 2025, 26, 27. [Google Scholar] [CrossRef]
- Hu, W.W.; Liu, Y.; An, J.; Xu, S.P.; Zhou, Z.W.; An, M.M.; Guo, X.K.; Ma, X.; Jiang, W.F.; Wang, Y.S. Intelligent irrigation strategy model for farmland using dung beetle optimization-random forest algorithms. Agric. Water Manag. 2025, 317, 109653. [Google Scholar] [CrossRef]
- Li, X.Y.; Yu, L.; Du, Z.R.; Liu, X.X. Crop Statistic to Annual Map: Tracking spatiotemporal dynamics of crop-specific areas through machine learning and statistics disaggregating. Sci. Data 2025, 12, 1249. [Google Scholar] [CrossRef]
- Wei, P.; Ye, H.C.; Nie, C.J.; Zhang, Y.; Liu, R.H. Remote sensing estimation of nitrogen content in scenes of different crop types based on the random forest algorithm. Comput. Electron. Agric. 2025, 231, 109987. [Google Scholar] [CrossRef]
- van Stekelenburg, A.; Bleize, D.N.M.; Riet, J.V.; Schaap, G.; Vlasceanu, M.; Doell, K.C. Communicating consensus among climate scientists increases estimates of consensus and belief in human-caused climate change across the globe. J. Environ. Psychol. 2024, 100, 102480. [Google Scholar] [CrossRef]
- Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 2018, 5, 180214. [Google Scholar] [CrossRef] [PubMed]
- Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
- Zhao, C.P.; Jia, M.M.; Zhang, R.; Wang, Z.M.; Ren, C.Y.; Mao, D.H.; Wang, Y.Q. Mangrove species mapping in coastal China using synthesized Sentinel-2 high-separability images. Remote Sens. Environ. 2024, 17, 114151. [Google Scholar] [CrossRef]
- Yue, J.B.; Tian, Q.J.; Liu, Y.; Fu, Y.Y.; Tian, J.; Zhou, C.Q.; Feng, H.K.; Yang, G.J. Mapping cropland rice residue cover using a radiative transfer model and deep learning. Comput. Electron. Agric. 2023, 215, 108421. [Google Scholar] [CrossRef]
- Venter, Z.S.; Sydenham, M.A.K. Continental-Scale Land Cover Mapping at 10 m Resolution Over Europe (ELC10). Remote Sens. 2021, 13, 2301. [Google Scholar] [CrossRef]
- You, N.; Dong, J.; Huang, J.; Du, G.; Zhang, G.; He, Y.; Yang, T.; Di, Y.; Xiao, X. The 10-m crop type maps in Northeast China during 2017–2019. Sci. Data 2021, 8, 41. [Google Scholar] [CrossRef]
- Boryan, C.; Yang, Z.W.; Mueller, R.; Craig, M. Monitoring US agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
- Souza, C.M.; Shimbo, J.Z.; Rosa, M.R.; Parente, L.L.; Alencar, A.A.; Rudorff, B.F.T.; Hasenack, H.; Matsumoto, M.; Ferreira, L.G.; Souza, P.W.M.; et al. Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine. Remote Sens. 2020, 12, 2735. [Google Scholar] [CrossRef]
- Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
- Friedl, M.A.; McIver, D.K.; Hodges, J.C.F.; Zhang, X.Y.; Muchoney, D.; Strahler, A.H.; Woodcock, C.E.; Gopal, S.; Schneider, A.; Cooper, A.; et al. Global land cover mapping from MODIS: Algorithms and early results. Remote Sens. Environ. 2002, 83, 287–302. [Google Scholar] [CrossRef]
- Song, X.P.; Hansen, M.C.; Potapov, P.; Adusei, B.; Pickering, J.; Adami, M.; Lima, A.; Zalles, V.; Stehman, S.V.; Di Bella, C.M.; et al. Massive soybean expansion in South America since 2000 and implications for conservation. Nat. Sustain. 2021, 4, 784–792. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Lou, Z.; Peng, D.; Zhang, B.; Luo, W.; Huang, J.; Zhang, X.; Yu, L.; Wang, F.; Huang, L.; et al. Mapping annual 10-m soybean cropland with spatiotemporal sample migration. Sci. Data 2024, 11, 439. [Google Scholar] [CrossRef] [PubMed]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Criminisi, A.; Shotton, J.D.J. Machine Learning Process for Providing Semi-Supervised Random Decision Forest for e.g., Interactive Image Segmentation Application, Involves Training Set of Random Decision Trees to Form Semi-Supervised Random Decision Forest. U.S. Patent 9,519,868, 13 December 2016. [Google Scholar]
- Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
- Griffiths, P.; van der Linden, S.; Kuemmerle, T.; Hostert, P. Pixel-Based Landsat Compositing Algorithm for Large Area Land Cover Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2088–2101. [Google Scholar] [CrossRef]
- Holben, B.N. Characteristics of Maximum-Value Composite Images from Temporal Avhrr Data. Int. J. Remote Sens. 1986, 7, 1417–1434. [Google Scholar] [CrossRef]
- Viña, A.; Gitelson, A.A.; Nguy-Robertson, A.L.; Peng, Y. Comparison of different vegetation indices for the remote assessment of green leaf area index of crops. Remote Sens. Environ. 2011, 115, 3468–3478. [Google Scholar] [CrossRef]
- Li, D.; Chen, J.M.; Zhang, X.; Yan, Y.; Zhu, J.; Zheng, H.; Zhou, K.; Yao, X.; Tian, Y.; Zhu, Y.; et al. Improved estimation of leaf chlorophyll content of row crops from canopy reflectance spectra through minimizing canopy structural effects and optimizing off-noon observation time. Remote Sens. Environ. 2020, 248, 111985. [Google Scholar] [CrossRef]
- Gitelson, A.; Merzlyak, M.N. Quantitative Estimation of Chlorophyll-A Using Reflectance Spectra—Experiments with Autumn Chestnut and Maple Leaves. J. Photochem. Photobiol. B-Biol. 1994, 22, 247–252. [Google Scholar] [CrossRef]
- Hagolle, O.; Lobo, A.; Maisongrande, P.; Cabot, F.; Duchemin, B.; De Pereyra, A. Quality assessment and improvement of temporally composited products of remotely sensed imagery by combination of VEGETATION 1 and 2 images. Remote Sens. Environ. 2005, 94, 172–186. [Google Scholar] [CrossRef]
- Pena-Barragan, J.M.; Ngugi, M.K.; Plant, R.E.; Six, J. Object-based crop identification using multiple vegetation indices, textural features and crop phenology. Remote Sens. Environ. 2011, 115, 1301–1316. [Google Scholar] [CrossRef]
- Shao, Y.; Lunetta, R.S. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J. Photogramm. Remote Sens. 2012, 70, 78–87. [Google Scholar] [CrossRef]
- Fu, X.; Chen, Y.; Yan, J.; Chen, Y.; Xu, F. BGRF: A broad granular random forest algorithm. J. Intell. Fuzzy Syst. 2023, 44, 8103–8117. [Google Scholar] [CrossRef]
- Qi, X.; Xu, B.; Wang, Z.P.; Hsu, L.T. Random forest-based multipath parameter estimation. GPS Solut. 2024, 28, 126. [Google Scholar] [CrossRef]
- Zhang, Y.F.; Wang, Y.H.; Gu, Z.F.; Pan, X.R.; Li, J.; Ding, H.; Zhang, Y.; Deng, K.J. Bitter-RF: A random forest machine model for recognizing bitter peptides. Front. Med. 2023, 10, 1052923. [Google Scholar] [CrossRef]
- Chen, M.; Chang, Z.; Jin, C.; Cheng, G.; Wang, S.; Ni, Y. Classification and Recognition of Soybean Quality Based on Hyperspectral Imaging and Random Forest Methods. Sensors 2025, 25, 1539. [Google Scholar] [CrossRef]
- Xie, Y.; Nhu, A.N.; Song, X.-P.; Jia, X.; Skakun, S.; Li, H.; Wang, Z. Accounting for spatial variability with geo-aware random forest: A case study for US major crop mapping. Remote Sens. Environ. 2025, 319, 114585. [Google Scholar] [CrossRef]
- Dey, S.; Bhogapurapu, N.; Homayouni, S.; Bhattacharya, A.; McNairn, H. Unsupervised Classification of Crop Growth Stages with Scattering Parameters from Dual-Pol Sentinel-1 SAR Data. Remote Sens. 2021, 13, 4412. [Google Scholar] [CrossRef]
- Bégué, A.; Arvor, D.; Bellon, B.; Betbeder, J.; de Abelleyra, D.; Ferraz, R.P.D.; Lebourgeois, V.; Lelong, C.; Simoes, M.; Verón, S.R. Remote Sensing and Cropping Practices: A Review. Remote Sens. 2018, 10, 99. [Google Scholar] [CrossRef]
- Ding, Y.M.; Wang, M.Y.; Jin, J.X.; Sun, Z.Y.; Zhang, J.; Zhu, L. Risk of real-time irrigation decision-making system for farmland in arid irrigation districts: Methodology and case study. Agric. Water Manag. 2025, 320, 109851. [Google Scholar] [CrossRef]
- Liu, Y.C.; Mizuta, K.; Morokuma, M.; Toyota, M. Effects of combined high temperature and water stress on soybean growth and physiological processes in a temperature gradient chamber. Field Crops Res. 2025, 333, 110063. [Google Scholar] [CrossRef]
- Nguy-Robertson, A.; Gitelson, A.; Peng, Y.; Viña, A.; Arkebauer, T.; Rundquist, D. Green Leaf Area Index Estimation in Maize and Soybean: Combining Vegetation Indices to Achieve Maximal Sensitivity. Agron. J. 2012, 104, 1336–1347. [Google Scholar] [CrossRef]
- Liao, Q.; Gu, S.J.; Gao, S.Y.; Du, T.S.; Kang, S.Z.; Tong, L.; Ding, R.S. Crop water stress index characterizes maize productivity under water and salt stress by using growth stage-specific non-water stress baselines. Field Crops Res. 2024, 317, 109544. [Google Scholar] [CrossRef]
- Ren, W.J.; Jiang, Q.Q.; Qi, W.L. Research progress in near-infrared spectroscopy for detecting the quality of potato crops. Chem. Biol. Technol. Agric. 2025, 12, 32. [Google Scholar] [CrossRef]
- Yue, J.B.; Wang, J.; Zhang, Z.Y.; Li, C.C.; Yang, H.; Feng, H.K.; Guo, W. Estimating crop leaf area index and chlorophyll content using a deep learning-based hyperspectral analysis method. Comput. Electron. Agric. 2024, 227, 109653. [Google Scholar] [CrossRef]
- Huang, L.; Miao, B.; She, B.; Zhang, A.; Zhao, J.; Ruan, C. Rapid mapping of soybean planting areas under complex crop structures: A modified GWCCI approach. Comput. Electron. Agric. 2025, 235, 110326. [Google Scholar] [CrossRef]
- Liu, L.; Xiao, X.; Qin, Y.; Wang, J.; Xu, X.; Hu, Y.; Qiao, Z. Mapping cropping intensity in China using time series Landsat and Sentinel-2 images and Google Earth Engine. Remote Sens. Environ. 2020, 239, 111624. [Google Scholar] [CrossRef]
- Griffiths, P.; Nendel, C.; Hostert, P. Intra-annual reflectance composites from Sentinel-2 and Landsat for national-scale crop and land cover mapping. Remote Sens. Environ. 2019, 220, 135–151. [Google Scholar] [CrossRef]
- Sudmanns, M.; Tiede, D.; Augustin, H.; Lang, S.F. Assessing global Sentinel-2 coverage dynamics and data availability for operational Earth observation (EO) applications using the EO-Compass. Int. J. Digit. Earth 2020, 13, 768–784. [Google Scholar] [CrossRef]
- Marsola, K.B.; de Oliveira, A.L.R.; Utino, M.Y.R.; Mann, P.; da Conceicao, T.C.O. The Impact of Exogenous Variables on Soybean Freight: A Machine Learning Analysis. Sustainability 2025, 17, 1067. [Google Scholar] [CrossRef]
- Ngcinela, S.; Mushunje, A.; Taruvinga, A.; Mutengwa, S.C.; Masehela, S.T. Mapping the Land Use Changes in Cultivation Areas of Maize and Soybean from 2006 to 2017 in the North West and Free State Provinces, South Africa. Agronomy 2024, 14, 1002. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.







