1. Introduction
Mapping target crops earlier than the harvest period is an essential task for improving agricultural productivity and decision-making. Early crop mapping provides valuable information for crop management, such as predicting yield [
1], monitoring crop growth [
2,
3], and identifying areas with high production potential [
4]. In recent years, the application of remote sensing techniques to early crop mapping has gained widespread popularity, owing to its inherent advantages of non-invasiveness and rapid data acquisition. By utilizing multispectral imagery (MSI) captured by satellites like Landsat-8 [
5,
6,
7], Sentinel-2 [
7,
8], and MODIS [
6,
9,
10], valuable insights into crop health, vegetation indices, and land cover classification can be obtained. These datasets play a pivotal role in assessing crop vigor, identifying stress factors, and mapping crop types, and ultimately facilitating more informed agricultural decision-making.
Among the remote sensing techniques used in early crop mapping, the NDVI and EVI have emerged as widely employed indicators for monitoring vegetation growth and identifying different crop types. NDVI quantifies the greenness of vegetation based on the difference between near-infrared and red spectral bands, allowing for the detection of vegetation density and health. Similarly, EVI incorporates additional spectral bands to mitigate the influence of atmospheric and canopy background effects, providing enhanced accuracy in characterizing vegetation dynamics. Numerous studies have been conducted to develop and improve early crop mapping methods using NDVI and EVI data. For instance, both the harmonized time-series NDVI and EVI from Landsat-8 and Sentinel-2 data are used with the decision tree method for crop identification [
11]. A time series analysis of MODIS NDVI data is utilized to map crop types for the US central great plains [
9]. They applied an unsupervised classification method (ISODATA) to the 15-date NDVI time series to produce the crop or non-crop map. Similarly, ref. [
12] used a multi-temporal MODIS NDVI approach to map soybeans in the USA for 2015. Furthermore, ref. [
13] proves that the general crop maps produced using the MODIS EVI and NDVI data both had very high overall (97.0%) and class-specific user’s and producer’s accuracies (ranging from 95 to 100%) for a case study for southwest Kansas.
However, accurately mapping crops at a large scale, such as for the entire CONUS, is challenging due to the heterogeneous nature of the crop-growing environment. This challenge arises from the intricate interactions among soil type, climate, and topography, which significantly influence the success of crop mapping [
10,
14,
15,
16,
17,
18]. The crop-growing environment displays substantial spatial and temporal heterogeneity. Soil types vary across regions, impacting vegetation growth and the reflectance captured by remote sensing data [
10,
15,
18]. Additionally, climate conditions, including temperature, precipitation, evaporation, wind speed, rainfall, and solar radiation, exhibit considerable variations, affecting crop phenology and productivity [
10,
14,
15,
16,
17,
18]. Moreover, topographic features, such as slope, aspect, and elevation, play a vital role in determining crop growth patterns by influencing factors like solar radiation distribution, water availability, and wind patterns [
10,
15,
18]. The failure to account for these influential factors in the crop mapping process can lead to inaccurate results, particularly when applied to large-scale areas. To address this issue, the concept of ecoregions was introduced. Ecoregions, which are geographically distinct areas with unique ecological patterns and processes, provide a valuable framework for large-scale crop mapping. By integrating knowledge of ecological characteristics within each ecoregion, such as climate, geology, and topography, it becomes possible to identify suitable crop-growing areas and predict crop distribution based on shared environmental characteristics. By leveraging the ecoregional context and incorporating factors like climate, soil types, and topography, crop mapping efforts can be tailored to effectively account for the heterogeneity of the crop-growing environment, leading to improved mapping accuracy at a large scale. Therefore, ecoregion clustering methods have been developed to address this issue [
19], and the static ecoregion clustering method [
20] was used for crop mapping by [
10]. The ecoregion clustering method involves analyzing environmental factors, such as soil type, climate, and topography, to divide the study area into multiple ecoregions. Each ecoregion has its own unique characteristics that affect crop growth, and these characteristics are taken into account during classification. This approach has been shown to improve crop mapping accuracy compared to country and state-level mapping [
10].
Nevertheless, aside from the large-scale problem, the inter-year challenge is also of great significance for precise crop mapping. The climate variation between different years within a given region is anticipated to significantly influence the patterns of Vegetation Indices (VIs) related to crop growth and, consequently, crop-mapping outcomes. The static ecoregion clustering method only provides a single ecoregion map for the CONUS. They proceed based on the assumption that the ecoregion remains unchanged over different years.
In order to address this limitation, our study considers the fluctuations in climate and introduces a dynamic ecoregion clustering approach. In this paper, we present a novel approach for mapping target crops (soybean and corn) earlier than the harvest period in the CONUS using time-series MODIS 250 m NDVI and EVI data from Google Earth Engine with a dynamic ecoregion clustering method. The dynamic ecoregion clustering method analyzes sand, climate, elevation, and slope data to divide the entire cropland areas of the CONUS into multiple ecoregions. It presents a clear advantage over the static approach as it produces a unique ecoregion map for each year in the region of interest. This is particularly important considering the year-to-year variations in climate, which directly impact crop growth and vegetation index patterns. A Random Forest classifier was then trained for each different ecoregion to classify the target crop. The results showed that the dynamic clustering method achieved significantly higher accuracy than the static clustering method. Specifically, for soybean mapping in 2018 in the entire CONUS, we observed an increase in user’s accuracy from 59.04 to 62.74%, representing a substantial improvement of 3.7%.
Our contributions can be summarized as follows:
We propose a novel approach for mapping target crops (soybean and corn) earlier than the harvest period in the USA using time-series NDVI and EVI data with a dynamic ecoregion clustering method.
We employ both the elbow method and silhouette method to ascertain the optimal number of ecoregions. Subsequently, we train an ecoregion clustering model using the Kmeans++ method, which allows us to generate ecoregion maps spanning the years 2013 to 2022, covering the entire cropland region within the CONUS.
We demonstrate significantly higher mapping accuracy using the dynamic clustering method compared to the static clustering method.
The rest of this paper is organized as follows.
Section 2 describes related works.
Section 3 presents the data, methods and system architecture. Performance evaluation and metrics calculation are presented in
Section 4. Finally, we draw discussions and conclusions in
Section 5 and
Section 6.
2. Related Works
In
Table 1, we present various methods related to crop mapping, along with the factors they consider, the datasets utilized, the vegetation indices employed, and the specific research regions. Various classification algorithms, including unsupervised methods like k-means clustering and supervised methods like decision trees and deep learning approaches, have been utilized for crop classification at small scales using Landsat-8 and Sentinel-2A data.
One study [
15] focuses on utilizing the Random Forest algorithm for mapping and predicting rice yield through analysis of Sentinel-2 satellite data. Another research [
11] effort concentrates on creating a high-resolution crop intensity mapping methodology by integrating data from Landsat-8 and Sentinel-2 satellites using a random forest algorithm. Furthermore, a hybrid deep-learning architecture called CerealNet [
21] has been introduced for the specific purpose of cereal crop mapping, utilizing Sentinel-2 time-series data. However, ref. [
21] has limitations in its scope, as it specifically examines a research region characterized by a hot Mediterranean climate with dry summers. This region selection addresses a common challenge posed by cloud cover in time-series data analysis of Landsat-8 and Sentinel-2 images. Cloud cover plays a crucial role in crop mapping by impacting the availability and quality of remote sensing data. The presence of clouds obstructs satellite imagery, leading to the loss or concealment of vital information about the Earth’s surface. Consequently, these cloud-induced data gaps undermine the accuracy and reliability of crop-mapping results. The negative impact of cloud cover on crop mapping becomes even more pronounced when considering large-scale applications, such as mapping crops across the entire CONUS. The expansive coverage of such regions makes them more susceptible to varying cloud cover patterns, resulting in extensive areas with missing or incomplete data. This increases the risk of biased or inaccurate crop classification, making it challenging to obtain a comprehensive and reliable understanding of crop distribution and dynamics at a broader scale.
In order to overcome this limitation of Landsat-8 and Sentinel-2 images, some works focus on using time-series MODIS data for early crop mapping. MODIS data can overcome the limitations imposed by cloud cover on crop mapping due to its unique capabilities. Its moderate spatial resolution allows for wider coverage and reduces the impact of cloud cover, while its frequent revisit time ensures more opportunities to capture cloud-free images, enabling a more consistent and reliable monitoring of crop patterns at regional or global scales. Previous research [
22,
23] has shown that individual major crops, such as corn and soybeans, can be mapped accurately as early as July and August using MODIS dataset.
However, these studies were also limited in scope, focusing on counties, states, or groups of states. A crop mapping model that is limited in scope cannot be directly applied to a large-scale setting due to spatial differences. The presence of spatial heterogeneity, encompassing diverse factors like soil types, climate, topography, and other environmental elements, significantly impacts crop growth patterns and diminishes the model’s accuracy outside of its intended region. Overcoming this challenge necessitates the integration of spatially explicit information into the method, effectively capturing and using the spatial variations and encompassing the diverse conditions present across the target area. There are some works [
16,
17], that use Growing-degree-day (GDD), which is a valuable metric that quantifies the accumulated heat necessary for the growth and development of vegetation. Its significance lies in its crop-specific nature, as the magnitude of GDD required during different growing stages varies for each crop. This characteristic makes GDD a valuable tool in crop classification, as it provides insights into the progression and timing of crop growth. By considering the crop-specific GDD thresholds for various stages, it becomes possible to leverage this metric for accurate and effective crop classification and monitoring. Other studies have performed crop classifications at smaller administrative units such as Agriculture Statistics Counties and Districts [
24], states [
22], or Agroecological zones [
18]. Nonetheless, these methodologies either disregard fluctuations in precipitation and soil characteristics or are conducted within administrative or political demarcations that lack relevance to crop phenology. Alternatively, they encompass areas of such magnitude that they fail to encompass the nuanced phenological fluctuations driven by climatic variations.
An ideal approach would be to model regions based on environmental variables that reflect crop growing conditions and are of small size, created using quantitative analytical methods that are both empirical and reproducible. Multivariate Geographic Clustering (MGC) algorithms [
19] and multivariate spatio-temporal clustering (MSTC) [
10,
20] have been successfully used to create ecoregions that exist within similar combinations of ecologically relevant conditions such as temperature, precipitation, soil, and topographic properties on a map.
However, an identified limitation of the study [
10] is that they assume that the ecoregion boundaries remain constant, which may not accurately represent the true variability. In addition, they solely present a single 500-ecoregions map directly utilizing MSTC without incorporating any decision-making process to determine the optimal number of ecoregions. In order to overcome these limitations, our proposed method introduces a dynamic ecoregion map that adjusts the boundaries based on climate data specific to each year’s crop growing season. Additionally, we employ the elbow method to ascertain the optimal number of ecoregions. This dynamic approach ensures a more precise and up-to-date understanding of crop phenology across diverse regions by adapting the ecoregion boundaries to reflect the evolving environmental conditions for each growing season. Ultimately, in conjunction with the crop classifier, this approach enhances the accuracy of early crop-mapping results.
3. Data and Methods
In this section, we present the study area and data in
Section 3.1, the system overview in
Section 3.2, the development of the dynamic ecoregions in
Section 3.3, and the training and evaluation methods of the crop classification model in
Section 3.4.
3.1. Study Area and Data
In this paper, we focus on mapping soybean and corn across the entire cropland area of the CONUS.
To achieve this, we split the cropland area into several ecoregions using soil, climate, elevation, and slope data as environmental data. The cropland area can be precisely delineated by the cultivated layer within the CDL, a crop-specific land cover raster map dataset available for the entire CONUS at 30 m resolution provided by the USDA [
25]. As shown in
Table 2, our soil data with a resolution of 250 m × 250 m were obtained from the ISRIC SoilGrids Dataset [
26] and included parameters such as bulk density, clay particle proportion, total nitrogen, soil pH, sand particle proportion, silt particle proportion, and soil organic carbon content for 2015. We utilize ERA5, the 5th major atmospheric reanalysis produced by ECMWF [
27], with a resolution of 11,132 m × 11,132 m from 2013 to 2022 as the climate data, which is comprised of air and soil temperature, precipitation, evaporation, wind speed, solar radiation, runoff, and soil water volume. We also incorporated elevation data with a resolution of 231.92 m × 231.92 m from the GMTED2010 Dataset [
28] for 2010 and calculated the slope based on the elevation.
Furthermore, we use remote sensing data as the classification input data. We extract time-series MODIS 250-m NDVI and EVI data from Google Earth Engine [
29]. The data are captured at a 16-day interval, covering the growing seasons between 2013 and 2022. Our objective is to locate the target crops, namely corn and soybeans, at an earlier stage in the entire CONUS. Corn harvest commences on 1 September, while soybean harvest begins on 1 October. Consequently, each year, we collect VIs data from 1 April to mid-July, encompassing a total of seven-time points. This approach accounts for the 16-day temporal frequency of the MODIS data and aligns with the vegetation growth patterns during this period. The CDL data were used as the ground truth for the classification training and evaluation process. Our crop classifiers were trained over 2013–2017 and applied to the period 2018–2022 as test years.
3.2. System Overview
The system architecture of our early crop mapping system with a dynamic ecoregion clustering method is shown in
Figure 1. In summary, our proposed ecoregion clustering algorithm fetches and standardizes soil, climate, elevation, and slope data from various sources to build a clustering model using the K-means++ method [
30]. This model is trained exclusively on target cropland regions from 2013, resulting in a well-trained ecoregion clusterer. Using this clusterer, we are able to partition the complete cropland area expanse into multiple distinct ecoregions, a process repeated annually within the timeframe spanning 2013 to 2022. With the provided ecoregion maps, we create a specific early crop-mapping classifier using the Random Forest method for the same ecoregion. These classifiers utilize time-series 250 m resolution MODIS NDVI and EVI data from 2013–2017 to predict early crop mapping for 2018–2022, incorporating dynamic ecoregion maps for each year. We conducted a random sampling of NDVI and EVI data points within each ecoregion, spanning from April 1st to mid-July, to gather our training data. This timeframe aligns with our objective of early locating target crops such as corn and soybean, as they are typically harvested starting from 1 September. We merge the training data points from multiple training years to train a dedicated crop mapping classifier for each ecoregion. Finally, we applied each classifier to the corresponding ecoregion and mosaiced the crop mapping results from all ecoregions to obtain the final crop mapping outcome.
3.3. Development of Dynamic Ecoregions
Our study focuses on accurately mapping the growing area of soybean and corn crops before their harvest periods throughout the entire cropland area of the CONUS. To achieve this goal, we utilize a dynamic ecoregion clustering method that relies on soil, climate, elevation, and slope data from various sources. Due to the absence of multi-year soil and topography data, our assumption is that soil quality, elevation, and slope conditions remain relatively stable over time, while climate conditions exhibit variability. Various methods exist for ecoregion clustering, with hierarchical clustering [
31] being one such approach known for accommodating datasets with nested or hierarchical structures. However, for our specific scenario, we prefer to adopt a simpler clustering method. This decision is driven by the intention to maintain simplicity in our current approach while preserving compatibility for future integration of multi-year soil and topography data. Consequently, we have chosen a direct clustering method for ecoregion clustering, with the expectation that it will facilitate a smoother transition when more comprehensive data become available in the future.
Each year, we initiate the process by reprojecting the data into a 10,000 m resolution. In the case of soil data, we compute the average metric value by consolidating values from various depths. When dealing with climate data, we determine both the mean and variance for each metric within the specified crop growth period. Notably, when it comes to wind speed, we combine the eastward and northward components to create a total wind speed, discarding the directional information. Then we randomly sample 10,000 points from the target crop region as the training data. Subsequently, we employ the Principal Component Analysis (PCA) method to reduce the data dimensions from 20 (8 for soil conditions, 10 for climate conditions, and 2 for topography conditions) down to 5. We trained our dynamic ecoregion clustering model using the K-means++ method and the pre-processed training data from 2013 and applied it to the entire cropland area, as determined by the cultivated layer of CDL, for the years 2013–2022.
This allows us to identify regions with similar environmental characteristics, which is essential for accurate early crop mapping. In order to determine the optimal number of clusters, we utilize the elbow method to calculate the within-cluster sum of squares (WCSS) value using Equation (
1):
where
n denotes the total number of ecoregion clusters,
denotes the number of pixels contained within cluster
i,
denotes the vector representing each pixel within cluster
i, and
represents the centroid vector of cluster
i. This allows us to strike a balance between cluster granularity and computational efficiency. If the number of clusters is too small, the environmental similarities may not be unique enough for each cluster, compromising the accuracy of our results. Conversely, if the number of clusters is too large, it may result in excessive computational cost, hindering the practical application of our approach. By finding the optimal number of clusters, we are able to maximize the accuracy of our approach while ensuring its computational feasibility.
The resultant ecoregion maps with different ecoregion numbers, were then used in our crop classification training and testing processes. For instance,
Figure 2a displays the entire cropland area of CONUS for the year 2013, as obtained from the CDL layer. We restrict our analysis to the green region that corresponds to the soybean cropland region. We determine the optimal number of ecoregions as 10, identified through the WCSS value. Subsequently, we employ our well-trained ecoregion clustering model to the entire cropland area to generate the ecoregion map for 2013, which is shown in
Figure 2c. The ecoregion maps from 2014 to 2022 for soybean mapping are also shown in
Figure 3.
To illustrate the variation in time-series vegetation indices of our target crop, we present the time-series average NDVI and EVI curves of soybean for each ecoregion based on ten clusters in 2013 in
Figure 4. It is evident that the patterns of these features exhibit significant differences across different ecoregions. To demonstrate distinctions among the distributions of VIs data across various ecoregions, we employ the Maximum Mean Discrepancy (MMD) as a measure. MMD allows us to measure the discrepancy between the distributions of VIs from different ecoregions, providing a quantitative measure of the similarity. Specifically, we compute MMD values between ecoregion 1 and all other ecoregions in our analysis. To ensure statistical robustness, we perform the following steps:
We randomly sample 20,000 data points from ecoregion 1.
For ecoregions 2–10, we also draw random samples of 10,000 data points each.
For ecoregion 1, we calculate the MMD by comparing two subsets of 10,000 data points each, where the first subset consists of the initial 10,000 data points, and the second subset comprises the last 10,000 data points.
For the other ecoregions (ecoregions 2–10), we compute the MMD by comparing the initial 10,000 data points from ecoregion 1 with the 10,000 data points from each of the other ecoregions.
The resulting MMD values are summarized in the
Table 3. The MMD within ecoregion 1 is notably lower than the MMD between ecoregion 1 and the other ecoregions, providing clear evidence of significant variations in VIs data distribution among different ecoregions.
Moreover, to highlight the varying patterns of VIs within a specific region across disparate ecoregions over different years, we have designated two distinct areas, which changed the ecoregion types between 2014 and 2021, as illustrated in
Figure 5. The distribution of soybean VIs data is then presented using t-distributed stochastic neighbor embedding (t-SNE) [
32] in
Figure 6. Evident disparities in these distributions between the years 2014 and 2021 are discernible, underscoring the fluctuating nature of soybean growth-related VIs data within these designated regions.
3.4. Crop Classification Model
Once the ecoregions are established, a specific early crop mapping classifier can be trained for each different ecoregion based on the cropland region, which is filtered out by the cultivated layer of the CDL. Random Forest classification models have become a popular tool for mapping land cover due to their flexibility in handling nonlinear relationships between input features and class membership, and their intuitive decision rules. During training, the tree grows by recursively partitioning the data into less heterogeneous groups until the desired level of accuracy or purity is achieved. The final model is constructed when all leaf nodes are generated, with each leaf node representing either a pure class or a mixture of two classes, determined by the proportion of training pixels in the node. However, these models are prone to overfitting, which can lead to poor generalization performance. To address this issue, we applied a bagging procedure and trained a random forest consisting of 50 classification tree models. Using a binary target-crop and non-target-crop training dataset, we locate our target crop across the corresponding ecoregion with the well-trained classifier. This approach significantly improved the stability and prediction accuracy of the model.
Since the CDL with the cultivated layer has been available only since 2013, the model was trained and validated in the years 2013–2017 and tested independently in the years 2018–2022.
3.4.1. Model Training within the Training Period
As per the USDA crop calendar, the seeding for corn and soybean typically begins after 1 April. Hence, we extracted time-series MODIS 250 m NDVI and EVI images with a 16-day temporal frequency from April 1st until the middle of July for 2013–2017. For each year, we compose these VIs images into one image as the composed image. Each composed image comprised seven NDVI bands and seven EVI bands. For each training year, we randomly selected 10,000 training sample points from the composed image for each different ecoregion, with half for target-crop points and half for non-target-crop points. Then we merge all the training sample points for the same ecoregion to train the ecoregion-specific crop classification model.
3.4.2. Model Evaluation within the Test Period
In order to evaluate our crop classification model, we first extracted the VIs images and provided the composed images for the years 2018–2022. To assess the effectiveness of our approach, we designed an experiment to compare the results of using a dynamic ecoregions map versus a static one. For the static map, we sample the training points from the 2013–2017 period and classify the target crops for 2018–2022 only using the ecoregion map from 2013, as shown in
Figure 2c, with fixed boundaries for each ecoregion. In contrast, for the dynamic map, we sample the training points from the 2013–2017 period based on the ecoregion maps for 2013–2017, as shown in
Figure 2c and
Figure 3a–d, and classify the target crops for 2018–2022 based on each year’s specific ecoregion map, as shown in
Figure 3e–i. The following results show that in most cases the dynamic model outperformed the static model. These findings demonstrate the benefits of using dynamic ecoregion maps in crop classification and highlight the importance of accounting for climate fluctuations when developing land cover maps.
3.4.3. Evaluation Metrics
To evaluate the accuracy of our classification, we used three metrics: Producer’s Accuracy, User’s Accuracy and Overall Accuracy, which are defined in Equations (
2)–(
4):
where TP, TN, FP, and FN refer to the numerical values in a confusion matrix that correspond to true positive (correctly predicted positive cases), true negative (correctly predicted negative cases), false positive (incorrectly predicted positive cases), and false negative (incorrectly predicted negative cases) outcomes, respectively.
The Producer’s Accuracy represents the map’s accuracy from the map producer’s perspective, indicating the probability that a ground feature is correctly classified by the map. On the other hand, the User’s Accuracy represents the map’s reliability from the user’s perspective, i.e., the probability that a feature on the map is actually present on the ground. The Overall Accuracy typically quantifies the fraction of all CDL pixels, encompassing both target crop and non-target crop pixels, that our crop classification method correctly identifies and maps.
5. Discussion
Our study demonstrates that our approach, incorporating dynamic ecoregion clustering and random forest classification, yields markedly higher accuracy compared to the static clustering method in the context of early crop mapping for target crops such as soybean and corn across the entire cropland region.
We employed the elbow method and silhouette method to ascertain the optimal number of ecoregion clusters. This approach allowed us to strike a balance between accuracy and computational efficiency, enabling us to achieve satisfactory early crop mapping results while conserving computational resources. However, there exists a limitation: when there is significant climate variability affecting the vegetation index (VIs) patterns of the target crops, but not substantial enough to alter the ecoregion boundaries, our system may struggle to address this scenario effectively. As illustrated in
Figure 10a, the overall accuracy is notably low in the Southern USA. This issue becomes apparent when examining
Figure 2c and
Figure 3, where, for the Southern USA region (depicted in blue), the ecoregion remains unchanged from 2013 to 2022. Consequently, our method fails to account for VIs pattern fluctuations in this region during this time period.
In our investigations, we observe that our dynamic ecoregion method, employing two, five, and ten distinct ecoregions, typically outperforms the static ecoregion method across most years in early soybean mapping. Particularly, the ecoregion configuration with ten divisions yields the most accurate outcomes. However, an interesting anomaly arises in the year 2020, where the static ecoregion map exhibits superior User’s Accuracy performance. This divergence implies that the static ecoregion map could possess advantages in specific scenarios, although overall, our dynamic ecoregion method generally exhibits enhanced accuracy. Notably, a comparison of pixel-wise overall accuracy between the dynamic and static methods for the soybean mapping underscores that the dynamic approach significantly outperforms the static approach in regions where shifts in ecoregion types between 2013 and 2021 have occurred. This contrast is particularly evident in the northern CONUS, confirming the enhanced robustness of our method against climate fluctuations between different years.
Interestingly, we find that the dynamic method outperforms the static method significantly in soybean mapping, while the improvement is less pronounced in corn mapping. Although the exact cause has not been explored in this paper, it is plausible that climate fluctuations exert a more substantial impact on soybean growing than on corn growing.
In conclusion, our proposed approach not only attests to its efficacy in terms of soybean and corn mapping, but also highlights the benefits of dynamic ecoregion clustering in coping with the intricate influences of climate fluctuations on crop mapping accuracy. Furthermore, our result map provides early insights into crop conditions, significantly preceding the CDL release. This timeliness is pivotal for making informed decisions early in the growing season. While the USDA unveils its CDL layer in January or February of the following year, our results are accessible by mid-July of the same year, offering information approximately 7 months ahead of the CDL. In addition, producing and updating a 250 m map proves to be more resource-efficient when compared to the maintenance of a nationwide 30 m map, particularly for research and monitoring purposes. However, the lower resolution of MODIS VIs data causes reduced accuracy in mapping target crops, when contrasted with Sentinel-2 and Landsat data. This decrease in resolution results in the loss of finer details and subtleties within the agricultural landscape, thereby presenting challenges in distinguishing between various crop types and detecting smaller-scale changes. It is important to note that this limitation constitutes a constraint within our work.
6. Conclusions
This paper introduces an innovative approach for early crop mapping across the entire land area of the CONUS by utilizing NDVI and EVI data combined with a dynamic ecoregion clustering technique. Unlike static ecoregion clustering, which generates a single unchanging ecoregion map, our dynamic approach results in a unique ecoregion map for each year. This dynamic strategy enables us to incorporate the year-to-year climate variations that significantly influence crop growth, thereby heightening the precision of our crop mapping process.
With the ecoregion maps for 2013–2022 established by the dynamic ecoregion clustering, a specific early crop mapping classifier can be trained for each different ecoregion. We used a bagging procedure and trained a random forest consisting of 50 classification tree models to locate our target crop for each ecoregion separately across the entire cropland region of the CONUS. The model was trained and validated in the years 2013–2017 and tested independently in the years 2018–2022. The results showed that the dynamic clustering method achieved significantly higher accuracy than the static clustering method. Our method has significant implications for forecasting crop yield and food production for countries.
In our future research endeavors, we intend to broaden the scope of our work by applying the dynamic ecoregion method to estimate target crop yields. This represents a crucial evolution of our current approach, as it will enable us to not only identify and map crops within distinct ecological regions, but also predict and quantify their potential yields. By harnessing the power of this method, we aim to provide valuable insights into agricultural productivity, aiding farmers, policymakers, and researchers in making informed decisions.