1. Introduction
The early estimation of crop planting areas [
1] aims to predict the crop planting areas in specific regions. Against the international backdrop of frequent extreme weather events, escalating global conflicts, and intensifying economic shocks, food production and supply chain trade are severely threatened, and the importance of food security has become even more prominent [
2,
3,
4,
5]. Accurate and timely agricultural production information is crucial for ensuring global food security [
6]. As a key link in achieving the effective allocation of government resources, optimizing resource allocation plans and preparations, and formulating import and export strategies related to food security, accurate and timely information on the early estimation of crop planting areas is of great significance for ensuring global food security [
7].
With the development of space satellite technology, the estimation of crop planting areas has shifted from labor-intensive methods (such as manual interviews and field surveys) to a traditional approach with remote sensing image processing. This method has become the mainstream approach for estimating large-scale planting areas because it can effectively monitor large areas of land and reduce the reliance on physical labor. It involves two main steps: first, classifying remote sensing images [
8,
9,
10], and then calculating the planting area based on the classification results [
11].
This method is limited by remote sensing datasets and the shortcomings of the traditional model, often falling short in high-precision crop area estimation. First, the timing of remote sensing image acquisition and crop germination characteristics limit the availability of images during the early stages of crop growth, resulting in delays in area estimation [
12,
13]. Second, for the traditional method, the errors arise from both the imperfect classification or segmentation results (e.g., commission, omission) and the limitations of the subsequent area calculation stage (e.g., handling mixed pixels, boundary effects, and the spatial resolution limitations of remote sensing images) [
12,
14,
15,
16]. All of the above are the key difficulties that are currently hard to overcome in area estimation work.
In addition, during the estimation of the planting area, there is also a problem of a lack of consistency in different sample inputs. For deep learning models, the unevenness of input sizes can easily plunge the models into training difficulties. In the realm of traditional visual computing, conventional means such as scaling and cropping are generally used to process data. However, since the model input data carry actual geographical information and are restricted by practical factors such as the shapes and sizes of administrative regions, traditional image processing techniques are not applicable in this context, thus adding numerous difficulties to the estimation of the planting area.
Last but not least, the labeled data of the planting area exhibit a markedly uneven distribution. Taking the planting area data at the county level in the United States as an example, influenced by multiple factors including local climate conditions and geographical environment, the data distribution among various county-level units varies greatly and is extremely uneven [
17]. This imbalance further exacerbates the difficulty of training deep models.
In this study, the primary motivation and contribution of our approach lie in the early prediction of time. We propose an end-to-end predictive network for an accurate early crop planting area estimation method. This method utilizes the land cover data accumulated over the past several years to predict the planting area of the current year. Theoretically, once the crops are harvested in the previous year, the planting area for the next year can be calculated accordingly. The proposed method is evaluated on county-level corn and soybean planting area estimation in the United States, and the experimental results show the effectiveness of the model.
The contributions of this work are as follows:
We propose a novel large-scale end-to-end method for early estimation of crop planting areas that avoids the errors of traditional methods as much as possible and completes the prediction before the crops are sown. Moreover, the end-to-end predictive network has potential in capturing complex spatiotemporal dependencies.
A time-series-based pixel inference method is proposed. Land cover datasets from previous years are used as the data source, instead of relying on remote sensing images from the current year, which addresses data scarcity in early planting area estimation.
A multi-subimage technique is proposed to effectively resolve the challenge of uneven input sizes in remote sensing images for end-to-end model training. Additionally, a label distribution smoothing technique alleviates data imbalance in crop planting predictions.
2. Related Work
Crop planting area estimation aims to predict the area of a specific crop in a given region. For example, Door County has a soybean planting area of 21,500 acres, and Brown County has a soybean planting area of 13,600 acres, estimated from remote sensing data, shown in
Figure 1.
For region
in year
, we assume the availability of earth observation data
and corresponding specific crop planting area data
, where
and
denote the spatial resolution (height and width) of the data
. Let
and
. We aim to develop a function
F(⋅) that maps the input data
to the planting area
, capturing the relationship between the spatial features and the crop planting area.
Currently, area estimation typically employs the two-stage approach to solve the problem of classification and area estimation.
In the crop planting area segmentation stage, previous related works have primarily relied on remote sensing images, making the accuracy of crop area extraction heavily dependent on the quality of these images [
18]. Bellon et al. [
11] addressed the impact of cloud cover on crop area segmentation by combining remote sensing images with various vegetation indices. Several studies [
19,
20,
21,
22,
23,
24,
25] have demonstrated the effectiveness of deep learning in handling complex datasets. For example, Gallo et al. [
22] proposed a 3D-FPN network utilizing time-series remote sensing images. This method employs a three-dimensional pyramid convolutional neural network to integrate temporal and spatial features of remote sensing data, achieving accurate crop area segmentation. Leveraging temporal complementarity mitigates the impact of low-quality images. Building on this, Yan and Yang et al. [
24,
25] extended the approach to multi-source satellite datasets, further improving classification performance. Wang et al. [
26] proposed a Temporal Memory Attention Network (TMANet) based on the self-attention mechanism, which is capable of adaptively integrating temporal relationships in time series data. Specifically, this method constructs a memory using images from several previous moments to store the temporal information of the current image. On this basis, the researchers further designed a Temporal Memory Attention Module to capture the relationship between the current frame and the memory, thereby enhancing the representation of the current image. Through this approach, the network can achieve pixel-level image classification. However, there are still misclassification and missed classification phenomena in the classification and segmentation results of the above-mentioned methods.
In the crop planting area calculation, errors come from mixed pixels, boundary effects, and the spatial resolution limitations of remote sensing images. Traditional methods estimate areas based on the affine matrix of remote sensing images. Liang et al. [
27], for instance, proposed a pixel-statistic-based area calculation method. However, this method treats remote sensing imagery as isolated pixels, neglecting spatial structural information and reducing accuracy. Therefore, Jin et al. [
14] proposed an error distribution-based classification method for area calculation. Zhang et al. [
12,
18] introduced a new definition named “fragmentation” to quantify spatial deviations in crop planting areas. Olofsson et al. [
15] effectively reduced the error of area estimation by combining spatial probability sampling and accuracy confidence interval quantization. Recently, Lu et al. [
13] proposed a new method using the concept of “mixed pixels” and considering the area calculation errors of different classified crop planting areas.
While these methods have improved area estimation accuracy, several challenges remain. First, the reliance on remote sensing images and crop germination timing makes traditional two-stage methods unsuitable for early estimation, leading to delays. Second, image quality, affected by weather conditions and sensor degradation, directly influences classification accuracy, introducing errors. Third, classification-based area estimation is vulnerable to error accumulation, reducing reliability.
Furthermore, existing models depend on authoritative crop area data, typically released by administrative units of varying sizes. For example, Brown County in the United States spans 1594 square kilometers, whereas Door County covers 6138 square kilometers. This disparity presents two challenges: (1) irregular administrative boundaries, leading to inconsistencies in the input data size and shape; and (2) the labeled data of the planting area exhibits a markedly uneven distribution because major production areas dominate the dataset.
Generally speaking, scaling and cropping techniques are used to solve the inconsistent input image sizes in the traditional field of computer vision [
28,
29]. For example, He et al. [
30] ensured the uniform size of images by employing the spatial pyramid pooling technique. Long et al. [
31] addressed the issue of varying input image sizes by combining kernel methods and Global Average Pooling methods. The approach of Mao et al. [
32] involved cropping single fixed-size images from the region, yet this may fail to fully cover large or elongated counties, resulting in information loss or sampling bias. However, each pixel point owns actual physical significance in the problem of area estimation, which is different from the tasks of object detection and image segmentation in the traditional field of computer vision. You et al. [
33] were inspired by the data statistical approach and transferred remote sensing images of different sizes into histograms with specified intervals. However, this method ignored the geospatial structure of remote sensing images.
With the rapid development of machine learning in data science, a host of techniques, like data normalization and feature engineering, are utilized to process imbalanced datasets [
33]. Ahmed et al. [
34] used the E-IRFS method to adaptively adjust the balancing strategy and achieved good results in image data. However, the differences in crop planting areas data are significant between the major and minor production areas, which might lead to difficulty in getting the target based on image data decomposition methods.
To address these challenges of early prediction difficulties, error accumulation, input data inconsistencies, and data imbalance, this study proposes an end-to-end predictive network for accurate early crop planting area estimation, based on our previous research [
35]. It directly utilizes the land cover layer data for area estimation, which not only avoids the impact of segmentation accuracy on area estimation in the two-stage method but also significantly advances the timeline for crop planting area estimation.
3. Methodology
3.1. Overall Workflow
Our goal is to remove the classification step from the area estimation process in the question of , and develop a direct method that estimates crop planting areas from data to target without intermediate steps. Accordingly, a novel method called the “End-to-End Predictive Network for Accurate Early Crop Planting Area Estimation” has been proposed. The overall structure and workflow are introduced.
In this section, we use cropland data (CDL) as the input for an end-to-end network and treat crop planting area estimation as the output. The proposed framework is illustrated in
Figure 2. First, we organize historical CDL over multiple years into time series at the pixel level. Using multi-subimage technology, the time-series data for the county region is divided into uniformly sized sub-blocks to facilitate model input and feature learning. Subsequently, a three-layer 1 × 1 3D convolutional network is employed to extract features and perform deep learning along the temporal dimension for each pixel. It is worth noting that we assume the planting patterns for a given year can be inferred from historical planting data (see the pixel-based temporal inference theory for details). The network then utilizes fully connected layers to learn latent patterns in the spatial dimension further. Finally, area estimation is achieved through regression, incorporating label distribution smoothing.
3.2. The End-to-End Predictive Network
In recent years, the convolutional neural network has been successfully used for human activity recognition and crop classification tasks. Because the convolution kernel has a controllable receptive field, we used a convolutional neural network and a fully connected network to predict the crop planting area in this paper. Based on the time series-based inference methods (
Section 3.3) and the multi-subimage technology (
Section 3.4), the CDL data
of the specified area
can be dealt with by
for the predictive network input, which is organized into time series data on a specific crop type.
The input data for the predictive network, denoted as , are a four-dimensional tensor where represents the number of region partitions, is the number of years of historical data, and defines the height and width of each subimage.
To estimate the planting area of different crops, we propose an area estimation spatiotemporal learning network
, whose structure consists of three main components: (1) a temporal inference module with four convolutional layers designed to capture the potential relationship between historical data and the planting area of current year, (2) a feature fusion module with two fully connected layers aimed at extracting spatial relationships related to planting areas, and (3) a crop planting area estimation module with three fully connected layers for performing area regression and obtaining the estimated area
, as detailed in
Table 1.
The layer “Conv_” means the convolutional layer, and the layer “Fc_” means the fully connected layer.
PreLU [
36] is used as the activation function in the network, which can effectively mitigate gradient explosion.
Furthermore, we also considered the advantages of other network models at the same time and replaced the prediction network in the end-to-end network. The results are shown in the
Section 4.4 discussion on prediction networks.
3.3. The Time-Series-Based Pixel Inference Method
Recently, machine learning techniques have been regarded as a practical approach for discovering the implicit patterns and structures in high-dimension datasets, and they have especially been widely utilized in the field of land-use/land-cover studies. Therefore, in this section, the time-series-based pixel inference method is introduced and employed to solve the problem of lacking data in the stage of early crop planting. The time-series-based pixel inference refers to pixels predicted from the historical CDL dataset with high confidence in the current year crop type.
The time-series-based pixel inference method involves the following. Firstly, the historical CDL dataset
is formatted into a collection of crop types in time-series sequence features for all pixels
, and each crop sequence feature is a one-dimensional time-series historical CDL data list at the pixel level, shown in Formula (3). Where,
means CDL data for last t years, and the reconstruction data
is labeled
.
Then, each time-series crop pixel sequence feature can predict the given pixel crop planting type in the following year. In the time-series-based pixel inference, the training dataset is constructed with recursive subsets and each subset owns eight-year moving windows. This process is defined as
and is detailed as follows. The detailed calculation process of
has been handed over to the network in
Section 3.2.
The following year, feature can be used to estimate the area. This prediction method based on historical information can effectively alleviate the problem of data scarcity in the early stages of crop planting prediction.
3.4. The Multi-Subimage Technology
Due to the different administrative areas of each county image and the inseparability of their planting labels. It is not feasible to directly send images with a hundred times size differences into the network for learning features. Inspired by the image blocking technique, we propose multi-subimage technology. Unlike the traditional scale-patch processing [
32] method, this approach places greater emphasis on the role of spatial resolution in area estimation and also has the advantage of consuming fewer training resources. When compared with the histogram [
31] method, which calculates individual pixels, the multi-subimage technology demonstrates a stronger consideration for the spatial distribution characteristics between pixels.
The multi-subimage technique primarily involves masking, cropping, deletion, and padding. As previously discussed in
Section 3.3 on the time-series-based pixel inference method, the historical CDL dataset
is converted into
. In the masking of the multi-subimages method, the crop sequence historical CDL data
are tailored by the cultivated land mask, which enable the reduction in irrelative data during the model training. After that, the non-cultivated land pixels, such as building, lakes, roads, and forests in the historical CDL data
, will be assigned with the value “zero”; the other pixels of target type in the cultivated land area are also assigned with the ’others’ class, such as being labeled ‘3’. Then, the data labeled
are denoted by (5).
In image cropping, the masked historical CDL data
are partitioned into non-overlapping subimages
of consistent size
by a fixed step
and all zero subimages are dropped to reduce the amount of data. As shown in
Figure 3, two kinds of subimages can be observed after cropping: one is non-cultivated land sub-images, characterized by all-zero pixels, which will be discarded in the model training, and other one is cultivated land related subimages, which are used to train the area estimation model, thereby enhancing the capability to identify and predict areas of cultivated land.
However, after dropping the zero subimages, the discrepancies in the number of subimages
across different CDL regions
might bring great challenges in the input dimensions of the model for area estimating. Therefore, after removing the non-cultivated land sub-images, the size-consistent input historical CDL data
acquired by an innovative approach, which is input size padding by reintroducing sub-images filled with zeros to a specific number of subimages
, are shown in Equation (6).
This strategy not only effectively reduced a significant amount of irrelevant data but also ensured the consistency of model input dimensions. Then, the size-consistent input historical CDL data are ready. Such consistency is crucial for model training as it helps to enhance the stability and predictive accuracy, thereby generating more precise predictions of the cultivated area.
3.5. The Label Distribution Smoothing Technology
In this section, label distribution smoothing technology (LDST) is proposed to enhance the numerical stability learning ability of the model on imbalanced area data without changing the true values. When training a network model, accurate area labels are essential for effective training. However, factors such as climate, hydrology, and soil conditions lead to significant variations in crop planting areas across different counties, resulting in unbalanced and long-tailed area data distributions that complicate the training of planting area estimation models.
From the analysis of the data on soybean planting in the United States in 2020, the data reveal a highly non-uniform distribution, with the value ranging from tens to approximately 10
5. Therefore, to address the disparity and minimize the influence of significant data variations, a novel function transformation denoted as
g(
y) is introduced in this study aiming at converting the imbalanced data into a more uniform distribution. This transformation improves the model’s numerical stability and learning ability on imbalanced data while preserving the true values. The function
g(
y) is defined as (7).
where
and
represent the planting area data of region
in year
. And
is an indicator function. The frequency count in the
-th interval of the frequency distribution is denoted as
represents a constant. The value of
γ ranges from 0 to 1, and it is set to 0.4, as determined through pre-experimental analysis.
Based on this transformation, the processed planting area dataset has mitigated the long-tail effect, with a more balanced distribution pattern ranging from 0 to 250, compared with the original area dataset . After, in the model predicting, it is applied to the original output of the model to obtain the inverse transformation of the final predicted area. This transformation (7) effectively addresses the issue of data imbalance, enhances the model’s interpretability, and enables more robust and insightful crop planting area estimation analyses.
5. Conclusions
In this study, an end-to-end predictive network for accurate early crop planting area estimation is proposed, integrating the time-series-based pixel inference method, multi-subimage technology, and label smoothing techniques to address the challenge of early crop planting area estimation. The proposed model not only effectively resolves the area error accumulation issues in the traditional “two-stage” planting area estimation methods but also provides an early planting area estimation before the crops are sown. Furthermore, multi-subimage technology is introduced to tackle the inconsistency in the input sizes of images, and the LDST is employed to mitigate data imbalance in crop planting area estimation.
Our method can obtain the planting area of crops according to the planting trend in the early planting stage, and the accuracy of the model has reached the industry-leading level. At the same time, it has also achieved good results in the practical verification experiment in the United States. However, the error in the area of individual states is slightly larger. Therefore, in future research, it is crucial to develop a high-precision, large-area estimation model that integrates multiple sources of dataset fusion or propose a large model method based on multiple-source datasets to address more challenging problems in agriculture.