A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios

Zhou, Xiaoyu; Dang, Yaoshuai; Song, Jinling; Xiao, Zhiqiang; Yang, Hua

doi:10.3390/rs18050743

Open AccessArticle

A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios

by

Xiaoyu Zhou

^1,2

,

Yaoshuai Dang

³,

Jinling Song

^1,2,*

,

Zhiqiang Xiao

^1,2

and

Hua Yang

^1,2

¹

State Key Laboratory of Remote Sensing and Digital Earth, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China

²

Beijing Engineering Research Center for Global Land Remote Sensing Products, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China

³

School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo 454003, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(5), 743; https://doi.org/10.3390/rs18050743

Submission received: 19 January 2026 / Revised: 20 February 2026 / Accepted: 25 February 2026 / Published: 28 February 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Gaussian process regression effectively enhances the accuracy of LSTM-based models.
The optimal period for corn yield prediction is mid-to-late July to early-to-mid August, corresponding to the key reproductive stages from late tasseling through silking to early grain filling.

What are the implications of the main findings?

The single-phase prediction model achieves comparable accuracy to time-series data models in yield estimation and can serve as an effective tool for early-season prediction, particularly during extreme events.
The period from July to August constitutes a critical window for early yield prediction of summer crops, encompassing the phenological stages from tasseling to mid-grain filling. This window enables an optimal balance between forecast timeliness and accuracy through the integration of multi-source data, facilitating the generation of reliable predictions to inform pre-harvest decision-making.

Abstract

Crop yield estimation, particularly early-season yield prediction, is highly important for global food security and disaster mitigation. In this study, we utilized deep learning models combined with remote sensing data to develop in-season crop yield estimation models, enabling immediate yield prediction. We employed a convolutional neural network (CNN) for spatial feature extraction and a long short-term memory network (LSTM) for temporal patterns, complemented by Gaussian process regression (GP) that introduced geographical coordinates. Three groups of in-season yield prediction experiments were designed, utilizing four-phase, two-phase, and single-phase data, respectively. The results indicated that under the two-phase training scheme, the LSTM_GP model achieved the highest performance in the sixth period, with an

R^{2}

value of 0.61 and a root mean square error (

R M S E

) value of 983.38 kg/ha. When trained on single-phase data at the twelfth phase (approximately mid-to-late July), the LSTM_GP model also performed best, attaining an

R^{2}

value of 0.62 and an

R M S E

value of 969.06 kg/ha. The single-phase prediction model outperformed time-series models in yield prediction accuracy. The periods from mid-to-late July to early-to-mid August represent critical crop growth stages were essential for accurate yield prediction. From our research, we found that adding GP can improve the prediction accuracy, especially for LSTM. Moreover, the proposed single-phase prediction model realized reliable crop yield prediction as well as the silking to early grain-filling stage (mid-to-late July), providing a critical lead time of approximately 2–2.5 months before harvest to support pre-harvest agricultural decision-making.

Keywords:

crop yield prediction; deep learning; single time phase; spatio-temporal information

1. Introduction

Remote sensing data cover the entire world and are available from a wide range of sources, including satellites, aircraft, drones and other devices. Moreover, with improvements in temporal, spectral, and spatial resolution, remote sensing has significant advantages in crop monitoring at local, regional, and global scales, and has been widely used in agricultural fields, including cultivated land cover classification [1], crop monitoring [2], and yield prediction [3,4]. Extracting various types of relevant information from remote sensing data for yield prediction has become a mainstream research area, especially when vegetation indices (VIs), such as the normalized difference vegetation index (NDVI) [5,6], green leaf area index (GLAI) [7], normalized difference water index (NDWI) [8], green vegetation index (GVI), soil adjusted vegetation index (SAVI) [9], and enhanced vegetation index (EVI) [10], are used. In addition, meteorological variables [11,12,13] and soil condition data, including soil moisture and temperature, can also be used as environmental indicators of crop growth for yield prediction [14]. A variety of meteorological factors related to temperature and precipitation have also been analyzed for their relationships with crop yield across different geographical zones [15].

Crop yield estimation models include primarily crop simulation and empirical statistical models [16]. Crop simulation models can accurately simulate the crop growth process. However, owing to the lack of long-term crop growth monitoring data and the complexity of parameter tuning mechanisms, the applications of these models at large spatiotemporal scales remain rare. When empirical statistical models are applied to large-scale yield estimation, they may also struggle to capture complex spatio-temporal variations [17]. The advancement of machine learning, such as support vector machines (SVMs), decision trees (DTs), multilayer perceptrons (MLPs), and restricted Boltzmann machines (RBMs) [18], offers multiple options for crop prediction and achieved credible prediction accuracy [19]. Additionally, artificial neural networks (ANNs) are recognized as effective approaches for predicting yields for a variety of crops [20,21,22]. Seven machine learning techniques have been applied to yield prediction across different regions and crop types, demonstrating the regional applicability and potential of machine learning approaches [23].

Recently, deep learning (DL) has been recognized as a breakthrough technology in machine learning and data mining for remote sensing in agriculture [24]. Most deep learning algorithms, including stacked sparse autoencoder (SSAE), convolutional neural networks (CNNs), and recurrent neural networks (RNNs), have been applied to yield prediction. An SSAE model for rice yield prediction using climate and MODIS data was proposed, and the results revealed that the model outperformed the artificial neural network model [25]. A CNN model for crop yield prediction was designed, which was based on the NDVI and RGB images obtained by drones, and the results showed that the CNN architecture performed better with RGB data than with NDVI data [26]. Studies have shown that CNNs trained with RGB and multispectral datasets performed much better than VI-based regression models did in yield estimation [27]. A faster region-based CNN was utilized to detect and count the number of flowers and ripe and unripe strawberries for yield prediction [28], and a 3D CNN architecture for yield prediction was optimized, achieving results that significantly outperform those of other machine learning methods [29]. Integrating unmanned aerial vehicle data with deep learning algorithms to develop rice yield prediction models, offering a reference for yield estimation and breeding trials [30].

Furthermore, several studies have explored the use of RNNs to incorporate temporal features for predicting yields. An RNN model was designed to determine the best combination of soil parameters and combined it with rainfall patterns in selected areas to derive expected crop yields [31]. Long short-term memory (LSTM) networks utilize gating mechanisms, play a significant role in learning from time series data and have important applications in yield prediction. An LSTM model was trained to predict corn yields on the basis of weather and soil data, and the county-level results in Iowa demonstrated the strong predictive capabilities of the model [32]. Moreover, an LSTM model was employed for predicting wheat yield [33] and was utilized to forecast sugar beet yield [34]. Both studies validated the excellent performance of the model in prediction tasks. Enhancing LSTM model with attention mechanisms [35] or combining them with other models [36] played a significant role in improving yield prediction accuracy. CNNs can extract spatial features, and LSTM has the ability to capture the phenological characteristics of time-series data [37]; however, limited attention has been given to distinguishing the county-level yield prediction performance between CNNs and RNNs, as well as their potential for integrated application. Unlike traditional statistical models, CNNs effectively extract spatial features from imagery, while LSTMs capture temporal dependencies in crop growth, and their combination may be suited for yield prediction under data-limited conditions.

In general, crop yield forecasting focuses primarily on extending predictions from the current period to the end of the season [29,31,33,38], with few studies addressing midseason yield forecasts. In this context, modeling with single-phase data emerges as an effective approach for achieving immediate mid-season yield prediction. A deep learning framework played a role in midseason yield prediction [39]; however, this approach focused predominantly on improving the accuracy of yield predictions across the entire growing season. Mid-season yield estimation at the sub-regional level have demonstrated the feasibility of early yield estimation [40].

To achieve high-precision prediction of crop yield during the midseason on a large scale, this study investigated immediate crop yield estimation models with multi-temporal (specifically four-phase and two-phase) and single-phase remote sensing data. We also compared the performance of spatial-focused CNNs and temporal-specialized LSTMs for in-season yield prediction, and evaluated their accuracy against statistical benchmarks.

2. Materials and Methods

2.1. Study Area

The study area was the United States Corn Belt, a premier agricultural region that accounts for a substantial proportion of global corn production. This region is characterized by flat terrain with an average elevation of less than 500 m, fertile soil, and annual precipitation of 500 to 600 mm. Additionally, elevated temperatures during the spring and summer lead to optimal conditions for corn cultivation. We selected eight states, including Arkansas (AR), Illinois (IL), Indiana (IN), Iowa (IA), Minnesota (MN), Missouri (MO), Nebraska (NE), and North Dakota (ND), which have been primary corn-producing regions in the United States since the 1850s, and in which the corn-related data are notably comprehensive. The study period was from 2003–2016, and 582 counties were considered in this study in Figure 1.

2.2. Data

Considering the availability and accessibility of the data, we utilized MODIS surface reflectance (SR) data and MODIS land surface temperature (LST) data as input data. The corn growing season was determined by analyzing the Crop Progress Report (CPR; https://usda.library.cornell.edu/, accessed on 2 June 2025). Given that the growing season in the U.S. Corn Belt is from late April to late October, the temporal coverage for each year in this study was defined from Day of Year (DOY) 113 to 304, corresponding to April 23rd to October 31st in nonleap years and April 22nd to October 30th in leap years. All the remote sensing data from the corresponding months in the study area were collected from 2003 to 2016, and all the data were obtained from Google Earth Engine (GEE). The descriptions of the relevant data were as follows.

2.2.1. MODIS Surface Reflectance

MOD09A1 is an 8-day composite data product providing a 500-m surface reflectance derived from the MODIS Terra satellite. Each pixel contains the most accurate observation value within 8 days. For this study, 7 reflectance bands, ranging from bands 1 to 7, were selected as the input features for the model.

2.2.2. MODIS Land Surface Temperature

MYD11A2 is an 8-day composite land surface temperature data product from MODIS Aqua, which provides the 8-day average land surface temperature. MYD11A2 is composed of daytime and nighttime LSTs, and these two bands were selected for model input and resampled to 500 m.

2.2.3. Yield Data

County-level corn yield data for 2003 to 2016 were obtained from the United States Department of Agriculture (USDA; https://quickstats.nass.usda.gov, accessed on 3 June 2025) and the distribution of data were showed in Figure 2. Data were filtered by removing outliers beyond the 2nd and 98th percentiles, resulting in a final dataset of 7426 yield records. The original unit of measurement was bushels per acre (bu/ac), which was converted to kilograms per hectare (kg/ha) for the study. The yield data were used as labels for model training and validation.

2.2.4. Cropland Data Layer

The cropland data layer (CDL) is a crop-specific data layer, with a spatial resolution of 30 m and a temporal resolution of one year. While CDL data were not available for the entire United States prior to 2008, the selected states had consistent coverage beginning in 2003. Mask data were employed to exclude interference from other non-corn crops to increase the precision of the study.

2.2.5. Data Preprocessing

Deep learning (DL) requires remote sensing data to be formatted as tensors, whereas the processing of large-scale remote sensing data demands substantial computational resources. In this study, to increase computational efficiency, the remote sensing data were transformed into a histogram, which can effectively reduce the dimensionality and retain the characteristics of the original data. The specific data preprocessing steps were as follows (Figure 3).

(a) Data Download: Remote sensing image data from 2003 to 2016 were downloaded from GEE.

(b) Data masking: Data were clipped using U.S. nation boundary vector data (https://gadm.org/), and the corn planting areas were extracted using the cropland data layer.

(c) Histogram Conversion: In county-level yield prediction tasks, the amount of available yield label data is far smaller than the number of pixels in remote sensing imagery. Therefore, it is necessary to design a remote sensing feature representation method that can both retain essential information and significantly reduce dimensionality. To this end, we hypothesized that county-level yield depended mainly on the proportional distribution of different crop growth statuses (characterized by pixel spectral features) within the region, rather than on the specific spatial arrangement of pixels [41]. That is, under the premise of maintaining an unchanged spectral distribution, shuffling the spatial positions of pixels will not significantly alter the predicted yield value. Based on this permutation invariance assumption, the image information can be effectively compressed into histogram representations based on pixel value counts. Consequently, all the bands of the data were converted into histograms, with the number of bins set to 32 to balance the preservation of spectral information and computational efficiency.

(d) Data flattening: After the flattening process, the format of the input variable was 1 × 9 × 32, and the corresponding actual county yield labels were matched.

Finally, in accordance with the crop growing season, in this study, the temporal period of each year was divided into 24 phases, with a temporal resolution of 8 days, and each phase included 9 bands. Following the aforementioned processing, the data were input into the model.

2.3. Method

Upon completion of the above data preprocessing, we employed several mature deep learning architectures for model construction, including a Convolutional Neural Network (CNN), a Long Short-Term Memory Network (LSTM), and a hybrid CNN-LSTM model. Furthermore, Gaussian progress regression (GP) was integrated into each of these base architectures to enhance predictive performance. Figure 4 illustrates the specific methodology.

2.3.1. Convolutional Neural Network Combined with GP (CNN_GP)

Convolutional neural networks (CNNs) are representative deep learning algorithms that specialize in extracting spatial features from input data with high effectiveness. They represent a feedforward neural network with local connectivity and weight sharing, which is composed mainly of one or more convolutional layers, a pooling layer, and a fully connected layer. The convolutional layer can extract the features of the image using the convolutional kernel. The pooling layer is used to reduce the dimensionality of the data, and the fully connected layers are used to integrate all the output features together and output the result. In this study, we constructed 6 convolutional blocks, each containing a convolutional layer and a batch normalization layer. The RELU activation function was used, and the dropout rate was 0.5. The strides of the convolutional layers were 1, 2, 1, 2, 1, and 2, and the sizes of the convolutional kernels were all 3 × 3. Finally, the CNN model output its predictions through two fully connected layers, and the CNN model combined the output results of the GP by replacing the last fully connected layer using the GP (Figure 5).

2.3.2. Long Short-Term Memory Network Combined with GP (LSTM_GP)

Long short-term memory (LSTM) is a special type of recurrent neural network (RNN). Traditional RNNs suffer from the vanishing gradient problem, whereas LSTM incorporates a gating mechanism that effectively mitigates the vanishing gradient issue when long-time series data are being processed. Moreover, LSTM has a better memory function and can retain distant context information when processing sequence data, which has advantages in time series prediction and signal processing tasks. In this study, we designed a unidirectional LSTM network consisting of three stacked LSTM layers followed by a fully connected layer that comprises two sequential linear layers. To prevent overfitting, we set the dropout rate to 0.5. Finally, the LSTM model output results through two linear layers, and the LSTM model combined the output results of the GP by replacing the last linear layer with those of the GP (Figure 6).

2.3.3. CNN_LSTM Combined with GP (CNN_LSTM_GP)

To further integrate the spatial extraction advantages of the CNN and the temporal extraction advantages of the LSTM, a CNN_LSTM model was designed. This model was employed to learn both spatial patterns within each temporal phase and the temporal evolution across phases. Additionally, the final linear layer was replaced with the GP, resulting in the CNN_LSTM_GP model. All the parameters in the CNN_LSTM model were consistent with those used in the individual CNN and LSTM models (Figure 7).

In addition, to increase the training speed, early stopping was set during the training of all the models. When the accuracy did not improve for 10 epochs, the model triggered early stopping.

2.3.4. Gaussian Process Regression (GP)

Deep learning networks can extract image features; however, some information related to crop growth, such as temporal factors, cannot be obtained from images. These attributes may either remain relatively stable over time or be affected by both time and space. Therefore, Gaussian process regression (GP) integrated with the DL models was introduced, incorporating temporal (year) and spatial (geographical coordinates) information to increase the accuracy of the prediction results.

The geographical coordinates represent spatial locations and define the spatial relationships between a given sample and all other samples in the dataset. GP can leverage the spatial coordinate relationships among sample points to model and correct the residual components that are not fully captured by the deep learning model. Moreover, using a radial basis function kernel, GP introduces a priori knowledge of spatial similarity into the model, making the model more consistent with the inherent spatial continuity of agricultural yields, thereby improving predicting accuracy. This process involves parameters normalized based on spatial and temporal data, with corresponding kernel matrices generated through distance computations between data points [41].

y (x) = f (x) + {h (x)}^{T} β

(1)

where

f (x) ~ g p (0, k (x, x^{'}))

,

h (\cdot)

denotes the set of functions in the final layer, and

β

follows a Gaussian distribution.

k (x, x^{'}) = σ^{2} e x p [- \frac{{‖g_{l o c} - g_{l o c}^{'}‖}^{2}}{2 r_{l o c}^{2}} - \frac{{‖g_{y e a r} - g_{y e a r}^{'}‖}^{2}}{2 r_{y e a r}^{2}}] + σ_{e}^{2} δ_{g, g^{'}}

(2)

where

σ

,

σ_{b}

,

σ_{e}

,

r_{l o c}

and

r_{y e a r}

are hyperparameters set to 1, 0.01, 0.32, 0.5 and 1.5 respectively. The terms

g_{l o c}

and

g_{l o c}^{'}

, denote geographical coordinates, while

g_{y e a r}

and

g_{y e a r}^{'}

represent corresponding years.

2.3.5. Multiple Scenarios Yield Prediction Model

We defined multiple scenarios based on the number of input phases (single-, two- and four-phase). To establish this, we configured the temporal parameters, including the start phase, end phase, and time step, to achieve yield prediction at different time intervals during the growing season.

In this study, the annual growing season was segmented into distinct time periods on the basis of the specified time step, which influences the training data structure for the prediction models. Specifically, when the time step was 4, the 24 phases of the annual growing season were partitioned into 6 contiguous time periods. Each prediction model was then trained using data from 4 consecutive phases within these periods. Conversely, when the time step was 2, the growing season was divided into 12 time periods, with each model trained on data from 2 consecutive phases. When the time step was 1, each model was trained using data from a single phase across multiple years. For example, with a time step of 4, the models were trained on data from Phases 1–4, 5–8, 9–12, 13–16, 17–20, and 21–24. By varying the time step size during model training, the optimal time points for midseason and real-time predictions were investigated to achieve the highest accuracy. The experimental design was visually summarized in Figure 8. To provide clear agricultural context for interpreting the prediction results, the temporal phases in the figure were aligned with key corn phenological stages. These stages were described using the standard V (vegetative) and R (reproductive) system: the Vegetative Growth period spanned from emergence (VE) to tasseling (VT); the Tasseling/Silking period covered the critical flowing transition from VT to silking (R1); and the Grain Fill & Maturation period encompassed development from the blister stage (R2) to physiological maturity (R6) [42].

2.3.6. Model Evaluation

In this study, the yield data from 2003 to 2015 were used as the training dataset, and 10% of the data were extracted as the validation set. On the basis of the trained model, the corn yield in 2016 was predicted. The yield distribution map of the study counties in 2016 is shown in Figure 9, providing a reference for the subsequent analysis of the yield prediction error distribution. The predicted yields from the four-phase, two-phase and single-phase scenarios were compared with the statistical yield data to evaluate the performance of the prediction effect. Equations (1) and (2) were chosen to calculate

R^{2}

and root mean square error (

R M S E

) values, respectively, as the evaluation metrics to analyze the model accuracy.

R^{2} = 1 - \frac{\sum_{i} {(\hat{y_{i}} {- y}_{i})}^{2}}{\sum_{i} {(\bar{y_{i}} - y_{i})}^{2}}

(3)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(4)

where

{\hat{y}}_{i}

is the predicted value,

y_{i}

is the observed value,

\bar{y_{i}}

is the mean of the observed values, and

n

is the number of samples.

3. Results

We conducted three experiments on the basis of the size of the image data involved in the training. The specific experimental results are presented below.

3.1. Four-Phase Data Yield Prediction

Table 1 presented the prediction results of the three models using four-phase data. The use of four-phase data divided the growing season into six time periods. Owing to the limited ground information contained in the early-phase data, effective yield prediction was not feasible during the initial stages. It was not until the third period, corresponding to Phases 9 to 12 (from June 25th to July 26th), that relatively reliable predictions were achieved using data from this period. All three models exhibited the best performance during this period, with the LSTM_GP model demonstrating the highest predictive accuracy. During the third and fourth periods (covering the critical growth stages in July and August), the LSTM_GP model demonstrated relatively high predictive accuracy. Particularly in the fourth period, the

R^{2}

value exceeded those of CNN_GP and CNN_LSTM_GP by 0.13 and 0.17, respectively. Overall, model performance ranked as LSTM_GP > CNN_GP > CNN_LSTM_GP. Toward the end of the growing season, specifically during Phases 21 to 24 (from September 29th to October 31st), the prediction accuracy of the LSTM_GP model decreased because of regional variations in the crop harvesting schedules. The yield prediction could not be completed for the other two model groups when only the end-of-season data were used.

The spatial distributions of the prediction errors from the four-phase models were shown in Figure 10 and Figure A1. Blue indicated underestimation, and red indicated overestimation. The lighter the color was, the higher the prediction accuracy. In Figure 10, subplots (a), (b), and (c) corresponded to the LSTM_GP, CNN_GP, and CNN_LSTM_GP models, respectively. Panels a1–c1 represented the third period, while a2–c2 represented the fourth period (from July 27th to August 27th).

Analysis of Panels a1–c1 revealed pronounced underestimation effects in high-yield counties in the central and north-central region, with the LSTM_GP model exhibiting the weakest underestimation among the three models. Concurrently, overestimation effects were also evident in low-yield areas in the eastern and southern regions, as well as counties with a dispersed distribution, and the LSTM_GP model again demonstrated the best performance in mitigating these deviations. In subsequent time periods, the prediction errors increased further, with the underestimation effect becoming more pronounced, particularly in central counties. The LSTM_GP model demonstrated stable performance over longer temporal sequences. Considering the spatial error distribution across all four periods, the LSTM_GP model delivered the most robust spatial performance overall, which underscores the critical role of the LSTM layer in capturing temporal patterns for yield prediction.

3.2. Two-Phase Data Yield Prediction

The training results of the three models based on the two-phase data were presented in Table 2. The growing season was further divided into 12 stages. Similar to earlier scenarios, yield model training was not feasible in the initial stages. By the sixth period, corresponding to Phases 11–12 (from July 11th to July 26th), the models achieved relatively accurate predictions. Compared with the CNN_GP and CNN_LSTM_GP models, the LSTM_GP model attained optimal performance in this period, with an

R^{2}

value of 0.61 and an

R M S E

value of 983.38 kg/ha, and

R^{2}

values increased by 27%.

A comparison with the four-phase model results (Table 1) indicated that Phases 11–12 (from July 11th to July 26th) and Phases 13–14 (from July 27th to August 11th) played crucial roles in yield prediction. For early-season yield forecasting, the period from mid-July to August represented a critical growth stage during which models trained with even limited temporal-phase data achieved effective predictive performance. Furthermore, in longer temporal sequences, temporal features became more informative, enabling lower-complexity models such as LSTM_GP to achieve more reliable predictions.

Overall, the LSTM_GP model achieved the highest accuracy, whereas the CNN_GP and CNN_LSTM_GP models exhibited comparable performance. Throughout the entire time series, the LSTM_GP model demonstrated the greatest stability.

The spatial distributions of the prediction errors from the two-phase models were shown in Figure 11. Subplots (a), (b), and (c) corresponded to the LSTM_GP, CNN_GP, and CNN_LSTM_GP models, respectively. Panels a1–c1 showed the prediction errors for Phases 11–12, a2–c2 for Phases 13–14, and a3–c3 for Phases 15–16 (from August 12th to August 27th).

A comparison of the three models revealed that the LSTM_GP model performed the best; however, all three exhibited a trend toward increasingly pronounced underestimation in the central region as the growing season progressed.

A comparison of model performance across similar time periods using different temporal-phase combinations revealed that compared with those in Figure 10, the predictive accuracies in Panels a1–c1 in Figure 11 improved in the central region, indicating that Phases 11–12 played a more critical role in yield prediction. Furthermore, a comparison between Panels a2–c2 in Figure 11 and Panels a2–c2 in Figure 10 revealed that the former achieved a better performance, demonstrating that even a small amount of data from key phases during critical growth periods can enable effective yield prediction.

3.3. Single-Phase Data Yield Prediction

The training results of the three models based on single-phase data were presented in Table 3. In the single-phase training mode, the entire growing season was divided into 24 phases, each spanning an 8-day period. During training, only one remote sensing image per feature per year was used. In the early growth stages, the model accuracy was relatively low. By the 11th phase (from July 11th to July 18th), all three models were able to achieve effective predictions. All the models achieved their highest accuracy in the 12th phase (from July 19th to July 26th). The LSTM_GP model performed best, with an

R^{2}

value of 0.62 and an

R M S E

value of 969.06 kg/ha, whereas both CNN_GP and CNN_LSTM_GP also achieved an

R^{2}

value of 0.61. Model accuracy decreased sharply in the 13th phase (from July 27th to August 3rd), which can be attributed to adverse weather conditions affecting the quality of the remote sensing data during this period. A comparison with the two-phase model results (Table 2) indicated that Phase 12 and Phase 14 (from August 4th to August 11th), representing a critical growth window, facilitated more effective model learning of yield-related spatio-temporal features from remote sensing data. Although the models maintained relatively good predictive performance in August (Phases 14–17), their accuracy began to decline in subsequent phases, ultimately failing to support real-time yield prediction. Over extended temporal sequences, the LSTM_GP model also demonstrated the best performance among the single-phase prediction models. During the critical growth stage, the CNN_GP model achieved higher accuracy than the CNN_LSTM_GP model.

The spatial distributions of the prediction errors from the two-phase models were shown in Figure 12 (for Phases 11–12) and Figure A2 (for Phase 13–16). In Figure 12, subplots (a), (b), and (c) corresponded to the LSTM_GP, CNN_GP, and CNN_LSTM_GP models, respectively. Panels a1–c1 showed the prediction error distributions for Phase 11, and panels a2–c2 for Phase 12.

With the exception of Phase 13, all three single-phase models demonstrated strong spatial predictive performance, with a notable reduction in the underestimation effect observed in high-yield central regions. This improvement was particularly evident during the 12th and 14th phases, likely attributable to the similarity in single-phase data characteristics, which facilitated more effective model learning. Among the models, LSTM_GP achieved the best performance, whereas the other two models exhibited comparable results.

A comparison of model performance between two-phase and single-phase data training during the same time period revealed that Panels a2–c2 in Figure 12 exhibited higher predictive accuracy than Panels a1–c1 in Figure 11, indicating that Phase 12 played an important role in early-season yield prediction. Furthermore, the results demonstrated that Phases 14 and 15 contributed significantly to yield forecasting. The results with higher prediction accuracy were concentrated between late July and early August.

Based on the training results of the three aforementioned temporal models, the LSTM_GP model consistently achieved optimal performance. In multi-temporal model training, the LSTM component leveraged its strength in temporal sequence extraction to effectively capture crop growth patterns. In single-temporal model training, the LSTM model demonstrated heightened sensitivity to interannual feature similarities at the same phase, leading to robust performance in both multi-temporal and single-temporal prediction scenarios. In contract, the spatial feature extraction capability of the CNN may exhibit some redundancy with the spatial information introduced by the GP. The high complexity of the CNN_LSTM_GP model made it susceptible to overfitting on county-level yield data, resulting in compromised generalization capability and poor predictive accuracy. Our findings reinforced the conception of crop growth as a pronounced temporal phenomenon [43], highlighting the critical role of time-series characteristics in yield forecasting.

4. Discussion

Previous studies have focused mainly on crop yield prediction for the entire growing season, whereas this study focused on yield prediction for the midseason of crop growth. On the basis of the experimental results, we concluded that the deep learning model can achieve yield prediction during critical crop growth stages using only two-phase or even single-phase data.

4.1. Impact of GP on Model Performance

To compare the role of GP, the performance of the three model groups under three distinct training configurations is shown in Figure 13. Subplots (a) corresponded to the LSTM, (b) to the CNN, and (c) to the CNN_LSTM models. Panels a1–c1 presented results from the single-phase training model (Phase 12, 2016); Panels a2–c2 displayed outcomes from the two-phase training model (Phases 11–12, 2016); and panels a3–c3 showed results derived from the four-phase training model (Phases 9–12, 2016). Density contours represented the distribution density of predicted values. Inner contours indicated higher density region. Without GP (blue) and with GP (orange) predictions were shown with their respective regression lines. Comparative analysis between models with and without GP revealed significant improvements in predictive performance upon GP integration. For the LSTM models (a1–a3), the inclusion of GP increased the average

R^{2}

value by 0.61, representing an improvement of 0.31 over the model without GP, and reduced the

R M S E

value by an average of 336.45 kg/ha. The CNN models (b1–b3) exhibited an average

R^{2}

value of 0.53 with increasing GP, reflecting a gain of 0.26, along with an average

R M S E

reduction of 269.69 kg/ha. Notably, the CNN-LSTM models (c1–c3) achieved the most substantial improvement, with GP increasing the mean

R^{2}

value by 0.44 to 0.50 and decreasing

R M S E

value by 415.68 kg/ha. The kernel density estimation contours further revealed that, with the integration of GP, the contours aligned more closely with the 1:1 line, exhibited a more circular shape, and reduced estimation errors.

The results from other temporal phases in 2016 were consistent with the aforementioned conclusions. In Figure 14, Panels a1–c1 presented the results from the single-phase training model (Phase 16, 2016); Panels a2–c2 displayed the outcomes from the two-phase training model (Phases 15–16, 2016); and panels a3–c3 showed the results derived from the four-phase training model (Phases 13–16, 2016).

For the LSTM models (a1–a3), the incorporation of GP resulted in an average

R^{2}

value of 0.49, reflecting an improvement of 0.26 over the model without GP, and the average

R M S E

value decreased by 257.94 kg/ha. The CNN models (b1–b3) exhibited an average

R^{2}

value of 0.40 with increasing GP, reflecting a gain of 0.09, along with an average

R M S E

reduction of 90.02 kg/ha. Similarly, for the CNN_LSTM models (c1–c3), the average

R^{2}

value of the GP was 0.35, representing a gain of 0.21, which was accompanied by an average

R M S E

reduction of 195.10 kg/ha.

These results indicate that, compared to the pure CNN, the GP consistently enhanced the prediction accuracy and the GP contributed more significantly to the LSTM and CNN_LSTM models. The differential effect may be attributed to the inherent strength of CNN models in spatial feature extraction, whereas the explicit introduction of spatial GP provided complementary benefits to temporal feature extractors such as LSTM, thereby leading to more pronounced accuracy enhancement in spatial–temporal prediction tasks.

Furthermore, the results of the analysis revealed that the predictive accuracy of the single-phase model during critical growth stages was comparable to and occasionally exceeds that of the two-phase and four-phase models. These findings indicated that yield prediction can be effectively achieved using single-phase data during key phenological periods, particularly the tasseling/silking stage in late July to early August, which has significant implications for real-time yield estimation, especially in the context of disaster monitoring and response. Meanwhile, compared with other studies that achieved yield prediction in late September [44] or 36 days before harvest [45], our study significantly advanced the timing of early yield prediction.

4.2. Comparative Analysis with Time-Series Data Modeling

To evaluate the performance of models trained on short-phase data, we conducted comparative experiments using the optimally performing LSTM_GP model. Time-series data were trained from different start phase (start phase = 11 and start phase = 1) to enable systematic comparison with short-phase modeling approaches.

The results in Table 4 demonstrated that sequential models trained from the 11th time phase (e.g., using data from phases 11–16 to predict yield at phase 16) achieved superior accuracy and stability compared to those trained from the 1st time phase. Notably, models initialized at the 11th time phase did not exhibit significant overfitting in the later stages of training. These findings suggested that initiating time-series input from key crop growth phases may yield higher prediction accuracy than modeling from the earliest phenological stage. This phenomenon may be attributable to weaker remote sensing signals during the early growing season.

The model trained from the 11th time phase achieved optimal accuracy at the 16th time phase (

R^{2}

= 0.69,

R M S E

= 873.90 kg/ha), with robust performance persisting through the 20th time phase (late September;

R^{2}

= 0.65,

R M S E

= 935.78 kg/ha). Despite the advantages in predictive accuracy, sequential models exhibited lower training efficiency: models trained from the 1st time phase required approximately four times the training time of single-phase models (~5 min), while those trained from the 11th time phase required roughly twice the training time. Although single-phase models yielded comparatively lower predictive performance, they maintained considerable accuracy during critical phenological phases. Moreover, short-phase or even single-phase modeling strategies offered substantial practical value and application potential under conditions of limited data availability or the need for rapid prediction generation.

4.3. Additional Improvements of the Study

In this study, three experiments were conducted on three models. When single-phase images were used for the experiments, good yield prediction results were achieved in late July, which represents an improvement in the prediction lead time compared with existing methods that can only obtain reliable predictions one month in advance [46]; thereby, our result further confirms that the period from late July to August, which aligns with the critical tasseling and silking stages, is a critical and feasible window for establishing early and reliable yield prediction models. Single-phase model training could also increase the running speed. Moreover, in the spatial distribution map of the model prediction results, the research error was distributed mainly in the central area of the corn belt, and there was an underestimation in high-value areas. This may be because this area was the main corn-producing area, and the corn yield was much higher in this area than in other areas. The information provided by other areas caused interference and underestimated the yield in the central area of the corn belt. The incorporation of more training data while controlling for data heterogeneity can further increase the accuracy of yield prediction.

During the early crop growth stages, the scarcity of crop-related information limits the training accuracy of models that use limited temporal data. As crops mature, the weakening of crop signals further challenges model training, which explains why current models perform optimally only during critical growth periods. The observed performance degradation in late-season phases (Phases 21–24 in Table 1, Table 2 and Table 3) can be attributed to several interrelated factors. Firstly, during the late growth stages, the remote sensing signals from crops weaken, leading to a weakened and noisier statistical relationship with final yield, which diminishes the predictive utility of remote sensing data. Secondly, the critical window for yield formation typically closes by the late grain-filling stages, meaning late-phase imagery offers little incremental information. Finally, data-driven models face inherent challenges in extracting information during the late growth stages of crops due to both the attenuated signal strength and the difficulty in generalizing across diverse late-season field conditions. The incorporation of spatial information significantly enhances models trained on temporal data, and the integration of additional data sources (e.g., soil properties, management practices) is expected to further improve prediction accuracy.

Comparative analysis of the LSTM_GP, CNN_GP, and CNN_LSTM_GP models indicated that the model with lower complexity achieved the best performance. This suggests that the combination of LSTM for temporal feature extraction and GP with integrated spatial coordinates information is sufficient for county-scale yield prediction, rendering more complex architectures unnecessary. We attribute the superiority of the LSTM_GP framework to several factors. First, the histogram-based representation of county-level remote sensing data is already highly aggregated, leaving limited spatial structure for the CNN modules to exploit. Second, the LSTM component effectively captures the interannual dynamic patterns of crop growth. Third, the direct fusion of CNN and LSTM features may introduce redundancy; given that the input data have already been compressed via histogram transformation, the hybrid architecture does not consistently yield synergistic gains. The potential benefits of incorporating alternative model mechanisms warrant further investigation.

In the future, timely prediction models based on single-phase data can be directly applied to support practical agricultural production activities, especially in response to emergent disaster events. Moreover, relying solely on optical data for prediction is insufficient, as these data are highly susceptible to meteorological conditions and often undergo maximum composition during processing, which limits their ability to adequately capture disaster-related information. Integrating additional data sources such as meteorological data, topographic data, and synthetic aperture radar data can significantly increase the accuracy and timeliness of crop yield prediction. Further work will also prioritize evaluating and improving model resilience under extreme climatic conditions and evolving agricultural technologies, as well as expanding the applicability to broader geographic and agroecological regions.

5. Conclusions

This study employed multiple deep learning approaches, including LSTM and CNN, to extracted spatio-temporal information, integrated with Gaussian process regression and trained on multitemporal-phase data to achieve real-time in-season crop yield estimation during critical growth stages, thereby addressing the limitations of yield prediction that rely solely on sequential data. The results demonstrate that the periods from mid-to-late July and early-to-mid August represent critical growth stages during which even single-temporal-phase data can support high-accuracy yield estimation. The incorporation of GP introduced spatial information and significantly enhanced the performance of the LSTM model. The use of two-phase or even single-phase data during the mid-growing season enables high-precision yield prediction, allowing for the rapid estimation of crop yield at key growth stages and facilitating timely and efficient field management.

Author Contributions

Conceptualization, X.Z., Y.D. and J.S.; methodology, X.Z., Y.D., J.S., Z.X. and H.Y.; software, X.Z. and Y.D.; formal analysis, X.Z.; investigation, X.Z.; data curation, X.Z. and Y.D.; writing—original draft preparation, X.Z. and Y.D.; writing—review and editing, X.Z. and J.S.; visualization, X.Z.; supervision, J.S.; funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China Major Program, grant number 42192584 and 42192580.

Data Availability Statement

The MODIS data (MODIS SR, MODIS LST) and Crop data layer (CDL) used in this paper are available in GEE, the yield data used in the from the United States Department of Agriculture (https://quickstats.nass.usda.gov, accessed on 3 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. County-level prediction error maps for the 17th to 24th time phases in 2016 base on (a) LSTM_GP, (b) CNN_GP and (c) CNN_LSTM_GP, showing (1) Phases 17–20, (2) Phases 21–24; the error is in kg/ha. Panels a1–c1 represent the fifth period (from August 28th to September 28th, and panels a2–c2 represent the sixth period.

Figure A2. County-level prediction error maps for the 13th to 16th time phases in 2016 based on (a) LSTM_GP, (b) CNN_GP, and (c) CNN_LSTM_GP; the error is in kg/ha. Panels a1–c1, a2–c2, a3–c3, and a4–c4 correspond to Phase 13, 14, 15 (from August 12th to August 19th), and 16 (from August 20th to August 27th), respectively. Specifically, the numbers 1, 2, 3, and 4 in the subfigure labels represent Phase 13, 14, 15, and 16.

References

Barriere, V.; Claverie, M.; Schneider, M.; Lemoine, G.; d’Andrimont, R. Boosting Crop Classification by Hierarchically Fusing Satellite, Rotational, and Contextual Data. Remote Sens. Environ. 2024, 305, 114110. [Google Scholar] [CrossRef]
Wu, B.; Zhang, M.; Zeng, H.; Tian, F.; Potgieter, A.B.; Qin, X.; Yan, N.; Chang, S.; Zhao, Y.; Dong, Q.; et al. Challenges and Opportunities in Remote Sensing-Based Crop Monitoring: A Review. Natl. Sci. Rev. 2023, 10, nwac290. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Tian, J.; Tian, Q. Deep Learning Application for Crop Classification via Multi-Temporal Remote Sensing Images. Agriculture 2023, 13, 906. [Google Scholar] [CrossRef]
Thorp, K.R.; Drajat, D. Deep Machine Learning with Sentinel Satellite Data to Map Paddy Rice Production Stages across West Java, Indonesia. Remote Sens. Environ. 2021, 265, 112679. [Google Scholar] [CrossRef]
Ludewig-Spickermann, O.C.; Sombrowski, M.; Kim, D.Y.; Gille, S.; Deubel, A.; Aleithe, W.; Seidel, M.; Pietsch, M. Optimisation of the Correlation between Normalised Difference Vegetation Index and Sugar Beet Yield Using Multispectral Remote Sensing Data. Eur. J. Agron. 2025, 171, 127820. [Google Scholar] [CrossRef]
Shrestha, R.; Di, L.; Yu, E.G.; Kang, L.; Shao, Y.; Bai, Y. Regression Model to Estimate Flood Impact on Corn Yield Using MODIS NDVI and USDA Cropland Data Layer. J. Integr. Agric. 2017, 16, 398–407. [Google Scholar] [CrossRef]
Duchemin, B.; Maisongrande, P.; Boulet, G.; Benhadj, I. A Simple Algorithm for Yield Estimates: Evaluation for Semi-Arid Irrigated Winter Wheat Monitored with Green Leaf Area Index. Environ. Model. Softw. 2008, 23, 876–892. [Google Scholar] [CrossRef]
Bolton, D.K.; Friedl, M.A. Forecasting Crop Yield Using Remotely Sensed Vegetation Indices and Crop Phenology Metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Hara, P.; Piekutowska, M.; Niedbała, G. Selection of Independent Variables for Crop Yield Prediction Using Artificial Neural Network Models with Remote Sensing Data. Land 2021, 10, 609. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Mathieu, J.A.; Aires, F. Using Neural Network Classifier Approach for Statistically Forecasting Extreme Corn Yield Losses in Eastern United States. Earth Space Sci. 2018, 5, 622–639. [Google Scholar] [CrossRef]
Funk, C.; Peterson, P.; Landsfeld, M.; Pedreros, D.; Verdin, J.; Shukla, S.; Husak, G.; Rowland, J.; Harrison, L.; Hoell, A.; et al. The Climate Hazards Infrared Precipitation with Stations—A New Environmental Record for Monitoring Extremes. Sci. Data 2015, 2, 150066. [Google Scholar] [CrossRef]
Crane-Droesch, A. Machine Learning Methods for Crop Yield Prediction and Climate Change Impact Assessment in Agriculture. Environ. Res. Lett. 2018, 13, 114003. [Google Scholar] [CrossRef]
Pourmohammadali, B.; Hosseinifard, S.J.; Hassan Salehi, M.; Shirani, H.; Esfandiarpour Boroujeni, I. Effects of Soil Properties, Water Quality and Management Practices on Pistachio Yield in Rafsanjan Region, Southeast of Iran. Agric. Water Manag. 2019, 213, 894–902. [Google Scholar] [CrossRef]
Sabut, A.; Tripathy, K.P.; Mishra, A.; Anderson, M.; Cosh, M.; Kraatz, S.; Gao, F.; Cirone, R. Assessing the Impact of Climate Indices on Corn Yield in the Continental USA Using Machine Learning Approach. Agric. For. Meteorol. 2025, 371, 110632. [Google Scholar] [CrossRef]
Bocca, F.F.; Rodrigues, L.H.A. The Effect of Tuning, Feature Engineering, and Feature Selection in Data Mining Applied to Rainfed Sugarcane Yield Modelling. Comput. Electron. Agric. 2016, 128, 67–76. [Google Scholar] [CrossRef]
Fang, H.; Liang, S.; Hoogenboom, G. Integration of MODIS LAI and Vegetation Index Products with the CSM–CERES–Maize Model for Corn Yield Estimation. Int. J. Remote Sens. 2011, 32, 1039–1065. [Google Scholar] [CrossRef]
Kim, N.; Lee, Y.W. Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data: A Case of Iowa State. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2016, 34, 383–390. [Google Scholar] [CrossRef]
Zhou, W.; Liu, Y.; Ata-Ul-Karim, S.T.; Ge, Q.; Li, X.; Xiao, J. Integrating Climate and Satellite Remote Sensing Data for Predicting County-Level Wheat Yield in China Using Machine Learning Methods. Int. J. Appl. Earth Obs. Geoinf. 2022, 111, 102861. [Google Scholar] [CrossRef]
Niedbała, G. Simple Model Based on Artificial Neural Network for Early Prediction and Simulation Winter Rapeseed Yield. J. Integr. Agric. 2019, 18, 54–61. [Google Scholar] [CrossRef]
Zeng, W.; Xu, C.; Gang, Z.; Wu, J.; Huang, J. Estimation of Sunflower Seed Yield Using Partial Least Squares Regression and Artificial Neural Network Models. Pedosphere 2018, 28, 764–774. [Google Scholar] [CrossRef]
Abrougui, K.; Gabsi, K.; Mercatoris, B.; Khemis, C.; Amami, R.; Chehaibi, S. Prediction of Organic Potato Yield Using Tillage Systems and Soil Properties by Artificial Neural Network (ANN) and Multiple Linear Regressions (MLR). Soil Tillage Res. 2019, 190, 202–208. [Google Scholar] [CrossRef]
Ju, S.; Lim, H.; Ma, J.W.; Kim, S.; Lee, K.; Zhao, S.; Heo, J. Optimal County-Level Crop Yield Prediction Using MODIS-Based Variables and Weather Data: A Comparative Study on Machine Learning Models. Agric. For. Meteorol. 2021, 307, 108530. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Ma, J.-W.; Nguyen, C.-H.; Lee, K.; Heo, J. Regional-Scale Rice-Yield Estimation Using Stacked Auto-Encoder with Climatic and MODIS Data: A Case Study of South Korea. Int. J. Remote Sens. 2019, 40, 51–71. [Google Scholar] [CrossRef]
Nevavuori, P.; Narra, N.; Linna, P.; Lipping, T. Crop Yield Prediction Using Multitemporal UAV Data and Spatio-Temporal Deep Learning Models. Remote Sens. 2020, 12, 4000. [Google Scholar] [CrossRef]
Yang, Q.; Shi, L.; Han, J.; Zha, Y.; Zhu, P. Deep Convolutional Neural Networks for Rice Grain Yield Estimation at the Ripening Stage Using UAV-Based Remotely Sensed Images. Field Crops Res. 2019, 235, 142–153. [Google Scholar] [CrossRef]
Chen, Y.; Lee, W.S.; Gan, H.; Peres, N.; Fraisse, C.; Zhang, Y.; He, Y. Strawberry Yield Prediction Based on a Deep Neural Network Using High-Resolution Aerial Orthoimages. Remote Sens. 2019, 11, 1584. [Google Scholar] [CrossRef]
Mohan, A.; Venkatesan, M.; Prabhavathy, P.; Jayakrishnan, A. Temporal Convolutional Network Based Rice Crop Yield Prediction Using Multispectral Satellite Data. Infrared Phys. Technol. 2023, 135, 104960. [Google Scholar] [CrossRef]
Zhou, H.; Huang, F.; Lou, W.; Gu, Q.; Ye, Z.; Hu, H.; Zhang, X. Yield Prediction through UAV-Based Multispectral Imaging and Deep Learning in Rice Breeding Trials. Agric. Syst. 2025, 223, 104214. [Google Scholar] [CrossRef]
Kulkarni, S.; Mandal, S.N.; Sharma, G.S.; Mundada, M.R. Meeradevi Predictive Analysis to Improve Crop Yield Using a Neural Network Model. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; pp. 74–79. [Google Scholar]
Jiang, Z.; Liu, C.; Hendricks, N.P.; Ganapathysubramanian, B.; Hayes, D.J.; Sarkar, S. Predicting County Level Corn Yields Using Deep Long Short Term Memory Models. Available online: https://arxiv.org/abs/1805.12044v1 (accessed on 7 October 2024).
Haider, S.A.; Naqvi, S.R.; Akram, T.; Umar, G.A.; Shahzad, A.; Sial, M.R.; Khaliq, S.; Kamran, M. LSTM Neural Network Based Forecasting Model for Wheat Production in Pakistan. Agronomy 2019, 9, 72. [Google Scholar] [CrossRef]
Wang, Q.; Shao, K.; Cai, Z.; Che, Y.; Chen, H.; Xiao, S.; Wang, R.; Liu, Y.; Li, B.; Ma, Y. Prediction of Sugar Beet Yield and Quality Parameters Using Stacked-LSTM Model with Pre-Harvest UAV Time Series Data and Meteorological Factors. Artif. Intell. Agric. 2025, 15, 252–265. [Google Scholar] [CrossRef]
Li, M.; Wang, P.; Tansey, K.; Zhang, Y.; Guo, F.; Liu, J.; Li, H. An Interpretable Wheat Yield Estimation Model Using an Attention Mechanism-Based Deep Learning Framework with Multiple Remotely Sensed Variables. Int. J. Appl. Earth Obs. Geoinf. 2025, 140, 104579. [Google Scholar] [CrossRef]
Nejad, S.M.M.; Abbasi-Moghadam, D.; Sharifi, A.; Tariq, A. Capsular Attention Conv-LSTM Network (CACN): A Deep Learning Structure for Crop Yield Estimation Based on Multispectral Imagery. Eur. J. Agron. 2024, 161, 127369. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar] [CrossRef]
Alhnaity, B.; Pearson, S.; Leontidis, G.; Kollias, S. Using Deep Learning to Predict Plant Growth and Yield in Greenhouse Environments. In Proceedings of the International Symposium on Advanced Technologies and Management for Innovative Greenhouses: GreenSys2019 1296, Angers, France, 16–20 June 2019. [Google Scholar]
Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model. Sensors 2019, 19, 4363. [Google Scholar] [CrossRef]
Wang, H.; Dai, Y.; Yao, Q.; Ma, L.; Zhang, Z.; Lv, X. Multi-Task Learning Model Driven by Climate and Remote Sensing Data Collaboration for Mid-Season Cotton Yield Prediction. Field Crops Res. 2025, 333, 110070. [Google Scholar] [CrossRef]
You, J.; Li, X.; Low, M.; Lobell, D.; Ermon, S. Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]
Darby, H.; Lauer, J. Critical Stages in the Life of a Corn Plant. University of Wisconsin-Extension. 2000. Available online: http://corn.agronomy.wisc.edu/Management/pdfs/CriticalStages.pdf (accessed on 24 February 2026).
Gelagay, H.S.; Leroux, L.; Tamene, L.; Chernet, M.; Blasch, G.; Tibebe, D.; Abera, W.; Sida, T.; Tesfaye, K.; Corbeels, M.; et al. A Crop-Specific and Time-Variant Spatial Framework for Characterizing Rainfed Wheat Production Environments in Ethiopia. Agric. Syst. 2025, 227, 104360. [Google Scholar] [CrossRef]
Khan, S.N.; Iqbal, J.; Khan, M.R.; Malik, N.A.; Khan, F.A.; Khan, K.; Khan, A.N.; Wahab, A. Using Remotely Sensed Vegetation Indices and Multi-Stream Deep Learning Improves County-Level Corn Yield Predictions. Eur. J. Agron. 2025, 164, 127496. [Google Scholar] [CrossRef]
Hashemi, M.G.Z.; Tan, P.-N.; Jalilvand, E.; Wilke, B.; Alemohammad, H.; Das, N.N. Yield Estimation from SAR Data Using Patch-Based Deep Learning and Machine Learning Techniques. Comput. Electron. Agric. 2024, 226, 109340. [Google Scholar] [CrossRef]
Lu, C.; Leng, G.; Liao, X.; Tu, H.; Qiu, J.; Li, J.; Huang, S.; Peng, J. In-Season Maize Yield Prediction in Northeast China: The Phase-Dependent Benefits of Assimilating Climate Forecast and Satellite Observations. Agric. For. Meteorol. 2024, 358, 110242. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Yield data distribution.

Figure 3. Data processing flow.

Figure 4. Technical route.

Figure 5. CNN and CNN_GP network structure.

Figure 6. LSTM and LSTM_GP network structure.

Figure 7. CNN_LSTM and CNN_LSTM_GP network structure.

Figure 8. Three scenarios of data partitioning aligned with corn phenology: Vegetative Growth (from VE to VT), Tasseling/Silking (from VT to R1), and rain Fill & Maturation (from R2 to R6).

Figure 9. Spatial distribution of crop yield in 2016.

Figure 10. County-level prediction error maps for the 9th to 16th time phases in 2016 based on (a) LSTM_GP, (b) CNN_GP and (c) CNN_LSTM_GP; (1) Phases 9–12, (2) Phases 13–16; the error is in kg/ha.

Figure 11. County-level prediction error maps for the 11th to 16th time phases in 2016 based on (a) LSTM_GP, (b) CNN_GP, (c) and CNN_LSTM_GP; (1) Phases 11–12, (2) Phases 13–14, and (3) Phases 15–16; the error is in kg/ha.

Figure 12. County-level prediction error maps for the 11th to 12th time phases in 2016 based on (a) LSTM_GP, (b) CNN_GP, and (c) CNN_LSTM_GP; (1) Phase 11, (2) Phase 12; the error is in kg/ha.

Figure 13. Scatter plots with density contours of the predicted yield versus the observed yield in the 12th time phase in 2016 based on (a) LSTM, (b) CNN, and (c) CNN_LSTM; (1) single-phase, (2) two-phase, and (3) four-phase;

R M S E

values are in kg/ha.

Figure 13. Scatter plots with density contours of the predicted yield versus the observed yield in the 12th time phase in 2016 based on (a) LSTM, (b) CNN, and (c) CNN_LSTM; (1) single-phase, (2) two-phase, and (3) four-phase;

R M S E

values are in kg/ha.

Figure 14. Scatter plots with density contours of the predicted yield versus the observed yield in the 16th time phase in 2016 based on (a) LSTM, (b) CNN, and (c) CNN_LSTM; (1) single-phase, (2) two-phase, and (3) four-phase;

R M S E

values are in kg/ha.

Figure 14. Scatter plots with density contours of the predicted yield versus the observed yield in the 16th time phase in 2016 based on (a) LSTM, (b) CNN, and (c) CNN_LSTM; (1) single-phase, (2) two-phase, and (3) four-phase;

R M S E

values are in kg/ha.

Table 1. Yield prediction results for 2016 based on four-phase data;

R M S E

values are in kg/ha.

Table 1. Yield prediction results for 2016 based on four-phase data;

R M S E

values are in kg/ha.

Time Phase	LSTM_GP		CNN_GP		CNN_LSTM_GP
Time Phase	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$
9th–12th	0.59	1006.43	0.5	1119.48	0.42	1204.62
13th–16th	0.51	1102.41	0.38	1244.98	0.34	1278.16
17th–20th	0.38	1241.44	0.35	1269.3	0.36	1258.81
21st–24th	0.26	1361.73	−0.35	1832.6	−0.57	1980.18

Table 2. Yield prediction model for 2016 based on two-phase data;

R M S E

values are in kg/ha.

Table 2. Yield prediction model for 2016 based on two-phase data;

R M S E

values are in kg/ha.

Time Phase	LSTM_GP		CNN_GP		CNN_LSTM_GP
Time Phase	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$
11th–12th	0.61	983.38	0.48	1137.12	0.48	1141.06
13th–14th	0.54	1067.70	0.47	1149.18	0.50	1116.37
15th–16th	0.46	1164.63	0.41	1216.87	0.26	1355.63
17th–18th	0.38	1244.21	0.13	1468.33	0.27	1350.31
19th–20th	0.36	1266.68	0.29	1329.64	0.39	1234.26
21st–22nd	0.47	1153.60	0.44	1180.24	0.44	1184.72
23rd–24th	0.17	1435.08	−0.50	1931.80	−0.66	2033.84

Table 3. Yield prediction model for 2016 based on single-phase data;

R M S E

values are in kg/ha.

Table 3. Yield prediction model for 2016 based on single-phase data;

R M S E

values are in kg/ha.

Time Phase	LSTM_GP		CNN_GP		CNN_LSTM_GP
Time Phase	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$
11th	0.53	1081.40	0.56	1048.65	0.57	1033.21
12th	0.62	969.06	0.61	992.00	0.61	981.38
13th	0.35	1274.98	0.39	1228.35	0.30	1318.54
14th	0.59	1013.34	0.58	1018.19	0.58	1023.90
15th	0.54	1066.51	0.52	1097.38	0.44	1182.73
16th	0.49	1125.98	0.43	1195.50	0.45	1167.41
17th	0.45	1171.46	0.35	1275.10	0.34	1278.67
18th	0.36	1260.62	0.28	1336.68	0.38	1242.03
19th	0.40	1226.87	0.37	1251.58	0.13	1473.00
20th	0.32	1297.66	0.19	1422.06	0.33	1290.93
21st	0.49	1122.57	0.33	1292.02	0.37	1256.77
22nd	0.50	1110.74	0.26	1353.27	0.37	1257.62
23rd	0.36	1267.39	0.26	1355.54	0.37	1252.06
24th	0.08	1517.90	−0.37	1848.61	−0.16	1703.26

Table 4. Yield prediction performance of time-series models with different start phase based on LSTM_GP; RMSE values are in kg/ha.

Time Phase	Start Phase = 11		Start Phase = 1
Time Phase	$R^{2}$	$R M S E$	$R^{2}$	$R M S E$
12th	0.61	981.23	0.38	1241.26
14th	0.63	964.87	0.56	1049.92
16th	0.69	873.90	0.38	1241.26
20th	0.65	935.78	0.38	1241.26

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, X.; Dang, Y.; Song, J.; Xiao, Z.; Yang, H. A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios. Remote Sens. 2026, 18, 743. https://doi.org/10.3390/rs18050743

AMA Style

Zhou X, Dang Y, Song J, Xiao Z, Yang H. A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios. Remote Sensing. 2026; 18(5):743. https://doi.org/10.3390/rs18050743

Chicago/Turabian Style

Zhou, Xiaoyu, Yaoshuai Dang, Jinling Song, Zhiqiang Xiao, and Hua Yang. 2026. "A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios" Remote Sensing 18, no. 5: 743. https://doi.org/10.3390/rs18050743

APA Style

Zhou, X., Dang, Y., Song, J., Xiao, Z., & Yang, H. (2026). A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios. Remote Sensing, 18(5), 743. https://doi.org/10.3390/rs18050743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Learning-Driven Spatio-Temporal Framework for Timely Corn Yield Estimation Across Multiple Remote Sensing Scenarios

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data

2.2.1. MODIS Surface Reflectance

2.2.2. MODIS Land Surface Temperature

2.2.3. Yield Data

2.2.4. Cropland Data Layer

2.2.5. Data Preprocessing

2.3. Method

2.3.1. Convolutional Neural Network Combined with GP (CNN_GP)

2.3.2. Long Short-Term Memory Network Combined with GP (LSTM_GP)

2.3.3. CNN_LSTM Combined with GP (CNN_LSTM_GP)

2.3.4. Gaussian Process Regression (GP)

2.3.5. Multiple Scenarios Yield Prediction Model

2.3.6. Model Evaluation

3. Results

3.1. Four-Phase Data Yield Prediction

3.2. Two-Phase Data Yield Prediction

3.3. Single-Phase Data Yield Prediction

4. Discussion

4.1. Impact of GP on Model Performance

4.2. Comparative Analysis with Time-Series Data Modeling

4.3. Additional Improvements of the Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI