Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index

Ji, Zhonglin; Pan, Yaozhong; Zhu, Xiufang; Wang, Jinyun; Li, Qiannan

doi:10.3390/s21041406

Open AccessArticle

Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index

by

Zhonglin Ji

^1,2,

Yaozhong Pan

^1,3,*

,

Xiufang Zhu

^1,2

,

Jinyun Wang

^1,2 and

Qiannan Li

^1,2

¹

State Key Laboratory of Remote Sensing Science, Jointly Sponsored by Beijing Normal University and Institute of Remote Sensing and Digital Earth of Chinese Academy of Sciences, Beijing 100875, China

²

Institute of Remote Sensing Science and Engineering, Faculty of Geographical Sciences, Beijing Normal University, Beijing 100875, China

³

Academy of Plateau Science and Sustainability, Qinghai Normal University, Xining 810016, China

^*

Author to whom correspondence should be addressed.

Sensors 2021, 21(4), 1406; https://doi.org/10.3390/s21041406

Submission received: 23 December 2020 / Revised: 8 February 2021 / Accepted: 12 February 2021 / Published: 17 February 2021

(This article belongs to the Section Remote Sensors)

Download

Browse Figures

Versions Notes

Abstract

Phenology is an indicator of crop growth conditions, and is correlated with crop yields. In this study, a phenological approach based on a remote sensing vegetation index was explored to predict the yield in 314 counties within the US Corn Belt, divided into semi-arid and non-semi-arid regions. The Moderate Resolution Imaging Spectroradiometer (MODIS) data product MOD09Q1 was used to calculate the normalized difference vegetation index (NDVI) time series. According to the NDVI time series, we divided the corn growing season into four growth phases, calculated phenological information metrics (duration and rate) for each growth phase, and obtained the maximum correlation NDVI (Max-R²). Duration and rate represent crop growth days and rate, respectively. Max-R² is the NDVI value with the most significant correlation with corn yield in the NDVI time series. We built three groups of yield regression models, including univariate models using phenological metrics and Max-R², and multivariate models using phenological metrics, and multivariate models using phenological metrics combined with Max-R² in the whole, semi-arid, and non-semi-arid regions, respectively, and compared the performance of these models. The results show that most phenological metrics had a statistically significant (p < 0.05) relationship with corn yield (maximum R² = 0.44). Models established with phenological metrics realized yield prediction before harvest in the three regions with R² = 0.64, 0.67, and 0.72. Compared with the univariate Max-R² models, the accuracy of models built with Max-R² and phenology metrics improved. Thus, the phenology metrics obtained from MODIS-NDVI accurately reflect the corn characteristics and can be used for large-scale yield prediction. Overall, this study showed that phenology metrics derived from remote sensing vegetation indexes could be used as crop yield prediction variables and provide a reference for data organization and yield prediction with physical crop significance.

Keywords:

yield prediction; corn; MODIS; NDVI time series; crop phenology; growth phase length; growth rate

1. Introduction

Timely and accurate predictions of crop yield before harvest at a large scale is critical for food security and administrative planning, especially in the current continually changing global environment and international situation [1]. At the same time, early-season crop yield predictions are also often required as essential information for decision making in the harvest, processing, storage, transportation, and marketing of agricultural commodities [2].

After decades of research, crop yield prediction methods can be summarized in two groups, i.e., empirical models and process-based models [3]. Empirical models determine the relationship between the prediction parameters and yield and process-based models simulate the growth process of the crop. The latter requires many calibration parameters, which are relatively difficult to obtain. Empirical models are more commonly used in large-scale yield prediction and mainly employed two parameters: environmental and remote sensing variables [4]. The former includes the four most important variables: soil productivity, accessibility of water, climate, and pests or diseases [5]. The rapid development of remote sensing technology has produced more remote sensing variables to serve crop yield prediction, which can be further divided into two types: a variable to monitor crop growth, such as vegetation indices (VIs) and photosynthetic activities [6,7,8,9], and the other variable to describe living conditions, such as heat stress [10,11] and water stress [12,13].

Recently, some studies have employed remote sensing derived phenological variables to predict crop yields [14,15]. These phenological variables are phenological period date information and belong to the variables that monitor crop growth. The date information assesses whether every phenological stage occurs during a period of favorable weather conditions [16] and how an accelerated or delayed phenological stage will affect crop growth conditions, especially when the current climate changes drastically [17]. For example, dates of anthesis, lengths of vegetative and reproductive growth periods, and the growing season can reflect climate change influences [18]. The plant breeding community also has a keen interest in developing crops that “stay-green” for longer, increasing the duration of grain-fill and decreasing senescence rate [19,20,21].

The phenological period date information provides practical support for the development of remote sensing in crop yield prediction. Many models predict crop yield based on remote sensing variables within a fixed timescale [3,4,22,23,24], such as the month. The representation of crop phenological date information is simplified in these models, such that understanding the yield variations is critical because crop growth characteristics and sensitivities toward different environmental events vary with changes in the growth phases (GPs) defined by phenological dates. This condition leads to spatial-temporal heterogeneity between the yield prediction variables. Experiments have shown that phenological dynamic information can solve this heterogeneity issue and improve the yield prediction or estimation accuracy [25,26]. For example, the accumulative leaf area index (LAI) in a specific GP had the highest correlation with the regional crop yield [27], and the time series index, combined with phenological date information, can effectively improve the yield prediction accuracy [28,29].

In addition to the phenological date, one piece of essential phenological information is GP duration [30]. Bai et al. [31] noted that the phase duration could be combined with remote-sensing-based parameters to improve crop yield prediction. The GPs in their study were divided by the effective accumulated temperature. Other than the effective accumulated temperature, the Normalized Difference Vegetation Index (NDVI) is widely used for monitoring crop growth conditions and is an effective way to extract phenology [32,33,34,35]. Magney et al. [36] used ground-based sensors to collect NDVI readings to divide GPs and calculate crop phenological information (the NDVI rate at different GPs and the duration of different GPs) at the field level; the results of their study indicated that NDVI rate and GP duration were good predictors for crop yield. NDVI rate represents the crop growth rate. The crop growth conditions can be reflected under the comprehensive effect of the external environment and crop characteristics by combining the growth duration and rate. Field observation data is a first-hand source of accurate and reliable information for crop phenology research. However, field observation experiments usually require substantial manpower, financial resources, material resources, and time. Therefore, field observations are not suitable as a method to obtain data for long-term and large-scale crop phenology.

Satellite remote sensing technology can effectively obtain long-term and large-scale phenological information. Although the technology has certain limitations, such as surface information accuracy (i.e., mixed pixel) and inherent complexity (i.e., cloud contamination and atmospheric variability), it lowers the cost of large-scale crop monitoring and possesses substantial potential for detecting crop regional phenology patterns through the VI time series [25,37,38]. For example, most remote-sensing-based studies have employed the data from the National Aeronautics and Space Administration’s (NASA) Moderate Resolution Imaging Spectroradiometer (MODIS) [39,40,41]. The spatial resolutions (250-m, 500-m, and 1000-m) are suitable for monitoring different scales from the county-level to the global scale, and the temporal resolutions (8- and 16-day) allow for continuous and in near-real-time monitoring within the whole growing season. Thus, the satellite remote sensing data is suitable to derive phenological metrics (duration and rate). It is also worth investigating the further application of phenological metrics in predicting yield at a large-scale.

The overall goal of this study was to predict corn yield using phenological information metrics extracted from the MODIS-NDVI time series. The specific objectives were to: (i) analyze the relationship between phenological metrics derived from satellite remote sensing VI and the yield, (ii) evaluate the capacity of phenological metrics to predict large-scale corn yield, and (iii) test the ability of the combined phenological metrics and other parameters derived from remote sensing for the prediction of corn yields.

2. Materials and Methods

2.1. Study Region

The study focused on agricultural counties in six states within the central US Corn Belt, including Illinois, Indiana, Iowa, Nebraska, Wisconsin, and North Dakota. There are a total of 314 counties in which the corn area exceeds 10,000 ha [28], and the mean field size in the US is 19.3 ha [42]. To account for the impact of geographical conditions on crop phenological metrics, the central US Corn Belt was divided into semi-arid and non-semi-arid regions according to the geographic variation in climate, topography, and edaphic conditions (Figure 1).

2.2. Data

MODIS 250-m and 8-day composite reflectance product data (MOD09Q1, version 6) for 2008–2018 were acquired from the National Aeronautics and Space Administration (NASA) Reverb (http://reverb.echo.nasa.gov/ (accessed on December 29, 2019)). There were 46 reflectance composites each year. Three MODIS tiles (h10v04, h11v04, and h11v05) were used to cover all counties fully and were re-projected using the MODIS re-projection tool (MRT) to the UTM (Universal Transverse Mercator) system. The 250-m and 8-day reflectance product allows for the calculation of VIs with a higher temporal resolution than that of the standard VI product (MOD13Q1) at 250-m and 16-day.

The county-level corn yields from 2008 to 2018 were obtained from the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS) (https://quickstats.nass.usda.gov/ (accessed on 21 April 2019)). The yield estimation unit was converted from bushels acre⁻¹ to kg ha⁻¹. As some counties lack individual annual yield data, the total number of yield samples was n = 3,320 for the whole region, 460 for the semi-arid region, and 2,860 for the non-semi-arid region. The corn-planting map data were extracted from the 30-m resolution Cropland Data Layer (CDL, http://nassgeodata.gmu.edu/CropScape/ (accessed on 21 April 2019)) from 2008 to 2018, re-projected to match the geographic projection of the MODIS data, and finally used to distinguish pixels dominated by corn from those dominated by other land cover types.

2.3. Yield Modeling Approach

Our general approach includes four main steps (Figure 2):

(1): Acquire pixel-based NDVI time series

We used band1 (red, 620–670 nm) and band2 (near-infrared, 841–876 nm) from MOD09Q1 to calculate the NDVI. The MODIS data were processed by an 8-day maximum value composite (MVC), which is less sensitive to clouds and other outliers. However, there are still many random factors that render the NDVI time series data irregular [43]. Thus, the NDVI time series data must be further smoothed to reduce the effects of noise and missing values before extracting the phenological crop characteristics. Popular smoothing methods include the Savitzky–Golay (SG), Double-logistic, and Whittaker Smoother. Previous studies suggested that the SG algorithm can better characterize the temporal signals of corn [44,45]; therefore, we used the SG filter to generate a smooth time series of NDVI on a pixel-by-pixel basis.

Many methods have been proposed to process MODIS data to improve the yield prediction or estimation accuracy [32,41,46,47,48]. The crop spatial distribution map is a vital element of the total crop production, and the ideal approach would be to use it as crop specific masks [32,47]. Mkhabela et al. [41] applied a crop land cover mask to satellite data to remove the effect of non-agricultural land on the NDVI signals, which improved the accuracy of crop yield prediction. We selected the pixels that were dominated by corn (i.e., corn planting area accounts for more than 70% of the MODIS-NDVI pixel area) as the corn pixels. The percentage of corn planted area in the pixels in each year was calculated using the corn planting map of the corresponding year.

In addition, crop planting dates and phenology vary with the location and external environment in every year. Thus, using a fixed calendar date in time series data to build remote-sensing-based yield prediction models is not optimal. A previous study showed that using the green-up date to adjust the start of the VI time series based on pixels can improve the remotely sensed yield prediction of both intra- and inter-annual variability in corn and soybeans [28]. Therefore, in this study, we defined the “phenologically adjusted” NDVI time series pixel by pixel and year by year (Figure 3). We first derived the daily NDVI for each corn pixel based on the 8-day NDVI data using cubic spline interpolation. Then, we defined the date when the NDVI curve began to increase at the bottom of the valley before the single NDVI corn peak as the start date (SD) of the corn growing season. The date when the NDVI reached the bottom of the valley after the single NDVI corn peak was defined as the end date (ED) of the corn growing season. The second derivatives of NDVI at SD and ED were approximately zero. Pixels before SD and after ED were excluded from the analysis, and the time series was adjusted based on SD. Thus, we created “phenologically adjusted” time series values for NDVI per corn pixel.

(2): Compute county-level NDVI time series

As the corn yield was recorded at the county level, we aggregated the daily NDVI of corn pixels in each county to obtain the daily county-level NDVI. To do this, the selected corn pixels were weighted by their contribution, which was the proportion of corn planting area in each pixel. Then, the NDVI values for each county were calculated by a weighted average of these pixels and weights.

(3): Calculation of the prediction variables

Three types of predictors (two phenological metrics [1,2] and one NDVI parameter [3]) were calculated using the county-level NDVI time series (step 2) and used as input variables to predict the corn yield:

Ref. [1] Duration (Equation (1)): Growth duration refers to the number of days in a given crop GP and is calculated by the day of year (DOY) of the end of the GP minus the DOY of its start.

D u r a t i o n = G P_{e n d} - G P_{s t a r t},

(1)

where

D u r a t i o n

is the phenological metric of duration;

G P_{e n d}

is the DOY of the end of the GP; and

G P_{s t a r t}

is the DOY of the start of the GP.

Ref. [2] Rate (Equation (2)): The rate (slope) of NDVI in a given GP refers to the change rate in the NDVI values throughout the GP.

R a t e = \frac{N D V I_{G P (e n d)} - N D V I_{G P (s t a r t)}}{D u r a t i o n},

(2)

where

R a t e

is the phenological metric of rate;

N D V I_{G P (e n d)}

is the NDVI value at

G P_{e n d}

;

N D V I_{G P (s t a r t)}

is the NDVI value at

G P_{s t a r t}

; and

D u r a t i o n

is the phenological metric of duration.

Ref. [3] Maximum correlation NDVI (Max-R²): The Max-R² [29] is the original NDVI value that has the most significant correlation with corn yield in the NDVI time series. The NDVI time series used to extract Max-R² started with the SD.

To extract the above two phenological metrics, four corn GPs were examined (Figure 4): the first phase (GP1) was from V1 to V6, the second phase (GP2) was from V6 to VT; the third phase (GP3) was from VT to R4, and the fourth phase (GP4) was from R4 to R6. The following V1, V6, VT, R4, and R6 refer to the start dates of the emergence, jointing, tasseling, dough, and maturity stages, respectively. The dates of V1, VT, and R4 were extracted using the dynamic threshold method [49]. During the rising phase of the daily corn NDVI time series curve, the points in time where the values increased by a certain value were defined as the date for V1 and VT. Setting to 10% of the distance between the minimum (value at SD) and the maximum is V1 and 90% of the distance between the minimum and maximum is VT, above the minimum. The date of R4 was defined from the descending phase of the corn NDVI time series curve as the point in time at which the value increased by a certain value, currently set to 10% of the distance between the maximum and minimum (value at ED), below the maximum. The date of V6 was defined as when the curvature reaches its local maximum value in the rising curve, where the stalk grows rapidly. The date of R6 occurred in the middle of the senescence phase [39] and was defined as when the curvature reaches its local maximum value in the descending curve. To obtain V6 and R6, the NDVI time series at the county level was fit by a piecewise logistic function [38], resulting in two functions for the rising and descending curves. Then, taking the derivation of the logistic functions, the maximum values of the two derivative functions were denoted as V6 and R6, respectively.

The first growth phase, second growth phase, third growth phase, and fourth growth phase were abbreviated as GP1, GP2, GP3, and GP4, respectively. To facilitate the description of these phenological metrics predictor variables, we defined GP1 duration, GP1 rate, GP2 duration, GP2 rate, GP3 duration, GP3 rate, GP4 duration, and GP4 rate as GP1D, GP1R, GP2D, GP2R, GP3D, GP3R, GP4D, and GP4R, respectively.

(4): Yield regression model

We built three groups of yield regression models for three regions: whole (semi-arid and non-semi-arid), semi-arid, and non-semi-arid. In the first group, we constructed univariate yield regression models for each predictor variable calculated in step 3 using different functions (linear, quadratic, logarithmic, etc.), by which we evaluated the relationship between each predictor variable and corn yield. In the second group, we constructed multivariate yield regression models using phenological metrics and assessed the performance of phenological metrics with respect to the yield prediction. In the third group, we constructed multivariate yield regression models using phenological metrics combined with Max-R² to evaluate the capability of combining phenological metrics with other types of NDVI remote sensing parameters for yield prediction. Both the second and third group models were built using a stepwise regression method, which can select significant variables into the regression equation and reduce collinearity. The standardized regression coefficients in the regression equation were used to compare the importance of different predictor variables on the dependent variable (corn yield).

2.4. Model Evaluation

To evaluate the performances of the prediction models in the second group, we used leave-one-year-out cross-validation [3], in which the model was iteratively trained on 10 years of data and then used to predict yield in the held-out year [28] from 2008 to 2018. The metrics used were the coefficient of determination (R²) and the root-mean-square error (RMSE). The R² was the predictive model R², and the RMSE was calculated between the actual and predicted yields. For the models in the first and third groups, we only used R² to evaluate the performances. We also used the variance inflation factor (VIF) (Equation (3)) to measure collinearity for the variables in the regression prediction model. In general, a VIF value of less than four indicates non-collinearity [50].

V I F_{i} = \frac{1}{1 - R_{i}^{2}}

(3)

where

R_{i}

is multiple correlation coefficient between the i-th variable,

X_{i}

, and all other variables,

X_{j} (j = 1, 2, \dots, k; j \neq i)

, and the multiple correlation coefficient is the arithmetic square root of the coefficient of determination

R^{2}

.

3. Results

3.1. MODIS-Derived Phenological Dates

MODIS-derived corn emergence and mature values were compared with the 50% corn emerged and mature dates from Crop Progress Reports (CPR) (2008–2018) (Figure 5) at the state level. The county sample numbers in Indiana and North Dakota were small, and these counties only covered a small part of the state’s spatial range. Therefore, we only compared the results of Illinois, Iowa, Nebraska, and Wisconsin. The R² was 0.50 and 0.65 for the corn emerged and mature, respectively. The corresponding RMSE values were 4.90 and 0.65 days, respectively. The results fell neatly around the 1:1 line.

3.2. Relationship between Predictor Variables and Yield

For each predictor variable, we established a set of univariate regression models with different functions, such as linear, quadratic, and logarithmic, and obtained the R² of each model. The largest R² values in the multiple univariate models are listed in Table 1. Most phenological metrics had a statistically significant relationship with the yield (at the p < 0.05 level) in the whole, semi-arid, and non-semi-arid regions, except for the GP2 duration and the GP3 rate. The GP1 rate, GP2 rate, GP3 duration, and GP4 rate could be used as yield prediction parameters with relatively large R² values (>0.20).

Increasing the growth rate in GP1 and GP2, extending the growth duration in GP3, and increasing the senescence rate in GP4 are beneficial for increasing the yield (Figure 6). GP1 is in the early stage within the whole growing season, where faster growth is better for the corn. GP2 includes the jointing stage, which is important for crop growth, and the interpretation power of it is stronger than GP1. GP3 is at growth peak; extending the time that the crop remains green helps the crop accumulate more nutrients. Increasing the senescence rate in GP4 ensures that more nutrients are transferred to the grain within a certain time. In addition, the R² values of all phenology metrics were similar in the whole region, non-semi-arid region, and non-semi-arid region, which indicated that the relationship strength between these metrics and corn yield was similar in different regions.

3.3. Yield Prediction with Phenological Metrics

The prediction models built with phenological metrics obtained the results with R² = 0.64, 0.72, and 0.64 in the whole region, semi-arid region, and non-semi-arid region (Table 2). More than 60% of the yield was explained by the combination of growth duration and rate. The best yield prediction with a maximum R² value (0.72) was in the semi-arid region. Prediction models did not have the problem of multi-collinearity with VIF values for all metrics < 4.

The four most essential metrics were selected by stepwise regression from eight metrics to build the models. For the whole region and non-semi-arid region, the GP2R presented the highest values for the standardized coefficient (0.70–0.74), followed by GP2D (0.52–0.60), GP3D, and GP4R. The GP2 belongs to the vegetative stage when the stems and leaves grow vigorously and continuously accumulate nitrogen. Crops will provide more nutrients to the ear during the reproductive stage with a longer time or a faster rate to store nitrogen in the vegetative stage [51]. GP3 contains the NDVI time series peak, and there is a positive correlation between leaf area duration (LAD) and corn yield during GP3 [19]. The GP4R had the smallest impact on yield among the four most essential metrics. The GP1R, which had a relatively large explanatory ability for the yield in Table 1, was not selected in the models, indicating there was collinearity among all phenological metrics. For the semi-arid region, beside GP2D, GP2R, and GP3D, GP4D was also a critical impact factor, which indicated that in the semi-arid region, the longer the fourth stage, the higher the crop yield.

Figure 7 shows the leave-one-year-out cross-validation results of the phenological yield prediction models constructed with the four most significant metrics (Table 2) in the three regions from 2008 to 2018. The medians of the R² values were 0.6–0.8, and the medians of the RMSE values were 900–1200 kg ha⁻¹. Results from the semi-arid region presented the highest R² and lowest RMSE, followed by the non-semi-arid region, and the results in the whole region presented the worst R² and RMSE. In addition, the results for 2012 were different from those of other years in the whole and non-semi-arid regions with the lowest R² values (Figure 7A).

3.4. Yield Prediction with Phenological Metrics and NDVI

The combination of three phenological metrics variables (GP1D, GP3D, and GP4D or GP1R, GP3D, and GP4R) and Max-R² improved the performance of the Max-R² yield prediction models (Table 1), with a higher R² value of 0.65, 0.73, and 0.68 (Table 3) in the whole region, semi-arid region, and non-semi-arid region, respectively. The prediction models in Table 3 did not have the problem of multi-collinearity with VIF values for all metrics < 4.

After adding the maximum correlation NDVI (Max-R²), the phenological metric variables used to construct the multivariate regression model changed compared with phenological metric models (Table 2). For the semi-arid region, the variables changed from the combination of GP2D, GP2R, GP3D, and GP4R to that of GP1R, GP3D, and GP4R. Combining the growth state, time, and rate, the yield prediction model will have more biophysical significance and better results in the semi-arid region. For the non-semi-arid region, the variables changed from the combination of GP2D, GP2R, GP3D, and GP4R to that of GP1D, GP3D, and GP4D. The Max-R² replaced all rate variables indicating that crop growth state and time are more important than the growth rate in a relatively humid environment. The whole region contains more counties located in the non-semi-arid region, and its model variables are the same as that in the non-semi-arid region.

4. Discussion

4.1. Contributions of This Study

First, this study demonstrated the feasibility of phenological metrics derived from satellite remote sensing data for crop yield prediction. The first group showed that some phenological metrics (durations and rates) have interpretation ability to the corn yields (R² ranged from 0.18 to 0.44 in Table 1) in the three regions, but the ability was limited (maximum R² = 0.44). Compared with this condition, the multivariate regression models built with some phenological metrics in the second group improved the yield prediction accuracy. The multivariate phenological metrics models’ stability and validity were proved through leave-one-year-out cross-validation, and these models can explain 60–80% of the yields. It indicated that phenological metrics from emergence to maturity were meaningful for crops and could be used as input variables to predict yield. Multivariate regression models in the third group built with some phenological metrics and NDVI obtained better yield prediction results than the NDVI univariate regression models in the first group. This indicates that phenological metrics derived from the NDVI time series could be incorporated with other parameters to improve yield prediction in large-scale.

Besides, our result is a useful supplement to phenological variables. Previous studies [14] had proven that phenological variables (phenological date) were closely correlated with crop yields. Phenological date variables can, directly and indirectly, influence the photosynthesis and respiration, which will change the accumulation of effective dry matter. The accumulation of effective dry matter is also affected by the time and rate of photosynthesis and respiration. We used statistic methods to investigate the impacts of phenological metrics (duration and rate) on corn yields. Some phenological metrics can achieve yield prediction, which had the interpretation ability of 60–80% in this paper. We recommend adopting combined phenological date and metrics variables in the future applications related to agricultural yield predictions. Besides, relationships between the phenological metrics derived from MODIS data at a large scale and yield were consistent with the actual growth characteristics of crops in the field. The relationships help the management department make agricultural production decisions in a unified manner. For example, fertilizing before and after the jointing stage increases the growth rate of corn.

Finally, this study provides a method reference for establishing the yield prediction model of other crops. The R² between the yield and NDVI rate of GP1, GP2, and GP4 ranged from 0.27 to 0.44, indicating that it may be more beneficial to organize time series data parameters based on the GP [25] and provide support for dynamic yield predictions with the growth stages as a time unit [26]. The combination of the duration and rate in each GP can simply simulate the crop growth process. Each crop has its growth characteristics at different growth stages, and the characteristics of each growth stage can be described by the growth duration time and rate. Thus, the models constructed with phenological metrics are based on the inherent growth and development of crops, and the yield prediction method is applicable to other crops (e.g., soybean and wheat).

4.2. Factors Affecting Model Accuracy

Our proposed yield prediction method may be affected by the following factors. First, the spatial resolution of the NDVI time series has an impact on our method. The models’ explanatory power in the second group was relatively weak compared with the phenological yield prediction models based on the NDVI time series derived from ground-based sensors [36]. This situation is understandable because there is a gap between the county-level NDVI obtained with 250-m resolution pixels and the NDVI obtained from the ground-based sensors. The commonly used MODIS-based 250-m products are suitable for many regions, such as the Great Plains of the US, which have large field sizes (mean field size of 19.3 ha [42]), and countries in Europe [4,47,52], which have small field sizes (two-thirds of Europeans fields are less than 5 ha [53]). Many methods (pixel-based crop planting ratio, phenological information, among others) have been proposed to improve the accuracy of MODIS in agricultural applications [28,41,46,52,54], such as crop map masks and phenological information adjustment used in this study. The NDVI is the most commonly used vegetation index, calculated from the two bands of the MODIS 250-m reflectivity products. Crop-specific NDVI selected by the crop mask [2,46] contains signals from all land surface types; therefore, it is still a mixture of the signals, which partially affects the accuracy of the phenology extraction and yield prediction.

Second, the NDVI time series need be collected from years with different climate condition (such as wet years and dry years). The yield prediction method using phenological metrics works best in the semi-arid region (Table 2, Figure 7). The United States suffered a drought in 2012, resulting in severe crop yield losses [55]. For the whole and non-semi-arid region, the explanatory power (R²) for models constructed with data including 2012 (average R² = 0.64, 0.67) was higher than that of models built without 2012 data (R² = 0.59, 0.59). It indicates that phenological metrics can respond to disasters, and datasets containing disaster information can describe more environmental characteristics. Thus, the model constructed using the datasets of 2012 can provide more yield information. Models constructed with data that did not include disaster information had higher RMSEs when predicting the yield in 2012 (Figure 7B—whole region/non-semi-arid region). However, the semi-arid region did not indicate the above situation. Irrigated corn was mainly planted in semi-arid regions [56], and farmers focused more on agricultural water management to alleviate the impacts of the drought.

Third, the determination of the phenology stage dates has potential impacts on our results. We divided the whole corn growing season into four relatively large GPs based on the MODIS-NDVI time series and corn growth characteristics. The phenological dates determine each GP, and the smoothing methods and phenology extraction methods based-on VI time series jointly determine the extraction of phenological dates. Besides, NDVI is sensitive to high-density vegetation and has a saturation phenomenon [57]. The NDVI saturation occurs in the curve peak of time series, which also influences the phenology stage dates and further affects the GP2, GP3, and GP4. Because there are generally two types of peaks, i.e., steep [54] and steady [58], this was consistent with the peaks obtained in this study. The peak’s steepness refers to two phenological dates in the GP3 and one phenological date in the GP2 (GP4) determined by the phenological extraction method. However, the effect of smoothing (phenology extraction) methods and NDVI saturation are relatively minor. The R² was 0.50 and 0.65 for corn emerged and mature, respectively. The corresponding RMSE values were 4.90 and 0.65 days. The phenological metrics models’ interpretation ability is more than 60%. Based on the above-mentioned phenology dates and yield models accuracy, we proved the feasibility of our method and showed that the phenological information metrics obtained from remote sensing data could be used to predict large-scale yield.

4.3. Direction of Future Improvement

The ability to predict yield using phenological metrics can be further improved. Further research should attempt to select pixels with a higher crop planting proportion to weaken the effect of sub-pixel mixtures. It also can use other indices, such as the enhanced vegetation index (EVI) and wide dynamic range vegetation index (WDRVI) [25,39], to obtain phenological information to avoid the impact that high-density vegetation cover has on NDVI saturation. Some remote sensing-based indices can also attempt to extract phenological information, such as the solar induced florescence (SIF), thermal decay rate, and vegetation optical depth (VOP), given that they have all become available at higher spatial resolutions. The SIF can capture vegetation’s photosynthesis process, the thermal decay rate monitors vegetation through diurnal temperature variations [59], and the VOP is highly sensitive to the water content and above-ground biomass of vegetation.

We used stepwise regression models to prove that the selected phenological metrics can be used to predict crop yield. Generally speaking, the use of more parameters can more comprehensively describe the crop growth environment conditions and growth status, which is conducive to better prediction of yield. In the future, we can combine phenological metrics with other parameters, such as the climate and vegetation index, as input to machine learning or deep learning which can effectively solve collinearity and extensively investigate the data features to predict the yield. These climate and vegetation index parameters also can take the phenological stage as the time unit, and combining phenology, climate, and vegetation index data to explore the ability to dynamically predict yield in the continuous phenological stages.

5. Conclusions

We used the MODIS MOD09Q1 product to calculate the corn NDVI time series and then divided it into four GPs. The phenological information metrics (duration and rate information) obtained in each GP were used to analyze the relationships between them and the corn yield and combine with the maximum correlation NDVI to build yield prediction models.

We obtained two main conclusions from the results of this study. First, most phenological metrics (duration and rate in different phases) extracted from the MODIS-NDVI time series strongly correlate with corn yield. Some phenological metrics can be combined to predict corn yield with relatively good results at a large scale. As a result of the interaction between crops and the environment, phenological information is a comprehensive and indirect crop yield indicator. It can be applied to yield predictions or estimations for other crop types (e.g., soybeans, wheat, cotton, and rice) and other regions. Second, phenological metrics can also be combined with other types of parameters, such as the maximum correlation NDVI, to improve the yield prediction or estimation accuracy.

The yield is the comprehensive performance of crop growth conditions throughout the season. Dividing the season into multiple phases and using duration and rate extracted from NDVI to describe crop growth duration and rate in different phases can simulate the crop growth. Crops require different environmental conditions and have different growth characteristics at different GPs. The NDVI rate based on GP shows a strong relationship with the yield. Therefore, for other yield predictions or estimation parameters, using the GP as the time scale to avoid geospatial data heterogeneity may be more reasonable, as the crop phenology varies by location and changes from one year to the next. A limitation of this study is that it did not clarify the effect of the time series smoothing methods, phenology extraction methods, and GP division method on the establishment of yield models. Moreover, there is no comparison among the performances of the different vegetation indices in this yield prediction method. In the future, these two aspects can be further analyzed to broaden the applicability of the current study to include more crop and vegetation indices’ diversity characteristics and to understand any limitations that may be present in the method.

Author Contributions

Conceptualization, methodology, formal analysis, and writing - original draft, Z.J.; writing - review and editing, supervision, and funding acquisition, Y.P.; writing - review and editing, X.Z.; investigation and validation, J.W.; resources, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National High Resolution Earth Observation System (The Civil Part) Technology Projects of China (project No. 20-Y30F10-9001-20/22) and the National Key Research and Development Program of China (project No. 2018YFC1504603).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model. Sensors 2019, 19, 4363. [Google Scholar] [CrossRef]
Liu, J.; Shang, J.; Qian, B.; Huffman, T.; Zhang, Y.; Dong, T.; Jing, Q.; Martin, T. Crop Yield Estimation Using Time-Series MODIS Data and the Effects of Cropland Masks in Ontario, Canada. Remote Sens. 2019, 11, 2419. [Google Scholar] [CrossRef]
Li, Y.; Guan, K.; Yu, A.; Peng, B.; Zhao, L.; Li, B.; Peng, J. Toward building a transparent statistical model for improving crop yield prediction: Modeling rainfed corn in the U.S. Field Crops Res. 2019, 234, 55–65. [Google Scholar] [CrossRef]
Kern, A.; Barcza, Z.; Marjanović, H.; Árendás, T.; Fodor, N.; Bónis, P.; Bognár, P.; Lichtenberger, J. Statistical modelling of crop yield in Central Europe using climate data and remote sensing vegetation indices. Agric. For. Meteorol. 2018, 260–261, 300–320. [Google Scholar] [CrossRef]
Elavarasan, D.; Vincent, D.R.; Sharma, V.; Zomaya, A.Y.; Srinivasan, K. Forecasting yield by integrating agrarian factors and machine learning models: A survey. Comput. Electron. Agric. 2018, 155, 257–282. [Google Scholar] [CrossRef]
Guan, K.; Berry, J.A.; Zhang, Y.; Joiner, J.; Guanter, L.; Badgley, G.; Lobell, D.B. Improving the monitoring of crop productivity using spaceborne solar-induced fluorescence. Glob. Chang. Biol. 2016, 22, 716–726. [Google Scholar] [CrossRef]
Qader, S.H.; Dash, J.; Atkinson, P.M. Forecasting wheat and barley crop production in arid and semi-arid regions using remotely sensed primary productivity and crop phenology: A case study in Iraq. Sci. Total Environ. 2018, 613, 250–262. [Google Scholar] [CrossRef]
Chahbi Bellakanji, A.; Zribi, M.; Lili-Chabaane, Z.; Mougenot, B. Forecasting of Cereal Yields in a Semi-arid Area Using the Simple Algorithm for Yield Estimation (SAFY) Agro-Meteorological Model Combined with Optical SPOT/HRV Images. Sensors 2018, 18, 2138. [Google Scholar] [CrossRef]
Yu, B.; Shang, S. Multi-year mapping of major crop yields in an irrigation district from high spatial and temporal resolution vegetation index. Sensors 2018, 18, 3787. [Google Scholar] [CrossRef] [PubMed]
Siebert, S.; Ewert, F.; Rezaei, E.E.; Kage, H.; Gras, R. Impact of heat stress on crop yield—on the importance of considering canopy temperature. Environ. Res. Lett. 2014, 9, 044012. [Google Scholar] [CrossRef]
Holzman, M.E.; Carmona, F.; Rivas, R.; Niclòs, R. Early assessment of crop yield from remotely sensed water stress and solar radiation data. ISPRS J. Photogramm. Remote Sens. 2018, 145, 297–308. [Google Scholar] [CrossRef]
Anderson, M.C.; Hain, C.R.; Wardlow, B.D.; Pimstein, A.; Mecikalski, J.R.; Kustas, W.P. Evaluation of Drought Indices Based on Thermal Remote Sensing of Evapotranspiration over the Continental United States. J. Clim. 2011, 24, 2025–2044. [Google Scholar] [CrossRef]
Inge, S.; Kjeld, R.; Jens, A. A simple interpretation of the surface temperature/vegetation index space for assessment of surface moisture status. Remote Sens. Environ. 2002, 79, 213–224. [Google Scholar]
Guo, Y.; Fu, Y.; Hao, F.; Zhang, X.; Wu, W.; Jin, X.; Bryant, C.R.; Senthilnath, J. Integrated phenology and climate in rice yields prediction using machine learning methods. Ecol. Indic. 2021, 120, 106935. [Google Scholar] [CrossRef]
Shammi, S.A.; Meng, Q. Use time series NDVI and EVI to develop dynamic crop growth metrics for yield modeling. Ecol. Indic. 2020, 107124. [Google Scholar] [CrossRef]
Vina, A.; Gitelson, A.A.; Rundquist, D.C.; Keydan, G.; Leavitt, B.; Schepers, J. Monitoring maize (Zea mays L.) phenology with remote sensing. Agron. J. 2004, 96, 1139–1147. [Google Scholar] [CrossRef]
Ahmad, S.; Abbas, Q.; Abbas, G.; Fatima, Z.; Atique-ur-Rehman Naz, S.; Younis, H.; Khan, R.J.; Nasim, W.; Habib ur Rehman, M. Quantification of Climate Warming and Crop Management Impacts on Cotton Phenology. Plants 2017, 6, 7. [Google Scholar] [CrossRef] [PubMed]
He, L.; Jin, N.; Yu, Q. Impacts of climate change and crop management practices on soybean phenology changes in China. Sci. Total Environ. 2020, 707, 135631–135638. [Google Scholar] [CrossRef] [PubMed]
Harris, K.; Subudhi, P.; Borrell, A.; Jordan, D.; Rosenow, D.; Nguyen, H.; Klein, P.; Klein, R.; Mullet, J. Sorghum stay-green QTL individually reduce post-flowering drought-induced leaf senescence. J. Exp. Bot. 2007, 58, 327–338. [Google Scholar] [CrossRef] [PubMed]
Christopher, J.T.; Veyradier, M.; Borrell, A.K.; Harvey, G.; Fletcher, S.; Chenu, K. Phenotyping novel stay-green traits to capture genetic variation in senescence dynamics. Funct. Plant Biol. 2014, 41, 1035–1048. [Google Scholar] [CrossRef]
Gaju, O.; Allard, V.; Martre, P.; Le Gouis, J.; Moreau, D.; Bogard, M.; Hubbart, S.; Foulkes, M.J. Nitrogen partitioning and remobilization in relation to leaf senescence, grain yield and grain nitrogen concentration in wheat cultivars. Field Crops Res. 2014, 155, 213–223. [Google Scholar] [CrossRef]
Lobell, D.B.; Thau, D.; Seifert, C.; Engle, E.; Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 2015, 164, 324–333. [Google Scholar] [CrossRef]
Johnson, D.M. An assessment of pre-and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens. Environ. 2014, 141, 116–128. [Google Scholar] [CrossRef]
Cao, J.; Zhang, Z.; Tao, F.; Zhang, L.; Luo, Y.; Zhang, J.; Han, J.; Xie, J. Integrating Multi-Source Data for Rice Yield Prediction across China using Machine Learning and Deep Learning Approaches. Agric. For. Meteorol. 2021, 297, 108275. [Google Scholar] [CrossRef]
Jiang, H.; Hu, H.; Zhong, R.; Xu, J.; Xu, J.; Huang, J.; Wang, S.; Ying, Y.; Lin, T. A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: A case study of the US Corn Belt at the county level. Glob. Chang. Biol. 2019. [Google Scholar] [CrossRef]
Feng, P.; Wang, B.; Liu, D.L.D.; Waters, C.M.; Yu, Q. Dynamic wheat yield forecasts are improved by a hybrid approach using a biophysical model and machine learning technique. Agric. For. Meteorol. 2020, 285–286, 107922. [Google Scholar] [CrossRef]
Ban, H.-Y.; Kim, K.S.; Park, N.-W.; Lee, B.-W. Using MODIS Data to Predict Regional Corn Yields. Remote Sensing 2017, 9, 16. [Google Scholar] [CrossRef]
Bolton, D.K.; Friedl, M.A. Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Sakamoto, T.; Gitelson, A.A.; Arkebauer, T.J. MODIS-based corn grain yield estimation model incorporating crop phenology information. Remote Sens. Environ. 2013, 131, 215–231. [Google Scholar] [CrossRef]
Peng, Z.; Jin, Z.; Zhuang, Q.; Philippe, C.; Carl, B.; Wang, X.; David, M.; David, L. The important but weakening maize yield benefit of grain filling prolongation in the US Midwest. Glob. Chang. Biol. 2018. [Google Scholar]
Bai, T.; Zhang, N.; Mercatoris, B.; Chen, Y. Jujube yield prediction method combining Landsat 8 Vegetation Index and the phenological length. Comput. Electron. Agric. 2019, 162, 1011–1027. [Google Scholar] [CrossRef]
Becker-Reshef, I.; Vermote, E.; Lindeman, M.; Justice, C. A generalized regression-based model for forecasting winter wheat yields in Kansas and Ukraine using MODIS data. Remote Sens. Environ. 2010, 114, 1312–1323. [Google Scholar] [CrossRef]
Zhao, W.L.; Zhen, H.E.; Jun-Ping, H.E.; Zhu, L.Q. Remote sensing estimation for winter wheat yield in Henan based on the MODIS-NDVI data. Geogr. Res. 2012, 31, 2310–2320. [Google Scholar]
Saeed, U.; Dempewolf, J.; Becker-Reshef, I.; Khan, A.; Ahmad, A.; Wajid, S.A. Forecasting wheat yield from weather data and MODIS NDVI using Random Forests for Punjab province, Pakistan. Int. J. Remote Sens. 2017, 38, 4831–4854. [Google Scholar] [CrossRef]
Sehgal, V.K.; Jain, S.; Aggarwal, P.K.; Jha, S. Deriving crop phenology metrics and their trends using times series NOAA-AVHRR NDVI data. J. Indian Soc. Remote Sens. 2011, 39, 373–381. [Google Scholar] [CrossRef]
Magney, T.S.; Eitel, J.U.H.; Huggins, D.R.; Vierling, L.A. Proximal NDVI derived phenology improves in-season predictions of wheat quantity and quality. Agric. For. Meteorol. 2016, 217, 46–60. [Google Scholar] [CrossRef]
Pieter, S.A.B.; Clement, A.; Kjell, H.A.; Johansen, B.; Bernt, J. Improved monitoring of vegetation dynamics at very high latitudes: A new method using MODIS NDVI. Remote Sens. Environ. 2006. [Google Scholar] [CrossRef]
Zhang, X.; Friedl, M.A.; Schaaf, C.B.; Strahler, A.H.; Hodges, J.C.; Gao, F.; Reed, B.C.; Huete, A. Monitoring vegetation phenology using MODIS. Remote Sens. Environ. 2003, 84, 471–475. [Google Scholar] [CrossRef]
Sakamoto, T.; Wardlow, B.D.; Gitelson, A.A.; Verma, S.B.; Suyker, A.E.; Arkebauer, T.J. A Two-Step Filtering approach for detecting maize and soybean phenology with time-series MODIS data. Remote Sens. Environ. 2010, 114, 2146–2159. [Google Scholar] [CrossRef]
Zeng, L.; Wardlow, B.D.; Wang, R.; Shan, J.; Tadesse, T.; Hayes, M.J.; Li, D. A hybrid approach for detecting corn and soybean phenology with time-series MODIS data. Remote Sens. Environ. 2016, 181, 237–250. [Google Scholar] [CrossRef]
Mkhabela, M.S.; Bullock, P.; Raj, S.; Wang, S.; Yang, Y. Crop yield forecasting on the Canadian Prairies using MODIS NDVI data. Agric. For. Meteorol. 2011, 151, 385–393. [Google Scholar] [CrossRef]
Yan, L.; Roy, D.P. Conterminous United States crop field size quantification from multi-temporal Landsat data. Remote Sens. Environ. 2016, 172, 67–86. [Google Scholar] [CrossRef]
Hird, J.N.; McDermid, G.J. Noise reduction of NDVI time series: An empirical comparison of selected techniques. Remote Sens. Environ. 2009, 113, 248–258. [Google Scholar] [CrossRef]
Jie, R.; Campbell, J.; Yang, S. Estimation of SOS and EOS for Midwestern US Corn and Soybean Crops. Remote Sens. 2017, 9, 722. [Google Scholar]
Shao, Y.; Lunetta, R.S.; Wheeler, B.; Iiames, J.S.; Campbell, J.B. An evaluation of time-series smoothing algorithms for land-cover classifications using MODIS-NDVI multi-temporal data. Remote Sens. Environ. 2016, 174, 258–265. [Google Scholar] [CrossRef]
Shao, Y.; Campbell, J.B.; Taff, G.N.; Zheng, B. An analysis of cropland mask choice and ancillary data for annual corn yield forecasting using MODIS data. Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 78–87. [Google Scholar] [CrossRef]
Bognár, P.; Kern, A.; Pásztor, S.; Lichtenberger, J.; Koronczay, D.; Ferencz, C. Yield estimation and forecasting for winter wheat in Hungary using time series of MODIS data. Int. J. Remote Sens. 2017, 38, 3394–3414. [Google Scholar] [CrossRef]
Genovese, G.; Vignolles, C.; Nègre, T.; Passera, G. A methodology for a combined use of normalised difference vegetation index and CORINE land cover data for crop yield monitoring and forecasting. A case study on Spain. Agronomie 2001, 21, 91–111. [Google Scholar] [CrossRef]
Jonsson, P.; Eklundh, L. Seasonality extraction by function fitting to time-series of satellite sensor data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1824–1832. [Google Scholar] [CrossRef]
Obrien, R.M. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar]
Peoples, M.B.; Beilharz, V.C.; Waters, S.P.; Simpson, R.J.; Dalling, M.J. Nitrogen redistribution during grain growth in wheat (Triticum aestivum L.). Planta 1980, 149, 241–251. [Google Scholar] [CrossRef] [PubMed]
Nagy, A.; Fehér, J.; Tamás, J. Wheat and maize yield forecasting for the Tisza river catchment using MODIS NDVI time series and reported crop statistics. Comput. Electron. Agric. 2018, 151, 41–49. [Google Scholar] [CrossRef]
Łakomiak, A.; Zhichkin, K.A. Economic aspects of fruit production: A case study in Poland. Proc. BIO Web Conf. EDP Sci. 2020, 17, 00236. [Google Scholar]
Seo, B.; Lee, J.; Lee, K.D.; Hong, S.; Kang, S. Improving remotely-sensed crop monitoring by NDVI-based crop phenology estimators for corn and soybeans in Iowa and Illinois, USA. Field Crops Res. 2019, 238, 113–128. [Google Scholar] [CrossRef]
Schwalbert, R.; Amado, T.J.C.; Nieto, L.; Corassa, G.M.; Rice, C.W.; Peralta, N.R.; Schauberger, B.; Gornott, C.; Ciampitti, I.A. Mid-season county-level corn yield forecast for US Corn Belt integrating satellite imagery and weather variables. Crop Sci. 2020. [Google Scholar] [CrossRef]
Schlenker, W.; Hanemann, W.M.; Fisher, A.C. Will U.S. Agriculture Really Benefit from Global Warming? Accounting for Irrigation in the Hedonic Approach. Am. Econ. Rev. 2005, 95, 395–406. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef]
Diao, C. Remote sensing phenological monitoring framework to characterize corn and soybean physiological growing stages. Remote Sens. Environ. 2020, 248, 111960. [Google Scholar] [CrossRef]
Kumar, S.; Prihodko, L.; Lind, B.; Anchang, J.; Ji, W.; Ross, C.; Kahiu, M.; Velpuri, N.; Hanan, N. Remotely sensed thermal decay rate: An index for vegetation monitoring. Sci. Rep. 2020, 10, 1–11. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of selected counties, which are divided into semi-arid and non-semi-arid regions.

Figure 2. Flow diagram of the datasets and processing used in the model, indicating four steps of model development: 1—obtaining the normalized difference vegetation index (NDVI) time series from the start date to end date of corn at a pixel level, 2—computing the county-level NDVI time series, 3—deriving the prediction variables (duration and rate in four growth phases, maximum correlation NDVI), and 4—constructing the regression relationships between the corn yield and predictors.

Figure 3. Schematic on how to determine the growing season at the pixel level. NDVI values were extracted from the start date of the growing season to the end date of the growing season for each corn pixel. The “phenologically adjusted” time series values of each corn pixel started with the start date. DOY denotes Day of Year.

Figure 4. Schematic of the four growth phases for the county-level corn NDVI time series values of the growing season. SD denotes the start date of the corn-growing season.

Figure 5. Comparison of the Moderate Resolution Imaging Spectroradiometer (MODIS)-derived emergence date (V1) values, mature date (R6) values, and the United States Department of Agriculture (USDA) Crop Progress Reports (CPR) survey data of 50% corn emerged, mature dates at the state level: (A) V1 estimation; (B) R6 estimation.

Figure 6. Relationship between yield and GP1 rate, GP2 rate, GP3 duration, and GP4 rate in the whole region. The four phenological predictor variables have the best explanatory power (R² > 0.20) for corn yield among the eight variables.

Figure 7. Boxplot of phenological yield prediction model performance for (A) R² and (B) RMSE in the whole region, semi-arid region, and non-semi-arid region. The models were built with the four most significant phenological metrics using leave-one-year-out cross-validation from 2008 to 2018.

Table 1. R² between the duration (days), rate (∆NDVI/days), and Max-R² and yield (kg/ha). The whole region contains all counties located in the semi-arid and non-semi-arid regions; semi-arid refers to the counties in the semi-arid region and non-semi-arid refers to the counties in the non-semi-arid region.

Phenological Predictor Variable	Whole Region	Semi-Arid Region	Non-Semi-Arid Region
GP1 duration	0.04 *	0.14 *	0.02 *
GP1 rate	0.25 *	0.35 *	0.23 *
GP2 duration	0.18 *	0.40 *	0.19 *
GP2 rate	0.32 *	0.44 *	0.30 *
GP3 duration	0.27 *	0.43 *	0.25 *
GP3 rate	0.01	0.01	0.01
GP4 duration	0.05 *	0.33 *	0.02
GP4 rate	0.37 *	0.35 *	0.38 *
Max-R²	0.61 *	0.66 *	0.62 *

* Significant at p < 0.05.

Table 2. Results from stepwise multiple linear regression between phenological metrics during all phenological phases and yield with data from 2008 to 2018.

Region	Equation	R²	VIFs
Whole region	Y = −10,084.25 + 360.58GP2D + 953,476.78GP2R + 97.00GP3D − 102,453.38GP4R ^a	0.64	X < 3.10
Whole region	Y = 0.52GP2D + 0.70GP2R + 0.32GP3D − 0.08GP4R ^b	0.64	X < 3.10
Semi-arid region	Y = −4716.91 + 155.34GP2D + 483,419.05GP2R + 110.04GP3D + 94.97GP4D ^a	0.72	X < 1.63
Semi-arid region	Y = 0.24GP2D + 0.35GP2R + 0.36GP3D + 0.23GP4D ^b	0.72	X < 1.63
Non-Semi-arid region	Y = −13918.97 + 448.02GP2D + 1,084,391.95GP2R + 107.47GP3D − 127,325.54GP4R ^a	0.67	X < 3.01
Non-Semi-arid region	Y = 0.60GP2D + 0.74GP2R + 0.35GP3D − 0.09GP4R ^b	0.67	X < 3.01

^a: unstandardized; ^b: standardized.

Table 3. Results for stepwise multiple regression between Max-R² with rate and duration and yield. Dates from 2008 to 2018.

Region	Equation	R²	VIFs
Whole region	Y = −11,872.08−56.01GP1D + 45.88GP3D + 56.74GP4D + 27,674.64Max-R²	0.65	X < 1.60
Semi-arid region	Y = −12,081.43 + 46,3358.80GP1R + 56.88GP3D + 545,886.68GP4R + 28,890.95Max-R²	0.73	X < 2.61
Non-Semi-arid region	Y = −15530.44−77.19GP1D + 56.39GP3D + 72.03GP4D + 31,959.49Max-R²	0.68	X < 1.46

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ji, Z.; Pan, Y.; Zhu, X.; Wang, J.; Li, Q. Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index. Sensors 2021, 21, 1406. https://doi.org/10.3390/s21041406

AMA Style

Ji Z, Pan Y, Zhu X, Wang J, Li Q. Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index. Sensors. 2021; 21(4):1406. https://doi.org/10.3390/s21041406

Chicago/Turabian Style

Ji, Zhonglin, Yaozhong Pan, Xiufang Zhu, Jinyun Wang, and Qiannan Li. 2021. "Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index" Sensors 21, no. 4: 1406. https://doi.org/10.3390/s21041406

APA Style

Ji, Z., Pan, Y., Zhu, X., Wang, J., & Li, Q. (2021). Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index. Sensors, 21(4), 1406. https://doi.org/10.3390/s21041406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Crop Yield Using Phenological Information Extracted from Remote Sensing Vegetation Index

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Region

2.2. Data

2.3. Yield Modeling Approach

2.4. Model Evaluation

3. Results

3.1. MODIS-Derived Phenological Dates

3.2. Relationship between Predictor Variables and Yield

3.3. Yield Prediction with Phenological Metrics

3.4. Yield Prediction with Phenological Metrics and NDVI

4. Discussion

4.1. Contributions of This Study

4.2. Factors Affecting Model Accuracy

4.3. Direction of Future Improvement

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI