Mid-Season High-Resolution Satellite Imagery for Forecasting Site-Specific Corn Yield

A timely and accurate crop yield forecast is crucial to make better decisions on crop management, marketing, and storage by assessing ahead and implementing based on expected crop performance. The objective of this study was to investigate the potential of high-resolution satellite imagery data collected at mid-growing season for identification of within-field variability and to forecast corn yield at different sites within a field. A test was conducted on yield monitor data and RapidEye satellite imagery obtained for 22 cornfields located in five different counties (Clay, Dickinson, Rice, Saline, and Washington) of Kansas (total of 457 ha). Three basic tests were conducted on the data: (1) spatial dependence on each of the yield and vegetation indices (VIs) using Moran’s I test; (2) model selection for the relationship between imagery data and actual yield using ordinary least square regression (OLS) and spatial econometric (SPL) models; and (3) model validation for yield forecasting purposes. Spatial autocorrelation analysis (Moran’s I test) for both yield and VIs (red edge NDVI = NDVIre, normalized difference vegetation index = NDVIr, SRre = red-edge simple ratio, near infrared = NIR and green-NDVI = NDVIG) was tested positive and statistically significant for most of the fields (p < 0.05), except for one. Inclusion of spatial adjustment to model improved the model fit on most fields as compared to OLS models, with the spatial adjustment coefficient significant for half of the fields studied. When selected models were used for prediction to validate dataset, a striking similarity (RMSE = 0.02) was obtained between predicted and observed yield within a field. Yield maps could assist implementing more effective site-specific management tools and could be utilized as a proxy of yield monitor data. In summary, high-resolution satellite imagery data can be reasonably used to forecast yield via utilization of models that include spatial adjustment to inform precision agricultural management decisions.

marketing, storage, and transport [1][2][3][4][5].Statistical and mechanistic models have been used to make such crop forecast but existing models are applicable only for large-scale yield prediction, i.e., regional or state level.As crop management progresses from large-scale blanket management to precision agriculture (PA), it requires technologies that evaluate within-field variation and provide reasonable yield forecasts.Precision agriculture is the application of a set of technologies and principles for managing spatial-and temporal-variability associated with the aspects of agricultural production, in order to improve input efficiency and primarily increase the economic return of the farming system [6][7][8].In the major agricultural regions of the world, introduction of yield monitor technologies coupled with Global Positioning Systems (GPS) made implementation of PA practical.Digital maps of grain yield obtained from yield monitors allow analysis of the spatial variability within an area of production [9].Interpretation of these digital maps, however, is often difficult because pattern of grain yield variability is permanently influenced by spatial (terrain attributes, erosion classes and soil properties) and temporal (soil pathogens, diseases and production issues in planting the crop) factors [10].In such cases, interpretation of the true spatial pattern of grain yield require years of accumulated yield maps (temporal-analysis), soil and crop management data, and weather information [11,12].
Coupling remote sensing and high-resolution spatio-temporal data collected at multiple growth stages, with yield monitor information has the potential to contribute to site-specific crop management [12,13].For example, the leaf area index (LAI) is a key variable to estimate the foliage canopy and to forecast growth and crop yields [14].Rapid-and regional-estimation of LAI via utilization and integration on models of data collected from remote sensors such as satellite imagery provides a large benefit in assessing in-season plant traits [14].High-resolution satellite imagery (multispectral or hyperspectral) can provide valuable data on the status and health of crops, depending on the interaction and the relationship between electromagnetic radiation (EMR) and foliage [12].Utilization of satellite imagery facilitates the identification of within-field variability and dry matter production in order to improve in-and after-season management decisions [13,15].Currently, the normalized difference vegetation index (NDVI) is most commonly and widely used vegetation index to assess crop growth and yield.Normalized difference vegetative index (VI) is sensitive to low LAI (i.e., LAI < 2-3), but saturates at medium to high LAI and yield [15].A few VIs have shown greater sensitivity to higher LAI and biomass, such as the simple ratio [15], or the green NDVI (NDVIG) [14].Indices that incorporate the reflectance of red-edge bands such as the red-edge triangular vegetation index (RTVI) and red edge NDVI (NDVIre) have increased potential for estimating LAI and biomass [14].Most of the reported red-edge indices were derived from narrow band field spectroradiometers [15,16].Thus, estimation of NDVI and red-edge indices from satellite imagery enhanced the power of characterizing within-field variability at a large-scale.
In the application of satellite data to PA, crop information is required at sufficiently high spatial and temporal resolutions to enable within-field monitoring.The latter data resolution can be obtained from the RapidEye satellite platform, which is the first commercial high-resolution constellation of satellites with a red-edge band.As a constellation of five, the RapidEye satellite platform can provide imagery over relatively large areas (swath of 77 km) at a spatial resolution of 5 m and a temporal resolution of one day, increasing the rate of success to acquire cloud-free imagery data.RapidEye's traditional broadband and red-edge indices were evaluated for grassland biomass and nitrogen [17], forest LAI [18], crop canopy chlorophyll content [19], wheat ground cover and LAI [20] and for forecasting yield at regional scale [21,22], but this satellite platform has not been used to forecast within-field variation for grain yield in corn (Zea mays L.).
To date, yield forecasts based on satellite imagery are not only used for large scale prediction but the models used to predict grain yield with VIs were mainly classical ordinary least squares (OLS)-based simple or multiple regression techniques [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23].These models did not take spatial autocorrelation structure into account, which, in turn, affects the crops site-specific function estimation, leading to inflated variance and most likely wrong conclusions.As demonstrated by Anselin [24], spatial correlation of regression residuals should be critically considered in the analysis of yield monitor data.Following this rationale, the application of spatial econometric methods incorporates a simultaneous autoregressive model of order one for the error term (SAR error model or SPL) and considers spatial neighborhood dependence structure [24][25][26].
Previous studies portrayed the benefits of using high-resolution satellite imagery for identifying within-field variation [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27] and provided a solid foundation for developing and extending this procedure to multiple sites and evaluation years.The current research project provides novel points in the evaluation of multiple fields in diverse site-years and in exploring the utilization of econometric models for accounting for the spatial variation in corn production.Corn is the major summer annual crop in the United States, accounting for more than 30 million ha −1 harvested annually [28].The objective of this study was to investigate the potential of high-resolution satellite imagery data collected at mid-growing season for identification of within-field variation and to reliably forecast corn yield at varying site-years.

Study Area and Data Description
The analysis was conducted on end-season yield monitor data and mid-season (overall between V12 to R3 growth stages [29].RapidEye satellite imagery of 22 commercial production corn fields located in five different counties (Clay, Dickinson, Rice, Saline, and Washington) of the state of Kansas (from 2011 to 2015), US was used.Each of the farmer fields had one year worth of data for the years from 2011 to 2015 (Table 1).The earliest corn planting data in the region were from the first week of April, with 50% planting date for the entire state attained by last week of April and latest corn harvest by early October (USDA-NASS).Fertilizer application rates, crop management and tillage practices varied between fields and were chosen by the farmer, but each farmer managing more than one field used similar practices on their fields.No-till (direct seeding), conservation and reduced tillage practices are widespread for the corn production systems in the Great Plains region, including Kansas [30].A total of 22 mid-season (July-August) RapidEye images were used from the growing seasons of 2011-2015 (Table 1).The RapidEye image was collected during a critical period for determining the grain yield of corn (approximately 20 days before and 20 after flowering) and it consists of blue (440-510 nm), green (520-590 nm), red (630-685 nm), red-edge (690-730 nm), and near infrared (NIR) (760-850 nm) spectral bands at 5 m ground sampling distance at nadir.Indices selected for evaluation used a combination of visible, near-infrared and red-edge bands (Table 2), including NDVI, red edge normalized difference vegetation index (NDVIre), red edge simple ratio (SRre) and green NDVI (NDVIG).The RapidEye image was orthorectified by the vendor.Corn grain yield was georeferenced using a yield monitor system (grain mass flow and moisture sensors) and grain yield was measured on three-second intervals and recorded using equipped with DGPS.Grain yield data were adjusted to 155 g•kg −1 grain moisture, spatially located and analyzed with ArcGIS Geospatial Analyst (ArcGIS v10.3,Environmental System Research Institute Inc. (ESRI), Redlands, CA, USA).The data points located approximately 20 m from the borders of the sites were deleted before the analysis because the combination was unlikely to be full [34,35].Yield measurements assumed to be erroneous were excised by standard protocol if the observations were more than three standard deviations (SDs) from the mean, as were yield data recorded when the harvester changed direction or speed significantly [11,36].All geostatistical evaluation was conducted with the ArcGIS Geospatial Analyst (ArcGIS v10.3).A final 5 m × 5 m grid cell size was chosen because it reflects the scale of variability associated with the VIs of Rapid Eye image.

Data Analysis
A workflow on data analysis and all steps required for the meta-and global-functions (entire database, training and validation datasets) is presented in Figure 1. the grain yield of corn (approximately 20 days before and 20 after flowering) and it consists of blue (440-510 nm), green (520-590 nm), red (630-685 nm), red-edge (690-730 nm), and near infrared (NIR) (760-850 nm) spectral bands at 5 m ground sampling distance at nadir.Indices selected for evaluation used a combination of visible, near-infrared and red-edge bands (Table 2), including NDVI, red edge normalized difference vegetation index (NDVIre), red edge simple ratio (SRre) and green NDVI (NDVIG).The RapidEye image was orthorectified by the vendor.Corn grain yield was georeferenced using a yield monitor system (grain mass flow and moisture sensors) and grain yield was measured on three-second intervals and recorded using equipped with DGPS.Grain yield data were adjusted to 155 g•kg −1 grain moisture, spatially located and analyzed with ArcGIS Geospatial Analyst (ArcGIS v10.3,Environmental System Research Institute Inc. (ESRI), Redlands, CA, USA).The data points located approximately 20 m from the borders of the sites were deleted before the analysis because the combination was unlikely to be full [34,35].Yield measurements assumed to be erroneous were excised by standard protocol if the observations were more than three standard deviations (SDs) from the mean, as were yield data recorded when the harvester changed direction or speed significantly [11,36].All geostatistical evaluation was conducted with the ArcGIS Geospatial Analyst (ArcGIS v10.3).A final 5 m × 5 m grid cell size was chosen because it reflects the scale of variability associated with the VIs of Rapid Eye image.

Data Analysis
A workflow on data analysis and all steps required for the meta-and global-functions (entire database, training and validation datasets) is presented in Figure 1.As an initial step, spatial autocorrelation analysis was conducted on yield and VIs (NIR, NDVI, NDVIG, NDVIre, and SRre) data of each field using Moran's [37] test (Table 3).Moran's I statistic measures the strength of spatial autocorrelation in a response among nearby locations in space as a function of cross-products of the neighboring weighted deviations from the mean.Moran's I coefficient values near 1 and −1 indicate strong positive and negative autocorrelation, respectively.Coefficient values near 0 support the null hypothesis of no spatial autocorrelation.Spatial correlation analysis was conducted using GeoDa software 1.4.6 [38].In order to identify an appropriate model that describes the relationship between end-season observed corn yields as a function of VIs of mid-season high-resolution imagery data for each field.Two models are considered, first is the classical linear regression model assuming that the errors are independent and identically distributed (i.i.d.).Ordinary least squares (OLS) is known as an efficient method for estimating the unknown parameters for this model, therefore we will call this baseline model as "OLS" model.Both the response variable corn yield and predictor variables VIs, as well as the regression errors, exhibit spatial autocorrelation as Moran's I test shows, the i.i.d.assumption is violated.This leads us to consider the application of the spatial error model (Anselin [24]), which is called "spatial econometric" model in this paper, to account for the spatial interaction (spatial autocorrelation) and spatial structure (spatial heterogeneity) in the datasets.To be more specific, this spatial regression model can be expressed as: where y is a vector (n by 1) of observations on the dependent variable, X is an n by p matrix of observations on the explanatory variables, and u is an error term that follows a simultaneous autoregressive (SAR) specification with spatial autoregressive coefficient λ.In the spatial autoregression, the vector of errors is expressed as a sum of a vector ε (n by 1) of independent and identically distributed innovation terms and a so-called spatially lagged error λWu.The latter boils down to a weighted average of errors in the neighboring locations.The selection of neighbors is formally specified in the n by n spatial weights matrix W. In this study, W was created using the distance-based K-Nearest Neighbor Spatial Weights, because the database was irregular [39].For this spatial econometric model, the parameter estimation is achieved by maximum likelihood estimation method.In comparison, the classical OLS model is simply given by y = Xβ + ε, where ε is a vector of i.i.d error terms.While applying to dataset, ordinary estimation method is used to estimate the parameters.To simplify equations from 22 (Figure 2) to two global models (Figure 3), fields were grouped into two clusters (Figure 1).Before grouping fields, the database (5 m × 5 m grid) for yield and VIs was randomly sampled to generate 600 observations per field.The yield data were normalized to lessen the effect of years, which allows comparison across site-years [40,41].Clustering analysis divided two groups, i.e., fields with spatial autocorrelation in error term (λ values significant, p ≤ 0.05) and without spatial autocorrelation (λ values insignificant, p > 0.05).Five independent fields (F1, F11, F13, F21 and F22); two from those fields that need spatial adjustment, two fields from those that did not require spatial adjustment, and one from all fields, were randomly selected and kept aside for validation analysis (Figure 1).From the remaining training data, two global models were developed to predict end-season observed yield in fields with and without spatial autocorrelation.The two models developed with training dataset were validated with the testing dataset (fields that were not used in the database to build both models, but to validate the models (Figure 1).In the case of OLS and spatial (SPL) models, two fields randomly selected were F1-F13 and F11-F22, respectively.To further simplify matters, all training datasets, irrespective of whether they require spatial adjustment or not, were combined and one SPL model was developed and validated with the one validation dataset selected (F21).
Remote Sens. 2016, 8, 848 6 of 16 Model performance was assessed using statistical criteria proposed by Akaike (AIC) [24,26].Accuracy of estimation and model fitting was evaluated using the Root-Mean Square Error (RMSE) and Pseudo R 2 [42].In addition, spatial predictions from each model were visually compared with geostatistical interpolation of yield from the dataset for each field and for the global meta-function.Model performance was assessed using statistical criteria proposed by Akaike (AIC) [24,26].Accuracy of estimation and model fitting was evaluated using the Root-Mean Square Error (RMSE) and Pseudo R 2 [42].In addition, spatial predictions from each model were visually compared with geostatistical interpolation of yield from the dataset for each field and for the global meta-function.

Spatial Autocorrelation on Yield and Imagery Data
Spatial autocorrelation analysis conducted using global Moran's I test (MI) on vegetation indices (VIs) and yield monitor data is presented in Table 3.In general, the autocorrelation (Moran's I test) for all variables was positive and statistically significant indicating yield or VI values that are closer in space to one another are more similar and the relation decreases as distance increases.The spatial autocorrelation for corn yield was stronger (MI > 0.15) for fields F6, F10, F11, F12, and F20 compared with others (MI < 0.15).Similarly, spatial autocorrelation of VI values (NIR, SRre, NDVIr, NDVIG, and NDVIre) were stronger (MI > 0.15) for fields F3, F6, F10, F11, F12, and F22, with few other fields (F13, F20, and F18) demonstrating strong spatial autocorrelation only for specific VIs (Table 3).

Comparison of Models for Relationship between Observed Yield and Satellite Data
The VIs were independent variables in empirical regression models developed to estimate final yield (dependent variable).Results of the model calibration indicated that the SPL regression models have a much better ability in predicting the within-in field variation with significantly lower AIC values as compared to the OLS models for almost all fields (Table 4).The AIC fit criterion of the OLS model were improved by approximately 4%, estimated by calculating the relative difference between AIC for OLS versus SPL (ranging from 0.1% to 42%), when spatial error dependence was included in the model.Additionally, the AIC fit criterion of the OLS model increased by 6% when compared only with models that have significant λ values (F2, F5, F6, F9, F10, F11, F12, F14, F15, F16, F17, F18, F19, F20, F21, and F22) (Table 4).The VIs retained varied across the multiple farmer fields evaluated.For the model evaluation, the ranking of the most commonly retained VIs were as follows: NDVIre (13 fields), NDVIG (11 fields), NDVI (10 fields), NIR (five fields) and SRre (four fields).The comparison of the magnitude of the λ values (larger values indicate higher spatial autocorrelation) indicated that the fields with significant λ values (p < 0.05) have a higher spatial continuity relative to when λ values were not significant (Table 4).In summary, SPL outperformed OLS methods (Table 4).For fields presenting better fit with the SPL models, a significant λ value reflected a very strong spatial correlation in errors.By accounting for spatial dependence, regression coefficients of the models changed, which improved the mapping of intra-field yield variation.Notwithstanding the level of significance for λ value, SPL regression models presented tend to have smaller AIC and therefore show better trade-off between the goodness of fit of the model and the complexity of the model (Table 4).Following this rationale, the scatter plots were developed utilizing the SPL model validation for all fields (Figure 2).Pseudo coefficient of determination (R 2 ) for all fields ranged from 0.29 to 0.70 units, with approximately 50% of the fields presenting pseudo R 2 values above 0.5 units (Figure 2).End-season yield monitor data reflected a yield variation ranging from 2 to 14 Mg•ha −1 across all farms evaluated.The results indicated that the spatial models provided consistent predictions, with low RMSE values and higher R 2 for the estimated versus observed yield data.The RMSE difference between estimated-and observed-yields was approximately of 1 Mg•ha −1 (ranged between 0.1 and 2.8 Mg•ha −1 ) for all 22 fields.

Spatial and OLS Model Variables and Validation Performance
Based on each field evaluation (Table 4), a training dataset was prepared including all models that resulted in a lack of spatial autocorrelation, OLS model, for fields: F3, F4, F7, and F8 (Figure 1).All four fields were utilized and an overall equation was determined in order to estimate yields.Estimated yield resulted by the OLS model showed similarity to the observed yield at harvest, with an adequate coefficient of determination (R 2 = 0.47) and root mean square error of 1.4 Mg•ha −1 (Figure 3a).Similar to the OLS model, a training dataset was built for the SPL model including fields F2, F5, F6, F9, F10, F12, F14, F15, F16, F17, F18, F19, F20, and F21, following the steps proposed in the framework presented in Figure 1.The SPL model was utilized to fit estimated versus observed yields, which presented a low RMSE (1.28 Mg•ha −1 ), high correlation (R 2 = 0.48), and with all observations close to 1:1 line (Figure 3b).Better performance, slightly higher coefficient of determination, and lower RMSE on forecasting yields was documented for the SPL model (Figure 3b) as compared to the overall OLS model (Figure 3a).
After clustering into two groups, OLS vs. SPL groups, and variables were selected, out of five VIs, the NDVIre and NDVIG were statistically significant in both OLS and spatial models, whereas NIR was only significant in the spatial model (Table 5).Predictive yield maps were prepared with the validation data (based on framework in Figure 1).Both selected models provided significant predictors for yield (p < 0.05) within their respective cluster.The regression models developed were applied to each respective training datasets (OLS and SPL datasets) and predictive corn yield maps were generated from those models (Figure 4).Notwithstanding the level of significance for λ value, SPL regression models presented tend to have smaller AIC and therefore show better trade-off between the goodness of fit of the model and the complexity of the model (Table 4).Following this rationale, the scatter plots were developed utilizing the SPL model validation for all fields (Figure 2).Pseudo coefficient of determination (R 2 ) for all fields ranged from 0.29 to 0.70 units, with approximately 50% of the fields presenting pseudo R 2 values above 0.5 units (Figure 2).End-season yield monitor data reflected a yield variation ranging from 2 to 14 Mg•ha −1 across all farms evaluated.The results indicated that the spatial models provided consistent predictions, with low RMSE values and higher R 2 for the estimated versus observed yield data.The RMSE difference between estimated-and observed-yields was approximately of 1 Mg•ha −1 (ranged between 0.1 and 2.8 Mg•ha −1 ) for all 22 fields.

Spatial and OLS Model Variables and Validation Performance
Based on each field evaluation (Table 4), a training dataset was prepared including all models that resulted in a lack of spatial autocorrelation, OLS model, for fields: F3, F4, F7, and F8 (Figure 1).All four fields were utilized and an overall equation was determined in order to estimate yields.Estimated yield resulted by the OLS model showed similarity to the observed yield at harvest, with an adequate coefficient of determination (R 2 = 0.47) and root mean square error of 1.4 Mg•ha −1 (Figure 3a).Similar to the OLS model, a training dataset was built for the SPL model including fields F2, F5, F6, F9, F10, F12, F14, F15, F16, F17, F18, F19, F20, and F21, following the steps proposed in the framework presented in Figure 1.The SPL model was utilized to fit estimated versus observed yields, which presented a low RMSE (1.28 Mg•ha −1 ), high correlation (R 2 = 0.48), and with all observations close to 1:1 line (Figure 3b).Better performance, slightly higher coefficient of determination, and lower RMSE on forecasting yields was documented for the SPL model (Figure 3b) as compared to the overall OLS model (Figure 3a).
After clustering into two groups, OLS vs. SPL groups, and variables were selected, out of five VIs, the NDVIre and NDVIG were statistically significant in both OLS and spatial models, whereas NIR was only significant in the spatial model (Table 5).Predictive yield maps were prepared with the validation data (based on framework in Figure 1).Both selected models provided significant predictors for yield (p < 0.05) within their respective cluster.The regression models developed were applied to each respective training datasets (OLS and SPL datasets) and predictive corn yield maps were generated from those models (Figure 4).Table 5. Meta-functions for the ordinary least-square (OLS) and spatial econometric (SPL) models including the vegetation indices (VIs) obtained from mid-season high-resolution satellite imagery as predictors of the end-season yield monitor data, observed corn yield for all fields evaluated in this study.As a last step, based on the framework proposed in Figure 1, a final global meta-function was developed with the entire database (22 fields), excluding one field (F21) that was randomly chosen as to validate the calculated function.Equation ( 2) shows the global meta-function for yield prediction (dependent variable) and respective VIs (independent variables) and constants.The validation of this model is showed of Figure 6.For the meta-model, only the NDVIre and NIR were retained when compared with all Vis tested in the model, and the spatial factor (λ) was statistically significant (p < 0.05).Yield was adequately predicted by the meta-model (Figure 5).A predictive yield map was obtained via implementation of the meta-model and validated with the F21 (Figure 6).For both variables, comparable yield environments from the estimated yield based on mid-season high-resolution imagery and the endseason yield monitor data were documented (Figure 6).Similar spatial (λ) coefficients were documented when the SPL model included only the fields with significant spatial factor (λ) (Table 5) as compared when the entire database was considered to build the SPL model (Equation ( 2)).Final validation presented in Figure 4b (for two fields) and Figure 6 As a last step, based on the framework proposed in Figure 1, a final global meta-function was developed with the entire database (22 fields), excluding one field (F21) that was randomly chosen as to validate the calculated function.Equation ( 2) shows the global meta-function for yield prediction (dependent variable) and respective VIs (independent variables) and constants.The validation of this model is showed of Figure 6.
For the meta-model, only the NDVIre and NIR were retained when compared with all Vis tested in the model, and the spatial factor (λ) was statistically significant (p < 0.05).Yield was adequately predicted by the meta-model (Figure 5).A predictive yield map was obtained via implementation of the meta-model and validated with the F21 (Figure 6).For both variables, comparable yield environments from the estimated yield based on mid-season high-resolution imagery and the end-season yield monitor data were documented (Figure 6).As a last step, based on the framework proposed in Figure 1, a final global meta-function was developed with the entire database (22 fields), excluding one field (F21) that was randomly chosen as to validate the calculated function.Equation ( 2) shows the global meta-function for yield prediction (dependent variable) and respective VIs (independent variables) and constants.The validation of this model is showed of Figure 6.For the meta-model, only the NDVIre and NIR were retained when compared with all Vis tested in the model, and the spatial factor (λ) was statistically significant (p < 0.05).Yield was adequately predicted by the meta-model (Figure 5).A predictive yield map was obtained via implementation of the meta-model and validated with the F21 (Figure 6).For both variables, comparable yield environments from the estimated yield based on mid-season high-resolution imagery and the endseason yield monitor data were documented (Figure 6).Similar spatial (λ) coefficients were documented when the SPL model included only the fields with significant spatial factor (λ) (Table 5) as compared when the entire database was considered to build the SPL model (Equation ( 2)).Final validation presented in Figure 4b (for two fields) and Figure 6 (for one field) portrayed similar yield estimation, even when diverse databases were utilized but considering spatial (λ) factor adjustment (SPL model).In summary, spatial dependence for the intra-field yield variation was reflected in 16 (λ is significant at p < 0.05) out of 22 fields total (70% of the total database), improving model estimation (Table 4; Figures 1 and 2).Implementation of this model was stable across fields and years (from 2011 to 2015).The latter was reflected in the global meta-function that utilized NIR and NDVIre indices for yield prediction (with a significant spatial dependence).The meta-function overestimated yield in the low productivity zones within the field but reflected comparable yields for the average productivity environments.

Discussion
Yield monitor information is a critical input for the spatial modeling analysis with a large potential for improving daily-farmer operations [9,43,44].Nonetheless, the yield monitor data are relevant for off-season (after harvest) farming management decisions.As highlighted by Yang [13], utilization of in-season remote sensing imagery has a tremendous potential to produce an "on-the-go" impact on the farming operations.The current study provides data at a large-scale (multiple sites across the state and years) related to the potential utilization of mid-season high-resolution satellite imagery as a proxy for yield monitor data.Related to satellite data, yield forecast model was not necessarily influenced by the timing (early July to mid-September) of collection, which is only restricted to the environmental conditions explored for all site-years evaluated in this study.A similar finding was reported by Shanahan et al. [45], portraying highest comparable correlation coefficients (close to 0.8 units) between yield and NDVIG from late vegetative to mid-reproductive stages in corn (early July to September).
The first main result of our analysis indicated a spatial dependence and spatial autocorrelation in all evaluated variables.This result is consistent with the bases of PA and it further affirms the need Similar spatial (λ) coefficients were documented when the SPL model included only the fields with significant spatial factor (λ) (Table 5) as compared when the entire database was considered to build the SPL model (Equation ( 2)).Final validation presented in Figure 4b (for two fields) and Figure 6 (for one field) portrayed similar yield estimation, even when diverse databases were utilized but considering spatial (λ) factor adjustment (SPL model).
In summary, spatial dependence for the intra-field yield variation was reflected in 16 (λ is significant at p < 0.05) out of 22 fields total (70% of the total database), improving model estimation (Table 4; Figures 1 and 2).Implementation of this model was stable across fields and years (from 2011 to 2015).The latter was reflected in the global meta-function that utilized NIR and NDVIre indices for yield prediction (with a significant spatial dependence).The meta-function overestimated yield in the low productivity zones within the field but reflected comparable yields for the average productivity environments.

Discussion
Yield monitor information is a critical input for the spatial modeling analysis with a large potential for improving daily-farmer operations [9,43,44].Nonetheless, the yield monitor data are relevant for off-season (after harvest) farming management decisions.As highlighted by Yang [13], utilization of in-season remote sensing imagery has a tremendous potential to produce an "on-the-go" impact on the farming operations.The current study provides data at a large-scale (multiple sites across the state and years) related to the potential utilization of mid-season high-resolution satellite imagery as a proxy for yield monitor data.Related to satellite data, yield forecast model was not necessarily influenced by the timing (early July to mid-September) of collection, which is only restricted to the environmental conditions explored for all site-years evaluated in this study.A similar finding was reported by Shanahan et al. [45], portraying highest comparable correlation coefficients (close to 0.8 units) between yield and NDVIG from late vegetative to mid-reproductive stages in corn (early July to September).
The first main result of our analysis indicated a spatial dependence and spatial autocorrelation in all evaluated variables.This result is consistent with the bases of PA and it further affirms the need for site-specific agriculture.Yield and imagery data of each field showed a significant autocorrelation indicating significant within-field variability and dependence of values that are closer in space than values at a further distance.Our result agrees with previous research results on field level spatial variation and autocorrelation on corn yield [46][47][48] and VI [27,49,50].An in depth study of the main reason behind a significant variation in yield and growth will follow this paper, but based on literature, soil nutrient and other resource variation, slope, production and management history of each area, drainage, and level of competition between neighboring plants can be cited.Variation in yield within a field was not same, i.e., in some fields the variation was as low as 33% and in some it was more than 140%.This suggests that the benefit of adopting PA technology could vary in similar percentage from field to field and from one farmer to another depending on site, environment, and management conditions.
The second and the most important outcome from this study showed the potential of mid-season VIs to predict yield with reasonable precision.The degree of precision was greater (up to ~70%) for fields that had significant variability and lower for fields that yielded relatively uniformly across all areas (~30%).Application of PA is crucial for fields that inherently have variability and, therefore, use of high-resolution satellite imagery can be critical for site-specific management.Previous investigations reflected significant correlation between mid-season VIs and final yield at the regional-scale [23,27] but there is scarce information for application of high-resolution satellite imagery data for intra-field level [13] management.
As related to the independent variables calculated from imagery data to predict yield (VIs gathered from the mid-season high-resolution imagery data), the NDVIre was the most retained VI in regression models developed for each field (Table 4).Thus, the VI that better explained the variation in yield was the NDVIre.Therefore, satellite images with the red edge band present a potential to provide higher level of accuracy in forecasting corn yields.This level of accuracy is perhaps result of not only to the sensitivity of the red-edge wavelength to chlorophyll concentration but also due to the plant scattering properties inherent in the crop biomass [51].A strong relationship of ground-truthing NDVIre with corn yield was found at V12 stage (twelve developed leaves) [52].The same authors also documented a stronger relationship of the VIs analyzed but with total biomass.Similar studies used NDVI to predict yield for different crops and imagery data gathered from different satellites [53][54][55][56].However, NDVI tends to saturate at medium to high LAI and yield [15].Notwithstanding of the main role highlighted for the NDVIre, it is imperative to mention that both NDVIG and NDVI were also retained in most field regression models, which reflects the potential of these indices for predicting the intra-field yield variation.At a regional-scale [57], the NDVIG explained a greater proportion of the variation of corn yield.Similar outcome was documented when the VIs was collected in field experiments from aircraft at varying growth stages during the corn growing season [45].
The meta-function overestimated yield for the low productivity zones within the field.There are plausible agronomic explanations for this yield overestimation.The first one is that the timing of the VIs collection; i.e., values are at the highest peak at mid-season in corn [58] related to the peak in leaf area.Second, the possibility of post-flowering biotic or abiotic stress conditions that could have severely reduced expected final yield [59].In addition, spectral reflectance signatures of crops are primarily influenced by physical properties of plants (e.g., chlorophyll content, mesophyll structure, canopy architecture, and biomass).Positional misalignment and the difference in spatial resolution between the image and the yield monitor data could also be sources of uncertainty.End-season yield data were re-projected to improve spatial alignment with the RapidEye image collected at mid-season.Discrepancies could have been introduced during the transformation of the data, which would have influenced the predictive power of the model.Future research can focus on aggregating data-layers from soil-weather-and-management practices for increasing the amount of variation that can be explained via the utilization of mid-season high-resolution satellite imagery in forecasting within-field and site-specific yield variation.
The third main result of our analysis demonstrates inclusion of spatial adjustment in models used for predicting yield.Due to significant spatial correlation demonstrated for yield, models predicting yield should strongly consider the evaluation of spatial correlation.Spatial adjustment to models predicting yield from imagery data is not well studied but our finding of considering spatial adjustment, rather than simple OLS regression, agrees with several researchers [24,[60][61][62].Furthermore, models developed for multiple fields by training similar fields demonstrate reasonable performance predicting yields.
In this study, normalized difference VIs based on ratios of two multispectral bands were tested for their effectiveness in predicting corn yield.More exhaustive analysis of spectral indices (such as simple ratios and differences) and angle indices derived from the angle formed by three spectral bands in a multispectral signature plot [63], in conjunction with new data analysis methodologies, such as machine learning, data meaning, among others, may improve accuracy of yield estimation.
From a practical standpoint, producers, consumers, researchers, policy makers, and grain marketing agencies could use high-resolution satellite imagery to determine the spatial within-field variability prior to harvest in order to make informed decisions based on the crop yield forecast report.Spatial representation of predictive yield also provides valuable information such as size, proximity, spatial arrangement, and connectivity of low and high-productive areas, which simple statistics or numerical data do not provide.The predictive yield maps at a resolution that can resolve variability of yields and its spatial patterns would be valuable for farmers in planning their sub-field scale land use in order to achieve their production goals.In concomitant, identification of different management zones within-fields would allow growers to move more quickly and effectively to variable rate technologies (i.e., seed, fertilizer, chemical inputs, and machinery use), providing not only agronomic improvements and better cost-effective field management but also critical benefits from environmental and economic standpoints of the farming systems.

Conclusions
This study investigated the potential of high-resolution satellite imagery data collected at mid-season for identification of within-field variability and to forecast corn yield at different sites within a field.Two regression models (OLS and SPL) were evaluated.The most striking outcomes of this research were: (1) The regression model using the NDVIr, NIR and NDVIre showed high performance for predicting yield within field.(2) Although within-field corn yield can be modeled and predicted by remote sensing imagery with the OLS method, the remote sensing imagery is only capable of capturing large scale variation and predicting general trends in the grain yield dataset.To model and predict the significant portion of small scale, local variation in the datasets, spatial prediction methods such as spatial regression is needed.(3) The creation of a single algorithm (meta-function) can generate a stable model over site-years with high performance for predicting within-field yields.(4) The predictive yield map (obtained from the SPL model) showed a clear similarity to the end-season yield monitor data (observed yields).
While further research is still needed, this study is one of the first to demonstrate the utilization of mid-season high-resolution satellite imagery to deliver actionable data for forecasting within-field corn yield variation.In addition to determining the robustness of the model using multiple site-years across a region, future studies should evaluate multiple timing of data collection via high-resolution satellite imagery, and more exhaustively exploring other spectral indices and model techniques (e.g., random forest regression) for further improving within-field yield prediction accuracy.

Figure 1 .
Figure 1.Workflow data integration between database, training and validation models.

Figure 1 .
Figure 1.Workflow data integration between database, training and validation models.

Figure 2 .
Figure 2.Estimated (predicted via mid-season high-resolution satellite imagery) versus observed corn yield, Y (end-season yield monitor data), 155 g•kg −1 grain moisture content, expressed in Mg•ha −1 .Coefficient of determination (R 2 ) and root mean-square error (RMSE) was calculated for each field relationship.A dashed line is presented in each panel portraying the 1:1 line for the estimated-observed yield relationship.

Figure 2 .
Figure 2.Estimated (predicted via mid-season high-resolution satellite imagery) versus observed corn yield, Y (end-season yield monitor data), 155 g•kg −1 grain moisture content, expressed in Mg•ha −1 .Coefficient of determination (R 2 ) and root mean-square error (RMSE) was calculated for each field relationship.A dashed line is presented in each panel portraying the 1:1 line for the estimated-observed yield relationship.

Figure 2 .
Figure 2.Estimated (predicted via mid-season high-resolution satellite imagery) versus observed corn yield, Y (end-season yield monitor data), 155 g•kg −1 grain moisture content, expressed in Mg•ha −1 .Coefficient of determination (R 2 ) and root mean-square error (RMSE) was calculated for each field relationship.A dashed line is presented in each panel portraying the 1:1 line for the estimated-observed yield relationship.

Figure 4 .
Figure 4. Predicted corn yield maps generated based on the simple meta-function (OLS) for two fields (F1 and F13) without spatial autocorrelation (a); and yield map based on the spatial meta-function (SPL) for two fields (F11 and F22) with spatial correlation (b) all versus end-season yield data (yield monitor).

Figure 5 .
Figure 5.Estimated corn yield (Y) predicted with global meta-function via SPL method (Table 5 versus observed Y (end-season yield monitor data).Coefficient of determination (R 2 ) and root mean-square error (RMSE) was determined for the global regression.The dashed line portrays the 1:1 line for the estimated-observed yield relationship.

Figure 4 .
Figure 4. Predicted corn yield maps generated based on the simple meta-function (OLS) for two fields (F1 and F13) without spatial autocorrelation (a); and yield map based on the spatial meta-function (SPL) for two fields (F11 and F22) with spatial correlation (b) all versus end-season yield data (yield monitor).

Figure 4 .
Figure 4. Predicted corn yield maps generated based on the simple meta-function (OLS) for two fields (F1 and F13) without spatial autocorrelation (a); and yield map based on the spatial meta-function (SPL) for two fields (F11 and F22) with spatial correlation (b) all versus end-season yield data (yield monitor).

Figure 5 .
Figure 5.Estimated corn yield (Y) predicted with global meta-function via SPL method (Table 5 versus observed Y (end-season yield monitor data).Coefficient of determination (R 2 ) and root mean-square error (RMSE) was determined for the global regression.The dashed line portrays the 1:1 line for the estimated-observed yield relationship.

Figure 5 .
Figure 5.Estimated corn yield (Y) predicted with global meta-function via SPL method (Table 5 versus observed Y (end-season yield monitor data).Coefficient of determination (R 2 ) and root mean-square error (RMSE) was determined for the global regression.The dashed line portrays the 1:1 line for the estimated-observed yield relationship.

Figure 6 .
Figure 6.Predicted yield map generated based on the spatial global meta-function (SPL) for field (F21) that present spatial autocorrelation versus end-season yield monitor data.

Figure 6 .
Figure 6.Predicted yield map generated based on the spatial global meta-function (SPL) for field (F21) that present spatial autocorrelation versus end-season yield monitor data.

Table 1 .
Descriptive information for each farm field collected during 2011-2015 related to county, latitude and longitude, total area (in hectares, ha), date for imagery acquisition (RapidEye platform), final grain yield (adjusted to 155 g•kg −1 grain moisture, expressed in Mg•ha −1 ) and year of yield monitor data.

Table 3 .
Moran's I test (MI)to vegetation indices (VIs) obtained from mid-season high-resolution satellite imagery and yield monitor data.
MI p-Value MI p-Value MI p-Value MI p-Value MI p-Value MI p-Value

Table 4 .
Multiple linear regression models (for the ordinary least-square (OLS_ and spatial econometric (SPL)) including the vegetation indices (VIs) obtained from mid-season high-resolution satellite imagery as predictors of the end-season yield monitor data.