Predicting Wheat Yield at the Field Scale by Combining High-Resolution Sentinel-2 Satellite Imagery and Crop Modelling

Accurate prediction of crop yield at the field scale is critical to addressing crop production challenges and reducing the impacts of climate variability and change. Recently released Sentinel-2 (S2) satellite data with a return cycle of five days and a high resolution at 13 spectral bands allows close observation of crop phenology and crop physiological attributes at field scale during crop growth. Here, we test the potential for indices derived from S2 data to estimate dryland wheat yields at the field scale and the potential for enhanced predictability by incorporating a modelled crop water stress index (SI). Observations from 103 study fields over the 2016 and 2017 cropping seasons across Northeastern Australia were used. Vegetation indices derived from S2 showed moderately high accuracy in yield prediction and explained over 70% of the yield variability. Specifically, the red edge chlorophyll index (CI; chlorophyll) (R2 = 0.76, RMSE = 0.88 t/ha) and the optimized soil-adjusted vegetation index (OSAVI; structural) (R2 = 0.74, RMSE = 0.91 t/ha) showed the best correlation with field yields. Furthermore, combining the crop model-derived SI with both structural and chlorophyll indices significantly enhanced predictability. The best model with combined OSAVI, CI and SI generated a much higher correlation, with R2 = 0.91 and RMSE = 0.54 t/ha. When validating the models on an independent set of fields, this model also showed high correlation (R2 = 0.93, RMSE = 0.64 t/ha). This study demonstrates the potential of combining S2-derived indices and crop model-derived indices to construct an enhanced yield prediction model suitable for fields in diversified climate conditions.


Introduction
Wheat (Triticum aestivum L.) is the 5th largest staple crop consumed by people worldwide [1]. Global wheat consumption reached a record high of 736 million tonnes in 2016/2017, which is a growth of 25% during the past 15 years (www.nass.usda.gov). In addition, while the rate of increase in global cereal yields has slowed or stagnated, the demand for wheat is projected to increase in future as the global population increases [2]. Therefore, the sustainable production of wheat will have a crucial bearing on food security over the coming decades.
In Australia, wheat is the main winter crop and contributes significantly to the national economy [3]. Significant volatility exists in wheat production due to large fluctuations in yield and area planted across the broad cropping region of Australia, often associated with the high climatic variability arising from El Niño or La Niña events [4]. Within this context, crop prediction has proved to be a with some success. This approach used leaf area index (LAI) estimates from satellite imagery to select and update the most likely crop growth and yield simulation at the pixel scale from a plethora of simulations generated using the Agricultural Production Systems Simulator (APSIM) crop modelling platform [15].
The launch of Sentinel 2 (S2) in 2015 provided the opportunity to overcome the identified constraints via its high spatial (10 m to 60 m), spectral (13 bands) and temporal resolutions (five-day return periods). S2 carries a multispectral instrument (MSI) payload that samples 13 spectral bands (https://sentinel.esa.int). With four bands in the visible to near-infrared range at 10 m resolution, four bands in the red edge range at 20 m resolution, two bands in the short-wave range at 20 m resolution and three bands at 60 m, it provides opportunity to explore new frontiers in crop yield prediction. Currently S2 can provide imagery every five days under ideal circumstances, which facilitates characterizing the entire crop life cycle at high temporal resolution. Some studies have utilized and validated the potential for using S2 for land cover classification [30], crop mapping [31,32], canopy chlorophyll and nitrogen content quantification [33,34] and yield estimation of other crops, with reasonable success [35][36][37][38].
Despite these advantages, cloud contamination on the optical images may still result in insufficient data for capturing the stresses experienced by a crop, especially those around flowering and during grain filling that crop will reduce grain number and weight and thus total grain yield [39]. Previous research has successfully estimated crop stress status during these critical periods using the thermal bands from Landsat [14] or via a simulated crop stress index derived from crop models as long as suitable weather data are available [12]. However, the use of Landsat imagery is usually constrained by missing data due to its low temporal frequency and cloud cover.
The aim of this study was to investigate the potential to utilize S2 MSI imagery for predicting dryland wheat yields at the field scale in North Eastern Australia (NEAUS). Firstly, VIs and crop growth attributes were derived from S2 time series. Secondly, the capacity of these VIs and crop growth attributes to predict wheat yield at the field scale was assessed. Thirdly, the potential to improve yield prediction by integrating a crop stress index simulated from the shire scale Oz-wheat model with VIs was explored. The predictive models were developed using data from 89 fields across two diverse cropping seasons in NEAUS. The models were contrasted quantitatively, and their robustness was evaluated using data from 14 independent fields at different locations. The implications and utility to industry of the findings of this study were discussed.

Study Area
Field data for model training was acquired from a broad acre cropping farm located in the Moree Plains Shire in Northern New South Wales ( Figure 1). Cropping systems on this farm are mixed dryland winter cereals and irrigated summer cotton. Total area of the property is 24,000 ha-of which, 10,400 ha is winter wheat, grown in rotation with cotton. During a normal season, only half of the irrigated area is cropped at any one time, with the other half maintained as a bare fallow. Summer cotton is usually planted during October, while winter wheat is sown in May to June and harvested during November. Long-term weather records (available from 1995 to present) from the nearest meteorological station (Moree Aero, ~30 km from the farm; latitude: 29.49° S, longitude: 149.85° E, elevation: 213 m), showed an average rainfall of 576 mm, with substantial within-and between year variations. Annual rainfall was 527 mm (323 mm in-crop) and 512 mm (120 mm in-crop) for 2016 and 2017, respectively. However, the two years experienced significantly high within-season variability. During 2016, rainfall mainly occurred during the winter crop-growing period with the highest amount recorded in September. Conversely, for 2017, the highest rainfall amount occurred during the fallow period before sowing (March) and significantly lower rainfall was observed during the winter crop-growing period. At the same time, maximum temperature during 2017 was significantly higher than that of 2016 ( Figure 2a). This resulted in two significantly diverse cropping seasons and thus an ideal data set for model calibration and training.
Additional field data from fields located in Moree Plain Shire (close to the training fields) and Bendemere Shire in South-eastern Queensland was collected for independent model validation. Climate records at Somerset station (next to the selected fields in Bendemere, latitude: 26.47° S, longitude: 148.97° E, elevation: 354 m) show that annual rainfall there was 628 mm (310 mm in-crop) and 465 mm (205 mm in-crop) for 2016 and 2017 season, respectively. This variation was similar to that observed for the training fields ( Figure 2b). As a result, while model testing was done on a completely independent data set from that of the training data set, both had similar dry land cropping environments. However, the independent test ensures model robustness and utility in application when out scaling to different regions and seasons. Long-term weather records (available from 1995 to present) from the nearest meteorological station (Moree Aero,~30 km from the farm; latitude: 29.49 • S, longitude: 149.85 • E, elevation: 213 m), showed an average rainfall of 576 mm, with substantial within-and between year variations. Annual rainfall was 527 mm (323 mm in-crop) and 512 mm (120 mm in-crop) for 2016 and 2017, respectively. However, the two years experienced significantly high within-season variability. During 2016, rainfall mainly occurred during the winter crop-growing period with the highest amount recorded in September. Conversely, for 2017, the highest rainfall amount occurred during the fallow period before sowing (March) and significantly lower rainfall was observed during the winter crop-growing period. At the same time, maximum temperature during 2017 was significantly higher than that of 2016 ( Figure 2a). This resulted in two significantly diverse cropping seasons and thus an ideal data set for model calibration and training.
Additional field data from fields located in Moree Plain Shire (close to the training fields) and Bendemere Shire in South-eastern Queensland was collected for independent model validation. Climate records at Somerset station (next to the selected fields in Bendemere, latitude: 26.47 • S, longitude: 148.97 • E, elevation: 354 m) show that annual rainfall there was 628 mm (310 mm in-crop) and 465 mm (205 mm in-crop) for 2016 and 2017 season, respectively. This variation was similar to that observed for the training fields ( Figure 2b). As a result, while model testing was done on a completely independent data set from that of the training data set, both had similar dry land cropping environments. However, the independent test ensures model robustness and utility in application when out scaling to different regions and seasons.

Yield data
The wheat yield data at the field scale was acquired from the Sundown Pastoral Company Pty. Ltd. This data was collated by yield monitors mounted on a combine harvester. Field-scale yield was aggregated from point-scale recordings at approximately 2 m intervals. In total, yield data (t/ha) for 46 fields (total area 5013 ha, field area ranging from 8 to 832 ha) in the 2016 crop season and 43 fields (total area 4714 ha, field area ranging from 6 to 554 ha) in the 2017 crop season constituted the training data set for model development and calibration ( Figure 1). Yield ranged from 2 to 6 t/ha in 2016 and 0.2 to 2.3 t/ha in 2017.
The calibrated model was then validated on a completely independent data set that had a total of 14 fields for the 2016 and 2017 seasons. Eight of these fields with yield data collected in 2016 (total area 1312 ha, field area ranging from 44 to 379 ha) were located in Moree Plains Shire (30 km away from the training fields). The remaining six fields for 2017 season (total area 572 ha, field area ranging from 18 to 215 ha) were located 320 km away in Bendemere Shire in Queensland ( Figure 1). Averaged yield was 1.3 t/ha and 4.3 t/ha for the 2016 and 2017 fields, respectively.  Table 1).

Satellite imagery and atmospheric corrections
The collected S2 TOA reflectance were corrected to surface reflectance using a 6S (Second Simulation of the Satellite Signal in the Solar Spectrum) atmospheric correction approach [40]. Py6S, an interface to the 6S through Python programming language, was implemented to perform the correction within GEE [41,42]. The atmospheric condition for each image scene (i.e., water vapor, ozone and aerosol optical thickness values) were determined using data products that are available in GEE.

Yield Data
The wheat yield data at the field scale was acquired from the Sundown Pastoral Company Pty. Ltd. This data was collated by yield monitors mounted on a combine harvester. Field-scale yield was aggregated from point-scale recordings at approximately 2 m intervals. In total, yield data (t/ha) for 46 fields (total area 5013 ha, field area ranging from 8 to 832 ha) in the 2016 crop season and 43 fields (total area 4714 ha, field area ranging from 6 to 554 ha) in the 2017 crop season constituted the training data set for model development and calibration ( Figure 1). Yield ranged from 2 to 6 t/ha in 2016 and 0.2 to 2.3 t/ha in 2017.
The calibrated model was then validated on a completely independent data set that had a total of 14 fields for the 2016 and 2017 seasons. Eight of these fields with yield data collected in 2016 (total area 1312 ha, field area ranging from 44 to 379 ha) were located in Moree Plains Shire (30 km away from the training fields). The remaining six fields for 2017 season (total area 572 ha, field area ranging from 18 to 215 ha) were located 320 km away in Bendemere Shire in Queensland ( Figure 1). Averaged yield was 1.3 t/ha and 4.3 t/ha for the 2016 and 2017 fields, respectively.  Table 1).

Satellite Imagery and Atmospheric Corrections
The collected S2 TOA reflectance were corrected to surface reflectance using a 6S (Second Simulation of the Satellite Signal in the Solar Spectrum) atmospheric correction approach [40]. Py6S, an interface to the 6S through Python programming language, was implemented to perform the correction within GEE [41,42]. The atmospheric condition for each image scene (i.e., water vapor, ozone and aerosol optical thickness values) were determined using data products that are available in GEE.

Selection of Vegetation Indices
A total of 14 spectral VIs were explored to determine their efficacy in accounting for wheat yield variability among fields ( Table 2). VIs are transformations of multiple spectral bands and wavelengths that allow the interpretation of variations in canopy structural and photosynthetic activities [16]. Here, we grouped the 14 selected indices into two categories as "canopy structural indices" (VI STR ) and "chlorophyll indices" (VI CHL ), based on their differences in wavelengths used ( Table 2).
Six VI STR indices were used to test the contribution of canopy structure to final yield. These indices are mainly known for their ability to capture light properties within the visible and near-infrared spectral regions. The Simple Ratio (SR) is sensitive to green vegetation and is highly correlated with LAI and leaf biomass [43]. NDVI is widely known for its capacity for assessing regional and global vegetation fluctuations. However, it is more sensitive to reflectance from soil background and also tends to saturate for high biomass canopies [16]. The Difference Vegetation Index (DVI) is a simple index but very sensitive to changes in soil background [44]. The optimized soil-adjusted vegetation index (OSAVI) is a soil-adjusted vegetation index optimized for agricultural monitoring, which has been applied to estimate crop canopy cover and monitor crop growth [45]. The Enhanced Vegetation Index (EVI) is an optimized vegetation index that shows improved sensitivity for measuring terrestrial vegetation that has high biomass or LAI, while better accounting for canopy background and atmosphere influences [16]. Lastly, the Enhanced Vegetation Index 2 (EVI2) is a transformation of the EVI, but without blue band [46].
Eight VI CHL indices and their ability to predict final wheat yield were also tested. Canopy chlorophyll content is strongly related to nitrogen supply and is therefore a strong predictor for crop yield [47]. Most chlorophyll-related indices focus on capturing reflectance in regions of the electromagnetic spectrum where (1) chlorophyll absorption is greatest, i.e., red band (R) region, (2) reflectance drastically increases towards the near-infrared band, i.e., red edge (RE) band region, and (3) the leaf cell structure produces strong reflection, i.e., near-infrared band region (NIR). Details of all indices tested in this study are listed in Table 2. Table 2. Details of selected canopy structural-and chlorophyll-related spectral indices tested in this study. The term "R Bi " in the equations denotes the surface reflectance at corresponding Sentinel-2 bands used for calculating the indices.

Index Equation Reference
Canopy structural-related indices ( VI STR )

Overview
Here, we employed a three-step approach: (i) construction of indices, (ii) analysis and model calibration and (iii) model validation. The first step was used to derive VI time series and to calculate attributes reflecting crop canopy development (e.g., canopy green up, senescence and peak) using the VI time series. During this step, we also simulated the crop stress index. The second step involved the development of the field-scale model. The model validation step was used to test model robustness and predictive ability for an independent set of fields. For creating the VI time series, we reconstructed the time sequential images acquired from the atmospheric corrected S2 MSI imageries. Metrics associated with peak canopy values and metrics relating to rates of green up and senescence (based on the regression slopes) before and after the peak were then calculated. Two crop stress indices (water stress and canopy temperature) were calculated using the biophysical crop model (Oz-Wheat, for water stress) and Landsat 8 imagery (for canopy temperature), respectively. The yield prediction model was developed using statistical analysis of variance (ANOVA) and a multivariate analysis incorporating all derived metrics and indices.

Extraction of Canopy Development-Related Metrics
Metrics derived from the S2 VI time series included peak, rate of green up (R GU ) and rate of senescence (R SEN ). Peaks of canopy values were derived by calculating the maximum value of each VI index derived from S2 time series between end of May and early November each year. May to November encapsulated the full crop growth period for winter crops in the study and training sites. The VIs were calculated according to their definitions outlined in Table 2 using GEE. Field-scale VIs were aggregated from the pixels within each field, as defined by field boundaries. The boundaries were provided by the field data producer and their accuracy was visually checked against S2 images showing clear field extensions. The derived VI series were then analyzed with the ArcGIS platform (Esri Inc., Version10.6.1).
The NDVI time series was divided into two portions before and after the date at which peak NDVI was observed. All available dates between start to the peak NDVI and peak to end were used to fit regression lines between NDVI values and date. The slopes of the fitted lines constituted the R GU (before peak) and the R SEN (after peak). These metrics enabled the accurate detection of the growth rates (before and after peak), and magnitude and exact timing of full canopy for each field crop. These three metrics (peak, R GU and R SEN ) closely mimic crop phenology attributes and therefore likely increase the discriminative power in predicting crop yield responses (e.g., timing of flowing and vegetative and grain filling phases) to environmental conditions.

Calculation of Crop Stress Using Biophysical Model
We investigated the ability of two crop stress-related metrics to predict variations in field-scale wheat yields. These were (i) the simulated Crop Stress Index (SI), which relates strongly to stress due to water limitation in dryland cropping regions, and (ii) land surface temperature (LST), which is a surrogate for crop stress and a measure of transpiration restriction at the canopy level.
SI was determined via an existing optimization calibration approach to determine the level of water supply to demand ratio experienced during crop growth [3,55] given as: Here, E a is the actual evapotranspiration and E t is the potential evapotranspiration. This ratio qualifies crop water stress and relates well to the physiological responses of a crop to water deficit. This index is simulated with the Oz-wheat model on a daily time step and averaged over an optimized critical period during crop growth around flowering. SI has values ranging from 0 to 1, where a SI value of 1 (E a = E t ) indicates no water limitation and values below 1 indicate increasing water stress experienced by the wheat crop [3,55].
Oz-wheat is a simple daily time-step model adapted from an agro-climatic wheat stress index model [56][57][58], which is sensitive to a water deficit or excess during the growing season. The model input parameters for each shire (i.e., potential available water content, planting rain and stress index period) have been selected based on the best fit when calibrated against actual shire wheat yields from the Australian Bureau of Statistics for the period 1976-2000, 2005, 2010 and 2015. More details on the development and function of the SI within Oz-wheat can be found in Potgieter et al. [56]. This study used the Oz-wheat model to simulate aggregated SI values for Moree Plains (total cropping area 1.1 × 10 6 ha, data derived from www.agriculture.gov.au) and Bendemere (total cropping area 4.2 × 10 4 ha) over the 2016 and 2017 cropping seasons. The daily SI was simulated for each of the 15 and 4 climate stations within the cropping area for the Moree Plains (approximately 72,000 ha per station) and Bendemere (approximately 10,000 ha per station) shires, respectively. The model integrates information from each station to a shire level driven by water deficits, historical climate data, three fixed crop maturities (i.e., late, medium, quick; derived from APSIM [59]) and a dynamical sowing date criteria triggered by a rainfall threshold of 15 mm in a five day period.

Calculation of Crop Stress Using Thermal Imagery
The second crop stress approach involves calculating the land surface temperature, which is known to be highly correlated to crop stress at canopy and field levels. High temperatures during grain filling are known to reduce potential wheat yields [60,61]. We used available Landsat thermal band to characterize canopy thermal conditions during a critical period around peak canopy cover (+/− 20 days). Canopy surface temperature during the period was determined using the Planck function: Here, LST is the land surface temperature in degrees Kelvin, CV R is the cell value as radiance, and ε is the emissivity, which was set as 0.95 in this study [62]. K1 (774.89) and K2 (1321.08) are thermal conversion constants. Maximum (LST max ), minimum (LST min ) and average (LST mean ) temperatures during the period were then determined by an overlay analysis in ArcGIS (Esri Inc., Version10.6.1).

Approaches for Estimating Wheat Yield at the Field Scale
We investigated the ability of S2 VI time series-derived canopy development-related metrics and the proposed SI to determine the variance explained and accuracy of wheat yields at the field scale. The metrics and SI were related to field-scale yields using a linear regression model (Equation (3)).
Furthermore, we analyzed the ability when combining the structural and chlorophyll indices to predict field-scale wheat yield. (Equation (4)). Finally, SI was included with and without the VIs models to determine the likely contribution of crop water stress status during flowering and grain filling at the field scale (Equation (5)).
where a, b, c and d are the regression coefficients for each predictor. The coefficient of determination (R 2 ), the prediction error (root mean square error, RMSE) and statistical significance of the model were computed to assess the performance of each index and phenological metric. Yield data for the 89 fields (46 in 2016, 43 in 2017) were used to calibrate the models.

Model Validation and Accuracy
A cross-validation approach was employed to determine the robustness and goodness of fit of the yield models. Yield data for the 89 fields was randomly partitioned into 10 "folders". A single "folder" was retained as the cross-validation data for testing the model, with yield data in the remaining nine folders used as training data to fit the model. This cross-validation procedure was repeated 10 times to constitute the cross-validated R 2 and RMSE [63].
Model accuracy was determined for each model by using the Mean Absolute Percent Error (MAPE) metric. MAPE was then transformed into a percentage as follows: where A t represents the actual yield and F t is the predicted yield. For validation, we applied the best derived model to predict observed yields for the independent validation dry land cropping data set (14 fields).

Simulated Crop Stress and Observed Yield across Seasons
The simulated SI values for the training data set located in Moree Plains Shire were 0.95 (little stress) and 0.59 (moderately stressed) for 2016 and 2017, respectively. These values varied between the two years and reflected the crop stress experienced by the crop at flowering and grain filling due to water limitation. The simulated stress aligned well with the total rainfall recorded during the in-crop period. Specifically, rainfall from May through October (323 mm) was 46% above the average (221 mm) and accounted for 61% of the total annual rainfall received during 2016. Conversely, 77% of the rainfall occurred during the non-cropping period (fallow), while extremely low rainfall occurred during the booting to flowering period in 2017 ( Figure 2).
Consequently, large differences in wheat yields were observed between the two cropping seasons across fields. The average wheat yield (46 fields) for 2016 was 4.12 t/ha, while much lower yields were observed in the 2017 cropping season, with an average of 1.14 t/ha (Figure 3). The wide distribution of yields across fields and between the two seasons provided an ideal training data set for developing a predictive yield model that can be used across diverse environmental conditions and crop management practices.

Predicting yield using single canopy development-related metrics
Overall, both VISTR-and VICHL-derived peaks showed moderately strong linear relationships with field-scale wheat yields, while RGU and RSEN showed slightly weaker relationships. Specifically, both PeakOSAVI and PeakNDVI had similar and moderately high linear correlations in predicting fieldscale wheat yields (R 2 = 0.74, RMSE = 0.91 t/ha, p < 0.001, Table 3 & Figure 4a). PeakEVI, PeakEVI2, PeakSR and PeakDVI showed slightly lower correlations and higher RMSE compared to PeakOSAVI and PeakNDVI.
The best chlorophyll model was obtained when using PeakCI (R 2 = 0.76, RMSE = 0.88 t/ha). Equally, PeakNDRE1 and PeakNDRE2 showed similar correlation coefficients. TCARI, which is specifically designed for detecting chlorophyll status, did not show improved prediction accuracy, nor did its combination with OSAVI (i.e., TCARI/OSAVI). The metrics capturing the rates of canopy development before (RGU) and after (RSEN) full canopy had moderate correlation with yield (R 2 < 0.56). The RGU metric had much lower skill (R 2 = 0.49) than the peak VI metrics (R 2 > 0.91) in predicting final yield. Further research is required if better prediction of yield at the field scale is required well in advance of flowering.

Predicting Yield Using Single Canopy Development-Related Metrics
Overall, both VI STR -and VI CHL -derived peaks showed moderately strong linear relationships with field-scale wheat yields, while R GU and R SEN showed slightly weaker relationships. Specifically, both Peak OSAVI and Peak NDVI had similar and moderately high linear correlations in predicting field-scale wheat yields (R 2 = 0.74, RMSE = 0.91 t/ha, p < 0.001, Table 3 & Figure 4a). Peak EVI , Peak EVI2 , Peak SR and Peak DVI showed slightly lower correlations and higher RMSE compared to Peak OSAVI and Peak NDVI . The best chlorophyll model was obtained when using Peak CI (R 2 = 0.76, RMSE = 0.88 t/ha). Equally, Peak NDRE1 and Peak NDRE2 showed similar correlation coefficients. TCARI, which is specifically designed for detecting chlorophyll status, did not show improved prediction accuracy, nor did its combination with OSAVI (i.e., TCARI/OSAVI). The metrics capturing the rates of canopy development before (R GU ) and after (R SEN ) full canopy had moderate correlation with yield (R 2 < 0.56). The R GU metric had much lower skill (R 2 = 0.49) than the peak VI metrics (R 2 > 0.91) in predicting final yield. Further research is required if better prediction of yield at the field scale is required well in advance of flowering.
Adding SI to the models generated above further improved the correlations (R 2 ) by more than 12% across all models. Typically, either PeakOSAVI or PeakNDVI with PeakCI and SI showed substantially higher R 2 of 0.91 for predicting wheat yields -including SI as a covariate improved the model strength (in terms of R 2 ) by more than 18% compared to omitting it. At the same time, the error decreased by around 37% from 0.86 t/ha to 0.54 t/ha. Adding crop stress derived from Landsat LST (LSTmax and LSTmean) during the critical period around peak canopy resulted in lower R 2 and larger RMSE when combined with the other main structural and chlorophyll indices ( Table 4). The R 2 values between actual yield and LSTmax and mean of LSTmean around peak canopy were 0.76 and 0.47, respectively. Hence, LST was not included for final model development and validation.

Predicting Wheat Yield Using Combined Model
A multivariate analysis was conducted to determine the best combination of VI STR and VI CHL to predict field-scale wheat yields. Overall, "Peak OSAVI + Peak CI " and "Peak NDVI + Peak CI " were the two combinations showing highest correlations with field yields. All models showed similar RMSEs between 0.84 to 0.86 t/ha ( Table 4). The strongest relationship was observed for the model using Peak NDVI and Peak CI to predict wheat yield (R 2 = 0.78, RMSE = 0.84 t/ha). Auto-correlations between indices (at peak) ranged from R 2 = 0.70 (NDVI~SR) to R 2 = 0.99 (EVI~EVI2), and R 2 = 0.29 (TCARI~TCARI/OSAVI) to R 2 = 0.99 (NDRE1~NDRE2) for structural and chlorophyll indices, respectively. The correlation between Peak OSAVI and Peak CI was 0.90, while correlation between Peak NDVI and Peak CI was 0.85.
Adding SI to the models generated above further improved the correlations (R 2 ) by more than 12% across all models. Typically, either Peak OSAVI or Peak NDVI with Peak CI and SI showed substantially higher R 2 of 0.91 for predicting wheat yields -including SI as a covariate improved the model strength (in terms of R 2 ) by more than 18% compared to omitting it. At the same time, the error decreased by around 37% from 0.86 t/ha to 0.54 t/ha. Adding crop stress derived from Landsat LST (LST max and LST mean ) during the critical period around peak canopy resulted in lower R 2 and larger RMSE when combined with the other main structural and chlorophyll indices ( Table 4). The R 2 values between actual yield and LST max and mean of LST mean around peak canopy were 0.76 and 0.47, respectively. Hence, LST was not included for final model development and validation.

Model Calibration and Validation
Predicted versus observed yield relationships for the calibration field data are shown in Figure 5 for the field-scale wheat yield models with best fit (Equations (7) and (8)).
Both models showed significantly high cross-validated correlation for predicting field-scale wheat yield. For Peak OSAVI and the Peak CI in combination with the SI, the correlation was high (R 2 = 0.90), with a small prediction error (RMSE = 0.56 t/ha). The MAPE ranged from close to 0% to 273% and averaged 33% across all fields and years. Average MAPE was 10% and 59% for the 2016 and 2017 cropping season, respectively. It is important to note that fields showing high MAPE% values had small portions within the fields with extremely low observed yields. For example, there are four fields (in 2017) with observed yields less than 0.3 t/ha and the MAPEs for these fields ranged from 146% to 273%. These fields had extremely low peak values in OSAVI and NDVI but were not excluded for completeness (data not shown). When excluding these fields with high MAPE%s (e.g., MAPE > 50%), the average yield for 2017 fields was recorded at 1.2 t/ha and the MAPE averaged 28%. Conversely, for the 2016 season, MAPEs for the fields ranged from 0.2% to 47% with most of the fields (40 out of 46 fields) showing MAPE values well below 20% at the field scale.
Similarly, the modelled yields using Peak NDVI , Peak CI and simulated SI was consistent with the observed values (R 2 = 0.90, RMSE = 0.57 t/ha). The MAPE ranged from close to 0% to 316% across fields and years. Average MAPE were 11% and 57% for the 2016 and 2017 cropping season, respectively. Again, larger predict errors (i.e., high MAPE values) were associated with fields that had extremely low observed yields. This was evident for the 2017 cropping season (d). The relative poorer performance in predicting fields that have extremely low yields (< 1 t/ha) does not necessarily suggest that the sensing indices are unable to detect those lower peaks (at pixel scales). The phenomenon is more likely related to poor crop emergence (crop cover) or completely failed crops within a field due to various biotic (pests and diseases) or abiotic stresses (water, high temperatures, soil constraints) [54]. Operationally, such pixels with low peak values will be excluded when aggregating to a field level. However, for exemplifying the sensitivity of indices and completeness of our analysis, pixels with low peak values in crop canopies were retained. Predictive models were tested using an independent data set. Both models showed significantly high correlations in predicting wheat yield for the independent test data set ( Figure 6). Specifically, the OSAVI based model in combination with Cl and SI showed high correlation coefficient (R 2 = 0.93, RMSE = 0.64 t/ha). Similarly, the NDVI, CI and SI model had an R 2 of 0.93, RMSE of 0.63 t/ha and an average the MAPE of 20%. Although both models had predictions close to the 1:1 line (broken line Figure 6a,c), there was a slight underestimation for observed yields less than 2 t/ha. Furthermore, relatively large aggregated errors, greater than 30% (MAPE), were mainly related to field with observed yields less than 1 t/ha. Predictive models were tested using an independent data set. Both models showed significantly high correlations in predicting wheat yield for the independent test data set ( Figure 6). Specifically, the OSAVI based model in combination with Cl and SI showed high correlation coefficient (R 2 = 0.93, RMSE = 0.64 t/ha). Similarly, the NDVI, CI and SI model had an R 2 of 0.93, RMSE of 0.63 t/ha and an average the MAPE of 20%. Although both models had predictions close to the 1:1 line (broken line Figure 6a,c), there was a slight underestimation for observed yields less than 2 t/ha. Furthermore, relatively large aggregated errors, greater than 30% (MAPE), were mainly related to field with observed yields less than 1 t/ha.

Within-field scale application of combined model
The NDVI-based model (Equation 7) was applied to the time series of S2 imagery for each pixel. Figure 7 shows the predicted yields for each pixel and the MAPE of aggregated predicted yield versus observed yield for each field. Substantial within-field variation was captured using the predictive yield model at the pixel scale. A pixel-scale yield map for the 2017 wheat fields is include as Figure  S1.

Within-Field Scale Application of Combined Model
The NDVI-based model (Equation (7)) was applied to the time series of S2 imagery for each pixel. Figure 7 shows the predicted yields for each pixel and the MAPE of aggregated predicted yield versus observed yield for each field. Substantial within-field variation was captured using the predictive yield model at the pixel scale. A pixel-scale yield map for the 2017 wheat fields is include as Figure S1.

Discussion
This study investigated potential of combining high-resolution S2 imagery and crop modelling to predict wheat yield at the field scale. Here, we analyzed the sensitivity of VI and remotely sensed crop development metrics to predict actual wheat yields. The chlorophyll-related indices (PeakCI and Figure 7. Applying the field-scale yield model to pixels within each field. The values for each field indicate the MAPE (%) of predicted vs. actual yield at the whole-field scale.

Discussion
This study investigated potential of combining high-resolution S2 imagery and crop modelling to predict wheat yield at the field scale. Here, we analyzed the sensitivity of VI and remotely sensed crop development metrics to predict actual wheat yields. The chlorophyll-related indices (Peak CI and Peak NDRE ) had similar or slightly higher R 2 than the structural VIs. Previous studies have reported similar findings, i.e., changes in red edge-based indices had a strong linear correlation with canopy chlorophyll content and thus better ability to predict yield [34,64,65]. While time series CI-derived metrics are useful indicators of crop canopy health and nitrogen status that can have large effects on biomass accumulation, grain formation and ultimate final yield [34], our results indicate similar predictive skill with better structural-and chlorophyll-related indices.
Combining VIs related to structural and chlorophyll content resulted in a small increase in overall predictive skill for actual yields at the field scale compared to using only structural VIs. Because both structural and chlorophyll VIs respond to different aspects of plant processes during crop growth (i.e., morphological and physiological), both were included in the final models. Further, for considering regions outside the calibration location and diverse crop management practices (such as modern cultivars with targeted stay-green traits [66]), a more comprehensive model was desirable.
The final model incorporated crop stress-related metrics (i.e., temperature and SI) with structural and chlorophyll VIs. Inclusion of temperature from Landsat imagery and the simulated crop stress (SI) increased the overall performance of the wheat yield model, with the latter being more effective. Such indices are known to be able to mimic crop stresses due to high temperature and/or water limitation, specifically around flowering and during grain filling, where they cause a reduction in final wheat yield [60,61]. Furthermore, the SI lends itself to higher utility in an operational sense for predicting crop yields well before peak canopy, which is not at all possible with a "pure" RS approach using LST and vegetation indices. In addition, acquisition of LST data on a regular time scale can be limited due to cloud cover and missing data.
Previous studies have indicated that changes in climate and farm management practices, causing local variations in crop canopies and ensuing crop yields [67,68], would likely affect the applicability of any yield prediction models to other regions and seasons [7,69]. The model developed in this study integrated both structural-and chlorophyll-related indices that are main contributors to final crop potential, with simulated crop stress index that incorporates effects on yield driven by environmental factors such as rainfall and temperature. Part of the model's discriminative power is driven by the magnitude in the peak of the selected structural and chlorophyll indices. The timing of when the peak occurs can differ between regions mainly due to the interaction between varieties (maturity types) and growing degree days (temperature) [70]. Unfortunately, variety type and sowing date data were not available in this study due to data privacy laws and could therefore not be analyzed. However, it is likely to have little to no effect on the accuracy and application of the derived model since timing is not a covariate of the model. Furthermore, by incorporating the SI as covariate, we accounted for some of the differences in phenology (flowering) and temperature across regions. It is anticipated that utilizing SI at the station level closest to the field will enhance the specificity and accuracy of the yield prediction. Therefore, the method for developing models utilized in this study provided a robust operational framework that could be easily adapted to yield prediction across other regions and seasons. This approach, however, is only applicable to dryland wheat cropping. For irrigated wheat crops that are not water limited, other yield-environment indices such as the photo thermal quotient (PTQ) would need to be explored [71].
Finally, this study exemplified the clear advantage of utilizing the fine spatial resolution of S2 (10 m to 20 m) to predict yield within a field. When applying the field-scale models to predict yield at the pixel level, we found appreciably low MAPEs across the training data set (Figure 7). This approach will have considerable utility for precision agriculture. For example, by identifying the low/medium/high-yield regions within a field (e.g., Figure 7), and their distribution across seasons, it is possible to support approaches to specifically targeted management that might result in higher profitability. Although this study is not exhaustive, it showed high predictability of wheat yields at field and within-field scales from remotely-sensed and modelled indices. Further targeted research will be pursued to explore the utility of these enabling digital technologies to the agricultural industry and government agencies to aid decision making.

Conclusions
In this study, we assessed the potential for integrating high-resolution satellite imagery (S2) with crop stress-related indices to predict wheat yield at the field scale. Yield prediction models were developed by deriving the indices relating to canopy structure, chlorophyll canopy status and simulated crop stress. The models were calibrated and validated across a diverse set of environments during the 2016 and 2017 winter cropping seasons in north-eastern Australia.
The final selected model integrating structural, chlorophyll and SI indices showed significant and high cross-validated correlation (R 2 = 0.90) for predicting field-scale wheat yield. When using the derived model to predict wheat yield for independent fields the correlation remained significantly high (R 2 = 0.93). From this study, it is clear that a predictive model using a VI or more than one VI (structural and/or chlorophyll) is less accurate than a model combining VIs (structural and chlorophyll) with the simulated SI (crop stress around flowering) to predict final harvested wheat yield. The biophysical index, such as the SI used here, provides an innovative avenue to enhance yield prediction from remote sensing approaches. This is mainly due to SI's ability to capture crop stress during flowering and grain filling periods in these water-limited environments. It is anticipated that producers, as well as related agricultural industries (e.g., bulk handlers, grain storage, transportation), will be able to utilize these enhanced predictive technologies to achieve reliable and accurate estimates of yield at field and within-field scales. Having access to spatial information on likely crop yields before harvest will not only aid in tactical decisions regarding precision agriculture but will likely aid government and private agencies responsible for commodity market forecasts, crop insurance products, and the collation of commodity forecast information at regional and national scales.