Assessing Multiple Years’ Spatial Variability of Crop Yields Using Satellite Vegetation Indices

: Assessing crop yield trends over years is a key step in site speciﬁc management, in view of improving the economic and environmental proﬁle of agriculture. This study was conducted in a 11.07 ha area under Mediterranean climate in Northern Italy to evaluate the spatial variability and the relationships between six remotely sensed vegetation indices (VIs) and grain yield (GY) in ﬁve consecutive years. A total of 25 satellite (Landsat 5, 7, and 8) images were downloaded during crop growth to obtain the following VIs: Normalized Di ﬀ erence Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Soil Adjusted Vegetation Index (SAVI), Green Normalized Di ﬀ erence Vegetation Index (GNDVI), Green Chlorophyll Index (GCI), and Simple Ratio (SR). The surveyed crops were durum wheat in 2010, sunﬂower in 2011, bread wheat in 2012 and 2014, and coriander in 2013. Geo-referenced GY and VI data were used to generate spatial trend maps across the experimental ﬁeld through geostatistical analysis. Crop stages featuring the best correlations between VIs and GY at the same spatial resolution (30 m) were acknowledged as the best periods for GY prediction. Based on this, 2–4 VIs were selected each year, totalling 15 VIs in the ﬁve years with r values with GY between 0.729** and 0.935**. SR and NDVI were most frequently chosen (six and four times, respectively) across stages from mid vegetative to mid reproductive growth. Conversely, SAVI never had correlations high enough to be selected. Correspondence analysis between remote VIs and GY based on quantile ranking in the 126 (30 m size) pixels exhibited a ﬁnal agreement between 64% and 86%. Therefore, Landsat imagery with its spatial and temporal resolution proved a good potential for estimating ﬁnal GY over di ﬀ erent crops in a rotation, at a relatively small ﬁeld scale.


Introduction
Investigation of problems associated with agricultural yield, before harvesting, most commonly involves observing in-season growth variations, scouting areas at different fertility, and defining optimum soil sampling design according to needs of individual field. All this helps to increase the agricultural economic yield. However, current traditional methods for estimating the crop yield during the growing season may lead to poor yield assessment and inaccurate area appraisal. These methods rely on the precise and detailed collection of crop data, which is cost-intensive and time-consuming [1,2]. In addition, extensive use of nitrogen (N) fertilizers poses a problem of growing concern in agriculture. Surplus N drains into groundwater affecting the quality of drinking water, or seeps into superficial water bodies determining eutrophication [3]. This scenario makes it necessary to adopt efficient crop production methods assuring minimum environmental impact and concurrent optimization The estimation of chlorophyll concentration at leaf canopy involves the Green Chlorophyll Index (GCI), which is directly related to leaf area index and final GY. It ranges from 0 to 6 [34].
Lastly, the Simple Ratio (SR), also known as ratio vegetation index (RVI) [35], is used to eliminate albedo effects in the atmosphere, as it is calculated by band ratio of light scattered in the NIR to light absorbed in the red reflectance (R red ). Hence, SR is close to 1 if the object gets similar reflectance in both red and NIR bands. For a green object, the value ranges from 0 to infinity [36]. SR, which is the simplest VI, owns nonetheless a higher sensitivity to high biomass and LAI values, compared to NDVI [37].
Many studies have successfully addressed various issues through remote sensing: leaf nitrogen content [25,38], leaf area index [28,39], chlorophyll content [39], total biomass [35], and final crop yield [40]. Landsat satellite data at 30 m resolution have already been used to successfully predict crop yields [41,42]. The strength of the relationships between VIs and crop yield is the fundamental premise to this. However, crop spectral reflectance varies with species and depends on the concentration of leaf pigments, soil moisture, biomass structure and ratio of light absorption to transmittance, management practices, and pest and disease outbreaks during crop growth [15,43].
Geostatistics encompasses a series of techniques of crucial importance in the assessment of spatial variability with both ground-based and remotely sensed data [44]. Owing to the discrete assessment of several crop traits, geostatistics serves to adapt them to the same grid size in order to compare them. In spatial studies, variograms are used as main geostatistical tool in the process of kriging. Kriging is one of the commonest techniques to estimate the prediction values from neighbouring actual values, a process known as interpolation [45,46]. Therefore, geostatistics was seen a valuable tool in the perspective of studying the relationship between remote VIs and ground-based crop features, as GY in this work.
Given the ample variation in the spectral composition of VIs, their variable performance at different growth stages of crops, and the resulting uncertainty, this study was intended to explore a significant number of remotely available VIs over a five-year period in a field hosting winter cereals and spring dicots alternating annually. Our specific objectives were to: (i) assess the spatial variability of the surveyed crop yields over five years; (ii) compare the behaviour of the targeted VIs during the growing season of each crop, in view of predicting final yield; (iii) select the best VIs in the five crops as cases for the study of spatial and temporal variability and relationship with final grain yield.

Experimental Site
The field site was located in the plain near Ravenna, Italy (44 • 29 26"N, 12 • 07 44"E, 0 m above sea level), a few kilometres from the Adriatic coast ( Figure 1). An 11.07 ha experimental area was chosen within a larger field (ca. 25 ha) of the Agrisfera Cooperative. Soils in this area have a variable texture depending on changes in sediment dispersal patterns in response to fluctuating sea level [47]. To cope with the limited elevation and shallow water table, a network of underground draining pipes discharging into a ditch on the north side serves the field. The climate falls in the Mediterranean North environmental zone [48], with mild winter and a long growing season, although precipitation is mostly concentrated in the cold semester.    Table 1. Cultivation was based on the good practice for each specific crop, according to local conditions. Each year the field was ploughed in the summertime and harrowed in the autumn (the three years with DW or BW) or winter (SF and CO) for seedbed preparation.

Field Data Collection
At maturity, yield data were collected by a New Holland CR 9080 (CNH Industrial N.V., Basildon, UK) combine harvester using specific headers according to the crop. An average 6170 GY data points per year were registered in the 11.07 ha experimental area. Therefore, each GY pixel covered an average 18 m². The combine harvester was equipped with assisted guiding system based on real time kinematic GPS, yield mapping system consisting of a Pektron flow meter (Pektron Group Ltd, Derby, UK) and Ag Leader moisture sensor (Ag Leader Technology, Ames, IA, USA).
Spatial data of raw yield were saved through the Farm Works™ Mapping software (Trimble Navigation Ltd., Sunnyvale, CA, USA) and exported to ESRI shape file format to be handled in QGIS 2.18.20. Yield data were filtered using Yield Editor software to detect and remove outliers [49] and adjusted at 13% moisture for DW and BW and at 9% moisture for SF and CO. The crop data of each year were intersected with polygon field boundary layer.   Table 1. Cultivation was based on the good practice for each specific crop, according to local conditions. Each year the field was ploughed in the summertime and harrowed in the autumn (the three years with DW or BW) or winter (SF and CO) for seedbed preparation. At maturity, yield data were collected by a New Holland CR 9080 (CNH Industrial N.V., Basildon, UK) combine harvester using specific headers according to the crop. An average 6170 GY data points per year were registered in the 11.07 ha experimental area. Therefore, each GY pixel covered an average 18 m 2 . The combine harvester was equipped with assisted guiding system based on real time kinematic GPS, yield mapping system consisting of a Pektron flow meter (Pektron Group Ltd, Derby, UK) and Ag Leader moisture sensor (Ag Leader Technology, Ames, IA, USA).

Field Data Collection
Spatial data of raw yield were saved through the Farm Works™ Mapping software (Trimble Navigation Ltd., Sunnyvale, CA, USA) and exported to ESRI shape file format to be handled in QGIS 2.18.20. Yield data were filtered using Yield Editor software to detect and remove outliers [49] and adjusted at 13% moisture for DW and BW and at 9% moisture for SF and CO. The crop data of each year were intersected with polygon field boundary layer.

Imagery Acquisition and Processing
Remotely sensed data from Landsat missions were used due to their frequency ensuring a dense time coverage in the investigated period. Landsat 5-Thematic Mapper (TM), Landsat 7-Enhanced Thematic Mapper Plus (ETM+), and Landsat 8-Operational Land Imager (OLI) have already been used in the monitoring of field crops over relatively small areas (6-11.5 ha) [50][51][52].
Collected Landsat (LS) images covered the following growth periods: 1 February-30 June for DW 2010 and BW 2012 and 2014; 1 May-31 August for SF 2011; 1 May-15 July for CO 2013. Landsat remote imagery were downloaded through US Geological Survey (USGS)-Earth Explorer website, a largest remote data network, using Collection Level-1, Tier 1, Precision Terrain (LITP) platform. This is an inventory structure for data collection, containing the highest quality Level-1 data suitable for time-series data record. Moreover, it gives access to all data as originally acquired and has an average revisiting time of 16 days. Landsat data archive equipped with 30 m spatial resolution was used for vegetation monitoring in the five years of the survey.
Specifically, multispectral images were retrieved by selecting the most recent Landsat mission available for each investigated year: LS-5 (TM) was used for DW 2010 and SF 2011, LS-7 (ETM+) for BW 2012, and lastly LS-8 OLI for SF 2013 and BW 2014 ( Table 2). Landsat satellite scenes selection was carried out by accurately evaluating clear sky conditions and the quality of pixels in the field, by using the available data and metadata for each Landsat product. Raster spectral bands (DNs) were converted to surface reflectance values by using the Semi-Automatic Classification Plugin [53] in QGIS by applying simple atmospheric correction under DOS1 method (Dark Object Subtraction 1) [54,55]. Although atmospheric correction for collection level-1 data is not univocally advised, it leads to an improvement in the results and makes the surface reflectance comparable among multi-temporal images [54].
Corrected raster images were intersected with polygon field boundary (11.07 ha), resulting in 126 data points extracted from each remote imagery with 30 m spatial resolution.
Crop stages at the respective dates were expressed with the BBCH scale that assigns a decimal code to the growth stages of mono-and di-cotyledonous plants [56]. A total of 25 images were used, i.e., 5 dates × 5 years.
All VIs were calculated through the algebraic combinations of reflectance values of red, infra-red and green portions of the electro-magnetic spectrum [41], based on the formulas presented in Table 3.
GCI Green chlorophyll index (R NIR /R green ) − 1 [25,30] SR Simple ratio R NIR / R red [26] R NIR , reflectance in the near infrared band; R red , reflectance in the red band; R blue , reflectance in the blue band; R green , reflectance in the green band; L, weighting coefficient = 0.25 (high vegetation density).

Analysis Methods
Georeferenced GY and remote VI data were subjected to geostatistical analysis (ArcGIS software version 10.3) to study the degree of spatial dependence in data distribution. The spatial dependence was calculated by means of empirical semivariogram, according to the following equation: where , γ(h) is the semi-variance at a specified distance, h represents lag distance between two paired points, N(h) is the number of paired points at distance h, Z(x i ) is the measured value at location x i , and Z(x i + h) denotes the secondary value measured at locations x i + h separated by the given distance (h) [57]. The experimental variogram was fitted by means of the spherical, exponential, linear, and Gaussian models. The spherical model exhibited a goodness of fit that was not surpassed by the other models and was, therefore, chosen. The spherical model, thanks to a well-defined sill, easily-interpreted range, and mathematical simplicity, is considered one of the best models for soil or plant variability fitting [45,46,[58][59][60].
Directional sample variograms were also computed for the canonical directions 0, 22.5, 45, 90, 135 degrees. Resulting variograms exhibited similar shapes, leading to the conclusion that directional effects were negligible/absent. Therefore, there was no evidence of spatial anisotropy potentially leading to wrong interpretation in data analysis.
In the spherical model, the following equations are applied: where: C 0 is the nugget, C 0 +C is the sill, a is the range, and h is the separation lag distance [45].
Three parameters need to be defined while fitting the theoretical model to the experimental semivariogram: (i) nugget effect (C 0 ), representing the error or variation in the measurement at minimum sampling distance (h = 0), i.e., the background effect; (ii) sill variance (C 0 + C), composed of C 0 (nugget variance) and C (structural variance), which is the maximum y-axis value increase with lag distance and remains constant beyond distance h; (iii) spatial dependence range (a), indicating the maximum limit at which data points are still spatially correlated and the semi-variogram reaches the sill value; beyond that limit no spatial auto-correlation can be demonstrated.
Through semivariograms, spatial yield maps were produced by simple kriging with 10 m cell size to extrapolate the values to non-sampled field parts [62]. Simple kriging was preferred over ordinary kriging because the mean value was known, which is the premise for a better estimate of the variance. Simple kriging is considered a realistic interpolation method: It provided highest R 2 and minimal error parameters among seven methods in environmental characteristics [60,63] and in the estimation of crop yield [64].
GY data referred to 1106 pixels (10 m cell size) and VI data referred to 126 pixels (30 m cell size) acquired in the 11.07 ha experimental field were submitted to descriptive statistics including mean, median, minimum, maximum, standard deviation (SD), and coefficient of variation (CV). Kolmogorov-Smirnov test was applied to ascertain the normal distribution in the data sets.
Kriged maps of crop yield were aggregated at the same 30 m cell size as the spatial resolution of satellite imagery [42]. Then, raster maps were converted into point data to assess the relationships between remote imagery and crop yields, i.e., between VIs and GY in each respective year, by means of Pearson's correlation [65]. The best times/crop stages for the prediction of final yields were acknowledged as those featuring the highest correlations for all the indices averaged [66,67]. The vegetation index showing the highest correlation with yield was selected from each date falling within these stages in each crop. When two VIs exhibited the same r value, both were selected. Based on this, a series of representative VIs were retained as study cases for each year, and further processed.
Geostatistics (ArcGIS software version 10.3) was applied on GY and selected VIs, in order to describe the data distribution of remote VIs and GY data in terms of semivariogram through the above described spherical model. The intrinsic variation in these two sets of variables was assessed based on their spatial dependence across the field. Geostatistics also served in the analysis between original VI data and kriged GY data for the determination of correspondence levels and final agreement between them.
Spatial maps of remote VIs and GY were shown with the same 30 m cell size [62]. A colour scale was chosen based on quantile classification for all maps produced in this study. All maps were georeferenced and co-registered in reference system of WGS 84/UTM zone 32N-EPSG: 32,632.
The prediction accuracy of VI and GY data was assessed with the spherical model in terms of coefficient of determination (R 2 ) of model-predicted vs. actual observations [68,69], while mean absolute error (MAE) and root mean square error (RMSE) were calculated through kriged residuals [70]. The mean relative error (MRE) and relative RMSE (RRMSE) were calculated as percent ratios on the average VIs and GY values, respectively. Maximum R 2 and minimal error parameters are associated with best model accuracy [63].
After analysing the spatial dependence within VIs and GY data, we investigated the correspondence between them. The correspondence levels were calculated as the proportion of pixels belonging to same remote imagery and crop yield class. To represent the relative similarity between them, remote imagery and crop yield data were classified from lowest to highest values into five classes (i.e., quintiles) of equal frequency over the entire field. Then, the correspondence levels and final agreement (%) between each VI and GY were determined as described by Stępień et al. [71]: For a given pixel, if the class of VI quantile was the same as that of GY quantile, the correspondence was considered 'high' with value 1; if the class of VI and GY belonged to adjacent quantiles, the correspondence was considered 'medium' with value 0.5; if the class of VI and GY quantile were separated by more than one class, the correspondence was considered 'low' with value 0. Based on this, final agreement was calculated as the sum of pixel agreement scores, according to the formula: where: F a = final agreement (%); P h = number of pixels with high agreement; P m = number of pixels with medium agreement; P l = number of pixels with low agreement; P t = total number of pixels.
The coefficient of variation of F a was also calculated as the standard deviation of correspondence levels in percent of their mean.

Descriptive Statistics of Crop Yields
Descriptive statistics of crop yields in the five years is reported in Table 4. In wheat, mean GY ranged between 4.26 and 5.91 t ha −1 . The lower GY was referred to DW 2010 and BW 2012 that showed the same mean data, whereas the higher GY was related to BW 2014. The two dicots (SF 2011 and CO 2013) were comparatively much less productive, attaining less than 2 t ha −1 mean GY. An ample variation in yield data was described by all crops, resulting in a standard deviation that, proportioned to GY, determined a CV value ranging between 23.2% (DW 2010) and 31.7% (CO 2013).

Descriptive Statistics of Remote Indices
The five years' remote indices are described in Table 5. In DW 2010, the six VIs in the five growth stages ranging from beginning of stem elongation to heading (Table 2) outlined the mean values reported in Table 5. The complete descriptive statistics of this year's data is shown in the supplementary material (Table S1). Mean values augmented until a peak at BBCH stage 43-55. Mean and median values were always very close, never diverging by more than 5%. VI variation was more contained than GY variation in the same year (Table 4): Average CV was 12.3%, and highest CV reached 23.9%. Lastly, data were more often normally distributed (non-significant Kolmogorov-Smirnov test in 23 cases out of 30).
In SF 2011, the six VIs in the five growth stages ranging from beginning of stem elongation to seed ripening (Table 2) exhibited the mean values reported in Table 5. The complete descriptive statistics of this year's data is shown in the supplementary material (Table S2). Mean values increased until a peak at approximately BBCH stage 51 (inflorescence just visible), although EVI staged higher value at BBCH stage 30. Mean and median values never diverged by more than 6%. VI variation was more contained than GY variation in the same year (Table 4): Average CV was 11.2%, and highest CV was 20.5%. Data were almost equally divided between normally distributed (17 cases out of 30) and non-normally distributed (the remaining 13 cases).
In BW 2012, the six VIs in the five growth stages ranging from mid-stem elongation to early dough ripening (Table 2) staged the mean values reported in Table 5. The complete descriptive statistics of this year's data is shown in the supplementary material (Table S3). Mean values increased until a peak at BBCH stage 63 (anthesis), except EVI that showed a very low value at this stage and peaked at BBCH stage 83 (early dough). Mean and median values sometimes diverged, especially at BBCH stage 41. Data variation was also fluctuating: Average CV was 31.9%, but CV values above 100% were also recorded. Lastly, data were normally distributed only in 2 cases out of 30. In CO 2013, the six VIs in the five growth stages ranging from mid-stem elongation to beginning of ripening (Table 2) showed the mean values reported in Table 5. The complete descriptive statistics of this year's data is shown in the supplementary material (Table S4). Mean values increased until BBCH stage 71 (fruiting). Mean and median values never diverged by more than 4%. VI variation was more contained than GY variation in the same year (Table 4): Average CV was 9.4%, and highest CV was 22.5%. Despite this, data were normally distributed only in 5 cases out of 30. In BW 2014, the six VIs in the five growth stages ranging from early stem elongation to soft dough ripening (Table 2) featured the mean values reported in Table 5. The complete descriptive statistics of this year's data is shown in the supplementary material (Table S5). Mean values increased until BBCH stage 55 (heading). Mean and median values were generally similar; only in three cases they diverged by 10% to 13%. VI variation was more contained than GY variation in the same year (Table 4): average CV was 13.2%, and highest CV was 30.2%. Nonetheless, data were normally distributed only in 3 cases out of 30.

Correlations and Choice of Representative Vegetation Indices
In this experiment, Pearson's correlations between VIs and GY in the five years are reported in Table 5. High, statistically significant r values were generally obtained, indicating that remote indices across variable growth stages were aligned with final yields. We visually checked all correlations (graphs not reported in MS) to spot if any cloud of data outlined a curvilinear relation, finding none.
At growth stages of incipient senescence (BBCH>80), VIs were insignificantly/negatively correlated with GY (BW 2012 and 2014), because senescence encourages the breakdown of pigments that influence the reflectance properties of leaf canopy.
Overall, the three wheat crops (DW 2010, BW 2012 and 2014) staged r values above 0.700 in the average of the six VIs at BBCH stages not exceeding 50, i.e., before heading (Table 5). Afterwards, the r values sharply declined in DW 2010 (at BBCH 55, average r = 0.405), whereas in BW 2012 and 2014 they remained quite high until the mid-60s and mid-70s BBCH stages, respectively. Owing to the morph-physiological similarity between durum and bread wheat, it is sensed that the different behaviour was due to different ambient conditions during the reproductive phase in 2010 vs. 2012 and 2014.
In contrast to wheat, the two dicot species (SF 2011 and CO 2013) staged r values above 0.700 in the average of the six VIs at BBCH stages >50, i.e., during the reproductive phase (Table 5). SF 2011 maintained r values above 0.700 until the mid-reproductive phase (BBCH 67), whereas CO 2013 remained well above this threshold until later stages (BBCH 81).
Based on the procedure described in Section 2.4, the 2-4 VIs with best correlations with GY (highlighted in Table 5) were selected each year as representative study cases. They covered growth stages from BBCH 33 to BBCH 81. SR and NDVI were the two indices most frequently chosen, six and four times, respectively. At the opposite end, SAVI never exhibited correlations high enough to be included in the list.

Spatial Variability in Crop Yields
Analysis of the spherical semivariogram fitted to GY data is reported in Table 6. The lag distance was between 6.79 and 9.97 m, (data omitted in Table 6), depending on a particular crop of the year. SF 2011 was the crop showing the strongest background effect (C 0 ) on total (C 0 +C) variance, associated with a high CV (Table 4). Compared to this, the very low C 0 weight on C 0 +C indicates almost no discontinuity in the other four crops' spatial structure. The five crops exhibited a sill comprised between 0.92 (DW 2010 and BW 2012) and 1.09 (CO 2013), meaning a quite similar total variance. However, the range of the spatial dependence varied amply, being comprised between 65.90 m (BW 2012) and 99.87 m (CO 2013). Finally, the proportion between the two components, nugget to sill variance, varied considerably between SF 2011 where C 0 represented 34% of C 0 +C, and the other four crops where C 0 represented 0% to 2% of C 0 +C. As a result, SF 2011 can be considered a case of moderate spatial dependence, whereas the other four crops featured strong spatial dependence [61]. The performance of the spherical model in the 5 years was generally good (Table 6). R 2 was always highly significant, ranging between 0.71 and 0.99, while the two error terms (MAE and RMSE) were low. This is especially true with respect to GY means (Table 4), resulting in relative error terms (MRE and RRMSE) that were between 2.2% and 10.6% (MRE) and between 3.1% and 15.5% (RRMSE).
Yield maps exhibited a quite similar pattern in the five years, as shown in Figure 2. The north side of the field always featured a lower GY than the south side. Especially the three years with wheat (DW 2010, BW 2012 and 2014) behaved in a similar way with lowest yield rank in the north-central portion of the field, and highest yield rank in the south-southwestern portion. Compared to this, the two spring sown crops (SF 2011 and CO 2013) exhibited a slightly wider area of low GY in the north portion of the field. These two crops also featured higher CV (Table 4), i.e., higher yield variation than the three wheat crops.

Spatial Variability in Remote Vegetation Indices
Analysis of the spherical semivariogram fitted to VI data is reported in Table 7. The lag distance was between 8.17 and 15.30 m (data omitted in Table 7), depending on specific VI, crop and stage. The nugget (C0) was always nil, indicating continuous spatial dependence of data over distance [44]. The sill (C0+C) was comprised between 1.18 (NDVI in BW 2014 at BBCH stage 55) and 1.42 (GNDVI in SF 2011 at BBCH stage 51). The range was comprised between 108.84 m (EVI in BW 2012 at BBCH stage 36) and 154.25 (NDVI and SR in CO 2013 at the respective BBCH stage 71 and 81). Finally, owing to nil nugget effect, the spatial dependence was always strong, i.e., 100% of the sill was associated with the structural component (C).
The performance of the spherical model in the 15 cases was good under all viewpoints (Table 7). R² was always ≥0.95, and the two error terms (MAE and RMSE) were low, especially with respect to mean VI values (Table 5). In fact, the relative error terms (MRE and RRMSE) were comprised between 0.6% and 5.6% of the mean.
In Figure 3, remote VI maps and the respective GY maps are depicted at the same cell size (30 m). A palette of five colours indicates values ranging from very low (red) to very high (dark green), passing through low (orange), medium (yellow) and high (light green). The 15 VIs, whose choice was based on their correlations with yield (Table 5), also from a spatial viewpoint described a pattern across the study area consistent with crop yield pattern: lower values in the north portion of the field, and conversely higher values in the south-southwestern portion.

Spatial Variability in Remote Vegetation Indices
Analysis of the spherical semivariogram fitted to VI data is reported in Table 7. The lag distance was between 8.17 and 15.30 m (data omitted in Table 7), depending on specific VI, crop and stage. The nugget (C 0 ) was always nil, indicating continuous spatial dependence of data over distance [44]. The sill (C 0 +C) was comprised between 1.18 (NDVI in BW 2014 at BBCH stage 55) and 1.42 (GNDVI in SF 2011 at BBCH stage 51). The range was comprised between 108.84 m (EVI in BW 2012 at BBCH stage 36) and 154.25 (NDVI and SR in CO 2013 at the respective BBCH stage 71 and 81). Finally, owing to nil nugget effect, the spatial dependence was always strong, i.e., 100% of the sill was associated with the structural component (C). DW, durum wheat; SF, sunflower; BW, bread wheat; CO, coriander; C 0 , nugget effect; C, structural variance; C 0 +C, sill variance; a, range; SD, spatial dependence (S, strong) ** , significant at P ≤ 0.01; MAE, mean absolute error; MRE, mean relative error; RMSE, root mean square error; RRMSE, relative root mean square error.
The performance of the spherical model in the 15 cases was good under all viewpoints (Table 7). R 2 was always ≥0.95, and the two error terms (MAE and RMSE) were low, especially with respect to mean VI values (Table 5). In fact, the relative error terms (MRE and RRMSE) were comprised between 0.6% and 5.6% of the mean.
In Figure 3, remote VI maps and the respective GY maps are depicted at the same cell size (30 m). A palette of five colours indicates values ranging from very low (red) to very high (dark green), passing through low (orange), medium (yellow) and high (light green). The 15 VIs, whose choice was based on their correlations with yield (Table 5), also from a spatial viewpoint described a pattern across the study area consistent with crop yield pattern: lower values in the north portion of the field, and conversely higher values in the south-southwestern portion.

Correspondence between Remote Vegetation Indices and Crop Yields
The apparent similarity in spatial pattern and geostatistical structure between VIs and GY is the premise for assessing the correspondence between remote indices and crop yields at the same pixel level (Table 8). In the 15 cases, high correspondence ranged between 42% and 72%, medium correspondence between 27% and 50%, and low correspondence between 1% and 16%. The resulting Fa ranged between 64% and 86%. Highest agreement was observed in BW 2014 (average of 4 VIs, 83%), lowest in SF 2011 (average of two VIs, 65%).  Some VIs at specific growth stages evidenced sharp differences between the two areas, as in the case of EVI and SR at BBCH 36 in BW 2012. Some other VIs depicted faded differences, as SR at BBCH 67 in SF 2011. However, spatial maps visually demonstrate that a small number of high remote VIs were associated with low final GY and vice versa.

Correspondence between Remote Vegetation Indices and Crop Yields
The apparent similarity in spatial pattern and geostatistical structure between VIs and GY is the premise for assessing the correspondence between remote indices and crop yields at the same pixel level (Table 8). In the 15 cases, high correspondence ranged between 42% and 72%, medium correspondence between 27% and 50%, and low correspondence between 1% and 16%. The resulting F a ranged between 64% and 86%. Highest agreement was observed in BW 2014 (average of 4 VIs, 83%), lowest in SF 2011 (average of two VIs, 65%).

Discussion
The CV data related to mean GY in the five years (Table 4) denote high yield variation, based on a CV threshold of 20% indicating high variation in field attributes [72]. This was a favourable circumstance in the present study, allowing us to appraise crop behaviour under conditions of sizeable spatial variability. Despite high CV, median GY in the five years was quite close to mean GY, differing from this latter by no more than ±4%. However, in all five years GY data were not normally distributed (significant Kolmogorov-Smirnov test), although normality is not a prerequisite for kriging interpolation [57].
In general, crop yield variation reflects the interactions among soil-related factors or topography [73]. Moreover, variation within the same plant genus (e.g., wheat) may be due to several factors as growth response to the amount of rainfall during specific crop stages, owing to the fact that wet years favour biomass accumulation thanks to higher soil water availability [58].
The similar spatial patter exhibited by yield maps across the five years ( Figure 2) is echoed in a previous survey on a narrower portion (4.15 ha) of the same field [74]. In that work, soils in the northern part were shown to be quite sandy and poor in organic matter. It is evinced, therefore, that under rainfed cultivation the two spring crops (SF and CO) may have suffered stronger drought during the reproductive stage in the summer time, compared to the three wheat crops sown in the autumn and maturing in the springtime.
High values of the six VIs in the respective scales indicate healthy, well growing crops. However, high VI values alone are not always good predictors for high GY because higher canopy biomass prior to grain filling only contributes to good growth status, possibly due to adequate water availability, which is not always a prerequisite for maximum grain production [75].
Notwithstanding this, the good correlations between VIs and GY (Table 5) are a remarkable outcome, considering that mono-cotyledonous (DW and BW) and di-cotyledonous (SF and CO) crops have different reflectance properties due to differences in leaf mesophyll cells and structure of front and back leaf side [76]. In other works, this has resulted in remote VIs behaving crop specifically according to leaf canopy structure [37].
The different relationships between VIs and GY in winter wheat (DW 2010, BW 2012 and 2014) vs. the two spring dicots (SF 2011 andCO 2013), consisting in the former crop showing better correlations in the vegetative phase and the latter two crops in the reproductive phase (Table 5), is at least partially associated with the time elapsed from seeding to the stage at which good correlations were found. In fact, wheat is cultivated as winter crop while sunflower and coriander can only be cultivated as spring crops in the specific climate [48]. It results that wheat cycle lasted an average 251 days from seeding to harvest, while sunflower cycle lasted 155 days and coriander cycle only 90 days (Table 1). Wheat scored the best correlations in the three years combined (Table 5) Table 2). Sunflower scored the best correlations from 83 to 108 DAS and coriander from 59 to 82 DAS (Table 2). Thus, the two dicots had been living for a shorter time, although they had reached a more advanced stage ( Table 2). It is perceived, therefore, that the long time passed when the first VIs were acquired in wheat may have served the plant to better sense the environmental differences within the field and translate them into a spectral response in agreement with the final yield. The two spring dicots featuring a shorter cycle also necessitated a certain amount of time to experience the same differences, and good correlations between VIs and GY were achieved at later stages. However, we have not found other works addressing this point, meaning that these findings are not echoed in the literature, to our best knowledge. Therefore, further evidence is needed to corroborate this hypothesis.
All the VIs acquired in these two time frames specific for wheat and spring dicots performed in a similar way. Assuming r = 0,700 as the threshold for good correlation, NDVI, GNDVI, GCI, SR, and EVI fell below this value only once (Table 5). However, even in that case NDVI, GNDVI, GCI, and SR decreased to levels slightly lower than this threshold (r between 0.631 and 0.698, depending on VI), whereas EVI plummeted to r = 0.480 in CO 2013 at BBCH 55 (Table 5). Compared to them, SAVI never fell below this threshold. Therefore, the choice of the best VI was not so critical within a suitable time frame, despite relevant differences in spectral composition and calculation formula.
Several works were found similar to our study, i.e., remote VIs successfully estimated spatial crop yields; Labus et al. [77] found a strong relationship (R 2 = 75.3%) between NDVI and wheat yield. Herbei and Sala [52] studied the relationship between growth stages and VIs in sunflower and found a maximum correlation with NDVI (R 2 > 0.97) at flowering, followed by a decline at maturity. Plant et al. [40] explained that NDVI is sensitive to canopy reflectance decreasing its correlation with cotton yield over a small field area. Yields of wheat, maize, rice, sugarcane, and soybean were also successfully estimated by means of NDVI [67,78,79]. After positive achievements with NDVI, other VIs proved also effective in the estimation of different crop yields [15,24,41,42,51,80,81].
The good performance of the spherical model in fitting GY experimental semivariograms (Table 6) is consistent with the equally good performance of the same model in fitting VI experimental semivariograms ( Table 7). The higher sill values in the model structure of VI semivariograms is assumed to indicate a higher percentage of canopy cover because young plants have less contrast in reflected light spectra resulting in lower sill variance [82]. However, at growth stages less than BBCH 30, as in our 15 cases (Table 7), full canopy cover is normally achieved, resulting in a random variation of sill data. Additionally, VI ranges largely exceeded GY ranges (Table 6), indicating a farther reaching spatial dependence of VI vs. GY data. This, in turn, supports the adoption of the 30 m Landsat resolution also for GY data to be compared [82]. Lastly, geostatistical analysis indicates a strong spatial dependence of VIs over distance (h), and the same was shown in GY data. In both cases, this is the premise for a continuous appraisal of the investigated traits over field surface.
In accordance with the similarities shown in spatial pattern and geostatistical analysis, final agreement between remote indices and crop yields at pixel level (Table 8) was, unsurprisingly, related to the R 2 values between VIs and GY obtained from the r values reported in Table 5. More surprisingly, F a data appeared to be adversely related to F a coefficient of variation (Table 8). In other words, very good agreement was obtained when the variation among the three correspondence levels (high, medium, low) was limited, as in the case of SR at BBCH stage 37 and 77 in BW 2014 (F a , 85% to 86% with respective CV, 27.1% to 27.5%). Conversely, a modest agreement was obtained when strong variation occurred among the three correspondence levels, as in the case of SR at BBCH stage 67 in SF 2011 (F a , 64% with CV, 56.4%). In theory, F a and CV are reciprocally independent, i.e., high F a could also be obtained with high CV and vice versa. This finding is of non-univocal interpretation and, to our knowledge, has not been reported so far in the literature.
Several reasons can explain why VI ranking into quintiles does not fully match the same ranking of GY data. Water availability, nutrient uptake, crop management practices, weather conditions, pests and diseases, etc., can influence growth status and final yield to a different extent: Resilience can be shown when a crop withstands unfavourable growth conditions (low VI values), attaining a fairly good yield. Conversely, rigidity is shown when slightly unfavourable growth conditions (relatively high VI values) result in poor yield. This latter case is also determined by the fact that GY may be reduced also when above-ground biomass is not significantly reduced.
An overall F a of 74% (Table 8) is nevertheless a good outcome: It means that almost 3 4 of the 126 cells belong to the same VI and GY quintile, despite ample variation in both VI and GY data.
There is still no general consensus in the use of rank comparison between spatially variable GY and remotely sensed VIs. In soybean [81], the combined SR and SAVI together with soil elevation concurred to delineate management zones reflecting significant differences in final yield and soil properties. More generally, in winter cereals (barley and wheat), NDVI from Landsat as well as higher resolution satellites (QuickBird and WorldView-2) proved effective in describing yield spatial variability, especially at the onset of the reproductive stage [83]. The same happened in winter oilseed rape with the Enhanced Moisture Stress Index [80]. No other hint of rank comparisons between VIs and GY can be found in the literature to our best knowledge.
The relationships (r and R 2 values) between VIs and final GY in the three cited sources [80,81,83] are no better, in general, than those found in the present study (Table 8). In this light, the above discussed data of F a (Table 8) may be considered encouraging and set the premise for a larger use of remote (satellite) imagery in the interpretation and subsequent management of crop spatial variation.
In this respect, the combined use of Sentinel-2A and Landsat data, providing a larger number of data and a higher spatial resolution, would foster a more detailed assessment that could be especially useful in identifying growth stages in different crops [84]. Data fusion from different platforms would overcome drawbacks related to the use of a single, medium-resolution satellite by improving the field temporal sampling.
Lastly, the use of Landsat collection level-1 data raises the issue of atmospheric data correction. In our study, we decided to apply a simple correction as the DOS1 method despite the good quality data (Level-1, Tier 1, Precision Terrain platform), to ensure comparability of surface reflectance among multi-temporal images [54,85]. More sophisticated models for atmospheric correction, as those distributed by USGS under Climate Data Records, were not adopted, as we could not collect specific in-field measurements. However, usefulness of atmospheric correction is questioned due to the risk of adding more errors [86], which is especially true in the case of large areas surveyed.

Conclusions
Landsat satellite imagery with its spatial and temporal resolution exhibited a good potential for estimating the final GY over different crops in a chronological rotation, at a relatively small field scale (11.07 ha). This is further stressed by the circumstance that the spatial resolution provided by the Landsat system (30 m) was shown sufficient to characterize crop variation across a field of moderate extension. This sets the premise for a wider use of satellite data in yield predictions during the growing season, beside their role in supporting site specific management of crop practices.
Simple ratio and NDVI were the two VIs most frequently selected as best indices, compared to EVI, SAVI, GNDVI, and GCI across stages ranging from vegetative (BBCH~30s) to reproductive phase (BBCH~70s). In contrast to this, SAVI never exhibited correlations with yield high enough during these stages, to be included among best VIs. Pixel level study demonstrated a generally good agreement between the five classes of VIs, on one side, and those of GY, on the other side.
Beside these 15 cases showing the best correlations with yield, two specific time frames were identified for winter wheat and the two spring dicots, showing high and consistent correlations between all remote indices, in general and final yields. Based on our data, this sets the premise for a reliable use of VIS in yield predictions. Outside these time frames, the relationships between VIs and GY are less consistent, and subjected to factors of difficult interpretation.
Future monitoring possibilities fostered by the Copernicus Sentinel 2 mission from the European Space Agency will allow imagery to be acquired from the year 2015 onwards, with spectral channels aligned with Landsat and Spot satellites. Improvements are expected from a higher spatial resolution (10-20 m) and radiometric resolution (reflectance registered with 16 bit). This is coupled with a revisit frequency of 2-3 days, ensuring the creation of dense time series for crop growth monitoring.
Based on this, good prospects may be envisaged for improved yield forecasts over large agricultural areas, as well as better support to farmers for specific decision management strategies.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/11/20/2384/s1, Table S1: Descriptive statistics of remote VIs derived from Landsat imagery during growing period of durum wheat in 2010, Table S2: Descriptive statistics of remote VIs derived from Landsat imagery during growing period of sunflower in 2011, Table S3: Descriptive statistics of remote VIs derived from Landsat imagery during growing period of bread wheat in 2012, Table S4: Descriptive statistics of remote VIs derived from Landsat imagery during growing period of coriander in 2013, Table S5