1. Introduction
Over the last 50 years, the world production of wheat has increased with a positive trend of around 8.7 million tons per year (value derived from the statistics of [
1]). The maximal surface allocated to wheat was reached in the 1980s with 239 million ha. With actual values higher than 200 million ha, wheat is the world’s most abundant crop in terms of harvested area. In France, one fifth of the useful agricultural surface is dedicated to the cultivation of wheat [
2]. In this context, spatial information delivered by satellite missions represents unique opportunities to assist management decisions in precision agriculture.
Several on-going satellite missions provide images allowing for monitoring of the intra-plot variability of crop status, with the spatial scale consistent with the recent advances in yield sensors onboard harvesting machines. The usefulness of remote sensing signals acquired in the optical or the microwave wavelengths for the detection of the phenological stages, of agricultural practices, or of damage on crops has; thus, been demonstrated on different studies, as well as the survey of crop properties (e.g., leaf area index, crop height, biomass) [
3,
4,
5]. Regarding the estimates of yields, different approaches have been proposed to take full advantage of such regular information, by assimilating images into agro-meteorological models or by using statistical algorithms [
6,
7,
8,
9,
10].
The objective of this study is to estimate the intra-plot variability of wheat yields by combining satellite acquisitions performed by Landsat-8 and Sentinel-2A, and taking advantage of the previous measured yields. Ground data collected by sensors onboard harvesting machines are described in
Section 2, together with high spatial resolution images. The proposed approach is based on random forest, considering reflectance as predictive variables, with or without previous observations, and crop yields as the target (
Section 3). The results are analyzed and discussed in
Section 4, focusing on the overall statistical performances obtained for four successive agricultural seasons and showing one example of yield map.
2. Materials
2.1. Study Site
The study area is located in southwestern France in the Gers County. Surrounded by valleys, the territory is characterized by a great diversity of landscapes and types of soil comprising ustic luvisols, limestone, clay–limestone, or more sandy soils. The county is subject to oceanic and Mediterranean climatic influences, with a precipitation regime spatially and annually variable. The useful agricultural area covers 71% of the territory (or 447,223 ha), being mainly dedicated to the cultivation of seasonal crops (cereals for 44.5% or oleaginous and proteinaceous for 24%) or forage crops and evergreen surfaces for 19% [
2]. The present paper focuses on the main cultivated crops in the Gers County—wheat—for which the agricultural season, delineated by the sowing and harvesting periods, is observed from autumn to summer of the following year, respectively.
2.2. Intra-Plot Yield Data
The yields of wheat were collected during four successive agricultural seasons over 10 or 12 field plots, depending on the considered year. The same plots were dedicated to the cultivation of wheat during the following rotations 2014–2016 and 2015–2017. Their sizes ranged from 3.2 to 28.6 ha, representing an amount of more than 500 ha considering the period of the study. Descriptive statistics (i.e., means and standard deviations) were derived at the plot scale. The mean values of yield ranged from 41.9 to 67.8 q.ha−1, showing a variability depending on the considered plot, as evidenced by the coefficients of variation (CV = 100 × standard deviation / mean) that ranged from 11.2% to 32.1%. A maximal difference of ~8.5 q.ha−1 was observed between the highest and lowest productive years (i.e., difference between 2015 and 2017 agricultural seasons, showing values of 58.9 to 50.4 q.ha−1 on average).
The yield values were derived from the data collected by the surveying harvesting machine with GPS system on track mode, namely the distance, the width of the cutting bar, the flux, and the humidity of grain. The distance and the width of the cutting bar were first combined to obtain the area matching with the grain flux. The harvested yields were then computed and dry yields were last calculated by accounting for the humidity of grain. All the measurements performed in a pixel with a spatial resolution of 30 m were aggregated, avoiding the extreme values (i.e., average plus three sigma or 99.7% of the values). Those maps of yields constitute the targeted variable of the statistical algorithm.
2.3. Satellite Data
The satellite images acquired during the four agricultural seasons are presented in
Table 1. From October 2013 to July 2017, 52 high spatial resolution images were provided by Landsat-8 (12, 14, and 7 images, in 2014, 2015, and 2016, respectively) and Sentinel-2 (7 and 12 images, in 2016 and 2017, respectively). From 12 to 14 regular acquisitions were available for the monitoring of wheat, from the sowing to the harvest of the crop. The approach developed in this study relied on comparable spectral bands, that is reflectance measured for blue, green, red, near infrared, and short-wavelength infrared.
3. Methodology
The satellite images were first processed by applying the steps described hereinafter (
Section 3.1) to obtain reflectance at a 30 m spatial scale. The images acquired during the agricultural season (i.e., after the sowing and before the harvest, that is from November to July over the study area) constitute the input data of the statistical algorithm (
Section 3.2).
3.1. Images Processing
The Landsat-8 and Sentinel-2 images were provided by the Theia land data center. The satellite data were processed using the software developed by [
11], delivering level 2A products characterized by ortho-rectified surface reflectance. The images were first corrected from atmospheric effects and provided with a mask of clouds and their shadows on the ground (using a multi-temporal algorithm). All the images were finally resized at the same spatial resolution of 30 m. The satellite images constitute the input data of the statistical algorithm described hereinafter, considering two cases: the widely-used NDVI (Normalized Difference Vegetation Index) or the combination of the six reflectances.
3.2. From Satellite Signals to Yields Estimates
A random selection of samples was used to partition the dataset into independent training and testing sets. Different ratios of data were tested; nevertheless, for the sake of conciseness, the present study focus only on the performance obtained using a ratio of 50:50.
The estimation of yield was based on random forest [
12], involving conditional regression models to predict a quantitative variable. Such a statistical algorithm combined an ensemble of independent decision trees trained on different sets of samples, through a procedure of bootstrap aggregation. The estimates provided by the ensemble of decision trees were finally aggregated through the weighted mean of the ensemble of estimations, providing an estimate of the targeted variable. This non-parametric approach was particularly appropriate in a multi-factorial context to model non-linear relationships, limiting the problems of over-adjustment or the noise influence on data, and providing a high stability of results.
Coefficient of determination (R²) and root mean square error (RMSE) were finally derived from the comparison between the observed and estimated yields at the pixel size. In the following section, only the results obtained from the independent testing set are presented.
4. Results and Discussion
4.1. Overall Performances for the Four Successive Agricultural Seasons
The statistical performances obtained using all the available images acquired throughout the 2014, 2015, 2016, and 2017 agricultural seasons are presented in the
Figure 1. The wheat yields are estimated considering four cases, using only satellite data as input of the statistical algorithm (i.e., the NDVI or the combination of satellite reflectances) or adding yield values collected during the previous or past crop rotation (the surveyed fields being dedicated to the cultivation of wheat every two years).
The best performances are obtained when the NDVI is combined with the yield maps, regardless of the considered agricultural season. In such case, the agricultural season 2014 shows the lower level of performances with a R² of 0.44 and a RMSE of 8.13 q.h−1 (corresponding to a relative error of 12.9%), the three other years being associated with values of R² close or upper to 0.60 and RMSE lower than 7 q.h−1 (corresponding to a relative error inferior to 11.3%). Such magnitude of error on yield estimates appears acceptable, values of RMSE being lower than with the observed variability, whatever the considered year (mean standard deviation of 11.8, 9.8, 10.0, and 8.9 q.h−1 for the years 2014, 2015, 2016, and 2017, respectively).
The results presented by [
9] provide an interesting comparison with the proposed study, as the estimates are based on the combination of Landsat-8 and Sentinel-2A images. In this study, the estimated winter wheat yields are compared with official statistics at district level for a study site located in Ukraine, showing a maximal R² of 0.50 and a relative error of 6.5% for best performance. Furthermore, the magnitude of error obtained on yield estimates is close to the performance presented by [
10] (R
2 = 0.76 and RMSE = 7.0 q ha
−1); a study based on successive acquired optical and radar images with artificial neural networks for yield retrieval at a field spatial scale.
Finally, the performance offered by the proposed statistical approach can also be compared to approaches combining satellite data and crop models through an assimilation scheme. Those more complex methods are characterized by a wide range of performance regarding the estimate of wheat yields (with, for instance, R² ranging from 0.50 to 0.91 [
6,
7,
8,
9]), explained by a set of factors such as the complexity of the considered model, the number of parameters, the targeted variables, and the method of assimilation. Furthermore, the variability in accuracy in those previous studies is also related to the validation dataset, which is often limited to few measurements (due to the difficulty to obtain such information) and acquired at the plot scale. The yields collected at the intra-plot scale by a surveying harvesting machine with GPS system on track mode allows a fully independent calibration and validation steps on a large dataset to be performed (more than thousands of measurements for each studied year), and to assess the performance of the proposed approach at a spatial scale consistent with the resolution of satellite images.
4.2. Yield Maps Estimates at the Intra-Plot Spatial Scale
Examples of yield maps obtained on one monitored field are finally presented in
Figure 2. The estimated values are based on the NDVI derived from images acquired during the agricultural season 2017 (
Figure 2a); satellite data are combined with the previous yields observed in 2015 (
Figure 2b) and compared to intra-plot measurements (
Figure 2c). The two maps of estimated yields exhibit comparable intra-plot spatial patterns, as evidenced by the correlation upper of 0.90 between the two maps. The intra-plot patterns associated to low and high values are well predicted, even if extreme measured values are not well reproduced by the statistical approach. Such observation is confirmed considering all the pixels of the plot. Indeed, the averages of estimated yields are close (i.e., 58.9 and 58.3 q.ha
−1, respectively, considering predicted values based on the NDVI or satellite data combined with previous yields) and consistent with the measured yields (57.8 q.ha
−1), confirming the ability of the proposed approach to carefully estimate the general behavior. Nevertheless, the observed variability of yields is not totally well reproduced, as evidenced by the higher value of standard deviation of measured yields (9.5 q.ha
−1) compared to those derived from estimates based on the NDVI (5.5 q.ha
−1) or on the combination of satellite images and the previous yield map (6.4 q.ha
−1).
5. Conclusions
In this study, a method was implemented to assess intra-plot yield of wheat at a decametric spatial scale. The description of the seasonality of the vegetation, through the reflectance provided by the Landsat-8 and Sentinel-2 optical images, was the predictive input variable of a statistical algorithm. The image-only approach has benefited from plot tracking history in previous crop rotations. The proposed approach fits perfectly into a context of precision farming, where access to information on soil and vegetation metrics is increasing, especially from harvesting machine collecting yield’s measurements.
The method was evaluated for wheat crops cultivated within the Gers County (southwestern France) during four successive agricultural seasons. From 12 to 14 regular satellites acquisitions were available from the sowing to the harvest of the crop, allowing for a comparison of the yield’s predictive capacities provided by the NDVI or the combined use of six spectral bands. The magnitudes of error on yield estimates appear acceptable, values of RMSE being lower than the observed variability, whatever the considered year (mean standard deviation of 11.8, 9.8, 10.0, and 8.9 q.h−1 for the yields collected in 2014, 2015, 2016, and 2017, respectively). The times series of NDVI combined with the previous yield maps provided the best yield estimates, the agricultural season 2014 showing the lower level of performances (R² of 0.44 and a RMSE of 8.13 q.h−1), while the three other years were associated with values of R² close or upper of 0.60 and RMSE lower than 7 q.h−1.
The results presented in this short communication focus on one aspect of the analyses performed regarding wheat yield estimates. Supplementary results will be developed later, in a longer paper.