^{1}

^{*}

^{†}

^{1}

^{*}

^{†}

^{2}

^{1}

^{1}

These authors contributed equally to this work.

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Multitemporal optical remote sensing constitutes a useful, cost efficient method for crop status monitoring over large areas. Modelers interested in yield monitoring can rely on past and recent observations of crop reflectance to estimate aboveground biomass and infer the likely yield. Therefore, in a framework constrained by information availability, remote sensing data to yield conversion parameters are to be estimated. Statistical models are suitable for this purpose, given their ability to deal with statistical errors. This paper explores the performance in yield estimation of various remote sensing indicators based on varying degrees of bio-physical insight, in interaction with statistical methods (linear regressions) that rely on different hypotheses. Performances in estimating the temporal and spatial variability of yield, and implications of data scarcity in both dimensions are investigated. Jackknifed results (leave one year out) are presented for the case of wheat yield regional estimation in Tunisia using the SPOT-VEGETATION instrument. Best performances, up to 0.8 of R^{2}, are achieved using the most physiologically sound remote sensing indicator, in conjunction with statistical specifications allowing for parsimonious spatial adjustment of the parameters.

Satellite instruments providing frequent, coarse resolution observations, such as AVHRR (Advanced Very High Resolution Radiometer), SPOT-VGT (SPOT-VEGETATION), or MODIS (Moderate Resolution Imaging Spectroradiometer), have been used extensively for crop monitoring and yield estimation at the regional scale (for a review see [

The derivation of biomass proxies (hereafter BP) from satellite observations usually proceeds in two steps. First, the top of canopy spectral reflectances of vegetation are retrieved from top of atmosphere satellite observations by taking atmospheric effects into account. This is typically achieved using atmospheric radiative transfer models (e.g., [

BPs can be estimated with either approach at different times during the growing season. The correlation between the BP and the final yield is expected to increase with later estimates since they are closer to harvest. However, the best timing cannot be known

The selected BP then needs to be translated into grain yield. Two groups of techniques are mainly used to accomplish this step: statistical modeling, both parametric and non-parametric, and crop growth modeling. Models of the first category rely on the availability of reference information (from actual ground measurements usually aggregated at some spatial level) for the empirical estimation of the conversion coefficients. Conversely, in the second group of techniques, RS data is assimilated [

This manuscript focuses on empirical regression modeling and aims to understand the implications, in terms of regional wheat yield estimation performance, of (i) the choice of BP and (ii) the differences in assumptions underlying a chosen set of regression models.

We investigate different RS approaches of increasing physical meaning by considering four BPs: the NDVI and the FAPAR value at a specific time of the year, and the integrated FAPAR and APAR (Absorbed Photosynthetically Active Radiation, the product of FAPAR and incident PAR) values during the period of plant activity. We also use six regression models, from a general country level to more region-specific ones, and the analysis is restricted to the linear framework for describing the relationships between the BP and yield. However, the models vary on their ability to locally adjust the relationship between the BP and actual yields, on their parsimony with respect to the number of parameters to be estimated, and on the implied assumptions concerning the error term they rely on.

The overall goal is to select the combination BP and statistical model that provides the best predictive capacity, avoiding over/under-parameterization given the dimension of the data set available for the calibration of the model. Overparameterization occurs when the amount of information contained in the calibration data is not enough to estimate the model parameters. The resulting model fits the calibration dataset, but produces large errors when used in prediction. Underparameterization refers to a situation in which the available information is not fully exploited by the restricted set of model parameters.

We address three additional important issues related to regional crop monitoring. First is the importance of distinguishing between the ability of a model to estimate the temporal and spatial variability of yield. In fact, in regional yield estimation studies, the model performance in predicting the interannual variability in yield is seldom decoupled from the overall performance in space and time together. However information regarding the temporal variability is of high practical importance for crop monitoring and yield forecasting, whilst information regarding the spatial variability may be of little practical use as the model may only be describing geographic variation in yield.

Second, the implications of data scarcity on model performance. Data availability can restricted by the limited length of RS time series used for crop monitoring (e.g., 14 years for SPOT-VGT). This issue is expected to be even more critical with new satellite missions (e.g., the future ESA-Sentinel 2) if a multi-sensor approach is not used to produce consistent long-term time series, integrating different sensors. Further data restrictions must also be faced in many regions of the world where only a shorter archive of ground yield measurements is available. Therefore, data constraints are investigated and discussed when choosing among statistical models, relying on different assumptions and with varying number of parameters to be estimated to avoid model over/under-parameterization.

Third, for specific applications such as crop monitoring in food-insecure regions, a qualitative yield assessment may be the only solution in the absence of ground calibration measurements. This is often achieved by comparing a BP, at a given time, with its “long-term” average or to a particular reference year (e.g., [

The investigation is performed on durum wheat yield data covering ten Tunisian governorates over the period of 1999–2011, using observations from the SPOT-VGT instrument.

The study area encompasses ten governorates in North Tunisia (Beja, Bizerte, Jendouba, Kairouan, Kasserine, Le Kef, La Manouba, Sidi Bouzid, Siliana and Zaghouan;

In the study area, durum wheat accounts for 60% of the total cereal production, and 51% of the cropped area. Agricultural practices and mineral fertilization are expected to play a role in determining wheat yield, whereas temperature is not considered a major limiting factor compared to the strong inter-annual variability of rainfalls [

This study is based on the analysis of maximum value composited dekadal (here defined as a 10-day period) NDVI and FAPAR products from the JRC-MARS archive (Joint Research Centre of the European Commission, Monitoring Agricultural Resources Unit) derived from calibrated, cloud-screened, and atmospherically corrected (SMAC algorithm, [

The statistical department of the Ministry of Agriculture provided the archive of durum wheat (

Wheat yield is estimated through empirical linear regression models relating RS-derived indicators of aboveground biomass to yield statistics at governorate level. Aboveground biomass is thus assumed to be the main predictor of yield in this study area, characterized by low to moderate productivity compared to other regions of the world (e.g., European Union mean yield is above 5,000 kg/ha, source: Eurostat). Limitations of this approach are represented by the marginal presence of high-yield irrigated crops for which grain productivity may not be linearly related to biomass and the possible occurrence of meteorological (e.g., dry conditions, and heavy rains) or biological disturbances (e.g., diseases) affecting the crop during its late development stages and leading to yield reduction not associated with green biomass reductions, and thus not easily detected by RS methods.

Four candidate BPs of increasing biophysical meaning have been selected from the range of existing techniques proposed to estimate vegetation biomass (see [^{x}^{x}

The last two proxies belong to the group of techniques opting for the integration of the RS indicator over an appropriate time interval (automatically retrieved of fixed _{FAPAR} and CUM_{APAR}, respectively. Incident PAR needed to compute APAR is derived from ERA Interim and Operational models estimate of incident global radiation produced by ECMWF (European Centre for Medium-Range Weather Forecasts), downscaled at 0.25° spatial resolution and aggregated at dekadal temporal resolution [

The cumulative value is calculated, as shown in the example of _{FAPAR} (CUM_{APAR}) as the integral of the modeled values (modeled values times incident PAR) after the removal of the base FAPAR level.

All the candidate proxies are computed for each year in which RS data is available, and for each pixel in the study area. As an example, the spatial distribution of the CUM_{APAR} is presented in

Pixel level biomass proxies are then aggregated at the governorate level as the weighted average according to each pixel’s area occupied by cereals [

The conversion of biomass proxies into actual yields is not a straightforward task. First, the relationship between the two variables may vary, in functional form (e.g., from linear to logarithmic) and in magnitude (

The empirical estimation of this relationship is often made through regression techniques. Models and specifications differ in the hypothesized nature of the link between the variables and in the properties of the subsequent residuals. This is an important issue since wrong choices can lead to biased, inefficient, and inconsistent parameter estimates. The simplest and most widespread way of modeling the relationship between yield and biomass proxies is through Ordinary Least Squares regression (OLS):
_{i,d}_{i,d}_{0}_{1}_{i,d}_{ε}^{2}) (independent and identically distributed with zero mean and the same finite variance). The advantages of such a model are its simplicity, and its parsimony on the number of estimated parameters. This specification, hereafter referred to as pooled OLS (P-OLS), assumes a constant relationship between yield and the BP in both space and time. This relation might not hold in all circumstances, in particular with respect to spatial variation. Indeed, the harvest index may vary spatially because of different management practices, as well as water and nitrogen availability, leading to different yields for the same amount of aboveground biomass. The typical mixture of elements within the elementary pixel (e.g., crops, bare soil, natural vegetation, water, ^{x}^{x}

Although it is recognized that such differences are present at different geographic scales, the spatial information needed for their detailed modeling is not available. Therefore, an alternative approach consists in estimating the yield at the governorate level (G-OLS) to account for these spatial heterogeneities:
_{0,d}_{1,d}

Both models estimate _{i,d}_{i,d}_{v}^{2}) Gaussian error component and _{d}

Although GS-OLS and FE models are more parsimonious than G-OLS, they can still suffer a significant loss of degrees of freedom from the estimation of governorate-specific parameters in datasets where the number of governorates is large. This can be avoided if _{d}_{u}^{2}), (ii) independent from _{i,d}_{i,d}. In this case, the random effects model (RE) is suitable for a consistent, unbiased, and efficient estimation of the unobservable governorate-specific effects (see [_{u}^{2}_{v}^{2}

To take into account possible phenological differences among governorates when using NDVI^{x}^{x}

All the models are assessed using a jackknife technique, leaving one year out at a time (^{2} across BP-statistical model combinations is tested following [

Finally, as data scarcity is a major concern when modeling yields based on RS data, an analysis is run in order to understand how model performance deteriorates with respect to decreasing data availability. We simulated increasing data scarcity in both the temporal and spatial dimensions. In the first case, jackknifed results are again reported but leaving

To facilitate the analysis of the results, a table summarizing the acronyms used for the biomass proxies and statistical models is provided in

We found that NDVI in the second dekad of April (^{11}^{12}_{FAPAR} and CUM_{APAR}, we did not perform any tuning for computing the integration limits.

Before comparing the performances of the different BP-statistical model combinations, it is worth presenting the results of the statistical tests that can be performed in order to guide the choice between regression specifications. F-tests show that models that have a governorate-specific adjustment (

In general, fairly good performances are observed in the Tunisian study case for all combinations of BP-statistical model we tested. Jackknifed
^{11}_{APAR}). This latter combination explains 80% of the actual yield variability over the 13 years and 10 governorates included in the analysis. The jackknifed scatterplot of modeled

With respect to the choice of BP, the one with the higher physical relevance, _{APAR}, consistently appears as the best yield predictor across models. However, the magnitude of the improvement is rather low, ranging from 6% to less than 1%. With respect to the choice of the statistical model, the FE appears as the best performing regardless of BP used. This confirms that the relationship between yield and BP is not necessarily spatially homogenous, and that governorate-specific errors are best accommodated using an FE model. Holding the BP constant and considering FAPAR^{12}_{APAR}, FE model performances are not substantially different from the other models allowing for some governorate tuning: G-OLS, GS-OLS, and RE. However, the implications of opting for FE, GS-OLS, or RE models instead of P-OLS or G-OLS are important since the P-OLS cannot take local errors into account, while G-OLS may suffer from over-parameterization.

Despite the small the sample size (128 observations), several statistically significant differences have been detected: the FE model using CUM_{APAR} outperforms all tested BPs when using the P-OLS model and, with the exception of the FE, all statistical models that make use of NDVI^{11}_{FAPAR} as BP. However, once the FE model is adopted no conclusion can be reached with respect to what BP is to be used (no statistically significant differences between FE models).

The following step of the analysis is to assess to what extent the combinations of BP-models are able to mimic the temporal variability of the data. This is accomplished by analyzing the jackknifed

As expected, a lower fraction of the yield variance is explained by the BPs when considering the temporal variability instead of the overall variability.
^{11}_{APAR}. The range of
_{APAR} with less than 10% chances of error. The first two also rely on CUM_{APAR} as BP and model the relationship with yield using GS-OLS or RE. Here, it is worth noting that results are robust to changes in growth and decay thresholds used for the computation of CUM_{APAR} and CUM_{FAPAR} (all combinations having been tested using 1%, 5% and 10% thresholds). The second two make use of the FAPAR^{12}^{12}_{APAR}, which requires the whole season of observations. However, care must be taken in adopting the single-timing approach to more extended and/or heterogeneous geographical settings. In fact, the method is not robust to significant changes in crop phenology that may occur due to differences in climatological conditions or farming practices. Facing these conditions, prior knowledge about climate, soil, and cropping systems should be exploited to segment the study area into regions with similar characteristics. Then, the best correlated dekad should be selected for each region and its empirical predictive capability tested.

In other operational crop monitoring applications, where no reliable yield data is available for model calibration, the BP is the only available information. Typically, in such situations the BP values are compared at different spatial locations and different times to highlight anomalies. For this purpose one would requires the linear relation between the BP and yield to be spatially homogenous. In fact, without ground measurement, it is not possible to spatially adapt the relationship. An insight on the spatial homogeneity of the relationship between the various BPs and yield is obtained by analyzing their performance when the P-OLS is used (_{APAR} performs significantly better than P-OLS-NDVI^{11}^{12}_{FAPAR}, suggesting that cumulated values over the actual season should be used in the absence of ground measurements.

Finally, the application of the Dek-G-OLS, did not improve the performances of yield estimation (

The effects of data scarcity in the temporal dimension on the different modeling solutions are investigated by reducing the number of years on which the models are fitted from the original 13 down to four. For this purpose, we only analyzed the best candidate BPs (FAPAR^{12}_{APAR}) and all statistical models but GS-OLS (

As expected,

With respect to the BP used, it’s worth noting that the use of the more physiologically meaningful CUM_{APAR} improves the performances under two circumstances: when no governorate-specific tuning is performed (

Another source of data scarcity to be explored is the number of spatial entities (

By definition, the performance of G-OLS is insensitive to reductions of sample size in the spatial domain. Also by definition, the performance of the three models converges when a single governorate is used. The FE remains the best performing model regardless of the number of governorates, provided more than five years of data is available for the analysis. In fact, given the amount of available data, the estimation of the unobserved fixed effects guarantees the model to perform better than both the P-OLS (that assumes a unique relationship between BP and yield), and G-OLS, as fewer parameters need to be estimated. Consequently, the FE model’s RMSE is lower and robust to sever data restrictions both in the temporal and the spatial domains (

In a case study of regional durum wheat yield estimation in Tunisia using 13 years of low-resolution SPOT-VGT observations, we have explored the interactions among four different remote sensing biomass proxies characterized by increasing physiological meaning, and six different statistical models relying on different hypotheses and using a varying number of degrees of freedom. The analysis aimed at assessing the improvements of the adoption of more sophisticated biomass proxies and the tradeoff between model complexity and data availability.

Results show that the high yield variability (spatial and temporal) in Tunisia can be predicted using Earth observation techniques with jackknifed

When focusing on the ability of the models to describe the temporal yield variability, the ranking between biomass proxy-statistical model combinations remained stable, although the

The results also showed that for qualitative crop monitoring, where crop conditions are to be assessed in the complete absence of ground calibration measurements, phenologically-tuned biomass proxies should be preferred to single observation indicators. First, because yield ground measurements are needed for the selection of the optimal NDVI or FAPAR dekad while the phenological tuning is performed in absence of such information; and secondly, because phenologically-tuned biomass proxies have been shown to have a more robust linear relationship with yields.

Finally, we assessed the role played by data scarcity on the yield estimation accuracy. Results confirm that the selection of the model specification should take into account the number of observations available, and not merely the expected spatial heterogeneity of the yield-BP relationship.

We thank our colleagues Dominique Fasbender and Josh Hooker for the support in the statistical analysis and the revision of the manuscript. We thank Myriam Haffani, head of SCAT project (Suivi des Cultures Annuelles par Télédétection) at Centre National de Cartographie et de Télédétection (CNCT, Tunisia) for her precious collaboration.

Summary table of biomass proxies and statistical models.

| |

^{x} |
NDVI value for the ^{th} |

^{x} |
FAPAR value for the ^{th} |

_{FAPAR} |
Cumulative FAPAR value during the growing period (estimated for each pixel, each season) |

_{APAR} |
Cumulative FAPAR*PAR value during the growing period (estimated for each pixel, each season) |

| |

| |

Pooled Ordinary Least Squares, 2 parameters | |

Random Effects model, 4 parameters (slope, intercept, and variances of the unobservable effects) | |

Governorate Slope OLS, slopes estimated for each governorate, single intercept, G + 1 parameters (G is the number of governorates) | |

Fixed Effects model, intercepts estimated for each governorate, single slope, G + 1 parameters | |

Governorate specific OLS, intercepts and slopes estimated for each governorate, 2 × G parameters | |

G-OLS model for which the most correlated dekad is selected for each governorate, 2 × G parameters plus G dekad selections |

Location of the study area and cereal area cover fraction. The map of the whole country is reported in the upper right corner.

Box-and-whisker plot showing wheat yield for years 1999–2011. Medians, quartiles, and extreme values are given. Department on the x-axis are ordered from North to South.

Example of computation of CUM_{FAPAR} for the season of 2010–2011, and a pixel located in Beja governorate (36.4776°N, 9.2991°E). Dots refer to actual FAPAR observations. The blue line and area represent the fitted PDHT model and the cumulative FAPAR value, respectively. Black vertical dashed lines indicate

Average CUM_{APAR} during the period 1999–2011 for cereal crop areas.

Modeled _{APAR}. The 1:1 line is drawn for reference.

Jackknifed Root Mean Square Error (RMSE) of different modeling solutions as a function of the number of available years of data (the parameters of each model have been estimated with the years available less one). Only models using FAPAR^{12} and CUM_{APAR} are showed.

Jackknifed Root Mean Square Error (RMSE) of three modeling solutions (P-OLS, FE, and G-OLS) as a function of the number of governorates and the number of years of data (fou (_{APAR} are showed.

Jackknifed
^{2}^{<}, ^{<<} and ^{<<<}.

_{(Pooled OLS, n = 2)} |
_{(Gov. OLS, n = 20)} |
_{(Gov. Slope OLS, n = 11)} |
_{(Fixed Effects, n = 11)} |
_{(Random Effects, n = 4)} | |
---|---|---|---|---|---|

^{11} |
0.66^{<<<} |
0.71^{<<} |
0.74^{<<<} |
0.75 | 0.75^{<<<} |

^{12} |
0.68^{<<<} |
0.75 | 0.77 | 0.79 | 0.78 |

_{FAPAR} |
0.71^{<<<} |
0.72^{<<} |
0.75^{<<} |
0.77 | 0.76^{<} |

_{APAR} |
0.72^{<<} |
0.75 | 0.78 | 0.79 |

Jackknifed
^{2} that are significantly lower than the higher value (in bold) at 10%, 5% and 1% are respectively indicated by superscripts ^{<}, ^{<<} and ^{<<<}.

_{(Pooled OLS, n = 2)} |
_{(Gov. OLS, n = 20)} |
_{(Gov. Slope OLS, n = 11)} |
_{(Fixed Effects, n = 11)} |
_{(Random Effects, n = 4)} | |
---|---|---|---|---|---|

^{11} |
0.36^{<<<} |
0.43^{<<<} |
0.50^{<<<} |
0.53^{<} |
0.52^{<<<} |

^{12} |
0.40^{<<<} |
0.51^{<} |
0.57^{<} |
0.60 | 0.59 |

_{FAPAR} |
0.45^{<<<} |
0.45^{<<<} |
0.53^{<<<} |
0.55^{<} |
0.54^{<<<} |

_{APAR} |
0.46^{<<<} |
0.53^{<} |
0.58 | 0.60 |