Do agrometeorological data improve optical satellite-based estimations of the herbaceous yield in Sahelian semi-arid ecosystems?

herbaceous Abstract: Quantitative estimates of forage availability at the end of the growing season in rangelands are helpful for pastoral livestock managers and for local, national and regional stakeholders in natural resource management. For this reason, remote sensing data such as the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) have been widely used to assess Sahelian plant productivity for about 40 years. This study combines traditional FAPAR-based assessments with agrometeorological variables computed by the geospatial water balance program, GeoWRSI, using rainfall and potential evapotranspiration satellite gridded data to estimate the annual herbaceous yield in the semi-arid areas of Senegal. It showed that a machine-learning model combining FAPAR seasonal metrics with various agrometeorological data provided better estimations of the in situ annual herbaceous yield (R 2 = 0.69; RMSE = 483 kg · DM/ha) than models based exclusively on FAPAR metrics (R 2 = 0.63; RMSE = 550 kg · DM/ha) or agrometeorological variables (R 2 = 0.55; RMSE = 585 kg · DM/ha). All the models provided reasonable outputs and showed a decrease in the mean annual yield with increasing latitude, together with an increase in relative inter-annual variation. In particular, the additional use of agrometeorological information mitigated the saturation effects that characterize the plant indices of areas with high plant productivity. In addition, the date of the onset of the growing season derived from smoothed FAPAR seasonal dynamics showed no signiﬁcant relationship (0.05 p-level) with the annual herbaceous yield across the whole studied area. The date of the onset of rainfall however, was signiﬁcantly related to the herbaceous yield and its inclusion in fodder biomass models could constitute a signiﬁcant improvement in forecasting risks of a mass herbaceous deﬁcit at an early stage of the year. determined by a set of functions integrating rainfall amount (PPT), plant available water (PAW), critical soil water level (SWC), soil water holding capacity (WHC), crop root depth (RD) and soil water content (SW) at the end of the study period. SW can be estimated using a soil water balance [60,63] with Equations (4).


Introduction
In the Sahel belt south of the Sahara desert, an arid to semi-arid region, the vegetation is composed of a herbaceous layer dominated by annual grasses and scattered woody plants including bushes, shrubs and small trees, among which are several thorny species [1,2]. These areas provide the bulk of pastoral livestock feeding [3] and contribute to carbon sequestration, nutrient uptake and cycling, soil fixation and soil biologic activity, as well as water cycle regulation. In this context, an accurate evaluation of herbaceous mass yield at the end of the growing season is essential for ensuring the rational use of available resources and environmental sustainability. Vegetation indices derived from satellite data have been widely used to monitor Sahelian herbaceous productivity for about 40-years. After the severe drought of 1970s, the first applications of herbaceous yield estimation in Sahelian rangelands using the Normalized Difference Vegetation Index (NDVI) from the National Oceanic and Atmospheric Administration-Advanced Very High Resolution Radiometer (NOAA-AVHRR) were published and were pioneer studies in the field of applied remote sensing [4][5][6]. In the past two decades and with regard to the technological advances in sensor design for vegetation monitoring [7], new satellites such as the Satellite Pour l'Observation de la Terre-VEGETATION (SPOT-VGT) and the Moderate Resolution Imaging Spectroradiometer (MODIS)-TERRA/AQUA have been launched and more datasets have become available with higher spatial and denser temporal resolutions [8,9]. In addition to the NDVI, the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) in relation to surface processes such as photosynthesis [10] has been recognized as constituting a key variable in the assessment of vegetation status [11,12]. For this purpose, the metrics of the FAPAR seasonal dynamics are commonly used in environmental studies [13][14][15] and their potential for herbaceous mass monitoring in Sahelian rangelands has recently been endorsed by [16]. Most studies however, rely exclusively on satellite-based vegetation indices (e.g., [13,17]), with only a minority combining these indices with rainfall data, soil water status indicators and ground plant mass data.
Rainfall distribution is generally considered as the main driver of plant growth in the West African Sahel [18][19][20][21], although there are many other local drivers of plants' photosynthetic capacity and spatial variability in the Sahel [22,23]. Among them, the supply of mineral nutrients such as nitrogen (N) and phosphorus (P) are major determinants of plant growth rate [24]. In the Sahel, N and P are the main limiting factors to plant production when rainfall is sufficient. For this reason, [25] proposed is the subdividing of the Sahel belt according to the 250 mm isohyet, with rainfall as the main limiting factor below 250 mm and the nutrients N and P the main limiting factors above 250 mm annual rainfall. The wide temporal variability of N and P however, in relation to the soil moisture regime and the absence of reliable information on their spatial distribution across Sahelian regions makes it difficult to include this information in statistical modeling approaches. In contrast, comprehensive rainfall data has become increasingly accessible for use in the Sahel through the development of satellite-based products such as the Tropical Applications of Meteorology using SATellite (TAMSAT) data and ground-based observations [26], the Famine Early Warning System NETwork (FEWS NET) Rainfall Estimate (RFE) [27] and the African Rainfall Climatology (ARC) [28]. The use of rainfall data as the overall driver of plant growth is supported by the high variability of Sahelian herbaceous productivity, which is difficult to predict in time and space [29,30]. Not all rainfall water however, is directly available to plants. Rainfall water availability is mediated by the redistribution (i.e., run-off/run-on) on the soil surface in relation to soil physical characteristics (i.e., structure and texture), to topography [25] and to the physical nature of the canopy [31]. This could help explain the poor relationship observed between herbaceous yield and annual total rainfall in the Sahelian rangelands of Niger and Mali [32,33]. It justifies the concept of a water requirement index (also called a water satisfaction index or a water requirement satisfaction index, WRSI) developed and implemented through a soil water balance model named the Crop Specific Water Balance (CSWB) by the Food and Agricultural Organization (FAO) of the United Nations. With this model, the water balance of a given crop can be calculated in time increments, usually 10 days (i.e., dekad), as presented in Equation (1) [34]. Then, when filled with complementary parameters, the CSWB model produces a set of water status indicators generally used to assess the effect of weather conditions on crop development and yield [35,36].
where, W d : amount of water stored in the soil at the end of the dekad (d) W d-1 : amount of water stored in the soil at the end of the previous dekad (d-1) R: cumulated rainfall during the dekad ETm: maximum evapotranspiration in the decadal period r: represents the water losses due to runoff in the decadal period i: represents the water losses due to deep percolation in the decadal period For its agricultural monitoring activities, FEWS NET developed a grid cell-based modeling environment from the FAO's CSWB [37]. The established geospatial model, GeoWRSI, was then enhanced by [38] for West Africa's Sahelian rangelands [39]. Plant production models established solely from the absorbed PAR, as mentioned earlier and according to [40], neglect the direct effects of other factors such as water availability and nutrient shortage on plant growth. Models including both soil moisture (i.e., soil water status indicators) and the metrics of FAPAR seasonal dynamics are therefore expected to overcome FAPAR-related problems (e.g., saturation and cloud contamination) and to improve herbaceous standing mass estimations. In such models, agrometeorological data introduce information about soil water availability, whereas FAPAR metrics provide information on herbaceous productivity and species patterns not taken into consideration by the agrometeorological component [41].
In this context, the overall aim of our study was to predict the herbaceous yield across the arid to sub-humid areas of Senegal. The originality of the study was the combination of satellite-based rainfall estimates, agro-ecological data and satellite-derived FAPAR metrics using a machine learning approach. The more specific objectives were to: (i) develop three herbaceous models based only on FAPAR metrics, only on agrometeorological variables and on the combined FAPAR and agrometeorological variables; (ii) conduct a spatio-temporal comparison of model outputs; and (iii) analyze the relationship between herbaceous yields and the onset/end of season metrics calculated from FAPAR and satellite-based rainfall data for early warning purposes.

Study Area
The study area covered 15 districts belonging to five administrative regions (i.e., Saint-Louis, Matam, Kaffrine, Louga and Tambacounda), with a total area of 125,000 km 2 ( Figure 1). The area lies in the Sahelian and northern Sudano-Guinean zone of Senegal between 16.69 • N and 12.63 • N latitude and 16.74 • W and 11.86 • W longitude. It includes all natural grazing areas of the pastoral domain as defined by [42], as well as some croplands, including fallows. The mean annual precipitation varies between 200 and 980 mm from north to south, with reference to the FEWS NET rainfall estimates for the 2000-2015 period [43,44]. The rainy season, driven by the West African monsoon, is unimodal, occurring over 3-5 months between June and October. The studied area includes different ecoregions, with a prevalence of red-brown sandy soils, ferruginous tropical sandy soils, leptic, gley and vertic soils [45,46]. Typical of the Sahel, the herbaceous vegetation is particularly dependent on the intra-seasonal rainfall distribution [47] and is dominated by annual plants with C4 photosynthesis [48]. According to the Centre de Suivi Ecologique (CSE) [49,50], the northern zone (≤300 mm annual rainfall) is characterized by Poaceae such as Chloris prieurii, Aristida mutabilis and Dactyloctenium aegyptium, as well as legumes such as Alysicarpus ovalifolius and Zornia glochidiata, whereas in the central zone (between 300 and 500 mm) the characteristic species are Zornia glochidiata (Fabaceae) and Schoenfeldia gracilis, Pennisetum pedicellatum and Eragrostis tremula (all Poaceae). Towards the south, Andropogoneae species such as Andropogon pseudapricus and A. amplectans are    The in situ herbaceous mass data used in this study for the calibration of the models were collected annually from 2000 to 2015 (except for 2004) from 24 field sites located in free access natural rangelands, as shown in Figure 1 (black dots). The sites were selected away from the main water points and pastoral camps in order to avoid heavy grazing. Each field site covered a 3 × 3 km 2 homogeneous area and represented the most common land cover classes. Above-ground herbaceous mass was measured towards the end of the growing season, in early October. The technique used was the stratified sampling line originally proposed by the International Livestock Centre for Africa (ILCA) for monitoring pastoral ecosystems in the Gourma of Mali [55]. Along a 1000 m transect, four strata (bare soil, and low, medium and high herbaceous mass production) were identified. Then, taking into account the variability of herbage mass in the three covered strata, between 35 and 100 plots (each 1 m 2 ) were sampled randomly along the transect line and fresh herbaceous mass was cut and weighed within each plot. After re-sampling, three samples for each stratum (i.e., nine samples per site) were kept for drying. The dry matter rate obtained by dividing the dry herbaceous mass weight by the fresh mass weight, as well as the relative frequency along the 1000 m transect, were then used to calculate the herbaceous mass yield in kg·DM/ha, first for each of the three strata and then for the site by adding them together. Note that the biomass data were not regularly collected in all monitoring sites due to occasional lack of logistics or the early passage of bush fires before field measures [16]. For more detailed information on the overall herbaceous mass collection, see [16].

FAPAR Vegetation Dynamics and Calculated Metrics
GEOV1 Copernicus Global Land FAPAR runs from 24 December 1998 to the present day at a ground sampling distance of 1/112 • (about 1 km at the Equator) and 10-day steps [56]. Derived from the SPOT-VEGETATION (from 2000 to 2013) and Proba-V (for 2014 and 2015) instruments, this product is freely available on http://land.copernicus.eu/global. For detailed information on the principles used to estimate the GEOV1 FAPAR product, see [10].
In order to calculate seasonal metrics, the GEOV1 FAPAR time series were filtered using the Savitzky-Golay (SG) fitting method available via TIMESAT software [16,57]. Filtering is essential in semi-arid areas such as the Sahel, where many factors (e.g., clouds, aerosols, shadows, surface water) tend to produce noisy and erroneous FAPAR values [58,59]. From the filtered FAPAR time series, eight metrics were retrieved with TIMESAT and used in this study ( Figure 2): the start of the plant growth season (SOS); the end of the season (EOS); the length of growing season (LOS); the maximum FAPAR value over the season (PEAK); the amplitude (AMPL) corresponding to the difference between PEAK and the averaged left and right minimum values over the ongoing annual cycle (BVAL); the small integrated FAPAR (SINT) from SOS to EOS and above BVAL; the small integrated FAPAR from SOS to PEAK time and above BVAL (GSINT); and the decreasing rate during the senescence phase (RDERIV). All these metrics have been described by [15,16], except for GSINT, which was computed to include information during the green-up phase. SOS and EOS are essential for calculating the cumulated FAPAR metrics and were set to occur at 20% and 50% of the seasonal AMPL before and after the peak value, respectively [16]. All the metrics were derived for each year between 2000 and 2015 on a pixel basis and 1 km spatial resolution. The annual values were then averaged for each site over an area of about 3 × 3 km 2 to match, as far as possible, the spatial sampling of the ground data.

Obtaining Agrometeorological Data
Agrometeorological data was calculated using the GeoWRSI model of FEWS NET adapted to a grid cell-based modeling environment [37] from the original water balance algorithm of FAO [35]. As noted by [60], the GeoWRSI model was enhanced and extended into an operational mode for Africa, Central America and Afghanistan after the work by [61]. The WRSI indicator (Equation (2)) is the ratio of seasonal actual crop evapotranspiration (AETc) to the seasonal crop water requirement, which is equivalent to potential crop evapotranspiration (PETc) [60]. AETc represents the actual amount of water withdrawn from the soil water reservoir by vegetation transpiration and soil evaporation [60,62] and PETc is the cumulative optimum crop water requirement as met by rainfall and soil moisture for a given accumulation period [39]. Following [60], the PETc at a given time in the growing season is calculated by multiplying the potential evapotranspiration (PET) by the crop coefficient (Kc), as in Equation (3). The AETc is determined by a set of functions integrating rainfall amount (PPT), plant available water (PAW), critical soil water level (SWC), soil water holding capacity (WHC), crop root depth (RD) and soil water content (SW) at the end of the study period. SW can be estimated using a soil water balance [60,63] with Equations (4).
where i corresponds to the time step. For more information on the development and parameterization of the GeoWRSI model, see [60,61]. The main parameters needed to run the model are PPT, PET, WHC and Kc values. We ran the model in this study using the Africa-wide blended satellite-gauge rainfall estimate images implemented by the Climate Prediction Center (CPC) of NOAA [43,44] and the global PET images estimated from the 6-hourly numerical meteorological model output using the Penman-Monteith equation [64]. Rainfall and PET images were downloaded in dekadal time steps with a 0.1 degree (about 10 km) and 1 degree (about 110 km) spatial resolution respectively, from the ftp server of the Climate Hazard Group for the archive data between 2000 and 2010 [65] and the FEWS NET server

Obtaining Agrometeorological Data
Agrometeorological data was calculated using the GeoWRSI model of FEWS NET adapted to a grid cell-based modeling environment [37] from the original water balance algorithm of FAO [35]. As noted by [60], the GeoWRSI model was enhanced and extended into an operational mode for Africa, Central America and Afghanistan after the work by [61]. The WRSI indicator (Equation (2)) is the ratio of seasonal actual crop evapotranspiration (AETc) to the seasonal crop water requirement, which is equivalent to potential crop evapotranspiration (PETc) [60]. AETc represents the actual amount of water withdrawn from the soil water reservoir by vegetation transpiration and soil evaporation [60,62] and PETc is the cumulative optimum crop water requirement as met by rainfall and soil moisture for a given accumulation period [39]. Following [60], the PETc at a given time in the growing season is calculated by multiplying the potential evapotranspiration (PET) by the crop coefficient (Kc), as in Equation (3). The AETc is determined by a set of functions integrating rainfall amount (PPT), plant available water (PAW), critical soil water level (SWC), soil water holding capacity (WHC), crop root depth (RD) and soil water content (SW) at the end of the study period. SW can be estimated using a soil water balance [60,63] with Equations (4).
where i corresponds to the time step. For more information on the development and parameterization of the GeoWRSI model, see [60,61]. The main parameters needed to run the model are PPT, PET, WHC and Kc values. We ran the model in this study using the Africa-wide blended satellite-gauge rainfall estimate images implemented by the Climate Prediction Center (CPC) of NOAA [43,44] and the global PET images estimated from the 6-hourly numerical meteorological model output using the Penman-Monteith equation [64]. Rainfall and PET images were downloaded in dekadal time steps with a 0.1 degree (about 10 km) and 1 degree (about 110 km) spatial resolution respectively, from the ftp server of the Climate Hazard Group for the archive data between 2000 and 2010 [65] and the FEWS NET server for the latest data between 2011 and 2015 [66]. The WHC image shows the spatial variation of easily available water capacity in the upper 100 cm, based on soil physical characteristics [37] and was obtained from the FAO digital soil map of the world [67] with a 0.1 degree spatial resolution. Kc values correspond to those proposed by [68] for the extensive grazing pastures in semi-arid climates ( Figure 3). These values were chosen with the assumption that the study area is uniform in hydro-climatic conditions and after a visual analysis of WRSI variable which showed the most reasonable distribution for a median year regarding rainfall conditions. of at least 25 mm of rainfall received over a given dekad, followed by a total of 20 mm for the next two dekads as applied by [37] after [69]. The end-of-season date (EOSp) occurred when a climatological ratio between rainfall and potential evapotranspiration (i.e., PPT < PET/2) was observed [60]. SOSp and EOSp were in turn used to calculate the herbaceous vegetation WRSI. In addition to WRSI, SOSp and EOSp, we calculated 12 other agrometeorological variables for the growing periods of each year (Table A1), particularly actual evapotranspiration (AET), water deficit (WDEF) and water surplus (WSUR) that accumulated during each of the four herbaceous vegetation growth phases (illustrated in Figure 3): initial (i); vegetative (v); flowering (f); and ripening (r). The seasonal fractions assigned to these four phases were 23%, 39%, 23% and 15% respectively, (numbers taken from the Sahelian Transpiration, Evaporation, and Productivity [STEP] model) [40]. The computed rainfall variables corresponded to the seasonal amount accumulated from the start to the end of the rainy season (PPTc) and the mean rainfall from the start to the end of the season (PPTm). All the agrometeorological images were re-sampled to a 1 km resolution in order to match the FAPAR metrics using a bilinear interpolation method.

Methods
The overall approach applied for elaborating the estimation models of herbaceous yield is presented in Figure 4. The rainy season onset date (SOSp) was calculated for each modeling grid cell using the criteria of at least 25 mm of rainfall received over a given dekad, followed by a total of 20 mm for the next two dekads as applied by [37] after [69]. The end-of-season date (EOSp) occurred when a climatological ratio between rainfall and potential evapotranspiration (i.e., PPT < PET/2) was observed [60]. SOSp and EOSp were in turn used to calculate the herbaceous vegetation WRSI. In addition to WRSI, SOSp and EOSp, we calculated 12 other agrometeorological variables for the growing periods of each year (Table A1), particularly actual evapotranspiration (AET), water deficit (WDEF) and water surplus (WSUR) that accumulated during each of the four herbaceous vegetation growth phases (illustrated in Figure 3): initial (i); vegetative (v); flowering (f); and ripening (r). The seasonal fractions assigned to these four phases were 23%, 39%, 23% and 15% respectively, (numbers taken from the Sahelian Transpiration, Evaporation, and Productivity [STEP] model) [40]. The computed rainfall variables corresponded to the seasonal amount accumulated from the start to the end of the rainy season (PPTc) and the mean rainfall from the start to the end of the season (PPTm). All the agrometeorological images were re-sampled to a 1 km resolution in order to match the FAPAR metrics using a bilinear interpolation method.

Methods
The overall approach applied for elaborating the estimation models of herbaceous yield is presented in Figure 4.

Explanatory Variable Selection for Herbaceous Mass Estimation
Over-fitting is a major drawback of machine learning methods [70] and can occur when there is a strong relationship between the explanatory variables. In order to avoid over-fitting, only relevant variables with no significant collinearity should be kept in a model. Redundant predictors can trigger substantial instability in a model's coefficients. Wrapper methods (e.g., backward and forward) are widely used in the search procedure for predictors' selection in environmental studies (e.g., [71][72][73]). As reported by [74] and [75] however, the principle of these methods lies in repeated hypothesis tests with the same data, which invalidates many of their statistical properties. In order to overcome this drawback, [76] proposed the recursive feature elimination procedure. This method is a sequential backward selection approach [77] that avoids refitting many models at each step of the search by including an importance-ranking criterion instead [75]. The recursive feature elimination algorithm available with R software [78] was used to identify less pertinent variables from the original dataset. In this algorithm, the random forest function [79] was used to compute the importance of the variables (i.e., mean decrease in accuracy when a variable is permuted) and rank them over subsets. The procedure's output with variables ranked from most to least important was then evaluated in order to eliminate interrelated explanatory variables [15], using the Variance Inflation Factor (VIF) indicator (Equation (5)). The least important variables with a VIF ≥ 10 [80] were eliminated one by one. This process was repeated until the VIF for all remaining variables was below 10.
where R²j is the correlation coefficient for the regression of each variable with the remaining ones.

Rule-Based Regression Tree and Model Building
The classification and regression trees (CART) approach, also called decision trees, is a data mining method used to identify patterns between a dependent variable and several independent predictor variables [81]. CART models can be used for either classification or regression, where classification trees predict classes and regression trees predict a continuous response [79,82,83]. We

Explanatory Variable Selection for Herbaceous Mass Estimation
Over-fitting is a major drawback of machine learning methods [70] and can occur when there is a strong relationship between the explanatory variables. In order to avoid over-fitting, only relevant variables with no significant collinearity should be kept in a model. Redundant predictors can trigger substantial instability in a model's coefficients. Wrapper methods (e.g., backward and forward) are widely used in the search procedure for predictors' selection in environmental studies (e.g., [71][72][73]). As reported by [74] and [75] however, the principle of these methods lies in repeated hypothesis tests with the same data, which invalidates many of their statistical properties. In order to overcome this drawback, [76] proposed the recursive feature elimination procedure. This method is a sequential backward selection approach [77] that avoids refitting many models at each step of the search by including an importance-ranking criterion instead [75]. The recursive feature elimination algorithm available with R software [78] was used to identify less pertinent variables from the original dataset. In this algorithm, the random forest function [79] was used to compute the importance of the variables (i.e., mean decrease in accuracy when a variable is permuted) and rank them over subsets. The procedure's output with variables ranked from most to least important was then evaluated in order to eliminate interrelated explanatory variables [15], using the Variance Inflation Factor (VIF) indicator (Equation (5)). The least important variables with a VIF ≥ 10 [80] were eliminated one by one. This process was repeated until the VIF for all remaining variables was below 10.
where R 2 j is the correlation coefficient for the regression of each variable with the remaining ones.

Rule-Based Regression Tree and Model Building
The classification and regression trees (CART) approach, also called decision trees, is a data mining method used to identify patterns between a dependent variable and several independent predictor variables [81]. CART models can be used for either classification or regression, where Remote Sens. 2016, 8, 668 9 of 23 classification trees predict classes and regression trees predict a continuous response [79,82,83]. We used the Cubist program [78], providing rule-based models using regression trees because the modeled herbaceous mass was continuous. Each tree was reduced to a set of conditional rules retrieved from the most important predictor variables [84]. These rules partitioned the independent variables (FAPAR metrics and/or agrometeorological products) into smaller groups and each of which was linked to a multiple linear regression model that predicted the dependent variable (herbaceous mass) [75]. This algorithm, based on if/then rules, is well suited for Sahelian ecosystems, which are characterized by high spatial heterogeneity in terms of soil type and fertility, rainfall mediated by run-off/run-on and species composition. The initial dataset was randomly separated into a 'training set' and a 'verification set' containing 70% and 30% of the data respectively. Then a simple check was made to ensure that all land cover classes are represented in the two sub-datasets. Model learning was done with the training set, using a boosting-like scheme called 'committees' [84], where each committee member corresponds to one regression tree. The number of committees was tuned by applying the commonly used 10-fold cross-validation in the model's performance estimation [85,86]. All instances of the initial dataset were used and the cross-validated root mean squared error (RMSE) applied in order to obtain the best value of the committee number. It should be noted that for each committee after learning, a subsequent member (multiple linear regression model) adjusts, for inaccuracies, in the prediction of the previous one and the final predicted value is a simple average of all predictions from the various members [75,81,84]. For comparative purposes, three cubist models were established: (i) the Vegetation Index-model (VI-model) including only FAPAR metrics; (ii) the Agrometeorological-model (AGRO-model) computed with agrometeorological variables; and (iii) the VIAGRO-model including both FAPAR and agrometeorological data.

Model Verification, Error Analysis and Yield Anomaly Computation
Model accuracy verification was performed using an independent set of samples that were not used (i.e., hold-out) for the training of the model. The Cubist models were trained using the training set (207 samples) and then used to estimate the herbaceous yield values of the verification set (90 samples). The validation accuracy was retrieved using the observed and predicted herbaceous yield of the verification set. The quality of the models was assessed by the RMSE and mean absolute error (MAE). The anomaly values were calculated pixel-wise by the ratio (in percentage) of the difference between the actual and long-term average of the herbaceous yield (or rainfall) and the 15-year long-term average.

Variable Selection and Model Development
After removing all the interrelated variables, a final dataset of 15 explanatory variables remained and were used for model development (Figure 5c). Three models were developed for estimating herbaceous yields using: (a) only FAPAR metrics (VI-model); (b) agrometeorological data (AGRO-model); and (c) both FAPAR metrics and agrometeorological data (VIAGRO-model). The set of variables used for each model is shown in Figure 5 and the model performance is shown in Figure 6.
The contribution of the input variables varied with each model. For the VI-model, the PEAK variable was the most important, followed by the right derivative (RDERIV) and EOS (Figure 5a). GSINT and SOS were the least pertinent variables as neither was used in model conditions or equations.
In the AGRO-model, the SOSp and water deficit during ripening (WDEFr) were the most important variables, whereas the least important were water deficit (WDEFv) and water surplus (WSURv) during the growth stage. The most important variable in the VIAGRO-model was PEAK, followed by RDERIV. The SOS had little or no importance in the VI-model and VIAGRO-model, unlike the SOSp, which played a major role in AGRO-model and VIAGRO-model. The validation statistics of the three models are given in Figure 6. The VIAGRO-model showed the best performance with R 2 = 0. 69

Spatio-Temporal Comparison of the Models' Output
All the established models provided herbaceous yield estimations with values increasing along a north-south gradient in the 2000-2015 period (Figure 7). The VI-model however, gave generally lower estimations than the two other models. Particular years, such as 2002 and 2014 when there was a considerable deficit in herbaceous mass production, were reflected well in both the VI-model and VIAGRO-model. The year 2010 however, was characterized by a high herbaceous mass production, which was clear in the AGRO-and VIAGRO-model estimations but less so in the VI-model. Overall, the VI-model underestimated high values, especially in the more southern regions; this drawback was corrected in the VIAGRO-model.

Spatio-Temporal Comparison of the Models' Output
All the established models provided herbaceous yield estimations with values increasing along a north-south gradient in the 2000-2015 period (Figure 7). The VI-model however, gave generally lower estimations than the two other models. Particular years, such as 2002 and 2014 when there was a considerable deficit in herbaceous mass production, were reflected well in both the VI-model and VIAGRO-model. The year 2010 however, was characterized by a high herbaceous mass production, which was clear in the AGRO-and VIAGRO-model estimations but less so in the VI-model. Overall, the VI-model underestimated high values, especially in the more southern regions; this drawback was corrected in the VIAGRO-model.

Spatio-Temporal Comparison of the Models' Output
All the established models provided herbaceous yield estimations with values increasing along a north-south gradient in the 2000-2015 period (Figure 7). The VI-model however, gave generally lower estimations than the two other models. Particular years, such as 2002 and 2014 when there was a considerable deficit in herbaceous mass production, were reflected well in both the VI-model and VIAGRO-model. The year 2010 however, was characterized by a high herbaceous mass production, which was clear in the AGRO-and VIAGRO-model estimations but less so in the VI-model. Overall, the VI-model underestimated high values, especially in the more southern regions; this drawback was corrected in the VIAGRO-model. As an indicator of the inter-annual variability of vegetation [33,87], the coefficient of variation (CV) of the herbaceous yield estimated by the three models over a 16-year period was used to assess the temporal variations in herbaceous yield across natural vegetation and agricultural land cover classes ( Figure 8). The inter-annual variations in herbaceous yield differed among the land cover classes. The AGRO-model showed the highest CV values, followed by the VIAGRO-model and the VI-model. For all the models, the HER class had the highest temporal variability, with an average CV of 17%, followed by SVO and AGR, with 14% and 13%, respectively. The lowest inter-annual fluctuations were observed in land cover classes with high herbaceous yields and woody cover, such as TVO (CV = 10%), TOC (CV = 9%) and SOC (CV = 9%).  Figure 9, the VIAGRO-model generally provided a blended estimation of the anomalies compared with the VI-model and AGRO-model. As an indicator of the inter-annual variability of vegetation [33,87], the coefficient of variation (CV) of the herbaceous yield estimated by the three models over a 16-year period was used to assess the temporal variations in herbaceous yield across natural vegetation and agricultural land cover classes (Figure 8). The inter-annual variations in herbaceous yield differed among the land cover classes. The AGRO-model showed the highest CV values, followed by the VIAGRO-model and the VI-model. For all the models, the HER class had the highest temporal variability, with an average CV of 17%, followed by SVO and AGR, with 14% and 13%, respectively. The lowest inter-annual fluctuations were observed in land cover classes with high herbaceous yields and woody cover, such as TVO (CV = 10%), TOC (CV = 9%) and SOC (CV = 9%). As an indicator of the inter-annual variability of vegetation [33,87], the coefficient of variation (CV) of the herbaceous yield estimated by the three models over a 16-year period was used to assess the temporal variations in herbaceous yield across natural vegetation and agricultural land cover classes ( Figure 8). The inter-annual variations in herbaceous yield differed among the land cover classes. The AGRO-model showed the highest CV values, followed by the VIAGRO-model and the VI-model. For all the models, the HER class had the highest temporal variability, with an average CV of 17%, followed by SVO and AGR, with 14% and 13%, respectively. The lowest inter-annual fluctuations were observed in land cover classes with high herbaceous yields and woody cover, such as TVO (CV = 10%), TOC (CV = 9%) and SOC (CV = 9%).  Figure 9, the VIAGRO-model generally provided a blended estimation of the anomalies compared with the VI-model and AGRO-model.  Figure 9, the VIAGRO-model generally provided a blended estimation of the anomalies compared with the VI-model and AGRO-model.

Season Onset/End Derived from FAPAR and Rainfall Data
As observed in Section 3.1, the onset/end metrics estimated from the FAPAR seasonal curve (indicating the plant growing season) and rainfall data (indicating the rainy season) played different roles in model establishment ( Figure 5), contrary to what was expected. The onset and end of the growing season (SOS and EOS from FAPAR) and of the rainy season (SOSp and EOSp from rainfall) are illustrated in Figure 10. The mean SOS and SOSp dates were delayed from the southern to northern land cover classes along the climatic gradient, reflecting the progression of the West African monsoon. With regard to the median values in boxplots, the growing season started in June for the three classes (SOC, TVO and TOC) located mainly in the south, whereas for the AGR, SVO and HER classes, the growing season started in July. The SOS occurred mainly in the same dekad as the SOSp for all land cover classes except HER, where it started about a dekad early (Table A2). The end of the growing season was concentrated between late October and November, being later towards the

Season Onset/End Derived from FAPAR and Rainfall Data
As observed in Section 3.1, the onset/end metrics estimated from the FAPAR seasonal curve (indicating the plant growing season) and rainfall data (indicating the rainy season) played different roles in model establishment ( Figure 5), contrary to what was expected. The onset and end of the growing season (SOS and EOS from FAPAR) and of the rainy season (SOSp and EOSp from rainfall) are illustrated in Figure 10. The mean SOS and SOSp dates were delayed from the southern to northern land cover classes along the climatic gradient, reflecting the progression of the West African monsoon. With regard to the median values in boxplots, the growing season started in June for the three classes (SOC, TVO and TOC) located mainly in the south, whereas for the AGR, SVO and HER classes, the growing season started in July. The SOS occurred mainly in the same dekad as the SOSp for all land cover classes except HER, where it started about a dekad early (Table A2). The end of the growing season was concentrated between late October and November, being later towards the south. On average, the EOS occurred about 2 dekads after the EOSp which occurred about late October. The EOS varied, occurring early for the HER class but late for TOC in November.

Linkage between Start of the Growing/Rainy Season and Annual Herbaceous Yield
This section analyzes the relationship between SOS and SOSp and the herbaceous yield. The range of the SOS and SOSp anomalies was narrow (−5 to +10%) compared with the herbaceous yield anomalies (−60 to +60%). A very weak and non-significant relationship (r = −0.30 and p = 0.28) was observed between the SOS and in situ herbaceous yield anomalies which were averaged over the study area (Figure 11a). In contrast, the meteorological variable SOSp had a highly significant relationship with observed herbaceous yield anomalies (r = −0.65 and p < 0.01) (Figure 11b). This was confirmed over all the land cover classes where SOSp achieved r = −0.74 for SVO, whereas the maximum r value for the SOS metrics was −0.47 registered for HER (Figure 11c). For all the land cover classes, there was no significant relationship (at the 95% significance level) between SOS and the observed herbaceous yield anomalies. The weakest relationships for the two metrics were observed for the open to closed classes dominated by either shrubs or trees (i.e., SOC and TOC respectively), being insignificant for both SOS and SOSp.

Linkage between Start of the Growing/Rainy Season and Annual Herbaceous Yield
This section analyzes the relationship between SOS and SOSp and the herbaceous yield. The range of the SOS and SOSp anomalies was narrow (−5% to +10%) compared with the herbaceous yield anomalies (−60% to +60%). A very weak and non-significant relationship (r = −0.30 and p = 0.28) was observed between the SOS and in situ herbaceous yield anomalies which were averaged over the study area (Figure 11a). In contrast, the meteorological variable SOSp had a highly significant relationship with observed herbaceous yield anomalies (r = −0.65 and p < 0.01) (Figure 11b). This was confirmed over all the land cover classes where SOSp achieved r = −0.74 for SVO, whereas the maximum r value for the SOS metrics was −0.47 registered for HER (Figure 11c). For all the land cover classes, there was no significant relationship (at the 95% significance level) between SOS and the observed herbaceous yield anomalies. The weakest relationships for the two metrics were observed for the open to closed classes dominated by either shrubs or trees (i.e., SOC and TOC respectively), being insignificant for both SOS and SOSp.

Model Development and Output Comparison
All three models gave a reasonable performance. The VIAGRO-model (including both FAPAR and agrometeorological variables) however, outperformed the models based solely on FAPAR metrics (VI-model) and agrometeorological variables (AGRO-model). Among the three models, the AGRO model had the weakest performance, which could be related to the fact that all applied variables are independent of internal factors (i.e., the influence of grazing) whereas all FAPAR metrics as well as the field data are influenced by grazing over time. All the models showed a coherent distribution of the estimated herbaceous yield, with values increasing along the north-south gradient, together with decreasing inter-annual variability. The saturation effect has always been a drawback in the optical vegetation products in relation to the high productivity of vegetation mass [88][89][90][91][92] and was consequently observed in VI-model. The simultaneous use of agrometeorological data and FAPAR metrics mitigated the saturation effect, being more sensitive to high greenness values. However, high biomass values (>3000 kg DM/ha) are still underestimated in the AGRO-and VIAGRO-models, which could be related to the few sites sampled in densely vegetated classes within the global dataset (see Table 1). Hence there is a need to establish more sites in southern Senegal in order to achieve greater accuracy of the herbaceous yield forecasting over the whole country. The agrometeorological data also reduced the discrepancy between herbaceous mass and FAPAR values,

Model Development and Output Comparison
All three models gave a reasonable performance. The VIAGRO-model (including both FAPAR and agrometeorological variables) however, outperformed the models based solely on FAPAR metrics (VI-model) and agrometeorological variables (AGRO-model). Among the three models, the AGRO model had the weakest performance, which could be related to the fact that all applied variables are independent of internal factors (i.e., the influence of grazing) whereas all FAPAR metrics as well as the field data are influenced by grazing over time. All the models showed a coherent distribution of the estimated herbaceous yield, with values increasing along the north-south gradient, together with decreasing inter-annual variability. The saturation effect has always been a drawback in the optical vegetation products in relation to the high productivity of vegetation mass [88][89][90][91][92] and was consequently observed in VI-model. The simultaneous use of agrometeorological data and FAPAR metrics mitigated the saturation effect, being more sensitive to high greenness values. However, high biomass values (>3000 kg DM/ha) are still underestimated in the AGRO-and VIAGRO-models, which could be related to the few sites sampled in densely vegetated classes within the global dataset (see Table 1). Hence there is a need to establish more sites in southern Senegal in order to achieve greater accuracy of the herbaceous yield forecasting over the whole country. The agrometeorological data also reduced the discrepancy between herbaceous mass and FAPAR values, with the VIAGRO-model having a smaller scatter of low values and high values closer to the 1:1 line (see Figure 6). The contribution and significance of the input variables varied among the models. In the VIAGRO-model and VI-model, the FAPAR PEAK was the most important variable, whereas SOS was the least important.
The relevance of PEAK in Sahelian grasslands has been demonstrated by [14,16]. The most important variable for the AGRO-model however, was SOSp and the least important was WDEFv. The latitudinal variation of the herbaceous yield estimated by the three models is mainly related to the dominant role of rainfall for vegetation growth along the north-south gradient. The variation of the rainfall onset as well as the mean annual rainfall along this gradient is a well-known characteristic of the Sahel. This is the basis of the transhumance system, where the herds move during the dry season from the north to the south, to take advantage of the more humid areas with available water and fodder biomass.
The HER land cover class, located mainly in the northern Sahelian regions had the highest temporal variability in herbaceous yield among all the land cover classes (see Figure 8). For this reason, anomalies in herbaceous mass were also the most pronounced in HER, particularly in extreme years (such as 2002 and 2010). Land cover classes with a higher herbaceous yield and annual rainfall, generally located in the south (i.e., SCO, TVO and TCO), had much lower inter-annual fluctuations. These results correspond with those reported by [33], who showed that the inter-annual variability in herbaceous yields increases as the climate becomes drier in accordance with latitude (i.e., to the north). The high temporal variability and magnitude of herbaceous mass anomalies (in percentage) in HER could also be related to the herbaceous species composition, which can vary greatly from one year to another. Species in HER are characterized mainly by annuals (Poaceae as Aristida mutabilis, Chloris prieurii and Dactyloctenium aegyptiacum) that are closely related to the annual rainfall and its intra-annual distribution [47]. These Poaceae species are also characterized by lower dry-matter production than species such as Andropogon amplectens (perennial) and A. pseudapricus (annual long-life cycle) prevalent in SCO and TVO areas, which have lower inter-annual mass fluctuations.

Model Applicability and Uncertainties
The three models can be used to estimate the spatial availability of herbaceous fodder at the end of each growing season across the Sahel in Senegal. The VIAGRO-model combines the advantages of both agrometeorological and FAPAR variables. The value of such a model compared with traditional assessments (i.e., empirical statistic relationship between plant production and FAPAR metrics) derives from the integration of agrometeorological information linking herbaceous yields with climate and biophysical processes, thus taking account of management intervention, soil water availability and species patterns. The VIAGRO-model is replicable in other Sahelian countries because the data used (i.e., satellite images and programs) is freely available and easy to access (see links in Section 2.2.). The models should be applied carefully however, because uncertainties caused by different collection dates of in situ data could lead to bias in their coefficients. Some dicotyledons, such as Zornia glochidiata, Alysicarpus ovalifolius and Tribulus terrestris, as well as grasses such as Tragus racemosus and Dactyloctenium aegyptium, shed their leaves before the end of the growing season (i.e., before the in situ data is collected). This can reduce the in situ herbaceous mass compared with the annual production peak. In addition, grazing during the growing season can also reduce the in situ herbaceous mass, particularly during the dry years where transhumant herds return earlier to the south, crossing the whole study area. The standard deviation of in situ data collected for the 24 monitoring sites during the 2000-2015 period are provided in Figure A1. Other sources of uncertainties could also be caused by the field herbaceous mass subsampling for the dry weight assessment and by the same Kc values applied for the whole classes, which may be not appropriate for densely vegetated areas. Future research can thus adjust Kc values to each land cover types based on average crop heights across Senegalese semi-arid areas and may lead to an adaptation of the GeoWRSI including a land cover mask to compute output variables with dedicated Kc values.

Management Implications of Models Results
The herbaceous yield estimated by the VIAGRO-model can be applied for the computation of both herbaceous production and anomaly detection per administrative unit across the whole Senegal. The biomass production could then be linked with the livestock number per unit to assess a prospective fodder balance which is useful to guide the management of rangelands and livestock movement [93]. To reduce the pastoral households' vulnerability in relation to livestock mortality, the livestock insurance domain is now being examined more closely in order to develop dedicated index based models [94,95]. However, classical NDVI based models have many flaws and do not realistically estimate the herbaceous biomass in shrub dominated areas which is important in correctly predicting livestock mortality. The VIAGRO-model shows good results across all Sahelian land cover classes and could be an important step towards an applicable insurance index.

Comparison of FAPAR and Rainfall-Based Onset/End Metrics
The onset/end of the rainy season marks a vital time for livestock managers, pastoralists and stakeholders in natural resource management in West African regions [96]. Several definitions of the onset of the rainy season have been proposed in literature in relation to the local onset of persistent rainfall [97][98][99]. With advances in remote sensing technology, other metrics of onset/end dates have been proposed for assessing the plant growing season. These metrics are generally retrieved using specific rules based on thresholds of the amplitude of seasonal vegetation indices and on certain rainfall amounts over given time periods. So far as we know however, no study has yet investigated the relationship between the onset/end metrics derived from satellite vegetation indices and from rainfall estimates in the West African Sahel. We have shown the dissymmetry of the onset/end of the rainy season along the Sahel bioclimatic gradient, with the onset date staggered over 3 months from May in the south to July in the north. The end date is spread over only 1 month, from late September in the north to early November in the south, which corresponds with observations by [48]. Our results also show that FAPAR based on SOS occurred at the same time as rainfall based on SOSp across the whole study area, except for HER land cover class where SOS was observed one dekad earlier (see Table A2). There are many explanations for this. The onset of the growing season or greening of plants follows the rains, except for woody plants that start foliation earlier in response mainly to an increase in air temperature and humidity, enabling them to produce green leaves before the first rains [100][101][102]. In addition, herbaceous vegetation is sensitive to low rainfall amounts, requiring less than 25 mm for the first dekad and 20 mm for the following two dekads to trigger the growth. The thresholds set for calculating SOS and SOSp could therefore be adjusted to better match the region's temporal dynamic. Apart from these botanical explanations, the reason could be related to data uncertainties. Almost no cloud-free optical satellite image is available at the start of season [59] and derived satellite data differ greatly from in situ measurements (e.g., NDVI) at this time [103]. Both the smoothing algorithm used for GEOV1 data (influenced by clouds) and the further smoothing via TIMESAT therefore add uncertainty to the FAPAR-based SOS calculation and our results indicate that this metric should be used with great caution at the annual scale.
Our results also showed that the onset dates of the rainy and growing seasons were more variable than their cessation, confirming the findings reported by [99]. The end date of both seasons occurs between late October and November for all land cover classes. The EOS is not entirely controlled by the end of the rains but depends also on the wilting of the herbaceous vegetation, which is determined by a biological clock (i.e., photoperiodicity) [15]. This was confirmed by our results where EOS dates, defined by the FAPAR reduction by 50% of the seasonal amplitude, occurred later than the cessation dates of the rainy season. The EOS delay is further influenced by crops that stay green longer, depending on their genetic features and soil tillage and by woody plants remaining green much longer, depending on their phenological behavior.

Early Assessment of Herbaceous Yield from Onset Metrics
For agricultural application, the SOS has attracted more attention from the scientific community and has been investigated in recent studies (i.e., [14,104,105]). It has frequently been used to investigate the land surface phenology trends in relation to climate variability [106]. After analyzing the SOS metrics against a proxy of biomass production in the Sahel however, [14] indicated that vegetation monitoring and biomass production forecasts should not be based on SOS in areas with non-significant correlation. Given that our study showed no significant relationship (at 0.05 p-level) between SOS and in situ herbaceous mass, this statement can be confirmed by our results and we do not recommend using this metric solely at the annual scale. The SOSp metrics however, were shown to have great potential for assessing herbaceous yield in the Sahelian region of Senegal. This was reflected in the VIAGRO-model where the SOSp variable was very important, confirming its superiority over SOS for detecting unfavorable rainfall conditions at the early stage of a growing season and therefore enabling early warnings of forthcoming risks of herbaceous mass deficits to be given to pastoral livestock managers and national stakeholders. Hence, even though the models presented here require a full season, this finding advances our knowledge towards an applicable early warning prediction [16].

Conclusions
Using the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) seasonal metrics computed with TIMESAT software and agrometeorological variables retrieved from the GeoWRSI program, we established three Cubist models for estimating herbaceous yield at the end of the season in the semi-arid areas of Senegal: (a) the VI-model with FAPAR metrics; (b) the AGRO-model with agrometeorological variables; and (c) the VIAGRO-model with both FAPAR and agrometeorological variables. All three models gave reasonable estimations of herbaceous yield over time and across land cover classes, among which those herbaceous areas with low woody cover showed the highest inter-annual variability and those in the south with higher woody cover showed lower variability over time. The VIAGRO-model gave the best estimation performance and indicated that the simultaneous use of agrometeorological data and FAPAR metrics improved the estimation accuracy and mitigated problems encountered with the sole use of FAPAR metrics (i.e., VI-model): (1) the saturation affecting optical remotely sensed vegetation data in areas of high vegetation productivity was attenuated; (2) the discrepancy between herbaceous mass and greenness (caused by species with high greenness and low mass production, or vice versa) was attenuated, with a weaker scattering around the low and high values closer to the 1:1 line (the additional use of agrometeorological data corrected for underestimations with the VI-model, particularly in sparsely vegetated areas); (3) the onset of the rainy season calculated from rainfall data was shown to be well suited for herbaceous mass assessment, in contrast to the onset of the growing season retrieved from FAPAR satellite data, which was not significantly related to herbaceous yields. Nevertheless, these metrics should be further investigated in order to improve our understanding of their temporal patterns and determine future setting parameters across Sahelian ecosystems. Table A1. Signification, unit and selection status after recursive feature elimination and variable inflation control of 17 agrometeorological variables provided by the GeoWRSI water balance model and used in the study. The 10 underlined variables correspond to those used for model development.