Sugarcane Yield Forecast in Ivory Coast (West Africa) Based on Weather and Vegetation Index Data

One way to use climate services in the case of sugarcane is to develop models that forecast yields to help the sector to be better prepared against climate risks. In this study, several models for forecasting sugarcane yields were developed and compared in the north of Ivory Coast (West Africa). These models were based on statistical methods, ranging from linear regression to machine learning algorithms such as the random forest method, fed by climate data (rainfall, temperature); satellite products (NDVI, EVI from MODIS Vegetation Index product) and information on cropping practices. The results show that the forecasting of sugarcane yield depended on the area considered. At the plot level, the noise due to cultivation practices can hide the effects of climate on yields and leads to poor forecasting performance. However, models using satellite variables are more efficient and those with EVI alone may explain 43% of yield variations. Moreover, taking into account cultural practices in the model improves the score and enables one to forecast 3 months before harvest in 50% and 69% of cases whether yields will be high or low, respectively, with errors of only 10% and 2%, respectively. These results on the predictive potential of sugarcane yields are useful for planning and climate risk management in this sector.


Introduction
The economy of Ivory Coast is strongly dependent on agriculture: in 2018, this sector accounted for 21.5% of GDP, was the source of nearly half of employment and accounted for 60% of the country's merchandise exports [1]. However, in the current context of climate change, the risks due to temperature rises and increased climate variability weigh heavily on Ivorian agriculture-especially because 95% of crops are rainfed in Sub-Saharan Africa [2] and are therefore highly dependent on climatic conditions. The climate in West Africa has changed significantly since the beginning of the 20th century. According to the last Intergovernmental Panel on Climate Change (IPCC) report, the temperature has increased by about 0.2 • C-0.5 • C by decade, with a high confidence level [3]. Moreover, the interannual variability of rainfall has increased and rainfall patterns have undergone major changes: heavy rainfalls and dry periods have been more and more frequent, with a high and medium confidence level, respectively [3]. In particular, Ivory economy compared to other speculations, the choice of a cash crop by a private company stems (i) from a request by SUCAF to improve the mainstreaming of climate information in their management and planning process to better adapt to climate change and (ii) from the findings of Vaughan et al. [18], who highlighted in a review of climate services studies on Africa that research has mainly focused on food crops and smallholders.
One of the objectives of the project is to be able to anticipate sugarcane production prior to harvest based on a set of cropping practice, meteorological and satellite data. There are few models available in the literature for the forecasting of sugarcane yields in West Africa compared to other geographies such as Australia, France, the United States, Brazil and South Africa. In those areas, mechanistic models such as APSIM [19] and CANEGRO [20] that simulate the biological characteristics of plant growth according to the climatic conditions (temperature, rainfall) have been used. Then, statistical models have been developed, ranging from ordinary least squares (OLS) regression [21,22] to more complex algorithms using machine learning such as the random forest method [23][24][25]. In parallel, new methods based on satellite data have emerged [26,27] enabling researchers to estimate yields over large areas at high resolution. In this study, a comparative analysis of linear regression and machine learning models for the forecasting of sugarcane yields has been carried out, using either meteorological data, satellite products or information on cropping practices at the Ferké 1 and 2 sites (14,600 ha) operated by the SUCAF company in northern Ivory Coast (West Africa).
This manuscript is organized as follows: Section 2 presents the study area and data used for this assessment. Section 3 describes the methodology with (i) an introduction on the trends and breaks in the data over the study area, (ii) a presentation of forecasting models involved and (iii) the strategy implemented to forecast yields at the plot level and over the entire sites using either meteorological data, satellite products or information on cropping practices. Section 4 shows the results and the discussions. Conclusion are presented in Section 5.

Studied Area
In the north of Ivory Coast, agriculture is composed of food crops (yam, maize, rice), annual cash crops (cotton, tobacco, sugar cane), perennial cash crops (cashew, mangoes, avocados) and livestock (cattle, goats) [28]. This region is characterized by a transitional tropical climate with average annual rainfall ranging from 1000 to 1300 mm ( Figure 1). This zone experiences a wet season from April to October with maximum rainfall in August. During the rainy season, temperatures vary from 20 • C to 30 • C. The Harmattan wind, from the Sahara, blows from November to March and carries a significant amount of dust. This induces a rapid heating of the air during the day and a sharp decrease in temperature during the night: temperatures then vary from 18 • C to 35 • C. This climate is favorable for the growing of sugarcane.
Sugarcane goes through four growth phases. The first is the tillering phase, which requires average humidity and high, but not extreme, temperatures [29,30]. Then, it enters the pre-growth phase, in which water requirements are higher. During the highgrowth phase, sugarcane requires a high quantity of water and high temperatures [31,32]. Finally, during the ripening phase, the drop in water supply and the large difference in temperature between day and night prevents the plant from flowering and encourages sugar storage [30,32].
Therefore, sugarcane planting takes place between October and March so that it can experience its period of strong growth during the wet season (from June to September) and its ripening phase between October and February, a month with scarce rainfall and a wide temperature range thanks to the Harmattan. Sugarcane is not replanted every year. After cutting, the ratoons regrow from their roots with a small loss of yield which increases with time [33]. The SUCAF Company administers the sugar cane plantations in Ferké 1 and 2 ( Figure 1). The total area of the plots is 14,600 hectares, with the production of nearly 1 million tons of sugarcane per year. Within these areas, 80% of the cultivated plots are irrigated. The slash-and-burn method, which consists of burning the fields before harvest to facilitate the work of farm workers, is still used. However, because of the environmental problems and the yield losses it causes, it is gradually decreasing in favor of mechanized harvesting.

Sugarcane Yield
We used yield data in tons of sugarcane per hectare collected by SUCAF at the plot level of Ferké 1 and 2 from 2008 to 2020. For each plot, the SUCAF database also contains information about the plot surface, details on the quality of the production such as sugar content and the percentage of internodes attacked by caterpillars, and information on cropping practices: harvest date, irrigation status, variety used, number of new plant regrowth, life cycle length.
The number of plots cultivated per year ranged from 182 to 349 at the Ferké 1 site, and from 189 and 346 at the Ferké 2 site (the exact geographical location of the plots was only available for the Ferké 2 area from 2011 to 2020). In total, the database consists of 7424 yield measurements acquired between 2008 and 2020. Among the data for these of 7424 plots data, only 5097 were complete, i.e., they contained usable values for each of the cropping practices and the quality of the plant.

Meteorological Data
Daily data on temperature, evapotranspiration, insolation and relative humidity are available for the two climate stations located at Ferké 1 and Ferké 2 and managed by  They have less than 1% missing values for the 2007-2019 period.  Rainfall data came from 27 rain gauges located at Ferké 1 (available from 2007 to 2019) and 24 at Ferké 2 (from 1999 to 2018). These rain gauges were those used by SUCAF to monitor the rainfall in the region. The administration and maintenance of these rain gauges is carried out by SUCAF with occasional support from the National Meteorological Office (Sodexam).
However, the collected data contained some missing values. To determine whether a given station was usable, the 10% threshold was used, a criterion generally argued to reject stations with too many missing values [35]. At Ferké 1, all stations had less than 10% missing values. At Ferké 2, only 4 stations had a very large number of missing values (nearly 80%) and were therefore removed from the analysis. The other 20 stations that met the 10% criterion were retained and the correlation weighting method was used to fill in the missing values if they existed. This consisted of replacing these values with the average of the other nearest stations, weighted by their correlation coefficient with this station. Teegavarapu and Chandramouli [36] showed that this method was both efficient and easy to implement to fill gaps in rainfall time series.

Satellite Data
Two types of satellite indices were used in this study, the NDVI (Normalized Difference Vegetation Index) and the EVI (Enhanced Vegetation Index). During photosynthesis, vegetation tends to absorb visible wavelengths (mainly in the green spectral zone) and to emit in the near-infrared region. The NDVI (Normalized Difference Vegetation Index) provides a measure of photosynthesis and ranges between −1 and 1 [37]. It is defined as: where NIR is the radiation in the near infrared region and VIS in the visible region. For sugarcane, NDVI is close to 0.2 at the beginning of the rainy season, then it increases to 0.7-0.8 at the end of the strong growth phase (during the maximum magnitude of the rainy season) and finally it decreases slightly during the ripening phase until the end of the rainy season [26].
As a comparison and based on the literature, another more recent vegetation index used was the EVI. This latter is defined as: where NIR (near infrared), Red, and Blue are the full or partially atmospheric-corrected (for Rayleigh scattering and ozone absorption) surface reflectance; L is the canopy background adjustment for correcting the nonlinear, differential NIR and red radiant transfer through a canopy; C1 and C2 are the coefficients of the aerosol resistance term (which uses the blue band to correct for aerosol influences in the red band); and G is a gain or scaling factor. The coefficients adopted for the MODIS EVI algorithm were L = 1, C1 = 6, C2 = 7.5, and G = 2.5 [38]. The EVI has greater sensitivity in areas of dense vegetation and reduces the influence of the atmosphere on the signal, providing less saturated results. NDVI is a measure of chlorophyll concentration, whereas EVI is a measure of structural variations in the vegetation cover. Both coefficients are related to biomass availability but the relationship with EVI is more direct than for NDVI [38].
The satellite data used come from the MODIS Vegetation Index product, freely provided by NASA (https://modis.gsfc.nasa.gov, accessed on 19 May 2021). They provide NDVI and EVI data at different resolutions over a 16-day period. The algorithm used retrieves the best available pixel over the 16-day period according to the following criteria, in order of priority: low cloud cover, low angle of view and the highest possible value [38]. We used 250-m resolution images over the entire Ferké 1 and 2 areas, which is the best spatial resolution that has been proposed. The images were retrieved every 16 days from 18 February 2000 to 19 June 2020. These data have already been used to estimate sugarcane yields in Brazil [39] and India [40].
Satellite data are strongly affected by cloud cover, shadows, weather phenomena and noise introduced by sensors [41]. Therefore, it is necessary to clean up these data by smoothing and filtering anomalies. A Whittaker filter is used here. This smoothing is based on the minimization of a cost function describing the balance between fidelity to the measured values and the robustness of the estimates among them. It is simple to use and depends on a single parameter (lambda) controlling the inclination of the balance [42]. The Whittaker filter is a commonly-used method for smoothing NDVI data [43]. It is particularly efficient in filling large data gaps, and offers good accuracy and smoothing performance in comparisons with other methods [41,42]. A lambda value equal to 10,000 was used, as it provides good performance in terms of accuracy and efficiency. Figure 2 shows the actual EVI provided by the MODIS Vegetation Index as well, as the filtered EVI obtained with the Whittaker filter. To explain sugarcane yields we have used the maximum of the NDVI and the EVI over the year. These variables measure the maximum value of biomass in the plot. We have also used the integral of NDVI and EVI, which measures the sum of biomass creation during the entire year. These variables have been used by Bégué et al. [26] to explain sugarcane yields. Figure 3 shows the annual mean value of the maximum and the integral of EVI over the 2001-2019 period.  Table 1. Table 1. List of the variables used in this study. The variables with * will be considered in the models using cropping practices and the quality of production because they have a significant effect on sugarcane yields.

Type of Variables Variables Description Unit
Cropping

Trends and Breaks in Rainfall Data
First, an analysis of the trends and breaks in the rainfall data collected at the Ferké 1 and 2 stations was conducted on the variables listed in Table 2. The analysis included rainfall data, temperature data and sugarcane yields. To measure the cumulative annual rainfall, the average rainfall on rainy days and the 12-month Standardized Precipitation Index were used [44]. The latter is an index ranging from −2 to 2 that statistically compares the monthly rainfall values for the year under consideration with the other years of the study. Then, the evolution of the mean, minimal and maximum temperature is studied. To detect breaks in the rainfall data, the Pettitt test was used [45]. This non-parametric test does not require any assumptions about the data. The probability of having a break in the year K among the temporal series is approximately calculated as: where T is the number of years. The presence of a trend in a period is assessed using the Mann-Kendall test. This nonparametric test is commonly used in the study of the long-term changes of climatological variables [46]. The null hypothesis of this test is that there is no trend in a data set (the data are independent and randomly distributed); otherwise, the second assumption is that there is a trend in the data. To observe whether this trend is positive or negative, we looked at Sen's estimator of slope, defined as: This coefficient is a good estimate of the slope (i.e., the linear rate of change) of trends in the time series.

Explanatory and Yield Forecating Models
Three types of forecasting models were selected from the literature (Table 3), in order to compare their performance: (i) based on climatic and cropping management data at the plot scale, (ii) based on climatic data and yields averaged on Ferké 1 and Ferké 2 and (iii) based on satellite data at the plot scale. For each of the different models, an analysis of the correlations between the variables under consideration and performance was undertaken. Then, a yield forecast tool implemented several months before harvest was constructed, based on the correlation results already obtained. The aim here was to understand and to explain how yields at the plot level are driven by climate variables and cropping practices. For this purpose, each climatic variable cumulated or averaged, depending on its intensive or extensive character, is considered at the different phases of plant growth: the tillering phase from 10 to 11 months before harvest, the pre-growth phase from 8 to 9 months before harvest, the strong growth phase from 3 to 7 months before harvest and the ripening phase from 1 to 2 months before harvest.
The main risk of error in the model is the multicollinearity in the explanatory climate variables. However, the use of a single variable among Tmin, Tmax, Trange and DegreJ, combined with another among Prec, ETP, RHmean and Rg, strongly limits the risk of multicollinearity, as the variables within these two groups are highly correlated. The calculation of the variance inflation factor (VIF) verifies the absence of multicollinearity. The VIF must be less than 10 for all the variables in the model [50].
Then, in order to conserve the most explanatory and meaningful model from these variables, the "stepwise" method was used. Stepwise regression is a variable selection method that tests the addition and deletion of variables at each step of the process. It is frequently used in regression models with a large number of variables [47,51].

Forecasting Model
The random forest algorithm [52] is a learning algorithm which can be used as a regression tool. It builds a large number of decision trees and provides the mean prediction of these decision trees, after being trained. The random forest package provided in R [53] makes possible the use of these algorithms. The number of trees used by default is 500 and the number of parameters observed at each node is equal to the number of variables divided by three. Since modifying these parameters does not have a significant impact on the results, the default values were retained. The yield forecasting model consists, for a given year, of training the algorithm over all the other years in order to obtain a forecast of this year. This method is equivalent to performing a cross-validation in preventing the risk of data for the training to be used for the forecasted year [54].
Two metrics are used to measure model performance, the coefficient of determination between forecasted and actual yields, which measures the explanatory power of the model, and the root mean square error (RMSE) defined as: which measures the error of the estimation.

Forecast of Mean Sugarcane Yields of Ferké 1 and 2 from Climate Variables Explanatory Model
To reduce the noise due to plot management, the same type of model used at the plot level was implemented, but taking into account the yields and climate variables at the whole site over the years. Once again, the "stepwise" method was used to select the most representative and significant model. At the scale of the whole site, it is not possible to use the information on cropping practices which depend on the plot considered.

Forecasting Model
Due to the small amount of data at the site scale, it is not possible to train a random forest model to make yield forecast. To circumvent this problem, the forecasting model carried out consisted of applying to each year the regression model constructed for all other years, in order to prevent the model from spatial autocorrelation. Then, the correlation between forecasted yields and actual yields was observed and measured using the coefficient of determination and RMSE.

Forecasting from Satellite Variables at the Plot Level
The exact spatial location of the plots was only available for the Ferké 2 area and for the harvest campaigns from 2011 to 2020. Therefore, the analysis will be restricted to this zone and to these years.

Explanatory Model
For each plot, the NDVI and EVI were taken as respectively equal to the mean filtered value of the set of pixels (250-m square) intercepting the geometry of the plot. The correlation between sugarcane yields and the maximum value of NDVI and EVI on the one hand, and the integral of NDVI and EVI over the whole year on the other hand, was then computed.

Forecasting Model
As in the first model, the forecasting model consists of training a random forest algorithm on the set of different years of the tested year, and makes a prediction for the tested year. The variables used to train this algorithm were the cropping practices and the satellite index restricted to the month of the prediction.

Comparison of the Models
In order to compare the different forecasting models, yields were classified into three terciles: low yield, medium yield and high yield. The ability of each forecasting model to place the yields in the correct category was then observed through a confusion matrix.
Three complementary metrics, not relying on prevalence, were then used to evaluate the performance of this classification. There are often used to assess prediction models of classification [55]: (i) accuracy, which measures the proportion of the correct forecast; (ii) sensitivity (the proportion of correctly identified points in a class); and (iii) specificity (the proportion of points correctly identified as not belonging to a class). In our case, the accuracy of the model measures its overall ability to correctly classify yields. The sensitivity of the high and low classes measures the model's ability to detect years with high or low yields. The specificity of these classes measures the model's ability not to consider the year's yields to be high or low when they are not. Table 4 shows all of the trends and change-points observed in our data. No changepoint was observed for all the different indices of rainfall. Regarding sugarcane yields, no trend was observed, as highlighted by Figure 4. There are significant increasing trends at the 5% threshold for Tmin and Tmax in Ferké 1 and 2 and at the 1% threshold for Tmoy in Ferké 1 and 2, these trends can be observed in Figure 5. For these indices, the Sen's slope is between 0.1 • C and 0.2 • C per year.   Several studies have analyzed extreme rains and droughts in West Africa. These studies used data over longer periods than ours (at least 50 years). However, the data used here have the advantage of being more recent, coming directly from rainfall stations and having a high spatial density. In agreement with Sacré Regis M. et al. [56], who investigated changes in precipitation during the last 30 years using the CHIRPS dataset (blending satellite products and rain gauge data), no significant trend of precipitation was found in our study area.

Trends and Change-Points in Data
Regarding temperature, statistically significant warming in Africa is considered to be evident in the literature [57][58][59]. Nevertheless, Barry et al. [57] calculated an increase in the annual maximum temperature of 0.27 • C per decade from 1981 to 2010 and an increase in the annual minimum temperature of 0.22 • C per decade on the same period, whereas in this study, an increase in both minimum and maximum temperature of approximately 1 • C per decade was found. This higher warming rate may be due to the shorter period of our study. Broadly speaking, the main limitation of this analysis is the sort period of data availability, which can lead to high imprecision in the trend level.

Explanatory and Yield-Forecasting Models
In this section, the results of the three different statistical models introduced in the Methodology are shown. First, the correlation between yields and a set of variables are computed (explanatory model) and then, a test of forecasting yields for a specific year is performed (forecasting model). In that case, we consider the data available only before a particular month. The model is trained on all but one year, on which we will try to forecast yields values; the process is repeated to predict yields for all years.

Explanatory and Forecasting Models Using Climate Variables and Cropping
Practices at the Plot Scale Explanatory Model Table 5 shows the stepwise model based on the plot-level regression of yields as a function of climate variables and cropping practices. The cropping practices included in the model are irrigation status, sugarcane variety, month of harvest, number of plant regrowth and length of life cycle. They are all significant at the 1% threshold and are not represented in the table. The model has variance inflation factors of less than 10 for all variables, which justifies the absence of multicollinearity. The model with the lowest Mallow Cp value has been selected, to keep the variables with the most significant relation with sugarcane yields. This model is not necessarily optimal; however, it is not far from it and can be used to draw interesting conclusions on the capacity of meteorological variables to explain variations in sugarcane yields at plot levels. As a main result, cropping practices explain 51% of the yields at the plot scale and adding climate variables into the model increases the coefficient of determination by only 3%. Thus, cropping practices play a greater part in explaining yields at the plot level than climate variables.
A relationship between rainfall and yields is often observed [60,61] but the model using the combination of rainfall and maximum temperature demonstrated the best performance in explaining the variations of sugarcane yields. Table 5 shows that during the pre-growth phase, rainfall negatively affects yields. Binbol et al. [47] and Humbert [30] also found that during the pre-growth phase, too much rainfall can affect plant development, especially when drainage is inefficient. Besides, the accumulated rainfall during the maturation period has the heaviest effect on yields. However, in this study, the relationship may be amplified when comparing the particularly wet year 2015 with the high accumulated rainfall during the ripening phase and the high yields of the same year, mainly due to good cropping practices. Indeed, according to SUCAF, irrigation management was particularly good that year, and the implementation of a new weeding method was very effective [62].
Maximum temperature also appears in some studies to be a good proxy of yields [51,63]. While increasing the maximum temperature is often considered beneficial for sugarcane yields [30,32,64], Deressa et al. [31] consider that temperatures above 35 • C can negatively affect plant growth. However, at Ferké 1 and 2, the average temperatures of the warmest month have always been above 35 • C between 2007 and 2019, and had reached 38 • C in some years. Then the negative coefficient linked to maximum temperature during the high-growth phase may be due to this quadratic relation between yields and maximum temperature.
Forecasting Model Figure 6 describes the values forecasted by the algorithm compared to the actual values. An average coefficient of determination for each forecast equal to 0.51 and an RMSE value equal to 14.28 tons per hectare was obtained. These results show that the forecasted values have a smaller standard deviation than the actual values, meaning that the algorithm was not able to detect conditions leading to extreme yield values.

Cropping Practices
The model with cropping variables explained 51% of yields at the plot scale, whereas considering only the climate variables, any model tested can explain around 10% of the yields. Adding climate variables to the model with cropping variables allows an increase of only 0.03 in the coefficient of determination to explain yields. This confirms the importance of information on cropping practices at the plot level and in the random forest model, Figure 7 shows that the importance of cropping practices greatly exceeds that of the climate variables. Thus, the climate variables provide little additional information to explain the variations in yields at the sugarcane plot level. Unrecorded cropping practices such as herbicides and fertilizers or planting and harvesting methods also have effects at the plot level. They can induce noise on yield values and, therefore, do not allow the statistical effect of climate variables to be properly observed.
Cropping practices frequently depend on the studies carried out. For example, irrigation status, which is the most important variable in our model (Figure 7), was never present in the previous studies because all plots studied were either irrigated or rainfed. Therefore, it is difficult to compare the effect and relative importance of irrigation practice found in this study with previous studies. However, the number of new plant regrowth is a cropping practice that is often taken into account because its effect on yield reduction is well known [65]. Bocca and Rodrigues [60] and Hammer et al. [24] found that the importance of this variable greatly surpassed climate variables in all of their models. Similarly, according to Ferraro et al. [66], cropping practices such as the farm, the variety, the life cycle length, the month of harvest and the geographic characteristics had a greater effect on yields than climate variables.
Nevertheless, cropping practices, especially those that do not depend on the biological characteristics of the plant (variety, number of regrowth, etc.), are local, and are inherently dependent on crop management. It is possible that in areas where management is standardized and industrialized, these cultural practices will have a weak effect. For example, in the model developed by Oliveira et al. [25] in Brazil, where sugarcane production is more standardized and industrialized than in Ivory Coast, the importance of meteorological variables outweighs those due to cropping practices.
In Ferké 1 and 2, cropping practices have clearly significant effects on yields. One of the limitations of our database is that it does not take into account some practices that, according to the SUCAF, have an influence on sugarcane yields. These include the use of the slash-and-burn method, weeding methods or good irrigation management. This lack of information accentuates the noise caused by cropping practices on yield values and makes it difficult to measure the effect of climate on these yields.

Explanatory and Forecasting Models Using Climate Data at the Scale of Ferké 1 and Ferké 2 Explanatory Model
The stepwise regression model showed that the most efficient model was the one which explained the yields based on the average minimum temperature during the year. This model explained 38% of the yields and this relationship was significant at the 1% threshold (Table 6). Figure 8 shows the relationship between yields and minimal temperature at the zone level.  In order to observe more precisely the influence of the minimum temperature on yields, four phases were studied: the tillering phase from January to March, the pre-growth phase from April to May, the high-growth phase from June to October and the ripening phase from November to December.
When analyzing the relation between minimum temperature and yields during these four phases, the minimum temperature during the tillering and the pre-growth phase explained the yield variations more significantly. Although Greenland [67] and Saithanu et al. [61] also found the positive effect of minimum temperature on yields during the year, Samui et al. [68] and Shrivastava [69] even observed that minimum temperature particularly influenced yields during the first three months after harvest, and thus during the tillering and pre-growth phases. During these phases, sugarcane does not tolerate extreme temperatures and needs high temperatures [30]. Our study confirms that in Ferké 1 and 2, minimum temperature is a key climate driver of the sugarcane yields especially during the pre-growth and the tillering phases.
Forecasting Model Figure 9 shows the results of the forecasting model. The minimum temperature during the tillering and pre-growth period is able to forecast 23% of the yield variability over the year studied. Thus, this model forecasts the yields as soon as the month of May, i.e., four months before the beginning of the harvest.  Table 7 details the result of the regressions of sugarcane yields according to the annual maximum of the NDVI, the NDVI integrated over the year, the annual maximum of the EVI and the EVI integrated over the year. The simple equation estimated is: where Y it is the yield value for the plot i at the year t, VI it is the value of the vegetation index and ε it is the classical error term. The four relationships are significant (p-value < 0.001) and explain 16%, 35%, 31% and 41% of sugarcane yields, respectively. The integrals of the vegetation indices were more effective in explaining yield variations than the maximum of these indices. This is consistent with observations made by Bégué et al. [26]. In addition, the integrated EVI is slightly better at explaining yields than the integrated NDVI. Son et al. [70] also found in their study on rice yield prediction that, while both relationships are highly significant, switching from NDVI to EVI allows a slight increase in the coefficient of determination.

Forecasting Model
When the integral of EVI is limited to a previous month, the decrease in the coefficient of determination is about 1% per month. After consultation with local experts, we consider the integral of the EVI from January to August in our forecasting model. This single variable explains 38% of the sugarcane yields at Ferké 2 between 2011 and 2020 with a high level of significance. Figure 10 describes the yields forecasted by the algorithm versus the actual yields for each year, after training a random forest regression algorithm over all other years and using the integral of EVI from January to August and the information on cropping practices. This model forecasts yields with a coefficient of determination of 62% and a RMSE equal to 13.37 tons per hectare. The standard deviation of the forecasted values is small compared to the actual values. Therefore, the algorithm is not able to forecast extreme yield values. Figure 10. Predicted yields vs. actual yields in Ferké 2 from 2011 to 2020 using a random forest algorithm fed by the integral of EVI from January to August and cropping practices. Table 8 shows the results of the categorization of yields after the application of the forecast algorithm for each year trained on all the other years. The metric used was the accuracy, which measures the percentage of correct classifications. The model using the integral of EVI was found to be more accurate with 65% of the correct classification. Table 9 shows the sensitivity and specificity for each yield category and for each model.  Table 9. Sensibility and specificity for high, medium and low sugarcane yields for the 3 forecasting models studied. All models have important specificities for both low and high classes. This means that they will rarely forecast that yields are low or high if they are not. The sensitivity for these same classes is relatively low for both models at the plot level; the algorithm will most often forecast average yields.

Models
From an operational point of view, the models should particularly avoid making errors on high and low yields, i.e., they must have a high sensitivity for these classes. Indeed, if the algorithm wrongly forecasts high yields for a specific year, it will have a high cost for the producer. The producer would have anticipated high yields and prepared for this (through the recruitment of additional farmers, investments due to a high-revenue year) and these costs would be useless. On the other hand, if a high yield year is not detected, and is considered as average, this simply makes the model inefficient and does not cause any additional cost.
The model that performed best was the one using the integral of the EVI from January to August and the cropping practices. This one showed greater accuracy and had a high specificity for the low and high classes: in only 2% of cases and 10% of cases, it forecasted that yields were respectively low and high when this was not the case. Moreover, its sensitivity was acceptable for the low class: it detected low yields in 69% of cases. It was, on the other hand, less efficient in detecting high yields; it detected them in 50% of cases.

Conclusions
The vulnerability of the African agricultural sector to climate change is a threat that requires the implementation of adaptation strategies. Climate services are part of these adaptation strategies in helping stakeholders of the agricultural sector to make decisions. According to the assessment carried out on Nationally Determined Contributions by the FAO and WMO, 85% of countries (100/117) identified "climate services" as being a fundamental element for planning and decision-making in the area of agriculture and food security [14].
In this context, the development of models for forecasting sugarcane yields could enable producers to improve crop management and to anticipate years that are more or less good in terms of climate variables and yields. In addition, such a study can inform the national weather service (Sodexam) on the most relevant variables to be forecasted and the appropriate climate products to deliver to meet the needs of sugarcane producers.
First, our analysis of rainfall data in our study area showed that despite positive coefficients of Sen for most of the rainfall stations, there was no significant trend in the evolution of accumulated rainfall and drought years. Regarding temperature, a significant trend was observed for the minimal, the maximal and the mean temperature in Ferké 1 and 2 at the 5% threshold. The warming trend was approximately 1 • C per decade for all the indices.
Then, the study of different forecasting models showed (i) that using only climate variables enabled us to forecast future yields with moderate performance (R 2 = 0.23 using minimal temperature) and (ii) that using satellite variables provided better results. The integral of a well-known satellite index (EVI), over the year contributed to explaining 43% of the year-to-year variability of yields. The resulting forecasting model allowed the correct categorization of 60% of the yields. This model can detect high or low yields in 50% and 69% of cases, respectively, while being wrong in only 10% and 2% of cases, respectively. However, it should be noted that these results only consider the Ferké 2 area from 2011 to 2020. It would be necessary to validate these results over a longer period.
The improvement of this model would henceforth require the use of more precise satellite images. Although MODIS data have the advantage of being easily accessible and free of charge, the resolution of 250 m leads to an overlap of several plots by a single pixel. Providing images at a resolution of 30 m, for example, could be an important source of improvement in the accuracy of our model. These modeling results from the "agro-climatic" Work Package of the CLIMSUCAF project will now have to be compared with those of the "socio-economic" Work Package, which aims to question sugarcane growers, owners of rainfed plots and SUCAF managers of irrigated plots on (i) the sensitivity of their crops to climate variables, and (ii) their needs in terms of weather and climate information for more resilient management and sustainable exploitation of sugarcane. The results of this study will notably help to guide surveys and discussions with local stakeholders, and the results of the surveys will in return help to refine the models, taking into account parameters that had not previously appeared relevant.