Cereal Yield Forecasting with Satellite Drought-Based Indices, Weather Data and Regional Climate Indices Using Machine Learning in Morocco

: Accurate seasonal forecasting of cereal yields is an important decision support tool for countries, such as Morocco, that are not self-sufﬁcient in order to predict, as early as possible, importation needs. This study aims to develop an early forecasting model of cereal yields (soft wheat, barley and durum wheat) at the scale of the agricultural province considering the 15 most productive over 2000–2017 (i.e., 15 × 18 = 270 yields values). To this objective, we built on previous works that showed a tight linkage between cereal yields and various datasets including weather data (rainfall and air temperature), regional climate indices (North Atlantic Oscillation in particular), and drought indices derived from satellite observations in different wavelengths. The combination of the latter three data sets is assessed to predict cereal yields using linear (Multiple Linear Regression, MLR) and non-linear (Support Vector Machine, SVM; Random Forest, RF, and eXtreme Gradient Boost, XGBoost) machine learning algorithms. The calibration of the algorithmic parameters of the different approaches are carried out using a 5-fold cross validation technique and a leave-one-out method is implemented for model validation. The statistical metrics of the models are ﬁrst analyzed as a function of the input datasets that are used, and as a function of the lead times, from 4 months to 2 months before harvest. The results show that combining data from multiple sources outperformed models based on one dataset only. In addition, the satellite drought indices are a major source of information for cereal prediction when the forecasting is carried out close to harvest (2 months before), while weather data and, to a lesser extent, climate indices, are key variables for earlier predictions. The best models can accurately predict yield in January (4 months before harvest) with an R 2 = 0.88 and RMSE around 0.22 t. ha − 1 . The XGBoost method exhibited the best metrics. Finally, training a speciﬁc model separately for each group of provinces, instead of one global model, improved the prediction performance by reducing the RMSE by 10% to 35% depending on the provinces. In conclusion, the results of this study pointed out that combining remote sensing drought indices with climate and weather variables using a machine learning technique is a promising approach for cereal yield forecasting.


Introduction
Climate change will affect global crop production [1,2] and threaten food security in several regions of the globe including the Mediterranean areas that have long been identified as a hot spot for climate change [3,4]. It has been shown that a 1 • C increase in temperature would lead to a drop of 6% in global wheat production for instance [1]. Besides the expected change of the average characteristics of climate, including temperature and precipitation, extreme events can further reduce crop production. Indeed, drought, the frequency of which is expected to increase in the future [5], can be responsible for a 10% to 35% loss depending on its intensity, timing and duration [6]. The southern Mediterranean countries and Morocco, in particular, are characterized by a strong interannual variability in precipitation amounts and distribution and recurrent droughts that mainly affect rainfed crops among which wheat dominates with more than 90% of cultivated areas [7]. Within this context, achieving food security, one of the key points of the Sustainable Development Goals [8], relies on a reliable monitoring system of wheat production [2]. An early and reliable forecast of the pre-harvest cereal yield in large areas would assist decision-makers in order to anticipate important needs, especially in countries such as Morocco that are not always self-sufficient [9][10][11][12]. It would also help to identify yield gaps and to better understand the wheat response to local climatic and edaphic conditions [9,13,14].
Besides the agricultural statistics based on sample observations in the field, the monitoring and forecasting of wheat yields are mainly carried out using empirical regressionbased models or crop growth models based on biophysiological processes [15]. The latter is able to describe crop growth and yield response to weather conditions, soil, and management practices [16] and can provide a good estimate of final crop yield when accurate values of input parameters and meteorological forcing variables are available; a strong drawback for southern countries considering the sparsity of the ground-based networks. Another limitation arises for seasonal forecasting in relation to the forcing meteorological data during the period between the forecast date and harvest time [17]. Seasonal weather forecasts either based on historical weather observation [18][19][20], on weather generators [21] or on climate model outputs [22] remain very uncertain. Given these limitations, the majority of the national agriculture departments use empirical regression-based models to forecast yield over large areas. These models rely on the use of some selected variables or indicators of environmental conditions (agrometeorological, and/or remotely sensed data) as independent variables to forecast crop yield [12,[23][24][25][26]. In addition, as the quantity and the quality of observed data have increased in recent years, these models forecast crop yield with reasonable accuracy [24,27].
Weather data have long been used to explain crop yield variability [27][28][29]. In this context, Sierra and Brynsztein [30] have used temperature and precipitation as predictors to forecast wheat yield up to 3 months before the harvest in Argentina; Giri et al. [31] have used several meteorological variables to predict wheat yield at the district scale in India. The models developed in this study had an R 2 range between 0.6 and 0.92, depending on district's location. Nevertheless, the performance of the models was lower for some districts, which may be due to other variables influencing yields such as the soil type or the practical management. More recently, many research works have focused on establishing a relationship between remote sensing indices and observed crop yield [10,[32][33][34]. The main advantage of using remote sensing observations in crop yield forecasting is that they allow the obtaining of information on a large scale, independent of territorial boundaries. The Normalized Difference Vegetation Index (NDVI) is one of the most widely used variables to forecast the final crop yield at a large scale [15]. For instance, the Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Very High-Resolution Radiometer (AVHRR) derived NDVI has been used to develop linear regression models to predict maize, wheat and rice yields up to 2 to 3 months before harvest [24,35,36]. Besides, other studies have used remote sensing drought indices, such as the Vegetation Condition Index (VCI) and the Temperature Condition Index (TCI) from AVHRR data, to forecast wheat yield in the United States [34], and soybean yield in Brazil, respectively [37]. Interestingly enough, it has been shown that the use of these drought indices to forecast crop yields in Spain outperformed models based on precipitation anomalies only [38]. This can be explained by the accurate detection of local drought conditions provided by these indices integrating information on climate and biophysical conditions when compared to indices based only on precipitation. In addition, the high spatial resolution of satellite products with regards to meteorological data provided by a coarse network of weather stations [38] may be an advantage, in particular for southern countries often characterized by sparse meteorological networks. Finally, several studies have also shown the impact of large-scale climate pseudo-oscillations on many components of the continental ecosystems including agricultural systems [39][40][41], and on the future monthly precipitation [41,42]. In light of this, large-scale climate indices and data, including El Niño Southern Oscillation (ENSO), the North Atlantic Oscillation (NAO) and Sea Surface Temperature (SST) have been used as predictors of crop yield in different regions over the world [41,[43][44][45]. In Morocco, in particular, wheat yields have been shown to be tightly linked to NAO value in December and to the leading mode of the SST in the tropical Atlantic [11]. In Australia, the large-scale climate indices related to ENSO have been incorporated into empirical models to predict wheat yield up to 3 months before the harvest [41]. Instead of using a single data source as predictors of crop yield, several studies have combined multi-source data to predict crop yield. For example, Cai and Sharma, [46] and Balaghi et al. [12] combined remote sensing data (NDVI) and weather data (rainfall and temperature) as predictors of rice and wheat yield in India and Morocco, respectively.
Most of the models developed in the previously cited studies are based on the classical Multiple Linear Regression (MLR) while the linkages between yields and potential predictors are likely to be non-linear. For this reason, non-linear machine learning algorithms have been employed to improve crop yield prediction [47]. Recently, several studies have examined the performance of machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF), eXtreme Gradient Boost (XGBoost), Artificial Neural Network (ANN) and Long-Short Term Memory (LSTM) for yield forecasting at county or province scales. They have used multi-source data as predictors, and they found that the non-linear machine learning methods showed a better performance for yield forecasting than the linear approach [48][49][50][51][52][53][54]. Schwalbert et al. [55] have used different machine learning algorithms (linear regression, RF and LSTM) to predict soybean yield at the municipality level in Brazil by using remote sensing data (NDVI, EVI, LST) and precipitation as predictors. They found that soybean yield can be forecasted with a mean absolute error of 0.42 t. ha −1 around 2 months before harvesting. Cai et al. [56] have also forecasted wheat yield in Australia by incorporating various predictors into SVM, RF and ANN algorithms. In their study, they used the enhanced vegetation index (EVI) from MODIS, solar-induced chlorophyll fluorescence from GOME-2 and several climate variables and they found that combining climate and satellite data achieved a high performance of wheat prediction (R 2 = 0.75). For Morocco, the existing literature on seasonal yield forecasting is limited. Balaghi et al. [12] proposed empirical linear regression models to forecast wheat yields up to 2 months before the harvest at a provincial and national scale. NDVI from AVHRR, rainfall sums and average monthly air temperatures were used. More recently, Lehmann et al. [45] have used SST and causal precursors from geopotential height anomalies at 500 hPa to forecast wheat yield anomalies at the country scale. Several studies have used the combination of multi-source datasets and machine learning algorithms to forecast crop yield including cereals [48][49][50][51][52][53][54]. However, the combination between remote sensing drought indices, weather data and climate indices as predictors of cereals yield has not been assessed yet.
In this context, the aim of this study is to investigate the potential of using machine learning for developing dynamic decision support systems for cereal production in Morocco, combining satellite-based drought indices, weather and climate data. Our specific objective is to develop empirical models that can forecast cereal yield early in the crop season (up to 4 months before harvest). More specifically, this study builds on previous work carried out in Morocco that highlighted biophysically sound linkages between wheat Remote Sens. 2021, 13, 3101 4 of 21 yields and weather data and climate indices (Jarlan et al. [11]) and between wheat yields and drought indices (Bouras et al. [10]). It also aimed to go further than Balaghi et al. [12] by analyzing the potential of climate and drought indices information to forecast yields earlier in the season and at a finer spatial scale than Lehmann et al. [45].

Materials and Methods
In this study, the target variable is cereal yield. The potential predictors are the satellite drought indices, weather data (rainfall and temperature) and climate indices derived from atmospheric and oceanic variables. In order to limit the number of agricultural provinces, a threshold of 90% of the national production was set: the 15 selected provinces corresponding to the most productive are displayed in Figure 1. The forecasting models are then built using the extensively used multi-linear regression approach and three nonlinear machine learning methods. An overview of the methodology is represented in the flowchart of Figure 2. Table 1 lists all the raw datasets with their sources. Table 2 displays the predictor variables derived from these raw datasets together with the time span of the year considered in the model based on biophysically sound linkages highlighted by previous studies. Table 1. Summary of the raw characteristics of the data sets used for yields prediction as well as yields "observations" information. All the datasets used for yields prediction are then averaged at the agricultural province and the monthly time scales (see text).  Air temperature ERA5 air temperature December [11,12] Rainfall ERA5 rainfall October-November and January-March [11,12] NAO Northern Hemispheric Teleconnection Patterns December [11] SCA Northern Hemispheric Teleconnection Patterns January [11] Atlantic Tripole SST February [11] Atlantic Niño SST October [11] Remote Sens.    compared to the model developed at a regional level based on a leave-one province-out approach.

Results
Results are organized around three sections dedicated to: (1) the assessment of the best combination of predictor datasets; (2) the performance of the seasonal forecasting models as a function of the lead times before harvest; and (3) the evaluation of the addedvalue of developing a model separately for each group of provinces.

Choice of Input Data sets
In order to identify the best combination of input data among the satellite-based drought indices, the weather data and the climate indices, the seasonal forecasting models of cereal yields were developed using the different combinations of input data within the season, about 1 month prior to harvest in April, by considering all available predictors from October to April (Table 2). All the provinces are considered to build a so-called global model. The statistical metrics for the different combinations of input datasets and for the different methods are reported in Table 3. The results presented in Table 3 show that the

Study Area
Morocco is a North African country ( Figure 1) with a semi-arid climate influenced by the Atlantic Ocean, the Mediterranean Sea and the Sahara [42]. Precipitation in Morocco is characterized by its strong spatiotemporal variability and a rainfall season-extending through winter and spring from November to April, which coincides with the cereal growing season. The northern part of Morocco receives higher amounts of precipitation which can rise to 900 mm while the center of the country is marked by low amounts of precipitation below 350 mm. Similarly, the high spatial variability of the temperature is noted. The regions located at high elevation (the high Atlas Mountains) are marked by low temperatures when compared to other regions in the country [10,57]. Cereals are the main rainfed crop occupying up to 90% of the rainfed usable agricultural area in Morocco. The early sowing takes place in November if significant precipitation occurs at this time, while the sowing can be extended to January in the case of delays in precipitation. Late sowing usually leads to a lower production compared to early sowing, due to both a decrease of the cropped areas because a large part of farmers will not seed if precipitation arrives late in the season and because the last stage of the season corresponds to periods of high temperature that may hamper yields [9]. Harvest takes place generally around the end of May.

Yield Data
Soft wheat, barley and durum wheat are the main types of cereals cropped in Morocco. Data on cereal crop productions and harvested areas over the study period 2000-2017 at the administrative provincial scale were gathered by the Economic Services of the Ministry of Agriculture in Morocco (https://www.agriculture.gov.ma/, accessed on 31 July 2021). The yearly cereal yield used in this study for each province was calculated as the ratio of the total crop production by the total harvest areas. The average cereal yield over the study period ranged from 0.7 t. ha −1 to 2.2 t. ha −1 depending on the province. The 15 selected provinces were classified into four groups with similar cereal yield interannual variability using a k-means algorithm based on the correlative distance [58]. More details about the cereal yield data and classification are provided in Bouras et al. [10].

Satellite-Based Drought Indices
Agricultural drought affects both vegetation and soil, the characteristics of which can be monitored by remote sensing observation [59]. We selected three extensively used satellite-based drought indices: the Vegetation Condition Index (VCI), the Temperature Condition Index (TCI) [60] and the Soil Moisture Condition Index (SMCI) [61]. The VCI, TCI and SMCI are the normalized anomalies of NDVI, Land Surface Temperature (LST) and soil moisture (SM), respectively. While the VCI is related to vegetation density and activity, the TCI is related to the thermal stress of vegetation and the SMCI describes soil moisture drought as it is based on soil moisture anomalies in the first centimeters. These indices were widely used in agricultural drought monitoring [10,38,62,63]. Bouras et al. [10] have analyzed the linkages between these indices and cereal yield at the provincial scale in Morocco. Their results have shown that the VCI in March and April during the heading stage of wheat is highly correlated to cereal yield. For TCI, the highest correlation with cereal yield was observed around the development stage in January-February. Finally, SMCI was found to be connected with cereal yield earlier at the beginning of crop season during the emergence stage (December-January).
The VCI was calculated with NDVI from MODIS (MOD13A2 collection 6). The VCI compares the currently observed value of NDVI to the minimum and maximum NDVI values observed during a study period. As such, the VCI lies between 0 and 100, with a low VCI value associated with below-normal vegetation development while above-normal vegetation development is indicated by a high VCI value. TCI was computed in a similar way to VCI but using the LST from MODIS LST (MOD11A1 collection 6). The high TCI values indicate low temperatures, then favorable climatic conditions, while lower values of TCI reflect unfavorable conditions with high temperatures. The SMCI is a normalization of soil moisture. SMCI lies between 0 and 100; the lower values indicate unfavorable soil moisture conditions (very dry), and the higher values indicate favorable conditions (very wet). In this study, we used the SM COMBINED version 4.2. product provided by the European Space Agency Climate Change Initiative (ESA CCI) [64].

Weather Data
The linkage between cereal yield and weather data, including rainfall and temperature over Morocco, has been analyzed in previous studies [10][11][12]. The main results of these studies are: for rainfall, a positive correlation with cereal yields was observed in November-December. When the rainfall is abundant during this period, it will speed up plant emergence and increase cropped areas as already highlighted. Concerning temperature, a positive correlation with cereal yields was observed in December and January during the early stage. Low temperatures during this period cause poor emergence and reduce the number of ears leading to lower yields. By contrast, a negative correlation with cereal yield was observed in March meaning that high temperatures should be avoided during the grain filling stage occurring at this time of the year [65]. Weather data including air temperature at 2 m. above land surface and rainfall were extracted from the ERA5 re-analysis data set [66].

Climate Data
In a previous study, Jarlan et al. [11] have analyzed the relationship between provincialscale wheat yields in Morocco and large-scale climate. Significant correlations have been found between yields and the NAO in December (negative sign), the Scandinavian Pattern (SCA) in January (positive sign) and the leading modes of SST on the northern hemisphere ("Atlantic tripole" mode) in February (negative sign) and on the equatorial Atlantic (the socalled "Atlantic Niño" mode) earlier in the season in October (positive sign). In this study, we evaluate the potentiality of introducing this climate information in addition to satellite drought indices and weather data to predict cereal yields. The NAO and the SCA are part of the Northern Hemisphere Teleconnection Patterns [67]. These indices are distributed with a monthly time scale by the Climate Prediction Center (https://www.cpc.ncep.noaa.gov/, accessed on 31 July 2021). In addition to these atmospheric indices, the monthly sea surface temperature (SST) leading modes are computed from monthly NOAA SST v2 at 0.25 • resolution throughout the study on a North Atlantic window (20 • and an Equatorial Atlantic window (20 • S-20 • N, 80 • W-20 • E) for the "Atlantic tripole" and the "Atlantic Niño", respectively, following Jarlan et al. [11].

Machine Learning Methods for Cereal Yield Forecasting
In order to build the seasonal forecasting models, we relied on Multiple Linear Regression, and three non-linear machine learning algorithms extensively used for crop yield prediction [47], which are: Support Vector Machine (SVM), Random Forest (RF) and eXtreme Gradient Boost (XGBoost). The scikit-learn package in Python 3.7 [68] was used in this study.

Multiple Linear Regression
In Multiple Linear Regression (MLR) [69], the dependent variable y is linearly related to multiple independent variables x i = 1, . . . , n as: where y in this study is the predicted yield, x i (i = 0, . . . , n) are the satellite-based drought indices, the weather data and/or the climate indices and a i (i = 0, . . . , n) are the regression coefficients.

Random Forest (RF)
The RF algorithm was introduced by Breiman (2001) and is used for both classification and regression. RF for regression is an ensemble of multiple decision trees regression model; each tree provides its prediction and the optimal prediction of RF obtained by averaging the prediction of all decision trees regression in RF [70]. The RF-based model follows three steps to provide the optimal predictions. In the first step, the dataset is split into data subdivisions. In the second step, each data subdivision is used to develop a single decision tree representing a sub-regression model that gives its prediction. In the last step, predictions of all decision trees are averaged to provide the final prediction. The hyper-parameters that need to be tuned in the RF algorithm are the number of trees or the number of regression trees, the number of features to consider when looking for the best split, and the maximum depth of the tree. These hyper-parameters were tuned with a grid search method, described in Section 2.7.

Support Vector Machine (SVM)
The Support Vector Machine (SVM) was originally developed to solve classification problems and it was extended to solve regression problems, namely support vector regression (SVR) [71]. The SVM algorithm uses kernels [72]. By relying not only on the minimization of the distance to training data (the training error or empirical risk) but also by trying to limit the model "complexity" (i.e., to search a function as flat as possible: the structural risk), SVR may have, a priori, better generalization capacity (i.e., for data not contained in the training set) than MLR. The SVM regression-based model passes through two steps. In the first step, by using the kernel function, which can be linear or non-linear depending on the relationship between the independent (=Crop yield in our case) and dependent variables, the independent variables (remote sensing drought indices, weather and/or climate indices in our case) are transformed from the original space to a high-dimensional feature space. In the last step, a linear model is built by the new derived feature space to minimize the errors [73]. The SVM algorithm based on the Radial Basis Function (RBF) (the most popular choice in the literature) had two hyper-parameters: the penalty factor C aiming to find a trade-off between the fitting error and the model "complexity", and the kernel width gamma [74]. The SVM hyper-parameters were tuned with a grid search method.

eXtreme Gradient Boost (XGBoost)
eXtreme Gradient Boost (XGBoost) is a machine learning algorithm proposed by Chen and Guestrin (2016), derived from the Gradient Boosting Machines (GBM) [75,76]. The basic principle of the approach is to consider a set of weak learners (with high error) that are combined to develop a new stronger learner (with low error) through the introduction of training additive strategy. The main idea of boosting methods is to use the negative gradient direction of the model loss function, which was established previously, and then iteratively improves the accuracy of the model [77]. The hyper-parameters the number of gradients boosted trees, the maximum tree depth and the learning rate were tuned using the grid search method.

Model Evaluation
To select the best hyper-parameters of the ML algorithms, the comprehensive grid search (GS) was used in this study, to examine all possible combinations of the hyperparameters, and cross-validation (CV) was used to assess the performance of the model [78]. In GS, a set of values was attributed for each hyper-parameter and a set of trials was formed by assembling every possible combination of values. The evaluation was performed using k-fold cross-validation. The CV is the most employed technique for algorithm selection and evaluation, due to its simplicity and ability to avoid over-fitting [79][80][81][82]. In k-fold crossvalidation, the training data are randomly divided into k subsets and the hold-out method is repeated k times, such that each time, one of the k subsets is used as the validation set of the model constructed using (k − 1) subsets [82]. To evaluate the performance of the developed models, widely employed statistical metrics were used in this study. The coefficient of determination (R 2 ) reflects the degree of linear relationship between the observed and forecasted cereal yields. The mean absolute error (MAE) indicates the percentage of the average deviation of the forecasted yield from the observation. The Root Mean Square Error (RMSE) measures the discrepancy of forecasted yield around observations.
where O i is the observed yield, F i is the forecasted yield by the machine learning algorithm, O and F are the averages of the observed and predicted yields and n is the number of samples used for the machine learning model.

Experiment Design
The identification of rainfed cereal areas is an important point in order to obtain values of the remote sensing drought indices representative of rainfed cereal conditions at the scale of the agricultural province. The identification of rainfed cereal areas was carried out based on the joint use of two land cover maps: ECOCLIMAP-II at a 1 km resolution [83] (https://opensource.umr-cnrm.fr/projects/ecoclimap/wiki, accessed on 31 July 2021) was used to determine the cereal areas and the land cover map provided by the Climate Change Initiative (CCI) land cover project of the ESA [84] (https://www. esa-landcover-cci.org/, accessed on 31 July 2021) at 300 m resolution was used to isolate the rainfed fields as described in Bouras et al. [10]. Remote sensing drought indices were then aggregated to the agricultural province by a simple average of these pixels identified as rainfed cereal areas. Weather data were also averaged at the scale of the agricultural province but without considering the rainfed cereal mask because of their coarse spatial resolution of about 25 km (Table 1).
Multicollinearity between the predictors variables is well known to increase the variance of the coefficients for MLR. This can limit the generalization capability of the MLR models as well as hamper the interpretation of the coefficients. In this study, no method to reduce data redundancy was applied because a pre-selection of the time span of the year considered for each predictor was carried out based on previous studies to limit collinearity. Nevertheless, the prediction metrics of the MLR models could probably be improved by applying methods to reduce collinearity such as Principal Component Analysis.
Four experiments were then designed. The first experiment was constructed to identify the best combinations of input datasets among the satellite-based drought indices, the weather data and the climate indices that will reach the high performance in forecasting final cereal yield in Morocco. For this reason, all machine learning algorithms were applied using the different combinations of available input data collected from October to April (see Table 2): (i) Satellite-based drought indices only; (ii) Satellite-based drought indices and weather data; and (iii) Satellite-based drought indices, weather and climate data. The second experiment was conducted in order to assess the performance of the models as a function of the lead time before harvest from 4 to 2 months. In this experiment, we used the best combination of inputs data, determined from the first experiment, and the input data were collected from October to January, October to February and from October to March, based on Table 2, to build the forecasting model in January, February and March, respectively. Then, the performance of machine learning models was evaluated in March, February and January which corresponds to 2, 3 and 4 months before the harvest. In addition, the importance of each input data point was computed using the best machine learning algorithm in order to assess the contribution of each input data point for each lead time of prediction. The third experiment was designed to test the practical performance of the developed models. For this reason, the predictions are performed using a "leave-one year-out" approach consisting in predicting the yield value of one year using all the other years data to train the model (for instance, yield in 2017 is predicted based on a model trained using the 2000-2016 database). Finally, the last experiment was designed to assess the performance of using specific models developed separately for each group of provinces with regards to one global model used in the previous experiments. In this experiment, the accuracy (RMSE) of the global model developed for all provinces was compared to the model developed at a regional level based on a leave-one province-out approach.

Results
Results are organized around three sections dedicated to: (1) the assessment of the best combination of predictor datasets; (2) the performance of the seasonal forecasting models as a function of the lead times before harvest; and (3) the evaluation of the added-value of developing a model separately for each group of provinces.

Choice of Input Data Sets
In order to identify the best combination of input data among the satellite-based drought indices, the weather data and the climate indices, the seasonal forecasting models of cereal yields were developed using the different combinations of input data within the season, about 1 month prior to harvest in April, by considering all available predictors from October to April (Table 2). All the provinces are considered to build a so-called global model. The statistical metrics for the different combinations of input datasets and for the different methods are reported in Table 3. The results presented in Table 3 show that the statistical metrics of the model improve with the increase of the number of datasets used for prediction. In addition, all statistical metrics are improved when adding a dataset and this is also true for all the tested methods. The results showed that the yield variability is reasonably explained with satellite-based drought indices only, with R 2 values ranging from 0.67 (for MLR) to 0.81 (for XGBoost) and RMSE from 0.66 t. ha −1 (for MLR) to 0.44 t. ha −1 (for XGBoost). By combining satellite-based drought indices and weather data, the performances of all models are improved by 2-7% for R 2 and by 25-30% for RMSE. The best statistical metrics are obtained by combining the three datasets with a further improvement of the statistical metrics by about 11-45% for RMSE and 4-10% for R 2 depending on the used method. This means that climate indices such as the Northern Hemisphere Teleconnection Patterns (NAO and SCA) and the main modes of SST variability in the Atlantic contributes to improving the model performances. In addition, when comparing the different methods, the non-linear machine learning approaches (RF, SVM and XGBoost), outperformed the linear approaches (MLR) as already shown by various authors when applied to seasonal predictions of yields [53] and streamflow [85]. This clearly reflects that most of the relationships between yield and the considered predictors are non-linear and that the non-linear methods can obviously better capture these relationships than the linear method. Finally, the best algorithm for yield forecasting in our study is XGBoost, which predicts the yield with R 2 = 0.95 and RMSE = 0.20 t. ha −1 . This finding was corroborated by several studies for seasonal yield forecasting that showed a better performance of XGBoost when compared to other nonlinear machine learning approaches such as SVM and RF [52]. Interestingly enough, this model fits the forecasting error threshold usually accepted in European agro-statistics that is of about 0.20 t. ha −1 [86]. In the next section, the combination of satellite drought indices, weather data and climate indices are considered to predict cereal yield for several lead times before harvest. Table 3. Statistical metrics of the forecasting models for several input data combination and for the 4 methods in April (1 month before harvest). All available input data from October to April were used (see Table 2). The metrics are computed for the 15 provinces.

Model Performance as a Function of Lead Time before Harvest
In this section, the performance of the forecasting models using the three datasets are evaluated as a function of the leading time prior to harvest from January to March (from 4 to 2 months before harvest). The RMSEs and R 2 of the models are plotted as a function of the lead time in Figure 3 to investigate the prediction accuracy. In addition, the relative importance of each dataset is reported in Figure 4 using the XGBoost algorithm as the method providing the best statistical metrics for all lead times.
from January to March at Figure 3. The best method whatever the lead time is XGBoost as already shown, followed closely by RF based approaches. The models based on XGBoost explain 88%, 92% and 96% of yield variability (RMSE of 0.41, 0.34 and 0.22 t. ha −1 ) for forecasting from January, February and March, respectively. By contrast, the poorest results are obtained with MLR with a strong gap of metrics with regards to the non-linear machine learning approaches (R 2 is below 0.75 for MLR while the correlations for the nonlinear methods are above 0.90).
While a slight improvement of the model metrics is observed when going from January to February, considering predictors in March leads to a significant jump in the metrics with an RMSE close to the international standard of 0.20 t. ha −1 for the XGBoost method and, to a lesser extent, for the RF model. This is probably related to the very high correlation between NDVI around the crop development peak in March and wheat yields that were already shown by various authors [11,87] giving a large weight to VCI at this time. The dominating importance of the satellite drought indices in March for the model based on XGBoost supports this assumption (Figure 4).  Table 2).
Other striking comments can be made by analyzing the importance of the three datasets ( Figure 4): (1) the weather data dominates largely in January and, to a lesser extent, in February, while a strong shift is observed in March when satellite drought indices take the lead over the two other datasets. This is in agreement with the already observed high correlation between yields and precipitation around emergence in October and November and between yields and temperature in December during the tillering stage [11]; (2) Likewise, the importance of climate indices decreases with the lead time and their contribution is the lowest of the three datasets apart from in January when it contributes to 20% like the satellite drought indices. Indeed, the highest correlation with yields was found in December and January for NAO and SCA, respectively, while the correlations with the SST leading modes peak in October and February for Atlantic Niño and Atlantic Tripole modes, respectively. In addition, linkages between climate indices are, in particular, based on SST, and yields occur through teleconnection, meaning that the relationships are very All the available predictor variables at the time of prediction were used (see Table 2).
te Sens. 2021, 13, x FOR PEER REVIEW 14 of indirect. This means that when good quality precipitation and temperature data are ava able, they should be preferred to climate indices as they provide more direct informati on growing conditions; (3) satellite drought indices play a dominating role in early p diction in March only when they contribute up to 73% to the prediction accuracy. Nev theless, a significant contribution is observed in February (35%) and in January (20%). Th is because VCI and TCI were found to be significantly correlated to final yields in Janua and February and because SMCI is significantly related to yields as early as Octob around the emergence stage [10]. Indeed, the high moisture in the upper soil layers at th time facilitates the emergence and significant rainfall event during October-Decemb promotes the farmer to seed, leading to an increase in cereal production [10,12].  Table 2.
In order to assess the practical performance of the developed models to predict yie in Morocco, a "leave-one-year-out" experiment, mimicking the practical forecasting co ditions of a manager who wants to predict yields for the season to come based on t historical dataset, is tested. Figure 5 presents the average of the observed and the p dicted yields using the three non-linear methods (MLR was excluded with regards to poorest performance) for a lead time from 4 to 2 months before harvest. As already hig  Table 2.
The closer to harvest the forecast is carried out, the better the performance metrics as shown by the increase of the correlation coefficient and the drop of RMSE when going from January to March at Figure 3. The best method whatever the lead time is XGBoost as Remote Sens. 2021, 13, 3101 13 of 21 already shown, followed closely by RF based approaches. The models based on XGBoost explain 88%, 92% and 96% of yield variability (RMSE of 0.41, 0.34 and 0.22 t. ha −1 ) for forecasting from January, February and March, respectively. By contrast, the poorest results are obtained with MLR with a strong gap of metrics with regards to the non-linear machine learning approaches (R 2 is below 0.75 for MLR while the correlations for the non-linear methods are above 0.90).
While a slight improvement of the model metrics is observed when going from January to February, considering predictors in March leads to a significant jump in the metrics with an RMSE close to the international standard of 0.20 t. ha −1 for the XGBoost method and, to a lesser extent, for the RF model. This is probably related to the very high correlation between NDVI around the crop development peak in March and wheat yields that were already shown by various authors [11,87] giving a large weight to VCI at this time. The dominating importance of the satellite drought indices in March for the model based on XGBoost supports this assumption (Figure 4).
Other striking comments can be made by analyzing the importance of the three datasets ( Figure 4): (1) the weather data dominates largely in January and, to a lesser extent, in February, while a strong shift is observed in March when satellite drought indices take the lead over the two other datasets. This is in agreement with the already observed high correlation between yields and precipitation around emergence in October and November and between yields and temperature in December during the tillering stage [11]; (2) Likewise, the importance of climate indices decreases with the lead time and their contribution is the lowest of the three datasets apart from in January when it contributes to 20% like the satellite drought indices. Indeed, the highest correlation with yields was found in December and January for NAO and SCA, respectively, while the correlations with the SST leading modes peak in October and February for Atlantic Niño and Atlantic Tripole modes, respectively. In addition, linkages between climate indices are, in particular, based on SST, and yields occur through teleconnection, meaning that the relationships are very indirect. This means that when good quality precipitation and temperature data are available, they should be preferred to climate indices as they provide more direct information on growing conditions; (3) satellite drought indices play a dominating role in early prediction in March only when they contribute up to 73% to the prediction accuracy. Nevertheless, a significant contribution is observed in February (35%) and in January (20%). This is because VCI and TCI were found to be significantly correlated to final yields in January and February and because SMCI is significantly related to yields as early as October around the emergence stage [10]. Indeed, the high moisture in the upper soil layers at this time facilitates the emergence and significant rainfall event during October-December promotes the farmer to seed, leading to an increase in cereal production [10,12].
In order to assess the practical performance of the developed models to predict yield in Morocco, a "leave-one-year-out" experiment, mimicking the practical forecasting conditions of a manager who wants to predict yields for the season to come based on the historical dataset, is tested. Figure 5 presents the average of the observed and the predicted yields using the three non-linear methods (MLR was excluded with regards to its poorest performance) for a lead time from 4 to 2 months before harvest. As already highlighted, the statistical metrics improve when going from January to March but the models predict yields with reasonable accuracy as early as January. Beyond the average statistical metrics, the ability of the forecasting models to predict extreme values is another important feature of seasonal prediction. Within this context, the ability of the models to predict classified anomalies instead of absolute value is analyzed by partitioning the production in terms of below normal (average minus one standard deviation), normal and above normal (average plus one standard deviation) production. Like the statistical metrics, the extreme anomalies are better predicted when the lead time decreases, as one anomaly is correctly detected by the models for a prediction from January (2006)(2007) while all significant anomalies are properly reproduced by the three methods with a slightly better ability of the SVM approach at the expense of some false detection (such as in 2008-2009 when SVM predicts above normal production).

Model Performance at a Regional Scale
Cereal yields are dependent on many factors, such as weather conditions, local management and soil type, while the importance of each factor varies from one region to another. [88]. Therefore, the high variability of crop yield from one season to another is also marked from province to province. To investigate the added value of developing a specific model for each group of provinces separately, Figure 6 compares the RMSE of the predicted yield using a global model and a local model with different lead times using a leave-one-out approach for each province. Only the XGBoost algorithm is retained as it exhibited the best metrics in the previous sections. The results illustrate that the performance of yield prediction improved when the lead time decreases, as already highlighted, and that the metrics show a high variability from one province to another. The use of a "regional" model improved the RMSE whatever the provinces and the lead time: the RMSE values decrease in January by about 4% to 13% and by 12% to 32% in February and by 12% to 36% in March depending on the province. Interestingly enough, the better per-

Model Performance at a Regional Scale
Cereal yields are dependent on many factors, such as weather conditions, local management and soil type, while the importance of each factor varies from one region to another [88]. Therefore, the high variability of crop yield from one season to another is also marked from province to province. To investigate the added value of developing a specific model for each group of provinces separately, Figure 6 compares the RMSE of the predicted yield using a global model and a local model with different lead times using a leave-one-out approach for each province. Only the XGBoost algorithm is retained as it exhibited the best metrics in the previous sections. The results illustrate that the performance of yield prediction improved when the lead time decreases, as already highlighted, and that the metrics show a high variability from one province to another. The use of a "regional" Remote Sens. 2021, 13, 3101 15 of 21 model improved the RMSE whatever the provinces and the lead time: the RMSE values decrease in January by about 4% to 13% and by 12% to 32% in February and by 12% to 36% in March depending on the province. Interestingly enough, the better performances are obtained for some provinces that are known to be mostly covered by rainfed cereals, such as the ones located along the Atlantic coast (El Jadida JD, Settat ST and Khmisset KM), highlighting the problem of the scale mismatch between the typical size of the fields in the Mediterranean agriculture (<5 ha) and the coarse scale of the input predictor variables.

Discussion
The proposed approach is based on a combination of three different datasets to for cast grain yields. From these datasets, the predictors were carefully selected based on st tistically significant correlations with grain yields and biophysically sound mechanism as explained in previous studies [10][11][12]. Atmospheric and oceanic indices are taken proxies of local temperature and rainfall conditions. Interestingly enough, the impact oceanic circulation on local weather can be not concomitant because of the remote natu of these phenomena occurring though teleconnections processes, such as for the Atlan "El Niño" mode of SST variability. This explains why climate indices are important pr dictors for very early forecasting in January. Some authors have even based their mod ing experiment on large scale climate information only to forecast yield, including Le mann et al. [45] who showed that climate data (NAO, SST) could be used for the ear forecasting (from December) of wheat yield anomaly at the country scale in Morocc Nevertheless, the direct use of temperature and rainfall data should be preferred to the substitutes when gridded data of good quality exist, as shown by the dominating im portance of weather variables for forecasting in January and February. Later in the seas in March, when the crops are developed, drought indices providing a direct informati on the cover density, health and hydric status obviously took the lead with regards to t other two data sets. In brief, the satellite drought indices are a potential predictor of cere yield when the forecasting is done close to harvest, while weather data and climate indic

Discussion
The proposed approach is based on a combination of three different datasets to forecast grain yields. From these datasets, the predictors were carefully selected based on statistically significant correlations with grain yields and biophysically sound mechanisms as explained in previous studies [10][11][12]. Atmospheric and oceanic indices are taken as proxies of local temperature and rainfall conditions. Interestingly enough, the impact of oceanic circulation on local weather can be not concomitant because of the remote nature of these phenomena occurring though teleconnections processes, such as for the Atlantic "El Niño" mode of SST variability. This explains why climate indices are important predictors for very early forecasting in January. Some authors have even based their modeling experiment on large scale climate information only to forecast yield, including Lehmann et al. [45] who showed that climate data (NAO, SST) could be used for the early forecasting (from December) of wheat yield anomaly at the country scale in Morocco. Nevertheless, the direct use of temperature and rainfall data should be preferred to these substitutes when gridded data of good quality exist, as shown by the dominating importance of weather variables for forecasting in January and February. Later in the season in March, when the crops are developed, drought indices providing a direct information on the cover density, health and hydric status obviously took the lead with regards to the other two data sets.
In brief, the satellite drought indices are a potential predictor of cereal yield when the forecasting is done close to harvest, while weather data and climate indices are the key variables for earlier forecasting of cereal yield. Other variables could also be considered to improve the models' skills. Soil type and management practices (water harvesting techniques, complementary irrigation, fertilizing inputs, planting dates etc.), for instance, are key factors for crop growth. While information on management practices is difficult to consider because of a strong variability from one farm to another (apart from the planting date, see below), large scale spatial patterns of soil type could be extracted from global soil maps such as soil grid [89]. For instance, considering information on the soil type could probably improve the performance metrics of the models on Safi, characterized by shallow and pebbly soils with a poor nutrient content, which are significantly lower than its surrounding coastal provinces, such as El Jadida and Settat (see for the global models Figure 4a-c).
The scale mismatch between the scale of the fields (lower than 5 ha) and the coarse resolution of the predictor variables (at best 1 km for the remote sensing drought index) is an important issue as already highlighted. The use of higher spatial and temporal resolution remote sensing data, such as Landsat and Sentinel, could thus improve the performance of the models developed in this study for those provinces with very heterogeneous land cover. For further studies at the field scale, higher resolution products, such as surface soil moisture derived from Sentinel-1 data [90,91] and Sentinel-2 NDVI, should be considered. In addition, cereal yields may be related to other factors that were not considered in our study, such as planting date, soil properties, local climate conditions and other management aspects [92]. In particular, the planting dates can shift the growing season with regards to the average growing period from November to May considered in this study. Local climate conditions can also shift the cereal season. For instance, the milder temperature conditions encountered in the Beni-Mellal province, located in the foothills of the Atlas, shift the cereal season by about one month with a harvest occurring in June on average while May is usually the harvest time for the provinces located in the plain (most of the provinces of our study area). This means that the considered time span of the year of the predictor variables (December for temperature, for instance, see Table 2) could not be optimal for all the provinces because of this time shift. A potential refinement of the models would thus be to consider the optimal time span of the predictor variables for each province or each group of provinces separately (for the last experiment considering a specific model for each group of provinces) instead of the same time period used in this study.
Finally, a last more general question arises about the model generalization to different crops and sites. In this study, the predictors were selected according to both the timing of the crop season and to the key phenological stages of wheat and companion cereals such as barley. As the timing of the crop season is similar for wheat that is usually cropped in winter in the whole north African area, it could be expected that the time span of local predictors, such as weather variables and drought indices, should be close for the other Maghreb countries. By contrast, the impact of oceanic and atmospheric indices on local climate may be different from one region to another. For instance, Tramblay et al. [93] found NAO to be related to rainfall in Morocco and Algeria while Tunisian rainfall was more correlated to the Mediterranean Oscillation (MO; [94]). Ouachani et al. [95] highlighted that ENSO could be a driving pattern of precipitation in Tunisia through teleconnections. This means that the use of other indices, proxies of climate pseudo-oscillations, should be considered for the development of forecasting models for other sites. Likewise, the forecasting of yields for other crops will require a different choice of predictors and their associated time spans according to their key phenological stages. For instance, maize is known to be relatively drought tolerant during the grain filling stages on the contrary to wheat [96]. By contrast, water deficit early in the season around seedling may hamper maize from complete recovery, even with full irrigation, during the vegetative growth stages [97]. This may have critical implications for the choice of the time span of the remote sensing drought indices.

Conclusions
Crop yield forecasting provides critical and timely information to enable farmers to make quick decisions to increase yields through improving agricultural practices during the growing season. In addition, it allows the modeling of global and local market prices [98]. The main objective of our study was to develop an approach to forecasting cereal yield in Morocco based on multi-source data and machine learning techniques. To this objective, this study presents a methodology based on different machine learning approaches (MLR, SVM, RF and XGBoost) to predict the cereal yield over Morocco for several lead times prior to harvest using freely available datasets including satellite-based drought indices, weather data and climate indices. Our results show that combining satellite-based drought indices, weather and climate data as predictors of cereal yield provided a better forecasting accuracy than using any single data source. In line with our results, several studies pointed out that the use of multi-source data increases the accuracy of the machine learning model for yield prediction [52,54,56]. XGBoost outperformed the other machine learning techniques by explaining 93% of yield variation at the scale of the country (RMSE = 0.23 t. ha −1 ). Interestingly enough, the value of RMSE is close to the acceptance threshold of 0.2 t. ha −1 used in European agro-statistics [86]. It is also shown that the prediction accuracy increases as more observations along the growing season are added for all machine learning algorithms. Finally, the development of models at the regional level for each group of provinces improved the skills of yield prediction with regards to one "global" model applied to all provinces by decreasing the RMSE by about 4% to 36% depending on the province and the time of prediction, which is due to the high variability of cereal yield from one province to another.
The results presented in this study clearly showed that combining satellite-based drought indices, weather and climate data integrated into machine learning algorithms is a promising approach to forecasting cereal yields in Morocco. Moreover, the proposed approach provides a source of timely information needed for decision making during the growing season. In addition, this work could be used to map gain yield and yield gap at a provincial scale across Morocco. Then, the province with hotspots in terms of yield gap could be targeted for practice improvement and further research works.