Comparison of Climate Reanalysis and Remote-Sensing Data for Predicting Olive Phenology through Machine-Learning Methods

: Machine-learning algorithms used for modelling olive-tree phenology generally and largely rely on temperature data. In this study, we developed a prediction model on the basis of climate data and geophysical information. Remote measurements of weather conditions, terrain slope, and surface spectral reﬂectance were considered for this purpose. The accuracy of the temperature data worsened when replacing weather-station measurements with remote-sensing records, though the addition of more complete environmental data resulted in an efﬁcient prediction model of olive-tree phenology. Filtering and embedded feature-selection techniques were employed to analyze the impact of variables on olive-tree phenology prediction, facilitating the inclusion of measurable information in decision support frameworks for the sustainable management of olive-tree systems.


Introduction
Monitoring the timing of phenological events is key to understanding the effects of climate change on agroecosystems [1], and scheduling appropriate control strategies. Phenological information can be derived through the direct ground observation of crop development, though this is a time-consuming and labor-intensive process [2]. Proximal sensing techniques (e.g., phenocams) may allow for retrieving accurate phenological data on the canopy scale. Satellite time series can be conveniently used to determine vegetation indices and monitor phenological phases on the regional level [3]. Vegetation indices, such as the normalized difference vegetation index (NDVI) and the enhanced vegetation index (EVI), were used to infer crop growth, and changes in vegetation dynamics and land cover [4,5]. These indices provide useful radiometric and biophysical information for land-surface characterization using different spectral-saturation effects [6], allowing for the study of phenological variation over large spatial and temporal scales [7]. The use of multitemporal data has proven useful for monitoring and characterizing spatial and temporal patterns of crop cover changes, with improved accuracy compared to that of a single date [8]. Both NDVI and EVI are important indicators to estimate the start of the growing season, SOS [9][10][11], or other phenological metrics [10].
Forecasting phenological crop phases is a valuable tool for crop management and decision making, particularly for yield prediction and growth modeling [2,12]. Temperature is the most important variable for the explanation of phenological changes [13]. The inclusion of temperature in process-driven models allows for determining thresholds below which physiological processes stop [14]. Models based on the thermal-time concept are widely used to predict phenological events with respect to thermal accumulation above a base temperature, generally calculated in hourly or daily (i.e., growing degree days, GDD) time steps up to a fixed amount that is crop-and variety-dependent [12,14,15]. However, GDD-based models are not sufficient for predicting crop phenology, as they are intrinsically uncertain and speculative [16]. Additional meteorological, geographical, and vegetation parameters can better capture environment-induced variability. Accurate predictive phenological models depend, therefore, on data availability and quality, in addition to robust modeling methods.
Due to the different observation scale, satellite or proximal sensing approaches and ground-based surveys have hardly provided successful information on the seasonal course of phenological events. Indeed, phenotypic plasticity determines a wide range of phenological responses [17]. In order to merge ground-based phenological observations and satellite-derived vegetation indices, statistical models based on meteorological parameters provide a potential solution [18,19]. In particular, machine-learning techniques may improve the accuracy of phenological models [20]. Dense meteorological networks and crop simulation models can be combined to predict phenological crop events at high resolution and apply sustainable management practices. Machine-learning techniques were previously applied to crop-yield optimization and predictions of plant phenology [16]. Indeed, machine-learning techniques have proven useful in selecting appropriate environmental variables and identifying important features, thus improving the accuracy of models aimed at predicting ecological phenomena [21]. In particular, decision-tree models may allow for numerous variables to be managed, in comparison with parametric models, addressing issues related to multicollinearity [22].
Although climate change and crop production are strongly interconnected, the timing of phenological events and the prediction of their trajectories in a climate-change scenarios, as well as effects on crop production are still uncertain for olive trees. The prognosis of flowering [23,24] and yield [25], and monitoring the phenology of olive trees, as well as pests and diseases [26] are crucial tools to modeling the impact of climate change on this crop. In olive trees, a relation between phenological phases and climatic variables was found [27,28], which is also associated with topographic features [29,30]. The impact of both environmental framework and agronomic management on olive-tree phenology was analyzed in several contributions [6,31,32].
Marchi et al. [33] used comprehensive regional phenology and an agrometeorological database of the Tuscany region (Italy), implementing a simple phenological model for olive trees. Afterwards, Oses et al. [34] adjusted the random-forest method with an optimized temperature base, considering as input the phase-specific cumulative GDD and the day of year (DOY). However, olive-tree agroecosystems across the Mediterranean region represent a complex landscape mosaic, and models based only on temperature are likely to reveal widely ranging temperature sensitivities. Here, we propose a novel approach based on vegetation-index accumulation quantities as input features to predict olive-tree phenology, which further improves previous phenological models for olive trees [27,28]. The daily sampling period of such data plays an essential role for the creation of highly correlated features to phase phenology. Other novel features, such as the plant-phenology index described in [9] (and its respective accumulation), were tested without successful results. We first considered distinct datasets and derived features. The quality of the random forest proposed in [34] was contrasted with the temperature dataset provided by different agencies (weather stations vs. Copernicus data).
Then, filter and embedded feature-selection techniques [35] were employed to analyze the impact of different feature sets on phenology prediction. In addition, a reduction in input variables was pursued to simplify model complexity [36] and the final physical interpretation of the model [37]. Lastly, multiple machine-learning models were compared to select the most accurate forecasting model with the most compliant feature sets.

Materials and Methods
A flowchart of the model is shown in Figure 1. Entire methodology adopted in this study on machine-learning process for olive-tree phenology prediction model. Flowchart provides a graphical cue for the sequence of steps in the modeling process.

Original Data
Olive-tree phenology observation data were obtained from the monitoring network established in Tuscany (Italy), consisting of 21, 19, and 18 olive-tree orchards in 2008, 2009, and 2010, respectively. Olive-tree orchards were selected for the proximity to one of the agrometeorological stations of the regional network and for being representative of the most suited areas for the production of high-quality olive oil in the region in both coastal and inland conditions. For each olive-tree orchard, 5 olive trees belonging to the Frantoio cultivar, in good health conditions and with homogeneous productivity, were selected in the core area of the orchard, excluding those at the edge. Phenological observations were carried out by trained field technicians that visited the orchards at different timespans throughout the year according to the main developmental stages of the olive tree. A simplified BBCH scale was used to assess the phenological stage for each observation (the abbreviation derives from the names of the originally participating collaborators: Biologische Bundesanstalt, Bundessortenamt, und CHemische Industrie).
Weather data, used for the phenological baseline model [33], were obtained from 19 different weather stations of the regional agrometeorological network of Tuscany. There are 822 phenological observations in the dataset that belong to 22 different olive-tree orchards, with an average of 34 phenological observations per location and an average of 274 phenological observations per year.
In addition to temperature, other climate variables were used, namely, precipitation, which may improve phenology prediction [38,39]; and vegetation indices, namely, NDVI and EVI. These indices can also be used to predict other phenological parameters [10]. We considered climate and MODIS-derived features to improve the prediction of phenological events based only on temperature, as in [34,40].
Three diverse datasets were utilized to calculate predictors for the target feature, the phenological phases of the olive tree. The first dataset was the weather registry provided by the open-access ERA5 dataset, the latest generation of ECMWF atmospheric reanalysis (Copernicus Climate Change Service, 2017). Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset. ERA5 provides hourly data, distributed in a fixed grid and with a native resolution of 9 km. Climate reanalysis data are available through the interface of Google Earth Engine (GEE) Data Catalog [41].
The second dataset, satellite and timeless dataset, contains the slope label provided by Conservation Science Partners. As a geophysical characteristic, together with the latitude and longitude of the olive-tree orchards, the slope (or type of landform) was considered as an input feature.
The last dataset corresponded to diverse collections of MODIS ( see the Data Source column of Table 1), the Moderate Resolution Imaging Spectroradiometer imaging sensor launched by NASA. The data products of MODIS are also available via the interface of GEE Data Catalog. In this case, the data provider was NASA LP DAAC 3.
All used parameters as features or for modeling purposes are reported in Table 1.

Created features
GEEcumt Growing degree day from GEE temperature measurements; t is base temperature used.
Cum precipitation Precipitation accumulated from the first of January until DOY.
EVIcum EVI accumulated from the first of January until DOY.
REDcum RED accumulated from 1 January until DOY.
NIRcum NIR accumulated from 1 January until DOY.

Data Preprocessing
The predicted variable is the phenological phase considering a simplified BBCH scale for the olive tree. The BBCH scale uses numerical codes to label the different phases; however, since the distance between numeric labels bears no relevance, it cannot be considered to be a numerical variable. A ranking of the BBCH scale for the olive tree was generated, and a new variable ranking the development label for each data point was created. The latter rank variable was used as the target variable in the prediction models. When regression models were applied, the output was discretized (with round function) to obtain an integer value. This phenological parameter is predicted for the day of year (DOY), representing the day of observation.
ERA5 provides aggregated values for each day of the used ERA5 climate reanalysis parameters described in Table 1. Following the Allen methodology [34,42], growing degree days (GDD) were calculated using the single sine method to approximate hourly temperatures given the daily minimal and maximal temperatures from ERA5, a base temperature, and no upper cutoff. The first day for starting heat accumulation was 1 January. The GDD values computed from ERA5 data are denoted by GEEcumt, with t being the base temperature. Lastly, we created the precipitation-accumulation feature, denoted by "cum precipitation". All accumulation-derived features were measured from 1 January until the day of the field observation of the target variable.
Considering the daily measured enhanced vegetation index (EVI) and normalized difference vegetation index (NDVI) that were used for forest metrics [10], we derived the cumulative sum for each day. Due to the increasing nature of the BBCH output, the accumulative strategy was applied to different MODIS features, namely, the ones that could explain the phenological events. The accumulation of EVI or NDVI is denoted by "EVIcum" and "NDVIcum", respectively. For the red and NIR surface reflectance, we computed the corresponding cumulative sum, represented by "REDcum" and "NIRcum".

Splitting of Dataset and Evaluation Metrics
For each experiment, the dataset of phenological data was divided into training (65%) and validation (35%) disjoint sets, obtained with stratified sampling depending on the different years and locations. For each model, we created a grid of model parameters, with the parameters of each model being optimized by cross-validated grid search over these parameter grids. The employed corroboration strategy was stratified 5-fold crossvalidation, completed with the training set. The different experiments were compared according to the root-mean-square error (RMSE) metric on the validation set. The model with the lowest RMSE was the most accurate. We used the RMSE metric of regression because the classes were strictly ordered, and RMSE correctly penalized higher deviances of the predicted values from the observed values more than lower deviances.

Baseline Model and Base-Temperature Optimization
The baseline model used as a benchmark for this study was random forest because, in recent studies [40], the random-forest model had better accuracy than that of the original GDD baseline technique detailed in [34,40]. Random-forest operates by constructing several decision trees during training, and outputting the most voted class as the prediction of all the trees. We used the scikit-learn package to build a pipeline with various machine-learning tools. More precisely, decision-tree-based methods such as random forest correspond to the module of ensemble methods, which combines predictions of several base estimators in order to improve generalizability and robustness over a single estimator. Following the available documentation in https: //scikit-learn.org/stable/modules/ensemble.html#id6, accessed on 15 February 2020, the common and distinct characteristics of the random-forest classifier and regressor methods are explained. Afterwards, the random-forest regressor method was selected due to having more accurate results. In both cases, the random-forest method allows for decreasing variance in the forest estimator. Unfortunately, this reduction in variance can result in a slight increase in bias. In order to avoid overfitting, and ensure the stability and efficiency of the model, parameter estimation using grid search with cross-validation was performed over the following parameters and values: number of trees (number of estimators, 100, 200, 300), the minimal number of samples required to be at a leaf node (2, 3, 5, 7), maximal depth of the tree (7,9,11,13,15), and minimal cost-complexity pruning (0, 0.0001, 0.001, 0.01, 0.1). As demonstrated in [34,40], the selection of the base temperature for GDD calculation plays an important role in determining model accuracy. A new value was proposed for the base temperature in order to optimize olive phenology prediction below the original base temperature set at 10 • C. Indeed, at lower temperature values, the RMSE of the model improved. This was tested using temperature data from weather stations and from ERA5 dataset, that is, GDD and DOY as input features. To this end, the RMSE of the baseline model was computed for different base-temperature values (from 0 to 14 • C).

Feature Engineering
Exhaustive feature analysis was conducted to select the most meaningful feature combinations using the parameters reported in Table 1. The reduction in input variables contributed to simplifying the model complexity [36] and enhanced its physical interpretation [37]. We applied the following methods for feature selection.
Hierarchical clustering based on Spearman rank-order correlations is a useful tool to reduce the number of collinear features without decreasing model accuracy. This technique consisted of picking a tolerance threshold over the Spearman rank-order correlation and keeping a single feature from each cluster (see Figure 2). The lower the threshold is, the smaller the dimension of input parameters is; therefore, this technique is employed to reduce the quantity of input parameters if the efficiency of the models is persistent. Hierarchical clustering is shown in Figure 2 with the corresponding collinear feature relations. Recursive feature elimination (RFE) removes less influential features from the original set. For this purpose, each feature is discarded, and the accuracy of the model is computed with the rest of the features. The supervised learning estimator is the random-forest regressor method. The feature corresponding to the worse RMSE mean is definitely discarded, and the procedure continues recursively eliminating less important features.
Recursive feature addition (RFA), contrary to RFE, recursively adds the most influential feature. The initial features set is composed by the DOY and GEEcum0 (GDD with a base temperature of 0 • C). The recursive procedure consists of adding only one feature to the initial set, and computes the related RMSE interval for the learning estimator random-forest regressor method. This computation is performed for each feature, and the feature corresponding to the lowest RMSE mean is added to the initial feature composition. The procedure continues recursively incorporating the most meaningful features.
Subgroup comparison and RMSE-based weight importance ranking: Subgroup comparison is a computationally expensive technique. All possible feature combinations are tested to determine the best combination. Completely distinct feature combinations obtain similar accuracy results for the random-forest model, as detailed in Appendix A, Table A1. Consequently, we used the following metric to reduce the number of experiments. As a result, feature priority ordering was introduced.
Description of metrics: Initially, an efficiency threshold was considered. The limit was set to be 0.7, the RMSE correspondent to the reference model presented in [34]. The technique consisted of computing the accuracy of the selected machine-learning model for all possible feature combinations, and the ones with lower RMSE than the efficiency threshold were selected. For each test, the weight of the features was the difference in RMSE with respect to 1 (the lower the better). We computed the total weight per feature as a sum in each experiment, as in Equation (1). rmse ex denotes the RMSE corresponding to experiment or feature combination. In order to normalize these quantities, all weight sums were divided by the maximal weight value.
Model Comparison: Preliminary experiments were carried out for several regression and classification methods. We considered both classification and regression models. Among regression methods, we used K neighbors, random forest, Bayesian ridge, logistic, and SGDR and logistic. As classifier methods, we applied K neighbors, decision tree, random forest, Ada boost, gradient boosting, extra tree, Gaussian naive Bayes (NB), linear discriminant analysis, and quadratic discriminant analysis. Models with a score higher than RMSE = 2 were discarded. Models were selected on the basis of [34,40] and considering [43][44][45]. The novel prediction model is the extra-tree regressor, an efficient technique over random forest models, as demonstrated in [43]. This model considers a metaestimator, fitting several randomized decision trees (called extra trees) on several subsamples of the dataset. Then, averaging the results, it provides more efficient predictions while avoiding overfitting. To compare the models, we considered the most influential input feature sets, i.e., the first 14 and 16, which were ranked according to their relative performance ( Table 2). Regardless of the number of considered features, RMSE values were similar. Consequently, reducing the feature sets did not affect the accuracy of model predictions, while saving time and computational resources.

Temperature Accuracy
The RMSE of the baseline model was computed using temperature data from weather stations and from the ERA5 dataset ( Figure 3). Considering both ERA5 and weather-station data, RMSE increased with augmenting the value of the base temperature. In addition, the difference between trends was constant.
The most accurate baseline models were those based on temperature measured at the weather stations. Since we aimed to use climate reanalysis data, the base temperature was fixed at 0 • C for the remaining analysis and computations (see corresponding feature definition GEEcum0, described in Section 2.2).

Feature Engineering
Several filter feature-selection techniques [35] were tested to contrast the effect of the features on prediction. This feature analysis was followed by a decisive model comparison. An embedded feature-selection procedure was performed to propose the most efficient and adequate feature combinations.

Hierarchical Clustering
Two different performance levels were obtained. In the first case, illustrated in Figure 4a, a tolerance threshold ranged from 0 to 3.8 with step size of 0.2, while in Figure 4b, the tolerance ranged from 0 to 0.9 with 0.1 as step size.
The tolerance threshold markedly increased with the increase in model accuracy. For the sake of consistency, this exercise was repeated with a smaller interval, which resulted in slightly better results for the tolerance limit fixed at 0.8. This property reduced the original selected feature set, discarding 11 features: NDVI, NDVIcum, EVIcum, sur refl b07, sur refl b02, REDcum, NIRcum, GEE TMIN, GEE TMAX, dewpoint 2 m temperature, GEEcum0. The resulting random forest model showed an average RMSE of 0.6603.

Recursive Feature Elimination and Addition
The main contribution of this strategy was the inverse priority ranking of the complete feature set. RMSE statistics for each feature elimination are shown in Figure 5a with feature inverse priority ordering. The discarded features proposed in common with hierarchy clustering were: REDcum, NDVI, sur refl b02, dewpoint 2 m temperature. RMSE statistics for each feature addition are illustrated in Figure 5b with feature priority ranking.   Table 2 shows the final feature priority ordering proposition, a ranking of features based on their impact on the model. This ordering based on RFA began being efficient when considering at least 6 features (see Figure 6a). In addition, the most efficient feature combinations are depicted in Figure 6b and detailed in Table A1, showing that multiple compositions provided equitably competent models.

Model Comparison
Comparison of two distinct feature sets, the 14 and 16 features with the highest weight percentage (Table 2) was used to select the most compelling model. In general, the efficiency of olive phenology prediction was related to decision-tree-based model selection (see Figure 7a,b). In addition, the extremely randomized trees improved similar random forest regressor method predictions depicted in Figure 7b. In the case of extremely randomized trees, randomness was even more exploited. Similar to random forests, in cases of the extra-tree method, a random subset of candidate features could be used. In this method, instead of looking for the most discriminative limits, selection thresholds are chosen at random for each candidate feature, and the best of these randomly generated thresholds is selected as the splitting rule. The randomness of features (the hybrid dataset contained features with different natures) helped improve the accuracy of the extra-trees regressor model (Table 3)   Here, the main goal was to enhance the most efficient feature combinations corresponding to the extra-tree regressor prediction model for olive phenology. The RFA methodology was applied for two distinct adding criteria, and subgroup comparison was performed. On the basis of the weighted metric feature ordering described in Table 2, these features were recursively added. The model started being efficient from the 11th added feature with a mean RMSE of 0.6231 (Figure 8a). The RFA methodology was applied as in Section 2.3.1 (Figure 8b). In this case, the performance of the model was acceptable with 6 features and a mean RMSE of 0.6214. Nonetheless, the strategy of computing all possible combinations was the most competent, as reported in Figure 8c. As detailed in Table A2, several feature consolidations are equally capable, with an RMSE mean below 0.6.  Table A2.

Discussion
Different strategies were useful for selecting appropriate features to attain accurate models for olive-tree phenology prediction. The Hierarchy clustering and RFE techniques failed at proposing a reduced feature set without losing prediction accuracy, whereas RFA was more effective in achieving this objective (see Figures 5b, 6a and 8a,b). However, common feature inclusion priority was not evident. The main key is that the RFA method was not equally applied in all cases. On the one hand, as the main objective of the study was to improve the obtained results in [34,40], we considered as mandatory inputs the DOY and GDD in Figures 5b-8b. Then, the final priority feature list was the result of recursively adding the feature corresponding to the most accurate combination. On the other hand, the recursively added features in Figures 6a-8a were extracted from a previously imposed feature list. Nevertheless, we observed that a feature priority list derived from one prediction model was not consistent for other models. The imposed feature list of Figures 6a-8a was the outcome of Section 3.2.3, as it was devoted to extracting unique priority ordering from several experiments with the reference random-forest model in [34,40]. In parallel, subgroup comparison extracted equally competent feature combinations for the randomforest model; consequently, this procedure was repeated once the most efficient model-the extra-tree regressor-had been highlighted.
Therefore, although machine learning improved prediction, unique priority ordering of appropriate environmental variables was not found. Indeed, separate feature combinations provided equally accurate results, as confirmed by subgroup comparison (see Table A1). Despite ranking based on subgroup analogies being provided, the best strategy to enhance the most efficient feature combination was the comparison of all possible compositions [46]. Nevertheless, such automated methodology was an advancement per se for selecting predictor variables at varying hierarchical levels to predict future phenological characteristics [22]. This result points to the benefits of the method when large datasets are explored to identify novel important features or their combinations that may provide superior outcomes in extracting practical information from model forecasts.
The decision-tree-based random forest and extra tree were the most efficient machinelearning models tested in this study. Classification and regression tasks were compared in both models, and the regression algorithm was found to be the most accurate. Although the phenological phases are categorical features, the categories have an increasing order, and this may explain why regression performed better than classification algorithms did.
The decision-tree-based extra-tree regression model had the most accurate olive-tree phenology prediction, with specific feature combinations (Table A2). When the combinations included slope and latitude as model features (lat in Table A2), geographic information was considered for phenology prediction. On the other hand, for generalization purposes, geographic features such as latitude and longitude could be discarded in order to properly adapt the model to new, previously unconsidered locations. As a matter of fact, though using geographic variables might allow for the model to consider the spatial structure of data, it would be difficult to apply resulting models in areas other than the one tested in this study. The terrain of Tuscany (the region from which data was gathered to feed the model) is rather complex, including coastal plains, mountain ranges, hilly areas, and alluvial valleys. As such, it can be considered representative of different geographic areas.
Surface-reflectance-derived NDVI and EVI were present in numerous combinations and, for the first time, to our knowledge, employed in olive-tree phenology prediction. Forecasting species-environment interactions through a combination of satellite observations and numerical models would help to optimize the monitoring of olive-tree phenology on a large scale. Since phenological events are influenced by drivers varying on different spatial scales (e.g., general circulation, local climate), the use of features generated at different spatial resolutions may allow for including in the model landscape predictions that consider overarching controls and finer-scale projections that ponder local variations.
Considering accumulated precipitation and temperature (cum precipitation and GEEcum0, respectively, in Table A2) proved to be beneficial to prediction accuracy. Indeed, equally efficient models could be generated with inputs of a completely different nature, including features derived from climate data and vegetation indices (ERA5-and MODISderived features, respectively). This implies that the unavailability of certain data may not invalidate olive-tree phenology prediction, which has important practical consequences for model implementation.

Conclusions and Future Work
The main goal of the DEMETER project (h2020-demeter.eu) is to provide Internet of Things (IoT)-based services aimed at providing practical strategies to optimize agricultural production across different European landscapes. In this sense, machine-learning techniques such as decision-tree models may have the advantage of handling numerous variables and allowing for discarding those features that impede the implementation of local models. This would reduce costs through the whole production chain.
Data availability is an important key in olive-tree phenology prediction, though the use of weather stations can be burdensome in terms of both instrument logistics and data management. Therefore, cost-effective methods as an alternative to acquiring meteorological data need to be explored. The main results of this research demonstrate that the worsening of accuracy when using temperature-derived indices such as GDD can be compensated with climate-, geography-, and vegetation-based model inputs. Therefore, a streaming satellite data service is advantageous for predicting and scaling olive-tree phenology in comparison with local data.
Since different feature sets provided equally efficient predictions, exploring larger datasets might select combinations of features, thereby increasing data-source options and phenology prediction quality. Further improvements to olive-tree phenology prediction may derive from applying alternative methods, such as the Markov chain approach, for managing meteorological variables [47][48][49][50][51] and implementing prediction models.