Phenological Model Intercomparison for Estimating Grapevine Budbreak Date ( Vitis vinifera L.) in Europe

: Budbreak date in grapevine is strictly dependent on temperature, and the correct simulation of its occurrence is of great interest since it may have major consequences on the ﬁnal yield and quality. In this study, we evaluated the reliability for budbreak simulation of two modeling approaches, the chilling-forcing (CF), which describes the entire dormancy period (endo- and eco-dormancy) and the forcing approach (F), which only describes the eco-dormancy. For this, we selected six phenological models that apply CF and F in di ﬀ erent ways, which were tested on budbreak simulation of eight grapevine varieties cultivated at di ﬀ erent latitudes in Europe. Although none of the compared models showed a clear supremacy over the others, models based on CF showed a generally higher estimation accuracy than F where ﬁxed starting dates were adopted. In the latter models, the accurate simulation of budbreak was dependent on the selection of the starting date for forcing accumulation that changes according to the latitude, whereas CF models were independent. Indeed, distinct thermal requirements were found for the grapevine varieties cultivated in Northern and Southern Europe. This implies the need to improve modeling of the dormancy period to avoid under- or over-estimations of budbreak date under di ﬀ erent environmental conditions. the model intercomparison is performed with phenological data of grapevine varieties collected in different environments. The objective of this study is to evaluate the performances of six phenological models (BRIN tested using hourly and daily temperature) at reproducing the budbreak date of eight grapevine varieties collected at different latitudes in Europe. In order to consider the influences of temperature during the dormancy period on model representations, two starting dates were set during chilling-forcing and forcing models' simulation, respectively. The analysis of budbreak date estimation is thus addressed to obtain responses of grapevine varieties’ behavior under different environments.


Introduction
Grapevine phenological phases have a profound influence on the final grape and wine quality [1][2][3]. In particular, changes in the current climate conditions are strongly affecting the occurrence of the developmental stages and length of the grapevine growing cycle in the most renowned wine areas [4,5].
Several authors have reported the increasing temperature trend across the most traditional wine areas of the Mediterranean basin during the last decades. For example, Tomasi et al. [6] showed a 2.3 • C-temperature increase over the grape growing season in Veneto region (Italy) during the period 1964-2009, while van Leeuwen et al. [7] evidenced a 2.0 • C increase from 1951 to 2016 in the Bordeaux wine production area. Moreover, Ramos et al. [8] observed an overall warming from 1.0 to 2.2 • C during the grape growing season in north-eastern Spain. These changes are expected to greatly influence grapevine varietal suitability across the European region [9][10][11][12].
Since grapevine phenology is mainly temperature-driven [4], the current warming is affecting the phenological stages and shifting the annual growing cycle [13][14][15][16]. Among the most important developmental phases, budbreak represents the starting point of vegetative growth. A temperature increase can have a dual effect on this phase (anticipating or delaying its occurrence), with strong consequences on the following reproductive stages. Indeed, a shift in budbreak may determine poor shoot growth together with fewer clusters and an inhomogeneous grape development [17,18].
Under increasing temperature, a delay in budbreak phase can be determined by insufficient chilling during the dormancy period [19]. Indeed, the bud vegetative growth is limited by a first sub-period induced by physiological conditions (endo-dormancy), and then by a second sub-period in which bud opening is affected by the environmental conditions (eco-dormancy [17,20]). In this context, some studies showed that dormancy release is regulated by bud exposure period to temperatures between 0 to 10 • C [19,21], until fulfilment of the chilling requirement, and a subsequent period of temperatures (from 0 to 25 • C), until the reaching of the forcing requirement [22].
Budbreak occurrence being strongly influenced by the dormancy period, several studies attempted the implementation of phenological models for improving the prediction of this developmental stage [23][24][25][26][27][28]. Phenological models can be classified in two main categories: forcing and chilling-forcing models. Forcing models are simply based on the forcing unit accumulation, starting, in general, from a fixed date in the year. These models only focus on the description of the eco-dormancy period assuming that the chilling unit accumulation requirement was previously fulfilled and the endo-dormancy period had concluded. Instead, chilling-forcing models can explain both the endo-and eco-dormancy periods by taking into consideration the chilling unit and the forcing accumulation with respect to specific temperature thresholds [24,29,30].
Hence, the selection of one of these two categories for an accurate budbreak simulation often gave rise to a wide debate [29,30]. Chilling-forcing models can take into account the influences of temperature on the entire dormancy period, evidencing not only the shortening effect of warmer temperatures during eco-dormancy but also a possible delay of budbreak caused by a failed chilling requirement [13,31]. However, the limited improvement in the performances of these models and the easier implementation provided by forcing models, focus on the issue about the most suitable modeling approach for budbreak estimation. In modeling studies, the choice of phenological model plays an important role in the use of crop simulation models and in determining potential risk of late spring frost events. In crop simulation models, phenology defines the plant growing cycle with important consequences on future biomass partitioning [32]. In this context, an accurate estimation of budbreak stage may avoid randomly selecting a starting point for vegetative biomass accumulation and over-or under-detection of late spring frost events under different climate scenarios [31,33].
Thus, the phenological intercomparison may be useful to highlight the performances of different models in simulating budbreak [29,30]. In order to take into account the influences of the grapevine precocity cycle and the effects of environmental conditions on budbreak estimation, model intercomparison experiments should consider multiple grapevine varieties and study sites. Indeed, grapevine varieties are characterized by various thermal requirements and focusing on a specific variety does not allow the range of different physiological responses derived from the impact of temperature on dormancy period to be highlighted [30]. In addition, the use of a single date as a starting point for the chilling and forcing accumulation [29] does not allow the influence of temperature on the beginning of endo-or eco-dormancy periods to be appreciated, especially when the model intercomparison is performed with phenological data of grapevine varieties collected in different environments.
The objective of this study is to evaluate the performances of six phenological models (BRIN tested using hourly and daily temperature) at reproducing the budbreak date of eight grapevine varieties collected at different latitudes in Europe. In order to consider the influences of temperature during the dormancy period on model representations, two starting dates were set during chilling-forcing and forcing models' simulation, respectively. The analysis of budbreak date estimation is thus addressed to obtain responses of grapevine varieties' behavior under different environments.

Observations
The phenological data were collected from 15 experimental vineyards in three European countries (France, Luxembourg and Portugal, see Figure 1 and Table 1). The weather data (i.e., daily maximum and minimum air temperature, • C) were freely available from weather services. More specifically, for France, the land-based daily weather data were collected by the National Centers for Environmental Information and then archived at NOAA (https://www.ncdc.noaa.gov/ [34]) and from a weather station near the VitAdapt project [35]. Hourly temperatures, rainfall, wind speed and reference evapotranspiration were recorded and stored in Climatik, a database managed by INRA, US 1116 Agroclim research station, 84914 Avignon, France (https://intranet.inra.fr/climatik_v2; limited access). For Luxembourg, the weather data were collected by the Administration des Services Techniques de l'Agriculture at the Institut Viti-Vinicole (https://agriculture.public.lu/) while the weather data for Portugal derived from the Instituto Nacional de Investigação Agrária e Veterinária (INIAV http://www.iniav.pt/) and the Associação para o Desenvolvimento da Viticultura Duriense (ADVID https://www.advid.pt/). The weather data were obtained from the closest and more representative stations for the vineyard weather conditions ( Figure 1 and Table 1).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 24 the model intercomparison is performed with phenological data of grapevine varieties collected in different environments. The objective of this study is to evaluate the performances of six phenological models (BRIN tested using hourly and daily temperature) at reproducing the budbreak date of eight grapevine varieties collected at different latitudes in Europe. In order to consider the influences of temperature during the dormancy period on model representations, two starting dates were set during chillingforcing and forcing models' simulation, respectively. The analysis of budbreak date estimation is thus addressed to obtain responses of grapevine varieties' behavior under different environments.

Observations
The phenological data were collected from 15 experimental vineyards in three European countries (France, Luxembourg and Portugal, see Figure 1 and Table 1). The weather data (i.e., daily maximum and minimum air temperature, °C) were freely available from weather services. More specifically, for France, the land-based daily weather data were collected by the National Centers for Environmental Information and then archived at NOAA (https://www.ncdc.noaa.gov/ [34]) and from a weather station near the VitAdapt project [35]. Hourly temperatures, rainfall, wind speed and reference evapotranspiration were recorded and stored in Climatik, a database managed by INRA, US 1116 Agroclim research station, 84914 Avignon, France (https://intranet.inra.fr/climatik_v2; limited access). For Luxembourg, the weather data were collected by the Administration des Services Techniques de l'Agriculture at the Institut Viti-Vinicole (https://agriculture.public.lu/) while the weather data for Portugal derived from the Instituto Nacional de Investigação Agrária e Veterinária (INIAV http://www.iniav.pt/) and the Associação para o Desenvolvimento da Viticultura Duriense (ADVID https://www.advid.pt/). The weather data were obtained from the closest and more representative stations for the vineyard weather conditions ( Figure 1 and Table 1).   The phenological data were collected for eight grapevine varieties (Cabernet Sauvignon, Chardonnay, Gewürztraminer, Grenache, Pinot Gris, Riesling, Touriga Franca and Touriga Nacional). For France, these data were retrieved from PERPHECLIM platform (https://www6.inrae.fr/projetaccaf-perpheclim_eng/The-PERPHECLIM-project [36]) and VitAdapt project [35]. For Luxembourg, the data were obtained from the Administration des Services Techniques de l'Agriculture at the Institut Viti-Vinicole. Finally, the data for the Lisbon wine region (Portugal) were obtained from the national ampelographic collection at the INIAV, located in Dois Portos. The data for the Douro wine region were collected by the associates from ADVID. Most of these grapevine varieties are for high-quality wine production derived from the world's benchmark regions and can be grouped on the basis of average growing season temperatures and ripening potential [37]. The phenological observations were collected by different research institutions and farmers at budbreak date which typically corresponded to BBCH 09 [38], except for Cabernet Sauvignon, Touriga Franca and Touriga Nacional collected in France (Bordeaux site; BBCH 07). We assumed that the distance between BBCH 07 and 09 is covered in 2 days as reported in [27].

GDD
The GDD model [39] is an F model that describes the eco-dormancy period based on accumulation of the daily average temperature (T avg ; • C) above a specific threshold (T b ; • C). The GDD accumulation (Forc GDD ) starts at a fixed date of the year (t 0 ) and is accumulated until the GDD requirement is satisfied at budbreak date (Bud; Equation (1)).

WANG
The WANG model [40] is a temperature response function used, in this study, to describe the eco-dormancy period from a fixed date of the year (t 0 ) and not to calculate the chilling unit accumulation as reported for other similar functions [30]. The temperature function (Forc WANG ) allows the phenological development of grapevine to be evaluated using a non-linear approach based on three cardinal temperatures: T opt (optimum temperature, • C), T max (maximum temperature, • C) and T min (minimum temperature, • C; Equations (2a) and (2b)).
where T avg is the daily average temperature ( • C) and Bud is the budbreak date.

UNIFORC
The UNIFORC model [24] is an F model based on a sigmoid curve for calculating the forcing unit accumulation (Forc UNIFORC ) from a fixed day of the year (t 0 ) until budbreak occurrence (Bud; Equation (3)): where T avg is daily average temperature ( • C), while d and e (d < 0 and e > 0) are empirical parameters of the function.

BRIN
The BRIN model [29] is a CF model based on the Bidabe's Cold Action Model [41,42] in which the endo-dormancy period is evaluated through the accumulation of chilling units (Chill BRIN ; Equation (4)). The eco-dormancy period is based on forcing unit accumulation (Forc BRIN ) calculated with the Richardson equation [23]. The Forc BRIN is computed using both a daily (Equation (5)) and hourly (Equations (6a) and (6b)) temperature accumulation approach.
where t 0 is the simulation starting date, T avg is daily average temperature ( • C), C endo is the date at which the chilling requirement is satisfied, Bud is budbreak date, Q 10c is the geometric progression rate of the thermal dormancy response, T n and T x are daily minimum and maximum temperatures ( • C), respectively. Richardson Daily and Richardson Hourly are daily and hourly Richardson equations used for representing the eco-dormancy period [29]. The Richardson equations are limited by two cardinal temperatures: a lower (T l , • C) and an upper limit temperature (T h , • C).

UNICHILL
The UNICHILL model [24] is a CF two-phase sequential model constituted by two functions describing chilling accumulation (Chill UNICHILL ) during endo-dormancy (Equation (7)) and forcing unit accumulation (Forc UNICHILL ) during eco-dormancy period (Equation (8)). After chilling requirement is satisfied at a specific date (C endo ), the forcing units start accumulating until the forcing requirement is reached, leading to the occurrence of the budbreak phase (Bud). The two functions are defined as followed: where t 0 is the simulation starting date, T avg is daily average temperature ( • C) and a, b, c, d and e are empirical parameters of the function.

UNIFIED
The UNIFIED model [24] is a CF overlapping model based on the same equations for chilling and forcing unit accumulation used in the UNICHILL model. However, UNIFIED model includes the effect of chilling units on forcing accumulation through implementation of the exponential decrease curve (Equation (9)) in which ForcReq UNIFIED (Fcrit) is the critical amount of forcing and Chill UNIFIED is the total amount of chilling at day t d . The budbreak date is estimated when both chilling and forcing requirements are satisfied: where w and z are empirical parameters of the function.

Outliers Removal
The model fitting, validation and application were performed after removing outliers from the observed dataset. This procedure consists in the identification of possible anomalies by comparing the observed and simulated results obtained by a preliminary models' fitting against the original datasets for each variety. The outliers were identified by using the Tukey method [43], based on the non-parametric measure of the statistical dispersion of a given distribution quantified using the interquartile range (IQR). For each model and starting date, the outliers represent the values in which the difference between the observed and simulated budbreak Days of Year (DOYs) are outside the whiskers of the distribution (below Q1 − 1.5 IQR or above Q3 + 1.5 IQR values). An outlier of a specific variety is removed from the dataset when it is observed in at least 4 combinations (models x starting Appl. Sci. 2020, 10, 3800 7 of 22 date). The percentage of outliers removed from the fitting, validation and application datasets is: 4.17% (Cabernet Sauvignon), 5.36% (Chardonnay), 1.85% (Gewurztraminer and Riesling), 3.70% (Grenache), 1.52% (Pinot Gris), 10% (Touriga Franca and Nacional). These values were generally observed after extreme weather conditions (e.g., frosts of 1956, 1963 in France) or due to other influences (e.g., agronomic practices or errors in data collection) which cannot be reproduced during model simulation.

Parameters' Range and Starting Date
The parameters' range of variation in the 6 phenological models was defined according to the literature [29,30,[44][45][46]. For the GDD model, the fitted parameter was T b which ranges from 0 to 10 • C. In this study, 0 • C was considered the limit temperature for bud growth while 10 • C was the maximum base temperature adopted for describing the eco-dormancy period as found in other studies [19,22,29,44]. For the WANG model, this function being generally used for wheat simulation [47], the parameter range of the cardinal temperatures was established considering the values obtained by other similar functions (Fbeta) implemented for simulating budbreak date in grapevine [30]. Thus, T min and T max were set at 0 and 45 • C respectively, while T opt ranged in a ±30% of the value reported in Fila et al. [30]. For UNIFORC model, the d and e parameter values varied in a ±30% interval with respect to the average value reported for different grapevine varieties (Glera, Cabernet Sauvignon, Chardonnay and Merlot [46]). For both daily and hourly approaches of BRIN model, the Q 10 parameter range was defined in a ±30% interval of the value estimated for several grapevine varieties [29]. Moreover, the lower limit temperature (T l ) was fitted in the range 0-10 • C, as reported for GDD model while the upper limit temperature (T h ) for post dormancy was fixed at 25 • C as indicated in Garcia de Cortazar-Atauri et al. [29]. For the UNICHILL model, the parameters a, b, c, d and e were in a range of ±30% of the overall minimum and maximum value reported by Fila [46] for Glera, Cabernet Sauvignon, Chardonnay and Merlot varieties (field calibration). However, in the specific case of Cabernet Sauvignon and Chardonnay, the parameter range of a, b, c, d and e varied in ±30% interval with respect to the parameter values found for these varieties. Similarly, the range adopted for a, b, c, d and e parameters in UNIFIED model was identified by considering ±30% of the minimum and maximum values reported by Fila et al. [46] (field calibration). Based on the same study, the parameters w and z were fixed by considering the average values reported for field calibration of Sangiovese and Montepulciano.
Concerning the use of fixed starting dates t 0 during model simulation, we set the 1st of January [30] and 1st of March [27] of the current year for the F models and the 1st September [24] and 1st of August [29] of the previous year for the CF models.

Model Fitting, Validation and Application
The methodology used for model fitting, validation and application is displayed in Figure 2. After outliers removal (Section 2.3.1), the first step was to randomly split the observed data (budbreak DOY per year) into 10 random subsamples (RS) in which 60% of the observed data was used for model fitting whereas the remaining 40% was retained for validation. Once the 10 random subsamples were prepared, the model fitting was performed for any combination of varieties, models and starting dates using the Phenological Modeling Platform (PMP) software (version 5.5) [48] and setting the parameter ranges as described in Section 2.3.2. The best fitting for any combination was found by exploring the range of each parameter through the Metropolis-Hastings algorithm [49][50][51] of PMP. Each fitted model was validated on the remaining 40% of the data and then applied on the validation datasets of the other subsamples by using the R software environment (version 3.5.0) [52]. Thus, the parameter set of each model combination (model x variety x starting date), which provided the highest-performance (HP) during fitting and validation procedures, was selected. Moreover, the average parameter sets (AV) derived from all fitted subsamples was also tested on the entire distribution of validation datasets. Finally, the HP and AV parameter sets were applied to the independent datasets of other study sites to evaluate the behavior of different parameterizations under different environmental conditions. Finally, the minimum, 10% quantile, median, mean, 90% quantile and maximum values of the statistics were extracted for describing the simulation results on all validation datasets.

Overall Model Behavior in Fitting and Validation
The results of the phenological model intercomparison were described comparing the performances of the different modeling approaches (e.g., F vs. CF) and starting dates (1

Statistical Analysis
The statistical indices used for evaluating the phenological model performances during fitting, validation and application were: R-squared (R 2 , Equation (10), [53]); Akaike information criterion (AIC, Equation 11, [54]); Root Mean Squared Error (RMSE, Equation (12), [55]) and modeling efficiency coefficient (EF, Equation (13), [56]), each defined as follows: where O i is the observed value, O is the average of the observed values, P i is the predicted value, n is the number of observations, k is the number of parameters in the model, andL is the maximized value of the likelihood function of the predicted model. Finally, the minimum, 10% quantile, median, mean, 90% quantile and maximum values of the statistics were extracted for describing the simulation results on all validation datasets.

Overall Model Behavior in Fitting and Validation
The results of the phenological model intercomparison were described comparing the performances of the different modeling approaches (e.g., F vs. CF) and starting dates (1st January/March vs.

HP versus AV
In order to compare the results of different models' parameterizations, we selected the highest performance parameter set (HP, Table 2) from the 10 fitting and validation subsamples for each combination of model, starting date and variety. We then selected the average parameter set (AV, Table 3) from the 10 fitting subsamples for each combination model, starting date and variety. On these bases, Figures S1-S7 show the performances (R 2 and AIC) of the model fitting and validation (symbols) and the applications of the different parameterizations on all validation subsamples

HP versus AV
In order to compare the results of different models' parameterizations, we selected the highest performance parameter set (HP, Table 2) from the 10 fitting and validation subsamples for each combination of model, starting date and variety. We then selected the average parameter set (AV , Table 3) from the 10 fitting subsamples for each combination model, starting date and variety. On these bases, Figures S1-S7 show the performances (R 2 and AIC) of the model fitting and validation (symbols) and the applications of the different parameterizations on all validation subsamples (boxplot) obtained by considering all grapevine varieties and starting dates. In detail, the selected parameter sets with highest performances (HP; filled symbols with different shapes according to the starting dates) were the result of the highest rank value of R 2 and AIC and the minimum discrepancy between fitting (red) and validation (blue) models. By contrast, the AV parameter sets (boxplots in shadow area) were the application of the average parameterization obtained for all subsamples. Despite that similar performances for HP and AV parameter sets were evidenced in Figures S1-S7, higher discrepancy was found in the behavior of CF and F models. In Figures S1-S4 (CF models), no differences were observed when different starting dates were set during model fitting and application. On the other hand, high discrepancy was found in model performances when different starting dates were adopted in Figures S5-S7 (F models). This trend is also evidenced by the correlations between observed and simulated DOYs in Figure 4, Figures  S8 and S9. The results of model fitting showed accurate performances for all selected HP, with more discrepancy between starting dates of F models (yellow and green correlations) compared to those used in CF models (blue and red correlations). The values of R 2 also highlighted the higher performances obtained with some varieties compared to others (Chardonnay vs. Touriga Nacional).

Model Applications on Other Independent Datasets
The statistical performances of the phenological models, applied on the respective independent datasets, are reported in Figure 5 and Figures S10-S16 while the comparison between CF and F models during fitting and applications is shown in Figure 6 and Figure S17. In general, higher performances were found for CF compared to F models during fitting and applications ( Figure 5), except for those varieties characterized by few observed data in the model application (Gewürztraminer, Riesling; Figure 6 and Figure S17).
Moreover, the behavior of grapevine varieties in response to specific climate conditions was noticed during model application. The higher performances of CF compared to F, as well the relevant differences between starting dates when F models were used, may be explained by the influence of temperatures on the dormancy period. The end of the first phase of dormancy (endo-dormancy) in CF models was generally simulated between the end of December (e.g., Touriga Nacional: DOY 356 for t 0 = −121 and −152, respectively) and end of February (e.g., Riesling: DOY 56 and 57 for t 0 = −121 and −152, respectively). More specifically, the period of endo-dormancy fulfilment was similarly reached regardless of starting date (1st September/August), allowing the next forcing unit accumulation during the eco-dormancy period to be correctly satisfied (differences in days from 0 to 3 days). The analysis further highlighted that the endo-dormancy release occurs later at northern than at southern latitudes and this is consistent with results obtained using F approach. These demonstrate that a late starting date (1st of March) for forcing unit accumulation is more suitable for budbreak simulation of varieties collected in Northern Europe (e.g., Gewürztraminer, Riesling and Pinot Gris), whose endo-dormancy release is simulated by CF late in the season (54 DOY on average). Conversely, an early starting date (1 January) provided the best results in simulating budbreak for grapevine varieties collected in Southern Europe (e.g., Touriga Franca and Nacional, Cabernet Sauvignon, Grenache and Chardonnay), which are characterized by an early end of endo-dormancy as simulated by CF models (13 DOY on average).

Discussion
This study compares the performance of six phenological models in simulating the budbreak date of eight grapevine varieties across different European sites. We found that the model performances characterized by a Chilling-Forcing approach were generally higher compared to those obtained for Forcing approach when different starting dates were set ( Figure 6 and Figure S17).
Although model intercomparison studies on the estimation of budbreak date in grapevine have already been conducted by other authors [15,29,30], this study focused on evaluation of the models' reliability to reproduce budbreak in different environmental conditions, by considering the impact of temperature during the chilling and forcing accumulation period. In contrast to other works [27,30,57,58], this model intercomparison considers the use of a long-term series of observed phenology, which include grapevine varieties with different precocity levels collected at various latitudes in three European countries (France, Luxembourg and Portugal, Figure 1 and Table 1). The extensive observed dataset coupled to the methodology applied in this study allowed to evaluate the influence of study site-temperature variability on the grapevine development. Moreover, the cross-validation scheme adopted in this model intercomparison experiment provided an exhaustive analysis of the bias and variance of the CF and F models while the comparison between two different types of parameterization (HP or AV) pointed out the issue to select a specific parameter set for model applications (Results section, Figure 5). This in-depth investigation of model fitting, validation and application was particularly useful for evaluating the influence of temperature during the dormancy period. In this context, the different model responses during budbreak predictions are related to the physiological requirements needed for dormancy fulfilment of the grapevine varieties. Differently from the CF models, which describe the entire dormancy period (endo-and eco-dormancy), the F models only describe the eco-dormancy period through forcing units accumulation, under the assumption that the chilling requirement was already satisfied [24]. Although the debate on which model type provides the highest reliability is still open [29,30], our analysis shows that CF models, considering the influences of winter and spring temperatures on the accumulation of chilling and forcing units, may avoid the under-or over-estimation at the beginning of eco-dormancy, which occur when F models are set on fixed dates. In this study, we estimated a longer endo-dormancy for the vines from the Northern European sites (e.g., Gewürztraminer), which occurs later than the endo-dormancy of vines from the Southern European sites (e.g., Touriga Nacional), irrespective of the precocity level of the different varieties. In other terms, grapevine varieties such as Gewürztraminer (early budbreak) and Riesling (late budbreak) collected in Northern Europe showed a late endo-dormancy release (higher chilling requirement) with respect to Chardonnay (early budbreak) and Cabernet Sauvignon (late budbreak) mainly collected in Southern Europe (lower chilling requirement). Although this trend was examined in this study only from a modeling point of view without any physiological evidence on the dormancy period, our results suggested that plants cultivated at different latitudes might show distinct dormancy period lengths. The higher accuracy of CF compared to F models is likely linked to the ability of these models to estimate the endo-dormancy release and to better account for the effect of temperature in this phase, avoiding the use of a fixed a priori value for the beginning of the eco-dormancy period as in F models [27,29]. Indeed, the goodness of F models is shown to significantly increase when the starting date is optimized against observed experimental data [30,45], demonstrating that the beginning of eco-dormancy period is dependent to the origin of the observed dataset and, thus, on the fulfilment of specific physiological requirements that change in relation to the climate conditions of the different environments. In fact, although no relevant differences were found for budbreak date estimation of grapevine varieties using CF models set on 1st of September/August starting dates, the discrepancy of the F model simulations was evident when 1st of January/March dates were used. Higher accuracy of F models to predict the budbreak date of grapevine varieties collected in Northern European countries was observed using the 1st of March starting date as also confirmed by [27], while the 1st January seems to be a more suitable starting date for representing budbreak date of the grapevine varieties from Southern Europe ( Figures S5-S7). In this case, a compromise between both starting dates may be represented by the beginning of February [59]. However, the adoption of different starting dates in F models highlights the role of temperature on bud growth and the need to satisfy specific dormancy requirements for varieties cultivated at different latitudes before budbreak occurrence. These results were supported by the work of Reis-Pereira et al. [60] on the grapevine flowering stage. The authors pointed out that the selection of a priori value for the starting of thermal unit accumulation in F models may lead to an under-or over-estimation of the flowering phase, demonstrating the importance of a correct evaluation of the previous phenological stage in respect to the subsequent one. These findings were also observed in relation to the flowering stage of other species such as olive tree e.g., [61]). In this context, we infer that the dormancy period of grapevine may play a relevant role for predicting budbreak date especially under future scenarios. Indeed, more accurate estimations of budbreak date using CF models are provided when the endo-dormancy release is calibrated against observed data of perennial crops. This behavior is particularly evident under warmer temperatures (after 2050) where CF without endo-dormancy calibration and F models showed poor performances of phenology prediction [62]. The accurate identification of the endo-dormancy release avoids selecting a fixed a priori date, especially when this may have strong consequences on the estimation of budbreak date (as in F models).
The impact of warmer temperatures after the endo-dormancy release showed higher effect on the budbreak phase compared to the influence of temperatures during chilling accumulation, as reported by Martinez-Lüscher et al. [63] in a two-year ad hoc experiment with different grapevine varieties and temperature regimes. However, the starting date for thermal unit accumulation depends on the physiological features of the grapevine variety and their interaction with the environment. Indeed, we found that the effect of temperature during the last part of winter and beginning of spring was also evident when models were tested on external sites (independent datasets, Figure 5). Although most of the models showed satisfactory results during application (R 2 and AIC), the average error (RMSE) between observed and simulated data was relatively high (from 4 to 19 days). This situation was exacerbated in F models where the reliability in simulating budbreak is dependent to the use of a specific starting date (1st of January/1st of March) and thus the original observed dataset. The discrepancy between the two starting dates raises the question of which initial date to choose for forcing unit accumulation when simulations are conducted on a European scale [31,64].
The implications of this study are particularly relevant for growers especially when considering the impact of climate change. Our study, in line with a large part of literature, confirmed that chill units accumulations, the amount of which varies according to variety and location, is required for exiting from endo-dormancy phase. In a warmer climate, as expected in the future, this amount may not fully satisfy, leading to developmental issues during the growing season as outlined in the introduction of this paper. Some authors already highlighted that, on a global scale, wine areas may shift to match their specific climate niches in case of no adaptation [11,65], while new varietal selections and breeding programmes are needed for improving the adaptability of the grapevine varieties and maintaining the high quality production [12,59,66]. In this context, our parameterizations may be used as metric, in combination with climate simulations under future scenarios, to highlight which regions and varieties are expected to be most exposed to the risk of budbreak failure. Accordingly, the use of alternative varieties, better adapted to the expected future climate, may be proposed or the direction of specific breeding programmes can be indicated to develop varieties with a lower chill units requirement as requested in a warmer climate. Moreover, we should also take into account that, besides thermal requirements, dormancy phase may be influenced by the photoperiod [67]. Since each grapevine variety requires critical thresholds of day length for the onset of dormancy [68,69], this regulating factor should be considered when CF models are adopted for simulations at different latitudes. As such, comprehensive datasets of phenological observations and ad hoc experiments for evaluating the influence of more factors on dormancy [30,46,69] may be useful to take into account for improving model reliability under future climate scenarios and different environments [31,64,70].

Conclusions
This model intercomparison evidenced the different performance of the phenological models in reproducing the budbreak date of eight grapevine varieties collected in different European regions. The reliability of two main phenological modeling approaches (CF and F) was evaluated considering different starting dates (1st January/March or 1st September/August) and the selection methodologies of the parameter set (HP and AV) for model application. The results showed, in general, higher performances of CF compared to F models where fixed starting dates were adopted, even if neither modeling approach highlighted a clear supremacy over the other. For this reason, CF and F models' behavior should be further investigated through ad hoc experiments for evaluating the role of dormancy on budbreak occurrence. Indeed, the findings of our model simulations evidenced distinct thermal requirements for grapevine varieties cultivated at different latitudes (i.e., Northern Europe vs. Southern Europe). The use of different starting dates highlighted that, unlike CF models, the accurate estimation of budbreak in F models is affected by the selection of a starting date for beginning of the eco-dormancy period. This implies that, if this phase is not accurately defined, F models may lead to over-or under-estimations of the budbreak date under different environmental conditions. In this context, comprehensive phenological datasets (e.g., grapevine phenological data collected at different latitudes) are needed to thoroughly investigate the effect of temperature variability on dormancy period and budbreak estimation. Finally, it is proposed the development and implementation of improved phenological models, which consider the effect of other climate (e.g., radiation) and soil (e.g., soil temperature) variables as well as some regulating factors of the endo-dormancy phase (e.g., day length).

Supplementary Materials:
The following are available online at http://www.mdpi.com/2076-3417/10/11/3800/s1. Figure S1: R 2 and AIC values of the BRIN Daily model obtained for all fitted and average models, grapevine varieties and starting dates. The symbols (points and triangles) show the model results at two starting dates (in this example t 0 = −121 and −152 DOY), while the two colours represent fitting (red) and validation (blue) procedure, respectively. The filled symbols constitute the R 2 and AIC values of the models obtained by the selected HP parameter sets, while the dashed (fitting) and dotted (validation) lines depict the average R 2 Figure S5: R 2 and AIC values of the UNIFORC model obtained for all fitted and average models, grapevine varieties and starting dates. The symbols (points and triangles) show the model results at two starting dates (in this example t 0 = 1 and 60 DOY), while the two colours represent fitting (red) and validation (blue) procedure, respectively. The filled symbols constitute the R 2 and AIC values of the models obtained by the selected HP parameter sets, while the dashed (fitting) and dotted (validation) lines depict the average R 2 and AIC values for both starting dates (in this example orange line = 1 DOY and green line = 60 DOY). The boxplots represent the R 2 and AIC values distribution for HP (orange and light blue: HP model application on all validation subsamples for t 0 = 1 and t 0 = 60 DOY, respectively) and AV (boxplot in the shaded area, orange and light blue: AV application on all validation subsamples for t 0 = 1 and t 0 = 60 DOY, respectively). The outliers represent all values out of the 1.5*InterQuartile Range (IQR) of the boxplot. Figure  S6: R 2 and AIC values of the GDD model obtained for all fitted and average models, grapevine varieties and starting dates. The symbols (points and triangles) show the model results at two starting dates (in this example t 0 = 1 and 60 DOY), while the two colours represent fitting (red) and validation (blue) procedure, respectively. The filled symbols constitute the R 2 and AIC values of the models obtained by the selected HP parameter sets, while the dashed (fitting) and dotted (validation) lines depict the average R 2 and AIC values for both starting dates (in this example orange line = 1 DOY and green line = 60 DOY). The boxplots represent the R 2 and AIC values distribution for HP (orange and light blue: HP model application on all validation subsamples for t 0 = 1 and t 0 = 60 DOY, respectively) and AV (boxplot in the shaded area, orange and light blue: AV application on all validation subsamples for t 0 = 1 and t 0 = 60 DOY, respectively). The outliers represent all values out of the 1.5*InterQuartile Range (IQR) of the boxplot. Figure S7: R 2 and AIC values of the WANG model obtained for all fitted and average models, grapevine varieties and starting dates. The symbols (points and triangles) show the model results at two starting dates (in this example t 0 = 1 and 60 DOY), while the two colours represent fitting (red) and validation (blue) procedure, respectively. The filled symbols constitute the R 2 and AIC values of the models obtained by the selected HP parameter sets, while the dashed (fitting) and dotted (validation) lines depict the average R 2 and AIC values for both starting dates (in this example orange line = 1 DOY and green line = 60 DOY). The boxplots represent the R 2 and AIC values distribution for HP (orange and light blue: HP model application on all validation subsamples for t 0 = 1 and t 0 = 60 DOY, respectively) and AV (boxplot in the shaded area, orange and light blue: AV application on all validation subsamples for t 0 = 1 and t 0 = 60 DOY, respectively). The outliers represent all values out of the 1.5*InterQuartile Range (IQR) of the boxplot. Figure S8: Correlations between observed and simulated budbreak Days of Year (DOY) of the UNIFORC and UNICHILL models. The results were obtained for all grapevine varieties and starting dates using the HP parameter set. Figure S9: Correlations between observed and simulated budbreak Days of Year (DOY) of the WANG and BRIN Daily models. The results were obtained for all grapevine varieties and starting dates using the HP parameter set. Figure S10: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Cabernet sauvignon variety. Figure S11: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Gewürztraminer variety. Figure S12: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Grenache variety. Figure S13: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Pinot Gris variety. Figure S14: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Riesling variety. Figure S15: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Touriga Franca variety. Figure S16: Highest-Performance (HP) and average (AV) parameters' set applied on the independent datasets by considering all phenological models and starting dates. The example of Touriga Nacional variety. Figure S17: Statistical results (R 2 , AIC and RMSE) of the comparison between CF (black bars) and F (red bars) models in model fitting and application for all grapevine varieties using AV parameters' set. The differences between CF and F models in model fitting and applications were displayed using green and yellow bars, respectively. CS = Cabernet Sauvignon, CH = Chardonnay, GE = Gewürztraminer, GR = Grenache, PG = Pinot Gris, RS = Riesling, TF = Touriga Franca, TN = Touriga Nacional. Table S1: Statistical indices (RMSE, EF, R 2 , AIC) obtained for all models, varieties and starting dates after model fitting and validation on all subsamples.