Sensitivity of Spring Phenology Simulations to the Selection of Model Structure and Driving Meteorological Data

Accurate estimation of the timing of intensive spring leaf growth initiation at mid and high latitudes is crucial for improving the predictive capacity of biogeochemical and Earth system models. In this study, we focus on the modeling of climatological onset of spring leaf growth in Central Europe and use three spring phenology models driven by three meteorological datasets. The MODIS-adjusted NDVI3g dataset was used as a reference for the period between 1982 and 2010, enabling us to study the long-term mean leaf onset timing and its interannual variability (IAV). The performance of all phenology model–meteorology database combinations was evaluated with one another, and against the reference dataset. We found that none of the constructed model–database combinations could reproduce the observed start of season (SOS) climatology within the study region. The models typically overestimated IAV of the leaf onset, where spatial median SOS dates were best simulated by the models based on heat accumulation. When aggregated for the whole study area, the complex, bioclimatic index-based model driven by the CarpatClim database could capture the observed overall SOS trend. Our results indicate that the simulated timing of leaf onset primarily depends on the choice of model structure, with a secondary contribution from the choice of the driving meteorological dataset.


Introduction
Plant development has evolved to respond to the seasonality of various environmental factors and is influenced by the interannual variability of climate [1,2]. Plant phenology research focuses on the periodic events occurring within the annual cycles of terrestrial vegetation [3,4]. Plant development has internal (genetic) and external (environmental) drivers. In the temperate zone, adaptive plant strategies must feature traits such as tolerance of late spring or early autumn frost damage and mid-summer water shortages. The timing of flowering and propagule development must enable successful reproduction [5].
In a feedback loop, the phenological stage of the vegetation influences the exchange of heat and mass between the land surface and the atmosphere [6]. Phenology thus plays an important role in ecosystem function, influencing the carbon balance of the soil-plant system and regulating biogeochemical and biophysical feedbacks through the coupled carbon-, water-, and energy cycles [4,6]. In the highly seasonal temperate zone at mid-and high latitudes, specific attention is paid to the start of the active vegetation period [2,7]. Surface and remote sensing observations provide information about the leaf onset at contrasting spatial scales [8][9][10].
Given the determinant role played by plant phenology, models of varying complexity have attempted to quantify the timing of the onset of the growing season (commonly referred to as the start of season (SOS), expressed by the numerical day of year (DOY)) [3,4,[11][12][13][14][15][16]. Phenology models aim to capture the long-term mean SOS and its interannual variability [6], while an accurate representation of recently observed SOS trends related to climate change is also crucial for future prediction of plant growth [1,3]. Phenology models are integral parts of the biogeochemical models (BGMs), Dynamic Global Vegetation Models (DGVM), and also are increasingly included in General Circulation Models (also referred to as Global Climate Models, GCM) and Earth system models (ESM) [4,17,18]. Given the generalisation inherent in these large-scale models, their regional applicability and performance metrics of the methods they make use of remain a challenge [19].
Meteorological variables such as temperature, precipitation, radiation, and photoperiod are important drivers of vegetation development. These variables play a key role in phenology models [4,13,20,21]. During the past two decades, there has been a growth in the number of available meteorological datasets that can be used in studies of vegetation growth [22]. Data availability, however, poses new challenges: The decision of which dataset is the most suitable for the research is likely to affect the eventual result. Previous phenological studies used datasets developed by the Modern-Era Retrospective Analysis for Research and Applications (MERRA) [23], E-OBS [5], CRU-NCEP [2], ERA-15 reanalysis product of European Centre for Medium-Range Weather Forecasts (ECMWF) with a spatial resolution of 1 • × 1 • [24], and the ERA-Interim reanalysis of ECMWF [25]. The Daymet database [26,27], with 1 km spatial resolution, was also used for North America by a number of studies [13,20,28]. Surprisingly, given the large number of available datasets, a comparison of the performance of phenology models as a function of meteorological database selection is largely missing from the literature.
The aim of the present study is to quantify the sensitivity of SOS estimation to the choice of the model structure, and to the selection of the driving meteorological dataset. Three commonly used spring phenology models with different meteorology data requirements were used to evaluate the quality of phenological prediction as a function of the selection of driving variables. As a case study, we investigate Hungary, which has experienced a number of extreme weather events (heatwaves, drought, extreme precipitation) in recent years, likely to have affected the variability of SOS and therefore pose a modelling challenge [29][30][31]. We used the NDVI3g dataset as an observation-based reference [29].
We seek to answer the following questions: 1.
How accurately can models simulate the observed SOS climatology in the region? 2.
Are models able to capture observed interannual variability and long-term trends of SOS? 3.
Is the choice of model or the choice of the meteorological database a more important factor affecting the accuracy of estimated SOS?
The results are very likely to inform the application of phenology models in Central Europe as the provided parameterisation can be directly included in BGMs or ESMs. The study also has global relevance as the tested comparison methodology applies to the study of any locality or phenological stage.

Study Area
This study focuses on Hungary (Figure 1). The country is located in the temperate climatic zone in Central Europe. Hungary is influenced by three major climate systems: Continental, Mediterranean, and Oceanic. The long-term (1981-2010) mean annual temperature is 10-11 • C, with mean annual precipitation of 500-750 mm. (https://www.met.hu/eghajlat/magyarorszag_eghajlata/; accessed on 11 January 2021.) The country is situated in the Carpathian Basin and is characterised by relatively low relief. The majority of the country is below 400 m with a mean elevation of~150 m.

Study Area
This study focuses on Hungary (Figure 1). The country is located in the temperate climatic zone in Central Europe. Hungary is influenced by three major climate systems: Continental, Mediterranean, and Oceanic. The long-term (1981-2010) mean annual temperature is 10-11 °C, with mean annual precipitation of 500-750 mm. (https://www.met.hu/eghajlat/magyarorszag_eghajlata/; accessed on 11 January 2021.) The country is situated in the Carpathian Basin and is characterised by relatively low relief. The majority of the country is below 400 m with a mean elevation of ~150 m. Land Cover database at 500 × 500 m spatial resolution. Land use categories shown on the map are described on the website of the database (http://land.copernicus.eu/pan-european/corine-landcover/clc-2012, accessed on 23 November 2021). Individual pixels used for the calibration of the phenology models are highlighted with coloured rectangles (see Section 2.6). Here we considered three main land use categories: Grasslands, forests, and arable lands. Pixel size is determined by the NDVI3g reference dataset.

Phenology Models
The estimation of the onset of spring vegetation growth is a highly scale-dependent task [17]. For large-scale modelling exercises, the species-specific or individual-based phenology models have to be upscaled to regional, continental, or global scales, which is a major challenge [6,17,20]. In this study, given the nature of the used models (BGMs, DGVMs, GCMs, and ESMs) we follow the method of [24] and [33] focusing on the climatological onset of leaf growth. In other words, we seek solutions that can be used to simulate climate-driven changes in the timing of SOS at a larger scale [20]. First, we handled SOS of forests, grasslands, and arable lands separately at the individual pixel level. Then we combined the PFT specific SOS according to the share of PFTs within the pixels to Geographical location (inset) and land use map of Hungary, based on the CORINE 2012 Land Cover database at 500 × 500 m spatial resolution. Land use categories shown on the map are described on the website of the database (http://land.copernicus.eu/pan-european/corine-land-cover/clc-2012, accessed on 23 November 2021). Individual pixels used for the calibration of the phenology models are highlighted with coloured rectangles (see Section 2.6). Here we considered three main land use categories: Grasslands, forests, and arable lands. Pixel size is determined by the NDVI3g reference dataset.

Phenology Models
The estimation of the onset of spring vegetation growth is a highly scale-dependent task [17]. For large-scale modelling exercises, the species-specific or individual-based phenology models have to be upscaled to regional, continental, or global scales, which is a major challenge [6,17,20]. In this study, given the nature of the used models (BGMs, DGVMs, GCMs, and ESMs) we follow the method of [24] and [33] focusing on the climatological onset of leaf growth. In other words, we seek solutions that can be used to simulate climate-driven changes in the timing of SOS at a larger scale [20]. First, we handled SOS of forests, grasslands, and arable lands separately at the individual pixel level. Then we combined the PFT specific SOS according to the share of PFTs within the pixels to provide a single SOS for a given year that is comparable with the observation-based SOS at 1/12 • × 1/12 • resolution. Three phenological models with different meteorological data requirements and underlying representation of the biological process were selected for the estimation of the onset of spring leaf growth.

WM
The first phenological model uses the simplest (and probably the most popular) approach: SOS is determined by accumulated thermal time. This method is widely used in terrestrial models such as IBIS (Integrated Biosphere Simulator; [34]) and CLM-DGVM (Community Land Model-Dynamic Global Vegetation Model; [34]). In this model (referred to as the warming model; WM), SOS is estimated as the calendar date when the cumulative growing degree-day (GDD; also known as heatsum relative to a predefined base temperature) exceeds a predetermined value. For a given PFT-specific base temperature, WM has one adjustable parameter, which is the cumulative GDD threshold. We use 1 January as the start of the GDD accumulation. The base temperature (T base ) and GDD threshold values used in this study were set by model optimisation (see Section 2.6). WM uses mean daily temperature only as input data.

CWM
The second model uses heat accumulation in spring as the driving factor (just as in the WM), but the timing of SOS as also affected by the exposure to low temperatures (i.e., fulfilment of the chilling requirement, or in other words vernalization) during winter. In this model, the GDD threshold is defined as: where NCD is the number of chilling days, while a, b, and c are three adjustable model parameters.
Here we refer to this model as a chilling-warming model (CWM). This approach is used in the process-driven ORCHIDEE (Organizing Carbon and Hydrology in Dynamic EcosystEms; [34]) and the SEIB-DGVM (Spatially Explicit Individual Based-Dynamic Global Vegetation Model; [34]) models. The CWM uses daily mean temperature as the input, but in this case, estimates of SOS for a given year use meteorological data from the second half of the previous year to calculate the fulfilment of the chilling requirement. Parameter optimisation was performed before the application of the model.

GSIM
A bioclimatic index (Growing Season Index; GSI) was constructed for the third model (Growing Season Index Model, GSIM), representing known constraints to foliar development [35]. Three variables were combined to jointly modulate the timing of SOS (where each variable has two limits): Suboptimal (minimum) temperature (with limits T Mmin and T Mmax ), vapor pressure deficit (VPD, with VPD min and VPD max ), and photoperiod (Photo min and Photo max ). Threshold limits, potentially adjustable in the GSI model, were set, and then indices were calculated separately for all three variables, and the GSI was calculated by multiplying the three indices. The calculated GSI has to reach a threshold, which is also an adjustable parameter. There is an additional adjustable parameter in the model that controls the smoothing window size that is used to avoid false onset detections [35]. Optimisation of the GSIM was done for all eight parameters included in the model.

Applicability of the Models
In forests and low-management intensity grasslands, it is reasonable to assume that the meteorological variables are the main drivers of the SOS date. In arable crops, particularly in spring-sown varieties, the phenology is also affected by human decisions (e.g., planting date). The often-intensive spring leaf growth of winter crop varieties is largely driven by the environmental conditions of the current and preceding years, as they are planted in autumn and are overwintering. Winter crops typically require vernalization; the accumulation of exposure to cold temperatures affects their phenological development [36,37]. In this sense, the CWM (originally developed for simulation of bud break of trees) might be structurally more suitable for winter crops and might reflect the effect of environmental variables on the spring leaf growth. To further complicate the picture, actual environmental conditions affect farmer decision making [38]. Crop phenology thus integrates weather, variety, and anthropogenic factors. At the target resolution of the study (1/12 • × 1/12 • ), winter and spring varieties share the total harvested area within a pixel almost equally (http://www.ksh.hu/docs/eng/xstadat/xstadat_annual/i_omn007a.html, accessed on 20 January 2021). Since spring varieties are typically planted after the initiation of leaf growth of other vegetation types, the phenology of winter crops may be the main driver of the observed green-up within a pixel. In summary, we assume that the climate will affect the intensive spring leaf growth for all plant functional types, and theoretically, all three model types are suitable for the estimation of SOS.

CarpatClim Database
CarpatClim is a high-resolution dataset based on weather observations of several meteorological variables for the Carpathian Region (44 • -50 • N latitudes and 17 • -25 • E longitudes) covering the period of 1961-2010 [39] at a 1/10 • × 1/10 • resolution regular grid. CarpatClim uses a relatively dense observation network (in Hungary, 109 stations measured precipitation and 34 stations temperature), although its temporal coverage is limited (the dataset ends in 2010). CarpatClim is one of the regional, observation-based gridded datasets that are available for specific parts of Europe (summarized in [22]). 1/10 • × 1/10 • resolution of CarpatClim data were interpolated to the target 1/12 • × 1/12 • resolution grid of this study using simple linear interpolation taking into account the area of the grid cells overlapping with the target grid cells.

FORESEE Database
FORESEE (Open Database FOR Climate Change-Related Impact Studies in Central Europe) is a free meteorological database for Central Europe covering the period of 1951-2100 with 1/6 • × 1/6 • spatial resolution [30,40]. FORESEE contains observationbased daily minimum/maximum temperature, precipitation data, and climate projections for these variables-the latter starting from the last continuously updated observed year. The dataset is supplemented by daylight mean temperature, global radiation, and VPD data estimated by the MTClim model [26]. Daylength (photoperiod) is also estimated by MTClim and used in this study to drive the GSIM. The target area of FORESEE spans 42 • 35 -51 • 05 N latitudes and 10 • 55 -28 • 05 E longitudes. The observation-based data (1951-2018) were built using the E-OBS v17 dataset [29,41]. FORESEE uses a relatively low number of station data (17 stations for temperature and 16 stations for precipitation). In the present study, we used a version of FORESEE interpolated to the target 1/12 • × 1/12 • resolution regular grid defined by the reference dataset.

ERA5 Database
ERA5 is a reanalysis database constructed and disseminated by ECMWF [42,43]. At present, ERA5 contains data from 1979 (also preliminary from 1950) until the present in hourly time steps. Observations are used indirectly in ERA5 through the 4D-Var data assimilation method. In this sense, ERA5 represents an alternative compared to CarpatClim and FORESEE that are both interpolated observation-based datasets. This assimilation method uses instrumental measurements and short-range model runs to produce gridded values of several meteorological parameters. The horizontal resolution of ERA5 is 1/4 • × 1/4 • . Surface data are available for 2D meteorological parameters such as rainfall, 2m temperature, wind speed, incoming solar radiation, and many others. ERA5 data were retrieved using the CDS API via R interface at a regular 1/10 • × 1/10 • resolution grid (due to technical considerations). The hourly data were post-processed to provide Atmosphere 2021, 12, 963 6 of 20 daily minimum/maximum temperatures and other ancillary variables needed in the study. ERA5 data were interpolated to the target grid using simple linear interpolation taking into account the area of the grid cells overlapping with the target grid cells.

Remote Sensing Based Reference Spring Phenology Dataset
Satellite-based remote sensing data were used in the present study to provide a reference for modelling. As remote sensing does not directly measure the SOS, the information registered by the instruments can be considered as proxy data. Thus, SOS determination from satellite data also has its own methodology and uncertainty (e.g., [44][45][46][47][48]) that has to be considered during the application of the datasets.
The MODIS-adjusted NDVI3g remote sensing dataset [29] covering Central Europe was used to provide observation-based data describing SOS. The dataset was created from the original NDVI3g [49] by statistical harmonisation (adjustment) using NDVI calculated from the data measured by the MODerate resolution Imaging Spectroradiometer (MODIS). Adjustment coefficients were determined for the overlapping years of NDVI3g and MODIS (2000-2013) and then applied to the entire NDVI3g period . The dataset has a 1/12 • × 1/12 • spatial and 15-day temporal resolution for the 1982-2013 period. The harmonised NDVI3g dataset showed improved quality for diverse vegetation metrics compared to the original NDVI3g [29]. SOS was calculated at pixel level based on the methodology described in [29], which partially follows the work of [50]. In brief, SOS was calculated as the average of three different methods, where SOS is the date when (1) the NDVI shows the highest rate of change in the year; (2) the derivative of the wavelet transformed function fitted to NDVI values reaches its first local maximum; and (3) the NDVI first reaches the value that gives the highest rate of change of the multiannual NDVI curves (serving as a predefined threshold). The uncertainty of the NDVI3g-based reference SOS data was estimated as the standard deviation of the results of the above three methods.

Ancillary Data Used in the Study
The elevation of grid cells at 1/12 • × 1/12 • resolution was estimated by the Shuttle Radar Topography Mission (SRTM) Digital Elevation Database [51]. Elevation classes within the target area were created considering the number of pixels within a given class so that sufficient data were available to support statistical analysis. Three classes were created: pixels below 100 m, between 100 and 200 m, and above 200 m elevation.
The share of generalized land cover categories (i.e., PFTs) within each grid cell was calculated based on the CORINE Land Cover 2012 database. Grid cells with relatively homogeneous PFT cover were used for the optimisation of the phenology models. Pixels were classified into three categories: If, within the pixel, PFT cover reached 70% threshold for grasslands, 80% threshold for forests, and 90% threshold for arable lands. Overall, 25 pixels were selected for forest, 99 pixels for arable lands, and only 5 pixels for grasslands ( Figure 1).

Optimisation of Phenological Models
Annually calculated, PFT-dependent median SOS dates were used to optimize the models using the selected homogeneous pixels and the description of their spring phenology from the NDVI3g dataset.
In the case of the WM, a simple global search algorithm was used with T base limit ranging from 0 to 8 • C and GDD threshold ranging from 100 to 300 • C days. We used 1 • C steps for T base and 1 • C days steps for GDD threshold, and all parameter combinations were tested. The root mean square error (RMSE) was calculated from the SOS medians of the PFT specific pixels calculated by model-dataset combinations and the NDVI3g-based SOS median. The parameter optimum was selected by finding the minimum RMSE.
In the case of the CWM, all five parameters were subject to calibration (a, b, c, GDD, and NCD; see [24]). We set the range of the parameters based on [24,34]. Parameter a was varied between −90 and −47 • C days, while parameter b was varied between 622 and 653 • C days, with 1 • C days steps for both cases. Parameter c was varied between −0.011 and −0.0101 with 0.001 steps. T base was varied between 0 and 8 • C, and NCD_T base was varied between 0 and 5 • C with 1 • C steps (NCD refers to the number of chilling days).
The GSIM model is characterised by a relatively high number of parameters. To simplify the optimisation process, we first fixed the difference between the T MMin and T MMax (see above). Then, minimum temperature limits were varied systematically between −2 and 7 • C (T MMin ), and 5 and 14 • C (T MMax ) with 1 • C resolution. VPD limits were varied between 500-1400 hPa (VPD min ) and 3700-4600 hPa (VPD max ) with 100 hPa steps. The photoperiod was varied between 35,500-36,400 s (Photo Min ) and 39,100-40,000 s (Photo max ) with 100 s steps. After this initial optimisation, a second step was necessary with reduced degrees of freedom. By fixing all other parameters, minimum temperature limits were varied between −3 and 5 • C (T MMin ), and 3 and 11 • C (T MMax ) with 0.5 • C resolution, now without fixing the difference between the minimum temperature thresholds. The GSI threshold was also included in the optimisation and was varied between 0.42 and 0.58 with 0.01 steps.
After model optimisation using relatively homogeneous PFT cover pixels, phenological models were run for all available pixels in the target area. Then the PFT-specific results were combined by weighting using the share of PFTs for the given pixel. The resulting SOS dataset reflects the contribution of different PFTs to the estimated SOS, which means that they are comparable with the observation-based SOS dataset for the whole country.

Statistical Analysis
The SOS datasets generated using the models and the adjusted NDVI3g dataset were calculated in a common 1/12 • × 1/12 • resolution grid, with a total of 1592 pixels available in Hungary. As the NDVI3g dataset provides unrealistic values over lakes and urban areas, a few pixels were excluded from the analysis. The remaining 1539 pixels were analysed in the study for the period of 1982-2010. The start year of the comparison is defined by the availability of NDVI3g, while the end year (2010) is constrained by the limited temporal availability of the CarpatClim dataset.
Climatological maps providing information about the long-term mean SOS data for the pixels were created. Long-term mean SOS was calculated for each pixel using the different model-database combinations, and for the NDVI3g dataset. The pixel-based results were further analysed (country median, standard deviation, 5th and 95th percentiles). Elevation classes were created for analysis based on the DEM dataset. We also calculated the spatial covariance between the pixel level SOS climatology values.
The interannual variability of the results was quantified in annual box-whisker plots. Median SOS dates were used to further analyse the relationship between the observationbased and estimated data. Performance metrics (bias, RMSE, and square of the linear correlation coefficient (R 2 ); for definitions see e.g., [52]) were calculated using the annually calculated country medians for the elevation classes, and the pixel-based SOS time series as well.
Trend analysis was performed using the calculated annual median SOS values to check the applicability of the models in terms of long-term, climate change-related SOS shifts in the study period. The analysis was executed at the pixel level, then histograms were created to compare the overall, pixel-based trends, and the observation-based trends. Statistical significance (p-value) of the regressions was quantified.
To quantitatively assess whether the model selection or the meteorological database selection influences the results more, a specific algorithm was developed. Annual SOS dates were analysed for the 1982-2010 period for all nine model-database combinations at the pixel level. As the first step, cumulative density functions (CDFs) were calculated for the SOS time series of the given pixel. Two groups were created from the CDFs, where each group contained three separate CDF combinations. In the first group (weather group), the three models were fixed one at a time and their results were compared as the function of meteorological datasets (CarpatClim, FORESEE, ERA5). In the second group (model group) the three meteorological datasets were fixed one at a time, and the associated model results were compared for the WM, CWM, and GSIM models, respectively. A total of six CDF combinations were created (three for the weather group and three for the model group), where each combination consisted of three CDFs. The Kolmogorov-Smirnov distance was used to calculate the average distance between all possible CDF pairs in each combination. The mean distance was calculated for the two groups (model and weather). Then, to calculate one unified index (referred to as the Importance Indicator, abbreviated as IID) the mean distance of the second group was subtracted from the mean distance of the first group. If the model selection is more important than the meteorological dataset selection, the difference is negative. The IID was calculated for all grid cells and visualised in the form of a heat map. The statistical significance of IID was calculated with a permutation test for each pixel. Using the previously calculated CDFs, 100,000 permutations were performed. We calculated the IID for each relabelled CDF. The null hypothesis was that the true IID is zero (i.e., we cannot choose between model and weather group). The p-values were calculated by counting the number of cases where the absolute value of the random IID was higher than or equal to the absolute value of the actual IID.
The calculations were performed in Interactive Data Language v8.5 (Harris Geospatial Solutions, Inc., Broomfield, CO, USA), and in the R statistical computing software [53], where the latter was used for IID calculations. Table 1 presents the parameters of optimised models. As a result of the GSIM optimisation, the minimum temperature and VPD thresholds were changed relative to the original values proposed by [35], while the daylength limits remained the same. For grasslands and arable lands, the minimum temperature limits are the same, and for forests and grasslands, VPD limits became the same. Table 1. PFT-specific parameters for the three optimised models used in the study. WM refers to the warming model, CWM stands for the chilling-warming model, and GSIM is the Growing Season Index Model. A detailed description of the GSIM is provided in [35].

Grasslands
Forests For the WM, the optimum GDD threshold was set to 105-128 • C days (depending on the PFT) as a result of the calibration. The same T base was used for grasslands and arable lands (5 • C). T base for forests is higher than that of arable lands and grasslands.
T base values used by the WM and the CWM differ as a result of the optimisation. However, the T base values for grasslands and arable lands are lower than for forests, both for the WM and CWM models. The chilling day base temperature (NCD_T base ) is lower in grasslands than in forests and croplands. Figure 2 shows SOS climatology maps for the country, calculated as median SOS date (DOY) of the three models driven by the three different climate datasets covering the 1982-2010 period. Model results disagree about the location of regions typical for early SOS. Observation-based long-term SOS indicates the earliest greening in the eastern, plain regions of the country. If we compare the sensitivity of the models to local features, it is clear that the CarpatClim dataset provides the highest level of details (pixel-by-pixel variability). As expected, the ERA5 dataset is characterized by smoothed-out spatial features, given the 1/4 • × 1/4 • native spatial resolution, which is quite coarse compared to the other datasets.

Climatology Maps for SOS
datasets. Table 1. PFT-specific parameters for the three optimised models used in the study. WM refers warming model, CWM stands for the chilling-warming model, and GSIM is the Growing S Index Model. A detailed description of the GSIM is provided in [35].    Table 2 provides key statistics behind the SOS climatology maps, allowing for quantitative insight into the spatial variability of the SOS estimates. The observation-based median SOS date for the 1982-2010 period is DOY 105 (15 April). SOS values increase with elevation both according to the models and the observation (with the exception of FORESEE-driven GSIM and ERA5-driven WM). Model-database combinations typically reproduce the observed median given the optimisation of the models. GSIM systematically predicts earlier SOS relative to the observation, except for the CarpatClim input data in the 200-700 m elevation class. In the case of CWM and WM, the different meteorological datasets provide very similar results. Table 2. Spatial statistics of the SOS climatologies (expressed by DOY): Medians, standard deviations, 5th and 95th percentiles. Percentiles are abbreviated as "perc", standard deviation is abbreviated as "SD". The last column presents the observation-based data. Results are presented for the selected elevation classes as well. Standard deviation (i.e., spatial variability) is underestimated by the model-database combinations (except for the GSIM and WM driven by CarpatClim in the 200-700 m elevation class). The CWM consistently predicts smaller spatial variability than the other two. The GSIM has the highest overall SD values; these are the closest to those of the observations but still underestimated (the exception is the 0-100 m GSIM driven by the CarpatClim and ERA5 database). According to the observed data, the earliest vegetation onset is DOY 92 (2 April), while the latest is DOY 118 (28 April). Looking at model performance, the GSIM is the best in estimating SOS data points in the 5th percentile (early greening regions), while the WM provides better predictions for the 95th percentile (late greening regions).
An additional statistical assessment of the SOS climatologies is presented in Table S1 of the Supplementary Material. According to the results, the spatial covariance between the maps generated by the model-database combinations is typically low. The highest R 2 is associated with the WM driven by the CarpatClim and the FORSESEE database. Figure 3 shows the interannual and spatial variability of the modelled and observed SOS, with relatively higher spatial variability within the specific years for the GSIM and the WM. CWM is associated with low spatial variability across years that is consistent with the SOS maps presented in Figure 2. None of the model-database combinations could reproduce the observed high spatial variability of SOS (represented by the interquartile ranges) for the specific years that are present in the observation-based SOS plot (lowest graph in Figure 3). the WM. CWM is associated with low spatial variability across years that is consisten with the SOS maps presented in Figure 2. None of the model-database combinations could reproduce the observed high spatial variability of SOS (represented by the interquartile ranges) for the specific years that are present in the observation-based SOS plot (lowes graph in Figure 3). IAV was expressed as the standard deviation around the annual median SOS dates Table 3 shows statistics about the IAV for the whole county, based on all available model database combinations, and also for the NDVI3g-based results. The overall standard de viation of the observation is ~5 days and decreases with elevation. The observed IAV is smaller than modelled, and the CWM estimate is the closest to the NDVI3g-based IAV. In the case of WM, the CarpatClim-driven CWM and ERA5-driven GSIM concur that the least variability is found at the elevation class 200-700 m, which is the case in the observed data.

Interannual Variability
The performance metrics describing modelled annual median SOS dates relative to the observation-based dataset show that the models typically slightly underestimate the observation. Further, the smallest overall RMSE value results from the CWM, followed by WM and GSIM (Supplementary Material, Table S2). An additional statistical assessmen of the IAV is presented in Tables S2 and S3, and Figures S1 and S2 of the Supplementary Material. IAV was expressed as the standard deviation around the annual median SOS dates. Table 3 shows statistics about the IAV for the whole county, based on all available modeldatabase combinations, and also for the NDVI3g-based results. The overall standard deviation of the observation is~5 days and decreases with elevation. The observed IAV is smaller than modelled, and the CWM estimate is the closest to the NDVI3g-based IAV. In the case of WM, the CarpatClim-driven CWM and ERA5-driven GSIM concur that the least variability is found at the elevation class 200-700 m, which is the case in the observed data.
The performance metrics describing modelled annual median SOS dates relative to the observation-based dataset show that the models typically slightly underestimate the observation. Further, the smallest overall RMSE value results from the CWM, followed by WM and GSIM (Supplementary Material, Table S2). An additional statistical assessment of the IAV is presented in Tables S2 and S3, and Figures S1 and S2 of the Supplementary Material. Table 3. Interannual variability (expressed as standard deviation in days, SD) of the annually calculated median SOS dates for the different model-database combinations and observations. Data are shown for the selected elevation classes and the whole study area.

Trend Analysis
Temporal change in the timing of spring leaf development was extracted from the simulated and observed SOS datasets. Observation-based estimates suggest that the date of SOS advanced in Hungary by 2.4 days per decade during the 1982-2010 time period, and this advance is the fastest at elevations below 100 m ( Table 4). The FORESEEand CarpatClim-driven GSIM generate an SOS trend consistent with the observations, whereas the WM shows the largest advancement of spring greening. The models used in this study are unable to consistently account for the observed trend change across the elevations. They typically generate similar trends across all three elevation classes (Supplementary Material, Figure S3).

Model and Driving Meteorology Database Selection
Our modelling exercise shows that choosing the right model to predict SOS is far more consequential than selecting the meteorological database to drive that model (Figure 4). In the case of Hungary, it is only a minority of mountainous pixels found in the highest elevation class (3.1% of the area) where the selection of the database is more important in terms of SOS prediction accuracy. The difference in the prediction accuracy as a result of model or database choice is statistically significant (p < 0.05) in 1169 pixels out of 1539 considered in the analysis.

Discussion
Modelling the climatological onset of spring leaf growth in BGMs and ESMs/GCMs is crucial, especially under current rapid climate change [6,54]. Modelling approaches try to approximate the reality and represent our current knowledge of plant physiology but will remain simplified representations of the real world processes [13,18,24]. Alongside modelling issues due to simplification, it is widely recognised that inaccuracies in the input data and errors in reference datasets used for model optimisation (see e.g., [5,25]) both negatively affect the accuracy of any prediction. In the present study, we aimed to evaluate the relative contribution of the phenology model and meteorological dataset selection to the accuracy of predicting the onset of spring leaf growth. The proposed method could be further developed and integrated into GCMs and ESMs.

Model Parametrisation, Model Structural Differences
The TMMax threshold for the GSIM suggested by the calibration was considerably higher than those proposed by [35] at the global scale (Table 1), whereas the minimum temperature thresholds were more realistic and correspondent to plant physiology. GSIM uses daily, non-accumulated environmental data to constrain bud break in spring and subsequent leaf growth, unlike the WM and CWM, which use cumulative values. In this sense, the WM and CWM can be considered models with "memory" with the capacity to integrate the effects of past environmental conditions. The results of our model parameterisation suggest that minimum temperatures may need adjustment to allow for sufficient heat accumulation to drive SOS.
The SOS trends simulated by the GSIM capture the observations fairly well, while

Discussion
Modelling the climatological onset of spring leaf growth in BGMs and ESMs/GCMs is crucial, especially under current rapid climate change [6,54]. Modelling approaches try to approximate the reality and represent our current knowledge of plant physiology but will remain simplified representations of the real world processes [13,18,24]. Alongside modelling issues due to simplification, it is widely recognised that inaccuracies in the input data and errors in reference datasets used for model optimisation (see e.g., [5,25]) both negatively affect the accuracy of any prediction. In the present study, we aimed to evaluate the relative contribution of the phenology model and meteorological dataset selection to the accuracy of predicting the onset of spring leaf growth. The proposed method could be further developed and integrated into GCMs and ESMs.

Model Parametrisation, Model Structural Differences
The T MMax threshold for the GSIM suggested by the calibration was considerably higher than those proposed by [35] at the global scale (Table 1), whereas the minimum temperature thresholds were more realistic and correspondent to plant physiology. GSIM uses daily, non-accumulated environmental data to constrain bud break in spring and subsequent leaf growth, unlike the WM and CWM, which use cumulative values. In this sense, the WM and CWM can be considered models with "memory" with the capacity to integrate the effects of past environmental conditions. The results of our model parameterisation suggest that minimum temperatures may need adjustment to allow for sufficient heat accumulation to drive SOS.
The SOS trends simulated by the GSIM capture the observations fairly well, while the other models offered less accurate results. A possible explanation might be that the minimum temperature is used by the GSIM, while the mean temperature is used by the other two approaches. At the point of leaf growth initiation, the vegetation might be more sensitive to early morning low temperatures, which would be masked by the use of the daily mean [35]. As the majority of current phenology models use mean air temperature as a driving variable, our observation suggests that considering minimum temperatures may improve their accuracy.

Spring Plant Phenology in Central Europe
Plant phenology was already investigated in Hungary, in studies focusing on leaf phenology, flowering time, or other aspects of the phenological cycle [9,29,[55][56][57][58][59][60][61]. In a study looking at the flowering of six mixed herbaceous and woody species between 1951 and 2000, the date of flowering advanced by 1.9-4.4 days/decade in four out of the six species [60]. These results corroborate the trends we extracted from the NDVI3g dataset ( Figure S3 of the Supplementary Material). Similarly, spring budburst and flowering were seen to advance by 2.1 days/decade in Europe during an investigated period covering 30 years [57], with a similar observation originating from Hungary [56]. However, a number of herbaceous and woody species studied in 10 central European regions in 1980-1995 advanced their spring phenology by 13.9 days/decade in urban and 15.3 days/decade in rural regions [62]. This is nearly an order of magnitude faster than the trend observed in this study. An examination of Europe from satellite observations in 1982-2001 at 1/10 • × 1/10 • spatial resolution came to a similar conclusion: Spring advanced by 13.2 days/decade in the domain that includes Hungary [63]. Interestingly, in situ observations of SOS trends in Central Europe based on the PEP725 database for the 1982-2011 period [4] indicate a slightly faster advance by 4.7 days/decade than the NDVI3g-based trend presented here. However, the WM model (driven by FORESEE or CarpatClim) comes very close to the observations derived from PEP725. Clearly, even though the spring initiation of growth is a complex process, the modelling approaches investigated in this study can provide reasonable predictions.

Evaluation of Model Performance
Relatively few studies of plant phenology report model bias (Table 5), even though this metric of model performance is critically important in the case of BGMs and ESMs, as biased estimation of SOS affects the simulated carbon budget [64]. The performance of our models is comparable to those published in the literature. The RMSE tends to be smaller in individual-based models. Our models show moderate performance as measured by R 2 , which is partly explained by the spatial scale of this study (effect of mixed PFTs) and the general optimisation of the models. The bias is relatively large in some pixels, indicating that further optimisation and further development of the models are needed.

Model Selection Versus Meteorology Database Selection
Phenology models are known to be sensitive to the choice of the forcing data [25], but this is the first study that compares the relative importance of the model and driving meteorological dataset selection. Due to the recent advances in the construction and dissemination of meteorological information, today's users typically have access to a number of different meteorological databases to drive their models. Similarly, a large variety of phenology models is available (e.g., [13]). Previous studies comparing different spring phenology models [5,13,14,33,69] show large variation in their capacity to predict SOS, with a clear need for further investigation [4,69]. In summary, phenology modelers can and have to make decisions on the model-meteorology combination for practical applications.
We show that the following different model-dataset combinations are best suited to represent varying aspects of spring pant phenology: SOS climatology (WM driven by CarpatClim), IAV (WM driven by FORESEE), and SOS trend (GSIM driven by Carpat-Clim). We further show that the SOS estimation performance is strongly affected by model selection and also by the choice of meteorological dataset. This implies that a detailed evaluation of model performance is needed, preferably using pre-determined and clearly defined performance metrics. Additionally, multiple meteorological datasets should be tested, despite the demand for computational capacity and extensive data handling. The IID method to quantify the relative importance of model and meteorology dataset selection for the accuracy of the prediction seems to have performed well and clearly indicated environments where preference to either selection should be preferred.
As Figure S2 of the Supplementary Material illustrates, model selection was more important than the meteorology database selection in most of Hungary, although there are notable exceptions. For example, model selection was not the dominant factor in the regions concentrated in the mountainous areas. This is interesting because spatial interpolation of meteorological data over complex terrain suffers from significant issues [70,71]. Some climate datasets may be able to represent rugged topography better than others, bringing the choice of the dataset to the fore. In the case of Hungary, which does not have high mountains (1015 m is the highest point of the country), the choice of phenological model is clearly more important, but this may not be the case in every environment or if performing studies over large areas. The selection of a suitable driving dataset may therefore be critical in some circumstances; however, such a determination is beyond the scope of the current study.

Conclusions
This study evaluated the influence of model and climate dataset selection on the accuracy of the start of leaf onset prediction using Hungary as a case study. To our knowledge, no similar study exists that uses multiple phenology models and driving meteorological datasets together with observation-based SOS estimates with the evaluation of the relative importance of model structure and driving meteorology. We provide information supporting model selection and provide a reference dataset and model parameters for future studies. Generalization of the presented methodology at large spatial scales (e.g., in the context of GCMs and ESMs) needs further research focusing on other PFTs as well.
No single model-dataset combination could be selected as clearly superior to the others. Our evaluation based on the newly developed IID index suggests that climate dataset choice may be more important when high environmental variation is expected at a small scale. However, in most cases, the choice of the model structure is more important than the selection of the forcing meteorological database. Current and future climate datasets should be evaluated for their applicability to phenological research prior to their use in driving models.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/atmos12080963/s1, Figure S1: Relationship between the observation-based and the WM based annual median SOS dates for the 1982-2010 time period. Error bars represent +/− one standard deviation of the NDVI3g-based SOS estimations. The equation of the fitted line is also presented (R 2 is significant with a p-value < 0.0001)., Figure S2: R 2 map based on the FORESEE-driven WM and the observations for the 1982-2010 time period., Figure S3: Frequency distribution of the pixel-based decadal trend values for the different models and the different driving datasets per model. The NDVI3g-based trend is presented in all plots to provide a reference. The plot was constructed with data from all elevation classes together. Table S1: Performance metrics for the SOS climatology maps. RMSE and bias are expressed by DOY, while R 2 is unitless. Table S2: Performance metrics for the calculated annual median SOS dates relative to the observation-based results. RMSE and bias are expressed by DOY, while R 2 is unitless. Table S3: Additional statistical information about the interannual variability of SOS calculated at the individual pixel level. The 5th and 95th percentiles of the pixel level statistics are presented. RMSE and bias are expressed by DOY, while R 2 is unitless.

Data Availability Statement:
The presented dataset is available from the authors upon request.