Empirical Approach for Modelling Tree Phenology in Mixed Forests Using Remote Sensing

: Phenological events are good indicators of the effects of climate change, since phenological phases are sensitive to changes in environmental conditions. Although several national phenological networks monitor the phenology of different plant species, direct observations can only be conducted on individual trees, which cannot be easily extended over large and continuous areas. Remote sensing has often been applied to model phenology for large areas, focusing mostly on pure forests in which it is relatively easier to match vegetation indices with ground observations. In mixed forests, phenology modelling from remote sensing is often limited to land surface phenology, which consists of an overall phenology of all tree species present in a pixel. The potential of remote sensing for modelling the phenology of individual tree species in mixed forests remains underexplored. In this study, we applied the seasonal midpoint (SM) method with MODIS GPP to model the start of season (SOS) and the end of season (EOS) of six different tree species in Slovenian mixed forests. First, substitute locations were identiﬁed for each combination of observation station and plant species based on similar environmental conditions (aspect, slope, and altitude) and tree species of interest, and used to retrieve the remote sensing information used in the SM method after ﬁtting the best of a Gaussian and two double logistic functions to each year of GPP time series. Then, the best thresholds were identiﬁed for SOS and EOS, and the results were validated using cross-validation. The results show clearly that the usual threshold of 0.5 is not best in most cases, especially for estimating the EOS. Despite the difﬁculty in modelling the phenology of different tree species in a mixed forest using remote sensing, it was possible to estimate SOS and EOS with moderate errors as low as <8 days ( Fagus sylvatica and Tilia sp.) and <10 days ( Fagus sylvatica and Populus tremula ), respectively.


Introduction
Reliable phenological observations, achieved through either direct visual human observations in phenology networks, near-surface measurements using phenology cameras and unmanned aerial vehicles, or remote sensing via satellites [1], are important for anticipating the effects of climate change on tree phenology in forests. In fact, tree phenological phases mirror environmental conditions and thus can respond to variations in climate, making them good indicators of the effects of climate change when long series of phenological observations exist, as they can reveal some trends in the timing of spring and autumn phenological events [2,3].
Long-term phenological observations in trees, including spring events such as leaf unfolding and autumnal events of leaf colouring, have shown that a rise in global temperature generally leads to earlier timing of spring events [4][5][6]. Leaf phenology is a major determinant of water and CO 2 fluxes, and several studies clearly showed that growing season length controls net ecosystem primary productivity and that phenological shifts already modified the annual carbon cycle of terrestrial ecosystems [7,8].
Although many national phenological monitoring networks have their phenological stations equally distributed over altitude, slope, and aspect, most of the stations are either outside the forest or on the forest edge due to accessibility and other practical reasons [4,9]. This could have implications for the observed tree phenology and the generalisations resulting from these observations at national or regional scales [10].
Direct observations on the ground can provide phenological characteristics of the different plant species present at a given location. However, they are unable to provide continuous phenology characteristics for a given geographic area [11]. On the other hand, satellite data can help estimate and monitor phenological characteristics over extended areas, even though these estimations are always more generalised than direct observations of individual plant phenology [12,13].
Phenology modelling using remote sensing typically makes use of vegetation indices such as the normalised difference vegetation index (NDVI), the enhanced vegetation index (EVI) and leaf area index (LAI) [1,14]. The choice of remote sensing data source requires a compromise between their spatial resolution, their temporal resolution, and their time span. Landsat missions provide long records of data with 30 m spatial resolution, but their temporal resolution is around 16 days, making it more challenging to reconstruct continuous data, especially when large data gaps are present. For this reason, several phenology modelling studies turned to moderate and low resolution satellites (>250 m) provided by the advanced very-high-resolution radiometer (AVHRR) and the moderate-resolution imaging spectroradiometer (MODIS) [15][16][17]. Some studies demonstrated the adequacy of gross primary production (GPP) for modelling plant phenology [18,19], making the MODIS GPP product a useful and yet underexplored data source to that end, considering their high temporal resolution of 8 days.
To derive phenological characteristics from remote sensing data, seasonality in the satellite data time series is essential. Different factors affect the seasonality observed in the remote sensing data. These factors include noise and extended gaps in the data [20,21], but also low temporal resolution [22]. It is therefore usual practice to apply mathematical functions to smooth or fit a curve to the time series of remote sensing data and provide continuous data useful for modelling the phenology. These methods include logistic and improved logistic models [23][24][25][26], Gaussian functions [27], quadratic functions and nonlinear spherical models [28,29], Savitzky-Golay filters [30], polynomial functions [31], discrete Fourier series [11] and machine learning-based algorithms [32]. After reconstructing continuous data, phenological characteristics are often extracted following different approaches [13], mainly threshold-based methods [11,33], the moving averages approach [34], the inflection point method [23] and the time of maximum increase [33,35]. The data reconstruction algorithm and phenology extraction methods chosen would greatly affect the modelled phenology. The simplicity and flexibility of threshold-based methods make them particularly popular. In addition, they were found to predict the start of season (SOS) more accurately than other methods in different forests in Europe [11], North America [36] and China [37].
Phenology modelling studies using remote sensing mostly focused on pure deciduous tree species and showed a good correlation with ground observations. In areas with mixed plant species, the phenology of individual plant species are all grouped under a unique "pixel ecosystem" phenology, or land surface phenology, which is often investigated instead of the phenology of the individual tree species in most studies dealing with mixed forests [38][39][40]. Land surface phenology is generally found to be more difficult to model in mixed forests compared to pure forests, especially for the end of season (EOS) [41]. The ground observations often used to match remote sensing data for EOS are either the date of leaf colouring or the date of leaf fall depending on plant species [42], which are further difficult to apprehend in mixed forests. To our knowledge, very few (if any) studies Remote Sens. 2021, 13, 3015 3 of 15 modelled the phenology of individual tree species in mixed forests using remote sensing. In fact, with multiple plant species coexisting in the extent of a remote sensing pixel, it is not straightforward to match ground observations with remote sensing information in order to model the phenology of different tree species in a mixed forest.
In this study, we suggest an empirical approach to model the phenology of different tree species in mixed forests. We take advantage of multi-year observations at multiple phenology stations and satellite data to identify the best thresholds for characterizing the phenology (SOS and EOS) for individual tree species at different stations in mixed tree species forests.

Phenological Observations
We used data on direct phenological observations of the timing of first leaf unfolding (hereafter termed as start of season, SOS) and general leaf colouring (hereafter termed as end of season, EOS) of selected tree species of the National Phenological Network (NPN) of the Slovenian Environment Agency. Slovenia is characterised by fairly large gradients of climatic factors due to its position between the Alps, the Mediterranean and continental Europe [43,44]. Consequently, a great variety of forest habitats, from the lowlands to the high mountains, can be found in the country [45]. Six broadleaved tree species/groups with more than ten years of available phenological observations provided by the NPN were included in this study: European beech (Fagus sylvatica), silver birch (Betula pendula), lime tree (mainly Tilia platyphyllos), common ash (Fraxinus excelsior), oak (pedunculate oak and sessile oak, i.e., Quercus robur and Quercus petraea, respectively) and aspen (Populus tremula). The phenological data were collected at 17 NPN phenological stations which are distributed throughout Slovenia ( Figure 1).  [41]. The ground observations often used to match remote sensing data for EOS are either the date of leaf colouring or the date of leaf fall depending on plant species [42], which are further difficult to apprehend in mixed forests. To our knowledge, very few (if any) studies modelled the phenology of individual tree species in mixed forests using remote sensing. In fact, with multiple plant species coexisting in the extent of a remote sensing pixel, it is not straightforward to match ground observations with remote sensing information in order to model the phenology of different tree species in a mixed forest.
In this study, we suggest an empirical approach to model the phenology of different tree species in mixed forests. We take advantage of multi-year observations at multiple phenology stations and satellite data to identify the best thresholds for characterizing the phenology (SOS and EOS) for individual tree species at different stations in mixed tree species forests.

Phenological Observations
We used data on direct phenological observations of the timing of first leaf unfolding (hereafter termed as start of season, SOS) and general leaf colouring (hereafter termed as end of season, EOS) of selected tree species of the National Phenological Network (NPN) of the Slovenian Environment Agency. Slovenia is characterised by fairly large gradients of climatic factors due to its position between the Alps, the Mediterranean and continental Europe [43,44]. Consequently, a great variety of forest habitats, from the lowlands to the high mountains, can be found in the country [45]. Six broadleaved tree species/groups with more than ten years of available phenological observations provided by the NPN were included in this study: European beech (Fagus sylvatica), silver birch (Betula pendula), lime tree (mainly Tilia platyphyllos), common ash (Fraxinus excelsior), oak (pedunculate oak and sessile oak, i.e., Quercus robur and Quercus petraea, respectively) and aspen (Populus tremula). The phenological data were collected at 17 NPN phenological stations which are distributed throughout Slovenia ( Figure 1).  Tree selection and observations were made in accordance with national guidelines [46], which in general follow the World Meteorological Organization (WMO) guidelines for phenological observations [9]. The monitoring of the different phenological events was performed on single individual mature trees for each tree species in natural populations, but more tree species at the same phenological station could be included. The observed trees could be selected in forests (seldom pure), at the forest edge, and in the open area. In spring, the observed period of leaf unfolding and leaf development was approximately four weeks, and during this time, the observations were performed daily. The SOS was recorded when the first regular surfaces of leaves became visible in three to four places on the observed tree. The phase corresponds to Biologische Bundesanstalt, Bundessortenamt und Chemische Industrie code 11 (BBCH11) [47]. The EOS was recorded when at least 50% of the leaves turned yellow on the observed tree (BBCH code 94). Each observed tree was precisely geolocated (longitude, latitude) using a GPS, and station characteristics were noted (elevation, slope, and aspect). The altitude of observed trees ranged from 53 to 820 m above sea level. The period of observation was from 2004 to 2019 for Fagus sylvatica and Quercus sp. at some stations, and from 2010 to 2019 for the other combinations of stations and species.

Remote Sensing Data
The phenological observations were not always made in the middle of a forest with a significant cover of the plant species being observed. Therefore, to be able to retrieve remote sensing information that reflects ground observations to some extent, the nearest forest stand [48] within a radius of 20 km [49] from the observed tree and with comparable station characteristics was identified. Given the original observation station characteristics (altitude, slope, and aspect), the substitute location used as the centre for the remote sensing data pixel had characteristics not exceeding altitude ± 200 m, slope ± 10 • , and aspect ± 1 class (note that aspects represent the standard 1 to 8 categorisation, i.e., N, NE, E, SE, S, SW, W, and NW, with −1 for flat areas). In addition, only stands larger than 1 ha (10,000 m 2 ) qualified for use as a substitute site. This threshold was based on the assumption that if a tree species is represented as nearly pure in 1 ha, it is likely to be present as a secondary species in the rest of the pixel, therefore contributing substantially to the pixel value (note that most of the species included in this study do not form pure stands over extended areas in Slovenia). From the qualifying forest stands satisfying the aforementioned criteria, the nearest stand to the observation station was used as a substitute stand. The centroid of the substitute stand was used to retrieve remote sensing information from Google Earth Engine for the period 2003 to 2020. The previous routine (the generation of the pixel centre from the observation station) applied for each combination of station and tree species using the ArcGIS desktop software (version 10.7.1) is summarised in Figure 2.
Tree selection and observations were made in accordance with national guidelines [46], which in general follow the World Meteorological Organization (WMO) guidelines for phenological observations [9]. The monitoring of the different phenological events was performed on single individual mature trees for each tree species in natural populations, but more tree species at the same phenological station could be included. The observed trees could be selected in forests (seldom pure), at the forest edge, and in the open area. In spring, the observed period of leaf unfolding and leaf development was approximately four weeks, and during this time, the observations were performed daily. The SOS was recorded when the first regular surfaces of leaves became visible in three to four places on the observed tree. The phase corresponds to Biologische Bundesanstalt, Bundessortenamt und Chemische Industrie code 11 (BBCH11) [47]. The EOS was recorded when at least 50% of the leaves turned yellow on the observed tree (BBCH code 94). Each observed tree was precisely geolocated (longitude, latitude) using a GPS, and station characteristics were noted (elevation, slope, and aspect). The altitude of observed trees ranged from 53 to 820 m above sea level. The period of observation was from 2004 to 2019 for Fagus sylvatica and Quercus sp. at some stations, and from 2010 to 2019 for the other combinations of stations and species.

Remote Sensing Data
The phenological observations were not always made in the middle of a forest with a significant cover of the plant species being observed. Therefore, to be able to retrieve remote sensing information that reflects ground observations to some extent, the nearest forest stand [48] within a radius of 20 km [49] from the observed tree and with comparable station characteristics was identified. Given the original observation station characteristics (altitude, slope, and aspect), the substitute location used as the centre for the remote sensing data pixel had characteristics not exceeding altitude ± 200 m, slope ± 10°, and aspect ± 1 class (note that aspects represent the standard 1 to 8 categorisation, i.e., N, NE, E, SE, S, SW, W, and NW, with −1 for flat areas). In addition, only stands larger than 1 ha (10,000 m²) qualified for use as a substitute site. This threshold was based on the assumption that if a tree species is represented as nearly pure in 1 ha, it is likely to be present as a secondary species in the rest of the pixel, therefore contributing substantially to the pixel value (note that most of the species included in this study do not form pure stands over extended areas in Slovenia). From the qualifying forest stands satisfying the aforementioned criteria, the nearest stand to the observation station was used as a substitute stand. The centroid of the substitute stand was used to retrieve remote sensing information from Google Earth Engine for the period 2003 to 2020. The previous routine (the generation of the pixel centre from the observation station) applied for each combination of station and tree species using the ArcGIS desktop software (version 10.7.1) is summarised in Figure 2. We opted for MODIS GPP (8-day aggregate) despite its very coarse spatial resolution (500 m) because the seasonality of GPP was more marked in the mixed forests of this study We opted for MODIS GPP (8-day aggregate) despite its very coarse spatial resolution (500 m) because the seasonality of GPP was more marked in the mixed forests of this study compared to that of vegetation indices, hence leading to better estimates of phenological events, i.e., SOS and EOS. In addition, the proportion of good quality data available for MODIS GPP (non-missing data after filtering out bad quality pixels) was considerably higher than that of vegetation indices, both from MODIS and Landsat 7, making the seasonality easier to retrieve. However, MODIS (250 m spatial resolution, 8-day temporal Remote Sens. 2021, 13, 3015 5 of 15 resolution) and Landsat 7 ETM+ (30 m spatial resolution, 16-day temporal resolution) derived EVI and NDVI, as well as MODIS LAI (500 m resolution, 4-day temporal resolution) were also tested and reported in this study.

Data Analysis and Modelling
The 8-day aggregate GPP data were used to reconstruct continuous daily GPP values. First, a fraction of 1/8 was applied to obtain the 8-day average from the downloaded 8-day sums. Then, each year of data was fitted with the best (low RMSE) of (i) a Gaussian function, (ii) a double logistic function after [25], and (iii) another double logistic function after [26].
(ii) and (iii) were performed using the greenbrown package (version 2.5.0) in R (see function equation details in the respective studies), while (i) was performed using non-linear least squares regression also in R, by fitting the following standard Gaussian function: where k is the peak GPP for the year, t is the day of the year, µ is the day of the year corresponding to the peak of GPP, and σ is the standard deviation.
The method used to extract SOS and EOS from the generated daily data is based on the widely used seasonal midpoint (SM) method (illustration in Figure 3). The traditional SM method derives SOS and EOS as the day of the year corresponding to the midpoint (0.5 * amplitude), i.e., SOS on the ascending part of the curve, and EOS on the descending part. For more stable estimates, several studies used a multi-year median of midpoints, instead of a different midpoint for each year [12,37]. compared to that of vegetation indices, hence leading to better estimates of phenological events, i.e., SOS and EOS. In addition, the proportion of good quality data available for MODIS GPP (non-missing data after filtering out bad quality pixels) was considerably higher than that of vegetation indices, both from MODIS and Landsat 7, making the seasonality easier to retrieve. However, MODIS (250 m spatial resolution, 8-day temporal resolution) and Landsat 7 ETM+ (30 m spatial resolution, 16-day temporal resolution) derived EVI and NDVI, as well as MODIS LAI (500 m resolution, 4-day temporal resolution) were also tested and reported in this study.

Data Analysis and Modelling
The 8-day aggregate GPP data were used to reconstruct continuous daily GPP values. First, a fraction of 1/8 was applied to obtain the 8-day average from the downloaded 8day sums. Then, each year of data was fitted with the best (low RMSE) of (i) a Gaussian function, (ii) a double logistic function after [25], and (iii) another double logistic function after [26]. (ii) and (iii) were performed using the greenbrown package (version 2.5.0) in R (see function equation details in the respective studies), while (i) was performed using non-linear least squares regression also in R, by fitting the following standard Gaussian function: where k is the peak GPP for the year, t is the day of the year, µ is the day of the year corresponding to the peak of GPP, and σ is the standard deviation.
The method used to extract SOS and EOS from the generated daily data is based on the widely used seasonal midpoint (SM) method (illustration in Figure 3). The traditional SM method derives SOS and EOS as the day of the year corresponding to the midpoint (0.5 * amplitude), i.e., SOS on the ascending part of the curve, and EOS on the descending part. For more stable estimates, several studies used a multi-year median of midpoints, instead of a different midpoint for each year [12,37]. In this study, we also used the long-term median threshold. In addition, instead of the traditional midpoint threshold (0.5), we allowed the threshold to vary from 0.05 to In this study, we also used the long-term median threshold. In addition, instead of the traditional midpoint threshold (0.5), we allowed the threshold to vary from 0.05 to 0.95, and to be different for SOS and EOS. This was based on the assumption that in a mixed forest, the species of interest may start contributing significantly to the increase or decrease in GPP at different thresholds compared to a pure forest, depending on the coverage of the species of interest and also the other species present on the site. The best threshold is the one that gives the lowest RMSE of the observed versus modelled SOS or EOS. To test the predictive abilities of the approach, a leave-one-out cross-validation was performed, i.e., estimating each time the best threshold based on all years but one, and then using the holdout year as test data. In that way, all years are used exactly once as validation data when they are not used to train the model. All predictions of holdout year are then put together to calculate RMSE and graphically represent the performance of the model, hence giving a good idea of how well the approach can work in future years.
After the cross-validation, all years were used to determine the best threshold to be applied for future years, for each combination of stations and species. To test the effect of land cover on the best thresholds obtained, fractional land share (% of the pixel) of the species of interest, other broadleaved species, coniferous species, and other land use (including agriculture, grassland, bare land, buildings, etc.) was computed. The best thresholds of 0.45, 0.5 and 0.55 (i.e.,~the traditional threshold) were compared to other best thresholds (<0.45 or >0.55), using the Mann-Whitney test [50,51].

Best Thresholds for the Different Combinations of Stations and Species
The best thresholds varied substantially between the different combinations of stations and species, and also between SOS and EOS ( Figure 4). In general, the best thresholds obtained for SOS are around 0.5, with few combinations of station and species being higher, up to 0.7. For EOS, the best threshold is lower than 0.5 for most combinations of station and species, the minimum being 0.2 for Populus tremula at Zibika, Fagus sylvatica at Cerknica, Javor, Lesce, Betula pendula at Cerknica, and Quercus sp. at Celje and Cerknica. The few exceptions (EOS threshold of 0.5 and 0.6) are Betula pendula at Novi and Celje, Fagus sylvatica at Novi and Bohinjska, and Tilia sp. at Novi and Podzemelj. No plant species stands out consistently as having high or low thresholds of SOS and EOS across all stations. 0.95, and to be different for SOS and EOS. This was based on the assumption that in a mixed forest, the species of interest may start contributing significantly to the increase or decrease in GPP at different thresholds compared to a pure forest, depending on the coverage of the species of interest and also the other species present on the site. The best threshold is the one that gives the lowest RMSE of the observed versus modelled SOS or EOS. To test the predictive abilities of the approach, a leave-one-out cross-validation was performed, i.e., estimating each time the best threshold based on all years but one, and then using the holdout year as test data. In that way, all years are used exactly once as validation data when they are not used to train the model. All predictions of holdout year are then put together to calculate RMSE and graphically represent the performance of the model, hence giving a good idea of how well the approach can work in future years.
After the cross-validation, all years were used to determine the best threshold to be applied for future years, for each combination of stations and species. To test the effect of land cover on the best thresholds obtained, fractional land share (% of the pixel) of the species of interest, other broadleaved species, coniferous species, and other land use (including agriculture, grassland, bare land, buildings, etc.) was computed. The best thresholds of 0.45, 0.5 and 0.55 (i.e., ~the traditional threshold) were compared to other best thresholds (<0.45 or >0.55), using the Mann-Whitney test [50,51].

Best Thresholds for the Different Combinations of Stations and Species
The best thresholds varied substantially between the different combinations of stations and species, and also between SOS and EOS ( Figure 4). In general, the best thresholds obtained for SOS are around 0.5, with few combinations of station and species being higher, up to 0.7. For EOS, the best threshold is lower than 0.5 for most combinations of station and species, the minimum being 0.2 for Populus tremula at Zibika, Fagus sylvatica at Cerknica, Javor, Lesce, Betula pendula at Cerknica, and Quercus sp. at Celje and Cerknica. The few exceptions (EOS threshold of 0.5 and 0.6) are Betula pendula at Novi and Celje, Fagus sylvatica at Novi and Bohinjska, and Tilia sp. at Novi and Podzemelj. No plant species stands out consistently as having high or low thresholds of SOS and EOS across all stations.  The Mann-Whitney tests comparing the share of different land uses between two groups of best thresholds (close to 0.5 or different from 0.5) are presented in Figure 5. The results suggest that for SOS, the land share of the species of interest, coniferous species, other broadleaved species, and other land use considered separately does not determine if the best threshold would be different from 0.5. For EOS, however, there is a significant difference of land share of other land uses (agriculture, grassland, bare land, buildings, etc.) for the two groups of best thresholds. It appears that combinations of stations and species The Mann-Whitney tests comparing the share of different land uses between two groups of best thresholds (close to 0.5 or different from 0.5) are presented in Figure 5. The results suggest that for SOS, the land share of the species of interest, coniferous species, other broadleaved species, and other land use considered separately does not determine if the best threshold would be different from 0.5. For EOS, however, there is a significant difference of land share of other land uses (agriculture, grassland, bare land, buildings, etc.) for the two groups of best thresholds. It appears that combinations of stations and species with higher land shares of other land uses are more likely to result in the best thresholds different from 0.5 (i.e., ≤0.45 and ≥0.55). and for EOS (B). "ns" and "*" mean non-significant (p-value ≥ 0.05) and significant (p-value < 0.05) difference, respectively.

SOS and EOS Estimates from GPP
The absolute errors in the modelled SOS and EOS are relatively well spread around the zero-line for all tree species ( Figure 6). Errors located above the zero-line represent overestimated SOS or EOS, and errors located below the zero-line represent underestimated SOS or EOS. Overall, the absolute errors are larger for EOS compared to SOS, as denoted by the generally longer boxes. The larger absolute errors of EOS are particularly noticeable at some stations and for specific years, with the highest error at Starše in 2012 for Tilia sp. (error of −48 days).  55) at the substitute forests for SOS (A) and for EOS (B). "ns" and "*" mean non-significant (p-value ≥ 0.05) and significant (p-value < 0.05) difference, respectively.

SOS and EOS Estimates from GPP
The absolute errors in the modelled SOS and EOS are relatively well spread around the zero-line for all tree species ( Figure 6). Errors located above the zero-line represent overestimated SOS or EOS, and errors located below the zero-line represent underestimated SOS or EOS. Overall, the absolute errors are larger for EOS compared to SOS, as denoted by the generally longer boxes. The larger absolute errors of EOS are particularly noticeable at some stations and for specific years, with the highest error at Starše in 2012 for Tilia sp. (error of −48 days).
Scatter plots of modelled vs. observed SOS (Figure 7) or EOS (Figure 8) based on the holdout years (left out during training) show relatively low RMSEs in general, considering the original temporal resolution of the remote sensing data (8 days). For SOS, the lowest RMSE of 7.71 days was recorded for Tilia sp. A comparable RMSE of 7.89 days was obtained for Fagus sylvatica. The highest average RMSE recorded is 14 days for Populus tremula. The RMSEs of other species Betula pendula, Quercus sp. and Fraxinus excelsior were 8.98, 9.07 and 11.1 days, respectively.
The RMSE recorded for EOS is generally greater than 10 days, with the exception of Fagus sylvatica and Populus tremula (Figure 8). While the errors for EOS are generally higher than for SOS, this was not the case for Populus tremula (9.07 days for EOS vs. 14 days for SOS).   The RMSE recorded for EOS is generally greater than 10 days, with the exception of Fagus sylvatica and Populus tremula (Figure 8). While the errors for EOS are generally higher than for SOS, this was not the case for Populus tremula (9.07 days for EOS vs. 14 days for SOS).   The RMSE recorded for EOS is generally greater than 10 days, with the exception of Fagus sylvatica and Populus tremula (Figure 8). While the errors for EOS are generally higher than for SOS, this was not the case for Populus tremula (9.07 days for EOS vs. 14 days for SOS).

SOS and EOS Estimates from Other Vegetation Indices
The RMSEs obtained per species for SOS and EOS are reported in Table 1. This includes RMSEs for SOS and EOS derived from MODIS GPP (Figures 7 and 8), but also from MODIS LAI, EVI and NDVI as well as Landsat 7 EVI and NDVI (Figures S1-S15). The results show large differences across the different vegetation indices. While lower RMSE values were obtained for EVI compared to NDVI (both MODIS and Landsat 7), these errors as well as those obtained with MODIS LAI were much higher than those obtained from MODIS GPP data.

Mixed Tree Species Forests Peculiarities
Modelling phenology from remote sensing requires reliable, dense, and frequent ground observation, which is a challenge in the case of mixed forests. In fact, observations are made on individual tree species, often selected not considering site characteristics such as the aspect and slope of the terrain, or the microclimate conditions of trees in near forests. Remote sensing pixels, however, blend together several trees and site conditions, making it difficult to relate field observations to remote sensing data [52]. A mixture of coniferous and broadleaved species is a well-studied case of the influence of mixture on phenology metrics. In fact, at the end of season in autumn, pixels with significant cover of coniferous tree species would probably lead to a slow decrease in vegetation indices as compared to pixels with pure deciduous broadleaved species, which makes the estimation of the date of EOS more uncertain, as it was the case in our study. On the other hand, a mixture of different deciduous species in the same pixel leads to more difficulties in estimating the date of SOS for individual species, due to the different timing of bud-burst for the different species [53].
The use of remote sensing to study phenology at species level requires a continuous and large mono-species area [54,55], which needs to be even larger for coarse-resolution remote sensing data such as the MODIS GPP product. This is usually resolved by finding larger pure stands in similar conditions as the observed tree [49]. This could constitute a substantial source of error in phenology modelling from remote sensing data, since the slight differences (altitude, slope, aspect) in the substitute forest stand could result in substantial discrepancies in the modelled phenology. In fact, the bioclimatic law suggests four-day offsets in phenology events for a difference of 120 m in altitudes [56], which means that the 200 m margin considered in the search of substitute stands could be an important source of error in our study. In addition, the relatively low minimum area of pure forest (1 ha) adopted in our study due to the scarcity of large pure stands of some tree species, meant low fractional cover of the species of interest in the remote sensing pixel, which is an important source of uncertainty in the model.
White et al. [53] predicted SOS with a mean absolute error of 11 days in mixed forests, 9 days in hardwood dominated forests, and 7 days in a nearly pure sugar maple forest using Landsat TM imagery. This is consistent with our study that showed a low SOS RMSE (7.89 days) for Fagus sylvatica, which represents the prevalent deciduous forest across Slovenia. The lower SOS RMSE observed for Tilia sp. (7.71 days) is probably due to the fact that Tilia species is a secondary tree species which rarely forms pure stands in Slovenia, and the few significant stands are mostly located in areas less suitable for other tree species. This, however, does not explain the high error obtained for EOS for the same tree species (16.7 days).

Challenges of the Seasonal Midpoint Method
Slovenian forests are generally situated in hilly/mountainous landscapes with high cloud cover. This means reduced cloud-free data coverage of different vegetation indices, which leads to difficulties in observing the seasonality in vegetation indices, and thus poor estimates of phenology metrics [53]. Since the SM method is based on the yearly maximum and minimum vegetation indices values as well as the trend over the year, it is very important to reconstruct the seasonal trend of the vegetation growth as reliably as possible. The smoothing algorithm or curve fitting method chosen for reconstructing continuous daily remote sensing data would greatly affect the seasonal pattern, and hence influence the results of the SM method [11,36] even more when large gaps exist in the time series. In this study, the choice of the double logistic functions is supported by the fact that it proved to describe the vegetation indices well over the growing season, by handling outliers effectively [26]. In the few years where the double logistic function failed to capture the seasonality due to a lot of missing data, for instance, the Gaussian function performed relatively well. Therefore, instead of using a single method as in most previous studies, we used the best fit between a Gaussian function and two double logistic functions for each year, in order to ensure the best fit and guarantee the lowest errors in modelled SOS and EOS.
The popularity of the SM method is due to its simplicity. Despite its good performance in modelling plant phenology, it is sensitive to several factors that could affect the seasonality reflected by satellite data. Based on the midpoint of the amplitude of one year of remote sensing vegetation indices or GPP, the SM method can be dynamically applied to each year separately, in which case it is very sensitive to partial snow and/or cloud cover, droughts, etc. of that year, introducing large variations in the results of the method. This is the reason for adopting a constant threshold over multiple years [11,57]. On the other hand, fixed threshold approaches have limitations such as a poor estimation of the phenology of some particular years, and the need to use a given threshold only for a number of years since long time series can show a positive trend in vegetation indices, leading to early SOS estimates in future [57]. Threshold methods often use a constant and the same threshold for defining SOS and EOS. The most common threshold used is 50% of the amplitude of the considered vegetation index over the season. However, Huang et al. [58] found that there is no optimal threshold that works for all plant species. In fact, the threshold for SOS (and EOS) depends on the plant species, the plant community, and the background reflectance properties at a particular site [34,59]. These support the choice of a tuned threshold for each combination of stations and species in our study, therefore offering the best possible estimates of the SM method across different situations.

Reasons for the Difficulty in Estimating Phenology from Remote Sensing
One possible source of error in phenology modelling originates from errors in observations, especially in empirical approaches as adopted in this study, where the model is developed based on observations. Visual phenological observations demand a person to observe and record the phenological phases in the field, and this approach is not free of subjective judgments and biases of an individual observer when visually assessing a canopy. Differences of up to 7 days have been found among different phenological observers. Differences are most visible when an observer is replaced at the same location [60].
The temporal resolution of vegetation indices is very important for phenology modelling, even more important than spatial resolution, especially in pure or nearly pure forests [36,61,62]. In the mixed forests of this study, this was also proven by the poor performance of Landsat-derived vegetation indices, which had a low temporal resolution of cloud-and snow-free data, despite their higher spatial resolution. The good performance of MODIS GPP is likely due to their good temporal resolution, despite a very coarse spatial resolution. However, it is worth noting that it is also the combination of both temporal and spatial resolutions, as well as the adequacy of the chosen remote sensing data that determine the performance of remote sensing-based phenology models. In fact, despite the higher temporal resolution of MODIS LAI data compared to MODIS GPP, the noise in LAI data caused a worse performance than GPP.
Events characterizing the EOS (leaf colouring and leaf fall) have been studied less than spring events of SOS (or leaf unfolding), although various authors have reported that the timing of leaf senescence shows less year-to-year variability and is concomitant with less favourable conditions for photosynthesis [63]. However, some factors influence leaf colouring, which could lead to difficulties in predicting the timing of the onset of leaf colouring for some particular years and species. In fact, Vilhar et al. [64] reported a sensitivity of the general leaf colouring of pedunculated oak to summer and autumn temperature, with higher temperatures (June-October) inducing later leaf colouring, while the leaf colouring of beech was more sensitive to spring temperature (March-May).
Autumn plant events are generally more difficult to define and are influenced by sudden individual weather events such as high winds or a single frost [65]. This makes the modelling of EOS particularly more difficult than SOS. For instance, errors of modelled EOS are usually ≥10 days, whereas usual errors in modelling SOS are around 7 days in physiological phenology models [63,66], which relates very well to the performance of the remote-sensing-based model of our study, with larger errors for EOS. The phenological phase 'leaf colouring' (or EOS) is more problematic to recognise than the first few green leaves in spring (SOS), because the phase definition refers to 50% of all leaves coloured, including leaves both on branches and on the ground. The variation among individual trees at comparable or identical sites is much higher than in the case of SOS [10].
In the present study, no significant effect of the fractional land share of the different groups of land use (species of interest, coniferous species, other broadleaved species, other land uses) on the best threshold of SOS was found. For EOS, however, higher values of other land uses seem to have a significant effect on the best thresholds, which fluctuate more from the usual default threshold of the SM method (0.5) with high land shares of other land uses (agriculture, grassland, bare land, buildings, etc.). In fact, agriculture and grassland mixed with a species of interest in the same remote sensing pixel would lead to more complex patterns in GPP, with a threshold not being able to capture the interannual variations of the EOS. This is supported by the findings of Klosterman et al. [67], that phenology, and in particular EOS, is more difficult to estimate using remote sensing in heterogeneous ecosystems with a low fractional cover of forests.

Conclusions
In this study, the SM method was applied to MODIS GPP data in order to estimate phenological events of start of season (SOS) and end of season (EOS) for six tree species in mixed forests in Slovenia for several years between 2004 and 2019. The results reinforce the importance of temporal resolution over spatial resolution in modelling tree phenology, due to the importance of seasonality, particularly for the SM method. It also appeared clearly that the usual threshold of 0.5 is not best in most cases, especially for estimating the EOS. Finally, this study showed that despite the difficulty in modelling the phenology of different tree species in a mixed forest using remote sensing, it remains possible with moderate errors. It was possible to estimate SOS and EOS with errors as low as <8 days (Fagus sylvatica and Tilia sp.) and <10 days (Fagus sylvatica and Populus tremula), respectively.