Validating MODIS and Sentinel-2 NDVI Products at a Temperate Deciduous Forest Site Using Two Independent Ground-Based Sensors

Quantifying the accuracy of remote sensing products is a timely endeavor given the rapid increase in Earth observation missions. A validation site for Sentinel-2 products was hence established in central Germany. Automatic multispectral and hyperspectral sensor systems were installed in parallel with an existing eddy covariance flux tower, providing spectral information of the vegetation present at high temporal resolution. Normalized Difference Vegetation Index (NDVI) values from ground-based hyperspectral and multispectral sensors were compared with NDVI products derived from Sentinel-2A and Moderate-resolution Imaging Spectroradiometer (MODIS). The influence of different spatial and temporal resolutions was assessed. High correlations and similar phenological patterns between in situ and satellite-based NDVI time series demonstrated the reliability of satellite-based phenological metrics. Sentinel-2-derived metrics showed better agreement with in situ measurements than MODIS-derived metrics. Dynamic filtering with the best index slope extraction algorithm was nevertheless beneficial for Sentinel-2 NDVI time series despite the availability of quality information from the atmospheric correction procedure.


Introduction
Quantifying the accuracy of satellite-based remote sensing products is an important task that is increasingly timely given the rising number of Earth observation missions. Different measurement strategies are pursued at established validation sites, including continuous in situ (spectral) measurements, airborne remote sensing campaigns at the landscape level, as well as the collection of independent satellite sensor data [1,2]. Linking small-scale observations with large-scale satellite products is crucial to understand the relationships between optical information and ecological, as well as physiological processes [3]. This thus facilitates prediction and mapping of processes at regional and global scales, such as terrestrial carbon assimilation [4]. Automatic continuous spectral measurements are bridging the gap between continuous micro-meteorological measurements and remote-sensing products by providing optical information about vegetation processes with high temporal resolution [5,6]. They are thus helpful to improve the understanding of satellite measurements.
In order to compute a reflectance factor, later used for the calculation of NDVI, it is necessary to measure the upwelling and downwelling radiation fluxes. This can be achieved with different setups [6]: systems using one sensor have to measure a reference in sequence to each measurement of the target surface, e.g., a white reference panel [27] or downwelling irradiance through the rotation of the fore optics of the sensor [28]. This approach introduces considerable time delays between downwelling irradiance and upwelling radiance measurements and therefore increases measurement uncertainties, particularly under unstable illumination conditions [27]. The moving parts are an additional potential error source in a long-term outdoor setup. Furthermore, the approach is restricted to a single fore optic for both target and reference measurements, which is suitable only for measurements of bi-hemispherical reflectance factors in practice.
Measuring downwelling irradiance and upwelling radiance quasi-simultaneously with a dual-field-of-view (DFOV) setup and different FOV fore optics [19,24,25] removes the restriction of observing bi-hemispherical reflectance factors. Furthermore, simultaneous acquisition for both FOVs is possible in the case of the use of two sensors, while the time delay depends on sensor characteristics and instrument settings such as integration time if only one sensor is used [26]. However, challenges due to spectral shifts caused by bifurcated fibers (one sensor) and cross-calibration of wavelength scale (two sensors) may arise for DFOV hyperspectral measurements. We used the DFOV approach with two sensors for multispectral measurements and with a single sensor and a bifurcated fiber for hyperspectral measurements. Hyperspectral-and multispectral-based NDVI values (spatial extent on the order of 10 m) were compared with NDVI products derived from MODIS Aqua and Terra (250-m ground resolution), as well as Sentinel-2 (10-and 20-m ground resolution) over the course of two vegetation periods. We evaluated whether dynamic filtering procedures, commonly used with NOAA AVHRR and MODIS, are now obsolete with new generation satellite data. Finally, the influence of varying spatial resolutions of the different datasets on extracted phenological metrics was also assessed.

Study Site and Sensor Setup
We collected data at the forest site 'Hohes Holz' (52.08 • N 11.22 • E), which is situated in central Germany near the Harz Mountains in a temperate climate [29]. The site is part of the interdisciplinary and long-term research programs Tereno (Terrestrial Environmental Observatories [30]) and ICOS (Integrated Carbon Observation System [31]). The ecosystem of 'Hohes Holz' is a deciduous forest with a size of around 15 km 2 . The vegetation within one MODIS pixel around the tower consisted of 50% oak, 45% beech, 2.5% birch and 2.5% clearings, whereas the area within 30 m around the tower had a higher amount of clearings (10%) and birch (4%) and less oak (44%) and beech (42%). The eddy flux tower has a height of 50 m, which is about 20 m above the canopy. We installed one hyperspectral sensor and two multispectral sensors at the site: a QE65000 (Ocean Optics, Dunedin, FL, USA) and an SKR1850 4-channel sensors (Skye Instruments Ltd., Llandrindod Wells, Powys, U.K.).
Data logging of the multispectral sensors is performed with a Campbell Scientific CR1000 data logger with a sampling frequency of 0.1 Hz. The upward facing multispectral sensor is set up with a hemispherical cosine diffuser. The downward facing sensor is used for narrow-angle measurement (25 • FOV) of vegetation with a south-orientated (view azimuth of 180 • ) off-nadir view angle of 22.5 • . Central wavelengths and the full-width at half-maximum (FWHM) were chosen according to Sentinel-2 Channels 4-6 and 8a (see Table 1). The hyperspectral sensor uses scientific-grade back-thinned detectors (S7031-1006, Hamamatsu Photonics, Hamamatsu City, Shizuoka Pref., Japan) and was set up with a spectrometer slit width of 10 µm, allowing an FWHM of 1.58 nm and a spectrometer grating allowing for observations in the range of 398 nm-1174 nm with 1030 channels (QE65000, Ocean Optics, Dunedin, FL, USA). A 400-µm bifurcated fiber optic cable (QBIF400-VI, Ocean Optics, Dunedin, FL, USA) is used as the dual input channel, where each of the two cable ends is connected with a shutter (INLINE-TTL, Ocean Optics, Dunedin, FL, USA). Each shutter is connected with a 400-µm fiber optic cable (QP400-15-V, Ocean Optics, Dunedin, FL, USA). A cosine corrector (CC-3-UV-S, Ocean Optics, Dunedin, FL, USA) is attached to the upward facing cable end for irradiance measurements. The downward facing cable end is used without fore optics with a field of view of 25 • to measure upwelling radiances. The optics are mounted 20 m above the canopy with a south-orientated (view azimuth of 180 • ) off-nadir view angle of 22.5 • (see Figure 1b) on a boom of the eddy flux tower. This results in an observed canopy area of around 72 m 2 . The sensor and shutters are controlled via LabView (National Instruments, Austin, TX, USA). Shutters open and close alternately to select the signal from upward-and downward-facing fibers to measure downwelling irradiance and upwelling radiance sequentially. Integration time was set to two seconds between April and September and four seconds between October and March. Dark current is determined with closed shutters before and after each measurement. This procedure is repeated three times every ten minutes and takes 24 seconds (three times eight seconds) between April and September and 48 seconds (three times 16 seconds) between October and March. The whole setup (see Figure 2) is situated in a weather-proof box (see Figure 1a) with a temperature-driven ventilation system. USB devices are connected to a USB-to-Ethernet server to enable remote control. The data logger is connected with Ethernet, and data are transferred automatically once a day onto a network storage system.

Generation of NDVI Products from Ground-Based Spectral Measurements
Hyperspectral solar irradiance data were cross-calibrated once by using measurements of solar irradiance with an ASD FieldSpec 4 spectrometer (Analytical Spectral Devices Inc., Boulder, CO, USA) and a Spectralon panel. Calibration performance was assessed by resampling the QE65000 spectrum to the lower spectral resolution of ASD FieldSpec. High accuracy and precision were achieved when comparing the corresponding two spectra of solar irradiance (r 2 = 0.99, relative RMSE = 0.03 in the spectral range 440-1000 nm). Hyperspectral upwelling radiance observations were calibrated by using the downwelling irradiance calibration and a Spectralon panel. The stability of wavelength positions was irregularly checked with a mercury argon calibration light source (HG-1, Ocean Optics, Dunedin, FL, USA). Multispectral data were radiometrically corrected by using the channel sensitivities given by the manufacturer. Manufacturer calibration was repeated every two years (2014 and 2016). Long-term stability was assessed by comparing against shortwave pyranometer measurements (CNR4 net radiometer, Kipp&Zonen B.V., Delft, The Netherlands) from the eddy flux tower. We verified CNR4 calibration stability by comparing against other sensors of the same type at sites located not far from the forest site. Channel 1 of the upward-looking multispectral sensor did not give correct observations. Its values were hence replaced with appropriately-scaled values of Channel 2. The validity of this approach was assessed both with hyperspectral observations at the same site and comparable multispectral observations at a different site [23].
Hyperspectral data were normalized by integration time, and the electronic dark current was subtracted before correction. Hyperspectral observations were transformed into multispectral signals by multiplying them with the corresponding spectral response curves of the multispectral sensor and integrating over the wavelengths, for both downwelling irradiance and upwelling radiance before calculating reflectance factors. Measurements are taken three times (hyperspectral sensor) every ten minutes and 60 times within ten minutes (multispectral sensors); we refer to this further as the measurement-cluster. For each cluster, the absolute sum of irradiance change per channel (Irr i,change ) and the mean irradiance per channel (Irr i ) are calculated. Measurement clusters are rejected if Irr i,change exceeds 0.02 × Irr i (hyperspectral data) or Irr i (multispectral data). Hence, measurements are selected that have been acquired under steady conditions, i.e., mainly clear sky conditions. Subsequently, only measurements between 11:00 a.m. and 01:00 p.m. UTC are considered. Non-vegetational effects usually decrease NDVI values, but highly diffuse illumination conditions illuminating shaded areas and therefore producing directional effects, for example, might also increase NDVI values. Such effects are rather difficult to eliminate. Here, we used sunshine pyranometer data (SPN1, Delta-T Devices Ltd., Cambridge, U.K.) to exclude measurements with a direct to diffuse solar radiation ratio (further referred to as DDSRR) lower than the empirically determined value of 3, to exclude the effects of varying illumination conditions that can affect NDVI [32]. Only upwelling radiance measurements made simultaneously to selected downwelling irradiances were used in order to compute reflectance factors: Refl i , i = 1,. . .,4 (see Equation (1)), where π stands for corrections of the hemispheric angle [33].
Mean reflectance factors of each measurement cluster serve the generation of NDVI data with RED = Refl 1 and NIR = Refl 4 according to Equation (2). The resulting datasets are further referred to as NDVI multi and NDVI hyper and contain 1525 and 1447 data points for 'Hohes Holz', respectively.

Satellite Data and Respective NDVI Products
We used spectral data from MODIS datasets MYD09GQ [34] and MOD09GQ [35] with a spatial resolution of 250 m, as well as Sentinel-2A Level 1C (L1C) data [36] with spatial resolutions of 10 and 20 m. Sentinel-2A L1C data were atmospherically-, terrain-and cirrus-corrected with Sen2Cor (Version 2.3.0) [37] and ATCOR 2/3 (Atmospheric & Topographic Correction for Small FOV Satellite Images, RESE R , Version 9.1) [38] in order to generate Level 2A (L2A) data. Both processing tools generate scene classifications to distinguish between clear pixels and clouds, shadows and saturated pixels. We continued the study with ATCOR 2/3 (and without Sen2Cor) due to the higher number of successfully-processed scenes with clear sky conditions. ATCOR 2/3 could process 47 out of 78 scenes without error messages. The remaining scenes could not be processed due to the solar zenith during acquisition exceeding 70 • , software problems or less than one percent of clear pixels. ATCOR 2/3 detected 22 scenes as 'clear' around the site 'Hohes Holz', while eleven of them showed clouds or shadows after manual examination.

NDVI Post-Processing and Phenological Metrics Extraction
Methods for NDVI post-processing and extraction of phenological metrics were implemented in an R-package [39] for phenological data analysis called 'phenex' [40]. This includes the Best Index Slope Extraction (BISE) [21] for preprocessing and methods to model phenological time series.
BISE is a dynamic filter algorithm for time series. Starting at the first date of the time series (i.e., DOY = 1 for each year), BISE accepts the next data point if it has a higher value than the previous one. Lower values are accepted if there is no point in a pre-defined time period ('sliding period') with a higher value. Higher values are rejected if they exceed a pre-defined threshold thres inc , i.e., the allowed increase of NDVI per day. NDVI fluctuations from natural surfaces greater than 0.1 per day are usually attributed to data errors.
We applied BISE on NDVI aqua and NDVI terra with a sliding period of 30 days and an allowed NDVI increase of 10% per day (thres inc = 0.1) in order to remove outliers and eliminate non-vegetational effects, e.g., cloud cover. The resulting datasets were stored as NDVI bise aqua and NDVI bise terra . The next step is the reconstruction of daily NDVI values based on BISE-selected NDVI observations (NDVI multi , NDVI hyper , NDVI bise aqua , NDVI bise terra and NDVI atcor s2a,10 ). Here, we used linear interpolation; however, other approaches such as fitting a Gaussian function are also commonly used [8]. Phenological metrics extraction is subsequently conducted on the reconstructed NDVI profile and includes green-up and senescence dates, as well as maximum and minimum NDVI and their date of occurrences. Green-up and senescence dates are defined as dates when NDVI values reach a threshold of 55% between the minimum and maximum NDVI values [40,41]. The threshold was determined empirically by comparing satellite-derived green-up dates from deciduous broadleaf forests all over Germany to respective ground observations of the German Weather Service (Deutscher Wetterdienst, DWD) acquired from the Plant-Phenological Online Database (PPODB, www.phenology.de [42]). This database provides around 1500 observations per species, phenological phase and year. The standard deviation of phenological metrics was computed by varying the threshold (normally distributed) according to the satellite product error along the modeled NDVI curve.

The NDVI Products at Different Scales
NDVI products across all systems showed very similar patterns at the forest site (Figure 4). The temporal NDVI evolution describes precisely the intra-annual patterns of a (changing) canopy structure in deciduous broadleaf forests in temperate climates. Species phenology is mainly driven by temperature (and by day length to a lesser degree) [7] and can be discriminated into two phases: a phase of active/green vegetation (leafy season) from spring to fall and a phase of vegetation in-activity (dormancy) during late autumn and winter. Two distinct phenological events precede these phases: the bud burst event, tightly coupled with leaf development and maturation, as well as the onset of leaf coloring and leaf fall in autumn. At the start of the season around April, NDVI is increasing until it reaches a maximum in June. The NDVI slightly decreases during the vegetation period and rapidly decreases in fall around mid-September due to senescence. It reaches its minimum value in December. The signal-to-noise-ratio decreases in winter due to snow cover, low irradiance and minor vegetation cover. Sentinel-2A images were manually examined in order to identify misclassified values and excluded if corresponding shortwave pyranometer measurements showed DDSRR < 4. Furthermore, tests demonstrated the applicability of BISE for the removal of these data points. NDVI observations falsely labeled as 'clear' by the processing software (and hence exhibiting lower values compared to a true clear pixel within a vegetation period) could be removed by the dynamic filtering procedure.

NDVI time series
Green-up in 2015 and the peak of the vegetation period could be clearly identified by both MODIS NDVI and ground-based datasets. NDVI multi , NDVI hyper , and MODIS NDVI datasets showed the anticipated decrease during summer, where leaf color usually changes from light to dark green. Sentinel-2A data of 2015 were in line with NDVI multi . Fall of 2015 was characterized by unstable illumination conditions (clouds, rain events, snow; see Figure 5a), which led to large gaps in satellite and hyperspectral data. In winter 2015/2016, NDVI bise terra showed slightly higher values than NDVI multi and NDVI hyper . NDVI curves exhibited a strong NDVI decrease in December 2015 and January 2016. This period was characterized by temperatures below 0 • C and precipitation events, with snow/ice cover as the main factor of decreased NDVI values (Figure 5b-d, confirmed by respective temperature and precipitation measurements at the tower site, which are not shown here). Data from this period were excluded from further analysis. The initial NDVI increase in 2016 (March) was captured by all systems almost simultaneously. NDVI multi , NDVI hyper and NDVI atcor s2a,10 started with lower NDVI values than NDVI bise aqua and NDVI bise terra . NDVI evolution over the course of leaf unfolding/leaf expansion between April and May was tracked similarly by all employed sensors. Peak NDVI was reached in June, with NDVI bise aqua showing the highest and NDVI multi showing the lowest values. The anticipated slight decrease over the summer months was detected by all sensor systems.
Temporal mismatches between sensor systems reduced the amount of available data points for inter-sensor comparisons (Figure 4 versus Figures 6 and 7). NDVI products from ground spectral measurements correlated well (r 2 = 0.998 and RMSE = 0.01; Figure 6) and showed very similar patterns (Figure 4). Correlations of both ground sensors with satellite NDVI products are similar ( Table 2) and strong (0.72 ≤ r 2 ≤ 0.97), especially with MODIS Aqua and Sentinel-2A NDVI (Figure 7). The magnitude of NDVI multi and NDVI hyper was comparable with satellite NDVI data points. There were some minor deviations during the vegetation period and larger deviations in fall 2015 and winter 2015/2016 (Figure 4). NDVI multi and NDVI hyper mainly agreed with NDVI bise aqua , showing a steady NDVI decrease towards an NDVI of around 0.56, while NDVI bise terra showed values greater than 0.7 after November 2015. They are also apparent when looking at the correlations between ground and satellite observations (Figure 7).    . Scatter-plot of NDVI derived from the hyperspectral sensor system (NDVI hyper ) and satellite NDVI products (NDVI satellite). Satellite NDVI products include NDVI bise aqua (blue), NDVI bise terra (red), as well as NDVI atcor s2a,10 (green). NDVI multi and NDVI atcor s2a,20 are not shown due to large similarities with NDVI hyper and NDVI atcor s2a,10 , respectively.

Analysis of Phenological Metrics
Phenological metrics calculated from NDVI time series differed between the sensor systems (Table 3)

Discussion
We validated different satellite phenology products by comparing them to multi-sensor ground measurements. The approach relates to Land Product Validation Stage 1 of CEOS (Committee on Earth Observation Satellites) [2]. This required the establishment of a validation site equipped with unattended multispectral and hyperspectral sensor systems for continuous vegetation monitoring. The DFOV systems enable reflectance factor acquisition with (near-)simultaneous measurements of downwelling irradiance and upwelling radiances at high frequency. The approach chosen in this study with a single-sensor DFOV hyperspectral system reduces the costs and effort compared to a system with two spectrometers since an additional permanently-installed spectrometer is not needed [6]. Considerable instability in the scaling of the hyperspectral reflectance factor was observed when comparing time series of hyperspectral with multispectral observations. However, the scaling factor was observed to be independent of wavelength and thus does not affect NDVI, as wavelength independent scaling factors cancel out. For other applications requiring correct scaling of the hyperspectral reflectance factor, a further correction of the calibration instabilities is required. An approach to achieve this will be described in a separate publication. It has also been shown that multispectral data accuracy can potentially be improved by conducting more frequent in situ calibration/validation measurements [44].
The high correlation of NDVI time series between sensors suggests that the multispectral sensor system is sufficient for phenological pattern analysis. The additional use of a hyperspectral system provides, however, information beneficial for the validation of other remote sensing products [6] such as chlorophyll content and leaf area index [45], as well as the possibility to model, for example, plant productivity [3,46]. Multispectral information (of different central wavelengths and FWHM) can further be generated from hyperspectral signals so that the latter consequently enables the validation of different optical satellite missions. Upcoming hyperspectral satellite missions, e.g., EnMap [47], and airborne hyperspectral campaigns will benefit largely from validation sites equipped with spectrometer systems.
Differences in NDVI products are either introduced from sensor specifications (e.g., band configuration), angular effects, measurement scales (and therefore, different observed vegetation patches), calibration accuracy of satellite sensors and atmospheric correction, including cloud-detection [48]. While sensor specifications are mainly comparable, scales and observed vegetation patches differed significantly. Due to the MODIS coarse resolution of 250 m, analysis is limited to only one pixel, whereas we could statistically analyze a cluster (within 30 m around the eddy flux tower) of Sentinel-2A pixels with 10-m resolution for this study. Hence, the observed area had a size of around 62,500 m 2 (MODIS) and around 3600 m 2 (Sentinel-2A), respectively, compared to the observed ground area of around 72 m 2 . We observed varying performance of Sentinel-2 processors especially in the presence of small cloud patches, cloud shadows and haze. These problems may be enhanced for MODIS pixels due to their coarser resolution and subsequently more difficulties in the detection of small-scale cloud cover. The angular configuration is different for the sensor systems: MODIS Terra and Aqua overflights differ in solar zenith angle (around five to eight degrees) due to their overpass times around 10:30 a.m. and 01:30 p.m., respectively. NDVI hyper and NDVI multi exhibit different illumination angles, introduced by the selection of the time period 11:00 a.m.-01:00 p.m. and DDSRR-filtering. Sentinel-2 observed our region of interest around 10:30 a.m. with a solar zenith angle similar to MODIS Terra. Hence, angular differences can be introduced by both temporal mismatches and angular configuration, which are difficult to differentiate. We mainly observed small NDVI differences within one day between sensor systems, although MODIS Terra showed larger differences to the other sensors during winter 2015/2016. We found considerably higher NDVI values in diurnal multispectral NDVI profiles before 10:30 a.m. only during winter time, assuming potential effects of illumination geometries on the signal. Consequently, MODIS Terra and potentially Sentinel-2 are more sensitive to this effect than MODIS Aqua due to their overpass time before noon. We could not substantiate this effect for Sentinel-2 data, since no observations with clear sky conditions were available for this period. Temporal mismatches between different systems increase using filter algorithms due to the selection of the 'best' or 'true' NDVI values. Remaining NDVI values after dynamic filtering are used as input for models restoring the phenological profile. Hence, the detection of phenological local extrema, e.g., date and magnitude of the peak of the growing season, is critical to robustly depict phenological metrics. Despite the above described effects, we could not detect a considerable influence of temporal mismatches within one day on phenological metrics extraction. This might be different for other vegetation types or climates with modified phenological timing. Temporal mismatches of several days can in extreme cases cause a derivation of potentially wrong phenological phases, e.g., Sentinel-2A observing clouds during green-up, while MODIS with higher repetition rate observes clear sky conditions in between.
In situ and satellite-based NDVI time series captured similar phenological patterns despite the numerous influencing factors. Correlations between the sensor systems were strong, although some deviations occurred during fall and winter between NDVI of ground sensor systems and MODIS Aqua and Terra. RMSEs between satellite and in situ data exhibited the same magnitude as found in another recent study [49] and as the MODIS NDVI uncertainty (0.02 + 0.02 × NDVI [48]).
Depicted phenological metrics from both in situ datasets were consistent with each other and agreed with Sentinel-2A-derived metrics in 2016. Standard deviations and even the differences of green-up and senescence between the sensor systems were consistent with the natural variance of phenology. MODIS NDVI values during spring and fall were lower compared to in situ data, leading to a deviation in depicted phenological phases: green-up occurred later and senescence earlier.
The later GU is contrary to the results found in [4]. Tests with NDVI from hyperspectral data resampled to MODIS bands demonstrated a small shift towards later GU. A larger shift was introduced when averaging NDVI of all Sentinel-2 pixels within the MODIS pixel, although the area is characterized by a homogeneous deciduous broadleaf forest with small clearings. Since satellite observations integrate phenological signals from different species within the pixel, the observed effect of spatial scale mainly stems from different species composition. The species' signals might also be differently weighted according to the sensors' point spread function. This hypothesis is supported by high satellite NDVI values in summer, later green-up and day of maximum NDVI, as well as earlier senescence with increasing scale.
Sentinel-2A images are currently available as Level 1C products. Atmospheric, cirrus and terrain corrections are necessary for further analysis. The processing tool ATCOR 2/3 produced reasonable Level 2A products. Although ATCOR 2/3 provided scene classification information supporting the elimination of outliers, misclassifications compromised the resulting NDVI time series. Three approaches can be distinguished eliminating faulty records in time series: images can be manually examined, but this is not feasible for long time series or large datasets. The usage of diffuse shortwave pyranometer measurements as selection criteria is also limited to observation sites with respective infrastructure, but these measurements are getting more and more common. Dynamic filter algorithms, such as BISE, reduce the risk of including false observations while being applicable to large datasets without the need for auxiliary site-specific measurements. Here, BISE was able to consistently remove faulty NDVI values. Daily MODIS NDVI values of 250 m lack scene classification information and respective analysis consequently requires the use of filter algorithms. BISE was able to restore the NDVI profile, although we detected mismatches with ground data during fall 2015 and winter 2015/2016. During December 2015 and January 2016, low temperatures, precipitation events and subsequently snow cover and frozen surfaces led to a strong decrease in multispectral and hyperspectral, as well as MODIS Aqua NDVI not related to vegetation. A strong increase in NDVI during the snow-melt period was detected, which is consistent with other studies observing NDVI of snow-covered vegetation [11,19]. The length of this situation exceeded the sliding period of BISE parametrization. Consequently, decreased MODIS Aqua NDVI values were not eliminated. In contrast, MODIS Terra NDVI increased during winter 2015/2016, relating to the previously-mentioned angular effects.
Apart from assessing the accuracy and precision of satellite products, a validation site offers the opportunity to examine data gaps or periods with low signal-to-noise-ratio introduced by bad illumination conditions and cloud or snow cover. Hence, complete ground NDVI time series are beneficial. Future studies may use a combination of DDSRR-filtering and dynamic filtering, subsequently lowering the DDSRR-threshold to obtain more data points under overcast or hazy conditions, increasing the number of usable ground measurements. The comparison of filtered satellite-based time series with in situ data provides the possibility to modify BISE parameters for satellite phenology depiction, if necessary [50].

Conclusions
Here, we validated satellite phenology products for a deciduous broadleaf forest. Phenological metrics were consistent between sensor systems and agreed with the natural phenological variance also with sparse data coverage during autumn and winter time. Methods taking the effects of strongly diffuse illumination into account were already addressed in other studies [51,52]. In our measurements, however, we mostly excluded periods of time with highly diffuse illumination because of unstable measurements. A larger number of measurements under highly diffuse and low illumination conditions could be retained by continuous integration time optimization (low illumination) [27,52] or averaging over a larger number of consequent measurements (diffuse conditions). Further, spectral sensors with technical specifications such as research-grade instruments of the WMO (World Meteorological Organization) class quality are beneficial for dealing with these suboptimal conditions. Validation sites should utilize either multispectral sensors with bands chosen according to satellite sensors' channels or spectrometers simulating respective bands, since the band configuration has an effect on extracted phenological metrics. Improvements in observing the 'true' phenological signal in phenological time series were detected by the use of Sentinel-2A data compared to MODIS. Nevertheless, despite the long service and comparably coarse spatial resolution of MODIS Aqua and Terra, MODIS-based phenological metrics agreed fairly well with ground based products. The high spatial resolution of Sentinel-2 reduces mixed-pixel effects ubiquitous in phenology research and offers new opportunities to understand connections between site measurements and large-scale processes, e.g., in the analysis of eddy covariance footprints or vegetation-dependent processes in heterogeneous landscapes. Temporal mismatches between phenological phases and timing of Sentinel-2 observations, under certain conditions critical for the extraction of phenological metrics, will decrease with the upcoming availability of Sentinel-2B data and hence increased temporal resolution of Sentinel-2. The Sentinel-2 processing tool ATCOR 2/3 allows for atmospheric, terrain and cirrus correction of Sentinel-2 images and provides scene classification information on cloud cover, illumination condition and saturation. Despite these comprehensive ancillary data, dynamic filtering algorithms like BISE are still beneficial to generate reliable phenological time series.