Quantifying the accuracy of satellite-based remote sensing products is an important task that is increasingly timely given the rising number of Earth observation missions. Different measurement strategies are pursued at established validation sites, including continuous in situ (spectral) measurements, airborne remote sensing campaigns at the landscape level, as well as the collection of independent satellite sensor data [1
]. Linking small-scale observations with large-scale satellite products is crucial to understand the relationships between optical information and ecological, as well as physiological processes [3
]. This thus facilitates prediction and mapping of processes at regional and global scales, such as terrestrial carbon assimilation [4
]. Automatic continuous spectral measurements are bridging the gap between continuous micro-meteorological measurements and remote-sensing products by providing optical information about vegetation processes with high temporal resolution [5
]. They are thus helpful to improve the understanding of satellite measurements.
Phenology plays a central role in the estimation of vegetation-dependent processes [7
]. Research on vegetation phenology uses phenological and spectral ground observation networks, phenological modeling, eddy covariance towers and satellite-derived imagery to assess and monitor vegetation status and dynamics [8
]. Several studies are dealing with the question of which vegetation index to choose for satellite phenology depiction [10
]. Although the Normalized Difference Vegetation Index (NDVI) shows saturation effects in dense forest canopies [11
], it is one of the most widely-used indices for this purpose, mainly due to data availability and its robustness against noise and varying illumination geometries [10
]. A large number of studies [13
] mapped land surface phenology from satellite NDVI data. These studies outlined two main problems in the derivation of phenology from satellite data: firstly, the low spatial resolution of satellite data, e.g., from the NOAA Advanced Very High Resolution Radiometer (AVHRR) and the NASA Moderate-resolution Imaging Spectroradiometer (MODIS). This results in mixed pixels, integrating phenological signals of different vegetation and land cover types, e.g., deciduous forests and nearby grasslands. New satellite generations, such as the Copernicus Sentinel-2, may overcome this issue with higher spatial resolution. Secondly, non-vegetational effects may alter the received phenological signal. These effects include atmospheric conditions, snow cover [11
], soil-wetness [8
], viewing geometry and illumination conditions [18
], as well as the distorted signal under overcast conditions. Noise introduced by these effects can be eliminated by using methods like Maximum Value Composite (MVC) [20
] or using dynamic filtering, such as the Best Index Slope Extraction (BISE) [21
]. While MVC is effective at reducing cloud and viewing condition effects, it can include outliers with higher NDVI. Short-term vegetation-state changes might be masked using a long composition period, while a short period retains noise [21
]. BISE excludes outliers, and its parametrization options allow for the removal of sudden decreases in NDVI, mainly due to non-vegetational effects. This, however, also excludes sudden decreases in NDVI due to vegetational processes, followed by rapid regrowth [21
]. Several approaches are used to fill gaps and smooth the phenological time series after the removal of noisy measurements: interpolation methods, signal filter or model fitting algorithms [12
]. Subsequently, phenological metrics are extracted using different approaches: global or local thresholds, conceptual-mathematical models [8
] or local extrema of transition rates [22
]. Here, we used BISE to filter NDVI data, followed by a simple linear interpolation to remove gaps in the time series, and finally, we extracted phenological metrics with a local threshold method.
The aim of this study is to validate different satellite phenology products by comparing them to multi-sensor ground measurements. This also requires a detailed description of the site’s instrumentation, as well as information on calibration, but the main focus of this study is not on the latter. Details on calibration will be described in a separate publication [23
]. In the following, we briefly introduce measurement approaches commonly applied at spectral ground observation sites.
Compact multispectral sensors such as the SKYE SKR1860 series (Skye Instruments Ltd., Llandrindod Wells, Powys, U.K.) are comparably low-cost and fairly straightforward to set up and maintain [6
]. Typically, no optical fibers are required, and data can be logged with data loggers commonly used at eddy covariance sites. Therefore, this type of instrument is widely used [24
]. Hyperspectral systems, in contrast, are usually more expensive and require computers to control the timing of measurements and instrument settings [6
]. Furthermore, it is common that fiber optics are used to connect the spectrometer to the point of measurement [25
], and this introduces additional potential for calibration issues on top of spectrometer calibration itself [6
]. However, they provide more detailed spectral information. Both sensor types are potentially affected by issues of long-term continuous field measurements [5
], such as degradation and sensitivity to temperature and humidity. While other studies focused mainly on either hyperspectral [24
] or multispectral sensors [4
], we used both types of sensors in order to assess measurement quality issues introduced by environmental conditions and instrument settings [24
], which might be relevant for the validation of satellite-derived phenological profiles.
In order to compute a reflectance factor, later used for the calculation of NDVI, it is necessary to measure the upwelling and downwelling radiation fluxes. This can be achieved with different setups [6
]: systems using one sensor have to measure a reference in sequence to each measurement of the target surface, e.g., a white reference panel [27
] or downwelling irradiance through the rotation of the fore optics of the sensor [28
]. This approach introduces considerable time delays between downwelling irradiance and upwelling radiance measurements and therefore increases measurement uncertainties, particularly under unstable illumination conditions [27
]. The moving parts are an additional potential error source in a long-term outdoor setup. Furthermore, the approach is restricted to a single fore optic for both target and reference measurements, which is suitable only for measurements of bi-hemispherical reflectance factors in practice.
Measuring downwelling irradiance and upwelling radiance quasi-simultaneously with a dual-field-of-view (DFOV) setup and different FOV fore optics [19
] removes the restriction of observing bi-hemispherical reflectance factors. Furthermore, simultaneous acquisition for both FOVs is possible in the case of the use of two sensors, while the time delay depends on sensor characteristics and instrument settings such as integration time if only one sensor is used [26
]. However, challenges due to spectral shifts caused by bifurcated fibers (one sensor) and cross-calibration of wavelength scale (two sensors) may arise for DFOV hyperspectral measurements. We used the DFOV approach with two sensors for multispectral measurements and with a single sensor and a bifurcated fiber for hyperspectral measurements. Hyperspectral- and multispectral-based NDVI values (spatial extent on the order of 10 m) were compared with NDVI products derived from MODIS Aqua and Terra (250-m ground resolution), as well as Sentinel-2 (10- and 20-m ground resolution) over the course of two vegetation periods. We evaluated whether dynamic filtering procedures, commonly used with NOAA AVHRR and MODIS, are now obsolete with new generation satellite data. Finally, the influence of varying spatial resolutions of the different datasets on extracted phenological metrics was also assessed.
We validated different satellite phenology products by comparing them to multi-sensor ground measurements. The approach relates to Land Product Validation Stage 1 of CEOS (Committee on Earth Observation Satellites) [2
]. This required the establishment of a validation site equipped with unattended multispectral and hyperspectral sensor systems for continuous vegetation monitoring. The DFOV systems enable reflectance factor acquisition with (near-)simultaneous measurements of downwelling irradiance and upwelling radiances at high frequency. The approach chosen in this study with a single-sensor DFOV hyperspectral system reduces the costs and effort compared to a system with two spectrometers since an additional permanently-installed spectrometer is not needed [6
]. Considerable instability in the scaling of the hyperspectral reflectance factor was observed when comparing time series of hyperspectral with multispectral observations. However, the scaling factor was observed to be independent of wavelength and thus does not affect NDVI, as wavelength independent scaling factors cancel out. For other applications requiring correct scaling of the hyperspectral reflectance factor, a further correction of the calibration instabilities is required. An approach to achieve this will be described in a separate publication. It has also been shown that multispectral data accuracy can potentially be improved by conducting more frequent in situ calibration/validation measurements [44
The high correlation of NDVI time series between sensors suggests that the multispectral sensor system is sufficient for phenological pattern analysis. The additional use of a hyperspectral system provides, however, information beneficial for the validation of other remote sensing products [6
] such as chlorophyll content and leaf area index [45
], as well as the possibility to model, for example, plant productivity [3
]. Multispectral information (of different central wavelengths and FWHM) can further be generated from hyperspectral signals so that the latter consequently enables the validation of different optical satellite missions. Upcoming hyperspectral satellite missions, e.g., EnMap [47
], and airborne hyperspectral campaigns will benefit largely from validation sites equipped with spectrometer systems.
Differences in NDVI products are either introduced from sensor specifications (e.g., band configuration), angular effects, measurement scales (and therefore, different observed vegetation patches), calibration accuracy of satellite sensors and atmospheric correction, including cloud-detection [48
]. While sensor specifications are mainly comparable, scales and observed vegetation patches differed significantly. Due to the MODIS coarse resolution of 250 m, analysis is limited to only one pixel, whereas we could statistically analyze a cluster (within 30 m around the eddy flux tower) of Sentinel-2A pixels with 10-m resolution for this study. Hence, the observed area had a size of around 62,500 m
(MODIS) and around 3600 m
(Sentinel-2A), respectively, compared to the observed ground area of around 72 m
. We observed varying performance of Sentinel-2 processors especially in the presence of small cloud patches, cloud shadows and haze. These problems may be enhanced for MODIS pixels due to their coarser resolution and subsequently more difficulties in the detection of small-scale cloud cover. The angular configuration is different for the sensor systems: MODIS Terra and Aqua overflights differ in solar zenith angle (around five to eight degrees) due to their overpass times around 10:30 a.m. and 01:30 p.m., respectively.
exhibit different illumination angles, introduced by the selection of the time period 11:00 a.m.–01:00 p.m. and DDSRR-filtering. Sentinel-2 observed our region of interest around 10:30 a.m. with a solar zenith angle similar to MODIS Terra. Hence, angular differences can be introduced by both temporal mismatches and angular configuration, which are difficult to differentiate. We mainly observed small NDVI differences within one day between sensor systems, although MODIS Terra showed larger differences to the other sensors during winter 2015/2016. We found considerably higher NDVI values in diurnal multispectral NDVI profiles before 10:30 a.m. only during winter time, assuming potential effects of illumination geometries on the signal. Consequently, MODIS Terra and potentially Sentinel-2 are more sensitive to this effect than MODIS Aqua due to their overpass time before noon. We could not substantiate this effect for Sentinel-2 data, since no observations with clear sky conditions were available for this period. Temporal mismatches between different systems increase using filter algorithms due to the selection of the ‘best’ or ‘true’ NDVI values. Remaining NDVI values after dynamic filtering are used as input for models restoring the phenological profile. Hence, the detection of phenological local extrema, e.g., date and magnitude of the peak of the growing season, is critical to robustly depict phenological metrics. Despite the above described effects, we could not detect a considerable influence of temporal mismatches within one day on phenological metrics extraction. This might be different for other vegetation types or climates with modified phenological timing. Temporal mismatches of several days can in extreme cases cause a derivation of potentially wrong phenological phases, e.g., Sentinel-2A observing clouds during green-up, while MODIS with higher repetition rate observes clear sky conditions in between.
In situ and satellite-based NDVI time series captured similar phenological patterns despite the numerous influencing factors. Correlations between the sensor systems were strong, although some deviations occurred during fall and winter between NDVI of ground sensor systems and MODIS Aqua and Terra. RMSEs between satellite and in situ data exhibited the same magnitude as found in another recent study [49
] and as the MODIS NDVI uncertainty (
Depicted phenological metrics from both in situ datasets were consistent with each other and agreed with Sentinel-2A-derived metrics in 2016. Standard deviations and even the differences of green-up and senescence between the sensor systems were consistent with the natural variance of phenology. MODIS NDVI values during spring and fall were lower compared to in situ data, leading to a deviation in depicted phenological phases: green-up occurred later and senescence earlier. The later GU is contrary to the results found in [4
]. Tests with NDVI from hyperspectral data resampled to MODIS bands demonstrated a small shift towards later GU. A larger shift was introduced when averaging NDVI of all Sentinel-2 pixels within the MODIS pixel, although the area is characterized by a homogeneous deciduous broadleaf forest with small clearings. Since satellite observations integrate phenological signals from different species within the pixel, the observed effect of spatial scale mainly stems from different species composition. The species’ signals might also be differently weighted according to the sensors’ point spread function. This hypothesis is supported by high satellite NDVI values in summer, later green-up and day of maximum NDVI, as well as earlier senescence with increasing scale.
Sentinel-2A images are currently available as Level 1C products. Atmospheric, cirrus and terrain corrections are necessary for further analysis. The processing tool ATCOR 2/3 produced reasonable Level 2A products. Although ATCOR 2/3 provided scene classification information supporting the elimination of outliers, misclassifications compromised the resulting NDVI time series. Three approaches can be distinguished eliminating faulty records in time series: images can be manually examined, but this is not feasible for long time series or large datasets. The usage of diffuse shortwave pyranometer measurements as selection criteria is also limited to observation sites with respective infrastructure, but these measurements are getting more and more common. Dynamic filter algorithms, such as BISE, reduce the risk of including false observations while being applicable to large datasets without the need for auxiliary site-specific measurements. Here, BISE was able to consistently remove faulty NDVI values. Daily MODIS NDVI values of 250 m lack scene classification information and respective analysis consequently requires the use of filter algorithms. BISE was able to restore the NDVI profile, although we detected mismatches with ground data during fall 2015 and winter 2015/2016. During December 2015 and January 2016, low temperatures, precipitation events and subsequently snow cover and frozen surfaces led to a strong decrease in multispectral and hyperspectral, as well as MODIS Aqua NDVI not related to vegetation. A strong increase in NDVI during the snow-melt period was detected, which is consistent with other studies observing NDVI of snow-covered vegetation [11
]. The length of this situation exceeded the sliding period of BISE parametrization. Consequently, decreased MODIS Aqua NDVI values were not eliminated. In contrast, MODIS Terra NDVI increased during winter 2015/2016, relating to the previously-mentioned angular effects.
Apart from assessing the accuracy and precision of satellite products, a validation site offers the opportunity to examine data gaps or periods with low signal-to-noise-ratio introduced by bad illumination conditions and cloud or snow cover. Hence, complete ground NDVI time series are beneficial. Future studies may use a combination of DDSRR-filtering and dynamic filtering, subsequently lowering the DDSRR-threshold to obtain more data points under overcast or hazy conditions, increasing the number of usable ground measurements. The comparison of filtered satellite-based time series with in situ data provides the possibility to modify BISE parameters for satellite phenology depiction, if necessary [50