Quality Assessment of PROBA-V LAI, fAPAR and fCOVER Collection 300 m Products of Copernicus Global Land Service

: The Copernicus Global Land Service (CGLS) provides global time series of leaf area index (LAI), fraction of absorbed photosynthetically active radiation (fAPAR) and fraction of vegetation cover (fCOVER) data at a resolution of 300 m and a frequency of 10 days. We performed a quality assessment and validation of Version 1 Collection 300 m products that were consistent with the guidelines of the Land Product Validation (LPV) subgroup of the Committee on Earth Observation System (CEOS) Working Group on Calibration and Validation (WGCV). The spatiotemporal patterns of Collection 300 m V1 LAI, fAPAR and fCOVER products are consistent with CGLS Collection 1 km V1, Collection 1 km V2 and Moderate Resolution Imagery Spectroradiometer Collection 6 (MODIS C6) products. The Collection 300 m V1 products have good precision and smooth temporal proﬁles, and the interannual variations are consistent with similar satellite products. The accuracy assessment using ground measurements mainly over crops shows an overall root mean square deviation of 1.01 (44.3%) for LAI, 0.12 (22.2%) for fAPAR and 0.21 (42.6%) for fCOVER, with positive mean biases of 0.36 (15.5%), 0.05 (10.3%) and 0.16 (32.2%), respectively. The products meet the CGLS user accuracy requirements in 69.1%, 62.5% and 29.7% of the cases for LAI, fAPAR and fCOVER, respectively. The CGLS will continue the production of Collection 300 m V1 LAI, fAPAR and fCOVER beyond the end of the PROBA-V mission by using Sentinel-3 OLCI as input data.


Introduction
The Copernicus programme is the European Union's Earth observation and monitoring programme, and it records data on the Earth and its ecosystems for the ultimate benefit of all citizens. A variety of technologies, including satellites and ground-based, sea-based and air-based measurements systems, allows Copernicus to deliver operational data and information services openly and freely in a wide range of application areas. In particular, the Copernicus Land Monitoring This article is structured as follows. Section 2 describes the satellite-based LAI, fAPAR and fCOVER products used in this study. Section 3 describes the ground datasets and the validation methodology. Section 4 presents the intercomparison and validation results. Section 5 discusses the results, and Section 6 presents the conclusions.

Collection 300 m V1 PROBA-V LAI, fAPAR and fCOVER Products Under Evaluation
The retrieval algorithm of the CGLS PROBA-V Collection 300 m Version 1 LAI, fAPAR and fCOVER products [8] is based on a two-step process. Daily top of the atmosphere (TOA) reflectances in the blue, red and near infrared (NIR) PROBA-V spectral bands are used in the Simplified Method for the Atmospheric Correction (SMAC) algorithm [19] to retrieve top of canopy (TOC) reflectance values, which are in turn used as inputs to the neural networks (NNTs) to retrieve daily estimates of LAI, fAPAR and fCOVER. The NNTs were calibrated using four years of VEGETATION-2 reflectance data as input. The outputs corresponded to the fusion of the MODIS C5 [20] and CYCLOPES V3.1 [21] for the LAI and fAPAR products and the scaling of the CYCLOPES V3.1 for fCOVER. Then, the final 10-day (dekadal) product values are generated using a dedicated compositing scheme over a dissymmetric temporal window characterized by a flexible size that depends on the availability of valid daily estimates [8]. The compositing includes the application of temporal filters to remove outliers as well as smoothing and gap-filling techniques. A specific process is applied to evergreen broadleaf forests that are highly affected by cloud occurrence, which induces higher noise levels (cloud misdetection) and missing data [8]. The products are delivered in near-real-time (RT0). Then, two successive consolidated estimates after one (RT1) and two (RT2) dekads are generated. A final estimate (RT5) is computed at the end of the consolidation period (i.e., 50 days). The final RT5 product is used in this study. An example of the PROBA-V LAI 300 m V1 spatial distribution is shown in Figure 1. The products are associated with quantitative quality information such as the number of valid daily values available in the compositing window, the root mean square errors of the dekadal values compared to the daily estimates and the length in days of the semiperiod of the compositing window before and after the dekad of interest. In addition, qualitative information is provided through a quality flag. The retrieval algorithm and the products are described in detail in the Algorithm Theoretical Basis Document [8] and the Product User Manual [22]. The first validation results that include all RT modes are reported in the Quality Assessment Report [23].

CGLS PROBA-V Collection 1 km
There are two versions of the CGLS Collection 1 km LAI, fAPAR and fCOVER products. Both Version 1 (V1) [24] and Version 2 (V2) [25] capitalize on existing CYCLOPES V3.1 [21] and MODIS Collection 5 (C5) products [20] and the use of NNTs calibrated with SPOT/VEGETATION reflectances in the red, NIR and SWIR bands. LAI is defined as the green area index, and it includes all the green vegetative elements and partly accounts for clumping; fAPAR is defined as the instantaneous black sky green fAPAR at 10:00-10:30; fCOVER is defined as the fraction of ground area covered by green vegetation [24].
The main differences between V2 and V1 are as follows: 1. V2 uses daily TOC reflectances and the associated cosine of the view zenith and sun zenith angles as inputs for the NNT, whereas V1 uses 10-day composite TOC reflectances; 2.
V2 applies temporal compositing over daily biophysical estimates by first removing outliers (e.g., mainly cloud mis-detection) by assuming a smooth temporal profile and then applying the CACAO method [26] (e.g., adjusting the magnitude and shift of the climatology) on a temporal window spanning a period between 30 and 60 days depending on the number of available observations, whereas V1 applies compositing at the TOC reflectance level by the inversion of a kernel-driven model over a temporal window [27]; 3.
V2 delivers near-real-time estimates that are updated every 10 days until a consolidated value is reached after two months, whereas V1 products are delivered with a temporal delay of 20 days; 4.
V2 uses dedicated temporal smoothing and gap filling based on interannual climatology information [28] to ensure consistency and continuity (no missing data) of products, whereas V1 does not apply temporal smoothing or gap filling.
The main differences between the V2 algorithm and 300 m V1 algorithm are as follows: 1.
The SWIR spectral band is not used since it is not available at a resolution of 300 m; 2.
No climatology is used as background information to fill gaps in the 300 m V1product, and gap filling is only applied over a local temporal window, which results in residual gaps for long periods of missing observations.
The CGLS PROBA-V Collection 1 km V1 products were validated over the year 2014 [29]. The accuracy assessment with ground measurements (25 samples) showed RMSD values of 0.54, 0.13 and 0.18 for LAI, fAPAR and fCOVER respectively and a positive bias of up to 0.15 for fCOVER.
The CGLS PROBA-V Collection 1 km V2 products were validated over the period October 2013-October 2014 [30], and the quality stability was systematically checked every year. The reports are available on the CGLS website [1]. The products displayed better spatial coverage (no gaps) and smoother profiles than V1 and MODIS C5 products. Specifically, over evergreen broadleaf forests, the CGLS 1 km V2 presented smooth trajectories with high values and very limited seasonality while the CGLS 1 km V1 showed unexpectedly low LAI, fAPAR and fCOVER values and seasonality and noise due to permanent clouds. The accuracy assessment over a limited number of concomitant ground-based measurements (<15) showed RMSD values of 0.79, 0.12 and 0.12 for LAI, fAPAR and fCOVER, respectively, with a positive significant fCOVER mean bias of 0.1.

NASA MOD15A2H Collection 6
The Terra MODIS LAI/fAPAR (MOD15A2H) Collection 6 (2000-present), available at https: //lpdaac.usgs.gov/products/, is produced at a spatial resolution of 500 m and a step of eight days over a sinusoidal grid [20]. We consider only the products derived from the main algorithm since the retrievals from the back-up algorithm are generated from surface reflectances with high uncertainties [31]. The main algorithm is based on the use of Look Up Tables (LUTs) built for six different types of biomes, with simulations from a three-dimensional radiative transfer model [32]. The MODIS red and Remote Sens. 2020, 12, 1017 5 of 29 NIR atmospherically corrected reflectances and associated uncertainties [33] and the corresponding illumination-view geometries and the biome type issued from the MODIS land cover product are used as inputs of the LUTs. The output is the mean LAI/fAPAR values for which simulated and measured MODIS surface reflectances are within the expected uncertainty ranges. Note that MODIS C6 does not provide fCOVER data and that over desert areas MODIS C6 does not provide LAI/fAPAR values, which are assumed to be zero.
An accuracy assessment performed over 45 fAPAR ground measurements from the CEOS Online Interactive Validation Exercise (OLIVE) dataset showed an overestimation of the C6 fAPAR products over sparsely vegetated areas [34]. Comparisons with CGLS Collection 1 km V1 products at the global scale showed similar spatiotemporal patterns except in evergreen broadleaf forests where noisy retrievals are observed, with a positive mean bias over benchmarking sites of 0.3 (0.13) units for LAI (fAPAR) [34].

Quality Flags
The quality flag information of the PROBA-V Collection 1 km V1 and V2 products and the two quality flag layers of MODIS C6 (FparLai provides information on retrieval status and FparExtra on clouds, water and land) were used to retain only the best-quality values (Table 1). Low-quality pixels and filled values for PROBA-V Collection 1 km V2 were not considered except in temporal profiles and global maps of residuals. All PROBA-V Collection 300 m V1 pixels were used since the retrieval algorithm keeps only the best PROBA-V quality pixels, i.e., pixels that presented good radiometric quality and were located over land and not covered by ice, cloud or snow [8].

Quality Assessment Methods and Ground Datasets
This section first describes the ground datasets (Section 3.1) used as validation references and then presents the quality assessment and validation methodology (Section 3.2). Appendix A describes the LANDVAL network of sites selected for benchmarking with reference satellite-derived vegetation products.

Ground Measurements
Ground measurements were mostly collected during the FP7 ImagineS project [7] over a network of sites, including a large variety of crop types and phenological stages, during the period 2013-2016 [16]. Approximately 90% of the sites belonged to crop areas (see Table 2), and the representativeness of other biome types was very limited in this dataset. Field data were collected according to well-defined protocols [17] that are consistent with the VALERI sampling strategy for upscaling. Between 13 and 15 Digital Hemispherical Photographs (DHPs) were taken in most of the sites to characterize the Elementary Sampling Unit (ESU) of approximately20 m × 20 m size and between 30 and 50 ESUs were measured following a stratified sampling per crop type to characterize the site of approximately 3 km × 3 km. In some sites, a LI-COR LAI-2200 Plant Canopy Analyzer (PCA) and an AccuPar LP-80 ceptometer were used following a similar sampling strategy [16]. The DHP measurements were processed with the CAN-EYE version 6.4.92 (CEV6.49) software [35] to retrieve the LAI, fAPAR and fCOVER. LAI was computed as the average value of two CAN-EYE (CV5.1 & CV6.1) solutions [36], and the clumping index was considered. The clumping index was computed using the [37] logarithm gap fraction averaging method, although some uncertainties are associated with this method [38]. The effective LAI (LAIeff) (i.e., assuming random distribution of elements in the canopy) was computed as the average value of the CEV5.1, CEV6.1 and Miller [39] methods; and the instantaneous black sky fAPAR was computed at 10:00 local solar time. It should be noted that all measurements were conducted from green-up to maturity of the plants with a negligible amount of nonphotosynthetically active material (i.e., retrieved variables correspond to green elements of the plants) and that using indirect methods no distinction is made between leaves and other plant elements (stems, bulbs, spikes). Consequently, for the LAI, the ground data definition corresponds better to a green area index (GAI). Note that when using the LAI-2200 PCA or AccuPAR LP-80, the clumping index is not computed; thus, the retrieved value corresponds better to an effective green area index. To maintain consistency with the satellite products, we retained the term LAI instead of GAI in the ground dataset. Ground-based maps were produced using CEOS LPV guidelines [9] and following the approach described in [15], and an iterative reweighted least squares (IRLS) regression algorithm was used to derive the transfer function between the ground data and the high-resolution Operational Land Imager (OLI)/Landsat-8 data [16]. The transfer function was applied to an area of 3 km x 3 km around the site. A quality flag was generated based on the convex hull technique described in [15] to identify the level of confidence in the transfer function estimate depending on whether the pixel falls within the convex hull (i.e., the transfer function behaves as an interpolator and has good confidence) or outside this domain (i.e., the transfer function behaves as an extrapolator and has lower confidence). The quality flag is used for the comparison with satellite products as described in Section 2.4. A total of 35 ground-based maps coincident with PROBA-V products are used in this study (sites #1-14, #17-31 and #36-41 in Table 2).
Six additional ground-based maps were generated by EOLAB during the 2016-2017 period (samples #15, #16 and #32-#35 in Table 2). The dataset was collected with DHP and processed following the same procedures used in ImagineS [17]; however, the recently available Sentinel-2A/MSI imagery was used for upscaling instead of Landsat-8/OLI. Values averaged over 3 km x 3 km around the site centre (Table 2) were included in the OLIVE DIRECT 2.0 ground database, which is available at the European Space Agency (ESA) cal/val portal and endorsed by the CEOS WGCV LPV [40].

Validation Methods
The quality assessment procedure was defined so that it was consistent with CEOS WGCV LPV good practices for global LAI product validation [9] and previous validation and intercomparison of vegetation satellite product research work [10][11][12]. Several performance criteria were assessed in Remote Sens. 2020, 12, 1017 8 of 29 this study, including (i) the product consistency with similar products in its temporal and spatial dimensions (Section 3.2.1), (ii) the inter-annual and intra-annual precision of the product (Section 3.2.2), and (iii) the accuracy of the product assessed by comparison with match-ups of upscaled ground-based measurements (Section 3.2.3). The assessment was performed for a two-year period (2016-2017) except for the accuracy assessment with ground measurements (2014-2017) to provide a more representative dataset.
Satellite products must be compared over a similar spatial support area and temporal support period. In this study, a 10-day temporal support (using the closest date of each product) and a 3 km × 3 km spatial support (i.e.,9 × 9 pixels in the case of Collection 300 m V1 products, 6 × 6 pixels in the case of MODIS C6, and 3 × 3 pixels in the case of Collection 1 km products) were used, except for the temporal profiles, where each product is displayed with its nominal temporal and spatial resolutions. For the spatial support, the Plate Carrée projection over 1/112º corresponding to the CGLS products was used as a common grid. Reference products were resampled to the CGLS grid considering only the best-quality pixels ( Table 1). The LANDVAL network of 725 sites was used for sampling global conditions.

Consistency Assessment with Similar Satellite Products
Spatial consistency refers to the realism of the spatial distribution of values over the globe. The consistency of the spatial distribution of retrievals with respect to similar satellite products was assessed by calculating the residuals between the Collection 300 m V1 products and the reference satellite products. The residual (ε) was estimated by assuming a linear relationship between two products (y = a·x + b + ε) and it represents the remaining discrepancies regarding the general relationship between both products. In this way, systematic relationships were not considered so that the patterns associated with the spatial distribution of the retrievals could be more clearly depicted. Note that deviations of the regression equation (i.e., residuals) are only used for the product spatial consistency assessment (Section 4.1.1).
Temporal consistency refers to the realism of seasonal and inter-annual variations, and it was qualitatively investigated via comparisons with the reference satellite products. Temporal profiles over the LANDVAL network of sites were analysed per biome type for the period January 2016-December 2017. Temporal consistency was quantitatively assessed with the inter-annual and intra-annual precision metrics (Section 4.1.2).
Finally, the overall spatiotemporal consistency between the Collection 300 m V1 and reference products was assessed over the LANDVAL network of sites considering the best-quality observations (Table 1) for the two-year period. Scatterplots between pairs of products were produced, and the associated validation metrics (Table 3) were quantified (Section 4.1.3).

Precision Assessment
In the absence of linear trends, upper and lower percentile anomalies for a biophysical variable are indicators of inter-annual precision, i.e., the dispersion of variable values from year to year [9]. A good practice is to report a boxplot of the median absolute deviation versus product bins and a single statistic corresponding to the median [9]. In this study, we assessed the absolute difference between two consecutive years of the 5th and 95th percentiles of the annual vegetation cycle over the LANDVAL network of sites. Cultivated sites were not considered in this analysis due to the non-natural variability of this land cover type induced by agricultural practices (e.g., crop rotation).
Intra-annual precision corresponds to temporal noise assumed to have no serial correlation within a season. In this case, the anomaly of a variable value predicted by linear interpolation of its neighbours in time can be used as an indication of intra-annual precision [9]. It can be characterized as suggested by [13] (Equation (1)) by computing the absolute value of the difference between the centre P(d n+1 ) and the corresponding linear interpolation between the two extremes P(d n ) and P(d n+2 ) for each triplet of consecutive observations: The histogram of δ values (Equation (1)) for the Collection 300 m V1 and reference products over LANDVAL were compared, and the median was used as a quantitative indicator of the inter-annual precision [9]. Hence, the lower the median of δ values, the higher the inter-annual precision. Table 3. Metrics computed for product validation, either comparison with ground measurements or intercomparisons of satellite products; x stands for the reference, and y stands for the estimated data. Major Axis Regression (slope, offset) Indicates bias.
Uncertainty requirements (%) Percentage of pixels meeting the CGLS or GCOS requirements.
p-value Test to determine whether the slope is significantly different from 1 (p < 0.05).

Accuracy Assessment with Ground Measurements
Product accuracy was assessed against ground reference data that were upscaled using high-resolution imagery. The closest product date to the field campaign was used. Available ground-based maps were provided over an area of 3 km × 3 km and remapped to the native PROBA-V resolution of 300 m for matching up with the satellite products. The aggregation was performed considering the effective point spread function (PSF) of the satellite product to account for several factors, including the PSF of the instrument, the geolocation uncertainty, the effect of the reprojection (from raw images to UMT) and atmospheric scattering [59]. This method improves the performance of the evaluation compared to the ordinary average [60], which is partly explained by the compensation of geolocation or reprojection errors. The PSF of the PROBA-V instrument can be approximated by the convolution of a Gaussian function characterizing the optical PSF [61]. The effective PSF was computed by maximizing the correlation coefficient between the 300 m V1 PROBA-V product and the corresponding high-resolution map [59][60][61]. During the PSF optimization process, an iterative approach in which we combined both the extension of the pixel size and the Full Width at Half Maximum (FWHM) of the Gaussian functions in the x and y directions was developed according to the scheme shown by Mira et al. [60]. The extension of the pixel was varied by steps of 1/8 up to 1/2 of the PROBA-V pixel size, whereas the FWHM was varied in steps of 1/20 up to 1/2 of the pixel size plus the enlargements.
Ground-based maps were provided with a quality flag indicating the pixels that belong to the convex hull computed from the set of pixels used to calibrate the up-scaling transfer function [15]. For the accuracy assessment we considered only the Collection 300 m V1 PROBA-V pixels for which more than 70% (threshold more restrictive than in previous studies [11,12]) of the corresponding high-resolution pixels were within the convex hull. The accuracy with respect to ground measurements was then assessed using the metrics defined in Table 3. Overall uncertainty (i.e., root mean square deviation, RMSD) includes systematic measurement error (i.e., bias) and random measurement error (i.e., standard deviation of bias). RMSD is recommended to report the accuracy of LAI products when there is only one product estimate to each ground mapping unit [9]. Major Axis Regression (MAR) was used because it is specifically formulated to handle errors in both the x and y products [62]. In addition, the accuracy assessment with ground data was conducted at 3 km × 3 km using the same number of samples; if there was a missing value in one dataset, then this observation was removed from the other datasets in order to have a consistent database for intercomparisons of results.
Finally, the percentage of pixels meeting the CGLS requirements was assessed ( Table 4). Because of the lack of an uncertainty requirement for LAI products, we have considered the actual Global Climate Observing System (GCOS) uncertainty requirement of 15% as the optimal level and the former GCOS requirement of Max (0.5, 20%) [63] as the target level. For the threshold level, a value of Max (0.75, 25%) was considered following the same incremental step in the relative terms. For fAPAR and fCOVER, an additional minimum uncertainty value of 0.05 or 0.1 was considered for the target and threshold level to avoid the large relative values that can be obtained for very low vegetation values. Note that the GCOS requirement for fAPAR corresponds to the target level. Table 4. Uncertainty requirements considered for LAI, fAPAR and fCOVER.

Optimal
Target Threshold

Spatial Consistency
The spatial distribution of residuals between the PROBA-V Collection 300 m V1 and reference LAI products is investigated ( Figure 2). Note that white areas correspond to missing data or water. The LAI collection 300 m V1 shows residuals within ±1 LAI units in most of the land vegetated regions and within ±0.5 LAI units for sparsely vegetated or arid regions when compared to the PROBA-V Collection 1 km (both V1 and V2). Some regions over northern latitudes and Central Africa exhibit larger negative residuals, typically between −1 and −2 LAI units. Similar spatial consistency between PROBA-V collections is found for the fAPAR and fCOVER, which can be seen in the Supplementary Materials ( Figure S1, Figure S2).
The analysis of residuals between PROBA-V 300 m V1 and MODIS C6 LAI products ( Figure 2) reveals larger discrepancies over most of the vegetated areas, with good consistency for sparsely vegetated or semiarid regions whilst for desert areas MODIS does not retrieve a value (we assumed LAI and fAPAR equal to zero). Positive residuals of approximately two LAI units are found in tropical forest, in the east part of North America and in large areas of Europe and Asia. Conversely, negative residuals of around −2 LAI units are found in large areas of Siberia. The spatial consistency between PROBA-V 300 m V1 and MODIS C6 is worse for fAPAR, and both positive and negative residuals are found around the world, including sparsely vegetated areas ( Figure S1).   Figure 3 shows some examples of Collection 300 m V1 temporal profiles for several sites corresponding to different biomes. Note that all LANDVAL temporal profiles can be found in the digital annex of the validation report [23]. Overall, good consistency in the temporal variations is observed between Collection 300 m V1 and the reference products, with consistent seasonal and inter-annual variations. The temporal consistency between the three PROBA-V products is noticeable, although some differences are observed in the magnitude of retrievals. MODIS C6 also shows overall similar seasonal and inter-annual variations but displays noisier temporal profiles.  Figure 3 shows some examples of Collection 300 m V1 temporal profiles for several sites corresponding to different biomes. Note that all LANDVAL temporal profiles can be found in the digital annex of the validation report [23]. Overall, good consistency in the temporal variations is observed between Collection 300 m V1 and the reference products, with consistent seasonal and inter-annual variations. The temporal consistency between the three PROBA-V products is noticeable, although some differences are observed in the magnitude of retrievals. MODIS C6 also shows overall similar seasonal and inter-annual variations but displays noisier temporal profiles. For evergreen broadleaf forest sites, the Collection 300 m V1 temporal trajectories are smooth with low seasonality, which is similar to that of the Collection 1 km V2 products as a result of the temporal compositing and smoothing techniques applied over this biome; conversely, MODIS C6 is significantly affected by cloud occurrence and displays noisy profiles for this biome type. For the needle-leaf forest sites located in the Siberia region, we observed dropouts during the peak of the season for MODIS C6, which explains the large differences found with respect to PROBA-V 300 m V1 in this area ( Figure 2). For fCOVER, Collection 300 m V1 in bare areas shows no seasonality as expected; hence, the false seasonality observed in PROBA-V Collection 1 km V1 is corrected.

Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 29
For evergreen broadleaf forest sites, the Collection 300 m V1 temporal trajectories are smooth with low seasonality, which is similar to that of the Collection 1 km V2 products as a result of the temporal compositing and smoothing techniques applied over this biome; conversely, MODIS C6 is significantly affected by cloud occurrence and displays noisy profiles for this biome type. For the needle-leaf forest sites located in the Siberia region, we observed dropouts during the peak of the season for MODIS C6, which explains the large differences found with respect to PROBA-V 300 m V1 in this area ( Figure 2). For fCOVER, Collection 300 m V1 in bare areas shows no seasonality as expected; hence, the false seasonality observed in PROBA-V Collection 1 km V1 is corrected.

Overall Spatiotemporal Consistency Between Products
For LAI, the product scatterplots show high consistency between the Collection 300 m V1 and reference products (Figure 4), with correlations of 0.98 for Collection 1 km V1 and V2 products and 0.94 for MODIS C6 ( Table 5). The comparison of Collection 300 m V1 LAI product with MODIS C6 shows a linear fit close to the 1:1 line (slope 1.00 and intercept 0.01) but with large scattering and overall discrepancies (RMSD) of 0.53 LAI units ( Table 5). The large scattering corresponds to spatial and temporal differences observed over some regions and time periods as observed previously (Figure 2 and Figure 3). The comparison with V1 and V2 shows a slight positive and negative mean bias (slope) of 0.04 (1.08) and −0.03 (0.97), respectively, with overall discrepancies (RMSD) of 0.36 and 0.31, respectively. The analysis per biome shows that systematic differences between Collection

Overall Spatiotemporal Consistency Between Products
For LAI, the product scatterplots show high consistency between the Collection 300 m V1 and reference products (Figure 4), with correlations of 0.98 for Collection 1 km V1 and V2 products and 0.94 for MODIS C6 ( Table 7). The comparison of Collection 300 m V1 LAI product with MODIS C6 shows a linear fit close to the 1:1 line (slope 1.00 and intercept 0.01) but with large scattering and overall discrepancies (RMSD) of 0.53 LAI units ( Table 7). The large scattering corresponds to spatial and temporal differences observed over some regions and time periods as observed previously (Figures 2  and 3). The comparison with V1 and V2 shows a slight positive and negative mean bias (slope) of 0.04 (1.08) and −0.03 (0.97), respectively, with overall discrepancies (RMSD) of 0.36 and 0.31, respectively. The analysis per biome shows that systematic differences between Collection 300 m V1 and Collection 1 km are mainly affecting evergreen broadleaf forest sites (Figures S3 and S6).
Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 29 300 m V1 and Collection 1 km are mainly affecting evergreen broadleaf forest sites (Figures S3 and  S6).
For fAPAR, correlations higher than 0.95 are found between Collection 300 m V1 and reference products, with linear fits close to the 1:1 line. The Collection 300 m V1 fAPAR product shows a small negative mean bias (slope of the fit) of −0.02 (0.96) compared with Collection 1 km V1 (Table  6), with an overall discrepancy (RMSD) of approximately 0.06. The analysis per biome type shows that the largest negative differences (bias of −0.076) are affecting the needle-leaf forest sites ( Figure  S4). The overall consistency with MODIS C6 is lower (RMSD = 0.08); MODIS C6 tends to provide high er values for low fAPAR ranges, which is mainly observed for nonforest sites ( Figure S10), but overall it displays a negative bias (−0.03), which can be mainly observed in the needle-leaf and broadleaf deciduous forest sites ( Figure S10).  Table 5, Table 6 and Table 7.
Finally, for the fCOVER results, the scatterplots show good linear consistency (Figure 4), with correlations of 0.99 and overall discrepancies (RMSD) of 0.05 between PROBA-V Collection 300 m V1 and both PROBA-V Collection 1 km V1 and V2 products (Table 7). Almost no mean bias is found in the comparison with Collection 1 km V1, although Collection 300 m V1 tends to provide higher values than Collection 1 km V2 (bias = 0.012, slope = 1.07). The analysis per biome reveals that bias mainly occurs for the cultivated and herbaceous sites with medium and high vegetation values ( Figure S8).  Tables 5-7. For fAPAR, correlations higher than 0.95 are found between Collection 300 m V1 and reference products, with linear fits close to the 1:1 line. The Collection 300 m V1 fAPAR product shows a small negative mean bias (slope of the fit) of −0.02 (0.96) compared with Collection 1 km V1 (Table 5), with an overall discrepancy (RMSD) of approximately 0.06. The analysis per biome type shows that the largest negative differences (bias of −0.076) are affecting the needle-leaf forest sites ( Figure S4). The overall consistency with MODIS C6 is lower (RMSD = 0.08); MODIS C6 tends to provide higher values for low fAPAR ranges, which is mainly observed for nonforest sites ( Figure S10), but overall it displays a negative bias (−0.03), which can be mainly observed in the needle-leaf and broadleaf deciduous forest sites ( Figure S10).
Finally, for the fCOVER results, the scatterplots show good linear consistency (Figure 4), with correlations of 0.99 and overall discrepancies (RMSD) of 0.05 between PROBA-V Collection 300 m V1 and both PROBA-V Collection 1 km V1 and V2 products ( Table 6). Almost no mean bias is found in the comparison with Collection 1 km V1, although Collection 300 m V1 tends to provide higher values than Collection 1 km V2 (bias = 0.012, slope = 1.07). The analysis per biome reveals that bias mainly occurs for the cultivated and herbaceous sites with medium and high vegetation values ( Figure S8).  Figure 5 shows the boxplots of the inter-annual absolute difference between two consecutive years for Collection 300 m V1 products. For LAI, larger absolute differences are found in the range of 4-5 LAI units, with differences of approximately 0.4 and typically below 0.3 for the other ranges. The median value for all the samples is 0.070 (4.4%), which is lower than that of Collection 1 km V1 and MODIS C6 ( Table 8). The Collection 1 km V2 LAI product shows the lowest median absolute difference (approximately 3%). The fAPAR Collection 300 m V1 product shows median absolute differences lower than 0.05 for all levels of fAPAR values, and the median absolute difference of 0.018 (5.1%) is slightly higher than that for the Collection 1 km product but lower than that for the MODIS C6 product. The Collection 300 m V1 fCOVER products show similar precision as the LAI and fAPAR, and the results are similar to that of Collection 1 km V1 fCOVER.   Figure 6 shows the histograms of the δ values (Equation (1)) for LAI and fAPAR products under study. The PROBA-V Collection 300 m V1 products show very similar histograms to the Collection 1 km V2 products since both are smoothed products, which also explains the lower median δ values obtained. The MODIS C6 histograms show higher frequencies for large delta values indicating noisier temporal evolution and lower intra-annual precision. For Collection 300 m V1, the median value used as an indicator of precision over short time scales shows values of 0.014 for LAI and 0.003 for fAPAR and fCOVER (not shown here for the sake of brevity), indicating high precision, and the result is slightly better than that of the Collection 1 km V1 products (0.017 for LAI and 0.005 for fAPAR and fCOVER). The MODIS C6 products show median δ values higher than Collection 300 m V1, indicating lower intra-annual precision as observed in the temporal profiles ( Figure 3).    Figure 6 shows the histograms of the δ values (Equation (1)) for LAI and fAPAR products under study. The PROBA-V Collection 300 m V1 products show very similar histograms to the Collection 1 km V2 products since both are smoothed products, which also explains the lower median δ values obtained. The MODIS C6 histograms show higher frequencies for large delta values indicating noisier temporal evolution and lower intra-annual precision. For Collection 300 m V1, the median value used as an indicator of precision over short time scales shows values of 0.014 for LAI and 0.003 for fAPAR and fCOVER (not shown here for the sake of brevity), indicating high precision, and the result is slightly better than that of the Collection 1 km V1 products (0.017 for LAI and 0.005 for fAPAR and fCOVER). The MODIS C6 products show median δ values higher than Collection 300 m V1, indicating lower intra-annual precision as observed in the temporal profiles ( Figure 3).   Figure 6 shows the histograms of the δ values (Equation (1)) for LAI and fAPAR products under study. The PROBA-V Collection 300 m V1 products show very similar histograms to the Collection 1 km V2 products since both are smoothed products, which also explains the lower median δ values obtained. The MODIS C6 histograms show higher frequencies for large delta values indicating noisier temporal evolution and lower intra-annual precision. For Collection 300 m V1, the median value used as an indicator of precision over short time scales shows values of 0.014 for LAI and 0.003 for fAPAR and fCOVER (not shown here for the sake of brevity), indicating high precision, a nd the result is slightly better than that of the Collection 1 km V1 products (0.017 for LAI and 0.005 for fAPAR and fCOVER). The MODIS C6 products show median δ values higher than Collection 300 m V1, indicating lower intra-annual precision as observed in the temporal profiles ( Figure 3 ).   Figure 7 shows the scatterplots between the PROBA-V Collection 300 m V1 products and ground-based reference maps at a spatial resolution of 300 m. For the sake of clarity, the mean value of each site along with the standard deviation over the 3 km × 3 km area is presented. However, the validation statistics are computed over all pixels at the nominal spatial resolution of 300 m. The accuracy assessment for each specific site over an extended area of the reference map can be found in the validation report [23].

Validation of PROBA-V Collection 300 m V1
Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 29 Figure 7 shows the scatterplots between the PROBA-V Collection 300 m V1 products and ground-based reference maps at a spatial resolution of 300 m. For the sake of clarity, the mean value of each site along with the standard deviation over the 3 km × 3 km area is presented. However, the validation statistics ar e computed over all pixels at the nominal spatial resolution of 300 m. The accuracy assessment for each specific site over an extended area of the reference map can be found in the validation report [23].  (Table 2), and colours identify the site. Statistics of the comparisons are reported in Table 9.

Validation of PROBA-V Collection 300 m V1
The LAI shows an accuracy (RMSD) of 1.01 (44.3%) (Table 9), with a positive mean bias of 0.36 (15.5%) and slope of the linear fit that is significantly different to 1. A large overestimation is found for medium LAI ranges over a few cropland sites (Albufera site: #17, #18 and #19 and the AHSPECT sites: #38 and #39), while an underestimation is observed for the Barrax (#12) and Moncada (#33 and #34) sites. For the limited samples of other biomes, good performance was found for the Shrublands (20Mayo_2: #9), deciduous broadleaf forest (Collelongo site: #29 and #30) and needleleaf forest (Lliria: #32).  (Table 2), and colours identify the site. Statistics of the comparisons are reported in Table 9. The LAI shows an accuracy (RMSD) of 1.01 (44.3%) (Table 9), with a positive mean bias of 0.36 (15.5%) and slope of the linear fit that is significantly different to 1. A large overestimation is found for medium LAI ranges over a few cropland sites (Albufera site: #17, #18 and #19 and the AHSPECT sites: #38 and #39), while an underestimation is observed for the Barrax (#12) and Moncada (#33 and #34) sites. For the limited samples of other biomes, good performance was found for the Shrublands (20Mayo_2: #9), deciduous broadleaf forest (Collelongo site: #29 and #30) and needle-leaf forest (Lliria: #32).
For fAPAR, the Collection 300 m V1 product shows a high correlation (0.91) with respect to ground reference data, with an overall accuracy (RMSD) of 0.12 (22.2%) and a slight positive mean bias of 0.05 (10.3%). Systematic positive bias occurs for all levels of fAPAR except for the highest values as can be observed by the linear fit (intercept 0.08, slope 0.94; Table 9). The largest overestimations are also found for croplands at the Albufera (#17, #18 and #19) and the AHSPECT sites (#39) as well as for the Barrax (#13), Capitanata (#25) and Moncada sites in wintertime (#35).
For fCOVER, the Collection 300 m V1 product shows a RMSD of 0.21 (42.6%), a statistically significant positive mean bias of 0.16 (32.2%) and slope of the linear fit of 1.28 (Table 9). For almost all sites, positive biases with ground references are found except for the Collelongo (#30) and Liria (#32) forest sites and the Moncada crop site (#33, #34), with the largest deviation found again for the Albufera rice site (#17) with the ground fCOVER values.
Finally, compliance with the CGLS user requirements (

Validation of MODIS and PROBA-V Products over the Common Samples
The accuracy assessment with common ground measurements is also conducted at a reduced resolution using averaged values over 3 km × 3 km (Table 2) to intercompare with reference satellite products.
The scatterplots for the different LAI products (Figure 8) show similar correlations and linear relationships for all the products. The best agreement in terms of RMSD is found for Collection 1 km V1 (0.86, Table 11), which is similar to MODIS C6, whereas Collection 300 m V1 shows a RMSD of 1.06. The three CGLS PROBA-V LAI products show a positive mean bias of approximately 0.5 (~20%) ( Table 11) and slopes that are different from 1 (not statistically significant), whereas MODIS C6 displays a lower mean bias 0.16 (7.5%). All the products show large discrepancies for the Albufera site at the early stages of development (#17-19) as well as for some AHSPECT sites (#38-39). The negative bias for Barrax (#12) is also common among all products.  Table 2) and colours indicate the site (legend in Figure 10). Dashed lines correspond to the 1:1 line and optimal and target requirements, and the red line corresponds to the linear fit using MAR. For the different fAPAR products (Figure 9), similar accuracy results are observed, with high correlations for all products (R>0.91, Table 12). The best accuracy is found for Collection 300 m V1, with an RMSD of 0.11 (21.3%) and a mean positive bias of 0.06 (12.5%). The other three satellite products show very similar statistics, with an RMSD of 0.14 (~26%) and bias of approximately 0.1 (~20%). A large overestimation is found for the Albufera rice site during the early stages and growing period.
For the fCOVER products (Figure 10), the three CGLS PROBA-V products show very similar results and display a systematic positive mean bias of approximately 30% compared with the ground reference maps. The best results are found for Collection 1 km V2 (mean bias of 0.13) whereas Collection 300 m V1 and Collection 1 km V1 show a mean bias of 0.16 (Table 13).  Table 2) and colours indicate the site (legend in Figure 10). Dashed lines correspond to the 1:1 line and optimal and target requirements, and the red line corresponds to the linear fit using MAR. Table 11. Accuracy assessment with ground measurements statistics for different satellite LAI products at 3 km spatial resolution. Relative values (%) are shown in brackets. For the different fAPAR products (Figure 9), similar accuracy results are observed, with high correlations for all products (R > 0.91, Table 12). The best accuracy is found for Collection 300 m V1, with an RMSD of 0.11 (21.3%) and a mean positive bias of 0.06 (12.5%). The other three satellite products show very similar statistics, with an RMSD of 0.14 (~26%) and bias of approximately 0.1 (~20%). A large overestimation is found for the Albufera rice site during the early stages and growing period.  (Table 4), and colours indicate the site (legend in Figure 10). Dashed lines correspond to the 1:1 line and optimal and target requirements, and the red line corresponds to the linear fit using MAR.   (Table 4), and colours indicate the site (legend in Figure 10). Dashed lines correspond to the 1:1 line and optimal and target requirements, and the red line corresponds to the linear fit using MAR. For the fCOVER products (Figure 10), the three CGLS PROBA-V products show very similar results and display a systematic positive mean bias of approximately 30% compared with the ground reference maps. The best results are found for Collection 1 km V2 (mean bias of 0.13) whereas Collection 300 m V1 and Collection 1 km V1 show a mean bias of 0.16 (Table 13) (Table 2). Dashed lines correspond to the 1:1 line and optimal and target requirements, and the red line corresponds to the linear fit using MAR.

Discussion
The CGLS offers global PROBA-V LAI, fAPAR and fCOVER products at resolutions of 1 km and 300 m. Overall good spatial and temporal consistency is found between the several CGLS PROBA-V collections for the three variables, which is partly explained by the fact that all the CGLS PROBA-V vegetation products are retrieved from the same sensor and based on the same NNT retrieval approach. The main differences between the algorithms for the different collections occur in the temporal compositing method: for V1 1 km, compositing is performed at the reflectance level without considering smoothing and gap filling; for V2 1 km, compositing is applied at the biophysical variable level and applies smoothing and gap filling with the use of climatology data for large gaps. Finally, for Collection 300 m V1, the approach is similar to that for the 1 km V2 but  (Table 2). Dashed lines correspond to the 1:1 line and optimal and target requirements, and the red line corresponds to the linear fit using MAR.

Discussion
The CGLS offers global PROBA-V LAI, fAPAR and fCOVER products at resolutions of 1 km and 300 m. Overall good spatial and temporal consistency is found between the several CGLS PROBA-V collections for the three variables, which is partly explained by the fact that all the CGLS PROBA-V vegetation products are retrieved from the same sensor and based on the same NNT retrieval approach. The main differences between the algorithms for the different collections occur in the temporal compositing method: for V1 1 km, compositing is performed at the reflectance level without considering smoothing and gap filling; for V2 1 km, compositing is applied at the biophysical variable level and applies smoothing and gap filling with the use of climatology data for large gaps.
Finally, for Collection 300 m V1, the approach is similar to that for the 1 km V2 but instead of using climatology data, which are not available in the 300 m V1 product, small gaps are filled by performing an interpolation in a local window while large gaps are not filled. The differences in these compositing approaches explain the larger differences observed when the number of valid observations is low. The largest residuals are found over northern latitudes or equatorial areas, which are affected by long periods of missing data due to persistent cloudiness and noise in the input data because of residual clouds and snow contamination. In terms of biomes, the evergreen broadleaf forest in tropical areas and needle-leaf forest in northern latitudes are affected by larger discrepancies between CGLS PROBA-V products. The comparison of Collection 300 m V1 with MODIS C6 shows overall good consistency for LAI with a linear relationship close to the 1:1 line but larger RMSD values than between PROBA-V products. However, for fAPAR, bias is found for low and high values. MODIS C6 displays higher values for very low and very high fAPAR values than Collection 300 m V1 products. These results are consistent with previous validation studies between SPOT/VGT V1 and MODIS C5 [29]. Large differences between satellite fAPAR products over forest sites have also been reported by a number of previous studies [10,[64][65][66]. The large RMSD values are partly explained due to large temporal variability of MODIS products (lower precision) as observed mainly over forest sites.
Collection 300 m V1 shows good precision, with smooth profiles and consistent inter-annual variations compared with the reference satellite products. The median absolute anomalies between consecutive years are only slightly higher than for the PROBA-V Collection 1 km products and better than MODIS C6 products, which show unstable profiles. The Collection 300 m V1 products display a slight systematic positive bias for LAI (0.36, 16%) and fAPAR (0.05, 10%) and significant deviations for fCOVER (0.16, 32%), with an overall accuracy (RMSD) of approximately 1 (44%) for LAI, 0.12 (22%) for fAPAR and 0.21 (42%) for fCOVER. The absolute location error for PROBA-V 300 m V1 is approximately 70 m on average for the different channels [67], which may introduce additional errors in the per-pixel based comparison. Nevertheless, the effective PSF method used for upscaling reduces the impact of geolocation errors in the accuracy assessment [60]. Indeed, the comparison at a spatial resolution of 3 km shows similar product accuracy (just slightly better) to that at a resolution of 300 m. The accuracy of Collection 300 m V1 is similar to that of the other reference satellite 1 km products for LAI and improved for fAPAR. All the satellite products show large overestimations for the Albufera rice crop site during the early and growing periods of development (June-July), and this result was previously observed for LAI products by Campos-Taberner et al. [68] and by Fang et al. [69] for rice crops in China. Satellite retrieval algorithms misinterpret the decreased reflectance due to strong water absorption as a denser canopy (i.e., increasing artificially LAI, fAPAR and fCOVER values). The user is advised to mask or apply specific corrections over flooded vegetated areas. Future improvements on the representativeness of the background spectra used for the calibration of retrievals algorithms should include the very dark background of flooded areas, such as those in paddy rice fields, during the growing period.
The overestimations found for PROBA-V fCOVER products was also reported over croplands for SPOT/VGT 1 km V1 using a different ground dataset [70]. The results of our study seem to indicate that the scale factor (1/0.687) applied to the CYCLOPES fCOVER product to derive the PROBA-V fCOVER product might be too high. However, correcting the fCOVER bias from the measurements might lead to some artefacts due to the limitations of the ground dataset in terms of accuracy and representativity. Further investigation is required before proposing any correction for the observed bias of fCOVER products. Indeed, the validation of fCOVER was conducted over a limited number of sites (mostly crops), and the accuracy of the ground measurements from DHP data is the best that could be obtained; however, it is still not very high, especially because of the footprint of the measurements compared to LAI/fAPAR. This footprint corresponds to approximately 2.2 m 2 per plot for an acquisition at 1 m height or fCOVER because only the 0-10 • range of view angles are used, whereas a footprint of approximately 60 m 2 is sampled for LAI and fAPAR using the 0-60 • range of view angles. Finally, more research is needed to confirm whether this overestimation is also affecting other biomes. Furthermore, it should be noted that ground reference data could have larger uncertainties than the CGLS optimal and target uncertainty requirements for fCOVER and fAPAR (5% and 10%, respectively). Thus, ground measurement uncertainties must be characterized, which is the main purpose in the fiducial reference measurement for vegetation (FRM4Veg) project [71], to better estimate the satellite product uncertainty budget.

Conclusions and Prospects
The quality assessment of PROBA-V Collection 300 m LAI, fAPAR and fCOVER Version 1 products was conducted following the CEOS WGCV LPV best practices for validating LAI products. The quality assessment was focused on evaluating the spatial and temporal consistency with similar products to identify similarities and discrepancies among similar datasets as well as the precision over a two-year period (2016-2017). The accuracy assessment using ground measurements was performed over a four-year period (2014-2017). Additional assessments of near real time estimates are not shown in this manuscript for the sake of brevity; however, they can be found in the validation report [23]. The main conclusions are as follows. • Collection 300 m V1 shows overall good spatial consistency with respect to Collection 1 km products, with some discrepancies mainly observed over areas where the inputs are more likely to show large uncertainties. Low spatial consistency is found with respect to MODIS C6, mainly for fAPAR products.

•
Collection 300 m V1 can be used to monitor the temporal evolution of vegetation with improved spatial resolution, good precision, a proper dynamic range (i.e., from 0 values for bare areas to maximum values for the densest canopies) and similar seasonal and inter-annual variations to reference products. Furthermore, it corrects some artefacts detected in PROBA-V Collection 1 km V1 for bare areas and northern latitudes.

•
Collection 300 m V1 shows low overall discrepancies (RMSD) with respect to ground reference data with a slight positive bias for LAI and fAPAR while fCOVER displays large systematic deviations with ground measurements. All variables are overestimated over paddy rice fields, which has been observed for other satellite reference products, and similarly low performance is expected for flooded vegetated targets. The user should use the products with caution for applications related to such ecosystems.
The relevance of the spatial and temporal consistency assessment is limited by the fact that no truly independent datasets have been considered, and because reference satellite products could suffer of the same inconsistencies, as we have observed over the Albufera site. Future work should include independent dataset (e.g., using high resolution imagery or temporal variations of vegetation derived at ground level) to assess the spatial and temporal consistency of the Collection 300 m dataset. Moreover, the relevance of the accuracy assessment using ground measurements is limited by the low number of appropriate ground reference datasets. In this context, the Ground-Based Observations for Validation (GBOV) [72] component of CGLS, which aims at facilitating the use of measurements from operational ground-based monitoring networks and their comparison to Earth observations products, will make a significant contribution to validation activities. Once the GBOV methodologies, in particular the up-scaling approaches, are validated by independent experts, the database will be a very useful reference for accuracy assessments of remotely sensed biophysical products.
Currently based on PROBA-V data, the production of Collection 300 m V1 will continue at the end of the operational PROBA-V mission (April 2020) using Sentinel-3 OLCI data as input. The retrieval methodologies are being updated to ensure the consistency of the time series.