Retrieval of High Spatiotemporal Resolution Leaf Area Index with Gaussian Processes , Wireless Sensor Network , and Satellite Data Fusion

Many applications, including crop growth and yield monitoring, require accurate long-term time series of leaf area index (LAI) at high spatiotemporal resolution with a quantification of the associated uncertainties. We propose an LAI retrieval approach based on a combination of the LAINet observation system, the Consistent Adjustment of the Climatology to Actual Observations (CACAO) method, and Gaussian process regression (GPR). First, the LAINet wireless sensor network provides temporally continuous field measurements of LAI. Then, the CACAO approach generates synchronous reflectance data at high spatiotemporal resolution (30-m and 8-day) from the fusion of multitemporal MODIS and high spatial resolution Landsat satellite imagery. Finally, the GPR machine learning regression algorithm retrieves the LAI maps and their associated uncertainties. A case study in a cropland site in China showed that the accuracy of LAI retrievals is 0.36 (12.7%) in terms of root mean square error and R2 = 0.88 correlation with ground measurements as evaluated over the entire growing season. This paper demonstrates the potential of the joint use of newly developed software and hardware technologies in deriving concomitant LAI and uncertainty maps with high spatiotemporal resolution. It will contribute to precision agriculture, as well as to the retrieval and validation of LAI products.


Introduction
The leaf area index (LAI) describes the surface area available per unit ground surface for energy and mass exchanges between vegetation and the atmosphere.It plays a key role in several surface processes, including photosynthesis, respiration, and evapotranspiration.Spatiotemporal continuous LAI data is critical for climate model validation [1], regional to global scale carbon budget estimation [2], disturbance detection [3], and many other applications.Remote sensing is the only tool that can effectively generate spatiotemporal continuous LAI products [4,5].The methods to estimate LAI values from satellite data can be divided into empirical methods based on statistical relationships with ground measurements [6][7][8], physical methods based on the inversion of radiative transfer (RT) models [9][10][11], and hybrid methods that combine statistical methods with RT models [12][13][14].The physical and hybrid methods are increasingly used in the scientific community because of their strong generalization ability [15,16].However, the inversion of a canopy RT model is an under-determined problem because of the limited information content of the radiometric signal compared to the high number of unknowns influencing the canopy reflectance [17].The ill-posed nature of the inversion problem-several sets of input variables can yield very similar spectra [18,19]-hampers the application of the physical and hybrid methods [18], so empirical methods are still popular, especially in regional-scale studies.
In situ LAI measurements, required in the calibration of the empirical methods, are usually derived from optical instruments [20][21][22], including digital hemispheric photographs [23,24], TRAC [25], LAI-2000 Plant Canopy Analyzer [26], or its successor LAI-2200.The manual nature of the optical instruments makes the traditional in situ measurements time-consuming and labor-intensive.The emergence of more portable or even fully automatic observation instruments, such as LED sensors [27] and smartphones [28,29], represents a change in paradigm in near-surface remote sensing.
Several operational LAI products, including MODIS [30], GLASS [31], and Copernicus Global Land [32,33] products, are regularly provided by space agencies over long periods at high temporal resolution (between 8-and 15-day frequency) but at coarse spatial resolution (between 300 m and 8 km).On the other hand, several studies have generated LAI maps with high spatial resolution but with a limited temporal sampling [34][35][36].In fact, the technical constraints of remote sensing instruments [37] involves a tradeoff between spatial and temporal resolution for remote sensing application.This hinders the utility of LAI products in areas with high spatial heterogeneity or substantial temporal variations [38][39][40].LAI time series with high spatiotemporal resolution are urgently needed for characterizing terrestrial ecosystem processes at the parcel level.
In a previous work [8], we proposed a framework for the retrieval of high spatiotemporal LAI products based on the combined use of wireless sensor network (WSN) and data blending (DB) technologies.The LAINet observation system [41,42] was used to estimate temporally continuous field measurements at high temporal frequency.The Consistent Adjustment of the Climatology to Actual Observations (CACAO) [43] was used as a DB technique to produce fine spatial resolution satellite imagery.Then, an exponential function relating the reconstructed remote sensing observations to the LAINet measurements was calibrated and used to predict LAI maps with high spatiotemporal resolution.However, this approach has two main limitations: (i) accuracy of LAI retrievals not meeting the user requirements (15%) proposed by the Global Climate Observing System [44] and (ii) non-retrieval of the uncertainty associated with the LAI maps.
Characterizing the uncertainties associated with LAI maps is required in many applications including the assimilation into land surface models [45,46].Some of the existing LAI products provide uncertainty estimates.In the MODIS LAI products, the uncertainty is the standard deviation over all acceptable solutions of the inversion process [4].In other cases, such as Copernicus Global Land 1 km Version 1 LAI products [32], the uncertainty is estimated using a dedicated algorithm that is independent of the main LAI retrieval algorithm.The rapid development of machine learning regression methods, particularly those based on Bayesian approaches, sheds light on the simultaneous estimation of LAI and uncertainty in a single algorithm [47].Gaussian process regression (GPR) [48,49] has been reported to improve accuracy in LAI retrieval and provide uncertainty estimates directly through Gaussian probabilities [7].
We aim to develop a method to generate more accurate LAI time series and their associated uncertainties at high spatiotemporal resolution.This will contribute to the temporally continuous validation of coarse spatial resolution LAI products and the assimilation of LAI estimates into land surface models, for which high spatiotemporal resolution maps of both LAI estimations and the corresponding uncertainties are needed.

Study Area
The research was conducted in a 5 km × 5 km region in Huailai, northern China (Figure 1).The annual average precipitation and temperature are 396 mm and 9.6 • C, respectively.Croplands are the main land cover type, and the growing season typically spans from mid-May to late September [50].The remaining land cover types include water in the northwest and impervious surfaces scattered over the study area.We aim to develop a method to generate more accurate LAI time series and their associated uncertainties at high spatiotemporal resolution.This will contribute to the temporally continuous validation of coarse spatial resolution LAI products and the assimilation of LAI estimates into land surface models, for which high spatiotemporal resolution maps of both LAI estimations and the corresponding uncertainties are needed.

Study Area
The research was conducted in a 5 km × 5 km region in Huailai, northern China (Figure 1).The annual average precipitation and temperature are 396 mm and 9.6 °C, respectively.Croplands are the main land cover type, and the growing season typically spans from mid-May to late September [50].The remaining land cover types include water in the northwest and impervious surfaces scattered over the study area.

Landsat 8 OLI
We used Landsat 8 Operational Land Imager (OLI) data acquired through the Earth Resources Observation and Science Center Science Processing Architecture (ESPA) (https://espa.cr.usgs.gov).ESPA provides top of canopy reflectances based on the Second Simulation of the Satellite Signal in the Solar Spectrum Vectorial (6SV) atmospheric correction algorithm [51].Spectral bands in the blue, green, red, near infrared (NIR), and the two shortwave infrared (SWIR1 and SWIR2) bands were used (Table 1).

Landsat 8 OLI
We used Landsat 8 Operational Land Imager (OLI) data acquired through the Earth Resources Observation and Science Center Science Processing Architecture (ESPA) (https://espa.cr.usgs.gov).ESPA provides top of canopy reflectances based on the Second Simulation of the Satellite Signal in the Solar Spectrum Vectorial (6SV) atmospheric correction algorithm [51].Spectral bands in the blue, green, red, near infrared (NIR), and the two shortwave infrared (SWIR1 and SWIR2) bands were used (Table 1).The dataset includes six cloud-free Landsat 8 OLI scenes (path: 123; row: 32) acquired on the day of year (DOY) 139, 187, 235, 267, 299, and 315 for the year 2013 (Figure 2).This period corresponds to the growing season of the study area (Figure 4).Note that only two Landsat 8 OLI scenes (acquired on DOY 187 and 235) are within the period of acquisition of ground data (from July 1 to September 14, 2013, as detailed in Section 2.3.1), that is, only two ground-based LAI maps could be generated from Landsat data without data blending.The dataset includes six cloud-free Landsat 8 OLI scenes (path: 123; row: 32) acquired on the day of year (DOY) 139, 187, 235, 267, 299, and 315 for the year 2013 (Figure 2).This period corresponds to the growing season of the study area (Figure 4).Note that only two Landsat 8 OLI scenes (acquired on DOY 187 and 235) are within the period of acquisition of ground data (from July 1 to September 14, 2013, as detailed in Section 2.3.1), that is, only two ground-based LAI maps could be generated from Landsat data without data blending.

MCD43A4
We also used the nadir directionally normalized reflectance MODIS Nadir Bidirectional Reflectance Distribution Function (BRDF) Adjusted Reflectance product (MCD43A4) [52].It is produced at 500 m spatial resolution every 8 days using a compositing window of 16 days with data from both Aqua and Terra satellites.All the scenes in 2013 covering the study area (tile number of h24v04) were downloaded from the Land Processes Distributed Active Archive Center (LP DAAC) (https://lpdaac.usgs.gov).The cloud-contaminated scenes were all filtered out, resulting in 36 clean scenes with a full coverage of the growing season of the study area (see Figure 2).The six bands in the MCD43A4 product analogous to the Landsat 8 OLI were selected (Table 1).The MCD43A4 data were resampled to 30-m spatial resolution using the nearest neighbor sampling method.(i) LAINet provided the LAI field measurements for calibrating and validating GPR;

Retrieval Algorithm
(ii) CACAO was used for blending the Landsat 8 OLI and MODIS MCD43A4 products and generating reflectance data with high spatiotemporal resolution; and, (iii) GPR was used for the retrieval of LAI and uncertainty maps.

MCD43A4
We also used the nadir directionally normalized reflectance MODIS Nadir Bidirectional Reflectance Distribution Function (BRDF) Adjusted Reflectance product (MCD43A4) [52].It is produced at 500 m spatial resolution every 8 days using a compositing window of 16 days with data from both Aqua and Terra satellites.All the scenes in 2013 covering the study area (tile number of h24v04) were downloaded from the Land Processes Distributed Active Archive Center (LP DAAC) (https://lpdaac.usgs.gov).The cloud-contaminated scenes were all filtered out, resulting in 36 clean scenes with a full coverage of the growing season of the study area (see Figure 2).The six bands in the MCD43A4 product analogous to the Landsat 8 OLI were selected (Table 1).The MCD43A4 data were resampled to 30-m spatial resolution using the nearest neighbor sampling method.(i) LAINet provided the LAI field measurements for calibrating and validating GPR; (ii) CACAO was used for blending the Landsat 8 OLI and MODIS MCD43A4 products and generating reflectance data with high spatiotemporal resolution; and, (iii) GPR was used for the retrieval of LAI and uncertainty maps.

Retrieval Algorithm
uncertainty maps with high spatiotemporal resolution.It includes LAINet, CACAO, and GPR methods: (i) LAINet provided the LAI field measurements for calibrating and validating GPR; (ii) CACAO was used for blending the Landsat 8 OLI and MODIS MCD43A4 products and generating reflectance data with high spatiotemporal resolution; and, (iii) GPR was used for the retrieval of LAI and uncertainty maps.

LAINet Field Measurements
Field measurements were collected through the LAINet observation system.LAINet comprises 12 below nodes which receive the transmitted solar radiation below the canopy (E B ) (see Figure 1 for their spatial distribution), and 1 above node which receives downward radiation above the canopy (E A ) [42].The LAI estimation is based on gap fraction measurements calculated as E B /E A .For details regarding LAINet, please refer to Qu et al. ( 2014) and our previous work [8].
Ground measurements were collected from July 1 to September 14, 2013 (see Figure 2).To reduce the random errors, the estimated LAI values from each node of the LAINet observation system were averaged over an 8-day interval.The 8-day composite measurements are hereafter indicated by the DOY of the first day of the compositing period.Due to unexpected instrument failures and unsatisfactory weather conditions, some LAINet measurements were invalid.Only four nodes (numbered 3, 5, 8, and 12 in Figure 1) had valid LAI values for every date.A total of 67 measurements were finally available.

CACAO Reconstruction of High Spatiotemporal Satellite Data
The CACAO [43] method was used to blend Landsat 8 OLI data of high spatial resolution and MCD43A4 data of high temporal resolution, and to reconstruct high spatiotemporal resolution reflectance data.
CACAO is a temporal smoothing and gap filling method with temporal constraints [43].It fits a phenology model to the actual observations based on a two-step process: (i) A phenology model was first constructed per pixel and per band based on the temporal evolution of MCD43A4 reflectance data.The time series of MCD43A4 reflectance were smoothed using Savitzky-Golay (SG) filtering [53] to generate the phenology model.SG smooths the time series by fitting a low-degree polynomial in the local temporal window.In this study, the degree of the smoothing polynomial was fixed at 2 and the half-width of the smoothing window at 16 days.(ii) The phenology model was then fitted to the actual Landsat observations.It was shifted and scaled in order to minimize its difference with the Landsat 8 OLI data in terms of the root mean square error (RMSE): where R OLI represents the Landsat 8 OLI reflectance, R pm represents the reflectance of the phenology model, n is the number of available Landsat 8 OLI observations (in this study, n = 6), t is the time in days, and scale and shift are the parameters to be adjusted.The minimization of the RMSE was done pixel by pixel, and band by band.After obtaining the values of scale and shift for each pixel in each band, Landsat-like reflectance can be computed at any arbitrary date from the MODIS phenology model.
The CACAO fitting is illustrated in Figure 4. Generally, the Landsat 8 OLI reflectance and the MODIS phenology model (SG-filtered MCD43A4 reflectance) show a similar seasonality with some differences in terms of magnitude and phase.The reconstructed reflectance retains the seasonality of the phenology model and fits the Landsat 8 OLI reflectance values.
The reconstructed reflectances have an 8-day temporal frequency and a 30-m spatial resolution.Their temporal and spatial characteristics inherit from MCD43A4 and Landsat 8 OLI data, respectively.The time series spans from DOY 121 to 321, 2013, covering the entire growing season in our study area.

GPR LAI and Uncertainty Retrieval
GPR can learn the complex relationship between input and output variables based on a probabilistic approach [49].In addition to highly accurate retrievals, GPR provides pixel-wise confidence intervals of the estimations [7].Details regarding GPR can be found in a series of studies from Verrelst et al. [7,46,54,55], and we provide only a brief description of GPR adopting their notation.
Under the GPR paradigm, the estimated LAI from GPR obey Gaussian distribution with mean and variance given by the following: where K* is a vector used for characterizing the similarity between the test and training samples, and K** is the prior covariance.A positive term is subtracted from the prior covariance to estimate the final variance.Therefore, the posterior variance is always smaller than the prior variance, since the observations contribute some additional information [49].Theoretically, the GPR-estimated uncertainty depends on the training reflectance.If the test reflectance is similar to the training reflectance, the uncertainty for the test set is relatively small because we are dealing with similar input features [46].In this study, standard deviation (SD), namely, the square root of the predictive variance, was used as a proxy of the retrieval uncertainty.
In Equations ( 2) and (3), K is a kernel function that evaluates the similarity between the test reflectance x and all training reflectance values.In this study, the automatic relevance determination kernel was used: where v is a scaling factor, B is the dimensions of the input (in this study, B = 6), σb controls the spread of the relations for each dimension of the input, and high values of σb mean lower informative content of the corresponding dimension for the regression and vice versa, σn is the noise standard deviation, and δij is Kronecker's symbol.
Model hyper-parameters involved in GPR were automatically optimized by maximizing the marginal likelihood in the training set [49].In this study, GPR was implemented using the simpleR

GPR LAI and Uncertainty Retrieval
GPR can learn the complex relationship between input and output variables based on a probabilistic approach [49].In addition to highly accurate retrievals, GPR provides pixel-wise confidence intervals of the estimations [7].Details regarding GPR can be found in a series of studies from Verrelst et al. [7,46,54,55], and we provide only a brief description of GPR adopting their notation.
Under the GPR paradigm, the estimated LAI from GPR obey Gaussian distribution with mean and variance given by the following: where K * is a vector used for characterizing the similarity between the test and training samples, and K ** is the prior covariance.A positive term is subtracted from the prior covariance to estimate the final variance.Therefore, the posterior variance is always smaller than the prior variance, since the observations contribute some additional information [49].Theoretically, the GPR-estimated uncertainty depends on the training reflectance.If the test reflectance is similar to the training reflectance, the uncertainty for the test set is relatively small because we are dealing with similar input features [46].In this study, standard deviation (SD), namely, the square root of the predictive variance, was used as a proxy of the retrieval uncertainty.
In Equations ( 2) and (3), K is a kernel function that evaluates the similarity between the test reflectance x and all training reflectance values.In this study, the automatic relevance determination kernel was used: where v is a scaling factor, B is the dimensions of the input (in this study, B = 6), σ b controls the spread of the relations for each dimension of the input, and high values of σ b mean lower informative content of the corresponding dimension for the regression and vice versa, σ n is the noise standard deviation, and δ ij is Kronecker's symbol.Model hyper-parameters involved in GPR were automatically optimized by maximizing the marginal likelihood in the training set [49].In this study, GPR was implemented using the simpleR MATLAB toolbox, freely available at https://github.com/IPL-UV/simpleR(accessed on 18 January 2019).
The training dataset is the prerequisite to train a GPR suitable for LAI and uncertainty retrieval.The 67 LAINet measurements and the associated reconstructed Landsat-like reflectances were therefore collected and added to the training dataset.Our study area, beyond the cropping period, is characterized by seasonal inundation and bare soil after harvest.Two dry-soil and two flooded-soil spectral reflectances were therefore added to the training database (with LAI values equal to zero) to improve its representativeness.This treatment was recommended to improve the accuracy of GPR, especially for the input reflectances without associated in situ measurements [34,46].This resulted in the final training dataset made of 71 LAI-reflectance pairs.The trained GPR was used to generate the concomitant LAI and uncertainty maps with 30-m and 8-day resolutions.

Validation Approach
The proposed framework approach requires accurate reconstruction of reflectance data at high spatial resolution.The accuracy of the reconstructed reflectances was first assessed (Section 3.1).The original reflectances from the six available Landsat 8 OLI images (Figure 2) were used as the reference data.
We conducted a leave-one-out cross-validation to evaluate the performance of GPR in LAI retrieval (Section 3.2).The n = 67 number of pairs of LAINet-reflectances were split in n-1 for the training and 1 for the validation.Then, the calibrated GPR was used to predict LAI for the observation left out.This process was repeated for all the samples.Finally, we evaluated the performance comparing the predicted LAI with the ground observations for all the samples.For comparison purposes, we also assessed the performance of the exponential regression (ER) approach proposed in our previous work [8].Both approaches share the same three-step flowchart described in Figure 3, but with some major updates: (1) the exponential regression (y = 0.21exp(3.44x))proposed in [8] has now been replaced with GPR, to generate LAI and uncertainty maps from ground measurements and high spatiotemporal reflectance data; (2) spectral reflectance was directly ingested into GPR, instead of Normalized Difference Vegetation Index (NDVI) ingested into ER in the previous work; (3) SG filtering was used to generate the phenology model in CACAO, rather than the Double Logistic function adopted in the previous work.
Finally, the temporal evolution of the predicted LAI and associated uncertainties was evaluated (Section 3.3).

Evaluation of the Reconstructed High Spatiotemporal Satellite Data
For brevity, Figure 5 only shows the histograms of residuals between the CACAO reconstructed reflectances and OLI observed reflectances on DOY 187 and 235, which are temporally synchronous with the LAINet measurements.The comparison for other dates is shown in Appendix A. In general, the distributions of residuals have peak shapes centered at 0 without significant bias, confirming a good agreement between the reconstructed and the observed reflectances.

Cross-validation
The results of the cross-validation for ER and GPR approaches are shown in Figure 6.Compared to our previous ER approach [8] (R 2 = 0.70, RMSE = 0.52, relative RMSE (R-RMSE) = 18.4%),GPR showed improved performances (R 2 = 0.88, RMSE = 0.36, relative RMSE = 12.7%).In addition, ER clearly underestimated LAI observations for LAI values higher than 4.0 because of the saturation of NDVI.This underestimation phenomenon for the high LAI values was clearly mitigated in GPR.

Temporal Evolution of LAI and Uncertainty Retrievals
The LAI maps (Figure 7) provide detailed information of the expected spatiotemporal dynamics of vegetation in the study area.Before green-up (DOY 161), consistent low LAI values close to zero are observed.After green-up, the vegetation grew rapidly, and reached the growing peak on DOY 217.During this period (from DOY 201 to 217), areas around the water (in the northwestern part of the study area) were characterized by low LAI values (<2.0) due to later sowing.After the growing peak on DOY 217, the vegetation began to wither, and the LAI decreased gradually to zero.During

Cross-Validation
The results of the cross-validation for ER and GPR approaches are shown in Figure 6.Compared to our previous ER approach [8] (R 2 = 0.70, RMSE = 0.52, relative RMSE (R-RMSE) = 18.4%),GPR showed improved performances (R 2 = 0.88, RMSE = 0.36, relative RMSE = 12.7%).In addition, ER clearly underestimated LAI observations for LAI values higher than 4.0 because of the saturation of NDVI.This underestimation phenomenon for the high LAI values was clearly mitigated in GPR.

Cross-validation
The results of the cross-validation for ER and GPR approaches are shown in Figure 6.Compared to our previous ER approach [8] (R 2 = 0.70, RMSE = 0.52, relative RMSE (R-RMSE) = 18.4%),GPR showed improved performances (R 2 = 0.88, RMSE = 0.36, relative RMSE = 12.7%).In addition, ER clearly underestimated LAI observations for LAI values higher than 4.0 because of the saturation of NDVI.This underestimation phenomenon for the high LAI values was clearly mitigated in GPR.

Temporal Evolution of LAI and Uncertainty Retrievals
The LAI maps (Figure 7) provide detailed information of the expected spatiotemporal dynamics of vegetation in the study area.Before green-up (DOY 161), consistent low LAI values close to zero are observed.After green-up, the vegetation grew rapidly, and reached the growing peak on DOY 217.During this period (from DOY 201 to 217), areas around the water (in the northwestern part of the study area) were characterized by low LAI values (<2.0) due to later sowing.After the growing peak on DOY 217, the vegetation began to wither, and the LAI decreased gradually to zero.During

Temporal Evolution of LAI and Uncertainty Retrievals
The LAI maps (Figure 7) provide detailed information of the expected spatiotemporal dynamics of vegetation in the study area.The retrieved uncertainty maps (Figure 8) provide insight into the confidence of the estimated LAI.Before green-up, the middle area covering the LAINet nodes was characterized by high uncertainty (SD >0.8).LAI maps near and during the period of implementation of the LAINet (from DOY 185 to 233) were more credible (SD <0.7).During the dormancy period (e.g., DOY 289-321), LAI retrievals in the inundated region were uncertain (SD >0.9).The retrieved uncertainty maps (Figure 8) provide insight into the confidence of the estimated LAI.Before green-up, the middle area covering the LAINet nodes was characterized by high uncertainty (SD > 0.8).LAI maps near and during the period of implementation of the LAINet (from DOY 185 to 233) were more credible (SD < 0.7).During the dormancy period (e.g., DOY 289-321), LAI retrievals in the inundated region were uncertain (SD > 0.9).The temporal evolution of GPR and ER retrievals was further evaluated over the nodes 3, 5, 8 and 12 of LAINet, which provided full temporal coverage for field measurements (Figure 9).In general, both the GPR and ER estimates had similar temporal patterns and followed the temporal dynamics of the observed LAI.However, three main differences can be clearly observed: (1) the local temporal variations in ground LAI measurements were well captured by GPR, whereas the ER oversmoothed the LAI time series; (2) the magnitude of LAI values before green-up (DOY <161) was close to zero for GPR as expected, whereas ER slightly over-estimated the LAI; (3) during the withering period between DOY 257 and 297, the GPR-estimated LAIs were larger than the ER-estimated ones.Note that nearly all field observations (except the one for plot 8 on DOY 185) lie within the confidence interval of GPR predictions.The temporal evolution of GPR and ER retrievals was further evaluated over the nodes 3, 5, 8 and 12 of LAINet, which provided full temporal coverage for field measurements (Figure 9).In general, both the GPR and ER estimates had similar temporal patterns and followed the temporal dynamics of the observed LAI.However, three main differences can be clearly observed: (1) the local temporal variations in ground LAI measurements were well captured by GPR, whereas the ER over-smoothed the LAI time series; (2) the magnitude of LAI values before green-up (DOY < 161) was close to zero for GPR as expected, whereas ER slightly over-estimated the LAI; (3) during the withering period between DOY 257 and 297, the GPR-estimated LAIs were larger than the ER-estimated ones.Note that nearly all field observations (except the one for plot 8 on DOY 185) lie within the confidence interval of GPR predictions.

Improvements Over Previous Approach
This study is based on our previous approach [8] with three major updates: (1) the exponential regression was replaced with GPR, to generate LAI and uncertainty maps from ground measurements and high spatiotemporal reflectance data; (2) spectral reflectance was directly ingested into the transfer function, instead of the vegetation index in the previous work; (3) SG filtering was used to generate the phenology model in CACAO, rather than the Double Logistic function adopted previously.These updates bring several improvements.First, the accuracy of the retrievals has improved.The R 2 increased from 0.70 to 0.88, and the RMSE decreased from 0.52 LAI (18.4%) to 0.36 LAI (12.7%), meeting the accuracy requirements proposed by the Global Climate Observing System [44].Second, the temporal evolution of LAI is better captured by GPR, whereas ER over-smooths the time series.Further, GPR shows a better temporal extrapolation capacity than ER, particularly for the dormancy period with LAI retrievals close to the expected zero LAI value.This allows the generation of temporally continuous LAI maps over the entire growing season.Finally, GPR provides the uncertainty associated with the LAI retrievals [46,54].
Several reasons can explain this improvement.(1) GPR is an advanced regression method and can learn the complex nonlinear relationship between remote sensing observations and biophysical variables [48,49].Many studies have shown that GPR outperforms parametric regression methods in most cases [5,7,34].Our study further confirms this finding.(2) GPR has the capability to directly exploit the full spectrum [55] and, for this reason, individual bands rather than the derived vegetation index were used in our GPR.The procedure for calculating the vegetation index (i.e., the selection of bands and the formulation of the selected bands) would cause loss of information useful for LAI retrieval [56].(3) SG filtering can reserve the relatively high frequency variation in the original reflectance time series [53], which is therefore suitable for capturing the local scale changes in the time series of LAI and preventing the over-smoothing observed in Double Logistic.( 4

Improvements Over Previous Approach
This study is based on our previous approach [8] with three major updates: (1) the exponential regression was replaced with GPR, to generate LAI and uncertainty maps from ground measurements and high spatiotemporal reflectance data; (2) spectral reflectance was directly ingested into the transfer function, instead of the vegetation index in the previous work; (3) SG filtering was used to generate the phenology model in CACAO, rather than the Double Logistic function adopted previously.These updates bring several improvements.First, the accuracy of the retrievals has improved.The R 2 increased from 0.70 to 0.88, and the RMSE decreased from 0.52 LAI (18.4%) to 0.36 LAI (12.7%), meeting the accuracy requirements proposed by the Global Climate Observing System [44].Second, the temporal evolution of LAI is better captured by GPR, whereas ER over-smooths the time series.Further, GPR shows a better temporal extrapolation capacity than ER, particularly for the dormancy period with LAI retrievals close to the expected zero LAI value.This allows the generation of temporally continuous LAI maps over the entire growing season.Finally, GPR provides the uncertainty associated with the LAI retrievals [46,54].
Several reasons can explain this improvement.(1) GPR is an advanced regression method and can learn the complex nonlinear relationship between remote sensing observations and biophysical variables [48,49].Many studies have shown that GPR outperforms parametric regression methods in most cases [5,7,34].Our study further confirms this finding.(2) GPR has the capability to directly exploit the full spectrum [55] and, for this reason, individual bands rather than the derived vegetation index were used in our GPR.The procedure for calculating the vegetation index (i.e., the selection of bands and the formulation of the selected bands) would cause loss of information useful for LAI retrieval [56].(3) SG filtering can reserve the relatively high frequency variation in the original reflectance time series [53], which is therefore suitable for capturing the local scale changes in the time series of LAI and preventing the over-smoothing observed in Double Logistic.(4) Two dry-soil and two flooded-soil spectral reflectances (with LAI values equal to zero) were introduced to establish an extended training dataset.The original training dataset consisting of only the 67 LAINet measurements under-sampled the low LAI values.The introduction of dry-and flooded-soil spectral reflectances improved the representativeness of the trained GPR for low LAI values, and therefore improved its temporal extrapolation capacity especially for dormancy periods.
The GPR-estimated LAI values are higher than the ER retrievals during the withering period (see Figure 9).One possible explanation is that the NDVI represents the greenness of the vegetation [57], and the gradual decrease of leaf chlorophyll content during the withering period [58] can therefore decrease the value of NDVI and further cause an underestimation of the LAI value.The temporal variation of leaf chlorophyll content (which further causes the variation of leaf spectra) underlines the importance of the direct use of multi-or even hyper-spectral reflectances, especially those containing information on leaf spectra.

Potential Applications
The LAI time series generated from the proposed framework are characterized by high spatiotemporal resolution and are accompanied with concomitant uncertainty maps.These datasets may be useful in many applications.
The generated high spatiotemporal resolution LAI maps can support the validation activities for the coarse resolution LAI products.Our framework complies with the hierarchical four-stage validation approach proposed by the Land Product Validation (LPV) subgroup of the Committee Earth Observing Satellites' Working Group on Calibration and Validation (CEOS WGCV) [59].Therefore, the generated LAI maps can directly serve as reference maps in validation activities.The obvious advantage of our framework over existing studies is that it can derive temporally continuous LAI reference maps.Therefore, it can be used to assess the temporal performance of the coarse resolution LAI products, which is the prerequisite for long-term global change research [60,61].
The derived high spatiotemporal resolution LAI maps can also be used to quantify the spatial and temporal variability of crop distribution and status at field scale.This kind of information is of paramount importance for precision agriculture [62].The high spatiotemporal resolution LAI maps can provide guidance on agricultural activities and aid farmers to decide, for example, which is the appropriate time to fertilize or irrigate their fields.
The retrieved uncertainty information would allow a proper use of the LAI retrieval.For example, the high reliable retrieval with low SD can be easily extracted for subsequent applications.The inconsistencies among existing coarse resolution LAI products call for quality-assured long-term LAI products [60,63].Our framework may provide an effective method to fulfill this target.
In addition, the concomitant uncertainty maps facilitate the integration of remote sensing and process models at field scale [64].For example, the LAI retrievals can be easily assimilated by crop models, because their contribution in the assimilation process can be automatically weighted according to the concomitant uncertainty maps.

Future Prospects of Research
This study proposed a framework to generate concomitant LAI and uncertainty maps with high spatiotemporal resolution.Several issues are worth being noted to further improve the framework.
High spatial resolution remote sensing observations that are temporally synchronous with field measurements are needed to reduce the uncertainty during LAI estimation caused by the dynamic change of vegetation [65].The tradeoff between temporal and spatial resolutions of a remote sensing sensor makes it difficult to acquire remote sensing observation with both high spatial resolution and frequent coverage.Data blending techniques were used here to obtain high spatial resolution remote sensing observations that were temporally synchronous with LAINet observations.In our study area, one year of MODIS data was enough to provide temporally continuous coverage although multi-year data may be required to build the phenology model in regions with high cloud coverage.Note that a spectral normalization between MCD43A4 and Landsat 8 OLI data was not applied.We assumed that the difference in their spectral response functions could be implicitly accounted for during the shifting and scaling procedure, considering the significant linear correlation between MODIS and OLI reflectance products [51].
An alternative to generate high spatiotemporal resolution remote sensing data is the combined use of multi-sensor data.For example, Zhao et al. [16] used HJ1/CCD and Landsat 8/OLI simultaneously to fill temporal gaps which existed in any one single data source.In addition, some newly launched satellites (e.g., GF-6 or Venµs) can provide remote sensing observations with both high spatial and temporal resolutions.The performance of our proposed method for multi-sensor or high spatiotemporal data will be evaluated in future studies.
The analysis of the uncertainty maps suggested that under-sampled observations (e.g., middle areas before green-up and inundated areas during the dormancy period, see Figure 9) are often associated with high uncertainty.Therefore, a longer temporal span and more below nodes should be added to the LAINet system in our study area to enhance the representativeness of the training dataset [50,66].To partly cope with the training dataset under-sample for low LAI, we deliberately extended the original training dataset with four new spectra that covered two main kinds of non-vegetated surfaces (i.e., dry and flooded soils) with an LAI value of zero.This extended training dataset improved the extrapolation capacity of GPR, especially for low values which were not sufficiently sampled by LAINet.Surprisingly, the estimates with low LAI also represent high uncertainty after introducing the non-vegetated spectra.This may be because the non-vegetated and low-LAI surfaces have significant heterogeneity [67] and four spectra cannot represent them very well.However, we did not try to add more non-vegetated spectra, because the number of LAINet measurements was limited (67), and more non-vegetated spectra would have diluted the information involved in LAINet measurements, and further reduced the retrieval accuracy.Another solution to improve the representativeness of the training dataset is to add reflectance simulated by RT models, which can cover a broader range of scenarios.As proposed by [12], the simulated reflectance can be imported through the joint Gaussian process to extrapolate the observed reflectance.
The predictive variance from GPR is an indicator of the real uncertainty; however, the physical meaning of this predictive variance is still not clear [46].Intuitively, it is an indicator of the similarity between the test and the training samples in the LAI space transferred from the reflectance space through a kernel function (see Equation ( 4)).The kernel used assumed that the measurement noise was independent from the signal (see Equation ( 4)), which does not hold in most of the problems.The heteroscedastic Gaussian processes [68] can be employed to further improve the estimation results in future studies.In addition, several other methods exist in the literature to quantify the uncertainty associated with retrievals including partial least squares regression [69], Markov chain Monte Carlo [70], random forest [71], and Kriging [72].These methods will be assessed in future studies.

Conclusions
We proposed a framework to generate long-term time series of LAI and their associated uncertainty maps with high spatiotemporal resolution.It was based on the combination of LAINet, CACAO, and GPR.The performance of the proposed method was evaluated over the entire growing season in a crop site in northern China.The accuracy of the retrieved LAI maps (0.36 LAI (12.7%) in terms of RMSE with R 2 = 0.88 correlation) met the up-to-date uncertainty threshold (15%) proposed by the Global Climate Observing System.To our knowledge, this is the first work combining wireless sensor network, data blending, and machine learning technologies for retrieving LAI and its uncertainty at high spatiotemporal resolution.Our framework will contribute to precision agriculture, as well as to the retrieval and validation of LAI products.

Figure 1 .
Figure 1.Map of the study area.The background image is the false color composition with bands 6-5-4 of the Landsat 8 Operational Land Imager (OLI) image acquired on August 23, 2013.The points and numbers indicate the locations of the twelve below nodes in the LAINet observation system deployed in the study area.

Figure 1 .
Figure 1.Map of the study area.The background image is the false color composition with bands 6-5-4 of the Landsat 8 Operational Land Imager (OLI) image acquired on August 23, 2013.The points and numbers indicate the locations of the twelve below nodes in the LAINet observation system deployed in the study area.

Figure 2 .
Figure 2. Temporal distributions of the MODIS and Landsat acquisitions during the year 2013 over the study area.The shaded area represents the period of acquisition of ground LAINet measurements.DOY: day of year.

Figure 3
Figure 3 shows the general outline of the proposed framework to generate concomitant LAI and uncertainty maps with high spatiotemporal resolution.It includes LAINet, CACAO, and GPR methods:(i) LAINet provided the LAI field measurements for calibrating and validating GPR;(ii) CACAO was used for blending the Landsat 8 OLI and MODIS MCD43A4 products and generating reflectance data with high spatiotemporal resolution; and, (iii) GPR was used for the retrieval of LAI and uncertainty maps.

Figure 2 .
Figure 2. Temporal distributions of the MODIS and Landsat acquisitions during the year 2013 over the study area.The shaded area represents the period of acquisition of ground LAINet measurements.DOY: day of year.

Figure 3
Figure 3 shows the general outline of the proposed framework to generate concomitant LAI and uncertainty maps with high spatiotemporal resolution.It includes LAINet, CACAO, and GPR methods:

Figure 3 .
Figure 3. Flowchart of the methodology combining LAINet, CACAO, and GPR for the generation of LAI and uncertainty maps with high spatiotemporal resolution.

Figure 4 .
Figure 4. Illustration of the CACAO fitting of the MODIS phenology model to actual Landsat observations for reconstructing high spatiotemporal reflectance data.Case of reflectance in the red band for the central pixel of the study area.The shift and scale fitting parameters are indicated.

Figure 4 .
Figure 4. Illustration of the CACAO fitting of the MODIS phenology model to actual Landsat observations for reconstructing high spatiotemporal reflectance data.Case of reflectance in the red band for the central pixel of the study area.The shift and scale fitting parameters are indicated.

Figure 5 .
Figure 5. Histograms of residuals between the CACAO reconstructed reflectances and the Landsat 8 OLI observed reflectances on DOY 187 (solid lines) and 235 (dashed lines) for the different spectral bands.The statistics refer to the relative differences (RD).

Figure 6 .
Figure 6.Cross-validation of (a) exponential regression (ER) predicted LAI, and (b) Gaussian process regression (GPR) with field LAI observations.The dashed line represents the 1:1 line.

Figure 5 .
Figure 5. Histograms of residuals between the CACAO reconstructed reflectances and the Landsat 8 OLI observed reflectances on DOY 187 (solid lines) and 235 (dashed lines) for the different spectral bands.The statistics refer to the relative differences (RD).

18 Figure 5 .
Figure 5. Histograms of residuals between the CACAO reconstructed reflectances and the Landsat 8 OLI observed reflectances on DOY 187 (solid lines) and 235 (dashed lines) for the different spectral bands.The statistics refer to the relative differences (RD).

Figure 6 .
Figure 6.Cross-validation of (a) exponential regression (ER) predicted LAI, and (b) Gaussian process regression (GPR) with field LAI observations.The dashed line represents the 1:1 line.

Figure 6 .
Figure 6.Cross-validation of (a) exponential regression (ER) predicted LAI, and (b) Gaussian process regression (GPR) with field LAI observations.The dashed line represents the 1:1 line.
Before green-up (DOY 161), consistent low LAI values close to zero are observed.After green-up, the vegetation grew rapidly, and reached the growing peak on DOY 217.During this period (from DOY 201 to 217), areas around the water (in the northwestern part of the study area) were characterized by low LAI values (<2.0) due to later sowing.After the growing peak on DOY 217, the vegetation began to wither, and the LAI decreased gradually to zero.During this withering period, areas around the water were also characterized by rather low LAI values, because of seasonal inundation.Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 18 this withering period, areas around the water were also characterized by rather low LAI values, because of seasonal inundation.

Figure 7 .
Figure 7. Maps of the estimated LAI at high spatiotemporal resolution (30-m and 8-day).The white areas represent non-vegetated cover types.

Figure 7 .
Figure 7. Maps of the estimated LAI at high spatiotemporal resolution (30-m and 8-day).The white areas represent non-vegetated cover types.

Figure 8 .
Figure 8. Maps of the standard deviation (SD) associated with the LAI retrievals (Figure7).The white areas represent non-vegetated cover types.

Figure 8 .
Figure 8. Maps of the standard deviation (SD) associated with the LAI retrievals (Figure7).The white areas represent non-vegetated cover types.
) Two dry-soil and two flooded-soil spectral reflectances (with LAI values equal to zero) were introduced to establish an extended training dataset.The original training dataset consisting of only the 67 LAINet measurements under-sampled the low LAI values.The introduction of dry-and flooded-soil spectral

Figure A1 .
Figure A1.Histograms of residuals between the CACAO reconstructed reflectances and the Landsat 8 OLI observed reflectances on DOY 139 (red solid lines), 235 (blue dashed lines), 299 (orange dotted lines), and 315 (green dash-dotted lines) for the different spectral bands.The statistics refer to the relative differences (RD).

Table 1 .
Band numbers (BN), central wavelengths (CW), and full width at half maximum (FWHM) of the OLI and MODIS sensors for the selected spectral bands.

Table 1 .
Band numbers (BN), central wavelengths (CW), and full width at half maximum (FWHM) of the OLI and MODIS sensors for the selected spectral bands.