Performance of Linear and Nonlinear Two-Leaf Light Use Efficiency Models at Different Temporal Scales

: The reliable simulation of gross primary productivity (GPP) at various spatial and temporal scales is of significance to quantifying the net exchange of carbon between terrestrial ecosystems and the atmosphere. This study aimed to verify the ability of a nonlinear two-leaf model (TL-LUEn), a linear two-leaf model (TL-LUE), and a big-leaf light use efficiency model (MOD17) to simulate GPP at half-hourly, daily and 8-day scales using GPP derived from 58 eddy-covariance flux sites in Asia, Europe and North America as benchmarks. Model evaluation showed that the overall performance of TL-LUEn was slightly but not significantly better than TL-LUE at half-hourly and daily scale, while the overall performance of both TL-LUEn and TL-LUE were significantly better ( p < 0.0001) than MOD17 at the two temporal scales. The improvement of TL-LUEn over TL-LUE was relatively small in comparison with the improvement of TL-LUE over MOD17. However, the differences between TL-LUEn and MOD17, and TL-LUE and MOD17 became less distinct at the 8-day scale. As for different vegetation types, TL-LUEn and TL-LUE performed better than MOD17 for all vegetation types except crops at the half-hourly scale. At the daily and 8-day scales, both TL-LUEn and TL-LUE outperformed MOD17 for forests. However, TL-LUEn had a mixed performance for the three non-forest types while TL-LUE outperformed MOD17 slightly for all these non-forest types at daily and 8-day scales. The better performance of TL-LUEn and TL-LUE for forests was mainly achieved by the correction of the underestimation/overestimation of GPP simulated by MOD17 under low/high solar radiation and sky clearness conditions. TL-LUEn is more applicable at individual sites at the half-hourly scale while TL-LUE could be regionally used at half-hourly, daily and 8-day scales. MOD17 is also an applicable option regionally at the 8-day scale.


Introduction
Efforts to mitigate climate change require the stabilization of atmospheric CO2 concentrations [1], which is significantly regulated by exchanges of carbon between terrestrial ecosystems and the atmosphere. Terrestrial gross primary productivity (GPP) is the largest component of the global carbon flux [2] and about 120 Pg C year −1 globally, considerably larger than the carbon annually emitted by human activities (about 9 Pg C year −1 ) [3]. Consequently, even a small change in GPP is likely to have a significant impact on atmospheric CO2 concentration. Thus, accurately simulating terrestrial GPP is of great significance to quantifying the global carbon cycle and predicting the future trajectories of the atmospheric CO2 concentration.
Two approaches have been widely employed to investigate the spatial and temporal variability in GPP using remotely sensed data: (i) remote sensing driven process-based models, and (ii) light use efficiency (LUE) models [4]. The former is based on the mechanistic description of the photosynthetic biochemical processes and scales the Farquhar instantaneous leaf-level biochemical model [5] to the canopy level using big-leaf, two-leaf, and multilayer scaling approaches. A number of process-based models have been successfully applied to quantify spatial-temporal variations of GPP at regional and global scales using remotely sensed vegetation parameters, such as leaf area index (LAI) and land cover types, as inputs. However, the application of these process-based models is limited by the complexity and uncertainty of their parameterization [6].
In contrast, LUE models, such as CASA [7], MOD17 [8], EC-LUE [9], and VPM [10], were developed according to the LUE argument of Monteith [11,12] that productivity is linearly related to the amount of absorbed photosynthetically active radiation (APAR). A fundamental assumption underlying LUE models is that plant canopies behave like a big single-leaf and their LUE is independent of the directional nature of solar radiation and vegetation structure [13].
Many studies have indicated that both GPP and LUE vary with both quantity and quality of incoming solar radiation. Gu et al. [14] detected a 20% increase in Harvard Forest photosynthesis after the 1991 Pinatubo eruption owing to the increase of diffuse radiation caused by volcanic aerosols. Flux site data indicated that canopy LUE was enhanced under diffuse sunlight in comparison with that under direct radiation [13,15,16]. Choudhury et al. [17] estimated an increase of 110% in crop LUE under diffuse radiation. Alton et al. [18] conducted a study for three forest sites (two broadleaf and one needleleaf) and found that the canopy LUE was enhanced by 6-33% under diffuse radiation. Alton [19] indicated that the enhancement of canopy LUE due to diffuse radiation varied with vegetation types, most significantly for tundra shrubs. Cai et al. [20] found that a single-leaf LUE model performed very well for a 56-year-old Douglas-fir stand when, instead of using total incident photosynthetically active radiation (PAR), they used the sum of incident diffuse PAR and a relatively small fixed fraction (22%) of incident direct PAR. Recently, Zhang et al. [21] reported that canopy LUE generally decreased with increasing sky clearness index, which is the ratio of solar radiation observed on the ground to radiation received at the top of the atmosphere, over 5 ChinaFLUX sites, including a temperate forest, a subtropical forest, a tropical rain forest, and two grassland sites. Therefore, the assumption that LUE is independent of the quality of radiation and GPP linearly increases with absorbed photosynthetic radiation would induce underestimation/overestimation of GPP in cloudy/clear skies [22,23].
Conceptually, a canopy is composed of clumps of sunlit and shaded leaves exposed to different levels of irradiance and showing variable LUE. Sunlit leaves receive both direct and diffuse radiation while shaded leaves mainly interact with diffuse beams. Under clear skies, solar irradiance is high and dominated by direct beams. Sunlit leaves are easily light saturated, and photosynthesis can even decrease with increasing radiation because of elevated temperature and enhanced photorespiration [13]. Consequently, the overall LUE of sunlit leaves is normally low [23]. In contrast, a large number of shaded leaves are only exposed to diffuse radiation, which is normally much lower than the radiation saturation point. Therefore, the photosynthesis of shaded leaves is typically light-limited. Under cloudy conditions, solar irradiance is dominated by diffuse sunlight, allowing shaded leaves to capture a large fraction of the solar irradiance. Even though the total incident radiation may be lower than that on clear days, the apparent improvement of the LUE for shaded leaves could lead to the enhanced LUE for the whole canopy [15,19].
More and more process models now calculate photosynthesis for sunlit and shade leaves separately [24][25][26]. However, this strategy has not been adopted by LUE models. To remedy this limitation, He et al. [23] recently developed a two-leaf LUE model on the basis of the MOD17 model. This new model considered differences in radiation absorption and in LUE of sunlit and shaded leaves. Validation at 6 ChinaFLUX sites demonstrated the improvement of the two-leaf LUE models over the MOD17 model in simulating GPP, especially at forest sites, with a R 2 value increasing about 0.1 and a root mean square error (RMSE) value decreasing about 0.64 g C m −2 day −1 on average.
In the two-leaf LUE model developed by He et al. [23], GPP of sunlit and shaded leaves increases linearly with APAR. However, many studies have shown a nonlinear increase of photosynthesis of sunlit leaves with increasing APAR because of light saturation of photosynthesis, especially at short temporal scales (minutes to hours) [15,[27][28][29][30][31]. Recently, Wang et al. [6] developed a two-leaf temperature and vegetation type dependent rectangular hyperbolic model, which links quantum yield (α) and maximum photosynthetic rate (Pm) with the maximum carboxylation rate at 25 °C. The model is able to simulate GPP as accurately as a process-based model.
Previously, linear LUE models have been mostly used to calculate GPP at daily, 8-day, and even longer temporal scales [7][8][9][10]32,33]. Recently, this type of models is used to calculate GPP at short temporal scales. For example, Carbon Tracker, a system to optimize terrestrial carbon flux, uses the Carnegie-Ames Stanford Approach (CASA) biogeochemical model to calculate GPP every three hours as the prior carbon flux [34]. Such application of linear LUE models might induce biases in simulated GPP since observations have indicated the nonlinear response of canopy GPP to incoming PAR at short temporal scales [15,[27][28][29][30][31]. Therefore, the assessment of applicability of different types of LUE models in simulating GPP at different temporal scales is of great significance to improving the simulation of GPP using remote sensing data.
In this study, the ability of nonlinear two-leaf LUE, linear two-leaf LUE, and MOD17 to simulate GPP at half-hourly, daily, and 8-day temporal scales were verified using GPP derived from net ecosystem productivity (NEP) measured at globally distributed 58 sites as benchmarks. The main goals of this study are: (1) to compare the performance of the MOD17, linear and nonlinear two-leaf LUE models at three different temporal scales (half-hourly, daily and 8-day); (2) to analyze the possible causes for the different performances of three LUE models. For simplicity, the MOD17, linear two-leaf LUE, and nonlinear two-leaf LUE models will be referred to hereinafter as MOD17, TL-LUE, and TL-LUEn, respectively.

Data
In this study, we used meteorological data and ecosystem fluxes measured with the eddy covariance (EC) technique at 58 sites pertaining to the FLUXNET network and the processed MODIS leaf area index (LAI) product (MOD15A2) to simulate GPP at half-hourly, daily and 8-day temporal scales. The meteorological and flux data belongs to the LaThuile FLUXNET dataset and can be freely downloaded [35]. The sites were selected on the basis of the availability of key datasets, such as LAI, meteorology, and land surface C fluxes. GPP derived from tower measured NEP was used as benchmarks for model parameter optimization and model evaluation. All flux data were processed in the manner proposed within the Fluxnet project [36,37] as described by [38][39][40][41]. The 58 sites included 21 needle-leaf-forest (NF) sites, with 1 deciduous needle-leaf-forest (DNF) sites and 20 evergreen needle-leaf-forest (ENF) sites, 11 broadleaf-forest (BF) sites, with 2 evergreen broadleaf-forest (EBF) sites and 9 deciduous broadleaf-forest (DBF) sites, 4 mixed-forest (MF) sites, 7 crop (CROP) sites, 7 grassland (GRASS) sites, and 8 shrub (SHRUB) sites, located in Asia, Europe, and North America ( Figure 1). The observations covered the period from January 2001 to December 2007 with at least two years of data for each site. In total, 143 site-years of data were used, of which 85 site-years of data were selected for parameter optimization (17 BF, 11 CROP, 10 GRASS, 6 MF, 29 NF, and 12 SHRUB site-years). The remaining 58 site-years of data were used for model evaluation, with one year of data for each site. Detailed information about each site is given in Table 1.

Models Used
TL-LUEn, TL-LUE, and MOD17 models were used in this study. The MOD17 algorithm is described in detail in Running et al. [8,91]. It relies on the assumption that GPP is linearly related to APAR [11,12,92]. The TL-LUE model stems from the MOD17 algorithm and discriminates the differences of upper and bottom canopy in receiving direct radiation and diffuse radiation and in their LUE. As a consequence, canopy GPP simulated by TL-LUE nonlinearly changes with incoming PAR. The TL-LUEn adopts the same methodology as the TL-LUE model to separate sunlit and shaded leaves and calculate their APAR. However, it takes the rectangular hyperbolic model to calculate GPP for sunlit and shaded leaves. The MOD17, TL-LUE, and TL-LUEn models are described in Equations where εmax is the maximum LUE in MOD17; fPAR is the fraction of PAR absorbed by vegetation and calculated from LAI using the Beer's Law (fPAR=1-e −k*LAI , where k is the light extinction coefficient and set as 0.5 as [23]); εmsu and εmsh are the maximum LUE of sunlit and shaded leaves in TL-LUE, respectively; εm is the quantum yield when incident PAR approaches zero and β is the maximum canopy photosynthetic flux density at light saturation in TL-LUEn [13]; f(VPD) and g(Tamin) denote the constrains imposed by atmospheric vapor pressure deficit (VPD) and minimum air temperature, respectively, and are used to downscale the maximum LUE values to real ones. APARmsu and APARmsh are the PAR absorbed by sunlit and shaded leaves per unit LAI; LAImsu and LAImsh are the leaf area index for sunlit and shaded leaves. In above equations, the two attenuation scalars, f(VPD) and g(Tamin), range from 0 (total inhibition) to 1 (no inhibition) and are calculated using the same formulas for MOD17, TL-LUE, and TL-LUEn. Parameters VPDmax, VPDmin, Tamin_min, and Tamin_max used to calculate f(VPD) and g(Tamin) depend on vegetation types [91] and are listed in Table 2.
In Equations (2) and (3), APARmsu and APARmsh are calculated as: where α is the albedo varying with vegetation types (Table 2), PARdif and PARdir are the diffuse and direct components of incoming PAR, respectively, and they are empirically calculated (see Equation (6)); PARdif,u is the diffuse PAR under the canopy and calculated following [24]; (PARdif -PARdif,u)/LAI denotes the absorbed diffuse PAR per unit leaf area within the canopy; C indicates multiple scattering of total PAR within the canopy [24]; φ is the mean leaf-sun angle and set as 60º [24]; θ is the mean solar zenith angles of half an hour, a day, and 8 days. The average solar zenith angle in each half an hour is calculated according to latitude, Julian day, and local time [93]. The average solar zenith angle in a given day is calculated according to latitude and Julian day [24]. The 8-day average solar zenith angle is the mean of the daily average solar zenith angles during the 8 days period. Diffuse and direct PAR are empirically partitioned as [24]: where PARdif is the estimated diffuse PAR; R is the sky clearness index (R=S/(S0cosθ)), S and S0 are the incoming solar radiation on the ground surface and solar constant (1367 Wm −2 ), respectively. In the conversion of incoming solar radiation into PAR, a constant of 0.5 is used [23] (PAR=0.5S).
LAImsu and LAImsh in Equations (2) and (3) are calculated as [24]: where Ω is the clumping index, which changes with land cover types, season and solar zenith angle. It was only assigned according to vegetation types here ( Table 2) since spatially and temporally variant data are not available for this parameter.

Parameter Optimization
The derived GPP in 85 calibration site-years was used to optimize parameters in MOD17, TL-LUEn, and TL-LUE models. The optimization was implemented using the Markov chain Monte Carlo (MCMC) method [97,98]. In the optimization, three models were driven using the same locally measured meteorological data and smoothed MODIS LAI. The MCMC simulation, as a stochastic simulation method, is based on Bayesian Theory, in which parameters are random variables instead of deterministic, but unknown constants in the classic thoughts. The fundamental formula of Bayesian Theory is: where π(θ | x) is the posterior density of parameter  (a term distribution under the condition of given sample x); π(θ) is the prior distribution of parameter  (the knowledge possessed before measurement); To determine the posterior density π(θ|x), the prior density and the likelihood function should be given in advance. We specified the prior density function as a uniform distribution over the following ranges: The lower and upper limits of εm (unit: g C MJ −1 ) and  (unit: µg C m −2 s −1 ) were set according to the values compiled from 100 published datasets by Ruimy et al. [99]. The upper limits of εmsu, εmsh and εmax (unit: g C MJ −1 ) were assigned on the basis of previous findings [32,100]. In the optimization, these parameters were assumed uniformly distributed in the given limits with equal probability for all possible values. The likelihood function was specified according to the distribution of simulation errors, which were assumed following a multivariate Gaussian distribution with a zero mean. This assumption is commonly made in many studies [101][102][103]. With this assumption, the likelihood function can be written as: where Oi and Pi are the tower-derived GPP and simulated GPP, respectively; i  are the standard error of tower-derived GPP. The sampling of parameters was implemented using the Metropolis-Hastings (M-H) algorithm. To find an effective proposal distribution, we first made a test run of the algorithm with 50,000 simulations. Based on the test run, a Gaussian distribution N(0, cov 0 (θ)) was constructed (cov 0 (θ) is a diagonal matrix with its diagonal elements equal to the estimated variances of parameters θ). Then, the following proposal distribution was adopted to execute the consecutive MCMC simulations formally for 30,000 times: where θ k is the new parameters generated from its predecessor θ k−1 .
The running means and standard deviations of parameter samples need time to approach stable. For bettering the statistical analysis of parameters, we discarded the initial 10,000 samples in the burn-in period and only used the remaining 20,000 samples for further analysis of each parameter. The histograms of the samples for each parameter indicate these parameters were well constrained in most situations because the posterior density functions were near the normal distribution (see Figure A1 in Appendix). Uncertainties of estimated parameters were quantified with a 95% highest-probability density interval. Means of parameter θi were estimated as followings and used for model validation: where N is the number of samples in the M-H algorithm.

Parameter Sensitivity Analysis
Sensitivity of simulated GPP to parameters in three LUE models was analyzed using the factorial approach [104][105][106]. This method facilitates the statistically based representation of combinations of errors in several parameter sets. For a two-level complete factorial design, each of the model parameters is assigned upper and lower values based on specified perturbations of the magnitudes of the parameters, and the model is run using all combinations of parameter values. For n different parameters, this would require 2 n simulation runs. Each parameter here was perturbed by an arbitrary magnitude ±10% [105].
The main effect of a parameter, which is also referred to the parameter sensitivity, is calculated as the average difference between a run in which the parameter is at its upper level (+10%) and a run in which the parameter at its lower level (−10%), but other parameters remain unchanged. For example, there are 4 simulation runs for TL-LUEn when considering the parameters εm and β. They are both εm and β at their lower levels (simulation #1), εm at its upper level and β at its lower level (simulation #2), εm at its lower level and β at its upper level (simulation #3), and both εm and β at their upper levels (simulation #4). The main effect of εm in TL-LUEn are the average of the difference between simulation #2 and simulation #1, and the difference between simulation #4 and simulation #3. A larger value of main effect indicates higher sensitivity of simulated GPP.

Model Performance Assessment
The performance of TL-LUEn, TL-LUE, and MOD17 was assessed using root mean square error (RMSE) and determination coefficient (R 2 ). The paired t test was then conducted to evaluate the significance regarding the differences in R 2 , RMSE between TL-LUEn and TL-LUE, TL-LUEn and MOD17, and TL-LUE and MOD17 when all vegetation types lumped together at three temporal scales for model evaluation, respectively [13]. Table 3 shows the averages of optimized εm, εmax, εmsu and εmsh for 6 different vegetation types at half-hourly, daily and 8-day temporal scales, respectively. εm was generally larger than εmsh, εmsu, and εmax. It increased sizably when the temporal scales increasing from half-hourly to 8-day, especially for CROP. CROP always had the highest εm in all the three temporal scales. At the half-hourly scale, three forest types, including BF, MF and NF, had lower εm values than CROP and SHRUB, but higher than GRASS. Through metadata analysis for more than 100 published datasets, Ruimy et al. [99] reported that CROP has the highest εm (about 5.17 g C MJ −1 ) at the half-hourly scale, followed by forests (about 4.37 g C MJ −1 , mainly BF sites), and GRASS has the smallest one (about 2.71 g C MJ −1 ), similar to the identified changes of εm with vegetation types here.

Optimized Model Parameters
Optimized εmax was in between εmsh and εmsu for all vegetation types and at all temporal scales. εmsh is larger than εmsu and εmax due to the fact that shaded leaves are only exposed to diffuse radiation, which enters a canopy from all directions and distributes more evenly than direct radiation within the canopy [107,108]. The intensity of light absorbed by shaded leaves is normally lower than light saturation point. Thus, they have higher light use efficiency than sunlit leaves. The values of εmax, εmsu and εmsh showed smaller variations than εm across three temporal scales ( Table 3). As expected, CROP had the highest εmax, εmsu, and εmsh values, which are 1.78, 1.21 and 5.23 g C MJ −1 at the half-hourly scale, 1.80, 0.95 and 4.67 g C MJ −1 at the daily temporal scale, and 1.80, 0.96 and 4.26 g C MJ −1 at the 8-day scale, respectively. MF had the second largest ones, followed by BF and NF. GRASS had the lowest εmax, εmsu and εmsh values among all 6 vegetation types. The εmax, εmsu and εmsh of GRASS were lower than half of the corresponding values of CROP at the three temporal scales, respectively. In general, the average optimized εmax was close to the default values used in the MOD17 algorithm ( Table 2) for all vegetation types except CROP, which had much higher optimized εmax than the default (0.604 g C MJ −1 ). Many studies have indicated that the underestimation of CROP GPP by the MOD17 algorithm is mainly due to the low value of εmax used [109]. It has been reported that the mean LUE of croplands can approach 2.80 g C MJ −1 [110][111][112], slightly higher than the average value of about 2.0 g C MJ −1 optimized in this study.
The parameter β in the TL-LUEn model showed complex changes with temporal scales and vegetation types. At the half-hourly scale, non-forest types had a relatively higher β value than forests, which is consistent with the findings reported by Ruimy et al. [99] and Wang et al. [6]. At the daily temporal scale, GRASS had the highest β (286.28 µg C m −2 s −1 ), followed by CROP, NF, MF, SHRUB and BF. At  Table 3. Average, standard deviation, variation of coefficient (CV), and uncertainties range of optimized εm, β, εmsu, εmsh and εmax for 6 different vegetation types at half-hourly, daily and 8-day temporal scales (εm, β in TL-LUEn, εmsu, εmsh in TL-LUE, and εmax in MOD17). The uncertainty was quantified with a 95% highest-probability density interval and averaged over each biome.

Model Performance in Calibration Site-Years
At the half-hourly scale, TL-LUEn showed slightly better performance than TL-LUE when data in all 85 calibration site-years were lumped together (Figure 2). With the increase of temporal scales from half an hour to 8 days, the difference of TL-LUEn and TL-LUE became almost indistinguishable. The improvement of both TL-LUE and TL-LUEn over MOD17 was obvious at all three temporal scales. At the half-hourly scale, the average RMSE of MOD17 were larger than that of TL-LUEn by 8.1 mg C m −2 (30min) −1 , and the corresponding average R 2 was lower than that of TL-LUEn by 0.050 (Figure 2a,b). At the daily scale, MOD17 output an average RMSE higher by 0.3 g C m −2 day −1 and R 2 lower by 0.051 than TL-LUEn (Figure 2c,d). As to the 8-day scale, the average RMSE of MOD17 was 1.0 g C m −2 8days −1 larger than that of TL-LUEn, and the corresponding average R 2 was 0.025 lower than that of TL-LUEn (Figure 2e,f). At three different temporal scales, the number of site-years in the same RMSE and R 2 classes was similar for TL-LUEn and TL-LUE, confirming their similar ability to simulate GPP (see Figure 2). MOD17 performed poorer than TL-LUEn and TL-LUE in most site-years, indicated by larger RMSE and smaller R 2 . For example, at the half-hourly temporal scale, the number of site-years with small RMSE values (below 60 mg C m −2 (30min) −1 ), was 29 for MOD17, 40 for TL-LUEn, and 41 for TL-LUE, respectively. The R 2 of GPP simulated by MOD17 was mostly in the range of 0.5-0.8 (in 58 site-years) while the R 2 of GPP simulated by the TL-LUEn and TL-LUE mostly ranged from 0.7 to 0.9 (58 site-years for TL-LUEn and 59 site-years for TL-LUE).
TL-LUEn performed better than TL-LUE for most vegetation types except CROP at the half-hourly and daily scale. However, it performed no better than TL-LUE for MF, NF and SHRUB at the 8-day scale (see Figure 3). Both TL-LUEn and TL-LUE outperformed MOD17 for most vegetation types at three temporal scales except for CROP at the half-hourly scale. The superiority of TL-LUEn and TL-LUE over MOD17 was most significant at forest sites.

Model Performance at the Half-hourly Scale
Model evaluation shows that TL-LUEn performed slightly better than TL-LUE in simulating half-hourly GPP when data in all 58 validation site-years was lumped together (Figure 4a,b). The RMSE and R 2 of GPP simulated using TL-LUE against measurements averaged 64.3 mg C m −2 (30min) −1 and 0.732, respectively, while the corresponding values of TL-LUEn were 63.9 mg C m −2 (30min) −1 and 0.735, respectively. However, the differences in RMSE value between TL-LUEn and TL-LUE were not significant (p = 0.45) as well as the differences in R 2 between the two models (p = 0.27) ( Table 4). The performance of MOD17 was the poorest, with average RMSE and R 2 values equaled to 70.1 mg C m −2 (30min) −1 and 0.690, respectively. In addition, the differences in both the two statistics (RMSE, R 2 ) between TL-LUEn and MOD17, and TL-LUE and MOD17 were significant, with p values smaller than 0.0001 (Table 4). In 58 evaluation site-years, the RMSE of GPP simulated using TL-LUEn, TL-LUE, and MOD17 was larger than 80 mg C m −2 (30min) −1 at 10, 11 and 17 sites, respectively. The R 2 of GPP simulated using MOD17 mostly ranged from 0.5 to 0.8 (at 42 sites) while the R 2 of GPP simulated using TL-LUEn and TL-LUE was in the range from 0.7 to 0.9 at 39 and 40 sites, respectively (Figure 4a,b). TL-LUEn performed better than TL-LUE at 31 sites. The poorer performance of TL-LUEn relative to TL-LUE occurred at CROP, GRASS, SHRUB and NF sites. MOD17 only performed better than TL-LUE and TL-LUEn at 9 sites (Table A1 in the Appendix). Overall, TL-LUEn performed better than TL-LUE for BF, GRASS, MF and SHRUB, but poorer than TL-LUE for CROP and NF (see Figure 5). TL-LUEn and TL-LUE outperformed MOD17 for all vegetation types but CROP. The improvement of TL-LUE and TL-LUEn over MOD17 was most significant for forests (BF, MF and NF). Averaged over all forest sites, the RMSE of MOD17 was larger than those of TL-LUEn and TL-LUE by 10.1 mg C m −2 (30min) −1 and 8.8mg C m −2 (30min) −1 , respectively. The corresponding average R 2 of MOD17 was 0.063 and 0.060 lower than those of TL-LUEn and TL-LUE, respectively.

Model Performance at the Daily Scale
TL-LUEn showed better performance than TL-LUE at the daily scale when 58 validation site-year data was lumped together (Figure 4c,d). The average RMSE of GPP simulated by TL-LUEn and TL-LUE was both 1.7 g C m −2 day −1 . The average R 2 of GPP simulated by TL-LUEn was slightly larger than that of TL-LUE. Results of the paired t test on the differences in average RMSE value between TL-LUEn and TL-LUE were not significant (p = 0.75), as well as the differences in average R 2 value between the two models (p = 0.60) ( Table 4). The average RMSE value of MOD17 was 1.9 g C m −2 day −1 . The average R 2 value of MOD17 was smaller than the corresponding values of TL-LUEn and TL-LUE by 0.046 and by 0.045, respectively. In addition, the differences in average RMSE value and R 2 value between TL-LUEn and MOD17, and TL-LUE and MOD17 were significant, with p values were all smaller than 0.0001.
In 58 validation site-years, MOD17 produced larger RMSE and lower R 2 than TL-LUE and TL-LUEn at most sites. The RMSE of GPP simulated using MOD17, TL-LUE, and TL-LUEn was larger than 2.0 g C m −2 day −1 at 28, 15, and 15 sites, respectively. The values of R 2 above 0.9 occurred at only 3 sites for MOD17, at 13 sites for TL-LUE, at 16 sites for TL-LUEn, respectively (Figure 4c,d). TL-LUEn showed better ability to simulate GPP than TL-LUE at 29 sites. MOD17 only outperformed TL-LUEn and TL-LUE at 12 and 8 sites, respectively, mainly CROP, SHRUB and GRASS sites (Table A2 in the Appendix).
As to MF, NF and SHRUB, TL-LUEn performed with a higher RMSE and R 2 than TL-LUE. TL-LUEn only outperformed TL-LUE for BF, but performed poorer for CROP and GRASS (see Figure 5). Overall, TL-LUEn and TL-LUE outperformed MOD17 for forests and SHRUB. For three forest types, both TL-LUEn and TL-LUE outperformed MOD17. The average RMSE of both TL-LUEn and TL-LUE was 0.4 g C m −2 day −1 smaller than that of MOD17 while the average R 2 of TL-LUEn and TL-LUE was 0.063 and 0.057 higher than that of MOD17, respectively. As to CROP and GRASS, both average RMSE and R 2 of GPP simulated by TL-LUEn were higher than those of MOD17, while TL-LUE outperformed MOD17 with a smaller RMSE and a higher R 2 (see Figure 5).

Model Performance at the 8-day Scale
When data in all 58 validation site-years was lumped together, TL-LUEn performed similarly with TL-LUE at the 8-day scale (Figure 4e,f). The differences between the two models were significant (p < 0.05) in average RMSE value but was not significant (p = 0.33) in average R 2 value (Table 4). TL-LUEn outperformed MOD17 with significant differences (p < 0.05) in their average R 2 value but not significant differences in their RMSE value (p = 0.18), while TL-LUE outperformed MOD17 with significant differences (p < 0.0001) in their average RMSE value but not significant differences in their average R 2 value (p = 0.33) (Figure 4e,f, Table 4). However, the improvement of both TL-LUE and TL-LUEn over MOD17 was smaller in comparison with the improvement at half-hourly and daily temporal scales. MOD17 produced an average RMSE value of 13.0 g C m −2 (8days) −1 and an average R 2 value of 0.755. The average RMSE and R 2 of TL-LUE were 12.0 g C m −2 (8days) −1 and 0.775, respectively.
The RMSE of GPP simulated by TL-LUEn, TL-LUE and MOD17 were smaller than 12 g C m −2 (8days) −1 at 35, 34 and 27 sites. The three models had similar numbers of sites in each R 2 class. TL-LUEn performed poorer than TL-LUE at 32 sites, while MOD17 outperformed TL-LUEn and TL-LUE at 20 and 13 sites, respectively, which were mainly non-forest and NF sites ( Table A3 in the Appendix).
TL-LUEn only performed better than TL-LUE for BF. As to non-forest types (CROP, GRASS and SHRUB), TL-LUEn performed similarly with MOD17. The average RMSE and R 2 of the former were larger than those of the latter by 1.0 g C m −2 (8days) −1 and 0.012, respectively. TL-LUE outperformed MOD17 in all the three non-forest types. As to forests, both TL-LUEn and TL-LUE showed a better performance than MOD17 with an average RMSE smaller than that of MOD17 by 1.5 g C m −2 (8days) −1 and 1.6 g C m −2 (8days) −1 and corresponding average R 2 larger than that of MOD17 by 0.030 and 0.022, respectively.

The Ability of the Three LUE Models to Simulate GPP
At the half-hourly temporal scale, TL-LUEn and TL-LUE performed better than MOD17 for three types of forests (MF, BF, and NF), GRASS and SHRUB and their differences between these vegetation types are significant. With the increase of temporal scales, the improvement of TL-LUEn and TL-LUE over MOD17 gradually became less distinct. The changes in the ability of TL-LUEn, TL-LUE, and MOD17 to simulate GPP with vegetation types and temporal scales are, at least in part, related to the different structure of various vegetation types and the different response of canopy GPP to incident PAR described by three models. It was found that scaling up in time tended to linearize the relationship between CO2 flux and PAR [99]. Observations also showed that changes in canopy GPP with incident PAR are nonlinear at the half-hourly scale and become approximately linear at the daily and 8-day scales [29]. GPP simulated by MOD17 always linearly increase with incident PAR while the increase of GPP with incident PAR is nonlinear in both TL-LUEn and TL-LUE. The non-linearity of TL-LUEn and TL-LUE can be modified through changing parameters εm, β, εmsu, and εmsh. For example, if εmsu equals εmsh in the TL-LUE model, the response of simulated canopy GPP to incident PAR would be close to linear. This is the reason why the improvement of TL-LUEn and TL-LUE over MOD17 is much smaller at the 8-day scale than at the half-hourly temporal scale.
The better performance of TL-LUEn and TL-LUE models over MOD17 changed with vegetation types, most significantly for forests, then for SHRUB, GRASS. Leuning et al. [113] pointed out that the canopy CO2 exchange rates of crops is a quasi-linear function of absorbed PAR. In contrast, forests and sparse vegetation often show a markedly nonlinear response of canopy CO2 exchange rates to absorbed PAR [99,113,114]. TL-LUEn and TL-LUE are able to capture both the nonlinear and linear responses of GPP to incident PAR while MOD17 is only able to describe the linear change of canopy GPP with incident PAR. Therefore, TL-LUEn and TL-LUE outperform MOD17 for forests, shrub and grass sites. The higher performance is the most significant for forests at the half-hourly scale.
The linear response of GPP to PAR in MOD17 led to the underestimation/overestimation of GPP under conditions of low/high incident PAR, which has been confirmed by Propastin et al. [22] and He et al. [23]. TL-LUEn and TL-LUE were, at least partially, able to correct this weakness. Figures 6-8 show the RMSE of simulated GPP as a function of incident PAR at three different temporal scales. Under medium PAR conditions, TL-LUEn, TL-LUE, and MOD17 performed similarly. The improvement of TL-LUEn and TL-LUE over MOD17 mainly occurred under low or high incident PAR conditions.  The ratio of diffuse to direct PAR changed with clearness index. Under conditions of low clearness index, canopy LUE is high [23] owing to more diffuse PAR being absorbed by shaded leaves with high LUE. MOD17 did not differentiate the different effects of diffuse and direct PAR on GPP and tended to underestimate/overestimate GPP under low/high clearness index conditions (Figure 9). In TL-LUEn and TL-LUE, incident PAR is first decomposed into diffuse and direct components according to clearness index. Under conditions of low clearness index, increased diffuse PAR will be mainly absorbed by shaded leaves, which have high LUE. Thus, GPP simulated by TL-LUEn and TL-LUE is higher than that simulated by MOD17 (Figure 9). In contrast, when clearness index is high, increased direct PAR will be mostly absorbed by sunlit leaves, which have low LUE. Consequently, GPP simulated by TL-LUEn and TL-LUE is lower than that simulated by MOD17. Therefore, the systematic biases of GPP simulated by MOD17 model under low and high clearness index can be alleviated by TL-LUE and TL-LUEn.

The Applicability of Different Models
The different performances of three LUE models at different temporal scales and for different vegetation types suggest that it should be selective when using them. Prior to regional simulations of GPP using these models, we must be careful with the applicability of the optimized parameters.
In this study, optimized parameters changed significantly among different vegetation types. The across-site variability of these parameters is also sizeable even for a specific vegetation type (Table 3). The sensitivity of simulated GPP to εm and β in TL-LUEn, to εmsu and εmsh in TL-LUE, and to εmax in MOD17 was assessed using the factorial approach described in section 2.2.3. The calculated main effect of εmax, εm, β, εmsu and εmsh are 20%, 11.50%, 8.50%, 8.09% and 11.91% at the half-hourly scale, and 20%, 10.72%, 9.28%, 6.84% and 13.16% at the daily scale, and 20%, 11.26%, 8.74%, 7.52% and 12.48% at the 8-day scale, respectively ( Table 5). The sensitivity of simulated GPP to εmax in MOD17 is higher than the sensitivity of simulated GPP to individual parameters in TL-LUEn and TL-LUE. The sensitivity of simulated GPP to the simultaneous uncertainties of εm and β in TL-LUEn and to the simultaneous uncertainties of εmsu and εmsh in TL-LUE is the same as the sensitivity of simulated GPP to εmax in MOD17. In addition, parameters in TL-LUE and MOD17 showed similar variations within each vegetation type, indicated by the similar CV in Table 3. However, as the two parameters in TL-LUEn vary not only with biomes but also with temperature [6], optimized εm and β exhibited larger variations and uncertainties than εmax, εmsu, and εmsh for all vegetation types (Table 3).
Above analyses on the performance of three models at different temporal scales and their sensitivity to parameter uncertainties indicate that TL-LUEn is more applicable in individual site at the half-hourly scale. TL-LUE can be used regionally at the half-hourly, daily, and 8-day scales. MOD17 is also a good option for simulating regional GPP at the 8-day temporal scale and it is able to simulate GPP with accuracy close to TL-LUE.  Note: Columns three-four and six-seven show contrast coefficients for ε m , β in TL-LUEn, and ε msu , ε msh in TL-LUE, respectively. A plus symbol indicates that the parameter was set at 110% of the estimate while a minus symbol indicates 90% of the estimate. ΔGPP rel is the relative differences between the simulated GPP calculated by introducing a perturbation to a certain parameter and the simulated GPP calculated using optimized parameters.

Uncertainties and Remaining Issues
Both TL-LUEn and TL-LUE separate a canopy into sunlit and shaded leaves, and TL-LUEn further describe nonlinear response of their respective photosynthesis to APAR. These two models demonstrated powerful ability to simulate GPP. However, there are still some uncertainties remained. As indicated by Gebremichael and Barros [105], uncertainties in meteorological and LAI data, parameters, and model structure all might induce errors of simulated GPP. The change of linear response of GPP to VPD and PAR to nonlinear one and the inclusion of a soil moisture scalar might improve GPP simulation.
Similar to MOD17, TL-LUEn and TL-LUE only use VPD and minimum air temperature as environmental constraints on GPP. VPD represents the effect of atmospheric dryness on vegetation photosynthesis as a result of stomatal conductance. Soil moisture also plays an important role in regulating GPP via effects on leaf cell turgor pressure directly affecting photosynthesis or by stomatal conductance [10,115,116]. Because VPD and soil water availability did not co-vary, it would be most appropriate to have soil water availability as a constraint on photosynthesis in addition to VPD [117]. In MOD17, soil drought stress was approximated through the increase in the sensitivity of GPP to VPD [118]. Photosynthesis is considered to be totally shut off during periods of very high VPD, but in fact soil moisture and other environmental conditions might be still favorable to maintain photosynthetic activity at a certain level even if atmosphere is very dry [105]. Thus, the lack of soil water availability as a photosynthetic constraint in all three LUE models surely increases uncertainties in simulated GPP, especially for crops and grasses with shorter roots and high dependence on shallow soil moisture.
In this study, parameters in three LUE models are assumed invariant seasonally, which could induce some uncertainties in optimized parameters and calculated GPP. Many studies have shown that these parameters vary with both temperature and vegetation types [14,119,120]. Wang et al. [6] recently reported that with the consideration of seasonal changes of two parameters associated with temperature, the two-leaf nonlinear hyperbolic model (i.e., TL-LUEn) could simulate GPP as well as a process-based model. Chen et al. [121] indicated that the exclusion of seasonality of parameters in one-leaf and two-leaf LUE models is one of major drivers responsible for the failure of these models to capture the seasonality of GPP well. Thus, the proper representation of seasonal variations of these parameters needs further investigation.
Simulated GPP is also affected by the quality of meteorological and LAI inputs. In the calibration and validation periods, the models were driven by tower-measured meteorological data and processed MODIS LAI. The errors caused by inaccuracies of meteorological inputs are likely relative small [41]. However, the MODIS LAI contained considerable uncertainties, especially for crops [122], mainly caused by uncertainties in land cover and surface reflectance inputs and in the LAI inversion algorithm, and by the prevalence of persistent cloud cover [105]. The four variables in both TL-LUEn and TL-LUE (APARmsh, APARmsu, LAImsh and LAImsu), are all linked to LAI [23]. Our analysis showed that GPP simulated by TL-LUEn and TL-LUE is slightly more sensitive to LAI than that simulated by MOD17 (not shown here), which could be one of possible explainers for the poorer performance of TL-LUEn and TL-LUE relative to MOD17 at crop sites. Of course, this speculation is still worth of deep study.
The LUE of crops changes with species. At the BON, MIR, MER, RG19, and RG21 crop sites, the corn or soybeans were cultivated every other year. At the ASM site, the dominant species was wheat in 2003-2004 and 2006, while it was changed to corn in 2005. Corn is a C4 plant while soybeans and wheat are C3 plants. The LUE of corn is much higher than that of soybeans and wheat. The application of optimized parameters of C3/C4 plants for C4/C3 plants in the validation years might result in large uncertainties in simulated GPP. This is a possible cause for the poorer performance of models for crops in the validation period than in the calibration period. In addition, the uneven numbers of flux sites for different vegetation types could also result in uncertainties in the identified overall robustness of individual models.

Conclusions
In this study, the ability of three different types of LUE models (MOD17, TL-LUE and TL-LUEn) to simulate GPP at various temporal scales for different vegetation types was assessed using measurements at 58 flux sites in Asia, Europe and North America. The main conclusions that were drawn as follows: (1) Optimized model parameters vary distinctly not only among different vegetation types, but also among different sites for the same vegetation type, especially for TL-LUEn. The parameters in TL-LUEn change sizably with temporal scales while the parameters in TL-LUE and MOD17 are almost invariant with temporal scales. (2) The overall performance of TL-LUEn was slightly but not significantly better than TL-LUE at half-hourly and daily scale, while the overall performance of both TL-LUEn and TL-LUE were significantly better (p < 0.0001) than MOD17 at the two temporal scales. The improvement of TL-LUEn over TL-LUE was relatively small in comparison with the improvement of TL-LUE over MOD17. However, the differences between TL-LUEn and MOD17, and TL-LUE and MOD17 became less distinct at 8-day scale.