Evaluation and Comparison of Light Use Efficiency and Gross Primary Productivity Using Three Different Approaches

: Light use efficiency (LUE), which characterizes the efficiency with which vegetation converts captured/absorbed radiation into organic dry matter through photosynthesis, is a key parameter for estimating vegetation gross primary productivity (GPP). Studies suggest that diffuse radiation induces a higher LUE than direct radiation in short-term and site-scale experiments. The clearness index (CI), described as the fraction of solar incident radiation on the surface of the earth to the extraterrestrial radiation at the top of the atmosphere, is added to the parameterization approach to explain the conditions of diffuse and direct radiation in this study. Machine learning methods—such as the Cubist regression tree approach—are also popular approaches for studying vegetation carbon uptake. This paper aims to compare and analyze the performances of three different approaches for estimating global LUE and GPP. The methods for collecting LUE were based on the following: ( 1) parameterization approach without CI; (2) parameterization approach with CI; and (3) Cubist regression tree approach. We collected GPP and meteorological data from 180 FLUXNET sites as calibration and validation data and the Global Land Surface Satellite (GLASS) products and ERA-interim data as input data to estimate the global LUE and GPP in 2014. Site-scale validation with FLUXNET measurements indicated that the Cubist regression approach performed better than the parameterization approaches. However, when applying the approaches to global LUE and GPP, the parameterization approach with the CI became the most reliable approach, then closely followed by the parameterization approach without the CI. Spatial analysis showed that the addition of the CI improved the LUE and GPP, especially in high-value zones. The results of the Cubist regression tree approach illustrate more fluctuations than the parameterization approaches. Although the distributions of LUE presented variations over different seasons, vegetation had the highest LUE, at approximately 1.5 gC/MJ, during the whole year in equatorial regions (e.g., South America, middle Africa and Southeast Asia). The three approaches produced roughly consistent global annual GPPs ranging from 109.23 to 120.65 Pg/yr. Our results suggest the parameterization approaches are robust when extrapolating to the global scale, of which the parameterization approach with CI performs slightly better than that without CI. By contrast, the Cubist regression tree produced LUE and GPP with lower accuracy even though it performed the best for model validation at the site scale. we set values to zero in these cases, most were still underestimated. We also plotted the difference between V2 GPP and MODIS GPP (MOD17). It shows that the parameterization approach with the CI gets a higher GPP than MODIS algorithm, especially in the equatorial regions (the northern regions of South America, middle Africa and Southeast Asia) where even more than 600 gC/m 2 / yr of difference was produced. There were minor differences which means V2 GPP were lower occurred in the south of China, Madagascar islands.

advantage of the Cubist method is to add multiple training committees and "reinforcement", so as to make the weights more balanced [47]. It also provides linear equations for prediction instead of black box. In addition, Cubist is a commercial, proprietary product and has the least algorithmic documentation [48]. Houborg et al. prove the better performances than random forest in estimating leaf area index [49]. Previous studies use this approach to produce LUE, GPP and spatially continuous GPP in the USA [50][51][52][53], and these studies gain satisfactory validation results against sitescale measured GPP. However, few studies compare this approach with the parameterization approach in terms of global LUE and GPP.
The purpose of this paper is to compare and analyze the performances of different approaches for producing global and seasonal LUE and GPP. The three methods for determining LUE were based on (1) a parameterization approach without the CI, (2) a parameterization approach with the CI, and 3) the Cubist regression tree approach. We collected eddy-covariance and meteorological data from 180 FLUXNET sites as calibration and validation data and GLASS products and ERA-interim data as input data to estimate global LUE and GPP in 2014, with a spatial resolution of 5 km and a temporal resolution of 8 days. FLUXNET measurements were used to validate LUE and GPP using three approaches at the site scale. Furthermore, we compared the temporal and spatial patterns of the LUE and GPP to analyze and assess the strengths and drawbacks of each approach. It is worth noting that global LUE distributions were first presented in this study to explore the spatial variations in LUE in different areas and the temporal changes in LUE in different seasons.

Data from FLUXNET
The FLUXNET2015 dataset, which can be downloaded from https://fluxnet.fluxdata.org/data/, contains global eddy covariance measurements and meteorological variables from more than 200 sites. In this study, we selected daily GPP, incident shortwave radiation (SW), latent heat flux (LE), sensible heat flux (H) and hourly air temperature (TA). FLUXNET GPP is calculated as the difference between ecosystem respiration (RECO) and net ecosystem CO2 exchange (NEE). SW was used to calculate PAR by multiplying the value by 0.48 [54]. LE and H were prepared for the EF according to EF = LE/(LE+H). Mean temperature (Tmean) was obtained by averaging hourly TA. To ensure that the FLUXNET data were reliable, we retained only high-quality data with the help of quality flags. Finally, we obtained 29,056 pieces of data distributed at 180 sites ranging from 2003 to 2014. All data were distributed in 12 types of vegetation, of which CRO (croplands) had 20 sites, CSH (closed shrublands) had 3 sites, DBF (deciduous broadleaf forests) had 24 sites, DNF (deciduous needleleaf forests) had 1 site, EBF (evergreen broadleaf forests) had 10 sites, ENF (evergreen needleleaf forests) had 35 sites, GRA (grasslands) had 33 sites, MF (mixed forests) had 9 sites, OSH (open shrublands) had 14 sites, SAV (savannas) had 7 sites, WET (permanent wetlands) had 18 sites, and WSA (woody savannas) had 6 sites. In this study, we considered WET as GRA because of the complicated conditions of WET.

MODIS Data Processing
We downloaded the fraction of absorbed photosynthetically active radiation (FPAR) and leaf area index (LAI) (MCD15A2H) data of Moderate Resolution Imaging Spectroradiometer (MODIS) at a spatial resolution of 500 m (from MODIS Global Subsets website https://modis.ornl.gov/cgibin/MODIS/global/subset.pl), with which LUE was calculated according to Formula (1) in the FLUXNET sites: where GPP and SW were from the FLUXNET data, and FPAR was from the MODIS data at a resolution of 500 m. 0.48 represents the ratio of photosynthetically active radiation to the total incoming solar energy. At each site, 9 (3 × 3) pixels around the central position were collected for quality-control processes. The first quality-control process included 1) gaining 9 values in each site; 2) deleting invalid data according to the quality control flags; 3) counting the number of valid values in each site; and 4) retaining the valid data if the number exceeds 5, or deleting the valid data if the value did not. MODIS LAI and FPAR depend on the reflectance in the red, near-infrared (NIR), and sometimes shortwave infrared (SWIR) bands at the surface level, which are often very sensitive to atmospheric effects, including clouds, aerosols, water vapor, and ozone [55,56]. Although many of these effects can be removed using real-time or near real-time atmospheric observations [57], the remaining effects can sometimes be very large. These remaining effects generally cause more increases in the red band than in the NIR band, consequently resulting in the reduction of LAI and FPAR. In this case, there would be abnormally low values in a seasonal trajectory. Therefore, we conduct the second qualitycontrol process, which includes deleting abnormally low values based on seasonal curves [58]. In parameterization approaches, parameters, for example, maximum LUE, vary with vegetation type.
In the regression tree approach, vegetation type can be an input data used to calibrate the approach. We used the International Geosphere-Biosphere Programme (IGBP) vegetation type map from yearly global 5-km MODIS Landcover (MCD12C1) to address different types of vegetation. In addition, we downloaded MOD17 8-day/0.05 degree GPP as an existing global GPP product to analyze our GPP from http://files.ntsg.umt.edu/data/NTSG_Products/MOD17 [59].

GLASS Data
The GLASS LAI and FPAR datasets were generated and released by Beijing Normal University (http://www.bnu-datacenter.com) [60]. This product has a temporal resolution of 8 days and is available from 1981 to the present. The LAI and FPAR used in this study were generated from AVHRR reflectance at a resolution of 5 km [61]. The GLASS LAI and FPAR products have smooth and reasonable trajectories. By cross-comparison and validation, the accuracy of GLASS products is clearly better than that of MODIS and CYCLOPES products (Carbon Cycle and Change in Land Observational Products from an Ensemble of Satellites). Moreover, the GLASS LAI and FPAR are more temporally continuous and spatially complete than are the other tested products [62,63].

ERA-Interim Data
ERA-Interim is a global land surface reanalysis dataset covering the period since 1979 and continuing in real time. This product can be downloaded from the European Centre for Medium Range Weather Forecasts (ECMWF) Data Server at http://data.ecmwf.int/data. The time steps are subdaily, daily and monthly, and a spatial resolution of 0.75°. ERA-Interim is the result of the simulation with the latest ECMWF land surface model driven by meteorological forcing from the ERA-Interim atmospheric reanalysis and precipitation adjustments based on the monthly Global Precipitation Climatology Project (GPCP) [64]. In this study, we downloaded half-daily ERA average air temperature (AAT), 2-m dewpoint temperature (D2M), surface air pressure (P), downward shortwave radiation (SWdw), net longwave radiation (LWnet) and net shortwave radiation (SWnet). Then, we averaged every 16 values to obtain 8-day ERA data to prepare for the global LUE and GPP estimation.

Methods
LUE is of great significance in GPP estimation, and it can help us to deeply understand and clarify the key processes of the vegetation carbon sequestration. In this section, we will describe the different approaches of estimating LUE and then how to gain GPP based on LUE.

LUE estimation
We established 3 approaches to compute LUE. Two of them were parameterization approaches in which we first determined the maximum LUE for each type of vegetation by an optimization algorithm developed at the University of Arizona (SCE-UA), and then we adjusted the maximum LUE with temperature and water stress factors, finally estimating the actual LUE. The difference between the two parameterization approaches was whether we considered the effect of solar radiation by adopting the CI in the determination of the maximum LUE. The third approach was the Cubist regression approach, which considers the contributions of temperature, LAI, EF, CI and vegetation type to the LUE.

•
Parameterization approach without CI (approach V1) LUE is estimated by Formula (2) in the parameterization approach without CI (approach V1): where LUE is the actual LUE and can be calculated by Formula (1) at FLUXNET sites, and LUE is the maximum LUE without stress in each type of vegetation. f T represents the temperature stress and can be described as [65]: where T represents the average temperature and comes from the FLUXNET measurements, and Topt is the monthly mean temperature when the vegetation reaches the maximum LAI for each vegetation type. f W describes the water stress and is related to EF by [65]: At the site scale, EF was calculated by Formula (5), where LE and H were collected from the FLUXNET data. For global LUE and GPP estimation, EF was derived by Formula (6), where ET and PET represent the actual and potential evapotranspiration, respectively.
Previous studies suggest that the Penman-Monteith (P-M) formula is a biophysically sound and robust framework for estimating daily evapotranspiration at regional and global scales with remotely sensed data [66]. In this study, we used a modified P-M approach with biome-specific canopy conductance to estimate daily actual evapotranspiration, which can be partitioned into soil evaporation and canopy transpiration [67,68]. Potential evapotranspiration is calculated using the Priestley and Taylor (P-T) formula [69]. More details can be found in Cui, et al. [70].
The maximum LUE values for different vegetation types in approach V1 were determined by the SCE-UA optimization algorithm. The SCE-UA optimization algorithm, developed and described by Duan, et al. [71], is both a global and a probabilistic optimization algorithm. This approach is structured by four basic ideas: (1) the combination of random and deterministic approaches; (2) the concept of clustering; (3) the concept of a systematic evolution towards global improvement; and (4) the concept of competitive evolution [72]. The details of the SCE-UA method refer to the literature [71,73]. Due to its high efficiency of solving global optimal solutions under nonlinear constraints and never depending on the initial value of the mode, SCE-UA has been widely used for the optimization of parameters and data assimilation [74][75][76][77]. In this study, we set a valid and reasonable range for LUE . The cost function equals the root-mean-square error (RMSE) between the actual LUE and the estimated value. Then, we minimized the cost function to obtain LUE in each vegetation type (Table 1). In the second parameterization approach (approach V2), we considered the effect of the CI, and LUE was estimated by Formula (7): = × (1 + 0.033 × cos 2 × 365 ) where LUE and LUE are the coefficients of CI and 1-CI, respectively. LUE equals the LUE in completely direct light. LUE is positively related to LUE in diffuse light. CI represents the fraction of solar incident radiation on the surface of the earth ( ) to the extraterrestrial radiation at the top of the atmosphere ( ). T denotes the time period corresponding to SW; thus, T = 60 × 60 × 24 = 86,400 s in this study.
represents the solar radiation constant, which is equal to 1367 W/m 2 .
is the solar horizon at sunrise. is the latitude. is the solar declination. In order to optimize LUE and LUE , we firstly set reasonable ranges for them based on the value of LUE for each vegetation type; and then different values are randomly selected from the specific ranges, in addition, the corresponding LUE is calculated which would be compared with FLUXNET LUE. Finally, we built the cost function (RMSE) and minimized it to simultaneously optimize LUE and LUE using the SCE-UA optimization algorithm. •

Cubist regression tree approach (approach V3)
The Cubist regression tree approach (approach V3) was finally established to estimate LUE. Cubist is a tool to generate rule-based predictive approaches from data. This approach partitions data into smaller groups that are more homogenous. To achieve outcome homogeneity, regression trees determine: (1) the predictor to split on and the value of the split; (2) the depth or complexity of the tree; and (3) the prediction formula at the terminal nodes. Cubist is one of the most utilized regression tree approaches. Some specific features of Cubist are (1) the specific techniques used for linear smoothing, creating rules, and pruning; (2) an optional boosting; and (3) the predictions generated by the rules can be adjusted using nearby points from the training set data [78]. For the Cubist regression tree approach, an assumption that LUE is determined by vegetation growth, water stress, temperature stress, radiation condition and vegetation type was made. Therefore, we selected 4 continuous parameters, LAI, EF, Tmean and CI and one discrete variable, vegetation type, which respectively represent the 5 factors in order, as input. The maximum number of regression trees was set to 10. The established linear formulas are listed in Table 2.

GPP Estimation
In this part, the main task is to prepare the required spatially continuous parameters listed in Figure 1. With Formula (3), we calculated gridded temperature stress f(T); with Formulas (4) and (6), we obtained gridded EF and water stress f(W), respectively. We already obtained the LUE ( Table  1) for each vegetation type during the parameterization process. Having the above data, the global LUE maps in 2014, with a spatial resolution of 5 km and a temporal resolution of 8 days, were produced (V1). The difference between V2 and V1 is the addition of the CI. With Formulas (8)-(10), we obtained the global gridded CI. LUE and LUE are listed in Table 1. Then, we obtained the global 8-day LUE in 2014 according to the process described in the blue frame. For the Cubist regression tree approach (V3), we first calculated the gridded parameters (green frame) and then selected the corresponding linear formula ( Table 2) for each pixel to calculate global LUE. Having gained LUE maps, the same equation (Equation 11) was next used to calculate global GPP.

Calibration and validation
After quality control, 29,056 pieces of FLUXNET site data remained, of which 19,459 pieces were randomly chosen for calibration and the remaining data (9597 pieces) were for validation. Data for the calibration and validation both are globally distributed and cover all vegetation types. Figure 2. shows the validation of the three approaches. We also calculated LUE with MOD17 GPP algorithm [79] and compared with in situ data. The results show that our LUE and GPP products were better than MOD17 results which had lower coefficients of determination (R 2 ) and higher root-mean-square error (RMSE), with R 2 of 0.098 for LUE and 0.558 for GPP, and RMSE of 0.545 gC/MJ for LUE and 2.575 gC/m2/d for GPP. The R 2 of LUE equaled 0.183 and 0.240 for parameterization approaches V1 and V2, respectively. The RMSE of LUE dropped from 0.508 to 0.487 gC/MJ after considering the CI. The results suggested that the addition of the CI to the parameterization approach slightly improved the accuracy of LUE. The R 2 and RMSE of LUE by the Cubist regression approach equaled 0.538 and 0.352 gC/MJ, respectively. In comparison with the former two approaches, the Cubist regression tree approach had great advantages in gaining a higher R 2 and a lower RMSE. The three approaches all underestimated LUE, especially when the FLUXNET LUE exceeded 2.0 gC/MJ. However, the underestimation problem was alleviated for GPP, with the R 2 ranging from 0.63 to 0.78 and the RMSE ranging from 1.79 to 2.33 gC/m 2 /d. The LUE in the FLUXNET sites was equal to the GPP in sites divided by the corresponding MODIS FPAR. The uncertainty in the LUE at the sites might result from the error of MODIS FPAR. However, the predicted GPP was calculated by multiplying the FPAR. There was a certain degree of offset of GPP error after multiplication and division operations. In addition, the seasonal variations in solar radiation enhanced the accuracy of GPP estimation.
Looking at the validation result in each type of vegetation (Figure 3), it is easy to find that the DBF, GRA and WSA gained high-quality LUE and GPP. In contrast, the two parameterization approaches obtained weak relationships between FLUXNET and estimated LUE in CRO, EBF, MF, CSH and OSH. In CRO, crop species were complicated and might contain C3 plants, C4 plants or both. The different features between C3 and C4 crops [6] introduced errors in CRO. The vegetation in EBF showed few variations in different seasons; therefore, it was difficult to retrieve solid correlations. In addition, EBF was distributed in both high-latitude and low-latitude areas where the maximum LUE and the optimal temperature varied greatly. These factors might lead to the uncertainty in the LUE and GPP in EBF. In MF, OSH and CSH, the main cause was the heterogeneity of plants.

Validation of Global LUE and GPP Results in 2014
We extrapolated the calibrated parameters and models to a global scale and produced LUE and GPP maps for 2014 with a spatial resolution of 5 km and a temporal resolution of 8 days. In this part, we validated the results against FLUXNET measurements. There were 1700 pieces of data observed in 69 sites within 2014. Figure 4 indicates that our LUE and GPP estimates are reliable. The R 2 of LUE ranged from 0.21 to 0.30, and the RMSE ranged from 0.41 to 0.55 gC/MJ. The best result was produced by the parameterization approach with the CI, while the least satisfactory one was estimated by the Cubist regression tree model. There were more overestimated LUE values in the Cubist regression approach, some of which reached or even exceeded 3 gC/MJ. Although the Cubist regression tree approach gained the optimum result during calibration, it revealed clear disadvantages in extrapolating to the global scale. Like other empirical methods, the main weakness of Cubist is the vague explanations behind the formulas and the great possibilities to exceed reasonable values when dealing with some rare and special situations. However, it is most likely to obtain pixels that are unsuitable for the estimated relationships when mapping global LUE. Compared with LUE, the accuracy of GPP was improved, with the R 2 ranging from 0.51 to 0.60 and the RMSE ranging from 2.42 to 2.87 gC/m 2 /d. Similarly, the parameterization approach with the CI gained the most satisfactory GPP. Although the Cubist regression tree approach still produced the weakest relationship between estimated GPP and FLUXNET GPP, the overestimation was mitigated because of the constraint of PAR.
Then, we looked at more details in each type of vegetation (Table 3). In DBF, DNF, GRA, MF and WSA, vegetation gained high-quality LUE with R 2 values greater than 0.3. In EBF, ENF, OSH and SAV, vegetation obtained acceptable accuracy with R 2 values from 0.17 to 0.26. All of the above types of vegetation had RMSE values less than 0.5 gC/MJ. However, all three approaches failed to produce reliable LUE values in CRO, with an R 2 less than 0.03 and an RMSE higher than 0.57 gC/MJ. Crop species in CRO were complicated and might contain C3 plants, C4 plants or both. The different features between C3 and C4 crops introduced errors in CRO. In comparison with the parameterization approach without the CI, the parameterization approach with the CI showed clear advantages in all types of vegetation in terms of calculating LUE. In comparison with the Cubist regression tree approach, the parameterization approach with the CI clearly won in CRO, DBF, ENF, GRA, MF, OSH, and SAV and was slightly superior in DNF and obviously inferior in EBF and WSA. Similar to the general situation, the accuracy of GPP was improved in each type of vegetation compared with LUE. The relationship between estimated GPP and FLUXNET GPP was strong, with the R 2 values were greater than 0.5 in DBF, DNF, EBF, ENF, GRA, MF, SAV and WSA. However, GPP gained only acceptable accuracy, with R 2 values of 0.46 and 0.31 and RMSE values of 2.73 gC/m 2 /d and 0.86 gC/m 2 /d in CRO and OSH, respectively. The main errors in OSH were induced by the heterogeneity of plants. The error resources of cropland vegetation will be discussed in Section 4. In general, the parameterization approach with the CI gained the most accurate estimates even though the parameterization approach without the CI surpassed the DNF and the Cubist regression tree approach won in ENF and WSA.   Figure 5 shows the distributions of LUE produced by the parameterization approach without the CI during four periods in 2014. The variations in the four periods indicate that the LUE varied greatly in different seasons. From 2014049 to 2014136, during which spring occurred in the Northern Hemisphere, the LUE in Asia, Europe and North America was generally low, with values less than 1.0 gC/MJ. On these continents, vegetation in coastal areas had a higher LUE than that found in inland areas, even at the same latitude or longitude. In equatorial regions including South America, middle Africa and Southeast Asia, vegetation had the highest LUE, at approximately 1.5 gC/MJ. From the equator to the Southern Hemisphere, LUE decreased to 1.2 gC/MJ at approximately 30° S and then continued to decline to approximately 0.5 gC/MJ in southwestern South America and Africa. The west and south of South America shared low LUE values because of the low temperatures resulting from high altitudes and high latitudes, respectively, while the vegetation in Africa presented a low LUE because of drought [80], which can be supported by the extremely low EF in this study. In Australia, the vegetation had much lower LUE values than those in South America and Africa, with an approximate value of 0.5 gC/MJ, but demonstrated a similar tendency for LUE to increase from the eastern coastline areas, at 1.5 gC/MJ, to the western area, at less than 0.5 gC/MJ, because of the decreasing soil moisture [81].

Parameterization Approach without the CI
In the second season (from 2014137 to 2014224), the global vegetation LUE generally illustrated an increasing tendency. During the last two seasons (from 2014225 to 2014365 and from 2014001 to 2014048), the global vegetation LUE showed a gradual decreasing tendency. Vegetation in the Northern Hemisphere had a similar changing trend with that of global vegetation. In eastern North America, Europe and Southeast Asia, the vegetation LUE increased to 1.5 gC/MJ due to the increasing radiation and temperature in the second season, and it declined to approximately 0.8 gC/MJ in the third season; finally, it declined to less than 0.2 gC/MJ in the last season. The Southern Hemisphere, including South America, Africa and Australia, showed fewer changes in LUE distribution during the four different seasons. In the north of Africa and Australia, however, the vegetation illustrated the opposite tendency as that seen in other areas where more radiation induced less LUE because these areas were attacked by drought for the whole year. More radiation causes the temperature to constantly increase, which consequently aggravates the effects of water stress.

Parameterization Approach with CI
Considering the effect of different radiation modes, we added the CI in the parameterization approach and gained a second global LUE distribution ( Figure 6). In general, the LUE distributions produced by this approach were similar to those produced by the parameterization approach without the CI; the differences between the two approaches ranged from −0.1 to 0.4 gC/MJ. For example, the differences included the LUE calculated by the second approach (V2) being even higher in high-value zones, which agrees with a previous study that found that diffuse light enhanced LUE within similar environmental conditions [82]. Looking at LUE in every single season, we found (1) from 2014049 to 2014136, the LUE calculated by V2 in South America, middle Africa, and the Southeast Asia reached 1.8 gC/MJ, which was approximately 0.3 gC/MJ more than the LUE calculated by V1. In Europe, the LUE calculated by V2 slightly exceeded that calculated by V1, by 0.1 gC/MJ. On the other hand, the LUE calculated by V2 was lower than the LUE calculated by V1 in the southwestern USA, the north of Africa, and the southwest of Asia; (2) from 2014137 to 2014224, the LUE calculated by V2 in Europe, Canada and northern Asia reached approximately 1.25 gC/MJ and slightly exceeded the LUE calculated by V1 by 0.15 gC/MJ. However, the LUE calculated by V2 was lower than that calculated by V1 in the southwestern USA, the north and south of Africa, the southwest of Asia and the north of Australia; (3) from 2014225 to 2014320, the differences between the two approaches narrowed. Great increases occurred in the north of South America, middle of Africa, and the southeast of Asia. (4) In the last season, variations continuously decreased in the Northern Hemisphere but gradually increased in the Southern Hemisphere, with more areas showing the LUE calculated by V2 being higher than the LUE calculated by V1. In the four seasons, the equatorial regions all showed an increase in LUE as calculated by V2. The main reason is that these areas had abundant rain for the whole year, which reduced the radiation reaching the surface of the earth, consequently leading to a low CI. Therefore, the addition of the CI increased the value of LUE. Other increases in LUE in the region during the growing season can be explained by there being more rain and lower CI.  Figure 7. demonstrates the distributions of LUE calculated by the Cubist regression tree approach. Although LUE generally shared a similar distribution with that of the former two approaches, the differences with the parameterization approach without the CI ranged from -0.8 to 0.8 gC/MJ. In general, the positive differences were located in high-latitude areas, while the negative differences were distributed in low-latitude regions. Looking at the LUE on every continent, we found that the southwest of North America had decreases, but other places in North America had increases. The differences diminished in the last season; in South America, few variations were observed; in Europe, the LUE calculated by V3 exceeded that calculated by V1 in the four seasons. In the middle of Africa, the LUE calculated by V3 was higher than that calculated by V1, while the north and south of Africa saw decreases in LUE as calculated by V3. The situation in Asia was more complicated. In northern Asia, including Russia and Mongolia, there were positive differences. During the former three seasons, western Asia and northern China showed negative differences, but the negative differences were eliminated in the last season. Southern China had positive differences in the whole year. There were few variations in the south of Asia. In Australia, except for its southern coastline, the LUE calculated by V3 was lower than the LUE calculated by V1 in the former three seasons, while the negative differences mainly occurred along the southern coastline in the last season.  Figure 8. presents the distributions of global annual GPP in 2014. Similar to LUE, there were three equatorial regions, including the northern regions of South America, middle Africa and Southeast Asia, in which vegetation absorbed the most carbon, with GPP greater than 2500 gC/m 2 /yr. Next to these high carbon sequestration areas, South America, South Africa, South China, India, the coastline of Australia, and some mid-high latitudes in the Northern Hemisphere, such as the eastern USA and Europe, cultivated vegetation with approximately 1500 gC/m 2 /yr GPP. High-latitude areas, such as Canada and northern Europe, and high-altitude regions, including the USA, the western coastline of South America, the south of Africa, and the Tibetan Plateau of China, saw low carbon sequestration because of the low-temperature stress. Drought-hit areas, such as northern Africa, northern Australia, and Northwest China, also experienced low GPP because of the lack of water. We compared our 3 GPP results with MODIS GPP (MOD17). The four approaches produced roughly consistent global annual GPP values ranging from 109.23 to 120.65 Pg/yr. The highest value (120.65 Pg/yr) was obtained by the parameterization approach with the CI because this approach considered the different effects between direct and diffuse radiation. The lowest value (109.23 Pg/yr) was estimated by the parameterization approach without the CI. The lands covered by dense vegetation are usually moistened by abundant rain, consequently having a low CI. Therefore, the addition of the CI increased the GPP. We regard this approach as the best method for calculating LUE and GPP. In comparison with the parameterization approach with the CI, the Cubist regression tree approach produced a GPP of 116.41 Pg/yr. This approach divided all pixels into 10 classes and then put them into a corresponding linear rule or rules. The Cubist regression tree approach had little mechanism consideration behind the linear formulas. In addition, 10 rules might not sufficiently satisfy all pixels. Therefore, linear rules are most likely to output negative values. Although we set values to zero in these cases, most were still underestimated. We also plotted the difference between V2 GPP and MODIS GPP (MOD17). It shows that the parameterization approach with the CI gets a higher GPP than MODIS algorithm, especially in the equatorial regions (the northern regions of South America, middle Africa and Southeast Asia) where even more than 600 gC/m 2 /yr of difference was produced. There were minor differences which means V2 GPP were lower occurred in the south of China, Madagascar islands.

Comparison between the Parameterization Approach with and without the CI
To distinguish the different effects of direct and diffuse radiation, we added the CI into the parameterization approach.  Table 4 lists more details of these sites. These sites cover the Northern (IT-Isp, CH-Cha, SE-St1) and Southern Hemispheres (AU-Whr) as well as humid (IT-Isp, AU-Whr, CH-Cha) and dry areas (SE-St1). They include 4 types of vegetation: DBF, EBF, GRA and WET.
Error! Reference source not found.Error! Reference source not found. shows the functions of the CI in the parameterization approach on the LUE estimation. Red dotted lines outline the key points; the 201st and 273rd days in IS-Isp; the 97th and from the 121st to 185th days in AU-Whr; the 113rd, 185th and 273rd days in CH-Cha; and the 73rd and from 209th to 225th days in SE-St1. The CI at these points was less than 0.40, and some values were even below 0.30, which were much lower than the average value of 0.50. In these cases, the accuracy of the parameterization approach with the CI apparently surpassed that without the CI. A lower CI indicates more diffuse radiation, which outputs a higher LUE.

Comparison between Parameterization Approaches and Regression Tree Approach
Error! Reference source not found.Error! Reference source not found. shows the global annual GPP estimated by the three approaches in each type of vegetation. It is easy to find that EBF absorbed the most carbon among all types of vegetation, with a GPP greater than 30 Pg/yr, followed by SAV, WSA, GRA and CRO, with GPP values of approximately 15 Pg/yr. The lowest GPP values were produced by DNF and CSH because of the extremely small areas for DNF and the weak carbon sequestration capacity for CSH. Although the GPP values estimated by the three approaches were similar, there were still some differences for some vegetation types, such as EBF, WSA and GRA. Most of the EBFs were located in equatorial regions with suitable temperature and abundant rain. More rain implied more diffuse radiation; therefore, the clearness index obviously increased the LUE and GPP. Our WSA calibration sites were located in low-latitude areas. The Cubist regression approach was likely to lead to the overestimation when extrapolating this relationship to highlatitude areas (southern Canada, northern USA, Europe and Russia). In contrast, the Cubist regression tree approach underestimated GPP and NPP in GRA because of these inconsistent locations between the calibration sites and land areas. To analyze the contributions of different latitudes to global vegetation carbon uptake and to compare the performances of the three approaches, we averaged the LUE and summed the annual GPP every 5 km along latitude ( Figure 11). In general, LUE demonstrated a tendency of having higher values at lower latitudes and lower values at higher latitudes. In the first season, LUE had its highest value of approximately 1.5 gC/MJ in equatorial regions corresponding to South America, middle Africa and Southeast Asia. There was another important peak near 1.0 gC/MJ at 40°S latitude because of the southern coastal vegetation in Australia. Areas from 50° to 55°S saw higher LUE values, some higher than 1.0 gC/MJ. The curves in the Southern Hemisphere illustrated more fluctuations. The total area of vegetation was much smaller; consequently, the mean LUE was more sensitive to spatial changes. Then, looking at the Northern Hemisphere, it was obvious that vegetation showed a relatively lower LUE, i.e., less than 0.5 gC/MJ, because of the lower temperature and less radiation. The LUE curves calculated by V1 and V2 were more stable than that calculated by V3. The Cubist regression tree approach belongs to the empirical linear regression method, which could be easily affected by a specific parameter and produced a fluctuating line. For the parameterization approaches, the LUE showed a gradual increase from the Arctic to equatorial regions. The three approaches all witnessed a local minimum of 0.5 gC/MJ at 12°N, which was caused by the large area of low LUE in Africa. In the second season, there were few changes (small drops) in the Southern Hemisphere; however, the Northern Hemisphere witnessed large increases up to 1.0 gC/MJ at 50°N. Similar to the first season, the curve generated by V3 showed more fluctuations. In the third season, vegetation in the Southern Hemisphere remained roughly unchanged. However, the LUE in the Northern Hemisphere declined. The peaks produced by V1 and V2 fell to 0.7 gC/MJ, while those produced by V3 remained at 1.0 gC/MJ. In the last season, the Southern Hemisphere saw a slight increase in LUE, with a peak of almost 1.0 gC/MJ. However. The LUE in the Northern Hemisphere went through a slump, and the latitude of the peak moved from 55° to 25°, and the peak LUE declined to 0.5 gC/MJ. Compared with the Southern Hemisphere, the Northern Hemisphere presented much more variation. The main reason for this variation was the distribution of vegetation in the Southern Hemisphere, which was mainly located in low-latitude areas, while the vegetation in the Northern Hemisphere was spread in low-and high-latitude regions. High-latitude areas underwent great changes in temperature and solar radiation, which were essential for vegetation growth. Therefore, the LUE in the Northern Hemisphere varied greatly during different seasons. The two parameterization approaches showed few differences except in equatorial regions and other local maximum values where V2 had a higher LUE than V1. A better explanation for this increase can be found in Section 3.2.2. The Cubist regression tree approach, however, presented a clear difference from the former parameterization approaches. V3 produced a higher LUE in high-latitude areas (50°N and 50°S) and a lower LUE at low-latitude areas (between 30°N and 30°S). Looking at the GPP curves with latitude, there were two clear peaks reaching 75 TgC/yr at 50°N and 130 TgC/yr in equatorial regions. The annual GPP curves from the 3 approaches showed fewer variations than those of LUE.

Uncertainty Analysis
As described in Section 2.2, we used a different dataset for the model calibration and global LUE and GPP estimation. The uncertainty of the input parameters, such as EF, PAR and Tmean, would introduce error to our LUE and GPP products. In this part, we plotted the time series of GPP, LUE, EF, PAR and Tmean in two sites and compared data for calibration and global extrapolation. Figure  12. shows our PAR and Tmean used for global product match well with FLUXNET PAR and Tmean which agrees with the high correlations of them (both higher than 0.95). By contrast, the correlation between FLUXNET EF and global EF is only 0.681, which means EF had higher uncertainty. In CH-Lae, global EF matched well with FLUXNET EF, then we got a satisfactory LUE and GPP, while US-SRG failed to produce either a very good LUE or GPP because of the uncertainty of EF. Therefore, we think EF is a key error source for global LUE and GPP.

Error Analysis
For most vegetation types, the relationships between FLUXNET and estimated GPP were stronger than LUE. This problem was particularly noticeable in CRO and EBF, with the LUE R 2 at less than 0.05 and the GPP R 2 at 0.46 and 0.50, respectively. The main causes might come from the process used to calculate LUE. According to Formula (2), the precision of site-scale LUE relies on FLUXNET GPP, incident shortwave (PAR) and MODIS FPAR. We regarded the measured FLUXNET data as reliable; therefore, the uncertainty was mainly decided by FPAR. In some pixels, MODIS FPAR was very low-because of the clouds, inconsistent landcover, or system errors-which would result in an overestimation of LUE. Therefore, the weak relationships between FLUXNET-estimated LUE did not only originate from the error of estimates, but from the FLUXNET LUE. Figure 13. demonstrates the LUE and GPP curves and 5000 m × 5000 m land surface images from a Google map at five sites, with which we can analyze the different effects of homogeneous and heterogeneous land surfaces. US-WCr was covered by DBF, US-Me2 was covered by ENF, AU-Gin was covered by WSA, and DE-Kli and CH-Oe2 were covered by CRO (Table 4). In the former three sites, the land surface was homogeneous within an area of 5000 m × 5000 m. The estimated LUE and GPP curves agreed well with FLUXNET LUE and GPP. However, most sites of croplands were heterogeneous in the same area. DE-Kli and CH-Oe2 showed a single vegetation type in 500 m × 500 m of land. However, they contained different types of vegetation, including croplands, forests, builtup lands and water, in 5000 m × 5000 m areas where the LUE varied greatly [5]. Furthermore, crops could be divided into C3 and C4 plants that produce different LUE [6]. Therefore, the complicated circumstances around croplands resulted in low-quality estimates. In addition, the misclassification induced errors in the estimated LUE and GPP. We used the MODIS landcover product in this study. Some sites were misclassified, such as US-Los and US-Tw4 ( Figure 14), because of inconsistent spatial resolutions and classification errors. These two sites were recorded as grassland in the FLUXNET dataset, while the corresponding pixel was cropland in the MODIS landcover data. In the parameterization approaches, we set different maximum LUEs for each vegetation type. These pixels were wrongly treated as cropland rather than grassland, in which the maximum LUE was lower than the former. Consequently, the LUE and GPP were overestimated.

Conclusions
In this study, we collected discrete FLUXNET eddy-covariance and meteorological data and spatially continuous MODIS, GLASS and ERA-Interim data to estimate the global LUE and GPP. The SCE-UA optimization method had a high efficiency of solving the global optimal solution under nonlinear constraints and never depended on the initial value of the mode, and the Cubist regression tree approach provided a powerful tool with which to upscale site-observed fluxes to a larger scale with satellite-derived parameters and other explanatory variables. We established three LUE-based GPP approaches to assess the different performances at both the site and the global scales. The method of obtaining the LUE was based on (1) a parameterization approach without the CI, (2) a parameterization approach with the CI, and (3) a Cubist regression tree approach.
By validating with FLUXNET measurements at the site scale, we obtained the following: (1) The Cubist regression approach performed better than the parameterization approaches in estimating LUE and GPP.
(2) The three approaches all underestimated the LUE, especially when the FLUXNET LUE exceeded 2.0 gC/MJ. However, the underestimation problem was alleviated for GPP.
However, when applying these models to the global LUE and GPP in 2014, we found the following: (1) The LUE and GPP estimated by the three approaches were reliable, of which the parameterization approach with the CI produced the most satisfactory result then closely followed by the parameterization approach without the CI, while the Cubist regression approach produced the least satisfactory result.
(2) The accuracy of GPP was higher than that of LUE for all types of vegetation.
(3) The LUE distributions showed some variations in different seasons, but vegetation had the highest LUE at approximately 1.5 gC/MJ for the entire year in equatorial regions (South America, middle Africa and Southeast Asia).
(4) The three approaches produced roughly consistent global annual GPP values, ranging from 109.23 to 120.65 Pg/yr.
In conclusion, our results suggest the parameterization approaches are robust when extrapolating to the global scale, of which the parameterization approach with CI performs slightly better than that without CI. By contrast, the Cubist regression tree produced LUE and GPP with lower accuracy even though it performed the best for model validation at the site scale.
Author Contributions: M.W. and R.S. conceived and designed the work; M.W. analyzed the data and wrote the manuscript, R.S. provided ideas and modified the manuscript; A.Z. processed some experimental data. Z.X. provided data and suggestions. All authors have read and agreed to the published version of the manuscript.