Estimating Global Gross Primary Production from Sun-Induced Chlorophyll Fluorescence Data and Auxiliary Information Using Machine Learning Methods

: The gross primary production (GPP) is important for regulating the global carbon cycle and climate change. Recent studies have shown that sun-induced chlorophyll ﬂuorescence (SIF) is highly advantageous regarding GPP monitoring. However, using SIF to estimate GPP on a global scale is limited by the lack of a stable SIF-GPP relationship. Here, we estimated global monthly GPP at 0.05 ◦ spatial resolution for the period 2001–2017, using the global OCO-2-based SIF product (GOSIF) and other auxiliary data. Large amounts of ﬂux tower data are not available to the public and the available data is not evenly distributed globally and has a smaller measured footprint than the GOSIF data. This makes it difﬁcult to use the ﬂux tower GPP directly as an input to the model. Our strategy is to scale in situ measurements using two moderate-resolution satellite GPP products (MODIS and GLASS). Speciﬁcally, these two satellite GPP products were calibrated and eventually integrated by in situ measurements (FLUXNET2015 dataset, 83 sites), which was then used to train a machine learning model (GBRT) that performed the best among ﬁve evaluated models. The GPP estimates from GOSIF were highly accurate coefﬁcient of determination (R 2 ) = 0.58, root mean square error (RMSE) = 2.74 g C · m − 2 , bias = –0.34 g C · m − 2 ) as validated by in situ measurements, and exhibited reasonable spatial and seasonal variations on a global scale. Our method requires fewer input variables and has higher computational efﬁciency than other satellite GPP estimation methods. Satellite-based SIF data provide a unique opportunity for more accurate, near real-time GPP mapping in the future.


Introduction
Plant photosynthesis is an important biochemical process in terrestrial ecosystems, it regulates gas exchange between the atmosphere and the biosphere and effectively mitigates global warming [1][2][3]. Accordingly, being able to accurately estimate the amount of carbon dioxide that is fixed by terrestrial vegetation is critical to understanding the underlying mechanisms of ecosystem-climate interactions [4,5]. Gross primary production (GPP), which measures the total amount of carbon dioxide absorbed by plants through photosynthesis, is an important part of quantifying carbon sinks in terrestrial ecosystems [6][7][8][9][10]. It is necessary to grasp the temporal and spatial dynamics of terrestrial ecosystem GPP on both regional and global scales. This will help to correctly evaluate terrestrial carbon uptake, monitor crop yields, and investigate the impacts of climate change and human activities on ecosystems [11][12][13][14][15].
With the development and application of quantitative remote sensing technologies, estimating GPP via remote sensing is becoming increasingly effective [16]. At present, Here, PAR is the photosynthetically active radiation received by the plant canopy, fPAR is the proportion of PAR absorbed by plant canopies, and ε p is the light energy efficiency of photosynthesis. These indicators are, however, often disturbed by the environment. fPAR is usually based on vegetation index [22,23] or fPAR remote sensing products [24]. These indicators are often disturbed by the environment. When vegetation is under stress (for example by extreme temperatures or limited water), they are not sensitive to rapid changes in plant photosynthetic status [25,26]. At the same time, ε p exhibits greater variability in a single biological community [27,28]. (3) Process-based models involve mechanisms of plant ecology as series of nonlinear equations to represent the atmosphere-vegetation-soil system and associated fluxes simulating plant growth. These are characterized by complex interactions between vegetation and atmospheric processes, such as photosynthesis, respiration, transpiration, and evaporation [29][30][31][32]. However, many of the model parameters are difficult to obtain, and the estimation of GPP is still subject to significant uncertainties. Currently, data-driven models are considered to be the most accurate for estimating GPP. However, owing to the limited number of flux sites and their uneven distribution, global GPP spatial and temporal patterns cannot be obtained directly through the spatial expansion method.
Chlorophyll fluorescence, which is a byproduct of photosynthesis, responds immediately to disturbances in environmental conditions, such as light and water. Therefore, it can directly reflect the photosynthetic activity of plants, compared to the reflectance-based vegetation index that is traditionally used for vegetation remote sensing applications. Furthermore, it can be used as a remote sensing index for estimating photosynthetic energy conversion and carbon absorption [7,[33][34][35]. In recent years, chlorophyll fluorescence analysis has been shown to be an effective method to monitor the photosynthesis of vegetation from space [25,33,[36][37][38]. It can rapidly evaluate the growth status of plants, and its coupling with physiological processes is more closely related to dynamic changes in physiological functions and light energy utilization than the normalized difference vegetation index (NDVI) or enhanced vegetation index [39,40]. Recent studies have shown that suninduced chlorophyll fluorescence (SIF) has great advantages regarding GPP monitoring, and can better track the seasonal and inter-annual changes of GPP for different plant functional types [9,25,[41][42][43][44]. There are two approaches to estimating GPP based on SIF: one is to establish a direct empirical linear model of the two [7,9,[45][46][47][48][49][50][51], and the other is based on the Soil-Canopy-Observation of Photosynthesis and the Energy Balance (SCOPE) model [3,[52][53][54]. The latter, however, has a large number of input parameters, which limits its application on a large scale. Studies have shown the relationship between SIF and GPP, and have used this relationship to prove the ability of SIF to estimate GPP [9,35,55]. However, species-specific physical and physiological traits mean that the relationships Remote Sens. 2021, 13,963 3 of 20 between ecosystem GPP and SIF are biome-dependent [7,33,56]. Meanwhile, most studies in this field have been based on space-borne and airborne technologies, which ignore the influence of canopy structure on the relationship between SIF and GPP. This makes it difficult to explain clearly with a simple SIF-GPP linear relationship [57][58][59][60]. Furthermore, it is difficult to obtain high-quality SIF data with appropriate spatial and temporal resolutions, and it is difficult to quantitatively assess the factors that influence the relationship between SIF and GPP. Although SIF and GPP have a significant relationship at the regional scale, the mechanisms that drive and determine the relationship between them, reflect complex underlying physiological processes, which prohibit the use of SIF for estimating GPP across larger scales.
The purpose of this study was to estimate global GPP from the global, OCO-2-based SIF product (GOSIF) and other auxiliary data using machine learning techniques. In situ GPP measurements from flux towers are small in volume, not evenly distributed globally and has a smaller measured footprint than the GOSIF data (0.05 • ). It cannot be used directly for model building. Here, in situ measurements were scaled through two fine-resolution (500 m) satellite GPP products: Moderate Resolution Imaging Spectroradiometer (MODIS) and Global LAnd Surface Satellite (GLASS).

Methodology
The GOSIF product was generated using a data-driven approach by training OCO-2 SIF data, the MODIS-derived vegetation index, and climate variables based on the reanalysis method by Li and Xiao [55]. Thus, the current short data record of OCO-2 SIF (from September 2014) was extended to a longer time period (from March 2000), and the sparse footprint was extended to the global scale (0.05 • ; eight-day) [55]. SIF estimates showed strong correlation with GPP data from 91 EC flux sites (R 2 = 0.73, p < 0.001). More details on the GOSIF product are described in Li and Xiao [55]. These advantages suggest that GOSIF data may be more flexible and widely applicable to assessing terrestrial photosynthesis and ecosystem functions. The GOSIF data used here are available online at http://globalecology.unh.edu (accessed on 1 January 2021). Monthly SIF data were downloaded from this source.

GPP Dataset
Three GPP datasets were used in this study-a flux tower dataset and two GPP product datasets. The FLUXNET2015 dataset provides ecosystem-scale data on CO 2 , water, and energy exchange between the biosphere and the atmosphere, in addition to other meteorological and biological measurements from 212 sites around the globe (over 1500 site-years, up to and including the year 2014) [61]. The FLUXNET2015 dataset can be downloaded at https://fluxnet.org/data/fluxnet2015-dataset/ (accessed on 1 January 2021); Data quality control and processing for this dataset have been improved compared to previous versions. We used the monthly GPP product (GPP_NT_CUT_REF) of the GOSIF grid cell flux sites, which has a dominant vegetation coverage type of over 60%.
The first GPP product was the MODIS GPP product (MOD17A2H Version 6), which was obtained from the Distributed Active Archive Center (DAAC) of the U.S. National Aeronautics and Space Administration (NASA) (https://lpdaac.usgs.gov/products/mod1 7a2hv006/ (accessed on 1 January 2021)) [24]. It was generated using an LUE approach with a spatial resolution of 500 m and an eight-day interval. We also used the GPP product from the GLASS products suite [62]. It has a spatial resolution of 500 m and a temporal resolution of eight days. The algorithm originates from the Eddy Covariance Light Use Efficiency (EC-LUE) model, and it integrates eight LUE models that are widely used across the world [63,64]. The GLASS data are available online at http://glass-product.bnu.edu.cn/ (accessed on 1 January 2021). They have been extensively validated and are in agreement with FLUXNET observations and also been proven to be a reliable long-term estimate for global GPP [65,66].

Land Cover Dataset
The MODIS Land Cover Type Version 6 product is used in this study. It was derived using supervised classifications of MODIS Terra and Aqua reflectance data [67][68][69]. Additional post-processing was then performed, which incorporates prior knowledge and ancillary information, to further refine specific classes [69]. In this study, we used the International Geosphere-Biosphere Programme (IGBP) classification on annual scales of 500 m (MCD12Q1) and 0.05 • (MCD12C1). The primary land cover scheme identifies 17 classes defined by the IGBP, including 11 natural vegetation classes, three human-altered classes, and three non-vegetated classes. Each pixel represents the type of annual dominant ground within a grid cell.

Near-Infrared Radiance of Vegetation (NIRv)
NIRv has been suggested as the effective substitution of satellite SIF through theoretical derivations and radiative transfer simulations [70,71]. It is the product of the NDVI and near-infrared reflectance; it can be used as an approximation of the canopy-escaping probability of emitted SIF. NIRv can explain most of the variation in GPP on monthly and annual scales and has been successfully used to estimate global GPP [70,72]. Badgley [72] and Wang [73] also suggested that using satellite NIRv to estimate global GPP does not need additional information on environmental conditions. It can also be used to convert observed SIF recorded at a given angle to hemispherical SIF emissions, which are better correlated to GPP [71,74]. In this study, we used NIRv at a spatial resolution of 0.05 • and a monthly temporal resolution; it was calculated using surface reflectance data from MODIS data (MCD43C4) [47,75]: where ρ nir and ρ red are the reflectance acquired in the near-infrared (841-876 nm) and red (620-670 nm) portions of the electromagnetic spectrum, respectively [76]. The constant (0.08) was subtracted to reduce the impact of bare soil [77,78].

Digital Elevation Model
We used topographic data from the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010). GMTED2010 is a global continental-wide elevation dataset produced by the U.S. Geological Survey (USGS) and the U.S. National Geospatial-Intelligence Agency (NGA). It features improved vertical accuracy compared to the GTOPO30 dataset. It has several resolution levels: 30-arc-second (1 km), 15-arc-second (450 m), and 7.5-arc-second (225 m). The data can be accessed via the Earth Explorer site: https://earthexplorer. usgs.gov/ (accessed on 1 January 2021). To match the spatial resolution of GOSIF data, the GMTED2010 1 km data were resampled to a 0.05 • spatial resolution.

Methodology
The schematic representation of the estimated GPP based on GOSIF SIF products and auxiliary data is shown in Figure 1.
The first step comprised data pre-processing. The data sources used in this study were not identical, and they featured different spatial and temporal resolutions. Therefore, to reduce the uncertainty caused by geographic location, all remote sensing data were converted to the same geographical projection (WGS84), and the data were resampled to the same spatial resolution as GOSIF (0.05 • ) using the bilinear interpolation method. In addition, to ensure the accuracy of machine learning methods during modeling, GPP ground observations should be used as the input variable in the training set of the machine learning model. However, the data volume derived from flux towers is small, the spatial scale of flux Remote Sens. 2021, 13, 963 5 of 20 towers is small, the underlying ground surface is complex, and the vegetation types are diverse. As they are affected by spatial heterogeneity, it is difficult to directly apply the observations from the GPP flux towers to the regional scale. Therefore, we have calibrated the GPP products (MODIS GPP and GLASS GPP) through multiple linear regression methods, using flux observations from sites with different vegetation types. Thus, we have obtained new GPP datasets as "reference values" for the training set of the machine learning model. The first step comprised data pre-processing. The data sources used in this study were not identical, and they featured different spatial and temporal resolutions. Therefore, to reduce the uncertainty caused by geographic location, all remote sensing data were converted to the same geographical projection (WGS84), and the data were resampled to the same spatial resolution as GOSIF (0.05°) using the bilinear interpolation method. In addition, to ensure the accuracy of machine learning methods during modeling, GPP ground observations should be used as the input variable in the training set of the machine learning model. However, the data volume derived from flux towers is small, the spatial scale of flux towers is small, the underlying ground surface is complex, and the vegetation types are diverse. As they are affected by spatial heterogeneity, it is difficult to directly apply the observations from the GPP flux towers to the regional scale. Therefore, we have calibrated the GPP products (MODIS GPP and GLASS GPP) through multiple linear regression methods, using flux observations from sites with different vegetation types. Thus, we have obtained new GPP datasets as "reference values" for the training set of the machine learning model.
The second step entailed building the model. In recent years, SIF and NIRv provide alternative approaches to estimate global GPP [45,70,73]. Meanwhile, precipitation and temperature are important for plant photosynthesis. Thus, we added SIF, DEM, land cover, NIRv, month, latitude, and longitude as explanatory variables. To ensure the robustness of the model, we randomly sampled the calibrated GPP datasets in 2001, 2004, 2007, 2010, and 2013, and added the explanatory variables. Then, the generated dataset was divided into a training dataset (70% of the generated dataset) and a validation dataset (30% of the generated dataset). Random forest (RF) is an integrated machine learning algorithm that integrates multiple weak classifiers (decision trees) [79]. Multivariate adap- The second step entailed building the model. In recent years, SIF and NIRv provide alternative approaches to estimate global GPP [45,70,73]. Meanwhile, precipitation and temperature are important for plant photosynthesis. Thus, we added SIF, DEM, land cover, NIRv, month, latitude, and longitude as explanatory variables. To ensure the robustness of the model, we randomly sampled the calibrated GPP datasets in 2001, 2004, 2007, 2010, and 2013, and added the explanatory variables. Then, the generated dataset was divided into a training dataset (70% of the generated dataset) and a validation dataset (30% of the generated dataset). Random forest (RF) is an integrated machine learning algorithm that integrates multiple weak classifiers (decision trees) [79]. Multivariate adaptive regression splines (MARS) is an automatic and adaptive nonparametric regression algorithm, which attempts to build a nonlinear regression model by fitting the weighted sum of multivariate spline basis functions [80]. The gradient boosting regression trees (GBRT) method proposed by Friedman is an iterative decision tree, which combines several weak classifiers in different proportions to form a strong classifier by additive model and forward distribution algorithm, and has been applied in remote sensing [81,82]. Light gradient boosting machine (LightTGBM) uses a decision tree-based learning algorithm that enables fast, efficient, parallel learning and large-scale data processing [83]. Long short-term memory (LSTM), a kind of neural network used for processing sequence data, is a special kind of recurrent neural network (RNN) [84]. In this study, global GPP estimation models were established using random forest (RF), MARS, GBRT, LightGBM, and LSTM algorithms using a training dataset.
Subsequently, the validation dataset was used to validate these five models. The accuracies of models established by these algorithms were compared, and the optimal model and explanatory variables were selected to estimate the long-term global GPP.
Based on the five model results for the RF, MARS, GBRT, LightGBM, and LSTM algorithms, we compared two statistical indicators: coefficient of determination (R 2 ) and root-mean-square error (RMSE) (see Appendix A for details). By comparing the accuracy of the model training dataset with the validation dataset and the efficiency of the model, we found that the GBRT model was more suitable for estimating global GPP based on GOSIF, and so we used this model to estimate global GPP from 2001 to 2017.

Validation against Site GPP
We established linear correlations between the SIF-estimated GPP, MODIS GPP, and GLASS GPP (GPP SIF, GPP MOD , and GPP GLASS , respectively) and the flux tower GPP observations (GPP EC ) for each vegetation type from 2001 to 2014, as shown in Table 1. For different vegetation types, GPP SIF (R 2 = 0.16-0.78, RMSE = 1.50-4.72 g C·m −2 ·day −1 , bias = -1.77-1.43 g C·m −2 ·day −1 ) performed well compared to GPP MODIS (R 2 = 0.10-0.75, RMSE = 1.12-5.21 g C·m −2 ·day −1 , bias = −2.14-0.54 g C·m −2 ·day −1 ) and GPP GLASS (R 2 = 0.11-0.72, RMSE = 1.31-4.92 gC·m −2 ·day −1 , bias = −1.14-2.23 g C·m −2 ·day −1 ). The poor performances of the different vegetation types may have been because of the small number of sites, or because of insufficient accuracy within the GPP products (e.g., the quality of the global land cover product). It is worth noting that the R 2 , RMSE, and the bias of wetlands (WET) did not perform well, meaning that more work is needed before the GPP of this vegetation type can be estimated using GOSIF datasets. The coefficients of determination of the three remote sensing products and GPP EC were small for evergreen broadleaf forests (EBF), savanna (SAV), and woody savannas (WSA), but were slightly higher for deciduous broadleaf forests (DBF), grasslands (GRA), and open shrublands (OSH). This indicates that the vegetation type had a great impact on the GPP estimations of remote sensing products. Referring to Table 1, the RMSE of the GPP MODIS -GPP EC model was the largest among the ten vegetation types. Overall, GPP SIF, GPP MOD , and GPP GLASS underestimated the GPP for croplands (CRO), DBF, evergreen needleleaf forests (ENF), mixed forests (MF), SAV, and WSA, whereas they overestimated GPP for OSH. More than half of the GPP values were underestimated by GPP MOD . MODIS GPP underestimated vegetation GPP, which has also been shown consistently in other studies [85,86]. For all sites, the correlations of GPP SIF -GPP EC , GPP MODIS -GPP EC , and GPP GLASS -GPP EC were similar, but GPP SIF -GPP EC had the lowest RMSE and bias. To some extent, GOSIF demonstrated the ability to estimate GPP, and SIF outperformed the LUE model with regards to estimating GPP. The poor performances of GPP MODIS and GPP GLASS may have been because MODIS and GLASS used the LUE model to calculate GPP. ε p is a key parameter within the LUE model, but it exhibits great variability within a single biological community. In the LUE model, ε p is often set as a fixed parameter according to a specific vegetation type, leading to great uncertainty in the estimation of GPP. Moreover, the vegetation index is not sensitive to changes in plant photosynthetic status when vegetation is under stress (e.g., due to temperature extremes or limited water). This could also lead to inaccuracies in the model. In contrast, SIF is very sensitive to plant growth status.
Scatter plots between GPP MOD and GPP EC , and scatter plots between GPP GLASS and GPP EC , and scatter plots between GPP SIF and GPP EC were analyzed in 2014, as shown in Figure 2. R 2 between GPP MOD and GPP EC was 0.59, RMSE was 2.74 g C·m −2 ·day −1 , and Bias was -1.17 g C·m −2 ·day −1 . As for the GLASS GPP, R 2 was 0.59, RMSE was 2.49 g C·m −2 ·day −1 , and Bias was -0.49 g C·m −2 ·day −1 . R 2 between GPP SIF and GPP EC could be as high as 0.64, RMSE was 2.33 g C·m −2 ·day −1 , and Bias was -0.35 g C·m −2 ·day −1 . We found that RMSEs of GPP GLASS -GPP EC and GPP SIF -GPP EC in 2014 was marginally smaller than RMSE of GPP MOD -GPP EC in 2014. Therefore, we found that GPP SIF achieved the highest precision in 2014. Scatter plots between GPPMOD and GPPEC, and scatter plots between GPPGLASS and GPPEC, and scatter plots between GPPSIF and GPPEC were analyzed in 2014, as shown in Figure 2. R 2 between GPPMOD and GPPEC was 0.59, RMSE was 2.74 g C·m -2 ·day -1 , and Bias was -1.17 g C·m -2 ·day -1 . As for the GLASS GPP, R 2 was 0.59, RMSE was 2.49 g C·m -2 ·day -1 , and Bias was -0.49 g C·m -2 ·day -1 . R 2 between GPPSIF and GPPEC could be as high as 0.64, RMSE was 2.33 g C·m -2 ·day -1 , and Bias was -0.35 g C·m -2 ·day -1 . We found that RMSEs of GPPGLASS-GPPEC and GPPSIF-GPPEC in 2014 was marginally smaller than RMSE of GPPMOD-GPPEC in 2014. Therefore, we found that GPPSIF achieved the highest precision in 2014.

Spatial Patterns of Estimated GPP
We generated 0.05 • GPP SIF data from 2001 to 2017, and maps of global GPP in 2014, as shown in Figure 3. Maps of other years are given in Appendix B. The absence of input data at high latitudes led us to mask the seasonal results at high latitudes when comparing the spatial patterns and seasonal variations of the three products. Compared with GPP MOD and GPP GLASS , our GPP SIF exhibited reasonable spatial and seasonal variation on a global scale. Comparing the three GPP products for different seasons in 2014 revealed that GPP SIF exhibited a high level of agreement with GPP GLASS and a low level of agreement with the GPP MOD . The seasonal variation observed in GPP MOD was not as great as that in the other two GPP products. In addition, GPP GLASS and GPP SIF exhibited greater seasonal changes, especially in tropical regions. GPP MOD was visually different from the other two products in tropical regions.
In December, January, and February (DJF), global GPP was mainly concentrated in the range of 0 to 300 g C·m −2 , and the highest GPP (GPP > 200 g C·m −2 ) were found in humid tropical areas such as Amazonia, Central Africa, and Southeast Asia. In the Northern Hemisphere, GPP above 30 • was generally below 100 g C·m −2 . In March, April, and May (MAM), global GPP was mainly between 0and 500 g C·m −2 . The GPP values of North America, Asia, and Europe ranged between 100 and 300 g C·m −2 , whereas most of Australia was below 100 g C·m −2 . The highest GPP (GPP > 300 g C·m −2 ) was found in Amazonia, Central Africa, and Southeast Asia. In June, July, and August (JJA), global GPP was mainly about 400 g C·m −2 . The highest GPP were mainly between 500 and 600 g C·m −2 , and was found in the Amazon. The GPP of North America, Europe, and Asia were mainly between 300 and 500 g C·m −2 . Within this, GPP was lower in some high-altitude regions, such as the Iranian plateau, the Mongolian plateau, and the Rocky Mountains. In September, October, and November (SON), high GPP values ranged between 400 and 600 g C·m −2 , and were distributed in the Amazon and Southeast Asia. Across the rest of the world, GPP was mainly between 100 and 300 g C·m −2 at this time. The seasonal pattern of global GPP revealed that the regions with the highest GPP were mainly in humid tropical areas such as the Amazon, Central Africa, and Southeast Asia, where both the temperature and moisture requirements for photosynthesis are adequately satisfied. Vegetation productivity in temperate regions was at an intermediate level, and vegetation productivity in cold and higher altitude regions was the lowest because temperature and or precipitation in these regions were limited.

Spatial Patterns of Estimated GPP
We generated 0.05° GPPSIF data from 2001 to 2017, and maps of global GPP in 2014, as shown in Figure 3. Maps of other years are given in Appendix B. The absence of input data at high latitudes led us to mask the seasonal results at high latitudes when comparing the spatial patterns and seasonal variations of the three products. Compared with GPPMOD and GPPGLASS, our GPPSIF exhibited reasonable spatial and seasonal variation on a global scale. Comparing the three GPP products for different seasons in 2014 revealed that GPPSIF exhibited a high level of agreement with GPPGLASS and a low level of agreement with the GPPMOD. The seasonal variation observed in GPPMOD was not as great as that in the other two GPP products. In addition, GPPGLASS and GPPSIF exhibited greater seasonal changes, especially in tropical regions. GPPMOD was visually different from the other two products in tropical regions. In December, January, and February (DJF), global GPP was mainly concentrated in the range of 0 to 300 g C·m -2 , and the highest GPP (GPP > 200 g C·m -2 ) were found in humid tropical areas such as Amazonia, Central Africa, and Southeast Asia. In the Northern Hemisphere, GPP above 30° was generally below 100 g C·m -2 . In March, April, and May (MAM), global GPP was mainly between 0and 500 g C·m -2 . The GPP values of North The GPP of different vegetation coverage types in each season in 2014 is shown in Figure 4 (note missing high latitude values). The GPP was highest in JJA, which was also related to the amount of data (GOSIF had fewer missing values at high latitudes for JJA). EBF had high GPP throughout the year, whereas GPP was always low for CSH, OSH, and WET. Comparing the statistics of the three GPP products for different vegetation coverage types, the barplot from the four seasons shows that the GPP of MODIS was low compared to the other two GPP products. In DJF, the EBF GPP reached as high as 189.30 g C·m −2 (GPP SIF ), 157.89 g C·m −2 (GPP MODIS ), and 206.72 g C·m −2 (GPP GLASS ), with a difference of up to 48.83 g C·m −2 between the three products. Except for ENF, GPP MODIS was the lowest among the three products across different vegetation coverage types. The GPP SIF , GPP MODIS , and GPP GLASS of CRO were all almost equal (28.48 g C·m −2 , 27.60 g C·m −2 , and 29.49 g C·m −2 , respectively). In MAM, the GPP of ENF, EBF, DBF, Remote Sens. 2021, 13, 963 9 of 20 MF, SAV, and WSA all exceeded 100 g C·m −2 . GPP GLASS was also higher than GPP SIF regarding EBF while the GPP of MODIS was higher than the other two GPP products in CSH and OSH. The GPP SIF , GPP MODIS , and GPP GLASS of OSH were similar (22.00 g C·m −2 , 22.57 g C·m −2 , and 19.97 g C·m −2 , respectively). During JJA, the GPP of CSH, OSH, GRA, and WET did not exceed 150 g C·m −2 . Among the different vegetation coverage types, the GPP SIF and GPP GLASS were all almost equal in EBF, DBF, MF, CSH, and CRO. For EBF, the differences between the three products were the smallest in a year. The GPP SIF , GPP MODIS , and GPP GLASS of CSH were all almost equal (118.07 g C·m −2 , 114.89 g C·m −2 , and 118.88 g C·m −2 , respectively). During SON, the GPP of each vegetation coverage type decreased to a certain extent, compared with the previous season. The GPP of ENF, DNF, CSH, OSH, and WET were almost below 100 gc·m −2 . The GPP SIF and GPP GLASS of EBF, DBF, MF, CSH, OSH, and WET were all very similar, which was in clear contrast with GPP MODIS . For CSH and OSH, GPP SIF , GPP MODIS , and GPP GLASS were almost equal.  The differences between GPPSIF and those of the other products are shown in Figure  5. During DJF, the differences between GPPSIF and GPPMODIS were mainly between -100 and 200 g C·m -2 , with differences ranged between 0 and 100 g C·m -2 in Europe and Asia, ranged between -100 and 0 g C·m -2 in Southern Asia and Southeast Asia, ranged between 100 and 200 g C·m -2 in Central Africa and the Amazon. The differences between the GPPSIF and GPPGLASS were mainly between -100 and 100 g C·m -2 . In tropical regions such as Southeast Asia, Central Africa, and the Amazon, the differences were mainly between -100 and 0 g C·m -2 . In MAM, the differences between GPPSIF and GPPMODIS were mainly between 0 and 100 g C·m -2 , with a few regions in the Amazon, Central Africa, and Europe having differences ranged between 100 and 200 g C·m -2 ; in the northern part of Australia, it reached -100 g C·m -2 . The differences between GPPSIF and GPPGLASS were mainly between 0 and 100 g C·m -2 , with some parts of the Amazon, Central Africa, and Europe exhibiting differences around -100 g C·m -2 . In JJA, the differences between GPPSIF and GPPMODIS were mainly between -100 and 200 g C·m -2 , with differences between 100 and 200 g C·m -2 in the equatorial tropics. Some regions in Central Africa exhibited differences as high as 200 to 300 g C·m -2 . The differences between GPPSIF and GPPGLASS were mainly between -100 and 200 g C·m -2 ; Eastern Europe and North Asia exhibited differences in the range of 100 to 200 g C·m -2 , with one part of the central Amazon reaching a difference of 200 g C·m -2 . In SON, the difference between GPPSIF and GPPMODIS was mainly in the range of -100 to 200 g C·m -2 . The larger differences were mostly located near the equator. The differences between the GPPSIF and GPPGLASS were mainly between -100 and 100 g C·m -2 . It is noteworthy that some of the differences around the equator reached 200 g C·m -2 , especially in the Guiana Plateau in northern South America. It is not difficult to determine that many of the large differences were concentrated in tropical regions and regions with high altitudes. In humid tropical regions at lower latitudes, where the temperature and humidity requirements for photosynthesis are adequately satisfied, the vegetation coverage types are complex and unevenly distributed, resulting in large differences in GPP and inaccurate predictions. Areas at higher latitudes or altitudes, where there are limited temperature and precipitation and so photosynthesis is limited, have homogeneous vegetation distributions and are therefore less complex than areas in the tropics. As a result, the GPP of these areas is lower overall, and the differences in GPP between various vegetation The differences between GPP SIF and those of the other products are shown in Figure 5. During DJF, the differences between GPP SIF and GPP MODIS were mainly between -100 and 200 g C·m −2 , with differences ranged between 0 and 100 g C·m −2 in Europe and Asia, ranged between -100 and 0 g C·m −2 in Southern Asia and Southeast Asia, ranged between 100 and 200 g C·m −2 in Central Africa and the Amazon. The differences between the GPP SIF and GPP GLASS were mainly between -100 and 100 g C·m −2 . In tropical regions such as Southeast Asia, Central Africa, and the Amazon, the differences were mainly between -100 and 0 g C·m −2 . In MAM, the differences between GPP SIF and GPP MODIS were mainly between 0 and 100 g C·m −2 , with a few regions in the Amazon, Central Africa, and Europe having differences ranged between 100 and 200 g C·m −2 ; in the northern part of Australia, it reached -100 g C·m −2 . The differences between GPP SIF and GPP GLASS were mainly between 0 and 100 g C·m −2 , with some parts of the Amazon, Central Africa, and Europe exhibiting differences around -100 g C·m −2 . In JJA, the differences between GPP SIF and GPP MODIS were mainly between -100 and 200 g C·m −2 , with differences between 100 and 200 g C·m −2 in the equatorial tropics. Some regions in Central Africa exhibited differences as high as 200 to 300 g C·m −2 . The differences between GPP SIF and GPP GLASS were mainly between -100 and 200 g C·m −2 ; Eastern Europe and North Asia exhibited differences in the range of 100 to 200 g C·m −2 , with one part of the central Amazon reaching a difference of 200 g C·m −2 . In SON, the difference between GPP SIF and GPP MODIS was mainly in the range of -100 to 200 g C·m −2 . The larger differences were mostly located near the equator. The differences between the GPP SIF and GPP GLASS were mainly between -100 and 100 g C·m −2 . It is noteworthy that some of the differences around the equator reached 200 g C·m −2 , especially in the Guiana Plateau in northern South America. It is not difficult to determine that many of the large differences were concentrated in tropical regions and regions with high altitudes. In humid tropical regions at lower latitudes, where the temperature and humidity requirements for photosynthesis are adequately satisfied, the vegetation coverage types are complex and unevenly distributed, resulting in large differences in GPP and inaccurate predictions. Areas at higher latitudes or altitudes, where there are limited temperature and precipitation and so photosynthesis is limited, have homogeneous vegetation distributions and are therefore less complex than areas in the tropics. As a result, the GPP of these areas is lower overall, and the differences in GPP between various vegetation types are small. Therefore, predictions in these areas have appeared to be more accurate. The differences between the three GPP products for different vegetation coverage types in each season in 2014 are shown in Figure 6. Regardless of the season, the GPPSIF was essentially higher than those of the other two products. The GPP productivity of the three products differed, especially for EBF, DBF, WSA, SAV, WET, and CRO. GPPGLASS is higher than the other two GPP products for DJF, MAM, and SON for EBF. GPPGLASS is greater than that of the other two products for CRO and WET. GPPMODIS is always lower The differences between the three GPP products for different vegetation coverage types in each season in 2014 are shown in Figure 6. Regardless of the season, the GPP SIF was essentially higher than those of the other two products. The GPP productivity of the three products differed, especially for EBF, DBF, WSA, SAV, WET, and CRO. GPP GLASS is higher than the other two GPP products for DJF, MAM, and SON for EBF. GPP GLASS is greater than that of the other two products for CRO and WET. GPP MODIS is always lower than GPP except for CSH in MAM. CSH and OSH had the smallest differences because of their low global distribution.
Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 22 than GPP except for CSH in MAM. CSH and OSH had the smallest differences because of their low global distribution.

Relative Importance of Variables
In estimating global GPP, the variables used in the model included monthly SIF, Land cover, latitude, longitude, DEM, month, and NIRv. We discussed the relative importance of variables and the results as shown in Figure 7. Generally, SIF was the most important variable in estimating global GPP. The other variables contributed less in estimating global GPP in this study. This conclusion fully reflects the relationship between SIF and GPP, which is strongly consistent with most research results [45,48,53,54]. Other variables used in this study, such as latitude, longitude, DEM, month, etc., reflect precipitation and temperature information and are important for plant photosynthesis. SIF and NIRv provide alternative approaches to estimate global GPP [45,70,73], but as shown in Figure 7, SIF was far more important than NIRv. The advantages and differences of SIF and NIRV estimation of GPP need to be further studied. Although the contribution of these variables to the estimation of GPP is small, the best estimates can only be obtained by using all GPP-related variables.

Relative Importance of Variables
In estimating global GPP, the variables used in the model included monthly SIF, Land cover, latitude, longitude, DEM, month, and NIRv. We discussed the relative importance of variables and the results as shown in Figure 7. Generally, SIF was the most important variable in estimating global GPP. The other variables contributed less in estimating global GPP in this study. This conclusion fully reflects the relationship between SIF and GPP, which is strongly consistent with most research results [45,48,53,54]. Other variables used in this study, such as latitude, longitude, DEM, month, etc., reflect precipitation and temperature information and are important for plant photosynthesis. SIF and NIRv provide alternative approaches to estimate global GPP [45,70,73], but as shown in Figure 7, SIF was far more important than NIRv. The advantages and differences of SIF and NIR V estimation of GPP need to be further studied. Although the contribution of these variables to the estimation of GPP is small, the best estimates can only be obtained by using all GPP-related variables.

Limitations of the Current Study
The results of this study reveal that the GOSIF dataset has great potential for estimating GPP for different vegetation coverage types. Our method requires fewer input variables and has higher computational efficiency than other satellite GPP estimation methods. Satellite-based SIF data provide a unique opportunity for more accurate, near real-time GPP mapping in the future.
However, its suitability varies somewhat. One of the main reasons for this is that significant amounts of flux tower data are not available to the public, and the available data are not evenly distributed across the globe. Consequently, there are huge challenges for using GPPEC to validate SIF-estimated GPP or other GPP products on a global scale. In addition, inaccurate GPPEC observations can also cause errors. To some extent, GPPEC observations obtained by flux towers can simply be considered as "reference values". However, different flux towers have different qualities [85]. Moreover, in this study, we assumed that the vegetation coverage types around the flux towers were homogeneous and uneven. This assumption does not hold consistently. In general, we believe that the smaller the pixel corresponding to the product at the location of the flux tower, the more representative the reference value of the pixel likely is. Here, we used site data to calibrate the GPP products with a spatial resolution of 500 m, and then resampled to 0.05°; this will affect the accuracy of the results. Above all, the input variables of the model training set used in this study also introduced certain uncertainties.
The SIF signal exhibited a strong relationship with GPP, which can be explained mechanistically as follows. Based on the LUE model, GPP can be expressed as Equation (1).
SIF can be similarly conceptualized as a byproduct of photosynthesis, which can be expressed as [34]: where is the spectral wavelength, is the fluorescence quantum yield (i.e., the rate at which a plant absorbs photosynthetically effective radiation and re-emits fluorescence at the wavelength ), and is the ratio at which fluorescence emitted by chloroplasts can escape out of the canopy. Equations (1) and (3) can be combined to describe the theoretical formulation of the GPP-SIF relationship as follows:

Limitations of the Current Study
The results of this study reveal that the GOSIF dataset has great potential for estimating GPP for different vegetation coverage types. Our method requires fewer input variables and has higher computational efficiency than other satellite GPP estimation methods. Satellite-based SIF data provide a unique opportunity for more accurate, near real-time GPP mapping in the future.
However, its suitability varies somewhat. One of the main reasons for this is that significant amounts of flux tower data are not available to the public, and the available data are not evenly distributed across the globe. Consequently, there are huge challenges for using GPP EC to validate SIF-estimated GPP or other GPP products on a global scale. In addition, inaccurate GPP EC observations can also cause errors. To some extent, GPP EC observations obtained by flux towers can simply be considered as "reference values". However, different flux towers have different qualities [85]. Moreover, in this study, we assumed that the vegetation coverage types around the flux towers were homogeneous and uneven. This assumption does not hold consistently. In general, we believe that the smaller the pixel corresponding to the product at the location of the flux tower, the more representative the reference value of the pixel likely is. Here, we used site data to calibrate the GPP products with a spatial resolution of 500 m, and then resampled to 0.05 • ; this will affect the accuracy of the results. Above all, the input variables of the model training set used in this study also introduced certain uncertainties.
The SIF signal exhibited a strong relationship with GPP, which can be explained mechanistically as follows. Based on the LUE model, GPP can be expressed as Equation (1).
SIF can be similarly conceptualized as a byproduct of photosynthesis, which can be expressed as [34]: where λ is the spectral wavelength, ε F is the fluorescence quantum yield (i.e., the rate at which a plant absorbs photosynthetically effective radiation and re-emits fluorescence at the wavelength λ), and f esc (λ) is the ratio at which fluorescence emitted by chloroplasts can escape out of the canopy. Equations (1) and (3) can be combined to describe the theoretical formulation of the GPP-SIF relationship as follows: For the same vegetation type, f esc (λ) remains relatively constant, and if the ratio is constant then SIF(λ) should be a good constraint on the photosynthetic rate, regardless of changes in FPAR, PAR, or environmental stresses. Thus, for this study, the land cover type determined f esc (λ) to some extent, whereas NIRv was used as an approximation of the canopy-escaping probability of emitted SIF.
Over the years, many researches have demonstrated that SIF and GPP have a strong linear relationship compared with the traditional vegetation index [9,45,48]. However, GPP estimates by SIF have been limited by the spatial resolution of the satellite spatial resolution. GOSIF(0.05 • ) offers a finer spatial resolution than other SIF products (e.g., 0.5 • ), providing an unprecedented opportunity to explore the estimation of GPP on an ecosystem or global scale. However, the spatial resolution of GOSIF is currently far from sufficient. To date, there is no global high-precision SIF remote sensing product for estimating GPP on a large regional scale. Here, MODIS datasets, such as the Enhanced Vegetation Index (EVI) and PAR, were used to calculate GOSIF data, and then data were resampled to 0.05 • , which would have caused errors in the predictions. In addition to the uncertainty caused by additional model inputs, the quality of OCO-2 SIF also represented a source of uncertainty for GOSIF in this study. Currently, there are no satellites dedicated to SIF observation. Hence, it is necessary to obtain more accurate SIF datasets. The greatest limitation of the current research is that the spatial resolution and temporal resolution of SIF data are too low to reflect the greatest advantage of SIF estimation of GPP: real-time estimation [85]. Additionally, the linear relationship between SIF and GPP for different vegetation types is different on different time scales [56,87].
Further to the above-stated issues, other input variables in the model and the number of training samples will also have an impact on the results of this study. We argue that the uncertainty of GPP can be estimated by remote sensing data across vegetation coverage types. The accuracy of the vegetation coverage type data has a significant impact on the accuracy of the LUE model, which is one of the advantages of using SIF to estimate GPP. But in this study, we also used the land cover types data [17]. Hence, producing accurate estimates of predictions of land cover types and NIRv are further issues that urgently need to be addressed when estimating GPP.

Conclusions
Most current research into sun-induced chlorophyll fluorescence (SIF)-estimated Gross primary production (GPP) focuses on flux towers and at small regional scales. GPP can be obtained based on the relationship between SIF and GPP, but there are still complex basic physiological processes that need to be constrained. Therefore, it is difficult to quantitatively evaluate the factors that affect the relationship between SIF and GPP.
In this study, global GPP was estimated from GOSIF SIF data using the gradient boosting regression trees (GBRT) model at a spatial resolution of 0.05 • . In this model, we also used land cover, DEM, NIRv, and other data as input variables while SIF was far more important than other variables. This is a new method for estimating GPP using SIF on a global scale. Subsequently, we evaluated the performance of three GPP products GPP SIF , GPP MODIS , and GPP GLASS , when estimating GPP from flux towers and for different vegetation types. Our results show that SIF can estimate global GPP very well, and the GPP estimated based on GOSIF is similar to the results of that derived from FLUXNET sites (R 2 = 0.58, RMSE = 2.74 g C·m −2 , bias = −0.34 g C·m −2 ). Furthermore, the GPP that we estimated using GOSIF demonstrated reasonable spatial and seasonal changes on a global scale. Although we have revealed that it is possible to estimate GPP from SIF at the satellite scale, challenges remain in estimating GPP directly from SIF at the regional or global scale. In the future, the accuracy of estimations of GPP using SIF could be improved by obtaining higher-quality SIF remote sensing datasets.

Conflicts of Interest:
The authors declare no conflict of interest.

Results of Five Algorithms on Validation Datasets
In this study, we used random forest (RF), multivariate adaptive regression splines (MARS), gradient boosting regression trees (GBRT), light gradient boosting machine (Light-GBM), and long short-term memory (LSTM) algorithms to build GPP estimation models. In addition, the testing dataset was a randomly selected 30% of the generated dataset used to validate these models. Based on the training and testing results for the five algorithms. The validation results for the five models are given in Table A1. We determined the optimal models by comparing their respective R 2 and root mean square error (RMSE) values. Based on a comparison of results, we found that the GBRT model was better than the other four models in this study. The R 2 values based on the GBRT model were higher than the other four models while the RMSE achieved by the GBRT model was also lower than others.