Generation of a Global Spatially Continuous TanSat Solar-Induced Chlorophyll Fluorescence Product by Considering the Impact of the Solar Radiation Intensity

: Solar-induced chlorophyll ﬂuorescence (SIF) provides a new and direct way of monitoring photosynthetic activity. However, current SIF products are limited by low spatial resolution or sparse sampling. In this paper, we present a data-driven method of generating a global, spatially continuous TanSat SIF product. Firstly, the key explanatory variables for modelling canopy SIF were investigated using in-situ and satellite observations. According to theoretical and experimental analysis, the solar radiation intensity was found to be a dominant driving environmental variable for the SIF yield at both the canopy and global scales; this has, however, been neglected in previous research. The cosine value of the solar zenith angle at noon (cos (SZA 0 )), a proxy for solar radiation intensity, was found to be a dominant abiotic factor for the SIF yield. Next, a Random Forest (RF) approach was employed for SIF prediction based on Moderate Resolution Imaging Spectroradiometer (MODIS) visible-to-NIR reﬂectance data, the normalized di ﬀ erence vegetation (NDVI), cos (SZA 0 ), and air temperature. The machine learning model performed well at predicting SIF, giving R 2 values of 0.73, an RMSE of 0.30 mW m − 2 nm − 1 sr − 1 and a bias of 0.22 mW m − 2 nm − 1 sr − 1 for 2018. If cos (SZA 0 ) was not included, the accuracy of the RF model decreased: the R 2 value was then 0.65, the RMSE 0.34 mW m − 2 nm − 1 sr − 1 and an bias of 0.26 mW m − 2 nm − 1 sr − 1 , further verifying the importance of cos (SZA 0 ). Finally, the globally continuous TanSat SIF product was developed and compared to the TROPOspheric Monitoring Instrument (TROPOMI) SIF data. The results showed that the globally continuous TanSat SIF product agreed well with the TROPOMI SIF data, with an R 2 value of 0.73. Thus, this paper presents an improved approach to modelling satellite SIF that has a better accuracy, and the study also generated a global, spatially continuous TanSat SIF product with a spatial resolution of 0.05 ◦ .


Introduction
Solar-induced chlorophyll fluorescence (SIF) within the wavelengths ranging from 650 to 850 nm, and which has two peaks centered at 685 and 740 nm [1], is well known as a good proxy for photosynthetic activity [2,3]. Numerous studies have shown that the gross primary productivity (GPP) of vegetation photosynthesis can be directly estimated using satellite, airborne, in-situ and ground level SIF data [4][5][6][7][8][9].
where PAR is the downwelling photosynthetically active radiation (PAR), f PAR is the fraction of PAR absorbed by vegetation, and SIF yield is the fluorescence quantum yield at the canopy level. The SIF contains plant physiological and structural information together with the mixed effects of vegetation biochemistry (i.e., pigments) and structure [26,27], whereas the reflectance provides only canopy structural and biochemic information. SIF yield , influenced by vegetation biochemistry, contains plant physiologic information. While f PAR, which can be estimated using reflectance data, mainly contains vegetation structural information. Zhang et al. [22] and Gentine and Alemohammad [28] constructed the relationship between reflectance and SIF, and reflectance only contains vegetation structural information. Other researchers have taken the land-cover type, Vapor Pressure Deficit (VPD), air temperature, Evapotranspiration (ET) and Normalized Difference Vegetation Index (NDWI) into consideration [21,[29][30][31] in order to better model the SIF; however, this can only provide part of the physiological information provided by the SIF. According to Gu et al. [32], SIF yield is affected by NPQ and q L , and the solar radiation intensity is a dominant factor for these two physiological variables. However, the solar radiation intensity has not been taken into account in recent prediction models [21,22,31], which has resulted in there being very limited physiological information in the currently available continuous OCO-2 SIF products.
The aims of this study were: (1) to investigate whether the solar radiation intensity, approximated by the cosine value of the solar zenith angle at noon, can provide good physiological information of SIF; (2) to generate global, continuous TanSat SIF products with a spatial resolution of 0.05 • ; and (3) to investigate the accuracy of spatially continuous TanSat SIF products.

TanSat SIF Product
TanSat is the first sun-synchronous satellite that has the potential to retrieve SIF in China and has a revisit period of 16 days. The high spectral resolution (0.044 nm) and high SNR (360 at 15.2 mW m −2 nm −1 sr −1 ) of the Atmospheric Carbon dioxide Grating Spectroradiometer (ACGS) in the region of the O 2 -A band (758-778 nm) provide the potential to retrieve SIF. The TanSat SIF dataset used in this study (https://zenodo.org/record/3883434) was provided by Du et al. [20], and consisted of SIF retrievals (referenced at 758 and 771 nm) retrieved at 13:30 local time. Singular vector decomposition (SVD) was used for SIF retrieval and the surrounding area covering a footprint of 2 × 2 km. TanSat can work in nadir, sun-glint or target mode. However, there was a change from sun-glint mode to nadir mode from October 2018, so the available data for 2019 is equivalent to that for 2017 for the first five months only.
Previous studies have shown that the SIF at different wavelengths has a different sensitivity to stress, and to leaf and canopy reabsorption [33][34][35]. We chose the 758 nm fluorescence because it is closer to the peak band and has a stronger fluorescence signal. In addition, studies have shown that the SIF at 757 nm is better at predicting GPP than the 771 nm SIF [36]. Observations covering the period from February 2017 to August 2019 made at nadir were used in this study to avoid the potential impact of the viewing geometry, as the glint mode tends to underestimate SIF [7].

TROPOMI SIF Product
The TROPOMI instrument is carried on-board the Sentinel 5 Precursor (S-5P) satellite [37], which has a crossing time of 13:30 local solar time and a revisit time of 17 days. Available ungridded SIF data with a footprint of 3.5 × 7 km is available from February 2018 and gridded data with a spatial resolution of 0.2 • × 0.2 • is available from March to October 2018 [18]. For each 0.2 • grid cell, if the footprint of the sample covered the center of the cell, this sample was used to calculate the SIF [18].
As the crossing time for TROPOMI is the same as that for TanSat and because it provides almost complete global coverage, 0.2 • Tropomi SIF data were used for comparison with the continuous TanSat SIF produced in this study.

MODIS NBAR Reflectance Product
The MODIS Collection 6 Nadir Bidirectional reflectance distribution Adjusted Reflectance (NBAR) product (MCD43C4) was employed to explain the SIF structural information. The BRDF product uses a combination of Terra and Aqua data over a period of 16 days to generate the highest possible quality data for each day. The NBAR product has a spatial resolution of 0.05 • × 0.05 • and computes the reflectance at a nadir viewing angle for each pixel at local solar noon, which should result in a more stable and consistent product [38].
As the swath width of TanSat satellite is narrow and the selected samples were acquired in nadir mode, the use of NBAR was considered reasonable. Additionally, only four bands of MCD43C4 (red, NIR, blue and green) were used as explanatory remote sensing variables in the modelling of the SIF, which is the same as in Zhang et al. [22] and Gentine and Alemohammad [28]. These four bands contain most of the vegetation information and drive most of the variation in the SIF [39]. In order to reduce the uncertainty in modelling the SIF, only high-quality MCDC43C4 data (Quality = 0) were used.

Air Temperature Datasets
Air temperature products can be used to provide physiological information for vegetation [31]. In this study, we used GLDAS/Noah L4 3-hourly 0.25-degree products (download address: http://disc.sci.gsfc.nasa.gov/hydrology/data-holdings). At the basin scale (14,700 km 2 ), this highly Remote Sens. 2020, 12, 2167 4 of 18 accurate temperature product is suitable for water and energy-cycling studies [40,41]. The GLDAS data, acquired at 3-h intervals, was produced on February 24, 2000; a detailed description can be seen in Rodell et al. [42].

Tower-Based Datasets
We used both satellite and tower-based data to obtain the relationship between SIF yield and cos (SZA 0 ). The SIF inversion was based mainly on the Fraunhofer Line Depth (FLD) and the Spectral Fitting Method (SFM). The tower-based cropland SIF dataset used in this study was acquired at the Huailai, Daman and Aurora sites [43][44][45]. The canopy SIF data from Huailai and Daman covered the period from July to September in both 2017 and 2018, and from April to October in 2019. The canopy SIF dataset in maize field from Aurora site covered the period from July 2018 to September 2018 (doi: 10.22002/D1.1226), which are available from California Institute of Technology (https://data.caltech.edu/records/1226) [45]. Another canopy SIF dataset that was used was acquired at the Niwot Ridge site from June 2017 to June 2018; the major species here is evergreen needleleaf. This needle-scale SIF data (doi: 10.22002/D1.1231) are available from the California Institute of Technology (https://data.caltech.edu/records/1231) [9].
In total we acquired 372 samples from the Huailai and Daman sites, 209 samples from the Niwot Ridge site and 57 samples from the Aurora site. Table 1 shows the detailed information of these sites. According to Gu et al. [32], the SIF can be calculated as: where ε is the canopy escape probability of SIF photons, α gm is the fraction of PAR absorbed by green leaves, and Φ SIF is the fluorescence quantum yield. This formula can be divided into three terms: PAR, εα gm (the SIF variation related to the canopy structure) and Φ SIF (the SIF variation related to plant physiological factors). Since the incoming PAR of instantaneous SIF differs according to the location, the SIF normalized by PAR has been widely used to describe the global photosynthetic capability [46,47]. For satellite SIF observations, which are available only under clear-sky conditions, the PAR can be assumed to be linearly correlated with the cosine value of the solar zenith angle at the transit time of the satellite [8,11,48]. SIF normalized is the SIF normalized to a fixed transit time, based on the central latitude and longitude of each grid cell (Equation (3)): where cos (SZA) is the cosine of the solar zenith angle at the satellite transit time. Then Equation (2) can be written as: At the far-red band, ε, the canopy escape probability of the SIF, is dominated by the canopy bi-directional reflectance (BRF) [25,[49][50][51]. α gm , the FAPAR for green leaves, has been widely estimated Remote Sens. 2020, 12, 2167 5 of 18 using combinations of BRFs [52][53][54]. The second term in Equation (2) depends on the reflectance and vegetation indices (VIs), which are directly related to plant structure. Additionally, since the selected TanSat SIF was acquired nadir mode, it was reasonable to assume that εα gm , which is the contribution of the structural information to SIF, could be modeled using BRDF-corrected reflectance and VIs.
Φ SIF , which accounts for the contribution of plant physiological factors to SIF, can be determined using chlorophyll fluorescence parameters, which, in turn, can be obtained by pulse-amplitude-modulated (PAM) fluorometry [32]. Since, in this case, the value of PSI SIF was low and stable, PSI was not taken into consideration when calculating Φ SIF [32,33,55]; where k DF is the ratio of thermal dissipation (k D ) to fluorescence emission (k F ), NPQ is the adjustable heat dissipation, q L is the fraction of photosynthesis system II (PSII) reaction centers, and Φ PSIImax is the maximum photochemical quantum yield of PSII.
In Equation (5), k DF is always assumed to be constant as k D and k F are considered to be intrinsic physical properties of the chlorophyll molecule. Φ PSIImax typically ranges from 0.8 to 0.83 and varies slightly under environmental stress [56]. Therefore, Φ SIF mainly depends on NPQ and q L . Numerous studies have shown that NPQ and q L vary with external conditions and stresses, such as illumination, temperature, and water stress [32,57,58], and that illumination or PAR is the main driving factor for them [32]. In addition, in diurnal experiments carried out on sunny days and investigations into light response, NPQ and q L have been shown to have a strong relationship with the illumination [32,58,59]. For satellite-based observations, only the SIF on sunny days is available, and so our aim was to produce a global SIF product for clear conditions. Therefore, the cosine of the solar zenith angle at noon (cos (SZA 0 )) was used to represent the illumination conditions in this study and for simulating the fluorescence quantum yield (SIF yield ) at the canopy scale.
Therefore, BRDF-corrected reflectances (red, NIR, blue and green bands), NDVI, cos (SZA 0 ), and the air temperature were selected as explanatory variables for modelling satellite SIF normalized signals. Compared to other studies [21,22,[29][30][31], it was hoped that adding cos (SZA 0 ) as a new explanatory variable might improve the modelling of the contribution of Φ SIF to SIF, given the light response of NPQ and q L to cos (SZA 0 ). The data-driven model used for reconstructing the normalized SIF was: SIF normalized = f (Rs, NDVI, cos(SZA 0 ), T) where NDVI is the normalized difference vegetation index, T is the air temperature, and Rs represents the four BRDF-corrected reflectances at the red, NIR, blue and green bands.

Random Forest Approach for SIF Modeling
Random forest has been widely applied in remote sensing applications such as classification, and the estimation of high-density wetland biomass [60][61][62][63]. Random forest (RF) was first proposed by Leo Breiman [64] and is a classifier that uses multiple decision trees to train and predict samples. Compared with other widely used non-parametric algorithms, RF is insensitive to unbalanced distributions and the problem of missing data. Because of the random way it selects data for splitting each tree node, it is also less sensitive to the over-fitting problem [60,64]. It performs better for large, high-dimensional data sets and is more robust to noise and feature selection [34,[61][62][63]65]. Additionally, the prediction ability of Random Forest is resistant to the multicollinearity of the driven variables [66][67][68], so all the all the variables in Equation (2) were used to predict the spatially continuous TanSat SIF. The main parameters used in RF are the numbers of input prediction variables and decision trees. Gislason et al. [69] found that there was no obvious relationship between the accuracy and the number of selected prediction variables, and so used the default value of the square root of the total number of variables [62,70]. For the number of decision trees, Du et al. [71] found by developing 10 to 200 trees (at intervals of 10) that the accuracy was insensitive to the number of trees. In this study, Remote Sens. 2020, 12, 2167 6 of 18 100 trees and six variables (three prediction variables were selected) were used in RF. The importance of each predictor was calculated by calculating the percentage increase in the mean-squared error when the value of a particular variable was changed (while keeping the other values unchanged): this allowed us to determine which were the important variables [64].
Here, a Random Forest (RF) approach was employed to model the normalized SIF, as in Equation (6): the flowchart for the corresponding procedure is illustrated in Figure 1. The input variables used were MCD43C4, and the vegetation index, air temperature and cos (SZA 0 ) datasets. Since the spatial resolution of MCD43C4 is 0.05 • , to maintain consistency with MCD43C4, the TanSat SIF needed to be aggregated to a resolution of 0.05 • . The SIF training samples were eliminated if there were less than 20 samples in each 0.05 • cell between 10 • S and 60 • N. However, due to the low vegetation coverage in some areas, there were few valid data. Therefore, in order to keep the integrity of the global coverage of the samples, we relaxed these restrictions and only eliminated samples which had less than 5 samples in each grid cell. The final results were that there were 118,722 SIF grid cells for 2017, 187,077 cells for 2018, and 173,153 cells for 2019. The air temperature data were also resampled to 0.05 degrees using polynomial interpolation.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 18 The importance of each predictor was calculated by calculating the percentage increase in the meansquared error when the value of a particular variable was changed (while keeping the other values unchanged): this allowed us to determine which were the important variables [64].
Here, a Random Forest (RF) approach was employed to model the normalized SIF, as in Equation (6): the flowchart for the corresponding procedure is illustrated in Figure 1. The input variables used were MCD43C4, and the vegetation index, air temperature and cos (SZA0) datasets. Since the spatial resolution of MCD43C4 is 0.05°, to maintain consistency with MCD43C4, the TanSat SIF needed to be aggregated to a resolution of 0.05°. The SIF training samples were eliminated if there were less than 20 samples in each 0.05° cell between 10° S and 60° N. However, due to the low vegetation coverage in some areas, there were few valid data. Therefore, in order to keep the integrity of the global coverage of the samples, we relaxed these restrictions and only eliminated samples which had less than 5 samples in each grid cell. The final results were that there were 118,722 SIF grid cells for 2017, 187,077 cells for 2018, and 173,153 cells for 2019. The air temperature data were also resampled to 0.05 degrees using polynomial interpolation. To validate the RF model, we randomly took 70% of the data as the training set and 30% as the validation set for each year. Three accuracy metrics, the coefficient of determination (R 2 ), Relative Deviation (RD) and Root Mean Square Error (RMSE) were used to evaluate the performance of the RF model.

Relationship between cos (SZA0) and Apparent SIF Yield
The tower-based and satellite experimental data were used to investigate the relationship between the apparent (calculated using the SIF observed at the top of the canopy) and cos (SZA0). As illustrated in Figure 2, there was a seasonal change in the apparent , which was correlated with cos (SZA0) in the maize crop and evergreen forest ecosystems. Therefore, it was concluded that cos (SZA0), which is a proxy for the solar radiation intensity, could be used to describe the seasonal variation in the apparent . To validate the RF model, we randomly took 70% of the data as the training set and 30% as the validation set for each year. Three accuracy metrics, the coefficient of determination (R 2 ), Relative Deviation (RD) and Root Mean Square Error (RMSE) were used to evaluate the performance of the RF model.

Relationship between cos (SZA 0 ) and Apparent SIF Yield
The tower-based and satellite experimental data were used to investigate the relationship between the apparent SIF yield (calculated using the SIF observed at the top of the canopy) and cos (SZA 0 ). As illustrated in Figure 2, there was a seasonal change in the apparent SIF yield , which was correlated with cos (SZA 0 ) in the maize crop and evergreen forest ecosystems. Therefore, it was concluded that cos (SZA 0 ), which is a proxy for the solar radiation intensity, could be used to describe the seasonal variation in the apparent SIF yield . From Figure 2a,c it can be seen that the value of apparent SIF , reaches its peak in July. As illustrated in Figure 2b,d, the apparent SIF also increases as cos (SZA0) increases at the crop sites (slope = 0.68, R 2 = 0.32) and the NR forest site (slope = 0.22, R 2 = 0.41). The different values of the slope may be due to the different types of ecosystem and number of samples. As has been widely observed previously, the value of apparent at the canopy level was higher for cropland than for forest [72,73].
The relationship between apparent and cos (SZA0) was also investigated using the TROPOMI SIF data acquired from February 2018 to February 2019. As in-situ APAR data were not available for TROPOMI SIF, (as defined in Equation (3)) for dense vegetation (NDVI > 0.85) was used as a proxy of the apparent . Only forest pixels lying within the area of dense vegetation (NDVI > 0.85) were selected. The dense vegetation regions at DOY 210 and 365 in 2018 were illustrated in Figure 3.  From Figure 2a,c it can be seen that the value of apparent SIF yield , reaches its peak in July. As illustrated in Figure 2b,d, the apparent SIF yield also increases as cos (SZA 0 ) increases at the crop sites (slope = 0.68, R 2 = 0.32) and the NR forest site (slope = 0.22, R 2 = 0.41). The different values of the slope may be due to the different types of ecosystem and number of samples. As has been widely observed previously, the value of apparent SIF yield at the canopy level was higher for cropland than for forest [72,73].
The relationship between apparent SIF yield and cos (SZA 0 ) was also investigated using the TROPOMI SIF data acquired from February 2018 to February 2019. As in-situ APAR data were not available for TROPOMI SIF, SIF normalized (as defined in Equation (3)) for dense vegetation (NDVI > 0.85) was used as a proxy of the apparent SIF yield . Only forest pixels lying within the area of dense vegetation (NDVI > 0.85) were selected. The dense vegetation regions at DOY 210 and 365 in 2018 were illustrated in Figure 3. From Figure 2a,c it can be seen that the value of apparent SIF , reaches its peak in July. As illustrated in Figure 2b,d, the apparent SIF also increases as cos (SZA0) increases at the crop sites (slope = 0.68, R 2 = 0.32) and the NR forest site (slope = 0.22, R 2 = 0.41). The different values of the slope may be due to the different types of ecosystem and number of samples. As has been widely observed previously, the value of apparent at the canopy level was higher for cropland than for forest [72,73].
The relationship between apparent and cos (SZA0) was also investigated using the TROPOMI SIF data acquired from February 2018 to February 2019. As in-situ APAR data were not available for TROPOMI SIF, (as defined in Equation (3)) for dense vegetation (NDVI > 0.85) was used as a proxy of the apparent . Only forest pixels lying within the area of dense vegetation (NDVI > 0.85) were selected. The dense vegetation regions at DOY 210 and 365 in 2018 were illustrated in Figure 3.   (apparent within the dense vegetation area in TROPOMI ignoring the canopy escape probability) and cos (SZA0) according to TROPOMI for a dense forest region from February 2018 to February 2019. Cos (SZA0) is the cosine of the solar zenith angle at noon, which was used to represent the solar radiation intensity.

Performance of the Random Forest Model in SIF Prediction
To better understand the performance of explanatory variables for , we calculated the importance of the variables in the model using RF; the results are shown in Figure 5.  ; the green and NIR reflectance and NDVI were the next most important. Healthy plants have a high reflectivity in the NIR band. The results for the performance of RF with six explanatory variables are shown as Figure 6.

Performance of the Random Forest Model in SIF Prediction
To better understand the performance of explanatory variables for SIF normalized , we calculated the importance of the variables in the model using RF; the results are shown in Figure 5.  (apparent within the dense vegetation area in TROPOMI ignoring the canopy escape probability) and cos (SZA0) according to TROPOMI for a dense forest region from February 2018 to February 2019. Cos (SZA0) is the cosine of the solar zenith angle at noon, which was used to represent the solar radiation intensity.

Performance of the Random Forest Model in SIF Prediction
To better understand the performance of explanatory variables for , we calculated the importance of the variables in the model using RF; the results are shown in Figure 5.    Figure 5 shows that cos (SZA 0 ) and the air temperature proved to be the most important input parameters for SIF normalized ; the green and NIR reflectance and NDVI were the next most important. Healthy plants have a high reflectivity in the NIR band. The results for the performance of RF with six explanatory variables are shown as Figure 6.
The results show that the RF model performed well at simulating TanSat SIF ( A comparison between models that did and did not include cos (SZA 0 ) was also made. Taking the 2018 dataset, for example, the RF model gave a better accuracy if cos (SZA 0 ) was included, giving an R 2 of 0.72, an RMSE of 0.30 mW m −2 nm −1 sr −1 and an bias of 0.22 mW m −2 nm −1 sr −1 ; this was against an R 2 of 0.65, an RMSE of 0.34 mW m −2 nm −1 sr −1 and an bias of 0.26 mW m −2 nm −1 sr −1 without cos (SZA 0 ). In comparison, R 2 was improved by 0.07, RMSE was reduced by 0.04, and bias was reduced by 0.04. cosine of the solar zenith angle at noon, and NDVI is the normalized difference vegetation index. Figure 5 shows that cos (SZA0) and the air temperature proved to be the most important input parameters for ; the green and NIR reflectance and NDVI were the next most important. Healthy plants have a high reflectivity in the NIR band. The results for the performance of RF with six explanatory variables are shown as Figure 6.

Global Continuous TanSIF Product
Based on the RF model, a global, continuous SIF dataset that included cos (SZA 0 ) at the transit time for the central latitude and longitude of each grid cell, normalized to the same level (0.2 • × 0.2 • ) as TROPOMI for the period 2017-2019 was produced. This dataset had a spatial resolution of 0.05 • and a temporal resolution of four days. Figure 7 shows the spatially continuous TanSat SIF products for July and December in 2018, together with the NDVI for comparison. Tropical rainforest areas had high SIF values in both July and December. In July, southeast and northeast China, the eastern United States, and southern Europe as well as southern Russia were hot spots of SIF, which agrees well with the distribution of dense vegetation. A comparison between models that did and did not include cos (SZA0) was also made. Taking the 2018 dataset, for example, the RF model gave a better accuracy if cos (SZA0) was included, giving an R 2 of 0.72, an RMSE of 0.30 mW m −2 nm −1 sr −1 and an bias of 0.22 mW m −2 nm −1 sr −1 ; this was against an R 2 of 0.65, an RMSE of 0.34 mW m −2 nm −1 sr −1 and an bias of 0.26 mW m −2 nm −1 sr −1 without cos (SZA0). In comparison, R 2 was improved by 0.07, RMSE was reduced by 0.04, and bias was reduced by 0.04.

Global Continuous TanSIF Product
Based on the RF model, a global, continuous SIF dataset that included cos (SZA0) at the transit time for the central latitude and longitude of each grid cell, normalized to the same level (0.2° × 0.2°) as TROPOMI for the period 2017-2019 was produced. This dataset had a spatial resolution of 0.05° and a temporal resolution of four days. Figure 7 shows the spatially continuous TanSat SIF products for July and December in 2018, together with the NDVI for comparison. Tropical rainforest areas had high SIF values in both July and December. In July, southeast and northeast China, the eastern United States, and southern Europe as well as southern Russia were hot spots of SIF, which agrees well with the distribution of dense vegetation. Since the transit time of TanSat and TROPOMI was almost the same at about 13:00 local time, the continuous TanSat SIF data were also validated using TROPOMI SIF data. In order to reduce the impact of noise, 0.05-degree TanSat SIF was aggregated to 0.2-degrees so that it could be compared with 0.2-degree TROPOMI SIF, as produced by Koehler et al. [18]. The comparison is illustrated in Figure 8. Since TROPOMI SIF is retrieved at around 740 nm, it has a higher value than TanSat SIF, which is retrieved at 758 nm. R 2 was used to evaluate the consistency between the two data sets. The Since the transit time of TanSat and TROPOMI was almost the same at about 13:00 local time, the continuous TanSat SIF data were also validated using TROPOMI SIF data. In order to reduce the impact of noise, 0.05-degree TanSat SIF was aggregated to 0.2-degrees so that it could be compared with 0.2-degree TROPOMI SIF, as produced by Koehler et al. [18]. The comparison is illustrated in Figure 8. Since TROPOMI SIF is retrieved at around 740 nm, it has a higher value than TanSat SIF, which is retrieved at 758 nm. R 2 was used to evaluate the consistency between the two data sets. The results show that the continuous TanSat SIF dataset agreed well with the TROPOMI SIF with an R 2 of 0.73, indicating that the SIF can be well modeled by integrating cos (SZA 0 ) with other metrics.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 18 results show that the continuous TanSat SIF dataset agreed well with the TROPOMI SIF with an R 2 of 0.73, indicating that the SIF can be well modeled by integrating cos (SZA0) with other metrics. The values of TanSat SIF were all lower than 2.5 mW m −2 μm −1 , whereas the maximum value of TROPOMI SIF reached around 6 mW m −2 μm −1 . Compared with TROPOMI SIF at 740 nm, TanSat SIF at 758nm was obviously lower in the high (>1 mW m −2 μm −1 ) fluorescence region but higher where the fluorescence was around 0.1-0.5 mW m −2 μm −1 .

Importance of Solar Radiation Intensity for Better SIF Modelling
The biggest problem in existing SIF prediction models is that the influence of is not well considered.
is mainly determined by heat dissipation ( ) and the fraction of PSII reaction centers ( ) [32].
increases as PAR increases, while decreases with increased PAR [59,74,75]. Therefore, the solar radiation intensity should be taken into account for better SIF modelling. The relationship between cos (SZA) and PAR is approximately linear for clear-sky conditions. Based on the above relationship, we have drawn a conceptual diagram that briefly indicates the relationship between cos (SZA0) and NPQ/q , as Figure 9. For satellite observations, only retrievals at clear-sky conditions were available; therefore, in this study, the cosine of the solar zenith angle at noon (cos (SZA0)) was used to represent the seasonal variation of solar radiation intensity.  The values of TanSat SIF were all lower than 2.5 mW m −2 µm −1 , whereas the maximum value of TROPOMI SIF reached around 6 mW m −2 µm −1 . Compared with TROPOMI SIF at 740 nm, TanSat SIF at 758 nm was obviously lower in the high (>1 mW m −2 µm −1 ) fluorescence region but higher where the fluorescence was around 0.1-0.5 mW m −2 µm −1 .

Importance of Solar Radiation Intensity for Better SIF Modelling
The biggest problem in existing SIF prediction models is that the influence of Φ SIF is not well considered. Φ SIF is mainly determined by heat dissipation (NPQ) and the fraction of PSII reaction centers (q L ) [32]. NPQ increases as PAR increases, while q L decreases with increased PAR [59,74,75]. Therefore, the solar radiation intensity should be taken into account for better SIF modelling. The relationship between cos (SZA) and PAR is approximately linear for clear-sky conditions. Based on the above relationship, we have drawn a conceptual diagram that briefly indicates the relationship between cos (SZA 0 ) and NPQ/q L , as Figure 9. For satellite observations, only retrievals at clear-sky conditions were available; therefore, in this study, the cosine of the solar zenith angle at noon (cos (SZA 0 )) was used to represent the seasonal variation of solar radiation intensity.
The results presented in this paper have confirmed that cos (SZA 0 ) can be used successfully as a proxy for solar-radiation-intensity information to produce better modelling based on satellite SIF data. A certain degree of correlation was found between cos (SZA 0 ) and tower-based and satellite SIF apparent yield. The apparent SIF yield increased as cos (SZA 0 ) increased. Analysis of the importance of different variables using the RF approach clearly showed that cos (SZA 0 ) was the most important variable for the modelling of SIF normalized . Therefore, our results were consistent with those of Gu et al. [32], which emphasized that, in theory, the solar radiation intensity is the main factor affecting the apparent SIF yield. Therefore, the solar radiation intensity should be taken into account for better SIF modelling. The relationship between cos (SZA) and PAR is approximately linear for clear-sky conditions. Based on the above relationship, we have drawn a conceptual diagram that briefly indicates the relationship between cos (SZA0) and NPQ/q , as Figure 9. For satellite observations, only retrievals at clear-sky conditions were available; therefore, in this study, the cosine of the solar zenith angle at noon (cos (SZA0)) was used to represent the seasonal variation of solar radiation intensity.  The seasonal trend in SIF at the Niwot Ridge site, where the majority of the forest consisted of needleaved evergreens, was investigated using both continuous TanSat and tower-based SIF measurements made by Magney et al. [76], and also compared with the tower-based NDVI observations, as illustrated in Figure 10. Here, we used the daily mean clearness index (CI) (only CI > 0.6 was used) to determine whether it was clear skies [77]. We calculated the daily SIF under clear-sky conditions, if there are more than three samples a day. We found that the seasonal trend in the continuous TanSat SIF values was consistent with that observed in the tower-based SIF data. The SIF signal reached its peak in summer and remained low in winter. In the first growing season, the slightly difference may be caused by observation error. The low values of SIF for evergreen needleleaf forest in winter season may be caused by snow cover [78]. Combined with in-situ APAR and tower-based SIF under clear-sky conditions, the high value of TanSat SIF may be due to the fact that the contidion of cloudy days was not considered in the model. The differences between TanSat SIF and tower-based SIF may also be due to the time scale and geographic coverage. The TanSat SIF dataset consisted of 0.05-degree instantaneous SIF values acquired at 13:30 local time, whereas the tower-based SIF values used were the daily averages for the station. NDVI was relative stable for all seasons around 0.8. Compared with NDVI, it was found that the SIF can better track seasonal changes in vegetation, which indicates that the spatially continuous TanSat SIF contains more physiological information related to plant photosynthesis.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 18 The results presented in this paper have confirmed that cos (SZA0) can be used successfully as a proxy for solar-radiation-intensity information to produce better modelling based on satellite SIF data. A certain degree of correlation was found between cos (SZA0) and tower-based and satellite SIF apparent yield. The apparent SIF yield increased as cos (SZA0) increased. Analysis of the importance of different variables using the RF approach clearly showed that cos (SZA0) was the most important variable for the modelling of . Therefore, our results were consistent with those of Gu et al. [32], which emphasized that, in theory, the solar radiation intensity is the main factor affecting the apparent SIF yield.
The seasonal trend in SIF at the Niwot Ridge site, where the majority of the forest consisted of needleaved evergreens, was investigated using both continuous TanSat and tower-based SIF measurements made by Magney et al. [76], and also compared with the tower-based NDVI observations, as illustrated in Figure 10. Here, we used the daily mean clearness index (CI) (only CI > 0.6 was used) to determine whether it was clear skies [77]. We calculated the daily SIF under clearsky conditions, if there are more than three samples a day. We found that the seasonal trend in the continuous TanSat SIF values was consistent with that observed in the tower-based SIF data. The SIF signal reached its peak in summer and remained low in winter. In the first growing season, the slightly difference may be caused by observation error. The low values of SIF for evergreen needleleaf forest in winter season may be caused by snow cover [78]. Combined with in-situ APAR and towerbased SIF under clear-sky conditions, the high value of TanSat SIF may be due to the fact that the contidion of cloudy days was not considered in the model. The differences between TanSat SIF and tower-based SIF may also be due to the time scale and geographic coverage. The TanSat SIF dataset consisted of 0.05-degree instantaneous SIF values acquired at 13:30 local time, whereas the towerbased SIF values used were the daily averages for the station. NDVI was relative stable for all seasons around 0.8. Compared with NDVI, it was found that the SIF can better track seasonal changes in vegetation, which indicates that the spatially continuous TanSat SIF contains more physiological information related to plant photosynthesis.

Importance of Reflectance and NDVI
Recent studies have used reflectance or NDVI to present most of the vegetation information used in predicting SIF [21,22,28]. In this paper, the main parameter used to provide plant physiological information for the SIF was the solar radiation intensity (as discussed in Section 2.3.1), whereas the reflectance and NDVI were mainly used to provide canopy structure-related information, which was mainly represented by the fraction of PAR absorbed by green leaves in Equation (2).
The canopy structure influences the re-absorption, scattering and escape probability of emitted SIF photons [25,50,79], and the reflectance and NDVI can provide structural information [33,80]. The

Importance of Reflectance and NDVI
Recent studies have used reflectance or NDVI to present most of the vegetation information used in predicting SIF [21,22,28]. In this paper, the main parameter used to provide plant physiological information for the SIF was the solar radiation intensity (as discussed in Section 2.3.1), whereas the reflectance and NDVI were mainly used to provide canopy structure-related information, which was mainly represented by the fraction of PAR absorbed by green leaves in Equation (2).
The canopy structure influences the re-absorption, scattering and escape probability of emitted SIF photons [25,50,79], and the reflectance and NDVI can provide structural information [33,80]. The NDVI was designed to enhance the vegetation information contained in the canopy reflectance. However, the NDVI characteristic may not be well learned by the random forest machine-learning model using red/near-infrared reflectance, although the relationship between the NDVI and the two reflectances is completely deterministic [81,82]. Therefore, the NDVI and red/near-infrared reflectance were both included in the random forest model in this study, although this may seem some redundant.
Gentine and Alemohammad [28] and Yu et al. [21] used seven MODIS reflectance bands to present SIF information. It was found that the first four bands (red, near-infrared, blue and green) can provide most of the vegetation information [39,83]. In our case, we found that the improvement in R 2 achieved by using seven bands rather than four was less than 0.1, which is consistent with the results found by Zhang et al. [22]. Therefore, only four MODIS reflectance bands were subsequently used in this study.
Near-infrared and red bands are most commonly used to represent the vegetation structural and pigment concentration information. The near-infrared band contains structural information, which influences the SIF escape probability (and also the NDVI) [50,84]. The near-infrared reflectance has a linear relationship with the canopy scattering scale, and has an inverse relationship with the leaf albedo and the amount of light intercepted by the canopy (the complement to the fraction of light that reaches the soil and does not interact with the canopy) [25,26,49] chlorophyll concentration has less impact on the escape ratio and reflectance in the near-infrared SIF [85]. The red reflectance is strongly associated with pigments apart from the xanthophyll carotenoids, and can provide information about the absorption [83,86]. Red radiation is strongly absorbed by chlorophyll, and the absorption increases with chlorophyll concentration [26,87].
The blue and green bands also contain helpful information for predicting SIF. Firstly, reflectance is mainly used to provide information related to fPAR. PAR covers the spectral window from 400-700 nm, which includes the blue, green and red bands. This means that the red band alone cannot well represent fPAR-related information. Secondly, as carotenoids transfer energy to chlorophyll molecules during photosynthesis [76], the use of the blue and green bands can help in the measurement of carotenoid concentration and variation, which are related to the amount of absorption. The blue reflectance is strong related to chlorophyll and carotenoid (including carotenes and xanthophylls) absorption; and green reflectance is sensitive to the difference between two extreme situations of xanthophyll carotenoids with higher absorption [83,86]. Thirdly, the green reflectance slowly changes with the pigment pool size at the seasonal scale [63]. Additionally, the Photochemical Reflectance Index (PRI), which is calculated form the green reflectance, is related to canopy photosynthetic efficiency over different kinds of species and time scales [88,89]. Many researchers have shown that the PRI can represent NPQ [83,90,91].
Therefore, BRDF-corrected reflectance (at the red, near infra-red, blue and green bands) and the NDVI are all important in predictions of SIF, and also provide a certain amount of structural information.

Conclusions
In this paper, we have presented an improved approach to modelling satellite SIF by taking the influence of illumination on the SIF yield into account. The cosine of sun zenith angle at noon (cos (SZA 0 )) was used to represent the illumination conditions, and together with the reflectance, NDVI, and air temperature, was used to generate a spatially-continuous TanSat SIF product with a resolution of 0.05 • for the period 2017-2019. The results showed that cos (SZA 0 ) is an important factor affecting the SIF yield. It was also shown that, of the variables in the RF model, cos (SZA 0 ) was the most important, and that the model accuracy could be significantly improved by including cos (SZA 0 ) as an explanatory variable. The RF model also modelled the TanSat SIF well (R 2 = 0.74 for 2017, 0.74 for 2018, and 0.81 for 2019). The global, continuous TanSat SIF data were highly consistent with