Feasibility of Using MODIS Products to Simulate Sun-Induced Chlorophyll Fluorescence (SIF) in Boreal Forests

: Solar-induced chlorophyll fluorescence (SIF) is a novel approach to gain information about plant activity from remote sensing observations. However, there are currently no continuous SIF data produced at high spatial resolutions. Many previous studies have discussed the relationship between SIF and gross primary production (GPP) and showed a significant correlation between them, but few researchers have focused on forests, which are one the most important terrestrial ecosystems. This study takes Greater Khingan Mountains, a typical boreal forest in China, as an example to explore the feasibility of using MODerate resolution Imaging Spectroradiometer (MODIS) products and Orbiting Carbon Observatory-2 (OCO-2) SIF data to simulate continuous SIF at higher spatial resolutions. The results show that there is no significant correlation between SIF and MODIS GPP at a spatial resolution of 1 km; however, significant correlations between SIF and the enhanced vegetation index (EVI) were found during growing seasons. Furthermore, the broadleaf forest has a higher SIF than coniferous forest because of the difference in leaf and canopy bio-chemical and structural characteristic. When using MODIS EVI to model SIF, linear regression models show average performance (R 2 = 0.58, Root Mean Squared Error (RMSE) = 0.14 from Julian day 145 to 257) at a 16-day time scale. However, when using MODIS EVI and temperature, multiple regressions perform better (R 2 = 0.71, RMSE = 0.13 from Julian day 145 to 241). An important contribution of this paper is the analysis of the relationships between SIF and vegetation indices at different spatial resolutions and the finding that the relationships became closer with a decrease in spatial resolution. From this research, we conclude that the SIF of the boreal forest investigated can mainly be explained by EVI and air temperature. SIF and while for SIF simulation in boreal forests based on the enhanced vegetation index (EVI) and temperature.


Introduction
Forests are important terrestrial ecosystems which contribute more than 70% of the terrestrial carbon pool [1]. Monitoring the status of forests from space is important for forest management [2]. Remote sensing has provided many global vegetation indices which rely on canopy reflectance in different spectral channels and has been widely used in forest research [3][4][5][6]. Over the past 30+ years, vegetation remote sensing based on "greenness" vegetation indices has greatly improved our understanding of the biosphere at a large scale. However, the vegetation indices (VIs) only explore

Study Region
Greater Khingan Mountains are part of the coniferous forests of northern Eurasia and the largest virgin forest in China with an area of 8.3 × 10 6 hm 2 ( Figure 1). The direction of the main vein of Greater Khingan Mountains is NNE-SSW, and direction of the branch vein (the Yilehuli Mountain) is WWN-EES [36]. The region has a mountainous topography with elevation ranging from 260 m to 1700 m. The zonal vegetation in Greater Khingan Mountains is typical taiga dominated by Dahurian larch (Larix gmelinii), with areas of temperate and mixed broadleaf-conifer dominated by deciduous broadleaf birch (Betula platyphylla) and aspen (Populus davidiana and Populus suaveolens) [37].
The study region has cold temperate and continental monsoon climates with an annual average temperature of -2.8℃, mean annual maximum and minimum temperatures of 35℃ and -40℃, respectively, and mean annual precipitation of about 460 mm. The growing season in this area is relatively short (150 days), and the frost-free period is also relatively short (~80-100 days) [37]. The period between May 10 and September 14 (Julian period 145 to 257) was selected as the growing season to explore the possibility of using MODIS products to simulate SIF.
In this study, we aim to determine whether MODIS products can be used to estimate SIF in the boreal forest. The first step was to identify the forest. When compared MODIS NDVI with Landsat 8 Operational Land Imager (OLI) images, we found that when the normalized difference vegetation index (NDVI) in July is greater than 0.8, the land cover is forest. By using this threshold, the regional forest distribution of Greater Khingan Mountains was extracted to estimate SIF.

OCO-2 Data
The OCO-2 satellite, launched on July 2, 2014, is one of the National Aeronautics and Space Administration's (NASA) mission satellite designed to monitor CO2 concentrations in the atmosphere. It ascends across the equator at 1:30 Local Standard Time (LST) and has a 16-day re-visit time. Each second, OCO-2 acquires 24 data points with a ground pixel size of 1.3 km × 2.25 km.
A novel data product of OCO-2 is SIF, which was evaluated at 757 nm and 770.1 nm of the OCO-2 O2 A-band, and is considered an indicator of the plant functional status [12,28]. OCO-2 acquires data in three different observation modes: nadir, glint, and target. Nadir mode is used to view the ground directly below the spacecraft. Glint mode is used when tracking near the location with direct, reflected sunlight. Target mode collects a large number of measurements from calibration/validation sites [40].
In this study, cloud-free nadir observations of offset-corrected OCO-2 SIF (Version 8r) were used to explore the relationship between MODIS products and SIF at a regional scale. We averaged both OCO-2 SIF retrieval results (valid at 757 and 770.1 nm) after multiplying SIF at 770.1 with 1.7 to account for the decrease in the wavelength of SIF in this part of the spectrum, as documented in the OCO-2 Data Product User's Guide and Yu, et al. [41]. To make SIF data more reasonable, we deleted SIF values lower than 0.1 W m -2 µm -1 sr -1 .

MODIS Products
MODIS is a key instrument aboard the satellites Terra (which crosses the equator in the morning) and Aqua (which crosses the equator in the afternoon). Terra MODIS and Aqua MODIS sample the entirety of Earth's surface in a 2330-km swath, providing near-global daily coverage. MODIS acquires data in 36 spectral bands from 0.4 to 14.4 µm at three spatial resolutions: 250 m (bands 1 and 2), 500 m (band 3-7), and 1,000 m (band . It covers visible, near-infrared, shortwave infrared, and thermal-IR regions in the electro-magnetic spectrum [6,38,42]. Many MODIS products at different spatial and temporal scales have been released on an official website. They are freely downloadable and have been widely used in many studies. MODIS NDVI, EVI, leaf area index (LAI), fraction of absorbed photosynthetically active radiation (FAPAR), and GPP were used in this study. NDVI and EVI were extracted from MOD13A2, with 1 km spatial resolution and a 16-day composite. LAI and FAPAR, with a 500 m spatial resolution and 8-day composite, were extracted from MOD15A2H. GPP were extracted from MOD17A2H at a spatial resolution of 500 m and 8-day composite. All data were downloaded from https://ladsweb.modaps.eosdis.nasa.gov.

Meteorological Data
The meteorological data in this study contain two parts: one is MERRA-2, a reanalysis data product from NASA/GMAO (Global Modeling and Assimilation Office) with a resolution of 0.5° × 0.625°. Daily surface downwelling PAR data from MERRA-2 [43] were selected to analyze their relationship with SIF. The other meteorological data used in this study are temperature and precipitation data from the National Meteorological Information Center (NMIC, http://data.cma.cn/).

Data Processing
To maintain the concordance between MODIS and OCO-2 data, we selected a 16-day temporal resolution at a spatial resolution of 1 km. FAPAR, LAI, and GPP data were resampled to 1 km and compiled to 16 days using the maximum value composition (MVC) technique [44]. As mentioned above, the re-visit cycle of OCO-2 is 16 days, and point data were obtained during its clear-sky overpass. SIF data were also compiled in 16 days and grouped according to MODIS products. To avoid spatial autocorrelation, 200 SIF points were randomly selected every day for 16 days to calculate their relationship with vegetation. Squares with 1 km length that cover the study region were built, and then the mean value of SIF and the vegetation index values of NDVI, EVI, LAI, FAPAR, and GPP within each square were extracted.

Temporal Variation of SIF in the Large Study Area
We calculated the average SIF and EVI values of the study region for each of the 16 days from 2015 to 2018 ( Figure 2). During the beginning of the growing season, SIF increased from Julian day 145 (May 10-25) and reached its peak value on Julian day 209 (July 13-28), before subsequently decreasing. Autumn arrives early in the study region of Greater Khingan Mountains because of higher latitudes (at 46°-53° N), and on Julian day 257 (October 30-September 14) the SIF value was quite low. The SIF and EVI curves show similar change trend, but the peak value of SIF lags behind that of EVI. Figure 2 shows that the trend of the SIF values was quite similar to the plant growth cycle, and SIF has been advised in phenology studies.

SIF-EVI and SIF-GPP at a Regional Scale
To understand the relationships of SIF-EVI and SIF-GPP at the regional scale, the mean values of EVI, GPP, and SIF for the study region were calculated for each 16-day period, and a scatter plot was drawn ( Figure 3). Here, the MODIS GPP product was used to represent the true GPP value. The relationship between SIF and GPP (R 2 = 0.74, n = 8, P < 0.01) was closer than that between SIF and EVI (R 2 = 0.57, n = 8, P < 0.05). This shows that, at a regional scale, EVI and GPP had significant correlations with SIF. The slope between SIF and EVI was 2.90, and the value between SIF and GPP was 10.70. However, this result only reflects the relationships at a regional scale. To understand more accurate relationships between vegetation indices and SIF, a more detailed analysis at a local scale is needed.

EVI/GPP
two slope directions, Lingnan (the southern slope of the mountains) and Lingbei (the northern slope of the mountains) ( Figure 4). The broadleaf tree ratio in the south of Yilehuli Mountain is higher than that in the north [36], and a broadleaf tree usually has a higher LAI than a needle leaf tree. To understand the relationship between vegetation indices and SIF, vegetation index values corresponding to OCO-2 SIF points were extracted. Thanks to the highly efficient data acquisition techniques of OCO-2, there are thousands of SIF data points and density distributions over the course of 16 days. To eliminate the spatial autocorrelation in SIF, we randomly selected 200 points with point distances longer than 5 km (on some Julian days, the random points did not reach 200). Figure 4 shows an example of random point extraction on Julian day 2015193. SIF can reflect a part of actual photosynthesis. NDVI and EVI are the vegetation indices that reflect vegetation greenness, and EVI is widely used in forest regions because it is less affected by the saturation effect compared to the NDVI [33]. Significant correlations (P < 0.05) between SIF and EVI in the growing seasons in 2015 and 2016 were found ( Figure 5), which are different from the results of former studies and those at a regional scale. The correlation coefficients of SIF-NDVI, SIF-GPP, SIF-LAI, and SIF-FAPAR were also calculated, but no significant correlations were found.
The correlation coefficient between EVI and SIF changed over the course of 16 days; however, it steadily reached a peak value on Julian day 209 and subsequently decreased. This is similar to the pattern observed in Figure 2.

SIF-EVI Relationships at Different Spatial Resolutions
We examined the correlation between SIF and EVI at spatial resolutions of 5 km, 10 km, 20 km, 30 km, 40 km, and 50 km using the growing-season data from 2015 and 2016. The EVI was resampled to the corresponding spatial resolution, and average SIF values were obtained within each EVI pixel. From the result (Figure 6), we found that the R 2 values were different for different spatial resolutions, with a clear trend of lower spatial resolution with higher R 2 value. Between Julian days 177 and 241, R 2 values for the 20 km and 30 km spatial resolutions were higher than that for 10 km but lower than that of the 40 km spatial resolution. On Julian days 193 and 225, R 2 values of 20 km were higher than those of 30 km and on other days, R 2 values of 30 km were higher than those of 20 km.

The Relationships between Meteorological Factors and SIF
Some researchers agree that SIF is affected by meteorological factors such as precipitation, temperature, and PAR [28,[45][46][47]. We examined the relationships between SIF and meteorological factors on daily and 16-day time scale. Figure 7 shows the relationships between SIF and meteorological factors at 16 days and at a regional scale. Significant correlations between SIF and temperature and SIF and radiation (R 2 = 0.68, P < 0.001; R 2 = 0.49, P = 0.005, respectively) were found. Precipitation had no significant correlation with SIF at the regional scale which indicated that precipitation was not the factor restraining plant growth in Greater Khingan Mountains.
Daily mean meteorological parameters and SIF at the regional scale were also calculated, and scatter plots were drawn (Figure 8). Only temperature had a significant correlation with SIF (R 2 = 0.40, P < 0.001). Based on the significant correlations between SIF and temperature at both 16-day and daily time scales, we concluded that, at the regional scale, temperature may affect SIF via photosynthesis. To understand the relationship between temperature and SIF at a 16-day time scale, we downloaded the daily mean temperatures of 27 ground stations within and around the study region in 2015 and 2016. We used kriging interpolation to obtain spatial continuous maps of air temperature (Figure 9). The results are shown in Figure 10. There was a significant correlation between SIF and temperature in the growing season, except on Julian day 257 when vegetation became yellow, and when the chlorophyll content decreased in the autumn. In the growing season, canopy capacity and SIF increased with increasing air temperatures.

SIF Differences between Vegetation Types
To understand the SIF values between needle and broadleaf plants, we selected the highest 5% (among all 1519 SIF points) and lowest 5% (among all 1482 points) SIF values of each Julian day and analyzed the spatial distribution of high and low SIF values. We found that 77.75% (1181 out of 1519) of the highest SIF values and 69.23% (1026 out of 1482) of the lowest SIF values were distributed in the southern and northern Yilehuli Mountain, respectively ( Figure 11). Photosynthesis between coniferous and broad-leaved forests is different because of the differences in leaf characteristics and canopy structure [30,48]. The average temperature in Lingnan is higher than that in Lingbei because of the lower latitude and the southerly aspect, and these are other reasons for higher SIF. Therefore, SIF increased due to the increase in photosynthesis and temperature. Compared with coniferous forests, broad-leaved forests usually have a larger leaf area, higher rates of photosynthesis [48], and then higher GPP and SIF, which is an important conclusion of this research. This result is in good agreement with the findings of Rossini, et al. [30], who argued that different vegetation types are usually characterized by different SIF values. In general, the highest SIF values are found in croplands, followed by broadleaf and needleleaf forests [30]. Temperature and perpetually frozen soil in northern Greater Khingan Mountains may also be the reason for the lower SIF values. The northern Greater Khingan Mountains are a permafrost region [49] with poor plant growth and low photosynthetic activity.

The Different SIF-GPP Results between Our Study and Previous Research
Many researchers argue that SIF is sensitive to photosynthesis and closely related to GPP at canopy and ecosystem scales because both SIF and GPP depend on absorbed radiation [21,32]. However, the relationships are affected by many factors such as absorbed photosynthetically active radiation (APAR) [50], vegetation types [51] or canopy complexity [7]. SIF provides insight into GPP which helps us gain a more mechanistic understanding of the ecosystem carbon cycle [12,52]; however, the correlations are different under various conditions. Contrary to previous studies, we did not find any significant relationship between SIF and GPP due to spatial resolution, vegetation type, and GPP accuracy.
However, previous studies used coarse resolution or limited sampling points. When decreasing the spatial resolution of remote sensing data, vegetation indices only reflect the mean value of a region because of mixed pixels. Due to mixed pixels at coarse spatial resolutions, the true SIF values of each vegetation type cannot be characterized. Thus, the objective relationships between vegetation types and SIF cannot be obtained. To eliminate the effect of mixed pixels, we need to understand the correlations at fine spatial resolutions. As mentioned above, SIF reflects vegetation photosynthesis and differences in vegetation types. In our study, no significant correlations were found between SIF and MODIS GPP at a 1 km spatial resolution in each 16-day interval.
SIF and GPP are closely (indirectly) related through APAR which is affected by canopy structure, leaf biochemical parameters, sampling time, etc. Thus, in croplands, grasslands, and rice paddy fields, ecosystems mainly determined by the chlorophyll content, SIF and GPP were found to be closely related [45,51]. In a word, different vegetation types in previous studies is another reason for no significant SIF-GPP relationship.
The semiempirical MODIS GPP model (MOD17) relies on the combination of satellite-based estimates of the FAPAR and tabulated light use efficiency [21]. Due to the algorithm of MODIS GPP products, the accuracy was affected by the uncertainties from various upstream inputs, and the products have always been questionable [5,55]. GPP obtained from eddy covariance (EC, GPP flux tower) is considered the most accurate GPP data. However, a flux tower has a small sampling area (< 1 km 2 ) and is unevenly distributed [51]. Sun, et al. [54] demonstrated that SIF-GPPflux tower (R 2 = 0.99) was higher than SIF-GPPMODIS (R 2 = 0.90) because of a higher flux tower GPP accuracy in a forest based on a few samples. MODIS GPP accuracy was also the reason that the correlation of SIF-GPP was affected.
GPP can be simulated using SIF at both regional and point scales because of the close relationships between them, but SIF cannot be simulated by MODIS GPP data because of the uncertainty of MODIS GPP data. Figure 5 shows significant relationships of SIF-EVI over the course of 16 days and at a 1 km spatial resolution. Significant correlations were found from Julian day 145 to 241, and R 2 values of 0.30 on Julian day 145 increased to 0.52 on Julian day 193 and subsequently decreased to 0.24 on Julian day 241.

The Correlations of 16-Day SIF-EVI
Many researchers have studied the relationships between SIF and EVI. Guanter, et al. [21] found R 2 = 0.80 between GOSAT SIF and MODIS EVI. Wood, et al. [56] obtained a strong linear regression (R = 0.69) between MODIS EVI and SIF in a cropland at the OCO-2 footprint level. Li, et al. [11] found a significant correlation between OCO-2 SIF and MODIS EVI, which was slightly stronger than the relationship between OCO-2 SIF and flux-tower observed GPP. Using eddy flux tower data, Yang, et al. [45] observed strong linear relationships (R 2 = 0.76) between SIF and absorbed photosynthetically active radiation (APAR) which was quite similar to that between SIF and EVI. In this research, significant correlations between EVI and SIF were found at the point scale. However, due to saturation in forest regions, nonlinearity was apparent between NDVI and SIF. Figure 6 shows the SIF-EVI relationships at different spatial resolutions for forest ecosystems. It shows that with decreasing spatial resolution, R 2 increased. Higher correlations for each spatial resolution appeared on Julian day 193, 209, and 225 (July and August). Li and Xiao [28] and Duveiller and Cescatti [35] agreed that most ecosystems are highly heterogeneous at spatial scales larger than 5-10 km, which implies that the characteristic scale of most ecosystems is between 5-10 km. In this study, we found that similar R 2 values appeared at about 30 km, which indicated that the characteristic scale of SIF-EVI was about 30 km in the forest region studied. Figure 6 shows the obvious scale effect of SIF-EVI in forests.

Potential of Model SIF from MODIS and Meteorological Data
Simulated SIF results based on only EVI and EVI combined with temperature were compared. When simulated using only EVI, the EVI data of 2018 were selected, the liner regression model of Figure 5 was employed, and the simulated SIF of each 16-day interval (SIFModeled) was obtained. Taking OCO-2 of 2018 as the observed value, scatter plots of SIFModeled and SIFOCO-2 are shown in Figure  12 EVI, as an index of greenness, can reflect vegetation canopy and leaf area and explain a part of SIF. Li and Xiao [28] used MODIS EVI and meteorological factors to simulate SIF and found a similar conclusion that temperature affects plant photosynthesis and SIF. However, in boreal forests, we only found a significant correlation between SIF and temperature under similar meteorological conditions. As Figure 10 shows, SIF had a significant relationship with daily mean temperature at the 16-day time scale (except for Julian day 257). Multi-regression models were built using EVI and temperature (Table 1), and simulated results are shown in Figure 13. As there was no significant correlation between temperature and SIF on Julian day 257, SIF values from Julian day 145 to 241 were simulated. The scatter plot information between OCO-2 observed data and the multi-regression model simulated SIF is shown in Table 1. The R 2 values improved compared to Figure 12, suggesting that EVI and temperature explained a large proportion of the variation in SIF.  For the boreal forest, our predictive multi-regression model (R 2 = 0.71, RMSE = 0.13 W m −2 µm −1 sr −1 ) performed slightly better than that of Yu, et al. [41] (R 2 = 0.46, RMSE = 0.065 W m −2 µm −1 sr −1 ) and Li and Xiao [28] (R 2 = 0.67, RMSE = 0.07 W m −2 µm −1 sr −1 ). Obviously, our result had a higher R 2 and RMSE; a higher R 2 indicates higher simulation accuracy, but a larger RMSE means greater simulation error. In this study, the larger RMSE was the result of a small number of training samples. As mentioned above, to avoid spatial autocorrelation, 200 SIF points were randomly selected for each 16-day interval, and a small training sample has a larger prediction error [28,57].
All three studies used a similar spatial resolution of 0.05° or 1 km (but different data processing methods). The different simulation accuracies of the three studies were also due to different data processing: Yu, et al. [41] used the mean value of 5 points for the same land cover type to improve the accuracy of SIF; Li and Xiao [28] used EVI, PAR, vapor pressure deficit (VPD), and air temperature to model SIF, and both studies simulated SIF for all the land cover types. Our research focused on only the boreal forest, and we did not find any significant correlation between environmental conditions and SIF, except for air temperature; this study only used EVI and air temperature to simulate SIF at a 1 km spatial resolution.

Conclusions
SIF images will improve our understanding of carbon cycles in forest ecosystems. Due to complex vegetation canopy and topographic conditions, it is challenging to accurately model SIF in boreal forests. To obtain continuous SIF images of forests at finer spatial resolutions, MODIS EVI and meteorological temperature data were selected to build multi-regression models in the growing season of a boreal forest, and conclusions were drawn.
In the growing season, forest SIF increased from Julian day 145 to 209 and subsequently decreased until Julian day 257 which is in agreement with the plant-growing season. Broad-leaved forests usually have a higher SIF than needle forests because of a number of complex effects related to the leaf and canopy bio-chemical and structural variables, as well different efficiencies in using absorbed sunlight.
Contrary to previous research, no significant relationship between SIF and GPP was found in this study due to different spatial resolutions, GPP dataset accuracy, and vegetation types. A significant correlation was found between SIF and EVI, and the correlation coefficients depend on the spatial resolutions; they are inversely proportional to spatial resolutions. Relationships between SIF and temperature and precipitation and PAR on daily and 16-day time scale were also calculated, and a significant correlation between SIF and temperature was found at both daily and 16-day time scale.
Finally, spatially continuous maps of SIF with a high spatial resolution were computed by means of a multiple regression model between MODIS EVI and air temperature; satisfactory results were obtained (R 2 = 0.71, RMSE = 0.13 W m −2 µm −1 sr −1 ), i.e., better than only using MODIS EVI (R 2 = 0.58, RMSE = 0.14 W m −2 µm −1 sr −1 ), indicating that in forest regions, MODIS EVI and temperature could explain SIF.