Estimating Crop and Grass Productivity over the United States Using Satellite Solar-Induced Chlorophyll Fluorescence, Precipitation and Soil Moisture Data

: This study investigates how gross primary production (GPP) estimates can be improved with the use of solar-induced chlorophyll ﬂuorescence (SIF) based on the interdependence between SIF, precipitation, soil moisture and GPP itself. We have used multi-year datasets from Global Ozone Monitoring Experiment-2 (GOME-2), Tropical Rainfall Measuring Mission (TRMM), European Space Agency Climate Change Initiative Soil Moisture (ESA CCI SM), and FLUXNET observations from ten stations in the continental United States. We have employed a GPP quantiﬁcation framework that makes use of two factors whose inﬂuence on the SIF–GPP relationship was not evaluated previously—namely, differential plant sensitivity to water supply at different stages of its lifecycle and spatial variability patterns in SIF that are in contrast to those of GPP, precipitation, and soil moisture. It was found that over the Great Plains and Texas, ﬂuorescence emission levels lag behind precipitation events from about two weeks for grasses to four weeks for crops. The spatial variability of SIF and GPP is shown to be characterized by different patterns: SIF demonstrates less variation over the same spatial extent as compared to GPP, precipitation and soil moisture. Thus, using newly introduced SIF–precipitation lead–lag relationships, we estimate GPP using SIF, precipitation and soil moisture data for grasses and crops over the US by applying the multiple linear regression technique. Our GPP estimates capture the drought impact over the US better than those from Moderate Resolution Imaging Spectroradiometer (MODIS). During the drought year of 2011 over Texas, our GPP values show a decrease by 50–75 gC/m 2 /month, as opposed to the normal yielding year of 2007. In 2012, a drought year over the Great Plains, we observe a signiﬁcant reduction in GPP, as compared to 2007. Hence, estimating GPP using speciﬁc SIF–GPP relationships, and information on different plant functional types (PFTs) and their interactions with precipitation and soil moisture over the Great Plains and Texas regions can help produce more reasonable GPP estimates.


Introduction
Knowledge of how global vegetation takes up atmospheric carbon dioxide is crucial for understanding the Earth's carbon cycle processes. Gross primary production (GPP), which is equivalent to the amount of carbon fixed during photosynthesis, constitutes the largest global land carbon flux that maintains ecosystem functions such as growth and respiration [1][2][3]. GPP is also closely related to (e.g., pod development stage in soybean [41,42]) and response from the plant machinery, as reflected in photosynthesis and fluorescence levels.
In this study, we intended to explore and investigate the influence of soil moisture and precipitation along with SIF on GPP quantification. Accordingly, we needed a statistical tool to establish such relationships between dependent and independent parameters. Thus, this study aims to use multiple linear regression (MLR) analysis in a statistically justified manner to include SIF, precipitation and soil moisture for the purpose of GPP quantification. We proposed to use SIF-based GPP equations constrained by local conditions and formulated such equations to characterize crop and grass plant production over the contiguous US. Note that we did not estimate GPP for vegetation types other than crops and grasses, e.g., mixed forests along the East Coast or shrublands over the Southwestern US and other US regions [43,44]. We have also tested the relationships between GPP and precipitation, and soil moisture with the inclusion of a lead-lag effect. This effect is likely related to the fact that plant water demand and availability during the development stage influence its production at later stages of the lifecycle.

Observational Data
Global Ozone Monitoring Experiment-2 (GOME-2) terrestrial chlorophyll fluorescence data product is the primary dataset being used in this study. GOME-2 provides retrievals of SIF peaking at 740 nm, which are based on the measurements from a broader spectral range of 734-758 nm [45,46]. GOME-2 SIF data have been successfully used to obtain the plant functional states related to GPP [22,[46][47][48][49]. The GOME-2 v25 level 3 dataset we used in this study covers the time period from 2007 to 2012 and has a spatial resolution of 0.5 × 0.5°.
Tropical Rainfall Measuring Mission (TRMM) 3B42 rainfall data were further used to quantify the relationship between SIF, precipitation, soil moisture, and GPP. TRMM 3B42 and 3B43 gridded estimates are on a spatial resolution of 0.25 × 0.25°in the belt extending from 50°N to 50°S. We also used TRMM 3B43 monthly precipitation based on 3-hourly rainfall estimates summed for the calendar month with rain gauge data applied for large-scale bias adjustment. The TRMM 3B43 product was used to produce SIF predictions based on dependence between SIF and precipitation.
In order to provide information on soil moisture impact on SIF and GPP, we have used the European Space Agency Climate Change Initiative Soil Moisture (ESA CCI SM) combined daily dataset  at 0.25 × 0.25°spatial resolution. This dataset represents the most comprehensive global time series of satellite-based soil moisture applicable at up to the top 5 cm of the soil. The CCI Soil Moisture product combines passive Level 2 radiometer-based products from Scanning Multichannel Microwave Radiometer (SMMR), Special Sensor Microwave Imager (SSM/I), Tropical Rainfall Measuring Mission's Microwave Imager (TMI), and Advanced Microwave Scanning Radiometer (AMSR-E) with active scatterometer-based products from European Remote Sensing satellite (ERS-1/2) and Advanced Scatterometer (ASCAT). As shown in several studies [50][51][52][53][54], ESA CCI SM data have been successfully used for a variety of applications, e.g., studying land-atmosphere coupling dynamics.
We linked GPP from FLUXNET measurements at five crop (US-Ne3, US-ARM, US-Twt, US-Tw2, US-Tw3) and five grass stations (US-KFS, US-AR1, US-AR2, US-Cop, US-Wkg) ( Table 1) to SIF, precipitation and soil moisture. FLUXNET data have been extensively used in various studies investigating processes related to the exchange of CO 2 between the land surface and atmosphere [55][56][57][58][59]. It was previously demonstrated that ecosystem-level GPP can be accurately estimated from measurements of CO 2 fluxes at eddy covariance towers [60]. FLUXNET GPP values, which are calculated as difference between total ecosystem respiration and net ecosystem exchange, are obtained at a half-hourly step and expressed in µmol/m 2 /s; therefore, we have converted GPP measurements to gC/m 2 /day and gC/m 2 /month, the units used throughout this study. GPP data are also available from Moderate Resolution Imaging Spectroradiometer (MODIS), a key instrument onboard the Terra (originally EOS AM-1) satellite that started providing global GPP products in 2000 [61]. We used monthly MOD17A2 data product available at 1 × 1 km spatial resolution as an observational reference to evaluate the performance of SIF-, precipitation-and soil moisture-based GPP estimates.
Based on the light use efficiency (LUE) concept [62], MODIS GPP is expressed as where PAR is photosynthetically active radiation, FPAR is fractional absorption of PAR, LUE is the efficiency with which absorbed radiation is converted to fixed carbon, f (VPD) is the scalar of daily vapor pressure deficit (VPD), g(T min ) is the scalar of daily minimum air temperature (T min ), and k is the extinction coefficient. Biome physiological parameters were specified with the use of a biome property look-up table (BPLUT), which was modified to agree with GPP derived from flux towers and synthesized net primary production [13].
While MODIS algorithm provides spatial patterns of GPP reasonably well and captures its temporal variability across various biome types [63], accurate GPP estimates over certain biomes are still difficult to achieve [64][65][66]. Recently, it was demonstrated that standard MOD17 GPP product substantially underestimates GPP over croplands [67][68][69], especially in summer, which is likely due to the prescribed LUE being too low [64,70]. One limitation of MOD17 product is related to the fact that it does not take into account the influence exerted on the photosynthetic capacity and consequently GPP estimation by leaf quality, which is linked to leaf chlorophyll and nitrogen content [71].
We have employed Breathing Earth System Simulator (BESS) GPP data as an additional observational reference. The BESS GPP dataset used in this study is a global 8-day 1-km product derived with the use of the simplified process-based BESS model, MODIS land and atmosphere data, reanalysis datasets and ancillary data sources [72]. BESS GPP product has demonstrated performance similar to that of MODIS GPP when compared against FLUXNET data, and better consistency with GPP modeled by MPI-BGC [73].
Normalized Difference Vegetation Index (NDVI) is also provided by the MODIS instrument. In this study, we used a 16-day composite MOD13A2 product at 1 × 1 km spatial resolution.
NDVI is an index calculated from the near-infrared and visible radiation reflected by vegetation where R N IR is the reflectance in the near-infrared (NIR) region of the spectrum, and R red is the reflectance in the red range of the spectrum. NDVI values typically vary from −1 to 1, where zero indicates the absence of vegetation and values approaching 1 generally correspond to tropical rainforests with characteristic dense vegetation. As a measure of vegetation "greenness", NDVI has proved to be useful in monitoring seasonal and phenologic activity, as well as quantifying the duration of the growing season and leaf turnover, or 'dry-down', period [74]. A time integral of NDVI values over the growing season is highly correlated with net primary production (NPP) [75][76][77], and in some cases NDVI can be used as a surrogate estimator of NPP [78].

Framework for Quantifying GPP, SIF, Precipitation and Soil Moisture Relationships
As stated in the introduction section, precipitation and soil moisture influence both SIF and GPP. Therefore, in order to avoid the replication of soil moisture and precipitation effects due to their multicollinearity, we have turned to multiple linear regression analysis.
We have used the multiple linear regression method to estimate the relative importance of fluorescence, precipitation and soil moisture for GPP quantification. Multiple linear regression is a technique widely applied in the atmospheric sciences [79][80][81]. Multiple regression is an extension of linear regression with two or more predictor variables. It is also known to take into account the interdependence between independent parameters by regulating the weights assigned to them. The coefficients of independent parameters can be high but negative depending on the relationships between them [81]. This is due to the fact that the parameters can have a strong correlation with GPP while being not totally independent of each other. MLR accounts for this interdependence so that the impacts from precipitation and soil moisture are not replicated in SIF and GPP.
The multiple linear regression equation is as follows where Y is the predicted or expected value of the dependent variable, x 1 through x p are p distinct independent, or predictor, variables, b 0 is the value of Y when all of the independent variables (x 1 through x p ) are equal to zero, and b 1 through b p are the estimated regression coefficients [82].
To derive the equations, we used weekly averaged values of precipitation, SIF, soil moisture, and FLUXNET GPP in order to have a larger number of datapoints to achieve better statistical significance. A predictor contributing the most to the equation is automatically chosen at first and then other predictors are added until and unless a predictor is statistically insignificant. The relationship between SIF and GPP is deemed a biome-dependent one [29,83]. Therefore, we have calculated the equations separately for both grass and crop vegetation types using remotely sensed SIF, precipitation and soil moisture data along with station-based GPP measurements. Our goal was to derive an equation that explains the relationships between the predicting parameters (SIF, precipitation, and soil moisture) and the predictant (GPP) the best in terms of standard error analysis and explained variance. Starting from a subset of one predictor (here, SIF), we have extended our analysis using a combination of two and three different predictors (for example, SIF and precipitation) to identify a subset of independent variables that maximizes the explained variance in GPP.
We used a fraction of our data to derive MLR equations and the rest to validate the proposed equations-namely, we chose a combination of six FLUXNET stations with the longest record to derive the model and the remaining stations to validate it. This approach was chosen due to the fact that not all the FLUXNET stations in this study have a record over the entire period of interest, i.e., 2007-2012; e.g., Twitchell corn (US-Tw2) station has available data for the period of 2 years only. Using the explained variance, analysis of the error values, and the coefficient of correlation between predicted GPP and actual GPP, the equations were derived from the FLUXNET stations with C3 crop, C3 non-arctic grasses and C4 grasses.
While estimating GPP within each grid cell, we multiplied GPP for grass and crop by corresponding plant functional type (PFT) fractions from Community Land Model (CLM) to avoid an overestimation of predicted GPP. This is because the amount of rainfall inside a grid cell is shared by different PFTs, and the recorded SIF is a combination of SIF emitted from vegetation belonging to different plant types. Therefore, we multiplied the GPP equations with the grass and crop PFT fractions, since grass and crop responses to SIF and precipitation are different and have been already accounted for by the different coefficients obtained in the equations based on 100% grass or crop stations, as stated above. It is necessary to emphasize that PFT classification from the CLM [84,85], PFT percentage and spatial patterns have been used to estimate GPP over the continental US.
It has long been recognized that water stress has a large effect on chlorophyll fluorescence parameters, which reflect plant structural and functional damage [86]. In this analysis, we focused on temporal dependence between SIF, precipitation and soil moisture. As stated earlier in the introduction, such relationships appear not to be simultaneous. Thus, production from crops received later in the season may not be directly linked to the precipitation and/or soil moisture availability during the same period. Hence, we further performed lead-lag correlation analysis using the above independent variables of SIF, precipitation and soil moisture to ascertain the temporal relationship between them.
We then introduced a new method of MLR GPP quantification, which employs knowledge of the lead-lag between precipitation and SIF. In order to do so, we have used SIF data at a given point in time, and precipitation and soil moisture from two to four weeks ago, depending on PFT, and entered these into MLR equations as predictors to calculate the predictant (GPP). MLR equations quantifying GPP were formulated for crop and grass vegetation types separately and produced GPP estimates for June, July and August (JJA) in 2007, 2011 and 2012. Calculation of predicted GPP is only performed if and when SIF, as an indicator of the activity of plant photosynthetic machinery, is greater or equal to zero.
Since all the equations linking GPP, SIF, precipitation and soil moisture were generated for C3 non-arctic grasses, C4 grasses and C3 crops only, we expect the estimates based on them to be the most robust in the areas with a high percentage of these vegetation types, such as the Great Plains and Texas ( Figure 1).

Figure 1.
Percentage of C3 non-arctic grasses (PFT 13 in CLM) (a), crops (PFT 15) (b), and their combination (c) over the contiguous US. PFT maps were derived from the 1-km IGBP (International Geosphere-Biosphere Program) DISCover dataset and the 1-km University of Maryland tree cover dataset [85]. This PFT product is modified by re-labeling the IGBP classes of the MODIS Land Cover Type 1 product. Red dots indicate FLUXNET station locations. Over regions where crop and grass PFT percentage is low (such as the East Coast or Southwest), our GPP estimation approach may not be accurate since we have not formulated equations for other PFTs. Therefore, our primary objective is to investigate and evaluate the performance of MLR equations over the Great Plains and Texas. We principally focus on finding the relationships between SIF and GPP that are hypothesized to vary for different plant types and be pertinent to environmental conditions over a specific region.

Results
While SIF has a strong and highly linear relationship with GPP and has been previously used for its quantification without auxiliary data [22,27], it is necessary to ensure that no information useful for GPP quantification purposes is lost. In our case, one caveat can potentially stem from the relatively coarse spatial resolution of the datasets. It is to be noted that while changes in soil moisture and precipitation are already reflected in corresponding SIF signal fluctuations, it is possible that these parameters might not be well delineated by SIF due to its comparatively lower spatial resolution. Therefore, in order to provide substantiation for usage of a few predictor variables in addition to SIF, we have turned to the covariogram analysis.
The covariogram might be thought of as the covariance, i.e., similarity, of point values as a function of distance between the points, thus enabling an understanding of how important spatial variability is for each given variable. Over both the Great Plains ( Figure 2a) and Texas (Figure 2b) soil moisture and GPP show a pattern of an almost linear reduction in covariance, i.e., the degree of similarity between the values of each variable at two given points becomes significantly lower as the distance between the points increases. Contrastingly, SIF covariogram is represented by a relatively straight line, which is characterized by a greater number of smaller scale fluctuations. As can be inferred from Figure 2, SIF generally demonstrates less variability over the same spatial extent compared to precipitation, soil moisture, and GPP itself, at least in part due to the coarseness of the satellite SIF data. One important aspect in GPP and SIF relationship, as seen from the covariogram analysis, lies in the fact that GPP and SIF covariance changes do not go hand in hand, and GPP covariance trends are similar to those of precipitation and soil moisture, not SIF. Since spatial variability is more significant in GPP, using SIF that is less prone to variations over the same spatial domain might lead to larger errors in GPP values if they are predicted based solely on SIF. Figure 3 demonstrates the lead-lag relationship between precipitation and SIF, soil moisture and precipitation, and SIF and soil moisture for US-Ne3 and US-KFS stations, respectively. It can be inferred that for US-Ne3 station with the dominant crop vegetation, the lag between precipitation and corresponding SIF is on the scale of 4 weeks (Figure 3a), while over US-KFS station, where grass is prevalent, this lag is about 2 weeks (Figure 3b). There is no significant lag between precipitation and soil moisture values for both US-Ne3 and US-KFS stations (Figure 3c,d), which is also expected. Similar temporal relationships between SIF, precipitation and soil moisture have also been found for the remaining FLUXNET stations used in this research (not shown).  Figure 4 illustrates the temporal relationship between plant water, expressed as soil moisture index based on ESA soil moisture, and SIF data. The soil moisture index calculations are similar to those used in the parameterization scheme for soil moisture limitation to transpiration in land surface models, e.g., Noah-Multiparameterization Land Surface Model (Noah-MP LSM) [87]: when the index approaches 1, the plant water need reaches its maximum; when the index is equal to 0, the need is at its minimum. As can be inferred from Figure 4, water need has a peak towards mid June-end of July, which roughly coincides with the timing of pollination and kernel development in corn, and the beginning of pod development in soybean-the most important periods for corn and soy plants to have an adequate water supply. This peak in water need, as expressed by soil moisture index, is followed by a peak in SIF, which is indicative of high photosynthetic rates and potentially mass gain and GPP in crops by this later time in the growing season. NDVI generally follows a dynamic similar to that of SIF: both show a peak after the highest water need is noted. NDVI demonstrates a wider peak during the growth season, which is probably due to the difference in the temporal resolution of SIF and NDVI data products used in this study. These findings show that water availability at the developmental stage of plant lifecycle is likely to affect plant SIF and GPP at later stages: when a plant does not have enough water available to meet evapotranspiration demands during the late development stage (i.e., when a plant is typically most sensitive to water stress), this can lead to a significant reduction in plant production [40,88,89], which is reflected in SIF and GPP.

Assessing the Relationships between the Predictors for Different Vegetation Types
First, we looked into (1) how satellite-retrieved SIF itself correlates with GPP from ground-based flux tower stations. Then, as shown in Figures 5-9, we estimated correlation and root-mean-square error (RMSE) for the following cases: (2) SIF-based GPP, (3) SIF-and precipitation-based GPP, (4) SIF-, precipitation-and soil-moisture-based GPP (i.e., MLR-based GPP). For the final case (5), we explored the SIF-precipitation-soil-moisture relationship using the lead-lag correlation analysis and its application for GPP quantification Figure 5 demonstrates correlation between GPP measured at the FLUXNET stations with C3 crop vegetation and GPP calculated with the use of the above-mentioned five combinations of predictors, over the entire vegetation period of a year, for 2007-2012. It can be noted that satellite SIF has a moderately strong relationship with FLUXNET GPP (Figure 5a), but the approach producing GPP estimates based on varying sets of predictors tends to yield better results, as expressed by improvements in the correlation coefficient R: e.g., from 0.65 when SIF connection to FLUXNET GPP is considered (Figure 5a) to 0.68 for the case when SIF-based GPP and FLUXNET GPP are used (Figure 5b). One curious feature is related to the existence of a "jump" in GPP quantification quality when precipitation data are taken into consideration: with the introduction of precipitation into MLR calculation, correlation coefficient R demonstrates an increase of 0.11 (from 0.68 to 0.79) as compared to the case using SIF data only (compare Figure 5b and Figure 5c); further addition of soil moisture leads to a relatively small increase in R (Figure 5d). However, the most significant rise in R is noted when the MLR equation is enhanced by the introduction of lead-lag: R increases from 0.82 to 0.98 (Figure 5e).   Figure 6 illustrates changes in the tightness of the correlation between FLUXNET GPP and C3 grass GPP calculated with the use of the same methods as in Figure 5. Similarly to Figure 5, calculated GPP estimates and FLUXNET GPP become more significantly correlated as the independent variables are introduced; it is also shown that, initially, the correlation between SIF and FLUXNET GPP without any ancillary data is relatively high (R = 0.77, Figure 6a) while that for the case of C3 crops was only 0.65 (Figure 5a). A similar pattern of increasing correlation and decreasing RMSE is also noted for C4 grasses (Figure 7a-e); the MLR-based equation with the inclusion of the lead-lag relationship yields the maximum correlation (R = 0.99) between estimated and FLUXNET GPP values. One can also notice that grass and crop vegetation cases exhibit a slightly different response to the addition of predictor variables: C3 non-arctic and C4 grasses GPP values demonstrate a rather monotonic increase in correlation with FLUXNET GPP, with small incremental steps, and one significant increase when lead-lag is considered (compare   Overall, GPP estimates derived with the use of MLR while taking the lead-lag relationship into consideration are close to those from FLUXNET network: at the crop stations peak mean monthly GPP can be up to 20 gC/m 2 /day, while for the grass sites mean monthly GPP does not exceed 10 gC/m 2 /day. These findings agree well with those reported by Guanter et al. [22]. We have also explored how the inclusion of lead-lag into the relationship between predictors influences GPP estimates for individual plant species. Alfalfa and rice, both C3 crop species, cultivated at US-Tw3 and US-Twt stations, respectively, were selected for this purpose. Figures 8a and 9a reveal that the correlation between SIF and flux tower GPP estimates demonstrates a moderate strength of relationship (0.57 and 0.41 for alfalfa and rice, respectively). It can be seen that for the rice station, the addition of precipitation (compare Figure 9b and Figure 9c) into the GPP equations has played a substantial role in the improvement of the correlation between MLR GPP estimates and respective FLUXNET GPP values, while such a change in R is not noted for alfalfa vegetation. These results imply that the sensitivity of crop production to various environmental parameters such as precipitation is an essential factor to consider when investigating crop production. Moreover, it is necessary to analyze the interplay between such sensitivities: e.g., rice, which is often planted during the tropical rainy season, has a water demand of 450-700 mm/total growing season while alfalfa water need is about two times higher and comprises 800-1600 mm/total growing season. However, rice is known to be less drought-resistant than alfalfa [40] and also to be more significantly impacted by precipitation levels than other crops such as wheat and maize [91], which might explain why the addition of precipitation to the MLR GPP equation leads to the pronounced improvement in GPP estimates for rice but not for alfalfa.  Overall, as in the previous cases with C3/C4 grasses and C3 crops considered as a whole, GPP estimates for individual plant types demonstrate an analogous response to the introduction of more predictor variables: the correlation between predicted and observed GPP becomes more significant with the addition of predictors, and especially lead-lag connection.
It is apparent from our findings that knowledge of plant lifecycle, lead-lag relationships, and the sensitivity of different plants to various environmental parameters is crucial for the purpose of plant production quantification. In order to provide more support for the proposed framework aimed at better MLR GPP estimation, we have calculated mean values and errors for the predicted GPP (MLR equation with lead-lag), and the reference datasets, MOD17A2 and FLUXNET GPP, within the CLM grid box surrounding US-ARM, US-AR1 and US-AR2 stations. Figure 10 shows that our mean monthly (for JJA of all the years) total (grass and crops combined) GPP within the grid box (315 gC/m 2 /month) is consistent with FLUXNET stations' observations (394 gC/m 2 /month). Although MODIS captures the drought trend, the observed values of GPP within the grid box surrounding the stations are only 123 gC/m 2 /month. This shows that our predicted, i.e., MLR-based with lead-lag, GPP estimates are the closest to the actual plant production, as inferred from FLUXNET flux tower observations.  Figures 11 and 12 show total predicted GPP over Texas and the contiguous US, respectively. As mentioned in the data and methodology section, we have calculated GPP from the MLR equation using GOME-2 SIF, TRMM precipitation, and ESA CCI soil moisture data, with the inclusion of lead-lag relationship between SIF and precipitation. Figure 11 shows that predicted GPP over Texas (within 30-36°N/95-105°W grid box) in 2011 is significantly lower than that of 2007. The area with GPP values higher than 75 gC/m 2 /month decreased significantly from 2007 to 2011, and specifically from June 2011 to August 2011, which is not typical for a normal yielding year. Figure 11d-f demonstrate that in 2011 drought conditions were limited to Texas as there was no significant change in GPP over the Great Plains (Figure 12d-f). These findings indicate that our MLR-based approach with lead-lag is able to capture the drought-related reduction in GPP over Texas in 2011. Figure 12 shows that maximum GPP values found over the Great Plains (Figure 12a-c) in the normal yielding year amount to ∼500 gC/m 2 /month, which agrees well with FLUXNET data [22]. Predicted GPP indicates that 2012 was a severe drought year over the Great Plains as compared to 2007. The impact of the drought is clearly seen in 2012, as GPP decreased significantly compared to 2007 and 2011 (Figure 12g-h). Figures 11 and 12 show that our predicted GPP estimates are capable of capturing the 2011 Texas drought effect and also the progression of the drought over the Great Plains region (along with Texas) in 2012. These results demonstrate the robustness of the approach presented in this research study, both in terms of the magnitude of GPP values and their trend.   We have also compared our results with MODIS observational reference data: Figure 13 shows that MODIS GPP values are about 50% lower than those of predicted gross primary production. GPP values based on MODIS satellite data range between 200 and 300 gC/m 2 /month over the Great Plains. MODIS is able to detect a drought signal over the Great Plains in 2012 (Figure 13g-i) and Texas in 2011 (Figure 13d-f), as areas with GPP < 100 gC/m 2 /month spread to the north and east, towards the Great Plains. MODIS observations also provide evidence of no reduction in GPP over the Great Plains in 2011, which is expected and also confirmed by monthly US Drought Monitor reports for JJA in 2011. Moreover, MODIS GPP over the Great Plains in August 2011 was higher than that in August 2007 and is consistent with the trend we observed in the predicted GPP (Figure 12c,f). Thus, our predicted GPP agrees well with MODIS GPP values in terms of capturing the drought conditions effect on plant production. Our MLR-based approach with lead-lag consideration can be used to provide GPP estimates that are more accurate in terms of absolute values than MODIS reference data.  Figure 14 indicates that BESS GPP values are typically lower than MLR GPP presented in this study and are slightly higher than MODIS GPP, which is likely related to the fact that the BESS algorithm considers the differences in C3/C4 photosynthetic pathways, while MODIS algorithm uses the same LUE max across all vegetation types [92]. BESS is also capable of capturing the drought propagation in water stress years. It can be noted that in 2011 Texas experienced a significant decrease in GPP (below 100 gC/m 2 /month), which is also noticeable in 2012. GPP over the Great Plains in 2012 demonstrated a decrease, which is especially pronounced in July-August 2012 as compared to 2007 and 2011 (Figure 14h  More detailed information about the stations can be found in Table 1.

Discussion
This research study aims specifically at investigating how already existing satellite SIF products (on the example of GOME-2) can be complemented by other parameters, such as precipitation and soil moisture, in order to produce reasonable GPP estimates. To our knowledge, this is also the first study to quantitatively assess the lead-lag relationship between water availability in the critical plant development period and its reflection in GPP and SIF at later stages of the plant lifecycle. In this research, we have derived separate equations quantifying relationships between SIF and GPP based on the differences in grass and crop plant functional types and taking into account precipitation and soil moisture conditions over the contiguous US.
In this regard, a factor that has the potential to complicate the relationship between SIF itself and GPP is the sun-satellite view observation geometry, which could introduce unwanted variation in observational SIF and cause large uncertainties in the SIF-GPP relationship. He et al. [93] demonstrated that angular normalization of SIF by hot spot direction could improve the correlation between SIF and GPP by 0.04 ± 0.03-0.07 ± 0.04 on average, especially in deciduous broadleaf forests. The angular normalization as a prerequisite step in using satellite SIF products will be addressed in future studies.
We have used well established MODIS products as an observational reference along with more recent BESS data. Other newly available GPP products such as Vegetation Photosynthesis Model (VPM) [71] and FLUXCOM [94] can potentially serve as additional benchmarks for the evaluation of GPP estimates and might be useful for future research. Near-infrared reflectance of vegetation (NIRv, [95]) will also be used in future studies, as it has demonstrated a strong relationship with both measured and modeled GPP and mostly resolves the mixed-pixel problem arising from the need to disentangle the contribution of vegetation to the observed spectral signal in remotely sensed data.
We have performed GPP estimation and employed precipitation and soil moisture datasets pertinent to the contiguous US only. Similarly, as seen from this study, FLUXNET measurements from other regions such as Europe, East and Southeast Asia, South America can be used, depending on prevalent biome types, precipitation, and soil moisture content to establish such SIF-GPP relationships.
The lead-lag relationship discussed in this study is likely to differ for various biomes represented by different PFTs, as shown in our analysis. This fact is to be taken into account if similar research is to be conducted for vegetation that cannot be classified as crops or grasses. For example, deciduous trees might exhibit more resistance to drought conditions compared to crops due to their trunk water storage, which may result in a longer delay between precipitation and fluorescence levels. Hence, there might be a different lead-lag relationship between tree production and precipitation, which warrants further investigation.
Our findings indicate that SIF-GPP relationships tend to vary not only between vegetation types, but also between crop species specifically. This is reflected in a significant improvement in agreement between FLUXNET and predicted MLR GPP when precipitation is considered for rice, but not for alfalfa. These results imply that differing sensitivity of crop production to various environmental parameters such as precipitation is an important factor to be taken into account when examining and quantifying crop production.
It is also important to examine possible differences in GPP quantification related to the fact that most plants follow C3 or C4 photosynthetic pathways. For instance, C3 plants are expected to be more sensitive to precipitation than C4 vegetation [96]: such a differential plant response might need to be accounted for in further research focused on plant production, especially in the light of the global climate change, which might have implications for plant productivity.

Conclusions
In this study, we have modeled gross primary production with the use of solar-induced chlorophyll fluorescence, precipitation and soil moisture, while considering differences in grass and crop plant functional types over the contiguous US. Our results show that correlation (root-mean-square error) between the FLUXNET and modeled gross primary production increases (decreases) from 0.41 (8.97 gC/m 2 /d) to 0.99 (0.13 gC/m 2 /d) with the addition of extra parameters such as precipitation and soil moisture and inclusion of the lead-lag relationship. Mean monthly GPP values calculated for summers in 2007-2012 with the use of the multiple linear regression approach and lead-lag are comparable to the FLUXNET measurements, while reference MODIS and BESS datasets demonstrate a tendency to underestimate GPP by approximately 50%. GPP values calculated with the use of the MLR approach and lead-lag also capture the drought trends successfully and are consistent with the observational trends from MODIS and BESS. In 2011, modeled GPP over Texas is approximately 50-75 gC/m 2 /month lower than in a normal yielding year of 2007; over the Great Plains, such a departure is comparable to that over Texas, as the area with production greater than 400 gC/m 2 /month was reduced significantly in July and August 2012.
Our results demonstrate that GPP values based on various combinations of predictors, i.e., SIF, precipitation and soil moisture, have lower correlation coefficients and higher RMSE until and unless lead-lag relationship between the above mentioned predictors is considered. Since a plant is most vulnerable to water stress in its development period, we expect that plant production later in the season would depend on precipitation and soil moisture during the development stage. This study highlights the importance of water availability for plants during their development stage and its influence on plant production later in the season. It is also the first one to demonstrate that incorporation of plant physiological variations associated with water availability and demand at different stages of its lifetime can play an important role in estimation of plant production [97][98][99][100], especially under drought conditions that are likely to increase in frequency and severity in future.
This study further supports the claim that relationships between SIF and GPP are different for crops and grasses; using the statistically justified inclusion of regional precipitation and soil moisture, such relationships can be derived and used to improve GPP estimates.  Acknowledgments: GOME-2 v25 level 3 SIF data are available at http://avdc.gsfc.nasa.gov. TRMM 3B42 and 3B43 products were obtained from https://mirador.gsfc.nasa.gov/. MODIS and FLUXNET GPP data are publicly available at https://lpdaac.usgs.gov/ and https://fluxnet.ornl.gov/, respectively. ESA CCI soil moisture products are provided by the European Space Agency at http://www.esa-soilmoisture-cci.org/. BESS GPP data can be accessed at http://environment.snu.ac.kr/bess_flux/

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: