An Algorithm for Gross Primary Production ( GPP ) and Net Ecosystem Production ( NEP ) Estimations in the Midstream of the Heihe River Basin , China

An accurate estimation of carbon fluxes is very important in carbon cycle studies. A remote sensing based gross primary production (GPP) and net ecosystem production (NEP) algorithm, RS-CFLUX, was presented in this work. The algorithm was calibrated with Markov Chain Monte Carlo (MCMC) method at Daman superstation and Zhangye wetland station in the midstream of the Heihe River Basin. Results indicated that both of the stations present high GPP (1442.04 g C/m2/year at Daman superstation and 928.89 g C/m2/year at Zhangye wetland station) and NEP (409.38 g C/m2/year at Daman superstation and 422.60 g C/m2/year at Zhangye wetland station). The RS-CFLUX model can correctly simulate the seasonal dynamics and quantities of carbon fluxes at both stations, using photosynthetically active radiation (PAR), land surface temperature (LST), normalized difference water index (NDWI) and enhanced vegetation index (EVI) as input. RS-CFLUX model were sensitive to maximum light use efficiency, respiration at reference temperature, activation energy parameter of respiration.


Introduction
Vegetation can be affected by climate change through the energy and mass exchange between vegetation and atmosphere [1][2][3].Carbon flux is a mass exchange that is related to global warming [4,5].Accurate estimation of carbon flux plays an important role in carbon cycle study.Generally, two types of methods can be used to estimate carbon flux: observing and modeling.Observing instrument (such as Eddy covariance) can measure the continuous exchange of carbon flux at a specific site with an area of several square kilometers.However, model can efficiently estimate carbon flux at regional scale.
Remote sensing is an important source of data for simulating carbon flux.In recent years, many remote sensing based carbon flux models were developed, such as Glo-PEM [6], C-Fix [7,8], VPM [9], MODIS GPP/NPP algorithm [10], and DCFM [11].All of these models use air temperature as an input.At many observation stations, the air temperature is observed above the canopy, and is different from the canopy temperature [12,13], which actually controls photosynthesis and respiration of vegetation.During the growing season, the land surface temperature (LST) within a high vegetation coverage area is dominated by the canopy temperature.With the development of remote sensing, more and more thermal infrared data is available to estimate LST.MODIS provide thermal infrared image twice every day with spatial resolution of 1 km.Landsat series data provide thermal infrared image every 16 days with spatial resolution of 100 m.These datasets can be used to retrieve continuous LST, which can force carbon flux model.Thus, LST could be used as inputs in remote sensing based carbon flux estimation.
The Heihe River Basin, which is located in Northwest China, is the second largest inland river basin in the country [14].In the midstream of the Heihe River Basin, yearly rainfall is less than 100 mm.Most of the area is Gobi Desert, which has little carbon flux.The largest vegetated area in the midstream of the Heihe River Basin is irrigated cropland.Another well-vegetated area is wetland because of abundant water supply from ground water and river water.Irrigated croplands and wetlands are the two ecosystems with the highest production in the midstream of the Heihe River Basin.Carbon fluxes of these two ecosystems contribute the majority of carbon flux in the midstream of the Heihe River Basin.Thus the accurate estimating of carbon fluxes of these two ecosystems is very important to evaluate the regional scale carbon flux of the midstream of the Heihe River Basin.
Hence, the objectives of this study were to (1) analyze the carbon flux characteristics of irrigated croplands and wetlands in midstream of the Heihe River Basin; (2) develop a remote sensing based carbon flux algorithm, RS-CFLUX, to simulate the carbon flux quantity and dynamics at the two ecosystems.

Observation Stations
Eddy covariance (EC) and automatic meteorological station (AMS) data in 2013 was collected at Daman superstation and Zhangye wetland station.These two observation stations were established during HiWATER experiment in May of 2012 [15].Figure 1 shows the location and photos of Daman superstation and Zhangye wetland station.The observing items and information of Daman superstation and Zhangye wetland station are listed in Table 1.Meteorological data was sampled every minute and was converted to a 30-min average value for comparison with the flux data.Values were deleted if they were beyond the physical range or instrument range [16].The EC systems at the two stations consist of an open-path infrared gas analyzer and a 3-D sonic anemometer [17,18].The 3-D wind speed, H2O concentration and CO2 concentration data were sampled at a frequency of 10 Hz by a data logger.Half hour fluxes were calculated by post-processing raw 10 Hz data using the EdiRe software.The processing steps include spike detection, lag correction of H2O/CO2 relative to the vertical wind component, sonic virtual temperature conversion, planar fit coordinate rotation, the WPL (Webb-Pearman-Leuning) density fluctuation correction and frequency response correction [16].
Half-hourly flux data were excluded if the instrument range was excessive, rainfall occurred or instrument malfunctions occurred.At night (downward short wave radiation < 1 W/m 2 ), half-hourly flux data were excluded if friction velocity (u*) was lower than a specific threshold (0.15 m/s), because of weak turbulent mixing at a low friction velocity.Then, missing values in the fluxes were filled using non-linear functions, which use PAR and air temperature as input.Missing values in the fluxes were not filled when the meteorological measurements were not simultaneously available.The gap-filling data were used only to analyze the seasonal and annual variations in the carbon flux.GPP was the sum of the daytime net ecosystem production (NEP) and ecosystem respiration (ER).The daytime ER was estimated with the Van't Hoff function which was calibrated using the night-time air temperature and nocturnal NEE.The specific method and formulas in the NEE partitioning can be found in Wang's paper [19,20].Moderate-resolution imaging spectroradiometer (MODIS) remote sensing data products are frequently used in light use efficiency models.MODIS provides high temporal resolution images of the land surface.The sensor contains 36 bands in the visible (459 nm) to thermal-infrared band (14,385 nm).Vegetation indices and land surface temperature can be retrieved from these bands.Many land surface variables products were estimated and distributed by the Land Processes Distributed Active Archive Centre (LP DAAC), and are widely used in scientific research.In this study, the 8-day land surface reflectance product (MOD09A1) and daily land surface temperature product (MOD11A1) were downloaded from the website of MODIS Land Product Subsets [21,22].The vegetation indices (EVI and NDWI) were estimated from the MOD09A1.MOD09A1 data were resampled to 1 km spatial resolution.The temporal profiles of the vegetation indices and LST were extracted according to the geographical coordinates of the two stations.The vegetation indices were smoothed using the Savizky-Golay algorithm [23] and were interpolated into daily data.
The formulas for EVI, NDWI and the Savizky-Golay algorithm are as follows: where G is 2.5, C1 is 6, C2 is 7.5 and L is 1, ρblue, ρred, ρnir and ρsir are the reflectance of the blue (band 4), red (band 1), near infrared (band 2) and shortwave infrared bands (band 6), Y is the original time-series data, Yi * is the reconstructed time-series data, Cj is the jth weight of the filter window and 2 m + 1 is the size of the filter window.The window size and order of Savizky-Golay algorithm were set to 13 and 4 based on previous studies [24,25].

GPP Estimation
The general form of light use efficiency (LUE) approach is as follows, max ) where GPP (g C/m 2 /day) represents absorbed CO2, PAR (MJ/m 2 /day) is photosynthetically active radiation, fAPAR (dimensionless) is the fraction of absorbed PAR by the canopy, and LUEmax (g C/MJ) is the maximum LUE.f(T) (dimensionless) and f(W) (dimensionless) are temperature and water scalars, respectively.

fAPAR
fAPAR is usually retrieved from vegetation indices, such as the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI).The EVI is an optimized vegetation index developed from the NDVI, and it has a strong seasonal correlation with GPP [26,27].Recent studies indicated that fAPAR should be partitioned into the fraction of PAR absorbed by chlorophyll and by non-photosynthetically active components.In addition, the EVI equals to the PAR absorbed by chlorophyll according to Xiao's study [9].

Temperature Scalar
In many LUE models, temperature scalars are calculated using near-surface air temperature, because it is easy to obtain from site-based meteorological measurements.When estimating regional carbon fluxes using LUE models, site-observed near-surface air temperature is interpolated into gridded data, which is spatially and temporally consistent with remote sensing data.This process can lead to uncertainties when estimating the results.In this study, we attempt to use LST as the input for the LUE model, because high temporal and spatial resolution LSTs can be estimated using thermal infrared remote sensing images.The temperature scalar is estimated at each time step, using the equation developed for the Terrestrial Ecosystem Model [28]: where LST (°C) is the land surface temperature, LSTmax (°C), LSTopt (°C) and LSTmin (°C) are the maximum, optimum, and minimum land surface temperature for photosynthesis, respectively.f(T) equals 0 when the LST is more than LSTmax or less than LSTmin.

Water Scalar
Water status f (W) is another important environmental factor that affects the photosynthesis rate.f (W) is usually calculated using different variables in LUE models, such as soil water content in Glo-PEM [6].These variables are only observed at a few of points in a particular area.The process of converting from point data to spatially gridded data leads to uncertainties in LUE model results.The normalized difference water index (NDWI) is a remote sensing data based index that can reflect the water content of vegetation [29].Xiao [9] used the NDWI to calculate water scalar of LUE model.We also used this formula to estimate f (W).

NEP Estimation
Net ecosystem production (NEP) is the difference between GPP and ecosystem ER.The Lloyd-Taylor respiration function is widely used in ecosystem respiration calculation [30].Here, the Lloyd-Taylor respiration function was modified in two ways.First, in order to consider the impact of seasonal changes in biomass on respiration, a × EVI was added to the respiration rate at the reference temperature [11].Second, air temperature was replaced with LST.The formula is as follows: where Rref (g C/m 2 /day) is the respiration rate at the reference temperature, and a (g C/m 2 /day) is an empirical coefficient.E0 (°C) is the activation energy parameter, which determines sensitivity of the respiration to temperature.

Model Calibration Method
Remote sensing based carbon flux models usually contain many parameters.When the models are used to simulate carbon flux at a specific site, the parameters should be calibrated.Bayesian theory based parameter estimation method, Markov Chain Monte Carlo (MCMC), is very popular in recent years, because it not only solves the value of parameters but also predicts the posterior probability distribution function of parameters.The posterior probability distribution function can help us to know the characteristics of model parameters and how the parameters will impact on models' output.The MCMC method contains two repeated steps: a proposal step and a movement step [31,32].
In the proposal step, a newly proposed parameter is generated from the previous accepted parameters with a random walk Δc.Δc is calculated using a random number r between 0 and 1, the parameter range (cmax − cmin) and a step length factor s. s equals to 5, according to a previous study [31].
where c i+1 and c i are the proposed parameter vector and previously accepted parameters vector, respectively.In the movement step, c i+1 is tested using the Metropolis rule to determine if the parameter should be accepted.The accepted probability (p(c i , c i+1 ) ) of the proposed parameters is derived from the likelihood function (L(c)) using the parameters accepted previously.The mathematical formula is as follows: where Cfluxt o and Cfluxt s are the in-situ and RS-CFLUX model predicted carbon flux, respectively.δ is the standard deviation of the data-model difference.m is the number of datasets used in the cost function, and n is the number of observations.Then, the acceptance probability is compared with a uniform random number U [0, 1].Only when the acceptance probability is greater than U, will c i+1 be accepted.In this study, five parallel MCMC chains with 50,000 iterations each were run in the calibration process.The first 10,000 iterations of each chain were burn-in iterations and were deleted.The remaining iterations of the parameters were used to generate posterior distributions.G-R diagnostic method [33] was used to evaluate the chain convergence.The MCMC algorithm was programmed in Matlab code.

Model Accuracy Evaluation
Three indices were used to evaluate the performance of the model.(1) The coefficient of determination (R 2 ), which describes to what extent the variation in the observations can be explained by the models.R 2 ranges from 0 to 1, where a value near 1 indicates a good performance of the model.
(2) The root mean square error (RMSE) represents the sample standard deviation of the differences between the predicted values and observed values.A smaller RSME indicates a better performance of the model.(3) The Nash-Sutcliffe model efficiency coefficient (NSE) is often used to assess the power of a model.The coefficient ranges from −∞ to 1.A NSE close to 1 indicates a better match between the modeled and observed values.The formulas for R 2 , RMSE and NSE are as follows: ( ) ( ) where Ot is observed value at time t, Mt is the predicted value at time t, and n is the number of observations.is the mean of the observations.

Seasonal Dynamics of the Carbon Flux and Environmental Factors
The seasonal dynamics of PAR, LST, air temperature, NDWI and EVI are shown in Figure 2. The PAR values observed at the two stations were very consistent in terms of the quantity and seasonal dynamics, except the period from early March to early May (See Figure 2a).At the Zhangye wetland station, the LST was close to the air temperature for entire year.However, at the Daman superstation, the LST was higher than the air temperature.Specifically, the soil surface at Zhangye wetland station remained wet all year, whereas the soil surface was dry for most of the year at the Daman superstation (see Figure 2b).The two stations had the same seasonal trends of vegetation indices (EVI and NDWI).At the Daman superstation, the EVI and NDWI reached their maximum values (0.53 and 0.33, respectively) in late July.At the Zhangye wetland station, the EVI and NDWI reached their maximum values (0.45 and 0.26, respectively) in the middle of July.At the peak of the growing season, the EVI and NDWI were higher at Daman superstation than the Zhangye wetland station, and vice versa at the beginning and end of the growing season.Furthermore, reed has a longer growing season than maize, because EVI of reed start increasing earlier than maize and decrease at the same time (see Figure 2c).
The seasonal dynamics of in-situ GPP and NEP are shown in Figure 3.The GPP and NEP of the Daman superstation (maize) were higher than those of the Zhangye wetland station (reed) during the growing season (see Figure 3), but were lower in the non-growing season.The maximum GPP was 22.64 g C/m 2 /day at the Daman superstation and 10.06 g C/m 2 /day at the Zhangye wetland station.The maximum NEP was attained during the peak of the growing season and was 14.02 g C/m 2 /day at the Daman superstation and 5.90 g C/m 2 /day at the Zhangye wetland station.The minimum NEP was attained in late April and early October.In 2013, the GPP at the Daman superstation was 1442.04 g C/m 2 /year, and the GPP at the Zhangye wetland station was 928.89 g C/m 2 /year (GPP for 20 days in May, 5 days in July, and days from the November 21 to the end of 2013 were missed).The NEP of 2013 is 409.38 g C/m 2 /year for the Daman superstation and 422.60 g C/m 2 /year for the Zhangye wetland station.During the non-growing season, carbon was slightly absorbed at the Zhangye wetland station.

Calibration of the RS-CFLUX Model at the Two Stations
The prior value and range of the parameters were obtained from the literature or from conventional knowledge.The posterior value and range of the parameters in the RS-CFLUX model were estimated using the MCMC method (see Table 2, Figures 4 and 5).From the posterior distribution of parameters, we found that LUEmax was well constrained at both stations.The value of 2.70 g C/MJ of LUEmax at the Daman superstation was much higher than the value of 2.15 g C/MJ at the Zhangye wetland station.LSTmax, LSTopt and LSTmin were not well constrained.LSTmax and LSTmin were very similar at the two stations, but LSTopt was obviously higher for maize than for reed.The range from LSTmin to LSTmax for maize was smaller than that for reed because the growing season for maize is shorter than for reed.Respiration rate at reference temperature Rref was also well constrained at both stations, but empirical parameter of respiration function E0 and a were not well constrained at either site.

Carbon Flux Predicted by the RS-CFLUX Model
Figures 6 and 7 show the comparison between the carbon flux predicted by the RS-CFLUX model and the carbon flux estimated from EC at Daman superstation and Zhangye wetland station.The model predictions and observations were consistent.The RS-CFLUX model can correctly simulate the seasonal dynamics and quantities of GPP and NEP at the two stations.R 2 , RMSE and NSE of the GPP between the RS-CFLUX model and EC were, respectively 0.92, 1.96 g C/m 2 /d and 0.87 at Daman superstation, and 0.94, 1.00 g C/m 2 /day and 0.85 at Zhangye wetland station.R 2 , RMSE and NSE of the NEP between the RS-CFLUX model and EC were 0.88, 1.31 g C/m 2 /d and 0.87 at the Daman superstation, and 0.78, 1.02 g C/m 2 /day and 0.61 at the Zhangye wetland station (see Table 3).The sum of the EC-estimated GPP was 1442.04 g C/m 2 /year at the Daman superstation and 928.89 g C/m 2 /year at the Zhangye wetland station (the GPP missed for 68 days in 2013 and was not filled for the Zhangye wetland station); The sum of predicted GPP was 1196.93 g C/m 2 /year at the Daman superstation and 775.24 g C/m 2 /year at the Zhangye wetland station at the corresponding time.At the peak-growing season, GPP was obviously underestimated at Daman superstation; this is mainly caused by model parameter LUEmax.LUEmax is smaller when both in-situ GPP and NEE were simultaneously used to calibrate CFLUX model than when only in-situ GPP was used.The sum of the EC-estimated NEP was 409.38 g C/m 2 /year at the Daman superstation and 422.60 g C/m 2 /year at the Zhangye wetland station (the GPP missed for 68 days in 2013 were not filled at Zhangye wetland station); The sum of the predicted NEP was 409.46 g C/m 2 /year at the Daman superstation and 433.12 g C/m 2 /year at the Zhangye wetland station at the corresponding time.The linear regression slope between the model and observations showed GPP was only slightly underestimated at the Daman superstation.3.

Using the MODIS LST Product as Input for the RS-CFLUX Model
LST input for the RS-CFLUX model can be replaced with the MODIS LST product.The average of daytime and nighttime LSTs of the MODIS daily LST product MOD11A1 was used as input of RS-CFLUX, and it is compared with the daily average in-situ LST (see left column of Figure 7 and Table 4).MODIS LST has a large positive bias comparing with in-situ LST at the Zhangye wetland station, but little bias at the Daman superstation.This is attributed to the strong land surface heterogeneity at Zhangye wetland station.When the average daytime and nighttime LSTs of MOD11A1 were used as input for RS-CFLUX at the Daman superstation and Zhangye wetland station, the RS-CFLUX model performed well at both stations (see middle and right columns of Figure 8 and Table 4).4.

Comparison of the Carbon Flux
Farmlands and wetlands are the two most productive ecosystems in the midstream of the Heihe River Basin.During the peak of the growing season, the GPP and NEP obviously higher at the Daman superstation than the Zhangye wetland station.However, the GPP and NEP were slightly lower at the Daman superstation than the Zhangye wetland station at the beginning and end of the growing season.Maize at the Daman superstation is usually sown in late April and harvested in late September, and it has faster growth speed than reed.However, reed at the Zhangye wetland station usually begins to sprout in early April and wither in early October.The yearly GPP was obviously higher at the Daman superstation than at the Zhangye wetland station.Because maize is C4 plant which has higher light use efficiency than C3, and maize land is fertilized and irrigated by farmers.The NEP value of the Daman superstation and Zhangye wetland station were nearly equal.Thus, the ecosystem respiration of the Daman superstation was also higher than that of the Zhangye wetland station.Under waterlogged conditions, soil respiration is constrained because of the lack of oxygen and reduced porosity due to the high soil water content [36,37].Another reason is higher GPP at Daman superstation will result in higher autotrophic respiration.

Uncertainties Resulted from Model Parameters
The model parameters can lead to uncertainties in the results of the RS-CFLUX model.The model calibration suggested that only a few parameters were well constrained.The number of parameters that can be well constrained depends on the amount and quality of measurement that used in calibration.It is also reported by previous study that there are only few parameters can be well constrained when only the carbon flux data was used in the reversion process [38,39].Based on the probability density function proposed by MCMC for each parameter, we used the Monte Carlo method to assess the uncertainties in the model outputs resulting from each parameter (see Figures 9 and 10).LUEmax resulted in the greatest uncertainties in GPP, followed by LSTopt.LSTmin and LSTmax both resulted in low uncertainties in the GPP.Rref, E0 and a did not result in uncertainties in the GPP prediction.However, E0, Rref and LUEmax led to the greatest uncertainties in the NEP, followed by LSTopt and a. LSTmin and LSTmax led to low uncertainties in the NEP prediction.In addition to the parameters, forcing variables and model structure also resulted in uncertainties in the model outputs [40].Observation instruments contain stochastic error.Furthermore, data processing also results in errors.The model is a simple function with a few environmental factors; it does not include all exchange processes between the ecosystem and atmosphere [41].

Potential to be a Fully Remote Sensing Base Carbon Flux Model
Generally, the RS-CFLUX model performed well in both the Daman superstation and Zhangye wetland station.The model correctly estimated the GPP and NEP of the ecosystem.The model accuracy in term of RMSE and R 2 is as high as the other model used in this area, such as VPM [19].Based on in-situ data, it is found that correlation coefficient between LST and NEE is much higher than that between air temperature and NEE.Furthermore, high-resolution LST and PAR data could be retrieved from remote sensing data; this will improve the carbon fluxes estimation at the areas with sparse weather stations, such as Northwest China.When using the remote-sensing-derived PAR product and LST product as inputs, RS-CFLUX will simulate the carbon flux using only remote sensing data.However, high temporal resolution PAR and LST products are difficult to generate.A lot of LST value is missing in current MODIS LST product because of the impact of the weather.Some methods are developed in recent years, which can be used to reconstruct the LST product and form a time continuous LST [42].Polar orbiting and geostationary satellite data can be combined with other remote sensing data to generate continuous daily even hourly PAR data.If the quality of remote sensing data product is improved using these new data and method, then the RS-CFLUX model could easily estimate the carbon flux at regional scale.

Conclusions
In summary, carbon fluxes model based on remote sensing data were presented in this study.The model was calibrated and validated at two stations in the midstream of the Heihe River Basin.Based on this research, the following conclusions are presented.(1) In the midstream of the Heihe River Basin, the maize land had higher GPP (1442.04g C/m 2 /year) than the reed dominated wetland (928.89g C/m 2 /year).However, the NEP of maize land (409.38 g C/m 2 /year) was nearly equal to that of the reed dominated wetland (422.60 g C/m 2 /year).( 2) RS-CFLUX can adequately simulate the seasonal dynamics and quantity of carbon fluxes for the maize land and reed dominated wetland.(3) From the calibration of RS-CFLUX, only a few parameters were well constrained.However, these wellconstrained parameters are important to model output, such as the maximum light use efficiency, respiration at the reference temperature and the activation energy parameter of respiration.If these parameters are not well estimated, then they can result in large uncertainties in the model output.

Figure 3 .
Figure 3. Seasonal dynamics of the GPP and NEP at the Daman superstation and Zhangye wetland station.

Figure 5 .
Figure 5. Histograms of the RS-CFLUX model parameters at the Zhangye wetland station (reed).

Figure 6 .
Figure 6.Validation of the GPP estimated by the RS-CFLUX model at the Daman superstation (Daman) and Zhangye wetland station (Wetland).The slope and residual of the linear fit are shown in Table3.

Figure 7 .
Figure 7. Validation of the NEP estimated by the RS-CFLUX model at the Daman superstation (Daman) and Zhangye wetland station (Wetland).The slope and residual of the linear fit are shown in Table3.

Figure 8 .
Figure 8. Validation of the RS-CFLUX model using MODIS LST as input at the Daman superstation (Daman) and Zhangye wetland station (Wetland).The slope and residual of the linear fit are shown in Table4.

Figure 9 .Figure 10 .
Figure 9. Uncertainties in the GPP caused by parameters in the RS-CFLUX model.N (mu, delta) is a normal distribution with a mean mu and standard deviation delta.

Table 1 .
Observing items at the Daman superstation and Zhangye wetland station.

Table 2 .
Prior distribution (initial value and range) and posterior distribution (mean value and 95% confidence interval) of the parameters of the RS-CFLUX model at the Daman superstation (Daman) and Zhangye wetland station (Wetland).

Table 3 .
Comparison of the RS-CFLUX modeled GPP with the observed GPP at the Daman superstation (Daman) and Zhangye wetland station (Wetland).

Table 4 .
Results from the RS-CFLUX model using MODIS LST as input.