Modeling the Carbon Cycle of a Subtropical Chinese Fir Plantation Using a Multi-Source Data Fusion Approach

: Process-based terrestrial ecosystem models are increasingly being used to predict carbon (C) cycling in forest ecosystems. Given the complexity of ecosystems, these models inevitably have certain deﬁciencies, and thus the model parameters and simulations can be highly uncertain. Through long-term direct observation of ecosystems, numerous di ﬀ erent types of data have accumulated, providing valuable opportunities to determine which sources of data can most e ﬀ ectively reduce the uncertainty of simulation results, and thereby improve simulation accuracy. In this study, based on a long-term series of observations (biometric and ﬂux data) of a subtropical Chinese ﬁr plantation ecosystem, we use a model–data fusion framework to evaluate the e ﬀ ects of di ﬀ erent constrained data on the parameter estimation and uncertainty of related variables, and systematically evaluate the uncertainty of parameters. We found that plant C pool observational data contributed to signiﬁcant reductions in the uncertainty of parameter estimates and simulation, as these data provide information on C pool size. However, none of the data e ﬀ ectively constrained the foliage C pool, indicating that this pool should be a target for future observational activities. The assimilation of soil organic C observations was found to be important for reducing the uncertainty or bias in soil C pools. The key ﬁndings of this study are that the assimilation of multiple time scales and types of data stream are critical for model constraint and that the most accurate simulation results are obtained when all available biometric and ﬂux data are used as constraints. Accordingly, our results highlight the importance of using multi-source data when seeking to constrain process-based terrestrial ecosystem models.

Abstract: Process-based terrestrial ecosystem models are increasingly being used to predict carbon (C) cycling in forest ecosystems. Given the complexity of ecosystems, these models inevitably have certain deficiencies, and thus the model parameters and simulations can be highly uncertain. Through long-term direct observation of ecosystems, numerous different types of data have accumulated, providing valuable opportunities to determine which sources of data can most effectively reduce the uncertainty of simulation results, and thereby improve simulation accuracy. In this study, based on a long-term series of observations (biometric and flux data) of a subtropical Chinese fir plantation ecosystem, we use a model-data fusion framework to evaluate the effects of different constrained data on the parameter estimation and uncertainty of related variables, and systematically evaluate the uncertainty of parameters. We found that plant C pool observational data contributed to significant reductions in the uncertainty of parameter estimates and simulation, as these data provide information on C pool size. However, none of the data effectively constrained the foliage C pool, indicating that this pool should be a target for future observational activities. The assimilation of soil organic C observations was found to be important for reducing the uncertainty or bias in soil C pools. The key findings of this study are that the assimilation of multiple time scales and types of data stream are critical for model constraint and that the most accurate simulation results are obtained when all available biometric and flux data are used as constraints. Accordingly, our results highlight the importance of using multi-source data when seeking to constrain process-based terrestrial ecosystem models.

Introduction
Forest ecosystems are among the most important of terrestrial ecosystems, in that they store large amounts of carbon (C) and play vital roles in regulating the global C balance and mitigating climate change. In the context of climate change, it is essential to accurately estimate C cycling in forest ecosystems, as this can provide a basis for predicting climate change and controlling the greenhouse effect. In recent years, a large number of different types of observations related to changes in ecosystem C cycles have accumulated, which have mainly been derived from environmental control experiments and eddy covariance measurements [1][2][3][4][5]. However, deciding how most effectively to utilize the accumulated data to enhance our understanding of terrestrial ecosystems remains a challenge [6].
Process-based terrestrial ecosystem models are important tools for studying key processes of C cycles and the mechanisms underlying their control and have been widely used to estimate regional and/or global C budgets [7][8][9][10][11]. However, given the complexity of ecosystems, our current understanding of ecosystem-related key processes and control mechanisms is insufficiently comprehensive, as model parameters inevitably have associated uncertainties, and thus these models are still unable to accurately simulate and predict ecosystem processes and C source/sink distribution and changes [12][13][14][15]. In this regard, the model-data fusion technique (MDF) provides a powerful tool for reducing the uncertainty when simulating ecosystem C cycles by combining observations and models, which can contribute to improving model simulation accuracy [16][17][18].
MDF makes full use of existing observations and applies mathematical methods to optimize the parameters and/or state variables of the model to achieve the best combination between simulations and measurements, thereby enabling more accurate simulation of the changes in ecosystem state [19][20][21]. The accuracy of these simulations, in turn, depends on the appropriate acquisition of model parameters [2,12,[22][23][24] and, given that models contain a large number of parameters that are difficult to estimate accurately, most of the research on MDF in the field of C cycles have tended to focus on parameter estimation [9,12,15,25,26].
At present, various time scales and types of observational data are commonly used in terrestrial process-based terrestrial ecosystem modeling, which may contain different types of information on ecosystem processes. For example, C flux data contains considerable amounts of information regarding how "fast" processes respond to environmental drivers (e.g., photosynthesis and respiration), whereas biometric data contains information relating to numerous "slow" processes (e.g., C pool size and turnover) [1,15,27,28]. Therefore, the data used as constraints require multiple data streams. In the MDF, due to differences in the circumstances of data collection, biometric data estimation parameters [26] and flux data [29][30][31][32][33][34] have been used to independently constrain parameters, as well as being used in combination [9,25,35]. However, although there have been numerous studies in this field, few have specifically quantified the impact of different data streams on simulation effects.
Plantations grow rapidly, with higher carbon sequestration rates and greater potential [36,37]. Accordingly, accurate simulation of the C cycle of planted forests is of considerable significance for reasonable evaluations of the C sequestration rates and potentials of forest ecosystems. In this study, we used a Data Assimilation Linked Ecosystem Carbon (DALEC) [38] model in conjunction with the Markov Chain-Monte Carlo (MCMC) method, in an inverse analysis using long-term data obtained from the Chinese Ecosystem Research Network (CERN, http://www.cern.ac.cn/) and China FLUX (http://www.chinaflux.org/) to model the C cycle of a subtropical coniferous plantation in Huitong, China. We constrained the model parameters using a range of different data streams, including field measurements of leaf area index (LAI), foliage, wood, and fine root biomass, annual litterfall, soil organic matter, and eddy covariance measurements of net ecosystem exchange (NEE), as well as manual chamber measurements of soil respiration (Rs). Our objective is to use the MDF framework to determine how combinations of different constraint data influence parameter estimates and how these different parameter-sets influence simulation. On the basis of this information, we can identify which observational data are most likely to reduce model uncertainties.

Site Description
The Huitong subtropical Chinese fir plantation (HTF) observation station (26 • 47 N, 109 • 35 E) is located in Huitong County, Hunan Province, China. It lies in a transition zone that extends from the Yun-Guizhou Plateau to the Jiangnan Hills, which is characterized by low mountains and hills with elevations of between approximately 300 and 1000 m. The parent rock is dominated by slab and shale, and the soil is a red or yellow soil. It is a typical subtropical region, with an average annual temperature of 16.5 • C and annual precipitation of 1200-1400 mm. The Chinese fir plantation, which comprises pure stands of Chinese fir, was planted in 1983. The landform is hilly with a slope of 20 • , which runs in a southeast direction [39]. The observation station of HTF has a comprehensive observation field (15,000 m 2 ) of Chinese fir plantation, and long-term observation of water, soil, meteorology, and biology. There are permanent plots, and destructive plots are set in the comprehensive observation field. The area of permanent plots are about 5000 m 2 , and a first-class plot (40 × 50 m) is set inside, which is divided into 25 secondary plots (8 × 10 m), five of them are surveyed every five years, with a cycle of 25 years planned.

Data
The collected data included a meteorological observation dataset (drive data) and eight sets of observational data (constraint data), including one flux dataset (NEE, Rs), three biomass datasets (foliage, fine root, and wood), and single datasets for litter, soil organic C, and canopy dynamics (LAI).

Flux Data
The NEE data were obtained from ChinaFLUX and were aggregated to the daily time step from 30 min CO 2 flux data measured using the eddy covariance technique and processed via quality control and gap-filling [40]. The flux observation system consists of an ultrasonic anemometer (CSAT3, Campbell, Camden, NJ, USA) and a CO 2 /H 2 O infrared gas analyzer (Li7500, Li-cor, Lincoln, NE, USA). The height of the observation tower is 32.5 m, and the installation height of the flux observation system is 32.5 m; flux observation time was May 2007. A negative NEE value indicates that the ecosystem absorbs CO 2 from the atmosphere, whereas a positive value indicates that the ecosystem releases CO 2 to the atmosphere. Rs values were measured using static chamber-gas chromatography techniques [41]. A total of four to six replicate measurements were obtained two to three times per month, with sampling times between 09:00 and 11:00. The Rs values thus obtained were converted to monthly averages to reduce the effects of wind, rain, and other weather.

Biometric Data
LAI was measured optically using an LAI-2000 plant canopy analyzer (LI-COR, Lincoln, NE, USA) at least quarterly every year and corrected using a foliage clumping index, which was set for plant functional type (PFT)-specific empirical values [42]. In the MDF analysis, the seasonal variation in LAI combined with the leaf C mass per leaf area (LCMA) parameter constrained the dynamic trajectory of the foliage C.
Data obtained for biomass observations made during the study period (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) were multiplied by 50% for conversion to C values. The biomass of leaves, stems, and roots was calculated using an allometric method according to measured diameters at breast height and tree height. The ratio of the biomass of fine roots and the biomass of entire roots in typical Chinese forests was obtained from Zhang and Wu (2001) [43] to facilitate the division of biomass into fine and coarse root components, and the biomass of coarse roots was combined with the biomass of branches and stems to provide estimates of woody biomass. The data for leaf, fine root, and woody biomass were used to constrain the corresponding C pools in the inverse analysis.
The aboveground litterfall biomass (foliage and fine twigs) was measured by 10 replicates of baskets (100 × 100 cm). Litter was collected monthly during the growing season and once during the non-growing season. All collected litter was dried at 70 • C for 24 h and weighed. We used the annual litter biomass data (2005,2006,2007,2010,2011) for inversion analysis to avoid the influence of wind on the measurement of litterfall biomass within an individual month. The results of long-term annual measurements indicate that the proportion of C content to dry matter in litter is essentially maintained at approximately 50%, and thus the average dry weight of litter can be reasonably accurately converted to litter C.

Soil Data
Soil organic matter (SOM) and soil bulk density were measured using the potassium dichromate oxidation titrimetric method and the cutting ring method, respectively. Soil sampling was performed in five layers (at depths of 0-10, 10-20, 20-40, 40-60, and 60-100 cm) at 5-year intervals (2005 and 2010), with three replicates per sampling. We calculated the soil organic C (SOC) using Equation (1) [44]: where SOC is the soil organic C density (g C/m 2 ) of all n layers, H i is soil thickness (cm), B i is soil bulk density (g/cm 3 ), and O i is the SOM content of the i-th layer (%).

Meteorological Driving Data
Data for meteorological drivers were obtained from the CERN Science and Technology Resource Service System, including daily meteorological data from 2005 to 2015 (air temperature (T, • C), photosynthetically active radiation (PAR, mol m −2 /day), relative humidity (RH, %), and saturated vapor pressure difference (VPD, hPa)). The data were processed by standardized quality control and gap-filling [40,45].

DALEC Model
The Data Assimilation Linked Ecosystem Carbon (DALEC) model is a simple ecosystem process model developed by Williams and colleagues [38], which integrates assimilated data and has been widely used in MDF analyses [5,9,27,38,46]. The DALEC model connects the C pools by flux. In the present study, we used the evergreen version of DALEC [27] (Figure 1), which represents the C pools of foliage (Cf), wood (Cw), fine roots (Cr), litter (Clit), and soil (Csom).
In general, the C cycle of forest ecosystems can be characterized in detail based on four properties, which are outlined as follows. (1) CO 2 in the atmosphere enters the ecosystem via photosynthesis, and the C cycle typically commences with canopy C [8]. In this regard, we estimated gross primary productivity (GPP) in the present study using a canopy photosynthesis model, which is a function of LAI, T, PAR, VPD, and RH [47]. (2) A proportion of the GPP is consumed as autotrophic respiration (Ra), whereas the remainder (i.e., net primary production, NPP) is allocated to the different plant C pools (i.e., foliage, wood, and fine root pools). The C from the plant pool subsequently enters the dead organic matter pool with temperature-dependent loss (heterotrophic respiration, Rh). (3) Subsequent C transfer is dominated by the upstream C pools (i.e., the litter is decomposed (D) and is incorporated into the soil). (4) The C released from C reservoirs is based on a first-order differential equation. In general, the C cycle of forest ecosystems can be characterized in detail based on four properties, which are outlined as follows. (1) CO2 in the atmosphere enters the ecosystem via photosynthesis, and the C cycle typically commences with canopy C [8]. In this regard, we estimated gross primary productivity (GPP) in the present study using a canopy photosynthesis model, which is a function of LAI, T, PAR, VPD, and RH [47]. (2) A proportion of the GPP is consumed as autotrophic respiration (Ra), whereas the remainder (i.e., net primary production, NPP) is allocated to the different plant C pools (i.e., foliage, wood, and fine root pools). The C from the plant pool subsequently enters the dead organic matter pool with temperature-dependent loss (heterotrophic respiration, Rh). (3) Subsequent C transfer is dominated by the upstream C pools (i.e., the litter is decomposed (D) and is incorporated into the soil). (4) The C released from C reservoirs is based on a first-order differential equation.

Parameterization of DALEC
In order to parameterize DALEC, we used the Markov Chain-Monte Carlo (MCMC) method, which is a non-sequential global search method based on Bayesian theory [2]. The MCMC algorithm treats the solved parameters as random variables that meet a certain prior probability distribution. According to the Bayesian formula, the MCMC sampling sequence is constructed through a large number of random samples [21]. The prior distribution of the parameters is used to determine their posterior distribution, and then infer the unknown parameters based on the posterior distribution. The Bayes formula is as follows (Equation (2)): where ρ(θ|x) and ρ(θ) are the posterior and prior probability density distributions of the parameters, ρ(x) is the probability of the observation data, and ρ(x|θ) is the prior parameter of the observations under the value, that is, the likelihood function of θ. In this study, we assumed that the random error follows a normal distribution, and its likelihood function is represented as follows: where n is the number of data points in the observations, xi and ηi are the i-th observations and simulations, respectively, and σ is the standard error of the observation

Parameterization of DALEC
In order to parameterize DALEC, we used the Markov Chain-Monte Carlo (MCMC) method, which is a non-sequential global search method based on Bayesian theory [2]. The MCMC algorithm treats the solved parameters as random variables that meet a certain prior probability distribution. According to the Bayesian formula, the MCMC sampling sequence is constructed through a large number of random samples [21]. The prior distribution of the parameters is used to determine their posterior distribution, and then infer the unknown parameters based on the posterior distribution. The Bayes formula is as follows (Equation (2)): where ρ(θ|x) and ρ(θ) are the posterior and prior probability density distributions of the parameters, ρ(x) is the probability of the observation data, and ρ(x|θ) is the prior parameter of the observations under the value, that is, the likelihood function of θ. In this study, we assumed that the random error follows a normal distribution, and its likelihood function is represented as follows: where n is the number of data points in the observations, x i and η i are the i-th observations and simulations, respectively, and σ is the standard error of the observation. We designed five optimization experiments (Runs 1 to 5; Table 1) to examine the effects of different data streams on parameters and simulation results. Using LAI observations as the base dataset (Run 1), we added other biometric data and C flux data in a step-by-step manner. All experiments assimilated the LAI time series, which can be used to constrain leaf C mass per unit area (LCMA) and indirectly constrains foliage carbon. Run 2 assimilated the biometric data (foliage, fine root, and wood biomass) reflecting plant C. We designed two experiments to constrain the associated C pool for the soil C simulation results, which were highly uncertain, with Run 3 assimilating the soil C information reflecting the dynamics of soil C storage and Run 4 assimilating the observational data for litter [48]. Run 5 assimilated all the observational data used, including C flux data (NEE, Rs) reflecting the carbon exchange process of the ecosystem. The root mean square error (RMSE) and mean absolute error (MAE) for simulated and observed values were used to quantify the error.

Parameter Posterior Distribution
In this study, we used biometric data and C flux data to constrain the parameters of the DALEC model. The MCMC method was used to estimate the posterior distribution of the 10 parameters of the DALEC model ( Figure 2). According to the shape of posterior distributions, the optimized parameters could be classified into three groups: well-constrained, poorly constrained, and edge hitting, and thus we were able to determine the constraint effect of different data streams on model parameters.  Table 2) estimated for the DALEC model using a variety of different data constraints (Runs 1-5, x-axis, see Table 1). All y-axis represent priori range. Whiskers indicate 95% confidence interval; box indicate interquartile range).   Table 2) estimated for the DALEC model using a variety of different data constraints (Runs 1-5, x-axis, see Table 1). All y-axis represent priori range. Whiskers indicate 95% confidence interval; box indicate interquartile range). The parameters were considered to be well constrained if their posterior distribution was normally distributed (or approximated to a normal distribution) [49]. The posterior distribution of the poorly constrained parameters was uniformly distributed, with the exception of the uniform distribution of Fg (fraction of GPP respired) in Run 1, this type of distribution was rarely observed in other experiments. The posterior distribution of the edge-hitting parameters was often concentrated near the extreme value of the parameter value range. For example, Td (decomposition rate) and Tw (rate of wood turnover) in Run 1. Although edge hitting was rarely detected in the posterior distribution with increases in constraint data, Tf (rate of foliage turnover) was the only parameter that was observed in all five experiments. Edge hitting may be partly ascribed to the insignificant constraining effect of the data in model parameters and the limit of the threshold of model parameters. Our analysis indicated that only two parameters (Fg and Tw) could be constrained using only LAI data (Run 1). Six parameters-Fg, Fnf (fraction of NPP allocated to foliage), Fnr (fraction of NPP allocated to roots), Tw, Tr (rate of root wood turnover), Et (parameter in exponential term of temperature-dependent rate parameter)-could be well constrained by plant C stock data (Run 2). These parameters are mainly related to the C pools of foliage, fine roots, and wood. Moreover, when the data for the soil C pool were included, Tl and Ts (representing the mineralization rates of litter and SOM, respectively) conformed to a normal distribution. With the addition of litter data (Run 4), Fg, Fnf, Tw, Tr, and Ts were all normally distributed. When all data streams were used in parameter optimization (Run 5), with the exception of Tf and Et, most parameters could be well constrained. In general, increasing the amounts of constraint data was found to be conducive to increasing the number of parameters that can be constrained. In particular, for certain parameters (e.g., Fnf and Td), a multiple constraint approach may be required in order to achieve well-constrained posterior distribution.

Parameter Uncertainties
The degree to which the posterior distribution of the parameters was constrained and the improvement in the prior value of the uniform distribution ( Figure 2) depend on both the data used to constrain the calibration (Table 1) and the parameters involved (Table 2). We found that the parameter posterior distributions generally included the upper and lower limits of the prior value range. However, the parameters Td, Tf, Tw, Tr, Tl, and Ts were noteworthy in all instances, for which the posterior interquartile ranges were significantly reduced (with average reductions of 81%, 87%, 74%, 67%, 75%, and 76%, respectively) compared with the parameter prior interquartile range (= half the width of the prior range, given a uniform prior).
Whether the parameters could be constrained by the data was dependent on the information contained in the data and the expression of this information in the model. When only LAI was used to constrain parameters (Run 1), the 95% confidence interval of the posterior distribution of six of the 10 parameters was outside the range of the prior interquartile, whereas for four of the 10 parameters, the posterior interquartile range of the posterior distribution was not substantially reduced compared with the prior value interquartile (a less than 20% reduction). Although increasing the number of constraint data sets used could generally reduce the uncertainty of the parameters, which parameters were affected was highly dependent on the data being assimilated. For example, soil respiration data (Run 5) were required to constrain the estimates of Fg, fine root data could strongly constrain Fnr, and litter (Run 4) data were needed to effectively prevent the posterior distribution of Tf from widening. Unsurprising, when all data streams were used (Run 5), we observed the tightest parameter distribution.

C Pools
When all the observed data were used to constrain state variables (Run 5), the goodness-of-fit statistics ( Table 3) and simulated C pool of the DALEC model ( Figure 3) indicated that the model was capable of capturing the observational data. When the simulations and observations of C pools (foliage (Cf), wood (Cw), fine root (Cr), litter (Clit), and soil (Csom)) were compared with Run 1, we observed that the RMSE decreased by 42%, 94%, 75%, 97%, and 93%, respectively, and that the MAE decreased by 60%, 94%, 75%, 96%, and 94%, respectively. In contrast, the RMSE and MAE of the simulated LAI error increased slightly (by less than 4%).  (Table 1). The solid line in the figure represents the average value of modeled results, the gray area represents the 95% confidence interval, and the blank circle represents the observed value).   (Table 1). The solid line in the figure represents the average value of modeled results, the gray area represents the 95% confidence interval, and the blank circle represents the observed value). Simulating foliage C pools represents a more challenging exercise. Although all the experiments captured the interannual variability of foliage C, the magnitude of this pool was significantly underestimated (Figure 3, Table 3). Assimilation of LAI time series data (Run 1), which reflect canopy dynamics, failed to adequately constrain foliage C. When biometric data (Run 2) reflecting the dynamics of vegetation C stocks were assimilated, RMSE and MAE were significantly reduced by 41% and 59%, respectively, compared with Run 1. Although the 95% confidence interval was reduced by 86% and the uncertainty was reduced, the RMSE and MAE remained at 347.74 and 217.55 g C m −2 , respectively. With the addition of soil C and litterfall information, the simulations of Cf were still markedly different from the observations. The smallest confidence interval for Cf was obtained in Run 5; however, the simulation results were still highly uncertain (RMSE > 300 g C m −2 , MAE > 200 g C m −2 ). These observations thus indicated that the fast-turnover C pool was not sufficiently constrained, which may indicate the need for additional types of data and/or larger amounts of data to constrain this C pool. Alternatively, the insufficient constraint could be attributable to uncertainties in the observations. We found that the assimilation of information on plant C pools (Run 2) strongly constrained the wood and fine roots C pools ( Figure 3, Table 3). When only LAI was assimilated (Run 1), the simulation of Cw was significantly underestimated, with RMSE and MAE values of 2847.26 and 2564.48 g C m −2 , respectively. Subsequent to the assimilation of wood information (Run 2), compared with Run 1, the RMSE decreased by 17-fold to 167.01 g C m −2 (Table 3), the MAE decreased by 21-fold to 19.28 g C m −2 , and the 95% confidence interval decreased by 91%. However, the incorporation of additional observation data (Run 3 to 5) had little effect on the simulation results of Cw (R 2 > 0.93), with the increases in RMSE and MAE being less than 3 and 10 g C m −2 , respectively (Table 3). Similar to Cw, we detected the largest confidence interval when information relating to Cr was not assimilated (i.e., Run 1). With the exception of Run 1, Cr (R 2 > 0.88) was well simulated in Run 2 to 5, and compared with Run 1, the RMSE and MAE decreased by an average of 71% and 72%, respectively, and the confidence interval decrease of more than 90%.
Soil C is a key component of process-based terrestrial ecosystem models and one of the largest sources of uncertainty, and in this regard, soil C time series data represent key data for C research, which can minimize the uncertainty of simulation results [48]. We obtained the smallest average RMSE and MAE values (108.99 and 90.08 g C m −2 , respectively) when assimilating soil C data. Compared with the experiments in which soil C information was not assimilated (Run 1 and 2), RMSE and MAE decreased by an average of 80% and 83%, respectively, and the average confidence interval decreased by an average of 95% (Figure 3, Table 3). However, after assimilating information for litterfall, the RMSE and MAE values increased by 215% and 266%, although the confidence interval decreased by 48%. With the assimilation of the C flux, the bias decreased, with RMSE and MAE values reaching 113.20 and 90.68 g C m −2 , respectively.
The litter C pool is notably sensitive to the assimilation biometric data (Figure 3, Table 3), and we found that when assimilating only LAI, the simulated values overestimated Clit, with RMSE and MAE values of 651.56 and 600.34 g C m −2 , respectively. With the assimilation of information on foliage, fine roots, and wood, the bias between the simulated and observed value was appreciably reduced (R 2 = 0.90), and RMSE and MAE decreased by 82% and 80%, respectively. With the addition of soil C data, the bias showed further decreases. After assimilating litterfall data, the RMSE and MAE further decreased to 15.00 and 13.05 g C m −2 , respectively. Compared with the experiments without assimilated litter (Run 1 to 3), RMSE and MAE decreased by an average of 76% and 78%, respectively, and the confidence interval decreased by an average of 58%.
When all observations with the exception of Csom were used to constrain the DALEC model, the confidence interval of the other four carbon pools were the narrowest (Csom was the narrowest at Run 4), with smaller bias, indicating that multi-source data is effective in constraining the DALEC model. It should be pointed out, however, that MDF entails a balance between variables [50]. For example, the simulation results of Run 3 and 4 revealed that better simulation results for Clit were gained at the expense of increasing the bias of other state variables (such as Csom).

C Fluxes
Compared with the observational data (R 2 = 0.003 and 0.001, respectively), assimilation of LAI and plant C pool information (Run 1 and 2) yielded mean RMSE and MAE values of 2.81 and 2.20g C m −2 , respectively, thereby indicating that these C sources do not reflect the true state of Chinese fir plantations ( Figure 4, Table 4) and that assimilation of LAI and plant C data alone cannot constrain NEE. When the soil C data were assimilated (Run 3), the simulation results are were C sink (R 2 = 0.094), and the average values of RMSE and MAE were 1.17 and 1.47 g C m −2 , respectively. After the addition of litterfall data (Run 4), RMSE and MAE increased slightly (R 2 = 0.098). Furthermore, with the assimilation of C flux (Run 5), the simulated total annual NEE in 2008 was −140.92 g C m −2 (−0.39 ± 1.22 g C m −2 ), the RMSE and MAE were 1.44 and 1.17 g C m −2 (R 2 = 0.099), respectively, and the simulation corresponded well with the observation (−168.90 g C m −2 ) (Table 4, Figure 4).   Table 1) (the solid line in the figure represents the average value of modeled results, the gray area represents the 95% confidence interval, and the blank circle represents the observed value). Soil respiration was also found to be sensitive to information assimilation ( Figure 4, Table 4), with the bias of the simulation results generally becoming smaller, with an increase in assimilated constraint data. The assimilation of LAI had virtually no constraining effect on Rs (R 2 = 0.97), whereas, in response to the addition of information on foliage, fine roots, and wood (Run 2), the bias of the simulation results increased (R 2 = 0.96), whereas in contrast, the addition of soil C (Run 3), resulted in a significant decrease in bias (R 2 = 0.85), and compared with Run 2, RMSE and MAE decreased by 83% and 81%, respectively. When information on litterfall was added (Run 4), the bias was further decreased (R 2 = 0.91). Not surprisingly, the bias reached a minimum, with RMSE and MAE values of 0.66 and 0.55g C m -2 , respectively, when C flux data were assimilated (Run 5), although the simulated  Table 1) (the solid line in the figure represents the average value of modeled results, the gray area represents the 95% confidence interval, and the blank circle represents the observed value).
Soil respiration was also found to be sensitive to information assimilation ( Figure 4, Table 4), with the bias of the simulation results generally becoming smaller, with an increase in assimilated constraint data. The assimilation of LAI had virtually no constraining effect on Rs (R 2 = 0.97), whereas, in response to the addition of information on foliage, fine roots, and wood (Run 2), the bias of the simulation results increased (R 2 = 0.96), whereas in contrast, the addition of soil C (Run 3), resulted in a significant decrease in bias (R 2 = 0.85), and compared with Run 2, RMSE and MAE decreased by 83% and 81%, respectively. When information on litterfall was added (Run 4), the bias was further decreased (R 2 = 0.91). Not surprisingly, the bias reached a minimum, with RMSE and MAE values of 0.66 and 0.55 g C m −2 , respectively, when C flux data were assimilated (Run 5), although the simulated value for soil respiration (1.75 ± 0.80 g C m −2 ) was slightly different from the observed value (1.21 ± 0.60 g C m −2 ) (R 2 = 0.93), which can be attributed mainly to an overestimation of summer soil respiration ( Figure 4, Table 4).

Discussion
In this study, we used multi-source data to constrain the parameters of a DALEC model for a subtropical coniferous plantation in Huitong, China. After assimilating all collected observations, we observed that the posterior interquartile range of parameters Td, Tf, Tw, Tr, Tl, and Ts were reduced by at least 60%, and the bias of simulations was markedly reduced, which enhanced the accuracy of the model. When constraint is based on a single source of data, the effect on the model is generally limited [6,9,12,25,51,52]. For example, when using only flux data or biometric data, only a fraction of the parameters in the model are typically well constrained. However, with the incorporation of additional observations, the estimates of many (although not all) model parameters are significantly tightened, and the uncertainty associated with the simulation results is generally substantially reduced (Tables 3 and 4; Figures 3 and 4) [9,25]. When all the relevant data were used to constrain the DALEC model, we found that only two of the 10 assessed parameters (Tr and Et) were poorly constrained (Figure 2), thereby indicating that other types of data constraint may be required, although this does not necessarily imply that larger quantities of constraint data will have a better constraining effect on the model [6,20]. In this regard, Keenan et al. (2013) [6] found that in terms of constraining model performance, much of the data information is redundant, as these authors observed that only five of the 17 data sources they evaluated were required for model constraint.
Whether the model parameters can be well constrained by the data is dependent on the data information content and how this information is represented in the model [1,20]. With respect to forest C cycling, for example, the data per se contains only limited information on some aspects of the C flux, such as NEE and Rs, and does not contain sufficient information to constrain parameters related to C pools, whereas C pool data includes the allocation and transfer between different C pools, which is relevant to internal system dynamic information. These data are extremely effective for constraining parameters and variables, and thus multiple data streams are needed to constrain the model. Plant C pool observations are effective in constraining the relevant parameters of leaf, fine root, and wood C pools, whereas C flux (NEE, Rs) has a strong constraining effect on NPP allocation and related parameters of soil C pools [53]. This indicates that observational data for foliage, fine root, and woody biomass provide information on C transfer from plant C to litter and soil C, and C flux data provide information on NPP allocation and carbon transfer between the litter and soil C pools.
Simulating foliage C remains a challenge necessitating further attention. Even with the assimilation of litterfall data (Run 4), the simulation of foliage C pools was not further constrained. This may be because fir canopies contain multiple years of foliage (i.e., foliage longevity is two or more years). In addition, the dynamics of foliar C pool has a limit point in the evergreen version of DALEC model [54], which depends on the values of Fnf and Tf, there may also be a need for further constraints on these two parameters. In the present study, we found that even when multiple types of observational data were assimilated, the simulation results for foliage C pools continued to show a large bias. Therefore, we speculate that other types of data are necessary for foliage C simulation (for example, soil moisture [55] and chlorophyll content [56] that are closely related to plant growth), suggesting that these should be targets for future measurement efforts. In addition, the simulated annual summed values of NEE were in good agreement with observed value (Run 5). However, it appears that the DALEC model significantly overpredicts Rs, thus should underpredict NEE for the year. To better simulate C flux, more amounts and smaller scales of data may be required. It is worth noting that the RMSE, MAE, and confidence intervals mainly reflect the impact of the differences in parameter values on the simulation results (uncertainty), and we did not consider the influence of the driving data, observational data, or model structure on the simulation results. Data errors (including system errors) are particularly important factors with regards to the estimation of parameters and state variables, and in future research, it will be necessary to take into account data errors and model structure, and the quantitative errors of the system.
The uncertainty associated with the simulation results for C pools increases with time, whereas the uncertainty for C flux is generally relatively small (Figures 3 and 4) [57,58]. This is because the size of C pools is a cumulative quantity, and simulation errors typically show a gradual increase with an increase in time, whereas the C flux represents a change within a given time period, and thus the error of the simulation results is relatively independent. In addition, in the present study, we applied a parameter optimization method for the batch method, that is, we assimilated all our observational data to optimize the parameters and state variables of the model, whereas the sequential method [such as the Kalman filter method [59,60] can make parameters and state variables with the appearance of the new observation update (calibration), if the flux data, simulation is carried out in real-time. This may be very important (for example, in numerical weather prediction), as it may effectively reduce the uncertainty of the simulation results. The uncertainty of model parameters and model prediction results generally decrease with the addition of new data to the inversion analysis, which enables us to gain a more comprehensive understanding of the information contained in these data [25,61]. C flux data also provides valuable constraining information, and since the measurement of these fluxes is often performed at many sites, these data are increasingly being used in MDF analysis. Our analysis indicates that incorporating multiple constraints in the inversion analysis can contribute to enhancing model simulation results by reducing the prediction bias and uncertainty during the forward run of the model.

Conclusions
Biometric data obtained from site observations and NEE data derived using the eddy covariance method provide rich sources of information for the parameters and state variables of constraint models. There are still challenges in simulating the foliar C pool, which needs to be paid attention to in future research. In this study, we found that the degrees to which model parameters are constrained and model performance is enhanced are dependent on the different constraint data employed. With an increase in assimilated data, we observed a gradual decrease in the uncertainty of simulations, and when all evaluated observations were used for model constraint, parameter uncertainties were minimized and the simulated values were in good agreement with the observed values. Plant carbon data were found to be the most effective in terms of significantly reducing the uncertainty of the DALEC model carbon pool, and when all observation data are used to constrain the model, the number of well-constrained model parameters increases, thereby enhancing the accuracy of model simulations. Furthermore, we observed that incorporating the additional constraints of soil organic carbon observations is a key factor with respect to reducing the uncertainty or bias associated with the soil carbon pool.