On Approaches to Analyze the Sensitivity of Simulated Hydrologic Fluxes to Model Parameters in the Community Land Model

Effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash–Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalized linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.


Introduction
Recent advances in modeling Earth systems involve model development and integration, data collection, and high-performance computing.The integrated Earth system involves multi-phase, multi-component biogeophysical and biogeochemical processes.Integrated models introduce numerous model and coupling parameters and therefore formidable high-dimensional parameter space; moreover, many of the parameters cannot be observed or measured, and therefore are subject to great uncertainty.All of these factors make it difficult to predict and quantify uncertainty and risk for decision-making.A land surface model (LSM) is the land component of a numerical weather prediction or a climate model.Developed as physically based models, the parameters in LSMs are designed to be physically meaningful, measurable, and transferable to locations that share the same physical properties [1,2].Nevertheless, predictions from current-generation LSMs remain subject to large uncertainties because of model structure errors caused by inaccurate parameterizations used to represent physical processes, uncertainty in external forcing data and initial and boundary conditions, as well as uncertainty in model parameters.As LSMs become increasingly sophisticated with interacting processes described by complicated parameterizations yet with poorly constrained parameter values, more and more research is focusing on designing algorithms to better estimate and constrain LSM parameters and to quantify the associated uncertainties [3][4][5][6].Version 4 of the Community Land Model (CLM) was released by the National Center for Atmospheric Research as the land component within the Community Earth System Model (CESM) [7,8].CLM4 simulates biogeophysical processes such as energy and water fluxes from canopy and soil; heat transfer in soil and snow; the hydrology of canopy, soil, and snow; and stomatal physiology and photosynthesis.Even though most of its applications are conducted at continental or global scales [9,10], CLM4 can be run at any resolution, such as at flux tower sites [4,11] or small watersheds [12,13].The main governing equations and parameters are briefly introduced in Supplementary Materials Section 1.
Various techniques have been used for sensitivity analysis (SA) in land surface modeling [14][15][16][17].These uncertainty quantification (UQ) and SA approaches have different focuses: global vs. local sensitivity, derivative vs. interpreted variance, and quantitative vs. qualitative.To evaluate contributions quantitatively, one can choose from different approaches [18,19], such as generalized cross validation (GCV) [20], analysis of variance (ANOVA), standardized regression coefficients (SRCs) [21], the Morris method [22], and the Sobol method [23,24].Some of these approaches (e.g., the Morris and Sobol methods) are based on simulation data directly, while others (e.g., GCV, ANOVA, and SRC) are based on regression models.The simulation-based SA methods require numerous simulation trials for high-order parameter space, which can be unaffordable for computationally expensive simulations, so regression-based SA methods become favorable under such circumstances.The various regression models that could be used in SA include the generalized linear model (GLM) [25][26][27][28], multivariate adaptive regression splines (MARS) model [20,24,29,30], support vector machine (SVM) [31][32][33], Gaussian process [34], and artificial neural networks (ANN) [35].In this study, we investigated four SA approaches to provide quantitative measures of the significance of a parameter/factor by calculating its contribution of variance to the overall variability of the model responses.Using these methods, the effects of linear, interaction, and high-order terms of input variables can all be explored and narrowed down to fewer terms using parameter reduction/selection techniques.
In our previous studies [4,13], the sensitivity of surface fluxes and runoff simulations to major hydrologic parameters in CLM4 was investigated by integrating CLM4 with a stochastic exploratory UQ framework.The framework features an entropy approach to objectively define priors distributions of the input parameters that are representative of our knowledge of the system, an efficient sampling method to explore the parameter space so that the output statistics are exploratory of most if not all of the possibilities of the reality, and the multivariate GLM analyses and significance statistical tests to rank the significance of input parameters and develop relationships between inputs and outputs using response surface plots and the finalized linear models.The UQ framework was applied at 13 flux towers in the Ameriflux network and 20 watersheds of the Model Parameter Estimation Experiment (MOPEX) [13] spanning a wide range of climate, landscape, and soil conditions.We found that the CLM4-simulated latent heat flux (LH), sensible heat flux (SH), and runoff show the largest sensitivity to parameters of subsurface runoff generation.
It is particularly challenging to choose among the many SA approaches and various combinations of input and response variables.Firstly, it is widely recognized that different SA approaches might yield different conclusions regarding parameter sensitivities and/or developed Water 2015, 7,[6810][6811][6812][6813][6814][6815][6816][6817][6818][6819][6820][6821][6822][6823][6824][6825][6826] relationships between the input parameters and output responses.Secondly, the choices of statistical measures or the descriptive properties/metrics of model outputs may provide quite different findings and/or conclusions about the model sensitivity from different perspectives.Furthermore, using different metrics/measures/properties as the output variables in the SA would require different numbers of samples or ensemble simulations to yield converged response surfaces (output variable vs. input parameters).Therefore, the robustness of the conclusions derived using a single SA approach (e.g., Hou et al. [4] and Huang et al. [13]) is to be assessed.In addition, because the ultimate goal of SA is to provide guidance for calibration, rather than focusing solely on model responses to parameters, more attention must be paid to deviations between simulated and observed variables.
Therefore, the study documented here aimed to evaluate the sensitivity of deviations between simulated and observed runoff and latent heat flux, identify the possible ways of adjusting the input parameter values for model improvement, and evaluate the feasibility of model calibration given guidance results using the various SA methods.The SAs conducted during this study also provide systematic and quantitative means for reducing the order of parameter space that can significantly accelerate the calibration process.We also evaluated the robustness of the conclusions derived from previous studies by comparing the choices of response variables and/or metrics for calibration, sampling, and SA methods.Hence, the analyses described herein provide guidance for parameter inversion design before integrating strategies such as Bayesian inversion [36,37].
In Section 2, we describe the study site, its model parameterization, the different metrics used in the SAs, and the SA approaches.Section 3 presents the results of the effects of using different metrics and SA approaches and discusses the convergence behaviors relative to the number of training sample sets.Discussion and conclusions are provided in Section 4.

Site Description
To evaluate how various metrics or SA methods might affect the SA results, we explore four different approaches at one MOPEX watershed investigated in our previous studies [4,13,36].In the current study, the streamflow at the outlet of the watershed is retrieved from U.S. Geological Survey (USGS) station 07147800 located at Winfield, Walnut River, Kansas.The 1 km resolution evapotranspiration products at monthly time step during the period of 2000-2008, based on the Moderate Resolution Imaging Spectroradiometer (MODIS) [38] (i.e., MOD16), is aggregated over the entire watershed and serve as observations for validating CLM4 simulations.Details of the MOD16 algorithm can be found at the description of MODIS Global Evapotranspiration project (MOD16) [39].The watershed, covered by grasslands and croplands, has a drainage area of 4869.18 km 2 , and is hereafter referred to as MOPEX 07147800.Daily streamflow observation data are from the USGS station and are partitioned into surface and baseflow components using the one parameter recursive filter [40].The data are further aggregated to the monthly time step in this study to evaluate simulations of total runoff and the partitioning between surface and subsurface runoff climatologically.
In this study, CLM4 is applied as a typical lumped watershed hydrologic model by assuming that the entire watershed can be represented as one grid cell using surface and subsurface runoff generation formulations based on the TOPMODEL assumptions [12,41].A 3.8 m soil column is used to simulate the soil moisture redistribution.It is discretized into 10 layers.An unconfined aquifer with a prescribed storage capacity and specific yield is added to the bottom of the soil column that could potentially be recharged or supply water to the soil column.The temporal resolution is one hour, and the results are averaged in each month to match the resolution of the available observation data.Therefore, all of the comparisons between simulations and observations in this study are based on the differences between the averaged simulated values in each month for this single lumped grid Water 2015, 7, 6810-6826 and the monthly averaged water flow data recorded at the MOPEX 07147800 streamflow station or the monthly averaged latent heat derived from MODIS.

Model Parameterization
Ten parameters associated with the soil hydrologic process (i.e., surface and subsurface runoff, recharge, soil water transport) are evaluated in this study, and consistent with our earlier studies [4,13].The parameters are f max (maximum factional saturated area), C s (shape parameter of topographic index distribution ), f over (decay factor that represents the distribution of surface runoff with depth), f drai (decay factor that represents the distribution of subsurface runoff with depth), Q dm (maximum subsurface drainage), b (Clapp and Hornberger exponent), S y (specific yield), Ψ s (saturated soil matrix potential), K s (hydraulic conductivity), and θ s (porosity).Definitions of the 10 selected parameters are in Supplementary Materials Section 1.The uncertainty ranges and prior information about each input parameter are reported in our previous work [13] and are summarized in Supplementary Materials Table S1.Given the prior information, the prior probability density functions (pdfs) of input parameters are derived using the minimum-relative-entropy (MRE) algorithm [42,43].With the entropy concept, the MRE solution is unique and maximally uncommitted with respect to unknown information given information such as bounds and moments.Therefore, the input parametric uncertainties are fully represented in the form of the MRE prior pdfs, from which samples of the input parameters can be generated using efficient sampling approaches (e.g., quasi Monte Carlo (QMC), Latin Hypercube sampling (LHS)).The details of the MRE algorithm and how to apply it on CLM4 input parameters were introduced in our previous work [4].Other parameters, like leaf area index and rooting depth of vegetation, may also affect runoff and evapotranspiration.Considering too many parameters at the same time may significantly increase the computational cost and make the SA results unreliable.Therefore, it is preferable to not work directly with a very high-dimensional parameter space, because doing so would make the SA results inconclusive.One solution is to divide all of the CLM4 parameters into several groups for SA study according to their associated physical processes and interactions.We chose the group of parameters associated with the soil hydrologic process in this study.In future work, we will consider adding more parameters or evaluating different groups of parameters.

Response Variables and Metrics
In this study, we focus on CLM4-simulated runoff and latent heat flux at the MOPEX 0717800 basin from 2000 to 2008.The observed and simulated data to be evaluated include total runoff (runoff), subsurface runoff (q drai ), surface runoff (q over ), and latent heat flux (LH).Three metrics are applied to quantitatively evaluate the deviations between the CLM4 simulations and observations: residuals, Nash-Sutcliffe coefficient (NSC) [44,45], and log mean square error (LMSE).Residuals directly measure the deviation between simulation results and observed data, and they favor more of the high flow or high heat flux, because in general the absolute differences between simulations and site observations are bigger when the observations of runoff or latent heat flux are higher.The NSC measures the deviation between simulations and observations during the whole time period of investigation by a nonlinear combination of the correlation, the difference in the mean, and the difference in the variability.It emphasizes high flows or heat flux.LMSE evaluates the difference between simulations and observations during the whole time period of investigation as well.Because the value of runoff usually ranges from 0 to a few hundreds in the units of millimeters per day, the log 10 prunoffq usually ranges from ´10 to 2. Therefore, the LMSE puts much more weight on the small runoff values (e.g., near 0 mm/day).The same behavior is true for latent heat flux.Therefore, LMSE places more weight on low flows or low heat flux.

Sampling Methods
Input parametric uncertainty can be fully represented in the form of MRE prior pdfs.To reliably evaluate the propagation of such uncertainty using the CLM, samples should be generated from the pdfs to explore all possibilities within the parametric space without introducing undesired bias.However, as the model's parameter dimensionality increases, the number of samples required for a systematic approach to adequately cover the parametric space becomes unreasonable without efficient sampling methods.Furthermore, the distribution of samples should be consistent with prior information about the parameters.
QMC methods have become increasingly popular over the last two decades because of their faster convergence speeds and effective sampling of high-dimensional parametric space without clumping and gaps [4,46].In this study, 1024 QMC samples were generated from existing scrambled Sobol sequences [23,47] that ranged from 0 to 1 and were scaled to the actual input parameter ranges listed in our previous work [4].The QMC samples were projected onto the prior MRE pdfs to construct realizations of the samples in physical space, as shown in Figure 1.
Water 2015, 7, page-page efficient sampling methods.Furthermore, the distribution of samples should be consistent with prior information about the parameters.
QMC methods have become increasingly popular over the last two decades because of their faster convergence speeds and effective sampling of high-dimensional parametric space without clumping and gaps [4,46].In this study, 1024 QMC samples were generated from existing scrambled Sobol sequences [23,47] that ranged from 0 to 1 and were scaled to the actual input parameter ranges listed in our previous work [4].The QMC samples were projected onto the prior MRE pdfs to construct realizations of the samples in physical space, as shown in Figure 1.

Sensitivity Analysis Approaches
Sensitivity analysis can be based on the direct system model outputs or a surrogate (e.g., a regression model) that can fit the original system outputs to input parameters or factors.Analysis based on the direct outputs avoids the inaccuracy and biases in the regression models, but requires a large amount of sampling points with the necessary number of replicated values of input parameters [48,49], such as those produced using the replicated Latin Hypercube sampling (rLHS) method [50].Water 2015, 7, 6810-6826

Sensitivity Analysis Approaches
Sensitivity analysis can be based on the direct system model outputs or a surrogate (e.g., a regression model) that can fit the original system outputs to input parameters or factors.Analysis based on the direct outputs avoids the inaccuracy and biases in the regression models, but requires a large amount of sampling points with the necessary number of replicated values of input parameters [48,49], such as those produced using the replicated Latin Hypercube sampling (rLHS) method [50].For a high-dimensional input parameter space, the number of sampling points is usually more than 10,000 to achieve a converged result [51,52].This number of sampling points is unaffordable for most studies, particularly when the forward model (e.g., CLM4) is computationally demanding.Therefore, SA based on a regression model is more efficient, especially for high-dimensional input parameter space, because it does not require replicated values of each input parameter.The accuracy of the regression method directly affects the reliability of SA.In this study, four SA approaches are investigated, including analysis of variance (ANOVA) based on the generalized linear model (GLM), generalized cross validation (GCV) based on the multivariate adaptive regression splines (MARS) model, standardized regression coefficients (SRCs) based on the linear regression model (LM), and ANOVA based on support vector machine (SVM).These four approaches are briefly introduced in the following sections.

ANOVA Based on GLM
Assuming the response variable Y (e.g., runoff and/or its metrics) follows a normal distribution, a GLM is fitted with the following starting model: where x i,j represents the ith realization of the jth parameter, which can be original or transformed first-order, two-way interaction, or higher order terms; c j is the fitted coefficient for the jth parameter, which can be calculated by the least squares method [53,54]; Y i represents the ith realization of the response variable, such as r i,t , NSC i , and LMSE i in this study; and ε is model-fitting residuals.Generally, use of a GLM is recommended for a monotonic system [33], and it can offer an explicit equation to present the relationship between input and output parameters, which is favorable for engineering applications.ANOVA is a collection of statistical models used to analyze the regression models.In this study, a new regression model built by removing a selected Kth parameter is The total deviation difference between the new regression model (Equation ( 2)) and the completed regression model (Equation ( 1)) is defined as a quantitative measurement of the importance of parameter x K .This approach summarizes all of the contribution of the parameter x K in different orders and interaction terms.

GCV Based on the MARS Model
The MARS model is a form of a nonparametric regression technique and can be seen as an extension of linear models that automatically model nonlinearities and interactions between variables.The MARS regression model is of the following form: Water 2015, 7, 6810-6826 where B i pxq is a basis function, c i is a coefficient, and k is the number of MARS terms.The basis function B i pxq can be a constant, a hinge function [20], or a product of two or more hinge functions, and x represents input parameters.In the MARS model-building process, MARS repeatedly adds paired basis functions to the model, until the residual sum-of-squares (RSS) is smaller than required or the maximum number of terms (specified by the user) is reached.This process is called forward passing and it usually builds an over-fit model, which means the model has a good fit to the training data used to build the model but it will not generalize well to new data.Therefore, the MARS model can be implemented with backward passing to remove terms one by one until the GCV is small enough.GCV is defined as follows: where N is the number of observed system outputs, and P is a penalty coefficient, which is about 2 or 3 [20].With MARS terms dropped, the RSS always increases, but the denominator of Equation ( 4) increases as well, so the GCV is a form of regularization that trades off goodness-of-fit against model complexity.After the MARS model is specified, the GCV can be used to quantitatively measure the importance of input parameters.For a developed MARS model, when all of the terms related to an input parameter are removed, the RSS and the number of MARS terms (k) will change, which leads to GCV changes.Conversely, the GCV change due to removal of the MARS model terms related to a parameter can be used as a quantitative measure of how much the regression model depends on the removed input parameter, that is, the importance of the corresponding input parameter.The MARS model theoretically can offer better interpolation than the GLM for a nonmonotonic nonlinear system, so the measurement of input parameter importance is supposed to be more reliable.In addition, the convergence behavior and performance with the number of sampling points for the MARS model and GLM are different, as demonstrated and discussed in Section 3.3.

SRC Based on the LM
A linear polynomial regression analysis assumes the following form: For the m-th sample points, x are the input parameters, b are the coefficients to be determined, Y is the output from the computational model, and is the discrepancy between the simulation model output and the regression results.For the k-th input parameter, the SRC is defined as [21,55] where ŝ " and M is the total number of sampling sets, and symbol p q stands for the mean value.

ANOVA Based on SVM
The basic idea of the SVM regression model is to find a function that deviates most from the actually obtained outputs for all of the training data and, at the same time, is as flat as possible [31,32].For a linear function in the form of Y i " where a j is the weighting coefficient, b j is a translation coefficient, and x i,j is j th input parameter for the i th sample, the formulated SVM approximation is subject to # Y i ´řj"1 to J a j x i,j `bi ď ř j"1 to J a j x i,j `bi ´Yi ď where is the most required deviation.Because the SVM assumption may be unrealistic in general, a slack variable is used to control the tradeoff between flatness and how much deviation is tolerated.Coefficients a j can be calculated from the optimization problem (Equation ( 9)), and the b i can be calculated using Karush-Kuhn-Tucker (KKT) conditions [33].More details regarding SVM can be found in references [31][32][33].The ANOVA for SVM is designed to be the total deviation difference between a new regression model (Equations ( 10) and( 11)) that is kicked out a selected input parameter (x K ) and the completed regression model (Equations ( 8) and ( 9)) is defined as a quantitative measurement of the importance of parameter x K .
subject to # Y i ´řj"1 to J, j‰K a j x i,j `bi ď ř j"1 to J, j‰K a j x i,j `bi ´Yi ď

Effects of Choices of Response Variables/Metrics
Figure 2 shows boxplots [56][57][58] that compare the residuals for different response variables and metrics as a function of input parameters S y .In this study, S y is one of the input parameters that show significant sensitivity relative to the deviation between the simulation and observation data, so we only show the boxplots of S y to be concise.The boxplots for other input parameters are shown in Figures S1-S3 in the Supplementary Materials Section 3. As mentioned in Section 2.1, a 3.8 m soil column is used to simulate the soil moisture redistribution; it is discretized into 10 layers and an unconfined aquifer with a prescribed storage capacity and specific yield (S y ) is added to the bottom of the soil column that could potentially be recharged or supply water to the soil column.Generally, a large value of S y leads to more water exchange between soil layers and the aquifer.At this site, such exchange seems to reduce overland flow (q over ) and consequently increases subsurface runoff (q drai ) by infiltrating more water into the subsurface.The recharge of the aquifer enhances percolation of water from shallow to deep soil layers, reduces available water to the shallow rooted vegetation dominated by cropland and grassland, and therefore the latent heat flux (LH).On the other hand, under extremely wet conditions, both q drai and LH are high because there is sufficient soil water supply to both.Therefore, there is no simple linear relationship between S y and LH.The total amount of LH may be affected nonlinearly by the combination of S y and other parameters.In Figure 2, Water 2015, 7, 6810-6826 the vertical red line is the default value of S y , which is 0.18 at this site.The red circles are the corresponding metric value obtained by using the default values for all of the selected 10 input parameters.The boxplots of residual and NSC of total runoff, subsurface runoff q drai , and surface runoff q over indicate that increased values of S y can reduce the deviations between simulation and observation data.The boxplots of LMSE of total runoff, q drai , and q over do not indicate an obvious preference of S y value that can reduce the deviations.For latent heat (LH), residual and NSC indicate that decreased values of S y can reduce the deviations, while LMSE indicates that increased values of S y can reduce the deviations.In summary, the boxplots of various metrics of streamflow indicate that a large value of S y can significantly reduce the model deviations.The metrics of LH indicate that simultaneous adjustments of S y and other parameters are required to further reduce the deviation for LH.Many trials in the boxplots show that NSC LH approaches 1 and LMSE LH approaches 0 simultaneously with large S y values.It should be noted that the boxplot is mainly used for qualitatively showing the significance of the metrics variation due to changes in the input parameters and the potential for reducing the deviations by adjusting parameter values.It is impossible to find the optimal input parameter value directly from the boxplot.Systematic calibration of input parameters will be discussed in our future work.

Effects of Choices of SA Approach
The preceding analyses provide qualitative information (Figure 2 and Figures S1-S3) about which parameters are more significant, and therefore need to be adjusted when calibrating the CLM4, and they indicate the direction of such adjustments.Sensitivity analysis approaches can quantitatively measure the importance of each input parameter, but different approaches usually yield different evaluations, especially for complicated systems.In many cases, it is hard to justify which approach is best for the particular system.For example, ANOVA based on a GLM usually enlarges the importance of top-ranked parameters, so it is efficient for parameter screening and reduced-order model development.GCV based on a MARS model usually more evenly distributes the contribution of variance among all of the parameters, so it may enlarge the importance of lower ranked parameters.There is no universal method that can accurately evaluate the importance of input parameters for every system, but GCV is considered a more reliable SA approach for nonmonotonic nonlinear systems.It is reasonable to compare the measurements/findings from

Effects of Choices of SA Approach
The preceding analyses provide qualitative information (Figure 2 and Figures S1-S3) about which parameters are more significant, and therefore need to be adjusted when calibrating the CLM4, and they indicate the direction of such adjustments.Sensitivity analysis approaches can quantitatively measure the importance of each input parameter, but different approaches usually yield different evaluations, especially for complicated systems.In many cases, it is hard to justify which approach is best for the particular system.For example, ANOVA based on a GLM usually Water 2015, 7, 6810-6826 enlarges the importance of top-ranked parameters, so it is efficient for parameter screening and reduced-order model development.GCV based on a MARS model usually more evenly distributes the contribution of variance among all of the parameters, so it may enlarge the importance of lower ranked parameters.There is no universal method that can accurately evaluate the importance of input parameters for every system, but GCV is considered a more reliable SA approach for nonmonotonic nonlinear systems.It is reasonable to compare the measurements/findings from multiple approaches, such as ANOVA, GCV, and SRC, or combine them with qualitative analysis to provide supplementary information to each other and to remove ambiguities from the conclusions.As mentioned above, the reliability of the regression model affects the accuracy of the sensitivity measurement.Generally, a higher order regression model can fit the training set better, which means that an SA based on higher order regression model is more accurate.However, a high-order regression model may over-fit the training set, e.g., fitting the noises or outliers in the system, which may reduce the reliability of the SA results.Therefore, an appropriate order of the regression model is critical for SA.In this study, a first-order GLM, MARS, LM, and SVM model can provide a correlation coefficient (R 2 ) of about 0.2 to 0.4, and a second-order GLM or MARS model can fit the system R 2 of around 0.8 to 0.9.A third-order GLM or MARS model can provide a R 2 reaching 0.99, but the resulting relationships are hard to interpret physically, and likely to be over-fitted statistically given the high number of coefficients to be fitted with limited sample sets; therefore, third or higher order models are not considered in this study.
Figure 3 shows the sensitivity score of the input parameters for the NSC and LMSE of runoff, q drai , q over , and LH derived using six different approaches.They are SRC based on LM, ANOVA based on SVM, GCV based on first-and second-order MARS, and ANOVA based on first-and second-order GLM.The score stands for the percentage of the contributions to the total variance of the investigated output response from each input parameter's uncertainty.The warm color (red) stands for more important, and the cold color (green) means less important.Generally, S y shows considerable importance for most response variables and metrics.f drai is important for total runoff, subsurface runoff, and LH, while log 10 pK s q is more important for surface runoff and LH.log 10 pΨ s q is relatively important for surface runoff and latent heat as well.log 10 pQ dm q also shows considerable impact on q drai and LH.In general, the six approaches show consistent results.
Water 2015, 7, page-page Figure 3 shows the sensitivity score of the input parameters for the NSC and LMSE of runoff, , , and LH derived using six different approaches.They are SRC based on LM, ANOVA based on SVM, GCV based on first-and second-order MARS, and ANOVA based on first-and second-order GLM.The score stands for the percentage of the contributions to the total variance of the investigated output response from each input parameter's uncertainty.The warm color (red) stands for more important, and the cold color (green) means less important.Generally, shows considerable importance for most response variables and metrics.
is important for total runoff, subsurface runoff, and LH, while ( ) is more important for surface runoff and LH.
is relatively important for surface runoff and latent heat as well.( ) also shows considerable impact on and LH.In general, the six approaches show consistent results.Figure 4 shows the proportion of sensitivity of ANOVA based on the first-order GLM for residuals of runoff, , , and LH in 108 months to investigate the seasonal patterns of parameter importance.The proportion of sensitivity is calculated for each month separately.For the residuals of total runoff (Figure 4a), and are dominant in the contributions to the residuals between simulations and observed data in all of the 108 months.
is more important in winter and is more important in summer.Using the residual of as the response variable (Figure 4b), the sensitivity scores are almost the same as using total runoff.Regarding the residual of (Figure 4c), , , , and ( ) are the top four input parameters in most of the 108 months., , and are more important in winter and less important in summer, while ( ) is more important in summer and less important in winter.Figure 4d shows the residuals of LH. , as the critical parameter, is more important in winter and less important in summer.For the investigated site, mainly controls (indirectly) how fast the water can percolate through the soil layers; it modifies the lower boundary condition of the 1-D Richards equation by recharging the Figure 4 shows the proportion of sensitivity of ANOVA based on the first-order GLM for residuals of runoff, q drai , q over , and LH in 108 months to investigate the seasonal patterns of parameter importance.The proportion of sensitivity is calculated for each month separately.For the residuals of total runoff (Figure 4a), S y and f drai are dominant in the contributions to the residuals between simulations and observed data in all of the 108 months.f drai is more important in winter and S y is more important in summer.Using the residual of q drai as the response variable (Figure 4b), the sensitivity scores are almost the same as using total runoff.Regarding the residual of q over (Figure 4c), Water 2015, 7, 6810-6826 f max , f over , S y , and log 10 pK s q are the top four input parameters in most of the 108 months f max , f over , and S y are more important in winter and less important in summer, while log 10 pK s q is more important in summer and less important in winter.Figure 4d shows the residuals of LH. S y , as the critical parameter, is more important in winter and less important in summer.For the investigated site, S y mainly controls (indirectly) how fast the water can percolate through the soil layers; it modifies the lower boundary condition of the 1-D Richards equation by recharging the underlying aquifer layer during and after precipitation.Fast percolation leads to less overland flow and shallow-layer soil moisture, and therefore less LH.It should be noted that the proportion of sensitivity shown here is only for ANOVA based on the first-order GLM.The figures showing the proportion of sensitivity variation with time for other SA approaches are in Supplementary Materials Figure S4, which assigns a relatively lower proportion of sensitivity to S y .
Figure 5 summarizes the four most important input parameters (red blocks) for different response variables (total runoff, q drai , q over , and LH), metrics (NSC, LMSE, and residual), and four SA approaches (SRC based on LM, ANOVA based on SVM, GCV based on second-order MARS, and ANOVA based on second-order GLM).The total contributions percentages of the top four input parameters are also listed in the bottom line in Figure 5. ANOVA based on GLM or SVM indicates that the top four important parameters are adequate to calibrate the CLM4, because they account for 80% to 90% of the variations in response variables.However, GCV and SRC show that the top four most important parameters only contribute 60% to 80% to the total variability of the response variables.The quantitative importance measurement provides a guideline for further model calibration for CLM4.For example, using the four most important parameters listed in Figure 5 as unknowns, one can efficiently calibrate CLM by adjusting the parameters to match the total runoff simulations; adding the fifth and sixth important parameters (i.e., b and θ s ) would have little impact on matching runoff observations and CLM4 predictions.

Convergence, Under-Sampling Issues, and Model Verification
As mentioned in Section 3.2, different SA approaches may lead to different important measurement results for input parameters.Moreover, the convergence speed of the SA approach may vary, and the SA approach with higher convergence speed is always preferred.The convergence speed also varies with different response variables.Figure 6 shows the comparison of the convergence of sensitivity scores for different response variables (NSC of total runoff, , , and LH) and SA approaches (ANOVA based on first-order GLM and GCV based on first-order MARS).The convergence curves for LMSE and second-order regression models are shown in Figure Figure 5.The four most important input parameters (red blocks) for different response variables (total runoff, q drai , q over , and LH), metrics (NSC, LMSE, and residual), and four SA approaches (SRC based on LM, ANOVA based on SVM, GCV based on second-order MARS, and ANOVA based on second-order GLM).

Convergence, Under-Sampling Issues, and Model Verification
As mentioned in Section 3.2, different SA approaches may lead to different important measurement results for input parameters.Moreover, the convergence speed of the SA approach may vary, and the SA approach with higher convergence speed is always preferred.The convergence speed also varies with different response variables.Figure 6 shows the comparison of the convergence of sensitivity scores for different response variables (NSC of total runoff, q drai , q over , and LH) and SA approaches (ANOVA based on first-order GLM and GCV based on first-order MARS).The convergence curves for LMSE and second-order regression models are shown in Figure S5 in the Supplementary Materials, where only the three most important input parameters are compared.For the NSCs of total runoff and q drai (Figure 6a,b), around 700 sampling points are needed to get a converged sensitivity measurement, and the GCV measurement still shows small fluctuations after 700 sampling points.ANOVA measurement seems to be a little more stable than GCV.For the NSC of q over (Figure 6c), around 250 sampling points are needed for stabilized sensitivity scores; similarly, GCV shows slightly larger fluctuations than ANOVA.For the NSC of LH (Figure 6d), 250 samples are needed, while the LMSE of LH needs about 400 sampling points.In summary, GCV and ANOVA can both achieve relatively stable measurements using a reasonable number of sampling points.The sensitivity score results converge slightly slower than the ANOVA results.From this point of view, the ANOVA results are more reliable with limited samples.In addition, the convergence curves provide guidance on the reliability of SA with different numbers of sampling points.For the field site of investigation, the recommended number of sampling points would be at least 400 in order to fully understand and therefore use the calibratable portion of the total variability in the observations.If the research focus is to separate the dominant and secondary input parameters, 250 or even 128 sampling points can provide relatively reliable results in ranking the parameter significance.
Water 2015, 7, 6810-6826 Theoretically, the MARS model can be used to build a regression model that perfectly matches the training set (here the CLM4 simulations), because the MARS model can keep adding hinge functions into the model to reduce the deviations to 0. Therefore, the MARS model provides a more accurate regressed model than the other three regression models.However, a more accurate regression model does not guarantee that the corresponding sensitivity measurement will be more reliable, because the training data not only provide information about the relationship between input and output parameters, but also contain noises associated with errors and uncertainties in observation-based data and numerical errors in CLM4 simulations.The regression model that perfectly matches the training data set might be over-fitted; therefore, it is reasonable to use multiple models to provide complementary information.Considering that the four SA approaches provide similar sensitivity measurements for most response variables in this study, we believe they are reliable for the study site.
The first-and regression models were used to fit the response variables, and the impacts of the order of regression models on the sensitivity measurement were investigated.If the relationships between response variables and input parameters are linear, the order of regression models do not affect the sensitivity measurement much, e.g., the NSC of q over , NSC of LH, and LMSE of q drai .If the relationships between response variables and input parameters are mostly quadratic, the order of regression models affect the sensitivity measurement, e.g., the NSC of runoff, NSC of q drai , LMSE of runoff, and LMSE of q over .This study also evaluated the effectiveness of quantitative SA approaches on different types of CLM4 output data, including runoff and latent heat fluxes.Different metrics of these simulated deviations are treated as response variables in the SA, which identifies the parameters dominating the deviations and therefore should be included in the model calibration or parameter inversion design.Depending on the purpose of matching the whole time series (low, high, and medium values) or just the extreme events (e.g., high runoff/LH), different sets of parameters should be inverted.For example, in model calibration on subsurface runoff, if the purpose is matching whole time series, S y , f drai , and log 10 pQ dm q would be the parameters to be adjusted, and adjustment of f max and log 10 pK s q might also be considered if higher accuracy numerical prediction is expected.If the calibration focuses on a low flow regime, S y , f drai , log 10 pΨ s q, log 10 pQ dm q, and f over should be included as unknowns.In addition, adopting different SA approaches may yield different ideas of inversion setup, and such differences are more obvious for parameters whose contributions are somewhere between dominant and secondary.The common parameters identified to be important by different SA approaches are of high priority to be optimized for reducing the discrepancies between runoff/LH observations and model simulations.It is reasonable to conduct joint inversion of multiple data or metrics to provide supplementary information to each other.
As mentioned in Section 2, the subsurface and surface runoffs are derived from total runoff.Because the subsurface runoff dominates the total runoff in the studied basin, the SA results are almost the same for total runoff and subsurface runoff.The metrics of NSC and LMSE indicate that the separation of subsurface runoff from total runoff helps improve the matches between simulated q drai and observations.Because the temporal variations in total runoff and surface runoff (q over ) are mainly caused by precipitation, the effects of the input parameters may be concealed by the temporal variations in precipitation, especially the short-term components.Separating the subsurface flow (q drai ) from total runoff might weaken the influence of precipitation variations and enhance the contributions from the input parameters.As a result, adjustments of the input parameters can clearly modify the simulated q drai , and therefore it could be more favorable to use q drai for model calibration.
The convergence behaviors (as a function of the number of samples) of the SA approaches provide guidance for the number of sampling points to be used for reliable SA and parameter identifiability study.For the study site, the recommended number of sample sets is at least 400.If the focus is to separate the dominant and secondary input parameters, 250 or even 128 sampling points are acceptable [4].

Figure 1 .
Figure 1.Paired scatterplots of realizations of quasi-Monte Carlo samples for the MOPEX 07147800.

Figure 1 .
Figure 1.Paired scatterplots of realizations of quasi-Monte Carlo samples for the MOPEX 07147800.

Figure 2 .
Figure 2. Boxplots for different response variables and metrics as a function of input parameters specific yield ( ).

Figure 2 .
Figure 2. Boxplots for different response variables and metrics as a function of input parameters specific yield (S y ).

Figure 3 .
Figure 3. Sensitivity scores for different response variables/metrics and SA approaches.

Figure 3 .
Figure 3. Sensitivity scores for different response variables/metrics and SA approaches.

Water 2015, 7 ,
page-page simulations; adding the fifth and sixth important parameters (i.e., and θ ) would have little impact on matching runoff observations and CLM4 predictions.

Figure 5 .
Figure 5.The four most important input parameters (red blocks) for different response variables (total runoff, , , and LH), metrics (NSC, LMSE, and residual), and four SA approaches (SRC based on LM, ANOVA based on SVM, GCV based on second-order MARS, and ANOVA based on second-order GLM).