1. Introduction
Computer models are indispensable to perform costly experiments in an office environment, e.g., distributed modelling of water fluxes across the hydrosphere. Physically based distributed hydrologic models have become increasingly complex due to the large number of incorporated parameters to represent a variety of spatially distributed processes. These models are typically calibrated against stream gauge observations, i.e., a lumped variable of all hydrological processes at catchment scale. This can cause equifinality problems [
1] and poor performance of the simulated spatial pattern of the model since optimizing the water balance and streamflow dynamics are the only concern [
2,
3]. To solve this issue, the community needs models with flexible spatial parametrization and calibration frameworks that incorporate spatially distributed observations, e.g., remote sensing estimates of evapotranspiration (ET).
The basic idea behind any sensitivity analysis (SA) method is to relate the response of the model output to variations in the parameter values [
4]. The SA methods can, therefore, enhance our control on spatiotemporal model behavior [
5]. There are local (LSA) and global sensitivity analysis (GSA) methods that evaluate distinct and joint effects between different model parameters, respectively [
6,
7,
8].
While LSA methods evaluate point sensitivity in parameter space [
9], the GSA covers the entire parameter space and parameter interactions too [
10,
11]. This is because GSA perturbs all parameters simultaneously to assess the inter-relations [
12]. The most well-known GSA methods are Sobol’s method [
13] and the Fourier amplitude sensitivity test (FAST) [
14]. The main effects (e.g., first-order sensitivity) and elementary effects, originally described by Morris [
15], can be evaluated using both methods. The Morris method has been widely applied in hydrologic modelling. Herman et al. [
16] were able to classify parameters of a spatially distributed watershed model as sensitive and not sensitive based on the Morris method with 300 times fewer model evaluations than Sobol’s approach. The GSA methods are usually thought to be more appropriate to use in hydrologic applications than LSA methods since hydrological processes are nonlinear and the interactions between the parameters have a substantial effect on the results. However, the computational cost is crucial in applying the GSA methods in distributed hydrologic modelling [
17,
18]. The LSA methods gives fast results by assessing only one parameter at a time without interactions between parameters [
19]. The local derivatives are, however, based on a certain initial set in the parameter space.
The foremost objective of our study is to assess the major driving parameters for the spatial patterns of actual evapotranspiration (AET) simulated by a catchment model. Furthermore, we address how the selected initial set of parameters can affect the LSA results and how many initial sets are required for a robust sensitivity analysis. We evaluate parameter sensitivities using an LSA method with random and behavioral initial parameter sets (each containing 100 initial parameter sets). We focus on both spatial patterns of AET over the basin and temporal hydrograph dynamics using multiple performance metrics to evaluate different aspects of the simulated maps and the hydrograph. Streamflow performance of a model has typically been the main concern in conventional model calibrations, whereas improving the simulated spatial pattern during calibration has rarely been targeted [
20,
21,
22,
23,
24,
25,
26]. A unique feature of our study is evaluating the model’s sensitivity based on a set of 10 spatial metrics that, unlike traditional cell-to-cell metrics, provide true pattern information. We include an innovative metric which utilizes empirical orthogonal function analysis [
23] as well as the fractional skill score [
27], among others, to evaluate the simulated spatial patterns. The added value of each metric is assessed based on a redundancy test. This is done to identify the most robust metric(s) with unique information content to apply in a subsequent spatial calibration study.
Höllering et al. [
28] used hydrologic fingerprint-based sensitivity analysis using temporally independent and temporal dynamics of only streamflow data. They could identify two major driving parameters for evapotranspiration and soil moisture dynamics in different mesoscale catchments in Germany and reveal their relation to different streamflow characteristics. Westerhoff [
29] used remotely sensed ET data as an interpolator between point data and used a simplified sensitivity analysis only focusing on interpolation between ground-based estimates based on simple (linear) relationships. In our study, we took it up another notch by using the spatial pattern, including sensitivity analyses, using multiple methods, for improved hydrological model calibration. We tested a large range of methods and spatial metrics. Also, we applied the mesoscale hydrologic model (mHM) [
30] which can simulate distributed variables using pedo-transfer functions (PTF) and related parameters. Additionally, a recently introduced dynamic ET scaling function (DSF) was used to increase the physical control on simulated spatial patterns of AET [
2]. Ultimately, the identified important parameters for simulating both stream discharge and spatial patterns of AET were used in a very recent model calibration study [
2]. The novelty of the current study lies in using sensitivity maps showing the difference between the initial run and perturbation of a parameter and various spatial metrics to complement conventional SA with spatial pattern evaluation.
3. Methods
In this study, we applied the Latin hypercube sampling strategy [
44] combined with a local sensitivity analysis approach [
19]. Latin hypercube sampling, firstly named after McKay et al. [
44], is an efficient multidimensional sampling similar to Monte Carlo sampling (MCS) but requiring much fewer runs to avoid the clumpy size of uniform random sampling [
45]. It splits the range of each variable into different intervals of equal probability, where one value is randomly selected from each interval [
46]. This improves the stability of MCS while preserving the tractability of random sampling.
We tested 10 sophisticated performance measures (hereinafter called objective functions) to identify most important parameters. Three of these metrics, i.e., Nash–Sutcliffe Efficiency (NSE, Nash and Sutcliffe [
47]), Kling–Gupta Efficiency (KGE, Gupta et al., [
48]), and percent bias (PB), focus on simulated streamflow, whereas the remaining objective functions focus on the spatial distribution of AET (
Table 3). Although PB is included in KGE equation, the PB that reflects errors in the water balance was evaluated separately to consider volume error in addition to the streamflow timing.
3.1. Objective Functions Focusing on Spatial Patterns
The objective functions focusing on streamflow, i.e., NSE, KGE, and PB, are all well known and therefore not presented in detail. In the following, we focus on introducing the spatial objective functions. Seven objective functions were used to evaluate the pattern similarity between the reference AET from TSEB and the simulated AET from mHM. Five of the seven spatial-pattern-oriented metrics are calculated based on the spatial overlap of categorical maps [
54]. In this study, we used three classes to transform the continuous patterns of AET into categorical maps: (1) below the 15th percentile, (2) above the 85th percentile, and (3) the remaining grids between highest 15% and lowest 15% values. The empirical orthogonal functions analysis does not require categorical maps and therefore does not need to be transformed. The fractional skill score is based on categorical maps but uses more bins in the classification than the three mentioned above. The applied spatial metrics are all bias insensitive, which we consider favorable to conduct a meaningful pattern comparison. The overall water balance error is well represented by streamflow observations and therefore represented in the respective objective functions. Moreover, AET is simulated in mm/day, whereas the remote sensing reference is given in W/m
2. The two units are closely related but vary in range; therefore, applying bias insensitive metrics is inevitable.
3.1.1. Goodman and Kruskal’s Lambda
Goodman and Kruskal’s lambda (
) is a similarity metric for contingency tables. It has an optimal value of one indicating perfect match and lowest value of zero indicating no overlap [
55].
is calculated as
where
N is the total number of grids; m is the number of classes in the observed maps to be compared;
cij is the grid numbers for the class
i in first map (A) and to class
j in the second map (B);
is the grid numbers contained in category
i in map A;
is the grid numbers contained in category
j in map B;
is the grid numbers in the modal class of map A, i.e., the class with largest number of grids; and
is the number of classes in map B with a given class of map A.
3.1.2. Theil’s Uncertainty Coefficient
Theil’s Uncertainty (
U) is a measure of percent reduction in error. It is also known as average mutual information [
55]. It yields the same value when the reference map is A or B, i.e., symmetric; therefore, a reference does not need to be defined. Unlike
which accounts for modal class, Theil’s
U considers the whole distribution of the data. It is based on entropy, joint entropy, and average mutual information [
50,
54]. The information content (entropy) of map A is calculated as
The same notation as
is used for Theil’s
U equation below. Similarly for map B:
The joint entropy is then calculated as
The shared information by maps A and B is estimated by average mutual information
I(
A;
B) based on the entropy of two maps minus the joint entropy:
The uncertainty coefficient is then calculated as
3.1.3. Cramér’s V
Cramér’s V is a metric based on Pearson’s
statistic calculated from the contingency table of maps A and B [
51]. Recently, Speich et al. [
54] used this metric to assess the similarity of different bivariate maps for Switzerland. In their case, the variable pairs snowmelt and runoff as well as precipitation and PET were selected to describe the water balance in Swiss catchments. The
statistics can be calculated by
using the same notation of
,
, and
as for the metrics above. In addition, m and n show the number of grids in maps A and B, respectively.
always yields non-negative values. Zero values only appear in the case when
.The zero value hence indicates no similarity between the map pairs. There have been different modifications of
[
55], but the simplest and most widely used form was proposed by Cramér [
51].
V is a transformation of
, as shown below:
In an earlier study, Rees [
55] used Cramér’s
V together with two other categorical association metrics (
U and
) to assess the similarity of two thematic maps from Landsat images. All three metrics investigated in that study appeared to work well, as they produced significantly high values for the maps that were reasonably similar and low values for those maps that obviously differed. Rees [
55] recommended using Cramér’s
V for three reasons: (1) this metric is relatively simple to calculate; (2) it is symmetric, giving the same value when the reference map is A or B; and (3) it performs slightly better than
U and
in discriminating between two different maps or approving two similar maps.
3.1.4. Mapcurves
Mapcurves (
MC) is a measure of goodness-of-fit (
GOF), indicating the degree of match between two categorical maps [
54]. It has an optimal value of one, whereas the lowest value is zero. For each pair of classes (
i,
j) between the two maps A and B, the algorithm calculates
GOF using the following equation:
In the following equations, the equations are presented for class A (i.e., index
i), as the category A represents observed maps and B indicates simulated maps (i.e., index
j). Thus, the calculation for class B is analogous. The
GOF values are added up for each group of the observed map (A):
where
n is the grid numbers in the map (A). Note that the size of the maps should match each other for this comparison. The
GOF values are organized in ascending order to estimate the vector
. The values 0 and 1 are included in the series of
to integrate the function later. The length of
is hence
m + 2. For each
GOF value
i ∊
, the
MC is calculated as a segment of classes that have a
GOF more than or equal to
i:
The
MC value is then calculated by integrating
f(
x) between zero and one. A trapezoid rule is applied as follows to calculate the area under the curve. It has a best value of one.
3.1.5. Empirical Orthogonal Functions
The empirical orthogonal functions (EOF) analysis is a frequently applied tool to study the spatiotemporal variability of environmental and meteorological variables [
56,
57]. The most important feature of the EOF analysis is that it decomposes the variability of a spatiotemporal dataset into two crucial components, i.e., time-invariant orthogonal spatial patterns and a set of loadings that are time variant [
2]. Perry et al. [
56] gave a brief description of the mathematical background of the EOF analysis. The EOF-based similarity score (SEOF) at time
x is formulated as
where
n is the number of EOFs and
wi represents the covariation contribution of the
ith EOF. In our study, we focused on the overall AET pattern performance and thus we averaged
SEOF from the individual months of the growing season into a single overall skill score.
3.1.6. Fractions Skill Score
Roberts and Lean [
27] introduced the fractions skill score (FSS) to the atmospheric science community to establish a quantitative measure of how the skill of precipitation products varies for different spatial scales. Fractions relate to occurrences of values exceeding a certain threshold at a given window size (scale) and are compared between model and observation at individual grids. Most commonly, the thresholds represent percentiles which have the purpose of eliminating any impact of a potential bias. Hence, FSS assesses the spatial performance of a model as a function of threshold and scale and has been implemented by Gilleland et al. [
58], Wolff et al. [
59], and others to spatially validate precipitation forecasts. In summary, the following steps are performed during the FSS methodology: (1) truncate the observed
A and simulated
B spatial patterns into binary patterns for each threshold of interest, (2) compute fractions
A(
n) and
B(
n) within a given spatial scale
n based on the number of grids that exceed the threshold and lie within the window of size
n by
n, and (3) estimate the mean-squared-error (MSE) and standardize it with a worst-case MSE that returns zero spatial agreement between
A and
B (MSE
ref). The MSE is based on all grids (
Nxy) that define the catchment area with dimension
Nx and
Ny. For a certain threshold, the FSS at scale
n is given by
where
and
FSS ranges from zero to one, where one indicates a perfect agreement between observed and simulated patterns and zero reflects the worst possible performance. To our knowledge, Koch et al. [
60] were the first to transfer FSS from the atmospheric to the hydrological community and applied it on spatial patterns of land-surface variables simulated by a hydrological catchment model. The flexibility of FSS, in terms of scale and threshold, is very desirable in hydrological modelling applications where uncertainties in model forcing and parameters as well as scale differences between model and observation hinder a meaningful validation at native scales. In this study, FSS was implemented in an automated manner. In order to reduce the computational time, an overall FSS score was computed based on an average of six selected percentiles with individual thresholds. We decided to tolerate placement errors of extreme percentiles (1% and 99% that focus on the bottom and top 1% of AET, respectively) more than moderate percentiles (20% and 80%) by assessing the first at a larger scale (25 km) than the latter (5 km). In addition, the 5th and 95th percentiles, which represent the top and bottom 5% of AET grids, were assessed at a 15-km scale. The average of these six percentiles was used as the overall FSS score.
3.2. Latin Hypercube Sampling One-Factor-at-a-Time Sensitivity Analysis
We used Latin hypercube (LH) sampling in combination with a local sensitivity analysis method. This is an integration of a global sampling method with a local SA method changing one factor at a time (OAT). In other words, one perturbation at a time depends on the local derivatives based on a certain initial point in the parameter space [
19]. A similar design based on random perturbation at a time following trajectories was firstly proposed by Morris [
15]. SA based on Monte Carlo simulation is robust but requires a larger number of simulations. Alternatively, the LHS is based on a stratified sampling method that divides the parameter values into
N strata with probability of occurrence having a value of
1/
N. This feature leads to a more robust sensitivity analysis with a given number of initial values [
19]. Here, we tested whether behavioral initial parameter sets resulted in different parameter identification compared to random initial parameter sets. In addition, we used 100 different initial sets to assess if/when the accumulated relative sensitivities became stable. We could then evaluate how many initial samples were required to get robust results using LHS-OAT.
5. Discussion
In this study, we identified important parameters for streamflow dynamics and spatial patterns of AET by incorporating different spatial and temporal performance metrics in an LHS-OAT sensitivity analysis combined with spatial sensitivity maps. We first analyzed the suitability of spatial performance metrics for sensitivity analysis. The results suggest that a good combination of performance metrics that are complementary should be included in a sensitivity analysis. Most importantly, the metrics should be able to separate similar and dissimilar spatial patterns. We concluded this by benchmarking a set of performance metrics against human perception, which is a reliable and well-trained reference for comparing spatial patterns. Furthermore, redundant metrics need to be identified and excluded from the analysis.
Another important point is that the modeler has to select or design a model parametrization scheme that allows the simulated spatial patterns to change while minimizing the number of model parameters. Otherwise, the efforts towards a spatial model calibration will be inadequate. Here, the mHM model was selected due to its flexibility through the pedo-transfer functions.
Once the appropriate spatial performance metrics and model are selected, model evaluation against relevant spatial observations can be conducted. We are aware that the use of another model could lead to different sensitivities and thus different conclusions. Such modelling schemes can be easily incorporated with any parameter sensitivity analysis. In our study, the identified parameters that were sensitive to either streamflow or spatial patterns were used in a subsequent calibration framework. With this study, we ensured that we selected the best combination of objective functions for model calibration and simultaneously reduced the computational costs of model calibration by reducing the number of model parameters.
Utility of the Multicriteria Spatial Sensitivity Analysis
To compensate the weakness of one-at-a-time perturbation, we incorporated different random and behavioral initial sets in our study. This made the combined approach simple and robust when used with appropriate multiple metrics. It should be noted that LHS-OAT has been validated in different study areas [
5,
19]. Although the parameter interactions are not evaluated explicitly, the identified parameters seem to be the most important and relevant parameters for calibration. This might stem from the fact that parameters in mHM are not highly interacting with each other, as already shown in Cuntz et al. [
18]. The resultant maps in
Figure 4 are instrumental in deciding which parameter has a spatial effect on the results. The saturation after 20 initial sets (
Figure 3) corresponds to 940 model simulations (20 × 47), including 47 mHM parameters evaluated in LHS-OAT. These are much fewer numbers of runs as compared to first and second order sensitivity analysis based on Sobol’s approach, which requires a minimum of 104–105 model simulations [
63,
64]. The gain in computational costs is mostly due to the fact that we are not mainly interested in quantitative sensitivity indexes but rather the importance ranking of the model parameters. Here, we showed that the LHS-OAT method can reduce the computational burden of the GSA methods such as Sobol’s.
In this study, we could exemplify the impact of selecting thresholds of 0.5 for FSS and 0.15 for EOF on the selection of behavioral models, although it is recognized that these thresholds are case specific and rather arbitrary. While the use of NSE for streamflow evaluation over the past five decades has built familiarity with this metric and generated a consensus on thresholds for behavioral models regarding streamflow, these thresholds are also arbitrary and differ slightly from one study to another. It is anticipated that once the spatial metrics are used more often in hydrologic modelling practices, expertise in identifying suitable performance metrics and threshold values for satisfying spatial performances will grow. Particularly for the GLUE type uncertainty analysis methods and other model calibration frameworks, a new spatiotemporal perspective for behavioral model definition is indispensable. The framework described in this study does not aim at replacing discharge as a hydrologic model evaluation target, however, it strongly encourages that spatial data on, e.g., AET, are used in conjunction with point discharge data when evaluating distributed models. By gradually making spatial pattern evaluation an integral part of standard distributed model evaluation and performance reporting, it is anticipated that the hydrologic community will build a similar familiarity with spatial performance metrics as has been built with discharge metrics.
6. Conclusions
The effect of hydrologic model parameters on the spatial distribution of AET has been evaluated with LHS-OAT method and maps. This was done to identify the most sensitive parameters to both streamflow dynamics and monthly spatial patterns of AET. To increase the model’s ability to change simulated spatial patterns during calibration, we introduced a new dynamic scaling function using actual vegetation information to update reference evapotranspiration at the model scale. Moreover, the uncertainties arising from random and behavioral Latin hypercube sampling were addressed. The following conclusions can be drawn from our results:
Based on the detailed analysis of spatial metrics, the EOF, FSS, and Cramér’s V are found to be relevant (nonredundant) pairs for spatial comparison of categorical maps. Further, the PCC metric can provide an easy understanding of map association, although it can be very sensitive to extreme values.
Based on the results from sensitivity analysis, vegetation and soil parametrization mainly control the spatial pattern of the actual evapotranspiration in the mHM model for this study area.
Besides, the interception, recharge, and geological parameters are also important for changing streamflow dynamics. Their effect on spatial actual evapotranspiration pattern is substantial but uniform over the basin. For interception, the lacking effect on the spatial pattern of AET is due to the exclusion of rainy days in the spatial pattern evaluation.
More than half of the 47 parameters included in this study have either little or no effect on simulated spatial patterns, i.e., noninformative parameters, in the Skjern Basin with the chosen setup. In total, only 17 of 47 mHM parameters were selected for a subsequent spatial calibration study.
The sensitivity maps are consistent with parameter types, as they reflect land cover, LAI, and soil maps of the Skjern Basin.
Combining NSE with a spatial metric strengthens the physical meaningfulness and robustness of selecting behavioral models.
Our results are in line with the study by Cornelissen et al. [
24] showing that spatial parameterization directly affects the monthly AET patterns simulated by the hydrologic model. Further, Berezowski et al. [
5] used a similar sort of Latin hypercube one-factor-at-a-time algorithm for sensitivity analysis of model parameters affecting simulated snow distribution patterns over the Biebrza River catchment in Poland. However, to our knowledge, this is the first study incorporating sensitivity maps with a wide range of spatial performance metrics. The LHS-OAT method is easy to apply and informative when used with bias-insensitive spatial metrics. The framework is transferrable to other catchments in the world. Even other metrics can be added to the spatial metric group if not redundant with the current ones.