Aberystwyth Extending the Global Sensitivity Analysis of the SimSphere model in the Context of its Future Exploitation by the Scientific Community

: In today’s changing climate, the development of robust, accurate and globally applicable models is imperative for a wider understanding of Earth’s terrestrial biosphere. Moreover, an understanding of the representation, sensitivity and coherence of such models are vital for the operationalisation of any physically based model. A Global Sensitivity Analysis (GSA) was conducted on the SimSphere land biosphere model in which a meta-modelling method adopting Bayesian theory was implemented. Initially, effects of assuming uniform probability distribution functions (PDFs) for the model inputs, when examining sensitivity of key quantities simulated by SimSphere at different output times, were examined. The development of topographic model input of the output variance decomposition, but it did not seem to change the relative importance of each input parameter. Sensitivity Analysis (SA) results of the newly modelled outputs allowed identification of the most responsive model inputs and interactions. Our study presents an important step forward in SimSphere verification given the increasing interest in its use both as an independent modelling and educational tool. Furthermore, this study is very timely given on-going efforts towards the development of operational products based on the synergy of SimSphere with Earth Observation (EO) data. In this context, results also provide additional support for the potential applicability of the assimilation of spatial analysis data derived from GIS and EO data into an accurate modelling framework.


Introduction
Advances in computer science over recent decades have led to the increased use of sophisticated deterministic models in the simulation and prediction of processes, feedbacks and mechanisms related to a number of science and engineering fields [1,2]. These computer-based system models have become indispensable in many disciplines, ranging from finance to life-sciences and from quantum physics to Earth sciences and environmental engineering [3]. Notably, the advent of environmental modelling has made it possible to study the complex interconnected state variables and physical processes that represent the mass and energy exchanges of land-surface interactions within an Earth system [4]. In particular, Soil-Vegetation-Atmosphere Transfer (SVAT) models are one such type of deterministic models designed to simulate the continuously evolving interactions and feedback processes within the soil-plantatmosphere continuum [5]. These models are essentially mathematical representations of 1-dimensional "views" of the increasingly complex physical mechanisms governing radiative, turbulent and water transfers [6]. They require a large set of input parameters and initial state variables, some of which are spatially and temporally very variable, where in general, model complexity and the number of parameters increase simultaneously [7]. Some SVAT models provide a framework for assessing the spatial variability of mass and energy exchanges by combining the remotely sensed surface conditions within a Surface Temperature (Ts) and Vegetation Index (VI) feature space with a land surface process model to derive regional maps of energy fluxes [8]. One such approach is the so-called "triangle" method [9]. This technique allows a mechanistic framework for scaling small scale observations to regional fluxes [10] through the combined use of models and Earth Observation (EO) data. It also addresses issues related to sub-grid variability and the impact of spatial heterogeneity on the sources and sinks of mass, energy, and heat fluxes [11]. The SimSphere land biosphere model is a recent example of a SVAT model which is implemented to the "triangle" method. Formerly known as the Penn-State University Biosphere-Atmosphere Modeling Scheme (PSUBAMS) [12][13][14], it was modified to its current state by Gillies et al. [15] and Petropoulos et al. [16]. The synergistic use of SimSphere with EO data via the "triangle" method allows the pixel distribution from an image to fix the boundary conditions for the model, thereby largely bypassing the need for ancillary atmospheric and surface data. Variants of this method are currently being considered for the development of operational products from EO data, some anticipated to be delivered on a global scale [17][18][19].
It should be noted that parameter estimation can become a high-dimensional, multi-modal and mostly non-linear problem in land-surface deterministic models such as SimSphere. This is due to the inherently intricate nature of the complex feedbacks and interactions represented by the model [20]. This difficulty can be further compounded by over-parameterisation problems associated with the complexity of such models, as well as the problem of equifinality (i.e., different parameter sets resulting in identical or similar model performance measures) [3,21]. This is not a trivial issue as such problems can present a barrier to the quantitative analysis of model performance [1,3]. To overcome these difficulties, Sensitivity Analysis (SA) should be performed as a key component of any modelling process. By helping to identify redundant parameters, SA allows the modeller to focus on those input parameters that are influential. With this simplification, the difficulties mentioned above should be easier to overcome. In general terms, SA contains mathematical approaches used to quantify the relative influence of each input parameter on the model's output variability [22,23]. It allows for an objective assessment of model structure and coherence, and also a quantitative evaluation of the influence of each parameter on model performance [1,3]. Thus, SA can be defined as an application of qualitative or quantitative tools to study how the variation in the output of a model can be attributed to the uncertainty in the input information provided [24]. As a result, it provides a valuable method to determine which input parameters are important (factor prioritisation) and which factors are non-influential (factor fixing) and rank them in order of importance [25,26]. The latter can be significant since low-impact modelled outputs may be converted to fixed values or dropped to simplify the model, reducing the required computing power. It may also be possible to concentrate on high-impact parameters during calibration or when offering guidance to the design of experimental programs for more efficient model coding [1,23,27]. SA methods can be categorised as either local (LSA) or global (GSA). In the LSA (so-called "local perturbation") the partial derivatives of the system response with respect to its input variables are evaluated at a given base (local) point in the input space. Therefore, the information it gives about the sensitivities is totally dependent on the point at which the partial derivatives are evaluated [2]. Criticism of this technique is centred on the fact that it can only obtain a qualitative analysis of the importance of each factor on the output response for a limited subset and particular values of the input variables, and is deemed to be an inadequate method for analysing a high number of complex and high-dimensional parameters [28]. A model-independent GSA technique is preferable in these situations; however, an LSA technique is frequently adopted due to its relatively low computational burden [29]. Contrary to local SA approaches, the global SA aims to quantify the effect of an individual or a set of inputs to the uncertainty of model output by considering the inputs over their full distribution ranges [30]. This is achieved by screening the non-influential input parameters on model outputs during the parameterisation process. The complexity of parameter estimation can thus be reduced by removing the non-influential input parameters. Because the interactions between parameters can deeply affect the strategy of parameter estimation, the interactions between modelled parameters are also evaluated [31,32]. The GSA approach, despite its often high computational demands, has recently become popular in environmental modelling due to its ability to incorporate parameter interactions whilst also allowing relatively straightforward interpretation [33][34][35].
SA is a valuable technique to verify any physically based deterministic model, such as SimSphere. SA studies typically contribute to the wider understanding of model architecture, as well as comprehensively develop a thorough scientific understanding of the physical processes being modelled. As a result, numerous SA techniques have been developed, these include: (i) studies associated with the evaluation of outputs simulated by the model e.g., [14,36]; (ii) studies where the model has been employed in examining hypothetical scenarios e.g., [37,38]; and (iii) studies where the model has been used in combination with EO data, e.g., to derive spatially distributed maps of energy fluxes and soil surface moisture content e.g., [15,39]. In addition, some SA studies have been undertaken as part of the verification of SimSphere e.g., [40,41]. However, in these studies, the sensitivity of specific model outputs to the model inputs has typically been examined by varying the inputs around their expected "nominal" values and observing the effect on the desired model output-often by calculating the derivative of the output with respect to the input considered [42]. These SA studies have been based on basic empirical or LSA methods. Although simple, these analyses were able to identify a number of critical input parameters in a number of studies while also identifying parameters associated with model initialisation error. Recognising the requirement for further SA on SimSphere using the more robust GSA technique, Petropoulos et al. [6,16,42,43] performed a series of GSA experiments utilising a Gaussian process emulator, providing for the first time an insight into the model architecture. This study allowed for the quantitative evaluation of key target quantities simulated by the model with respect to its inputs [43]. Previous work has analysed the impact of changes in Probability Distribution Functions (PDFs) [6,16], changes in atmospheric sounding settings [44] and of a wide range of output variables, representing a significant step forward in global SimSphere verification efforts, particularly those focusing on the development of global operational products from the synergy of SimSphere with EO data.
Although previous SA works have examined the ability of SimSphere to provide realistic simulations of key land surface processes, the effect of varying the time of model simulation and its relative influence on how input parameters influence output variance decomposition is yet to be examined. A wide variety of interconnected radiative (incident angle of solar radiation), topographic (slope and aspect), biotic (shading of the ground surface, soil characteristics, influence on roughness, soil moisture availability) factors can all influence these land surface processes by controlling the net radiation balance and proportion of energy transferred as latent and sensible heat. The influence of these factors change throughout the day, based primarily on changes to the energy and hydrological cycles and the interplay between different processes and mechanics of the Earth system. In addition, as SimSphere is used synergistically with EO data to produce spatio-temporal estimates of energy fluxes and soil moisture content via the "triangle" method, it is of key interest to evaluate the model's performance at different model output times that are close to the overpass times of common polar orbiting satellites which provide data suitable for the "triangle" implementation. Thus to provide a full and comprehensive programme of testing for this model, analysing the influence of parameters, the relationships between them and how they are affected by time of day is clearly needed.
In this context, this study performs a GSA on SimSphere aiming to further extend our understanding of the model structure and establish its coherence. Initially, the effect of assuming uniform PDFs for the model inputs on modelled outputs ( Subsequently, a further exploration of the sensitivity of new, previously unexplored outputs simulated by the model was also undertaken. This study builds on previous SA analyses of the model aiming to help establish a more stable and comprehensive evaluation of the model structure and correspondence to the processes it simulates. To satisfy these objectives, a cutting edge GSA meta-modelling method adopting Bayesian theory is implemented.

SimSphere Land Biosphere Model
Carlson and Boland [12] originally developed the SimSphere Soil Vegetation Atmosphere Transfer (SVAT) model at Penn State's Department of Meteorology and Atmospheric Science. After initial development, the model was modified considerably by Gillies et al. [15], and to its current state by Petropoulos et al. [16]. A detailed mathematical account of the model's bare soil component has been described by Carlson et al. [13], its vegetation component by Mehrez et al. [45] and the model plant hydraulics by Carlson and Lynn [46]. Gillies [47] provides a comprehensive systematic description of the overall model architecture and its initialisation process. SimSphere is at present distributed globally from Aberystwyth University, United Kingdom (UK) [48].
The physical components of SimSphere ultimately determine the microclimate conditions in the model and are grouped into three categories-radiative, atmospheric and hydrological. The primary forcing in the model is the available radiant energy reaching the surface or the plant canopy. The components of the vertical structure effectively correspond to the components of the Planetary Boundary Layer (PBL) that are divided into three layers-a surface mixing layer, a surface of constant flux layer and a surface transition layer, the depths of which are variable with time. The vertical structure also contains a fourth layer, the substrate layer, which refers to the depth of the soil over which heat and water is conducted. The third model facet, the horizontal component, is used to account for the spatial diversity of vegetation cover across the land surface. This component accounts for a layer of vegetation between the atmospheric surface layer and the ground surface. Heat and moisture fluxes are exchanged between both the ground and foliage, with the inter-plant airspaces through resistances in the leaf (for water vapour) and the air. SimSphere is therefore, fundamentally, a one-dimensional boundary layer model with a plant component. Over a 24 h cycle, SimSphere simulates physical processes occurring in a 1D vertical column between the root zone and the lower atmosphere. Spatially, the model is only representative of an area the size of its parameterised conditions. Initial conditions are set at 06.00 h local time and develop at a time step of 15' or better. Model outputs include variables associated with the radiative, hydrological and atmospheric physical domains; (e.g., energy fluxes, the transfer of water in the soil and in the vegetation, carbon dioxide flux between the atmosphere and the vegetation, surface ozone flux, wind velocity, temperature and humidity above the surface canopy). In addition to its use as a stand-alone modelling tool, SimSphere can also be integrated synergistically with EO data using the "triangle" method which aims to derive regional estimates of Latent Heat (LE) and Sensible Heat (H) fluxes as well as Surface Soil Moisture (Mo) [9,49]. A recent overview of SimSphere use can be found in Petropoulos et al. [50].

BACCO GSA: Principles
Research in the field of Bayesian statistics has recently produced a number of general purpose tools for the analysis of computer simulators. Bayesian Analysis of Computer Code Outputs (BACCO) includes methods designed for prediction, GSA, and Uncertainty Analysis (UA). BACCO methods allow for a comprehensive global assessment across a large number of model inputs, accounting for prior information about the inputs and outputs, in a way that is highly efficient and mathematically tractable. Relevant methods are introduced in detail in the tutorial of O'Hagan [51], but we include here an introduction to the main features. Bayesian statistics assumes that uncertainty about any unknown quantity can be represented as a probability distribution, and works by updating the distribution as more information becomes available. It is necessary as part of the Bayesian model to specify a prior distribution which represents any knowledge of the unknown quantity before observing the data. After obtaining new observations this is then modified using Bayes' theorem to yield the posterior distribution. The first stage in a BACCO approach is to create an emulator of the simulation code, from a set of pre-calculated training runs. The emulator essentially comprises a statistical approximation of the code's input/output relationship (the posterior mean from the Bayesian analysis) together with a representation of uncertainty about the true function (the posterior standard deviation). The standard deviation is a measure of the accuracy of the code approximation, and as discussed in Section 3.1 this can also be used in diagnostics to check the overall quality of the emulator. First we assume the output function f(x) is an unknown function of its multi-parameter input vector x. Uncertainty about the function f(·) is quantified using statistical analysis of the input and output data that are generated from running the code at selected inputs. We refer to these data as the training runs. The Bayesian analysis requires a prior distribution for the function. The most practical form of prior is a Gaussian Process (GP) with the following simple form. Suppose the code takes a p-dimensional input = , … , . The prior expected output at input is assumed to be an additive linear function: where , … , are unknown coefficients. The prior covariance function is: The , … , , , , … , are unknown parameters estimated using maximum likelihood as part of the model fitting (or emulator building). The theory is directly linked to the standard analysis of multivariate normal data, as the implied prior distribution of any set of outputs is multivariate normal with mean vector and covariance matrix derived directly from Equations (1) and (2). The choice of smooth covariance function (Equation (2)) represents a belief that the output is a smooth function of its inputs, in particular that it is infinitely differentiable. The correlation is a product of functions, where the ith factor tends to 1 as the distance in the th component approaches 0. The result is that as → ′, the covariance (Equation (2)) tends to . The parameter is a scaling parameter that determines how quickly the correlation between two model outputs decays in the th input component. These are referred to as roughness parameters and their estimated values can be useful to assess the nonlinearity of the output as individual input components are varied. The GP prior formulation also leads to a posterior distribution that is also Gaussian. This is important because it makes the subsequent integrations, necessary for UA and GSA, available in closed form [52]. The posterior mean of (·) is a smooth curve that passes exactly through the points observed in the training runs. The posterior variance of (·) is 0 at the training points and increases smoothly as the input deviates away from those points. Although the prior expectation (Equation (1)) has a particular linear form, the posterior mean increasingly adapts to the shape of the true output as more training points are added. Therefore in practice if the training sample is well designed, the approximation follows the true code accurately even when the code is non-linear. One aspect of this formulation that is important to recognise is that, apart from the linear term, it assumes stationarity of the output, i.e., the distributions of deviations from the linear term are similar in different parts of the input space. This is seen in the fact that correlations (Equation (2)) depend only on the distance between points rather than the values of , ′. Again, the posterior distribution tends to adapt to the output shape even if the true output is a non-stationary function, but the posterior standard deviations may be incorrectly estimated. These issues can be diagnosed using cross-validation as part of the analysis (see Section 3.1). More detailed prior beliefs about functions are unlikely to occur in practice, and the flexibility of the GP coupled with the information provided in the training runs tends to produce useful approximations to the true output, so these modelling choices are fine for practical applications. Once built, the emulator can be used to derive predictions, sensitivity measures, and uncertainty analyses. Each of these can be carried out from the same emulator without further code runs, which again highlights the efficiency of BACCO.
Prediction: the emulator mean is the posterior mean of the GP distribution, and this can be evaluated at any input value to give a fast version of the original code (typically 1-3 orders of magnitude reduction in code runs, relative to more conventional Monte-Carlo based methods such as those covered in [53]). BACCO prediction is therefore most beneficial for computationally expensive codes. Accuracy of the predictions can be reported too, as the posterior variance or standard deviation of ( ). The GP posterior has the intuitive property that at any of the observed training runs the prediction is exact, with variance = 0. The posterior mean adapts to the true shape of the output as more training points are added. Hence the recommendation that the input design for training runs is well-dispersed throughout the plausible input region.
Uncertainty Analysis: in a given use of the computer code the true values of some or all of the inputs are typically uncertain, which implies uncertainty in the corresponding output. Uncertainty analysis is the characterisation of the total uncertainty of an output, due to the input uncertainty. As ( ) is generally non-linear, the probability distribution of the output has a complex form that cannot be derived analytically. However the mean ( ( )) and variance ( ( )) are defined as integrals with respect to the distribution of the uncertain input parameters , so by using the GP formulation and for particular forms of input distributions, summaries of these distributions can be estimated directly via the emulator in closed form. A detailed and comprehensive overview of the BACCO method can be found in Oakley and O'Hagan [52].
Sensitivity Analysis: GSA also accounts for uncertainty in the inputs, so is similar to uncertainty analysis but is specifically concerned with the relative contributions to the total output uncertainty that are due to uncertainty in individual inputs or in pairs/groups of inputs. GSA also provides useful visualisations showing the impact that changing any single input or a group of inputs has on the output. The particular measures of sensitivity we use here are main effects, joint effects, and sensitivity indices. The main effect of the ith component/variable of the input ( ) shows how the output changes on average as is varied, after averaging over the impact of all other components. When plotted as a function of , this can be a useful visual aid to understand the model behaviour and can sometimes reveal unusual features. Similarly, the joint effect of a pair of input variables ( , ) is the equivalent measure showing how changes to two inputs influence the output when varied in combination. While main and joint effect plots are very useful, sensitivity indices provide a more direct quantitative measure of the influence of each input of group of inputs. As mentioned above, the combined input uncertainty induces a total output uncertainty, and it is possible to partition this total output variance into contributions from individual inputs. The first order sensitivity index of is defined as { ( ( )| }/ ( ( )) . The expression ( ( )| ) means the expected value of the output ( ) conditional on the th component having the value , and is the ith main effect evaluated at . This ratio can be interpreted as the expected amount to which uncertainty in ( ) would be reduced if we learned the true value of , as a proportion of the total variance. Second or higher order effects can be defined similarly. The total effect of a single input refers to the sum of all first and higher order indices in which it appears, and can therefore be used as a measure of the combined influence of an input [52,53]. It is important to note that the (code approximation) uncertainty represented in the emulator is carried forward in each of the derived uncertainty and sensitivity measures and can be used to assess the accuracy of each of the measures listed above. As with the uncertainty analysis, the GP property allows for integrations to be performed analytically directly from the emulator if the input distributions are Gaussian or uniform.
BACCO is generic, in that it analyses the computer code as a black box, so can be applied wherever there is a simulation with inputs and outputs. It has been applied successfully in many fields, as diverse as biomedical engineering [54] and cosmology [55]. Some examples relevant to this study include the GSA of the Sheffield Dynamic Global Vegetation Model (SDGVM) in Picard et al. [56], and Kennedy et al. [57,58]. The Gaussian emulation machine for SA (GEM-SA) software [59] is a freely available implementation of methods described in Kennedy et al. [56] and Oakley and O'Hagan [53]. GEM-SA includes a user-friendly interface and can generate space-filling input designs for the initial training inputs, and it provides the estimation of parameters , … , , , , … , . Validation information can also be generated to identify any problems with the emulator model fit (Section 3.1). The user must supply a file of outputs to complete the training runs. Alternative tools are available, such as the R packages BACCO [60,61], SAVE [62] and Dice [63]. Apart from differences in the underlying modelling choices and intended use of these models, such R packages provide more flexibility when embedding emulators within larger software projects.

GSA BACCO Implementation on SimSphere
An overview of the GSA methodology implemented on SimSphere is provided in Figure 1. The BACCO GEM-SA was implemented along the lines of previous similar GSA studies applied to the model [6,16,[42][43][44]. Herein the sensitivity of additional previously unexplored modelled outputs (Table 1) were studied, and more notably, the effect of implementing the model over different times of simulation (7:30/9:30/11:30/13:30/15:30/17:00) was also explored. Examining an increasing number of modelled outputs allows us to potentially increase our confidence in the behaviour of the model, and better establish its coherence. A comprehensive analysis of the model structure is required to ensure accuracy in its real-world applicability and for its future development for possible operational use. The same implementation setting and sampling dataset was used as in other GSA studies on the model [6,16,[42][43][44]. In summary, a design space of 400 SimSphere simulations was developed using the LP-tau sampling method. This sampling method is designed to prevent clustering and gaps of sampling points as much as possible, even for fairly small samples, by placing the points as homogeneously as possible within the space. Thus it is generally found to be a robust, computationally fast method for the efficient space-filling design of a large number of training input points [64,65]. For both the geographical location (latitude/longitude) and atmospheric profile, a-priori real observations for August 2002 were used from the Loobos CarboEurope site, located in The Netherlands (52°10'04.29" N, 05°44'38.25" E). All other SimSphere inputs were allowed to vary across their theoretical ranges ( Table 2). Each model output was exported on six occasions relating to the different simulation output times. In addition, the emulator performance was evaluated based on the "leave final 20% out" cross-validation method offered in GEM-SA, again in accordance with previous GEM-SA studies conducted on the model.

Emulator Accuracy
In an SA experiment, it is vital to build an accurate emulator. GEM-SA internally computes a set of statistical measures which evaluate the performance of the emulator. These include the unitless "roughness value" parameters , … , , which represents emulator performance, unrelated to the surface roughness of physical model input parameter. They also include the "cross-validation root mean square error", "cross-validation root mean squared relative error", "cross-validation root mean squared standardised error" and the "sigma-squared" value (which provides a measure of the non-linearity in the emulator). These cross-validation measures work by estimating a series of "left-out" points for which we actually know the true values generated as output from the code. Hence the resulting errors are readily available. Note that we refer to cross-validation in the usual statistical sense of using data that is used to fit the emulator to test the emulator. Furthermore, this is not a validation of the model as a representation of the real system, but rather a test of the emulator as an approximation to the original computer model. "Cross-validation root mean square error" is simply the square root of the mean square error of the emulator prediction whereas the second term is that value expressed as a percentage of the true value. The third term expresses the residual divided by an estimate of its standard deviation. The standard deviation is from the emulator posterior distribution, as explained in Section 2.2, calculated at each of the left-out training runs. Because of the standardisations, the target value is 1. These statistics are summarised in Table 3. In addition, roughness values of the model inputs had a low range, which indicates that the built emulator is a very good approximation of the actual model. However some exceptions occurred for the Aspect model input where the roughness values reached 7.937 and 6.169 respectively for different times of simulation. Most roughness values obtained were below 1.0, which suggest that the emulator responded smoothly to variations in model inputs. In regards to roughness values above 1.0, this indicated that there is some degree of non-linearity between model inputs and outputs. The following sections present a detailed view in the SA results on each of the model outputs.

Parameter Sensitivity for ( daily
Rn ) The input parameters percentage variance contribution of daily Rn ranged from 0% to 79.63% and from 0% to 96.18% for the main and total effects respectively (Tables 4-9, Figure 2g). The input parameters with the largest contribution to the main effects were those of Aspect (50.10%-79.63%), Fractional Vegetation Cover (FVC) (1.62%-7.16%) and Slope (0.45%-20.29%). These three parameters together accounted for approximately 74%-82% of the total variance of daily Rn . Results also showed pairwise interactions contributing to between 13.49%-18.48% of the decomposition of the total sensitivity index dependent on time of model simulation. It should be noted that a total pairwise or 1st-order interaction is "a pairwise interaction" between two parameters (i.e., in general, an n-th order interaction is an interaction between n-1 parameters). Notably, the most influential model inputs to the overall model decomposition were those of Aspect, Slope, Leaf Area Index (LAI) and FVC. The impact of both Aspect and Slope on the simulation of daily Rn in the model is reasonable, since these parameters directly affect the amount of the incident incoming surface radiation (Rg). In addition, FVC determines the amount of incoming solar radiation that is reflected from the surface. Within the model, the reflective efficiency of FVC is given by a partition factor dependent on LAI, thus explaining the significant influence of these vegetation parameters. In terms of changes during the day, the largest changes were seen in the contribution of Aspect (Figure 2g), showing a marked decrease in its contribution from the highest value at 07:30, to its lowest contribution at 11:30, but then showing a sustained increase until 17:30.

Parameter Sensitivity for ( down L )
Here main effects and total effects varied from 0.02% to 22.23% and 0.02% to 36.28% respectively (Tables 4-9, Figure 2i). Time of simulation seemed to have an effect on percentage variance decomposition, where in the 07:30 scenario the summation of main effects alone was only 32.23%, compared to ranges between 56.75% and 64.42% for all other time scenarios. Significant input parameters with the highest main effects were those of Aspect (5.37%-22.23%), FVC (4.71%-18.59%), Vegetation Height (0.16%-16.98%), Surface Moisture Availability (Mo) and Slope (1.56%-10.11%). All other input parameters evidently contributed negligibly, with less than 1% contribution in terms of their main effects. The contribution of Aspect once again increased from 07:30 to 11:30 before decreasing steadily again throughout the afternoon. The other important parameters mentioned above increased throughout the day, with the contribution of Vegetation Height in particular increasing with each time interval. However, the influence of the other inputs (FVC, Mo and Slope) increased in the morning but stayed relatively constant in the afternoon.
Interactions appeared to contribute significantly to the total variance of down L for all six times of to approximately 21% of total variance. A large number of first order interactions with values above 0.1% were observed and results suggest that the total interaction effects are owed to first order interactions. The most important interactions were between Slope and Aspect (3.02%-11.84%), Aspect and Mo (1.66%-2.71%) and between Aspect and LAI (0.29%-1.41%). Figure 2j shows that the contribution of Aspect decreased in the morning, stayed constant in the afternoon before increasing once more in the final time of simulation. The contribution of LAI increased throughout the day before decreasing in the final time of simulation and Mo increased throughout most of the day but decreased in the final two times of simulation. were Aspect (56.34%-81.25%) and Slope (0.68%-26.63%). All other input parameters contributed less than 1% each to the decomposition of daily Rg variance. In terms of total effects the most important input parameters remained the same as for the main effects (Slope and Aspect). First order interactions varied between 12.89% and 16.47% and higher order interactions between 0.01% and 1%. Interestingly the presence of first order interactions was low. However the most important interaction between Slope and Aspect dominated the first order interaction contribution, ranging from 11.92% to 15.62%. Figure 2h shows that the contributions of both Aspect and Slope decreased throughout the day before increasing in the final two times of simulation (15:30 and 17:00).  (5) and (6) in [6]). Given this, the influence of the input parameters on the output variance decomposition of each model output was exactly the same. Thus, ranges of main and total effects for daily NEF / daily EF ranged from 0.03% to 59.70% and total effects from 0.03% to 72.79% respectively (Tables 4-9, Figure 2b and  Ranges of main and total effects varied from 0.04% to 37.07% and 0.04% to 36.12% respectively (Tables 4-9, Figure 2a). Model inputs with high main effect values were Aspect (5.55%-37.07%), FVC (9.08%-21.86%) and Vegetation Height (6.54%-19.04%). Figure 2a shows that the contribution of Aspect gradually decreased throughout the day. The contribution of Vegetation Height increased throughout the day before decreasing in the final time of simulation. Changes in the contribution of FVC and Mo were much more variable with time. To a much lesser extent, though still appreciable, were the main effects of Mo and Surface Roughness. In all six scenarios, the summation of main effects appeared to be relatively low, indicating a high degree of interactions. The model inputs with the highest total indices were those of Slope, Aspect, FVC, Vegetation Height, Mo and Surface Roughness. Much less important total effects were noted for the Cuticle Resistance and Saturated Volumetric Water Content (THM). Pairwise interactions dominated the interaction structure, as they explained on average 25% of the total daily H variance. Particularly notable was the contribution of higher order interactions, which fluctuated from 11.82% to 20.96%. The model inputs with the highest sums of main and joint effects were those of Vegetation Height, Mo and Aspect. FVC and Surface Roughness had appreciable interaction effects, as indicated by their total effect index, which was markedly higher than their corresponding main index, implying strong interaction effects.

Parameter Sensitivity for ( daily LE )
The main and total effects for the inputs of  Figure 2c shows that the contribution of Aspect initially increased in the early morning before decreasing with time during the afternoon. The contribution of FVC stayed below 10% for the majority of the time of simulation but increased markedly at the 17:00 time of simulation. The contribution Mo increased in the morning before becoming more variable in the afternoon. Tair ) The main and total effects for daily Tair vary from 0.01% to 57.68% and from 0.01% to 78.53% respectively (Tables 4-9, Figure 2k) Specifically, first order interactions indices were 23.6% on average and higher order interactions were 4.16% on average. In addition the most important first order interactions were observed between Slope and Aspect (4.24%-16.05%), FVC and Vegetation Height (0.48%-2.64%) and between Aspect and FVC (0.39%-1.41%). Figure 2k shows that the contribution of Aspect decreased during the day and FVC increased for the majority of the day with the exception of the final two simulation times.

Parameter Sensitivity for ( daily
3.2.9. Parameter Sensitivity for ( daily Trad ) Ranges of the main and total effects for this model output varied from 0.01% to 70.94% and 0.01% to 90.24% respectively (Tables 4-9, Figure 2e). Model inputs with significantly high main effect values were those of Aspect (21.22%-70.94%), Slope (0.45%-20.98%), Mo (2.77%-20.53%), Vegetation Height (0.66%-6.26%) and FVC (0.16%-5.59%). Also appreciable but to a lesser extent were the contributions of Substrate Climatological Mean Temperature and LAI input parameters. In addition, the main effects of the model input parameters summed to 72.16%, suggesting the existence of high interactions. First order interactions were approximately 20.60% on average with the most important being observed between Slope and Aspect (3.59%-13.65%) and between Aspect and FVC (0.72%-1.25%). The contribution of Aspect decreased during the morning and increased during the afternoon and the converse was true for Slope. Mo increased during the majority of the day but decreased in the final two time slots.

)
The contribution of each input parameters main effect to the total variance decomposition of daily Mo varied widely from 0% to 98.23% (Tables 4-9 and Figure 2f) showing that the variance in daily Mo was dominated by a single parameter, Mo itself, which tended to decrease during the day. In terms of total effects, it appears that all the input parameters remained in approximately the same ranges when comparing main and total effects, but also across time periods. The small observed differences between the calculated main and total effects values for most of the model inputs indicates minimal interaction effects (Tables 4-9, Figure 2k). The variance decomposition of                Table 7. Summarised results from the implementation of the BACCO GEM-SA method on the different outputs simulated by SimSphere using uniform PDFs at 13:30. Computed main (Main) and total effect (Total) indices by the GEM tool (expressed as %) for each of the model parameters are shown, whereas the last three lines summarise the percentages of the explained total output variance of the main effects alone (ME), the 1st order interactions only (1st) and the 2nd or higher order interactions only (>1st). Input parameters with a variance decomposition of greater than 1% are highlighted.      Table 8. Summarised results from the implementation of the BACCO GEM-SA method on the different outputs simulated by SimSphere using uniform PDFs at 15:30. Computed main (Main) and total effect (Total) indices by the GEM tool (expressed as %) for each of the model parameters are shown, whereas the last three lines summarise the percentages of the explained total output variance of the main effects alone (ME), the 1st order interactions only (1st) and the 2nd or higher order interactions only (>1st). Input parameters with a variance decomposition of greater than 1% are highlighted.      Table 9. Summarised results from the implementation of the BACCO GEM-SA method on the different outputs simulated by SimSphere using uniform PDFs at 17:00. Computed main (Main) and total effect (Total) indices by the GEM tool (expressed as %) for each of the model parameters are shown, whereas the last three lines summarise the percentages of the explained total output variance of the main effects alone (ME), the 1st order interactions only (1st) and the 2nd or higher order interactions only (>1st). Input parameters with a variance decomposition of greater than 1% are highlighted.

Identification of Key Input Parameters
The aim of this study was to perform a SA on SimSphere at six different times during the day. In common with previous work undertaken on SimSphere [6,16,[42][43][44] it has been shown that the model outputs are sensitive to a small number of input parameters. Once again, outputs were shown to be significantly influenced by changes in topographic input parameters which were derived using Geographic Information System (GIS) capabilities (in particular Aspect, and to a somewhat lesser extent, Slope). However, other input parameters were also found to be influential, in particular FVC, Vegetation Height and Mo. The influence of the topographic parameters in particular confirm the results of Petropoulos et al., [6,16,[42][43][44], and the fact that these parameters have consistently been found to be significant regardless of which output parameters are being analysed, which spatial location is used (Italy or the Netherlands), or which PDF is used, clearly underlines the extent of this influence.
Daily Average Net Radiation, Daily Average Shortwave Incoming Radiation and Daily Average Longwave Downwelling Radiation model outputs are perhaps the simplest examples of where Slope and Aspect input parameters combine to exert an influence on the output variables. This is of course to be expected since slopes facing in different directions and having different slope angles will receive different amounts of solar radiation [66]. This primary influence will then influence net radiation at the surface. However, FVC has been shown to exert more influence on net radiation than slope, and once again, this confirms previous work on SimSphere [44]. Indeed, the complex interplay between slope, aspect and vegetation (including albedo and soil characteristics associated with particular vegetation) control how much radiation reaches the surface and how much is reflected back into the atmosphere, explaining the significant influence of these parameters [67]. Aspect and Slope have also proven to be the most important parameters influencing Daily Average Tair at 50m and Daily Average Radiometric Temperature model outputs, once again due to the fact that these parameters control the distribution of incoming solar radiation and net radiation, and the resultant latent and sensible heat distribution which in turn warm the air.
It was also evident that the time of model simulation played an important role in the dynamic evolution of the amount of influence the topographic parameters had on the model outputs related to solar radiation (Daily Average Net Radiation, Daily Average Shortwave Incoming Radiation, Daily Average Longwave Downwelling Radiation and Daily Average Longwave Upwelling Radiation). As discussed, the intensity of solar radiation reached at the surface is heavily influenced by uneven terrain and variations in slope and aspect [68]. This influence can be explained by the effect of the Sun's zenith angle, the angle of slope, the direction of the Sun, and the direction of the sloping surface. Essentially, the sky forms an inverted bowl, or half sphere, above and around a point in the landscape. On a horizontal surface, diffuse radiation emanates from all portions of the sky. As the surface is tilted at an angle, less of the sky hemisphere is viewed from a point on the surface. A portion of the sky is essentially blocked by the terrain, from which no sky diffuse radiation is received. With a vertical wall, the sky hemisphere is cut in half and each side of the wall receives diffuse radiation from only one-half of the sky. A sloped surface, therefore, sees less of the sky as the angle increases or dependent on the uniformity of slope aspect. From early morning, the angle of the Sun is parallel to the surface, and thus the subsequent incoming solar radiation is more likely to be blocked by the terrain [68]. This effect is likely to decrease as time of day approaches noon and the Sun's zenith angle increases. At the time of day where the Sun is at nadir to the surface at an altitude angle of 90° (~12:00 p.m.), this "blocking" effect is at its least amount and less likely to influence the amount of incoming solar radiation. After this point, the Sun will then return on a similar arc, lowering its zenith angle and experiencing an increase in the "blocking" of incoming solar radiation until sunset. However, there is a complex relationship among slope, aspect, latitude, time of year, time of day, and solar radiation that precludes simple statements that north slopes receive less radiation than south slopes and gentle slopes receive more radiation than steep slopes.
The influence of vegetation on the transfer of energy and heat that has been noted in this paper is also significant, perhaps more so than in other SA on SimSphere [6,16,[42][43][44]. The factors showing the greatest influence on the Daily Average Latent Heat Flux model outputs included Mo, FVC and Vegetation Height. This influence of vegetation on latent heat transfer is to be expected given the role of evapotranspiration in partitioning incoming solar radiation into latent and sensible heat fluxes. In terms of the Vegetation Height input parameter, the height of vegetation can have an influence on the rate of evapotranspiration through the greater or lesser extent of shading of the ground surface from the incoming solar radiation and hence the energy available. Similarly, the role of surface moisture availability is also a key influence on latent heat transfer because a ready supply of surface moisture will provide water for evapotranspiration and hence influence the proportion of incoming solar radiation transfer by latent heat [69]. Given the above, it is unsurprising that the corresponding Daily Average Sensible Heat Flux output also exhibits sensitivity to the Aspect, FVC, Vegetation Height and Mo input parameters. Vegetation parameters also influence radiometric temperature and air temperature at 50 m because of shading and roughness properties and control on the proportion of energy transferred as latent and sensible heat.
The fact that Mo is the most important input parameter influencing the Daily Average Surface Soil Moisture Availability output, and the lack of interaction effects implies that the simulation of soil moisture content is controlled by the value of surface moisture availability alone. This can perhaps be explained by the definition of the above parameter in the SimSphere model structure, expressed only in terms of water content of the soil surface, and is also in accordance with other SA undertaken on the model. The results for this parameter show most clearly that the significance of the influence of an input parameter can change during the day. For example, during one scenario Aspect and Slope become the most important input parameters. For other model outputs, the importance of the main effects changes throughout the day. For example, Aspect in the case of the Daily Average Sensible Heat Flux becomes less important as the day progresses. This is to be expected since the amount and angle of incidence of incoming solar radiation will vary with time, as will the influence of shading by vegetation, which will in turn influence evapotranspiration and latent and sensible heat fluxes.
It is evident from the results of this SA that the time of model simulation can significantly affect the influence of specific parameters on model outputs. This effect needs to be considered when setting the initial conditions of the model, and should be a critical step of model parameterisation. This issue relates to questions related to model identifiability. In SA, these are concerned with changing objective functions, parameter fixing, changing model structures or changing data periods amongst others. As was the case in our study, there can be a change in the number of insensitive model parameters if different simulation times are used, although some model parameters are insensitive for all data periods.
Consideration of such issues must be taken when interpreting any results to avoid drawing generalisations or trying to apply generalisations to new situations.

SimSphere as a Tool for Environmental Management
Although numerous studies have identified the likely impacts of potential future climatic change on the hydrological system and land surface processes [70][71][72][73], great uncertainties still exist in terms of responses at a variety of scales. As a result, the need for robust and reliable models which accurately represent physical processes and enable environmental managers to effectively plan for a wide range of scenarios is becoming ever more important. Representing as it does the culmination of a series of sensitivity analyses of SimSphere analysing the behaviour of the model under different model conditions [6,16,[42][43][44], this study once again confirms that the model is potentially a significant tool for environmental management if combined with EO data. The results of this study are not directly related to the development and future exploitation of such operational products but have significant implications for the development of the model as a useful and practical educational tool. Specifically, the analysis of the effect of time of simulation on the influence of parameters on model outputs is a significant consideration when acquiring EO data for synergistic use with the model. Results suggest that time of satellite acquisition becomes an important consideration when selecting EO data. In common with Petropoulos et al., [6,16], this study has shown that many of the modelled output are sensitive to changes in FVC and surface soil moisture to a greater or lesser extent. Despite the clear need for further work in the processes by which these two parameters are derived from remotely sensed data it is generally felt that good estimates of the extent of these parameters can be obtained if remotely sensed imagery is available. These accurate estimates could be used to constrain model inputs in particular areas in order to minimise variations in model outputs.
Once again, in common with previous work on SimSphere [44], this paper has shown that model outputs are particularly sensitive to topographic inputs, in particular slope and aspect, even when the model is run at different times during the day. When considering the use of the "triangle method" for estimating energy fluxes, this is particularly significant since it suggests that the application of that method in areas of high relief may be problematic and that topographic variables may need to be considered in those cases. However, data on these properties are increasingly becoming available at no cost and at a variety of different spatial scales such that the creation of digital elevation models is relatively straightforward. Further work could perhaps focus on identifying a threshold slope, or range of aspect values above which the output variables become sensitive, thus restricting application of the model to a particular range of landscapes for which the model is robust. These results underline the potential complementarity of integrating GIS, EO and modelling approaches to involve spatial data handling and analysis support for land surface paramaterisation [74,75]. However, the development of several advanced GIS tools are needed to further progress the coupling of GIS functions within a modelling framework and exploit the full functionality of such data assimilation methods. The capability of GIS functions related to efficient data conversion algorithms, exploratory spatial data analysis tools, spatial interpolation, modelling languages, and temporal and three-dimensional data analysis capabilities are some highlighted areas where GIS has the potential to advance modelling applications related to land surface processes. Such advancements would allow for a host of spatial analysis methods to be incorporated within an environmental modelling framework, ranging from the integration of the interpolation of geostatistics which would allow researchers to include discontinuities or weight observations of quality, to a convenient source for the display and visualisation of results [74,75]. An understanding of the underlying GIS tools for environmental modelling is a potential avenue for future works.
The results presented here are further evidence of the promise of SimSphere as a model which can accurately represent those key processes that occur at the hugely significant interface between the lithosphere, hydrosphere and atmosphere, an interface which is likely to see significant changes over coming decades. Given that the SA performed here analyses the impact of changing the time of day at which the model is run, and that previous work has analysed the impact of spatial changes [44], changes in probability distribution functions [6,16] and of a wide range of output variables [42], the paper represents the final component of a comprehensive programme of testing this model. Further work is now needed to validate the model outputs against in-situ, instrumented data from different locations and over different time periods. This is an essential step in order to confirm the potential role of SimSphere as an applied tool for environmental managers, soil conservation practices, sustainable water resource management and agriculture that this study has suggested.

Conclusions
This study provides a further demonstration of the capability of the GSA meta-modelling method adopting Bayesian theory to decompose the total uncertainty of the SimSphere model and confirm results from previous sensitivity analyses on the model showing that the model's yield output sensitivity to important parameters depend strongly on topographic conditions. The fact that these parameters have consistently been found to be significant regardless of which modelled output are being analysed, which spatial location or probability distribution function is used, clearly underlines the extent of this influence. FVC, Vegetation Height and Mo were also significant influences on output variance decomposition. Several model components have been previously studied (e.g., spatial changes, changes in probability functions, a wide range of output variables), however this study, has, for the first time analysed the impact of changing the time of day at which the model is run. Although the relative importance of the contribution of the input parameters does not generally change with times of model output simulation its influence can clearly be seen in the varying percentage contribution of main and total effects results of each model output, with significant variations in the contributions of changing input parameters observed during the day.
Our results represent the final component of a comprehensive programme of testing this model, whilst also providing further evidence supporting the model coherence and correspondence to the behaviour of a natural system. The latter is an important element as regards the model use in future either as a standalone tool or synergistically with EO data. Future work should be aimed at validating SimSphere outputs against in-situ or reference data, to confirm the operational development of this model as an applied tool in a wide range of disciplines, and detect possible causes of uncertainty which might hamper future development.