4.3. Input Data Availability
Data availability in a study catchment is often the principal model selection criterion and also the prominent concern considered by the model users at an early stage before carrying out succeeding modeling operations. Characterization of the study area is costly and time consuming and it is often performed independently of the modeling activities, which are generally foreseen at a later stage. Available data may derive from heterogeneous sources, such as local gauging stations, available databases, and environmental or water authorities of the government, or from scientific literature. For nutrient modeling work, input data limitation is generally specific for a single case study. Usually, missing data are related to deep soil data, crop and management information, groundwater system data, and nutrient data.
Due to their geological features, deep soil and groundwater system are not easy to access and their nutrient load is often difficult to properly quantify. Although in some catchments nutrient data are recorded, they are limited in both nutrient forms and nutrient sources. Without instruments or measurement records, which require substantial investment in monitoring network of sensors, data time series with high spatio-temporal resolution are seldom available for these data. In ungauged catchments, the condition of data scarcity is even more relevant. However, model functionalities can optimally operate only when all required field-specific input data are provided. Under the circumstances that limited input data are available or some essential data are missing, extra measures should be taken to find surrogate data. This can be achieved by collecting additional measurements, which will increase the time required to obtain model outputs and the costs of the investigation. Thus, data availability is definitely a dominant criterion and precondition of great importance for model selection. As shown in Table 2
, Table 3
and Table 4
, in selecting model, the required input data should be checked carefully to make sure that the required input are available. With a clear comparison between the input requirements and the available data, the user can get a first judgment on which model is the easiest to set up and which data should be measured if a more complex and data-demanding model has to be selected.
To provide a general approach towards the issue of model selection, in the present study, each input parameter has the same importance in the protocol. Depending on a specific case study and on the experience of the user, different weights could be assigned to different parameters.
4.4. Model Complexity
The complexity of a model is frequently associated to the model functionality. Model functionality is reflected in model outputs in the form of quality and quantity and by what and how many processes are implemented in the model. Complex models often appear to be an optimal choice, due to the detailed process description they entail. However, as a pay-off, they are extremely data-demanding, in order to achieve reliable model results and predictions. Complex models can simulate diverse processes with various outputs, but each computation requires a given amount of input data. In general, the more functionalities a model implements, the more input data are needed. In other words, full-featured and powerful functionalities can be realized on the basis of owning abundant input data. In conditions of limited data availability, complex models are not easy or unable to properly operate as expected and tend to generate a higher uncertainty. At this time, a simple model with less complexity could be a better choice. A simple model may implement the same functionalities as a complex model does, although it may neglect some processes, which may be of secondary importance in a specific case study. Fulfilling all the input requirements is a necessity in order to increase model complexity.
In model selection, we suggest to consider model complexity as a decision criterion, which depends on users’ objectives, on data availability and on a cost-benefit analysis [37
]. Irrelevant functionalities should never be considered as the possible reasons for choosing a model. Besides, model complexity is an essential model attribute, which cannot be changed by the user, but the user can decide to choose the model with an adequate complexity level.
4.6. Model Selection Protocol: Some Applications
In each project, scientists have fixed and explicit research objects, e.g., specific nutrient forms, particular transport processes or certain reactions. For them, benefit is evaluated according to the model’s ability of supporting the explanation of particular research questions. The model selection protocol suggests first to group different models, which allows addressing the research objects and then to estimate the costs to acquire missing data for each model, considering the limitation of research funding. Let us exemplify this situation, considering the five models presented above. One researcher aims at studying nitrogen transport from different sources to a reach, to perform then further eco-hydrological analyses. Within the five models, only GWLF and SWAT have the corresponding functions. GWLF requires 15 basic input data to set up the hydrological model and eight N inputs for nutrient transport simulations. SWAT requires 39 basic input data and 13 N input data. The corresponding N outputs of the two models are similar. It is visible that SWAT is more data demanding than GWLF. However, from the point of view of a researcher, the processes described by GWLF could be too simplified. GWLF simply predicts the amount of nitrogen transported from each source but it does not describe other processes, for example, GWLF ignores the nitrogen flux interactions inside the soil profile between surface water and groundwater. Due to a more complete representation of hydrological cycle, SWAT is capable of describing in more details a larger number of processes, which benefit the investigation of the mechanisms of N transport form different sources. Thus, despite its higher cost due to the additional data required, SWAT provides some benefits for a researcher, which can justify the investment of collected missing information. If affordable within the available funding, SWAT is chosen by the researcher.
For stakeholders, the first priority is usually profit. Among multiple models, ordinarily, the model costing the least to get all the required input data is applied to a local application. A more data-demanding model is justified only if its outputs can provide a significant increase in the stakeholder’s profits. For example, let us assume that controlling nitrate leaching by soil nitrate remediation is the goal of a stakeholder. A modeling approach is required to predict the local nitrate leaching by percolation, which can provide insights about the remediation strategy to be performed. SWAT, SWIM, AnnAGNPS, and HSPF are potential models. For operating this functionality, SWAT, SWIM, AnnAGNPS, and HSPF, respectively, require 39, 34, 27 and 24 basic inputs to present the hydrological processes, and they need the same nitrogen input: soil nitrate. Intuitively, HSPF is the best choice because of its least input requirements. However, taking into account the final goal of remediation, HSPF can be substituted by SWIM or SWAT. Indeed HSPF represents the soil with three layers, while SWIM and SWAT can divide the soil column into ten layers. A more detailed investigation on the nitrate leaching state of each layer can help to locate the crucial layers for nitrate remediation. Workload and investment of remediation work can be largely reduced by focusing on a specific soil layer. Compared with HSPF, SWIM and SWAT cost more to fulfill the input requirements, but in the long run, they may be beneficial to reduce the remediation costs. In this way, SWIM and SWAT could be more suitable choices. Since SWIM requires less in input data than SWAT, it is preferred in the final selection.
Water authorities frequently work with large projects, which involve numerous modelers from different departments of multiple districts and require a close cooperation and guidance of policy makers. Aiming at an overall societal, environmental and economic planning, policy makers take into account numerous aspects in their requirements. Dealing with benefit, they evaluate the long-term benefit of the whole project with a long-term perspective. For instance, one policy maker wants to carry out a comprehensive investigation about nitrogen pollution in a large-scale rural catchment, in order to plan future economic activities on the interested region. In this catchment, there is no reservoir, wetland, pond, or pothole. A modeling approach is applied to simulate nitrogen dynamics. As it is a comprehensive study, both nitrogen transport and transformation are considered. The catchment is divided into several sub-catchments. Modelers from multiple departments participate in this work and each is responsible for the modeling work of one sub-catchment. A model should be selected to fit this practical work. According to the nitrogen outputs (Table 5
), GWLF is excluded since it does not model N transformation processes. SWAT, SWIM, AnnAGNPS, and HSPF are capable in predicting both N transport and N transformation. Compared with the other two, SWIM and AnnAGNPS are not selected, because both of them simulate fewer reactions (four reactions) for N transformation. Moreover, they are not able to predict N transport in groundwater flow, which may affect the accuracy of prediction, particularly if the groundwater is abundant. Therefore, SWAT and HSPF are the two options considered. They both have competitive features to predict N transport and N transformation. Concerning N transport modeling, SWAT and HSPF can simulate N transport with the same four hydrological processes. Additionally, SWAT is able to model N transport from the first soil layer to the surface by evaporation, while HSPF does not entail this feature. With respect to the N transformation, SWAT and HSPF can both simulate seven reactions including nitrification, denitrification, ammonia volatilization, mineralization/immobilization, decomposition, plant uptake and in-stream reactions. As distinctive functions, HSPF can simulate ionization and ammonium adsorption/desorption; SWAT can model N fixation. Considering N outputs, SWAT and HSPF are comparable. There is not a significant difference between both models in terms of nutrient inputs. Considering basic inputs required by the relevant functionalities, SWAT needs 44 input data while HSPF 25. This means that SWAT is more data demanding than HSPF. Based on the cost, it seems better to choose HSPF. From the perspective of the complexity of the hydrological processes represented in the two models, HSPF represents most nutrient dynamics in a relatively simple way due to its simple hydrological basis. SWAT simulates much more complex hydrological systems than HSPF, and due to the heterogeneous community of modelers involved, this could be seen as a problem due to the potentially different background and experience of the users. Therefore, a more complex representation of the processes, in this case, can lead to a delay in the achievement of the model results. In view of the whole project, HSPF is a better choice than SWAT due to the three facts:
HSPF is complex enough to solve and further explain the problems of this practical case;
HSPF’s easier representation of hydrological processes is easier to be handled by the modelers;
An easier understanding of the model will lead to a faster achievement of the project objective.