Spatio-Temporal Hydrological Model Structure and Parametrization Analysis

Many grid-based spatial hydrological models suffer from the complexity of setting up a coherent spatial structure to calibrate such a complex, highly parameterized system. There are essential aspects of model-building to be taken into account: spatial resolution, the routing equation limitations, and calibration of spatial parameters, and their influence on modeling results, all are decisions that are often made without adequate analysis. In this research, an experimental analysis of grid discretization level, an analysis of processes integration, and the routing concepts are analyzed. The HBV-96 model is set up for each cell, and later on, cells are integrated into an interlinked modeling system (Hapi). The Jiboa River Basin in El Salvador is used as a case study. The first concept tested is the model structure temporal responses, which are highly linked to the runoff dynamics. By changing the runoff generation model description, we explore the responses to events. Two routing models are considered: Muskingum, which routes the runoff from each cell following the river network, and Maxbas, which routes the runoff directly to the outlet. The second concept is the spatial representation, where the model is built and tested for different spatial resolutions (500 m, 1 km, 2 km, and 4 km). The results show that the spatial sensitivity of the resolution is highly linked to the routing method, and it was found that routing sensitivity influenced the model performance more than the spatial discretization, and allowing for coarser discretization makes the model simpler and computationally faster. Slight performance improvement is gained by using different parameters’ values for each cell. It was found that the 2 km cell size corresponds to the least model error values. The proposed hydrological modeling codes have been published as open-source.


Introduction
Among different types of hydrological models, an increasing interest has been shown in spatially distributed models for their ability to derive spatially distributed and, in particular, to support the analysis of land use impact on landscape hydrology and water quality management [1][2][3][4]. For these applications, water balance components and fluxes throughout the landscapes have to be identified, and thus land-surface heterogeneity representation should be considered [5]. Extensive studies have been carried out to develop physical models, but their extensive requirements for data limit their use considerably [1]. On the other hand, distributed conceptual hydrological models are mainly based on mass conservation, usually calibrated with simple rainfall-runoff information, and still preserve certain physical basis in their structure [2,[6][7][8].
Nowadays, there are several physically based distributed hydrological models used for practical purposes, such as the Institute of Hydrology Distributed Model (IHDM) [2], System Hydrological European (SHE) [3], and many others. However, these models are Many researchers have changed the structure of the conceptual model and coupled it with a routing model to be adapted in a conceptual distributed model. Ref. [22] used the concept of "Flex-Topo" [23] to build a distributed model of spatial elements, "Fields", resulting from the intersection of subcatchment layer with the HRUs layer; each HRU has the same parameters and HBV model structure, and calculations are performed and updated for each field (different state variables), and flow from each subcatchment is routed to the outlet using a lag function used by [13]. Mayr et al. [24] used HBV-ETH [21,24,25] to build a distributed model that improves the representation of snowmelt in highly glacierized basins. Furthermore, they adjusted the model structure to calculate the accumulation in each cell using the cell curvature. The runoff from each cell is then summed without routing to obtain the total catchment flow. Boa [4] combined saturation excess runoff and infiltration excess to form the runoff generation model and coupled it with a 2D kinematic wave model for overland flow to solve the problem of catchment sinks storing water, and a 1D kinematic wave model for channel routing.
In this research, we aim at exploring a distributed modeling system to improve the understanding of the difference in performance between different grid discretization levels. This is done for a tropical river basin that is exposed periodically to extreme events. We also aim to provide a clear identification of the changes in discretization, and to understand if the change in performance with discretization is due to the lack of a well-defined routing scheme. We explore two methods and benchmarked the errors obtained in a validation process. The Jiboa River Basin in El Salvador is used as a case study.
The paper is organized as follows: the following section describes the study catchment; Section 3 presents the methodology and experimental setup. Section 4 includes the results and discussion. The last section presents conclusions.

Case Study
The study catchment is in the Jiboa River in the central part of El Salvador. The catchment ( Figure 1) spread over 432 km 2 with the discharge gauge Puente Viejo locates at the outlet of the catchment at (13.52 • N, 88.98 • W). The catchment consisted of four topographical areas: San Vicente Volcano with an elevation of 2171 m a.s.l (above sea level) and very steep sides, Ilopango Lake (covers an area of 70 km 2 ), Balsamo mountain range area with an elevation of 1000 m a.s.l, and the coastal plain area with elevation ranges from 0-100 m a.s.l.

Hydrological Model Setup
This section describes the general methodology followed in this research for developing and testing the distributed models. First, we describe the approach for discretizing the landscape and connecting multiple model elements. Second the hydrological processes conceptualization in the model structure and the iterative process to improve model representation of the fast runoff response. Later we define the model parameters and parameterization approaches.

Hydrological Grid-Based Model "Hapi"
This work is based on an extended version of the HBV-96 model structure that provides grid-based conceptual integration of processing river basin. The structure is based on the mass conservation principle keeping the hydrological process as the main variables (Equation (1)) where P is the precipitation, E is the actual evapotranspiration, Q is the discharge, SP, SM, UZ, LZ are the snowpack, soil moisture, upper zone, lower zone, and lake state variables, respectively, all in mm/time step, Equation (1) in its discrete form is represented in the Hapi model like in the following form (Equation (2)): where i, t + 1 refer to the cell at time step t + 1, and f is a routing method (Maxbas or Muskingum).

Spatial Discretization and Spatial Connectivity
Two types of models with different spatial discretizations were considered in this study. In the first type, the lake is represented at the outlet of one subcatchment, and the rest of the catchment is represented as another subcatchment. Semi-distributed model structure is used. Each subcatchment is assigned mean areal forcing data and different parameters; state variables evolve independently and do not interact with the state variables of the other subcatchments. The calculated runoff by each subcatchment is routed directly to the outlet using a Maxbas triangular function [13] (Figure 2). (This semi-distributed model of type 1 is used as a benchmark model with which the two distributed model variants of the second type are compared).
The second type of model uses spatial discretization on a regular grid (raster-based), where the Hapi Hydrologic framework [26] is used to build two variants of conceptual distributed models. Hapi uses multiple geo-registered layers of raster data sets to describe the drainage basin in order to perform the hydrologic computations and pixel-by-pixel routing in the correct order following the drainage network to the outlet. Pixels are assumed to be laterally disconnected from each other, so state variables are uncoupled and change independently. Figure 2 presents the two considered variants (frameworks) of raster-based models of the second type. The first one, FW-1 (framework 1) (Figure 2A), routes the runoff from each cell directly to the outlet using Maxbas function, each cell has a Maxbas value depends on its flow path length; discharge at the outlet is obtained by summing up the routed runoff from all cells. In the second raster-based model, FW-2 ( Figure 2B), runoff is routed from one cell to another following the flow direction input (see Section 3.1.4) using the Muskingum method (see Section 3.1.3). For both FW-1 and FW-2, only superficial discharge (surface runoff and interflow generated from upper zone UZ reservoir in the HBV model structure) is routed, while flow generated from the lower zone LZ is assumed to be lumped for the whole catchment (see Section 3.1.3).

Model Structure
Hydrological processes within the adopted spatial discretization (subcatchment and cell) are represented using the same lumped HBV model structure [13]. HBV consists of three tanks (soil moisture, upper response, and lower response) without considering the snow subroutine process (as followed in this study). For each spatial discretization element (cell), infiltration, soil moisture, percolation, and runoff generation processes are calculated. The runoff process is represented using a nonlinear function of actual soil moisture and precipitation in the upper reservoir; the nonlinear function simulates the surface runoff and interflow. The baseflow is simulated using a linear function of the lower reservoir tank. Both reservoirs interact in series by constant capillary rise and percolation. The primary parameters of the HBV are shown in Table 1, with the range of values used in the calibration for each parameter. More details on the HBV model can be found in [13].
Several trials are made to adapt the model structure following an iterative calibration process to understand better which hydrologic process needs to be improved in the model structure. In the iterative process, we either modify the conceptualization of the processes or change the parameter range of values used in the calibration process using OAT sensitivity analysis (see Section 3.2).
Following the iterative process, it is noticed that the model significantly underestimates the whole flood season, which indicates the poor representation of the fast runoff response of the model. The interaction between soil moisture and upper zone tanks has to be regulated differently in order to increase the sensitivity of the model structure to the catchment fast response. Both soil moisture and upper zone tanks are connected through recharge (Equation (3) and Figure 3) and capillary rise; by checking the sensitivity of the runoff to parameters of both processes, it was found that model performance (RMSE error) is susceptible to a slight change in the value of beta.
where R is the recharge from soil moisture tank to upper zone tank, IN is the infiltration to the soil moisture tank, SM is the soil moisture storage in mm, FC is the max soil moisture storage capacity, β is a nonlinear runoff parameter that conceptualizes the degree of soil saturation and its effect on the recharge response.  Schumann [27] mentioned the different interpretations and responses that model structure exhibit just by adopting a different range of values for β. In [27], it has also been stated that during a rainfall event, soil moisture tank would be filled up rapidly; recharge rate will be high, and that could be represented using Equation (3) with a convex behavior of soil storage capacity (beta is less than 1 ( Figure 3A)). In contrast, using the same equation with a concave behavior (beta is greater than 1 ( Figure 3B)) will indicate that the part of the basin with full soil storage (saturated) during rainfall event is small, and consequently, that the recharge is small.
In the calibration iterative process, while using the convex behavior of soil moisture storage (beta less than 1, which will be referred to as "Model Structure-1") with the distributed model, the beta value resulting from the calibration is minimal (less than 0.1). Very small beta values (power coefficient almost zero) mean that the soil moisture subroutine's effect is very minimal (all water that comes to the soil tank goes directly to the upper zone tank). This behavior of passing all the water to the upper zone indicates that the model is trying to generate more runoff to overcome the runoff underestimation, and it has already reached its highest capability (beta is almost zero), and, still, model performance is not good.
The concave soil moisture behavior is adjusted and used in the distributed model along with another rule that passes water to the upper zone once soil moisture exceeds the field capacity FC ( Figure 3C) as an artificial recharge which will be referred to as "Model Structure-2". This rule in the model structure forces the calibration algorithm to give smaller FC values to increase the artificial recharge, which in return increases the subsurface runoff and makes the simulated hydrograph closer to the observed during the rainy season.
Based on the iterative process of adopting a hypothesis about the catchment behavior (convex or concave behavior) then check the adequacy of the hypothesis in the model performance and visually inspect whether the expected shape of the hydrograph is obtained, the semi-distributed model will be implemented using both Model Structure 1 and 2, and it will be the base case, while Model Structure 2 will be used for both FW-1 and FW-2.
The lake is simulated explicitly with Model Structure-1 using the storage-discharge curve relationship described by [13,28]. The outflow from basins upstream of the lake is summed and used as an inflow to the lake (represented as a separate tank with the lake storage as a state variable). The new lake storage is then updated with the recent inflow, direct lake precipitation, and evaporation, the outflow from the lake can then be obtained using the storage-discharge curve.
The superficial discharge generated from the upper zone tank in Model Structure-1 and 2 is assumed to be distributed, and computations are performed on a cell scale. The runoff is routed either directly to the outlet using Maxbas Triangular function as in FW-1, or routed from one cell to another following the flow direction input (see Section 3.1.4) as in FW-2. In the FW-1 model, only one Maxbas parameter value is used as an input to the model regardless of the number of cells, and this value is assigned to the cell with the longest flow path length, and Maxbas values for all other cells are calculated proportionally based on the flow path length of each cell ( Figure 4).
The Muskingum [29] routing method is used for routing the flow from cell to cell in FW-2, as this method has the ability to alter the coefficient for each cell at every computational time step (and, as observed in [30], it also requires limited data for the computations, but with modern computers, this becomes not very relevant). The Muskingum method employs the continuity equation to predict the magnitude, volume, and temporal patterns of flow as it translates downstream of a channel (Equations (4) and (5)) where I, Q are Inflow and Outflow respectively, ∆t is the time step, and ∆S is the storage. K is the travel time, X is a weighting coefficient to determine the linearity of the water surface, x values are between 0 and 0.5, and m is an exponential constant varying from 0.6 for the rectangle channel to 1 [31]. In this study, we use the linear Muskingum version, which uses m equal to 1. The Muskingum equation can be written as (Equations (6) and (7)): where J + 1 and J denote the present time step and the previous time step, respectively, C o , C 1, and C 2 are three coefficients that can be calculated using K and X (Equation (7)). The main difference between the Muskingum-Cunge method and the Muskingum method is that the estimation of the parameters K and X [29] derives an expression to calculate each coefficient; however, in this study, both parameters K and X are freely calibrated parameters. Based on the range of values for parameter X and K, and the temporal resolution ∆t (hourly time step), the value of the coefficient C o might be negative, Mishra 2001 [32] suggested that negative values of C o should be avoided in Equation (8) to accomplish numerical stability. The time step ∆t should be small enough to achieve the assumption of linearity of flow rate over the timestep, especially if the response time of the catchment is short [33]. To keep the numerical method theoretically stable, another constraint was suggested by Mishra and Singh, (2001) [32], as described in Equation (9) Both K and X are obtained through optimization by defining a possible range of values for the optimization algorithm to search for the optimum values, by defining Equations (8) and (9) as boundaries in the calibration process, the feasible search space for parameter K and X will be as shown in Figure 5. In this study, the Hapi model [26] was employed (the source code is available in GitHub); it is a framework for building raster-based conceptual distributed hydrological models where computations are performed on a pixel-by-pixel basis. It also enables multiscale modeling and thus suitable for simulations from coarse to relatively fine resolutions. Meteorological data (precipitation, evapotranspiration, and temperature) are supplied in the form of gauge inputs or raster data. Hapi uses HBV-96 as a conceptual model to represent the hydrological processes within a cell, and the Muskingum method as a routing model between cells, with the potential of using any other conceptual hydrological model in cells instead of HBV-96.

Model Inputs and Outputs
In the models used, the model computations are performed on a pixel-by-pixel basis. Meteorological data (precipitation, evapotranspiration, and temperature) data can be processed in a raster form. A simple statistical formula like inverse squared distance weighting (ISDW) (Equation (10)) can be used to distribute the values of gauge data over different cell resolutions as was done in this study (15 gauges).
where Z is the gauge's value, i is the number of gauges, and d is the distance between each gauge and the cell center.
Hapi uses multiple geo-registered raster data to describe the drainage basin. All the raster data required were derived from DEM, where it is used to delineate the catchment and subdivide it into two subcatchments (in the case of the semi-distributed model). A series of GIS modules were used to perform operations to generate these input data, such as pit removal to fill depressions in the DEM, flow direction, flow accumulation, river network, and flow path length ( Figure 6). DEM is resampled many times in steps (not directly from 30 m to 4 km) to prepare the raster input data for different resolutions. To keep the topographical features of the river and flow direction in coarser resolution as close as possible to the original shape of the river network, DEM depressions are filled after each resampling.

Model Parameters
Two model structures are used in this study (Figure 3), being different only in the surface runoff generation process. Both model structures use ten parameters (Table 1) to describe the whole range of hydrological processes in the catchment (snow subroutine is excluded). Furthermore, additional parameters were considered for the lake and the routing methods (Maxbas and Muskingum K and X parameters).

Parameterization Approaches
Two simple parameterization approaches are followed in the distributed models in this study; the first approach assumes that all cells use the same set of parameters. The second approach assumes different parameter values for each cell, considering that in a large catchment, most of the hydrological processes change significantly with the change of land use or elevation. Lower zone response is considered uniform in all cells for both parameterization approaches.

Model Calibration
Meteorological input data are split into two sets with similar statistical properties for calibration and verification, respectively. The available data with good quality is two-anda-half years long (June 2012 to November 2014), including two dry seasons and three wet seasons. As the model is built mainly for flood forecasting purposes, data on the two wet seasons is included in the training dataset (till December 2013), including the highest peak in the hydrograph.
Conceptual model parameters are not measurable, and thus calibration procedure has to be used to determine model parameters. The calibration process consisted of three steps:  Starting from initial boundaries for each parameter to define the search space, GA runs for 30 generations with 200 populations to minimize overall RMSE as an objective function. Afterward, One-at-a-time (OAT) sensitivity analysis is performed to re-explore all parameters' search space and to redefine the boundaries for the most influential parameters to the model performance criteria. The interaction between parameters is checked using Sabol's concept by plotting relative values of all parameters in one graph [34] (Figure 8). The previous calibration steps are performed many times as an iterative process to define the model structure and the parameter range (see Section 3.1.5).
Choice of parameter boundaries is crucial for the calibration results, and initial choice may not be the best. After the GA runs for the entire 30 generations or converges in the middle, the evolution from one generation to another focuses the search space around good results. The GA may ignore parts of the multidimensional search space after a certain number of generations, trying to converge faster. Using the OAT sensitivity analysis reexplores the entire range of each parameter after reaching the optimum set of parameter values. Redefining the boundaries of each parameter to either narrow the range to make the algorithm feel the influence of a minimal change in one parameter value or to expand one of the boundaries of a parameter as the best value locates very close to the boundary (parameter K1-j). With the new search region, the randomized search may find a better result. (For example, model performance improves significantly with a very slight change in parameter Perc-j see Figure 8). This final step of the calibration is carried out using a different type of randomized search (HS) as the calibration algorithm with the new boundaries. HS generates a new vector of parameters from all the existing vectors in the harmony memory, while GA generates offsprings (new parameter sets) from only two parents [35]. Our experiments showed that such hybridization leads to better results than just using GA at both stages.

Model Performance Metrics
Model performance is assessed using traditional statistical goodness-of-fit metrics, such as root mean square error RMSE (Equation (11)) and Nash-Sutcliffe efficiency (NSE) (Equation (12)) [36]. RMSE is also used as an objective function for the calibration of all models. RMSE and NSE focus on the high values (peaks) in the time series and downplay lower values, while NSE with logarithmic values (NSEln) (Equation (13)) puts greater emphasis on low flow values. Furthermore, to estimate how much the model manages to reproduce the streamflow volume correctly, the mean cumulative error 'WB' is used (Equation (14)). Finally, the Kling-Gupta efficiency index (KGE) [37] is used as an aggregate metrics (Equation (15)).
where Qobs and Qsim are the observed and simulated discharge at time t, respectively, Qavg is the mean observed discharge, and n is the number of time steps. c is the crosscorrelation between Qobs and Qsim, α measures the variability in the data α = σ Qsim /σ Qobs , σ Qsim , σ Qobs are the standard deviation in the simulated and observed discharge, respectively, β = QsimAvg/Qavg.

Data
The basin is instrumented by a telemetry system of 15-min time step rainfall gauges ( Figure 9A), which provides data for the period between 2002 and 2015. Only one water level gauge at the catchment outlet is available and provides data between 2012 and 2016. For the calibration, the rainfall and water level measurements were aggregated to hourly time step, while the water level data was further converted into discharge using a rating curve equation) Equation (16)(and the equation coefficient are listed in Table 2. Due to gaps in the rainfall data, only the period between June 2012 to November 2014 is considered in the study. where H is the water level in meters, Ho, a, and b are coefficient changes based on the date of the water level value, and Q is the discharge in m 3 /s.  Monthly potential evapotranspiration data were obtained for each gauge using a potential evapotranspiration-elevation relation derived by the Hargreaves method and linear regression [38]. Hourly temperature estimates were calculated from one available daily gauge (Cojutepeque) using sine function approximation, and were used for the whole catchment.
The lake is simulated using a storage-outflow relation, where storage changes from inflow, direct rainfall, and evaporation are updated at each time step, and the lake-outflow is calculated using the newly calculated storage. Inverse square distance weighting method was used to calculate distributed data for different spatial resolution for all data required for the hydrological simulation (precipitation, temperature, potential evapotranspiration). All the data available provided by the Ministry of Environment and Natural Resources of El Salvador (MARN).

Model Setup
The lake's existence in the catchment made it impossible to represent the whole catchment as one lumped model; therefore, the catchment is divided into two parts, the lake and the Jiboa subcatchments ( Figure 9B). The lake and all the upstream areas to the lake are represented as one lumped model using Model Structure-1, referred to as a lake subcatchment. The rest of the catchment, which will be referred to as the "Jiboa subcatchment", is simulated using different spatial discretizations (lumped or cells) as follows: • The experimental model setup followed in this research, starting from the model type, spatial discretization, model structure, routing method, and parameterization approach, is shown in Figure 10. For model descriptions, see Table 3.

Results and Discussion
Runoff dynamics in the catchment are significantly influenced by the lake, which is the reason subcatchments have been defined based on the outlet of the lake. Therefore, the semi-distributed model is used as the simplest and the base case in this study. However, the elevation differs a lot between the four topographical areas in the catchment, Temperature from only one gauge is available and used in the study, which might affect the performance of the models; however, the effect will be the same on all models, therefore the comparison between models would still be valid.
It is worthy of mentioning that we explored CMORPH rainfall satellite data with an hourly temporal resolution and 8 × 8 km 2 spatial resolution; however, it shows very poor quality compared to ground-based gauges. The kriging method wasn't used due to the non-Gaussian nature and the skewness of the rainfall data, being discontinuous and having many zeros; the kriging method requires rainfall data to be transformed to the normal distribution, and for this reason, the BoxCox and normal score transform method were investigated but did not fully succeed at transforming the rainfall to normally distributed values. On the basis of our knowledge of the localized orographic pattern of rainfall at the east part of the catchment where there is a volcano, in addition to the convective rainfall pattern in the catchment, the ISDW method is preferred over the inverse distance weighting method (IDW) as it gives smaller weights (1/d 2 instead of 1/d) with greater distances, which suits the rainfall nature in the area. Table 4 presents the performances of all models. The semiDist-1 model shows better results, with an RMSE value of 1.33, than the SemiDist-2, with an RMSE value of 1.65. FW-2 4 km 2 model performance is slightly better than the performance of SemiDist-2, both having the same model structure, while it shows lower performance than SemiDist-1, the reason is clearly related to the conceptualization of direct runoff process. Another possible explanation, as stated by Das et al. (2008) [39], could be the overparameterization of the distributed models with high and low resolutions, and it could also be that averaging in space compensates for stochastic errors. A possible source of error that affects the performance of the models with different cell resolutions is that precipitation and evapotranspiration were obtained using the ISDW interpolation method (Equation (10)), and interpolation methods tend to have higher estimation error with higher resolution. Moreover, the generated values do not reflect the spatial variation of the precipitation or evapotranspiration; rather, they reflect the statistical method's inherent effect on gauges' values at different distances. The performance statistics of all models for the calibration period (June 2012 to December 2013) are presented in a normalized form in Figure 11. FW-2 4 km 2 distributed model has the best performance among all distributed models during the calibration periods, with an RMSE value of 1.6, Nash-Sutcliffe value of 0.64, and a value of 0.62 for logarithmic Nash-Sutcliffe. RMSE for all models of FW-1 ranges between 1.63 and 1.69, whereas for FW-1-L models, RMSE ranges between 1.65 and 1.69, and for FW-1-D RMSE ranges between 1.63 and 1.69, as shown in Table 4. The simulated hydrograph with FW-2 4 km 2 is benchmarked to the gauge hydrograph in Figure 12.  For the optimal set of parameters obtained by calibration for both SemiDist-1 and SemiDist-2 model, see Table 5.  Figure 12 shows the simulated and observed flow hydrograph using FW-2 4 km 2 .
On average, the medium and low flow are well estimated, considering that low flow is dominated by the outflow from the lake during the dry season, and it is slightly affected by the spatial representation of the Jiboa subcatchment. Peak flows are underestimated, as shown in Figure 13. FW-1-L 2 km 2 and FW-1-L 1 km 2 are the closest to the observed hydrograph in terms of the magnitude and shape of the peak (rising and falling limb), FW-1-D models with resolutions of 4 km 2 and 2 km 2 comes after in capturing the highest peak, whereas FW-2 (4 km 2 ) comes next. FW-2 4 km 2 has the highest NSE and NSE(Lf), which represents the model performance to reproduce high and low flow values, respectively, followed by FW-1-D 2 km 2 model then FW-1-L 1 km 2 ( Figure 11 and Table 4).
The results regarding the effect of increasing the number of parameters on model performance from lumped to distributed parameters are as expected ( Figure 14): FW-1-L has lower performance statistics (higher error) than FW-1-D; however, the difference in performance is minimal and using lumped parameters leads to better results with 500 m 2 resolution than that with distributed parameters (Figure 14). Even though the calibration process consists of three independent steps and two different calibration algorithms are used to obtain better parameters and not to end in a local minimum, with more model parameters, there is a need for more iterations during the calibration process. Besides, an increase in the number of parameters with finer resolutions, and the fact that overestimation of flow in one part of the catchment can be compensated by underestimation of the flow in another part; this is because only one flow gauge at the outlet is used for the calibration. These points could be the reason for the slight difference in model performance with an increase in the model parameters number.  The decrease in performance with the increase of the spatial resolution confirms the hypothesis by [40][41][42][43] that complex process interaction at small scales can be represented by simple approaches at a larger scale due to the self-organizing capacity of large systems. Therefore, the more we use finer resolution with the conceptual model, the more we lose the self-organizing capacity, and at a certain scale, we should switch to more complex models. Both FW-1-L and FW-1-D models show similar behavior: error decreases with the increase in model resolution to reach the lowest error value (RMSE 1.63) at a resolution of 2 km 2 , and then it starts increasing again ( Figure 14).
It was expected that increasing the complexity of the model would lead to overfitting. However, considering that both calibration and validation datasets are statistically different, and the validation dataset has an even higher error with the semi-distributed model (Table 4), a slight increase in error corresponding to an increased spatial resolution does not indicate overfitting, especially given that it decreases at a resolution of 500 m 2 ( Figure 15). Based on the nonlinear constraints that govern the values of the routing parameters K and X (Figure 5), the point in the feasible parameter space that gives minimum attenuation (X equals 0.5) corresponds to a 1-time step of translation (K equals 1). The point that gives the minimum translation (K equals 0.5) corresponds to the maximum attenuation (X equals 0). For fine spatial resolutions, the flood wave is assumed to travel between cells with minimal attenuation and small translation, which conflicts with the feasible space of parameter values created by the constraints. Therefore, for 500 m cell resolution, flood wave is supposed to be as close as possible to remain unattenuated (X equals 0.5), and traveling time should be around 0.13 h (500 m/(1 × 60 × 60), assuming flood speed of 1 m/s). However, these values are outside the feasible search space of K and X to maintain the stability of the method with the given time step (hourly time step), so the only solution for this drawback is to use coarser cell resolution considering that using a finer time step is not always possible. Figure 16 shows the effect of Muskingum parameters on the highest flood wave simulated using different cell sizes. Using small spatial resolutions led to more attenuating and translation of the hydrograph, which in return results in a decline in the model performance with smaller cell sizes. The previous trend between cell size and model performance shows the strong influence of the routing method and cell size on the accuracy of the model predictions. It is important to stress the following: the lumped representation of the whole catchment, including the lake (implicitly represented in the lower reservoir and not at the outlet of the catchment), did not capture the catchment baseflow response. Thus, it is very important to define a separate state variable to reflect the lake characteristics (lake storage, water level, or lake surface area). The lake state variable is updated based on the interaction between the lake and other hydrological responses (inflow, direct rainfall, evaporation, and outflow). Modeling the lake implicitly without this state variable would lead to a misinterpretation of both lower response from the lower reservoir and outflow from the lake. Therefore, explicit representation of the lake, which requires dividing the catchment into sub-basins with the lake being at the outlet of one of the sub-basins, is the simplest way to simulate the catchment. Consequently, the semi-distributed model structure is used as a base case for this study.
The lake is a very dominating hydrological feature, where base flow in dry seasons comes mainly from the lake's outflow (Figure 17). Using a rating curve (storage-outflow relation) in lake outflow calculation overwrites all the upstream hydrological processes and gives all the importance to the initial lake volume at the beginning of the simulation; the initial lake volume cannot be corrected by considering a warm-up period at the beginning of the simulation.

Conclusions
In this study, we studied the performance of several hydrological (semi-distributed) model structures on an example of modeling the Jiboa River Basin in El Salvador. Two routing models are considered. The second concept is the spatial representation, where the model is built and tested for different spatial resolutions (500 m, 1 km, 2 km, and 4 km). The models were calibrated using a combination of two randomized search algorithms, complemented by the sensitivity analysis.
The results allow us to conclude that that the spatial sensitivity of the resolution is highly linked to the routing method; the routing sensitivity influenced the model performance more than the spatial discretization, allowing for coarser discretization makes the model simpler and computationally faster. Slight performance improvement is gained by considering spatial catchment characteristics. It was found that the 2 km cell size corresponds to the least model error values.
The results show the same trend of error regarding increasing the resolution of spatial discretization from 4 km 2 to 500 m 2 for FW-1 with both lumped parameters and distributed parameters, while it shows a decrease in all performance criteria with increasing the resolution with FW-2 due to the high attenuation of peaks that Muskingum method performs with fine resolutions.
There are also limitations to this study. Only a single catchment is considered, so one should be careful with generalizations; only one model structure (HBV) was evaluated for model components. No comparison with other model structures of the same class was carried out. Propagation of uncertainty from inputs and parameters to the modeling results was not studied (albeit model sensitivity to parameters has been explored). The main forcing data (precipitation, evapotranspiration, and temperature) are interpolated from gauge data using the ISDW method for all resolutions. Due to the complexity of the distributed model parameter interaction and dependency, the calibration process is a difficult task and is highly affected by the initial limits of the parameters. Therefore, the genetic algorithm GA is used in an iterative approach to modify the model structure and to define the possible range for each parameter.
Future work will focus on uncertainty analysis of the built models and extending the Hapi model with more model formulations used in the model cells.
The Hapi modeling framework for building raster-based conceptual distributed hydrological models is open-source and is available on GitHub [26].
Author Contributions: D.S. and G.C.P. conceptualized the study, M.F. developed the Hapi algorithm, calibrated the model, analyzed the results, and drafted the initial manuscript. All authors contributed to the final manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Data Availability Statement: Restrictions apply to the availability of these data, which were used under license for this study.