Testing the Efficiency of Parameter Disaggregation for Distributed Rainfall ‐ Runoff Modelling

: A variety of hydrological models is currently available. Many of those employ physically based formulations to account for the complexity and spatial heterogeneity of natural processes. In turn, they require a substantial amount of spatial data, which may not always be available at sufficient quality. Recently, a top ‐ down approach for distributed rainfall ‐ runoff modelling has been developed, which aims at combining accuracy and simplicity. Essentially, a distributed model with uniform model parameters (base model) is derived from a calibrated lumped conceptual model. Subsequently, selected parameters are disaggregated based on links with the available spatially variable catchment properties. The disaggregation concept is now adjusted to better account for non ‐ linearities and extended to incorporate more model parameters (and, thus, larger catchment heterogeneity). The modelling approach is tested for a catchment including several flow gauging stations. The disaggregated model is shown to outperform the base model with respect to internal catchment dynamics, while performing similarly at the catchment outlet. Moreover, it manages to bridge on average 44% of the Nash–Sutcliffe efficiency difference between the base model and the lumped models calibrated for the internal gauging stations. Nevertheless, the aforementioned improvement is not necessarily sufficient for reliable model results.


Introduction
During the past decades, there has been a significant trend towards the development of hydrological catchment models. In this expanding modelling "ecosystem", one can find models of various types. Distributed models aim at describing dynamics by explicitly considering the heterogeneity in catchment properties. Those models can be useful for providing a more detailed view on catchment functioning, which is important for studies such as land cover change impact assessment. Nevertheless, due to the large number of parameters, and because sufficient data of good quality are frequently not available for assessing those, calibration is required and they can be prone to overparameterization and equifinality. The former refers to the use of more parameters than can be identified via properties retrieved based on soil and vegetation maps to get an initial estimation of the parameters for each sub-basin of the semi-distributed TOPNET model. Subsequently, some of the parameter fields were adjusted using calibrated multipliers (one per parameter field). This way, the initial spatial patterns of each parameter were maintained. Such multipliers were also used by other researchers (e.g., [6,7]). Pokhrel et al. [8] used the distributed conceptual HL-DHM model and added more degrees of freedom to the transfer functions by also using intercepts and exponents apart from multipliers. The number of parameters for calibration was greatly reduced, while the overall model performance was evaluated at the catchment outlet and was shown to improve compared to using the a priori parameters. At the same time, it was noted that the efficiency of the method largely depends on the estimation of the a priori parameter fields.
Samaniego et al. [9] proposed an interesting approach, termed multiscale parameter regionalization (MPR), which aims (among other things) to enhance the transferability of model parameters across different resolutions. The distributed model parameters are linked with catchment characteristics and are obtained at the fine resolution of the available data using calibrated transfer functions. Those parameters are then upscaled to match the model resolution. The upscaling operators are assumed to be quasi-scaleindependent. The approach was tested for a catchment of 4000 km 2 in Germany using the MHM model and, among other things, showed promising results when transferring the model parameters across different resolutions.
A different method for reducing the dimensionality of the parameter space was tested by [10], who applied the large-scale VIC land surface model [11,12] to the continental United States. The model parameters were calibrated for only a subset of the grid cells and the parameters for the remaining cells were linearly interpolated. Among other things, they tested the potential of the method on transferring the model parameters across different timesteps and resolutions. They reported promising results when parameters were transferred across temporal resolutions, while significant changes in model performance were noted when parameters were transferred across spatial resolutions. Ricard et al. [13] applied the semi-distributed Hydrotel model [14] for a region in Canada. As a trade-off between efficiency and equifinality for homogeneous hydrological regimes, they suggested an alternative calibration method (termed global calibration), which uses a single parameter set for the entire study area.
As opposed to their distributed counterparts, lumped models require substantially fewer inputs and are much faster to build and run. Moreover, since they only aim at encompassing the dominant catchment dynamics in a simplified manner, they typically use a limited number of parameters. In turn, this can mitigate equifinality. Another important aspect with respect to modelling applications is accuracy. It has been shown that, when data for calibration are available, a higher model complexity does not necessarily result in better performance (e.g., [15][16][17][18]). This largely stems from the complexity of hydrology itself and the difficulty to accurately capture important processes (e.g., actual evapotranspiration, groundwater recharge) via direct measurements. While the above form a call for more field research, inevitably they also restrict us with respect to how much complexity we can meaningfully add to a model.
In an attempt to combine advantages of both lumped and distributed models, Tran et al. [19] suggested a flexible generic top-down approach for the development of spatially distributed hydrological models. The process starts from a calibrated lumped model, whose parameters are used to derive a parameter set that is uniform over the catchment. Subsequently, disaggregation is carried out to come up with a spatially distributed parameter set. This is accomplished by identifying links between model parameters and catchment characteristics, such as topography, land cover and soil texture. An assumption of the disaggregation methodology was that the spatial parameter averages are equal to the corresponding uniform parameter values. The approach was demonstrated for the PDM [20], NAM [21] and VHM [22] conceptual models and two Belgian catchments, showing promising results and emerging as an interesting alternative to more complex and parameterized models when data limitations hinder the identification of their parameter values. The modelling framework was implemented using PCRaster Python [23].
The disaggregation concept is now extended for more model parameters. This allows for the incorporation of more catchment properties (and, thus, larger degree of heterogeneity) in the distributed model parameterization. Moreover, an alternative form of the disaggregation relationships is used to better account for potential non-linearity between the model parameters and the a priori selected catchment properties, while also to allow the spatial parameter averages to be different from the uniform parameter values. To better account for the hydrological impact of impervious areas, a modified version of the NAM model is used. The performance of the modelling approach is investigated for a catchment including several flow gauging stations. This allows for a more thorough evaluation of methodology. The disaggregated model is calibrated using data from both the catchment outlet and the internal gauging stations. The performance of the disaggregated distributed model is compared against a model with non-disaggregated parameters. In addition, lumped models are calibrated for the internal catchment gauging stations and are used as performance benchmarks.
In Sections 2-6, the materials and methods used for this work are described. In Section 2, the study area and the available flow and meteorological data are presented. The lumped and distributed model structures are described in Sections 3 and 4, respectively. Section 5 revolves around the parameter disaggregation methodology. In Section 6, the model calibration and evaluation methodologies are explained. Model results are presented and discussed in Section 7, while conclusions are formulated in Section 8.

Catchment
The study area covers about 1969 km 2 and is located in Belgium, upstream of the river Demer gauging station at Zichem (Figure 1). Its largest part is located in the Flemish region, with the southern part in Wallonia. The catchment belongs to the international Scheldt River Basin. Elevation ranges from 16.9 and 182.0 m. It is dominated by silt (50%), followed by silt loam (20%) and sand (18%). Croplands is the most frequent land cover class taking up to 67% of the total area, while urban areas account for 22% of it. Note that the above information comes from land cover and soil texture maps that were first reclassified to the IGBP (International Geosphere-Biosphere Programme) and USDA (United States Department of Agriculture) systems, respectively, and resampled to 250 m to be in line with the spatial resolution used for distributed modelling. The locations of the flow gauging stations selected for this study are also shown in Figure 1, while their drainage areas can be seen in Table 1.

Meteorological Data
The simulation period for this study was 1 January 2007 to 31 December 2019, out of which 3 years were used for warming up the models (2007-2009), 5 years for calibration (2010-2014) and 5 years for validation (2015-2019). A daily modelling timestep was used. The meteorological inputs for the rainfall-runoff models consist of rainfall and potential evapotranspiration (PET).
Precipitation data were obtained for rain gauges in Flanders [24], Wallonia [25] and the Netherlands [26]. Areal rainfall was calculated using the Thiessen polygons methodology, but excluding stations with more than 50% missing data during the simulation period. Gaps in the remaining datasets were filled using data from nearby stations, starting from the closest ones and moving further when needed. The largest distance between any pair of donor-recipient stations that occurred during the infilling process was 25.7 km, while the smallest Pearson correlation coefficient was 0.73.
To better account for the spatial rainfall variability during the infilling process, rainfall data from the donor stations were "re-scaled" before being used for data infilling. This involved multiplying data from the donor station by a factor so that the cumulative rainfall volumes of the donor and recipient stations match during their overlapping complete data periods. Stations with less than 5 years of overlapping data with the recipient stations were not used for filling the data gaps. The resulting average annual rainfall at the catchment for the study period was found to be 726 mm. Regarding PET, data from the nearby station of Maastricht (The Netherlands) were used [26]. The dataset was calculated with the Makkink method [27] and does not include missing data for the simulation period. The calculated average annual PET for the study period was 618 mm.

Discharge Data
Discharge data were obtained from [24]. Out of the stations located in the catchment, 15 (including the one at the catchment outlet) were selected for this application ( Figure 1 and Table 1). The selection was performed based on data availability and quality. Measurements characterized as "poor" or "suspect" by the data providers were not used. Furthermore, stations with an average flow rate below 0.1 m 3 /s (during the simulation period) were not considered.
Apart from runoff related to rainfall, flows from anthropogenic activities are also present in the catchment. Among others, those involve water discharged into the rivers via the wastewater treatment plants (WWTPs) and industries not connected to such plants. Such data for Flanders were retrieved from the web portal Geoviews of the Flanders Environment Agency (VMM). Although industrial effluents can be considered completely anthropogenic, those from WWTPs may originate from storm water, wastewater and parasitic flows (water intruding the pipe system via cracks and misconnections).
To quantify the outflows from WWTPs related to wastewater only, a numerical filter was applied. First, the sum of wastewater and parasitic flows (dry weather flow) was separated from the total effluent flow for each WWTP as the moving-minimum with a window of 21 days. This separation process has been described by [28]. Subsequently, wastewater was considered to account for 50% of the resulting quantity, with the rest attributed to parasitic flows. This percentage is in line with values in literature. Based on a large-scale study on Flemish catchments, parasitic flows were found to account on average for 51% of the dry weather flow [29]. Moreover, an examination of 34 sewage systems in Germany, quantified storm water, sanitary sewage and parasitic flows as 35%, 30% and 35% of the total water treated by WWTPs, respectively [28]. Flows from the WWTPs related to wastewater (resulting from the separation process described above) and the industrial effluents were averaged for the simulation period and added to model results as constant fluxes. Finally, abstractions from the rivers and the groundwater aquifers were not assessed.

Lumped Conceptual Model
The lumped conceptual rainfall-runoff model used is based on the NAM (Nedbør-Afstrømnings-Model). For the NAM model equations the reader can refer to [21]. The model is built around three major storages. The surface storage represents the rainwater intercepted by vegetation, trapped in soil surface depressions and in the top, cultivated part of the ground. As we move deeper from the soil surface, we have the root zone (or lower zone) storage and the groundwater storage. Rainfall is applied to the surface storage, while evapotranspiration first removes water from the surface storage (U) at the potential rate (Ep) and subsequently (if the atmospheric demand has not been met) from the root zone storage: (1) where: EaL is evapotranspiration from the root zone storage, L denotes the water in the root zone storage and Lmax is the root zone storage capacity.
When the surface storage reaches its capacity (Umax), a part of the excess rainfall may contribute to overland flow and baseflow, while the remaining water is stored in the root zone storage. The parameter CQOF, termed as overland flow runoff coefficient, represents the fraction of excess rainfall that becomes overland flow if the root zone storage is full Equation (2). Interflow is generated from the surface storage via Equation (3).
where: Pn is the excess rainfall (remaining after the surface storage is filled), CKIF is the time constant for generating interflow, while TOF and TIF are the respective thresholds for water in the root zone storage below which contributions to overland flow and interflow are prohibited. Equation (3) can be re-written as follows: where: IFmax is a parameter expressing the maximum contribution to interflow during a timestep, being equivalent to Umax/CKIF in the original equation. Of course, QIF is not allowed to be larger than the water in the surface storage (U). This modified equation form is used here as it seems to be more convenient for parameter disaggregation. The original parameter CKIF is bound to the surface storage capacity (Umax) for expressing the maximum contribution to interflow, while the newly introduced parameter IFmax can be assessed independently. The contribution to baseflow (G) is quantified as follows: where: TG is the root zone storage threshold below which the contribution to baseflow is prohibited. For simplicity (and to mitigate equifinality), the thresholds TIF and TG are here handled as a single parameter assuming that the contributions to interflow and groundwater storage both occur after the soil moisture surpasses a threshold allowing water to move under gravitational forces. Moreover, TOF was set to zero. It is obvious that the above model structure links infiltration and evapotranspiration with the relative wetness in the root zone storage (L/Lmax). However, in impervious areas infiltration is limited regardless of the water found deeper in the soil. Moreover, such areas can limit the interaction between the atmosphere and the soil and, thus, evaporation from the soil may be reduced. To better reflect those phenomena, the following equations are formulated: where: IMP is impervious fraction of the catchment. This parameter can be estimated from maps on catchment imperviousness and, thus, is not calibrated in the frame of this study. Via the above simplified equations, excess rainfall in impervious areas becomes overland flow while the generation of interflow and root zone storage evapotranspiration are prohibited. Finally, for our model overland flow and interflow are each routed via a linear reservoir with time constant CKQF, while baseflow via a reservoir with time constant CKBF.
The model parameters are summarized in Table 2.

Distributed Conceptual Model
A spatial resolution of 250 m was used for the distributed rainfall-runoff model. Regarding the distributed model structure, we can distinguish between the runoff production scheme and the routing scheme. The runoff production scheme refers to the way each grid cell translates inputs (rainfall, potential evapotranspiration) to internal states and outputs. This is adopted from the lumped conceptual model. In other words, each cell operates as a small lumped model.
The routing scheme determines both the direction of flow (flow paths) and the way adjacent cells are linked to each other (cell-to-cell routing). In the current modelling framework, the flow paths follow topography. It should be noted that this assumption may not always be accurate. For instance, the groundwater heads, which determine the groundwater flow paths, may not be in agreement with the land surface elevation gradients.
Regarding the cell-to-cell routing, a simplified approach was used according to which each sub-flow component coming from upstream will contribute to the corresponding sub-flow component downstream. To mitigate the dependency between the model spatial and temporal resolutions, overland flow is allowed to travel more than one cell at a timestep. For this, the "accufractionstate" and "accufractionflux" PCRaster Python functions are used. Those functions make use of a "transport fraction", which dictates the part of the accumulated overland flow that remains within or escapes each cell as we move from upstream to downstream. In our case, the transport fraction is estimated per cell and timestep using the linear reservoir concept. The reservoir time constant is the model parameter CKQF, while the inputs for such calculation consist of the overland flow generated within a cell itself (known from the runoff production scheme) and the overland flow coming from upstream (which is approximated by the corresponding quantity for the previous timestep).
Baseflow and interflow contribute to the corresponding sub-flow components of the immediate downstream cells. It should be mentioned that the aforementioned spatial linking methodology is applied for routing flows up to the river. Each river cell then collects all flows coming from its upstream river cells in a single timestep. The same simple routing scheme for river cells was used by Tran et al. [19].

General Concept
The aim of a distributed model is to adequately reflect catchment heterogeneity via its parameter values at different locations. Parameter disaggregation aims at retrieving spatially distributed model parameters starting from a base parameter set. This is accomplished using links between the parameters and location-specific properties. Tran et al. [19] used the following form of disaggregation relationship: where: θg is the parameter at the location g, θb is the base model parameter, ρg is the selected property at the location g, ̅ is the average property value for the catchment, while F is a parameter that can be used for calibrating the relationship. Equation (9) assumes a linear relationship between model parameters and the a priori selected catchment properties (ρ), which may not always stand. An alternative form of disaggregation relationship is the following: where: c is an exponent used for calibrating the relationships. This way, non-linearity can, in principle, be assessed better. An important assumption of both Equations (9) and (10) is that the base model parameter values are the averages of the spatially variable parameters. Obviously, this may not be valid as the effect of each parameter on the model results (e.g., the total flow volumes) is not necessarily linear. To address this, a more general equation is used: (11) This equation implies that the average of θg d over the model domain equals θb d (for d = 1, Equation (10) is obtained). By raising both sides of Equation (11) to the power of d −1 , the following is derived: The base model parameter, θb, in the above equations is adapted from a lumped model calibrated at the catchment outlet. A point of attention is that the lumped model implicitly uses a source-to-sink routing scheme (simulated flow is directly routed to the catchment outlet), while the distributed model uses a cell-to-cell scheme (flow passes from grid cell to grid cell). To account for that difference, a preliminary step, termed as the "1st disaggregation step", has to be carried out before the main parameter disaggregation [19]. This consists of dividing the lumped model recession constants with the average distance (in number of cells) to the river. Subsequently, fine-tuning may take place to better match the shapes of the recessions between the lumped and base model results.

Input Maps
The basic map inputs for the current modelling framework are a digital elevation map (DEM), a land cover, and a soil texture map. Moreover, a map related to soil sealing was used to assess imperviousness in the catchment. Finally, the map of Flemish watercourses, obtained from the Flanders Information Agency (AGIV) [30], was used to recondition the DEM in order to have a more accurate representation of the river paths at the model resolution. Nevertheless, the non-reconditioned DEM was used for parameter disaggregation.
A DEM of 25 m resolution was obtained from the European Environment Agency [31], while a land cover map of 2010 with 30 m resolution was acquired from the GlobeLand30 dataset [32,33] and reclassified to the IGBP classification system. Reclassification was carried out considering the general meaning of the land cover classes as their definitions are not necessarily identical between the two systems. The soil texture map (in vector format) was obtained from the Database of the Subsoil of Flanders (DOV) [34]. It is based on field observations using soil augers to a depth of 1.25 m, unless it is impossible to penetrate that deep [35]. The soil texture classes were converted from the Belgian to the USDA system based on the average particle composition (gravimetric sand, silt and clay fractions) of each Belgian class. Such a correspondence is shown in [36]. Gaps were filled using a soil texture map at the depth of 60 cm with 250 m resolution (using the USDA classification system) obtained from the International Soil Reference Information Centre (ISRIC) [37,38]. To consider the inconsistencies between the maps, the ISRIC map was reclassified prior to filling the gaps. This was done by converting its classes to those of the main map they most frequently correspond to for the areas where both datasets are complete. The catchment imperviousness was assessed via a map of 2009 with 100 m resolution which was obtained from the European Environment Agency [39]. Finally, the maps were resampled to comply with the model resolution. The continuous spatial data (elevation, imperviousness) were resampled via averaging, while majority resampling was used for the categorical data (land cover, soil texture).

Linking Model Parameters to Catchment Properties
An important step of the disaggregation methodology is the identification of links between model parameters and location-specific properties (ρ in Equation (12)). Those properties can be estimated from the elevation gradient, land cover, soil texture and imperviousness of each grid cell employing lookup tables. For example, Rawls et al. [40] classified soil water properties by soil texture. The lookup tables used for this study are the same as those used for the WetSpa Extension hydrological model [41]. Tran et al. [19] disaggregated four NAM model parameters, namely the root zone storage capacity (Lmax), and the time constants CKQF, CKBF and CKIF. We now disaggregate more model parameters. Below the links used for this study are described. Due to its conceptual nature, the model lumps several hydrological processes in a small number of parameters. Thus, such links cannot typically be considered as unique. In other words, the selection of links is (to an extent) up to the judgement of the modeller.
The root zone storage capacity parameter (Lmax) was linked with amount of water in the root zone under saturation, which can be expressed as the product of the root depth and the soil porosity. The surface storage capacity parameter (Umax) was conceptually linked with the maximum amount of water that can freely evaporate (or else, evaporate at the potential rate) from the surface of the modelled system. Therefore, Umax was related to the sum of the interception storage capacity and the readily evaporable water from the soil (REW) which is estimated as follows [42]: where: FC is the field capacity, PWP is the permanent wilting point and Zsurf denotes a small depth in the soil surface which is set equal to 40 mm. Field capacity represents the soil water content below which downward movement of water is considerably reduced, acknowledging however that the quantification of the aforementioned "considerable" reduction can be ambiguous [43]. Equation (13) was then modified to account for the limitations posed on soil evaporation due to imperviousness. Moreover, a wetting loss was added for the impervious areas, which was set to 0.5 mm as in the WetSpa Extension model: Interception storage capacity is considered to be zero for the impervious part of the grid cells, while the pervious part of the urban cells is considered covered by grass. Water stored in soil depressions can also be considered part of the surface storage capacity (Umax). However, courser soils (e.g., sand) that have larger depression storage capacities may hold less water in their depressions than heavier soils (e.g., clay) due to larger hydraulic conductivity. This is why the depression storage capacity was not linked to the Umax parameter. The parameter CQOF model was related to the potential runoff coefficient (C) of the WetSpa Extension model, which is calculated based on slope, land cover and soil texture. Such a link was also used by [44]. It should be mentioned that this link may be suboptimal as the two parameters are not used the same way by their respective models. C is used in the WetSpa Extension model to express the maximum fraction of rainwater after interception losses that does not infiltrate (either becomes surface runoff or gets trapped into depressions). For NAM model, on the other hand, the CQOF parameter is applied on the rainfall that escapes the surface storage (which accounts for more losses than just interception) and expresses its maximum fraction to become surface runoff.
The parameter CQOF in impervious areas is overshadowed by the imperviousness fraction parameter IMP (Equation (7)). To address this, parameter disaggregation was handled as follows:  Each cell was assigned to the property value (ρ) it would have if it was pervious.  The quantity (see Equation (12)) was calculated as the weighted average with the pervious cell fractions as weights. The pervious urban areas are considered covered by grass. This approach does not assess the soil compaction in the pervious urban areas, which is expected to enhance the generation of surface runoff. For instance, Gregory et al. [45] performed measurements and reported that compacted soils can reduce infiltration rates from 70% to 99%. Considering such effects could improve the disaggregation of the CQOF parameter.
The parameters TIF and TG (relative root zone storage thresholds for contribution to interflow and baseflow, respectively), here handled as a single parameter, were linked to the ratio between field capacity and porosity. The recession constant for baseflow (CKBF) was linked to the reciprocal of the saturated hydraulic conductivity (1/Ksat). Such a link was also used by Tran et al. [19]. As already mentioned, the soil texture map does not account for depths larger than 1.25 m (at maximum). This can have implications for the parameter disaggregation. For instance, when the groundwater table is located well below this depth, the estimated saturated hydraulic conductivity (Ksat) may not be representative.
The maximum contribution to interflow (IFmax) from the surface storage was linked to the product between the saturated hydraulic conductivity and the grid cell slope. This is a simplified concept based on Darcy's law, with the slope representing the hydraulic gradient. Thus, the generation of interflow will be enhanced in cells with steep slopes and soils with large hydraulic conductivity (e.g., sand). Since interflow generation in impervious areas is prohibited (Equation (8)), the disaggregation of IFmax parameter is handled as discussed above for the CQOF parameter. Finally, the time constant for routing the quick flows (CKQF) was related to the time of concentration. Using the kinematic wave equation, the concentration time can be considered proportional to the quantity n 0.6 × s −0.3 , where n is the Manning coefficient and s is the slope [46]. Tran et al. [19] also used this link. Manning coefficient was calculated as the weighted average between the values of the impervious and pervious grid cell fractions. Impervious areas (regardless of land cover) were assigned to a value of 0.05 m −1/3 s, which is the value originally attributed by the lookup tables to the urban areas. Pervious urban areas were assigned with the Manning coefficient for grass, while pervious areas for the other land cover classes were attributed with the lookup tables' value for the corresponding class: where: npervious denotes the Manning coefficient for the pervious grid cell fraction. In Table  3, the aforementioned links are summarized. Moreover, the catchment characteristics (topography, land cover, soil texture, imperviousness) used to retrieve each property are noted.

Calibration
The models were calibrated using a combination of manual and automatic methods. For automatic calibration, the Shuffled Complex Evolution (SCE-UA) method [47,48] was used. The methodology starts by calibrating a lumped model at the catchment outlet (Zichem/Demer station). This was carried out using an integrated objective function built around Nash-Sutcliffe efficiency [49]: 16) where: N is the total number of timesteps for model calibration, Qm(t) is the simulated discharge at timestep t, Qo(t) is the observed discharge at timestep t and is the average observed discharge during the calibration period. The integrated objective function (minimized) was formulated as follows: where: NSEH,ranked and NSEL,ranked denote the Nash-Sutcliffe efficiency on the ranked nearly independent peak and low flows, respectively, which were retrieved using the Water Engineering Time Series PROcessing tool (WETSPRO) by [50]. WETSPRO employs a digital filter to separate the subflows (baseflow, interflow, overland flow) from a hydrograph based on the difference in the corresponding recession constants. Subsequently, it uses those (together with other criteria) to split the hydrograph in nearly independent quick/slow flow periods and extract the corresponding peak/low flows. NSEH,ranked and NSEL,ranked compare the observed and simulated peak/low flows after being ranked and, thus, do not take into account the timing of the events. Instead, those metrics were used in an attempt to come up with a model that better matches their empirical distributions. The last term of the objective function is used to limit the discrepancy between the observed and simulated flow volumes by applying a penalty if it surpasses a threshold. The impervious fraction parameter (IMP) was not included in calibration. Instead, it was set to 6.7% which is the average catchment imperviousness as calculated from the soil sealing map. The study area belongs to a flood-prone zone with several artificial retention storages [51]. During large storms, the river may overflow its banks and the reported discharge values may underestimate the actual flows. To account for that, the recession constant for routing the quick flows (CKQF) obtained from automatic calibration was manually adjusted. The aim was to improve the overall fit of the empirical distribution related to the observed peak flows, while being more lenient with overestimations related to the flows for the highest return periods. The parameter ranges used for calibration and the final parameter values for the lumped model are shown in Table 4. For benchmarking purposes, lumped models were automatically calibrated for the internal gauging stations using NSE as the objective function. Lumped modes were not calibrated for the Zonhoven/Laambeek and Zoutleeuw/Dormaalbeek stations due to the lack of sufficient data for the calibration period.
The next step involves the parameterization of the distributed model to be used as the base for disaggregation. First of all, the soil sealing map was directly used as the IMP parameter. In other words, each grid cell was attributed with its actual impervious fraction without further calibration. Subsequently, the 1st disaggregation step was performed by adjusting the lumped model values for the recession constants (CKQF, CKBF) to account for the differences in the routing scheme between the lumped and the distributed models. The remaining parameters were directly adopted from the lumped model.
Finally, a distributed model with spatially variable parameters is derived by disaggregating the model parameters of the base distributed model. This involved all model parameters apart from IMP (which was already spatially variable based on the imperviousness map). Initially, the exponents d (see Equation (12)) were assessed. It was noticed that the disaggregation of Umax parameter can lead to a substantial added bias in terms of flow volumes at the catchment outlet in comparison with the base model. Its effect on the flow volumes was investigated by running the lumped model for the catchment outlet for different Umax values while keeping the other parameters fixed. The fitted power-law can be seen in Figure 2. Its coefficient of determination (R 2 , Equation (18)) is 0.96 and it has an exponent of −0.103. Therefore, the disaggregation exponent d for Umax was set to −0.103. For simplicity, the corresponding exponents for the remaining parameters were set to 1.0. Subsequently, the exponent c (Equation (12)) for each parameter to be disaggregated was automatically calibrated. The average NSE value for the gauging stations was used as the objective function. Stations with insufficient data for the calibration period (Zonhoven/Laambeek, Zoutleeuw/Dormaalbeek), while also stations for which the corresponding lumped model noted NSE less than 0.50 for the simulation period were not used for calibrating the disaggregated model. The calibrated values for the disaggregation exponent c can be overviewed in Table 4, while the corresponding parameter maps can be seen in Figure 3. The calibration boundaries for the exponents were determined so that the disaggregated model parameters are within logical ranges, but at the same time being more lenient compared to the lumped models in order to account for the spatial variability in catchment properties. For example, a small number of grid cells with slopes close to zero (accounting for less than 0.1% of the catchment) were allowed take CKQF values larger than the upper limit of 100 hours used for the lumped model. From Figure 3 it can be noticed that the CQOF parameter values practically remain uniform even after disaggregation. Possible reasons for this have already been discussed previously in the paper.

Evaluation
Apart from NSE and R 2 , the Kling-Gupta efficiency (KGE) goodness-of-fit measure [52] was also used as a general metric for evaluating the model results: where: r is the Pearson correlation coefficient between the observed and simulated flows, σ denotes the standard deviation and μ the mean. To give more weight to the low flows, NSE on log-transformed flows (NSE_log) was used. To avoid potential issues with zero values, a small constant ε equal to one-hundredth of the average observed flow was added to the flows before working with logarithmic expressions (Equation (20)). Such value for the constant was also suggested by [53]. Finally, percent bias (PBIAS) and absolute percent bias (APBIAS) were used to evaluate the model performance in terms of flow volumes (Equations (21) and (22), respectively).

Catchment Outlet
In Table 5, the performance metrics at the catchment outlet (Zichem/Demer station) for the base and disaggregated distributed models are shown. The base and disaggregated models perform closely to each other with respective NSE for the entire simulation period (excluding warming up) is 0.70 for both models. This value can be considered acceptable. R 2 is 0.74, while KGE is 0.83 and 0.81 for the base and disaggregated models, respectively. In Figures 4 and 5, the observed and simulated hydrographs, cumulative flow volumes and empirical distributions of the peak/low flows (retrieved using WETSPRO) are presented for the calibration and validation periods, respectively. Despite the acceptable model performance in terms of NSE, KGE and R 2 , a visual inspection of the hydrographs also reveals periods with unsatisfactory agreement between observations and model results. For instance, flow overestimations can be observed for the second half of 2016. Moreover, the simulated hydrograph recessions during the validation period (e.g., summer 2018) are milder compared to those observed ( Figure 5). This could be due to an overestimation of the recession constant for baseflow (CKBF). A solution could be to narrow this parameter's range for calibration. The estimation of flows from anthropogenic activities into the rivers (and the related uncertainty) is also expected to have a considerable effect on the parameters related to the slow flows, such as CKBF. Finally, different objective functions for calibration could also be tested, depending always on the goal of each modelling task.  PBIAS is 10% and 12% for the base and disaggregated models, respectively, when the entire modeling period is considered. During the calibration period, overestimations of the observed flow volumes are mainly noticed after 2013, while the discrepancies are much smaller before ( Figure 4). With respect to the validation period ( Figure 5), a large part of the overall deviation is due to the overestimations during the second half of 2016. Regarding NSE_log metric (more sensitive to low flows), the base model scores 0.65, while the disaggregated one 0.68 for the entire simulation period. The empirical distribution of the observed low flows is generally better followed by the models for the calibration than the validation period (Figures 4 and 5), with the two models performing closely to each other.
From Figures 4 and 5, it can be seen that empirical distribution of the observed peak flows are followed to a good extent by the models. Nevertheless, proclaimed differences are the overestimations noted for the largest return periods. For the calibration period, the maximum simulated flows are 100.1 and 98.9 m 3 /s for the base and disaggregated model respectively, while the maximum observed flow is 59.3 m 3 /s. For the validation period, the maximum simulated flows are 87.5 m 3 /s and 86.1 m 3 /s for the base and disaggregated model respectively, while the maximum observed flow is 54.3 m 3 /s. As already discussed, those could be due to flooding. Coupling of the rainfall-runoff model with a river model, could probably mitigate those differences.

Internal Gauging Stations
In Table 6, the NSE metric for each internal gauging station and modelling configuration (base model, disaggregated model, lumped models) is shown for the entire simulation period. Figure 6 presents those as a plot. The disaggregated model performs better than the base model for 12 out of 14 internal stations, while the average NSE is 0.28 and 0.39 for the base and disaggregated models, respectively.  The largest differences between the models are noted for Diepenbeek/Stiemer, Lummen/Zwartebeek, Wellen/Herk, Wimmertingen/Mombeek and Zoutleeuw/Dormaalbeek stations. Those are stations for which the base model performs poorly (NSE ranging between −0.37 and 0.27). Apart from Lummen/Zwartebeek, they are found relatively far from the catchment outlet, while their upstream areas have a substantially different soil texture composition when compared to the entire catchment. The areas upstream of Diepenbeek/Stiemer and Lummen/Zwartebeek are dominated by sand (81 and 62%, respectively), while silt is the dominant soil texture upstream of Wellen/Herk, Wimmertingen/Mombeek and Zoutleeuw/Dormaalbeek stations (98%, 75% and 99%, respectively). Moreover, the imperviousness fraction upstream of Diepenbeek/Stiemer is the highest among all stations (18%). The above can partially explain why the base model parameterization does not sufficiently express the rainfallrunoff dynamics for these areas.
The disaggregated model NSE for the aforementioned stations ranges between 0.16 and 0.48. Despite the performance difference with the base model, there is still substantial room for improvement. In Figure 7, the observed and simulated hydrographs at Diepenbeek/Stiemer, Halen/Gete and Wimmertingen/Mombeek stations for 2014 can be seen. While the disaggregated model is generally closer to the observations at Diepenbeek/Stiemer and Wimmertingen/Mombeek when compared to the base model, it is not necessarily reliable as the differences between the simulated hydrographs are often small. Regarding Halen/Gete, periods when the base model is closer to the observations can also be seen. Using distributed rainfall inputs (e.g., radar data) could enhance model performance. Moreover, it is possible that more detailed input maps on catchment characteristics (e.g., land cover) could trigger further improvement. At the same time, the respective NSE values for the lumped models calibrated for Lummen/Zwartebeek and Wimmertingen/Mombeek are 0.43 and 0.34, being among the lowest for this modelling configuration. This may indicate that the conceptual model structure is insufficient for expressing dynamics for those areas (e.g., due to anthropogenic regulations).  Table 7, the NSE, KGE, NSE_log and APBIAS metrics averaged for the internal gauging stations where lumped models are available (thus, excluding Zonhoven/Laambeek and Zoutleeuw/Dormaalbeek stations) are shown. As expected, the base model is the worst-performing one in terms of average NSE for the internal stations, with 0.32 for the entire simulation period. At the same time, NSE is 0.44 for the disaggregated model and 0.59 for the lumped models. In other words, the disaggregated model manages to bridge about 44% of the total performance difference between the base distributed model and the lumped models. KGE is 0.57, 0.61 and 0.72 for the base, disaggregated and lumped models, respectively. When all internal stations are considered, the disaggregated model performs better than the base model for 12 out of the 14 internal stations (albeit the differences are sometimes not considerable). This is illustrated in Figure 8. The average NSE_log values are −0.01, 0.19 and 0.33 for the base, disaggregated and lumped models, respectively. The disaggregated model outperforms the base model for ten internal stations ( Figure 8). Finally, the respective APBIAS metric values are 25%, 19% and 12% for the base, disaggregated and lumped models. The disaggregated model performs better than the base model for 10 gauging stations. In Figure 9, the above are summarized in the form of boxplots. Overall, it can be seen that the base model is the worst performing for all considered metrics, while the lumped models are (as expected) the best performing ones. At the same time, the disaggregated model manages to cover a substantial part of the performance difference between the other modelling configurations.

Conclusions
Tran et al. [19] developed a simple top-down approach for distributed rainfall-runoff modelling, which essentially consists of disaggregating selected parameters of a lumped conceptual model based on their links with catchment properties. With the current work, we further develop the disaggregation methodology and extend it for more model parameters. Furthermore, we test the distributed rainfall-runoff modelling approach on a catchment in Belgium involving 15 gauging stations.
Initially, a lumped conceptual model is calibrated using measurements at the catchment outlet. The lumped model is used to derive a base distributed model with spatially uniform parameters. An exception to that is the catchment imperviousness which is already assessed in a spatially distributed manner using an imperviousness map. Subsequently, parameter disaggregation is carried out to come up with spatially variable model parameters. Lumped models for the internal gauging stations are also calibrated and used as a benchmark for model evaluation.
The disaggregated model manages to maintain a similar performance to the base distributed model at the catchment outlet. At the same time, it was shown to outperform the base model for the majority of internal gauging stations with respect to the metrics used for model evaluation. While the lumped models were shown to be the best performing ones on average, the disaggregated model managed to bridge a substantial performance difference between the base model and the lumped models. Specifically for NSE, it covered about 44% of the difference between the other modelling configurations.
Those indicate the potential of the approach on expressing a part of the spatially heterogeneous catchment characteristics in the form of model parameters.
Despite the improvement when compared to the base model, it was shown that the disaggregated model performance was not necessarily satisfying with respect to simulating flow dynamics at the internal gauging stations. The differences between the hydrographs simulated by the two models were often not considerable. Further refinement of the disaggregation methodology, coupling with a river model, using distributed meteorological inputs, while also accounting for local effects such as groundwater abstractions could enhance model accuracy.
The parameter disaggregation approach poses as another small step towards the regionalization of conceptual model parameters. Eventually, this could be beneficial for modelling flow dynamics in ungauged catchments or catchments characterized by the scarcity of good quality data. Testing the disaggregation concept with other rainfall-runoff models while also for catchments with different climate and characteristics would be a step further. Finally, there is also potential on coupling the disaggregated rainfall-runoff models with other model types. Such a coupling with MODFLOW, a physically-based groundwater flow model [54], was already demonstrated by [55]. Similar tests with other model types (soil moisture, urban runoff models etc.) could also be performed.