Seamless Integration of Rainfall Spatial Variability and a Conceptual Hydrological Model

Rainfall is an important input to conceptual hydrological models, and its accuracy would have a considerable effect on that of the model simulations. However, traditional conceptual rainfall-runoff models commonly use catchment-average rainfall as inputs without recognizing its spatial variability. To solve this, a seamless integration framework that couples rainfall spatial variability with a conceptual rainfall-runoff model, named the statistical rainfall-runoff (SRR) model, is built in this study. In the SRR model, the exponential difference distribution (EDD) is proposed to describe the spatial variability of rainfall for traditional rain gauging stations. The EDD is then incorporated into the vertically mixed runoff (VMR) model to estimate the statistical runoff component. Then, the stochastic differential equation is adopted to deal with the flow routing under stochastic inflow. To test the performance, the SRR model is then calibrated and validated in a Chinese catchment. The results indicate that the EDD performs well in describing rainfall spatial variability, and that the SRR model is superior to the Xinanjiang model because it provides more accurate mean simulations. The seamless integration framework considering rainfall spatial variability can help build a more reasonable statistical rainfall-runoff model.


Introduction
Accurate simulation of the rainfall-runoff process is of great importance to runoff simulation, flood prediction, and water resources optimization [1][2][3][4]. Among various types of rainfall-runoff models, conceptual models are widely used due to their simple structure and few data requirements [5][6][7][8][9]. Rainfall, as one of the key inputs to conceptual rainfall-runoff models, is extremely vital to the accuracy of the model outputs [10][11][12][13][14][15]. Nowadays, rain gauges are the most traditional and direct way to obtain reliable rainfall data at a fine temporal resolution. Traditional conceptual rainfall-runoff models commonly use catchment-average rainfall as inputs [16]; however, rainfall is usually biased due to the spatial variability and low density of unevenly distributed rain gauging stations, especially in developing areas [15,[17][18][19][20][21]. Therefore, finding an appropriate way to describe rainfall variability for traditional rain gauging stations in conceptual rainfall-runoff models remains an unresolved issue. Recently, the study of empirical functions has provided an ingenious way to characterize the distribution of rainfall variability using a few parameters [22]. Several probability density functions (pdfs) have been commonly used to describe the variability of rainfall over the past few decades, such as gamma distribution, Pearson type III distribution, exponential distribution, Weibull distribution, and lognormal distribution [23][24][25][26]. Among these distributions, the gamma distribution is probably the most frequently used one [24,27].
Although the importance of rainfall spatial variability in runoff calculations has been acknowledged in many studies [12,[28][29][30], there is far too little research investigating the integration of rainfall variability and conceptual rainfall-runoff models. Therefore, it is urgent to find an appropriate rainfall distribution and a conceptual rainfall-runoff model to better simulate the flow. In this case, the gamma distribution is inappropriate due to its complex mathematical expression. Recently, a generalized exponential distribution (GED) was proposed and studied as an alternative to the most widely used distributions (e.g., gamma, Weibull, and lognormal distributions) [31][32][33][34][35]. The tractable exponential function, which is characterized by calculative simplicity [36,37], provides the possibility of integrating rainfall spatial variability with a conceptual rainfall-runoff model.
Typically, there are two underlying runoff generation mechanisms in conceptual rainfall-runoff models [38], i.e., the infiltration excess (Horton) mechanism [39] and saturation excess (Dunne) mechanism [40]. However, most of the models are developed based on a single mechanism, such as the Xinanjiang (XAJ) model [41], TANK model [42], Hydrological MODel (HYMOD) [43], Sacramento model [44], and Shanbei model [45]. While, in the real world, the two types of runoff processes are usually intertwined due to the heterogeneity of rainfall and the complexity of underlying surface conditions [46,47]. Therefore, the mixed runoff that couples both infiltration excess and saturation excess mechanisms needs to be studied [46,[48][49][50]. For this purpose, the vertically mixed runoff (VMR) model was developed by [48,51] and has been successfully applied to several catchments in China [13,38,46]. The VMR model assumes that the two runoff generation mechanisms exist simultaneously in the vertical direction at the same time where the Horton runoff occurs when rainfall intensity is greater than the infiltration capacity, and the Dunne runoff occurs when the soil moisture capacity is satisfied [52]. The spatial variability of soil infiltration capacity and soil moisture capacity is also included in the VMR model.
In conclusion, our key objectives are (1) developing an appropriate distribution to describe the spatial variability of rainfall for traditional rain gauging stations; (2) building a statistical rainfall-runoff model that considers rainfall spatial variability in a conceptual rainfall-runoff model where the two runoff generation mechanisms are considered. Therefore, a transformation of the GED is incorporated into the VMR conceptual rainfall-runoff model to develop a seamless integration framework named the statistical rainfall-runoff (SRR) model. In the SRR model, the uncertainty of rainfall, soil infiltration capacity, and soil moisture capacity (from the VMR model) will be propagated to the simulated runoff. In that case, the deterministic flow routing approaches are no longer applicable. To solve this problem, a stochastic differential equation, which is useful for dealing with the uncertain effects of stochastic input processes [53][54][55][56], is applied to the flow routing process in the SRR model.
The remaining sections are organized as follows. In the Section 2, the framework of the SRR model is described in three main parts: rainfall spatial variability description, statistical runoff calculation, and stochastic flow routing calculation. Section 3 gives a brief introduction to the study area and data used. Hourly rainfall, evaporation, and streamflow data of 16 historical flood events in a Chinese catchment are collected for model calibration and validation. Two metrics evaluating the performances of the rainfall distribution and the SRR model are presented as well. The fitness of the rainfall distribution and the SRR model is then assessed in Section 4, followed by a summary of this study in Section 5.

Rainfall Spatial Variability Description
The GED can be used to substitute for traditional rainfall distributions (such as gamma and Weibull distributions) [31][32][33][34][35]. The pdf of this GED is shown below: where α is the shape parameter, α > 0; λ is the scale parameter, λ > 0. For α ≤ 1, the pdf is a decreasing function while for α > 1, it is a unimodal, skewed, right-tailed function similar to the gamma or Weibull density function [35]. However, the variability of the shape parameter makes it difficult to integrate the pdf with rainfall-runoff models where integrals and derivatives need to be accounted for. Based on the idea that rainfall distribution across a catchment is usually positively (or right) skewed [57][58][59][60][61][62][63][64], the parameter α in the GED is fixed to 2 to reduce the complexity of fusion in this study. More elasticity is then given to the powers of the exponential functions to form the exponential difference distribution (EDD) in this study. The pdf of the EDD is shown below: where P is the rainfall that occurred across the catchment; λ 1 and λ 2 are parameters of the pdf and satisfy λ 2 > λ 1 > 0; c is defined as c = λ 1 λ 2 . When λ 2 = 2λ 1 , it will be the same as the particular case of Equation (1) where its shape parameter is fixed to 2.
The exceedance distribution and mean of the EDD are presented below: The least squares method can be used to estimate the parameters of the EDD by minimizing errors between the theoretical distribution (the EDD) and an empirical distribution (Equation (5)) [34,65]. An unbiased empirical distribution estimator that has been widely used for rainfall [31,35,66] is presented below: where j is the ranking number of data in a nonincreasing sequence and n is the rainfall sample size. Therefore, the empirical distribution describes the exceedance probability of a given rainfall amount, which is consistent with the exceedance distribution of the EDD (Equation (3)).

Statistical Runoff Calculation
Runoff is estimated by combining the rainfall pdf (EDD) with the VMR model. The VMR model is a conceptual rainfall-runoff model that has been widely used in China since 1997 [13,48,51,67]. This model is developed from the XAJ model [41] and postulates that two runoff generation mechanisms (i.e., infiltration excess mechanism and saturation excess mechanism) exist simultaneously in the vertical direction (see Figure 1); thus, it can be applied to both humid and arid areas theoretically. In the VMR model, rainfall will be divided into surface runoff (RS) and infiltration (I) when rainfall intensity is greater than the infiltration rate ( Figure 1a). The downward infiltration is then used for replenishing the soil moisture capacity before groundwater runoff (RG) occurs ( Figure 1b). More details of the VMR model can be found in [13,51,67]. . P is the net rainfall (or effective rainfall), I is the infiltration, I c is the infiltration capacity, I cmax is the maximum infiltration capacity within the catchment, S c is the soil moisture capacity, S cmax is the maximum soil moisture capacity within the catchment, W 0 is the initial soil moisture content, ∆W is the changes in the soil moisture content, and S c0 is the soil moisture capacity corresponding to W 0 .
Surface runoff occurs when the net rainfall intensity (P i ) is greater than the infiltration rate (I i ) according to the infiltration excess (Horton) runoff generation mechanism [39]. Therefore, the statistical distribution of surface runoff (RS) in the SRR model is obtained by combining the pdfs of rainfall and infiltration capacity: where F(RS) represents the distribution function of surface runoff and f (I i , P i ) represents the joint distribution density of dualistic variables: infiltration rate and rainfall intensity. Considering the complexity in soil structure and texture, the soil infiltration capacity varies across a catchment. The infiltration capacity distribution curve ( Figure 1a) [68,69] uses an empirical formulation without the need to detect local soil moisture capacity, which is usually unavailable due to limited in-situ measurements: where I c is the infiltration capacity, F(I c ) is the distribution of the infiltration capacity, I cmax is the maximum infiltration capacity within the catchment, BF is a parameter that reflects the unevenly distributed infiltration capacity, and I cm represents the catchment-average infiltration capacity. Thus, the pdf of the infiltration capacity is derived: The improved Green-Ampt infiltration curve is then applied to help estimate the I cm [51,70]: where I s is the stable infiltration rate, KF is the osmotic coefficient, showing the sensitivity coefficient of the influence of soil moisture on the infiltration rate, S cm is the catchmentaverage soil moisture capacity, and W refers to the soil moisture content. Assuming variables I i and P i are independent, integration in math is then applied to Equation (6) to derive the distribution of the surface runoff: with: where P min is the minimum rainfall within the catchment at a given time and r, r 1 , and r 2 are coefficients independent of RS that can be solved by numerical integration methods using MATLAB (Matrix Laboratory, USA) or R (The R Programming Language, New Zealand). Then, the expected value and variance of the surface runoff can be further derived: where: P max is the maximum rainfall within the catchment at a given time.
Groundwater runoff occurs when the soil moisture capacity is filled up according to the saturation excess (Dunne) mechanism [40]. Different from the case of surface runoff estimation, the expected value of groundwater runoff is considered because of the complexity of the joint probability distribution of infiltration and soil moisture capacity. The spatial variability of soil moisture capacity is then considered using the soil moisture capacity distribution curve ( Figure 1b): where S c is the soil moisture capacity, F(S c ) is the distribution of soil moisture capacity, S cmax is the maximum soil moisture capacity within the catchment, and β is a parameter describing the degree of the spatial variability of soil moisture capacity. The smaller the value of β( β ≥ 0), the more uniform the distribution of the soil moisture capacity. Under this condition, the mean groundwater runoff (RG) is estimated [41,67]: where: E(I) is the mean infiltration, E(I) = E(P) − E(RS), W max is the maximum catchment-average soil moisture content, W 0 is the initial soil moisture content, and S c0 is the soil moisture capacity corresponding to W 0 .
In conclusion, the surface runoff is expressed as a probabilistic function, while groundwater runoff is presented as a mean value. Therefore, the total runoff is obtained through the sum of information from the surface runoff and groundwater runoff. Uncertainties from the spatial variability of rainfall, infiltration capacity, and soil moisture capacity are transferred to the total runoff (TR). The first two moments (i.e., mean and variance) are adopted to estimate the statistical runoff yield component (i.e., TR):

Stochastic Flow Routing Calculation
To cope with the watershed flow routing under the condition of stochastic inflow, stochastic differential equation theory [56,71] is used in this study. The study catchment is simplified to a linear reservoir without lateral inflow, the inflow of the reservoir is the total runoff with uncertainties, and the outflow of the reservoir is the flood simulations. The following set of equations for the catchment routing system is built based on the water balance equation [56,72]: where Q I (t) is the stochastic inflow, m 3 where the constant coefficient (3.6) is the unit conversion coefficient, and F is the area of the study catchment, km 2 .
To simplify the calculations, we assume that the stochastic inflow process Q I (t) can be expressed as Gaussian white noise superimposed on a deterministic mean [54]: where Q I (t) is the mean of the stochastic inflow process; Gaussian white noise is expressed as ω(t) = dB(t)/dt, and B(t) is the Wiener process.
Simultaneously solving Equations (20) and (22), the stochastic differential equation of this flow routing component is obtained: It is important to pay attention to the moments (usually the first two moments, i.e., the mean and variance) of a stochastic process. Therefore, the numerical discretization method is adopted to solve this stochastic differential equation: Once the initial outflow is determined, the expected value and variance of the outflow can be estimated using Equation (24).
The statistical runoff component coupled with the stochastic routing component compose the seamless integration framework named the SRR model. Parameters calibrated in the SRR model are described in Table 1. Table 1. The description and ranges of the parameters calibrated in the statistical rainfall-runoff (SRR) model.

Parameter
Description Ranges

KC
The ratio of potential evapotranspiration to pan evaporation 0.5-1.5 ImA The impermeable area ratio 0-1 BF The exponent of the infiltration capacity distribution curve 0-0.

Study Area and Data
The methodologies described above are applied to the Huangnizhuang catchment located in the Huaihe River Basin in China (see Figure 2). The study catchment approximately covers an area of 805 km 2 . The mean annual temperature is 11 • -16 • and the mean annual precipitation is 1077 mm. However, storms usually occur between June and September (summer in China) and account for 50-80% of the total annual rainfall amount [73,74]. Most floods in this region are caused by rainstorms with characteristics of high intensity, short duration, small rainstorm center range, and high peak discharge.
There are seven rain gauging stations and a hydrological station located in the study area. To increase the number of rainfall samples for rainfall distribution estimation, spatial interpolation methods, such as the inverse distance weighting (IDW) technique, polynomial interpolation, and Kriging interpolation, can be used [75][76][77][78]. Therefore, two rain gauging stations near the catchment are also considered for better interpolation. Hourly rainfall and discharge data of 16 historical flood events that occurred between 1983 and 2008 are collected for the assessment of the SRR model. The Kolmogorov-Smirnov (KS) test is a widely used goodness-of-fit test that can help us better understand how accurately the EDD describes the observed rainfall [24]. The KS test quantifies the differences between empirical and hypothetical cumulative distribution functions [79,80]. The mathematical description of the KS test is shown below: where F n (P) is the empirical distribution function, F (P) equals 1 minus the exceedance distribution function F(P) shown in Equation (3) and D (also called the KS statistic) is the maximum difference between the empirical distribution and the EDD. The null hypothesis of the KS test is that the observed rainfall samples follow the hypothetical distribution. Usually, the KS statistic (D value) is compared with a given KS value, but the acceptable KS value is usually variable and arbitrary. Therefore, the confidence in accepting or rejecting the null hypothesis is measured by the p-value, which factors the number of samples into the calculation of its value. For a given significance level, a p-value larger than the significance level indicates that the null hypothesis cannot be rejected, which means it is acceptable to use the hypothetical distribution function to represent the rainfall across the catchment, and vice versa [24].
The Nash-Sutcliffe efficiency (NSE) [81] has been one of the most widely used metrics for model calibration and evaluation in hydrology [82]. The NSE describes how closely the simulated flow matches the observed flow: where Q obs is the observed flow, m 3 /s; Q sim is the simulated flow, m 3 /s; and Q obs is the mean of the observations, m 3 /s. The range of the NSE is between −∞ and 1, and the closer the NSE value approaches 1, the better the model simulation [83].

Fitness of the Rainfall Distribution
To guarantee the estimation accuracy of the pdf, the inverse distance weighting (IDW) technique is used to increase the rainfall samples across the study area. The mean of these rainfall samples is taken as the approximate expected value of the EDD, i.e., E(P). For a given time, the parameters in the EDD are estimated based on all the rainfall samples within the catchment using the least squares method. When no rainfall occurs, the surface runoff generated at this time is set to be zero without generating the rainfall pdf. Parameters of the EDD vary with rainfall amounts and rainfall spatial distribution. In total, hourly data from 16 rainfall events (see Table 2) that occurred in the Huangnizhuang catchment from 1983 to 2008 are collected. The KS test is then applied to assess the accuracy of the EDD in describing the observed rainfall. For comparison, the most widely used gamma distribution is also conducted for rainfall description. Similarly, the accuracy of the gamma distribution is then tested by the KS test. Table 2 presents the KS test results of both the EDD and gamma distribution at the 0.05 significance level.
In the KS test, the null hypothesis assumes that the observed data follow the theoretical distribution. 'H = 0' represents the acceptance of the null hypothesis, while 'H = 1' represents the rejection of the null hypothesis. As can be seen from Table 2, the mean p-values for all the rainfall events described by both the EDD and gamma distribution cases are larger than the significance level 0.05, which indicates the general acceptance of the null hypothesis. For the EDD case, the mean proportion of 'H = 0' is 71.9% and that of 'H = 1' is 28.1%; for the gamma distribution case, the mean proportion of 'H = 0' is 75.3% and that of 'H = 1' is 24.7%. The results indicate that at the 0.05 significance level, the performances of the EDD and gamma distribution are quite comparable. Considering the possibility of the integration of the rainfall distribution and the conceptual rainfall-runoff models, the EDD is used in this study. Figure 3 presents examples of the fitness of the probability density function (pdf) and cumulative distribution function (CDF) of the EDD for the rainfall event that occurred on 02/09/2005.

Model Calibration and Validation
The estimated EDD is combined with the VMR conceptual rainfall-runoff model to develop the SRR model. The stochastic differential equation is adopted for the flow routing. The SRR model is then calibrated in the Huangnizhuang catchment by maximizing the NSE metric using the automatic global optimization method-Shuffled Complex Evolution algorithm (SCEUA) [84,85]. Readers who are interested in the SRR model can follow the methodology section but use their own catchment data instead. Table 3 presents the calibration and validation results of the SRR model for the 16 flood events in the Huangnizhuang catchment, including the NSE and the absolute value of flood volume relative error. Moreover, the flood peak is the focus since it is one of the most important characteristics of a hydrograph [3]. Therefore, the observed peak flow, simulated peak flow, and the absolute value of flood peak relative error are also considered in the model assessment.
As can be seen from Table 3, all the NSE values of both calibration and validation periods are larger than 0.70, ranging from 0.71 to 0.92, with a mean of 0.82, which indicates good performance of the SRR model. For the calibration period, the average absolute value of the flood peak relative error is approximately 10.9%, ranging from 0.2% to 22.4%; the average absolute value of the flood volume relative error is approximately 11.9%, ranging from 1.6% to 31.5%. For the validation period, the average absolute value of the flood peak relative error is approximately 12.4%, ranging from 9.2% to 14.5%; the average absolute value of the flood volume relative error is approximately 11.5%, ranging from 4.8% to 20.3%. Figure 4 presents examples of the fitness of the observed flow and simulated mean flow chosen from the calibration period (10/05/1989) and validation period (10/07/2005), respectively. Overall, these results indicate that the SRR model has the potential for flood simulation as far as the expected values are concerned. Further study will delve into the probability forecast of the SRR framework when its variance is considered.

Comparison between the SRR Model and the XAJ Model
The XAJ model, a lumped conceptual rainfall-runoff model that has been used worldwide [86][87][88][89][90], is used in this study for comparison with the SRR model. The XAJ model is based on a single runoff generation mechanism-the saturation excess mechanism. The soil moisture capacity distribution curve is used to describe the spatial variability of catchment soil moisture capacity. Similarly, the XAJ model is calibrated using the SCEUA to maximize the NSE metric. The observed peak flow, simulated peak flow, absolute value of the flood peak relative error, absolute value of the flood volume relative error, and NSE are all listed in Table 4.
The results in Table 4 indicate that the XAJ model performs well in the Huangnizhuang catchment. The mean NSE in both calibration and validation periods is 0.84. For the calibration period, the mean absolute value of the flood peak relative error is approximately 15.9%; the average absolute value of the flood volume relative error is approximately 14.9%. For the validation period, the mean absolute value of the flood peak relative error is approximately 34.4%; the average absolute value of the flood volume relative error is approximately 16.7%.
To better understand the performance of the SRR model against that of the XAJ model, a coefficient of relative effectiveness (CEF) is used [91]. The CEF is described below: where Q M1 is the simulations from Model 1, m 3 /s, while Q M2 is the simulations from Model 2, m 3 /s. A positive CEF indicates a closer match between the results of Model 1 and the observations, whereas a negative CEF indicates a closer match between the results of Model 2 and the observations. In this study, the SRR model is regarded as Model 1, while the XAJ model is regarded as Model 2. The calculated CEF is 0.49 which indicates that the results of the SRR model are superior to those of the XAJ model for the flood events in the Huangnizhuang catchment.

Conclusions
This study aims to build a seamless integration framework that considers rainfall spatial variability in a conceptual rainfall-runoff model, which may contribute to more reasonable statistical rainfall-runoff modeling. For this purpose, the SRR model is built. The performance of the SRR model is then assessed using 16 flood events that occurred in a Chinese catchment. The main conclusions are summarized as follows: (1) A transformation form of the generalized exponential function, i.e., the EDD, is proposed to describe the spatial variability of rainfall. The unimodal, skewed, and right-tailed EDD indicates that small rainfall has a relatively high probability over the catchment and vice versa. The exponential type provides the ability to incorporate the EDD into a conceptual rainfall-runoff model. The IDW approach is then applied to obtain more rainfall samples, based on which, parameters of the rainfall distribution (the EDD) are estimated using the least squares method. (2) The VMR model is a lumped conceptual rainfall-runoff model considering both the infiltration excess and saturation excess runoff generation mechanisms. Additionally, the spatial variability of soil infiltration capacity and soil moisture capacity is described using empirical distributions in the VMR model. The EDD is then coupled with the VMR model to estimate the statistical runoff component. Specifically, the distribution of surface runoff is deduced by the joint distribution of rainfall and soil infiltration capacity according to the infiltration excess mechanism, while the expected value of groundwater runoff is estimated based on the saturation excess mechanism. Considering the complexity of the distribution, the total runoff can be described by the mean and variance where uncertainties in the spatial variability of rainfall, infiltration capacity, and soil moisture capacity are included. (3) To address the flow routing under the condition of stochastic inflow, the stochastic differential equation is adopted. The study catchment is assumed to be a linear reservoir whose input is the estimated total runoff, while the output is the simulated streamflow. The first two moments of the simulated streamflow can be given while the mean is the focus in this study. (4) The SRR model is calibrated and verified by 16 flood events that occurred in the Huangnizhuang catchment of China. The fitness of the EDD and observed rainfall is assessed by the KS test. The gamma distribution is also adopted for comparison. The KS test results indicate the good performance of the EDD, which is comparable to that of the gamma distribution. In addition, the SRR model is compared with the XAJ model based on metrics of peak flow and NSE. The results show the advantage of the SRR model when its expected values are considered. Further study will elaborate the probability forecast of the SRR framework when its variance is considered. Moreover, catchments with various hydro-climatic characteristics will be investigated.