Performance Assessment of Hydrological Models Considering Acceptable Forecast Error Threshold

It is essential to consider the acceptable threshold in the assessment of a hydrological model because of the scarcity of research in the hydrology community and errors do not necessarily cause risk. Two forecast errors, including rainfall forecast error and peak flood forecast error, have been studied based on the reliability theory. The first order second moment (FOSM) and bound methods are used to identify the reliability. Through the case study of the Dahuofang (DHF) Reservoir, it is shown that the correlation between these two errors has great influence on the reliability index of hydrological model. In particular, the reliability index of the DHF hydrological model decreases with the increasing correlation. Based on the reliability theory, the proposed performance evaluation framework incorporating the acceptable forecast error threshold and correlation among the multiple errors can be used to evaluate the performance of a hydrological model and to quantify the uncertainties of a hydrological model output.


Introduction
The forecast error, which is usually defined as difference between the simulated and observed streamflow, could benefit the assessment of a hydrological model using certain criteria like Root Mean Square Error (RMSE) and the Nash-Sutcliffe coefficient [1].However, these criteria only represent the combined effects of the model bias and the model uncertainty because only residuals are considered [2].Consequently, they are unable to indicate how well the hydrological function of a system is maintained [3] and do not account for heteroscedasticity and serial dependence of the residuals [4][5][6].As a result, the traditional criteria have limited ability to quantifying the reliability of a hydrological model.
On the other hand, the forecast error does not necessarily cause risk, since the forecast error has an acceptance threshold, which has rarely been discussed in previous studies.An acceptance threshold of forecast error is defined as an interval in which the forecast error is within the interval of the acceptance threshold, the hydrological model is reliable and may have potential application; otherwise, the hydrological model is not accepted.Thus, the acceptable threshold of forecast error could support a much broader perspective for the assessment of hydrological model.
Although the serial dependency of respective forecast error for a hydrological model has been widely studied [4,5,[7][8][9][10][11][12][13][14], the research of inter-correlation of forecast errors (e.g., rainfall forecast error and flood forecast error) is limited.Krzysztofowicz [15] presented an analytic numerical method combining the precipitation uncertainty and hydrologic uncertainty to forecast river stages, which is a mixture of two distributions related to occurrence and nonoccurrence of precipitation.Hostache et al. [16] assessed the predictive uncertainty of the coupled atmospheric-hydrologic-error correction model, which used a bivariate meta-Gaussian probability density function.The latter enabled the computation of confidence intervals in a normalized space, and the results demoed the validity of the model.Nester et al. [17] analyzed the scaling relationship with catchment area between the precipitation forecast error and hydrological simulation error.Yazdi et al. [18] proposed a stochastic methodology based on Monte Carlo simulation and multivariate analysis to provide the uncertainty band of the rainfall-runoff model and to calculate the probability of the acceptable forecasts.They discussed the acceptable error for threshold discharge (amplitude error) and the acceptable error for the time (phase error).Dogulu et al. [19] focused on the methods predicting model residual uncertainty based on the quantile regression and UNcertainty Estimation based on local Errors and Clustering (UNEEC) and compared the performance of the methods.Although many studies focus on the uncertainties of the hydrological model, the study on the rainfall forecast error and peak flood forecast error is rare.
Considering the classical rainfall-runoff model, the rainfall is transformed to effective rainfall after the interception (e.g., canopy interception, soil detention).Therefore, the term of rainfall forecast error appears because of the uncertainties during runoff yield, such as the simplified calculation of evaporation, canopy interception, soil detention.During the runoff confluence, the peak flood forecast error appears for the simplified physical catchment, which the hydrological model simulated and the complex interaction among the soil-surface-groundwater.Thus, we will explore the inter-correlation between the rainfall forecast error and peak flood forecast error, and try to present an improved approach to assessing the performance of the hydrological model and considering the inter-correlation of forecast errors.Therefore, the following problems we focus on in this study are: (1) What is the correlation between the rainfall forecast error and peak flood forecast error, is it statistically significant?(2) How to calculate the reliability index of a hydrological model based on the reliability theory, under the context of correlated forecast errors considering two scenarios, single failure mode and double failure mode (see Section 4)? (3) What is the effect of different types of the performance function on the reliability index of a hydrological model?(4) What is the change law for the reliability index of a hydrological model with regard to the variable correlation between the two forecast errors?
To address these questions, we propose a framework to assess the performance of the hydrological model based on the reliability theory, which has been widely applied in the civil engineering field [20][21][22][23][24].
The reminder of this paper is organized as follows: Section 2 starts with a description of reliability theory, including FOSM and bound methods.Section 3 describes the study area and error samples, and the results are then presented and discussed in Section 4. Finally, the conclusion is drawn in Section 5.

Simple Introduction to Reliability Theory
Reliability is defined as the probabilistic measure of whether a system meets certain standards [25,26].The reliability theory analyzes the relationship between the load and the resistance of a problem by using a reliability index and/or a failure probability.In the context of a hydrological model performance assessment problem, the forecast error, which is the output of the hydrological model, could be regarded as the load of the model, and the acceptable threshold could be regarded as the resistance.This is because the acceptable threshold could reflect some features of certain physical catchment, thus may also give reference to assessing the hydrological model.The forecast error (load) should meet the acceptable threshold (resistance) if the hydrological model works well and can be applied for practice.Otherwise, the failure probability of the event can be defined as follows.For example, the event is whether the forecast error of a hydrological model meets the acceptable threshold of the forecast error for certain catchment.
where is the acceptable error resistance, g( ) is the error load, and is the joint distribution of the error load and the acceptable error resistance.The distribution on ( ) can be obtained by using either numeric methods (e.g., Monte-Carlo sampling methods [27,28]) or analytic methods.
Assuming that the failure probability is based on a specific performance criteria, the limit state corresponding to Equation (1), which represents a boundary between safe and unsafe regions on the parametric space, is: , , , , , , 0 Equation ( 2) is the performance function of a hydrological model based on forecast error information.The hydrological model is either reliable and its accuracy is acceptable or the hydrological model is unreliable.Solutions of Equation ( 2) are reference points.
If Z follows the normal distribution ( ) N μ σ , the reliability index β can be obtained as follows.
Thus, the relationship between β and f p (Ganji & Jowkarshorijeh, 2012) is as follows: ( ) where ( ) Φ is a standard normal probability distribution function.
Equation ( 4) is derived under the condition that the performance function follows a normal distribution.However, this situation does not valid for a large set of random variables with arbitrary distributions.Hence, many approximation methods for calculating the reliability index have been developed, such as first order second moment (FOSM) method [29], second order second moment (SOSM) method [30], second order fourth moment (SOFM) method [31], and responding surface method (RSM) [32].The FOSM method is adopted in this study since it is the simplest and most convenient method in the reliability theory among the above-mentioned methodologies [33].

FOSM Method
FOSM is a well-known method used in water resources engineering, which is originally developed to evaluate the safety of the structure systems.The FOSM method takes the first two statistical moments of a linear approximation of the performance function and attempts to find the minimal distance from the given nominal point to the tangent hyper-plane.This distance provides a measure of reliability for evaluation of hydrological model through the reliability index.
Without loss of generality, consider a function G of interest of several random variables and standard deviation ( ) ( X is the basic variables in the reliability theory).G transforms the random variable into a random variable, ( ) , which is the performance function in the reliability theory.The linearized form of Z about the mean value X μ is defined as follows: ( ) is the gradient vector of G and describes the sensitivity of Z to the input variable , and the mean (the first moment) and variance (the second moment) of Z can be computed as follows, respectively: The FOSM method requires that the input variables are statistically independent.If input variables are dependent, the transformation of input variables is needed.The common transformation in engineering is orthogonal transformation, which is detailed in Shinozuka [34], other transformation includes Rosenblatt transformation [35], Nataf transformation [36], Hermite polynomials transformation [37], and so on.The orthogonal transformation was adopted in this study to deal with the correlated input variables because of its robust feature and compact formulation.

Ditlevsen's Bounds Method
Ditlevsen's bounds [38] method is widely used for calculating system failure probability.This method is often reasonably accurate and computationally more efficient than Monte-Carlo simulation.Considering a general structural system, the failure of the system E is modeled as the union of all possible failure modes with the failure events ( = 1, ⋯ , ), = ∪ ∪ ⋯ ∪ = ⋃ .Suppose the joint density probability function of the basic variables is ( ), then the failure probability of the systems is: in which Pr ( ) denotes the failure probability.Generally, the structure system includes series structure, parallel structure and mix of series and parallel structure, according to the relationship between the system and its members.The series structure is a structure in which the failure of any member of the system will lead to an overall failure of the system.The parallel structure, on the other hand, is a structure in which the system fails only if all its members fail.Denote the failure probability of as ; the failure probability of system can be obtained according to Cornell [39], which is named wide bounds approach.
where is the number of event when it is failure.Denoting the failure probability of two failure modes failing simultaneously as = , the failure probability of the systems , according to the narrow bound [38,40], is For the parallel case, the bound for the failure probability of systems could be expressed as: where is the failure status numbers of the total failure events ∪ = ⋃ ∩ , .

Study Area
The Dahuofang (DHF) Reservoir is located in the middle-to-lower course of the Hun River, between 41°31′ N-42°15′ N and 120°20′ E-125°15′ E, in Liaoning Province, northeast of China (see Figure 1).It is a large-scale hydraulic project, mainly used for flood defense and irrigation.It controls an area of 5437 km 2 .The area has a mean annual rainfall of over 800 mm, generally concentrating in July and August, and the floods are driven mainly by storms.The DHF model was used to forecast floods with 3-h time steps, which is a hybrid conceptual rainfall-runoff model, consisting of two parts.The runoff yield part has an eight-parameter excess infiltration runoff model, based on the Horton curve [41] and the double-layered infiltration curve for reductive calculation, while using parabolas to describe the upper amount of water storage and the distribution of double-layered infiltration.The runoff confluence part has an eight-parameter empirical unit hydro-graph convergence model with variable intensity and variable confluence velocity, which utilizes exponents and triangular function production to describe the empirical unit hydro-graph, and antecedent rainfall, reflecting the velocity variations in confluence.The structure of the DHF model is shown in Figure 2, and its parameters are listed in Table 1.
In this study, the DHF's 31 floods from 1951 to 1996 were analyzed.The frequency of floods in this period is listed in Table 2, which shows the years having more than two floods [42].Here, the rainfall forecast errors denotes the errors of effective precipitation for the set of 31 floods, and corresponding peak flood forecast error.The accuracy uses the criteria of qualified rate, and the correlation between the forecast errors are shown in Table 3. Table 3 shows that the DHF hydrological model has good performance in reproducing volumes and peak floods in the DHF river basin because the DHF model is especially developed for this river basin.It is also seen from the Table 3 that the errors of rainfall forecast and flood forecast are statistically significant correlated, where the qualified rate is calculated by counting the errors which is within the range of 20% observed rainfall or flood samples, and dividing the total number of respective error samples.The correlation coefficients between two forecast errors are calculated according to statistics theory.Comparing different correlation coefficients between rainfall forecast error and flood forecast error, it seems statistically more nonlinear than linear association for these two forecast errors because the former have larger coefficients, but for the convenient of calculation, only the linear correlation is considered in this study.To determine the distribution of forecast errors for DHF hydrological model, a normal statistical test namely the Lilliefors test [43,44] has been conducted and the result is shown in Figure 3.It is observed that the rainfall forecast errors and flood forecast errors are almost normally distributed, although there are some outliers on the upper tail of the error samples.The Lilliefors test on the distribution of the error samples shows that we could not reject the null hypothesis that these errors come from a normal family at the 5% significant level.This result is consistent with the conventional hypothesis that the errors follow a normal distribution, but it is very obvious that the distributions of errors samples do not come from a standard normal distribution because they deviate the origin of the axis.Thus, the probability density functions (pdfs) of error samples are obtained with the rainfall forecast errors ~N(2.09,9.12)and the flood forecast error ~N(−3.13,294.11)through normal moments fitting method.To identify the relationship between the forecasted items such as rainfall or flood peaks and the forecast error, Figure 4 is obtained through the standardized pairs of the forecasted rainfall and the rainfall forecast error, and pairs of the forecasted flood peak and the flood forecast error.The standardized forecast or forecasted samples are conducted by the forecast or forecasted samples minus their respective mean, and then are divided by their respective standard errors.Figure 4 shows that the two pairs of rainfall and peak flood seems to be stochastic with the two correlation coefficients of two pairs being both less than 0.2, thus it could be regarded as independent pairs.

Results and Discussion
The following sections discussed two modes of reliability indexes that considered the acceptable error threshold, especially the single and double failure modes.The single failure mode deals with the rainfall forecast error or flood forecast error separately, while the double mode deals with the failure of jointly rainfall forecast error and flood forecast error.

Single Failure Mode
To assess the performance of a hydrological model, 20% of the standard deviations of observed samples in rainfall forecasts and flood forecasts are regarded as their acceptable error threshold (forecast error resistance), according to the Standard for hydrological information and hydrological forecasting of China [45].The reliability indexes β calculated by Equation (3) and the failure probabilities p calculated by Equation ( 4) of the two errors are shown in Figure 5.We note that since the performance functions are normal and the respective forecast error resistance is constant, the calculation of the β and is therefore easy in this case.The variations in failure probabilities and reliability indexes, using different proportions (e.g., 5%, 10%, 15%, 20%, 25%, 30% and 35%) of the standard deviations of observed samples as the acceptable error threshold, are also shown in Figure 5, where the solid lines are the rainfall forecast error, the dashed lines are the flood forecast error.The horizontal axis represents different proportions of the standard deviations of observed rainfall or peak floods, and the left and right vertical axis represent the failure probability and the reliability index, respectively.Figure 5 shows that when the acceptable error threshold increases, the failure probabilities of the two errors decrease, and the reliability indexes increase.This point justifies that an increasing acceptable error threshold is equivalent to increasing resistances of error.The failure probability of flood forecast error decreases nonlinearly with respect to the acceptable error threshold up to 35% of the standard deviations of the observed flood samples.Obviously, different acceptable thresholds have different failure probabilities of flood forecast error, and the curve shows more flat after the 20% threshold, indicating that the failure probability of flood forecast error may not be sensitive when the threshold reaches to certain proportion.At the same time, the reliability index of flood forecast error changes nonlinearly with different acceptable thresholds.However, the failure probability and reliability index of rainfall forecast error changes almost linearly with different threshold of rainfall forecast error.As a result, the reliability indexes of these two forecast error are 0.94 for rainfall forecast error and 1.38 for flood forecast error respectively, corresponding failure probability are 0.17 for the former and 0.085 for the latter.It shows that the flood forecast errors is more reliable than the rainfall forecast errors in DHF hydrological model, because the former has larger reliability index, since the reliability index is a relevant index with the inclusion of the variance of the observed samples.

Unknown Performance Function
The reliability index of the DHF hydrologic model can be calculated by combining the rainfall and flood forecast errors.Since the physical mechanism of the two errors affecting the hydrologic model is unclear, an unknown performance function is first considered.In reliability theory, the wide and narrow bounds methods [38][39][40] were used to calculate reliability indexes of serial and parallel structural systems.Here, the serial or parallel structured error represent the relationship to the reliability of the hydrological model, where the forecasted rainfall and flood are both used to calculate the forecast error.Taking into account the correlation between the two failure modes (here simply using the linear correlation coefficient between the two error samples as coefficient for the two failure modes for the convenient of calculation), the results are shown in Table 4. Table 4 indicates that the two errors have larger failure probability in serial structure than in parallel structure.This is due to the mutual effects of the two errors in a serial structure, where the failure of any error will result in failure of the whole system.In the parallel structure, on the other hand, the failure of one error does not imply the failure of the system since it only fails when both errors fail.Table 4 shows the system failure probability is increased comparing to the single failure mode of rainfall forecast error or flood forecast error with the serial structure type for these two forecast errors, however, the system failure probability with the parallel structure type is decreased comparing to the single mode, which is because the structure type of forecast error changes the interaction of forecast errors on the failure probability of combined system.That is to say, it is easier to failure with serial structure because of the additional load, while more difficult to failure with parallel structure because of the additional resistance.Table 4 also shows that the interval of the wide bounds method contains the interval of the narrow bounds method.Hence, in practice, the narrow bounds method is generally used to avoiding the overly wide range gathered from the wide bounds method and the result obtained by the narrow bounds may be regarded more accurate because of its narrow variance from the perspective of hydrology.

Known Performance Function
Five types of performance functions (Equation ( 12)) are considered in this study.The calculation of the failure probability of the system was performed by Monte-Carlo sampling methods [27,28].In the calculation, is constant and obtained by the production of the 20% standard deviation of respective observed samples.Figure 6 shows the variation of the reliability indexes with the Pearson coefficient and the system resistance.Figure 6a shows that the system reliability indexes varies with different correlation coefficients, and the system reliability indexes decrease with the increasing positive correlation coefficients between two error samples for all five types performance function.Due to the potential relationship between the system reliability indexes and the correlation of two forecast errors [46], the derivative of system reliability index with respect to the correlation may be negative.Thus, the system reliability index is a decrease function of correlation coefficient when it varies in the interval (0, 1].Different forms of performance function have great impact on the system reliability indices.Form Equation (12d) has the smallest β at the same Pearson coefficient, while form Equation (12e) has the largest.Forms Equations (12a)-(12c) have almost the same change curves with different coefficients, which may due to the reason that the values of loads of Equations (12a)-(12c) are far small than the resistances.It is also clear that the resistance has great impact on system reliability indices from the Figure 6b, since the system reliability indices are increased with the resistance for the two error samples at the same resistance.Form of Equation (12e) shows the largest system reliability index, while form Equation (12d) shows the smallest one.Forms of Equations (12a)-(12c) behave very similarly.The form of Equation (12d) could be chosen as the form of performance function according to the principle of worst.As a result, the reliability index of system was 1.39 with the performance function form Equation (12d) under the correlation coefficient 0.40.

The Acceptable Threshold Value of the Performance Function
The performance assessment of a hydrological model may be easy problem by comparing the simulated and observed outputs.Based on this logic, many performance assessment approaches are presented, such as the NSE [1], RMSE and the R or R 2 .Most of them are based on the residuals of the model for the simulated hydrological processes [47,48].The residuals could have acceptable threshold, which would be resistance from the perspective of reliability theory.Therefore, based on the reliability theory, the assessment of performance of hydrological model is conducted to (1) considering the acceptable forecast error threshold and (2) including the correlation among the forecast errors, which will add a new insight to the forecast error and for the assessment of a hydrological model, and the traditional criteria are still very useful from different aspects to quantify the performance of a hydrological model.
In this study, the acceptable threshold of forecast error is only determined by the observed samples of rainfall or flood, it is an empirical and conventional approach because of the scarcity of physical foundation on allowance error limit.The 20% of observed sample variance is adopted according to the legally bind standard (GAQSIQC), thus as a guideline, it is also recommended to use this acceptable threshold in practice based on the standard, however, it requires further works including the sensitivity analysis of performance function of a hydrological model to determine the critical value because the resistance directly impacts the evaluation of performance of hydrological models, and many factors may influence the acceptable threshold of forecast error, not only the precision of the hydrological model, which will involve the uncertainties in hydrological system, but also the recognition limit of the human on forecast error itself which remain unexplored.
In the form of the performance function, only linear function between the load and the resistance is considered, which is popular and simplest way in the literatures appeared.Other forms could be considered after analyzing the relevant features between the load and the resistance.In addition, the nonlinear correlation between the basic variables such as rainfall forecast error and flood forecast error need to explore in deep research in the future.

Correlation between the Basic Variables of the Performance Function
The issue of correlation among the basic variables of the performance function has been widely discussed in the structure engineering, and many approaches have been developed to deal with this issue in the reliability theory, which presents a convenient tool for correlated variables because of many mature approaches such as orthogonal transformation [34] and Rosenblatt transformation [35], since many performance function in practice are complex, and it is difficult to obtain the analytical solution for the performance function, thus many approximation methods have been proposed, including the FOSM, SOSM and SOFM, and so on.
The correlation between the rainfall forecast error and flood forecast error is a challenging work since few literature has discussed the physical mechanism of the correlation [17,18].Results are reasonable because the correlation is statistically significant through the case study of DHF reservoir, but the correlation between the variables of the rainfall forecast error and flood forecast error may not the same correlation between the modes of two failure modes, and hence, only the linear correlation is considered for the two forecast errors in the convenient method.However, whether the rainfall forecast error influence the flood forecast error or not requires further investigation.

Conclusions
By considering the forecast error may not cause risk instantly and the statistically significant correlation between the rainfall forecast error and the flood forecast error, an assessment framework based on the reliability theory on the performance of the hydrological model is presented.Compared with conventional criteria such as NSE [1] and RMSE, this framework used not only the residual information but also the threshold of the residuals.Although the determination of the threshold is empirical, the reliability index gives another insight for the evaluation of performance of hydrological models and it remains a difficult task to compare the reliability index to the common criteria like RMSE, NSE [1] and R 2 because the former add information on the threshold of forecast error.The correlation between the two errors and two failure modes is also considered through the reliability theory.Although the physical mechanism of the correlation between the two errors is not clear in detail, it is necessary for dealing with the correlation when it is statistically significant.Future work could be focused on the error propagation from the rainfall forecast error to the flood forecast error, and using the reliability index to guide the practice for presenting criteria for evaluating the performance of hydrological models.Through the case study of DHF hydrological model, the following conclusion could be obtained: (1) The sensitivity of performance function on different acceptable threshold of forecast error are different, and the variation of the flood forecast error seems to change nonlinearly with their acceptable threshold and the rainfall forecast error changes almost linearly with their acceptable threshold.(2) The correlation between the rainfall forecast error and the flood forecast has great impact on the evaluation of performance of hydrological model, and the reliability index of the hydrological model system decreases when the correlation increases positively.The resistance has great impact on the evaluation of performance of a hydrological model, which means that larger resistance indicates higher reliability index.

Figure 3 .
Figure 3. Normal distribution test for (a) rainfall forecast error sample; and (b) flood forecast error sample.

Figure 4 .
Figure 4. Relationship of (a) standardized forecasted rainfall and standardized error of rainfall; and (b) standardized forecasted flood and standardized error of flood.

Figure 5 .
Figure 5. Failure probabilities and reliability indexes of rainfall forecast errors and flood forecast errors for DHF reservoir.
Z denote different performance functions.is systemic resistance under combined effect of rainfall forecast error and flood forecast error, is rainfall forecast error load, and is flood forecast error load.

Figure 6 .
Figure 6.Relationship between system reliability indexes and (a) different correlation coefficients and (b) different system resistances of two errors of DHF reservoir.

Table 1 .
Parameters description of the DHF hydrological model.

Table 2 .
Number of floods in specified year for DHF reservoir.

Table 3 .
Accuracy of floods and correlation of forecast errors for DHF reservoir.The correlation is significant at the level 0.05; ** the correlation is significant at the level 0.01.

Table 4 .
System failure probability of rainfall forecast errors and flood forecast errors for DHF reservoir (with bounds method).