An Ensemble Flow Forecast Method Based on Autoregressive Model and Hydrological Uncertainty Processer

In the process of hydrological forecasting, there are uncertainties in data input, model parameters, and model structure, which cause a deterministic forecasting to fail to provide useful risk information to decision-makers. Therefore, the study of ensemble forecasting and the analysis of hydrological uncertainty are of great significance to guide the actual operation of reservoirs in the flood season. This study proposed a Bayesian ensemble forecast method, comprising of a Gaussian mixture model (GMM), a hydrological uncertainty processer (HUP), and an Autoregressive (AR) model. First, the GMM is selected as the marginal distribution function to estimate the uncertainty of observed and modelled data. Next, the AR model is used to correct the forecast rainfall data. Then, a modified HUP is used to deal with the uncertainty of hydrological model structure and rainfall input data. In the end, the ensemble flow forecast results are composed of the expected values of the posterior distribution obtained by HUP under different rainfall conditions. Taking the Three Gorges Reservoir (TGR) as a case study, the ensemble flow prediction in the forecast period is calculated by using the above method. Results show that the method proposed in this paper can improve the accuracy of runoff forecasts and reduce the uncertainty of the hydrological forecast.


Introduction
Flooding is the most common natural hazard and third most damaging globally after storms and earthquakes [1]. Flood forecasting can provide key information for disaster warning and flood control, which plays an important role in reducing the damage caused by flooding [2,3]. The common flood forecasting methods generally use a deterministic hydrological model to predict the future flood process [4][5][6][7]. However, the traditional deterministic hydrological forecast cannot provide the hydrological forecast value in the forecast period to meet the needs of the corresponding river basin management department [8].
Ensemble flow forecasting can provide more hydrological forecasting information for policymakers in order to better ensure the safety of life and property in downstream areas and improve the social and economic benefits [9,10]. Ensemble weather forecasting (EWF) plays an important role in ensemble flow forecasting [11]. First, EWF contains more information, which can provide more comprehensive meteorological data for a hydrological forecast in a future period [12]. Second, research shows that combining EWF with a hydrological forecast model can improve the reliability and accuracy of forecast results [13]. However, due to the diversity of atmospheric conditions and topography, the simplification of physical and thermodynamic processes, the uncertainties of parameterization of However, it did not consider the effect of precipitation uncertainty. In order to solve the above two problems, a method combining precipitation-dependent HUP (PD-HUP) and GMM is proposed in this paper. The GMM function is used as the marginal distribution function in BFS, and the uncertainty of precipitation input data is also considered.
In summary, the main objective of this paper is to propose a Bayesian ensemble forecast method considering uncertainty of hydrological models and precipitation. First, the Autoregressive (AR) model is used to correct the rainfall data of Ensemble Weather Forecasts (EWF). Then, a precipitation-dependent HUP (PD-HUP) based on GMM is proposed to deal with the uncertainty of both the input data and the hydrological model. Finally, the ensemble flow forecast results are generated from the results determined by the PD-HUP method.

Postprocessing of Ensemble Weather Forecasts
The Autoregressive (AR) model was used to postprocess the precipitation series of the EWF. The AR model only reflects the influence and effect of related factors on the prediction target through its own modelling of historical observations of time series variables. It is not restricted by the assumptions that the model variables are independent of each other. The model constructed can eliminate the difficulties caused by independent variable selection and multicollinearity in general regression prediction methods. The mathematical expression of the AR model is shown below.
where w j f ore,t is the uncorrected rainfall forecast value of jth ensemble member, e j t (t = 1, 2, . . . , N) is the prediction error sequence of the jth ensemble member at time t and represents the difference between the EWF result and the observed precipitation. N is the length of the precipitation sequence of each ensemble member. e j t−1 , e j t−2 , . . . , e j t−p is the prediction residual series for the p periods of the jth ensemble member before time t, α j 1 , α j 2 , . . . , α j p are the regression coefficients of the jth ensemble member, p is the order, which can be determined by the Akaike information criterion (AIC) criterion, ξ j represents a white noise sequence who's mean value is 0, and variance is determined from the parameterization and w j c being the corrected prediction of the jth ensemble member. The calculation steps of the AR model are as follows.
First, determine the order of p according to the AIC criterion. Letting p take different values, calculate the AIC value of each ensemble member, and select the p value with the minimum AIC value as the regression order of the AR model. AIC is expressed as follows: where k is the number of parameters, referring to the order of AR in the model in this paper, and L is the variance of the residuals. and Then the p-order AR model can be expressed as: The estimation of model parameters can be obtained from the least square principle.
( ) In the end, the AR models with the parameters determined above were used to correct the precipitation forecast errors.

Establishment of the Ensemble Flow Forecast Model
The Bayesian probability forecast system (BFS) was first proposed by Roman Krzysztofowicz in 1999. As an important part of BFS, the hydrologic uncertainty processor (HUP) presents the hydrological prediction values in a probabilistic form through Bayesian analysis and calculation of the results of the deterministic hydrological prediction model.
where ( ) γ ⋅ is the marginal PDF of observed inflow, ( ) q ⋅ is the standard normal density function, The PDF under different prediction periods can be expressed as follows.
Then the p-order AR model can be expressed as: The estimation of model parameters can be obtained from the least square principle.
In the end, the AR models with the parameters determined above were used to correct the precipitation forecast errors.

Establishment of the Ensemble Flow Forecast Model
The Bayesian probability forecast system (BFS) was first proposed by Roman Krzysztofowicz in 1999. As an important part of BFS, the hydrologic uncertainty processor (HUP) presents the hydrological prediction values in a probabilistic form through Bayesian analysis and calculation of the results of the deterministic hydrological prediction model.

The Hydrological Uncertainty Processor Methodology
Let n (n = 0, 1, 2, . . . , N) denote forecast time. Suppose that h n (n = 0, 1, 2, . . . , N) represents the observed inflow to the Three Gorges Reservoir (TGR) and s n (n = 0, 1, 2, . . . , N) represents the reservoir inflow simulated by the hydrological model. H n and S n are, respectively, the implementation values of the random variables. The prior probability density function (PDF) under different prediction periods can be expressed as the following equation.
where γ(·) is the marginal PDF of observed inflow, q(·) is the standard normal density function, Q −1 (·) is the standard normal quantile function, Γ(·) is the marginal cumulative probability function (CDF) of H n , and c is the Pearson correlation coefficient obtained from the normal quantile transform. The PDF under different prediction periods can be expressed as follows.
where Λ n (·) is the marginal distribution function of the reservoir inflow simulated by the hydrological model, and A n , B n , D n , and T n are the parameters calculated in the normal quantile conversion space and the expression are as follows.
A n = a n τ 2 n a 2 n τ 2 B n = −a n b n τ 2 n a 2 n τ 2 n + σ 2 n (12) D n = −a n d n τ 2 n a 2 n τ 2 n + σ 2 n (13) In the normal space, the stochastic dependence between W n and W n-1 is governed by a normal-linear equation.
where c n is Pearson's correlation coefficient, Ξ is a stochastically independent and normally distributed variable of W n with a mean of zero and a variance of τ 2 n = 1 − c 2 n . The stochastic dependence between X n and W n , W n−1 , W 0 is governed by a normal-linear equation.
where a n , b n , and d n are the regression coefficients, and Θ n is a stochastically independent and normally distributed variable of (W n , W 0 ) with a mean of zero and a variance of σ 2 n .

Precipitation-Dependent Hydrological Uncertainty Processer Based on Gaussian Mixture Model
In the calculation process of the HUP, the selection of marginal distribution function directly affects the accuracy of calculation results and the complexity of parameter estimation. Feng et al. [28] proposed that the Gaussian mixture model has better performance in fitting the marginal probability distribution of the observed flow and simulated flow, which can improve the performance of the prediction method based on HUP.
The marginal density function and distribution function in the HUP are estimated as follows.
where gmm(·|θ Γ ) and GMM(·|·) approximately represent the PDF and CDF of H n , where θ Γ and θ Λ n are the distribution parameters to be estimated. More details about the GMM can be found in Feng et al. [28]. Then, on the conditional of observed inflow at time t 0 , the prior distribution of H n can be expressed as: Similarly, according to Equation (10), under the condition that the observed inflow at time t 0 is h 0 and the simulated flow is s n , the posterior probability density function of H n is the following.
In order to accurately describe the uncertainty of the rainfall input data, the parameters of the posterior probability need to be discussed, according to the precipitation during the lead-time period. Krzysztofowicz [23] discussed the HUP model in the case of rainfall occurrence and non-occurrence, and proposed the precipitation-dependent HUP (PD-HUP) model considering the uncertainty of rainfall. However, there are only a few periods during the flood season when no rainfall is generated, so it is difficult to accurately estimate the marginal distribution function of observed and predicted flow in the case of no rainfall. Therefore, this paper establishes a PD-HUP model based on the precipitation in the lead time period, divided into two types of effective rainfall and negligible rainfall, and then weighs the two to form a PD-HUP based on Gaussian mixture model (PD-HUP-GMM) model that takes rainfall uncertainty into account.
Let Variable V represent different rainfall conditions and W represent precipitation. When v = 1, it means effective rainfall (w ≥ 1 mm) and v = 0 (0 ≤ w ≤ 1 mm) means negligible rainfall. Let γ ov (h 0 ) = p(h 0 |V = v) represent the marginal PDF of h 0 under the condition V = v where v = P(V = 1) represents the probability of effective rainfall in the lead time period. Then, the marginal PDF of H n is as follows.
Under the condition that the observed inflow is h 0 , the probability of effective and negligible rainfall is as follows.
Under the condition that the observed inflow is h 0 , the PDF of the S n is as follows.
Under the condition that the observed inflow is h 0 and simulated inflow is s n , the probability of effective and negligible rainfall is as follows.
Thus, finally, the posterior probability can be expressed as: Krzysztofowicz [23] discussed the HUP model in the case of rainfall occurrence and non-occurrence, and proposed the precipitation-dependent HUP (PD-HUP) model considering the uncertainty of rainfall. However, there are only a few periods during the flood season when no rainfall is generated, so it is difficult to accurately estimate the marginal distribution function of observed and predicted flow in the case of no rainfall. Therefore, this paper establishes a PD-HUP model based on the precipitation in the lead time period, divided into two types of effective rainfall and negligible rainfall, and then weighs the two to form a PD-HUP based on Gaussian mixture model (PD-HUP-GMM) model that takes rainfall uncertainty into account. Let Variable V represent different rainfall conditions and W represent precipitation. When v = 1, it means effective rainfall ( 1 mm w ≥ ) and v = 0 ( 0 1 mm w ≤ ≤ ) means negligible rainfall. Let represents the probability of effective rainfall in the lead time period. Then, the marginal PDF of Hn is as follows.
Under the condition that the observed inflow is h0, the probability of effective and negligible rainfall is as follows.
Under the condition that the observed inflow is h0, the PDF of the Sn is as follows.
Under the condition that the observed inflow is h0 and simulated inflow is sn, the probability of effective and negligible rainfall is as follows.

Generation of Ensemble Flow Forecasts
In order to convert the probability flow forecast result obtained by the PD-HUP-GMM method into the ensemble flow forecast result, the expected value of the probability flow forecast result under each rainfall condition is taken as the member of the flow forecast set to constitute the flow forecast ensemble.
First, the value of flood expectation under a different rainfall forecast is as follows. where H n,t represents the simulated value of the nth flood ensemble forecast member at time t, and E(H n,t ) is the expected value of H n,t . Then, the ensemble flow forecast result can be expressed as follows.
where T represents the number of periods of simulated runoff sequence, and each row in the matrix represents a member of a flood ensemble forecast. The flow chart describing the Bayesian ensemble forecast method is shown in Figure 1.

Data
The Three Gorges Reservoir (TGR) located on the Yangtze River in China was selected as a case study and located at Yichang on the Yangtze river in China, which is the world's largest hydroelectric project. Its geographical location is shown in Figure 2. The main functions of the Three Gorges Reservoir are flood control, power generation, and navigation, etc. The basin area of the Three Gorges Reservoir is 1 × 10 6 km 2 , the surface area of the reservoir is about 1080 km 2 , and the average width is about 1100 m [29]. The historical daily inflow data of TGR and precipitation from 2006 to 2017 during the flood season were used in this study. This study uses the EWF data from the second edition of the Global Ensemble Forecast System (GEFS) developed by the National Center for Environmental Forecasting (NCEP). This is one of the most widely used numerical forecasting systems in the world. The data set contains 11 set forecast members, and the spatial resolution in latitude and longitude is 1 • × 1 • . Data from January to September in 2017 was used in this study. The Xin'an Jiang model is selected as the deterministic prediction model of the PD-HUP-GMM method with the precipitation and evaporation in the study area used as the input. We decided to restrict the forecast period to one day ahead.

Correction of Ensemble Weather Forecasts
First, the value of order p was determined according to the AIC criterion. In this paper, p takes different values (1, 2, . . . , 10). When we calculate the AIC value of each ensemble member, the result is shown in Figure 3. The p value (p = 8) with the minimum AIC value was selected as the regression order of the AR model. Then, the mean absolute error (MAE) and root-mean square error (RMSE) were used to evaluate the performance of the AR model. The MAE and RMSE are defined as follows.
where F t represents the forecast precipitation, and O t represents the observed precipitation. The MAE and RMSE value of each GEFS member before and after correction are shown in Table 1. According to Table 1, after AR model correction, the MAE of precipitation data of GEFS members decreased by an average of 18.10 m 3 /s, while RMSE decreased by an average of 16.05 m 3 /s. It shows that the AR model can effectively reduce the precipitation deviation of GEFS. However, there are still some bias and uncertainties in the corrected GEFS precipitation data. Therefore, the PD-HUP model was adopted in this paper to further deal with the uncertainty of hydrological forecast caused by rainfall forecast bias.

Generation of Ensemble Flow Forecasts
The observed flow data in the flood season from 2006 to 2016 were used to fit the marginal distribution. In order to obtain the marginal distribution function of the simulated flow, the Xin'an Jiang model was used to calculate the simulated flow value in the same period as the observed flow at first, and the lead time n was set to one day. The number of days with effective rainfall is 1178, and the number of days with negligible rainfall is 164.
Following the ideas of Feng [28], the GMM with three Gaussian components was used to fit the observed inflow and simulated inflow data of TGR. The Maximum Likelihood Estimate (MLE) was used to estimate the parameters of the marginal distributions under heavy effective and negligible rainfall conditions, respectively. The estimated parameter values are given in Tables 2 and 3. The fitting of marginal CDF is shown in Figure 4. As can be seen from Figure 4, GMM fits the observed and simulated flow marginal distribution under effective and negligible rainfall conditions.

Parameters of PD-HUP
The calculation method of prior density, likelihood function, and posterior is shown in Equations (15)- (18). The parameters of prior density and likelihood function in the transformed space is shown in Table 4 and the parameters of parameters of post distribution in the transformed space is shown in Table 5.

Ensemble Forecast Analysis
In order to verify the effectiveness of the method proposed in this paper, precipitation and inflow data of TGR in the flood season of 2017 were taken as model input data, and the above method was used to calculate the ensemble inflow forecast results of TGR. Three scenarios as shown in Table 6 were set up for comparative analysis. The Nash-Sutcliffe efficiency (NSE) and RMSE were selected to evaluate the accuracy of forecast results under different scenarios and the variance was used to evaluate the uncertainty of the Bayesian ensemble forecast result, as discussed in the following subsections.  Figure 5. The formula for calculating NSE is as follows and the calculated results of indicators are shown in Table 7.
It can be seen from the simulation results that, when the observed historical rainfall data is used as the input of the hydrological model, the NSE of the simulation results can reach 0.92, as shown in Table 7, indicating that the Xin'an Jiang model can be used as a deterministic model for hydrological prediction in the research area. When GEFS data is directly used as input data of the Xin'an Jiang model, the simulation results of the hydrological model are significantly larger than the actual runoff process NSE= −7.14. When the GEFS data is corrected by using the AR model and the corrected GEFS data is used as the input of the Xin'anjiang model, the simulation accuracy of the Xin'an Jiang model has been significantly improved. However, by comparing the ensemble average with the observed inflow process, the NSE improves to only 0.56, which is difficult to meet the demand for prediction accuracy in actual production. This is mainly because the uncertainty of rainfall input data affects the accuracy of hydrological model simulation results. Furthermore, by using the PD-HUP-GMM method proposed in this paper, it can be found that the simulation results of the hydrological model have been effectively improved. The NSE of the ensemble average value is up to 0.91 and the RMSE decreases by 52.6% when compared with the scenario of directly adopting the corrected GEFS data, greatly improve the accuracy of the runoff simulation, and enough to meet demand for hydrological forecast accuracy in practical production. It shows that the method proposed in this paper greatly improves the accuracy of inflow forecast of TGR and is sufficient to meet the demand of hydrological prediction accuracy in actual production.

Uncertainty Evaluation of Ensemble Prediction Results
In order to analyze the uncertainty of hydrological model simulation results under different scenarios, the variances of ensemble members in different time periods and different scenarios are calculated. The formula for variance used is shown below.
where n is the number of ensemble members, Q i,t represents the prediction result of the ith ensemble member at time t, and Q t is the mean value of the prediction result of all ensemble members at time t.
The variance of ensemble members in each scenario changes over time as shown in Figure 6.