Many regions especially in developing countries are suffering from severe water stresses resulting from heterogenous precipitation, extreme hydrological events, water shortages, as well as numerous demands from socio-economic and natural systems [1
]. However, extensive uncertainties in hydrological processes are among the major challenges to deriving reliable water system management strategies. Uncertainties in hydrological processes have been addressed with a large number of research works [6
]. For hydrological processes, the variation of natural circumstances, the lack of historical records, and the limitation of measurement may cause uncertainties in input information, i.e., randomness in the streamflow inputs [11
]. Moreover, the interactive relationship among historical data records of streamflow may lead to uncertainties in hydrological predictions, e.g., sub-optimal parameter values and errors due to incomplete or biased model structures. All these uncertainties may affect the resulting results of frequency analysis for streamflow. Consequently, it is necessary to develop effective tools to identify and analyze these uncertainties in order to preserve the distributional characteristics of streamflow records, maintain the dependence structure of such records, and quantify the uncertainties existing in the hydrological processes.
In order to identify and analyze uncertainties in hydrological processes, a large number of methods have been proposed in recent decades [12
]. For example, a regional frequency analysis of annual maximum streamflow for Gipuzkoa was developed by Erro and Lόpez [21
]; the results showed that Gipuzkoa could be characterized by generalized logistic distribution. Schnier and Cai proposed a model tree ensembles (MTEs) method to predict streamflow frequency statistics, and they obtained the results that MTEs outperform global multiple-linear regression models in terms of predictions in watersheds which are ungagged [22
]. A non-stationary flood frequency model was proposed by Zhang et al. [23
] to analyze the annual peak streamflow data in the Pearl River Basin, China, which showed that the streamflow trends significantly impacted the estimation of flood frequencies and investigated the reason of changes in hydrological extremes. Among these methods, the Bayesian network was recently used since it can provide the opportunity to quantify the uncertainty of parameter and model presentation [24
]. For instance, Nagarajan et al. utilized a scalable spatiotemporal method based on the Bayesian network to estimate streamflow [25
]; the results indicated prediction accuracy of the developed method well. Mediero et al. applied the Bayesian network method to a homogeneous region in the Tagus Basin to estimate the flood quantile [26
]; the results denoted as a probability distribution of discharges could supply information about the prediction uncertainty. The formal safety assessment and the Bayesian network technique was incorporated by Zhang et al. [27
] to estimate the navigational risk of the Yangtze River, China; the advantage of the proposed model was that both probability and consequences of accidents were considered. D’Addabbo et al. [28
] employed the Bayesian network to monitor flood events, which proved that the Bayesian network method can help gain insight into the complex phenomena related to floods.
As a probabilistic computational structure, the Bayesian network infers the joint probability distributions of the related random variables from observations. In addition, the Bayesian network allows taking different sources of estimation uncertainty into account. However, it is still a challenge as to whether the joint probability distributions in the Bayesian network can address the interactive and dependence structure between related random variables accurately. The copula method has been widely applied to complex hydrologic phenomena with strongly correlated variables. Using the copula method, one may successfully capture the non-linear dependence between random variables. Taking into account of the capability of reflecting correlation between random variables, the copula method was introduced into the Bayesian network framework. For example, Madargar and Moradkhani proposed copula-based modeling of Bayesian networks for seasonal drought forecasting [29
]; then, they extended the work and proposed a probabilistic forecast model by using a statistical forecast model within Bayesian networks, which was applied for spatial variations of future droughts evaluation for the Gunnison River Basin in Gunnison, CO, USA (Madargar and Moradkhani) [30
]. However, in terms of the copula method, the marginal distributions in the copula function were determined by parametric distributions, which may be unable to preserve the distributional characteristics of the historical records properly. In addition, the copula-based Bayesian network approach has not proved its capability and efficiency for frequency analysis of monthly streamflow in rivers of China.
Therefore, this study aims to advance a maximum entropy-Archimedean copula-based Bayesian network (MECBN) method for frequency analysis of monthly streamflow. The MECBN method is then applied to a case study of the Kaidu River Basin in the Xinjiang Uyghur Autonomous Region, an arid region in northwest China. In the MECBN method, the maximum entropy-Archimedean copula approach is incorporated into a Bayesian network framework according to the dependences between adjacent monthly streamflows in the Kaidu River Basin. MECBN is used for: (i) assessing the uncertainties which exist in the model representation; (ii) addressing the correlation structure between streamflows in two adjacent months, and (iii) presenting the key statistical characters (i.e., mean, standard deviation, skewness and kurtosis) for the predictive streamflow.
In this study, a maximum entropy-Archimedean copula-based Bayesian network (MECBN) method has been developed in order to analyze the interaction of monthly streamflows. MECBN is effective for representing the uncertainties existing in the model representation, preserving the distributional characteristics of streamflow records and addressing the correlation structure between streamflow pairs. This study is the first attempt at introducing a maximum entropy-copula into the Bayesian network modeling framework. Compared to the conventional Bayesian network approach, MECBN is more effective for addressing the correlation structure between streamflow pairs with arbitrary random characteristics. Moreover, the marginal distribution of monthly streamflow is approximated by the maximum entropy method. Such a method could preserve key statistics, and does not require specific assumptions (e.g., normal, gamma etc.) on the historical streamflow data. The limitations of MECBN is that the selection of copula functions needs more computation to ensure the validity of preserving the non-linear dependence structure. Moreover, the determination of parameters in the MECBN method is computationally cumbersome.
The proposed method has been applied to frequency analysis of monthly streamflow in Kaidu River Basin. The results indicate that (a) MECBN can preserve the key statistical characteristics (i.e., mean, standard deviation, skewness and kurtosis) for each monthly streamflow in the Kaidu River Basin; (b) the Gumbel–Hougaard copula among Archimedean copulas is the most appropriate for reflecting the interactions of monthly streamflow pairs, with the minimum values of AIC and RMSE; (c) the joint distributions, the conditional distributions and the dependence measures including Spearman’s Rho, Kendall’s Tau and the tail dependence coefficient reveal the upper tail dependence between monthly streamflow pairs in the Kaidu River Basin; (d) the goodness-of-fit statistical tests show a good performance of MECBN in describing the historical data of the Kaidu River Basin, China.
The frequency analysis of monthly streamflow in the Kaidu River Basin indicates that the interactions between two adjacent monthly streamflow pairs are non-linear. Compared with the other months, there is a low rainfall but a high streamflow in April due to the fact that snow melting is the main source for the streamflow in April in the Kaidu River Basin. The upper tail dependence between monthly streamflow pairs in the Kaidu River Basin indicates that streamflow of one month depends on the streamflow of its adjacent month. It also denotes a high probability for one monthly streamflow exceeding a high threshold under the situation that its adjacent monthly streamflow exceeds the threshold. The upper tail dependence varies from different streamflow status (95%, 97.5% and 99% quantiles of streamflow) and is described by the upper tail dependence coefficients. The dependence coefficients of Spearman’s rho, Kendall’s tau and the upper tail dependence coefficient are in inverse proportion to monthly streamflow values in the Kaidu River Basin. This is because a variety of factors (i.e., rainfall, snow melting, evapotranspiration rate and requirement of water use) would affect the streamflow in the Kaidu River Basin in the flooding season, leading to less impact from the streamflow in the antecedent month. These findings have great significance for the Kaidu River Basin, which is located in the arid region of China. They can be used for providing vital information for the prevention and control of hydrological extremes and further water resources planning in the Kaidu River Basin.