A C-Vine Copula-Based Quantile Regression Method for Streamﬂow Forecasting in Xiangxi River Basin, China

: In this study, a C-vine copula-based quantile regression (CVQR) model is proposed for forecasting monthly streamﬂow. The CVQR model integrates techniques for vine copulas and quantile regression into a framework that can effectively establish relationships between the multidimensional response-independent variables as well as capture the upper tail or asymmetric dependence (i.e., upper extreme values). The CVQR model is applied to the Xiangxi River basin that is located in the Three Gorges Reservoir area in China for monthly streamﬂow forecasting. Multiple linear regression (MLR) and artiﬁcial neural network (ANN) are also compared to illustrate the applicability of CVQR. The results show that the CVQR model performs best in the calibration period for monthly streamﬂow prediction. The results also indicate that MLR has the worst effects in extreme quantile (ﬂood events) and conﬁdence interval predictions. Moreover, the performance of ANN tends to be overestimated in the process of peak prediction. Notably, CVQR is the most effective at capturing upper tail dependences among the hydrometeorological variables (i.e., ﬂoods). These ﬁndings are very helpful to decision-makers in hydrological process identiﬁcation and water resource management practices.

• A C-vine copula-based quantile regression (CVQR) model is developed. • The CVQR model is applied to monthly streamflow forecasting in the Xiangxi River basin. • It can establish relationships between multidimensional response and independent variables. • It can also capture tail or asymmetric dependences such as extremes values.

•
The results are helpful to decision-makers in water resource management practices.

Introduction
With continuously growing populations, water resources are becoming more and more important for urbanization and agricultural intensification, especially for developing countries [1][2][3]. In the process of water resource planning, streamflow forecasting plays a key role in hydrological risk assessment, reservoir operations, drought/flood prevention, and water resource allocation [4][5][6]. More importantly, the management efficiency of water resource systems mainly depends on the reliability and accuracy of hydrological prediction. Consequently, it is desirable to employ streamflow forecasting models for effective water resources planning and management.
Over the last few decades, great efforts have been made towards developing advanced forecasting techniques to improve hydrological prediction, including process-driven and data-driven statistical approaches [7][8][9]. Process-based modeling methods are based on the principle of water cycle balance coupling various physical processes, such as precipitation, evaporation, infiltration, and other processes [10,11]. These models use large amounts of data (e.g., hydrometeorology, topography, and land use/cover) and robust calibration techniques, while data-driven models can be easily built in practice without considering physical process information from hydrological models and have been extensively used [12][13][14]. Therefore, data-driven technology is very useful and valuable as an option for streamflow forecasting.
Previously, a variety of data-driven modeling techniques were proposed and promoted for streamflow forecasting, including autoregressive moving average, multiple linear regression (MLR), stepwise cluster analysis, artificial neural networks (ANN), genetic programming, and support vector regression (SVR) [15][16][17]. For example, Besaw et al. [18] employed the ANN method for streamflow forecasting in ungauged basins. The results showed that local climate measurements with time delays as the input to the model are key to improving hydrological forecasting. Guo et al. [19] coupled an SVR model with adaptive insensitive factors to predict monthly streamflow, which was proven to be effective and to have high accuracy in streamflow prediction. Terzi and Ergin [20] used autoregressive (AR) modeling, gene expression programming (GEP), and adaptive neuro-fuzzy inference system (ANFIS) to predict the monthly mean flow of a watershed in Turkey. The results indicated that the developed models had good performance. Fan et al. [21] established a stepwise cluster forecasting (SCF) model for monthly streamflow forecasting, which effectively reflected the nonlinear and discrete relationships between climatic factors and streamflow. In general, these data-driven techniques can effectively simulate hydrological elements by capturing the complex interrelationships among the multiple hydrometeorological inputs. However, these models can often be flawed when predicting outliers (such as flood events), leading to illusory relationships between the response and independent variables [22].
To overcome these limitations, in this study, the copula method is proposed to flexibly construct the joint distribution to describe the complicated dependence structure between stochastic variables. Copula functions have been extensively applied to construct multivariate models and forecasting in several areas such as flood frequency and drought analysis, rainfall and climate predictions, financial risks, and energy [23][24][25][26]. However, it is difficult to derive multivariate copulas directly. Fortunately, vines known as pair copula constructions (PCCs) can describe the correlation structures between high-dimensional response-independent variables, providing an efficient and flexible tool to analyze the dependency structures between complex coupled correlated variables [27]. Moreover, the vine copulas coupling the quantile regression provide a more complete statistical analysis of random relationships between stochastic variables, such as tail or asymmetric dependence. Specially, quantile regression (QR) was introduced by Koenker and Bassett to estimate the conditional quantiles [28]. Given the distribution of the variables, the QR method can capture the total variation, heavy tail, skewness, and kurtosis of variables and can support the calculation of confidence intervals. Moreover, the method can estimate the levels of risk in extreme cases [29,30]. Quantile regression has been successfully applied in various scientific fields, such as economics, finance, and medicine [31][32][33]. Therefore, this study integrates the copula and quantile regression methods to explore the complex dependence among variables. Notably, the data-driven model is often influenced by the division of training and validation data sets. In many cases, the simulation and validation effects of the model are often affected by the data inputs, especially in a changing climate environment. Therefore, in order to overcome the possible influence of different data inputs on the model and randomness errors in the simulation process, the calibration and verification data sets are divided at certain points with the five-fold cross-validation method. In this study, the predictions are repeated five times using different training and test data sets. Therefore, this study aims to develop a C-vine copula-based quantile regression (CVQR) model for streamflow forecasting. The proposed CVQR model can construct a conditional copula prediction model to capture the relationship between streamflow and hydrometeorology variables. The developed method has advantages in (i) modelling the dependence among the multidimensional response-independent variables, (ii) revealing the complicated interrelationships among hydrometeorological factors, and (iii) outperforming MLR and ANN on issues related to upper tail dependence (i.e., flood events). These findings are very helpful to decision-makers in hydrological process identification and water resource management practices.
In this study, the CVQR model is applied to the Xiangxi River basin to illustrate its applicability in streamflow prediction with multiple hydrometeorological factors. Specially, the structure of this article is as follows. Firstly, the MLR, ANN, and CVQR models are introduced in Section 2. Next, the study area and database, and the method of evaluation for the various functions are depicted in Section 3. In Sections 4 and 5, relevant results from the proposed model applied in our research area, and a comparison with and discussion about the results of different models are described.

Model Development
In this study, multiple linear regression (MLR), artificial neural network (ANN), and the proposed C-vine copula-based quantile regression (CVQR) models are used for streamflow forecasting. In the model development section, the MLR, ANN, and CVQR models are described, which together constitute the main modules of the proposed framework shown in Figure 1. Generally, the framework of this study entails the next four steps: (1-2) fitting and standardizing the predictors (i.e., x 1 , . . . x n−1 ) and predicted variable (x n ); (3) simulating the monthly streamflow for the calibration process using the MLR, ANN, and proposed CVQR models; and (4) performing monthly streamflow prediction during the calibration and verification periods based on the results of step 3 and comparing the results of R 2 , RMSE, and NSE for each model.

Multiple Linear Regression (MLR)
The purpose of multiple linear regression (MLR) is to investigate the relationship between the independent variables and a dependent variable. Assuming that the dependent variable y is a function of n independent variables x 1 , x 2 , x 3 , ..., x n , then the MLR can be expressed as follows: where a indicates the intercept; b 1 , . . . , b n are the slope coefficients of the corresponding independent variables; e is the random error; and y represents the independent variable. For more details, please refer to Yan and Su [34]. In this study, a generalized linear regression model is used to fit the relationship between the response variable y (monthly streamflow data) and the explanatory variables x (other hydrometeorological factors), and then, the model is used to predict the streamflow (y) with the new observations (x).

Multiple Linear Regression (MLR)
The purpose of multiple linear regression (MLR) is to investigate the relationship between the independent variables and a dependent variable. Assuming that the dependent variable y is a function of n independent variables x1, x2, x3, ..., xn, then the MLR can be expressed as follows:

Artificial Neural Networks (ANNs)
An artificial neural network is an information processing system inspired by biological neural networks (such as the brain). Artificial neural networks can model the complex relationships between the input and output by simulating human learning [35]. Neural networks can be described as simple processing nodes or neurons, which generally include inputs, weights, a sum function, an activation function, and outputs and perform the corresponding numerical operations in a specific order [36]. An ANN model is usually made up of three parts: the input layer, the hidden layer, and the output layer, each of which do not have a unique number of layers. Multilayer feedforward ANNs, also known as multilayer perceptron, are commonly used in drought and water resource management and contain one input layer, one or more hidden compute node layers, and one output layer [37]. The three-layered ANNs can be expressed as follows: iutput ith node f or the hidden layer H output ith node f or the hidden layer H output kth node f or the output layer O where w ij is the weight between node i of the hidden layer and node j of the input layer; w ki is the weight between the ith hidden layer node and the kth output layer node; b hi and b ok are the bias weights of ith node for the hidden layer and of the kth node for the output layer; and ϕ() and ψ() indicate the activation functions of the hidden and output layers, respectively. In this study, the multilayer feedforward ANNs with the back-propagation algorithm are used for monthly streamflow forecasting, and the number of hidden nodes is determined as five by the trial and error method. For more details, refer to Tan et al. [38].

Development of C-Vine Copula-Based Quantile Regression (CVQR) Model
In general, vine copulas are represented using a graph called R-vine, which consists of a series of trees (undirected acyclic graphs) [39]. Specially, the hierarchical structure, called a regular vine (R-vine), contains a series of connected trees T := (T 1 , T 2 , . . . , T d ) along with the series of edges E(T) := E 1 ∪ E 2 ∪ . . . ∪ E d−1 and the series of nodes N(T) := N 1 ∪ N 2 ∪ . . . ∪ N d−1 . However, regular vines in terms of pair-copulas are still very general and do not have unique decomposition. Thus, the canonical vine (C-vine) and the D-vine are two most common structures of regular vines [40]. C-vine has a stellar structure in their tree sequence, while D-vine has a path structure. In hydrological field in this study, the monthly streamflow is affected by various climatic and hydrological factors. Therefore, the runoff factor that has a strong dependence on all other variables is selected as the first root for C-vine construction instead of D-vines. Here, two five-dimensional examples of possible tree sequences are shown in Figure 2.

Copula Function
The general expression of bivariate copulas can be written as follows: where (x, y) are correlated random variables. θ can often be derived from Kendall's τ as a preliminary estimation, and (u x , u y ) are the marginal cumulative distribution functions of x and y, respectively. Kendall's τ is the rank correlation coefficient proposed by Kendall [41].
Let (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ) be a set of observations of the joint random variables X and Y, respectively, and empirical Kendall's τ can be defined as τ = 2(C n − D n )/n(n − 1), where C n and D n indicate the number of concordant and discordant pairs, respectively. this study, the monthly streamflow is affected by various climatic and hydrological factors. Therefore, the runoff factor that has a strong dependence on all other variables is selected as the first root for C-vine construction instead of D-vines. Here, two five-dimensional examples of possible tree sequences are shown in Figure 2.

Copula Function
The general expression of bivariate copulas can be written as follows: , ; x y H x y C u u θ = where (x, y) are correlated random variables. θ can often be derived from Kendall's τ as a preliminary estimation, and (ux, uy) are the marginal cumulative distribution functions of x and y, respectively. Kendall's τ is the rank correlation coefficient proposed by Kendall [41]. Let (x1, y1), (x2, y2), ..., (xn, yn) be a set of observations of the joint random variables X and Y, respectively, and empirical Kendall's τ can be defined as  A d-dimensional copula C: [0, 1] d → [0, 1] with uniformly distributed marginals U (0, 1) on the interval [0, 1] was introduced by Sklar [42]. According to Sklar's theorem, every joint cumulative distribution function (CDF) F on R d with marginals F 1 (x 1 ), F 2 (x 2 ), . . . , F d (x d ) can be written as follows: Similarly, the multivariate density f (x 1 , . . , f d (x d ) and join probability density of copula c (u 1 , u 2 , . . . , u d ) can be written as follows: and vice versa: are the inverse distribution functions of the marginals.

Vine Copulas
For actual statistical inference, a d-dimensional copula density c can be decomposed into a product of d (d−1)/2 so-called pair-copula constructions (PCCs) based on bivariate (conditional) copulas [43]. The PCCs involve marginal conditional distributions of the form F(x |ω ). Joe [44] showed that, for every j, where ω = ω 1 , . . . , ω j , . . . , ω n is a n-dimensional vector, ω j is an arbitrarily selected component of the vector ω, and ω −j is a vector of ω without the jth component; h(x |ω ) is the conditional distribution function given the k-dimensional vector ω (i.e., h-function) [43].
Then, the C-vines with one node connected to all others is the focus of this study (as shown in a). The density of the d-dimensional C-vine can be factorized as follows [45]: where c i,i+j|1:(i−1) are the bivariate (conditional) copula densities, index j indicates the trees, while i runs over the edges in each tree.
In order to understand the decomposition of C-vine structures, only 5-dimensional C-vine structure is taken as an example to show the pair-copulas of vine structure decomposition in Figure 2a, that is, the joint density of C-vine copula can be decomposed into the following: where c 12 (F 1 (x 1 ), F 2 (x 2 )), denoted as c 12 , represents the density function of pair-copula with marginal distributions F 1 (x 1 ) and F 2 (x 2 ). According to the joint density of a C-vine copula presented in Equation (9), a Cvine copula with a certain order for given data can be fitted using all of the pair-copulas (conditional bivariate copulas). Then, the conditional distribution function C 34|12 and C 35|12 from tree 3, with edges F 3|12 (x 3 |x 1 , x 2 ), F 4|12 (x 4 |x 1 , x 2 ), and F 5|12 (x 5 |x 1 , x 2 ), can be obtained using Equation (7) along with C 3|12 , C 4|12 , C 5|12 and C 12 , C 13 , C 14 , C 15 from the first two trees. In general, the whole inferences for the conditional distribution function of predicted variable x 5 given x 1 , x 2 , x 3 , and x 4 can be decomposed recursively from the bivariate copulas as follows:

CVQR Model
Generally, taking the bivariate copula as an example, the condition distribution function of Y under the condition of X = x, i.e., F Y|X (y|x) can be expressed as follows: where u = F X (x), v = F Y (y) are the cumulative distribution function of y and x, respectively. For any probabilities τ ∈ (0, 1) (e.g., τ = 0.05, 0.1, . . . , 0.95), the τth quantile function of Y given X = x from C 1 (F X (x), F Y (y)) can be derived from the h-function: where h −1 () indicates the inverse conditional distribution function (inverse h-function) of a given parametric bivariate copula.
In this study, the main purpose of the C-vine copula-based quantile regression (CVQR) model is to predict the quantile of a response variable Y given the outcome of some predictor variables. For the five-dimensional case, according to Equations (10)-(13), the τth conditional quantile function of x 5 , Q x 5 (τ|x 1 , x 2 , x 3 , x 4 ), can be derived from the recursive formulation: A C-vine copula-based quantile regression (CVQR) model is developed for monthly streamflow forecasting coupling a C-vine copula model and a quantile regression method within a general optimization framework. Specially, the CVQR model is constructed by modelling the distributions of predictors (i.e., x 1 , . . . x n−1 ) and predicted variable (x n ) with the selected n-d C-vine (structure), i.e., unconditioned and conditioned pairs (e.g., Equation (9)); then, the predicted variable of x n is derived from the conditional distribution function (Equations (10)- (14)). In detail, the predicted variable x 5 can be obtained from the given predictor variables x 1 , x 2 , x 3 , and x 4 . Firstly, the Monte Carlo simulation is used to generate a sample of 5000 uniformly distributed random numbers spaced [0, 1] as the quantiles τ. Secondly, the 5000 implementations of x 5 can be generated using Equation (14), with one random number generated for each quantile τ. Then, the average of these realizations is considered the general prediction.
A recommended tool for statistical inference of vine copulas is statistical software R with the VineCopula package (http://CRAN.R-project.org/, accessed on 20 January 2021). In this study, the Archimedean copula family (Frank, Clayton, and Gumbel copulas [46,47]) and Normal and Student's t copulas are employed to build the C-vine structures. The optimal bivariate copula families associated with parameter estimation are selected and calculated depending on the AIC and BIC using the maximum likelihood estimation (MLE) for the first C-vine tree. Then, based on these pair-copula families and the corresponding estimated parameters, the h-function can be used to calculate and specify the pair-copula input for the second C-vine tree. The process is iterated tree by tree until the last paircopula is evaluated. The building steps were detailed in Brechmann and Schepsmeier [48]. Meanwhile, the goodness-of-fit test includes the λ-function and Kolmogorov-Smirnov (KS) test with p-values and statistics (Sn) to check whether the selected copula is suitable for describing the observed dependencies, where the λ function is defined as follows: where K(v, θ) = P(C(u 1 , u 2 |θ) ≤ v) is the Kendall distribution function of copula C with parameter θ, and v ∈ [0, 1], and (u 1 , u 2 ) are the marginal cumulative distribution func-tions of copula C. The λ-function can be obtained by the 'BiCopLambda' function in the VineCopula package. For more descriptions, please refer to Genest and Favre [49], and Genest and Rivest [50]. In general, the main procedures of the proposed CVQR model for monthly streamflow predictions can be expresses as follows: Step 1: Fit optimal marginal distributions, denoted as u i = F i (x i ), (i = 1, 2, . . . , d); Step 2: Model the joint probability distributions C (u 1 , u 2 ), . . . , C (u 1 , u d ), and then, the C-vine copula is iterated tree by tree until the last pair-copula is evaluated Step 3: Calculate the conditional distribution of the predictive variable (monthly Step 4: Generate uniformly distributed random numbers τ, and then, predictive variable is derived from the inverse function of the conditional distribution in Step 3, that is,

Study Area and Datasets
Application of the proposed approach is proven to forecast monthly streamflow in the Xiangxi River basin, which is located in the western Hubei province and is part of the Three Gorges Reservoir region with a basin area of about 3100 km 2 (between 30 • 57 -31 • 34 N and 110 • 25 -111 • 06 E, shown in Figure 3) in China. The Xiangxi River, originating in the Shennongjia Mountain area, is a tributary of the Yangtze River with a main stream length of 94 km [51,52]. Due to the influence of typical subtropical continental monsoon climate characteristics, the annual precipitation in this basin is between 670 and l700 mm [53]. The annual average temperature of this region is 15.6 • C and ranges between 12 • C and 20 • C.
The amount of streamflow is affected by many factors, a large part of which involve geographical and climatic conditions. Specifically, the climatic conditions consist of a collection of meteorological variables such as the air temperature ( • C) and the precipitation (mm). Previous studies have proven that precipitation has a significant effect on both short-and long-term streamflow [54,55]. Therefore, the total monthly precipitation is used as a predictor in this study. Most importantly, the initial catchment conditions are nonnegligible factors affecting the streamflow generation and confluence. Moreover, the monthly average temperature is also applied as a predictor for streamflow forecasting [56]. It is noted that observations of hydrological processes tend to vary with time [57]. The occurrence of rainfall events is closely related to the fluctuation in streamflow, especially the distribution of a rainfall event is crucial to the influence of peak discharge (i.e., flood events). In addition, considering the climatic characteristics of the watershed, the snowmelt runoff (mainly in winter) is relatively little, so the influence of snowmelt runoff is ignored. The available hydrological (streamflow, unit: m3/s) and meteorological data (temperature and precipitation) from 1962 to 2009 were obtained from the Xingshan Hydrometric Station (located at 110 • 45 0" E, 31 • 13 0" N, as shown in Figure 3), which was provided by the Hydrological Bureau of Xingshan County. Considering that Xingshan Hydrometric Station is the largest hydrological control station in Xiangxi watershed (the representative station of the Three Gorges Hydrological Zone between 1000-3000 km 2 ), the hydrometeorological data of Xingshan Station was used for the streamflow forecasting. Moreover, as a lumped hydrological model, good results have also been achieved in the process of streamflow simulation in the earlier study of Kong [51].

Evaluation Measures
In order to evaluate the performance of the developed models, in this study, four commonly used statistical evaluation methods are selected for model evaluation, including the coefficient of determination (R 2 ), the root mean square error (RMSE), and the Nash-Sutcliffe efficiency coefficient (NSE) and Mean Absolute Error (MAE). Then, the formulae for R2, RMSE, NSE, and MAE can be written as follows:

Evaluation Measures
In order to evaluate the performance of the developed models, in this study, four commonly used statistical evaluation methods are selected for model evaluation, including the coefficient of determination (R 2 ), the root mean square error (RMSE), and the Nash-Sutcliffe efficiency coefficient (NSE) and Mean Absolute Error (MAE). Then, the formulae for R 2 , RMSE, NSE, and MAE can be written as follows: where n indicates the total number of observations (or predictions), K is the number of repeated forecasting periods (K = 5), Q i and P i are the observed and simulated values; Q avg and P avg are the averages of all of the observed and simulated values, respectively. The 90% confidential interval containing ratio (CR90) and its dispersion index (DI) are also used to evaluated the reliability and sharpness of the probabilistic predictions, respectively. CR90 is the ratio of observations covered by the 90% prediction interval. The range is between 0 and 1, and the best effect is 0.90. DI is the ratio of the average width of the 90% prediction interval to the observed value, with the lower the value, the better the prediction [60].
where k i indicates the ith observation o i in the 90% confidence interval with the bound [s l (i), s u (i)] and N is the number of observations. Notably, from the perspective of flood forecasting, A high CR90 is still insufficient to illustrate a good prediction, and a high corresponding DI indicates an overestimation of uncertain boundaries.
To further illustrate the applicability of the CVQR model in streamflow forecasting, the relative estimated root mean square error (RRMSE) and relative mean absolute error (RMAE) are used to evaluate the comparison between the CVQR, ANN, and MLR models at different quantiles [61]: in which the RMSE and MAE of the three models are acquired from Equations (17) and (18); RMAE and RRMSE stand for the relative performances of the proposed model (CVQR), for which values greater than one suggest a worse relative performance compared to the proposed model.

Marginal Probability Distribution Functions of C-Vine Model Variables
A two-step approach that separately evaluates the dependence function and the marginals is of great advantage in stochastic modeling of multivariate data, since many manageable distribution models are available for simulating the marginal distributions. In this study, in order to build the CVQR model, firstly, after standardization, the data are fitted with some parametric distribution functions, including the gamma, lognormal, general extreme value (GEV), and Pearson type-III (P-III) distributions, which are commonly used parameter distributions to quantify the probability distribution characteristics of hydrometeorological variables in the hydrological process [62][63][64]. The expressions for the gamma, GEV, lognormal, P-III, and the associated parameter values for probability functions (PDFs) are shown in Table 2. The parameters of the above distributions were obtained through the Maximum Likelihood Estimation (MLE) method. Table 2. Parameters of optimal marginal distribution functions.  The goodness-of-fit (GOF) of each distribution was computed by using RMSE and AIC values to select the most appropriate distribution for fitting each individual variable. The results of GOF are presented in Table 3. The results demonstrate that all of the proposed four distribution models can be applied for processing the distributions of the variables (i.e., St-1, Pt-1, St-2, St-12, Tt, Pt, and St), except that the P-III distribution is not suitable for the average temperature (Tt). Specially, the P-III distribution are most suitable for the streamflow data series (i.e., St-1, St-2, St-12, and St), the Gamma distribution would perform best when fitting the distributions of precipitation data (Pt-1 and Pt), and the GEV method has advantages in quantifying the distributions of the average temperature (Tt).

Selection and Estimation of C-Vine Copula
In this section, we introduce how to define the C-vine structures according to the learning data obtained from Section 4.1. Figure 5 shows the pair plots of the learning data set. The histograms along on the diagonal represent the marginal distributions discussed in Section 4.1. Additionally, Figure 5 (above the diagonal) indicates the values of Kendall's τ between two pairs of the variables, and the results show that the correlation between the variable St-1 and other variables is approximately stronger than that other pair variables (i.e., Kendall's τ = 0.65, 0.46, 0.33, 0.40, 0.32, and 0.46). Therefore, we define the variable St-1 as the central variate 1 (e.g., in Figure 1) in the first tree. In detail, considering that the monthly streamflow (S) is affected by various climatic and hydrological factors, such as temperature and precipitation, the monthly streamflow at last month (St-1) is selected as the first root in the first tree. Moreover, the predicted variable (St) is placed last because it is the more convenient option to evaluate the probability of St and to predict the St. The rest of the tree structures follow this principle and so forth (e.g., as shown in Figure 1). In general, the order of these variables is 1-St-1, 2-Pt-1, 3-St-2, 4-St-12, 5-Tt, 6-Pt, and 7-St. Figure 5 (below the diagonal) shows scatter plots for each pair of learning data and provides a basis for revealing the dependence structures between the variables. For example, we may find that there exists a lower tail correlation between St-1 and St-2. Obviously, the Clayton copula can be used to fit the relationship between variables St-1 and St-2.
According to the process of construction of the bivariate copula, the vine copula is constructed by a series of pair-copulas iterated tree by tree. Table 4 presents the C-vine structures consisting of 6 trees, 21 nodes, and the corresponding bivariate copulas with the parameters for every edge and KS test statistics. As mentioned above, the variables from 1 to 7 correspond to St-1, Pt-1, St-2, St-12, Tt, Pt, and St, respectively. In fact, due to the flexibility of the vines' structure, this order of the variables above is only such structure. It is the best arrangement made by considering the dependence of the variables in practical applications in this study. Meanwhile, in the process of constructing the paired copula, the vine copulas are simplified by ignoring the conditional variables.
λ-function is used to test the goodness of fit for the estimation of bivariate copula in each C-vine structure. Figure 6 illustrates the dependence of St-1 and other variables with the main node in tree 1 using λ-function. The results indicate that the selected and empirical copula are consistent with each other in all edges of tree 1. As shown in Figure 6a, the empirical λ-function (black) of the observations and the theoretical λ-function (grey) of the fitted copula coincide with each other, which means that the fitted copula is consistent with the empirical values. Combined with the KS test results in Table 4, all other selected pair-copulas obtained the optimal fitting results with p > 0.05 for the KS test.       Figure 7 shows a comparison of the predicted and observed streamflow acquired by    (Figure 7a), the predicted value is slightly underestimated in the case of high flow observation values (1980)(1981)(1982)(1983)(1984)(1985)(1986), and vice versa, the predicted value is slightly overestimated during 2004-2009. Due to the inherent characteristics of the algorithm, the predicted values even become negative at some low-flow records (e.g., 1999 and 2000).

Predicted Monthly Streamflow of MLR, ANN, and C-Vine Models
The ANN model performs better than the MLR model in the calibration period (Figure 7b). The ANN model obtains an R 2 of 0.75, an NSE of 0.73, and an RMSE of 15.57 in the calibration period. Similar to the results of the MLR model, the ANN model, with values of R 2 at 0.72, NSE at 0.69, and RMSE at 16.53, performs worse in the validation period than that in the calibration period. Moreover, as presented in Figure 7b, the ANN model also underestimates some streamflow during the high flow periods (e.g., 1963-1964) but overestimates more records during 2004-2009. As presented in Figure 7c, the predicted monthly streamflow using the CVQR model could satisfy the observed values well. In the calibration period, the values of R 2 , NSE, and RMSE obtained by the CVQR model are 0.73, 0.70, and 16.75, respectively. In the validation period, the values are 0.74, 0.71, and 16.13, which shows that the performance of CVQR model in the validation period is similar to that in the calibration period. The CVQR model underestimates some high flow values (e.g., during 1980-1986). Generally, compared with MLR and ANN models, the CVQR model performs best in the calibration period for monthly streamflow prediction. The CVQR model can effectively capture both linear and nonlinear dependence of these input variables (e.g., temperature, precipitation, and streamflow). Additionally, the CVQR model based on the multivariate copula functions could effectively reveal the correlation structures between predictor-response variables, which provides a potent and adaptable tool to model the dependence of such complex and jointly correlated variables. and RMSE obtained by the CVQR model are 0.73, 0.70, and 16.75, respectively. In the validation period, the values are 0.74, 0.71, and 16.13, which shows that the performance of CVQR model in the validation period is similar to that in the calibration period. The CVQR model underestimates some high flow values (e.g., during 1980-1986). Generally, compared with MLR and ANN models, the CVQR model performs best in the calibration period for monthly streamflow prediction. The CVQR model can effectively capture both linear and nonlinear dependence of these input variables (e.g., temperature, precipitation, and streamflow). Additionally, the CVQR model based on the multivariate copula functions could effectively reveal the correlation structures between predictor-response variables, which provides a potent and adaptable tool to model the dependence of such complex and jointly correlated variables.  Table 5 illustrates the general resulting statistics from the ANN, MLR, and CVQR models for forecasting during the calibration and validation periods. For the results of R 2 , NSE, and RMSE, these results indicate that the ANN model performs best in the calibration period compared to the MLR and CVQR models while the proposed CVQR achieves the best results among the validation period compared to other models. However, the results show that ANN and CVQR performed best in terms of 90% confidence interval  Table 5 illustrates the general resulting statistics from the ANN, MLR, and CVQR models for forecasting during the calibration and validation periods. For the results of R 2 , NSE, and RMSE, these results indicate that the ANN model performs best in the calibration period compared to the MLR and CVQR models while the proposed CVQR achieves the best results among the validation period compared to other models. However, the results show that ANN and CVQR performed best in terms of 90% confidence interval prediction (CR90 and DI) while MLR performed worst. The result, on the other hand, shows that MLR is not effective in quantifying nonlinear relationships among hydrological variables. In general, the results show that CVQR performs best in the calibration period for monthly streamflow prediction compared to ANN and MLR models. Moreover, the CVQR and ANN models can reflect the complex nonlinear relationships between the hydrological and meteorological factors. Therefore, in order to understand the prediction performance of CVQR in the tail correlations, the comparison of regression predictions between the CVQR and ANN models at different quantiles are explored in the next section.

Probabilistic and Interval Predictions Obtained by the CVQR Model
As mentioned in Section 2.3, according to the C-vine copula-based quantile regression (CVQR) model, for any quantile τ ∈ (0, 1), the τth conditional quantile function of the predicted variable can be obtained. In this section, the relationships between the streamflow (St) abnormalities and other hydrometeorological indices at different levels of quantiles τ (i.e., τ = 0.05, 0.25, 0.50, 0.75, and 0.95) are explored.
The median prediction (i.e., α = 0.5) provides a general level about the monthly streamflow, while extreme values (e.g., flood, drought) in the upper tail (τ ≥ 0.75) or lower tail (τ ≤ 0.25) indicate the worst forecast scenarios. Table 6 describes the relative performance of the ANN model with respect to the CVQR model at different quantiles. It can be seen that the proposed CVQR model outperforms the ANN model at quantiles τ = 0.75 and 0.95 and that the ANN model performs better than the CVQR model at quantiles τ = 0.25 and 0.50, which indicate that the proposed CVQR model could perform better at upper extreme events (i.e., τ = 0.75 and 0.95 quantile levels) and that the ANN model provides good results in some cases of the mean and lower quantile values. A scatter diagram of the simulated streamflow at different quantiles (τ = 0.05, 0.25, 0.5, 0.75, and 0.95) by the ANN and CVQR models with five-fold K cross-validations is depicted in Figure 8. The results also show that the proposed CVQR model performs a better fit in most cases, especially in the process of upper tail predictions, which are consistent with the earlier study of Kong in Xiangxi River basin [51]. While the ANN tends to overfit overestimated in the aspect of upper tail prediction. In general, the CVQR model shows a higher accuracy at upper tail levels while the ANN model provides overestimation predictions. The results indicate that the CVQR model can effectively capture upper tail dependences and has a relatively accurate assessment of the impact of upper extreme conditions (i.e., flood) in Xiangxi watershed.   Figure 9 depicts the simulated streamflow with quantiles of 5% and 95% (90% uncertainty prediction intervals) using the ANN and CVQR models. The results indicate that the quantiles τ = 5% and 95% values of the predicted variable cover most of the observations and effectively reflect the fluctuation of the actual streamflow for the two models. Usually, hydrological forecasting in extreme cases can help policy makers make timely policy responses within the maximum risk range. The predicted 90% CI can reflect the fluctuation trend and abnormal value of the records well, whereas compared with the CVQR model, the ANN model often overestimates peaks in the prediction of flood events. Therefore, the CVQR model can effectively capture the complex nonlinear dependences among hydrological meteorological factors. This is of great significance to the practice of water resource management, for example, in rainy and dry seasons, managers can well prevent and control the occurrence of flood and make timely corresponding countermeasures.  Figure 9 depicts the simulated streamflow with quantiles of 5% and 95% (90% uncertainty prediction intervals) using the ANN and CVQR models. The results indicate that the quantiles τ = 5% and 95% values of the predicted variable cover most of the observations and effectively reflect the fluctuation of the actual streamflow for the two models. Usually, hydrological forecasting in extreme cases can help policy makers make timely policy responses within the maximum risk range. The predicted 90% CI can reflect the fluctuation trend and abnormal value of the records well, whereas compared with the CVQR model, the ANN model often overestimates peaks in the prediction of flood events. Therefore, the CVQR model can effectively capture the complex nonlinear dependences among hydrological meteorological factors. This is of great significance to the practice of water resource management, for example, in rainy and dry seasons, managers can well prevent and control the occurrence of flood and make timely corresponding countermeasures. Sustainability 2021, 13, x FOR PEER REVIEW 21 of 24 Figure 9. Comparison of the predicted and observed monthly streamflow using the CVQR and ANN models with τ = 5% and 95% (90% uncertainty prediction intervals).

Conclusions
In this study, a C-vine copula-based quantile regression (CVQR) model was developed to model the relationship between streamflow and other hydrometeorological variables, such as temperature and precipitation. The proposed CVQR model couples vine copulas (known as pair copula constructions) with a quantile regression method, which was applied to monthly streamflow forecasting in the Xiangxi River basin.
Specifically, the CVQR model could process multidimensional data problems while satisfying the wide range of dependence. Meanwhile, the CVQR model can effectively capture the upper correlations between independent and dependent variables (i.e., flood events). In this paper, comparisons between the proposed CVQR model and the MLR and ANN models for monthly streamflow prediction are explored. The results indicate that the performance of the CVQR model is most effective for monthly streamflow forecasting in the calibration period. The performance of the MLR model in extreme quantile (flood events) and confidence intervals is the worst and is mainly determined by the inherent characteristics of the algorithm. Compared with the MLR model, the ANN model has good advantages in this aspect of flood events and confidence intervals, but it tends to be over-fit in the process of peaks prediction. Undeniably, the CVQR model can effectively capture both the linear and nonlinear dependence of these input variables and to perform best when dealing with upper tail correlation issues (i.e., flood events) in this study.
In summary, this proposed method can effectually depict the complicated dependencies between the hydrometeorological variables. However, there still remain some Figure 9. Comparison of the predicted and observed monthly streamflow using the CVQR and ANN models with τ = 5% and 95% (90% uncertainty prediction intervals).

Conclusions
In this study, a C-vine copula-based quantile regression (CVQR) model was developed to model the relationship between streamflow and other hydrometeorological variables, such as temperature and precipitation. The proposed CVQR model couples vine copulas (known as pair copula constructions) with a quantile regression method, which was applied to monthly streamflow forecasting in the Xiangxi River basin.
Specifically, the CVQR model could process multidimensional data problems while satisfying the wide range of dependence. Meanwhile, the CVQR model can effectively capture the upper correlations between independent and dependent variables (i.e., flood events). In this paper, comparisons between the proposed CVQR model and the MLR and ANN models for monthly streamflow prediction are explored. The results indicate that the performance of the CVQR model is most effective for monthly streamflow forecasting in the calibration period. The performance of the MLR model in extreme quantile (flood events) and confidence intervals is the worst and is mainly determined by the inherent characteristics of the algorithm. Compared with the MLR model, the ANN model has good advantages in this aspect of flood events and confidence intervals, but it tends to be over-fit in the process of peaks prediction. Undeniably, the CVQR model can effectively capture both the linear and nonlinear dependence of these input variables and to perform best when dealing with upper tail correlation issues (i.e., flood events) in this study.
In summary, this proposed method can effectually depict the complicated dependencies between the hydrometeorological variables. However, there still remain some flaws in the process of model building. Pair-copula is joined by marginal distributions irrespective of the conditional variables, which simplifies the construction of vine copulas [65]. The structure of PCCs is often not unique due to the flexibility of vine copulas [66]. Moreover, the proposed model can be used to explore temporal and spatial dependencies among hydrological series while spatial dependence is not considered in this study [67]. Consequently, the model will be explored further in the application process of future extensions.  Data Availability Statement: Publicly available datasets were analyzed in this study. The data presented in this study are available at http://data.cma.cn/, accessed on 20 January 2021.