Vine-Copula-Based Quantile Regression for Cascade Reservoirs Management

: This paper features an application of Regular Vine (R-vine) copulas, a recently developed statistical tool to assess composite risk. Copula-based dependence modelling is a popular tool in conditional risk assessment, but is usually applied to pairs of variables. By contrast, Vine copulas provide greater ﬂexibility and permit the modelling of complex dependency patterns using a wide variety of bivariate copulas which may be arranged and analysed in a tree structure to explore multiple dependencies. This study emphasises the use of R-vine copulas in an analysis of the co-dependencies of ﬁve reservoirs in the cascade of the Saint-John River basin in Eastern Canada. The developed R-vine copulas lead to the joint and conditional return periods of maximum volumes, for hydrologic design and cascade reservoir management in the basin. The main attraction of this approach to risk modelling is the ﬂexibility in the choice of distributions used to model heavy-tailed marginals and co-dependencies.


Introduction
A successful reservoir system management requires pertinent information to make appropriate decisions about the release or diversion of water at a particular time t or over different periods of the year. Such choices are crucial and enormously complex because of the multiple uses of reservoirs such as flood control, hydroelectric production, irrigation, drinking and industrial water supply, navigation and water sports. For instance, fixing water upper extremes is very advantageous in flooding problems management, and coping with the destruction of water structures. In the same way, the utility of lower extremes management is establishing guidelines for drought management, water shortage management, and other disasters on flora and fauna. In practice, this type of management is dictated by well-defined rules and operation policies, often in the form of curves stipulating which action should be taken, depending on the current status of the system. The decision calls for optimization techniques examining all possible alternatives [1]. There is a multitude of methods that can be used for flood control management, for example implicit stochastic optimization (ISO) [2], explicit stochastic optimization (ESO) and parameterization-simulation-optimization (PSO) [3]. Each of these methods has its advantages and limitations, as discussed in [4]. More recently, the multiple reservoir management method, called the Aggregation-decomposition (AGDP) method, has been introduced to solve dimensionality problems in large-scale reservoir systems. It has been the subject of several studies, in particular those of [4][5][6] where different operating rules are developed.
In this framework, we discuss the best hydrological modelling approach for reproducing observed streamflow and handling extremes, low and high quantiles. Our method also deals with information transfer from one station to another, which improves the knowledge of geographical mechanisms behind an event. The derivation of the contribution of each upstream station in the downstream station quantiles can help predict runoff in ungauged stations and determine the responsible upstream stations of extreme downstream events.
To this aim, we introduce a new approach based on R-vine copulas. As their name suggests, they are based on the combination of theories of R-vines and copulas. The latter have been extensively used in hydrology. For instance, for the estimation of extreme values, Ref. [7] considered flooding as a joint behavior of peak and volume. Asymmetric copulas were used in [8] to model the structure of spatial interdependence between precipitation rate and its occurrence. Ref. [9] used copulas to model the correlation structure of runoff time series pairs and transfer properties from one basin to another. Ref. [10] used copulabased multivariate analysis with a joint return period to define particular T-year rainfall patterns. However, the multivariate case is not easy to handle with copulas. R-vine copulas propose the division of the multivariate structure to pair copulas, which are easier to manipulate. This approach has many remarkable properties, as detailed in [11,12]. Two particular types of R-vines (see Figure 1) are the canonical (C-) vine which has a star structure and the drawable (D-) vine, which has a path structure. The order of the nodes of the first tree entirely determines the construction of these two R-vine types, making them easy to use. R-vines are increasingly being used, principally in finance and climatology. For example, Ref. [13] use vine-copulas to analyse the interdependencies between bank and insurers in the financial market. Ref. [14] use C-vines to analyse stock exchange. Ref. [15] demonstrate the superiority of vine-copulas over conventional copulas when modelling the dependence structure of a credit portfolio. Ref. [16] used vine-copulas in trading strategies and asset allocation. Ref. [17] used R-vine models to investigate the existence of different global dependence regimes. In climatology, Ref. [18] considered a copula approach to model the dependence of multivariate factors of rainfall. Ref. [19] proved the existence of a relationship between the vine-copula parameters and the station distances in a daily mean temperature study. Ref. [20] used vine copulas to incorporate all relevant dependencies between storm variables. Ref. [21] develop a method using vine copulas to estimate prediction uncertainty in an environmental framework. Ref. [22] obtained a stochastic simulation of stream-flow scenarios using vine copulas. Ref. [23] use vine copulas to model the multivariate dependence among the glacier discharge and other related meteorological variables. More recently, Ref. [24] used vine-copulas to forecast the release of a hydro-power reservoir conditioned by its initial volume. Vine-copulas have also been used in diverse other fields, for example in the analysis of aviation safety [25] and to capture spatial and temporal dependence between hydroelectric power plants [26]. We can find in [27] a detailed survey of vine-copulas applications.
Our approach, called "the reduced-network vine" combines all the information available on all sections of a watershed. We prove throughout this article that, the notion of a "basin", can be captured through high quantile modelling. This approach will help to know the exact quantity of the quantiles of the volume of water. It will help quantify the minimum and maximum volumes expected in a dam and how to manage water requirements or face expected floods. It will also help decide how much excess water is to transfer and how much water should be released.
We capitalize on the argumentation of [28] for regular vines and the one of [29] for D-vines about the performance of quantile regression to other regression models. In both cases, vine models make less restrictive assumptions on the choice of the covariates influencing the response and can solve analytically for conditional quantiles. Unlike the classical approach (Kendall's Tau (τ) R-vine), our method finds the quantile regression for the station downstream due to the new constraint of having the station downstream estimated. Moreover, using the reduced-network R-vine copes with the imposed order condition in a D-vine, making the restrictive model assumptions easier to obtain and allowing the formulation of the conditional form.  Our contribution aims to estimate the quantiles of hydrological variables in stations located downstream of a basin. We show how the quantile regression obtained from an R-vine copula model based on the basin's structure allows us to find the desired conditional quantiles. Besides, our method makes it possible to find the station(s) responsible for extreme hydrological events downstream, and therefore, leads to a better-cascade dam management. We illustrate our approach by analysing the co-dependencies of five reservoirs in the cascade of the Saint-John River basin in Eastern Canada. The developed R-vine copulas lead to the joint and conditional return periods of water's maximum volume and the hydrologic design and cascade reservoir management in the basin.
In Section 2, we recall the main definitions of copulas and R-vine-copulas and how to obtain conditional copula expressions. We also detail how to estimate the conditional quantiles. In Section 3, we present our data and the results of the new R-vine quantile regression method for the studied variables with a discussion. In Sections 4 and 5, we discuss our main contributions and some perspectives for further studies. Supplementary material to our work is in Appendices A and B.

Copulas
A bivariate copula C is a distribution C : [0, 1] 2 → [0, 1] with uniform margins. The theorem of Sklar [30] shows that for a bivariate distribution F with continuous margins F 1 and F 2 , there exists a unique copula C(., .), such that: The following functions, related to copulas, can be found in [31,32]: (2)

R-Vines
We also recall the notion of a vine as introduced in [11].
T 1 is a tree with nodes N 1 = {1, . . . , n} and a set of edges denoted E 1 .

3.
For As described in [11], V is a regular vine when m = n, T i is a connected tree with edge set E i and node set In addition, the proximity condition must hold, i.e., for i = 2, . . . n − 1, if a = {a 1 , a 2 } and b = {b 1 , b 2 } are two nodes in N i connected by an edge then #a ∩ b = 1.

R-Vine Copulas
We define the specification of the R-vine copula corresponding to an R-vine as in [33].
Definition 2. Let F = (F 1 , . . . , F n ) be a vector of continuous invertible distribution functions, V an n-dimensional R-vine and B = {B e /i = 1, . . . , n − 1; e ∈ E i } where B e is a bivariate copula and E i is the edge set of tree T i of the R-vine V. The triplet (F, V, B) is called an R-vine copula specification.
Bedford et al. [11,34] showed that the density of an R-vine-copula, specified by the assignment of an appropriate bivariate copula to the edges of the R-vine, is equal to the product of the conditional and unconditional copula assigned to its edges. More precisely, they proved the following theorem. Theorem 1. Let (F, V, B) be an R-vine specification on n elements. The distribution F is unique and its density is given by and f i denotes the density of F i for i = 1, . . . , n.
To obtain conditional copulas, Joe [35] showed that: where each term of the partial derivative of the copula must be obtained recursively.
To specify the necessary copulas for a pair-copula construction, we use the graphical structure of R-vines [36]. Adapting an R-vine copula specification to a given data set requires, in general, the following separate tasks, as detailed in [37].

1.
Selection of vine R (structure), i.e., selection of unconditioned and conditioned pairs to be used.

2.
Choice of a family of bivariate copula for each pair selected in the previous step.

3.
Estimation of the corresponding parameter(s) for each copula.
Other methods for selecting regular vine-copulas are investigated in [38].
We have tried to fit several combinations of copulas (Joe, Tawn, and two-parameter mix copulas) to our data. However, these copulas were not appropriate for our data and resulted in poor estimators. We have, therefore, limited our choice to copulas that best fit our data : Gumbel, Clayton and Frank copulas. These copulas are members of the Archimedean family. Because of their flexible structure, Archimedean copulas are the most commonly used to model dependence in hydrology [39]. Gumbel copula has a range between independence and perfect positive dependence, Clayton copula has more mass in the negative tail and Frank copula is a flexible copula which takes no restrictions in its parameter [40]. The distributions of these copulas and their properties can be found in [31]. The choice of bivariate copulas (conditional and unconditional) of each tree is made individually each time. In this work, we use the maximum likelihood method to obtain copula parameter estimators (MLE) [32], and retain the copulas minimizing the Akaike information criterion (AIC).

Quantile Regression
As demonstrated in [41], the R-vine-copula-based quantile regression showed its superiority to linear regression in terms of making inferences. The goal is to predict the quantile of a response variable Y given the outcome of certain predictor variables [29]. Hence, all the effort is made on the joint modelling of Y and X, and more particularly on the quantile function for α in [0, 1] [29,42]: using probability integral transforms [43,44]. More precisely, let V := F Y (Y) and U j := F j (X j ) we have that: Applying the inverse function F −1 Y , we have that An estimate of the conditional quantile can, therefore, be obtained by estimating the marginal distributions F Y , F j (j = 1, . . . , d) as well as the copula C V/U 1 ,...,U d : More details about R-vine-copula-based quantile regression are to be found in [29].

Data
The Saint-John River watershed is an inter-provincial watershed connecting Quebec and New Brunswick. It is also an international watershed connecting Canada and the United States. It covers an area of approximately 55,200 km 2 , of which just over half is in New Brunswick [45]. The Saint-John River is often known by an increase in water levels above the flood threshold in several collectivities, the most remarkable of which is that of spring (April-May) of the year 2008 [45].
Reservoirs in New Brunswick generally serve several purposes, of which flood control, both at tributary and mainstream level. Therefore, the joint operation rule should be able to coordinate not only the individual reservoirs but also the different uses of the water. From this perspective, the R-vine-copula-based quantile regression is considered suitable for a large basin such as that of the Saint-John River (Figure 2). To cope with flooding episodes in the Saint-Jean basin, the implementation of efficient management rules is essential for the decision-maker. It is within this framework that we try to derive management rules for a reservoir system by limiting flood losses at the level of five selected flood control sections, based on our database. In Figure 3, the structure of reservoirs determined by these control stations chosen to cover the entire basin can be seen. We also give the station codes and names in Table 1.
Among the characteristics of the hydrographs identified for all the stations and all the observation periods, we considered the peak flow and the flood volume [46][47][48]. We choose these two indices to illustrate the use of the R-vine-copula-based quantile regression approach on five sections of flood control. We then have a matrix of 10 components for each of the 42 years of the observation period (from 1975 to 2016). For the reader's convenience, the time series of the volume variables are represented in Appendix A.
We used the Anderson-Darling test as a criterion to choose the most fitted distribution to our data [49]. Gamma distribution (Pearson type III) usually represents the maximum annual flows, the volume of annual runoff. Weibull distribution is a particular case of the Generalized extreme value distribution (GEV) recommended for representing maxima [50]. These distributions have fat tails and are the most convenient to represent the extremes. Gumbel distribution is another special case of the GEV distribution when its Kappa shape parameter is zero. We represent the fitted distributions and the results of the Box-Ljung independence test in Table 2.

The New R-Vine
The (unique) reduced-network R-vine corresponding to the Saint-John basin is given in Figure 4. In Table 3, the fitted copulas corresponding to each edge of the R-vine with their parameters are provided. The associated fitted normalized contour plots of the selected copulas are presented in Figure 5 to visually investigate their influence.
To compare the proposed R-vine structure with a classical model, we construct the vine corresponding to our data using the maximum spanning tree method. The obtained Kendall's τ values summarized in Figure 6 suggest a D-vine structure represented in Figure 7. We compare the efficiency of our model by illustrating the basin dependency structure with that of the D-vine and an all-Gauss D-vine as a reference structure. The results of Table 4 show that the Tau D-Vine model has the highest MLE, followed by the new R-Vine then all Gauss D-Vine. For the results of the AIC and the Bayesian information criterion (BIC), we have the same classification. Moreover, based on Vuong's tests results [51], we cannot reject the hypothesis that the new R-vine and the Tau D-Vine are equivalent. For the peak flow variable, the maximum spanning tree method suggests an R-vine. The results of the efficiency comparison and Vuong's tests in Table 4 lead to the same conclusion as for the volume variable. But even with the results preferring the classically obtained vine-copulas to the new R-vine, we keep this last to do the regression because it gives consistent results with the geographical position of the stations.

Quantile Regression
The expression of the h-function of the reduced-network vine of the Saint-John basin is detailed in Appendix B. Next, we show the relative contribution of each station in the final estimate of the quantiles of the volume in the downstream station (station 15). In (a) and (b) of Figure 8, we represent the situations where the volume in station 15 is the most extreme. This occurs when the volumes in stations 3 and 7 are extreme simultaneously (scenario 5), the severity is high but, less when stations 7 and 8 are extreme simultaneously (scenario 8). In (c) and (d) of Figure 8, we can see that when station 7 is extreme and the volume in the other stations is set to the median (scenario 2), there is no noticeable effect on station 15. This is also the case when stations 3 and 8 are extreme simultaneously (scenario 6) or alone as in (e) and (f) of Figure 8. We can, therefore, deduce that station 7 (which is in up-mainstream) has a key role to play in the occurrence of extreme events in station 15 when it is associated with extremes in the station 3 (also up-mainstream but more distant from station 15) or station 8 (up-tributary and the farthest from station 15). On the other hand, the scenarios for station 11 have not been represented because they have no noticeable effect on the extreme quantiles of station 15 (even if it is down-mainstream and closest to station 15).
(e) (f) Figure 8. Illustration of the obtained scenarios with respect to downstream conditional quantiles. The red color means values are set to Q 100 as in Table 5. White means values are set to Q 2 . For station 15, blue means the conditional quantile S15 Q 100 is less than the marginal index, it is pink when they are almost equal and red when S15 Q 100 is much greater than the marginal index. (a-f): scenarios 5, 8, 2, 6, 1, and 3 respectively.
We resume the obtained quantiles of the volume in station 15 in each of the above scenarios in Table 5. We also give a comparison with the observed and marginal quantiles using indexes. To complete the results, we set the values of the first four stations with the quantiles of return periods 2, 20 and 100 years. We then calculate the conditional quantiles of station 15 based on these values for different probabilities. Figure 9 represents the conditional distributions of the extreme events at station 15 as a function of potential scenarios in the upstream sub-basins. It is then possible to deduce the conditional distributions of the quantiles, at station S15, depending on the levels recorded in the other four stations. This is an estimate based on the multivariate model and, therefore, takes into account the uncertainties and interactions between the series of extremes observed at each of the measuring stations.

Discussion
This study established multivariate models for flood-risk assessment of cascade reservoirs. The management approaches for multi-reservoir flood control mainly concern minimizing the maximum water level in the dam itself or discharges at downstream flood control points. For a large watershed, such as the Saint-John River, this management is even more challenging. It should take into account the interactions between the sub-basins. Moreover, the time lag and the flooding interdependence along the river are two significant parameters. They should be taken into consideration during the management process. The proposed multivariate vine copula model offers some flexibility to combine all risk sources with different dependence structures. The flooding capacity of existing reservoirs cannot hold back flooding downstream when extreme flooding has occurred in the upper Saint-John River basin. Therefore, more reservoirs are needed to control flooding in some downstream areas. The sub-basins that require more control are those connected to stations S3 and S7. This result might suggest the necessity to control all the stations on the main river and chief tributaries. The model also makes it possible to estimate the contribution of each of the sub-basins to extreme floods.
The Saint-John River has seasonal regulation reservoirs for medium and low floods. In the case of extreme upstream floods, the valves of downstream reservoirs are opened prematurely, causing high levels downstream in early spring. By better estimating the conditional risk of floods downstream of the basin, managers can make informed decisions to reduce vulnerability to flooding.
One of the extensions to our study would be the use of Extreme value copulas such as Joe, Tawn or 2-parameter Archimedean copulas (BB class). Due to the known ability of this class of copulas to model tail behaviours, this extension would be adapted to other hydroclimatic regions. We also think that an extension of this analysis could involve all the basin's sub-catchments. In this case, each sub-catchment network could be modelled separately by a vine structure, then all the vine models obtained can be regrouped to one multivariate model.

Conclusions
In this article, we have introduced a new approach to combine all the information available on the different sections of a river basin. Flooding in a large basin, as in the case of the Saint-John River, is a highly complex phenomenon and depends on several temporal and spatial parameters. The R-vine-copula-based quantile regression approach enables dimensionality to be reduced while retaining essential information on the contribution of each section to the entire basin's flood. A similar study can be done based on climate forecasts to produce decision-making tools on the management of available reservoirs and help to mitigate the effects of floods and protect vulnerable areas to flooding. The three main achievements of the work can be summarized as follows: (1) Conception of the multivariate distribution for multi-reservoirs volumes' extremes; (2) Development of the conditional quantile regression expressions at the downstream station; (3) Simulation of scenarios for cascade reservoirs, considering both sub-basins local characteristics and interactions.
The use of an R-vine structure suggested by the geographic system of the watershed allows precise modelling of the dependency structure between stations for a given variable. The proposed model can reproduce the periodic behaviour of the volume and peak flow series. The choice of the reduced-network R-vine allows for the alleviation of the conditions imposed during the construction of other vine types (calculation of the correlation coefficients and respect of the covariates order for the D-vine).
The generated scenarios show that the extreme conditions in the two stations S3 and S7 are decisive for the downstream sub-basin. These indicate the need for an improvement in the capacity and management of flood volumes at the level of these two sub-basins. Indeed, station S15 is located near Fredericton city, the capital of New Brunswick province, which experiences frequent flooding.