Stochastic Modeling of Hydroclimatic Processes Using Vine Copulas

: The generation of synthetic time series is important in contemporary water sciences for their wide applicability and ability to model environmental uncertainty. Hydroclimatic variables often exhibit highly skewed distributions, intermittency (that is, alternating dry and wet intervals), and spatial and temporal dependencies that pose a particular challenge to their study. Vine copula models offer an appealing approach to generate synthetic time series because of their ability to preserve any marginal distribution while modeling a variety of probabilistic dependence structures. In this work, we focus on the stochastic modeling of hydroclimatic processes using vine copula models. We provide an approach to model intermittency by coupling Markov chains with vine copula models. Our approach preserves ﬁrst-order auto- and cross-dependencies (correlation). Moreover, we present a novel framework that is able to model multiple processes simultaneously. This method is based on the coupling of temporal and spatial dependence models through repetitive sampling. The result is a parsimonious and ﬂexible method that can adequately account for temporal and spatial dependencies. Our method is illustrated within the context of a recent reliability assessment of a historical hydraulic structure in central Mexico. Our results show that by ignoring important characteristics of probabilistic dependence that are well captured by our approach, the reliability of the structure could be severely underestimated.


Introduction
In the field of hydrology, the study of time series and their synthetic generation is of great importance. They are used to drive models in a wide range of applications, from reservoir design [1][2][3] and planning [4][5][6][7], ecological flow estimation [8], to flood risk [9]. There are a fair amount of models available in the literature and despite potential differences between them, they share two common goals: i) adequate modeling of the marginal distribution and ii) adequate characterization of the auto-correlation structure.
Regarding the modeling of time series of hydroclimatic variables, several challenges can be identified. Firstly, these variables typically present highly skewed behavior, especially in fine time scales [10,11]. In common practice, skewed distributions are simulated using the Thomas-Fiering models [12]. These are linear models that approximate skewness by introducing non-Gaussian white noise in the generation scheme; however, the estimation of white noise distribution parameters requires moments with an order higher than two. The estimates of higher-order moments are very uncertain [13] and have a significant impact on the reproduction of the distribution [14]. Additionally, the linear combination of the i.i.d. non-Gaussian white noise terms does not necessarily result in the target distribution [15], but rather a distribution that has equal moments up the preserved order [16]. Furthermore, hydroclimatic processes exhibit a wide range of dependence structures. Typically, these are categorized as short-range dependence (SRD) and long-range dependence (LRD) [17]. The main characteristic of SRD is the fast decay of the auto-correlation after a few lags, while LRD implies the persistence of high auto-correlation after many lags. It has been shown that LRD is present in many geophysical processes [17][18][19][20][21]. Most applications employ models that are able to reproduce only SRD with some notable exceptions [22,23].
Modeling auto-correlation with linear models (such as those typically used to account for SRD) can have some serious implications. Tsoukalas et al. [16] show that linear models result in unnatural bounded auto-correlation structures when processes other than Gaussian are simulated. This effect was termed envelope behavior. Envelope behavior is caused by the linear combination of white noise described by bounded distributions in the time series generation scheme. For example, when a Pearson-III distribution (which is bounded from below) is used to model the white noise terms in a first-order auto-regressive model (AR(1)), the resulting auto-correlation structure is bounded from below as well [16]. Tsoukalas et al. ([16]) show the importance of shifting the focus from the traditional narrow view of correlation, as expressed by the Pearson correlation coefficient, to the dependence structure in a joint distribution. This is a very important step not only in an effort to recognize the limitations of linear stochastic models, but also to understand better the dependence relationships between hydroclimatic variables. Contemporary research indicates that in many cases this relationship deviates from the Gaussian and the effects on the resulting uncertainty estimates are significant [24,25].
A promising alternative can be found in copulas. Copulas are joint distributions with uniform [0, 1] margins. They allow the modeling of the joint distribution through the decoupling of the dependence structure between variables and their marginal distributions. In many cases, this enables the entire description of the dependence structure between variables (or between lagged versions of the same variable) to one measure, such as Spearman's coefficient. Moreover, copula models are able to reproduce explicitly the marginal distributions.
In comparison to fitting a linear model, copula models can be relatively more complex and time-consuming. In order to fit a variable to a copula function, knowledge about the (physical) behavior of such a variable is required. Additionally, the complexity of these models grows significantly when dimensionality increases. Consequently, this hinders mathematical and computational handling as well as conceptual understanding. Nevertheless, copulas have been widely used and it is a rapidly growing field.
Copulas have gained popularity among the hydrological community [25][26][27][28][29]. Bivariate copulas have been used to generate synthetic time series at the River Nile [30] and the Colorado River [31]. Further, they have been applied to generate multi-site precipitation time series [32]. In the field of hydraulic engineering, for example, [33] applies copulas to produce wind speed and wave height time series for the scheduling of off-shore operations.
Bivariate copulas are limited on the order of auto-correlation they can preserve and the number of variables (for example in terms of geographical location) they can simultaneously simulate (some recent developments allow for approximating non-parametrically complex joint densities. See for example [34,35]). For this reason, extended copula models, i.e., vine copulas, can be used to represent a more complex joint distribution. This method, which was introduced by [36], uses bivariate copulas to decompose multivariate joint distributions. For example, ref. [37] generated cyclostationary stream flow time series considering higherorder auto-dependence. Jäger and Nápoles [38] demonstrated how to produce correlated significant wave heights and mean zero-crossing periods. Sarmiento et al. [39] successfully employed a regular vine (R-vine) model developed by [40] for the simulation of wind speed and direction time series. Vine copulas require the fitting of a large number of copula functions. For example, for the simulation of four variables (taking into consideration only the first-order auto-dependence and their cross-dependence), it would require the calculation of 22 copula parameters.
In this paper, we present a methodology that can be used to generate synthetic time series of hydroclimatic variables. Specifically, we present a case study that concerns an ancient dike that was located within an extinct lake in present-time Mexico City. In order to recreate the hydrological conditions of the lake at the time where the dike was used, time series of precipitation and evaporation are needed. A first attempt to characterize the dependence between the two variables is the model developed in [41]. In this paper, the model presented in [41] is extended to reproduce the first-order auto-dependence structure in the evaporation process. Additionally, specific considerations to better reproduce cross-correlations between the evaporation and precipitation processes are introduced. Furthermore, a novel framework to simulate from an arbitrary number of geographical locations is presented. This approach introduces the coupling of the spatial and temporal models using repetitive sampling. In this way, the spatial model and the temporal model can be freely chosen according to each data set and the required parameters for the quantification of the joint distribution are reduced. This paper starts by describing the general methodology to generate the time series. Next, the case study is presented including the description of the data. Then, the precipitation-evaporation model is applied to simulate daily realizations for one and four stations. Finally, a discussion of the results, conclusions, and recommendations for future work are presented.

First-Order Univariate Processes
At the core of the modeling methodology proposed here lies the notion of copulas. Copulas are multivariate distributions with uniform marginals in [0, 1]. They provide an elegant way to describe the dependence between variables by removing the effects of the marginal distributions. Sklar [42] showed that any multivariate joint distribution can be expressed as a function of its marginals. In a two-dimensional context his theorem states: where X 1 and X 2 are random variables with marginal distributions F X 1 (x 1 ) and F X 2 (x 2 ) and C the copula function. Equation (1) indicates that two components are required to define the joint distribution: the copula function and the marginal distributions. Bivariate copulas have been used before to model time series [43]. The simplest case of temporal dependence regards only two time steps. Let {X t } for t ∈ N denote the time series of interest. In terms of a copula, the relationship between consecutive time steps can be written as where C X t |X t−1 is the conditional copula with arguments ∈ [0, 1] 2 . Notice that the parameter vector Θ would model auto-correlation of order 1 for the time series of interest. Since {X t } is assumed to be a stationary process F(x t |x t−1 ) corresponds to a first-order Markov model, which is characterized with a single marginal distribution and an appropriate copula. Often, the conditional distribution is expressed as [44]: For simplicity, a different notation is introduced based on Aas et al. [45], according to , v, Θ) be the inverse conditional distribution, where the F(x t−1 ) is the conditioning variable, v is an independent uniform [0,1] random variable and Θ the parameter vector of the bivariate copula C X t ,X t−1 . Solving for x t yields the definition of the first-order univariate process.
where F −1 is the inverse marginal distribution. The process, briefly described in this section, has been used previously, for example, to model traffic loads for bridge reliability in [46,47].

First-Order Bivariate Processes
In this section, the univariate model briefly described in Section 2.1 is extended to account for dependence between two variables. Let us consider two dependent processes {X t } and {Y t } that present additionally serial correlation. Similarly, as in the univariate case, t ∈ N. A good alternative to model these kinds of processes are vine copulas or simply vines. Vines were originally introduced in Joe [44], Cooke [36], and Bedford and Cooke [48] (see [49]). Vines are a graphical way of representing multivariate joint distributions. They are a generalization of dependence trees. Roughly, a vine on n elements V(n) = {T 1 , . . . , T n } is a nested set of trees where the edges of tree j are nodes on the tree j + 1 for j = 1, . . . , n − 1. In particular, regular vines are of interest. These are vines whose edges in tree j are connected as nodes in tree j + 1 only if they share a common node in tree j. Vine copulas as dependence models assign a bivariate copula to each edge in the first tree of the vine and conditional bivariate copulas to the edges of every tree > 1. For a formal definition and their statistical treatment see for example [45]. A graphical representation of a regular vine on 3 nodes is presented in Figure 1. The edges in Figure 1 are assigned copulas that are parametrized by rank correlation coefficients and a conditional rank correlation of X t , Y t |Y t−1 . Rank correlations are denoted by r followed by a subscript indicating the respective variables and time step. For example, r Y t Y t−1 denotes the rank correlation between process Y at times t and t − 1. Notice that the essential correlations that the model needs to capture are the temporal dependencies {X t } and {Y t } and the cross-correlation {X t , Y t }. If the process {X t } is generated independently with the univariate conditional copula model discussed in Section 2.1, then all that is left is to induce the correct dependence {X t , Y t } and {Y t }. This can be achieved by modeling Y t conditional on both Y t−1 and X t with a vine as the one presented in Figure 1. Y t can be sampled according to Equation (4) where F X and F Y are the marginal distributions underlying X t and Y t (F −1 X and F −1 Y their inverses). Notice that the time index t is omitted since both processes are stationary. Moreover, h is the conditional copula (and h −1 its inverse) operator, as it was established earlier, and v is as before, an independent uniform random variable on [0,1] .
The modeling procedure for the generation of a continuous bivariate stochastic process may be summarized as: i Fit the appropriate marginal distributions characterising the random variables (or use the empirical distribution). ii Select suitable copulas Generate the first variable (X t ) using the univariate copula model from the previous subsection. iv Generate the second variable (Y t ) using Equation (4).
Similar models to the one briefly described in this section have been used in [33,38].

First-Order Intermittent-Continuous Bivariate Processes
In this section, the algorithm introduced in Section 2.2 is expanded to take into account intermittency. Results are based on the methodology presented in Torres-Alves et al. [41]. The basic idea is the coupling of the copula models with Markov chains. Specifically, the intermittent process is split into two sub-processes, (i) a discrete state-time Markov Chain accounting for the occurrence of a certain environmental condition, and (ii) copula models accounting for "amounts". For example, in the case of daily precipitation, the Markov chain model would simulate days with or without rain while the copula models would account for the amount of rain (mm) per day. In an intermittent process, a Markov Chain model accounts for two states: one for zero amount (i.e., a dry day) and one for larger than zero amount (i.e., a wet day). The first-order discrete state-time Markov process is described by: Similar to previous subsections, t ∈ N while s ∈ S for a countable set S. The probabilities that express the chance of transition between states are called transition probabilities. Continuing with our example, the two states of precipitation could be wet ⇒ 1 and dry ⇒ 0, thus S = {0, 1}. The probabilities of interest are defined as follows: For the above probabilities it holds that P 1|1 = 1 − P 0|1 and P 1|0 = 1 − P 0|0 . These probabilities comprise the so-called transition-probability matrix: This matrix characterizes the Markov chain, providing all the information necessary to reproduce the process. In this way, the probability of each state is explicitly simulated. Another way to approach intermittency could be by adopting mixed type distributions that consider the dry state as an atom of probability. These have been adopted in literature [50,51] but they will not be discussed in this text; however, they are theoretically compatible with the proposed copula models.
The way that intermittency has been approached, by splitting the main process into two sub-processes of occurrence and amount dictates that the generation algorithm cannot be continuous. Specifically, the generation of the "occurrence" process must precede the "amount" process. In other words, the Markov chain must precede the copula time series.
Consequently, to generate a synthetic time series, the first step is to generate realizations of the Markov chain. This will define the different state blocks (i.e., sequences of wet and dry periods of time) of the time series. For the dry (zero amount state) block of the intermittent process, no further action is needed. What follows is the generation of the amount process for the wet blocks. This can be performed for each block by employing the univariate model of Equation (3). Then the continuous process can be generated conditionally on the intermittent one of the same time step using the vine copula model of Equation (4); however, dry blocks require special treatment. Naturally, zero amount does not allow the expression of dependence via a copula. To overcome this, the cross-correlation is treated implicitly by conditionalizing the continuous process distribution according to the two states, wet and dry.
On a final note, to preserve correctly the first-order auto-dependence structure of the continuous process the generation of blocks must be sequential. This means that the first realization of each block should be conditional on the final realization of the previous block.
The generation procedure is schematized in Figure 2. Superscripts W and D denote the conditional wet and dry continuous process marginal distributions, respectively.
Dry Block Generation Procedure The generation algorithm is summarized below: i Fit the appropriate marginal distributions characterizing the RV's. ii Select and fit the suitable copulas Calculate the transition-probability matrix of the Markov chain for the intermittent process X t . Simulate a desired length of the Markov sequence. iv Split the time series into dry and wet blocks. v Generate "blocks" representing {X t } using the univariate copula model (Equation (3)). vi Generate the wet block Y t using the vine model of Equation (4) and the appropriate marginal distribution. Use as seed for the first value of the wet block the last value of the previous dry block. vii Generate the dry block Y t using the vine model (Equation (3)) and the appropriate marginal distribution. Use as seed for the first value of the dry block the last value of the previous wet block.

Multivariate Processes
Equations (3) and (4) define univariate and bivariate models, respectively; however, using the vine decomposition, higher-order models can be can be defined. As it was discussed during the introduction, while such models that account for temporal and spatial dependence exist, they require the calculation of a great number of parameters and are limited in the dependence structures they can represent. The methodology described herein aims to approximate dependent processes in time and space (for example precipitation and evaporation from an arbitrary number of stations) while remaining parsimonious and flexible. The key of this methodology is to approximate the conditional distribution of the spatial and temporal dependence through repetitive sampling instead of inferring it theoretically. It should be noted that the use of repetitive sampling in the context of hydrology has been used in [52,53] for the coupling of different temporal scales.
For simplicity, let us first assume a two-dimensional case. Let us define a twodimensional process {Y 1 t , Y 2 t } where the superscripts denote, for example, a corresponding station and the subscripts the time step. Moreover, let us assume that the temporal and spatial dependence can be described by the first-order model of Equation (3). The methodology can be described in four steps: i Generate one realization of {Y 1 t } from Y 1 t−1 based on the temporal model. ii Generate n realizations of Y 2 t based on Y 1 t according to the spatial model. These are denoted with a tilde and the superscript S2 as [Ỹ S2 t,1 , · · · ,Ỹ S2 t,n ]. iii Generate n realizations of Y 2 t based on Y 2 t−1 according to the temporal model. These are denoted with a tilde and the superscript T2 as [Ỹ T2 t,1 , · · · ,Ỹ T2 t,n ]. The spatial and temporal realizations are all plausible realizations of Y 2 t , which implies that a section between the two sets exists. iv Identify the common space. This is performed by identifying the realizations that minimize the root mean squared error (RMSE).
Notice that RMSE in step iv above is not used as a goodness of fit measure but rather as a selection criterion between the spatial and temporal components of the model. Another way to approach this could be a Monte Carlo procedure where a convergence criterion is targeted; however, in the context of time series generation this would result in non-practical computational demand. Other measures for selection may be very well possible; however, investigating them is out of the scope of this paper since we focus mainly on the general methodology for simulating spatially diverse data sources.
The above procedure can easily be extended to more dimensions. Let us consider another case where a set of four correlated variables [Y 1 a.
b. It can be seen that the C-Vine representation has a unique variable, in our case Y 1 t , from which the other variables depend. On the contrary in the D-Vine representation, the variables depend serially. Both could be probable representations of spatial dependence; however, the algorithm depends on the vine's sampling order. Let us assume that a C-Vine is selected. Further, let v 1 , v 2 , v 3 , v 4 denote independent random draws from the uniform (0, 1) distribution. The sampling algorithm is given by the following equations: Furthermore, let us assume that the temporal dependence can be described by the firstorder model of Equation (3). The first step is to generate a basis realization on which the repetitive sampling will be based. This decision depends on the selected spatial dependence model. It is noted that according to Equation (9), the C-Vine sampling order is 1-2-3-4. Thus, in this case, the algorithm should start by computing Y 1 t . If a D-Vine was selected instead, the algorithm should start by computing Y 3 t since the sampling order of a D-Vine is 3-2-4-1 [54]. The rest of the steps remain the same. n realizations are generated according to the spatial and temporal models of which the one that minimizes the RMSE (Equation (10)) is selected.
where m = 4 in our case. This procedure is schematized in Figure 4. I I I. The methodology above can be extended for m stations. The required number of parameters of such a model is: The multivariate generation algorithm is summarized below: i Fit the appropriate marginal distributions characterizing the RV's. ii Identify suitable models to describe spatial and temporal dependence. iii Select and fit N Θ suitable copulas to model dependencies. iv Select the number of trials n for the repetitive sampling. v Generate the first temporal realization according to the sampling order of the selected spatial model. vi Generate n realizations from the spatial and temporal models independently. vii Select the realization which minimizes the error (Equation (10)).

Admissible Marginal Distributions and Copula Fitting
One of the major benefits of modeling stochastic time series with copulas is the direct use of the marginal distribution, instead of moment approximations. This allows the utilization of classical fitting techniques such as maximum likelihood estimation [55] and L-Moments [56] and state-of-the-art methods such as K-Moments [14] and the metastatistical extreme value framework (MEV) [57]. Moreover, any finite variance distribution is admissible, which increases the flexibility of the models.
The same flexibility can be demonstrated in the admissible copula fitting techniques. In this study, the estimation of copula parameters is performed via the Spearman or Rank correlation coefficient accompanied by measures to test the goodness of fit, such as the Cramer-Von Mises statistic [58] and semi-correlations [43]; however, other methods such as maximum likelihood, Bayesian information criteria, or Kendall's tau can be supported. A more comprehensive review can be found in [25,59].

Case Study
To demonstrate the effectiveness of the proposed vine copula model, two scenarios are presented: (i) simulation of daily evaporation and precipitation at one station and (ii) simulation of daily evaporation at four stations, all located at the Valley of Mexico. Additionally, the first scenario illustrates the influence of the dependence structure on the reliability of flood defenses.
The proposed methodology is validated on the accuracy of the reproduced historical characteristics, such as the approximation of the historical marginal distributions, the first-order auto-correlation dependence, and the cross-correlation dependence structure.

Area of Study
The case study described herein is presented in Torres-Alves & Morales-Nápoles (2020) [41]. It concerns the reliability analysis due to overflow of the ancient Nezahualcoyotl dike that was once located in the Valley of Mexico ( Figure 5). The Aztecs built this structure around 1450 AD to protect their capital, the city of Tenochtitlan, from rising water levels in the lacustrine system. Around 1519, the Valley of Mexico was covered by a lacustrine system comprising of six interconnected lakes. These lakes connected during high water and because of the salinity of the lakes, the rising water posed a threat to agriculture and freshwater supplies in Tenochtitlan. Nowadays, there are no remains of this dike and the lakes were almost completely drained by the end of the 19th century. Ref. [41] reconstructed the geometry and position of the ancient structure based on historical sources. The dike was 16 km long with a height of 8 m and a width of 3.5 m. It was made out of wood, stone, and mud. Moreover, ref. [41] estimated the extent of the lacustrine system and proposed a simple hydrological balance equation (Equation (12)) to compute the water level fluctuation at the lake. Finally, an assessment of the reliability of the dike due to overflow was presented. where: • X t is the precipitation at time t (m/day). • Y t is the evaporation at time t (m/day). • A tr is the tributary area of the basin (m 2 ). • A l is the surface area of the lake (m 2 ). • dV dt is the daily change in volume (m 3 /day). The hydrological balance was simulated only for the wet seasons. For this reason, an initial water level (L 0 ) needed to be assumed. The dike's probability of failure due to overflow was computed for six initial water levels (1 to 6 m).

Data
To simulate the precipitation and evaporation time series, records from five stations situated within the Valley of Mexico are analyzed. This analysis was originally performed in [41]. Therefore, only a brief overview is provided in the present work. For a detailed presentation of the historical records the reader is referred to the original publication. The data are available from Mexico's national database [60]. Information about the stations is given in Table 1. In this region, two seasons are identified, (i) wet season (May-October), and (ii) dry season (November-April). The data from the stations were analyzed and appropriate distributions were fitted using Maximum likelihood ( Table 2). The respective cumulative distribution functions are presented in Appendix A. Furthermore, bivariate copulas were fitted for each data pair (precipitation-evaporation). The results are presented in Table 3. In [41] the authors use different goodness of fit (GOF) techniques to show that the pairs in Table 3 are adequate choices for the (conditional) copulas of interest. For further details about the GOF techniques used, the reader is referred to [41] and references therein. Their functional representations are given in Appendix B.

Simulation of Daily Evaporation and Precipitation
The simulation of precipitation {X t } and evaporation {Y t } poses a particular challenge due to the intermittent behavior exhibited by precipitation. Moreover, these variables are the basic drivers for many hydrological models (i.e., [61]). Producing correlated time series enables the quantification of uncertainty in these drivers and a probabilistic description of the results. For this scenario, the data from Atenco station were analyzed (for further details on why this station is selected, refer to [41]). From these data, 5000 hypothetical wet seasons were generated, each with a length of 185 days.
The ability of the model to reproduce the historical distributions is exhibited in Figure 6 with the use of QQ plots. These depict the relationship between the theoretical and empirical quantiles of the cumulative distribution function. In the presented case, the plotted quantiles form an almost straight line; thus, good agreement is found between historical and simulated CDFs for both variables. Figure 7 provides a comparison between historical and synthetic dependence structure in the real domain. Precipitation and evaporation exhibit a moderate correlation in the range of 0.3 to 0.4. Additionally, between the variables, a low negative correlation is observed. These dependencies are quantified in terms of rank correlations in Table 4. Overall, the model reproduces adequately the first-order auto-dependence as well as the dependence between variables.

The Effect of the Choice of Copula
For the description of the historical first-order auto-dependence, a Gumbel copula was employed; however, in non-copula models addressing directly the characteristics of the observed dependence structure is not straightforward. To demonstrate the importance of a correct dependence description, a reliability calculation will be performed using two sets of the synthetic time series. The first one will be the one generated during the previous section. The second one is a precipitation and evaporation pair with the auto-dependence of evaporation described by a Gaussian copula. The latter is selected for comparison purposes because it represents a simple dependence structure that is reproduced by commonly used linear Gaussian models and more recent models such as [50,51,53,62]. The results for both cases are summarized in Table 5.   Given an initial water level of L 0 = 1 m. the probability of failure computed for the Gaussian copula case is 73% higher than the Gumbel case.
Note that the Gaussian copula reproduces the same first-order rank auto-correlation as the Gumbel. The difference can be entirely attributed to the change in the shape of the dependence structure (Figure 8). In the absence of tail dependence, there is no clustering of high evaporation events. This moderates the outgoing flux of the lake resulting in more frequent high-water levels and thus, a larger probability of failure. This can be illustrated by comparing the two evaporation time series (Figure 9).  In Figure 9, the time series generated by the Gumbel copula presents clusters of high evaporation values around time steps 40 to 70, 100 and 160. On the contrary, the time series generated by the Gaussian copula demonstrate a more symmetrical dependence pattern. This is in accordance with the scatter plots provided in Figure 8 where the difference of tail dependence between both copulas is demonstrated. Additionally, it is observed that the differences between the computed probabilities decrease as the initial water level L 0 increases (Table 5). This is because larger probabilities (smaller return periods) correspond to events further from the tail, where the differences between the copula functions are found. These results provide evidence on the importance of selecting an appropriate model that characterizes the dependence structure of extreme events.

Simulation of Evaporation across Multiple Stations
This section presents the application of the multivariate methodology. The developed multivariate algorithm is implemented for the simulation of daily evaporation in four of the five stations at the Valley of Mexico. Evaporation data from Atenco {Y 1 t }, La Grande {Y 2 t }, San Andres {Y 3 t }, and Chapingo (DGE) {Y 4 t } were used to generate 100 seasons (each of 185 days). The spatial dependence is described by a Gaussian copula while the temporal dependence is described by a Gumbel copula. For the latter, only the first-order autodependence is taken into account. The sampling number is set to n = 100. A comparison between historical and simulated first-order auto-dependence is provided in Figure 10. On the temporal level, evaporation in all stations demonstrates a rank correlation in the range of 0.35 to 0.45. Moreover, the historical data exhibit upper tail dependence between consecutive time steps. In all cases, the simulated data reproduce the shape of the historical dependence structure. The results are quantified in Table 6 where a very good agreement is found between historical and simulated rank correlations. Table 6. Comparison of historical and simulated rank auto-correlations. Furthermore, evaporation exhibits a high spatial correlation, in the order of r ≈ 0.6 among the stations. From a physical point of view, this can be explained by the proximity of the stations and their position within the same watershed. The proposed algorithm was able to reproduce these dependencies to a good extent ( Figure 11); however, a small deviation is observed between pairs (Table 7). This difference can be attributed to the copula choice. Table 7. Comparison of historical and simulated rank cross-correlations. Finally, the model's ability to reproduce the marginal distributions is presented in Figure 12 where the empirical and synthetic quantiles of the cumulative distribution function are compared. Overall, a good agreement is observed for all variables; however, small differences are found near the tails. The size of this disagreement is not atypical in hydrological research or other fields where uncertainty in modeling or observations is, in general, significant.

Conclusions
This paper presents a methodology to simulate hydroclimatic variables through copula-based models. The proposed methodologies focus on the use of vine copulas to characterize complex temporal and spatial probabilistic dependence. Two cases are presented: (i) the generation of correlated precipitation and evaporation on a daily scale and (ii) the generation of correlated daily evaporation time series from four stations. The first case consists of a trivariate vine copula to handle temporal dependence of evaporation and cross-dependence with precipitation. To capture intermittency, Markov chains were coupled with a copula-based model. In the second case, the methodology relied on repetitive sampling to couple the temporal and spatial vine copula models. The result is a parsimonious and flexible model capable of simulating accurately observations from an arbitrary number of stations.
In both case studies discussed herein, the models proved their capability to capture the underlying dependence structures, reproduce the marginal distributions, and intermittency. In fact, for the dependence structure, important asymmetries (such as tail dependence) may be incorporated. These asymmetries are often found but overlooked in the modeling of hydroclimatic variables. One of our case studies shows that by ignoring these asymmetries, the reliability of hydraulic structures, for example, could be underestimated by a factor of 2 in some cases (Table 5).
We have shown the ability of our approach combining Markov chains and vine copula models to reproduce short-range dependence (SRD) structures [63]. Many hydroclimatic processes however exhibit long-range dependence (LRD) [64][65][66]. The performance of our approach for these kind of dependence may be a subject for future research.  Acknowledgments: The first author was partially funded by the Eugenides Foundation for this research, as part of his MSc studies scholarship.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Distribution Functions
Weibull where κ is the shape parameter and λ is the scale parameter. Generalized Extreme Value where κ is the shape parameter, λ is the scale parameter and ψ is the location parameter.