#### 2.2.3. Application of the Copula Theory

The concept of copula was introduced by Reference [

51], which defined copula as a joint distribution function of standard uniform random variables. Modeling joint distribution using copula relaxes the restriction of traditional flood frequency analysis by selecting marginals from different families of probability distribution functions for flood characteristics [

52].

First, the best matching statistical distributions were selected for the analyzed data series. The following distributions reflect the maximum values in series of hydrological data: Weibull, Gamm, Gumbel, and log-normal [

53]. Parameters of the distributions were estimated by means of the highest probability method [

54]. For the purpose of an assessment of the goodness of fit of a given distribution in the data series, the Akaike information criterion (AIC) was applied [

55], which was calculated from the following formulas:

where

MSE is the mean square error, and

N is the sample size, or

The best model is the one that has the minimum AIC value [

55].

In the next step, the copula method was used to construct the joint distribution of lake and sea stages. In general, a bivariate Archimedean copula can be defined as [

56]

where the

θ subscript of copula

C is the parameter hidden in the generating function

$\varphi $, and

$\varphi $ is a continuous function called a generator that strictly decreases and is convex from

I = [0,1] to [0,

$\varphi $(0)].

The Archimedean copula family is often applied in hydrological studies, for example in flood frequency analyses. It was found in References [

57] and [

58] that a copula-based flood frequency analysis performs better than a conventional flood frequency analysis, as joint distribution based on a copula fits the empirical joint distribution (i.e., from observed data using a plotting position formula) better than the established standard joint parametric distribution. Numerous successful applications of copula modeling have been achieved, most notably in survival analysis, actuarial science, and finance [

52].

A large variety of copulas belong to the Archimedean copula family and can be applied when the correlation between hydrologic variables is positive or negative. The proofs of these properties have been reported by References [

59] and [

60]. For this reason, one-parameter Archimedean copulas, including the Clayton family, the Gumbel–Hougaard family, and the Frank family, were used in this study. The Gumbel–Hougaard and Clayton copula families are appropriate only for positively correlated variables, while the Frank family is appropriate for both negatively and positively correlated variables (

Table 3).

where

${D}_{k}\left(x\right)$ is Debye function, for any positive integer

k,

The best fitted joint distribution was selected through comparison to the empirical joint distribution using the Akaike information criterion (AIC), as mentioned earlier.

For each compared pairs of series, based on previously calculated parameters of statistical distribution, 5000 hypothetical points were generated. They were used for the selection of the best fitted family of copulas for a given pair of series and then for the development of an appropriate copula. Based on empirical pairs of values for particular years and generated hypothetical points, graphs with probability curves (expressed in return periods) were developed (

Figure 2).

The next stage involved a calculation of the degree of synchronicity (synchronous occurrence) and asynchronicity (asynchronous occurrence) of maximum water levels in lakes and the sea. For each pair of stations, probability curves at a level of 62.5% (once in 1.6 years), 37.5% (once in approximately 2.7 years), 20% (once in 5 years), 10% (once in 10 years), 2% (once in 50 years), 1% (once in 100 years), 0.5% (once in 200 years), and 0.2% (once in 500 years) are presented (

Figure 2).

The obtained data were then analyzed based on probabilities of 62.5% and 37.5% [

44]. Nine sectors were designated, representing different relations between probable maximum water levels. Based on generated points with a distribution imitating the shared distribution of values from comparable water gauge stations and their participation in particular sectors (

Figure 2), 3 sectors with synchronous occurrences of maximum water levels were designated:

Sector 1: LHWL_{S}–LHWL_{L} (X ≤ S_{62.5%}, Y ≤ L_{62.5%});

Sector 5: MHWL_{S}–MHWL_{L} (S_{62.5%}< X ≤ S_{37.5%}, L_{62.5%} < Y ≤ L_{37.5%});

Sector 9: HHWL_{S}–HHWL_{L} (X > S_{37.5%}, Y > L_{37.5%});

as well as 6 sectors with asynchronous occurrence:

Sector 2: LHWL_{S}–MHWL_{L} (X ≤ S_{62.5%}, L_{62.5%} < Y ≤ L_{37.5%});

Sector 3: LHWL_{S}–HHWL_{L} (X ≤ S_{62.5%}, Y > L_{37.5%});

Sector 4: MHWL_{S}–LHWL_{L} (S_{62.5%}< X ≤ S_{37.5%}, Y ≤ L_{62.5%});

Sector 6: MHWL_{S}–HHWL_{L} (S_{62.5%}< X ≤ S_{37.5%}, Y > L_{37.5%});

Sector 7: HHWL_{S}–LHWL_{L} (X > S_{37.5%}, Y ≤ L_{62.5%});

Sector 8: HHWL_{S}–MHWL_{L} (X > S_{37.5%}, L_{62.5%} < Y ≤ L_{37.5%});

where X = the values of x coordinates of generated points, Y = the values of y coordinates of generated points, S_{62.5%} = the value of the maximum sea water level with a probability of exceedance of 62.5%, S_{37.5%} = the value of the maximum sea water level with a probability of exceedance of 37.5%, L_{62.5%} = the value of the maximum sea water level with a probability of exceedance of 62.5%, L_{37.5%} is the value of the maximum sea water level with a probability of exceedance of 37.5%, WL = water level, LH = “low high”, MH = “mean high”, and HH = “high high”.

The percent contribution of points included in sectors 1, 5, and 9 permitted a determination of the degree of synchronicity of maximum water levels between the two analyzed water bodies in a given time unit.

Synchronous and asynchronous occurrences of maximum water levels were determined through a determination of threshold values of probability ranges:

Probable maximum water levels with a probability of occurrence of <62.5% were designated as LHWL;

Probable maximum water levels with a probability of occurrence in a range >62.5% and <37.5% were designated as MHWL; and

Probable maximum water levels with a probability of occurrence >37.5% were designated as HHWL.

For example, the occurrence of LHWL in a given lake is a synchronous event if LHWL also occurs in the Baltic Sea in a given time unit, and it is asynchronous if MHWL or HHWL is recorded there.

The total contribution of synchronous and asynchronous events is always 100%.

The mathematical and statistical processing of analysis results employed statistical procedures included in the following software programs: Excel (Microsoft), Statistica (TIBCO Software Inc.), and RStudio. The implementation of the graphic form employed QGIS (3.6.2. Noosa) and Publisher (Microsoft) software.