The Use of a Uniform Technique for Harmonization and Generalization in Assessing the Flood Discharge Frequencies of Long Return Period Floods in the Danube River Basin

: The flow regime conditions of the Danube River are continually changing. These changes are the result of natural processes and anthropogenic activities. The territory of the Danube River Basin is one of the most flood ‐ endangered regions in Europe and assessing the design discharges along the Danube channel is complicated by the different estimation methods that are applied in particular countries. For this reason, it is necessary to harmonize flood design value assessment methods. The long ‐ term maximum annual discharge series of the Danube River and other rivers in the Danube basin were analyzed and used to estimate the flood design values. We used the Log ‐ Pearson type III distribution, which is one of the most widely used theoretical probability distributions to estimate extremes. This distribution can be flexibly applied to extreme values depending on the skew coefficient. We also analyzed the effect of the inclusion and exclusion of the historical extremes in the processed dataset. The results show that the inclusion of historical floods and the regionalization of the Log ‐ Pearson type III distribution skew parameter can change the design discharges.


Introduction
Flood frequency analysis plays a major role in the design of hydraulic structures and flood control management. One of the fundamental problems of flood hydrology was (and still is) establishing the relationship between peak discharges of flood waves and the probability of their return period. Extrapolation from these variables (a so-called frequency curve) is especially necessary for water management and flood control plans. Directive 2007/60/EC of the European Parliament of 23 October 2007 concerning the assessment and management of flood risks requires member states to draw up flood hazard maps of floods with long return periods (from 100 to 1000 years). On the basis of the statistics, it is clear that the extrapolation of the data is very sensitive to both the length of the data series and the inclusion of historic extremes in the data series. Investigation of the history of extreme flood event frequency, severity and duration provides a greater understanding of the region's extreme event characteristics and the probability of occurrence at various levels of severity. This type of information is beneficial in the development of extreme response and mitigation strategies and preparedness plans. Many hydrologists consider mathematician E.J. Gumbel to be one of the pioneers in the development of extreme value theory [1]. In [2], the theory of extremes was applied to different areas, with a main focus on hydrology, the definition of floods and a practical method for estimating flood frequencies. The correct estimation of potential culminations of floods requires the inclusion of long-term observational data series and historic preinstrumental data to statistically analyze data series [3][4][5][6][7][8][9][10]. An extensive overview of scientific papers dealing with the historical floods and individual disastrous flood events in the past millennium for the rivers in various European countries can be found in [11]. Moreover, historic floods that occurred in the Danube River Basin are summarized in the book of Pekárová et al. [12].
Another important factor in the correct estimation of extremes is the uncertainty of the applied statistical method. For example, the estimation of uncertainty for design discharges was investigated in [13][14][15][16]. In [17], four models were compared in terms of goodness of fit, their uncertainties, parameter estimation methods and implications for estimating flood quantiles. Regional flood frequency analyses (RFFA) using L-moments and annual maximum series (AMS) methods for Pannonia basins were conducted in [18]. The following conditions are among the basic assumptions for the application of the frequency analyses of maximum annual discharge: -Maximum annual discharges must be independent and stochastic; -Processes influencing the runoff process are stationary with respect to time (homogeneity of the series); -Statistical characteristics of the measured data series (series of maximum annual discharge) represent the past, presence, and future.
A second problem regarding hydrology is hydrological regionalization concerns the manner in which the transfer of data to the ungauged basins, or to deficient data sites, is carried out. There are two main procedures used for this transfer. The first consists of discovering certain relationships for the spatial interpolation of the principal statistics of the probability curves; the second tries to eliminate the shortcomings of the first. This consists of determining several statistical distribution curves of the standardized annual maximum discharge. Standardization is achieved by dividing the maximum annual discharges by their average magnitude. These standardized (or dimensionless) curves are often called growth curves [19]. All methods of estimating floods with a very long return period are associated with great uncertainties. Determining the specific values of a 500 or 1,000-year flood for engineering purposes is extremely complex. Nowadays, hydrologists are required to determine both the specific design values of the extremes and to specify confidence intervals in which the discharge of a given, for example, 100, 500 or 1,000-year flood may occur, with a probability of 90%. Globally, there are a huge number of scientific papers that deal with the selection and testing of theoretical probability distributions for estimating the maximum values of hydrological characteristics. The application and selection of a particular probability distribution function, the method of parameter estimation and the analyzed period depend on the calculation method generally used in a given country [20][21][22][23][24]. For example, since 1967 in the United States, Log-Pearson type III (LPIII) distribution has often been chosen by experts as "The distribution of choice for floods" [25]. The LPIII is used to estimate extremes in many natural processes and is one of the most commonly used probability distributions in hydrology [26][27][28][29]. In [30], the use of the Log-Person distribution to estimate maximum annual precipitation and discharge was investigated. They concluded that this distribution is more suitable for discharges with higher return periods, whereas for annual floods, the existence of an upper bound for the distribution may cause uncertainty in some cases. The use of historical information to improve flood quantile estimates was investigated in [31]. The authors showed that much of the information contained in historical flood records is connected with knowing the number of exceedances of the threshold rather than the magnitudes of the "historic" floods. Various authors, e.g., see [32][33][34], prefer the generalized extreme value (GEV) distribution for estimating hydrological extremes. In [35], the authors examined the suitability of several types of probability distribution (GEV, LPIII and Gumbel) for estimating T-year discharges. The results of [35] showed that the GEV distribution was applicable to the Upper Thames River Watershed data, but they recommend further research. Excepting the aforementioned factors, the estimation of T-year discharges is finally influenced by the type of theoretical probability distribution function used. The choice of theoretical probability distribution function should accurately represent the uncertainty and variability of the hydrological problem. For large international basins, such as the Danube River Basin, it is necessary to synchronize the methodology and to prepare common procedures for determining flood hazards.
Therefore, the aim of the paper is to propose a uniform methodology for the harmonization and generalization of design value assessments of flood discharges in stations along the Danube River. We used LPIII distribution as a mathematical tool. The skew coefficient is a measure of the asymmetry of that distribution and is sensitive to extreme events. In the first part of this paper, we analyze the effect of the inclusion or exclusion of historical extremes in the processed dataset on design value estimation. The second part of the paper is focused on the estimation of the relationship between the skew coefficient of the Log-Pearson type III distribution function and runoff depth, basin area, and elevation, for the purpose of regionalization. The last section offers the results, conclusions and a short discussion.

Materials
The Danube River is the second greatest river in Europe, after the Volga. The basin covers an area of 817,000 km 2 ( Figure 1). The river originates in the Black Forest in Germany at the confluence of the Briga and the Breg streams. The Danube then discharges southeast for 2872 km (1785 mi), passing through four Central European capitals before emptying into the Black Sea via the Danube Delta in Romania and Ukraine. The Danube River Basin landscape geomorphology is characterized by a diversity of morphological patterns, and the river channel itself can be divided into six sections ( Figure 1). The territory of the Danube River Basin is also one of the most flood-endangered regions in Europe. Therefore, it is vital to have complete data of the flood regime to be able to generalize such information on the basis of long-term observations from the whole Danube territory. The occurrence of large floods on the Danube River is described in detail in many publications [8,11,[36][37][38][39][40][41][42]. In this study, the long-term data of annual maximum discharges from 20 stations (Table 1) along the Danube River from Germany to Ukraine were used to determine the T-year maximum discharges. The basic statistical characteristics of the stations are presented in Table 1. In Figure 2, examples of the maximum annual discharges in the upper (Hofkirchen gauge), central (Bratislava gauge), and lower Danube (Orsova/Turnu Severin gauge) from 142 to 193 years are shown.

Log-Pearson III Probability Distribution
For the estimation of the Qmax discharge series distribution function, we used Log-Pearson Type III distribution. The LPIII distribution is used to estimate the extremes in many natural processes and is the most commonly used frequency distribution, especially in hydrology. The Log-likelihood function of LPIII with estimation of its parameters was developed in [27]. In [43], a frequency factor-based method for hydrological frequency analysis for the random generation of five distributions (normal, lognormal, extreme value type 1, Pearson Type III and Log-Pearson Type III) is presented. The LPIII distribution was also used in flood frequency analysis in [28,33,34]. Use of one type of distribution also allows the value of the T-year maximum discharges to be estimated for parts of the river without observations on the basis of the long-term average of maximum annual discharge and distribution parameters from the neighboring gauging stations.
To estimate the distribution parameters, the method described in the Interagency Committee on Water Data Bulletin 17B [44] was used. Bulletin 17B provided revised procedures for weighting station skew values with results from a generalized skew study, detecting and treating outliers, making two station comparisons and computing the confidence limits of the frequency curve. Flood estimation procedures in the United States traditionally use two primary methods: frequency analysis of peak discharges for floodplain management and levee design, and deterministic Probable Maximum Flood estimates for the design of dams and nuclear facilities, [45].
The Log-Pearson Type III distribution is a three-parameter gamma distribution with a logarithmic transformation of the variable. It is widely used for flood analyses because the data quite frequently fit the assumed annual maximum discharge series. The probability density function of the Pearson Type III distribution is of the following form: where τ is the location parameter, α is the shape parameter, β is the scale parameter and Γ(α) is the Gamma function given by Equation (2).
The moment method uses the logarithms of variables to estimate the distribution parameters log X = +K, where X is a random variable, ̂ is the mean, ̂ is the standard deviation and K is a factor of the skew coefficient at a selected exceedance probability.

Conditions of Qmax Series
The distribution is fit by computing the base 10 logarithms of the discharge, Q, at a selected exceedance probability, P, using the following Equation (3): where X is the mean, is the standard deviation and K is a factor of the skew coefficient at selected exceedance probability. The formulas for these parameters are provided below. Mean: Standard Deviation: Skew Coefficient: The Kolmogorov-Smirnov test was performed to test the assumption that the discharge magnitudes followed the theoretical distributions. The p-value (p ≥ 0.05) was used as a criterion for rejection of the proposed distribution hypothesis. Probability estimates were calculated for the chosen plotting positions. A basic plotting position formula for symmetrical distributions is given by [32]: where pi is the exceedance probability of variable observations Xi ranked from largest (i = 1) to smallest (i = n), and α is a plotting position parameter (0≤ α ≥ 0.5).

Parameter Estimation: Simple Case
The method of moments uses the logarithms of flood discharges to estimate the distribution parameters. The first three sample moments are used to estimate the LPIII parameters. These include the mean (μ ), standard deviation (σ ), and skew coefficient (γ ). In the case where only systematic data are available, with no historical information, the mean, standard deviation, and skew coefficient of the station data may be computed using the following equations: where n is the number of flood observations and (ˆ) represents a sample estimate. The sample standard deviation and skew coefficient include bias correction factors (n−1) and (n−1). (n−2) for small samples, respectively.

Historical Floods
Historical flood peaks reflect the frequency of large floods and thus should be incorporated into the flood frequency analysis. They can also be used to judge the adequacy of estimated flood frequency relationships. For the latter purpose, appropriate plotting positions or estimates of the average exceedance probabilities associated with the historical peaks and the remainder of the data are desired. An algorithm for assigning plotting positions to censored data, such as historical floods, is provided in [46] and [47].

Skew Coefficients in Log-Pearson III Distribution-Regionalization
The skew coefficient is a measure of the asymmetry of the distribution. There is relatively large uncertainty for the station sample coefficient of skewness (third moment) because it is sensitive to extreme events in records of limited length [48,49]. The station skew coefficient (Gs) and regional skew coefficient can be combined to form a better estimate of skew for a given watershed. Under the assumption that the regional skew coefficient is unbiased and independent of the station skew, the mean-square errors (MSEs) of the station skew and the regional skew can be used to estimate a weighted skew coefficient. If the regional and station skews differ by more than 0.5, a careful examination of the data and the flood-producing characteristics of the watershed should be made. Greater weight may be given to the station skew depending on record length, the largest floods within the gauging record and watershed, and watershed characteristics. Large deviations between the regional skew and station skew may indicate that the flood frequency characteristics of the watershed of interest are different from those used to develop the regional skew estimate. It is thought that station skew is a function of rainfall skew, channel storage and basin storage [50]. There is considerable variability of responses among different basins with similar observable characteristics, in addition to the random sampling variability in estimating skew from a short record. It is considered reasonable to give greater weight to the station skew, after due consideration of the data and flood-producing characteristics of the basin [49].
The estimation of the design discharge values by Log-Pearson III-type probability distribution according to the method described in [44] is presented hereafter. The frequency curve spreadsheet version 3.06 of [44] was used to estimate the parameters of distribution function with exclusion and inclusion of the historical floods data in the calculation. The design discharge values for 20 gauge stations from Germany to Romania along the Danube River were calculated.

Estimation of the T-Year Design Discharges Along the Danube River
As the first step, we estimated the LPIII distribution function parameters (mean Q, standard deviation S, and station skew coefficient Gs) for each of the stations separately and computed the QT design values. The design values of selected T-year annual maximum discharges along the Danube River with station skew coefficients Gs are listed in Table 2. In the case of gauges for which some historical maxima were known, we added the historical data in the calculation. Next, the parameters of the LPIII distribution curves were recalculated for individual stations. The design values of selected T-year annual maximum discharges along the Danube River with historical skew coefficients Gh are presented in Table 3. The inclusion of the historic flood data in the calculation increased the skew coefficient by an average of 0.22. The highest difference between skew coefficients with and without historical data was 0.87 for the Regensburg-Schwabelweis station. An example of the computation of theoretical LPIII exceedance probability curves of the Danube maximum annual discharges, without historical data and with historical data for Regensburg-Schwabelweis, Stein-Krems, Devin/Bratislava, and Ceatal Izmail, are illustrated in Figure 4.
Differences in the estimation of the maximum discharges with a return period of 100 and 1,000 years along the Danube River, estimated according to LPIII distribution with historical data and without historical data for each of the stations, are illustrated in Figure  5a,b. The average difference between the estimated maximum discharges in gauging stations, with or without the inclusion of the historical data for a return period of T = 100, was 751 m 3 s −1 and for a return period of T = 1,000 years was 1,730 m 3 s −1 . Our investigation showed that the inclusion of historical floods changed the curvature of the LPIII distribution curves and changed the design discharge.   (a) (b) Figure 5. Differences in the estimated maximum discharges with return periods of (a) 100 years and (b) 1000 years along the Danube River, estimated according to LPIII distribution with historical data (11 red points) and without historical data (20 blue points).

Regionalization of the Skew Coefficients of the LPIII Probability Curves for the Danube River
The previous part of the analyzed annual maximum discharges shows how the QT design values change along the Danube River. The ratio k of QT/Qa (Qa: long-term mean discharges) for selected stations are presented in Figure 6a. The 1,000-year discharge is 15 times higher than the mean annual discharge at the Berg station, seven times higher at the Bratislava station, and only three times at the Reni station. Subsequently, we individually plotted the course of the skew parameter Gs for each station (Figure 6b).   Within the regionalization, we investigated various dependences of the skew coefficient Gs, individual physical-geographical characteristics (river basin area, altitude of the station), and runoff depth at stations along the Danube River. The equations computed from the regression analyses could then be used to calculate flood-discharge estimates at sites where the basin characteristics were known, but for which no discharge data were available. The course of the long-term runoff depth R (mm per year) along the Danube River is illustrated in Figure 7. Figure 6b or Figure 7 show that both skew coefficients Gs and Gh have a similar course as related to long-term runoff depth R at the analyzed stations. Therefore, we primarily analyzed the relationships between the skew coefficient and the runoff depth R (Figure 8a,b).    After estimating the best fitted relationship for the Danube River stations, we propose using the generalized (regional) skew coefficient Ghr calculated according to the following relation (11): Ghr = 0.0025R − 0.7756 (11) where Ghr denotes the regional (generalized) skew coefficient with some historical data, and R denotes the long-term runoff depth (from ca. 260 to ca. 600 mm per year). An example of the regression Equation (11) being applied to calculate the regionalized skew parameters Ghr for the Danube River at Hofkirchen is presented herein. The station skew coefficient Gs of the selected station was relatively low. Such skew parameter values meant that the upper curvature of the LPIII exceedance probability curves did not capture extreme values. An example of the computation of theoretical LPIII exceedance probability curves of the Danube at Hofkirchen with station skew parameter Gs ,and regionalized parameter Ghr, is shown in Figure 9.

Discussion
The    Monitoring and evaluation of extreme hydrological phenomena using various methods and models is very important, as anthropogenic activities can negatively affect the application of frequency analyses. It is vital to have complete flood regime data to be able to generalize such information on the basis of long-term observations from basins. We consider the plotting of flood risk maps for these extreme hydrological situations to be necessary regardless of the Directive 2007/60/EC of the European Parliament. Determining design values for extreme floods with a long return period (once every 100, 500 or 1,000 years) is a complex process with great levels of uncertainty.
Assessing the design discharges along the Danube channel is also complicated by the different estimation methods in various countries. Therefore, it is necessary to harmonize flood design value assessment methods. The authors in [53] summarized the regionalization of distribution functions estimated for annual peak discharges in the Danube basin based on regional empirical relationships from sufficiently long and reliable series of annual peak discharges available for 176 water gauging stations in the Danube catchment. The aim was to facilitate the estimation of the quantile of annual peak discharge and the related specific flood discharges in the ungauged river sections of that catchment.
Our paper presents another possible approach for determining the design values of the T-year floods with very long return periods along the Danube River. We tested and used a uniform methodology to estimate the design values of flood discharges in 20 stations along the Danube River to harmonize the methodology for design discharge  value estimation in stations along the Danube. The Log-Pearson type III distribution was selected for its flexibility and because it can be used with extreme values according to the coefficient of skewness (G). Using one type of distribution made it possible to generalize its skewness coefficients. Thereafter, we were able to estimate T-year discharges at gauges with short observation period, and at sites between gauges. The first results showed that the station skew parameter Gs indicated high positive values at the Danube River stations with low infiltrating areas, quick propagation of flood waves, and one or more extremely high peak discharge. On the other hand, the station skew parameter Gs indicated negative values at the Danube River stations with a higher share of infiltrating areas and runoff from catchment regulated by lakes and wetlands. Previous experiences with the occurrence of extreme floods also showed that the historical flood events need to be included in the latest calculations when estimating the threat of such events. For the Danube River, we can see how the inclusion of historical floods in the measured data series can change the estimation of the discharge values with return periods from 100 to 1,000 years. The results showed that the inclusion of historical floods can change and increase the design discharge. The authors in [54] assessed the added value of using historical data for flood quantile estimation, and their results showed that using historical flood information improved both the reliability and stability of the design flood estimates.

Conclusions
The Log-Pearson III distribution fits well with the observed data, and it is an appropriate mathematical tool for estimating the design values with long return periods.
The skew parameter and its optimal setting for the gauging station is sufficient in many cases; however, in practice, this may not be sufficient when considering a short data series or a data series from a period without historical data or a recent significant extreme.
An alternative method for estimating the design values in basins without extremes is to use the generalized skew parameter derived from a data series from basins with similar runoff regimes where the extremes are known or captured.
Our investigation showed that the regional skew parameter curved the LPIII distribution curves to capture all discharges with low probability of occurrence with confidence intervals Q5%-Q95%.
For cases in which the extreme value was included in the analyzed period and the long-term runoff depth was lower, the station's historical skew parameter achieved a better curvature for the LPIII distribution curve.
An important conclusion of this study relates to how local conditions and datasets at gauging stations can determine the magnitude of discharge uncertainty and how the discharge uncertainty changes according to time and magnitude. The results of this study can be applied to investigate and improve the estimation of the design discharges of ungauged or poorly gauged rivers within the Danube River Basin. In frequency analyses, it is again important to note that the process is never-ending and, if anything, will change according to catchment. Thus, it is necessary to recalculate distribution curves and define new design discharges for recent periods for particular stations.