On the Uncertainty and Changeability of the Estimates of Seasonal Maximum Flows

A classical approach to flood frequency modeling is based on the choice of the probability distribution to best describe the analyzed series of annual or seasonal maximum flows. In the paper, we discuss the two main problems, the uncertainty and instability of the upper quantile estimates, which serve as the design values. Ways to mitigate the above-mentioned problems are proposed and illustrated by seasonal maximum flows at the Proszówki gauging station on the Raba River. The inverse Gaussian and generalized exponential distributions, which are not commonly used for flood frequency modeling, were found to be suitable for Polish data of seasonal peak flows. At the same time, the heavy tailed distributions, which are currently recommended for extreme hydrological phenomena modeling, were found to be inappropriate. Applying the classical approach of selecting the best fitted model to the peak flows data, significant shifts in the upper quantile estimates were often observed when a new observation was added to the data series. The method of aggregation, proposed by the authors, mitigates this problem. Elimination of distributions that are poorly fitted to the data series increases the stability of the upper quantile estimates over time.


Introduction
The main goal of flood frequency analysis (FFA) is to assess the size of the probable flood peak flows. As the true probability distribution function (PDF), which describes the analyzed series of annual or seasonal maximum flows, is not known, the flood frequency analysis refers to a hypothetical distribution whose parameters are estimated based on the observation series. Due to many practical applications, interest has been focused on the upper quantile estimates. For example, their values are necessary when most hydraulic structures are dimensioned and when the limits of flood zones are determined. The most commonly used design value is the 1% quantile estimate (Q max1% ) and it represents the probable maximum flow that is exceeded on average once in a hundred years. The 1% quantile is equivalent to the 0.99th quantile in the notation of probability of non-exceedance.
Research on FFA has been conducted for almost a century [1,2], resulting in numerous PDFs proposed for describing the maximum flow series. According to the World Meteorological Organization (WMO) in 1989, the Gumbel and log-normal distributions were the most widely used, followed by log-Pearson 3 and Pearson 3 distributions [3]. Nowadays, heavy-tailed distributions are recommended such as the generalized extreme values distribution (GEV), generalized log-logistic (GLL), or log-Pearson 3 (LP3) [4][5][6]. However, the heavy-tailed form of hydrological variables is not sufficiently supported in Polish conditions [7,8] and soft-tailed distributions appear to be more adequate [9]. In general, the selection of the appropriate distribution function for peak discharges modeling is to some extent subjective and driven by the regional regime of the rivers or the particular country's regulations Water 2020, 12 (e.g., [4,10,11]). Therefore, one cannot point out one model that would be the best or even better than the others to be generally used for the estimation of flood quantiles. For annual and seasonal maximum flow models, the unbounded distributions with non-negative skewness are usually used. In this paper, two-parameter distributions with the scale and shape parameters and lower bounded at zero were investigated, instead of their three-parameter equivalents, since two parameters are more reliable to estimate from short hydrological data series, which are usually available [3,12]. Note that two-parameter distributions are commonly used for academic and practical applications when dealing with short measurement series and when there is no regional information. The list of probability density functions analyzed here is presented in Table 1. The domain of the shape parameter k is given, while the scale parameter α > 0 and random variable x ≥ 0. Table 1. Probability density functions used in the paper.

Distribution
Probability Density Function (PDF) The distributions of gamma (Ga), Weibull (We), log-normal (LN), log-logistic (LL), and log-Gumbel (LG) are commonly used in FFA, while the inverse Gaussian (IG) and generalized exponential (GE) distributions are not in standard practice. The inverse Gaussian has alternative names such as the Wald distribution and, in Poland, the convective diffusion distribution [13] and has revealed a good fit to the numerous annual maximum flow series for Polish lowland rivers [9]. The generalized exponential distribution developed by Gupta and Kundu [14] has been compared to gamma, Weibull, Pareto, and log-normal distributions [15][16][17], since all of them have a positive skewness. In Poland, Brzeziński presented the possibility of using GE distribution in order to describe the random properties of maximum flows [18], while Markiewicz introduced GE to flood frequency analysis of annual Polish data [19]. In the last paper, the GE distribution has been an alternative to IG distribution and has proven to be promising in modeling the annual peak flows for Polish rivers.
To select the theoretical distribution for the observed data series, the method of the moment ratio diagram (MRD) is used in many fields [20,21]. When two-parameter distributions are considered, the skewness coefficient is usually plotted against the variation coefficient both for the candidate PDFs and for the data series, and their relative locations are compared [22]. The MRD based on the L-moment ratios is superior to the conventional MRD, since the results from the L-moment MRD are more compatible with the results obtained from the goodness-of-fit statistics than in the case of conventional MRD [5,23,24].
The scope of this study refers to the seasonal maximum flows. In Poland and countries with similar hydrological regimes, two general hydrological seasons are considered: the summer season, which in Poland goes from May 1 to October 31, and the winter season from November 1 to April 30. The floods occurring in each season have different origins: rainfalls in summer and thaws in winter. However, nowadays, when global climate changes are observed, the genesis of floods in particular seasons is not as clear and other factors are becoming more important such as land cover, retention, or air circulation, Water 2020, 12, 704 3 of 18 which differs in summer and winter [25][26][27]. At present, very few snowmelt floods are observed [28] and we can at most say that there are floods from the mixture of rain and snowmelt. The snowmelt as a factor causing floods is of little relevance for some regions like the northern United Kingdom, western France, or northern Iberia [28,29]. The seasonal approach is beneficial in analyzing the impact of climatic change on the high water regime due to the greater vulnerability of seasonal flood generation processes on climate drivers [30,31]. The comparison of annual and seasonal approaches for modeling the peak flow series of Polish data has been investigated in theoretical and real aspects [32,33].
The seasonal maximum flows are genetically homogeneous [34,35]. If they are independent, the product model of seasonal distributions can be used [35,36]. Although the issue of the independence of seasonal peak flows can be disputed [37], for the Polish hydrological regime, the assumption of the independence of summer and winter maximum flows is reasonable [32,33,38]. Otherwise, the bivariate distributions or copula functions ought to be used [38,39].
The aim of the study was to evaluate the accuracy and stability of the estimates of seasonal maximum flows and give suggestions for their improvement. A particular purpose was the assessment of the applicability of two distributions, which has not been used for the modeling of seasonal maximum flows thus far, and these are the generalized exponential (GE) and the inverse Gaussian (IG) distributions. The second particular goal of this study was a popularization of the aggregation method, which was developed by the authors in [40], but has not been investigated within the context of flood frequency analysis as yet. Here, the influence of the number and type of competitive distributions on the aggregation of seasonal flood quantile estimates was analyzed. The novelty of this research consists of attracting the attention of researchers to the problem of changing design estimates along with the lengthening observational series. To the best our knowledge, this issue has not been addressed in theoretical studies, however, its scrutinized analysis is necessary for engineering, planning, and water-economic applications. As we are going to show here, the aggregation method provides design values that are more stable than in the case of choosing one as the best model.

Case Study
In this paper, the data of seasonal maximum flows from 37 gauging stations at Polish Rivers were investigated. The stations are listed in Appendix A (Table A1), along with the observation periods and the mean value of both the summer and winter maximum flows, while the location of all stations is illustrated in Figure 1. The studies mainly focused on the catchment of the Vistula Basin, which is characterized by high flood potential in its southern part. Here, the dominance of summer floods caused by intensive, long-lasting rainfall is observed. Gauging stations in the central part of the country represent mainly a snowmelt or mixed snowmelt-rainfall flood regime.
The studies presented in the paper are illustrated for the Proszówki gauging station on the Raba River (no. 17, Figure 1). The data series of seasonal peak flows from the period 1951-2016 was considered. A choice of this hydrological station was caused by the main reason to reflect the difficulties hydrologists come across in their work. Proszówki Station is the closing station of the Raba River catchment. The reduction of flood peaks at Proszówki is due to the Dobczyce Reservoir (put in use in 1986) situated about 30 km upstream. The reservoir, which has total storage capacity of 127 million m 3 and a flood storage capacity of about 30 million m 3 , is a multi-purpose reservoir for flood protection, hydropower generation, and drinking water supply. In the face of a lack of information about water management for the reservoir and the awareness of the importance of naturalization of the flow, we decided to apply the simplified method for the reconstruction of the seasonal flood peak flows based on regression analysis with the reservoir upstream station, Stróża, for the common observation period before 1986, the coefficients of determination being equal to 82.36% for winter and 74.25% for summer, respectively. It was obvious that this simplified procedure introduced additional uncertainty to the data series. However, there are other sources of uncertainty one cannot avoid (e.g., uncertainties of flow measurements, upper parts of rating curves, and in data processing and control). Despite the continuous development of measurement methods, the assessment of high waters is still not perfect. Moreover, we cannot quantify or eliminate uncertainty from historical data, when the measurement methods were much less precise than today. All of these uncertainties, together with different criteria of the best model selection, makes the problem of design quantile assessment equivocal, what is shown in the next sections. The studies presented in the paper are illustrated for the Proszówki gauging station on the Raba River (no. 17, Figure 1). The data series of seasonal peak flows from the period 1951-2016 was considered. A choice of this hydrological station was caused by the main reason to reflect the difficulties hydrologists come across in their work. Proszówki Station is the closing station of the Raba River catchment. The reduction of flood peaks at Proszówki is due to the Dobczyce Reservoir (put in use in 1986) situated about 30 km upstream. The reservoir, which has total storage capacity of 127 million m³ and a flood storage capacity of about 30 million m³, is a multi-purpose reservoir for flood protection, hydropower generation, and drinking water supply. In the face of a lack of information about water management for the reservoir and the awareness of the importance of naturalization of the flow, we decided to apply the simplified method for the reconstruction of the seasonal flood peak flows based on regression analysis with the reservoir upstream station, Stróża, for the common observation period before 1986, the coefficients of determination being equal to 82.36% for winter and 74.25% for summer, respectively. It was obvious that this simplified procedure introduced additional uncertainty to the data series. However, there are other sources of uncertainty one cannot avoid (e.g., uncertainties of flow measurements, upper parts of rating curves, and in data processing and control). Despite the continuous development of measurement methods, the assessment of high waters is still not perfect. Moreover, we cannot quantify or eliminate uncertainty from historical data, when the measurement methods were much less precise than today. All of these uncertainties, together with different criteria of the best model selection, makes the problem of design quantile assessment equivocal, what is shown in the next sections.
The bar charts of regulated (observed) and unregulated (reconstructed) maximum seasonal flow series are presented in Figure 2. Then, flood frequency analysis was performed with unregulated The bar charts of regulated (observed) and unregulated (reconstructed) maximum seasonal flow series are presented in Figure 2. Then, flood frequency analysis was performed with unregulated flood data (blue columns in Figure 2). The applied procedure diminished the decreasing trends in the average value of the maximum seasonal flows caused by the reduction in the culmination of flood waves by the reservoir (red and blue lines in Figure 2).
The mean of unregulated flow maxima in 1951-2016 for summer peak flows was 433 (m 3 /s), and for winter peak flows it was 213 (m 3 /s), while the variation coefficient Cv was 0.718 and 0.569, respectively. average value of the maximum seasonal flows caused by the reduction in the culmination of flood waves by the reservoir (red and blue lines in Figure 2).
The mean of unregulated flow maxima in 1951-2016 for summer peak flows was 433 (m 3 /s), and for winter peak flows it was 213 (m 3 /s), while the variation coefficient Cv was 0.718 and 0.569, respectively.

Model Selection Procedures
To determine which of the alternative models was the best for modeling the maximum flow series of each season, the following procedures of model selection were used in this paper: 1. The AIC criterion (Akaike information criterion) is based on the likelihood function of distribution maximized with respect to the parameter values, also taking into account the number of distribution parameters . The AIC selection statistic is defined by Equation (1) for the i-th distribution of the probability density function from among alternative distributions and for = 1, . . , [41]: where N is the sample size and denotes the parameters of the i-th distribution, which can be determined by any estimation method. In this paper, three methods were used: the method of conventional moments (MOM) [42]; the linear moments method (LMM) [43]; and the maximum likelihood method (MLM) [44]. 2. The QK (Quesenberry-Kent) procedure is based on the density function modified to the form given by Equation (2), which is invariant under a scale transformation of the data [45]: where is the probability density function of i-th distribution with its scale parameter equal to one, and the unknown shape parameter that is estimated by the MLM method. To facilitate the analysis of the results and to compare the positive numbers with each other, the original selection statistic is preceded by a minus sign and a minimum value from all models is determined as the solution, instead of the maximum value.

Model Selection Procedures
To determine which of the alternative models was the best for modeling the maximum flow series of each season, the following procedures of model selection were used in this paper: 1.
The AIC criterion (Akaike information criterion) is based on the likelihood function of distribution maximized with respect to the parameter values, also taking into account the number of distribution parameters k. The AIC i selection statistic is defined by Equation (1) for the i-th distribution of the probability density function f i from among m alternative distributions and for i = 1, .., m [41]: where N is the sample size andθ i denotes the parameters of the i-th distribution, which can be determined by any estimation method. In this paper, three methods were used: the method of conventional moments (MOM) [42]; the linear moments method (LMM) [43]; and the maximum likelihood method (MLM) [44].

2.
The QK (Quesenberry-Kent) procedure is based on the density function modified to the form given by Equation (2), which is invariant under a scale transformation of the data [45]: where f i is the probability density function of i-th distribution with its scale parameter equal to one, and the unknown shape parameter that is estimated by the MLM method. To facilitate the analysis of the results and to compare the positive numbers with each other, the original selection statistic is preceded by a minus sign and a minimum value from all models is determined as the solution, instead of the maximum value.

3.
The KS (Kolmogorov-Smirnov) goodness of fit test is based on the Kolmogorov-Smirnov statistics defined by Equation (3) [46,47]: where p i x j:N denotes the theoretical probability of the j-th element of the ordered sample x j:N ≥ . . . ≥ x N:N with its empirical probabilityp j:N , expressed here by the Weibull formula (i.e.,p j:N = j/(N + 1)). The parameters of distribution can be estimated using three methods. The advantage Water 2020, 12, 704 6 of 18 of the Kolmogorov-Smirnov statistics is the independence of probability distribution D max from the CDF that is tested [48].

4.
The R (range) procedure has two variants. The first variant, R 1 , is based on the distance between the estimate of the 1% quantile from the maximum likelihood method and its value from the method of moments, and it is calculated using Equation (4) [49]: Another variant of range statistic is R 2 , proposed by the authors [19], and is based on the distance between the 1% quantile estimate from the maximum likelihood method and its value from the linear moments method. The R 2 statistic is defined by Equation (5): Regarding the decision rules of model selection procedures 1-4, the probability distribution with the smallest value of the selection statistic was the closest to the true population distribution among all competing models, while the model with the highest value of the selection statistic was the least fitting. The efficiency of the above model selection procedures has been studied in the literature for various configurations of distributions [19,[49][50][51].

Models Aggregation
In the classical approach, to estimate the probable maximum flow that serves as a hydrological design value, the commonly used methods are based on the choice of the best probability distribution from among m candidate distributions (i = 1, . . . , m). Meanwhile, the aggregation method proposed by the authors uses all of the information contained in these distributions by aggregating quantiles of individual distributionsQ maxp i in the form of the weighted average. The formula of the aggregated (average) quantile Q maxp is expressed by Equation (6) [40]: The weights w i are based on the value of the AIC i criterion according to Equation (7): where is the difference between the AIC value for the i-th model and the lowest value of AIC among all distribution candidates, which in fact means the best fitted distribution to the data series, according to the AIC criterion. For the set of PDFs with the same number of parameters, Equation (7) reduces to Equation (8), and then the likelihood function is used instead of the AIC value: Note that the weights w i can be interpreted (based on the Bayes theorem with equal a priori probabilities for all models) as the a posteriori probabilities of the adequacy of the i-th model. Hence, the aggregated quantile represents the expected value of m quantiles conditioned by the set of m models used for estimation as the candidate models.

Models for Seasonal Maximum Flows of Polish Data
For making the initial selection of appropriate distributions for Polish data of seasonal peak flows, both conventional and linear moment ratio diagrams MRD were plotted in Figure 3. For two-parameter distributions, both the relation of conventional skewness coefficient Cs versus variation coefficient Cv (Figure 3a) and the relation of linear skewness coefficient LCs versus linear variation coefficient LCv (Figure 3b) were plotted as the curves, while the above relations for the data series were plotted as the point. Point number 17 (red and blue) in Figure 3 represents Proszówki Station on the Raba River, which will serve here for the case study.

Models for Seasonal Maximum Flows of Polish Data
For making the initial selection of appropriate distributions for Polish data of seasonal peak flows, both conventional and linear moment ratio diagrams MRD were plotted in Figure 3. For twoparameter distributions, both the relation of conventional skewness coefficient Cs versus variation coefficient Cv (Figure 3a) and the relation of linear skewness coefficient LCs versus linear variation coefficient LCv (Figure 3b) were plotted as the curves, while the above relations for the data series were plotted as the point. Point number 17 (red and blue) in Figure 3 represents Proszówki Station on the Raba River, which will serve here for the case study.   When a point representing a gauging station is near a line that corresponds to a distribution, it means that this distribution may be the best fit for the peak flow series from the station. In an ideal situation, when the data follow a given distribution, an exact match would be achieved for an infinite sample. Since the data series are of limited length, the above graphs can only provide an approximation and should be established by procedures of model selection. The analysis of this issue is provided in Section 4.2.2.
In Figure 3, there is a range of the values for Cv-Cs and LCv-LCs, respectively, which was achieved by a large group of Polish measurement series and not covered by two-parameter distributions usually used in FFA, marked by solid lines. The dotted lines illustrate inverse Gaussian and generalized exponential distributions, which appear to fit well within the scope of Polish data of seasonal peak flows. This is particularly evident for winter peak flows in Figure 3a and summer peak flows in Figure 3b, where about half of the stations are out of range to match the commonly used probability distributions.

Estimation of Quantiles of the Maximum Flow Distribution
For seasonal maximum flows at Proszówki Station, seven hypothetical distributions were successively assumed (see Table 1). For each of the PDFs, two goodness of fit tests, χ 2 and the Kołmogorov-Smirnov tests [47], were performed and both of them determined that all seven PDFs were acceptable under the null hypothesis. The estimates of the 1% quantile calculated with three methods are shown in Table 2 along with the standard deviation (σ) of the estimates for the individual methods. For both seasons, the bolded number represents the smallest valueQ max1% among all considered models, which is yielded by the Weibull distribution along with the MLM for the summer season and with LMM for the winter one. The underlined number represents the greatest valueQ max1% and, for both seasons, it comes from the log-Gumbel distribution along with the MLM estimation method, and it stands out with a huge value in comparison with other models. The heavy-tailed distributions, represented by the LL and LG, were included in our studies, although their poor fit to the Polish data (see Figure 3) may result in the overestimation ofQ max1% , especially in the case of the LG distribution and the MLM. The MLM's estimators of quantiles are unbiased only if a parent (true) distribution is used for the estimation and the sample is large (asymptotic). Moreover, the MLM concentrates on the main mass of probability (i.e., far from the upper quantiles). In contrast, the MOM or LMM generally produces less biased estimators of upper quantiles [52][53][54]. Please also note that the MOM, and to a lesser extent the LMM, is more robust to the misspecification of the applied statistical models, and the values of the estimators of the upper quantiles are similar, regardless of the model applied (see the values of the standard deviation in the last column).
Regarding the variability of the quantile estimates for a given distribution depending on the choice of the estimation method, the Ga, We, GE distributions remain the least sensitive to the choice of estimation method, while the LG is the most sensitive.

Uncertainty of Distribution Selection
To determine which of the alternative models is the best for modeling the maximum flow series of each season, four procedures of model selection were used (Section 3.1) and the results are presented in Table 3. For each model selection procedure and estimation method, the best fitting distribution is marked in bold, while the PDFs fitting at the second and third place are marked in italics and underlined fonts, respectively. For instance, for the summer peak flows, according to the AIC criterion and using the MOM estimation method, the distribution best fitting the data was GE, followed by Ga and We, which is identical to the results of the AIC criterion and the LMM and MLM estimation methods, and also coincides with the results of the R 1 procedure. The GE distribution also turns out to be the best fit to the data among all competitive PDFs, after which the LN and LL are placed. The GE distribution also turns out to be the best fit to the data among all competitive PDFs, when QK procedure or KS test along with MOM method of estimation are applied. In both cases, the second and third place are occupied by Ga and LN distributions, respectively. For the KS test with the LMM estimation method, the best distribution was IG, followed by GE and LN, while with the MLM estimation method, surprisingly, the best was the LL distribution, followed by LN and GE, respectively. Finally, according to the R 2 selection procedure, first place was taken by the Ga distribution, followed by We and GE.
In the case of the winter maximum flows, there was a very similar layout of the best fitted distributions (i.e., the first place) as was in the case of the summer maximum flows. The GE distribution was the best according to the AIC criterion, regardless of the estimation method and according to the QK and R 1 procedures. The IG and LL distributions occupied first place when the KS test was applied, together with the LMM and MLM estimation methods, respectively. Except for the case of the R 1 procedure, in all variants of the model selection procedure with an estimation method, the LN distribution was in second place.
Summarizing summer season, the GE distribution ranked first in six out of nine possible variants of the model selection procedure with the estimation method, while the Ga, IG, and LL distributions had one win each. For the winter season, the GE distribution was best in five out of nine possible variants, IG ranked first in two variants, while We and LL had one each.
It is worth noting that for the Proszówki gauging station, the results of the distribution fitting to the data from the model selection procedures coincided with the results from the LCv-LCs diagram (Figure 3b). In general, the GE, Ga, LN, and IG distributions were the best fit for the summer peak flow series (red point 17), while the GE, IG, LN, and LL were right for the winter peaks (blue point 17).
The analysis conducted for Proszówki and several other gauging stations from the Vistula and Oder Basins presented in Figure 1 showed that both the GE and IG distributions can be recommended for modeling the seasonal maximum flows in Poland. Compared with the distributions popular in flood frequency modeling, the IG and GE distributions occupied leading positions. Nevertheless, the choice of one model that best fits the data is ambiguous and depends on the selection of the estimation method and the model selection procedure. This problem is typical for samples usually available in hydrology. Meanwhile, at practical applications, for example, the design of hydraulic structures, one value for the maximum flow estimate is desirable with the least uncertainty. The solution may be the method of quantile aggregation proposed by the authors [19,40] and discussed in the next section.

Instability of High Quantile Estimates with Increasing Length of Data Series
In the classical approach to estimating quantiles of maximum flow distributions, apart from the uncertainty resulting from the choice of the best distribution, there is a problem of the instability of estimations when lengthening the measurement series. As the length of the observation series increases, the repetition of the procedure (choice of distribution, estimation method, and model selection procedure) can lead to a change in the type of the best distribution, and at least to a change in the values of the parameter estimates of the current distribution. Due to the small size of the series and hence the strong dependence of the result on the successive incoming elements, these changes are periodic and chaotic and can cause significant changes in the design quantile during the multistage procedure of designing. This phenomenon (more or less accentuated) has been observed for almost all investigated seasonal peak flow series, as illustrated by the example of seasonal maximum flows from the period 1951-2016 for the Proszówki gauging station on the Raba River (Figure 4). Summarizing summer season, the GE distribution ranked first in six out of nine possible variants of the model selection procedure with the estimation method, while the Ga, IG, and LL distributions had one win each. For the winter season, the GE distribution was best in five out of nine possible variants, IG ranked first in two variants, while We and LL had one each.
It is worth noting that for the Proszówki gauging station, the results of the distribution fitting to the data from the model selection procedures coincided with the results from the LCv-LCs diagram (Figure 3b). In general, the GE, Ga, LN, and IG distributions were the best fit for the summer peak flow series (red point 17), while the GE, IG, LN, and LL were right for the winter peaks (blue point 17).
The analysis conducted for Proszówki and several other gauging stations from the Vistula and Oder Basins presented in Figure 1 showed that both the GE and IG distributions can be recommended for modeling the seasonal maximum flows in Poland. Compared with the distributions popular in flood frequency modeling, the IG and GE distributions occupied leading positions. Nevertheless, the choice of one model that best fits the data is ambiguous and depends on the selection of the estimation method and the model selection procedure. This problem is typical for samples usually available in hydrology. Meanwhile, at practical applications, for example, the design of hydraulic structures, one value for the maximum flow estimate is desirable with the least uncertainty. The solution may be the method of quantile aggregation proposed by the authors [19,40] and discussed in the next section.

Instability of High Quantile Estimates with Increasing Length of Data Series
In the classical approach to estimating quantiles of maximum flow distributions, apart from the uncertainty resulting from the choice of the best distribution, there is a problem of the instability of estimations when lengthening the measurement series. As the length of the observation series increases, the repetition of the procedure (choice of distribution, estimation method, and model selection procedure) can lead to a change in the type of the best distribution, and at least to a change in the values of the parameter estimates of the current distribution. Due to the small size of the series and hence the strong dependence of the result on the successive incoming elements, these changes are periodic and chaotic and can cause significant changes in the design quantile during the multistage procedure of designing. This phenomenon (more or less accentuated) has been observed for almost all investigated seasonal peak flow series, as illustrated by the example of seasonal maximum flows from the period 1951-2016 for the Proszówki gauging station on the Raba River ( Figure 4).   Starting from the 25-year series of seasonal peak flows (i.e., the measurements from 1951 to 1975), the best fitted distribution was chosen from among seven candidate distributions (see Table 1) and the quantile Q max1% corresponding to the best fitted distribution was assigned. This procedure has been repeated for data series extended for a successive year, up to the complete 66-year series. The MLM method was applied to estimate the distribution parameters and the AIC criterion was used to choose the best fit distribution.
For the series of maximum summer flows (Figure 4a), a significant difference in the estimates of 1% quantile was observed in the years 1982, 1984, and 1993-1994. Then, the type of the best fitted distribution changed from GE to LN, resulting in a significant increase in design value. The changes of other types of distribution did not cause such large changes in the estimates. It is worth noting that the peaks in the estimation curve are related to those of the smallest value of the observations, especially the peaks in 1982 and 1993 (see Figure 4a). For the series of maximum winter flows (Figure 4b), the greatest decreases were observed in 1977 and 2000. These were due to the change in the distribution type from IG or LN into GE. Moreover, in 2005-2006, there was a two-step increase in the 1% quantile estimates. Since 2000, the GE was the best fit distribution among the seven competing models, the jump was due to a change in the parameters of the current (GE) distribution with the addition of observations from 2005 and 2006. Interestingly, the largest shifts in the estimation curve (i.e., in 1977 and 2000) were downhill and were associated with those of the highest observation values (see Figure 4b).
The above results of the best matching distributions for the summer and winter maximum flows are consistent with the LCv-LCs graph (Figure 2b) for the red and blue points of number 17, respectively. Moreover, the variability of the seasonal 1% quantile estimates (Figure 4) broadly overlapped with the earlier studies [55] and with the latest studies [28]. In general, the summer peak flows were identified as stationary, while the winter peak flows were decreasing.
The jumps in the 1% quantile estimates occurred at various scales in all of the investigated data series. Some examples are given in Figure 5. Starting from the 25-year series of seasonal peak flows (i.e., the measurements from 1951 to 1975), the best fitted distribution was chosen from among seven candidate distributions (see Table 1) and the quantile % corresponding to the best fitted distribution was assigned. This procedure has been repeated for data series extended for a successive year, up to the complete 66-year series. The MLM method was applied to estimate the distribution parameters and the AIC criterion was used to choose the best fit distribution.
For the series of maximum summer flows (Figure 4a), a significant difference in the estimates of 1% quantile was observed in the years 1982, 1984, and 1993-1994. Then, the type of the best fitted distribution changed from GE to LN, resulting in a significant increase in design value. The changes of other types of distribution did not cause such large changes in the estimates. It is worth noting that the peaks in the estimation curve are related to those of the smallest value of the observations, especially the peaks in 1982 and 1993 (see Figure 4a). For the series of maximum winter flows (Figure  4b), the greatest decreases were observed in 1977 and 2000. These were due to the change in the distribution type from IG or LN into GE. Moreover, in 2005-2006, there was a two-step increase in the 1% quantile estimates. Since 2000, the GE was the best fit distribution among the seven competing models, the jump was due to a change in the parameters of the current (GE) distribution with the addition of observations from 2005 and 2006. Interestingly, the largest shifts in the estimation curve (i.e., in 1977 and 2000) were downhill and were associated with those of the highest observation values (see Figure 4b).
The above results of the best matching distributions for the summer and winter maximum flows are consistent with the LCv-LCs graph (Figure 2b) for the red and blue points of number 17, respectively. Moreover, the variability of the seasonal 1% quantile estimates (Figure 4) broadly overlapped with the earlier studies [55] and with the latest studies [28]. In general, the summer peak flows were identified as stationary, while the winter peak flows were decreasing.
Significant differences in the estimates of hydrological design values in subsequent years are a big problem for designers of hydrotechnical structures and engineers. As hydrological design investments last several years, the projects require multiple revisions, resulting in high financial costs. To minimize the effect of unstable design values, the method of aggregation (see Section 3.2) can be used to assess the quantile of the peak flows.

Case Study of Model Aggregation
Characteristic values of the aggregation method calculated for seasonal maximum flows at the Proszówki gauging station on the Raba River from the period 1951-2016 are shown in Table 4 for seven candidate distributions. For the summer season, the lowest AIC value corresponding to the GE distribution represents the best fit of this distribution to the data series from all candidate distributions. This results in the largest share (i.e., the highest weight) of the quantile of GE distributionQ max1% (GE) in the aggregated quantile Q max1% and is equal to 0.352. The second distribution in order was Ga with a slightly smaller weight equal to 0.31. Hence, a contribution of both the GE and Ga distributions to the aggregated quantile was similarly high. Significantly smaller weights had the quantiles of We, LN, and LL distributions, which are 0.125, 0.112, and 0.082, respectively. The other two PDFs (i.e., IG and LG) yielded a much smaller contribution to the aggregated quantile, especially LG, since its weight is of the order of zero (it is 7E-06 exactly).
Regardless of the matching criterion, the distribution matches were less pronounced when the observation series was shorter. For instance, for the 40-element series from the period 1951-1990, the weights of candidate distributions are in sequence: Ga-0.219, We-0.154, GE-0.230, IG-0.126, LN-0.195, LL-0.073, and LG-0.003. Thus, the leaders have lower weights and the least fit distributions have higher weights, compared to the whole observation period.
For the winter season, the GE distribution was clearly dominant as the best fit to the data and the next was LN. The smallest weights corresponded to the We and LG distributions and the latter had an exact weight equal to 14E-05.
Due to the above considerations, the question arises as to how the poorly matched distributions affect the value of an aggregate quantile. Is it more sensible to take into account the poorly matching distributions, or to use selection and reject the distribution (or distributions) with the weakest fit and not include them in the calculation of the aggregate quantile? The analysis of the above issue is presented in the next subsection.

Impact of Competitive Distributions on Aggregated Quantiles
For the analyzed station, the 1% quantile estimates obtained from the classical method of selecting the best fitted distribution (using the MLM estimation method and AIC for the selection of the best matched distribution) were compared with the estimates from the aggregation method with several variants of the distribution sets. The results are illustrated in Figure 6. not include them in the calculation of the aggregate quantile? The analysis of the above issue is presented in the next subsection.

Impact of Competitive Distributions on Aggregated Quantiles
For the analyzed station, the 1% quantile estimates obtained from the classical method of selecting the best fitted distribution (using the MLM estimation method and AIC for the selection of the best matched distribution) were compared with the estimates from the aggregation method with several variants of the distribution sets. The results are illustrated in Figure 6.
(a) (b) Figure 6. The estimates of the 1% quantile of (a) summer and (b) winter peak flows for the Proszówki gauging station on the Raba River obtained by the method of selecting the best fit distribution and by the aggregation method with variants of the candidate distributions.
In comparison with the classical approach to flood quantile estimation (black line with dots in Figure 6), the original variant of the aggregation method (i.e., when all alternative distributions are taken into account (red line)) significantly reduced the variability of the 1% quantile estimate when extending the observation series for subsequent years. For both the summer and winter seasons, the jumps at the quantile estimates were smoothed out. When the least fitted distribution (i.e., the one with the lowest weight) was removed from the group of candidate distributions, then the estimate value of the 1% quantile decreased for a given observation length (blue line). This was due to the fact that for each length of the measurement series of the maximum flows for both seasons, the distribution of the least fit was the LG distribution, which provided much higher estimates of the design values than the other alternative distributions (see Table 1). After the removal of the second worst-fit distribution (green line), the estimates of % still decreased in the case of the summer season, while they slightly increased in the winter season. In the case of the maximum summer flows, the second least fit distribution is the LL until 1998 and since 1999 it is the IG distribution. Meanwhile, in the case of the maximum winter flows, the second weakest matching distribution was We. It is worth noting that the 1% quantile estimates were almost identical for all three variants of aggregation method (i.e., for all three groups of candidate distributions, for the summer peak flow series of more than 49 elements, that is, for the series ending in 1999 or later (Figure 6a). In this case, for the long observation series, the least-fitting distributions (i.e., LG and IG) had very small weights of zero order, so their removal from the set of alternative distributions did not significantly affect the estimation value of % . A similar effect was observed for the same period (from 1999) of the winter peak flows, but only for two variants of aggregation, when all seven candidate distributions were considered and when the least fit PDF was excluded from aggregation ( Figure 6b). Excluding two least fit distributions caused a slight increase of % .

Conclusions
In this paper, the classical and alternative approaches to the estimation of the upper quantiles of the probability distribution of maximum seasonal flows were presented for the Proszówki gauging In comparison with the classical approach to flood quantile estimation (black line with dots in Figure 6), the original variant of the aggregation method (i.e., when all alternative distributions are taken into account (red line)) significantly reduced the variability of the 1% quantile estimate when extending the observation series for subsequent years. For both the summer and winter seasons, the jumps at the quantile estimates were smoothed out. When the least fitted distribution (i.e., the one with the lowest weight) was removed from the group of candidate distributions, then the estimate value of the 1% quantile decreased for a given observation length (blue line). This was due to the fact that for each length of the measurement series of the maximum flows for both seasons, the distribution of the least fit was the LG distribution, which provided much higher estimates of the design values than the other alternative distributions (see Table 1). After the removal of the second worst-fit distribution (green line), the estimates of Q max1% still decreased in the case of the summer season, while they slightly increased in the winter season. In the case of the maximum summer flows, the second least fit distribution is the LL until 1998 and since 1999 it is the IG distribution. Meanwhile, in the case of the maximum winter flows, the second weakest matching distribution was We. It is worth noting that the 1% quantile estimates were almost identical for all three variants of aggregation method (i.e., for all three groups of candidate distributions, for the summer peak flow series of more than 49 elements, that is, for the series ending in 1999 or later (Figure 6a). In this case, for the long observation series, the least-fitting distributions (i.e., LG and IG) had very small weights of zero order, so their removal from the set of alternative distributions did not significantly affect the estimation value of Q max1% . A similar effect was observed for the same period (from 1999) of the winter peak flows, but only for two variants of aggregation, when all seven candidate distributions were considered and when the least fit PDF was excluded from aggregation ( Figure 6b). Excluding two least fit distributions caused a slight increase of Q max1% .

Conclusions
In this paper, the classical and alternative approaches to the estimation of the upper quantiles of the probability distribution of maximum seasonal flows were presented for the Proszówki gauging station on the Raba River. Despite the multiplicity of probability distributions that have been proposed for flood frequency modeling, the analysis of Polish data of seasonal maximum flows from 37 gauging stations shows that new models are still desirable. The classical approach based on the selection of the best fitting distribution among candidate distributions reveals that the inverse Gaussian and generalized exponential distributions, which are not commonly applied in standard flood frequency modeling, occupy leading positions in the fit to the seasonal peak flows of Polish rivers. At the same time, the heavy tailed distributions turned out to not be suitable for the description of seasonal maximum flows for Polish rivers, in particular the log-Gumbel distribution (i.e., two-parameter equivalent of the GEV distribution). Generally, the choice of the best model and consequently the estimation of the hydrological design value (e.g., 1% quantile) strongly depends on the estimation method and the model selection procedure, which is characteristic of hydrological sample sizes. As a consequence, the development of flood frequency analysis methods by improving statistical techniques using new probability distributions, estimation methods, and model selection procedures, increases the uncertainty of flood quantile estimates.
When the ML method and model selection procedure based on AIC criterion were used for year to year successive evaluation of design quantiles, then their sudden upward or downward jumps can occur. This is not explainable by the changes of the flood generating processes themselves, but results only from the properties of the procedure applied. Our long-term research and experience in the field of FFA and design quantile assessment confirm the occurrence of sudden changes in the design quantiles in most of the analyzed cases. In the meantime, both researchers and practitioners like hydrological designers desire one specific design value with the least uncertainty. The solution may be a method of the aggregation of models, providing quantile estimates, and thus design values that are much more stable by extending the data series than the classical statistical methods used in FFA. The problem yet to be solved is the objective selection of models for aggregation. As was shown, the poorly suited models (with low weights) may significantly affect the quantile values while used in aggregation, especially in the range where the classical FFA shows high variability of quantile estimates. It is difficult to impose a strict limit on the weights as a selection criterion. In practice, hydrologists can support their decisions through the analysis of neighboring stations on the same river or in all region to narrow or enlarge the list of potential best distributions or to justify their use. However, it should be stressed that by aggregating the models, we take advantage of the wider information provided by several best fitted models.
The application of the method has utility in many areas, especially in the natural sciences, when performing experiments of the passive type (i.e., not being able to control all of the parameters of the experiment or the experiment cannot be repeated under the same, fixed conditions.