A New Empirical Approach to Calculating Flood Frequency in Ungauged Catchments : A Case Study of the Upper Vistula Basin , Poland

The aim of the work was to develop a new empirical model for calculating the peak annual flows of a given frequency of occurrence (QT) in the ungauged catchments of the upper Vistula basin in Poland. The approach to the regionalization of the catchment and the selection of the optimal form of the empirical model are indicated as a novelty of the proposed research. The research was carried out on the basis of observation series of peak annual flows (Qmax) for 41 catchments. The analysis was performed in the following steps: statistical verification of data; estimation of Qmax flows using kernel density estimation; determination of physiographic and meteorological characteristics affecting the Qmax flow volume; determination of the value of dimensionless quantiles for QT flow calculation in the upper Vistula basin; verification of the determined correlation for the calculation of QT flows in the upper Vistula basin. Based on the research we conducted, we found that the following factors have the greatest impact on the formation of flood flows in the upper Vistula basin: the size of catchment area; the height difference in the catchment area; the density of the river network; the soil imperviousness index; and the volume of normal annual precipitation. The verification procedure that we performed made it possible to conclude that the developed empirical model functions correctly.


Introduction
One of the tasks of engineering hydrology is to determine the quantiles of peak annual flows with a certain exceedance probability (Q T ).These values constitute an important characteristic of the hydrological regime of rivers.The correct determination of these quantities has practical implications for designing hydraulic structures, for defining flood risk zones, and for certain aspects of effective water management throughout the catchment [1,2].
To determine the Q T value, statistical methods are used, based on the density of continuous random variable, i.e., probability density functions such as: Pearson III, log-normal, Gumbel, log-Pearson III, and others [3][4][5].To make Q T predictions using statistical methods, access to historical observations of peak annual flows (Q max ) is required.These observations constitute an important source of information on the course of extreme flows over the centuries [6].However, hydrometric observations, including adequately long sequences of Q max values, are not always available for specific catchments (ungauged catchments)-a fact which precludes the use of statistical methods.Furthermore, currently available hydrometric data may not reflect the current Water 2019, 11, 601 2 of 21 physiographic conditions-such as land use in the catchment area-or the current meteorological conditions therein [7].The reliability of hydrometric observations may also present a problem.The Q max flows can be burdened with significant errors, for instance related to the extrapolation of the flow curve.Therefore, for ungauged catchments, so-called regional methods for the frequency of occurrence of peak flows are used, based on the correlation between the physiographic and meteorological characteristics of the catchment and the flood flows.This correlation is usually described by multiple regression equations [8][9][10].
The key stage in the development of regional methods for determining the occurrence of peak flows is catchment regionalization.This leads to obtaining homogeneous groups in terms of the impact of physiographic and meteorological factors of the catchment on flood flows therein.The methods of catchment regionalization, which are commonly used in hydrology, include the L-moments estimation method proposed by Hosking and Wallis as well as cluster analysis [11][12][13].However, it should be noted that these methods have certain drawbacks.For the L-moments estimation method, a distribution function or a quantile function must exist in an analytical form, which is not always possible.Additionally, it is necessary to use the sample in the form of a distribution series, which may also not always be possible [14].In the case of cluster analysis, the biggest problem is the adoption of the so-called cut-off point, which is decisive for the number of homogeneous groups.
The peak annual flows with a defined frequency of occurrence constitute the characteristics whose variable course is directly related to climate change.According to Hirabayashi et al. [15], events related to the occurrence and the course of floods will be more frequent and more intense, along with the changing climate.Therefore, in order to predict the risks associated with the occurrence of Q max flows, analyses based on interrelated climate and hydrological models are increasingly employed [16].This also applies to regional methods for estimating Q T in ungauged catchments.Such models should be verified and updated periodically, which is related to the changeability of the natural mechanism that shapes the course of flood flows.
The empirical models for calculating Q T currently used in Poland were developed in the 1980s.Bearing in mind the ongoing climate changes and the land use within the catchment areas, their application in the current form may raise justifiable reservations.Therefore, the goal of this paper is to develop a new empirical model for calculating Q T flows in the upper Vistula catchments within Poland.The choice of this particular region was dictated by the fact that due to the morphoclimatic conditions prevailing therein, it is the most flood-prone area in all Poland [17].As a novelty in the conducted research, an approach to the regionalization of the studied catchments is proposed.Until now, in Poland, such analyses have been conducted on the basis of grouping the catchments with respect to their geographical location or using methods of multidimensional statistical analysis.In this work, kernel density estimation was used for the purpose.In addition, the selection of variables for the model was based on the sensitivity of the fit measures and on substantive verification, rather than the stepwise regression applied previously in the Polish context.
soils: soils originating from medium and heavy tills, cherozemic soils and alfisols derived from clay loams and silt loams, soils derived from loams of different origin, soils derived from silts of different origin, as well as soils derived from silts, clays, and loams.In the case of the non-Carpathian catchments, substrates formed by medium permeable soils predominate: chernozems and chernozemic soils, sands and loamy and sands, soils made of loess, loess formations, clayey sands and light tills, as well as low-moor, high-moor, and transitional peats.In the studied Carpathian catchments, the main land cover is that of woodland and semi-natural ecosystems (on average, 55%), as well as arable land (on average, 39%).In turn, urbanized areas occupy, on average, 5% of the studied catchment area.The remaining part of the areas (1%) comprises wetlands and bodies of water.In the non-Carpathian catchments, arable land occupies on average 50% of the catchment area, and woodland occupies 45%.Urbanized areas constitute 4% of the catchment area on average, while 1% is covered by wetlands and bodies of water.The upper Vistula basin constitutes 25% of the total area of the basin's catchment and about 15% of Poland's area.It is subdivided into three main physiographic units: Carpathian mountains, highlands, and plains.The research area varies in height, which is reflected in the mean annual sum of atmospheric precipitation.It ranges from 580 mm for the plains, up to 1540 mm for mountain catchments [18,19].The research catchments adopted for analysis range in terms of their surface areas, from 23.39 km 2 to 865.03 km 2 .The soil of the Carpathian basin is dominated by the impermeable soils: soils originating from medium and heavy tills, cherozemic soils and alfisols derived from clay loams and silt loams, soils derived from loams of different origin, soils derived from silts of different origin, as well as soils derived from silts, clays, and loams.In the case of the non-Carpathian catchments, substrates formed by medium permeable soils predominate: chernozems and chernozemic soils, sands and loamy and sands, soils made of loess, loess formations, clayey sands and light tills, as well as low-moor, high-moor, and transitional peats.In the studied Carpathian catchments, the main land cover is that of woodland and semi-natural ecosystems (on average, 55%), as well as arable land (on average, 39%).In turn, urbanized areas occupy, on average, 5% of the studied catchment area.The remaining part of the areas (1%) comprises wetlands and bodies of water.In the non-Carpathian catchments, arable land occupies on average 50% of the catchment area, and woodland occupies 45%.Urbanized areas constitute 4% of the catchment area on average, while 1% is covered by wetlands and bodies of water.

Materials and Methods
The purpose of the work was accomplished based on the observation series of Q max flows in selected research catchments of the upper Vistula basin.The data covering the years 1971-2015 were obtained from the Institute of Meteorology and Water Management of the National Research Institute in Warsaw.Based on acquired hydrometric observations, the following tests were performed: statistical verification of Q max flow observation series, estimation of Q max distribution using kernel density estimation, determination of physiographic and meteorological characteristics affecting the flow size of Q max , and the determination of dimensionless quantiles for calculating Q T flows in the upper Vistula basin.

Statistical Verification of Data
Statistical verification of the data was performed by assessing the significance of the trend of the observation series of peak annual flows using the Mann-Kendall test.The zero hypothesis of the test (H 0 ) assumes that there is no monotonic trend of the data, while the alternative hypothesis (H 1 ) states that such a trend does exist.The calculations were carried out for the significance level of α = 0.05.The Mann-Kendall S statistic is determined based on the following formula [20]: where: where: n-number of elements of the time series The normalised statistic Z calculated according to the formula: where: Var(S)-variance of S, derived from the equation: If the value of the normalised Z statistic is less than the critical Z crit value for the significance level of α = 0.05 (1.96) then the H 0 hypothesis is acceptable.Otherwise, the H 0 hypothesis is rejected in favour of the alternative.Catchments showing a statistically significant trend in the Q max observation series were excluded from further analysis.

Assessment of Peak Flow Distributions Using Kernel Density Estimation
On the basis of kernel density estimation, a direct estimation of the function of peak flows was performed, which made it possible to evaluate the modality of the function for the studied random variables.In the case of obtaining the unimodal distribution density function, it was found that the studied area is homogeneous in terms of the formation of flood flows.Estimators were determined according to the following correlation [21]: Water 2019, 11, 601 5 of 21 where: n-sample size; h-smoothing parameter, i.e., the so-called bandwidth; K-kernel density estimate; X i -sample element t.
Bandwidth h was determined according to the Silverman method [22].Kernel density estimate K was adopted as the Gaussian kernel [23].

Determination of Physiographic and Meteorological Characteristics Affecting the Formation of Peak Flows
Determination of the impact of physiographic and meteorological characteristics of the catchment on the formation of peak annual flows in the upper Vistula basin was aimed at building a model for estimating the size of the variable representing peak flows in ungauged catchments of this water region.The Q med flow (median of the observation series of Q max ) was assumed as an independent variable due to the resistance to single, extremely high flows occurring in the observation series [24].The analysed catchment characteristics applicable to the construction of the empirical Q med model are presented in Table 1.

Table 1.
Physiographic and meteorological characteristics of the catchment applicable to the construction of the empirical Q med model.Based on the values of individual physiographic and meteorological characteristics of the catchment, correlation matrices were determined in order to enable initial selection of predictors to the formulas allowing estimation of Q med in ungauged catchments of the entire upper Vistula basin.A multiple regression was used to build the model, the linear form of which is expressed by the following equation [25]:

Type of the
where: Y-dependent variable; a-regression constant (intercept); The obtained form of the model for calculating Q med flows in the ungauged catchments of the upper Vistula basin was verified in three stages: substantive, statistical, and against independent research material.By way of substantive verification, the so-called logic of the model was checked through the analysis of the correctness of regression coefficients' signs.This was aimed at determining whether the model meets the prearranged expectations, and checking the model's compliance with the assumptions that were the basis for the determination of that specific formula.Statistical verification of the model was carried out for the significance level of α = 0.05.It consisted of checking whether the following assumptions were met, regarding the significance of regression equation, the significance of partial regression coefficients, the evaluation of redundancy between independent variables, the verification of homoscedasticity of residues (residual scattering analysis), the residual autocorrelation study (using Durbin-Watson statistics), the normality of residual distribution, and the estimation of the expected value of the random component.Verification against the independent research material consisted of determining, by means of fixed forms of equations, the Q med values in the catchments not included in the structure of the analysed models, and making a comparison between the observed and the calculated Q med .
The analysis of the uncertainty of designated model forms for estimating Q med flows in the upper Vistula basin was made by specifying the range of forecast (prediction).The calculations, with an assumed significance level of α = 0.05, were computed based on the following formula [26]: where: Ŷp -predictable value of the dependent variable; t kryt -student t statistic with n -2 degrees of freedom; b. s.-standard error in matching, determined using the following formula: where: MS Res -square root of the model's residuals; X 0 -vector of independent variables; X-matrix of independent variables adopted in the model's structure.

Determination of the Values of Dimensionless Quantiles for the Calculation of Peak Annual Flows with a Defined Frequency of Occurrence
Determination of dimensionless quantile values to calculate Q T in the catchments of the upper Vistula basin was conducted in two stages, first by determining the recommended statistical distribution for Q T estimation, and second by determining dimensionless probability curves for the upper Vistula basin.The Q T values were estimated using the statistical distributions recommended in Poland: Pearson type III, Weibull's, and log-normal, based on the following formulae [27]: Pearson III distribution: Weibull distribution: Log-normal distribution: where: Water 2019, 11, 601 7 of 21 ε-lower sequence boundary; λ, β-shape parameters; α-scale parameter; µ, σ-log-normal distribution parameters; p-exceedance probability; u p -quantile of order p.
The lower sequence boundary of ε was determined graphically, whereas the parameters of distributions were determined using the maximum likelihood estimation method.The conformity assessment of the probability distributions function with the empirical distribution of the peak annual flows was conducted using the Kolmogorov test for the significance level of α = 0.05.The selection of the theoretical function with the best fit with the empirical distribution of the peak annual flows was made using the Akaike Information Criterion (AIC), based on the following correlations [28]: where: -the logarithm of the likelihood function; k-the number of estimated parameters.To determine the recommended statistical distribution for the estimation of Q T in the upper Vistula basin, the ranking method was used.The designated values of the AIC criterion were given ranks from 1 to 3, where 1 is the best fit, and 3 is the poorest fit between the theoretical distribution and the empirical distribution of the random variable in the given catchment.As a recommended function for the estimation of the Q T quantiles in the upper Vistula basin, a distribution was assumed with the lowest rank value in relation to the entire water region covered by the study.
The determination of the dimensionless probability curve for estimating Q T quantiles in the upper Vistula basin was based on the method proposed by Stachý and Fal [29], in which regional curves are estimated as arithmetic means of the dimensionless quantiles of probability distribution curves, thus arriving at the following: where: k-the number of catchments being tested.
Having determined the dimensionless curve of probability distribution for the whole river basin, we have examined the extent to which, for each studied catchment, the curves remained within the confidence interval determined for the dimensionless curve encompassing the upper Vistula water region.Since, in practice, when determining the Q T volume, it is not so much the confidence interval that is of interest, but rather its upper boundary, the verification of the dimensionless curve was based on the upper boundary Q T of the unilateral 84% interval confidence for the actual peak flows Q T .As stated in the work by Stachý and Fal [29], the verification of dimensionless curves is carried out on the basis of quantile values from 1 to 10%.Thus, in the present work the testing included dimensionless quantile values of Q 100 /Q med and Q 10 /Q med .

Verification of the Determined Correlation for Estimating the Quantiles of the Peak Annual Flows with a Given Frequency of Occurrence
As a complement to the conducted research, we performed the verification of the established empirical correlation for calculating quantiles of Q T flows against the currently used empirical formulas in the upper Vistula river basin: the Punzet formula and the spatial regression equation.It consisted of the determination of Q T recommended by the statistical distribution and the empirical models, as well as in the determination of the mean absolute percentage error (MAPE) of estimating Q T quantiles with empirical formulas in relation to the statistical method.The Punzet formula and the spatial regression equation are described with the following correlations [30]: Punzet Formula: where: ϕ T -a function dependent on the probability (-); Q 2 -peak flow with return period of T = 2 years.The function dependent on the probability ϕ T was calculated as: where: t p -quantile in a standardized normal distribution (-); c vmax -variation coefficient (-).Peak flow with return period of T = 2 was determined according to the following formulas: for mountain catchments: for upland catchments: for flatland catchments: where: A-catchment area (km 2 ); P-mean annual precipitation (mm); N-soil imperviousness index (%); I-river slope indicator ( ).Spatial equation regression: where: λ p% -quantile established for the dimensionless curves of regional peak flows; Q 100 -peak flow with return period = 100 years which is determined according to following formula: where: α-regional parameter (-); A-catchment area (km 2 ); H 100 -annual maximum daily with return period T = 100 years (mm); Φ-runoff coefficient (-); I r -slope of the watercourse in ( ); Ψ-mean slope of the catchment ( ); JEZ-lake index (quotient of the total lakes area in the catchment to the total catchment area) (-); B-swamp index (quotient of the total swamps area in the catchment to the total catchment area) (-).
Water 2019, 11, 601 9 of 21 Mean absolute percentage error for quantiles Q T was computed from the formula [31,32]: where: N-number of observations; Q T -peak flow of a determined frequency of occurrence, computed using statistical method (m 3 •s −1 ); Q T e -peak flow of a determined frequency of occurrence, computed using the analysed empirical model (m 3 •s −1 ).

Statistical Verification of Data
Taking into account the increasing frequency of human interference in the natural water environment, which is affecting changes in the river regime, research into the invariance of hydrological conditions in the studied catchments is necessary for the considered measurement period.Therefore, statistical verification of the Q max flow observation series versus the homogeneity and independence of data was carried out, using the Mann-Kendall test to examine the significance of the trend.The results of the analysis are presented in Figure 2.
Water 2019, 11, x FOR PEER REVIEW 9 of 21 period.Therefore, statistical verification of the Qmax flow observation series versus the homogeneity and independence of data was carried out, using the Mann-Kendall test to examine the significance of the trend.The results of the analysis are presented in Figure 2.
Based on the obtained results, it was found that the majority of the studied rivers did not show statistically significant trends of the Qmax flows.This is evidenced by the size of normalized statistics |Z|, for which most values were lower than the critical value of this test for the significance level of α = 0.05 (Zcrit = 1.96).The following catchment areas constitute exceptions: Bystra-Kamesznica (C_03), Skawica-Zawoja (C_07), Stryszawka-Sucha (C_08), Raba-Rabka (C_10), Kirowa Woda-Kościelisko Kiry (C_16), Grajcarek-Szczawnica (C_21), San-Dwernik (C_28), and Czarny-Polana (C_29), for which the values of |Z| are bigger than Zcrit.Such results are attributed to the response of these catchments to the course of heavy rainfall of very strong intensity that occurred in Central and Eastern Europe in 1997 and 2010, causing flash floods in the upper Vistula basin [33].In addition, as stated by Wyżga et al. [34], in recent years in the basin of the upper Vistula there had been changes in land use, which resulted in the modified occurrence of floods.For the remaining catchments, there were no statistically significant trends observed.This means that the studied variables are independent and that they derive from the same general population.Therefore, in the analysed multi-year period, no factor has appeared that would significantly affect the course of processes shaping flood flows from these catchments.Similar research results related to the analysis of changes in the flood flows from the catchments of the upper Vistula river basin are presented in the papers [35,36], where in the majority of the studied cases there were also no statistically significant trends found in the observation series of flood flows in the upper Vistula basin.Bearing in mind that the observation series adopted for further analysis should meet the requirements of a simple random sample, the following catchments were excluded from further research: Bystra-Kamesznica, Skawica-Zawoja, Stryszawka-Sucha, Raba-Rabka, and Grajcarek-Szczawnica. On the other hand, catchments where a slight deviation from the assumed Zcrit was recorded were included in further analyses.

Estimation of the Distribution of Peak Annual Flows Using Kernel Estimates
In the present study, the estimation of the distribution of the density function in its empirical form was made for an observation series comprised of the Qmed values for the catchments, which were accepted for further analysis after the statistical verification.In the cases where the distribution Based on the obtained results, it was found that the majority of the studied rivers did not show statistically significant trends of the Q max flows.This is evidenced by the size of normalized statistics |Z|, for which most values were lower than the critical value of this test for the significance level of α = 0.05 (Z crit = 1.96).The following catchment areas constitute exceptions: Bystra-Kamesznica (C_03), Skawica-Zawoja (C_07), Stryszawka-Sucha (C_08), Raba-Rabka (C_10), Kirowa Woda-Kościelisko Kiry (C_16), Grajcarek-Szczawnica (C_21), San-Dwernik (C_28), and Czarny-Polana (C_29), for which the values of |Z| are bigger than Z crit .Such results are attributed to the response of these catchments to the course of heavy rainfall of very strong intensity that occurred in Central and Eastern Europe in 1997 and 2010, causing flash floods in the upper Vistula basin [33].In addition, as stated by Wy żga et al. [34], in recent years in the basin of the upper Vistula there had been changes in land use, which resulted in the modified occurrence of floods.For the remaining catchments, there were no statistically significant trends observed.This means that the studied variables are independent and that they derive from the same general population.Therefore, in the analysed multi-year period, no factor has appeared that would significantly affect the course of processes shaping flood flows from these catchments.
Similar research results related to the analysis of changes in the flood flows from the catchments of the upper Vistula river basin are presented in the papers [35,36], where in the majority of the studied cases there were also no statistically significant trends found in the observation series of flood flows in the upper Vistula basin.Bearing in mind that the observation series adopted for further analysis should meet the requirements of a simple random sample, the following catchments were excluded from further research: Bystra-Kamesznica, Skawica-Zawoja, Stryszawka-Sucha, Raba-Rabka, and Grajcarek-Szczawnica. On the other hand, catchments where a slight deviation from the assumed Z crit was recorded were included in further analyses.

Estimation of the Distribution of Peak Annual Flows Using Kernel Estimates
In the present study, the estimation of the distribution of the density function in its empirical form was made for an observation series comprised of the Q med values for the catchments, which were accepted for further analysis after the statistical verification.In the cases where the distribution showed multimodality, it was possible to conclude about the existence of many subpopulations of the examined feature.The results of calculations are presented in Figure 3. showed multimodality, it was possible to conclude about the existence of many subpopulations of the examined feature.The results of calculations are presented in Figure 3.  as areas with a homogeneous course of the analysed phenomenon.Hence the attempt to build a general form of a multiple regression model for determining Q med throughout the whole area of the upper Vistula basin.However, it should be emphasized that the vast majority of the studied catchments are mountainous in nature, and therefore the course of kernel density function could be under strong pressure of flow-forming characteristics typical of catchments located in such areas.Furthermore, as stated in Santhosh and Srinivas [37], the choice of the method for estimating the smoothing parameter also has a significant impact on the result of kernel density estimation of the density function.An overly low value of the smoothing parameter may cause the estimator to exhibit multimodal features.However, at high values of this parameter, the estimator may be deprived of much information about the functional characteristics of the analysed random variable, which makes it more smooth, while indicating the unimodal distribution of this variable.According to Rutkowska et al. [38], regionalization of the catchment is based on its physiological characteristics, which have the greatest impact on the flood flows from such areas.This requires precise determination of numerical values describing these variables.In the case of using kernel estimates, it is possible to conduct an analysis only for the size of flows, without the necessity to provide any other information.A detailed analysis of the modality of kernel density function makes it possible to determine whether the given regions are homogeneous in terms of shaping the flood flows or not.For this reason, it has a certain advantage over the classical methods used for regionalization.

Determining the Form of the Equation for Calculating the Peak Flows in the Catchments of the Upper Vistula River Basin
The preliminary selection of physiographic and meteorological characteristics describing Q med flows in the upper Vistula basin was made on the basis of the correlation matrix analysis, conducted for the initially determined values of these factors.It should be emphasized that due to the nature of statistical significance, it follows that if a significant number of determinations of correlation coefficients are performed, then statistically significant values may occur relatively frequently.There is no universal way to identify true (actual) correlations.Therefore, all results for which the strength of the correlation relationship is insufficient should be treated with caution.They should be verified in a subjective way, intuitively assessing the impact of these characteristics on the variable under study.With this in mind, final selection was made from the group of predictors (see Table 1) for the construction of the model in its final form: surface area of the catchment A, height difference in the catchment area ∆H, river network density D, arable land index S fr , built-up index S fu , soil imperviousness index N, and annual normal precipitation P. According to Węglarczyk [39], the number of predictors describing the dependent variable should not be overly high.This is due to the fact that each independent variable, in addition to information about the forecasted value, carries with it a certain degree of uncertainty, resulting from the observation series of this particular feature.Hence the need to determine the optimum number of independent variables, based on the quality of the model.Figure 4 summarizes the values of statistics for a given number of independent variables of the analysed formula.
Based on the data presented in Figure 4, it was found that the values of the determination coefficient r 2 increase significantly with the addition of further independent variables to the equation.This results from the very essence of this coefficient, as it is a non-decreasing function of the number of independent variables in multiple regression models.On the other hand, a markedly smaller increase in this characteristic was recorded after taking into account the fifth predictor in the equation.Furthermore, the addition of a sixth independent variable did not provide a significant improvement in the quality of the model.Therefore, as the final configuration of the formula for calculating the Q med flows in the entire upper Vistula basin, a five-parameter form of the exponential equation was adopted: where: A-catchment area (km 2 ); ∆H-height difference in the catchment area (m a.s.l.); D-river network density (km•km −2 ); N-Boldakov's soil imperviousness index (%); P-annual normal precipitation in the catchment (mm).While making the substantive verification of the established model form, it was found that it is logical.This is evidenced by the values of regression coefficients n for the predictors describing particular equations.When analysing Formula ( 22) in detail, it is concluded that the flow of Q med increases with the increase of the catchment's surface area, as well as the height difference of the catchment area, the density of the river network, the value of the soil imperviousness index, and the amount of normal annual precipitation within the catchment.
Statistical verification of the established model forms was made on the basis of the significance of the linear regression of the model, the significance of partial regression coefficients, the evaluation of redundancy between independent variables, the assumption of homoscedasticity of residuals, the lack of autocorrelation of residuals, the normality of distribution of residuals, and the evaluation of the expected value of a random component.Table 2 presents the results concerning the analysis of the significance of the linear regression of the model, and the significance of partial regression coefficients.Statistical verification of the established model forms was made on the basis of the significance of the linear regression of the model, the significance of partial regression coefficients, the evaluation of redundancy between independent variables, the assumption of homoscedasticity of residuals, the lack of autocorrelation of residuals, the normality of distribution of residuals, and the evaluation of the expected value of a random component.Table 2 presents the results concerning the analysis of the significance of the linear regression of the model, and the significance of partial regression coefficients.Based on the values summarized in Table 2, it has been found that the model form for calculating Q med in the entire upper Vistula basin is characterized by a statistically significant value of the F statistic, for which the p-value is less than the assumed significance level of α = 0.05.In turn, statistically significant values of p i partial regression coefficients occur for the catchment area and river network density.Bearing in mind the analysis regarding the determination of the optimum number of predictors in the equations, it was decided that statistically insignificant parameters should be retained, because their removal decreases the quality of the examined models, reflected by a marked decrease in the value of the determination coefficient r 2 .
The evaluation of the redundancy of variables is based on the so-called tolerance factor.In cases when the value of that factor was higher than 0.1, it was concluded that there is no collinearity of independent variables.The results of this analysis are summarized in Table 3.When analysing the values listed in Table 3, it was found that the tolerance for all variables is high (above 0.1).In addition, the values of coefficient r 2 c differ significantly from one.Thus, independent variables do not show redundancy in regression equations, which indicates the lack of their collinearity.Furthermore, the relatively high values of semi-partial correlations in the studied equation forms, for independent variables, indicate relatively high correlations with the dependent variable.
The assumption of constancy of the variance of the random component for individual values of independent variables (homoscedasticity) was verified using the scatter plots.Figure 5  The evaluation of the redundancy of variables is based on the so-called tolerance factor.In cases when the value of that factor was higher than 0.1, it was concluded that there is no collinearity of independent variables.The results of this analysis are summarized in Table 3.When analysing the values listed in Table 3, it was found that the tolerance for all variables is high (above 0.1).In addition, the values of coefficient r 2 c differ significantly from one.Thus, independent variables do not show redundancy in regression equations, which indicates the lack of their collinearity.Furthermore, the relatively high values of semi-partial correlations in the studied equation forms, for independent variables, indicate relatively high correlations with the dependent variable.
The assumption of constancy of the variance of the random component for individual values of independent variables (homoscedasticity) was verified using the scatter plots.Figure 5 is a graph of predicted values relative to residual values.When analysing the values summarized in Figure 5, we noted the lack of heteroscedasticity (violation of the assumption of homoscedasticity) of the random variables being analysed.Points on the graph are arranged in the form of an evenly distributed cloud, and there are no clear systems of the points that form individual groups.Therefore, there is no reason to reject the assumption of When analysing the values summarized in Figure 5, we noted the lack of heteroscedasticity (violation of the assumption of homoscedasticity) of the random variables being analysed.Points on the graph are arranged in the form of an evenly distributed cloud, and there are no clear systems of the points that form individual groups.Therefore, there is no reason to reject the assumption of constancy of the random component variance, for individual independent variables.
To verify the autocorrelation of the residuals of the models, the Durbin-Watson statistics were used.The results of the analysis are summarized in Table 4. Based on the results as seen in Table 4, the hypothesis was adopted that the random elements were not correlated.The normality of the distribution of residues was verified using the normality plot.Figure 6 presents a chart of nominal (expected) values relative to residual values obtained by applying the tested form of the empirical model.Based on the normality plot of the residuals, it was found that for the analysed equation, most points are arranged along a straight line.Hence the inference that in these cases the distribution of residues is consistent with the normal distribution.To verify the autocorrelation of the residuals of the models, the Durbin-Watson statistics were used.The results of the analysis are summarized in Table 4. Based on the results as seen in Table 4, the hypothesis was adopted that the random elements were not correlated.The normality of the distribution of residues was verified using the normality plot.The verification of the assumption about the zero value of the expected random component εi was made based on the analysis of average residuals for the studied forms of equations.The results are summarized in Table 5.The verification of the assumption about the zero value of the expected random component ε i was made based on the analysis of average residuals for the studied forms of equations.The results are summarized in Table 5.Based on the results summarized in Table 5, it was found that the average values of the residuals for the developed model are 0; therefore, the hypothesis with a zero value for the random component ε i is true.This means that the distortions (random components) do not show any tendency of the empirical values of the dependent variable deviating from the theoretical values in any direction (either plus or minus).
Verification of the determined correlation for the forecast of Q med flows in the catchments of the upper Vistula basin was made on of independent hydrometric material for the following catchments: Przemsza-Piwo ń, Skawinka-Radziszów (non-Carpathian catchments) and Stradomka-Stradomka, Niedziczanka-Niedzica, Jasiołka-Zboiska (Carpathian catchments).Additionally, the confidence interval was estimated by applying the Formulas ( 6) and (7), for the significance level of α = 0.05.The results are shown in Table 6.Table 6.Values Q med and confidence intervals for the values obtained from the adopted empirical model.

River-Profile
Lower Boundary of Confidence Interval (m 3  Q med -median of annual peak flows, determined on the basis of the observation series; Q medp -flow calculated according to Formula (22).
Based on the results summarized in Table 6, it was found that the obtained form of the empirical model produces satisfying results.This is evidenced by the small differences between Q med and Q med p .Therefore, it is recommended that Formula (22) be used in the ungauged basins of the upper Vistula river basin.This will eliminate the problem related to the choice of the appropriate regional equation if the river flows through several physiographic regions, and above all, through both Carpathians and non-Carpathian areas.Such rivers may demonstrate characteristics acquired in the upper course, even though their water gauge profile is far beyond the region's reach.Regarding the analysis we have conducted, concerning the determination of the lower and upper boundaries of the confidence interval for the determined form of the empirical model, it can be stated that for the confidence level of 95%, the predicted Q med values remain within the range described by Equation (6).

Determination of Dimensionless Quantiles' Values for the Calculation of Peak Annual Flows with a Defined Frequency of Occurrence
Determination of the values of dimensionless µ T quantiles was meant to facilitate the determination of Q T flows, based on Formula (22).To determine the quantile values of µ T , firstly, the best-fit probability distribution function to calculate the Q T was indicated.Then the statistical distributions recommended in Poland were subjected to analysis: Pearson type III (PIII), Weibull, and log-normal.Figure 7 presents Q 100 values determined by the studied probability distributions.
Based on the recommended statistical distribution, a non-dimensional probability curve was determined (see Figure 8).The curve was verified based on the results summarized in Figure 9. Verification of the non-dimensional probability curve, subject to log-normal distribution, produced satisfactory results.In the total number of 36 tested catchments of the entire upper Vistula basin, the Q 10 /Q med quantile was outside the upper boundary of the confidence interval 4 times (11%), and the Q 100 /Q med quantile, 6 times (17%).According to the definition of the upper boundary at 84% of the confidence interval, for 36 cases outside this limit, there may be 5 observations (16% of 36 cases).With a small number of observations, such a result can be considered acceptable.Therefore, the log-normal distribution was assumed as the basis for determining Q T quantiles using the determined empirical correlation.Based on the recommended statistical distribution, a non-dimensional probability curve was determined (see Figure 8).The curve was verified based on the results summarized in Figure 9. Verification of the non-dimensional probability curve, subject to log-normal distribution, produced satisfactory results.In the total number of 36 tested catchments of the entire upper Vistula basin, the Q10/Qmed quantile was outside the upper boundary of the confidence interval 4 times (11%), and the Q100/Qmed quantile, 6 times (17%).According to the definition of the upper boundary at 84% of the confidence interval, for 36 cases outside this limit, there may be 5 observations (16% of 36 cases).With a small number of observations, such a result can be considered acceptable.Therefore, the log-normal distribution was assumed as the basis for determining QT quantiles using the determined empirical correlation.Based on the recommended statistical distribution, a non-dimensional probability curve was determined (see Figure 8).The curve was verified based on the results summarized in Figure 9. Verification of the non-dimensional probability curve, subject to log-normal distribution, produced satisfactory results.In the total number of 36 tested catchments of the entire upper Vistula basin, the Q10/Qmed quantile was outside the upper boundary of the confidence interval 4 times (11%), and the Q100/Qmed quantile, 6 times (17%).According to the definition of the upper boundary at 84% of the confidence interval, for 36 cases outside this limit, there may be 5 observations (16% of 36 cases).With a small number of observations, such a result can be considered acceptable.Therefore, the log-normal distribution was assumed as the basis for determining QT quantiles using the determined empirical correlation.Bearing in mind the calculations we have carried out, the final form of the empirical model for estimating Q T flows in the catchments of the ungauged upper Vistula basin was obtained as follows: where: Q med -median annual flow, determined by Formula ( 22) (m 3 •s −1 ); µ T -dimensionless value of distribution quantile for the assumed frequency of occurrence, taken from Figure 8 (-).
Thus, the developed empirical formula is recommended for use in catchments whose surface areas range from 50 to 600 km 2 .
As a complement to the conducted research, verification of the established Formula ( 23) for estimating Q T quantiles was performed against the currently used empirical formulas in the upper Vistula basin: Punzet's and the spatial equation of regression (SER).The results of the verification are presented in Figure 10.Based on the obtained results, it was found that compared to the Punzet and SER formula, the values obtained with Formula (23) present the lowest MAPE value for each Q T quantile.Standard error of estimating Q T using the Punzet formula is 46%; when using the area regression equation, it is 39%, and when using Formula (23) it is 21%.Therefore, it is concluded that the developed equation can be a viable alternative to the currently used empirical formulas for calculating Q T in ungauged catchments of the upper Vistula basin.Bearing in mind the calculations we have carried out, the final form of the empirical model for estimating QT flows in the catchments of the ungauged upper Vistula basin was obtained as follows: where: Qmed-median annual flow, determined by Formula ( 22) (m 3 •s −1 ); μT-dimensionless value of distribution quantile for the assumed frequency of occurrence, taken from Figure 8 (-).
Thus, the developed empirical formula is recommended for use in catchments whose surface areas range from 50 to 600 km 2 .
As a complement to the conducted research, verification of the established formula ( 23) for estimating QT quantiles was performed against the currently used empirical formulas in the upper Vistula basin: Punzet's and the spatial equation of regression (SER).The results of the verification are presented in Figure 10.Based on the obtained results, it was found that compared to the Punzet and SER formula, the values obtained with Formula (23) present the lowest MAPE value for each QT quantile.Standard error of estimating QT using the Punzet formula is 46%; when using the area regression equation, it is 39%, and when using formula (23) it is 21%.Therefore, it is concluded that the developed equation can be a viable alternative to the currently used empirical formulas for calculating QT in ungauged catchments of the upper Vistula basin.

Conclusions
The aim of the work was to determine the form of a new empirical model for estimating quantiles of peak annual flows with a defined frequency of occurrence in ungauged catchments of the upper Vistula basin.Based on the research we conducted, it was found that in the majority of the catchments there are no statistically significant trends of peak annual flows.This is evidenced by the results of the analysis carried out with the application of the Mann-Kendall test, confirming the invariability of hydrological conditions and the stationarity of the characteristics affecting the volume of flood flows (values of Z statistics from the Mann-Kendall test for the analysed time series below 1.96).Kernel estimation of the distribution function of median flows in the upper Vistula basin clearly indicated the unimodal character of the empirical distribution function, which may indicate homogeneous conditions affecting the flood flows in the analysed multi-year period, in all of the studied catchments.Since the application of this method requires knowledge of only the factor for which calculations are made; hence, it may compete with other commonly used methods of regionalization.Based on the computations conducted in the study, it was demonstrated that the

Conclusions
The aim of the work was to determine the form of a new empirical model for estimating quantiles of peak annual flows with a defined frequency of occurrence in ungauged catchments of the upper Vistula basin.Based on the research we conducted, it was found that in the majority of the catchments there are no statistically significant trends of peak annual flows.This is evidenced by the results of the analysis carried out with the application of the Mann-Kendall test, confirming the invariability of hydrological conditions and the stationarity of the characteristics affecting the volume of flood flows (values of Z statistics from the Mann-Kendall test for the analysed time series below 1.96).Kernel estimation of the distribution function of median flows in the upper Vistula basin clearly indicated the unimodal character of the empirical distribution function, which may indicate homogeneous conditions affecting the flood flows in the analysed multi-year period, in all of the studied catchments.Since the application of this method requires knowledge of only the factor for which calculations are made; hence, it may compete with other commonly used methods of regionalization.Based on the computations conducted in the study, it was demonstrated that the course of floods in the upper Vistula basin is most influenced by such factors as: surface area of the catchment, height difference in the catchment area, river network density, imperviousness of the soil, and normal annual precipitation.Based on the results obtained using the AIC criterion, it was found that among the probability distribution types tested for Q T calculation in the upper Vistula basin, the empirical Q max sequences were approximated best by the log-normal distribution.Verification of the established correlation for Q T estimation in the upper Vistula basin showed that the formula functions properly, as evidenced by the MAPE values (standard error of Q T estimating was 21% while for currently used empirical formulas in upper Vistula basin: Punzet and SER it was 46 and 39% respectively).The determined form of the empirical equation finds application in the entire upper Vistula basin, for the catchments with a surface area of 50 to 600 km 2 .

Figure 1 .
Figure 1.Location of studied catchment areas in upper Vistula basin, set against digital elevation model.
purpose of the work was accomplished based on the observation series of Qmax flows in selected research catchments of the upper Vistula basin.The data covering the years 1971-2015 were obtained from the Institute of Meteorology and Water Management of the National Research Institute

Figure 1 .
Figure 1.Location of studied catchment areas in upper Vistula basin, set against digital elevation model.

Figure 2 .
Figure 2. Results of the Mann-Kendall test of trend significance for the studied catchments.

Figure 2 .
Figure 2. Results of the Mann-Kendall test of trend significance for the studied catchments.
Water 2019, 11, x FOR PEER REVIEW 10 of 21

Figure 3 .
Figure 3.The course of the estimated kernel density function of Qmed flows for the studied catchments of the upper Vistula basin.

Figure 3 .
Figure 3.The course of the estimated kernel density function of Q med flows for the studied catchments of the upper Vistula basin.Kernel density estimation of the Q med flow density function, carried out for the tested catchments of the upper Vistula basin, clearly indicated the unimodal nature of the density function with the right-skewed distribution.It follows that the studied catchments located in different physiographic units of the upper Vistula river basin (Carpathian and non-Carpathian catchments) can be treated Water 2019, 11, x FOR PEER REVIEW 12 of 21

Figure 4 .
Figure 4. Impact of the number of predictors on the value of the coefficient of determination for the analysed form of the model for estimating Qmed throughout the upper Vistula basin.

Figure 4 .
Figure 4. Impact of the number of predictors on the value of the coefficient of determination for the analysed form of the model for estimating Q med throughout the upper Vistula basin.
is a graph of predicted values relative to residual values.Water 2019, 11, x FOR PEER REVIEW 13 of 21

Figure 5 .
Figure 5. Diagram of the predicted values versus residual values for the model for estimating Qmed in the upper Vistula basin.

Figure 5 .
Figure 5. Diagram of the predicted values versus residual values for the model for estimating Q med in the upper Vistula basin.

Water 2019 ,
11, x FOR PEER REVIEW 14 of 21 Figure 6   presents a chart of nominal (expected) values relative to residual values obtained by applying the tested form of the empirical model.Based on the normality plot of the residuals, it was found that for the analysed equation, most points are arranged along a straight line.Hence the inference that in these cases the distribution of residues is consistent with the normal distribution.

Figure 6 .
Figure 6.Diagram of normal distribution of residuals for the model for estimating Qmed in the upper Vistula basin.

Figure 6 .
Figure 6.Diagram of normal distribution of residuals for the model for estimating Q med in the upper Vistula basin.

Figure 8 .
Figure 8. Dimensionless probability curve of annual peak flows for the catchments of upper Vistula river basin.

Figure 9 .
Figure 9. Verification of the dimensionless probability curve for the upper Vistula river basin.

Figure 8 .
Figure 8. Dimensionless probability curve of annual peak flows for the catchments of upper Vistula river basin.

Figure 8 .
Figure 8. Dimensionless probability curve of annual peak flows for the catchments of upper Vistula river basin.

Figure 9 .
Figure 9. Verification of the dimensionless probability curve for the upper Vistula river basin.

Figure 9 .
Figure 9. Verification of the dimensionless probability curve for the upper Vistula river basin.

Figure 10 .
Figure 10.MAPE values for the estimation of QT quantiles for the analysed empirical formulae.

Figure 10 .
Figure 10.MAPE values for the estimation of Q T quantiles for the analysed empirical formulae.

Table 2 .
Results of the significance analysis of linear regression model, and the significance of the component coefficients of regression for the model for estimating Qmed throughout the upper Vistula basin.

Table 2 .
Results of the significance analysis of linear regression model, and the significance of the component coefficients of regression for the model for estimating Q med throughout the upper Vistula basin.
F-Fisher-Snedecor distribution; p-p value for the regression model; a-value of the absolute term; n*-normalised coefficient of regression; n-coefficient of regression; t-quotient b/(standard effort of b); p i -p value for partial coefficients of regression models.

Table 3 .
Results of the collinearity analysis of independent variables in the studied forms of the model for determining Q med in the upper Vistula basin.
r 2 c -value of the coefficient of determination between the given variable and all other independent variables.

Table 3 .
Results of the collinearity analysis of independent variables in the studied forms of the model for determining Qmed in the upper Vistula basin.
r 2 c-value of the coefficient of determination between the given variable and all other independent variables.

Table 4 .
Results of the autocorrelation analysis of residuals, conducted using Durbin-Watson test (source: own study).
N-number of cases; k-number of variables in the equation; d l , d g -threshold values of the Durbin-Watson statistic; D-Durbin-Watson statistic.

Table 4 .
Results of the autocorrelation analysis of residuals, conducted using Durbin-Watson test (source: own study).
N-number of cases; k-number of variables in the equation; dl, dg-threshold values of the Durbin-Watson statistic; D-Durbin-Watson statistic.

Table 5 .
Analysis of outlier residuals for the studied forms of the models for estimating Qmed in the upper Vistula basin.

Table 5 .
Analysis of outlier residuals for the studied forms of the models for estimating Q med in the upper Vistula basin.