Flood Frequency Analyses over Different Basin Scales in the Blue Nile River Basin, Ethiopia

The frequency and intensity of flood quantiles and its attendant damage in agricultural establishments have generated a lot of issues in Ethiopia. Moreover, precise estimates of flood quantiles are needed for efficient design of hydraulic structures; however, quantification of these quantiles in data-scarce regions has been a continuing challenge in hydrologic design. Flood frequency analysis is thus essential to reduce possible flood damage by investigating the most suitable flood prediction model. The annual maximum discharges from six representative stations in the Upper Blue Nile River Basin were fitted to the commonly used nine statistical distributions. This study also assessed the performance evolution of the probability distributions with varying spatial scales, such that three different spatial scales of small-, medium-, and large-scale basins in the Blue Nile River Basin were considered. The performances of the candidate probability distributions were assessed using three goodness-of-fit test statistics, root mean square error, and graphical interpretation approaches to investigate the robust probability distribution for flood frequency analysis over different basin spatial scales. Based on the overall analyses, the generalized extreme value distribution was proven to be a robust model for flood frequency analysis in the study region. The generalized extreme value distribution significantly improved the performance of the flood prediction over different spatial scales. The generalized extreme value flood prediction performance improvement measured in root mean square error varied between 5.84 and 67.91% over other commonly used probability distribution models. Thus, the flood frequency analysis using the generalized extreme value distribution could be essential for the efficient planning and design of hydraulic structures in the Blue Nile River Basin. Furthermore, this study suggests that, in the future, significant efforts should be put to conduct similar flood frequency analyses over the other major river basins of Ethiopia.


Introduction
In hydrology, the quantification of design peak discharges on data-scarce catchments has been a continuing problem [1,2]. Precise estimates of flood quantiles are needed for the efficient design of hydraulic structures [3,4]; however, historical data that are required to quantify the flood statistics are usually unavailable at the site of interest or the available information may not be representative of the catchment studied because of the changes in the watershed characteristics, such as urbanization [2,5]. The measured hydrological data, particularly in developing countries such as Ethiopia can be limited, short, or nonexistent to the extent that they are far from representative of the basin under consideration [1,6]. A promising and elegant way to solve this problem is deriving the flood frequency for more than 30 years from 11 stations in Pakistan, such that they suggested the generalized Pareto and Weibull distributions. Dong et al. [8] developed a nonstationary flood frequency analysis in China considering climate change and human activity. Chen et al. [7] compared eight probability distributions considering hypothesis tests and information-based criteria for identifying the best flood frequency prediction model in the Thames River from the United Kingdom, Wabash River from the United States of America, and both Beijing River and Huai River from China. They reported that the information-based criteria perform better than hypothesis test methods for identifying the most suitable probability distribution function. Langat et al. [12] compared the performance of six probability distributions for frequency analysis of the maximum, minimum, and mean flows in the Tana River, Kenya. They reported the suitability of the gamma (Pearson type 3) and lognormal distribution functions for predicting the maximum streamflow, while the Weibull and the generalized extreme value distributions for predicting the annual minimum flows, and the lognormal and generalized extreme value distributions for predicting the annual mean flows in the study region. Griffis and Stedinger [30] suggested the log-Pearson type III for flood frequency analysis in the United States of America, while Rahman et al. [32] recommended the log-Pearson type III, generalized extreme value, and generalized Pareto distributions for flood frequency analysis in Australia. Thus, the most widely used and recommended probability distributions in the previous studies have been considered in this study.
It is worth mentioning that the major sources of uncertainty in flood prediction included the heterogeneity of catchments that are included in the flood frequency analysis, the parameter estimation method, and the probability distribution selection approach [2]. The suitability of the probability distribution functions for improved flood frequency analysis might depend on the hydro-climatic regimes of the region; therefore, identification of the most suitable flood frequency probability distribution for different climate zones should be assessed. Thus, the differences in the hydro-climatic characteristics should serve as a hydro-physical basis for choosing the most suitable probability distribution for the studied region. Despite previous efforts in advancing the flood frequency prediction models in different regions, the performance evolution of the flood frequency prediction models considering different spatial scales within the same hydro-climatic setting has not perceived much attention. Most of the existing flood frequency prediction models comparison studies, to our knowledge, are either in the way of investigating the most suitable probability distribution for flood modeling at a specific site of the studied region or investigating the most suitable flood frequency approach for the entire region, including the gauged and ungauged sites, through the regional flood frequency analysis approach. However, investigation of the flood frequency distribution performance evolution considering different spatial scales within the same hydro-climatic setting might give the robust probability distribution model for flood prediction in the entire basin. It is essential to note that the choice of flood prediction model is highly influenced by the flood magnitude. Thus, considering the flood magnitudes from different basin spatial scales could improve our understanding of the probability distributions performance evolution with different basin spatial scales (i.e., with different flood magnitudes). In this respect, this study considered four small spatial scale basins with an area ranging between 514 and 1656 km 2 , while the large and medium spatial scale basins cover the areas of 199,812 and 65,784 km 2 , respectively. It is worth mentioning that each of the small-scale catchment contribute to the flow magnitude measured at both the medium scale and large scale basins, such that the performance evolution of each probability distribution function with flow magnitude can be investigated. Thus, the ultimate goal of this study is to investigate the robust flood frequency prediction model among the most commonly used probability distributions by considering different basin spatial scales in the Blue Nile River Basin. The robust probability distribution will perform satisfactorily in all selected basin spatial scale scenarios, such that the investigated flood prediction model can be used for the design, planning, and management of the regional water resources.
This study relies on the investigation of the robust probability distribution among the commonly used univariate two-, three-, and four-parameter probability distributions for flood frequency analysis in Hydrology 2020, 7, 44 4 of 21 the upper Blue Nile River Basin. However, multivariate probability distributions such as bivariate and trivariate have also been suggested by several studies for flood frequency analysis [34][35][36][37][38][39][40][41]. The flood frequency analysis commonly requires fitting univariate distributions to annual peak discharges [34]. However, the flood and watershed characteristics are multidimensional processes; thus, multivariate distributions can be considered for a simultaneous consideration of several hydrological component processes in certain situations. The decision to adopt either the univariate, bivariate or multivariate distribution mainly depends on the objectives of the particular study [34]. For example, a univariate flood frequency analysis might be more suitable in the case of flood protection risk assessment with a small to moderate sized structure [34]. Conversely, a multivariate flood frequency analysis approach is desirable if the flood risk assessment considers the flood attenuation, duration, and volume in addition to the flood magnitude. In general, a multivariate consideration is more appropriate when the variables of interest are a function of two or more hydrological variables. In this direction, the copula-based multivariate distributions were commonly used. Comprehensive definitions and formulae for the copula-based frequency analysis can be found in the study by Salvadori and De Michele [39]. Moreover, Genest and Favre [36] can be referred for a comprehensive review and illustration of a copula estimation procedures.
This study also investigates the best parameter estimation approach for each of the selected probability distribution. The commonly used parameter estimation methods include the method of moments, maximum likelihood method, probability weighted moments, and L-moments [10,12,14,42]. Gharib et al. [10] and Bermudez and Kotz [43] mentioned that the maximum likelihood estimator, the method of moments, and the probability weighted moments are the most widely used and efficient parameter estimation approaches. The maximum likelihood parameter estimation method has been suggested by several previous studies [12,42,44]. For example, Langat et al. [12] and Laio [42] reported that the probability distribution parameter estimates using the maximum likelihood parameter estimation method are consistent, reliable, and unbiased. It is worth mentioning that the maximum likelihood parameter estimation method starts with the likelihood function with unknown model parameters and yielded the parameter values that maximize the sample likelihood. Moreover, the probability-weighted moments [45] were found to be superior to that of the standard moment-based estimates and suggested as an alternative method to the maximum likelihood parameter estimation approach [12]. The L-moment method has also been popularly used for parameter estimation due to its advantages of not being affected by sampling variability, small bias, and robustness compared to that of the standard moment method [24,46]. However, the L-moment method is more suited for three-parameter distributions with small sample sizes [47], and their estimates can be influenced by the outliers [48]. In this aspect, Zhang [49] proposed the hybrid between likelihood and moment parameter estimation approaches called the likelihood moment parameter estimation approach. Thus, this study considered the three commonly used parameter estimation approaches: the maximum likelihood estimation, the probability weighted moments estimation, and the likelihood moment estimation approaches. It is essential to note that an alternative parameter estimation method, such as the maximum entropy and neural network approaches, have also been proposed by the previous studies [50][51][52][53]. The maximum entropy was used for estimating the parameters of several probability distributions such as gamma, Pearson type III, Log-Pearson type III, and generalized Pareto probability distributions [51,52]. Singh [53] reported that the maximum entropy method is comparable to the commonly used method of moments, maximum likelihood method, probability weighted moments, and L-moments parameter estimation approaches. Molina-Aguilar et al. [50] also suggested a meta-heuristic harmonic search approach for estimating the parameters of the generalized extreme value distribution. In general, the specific objectives of this study are to (1) explore the influence of basin spatial scales on the performance of probability distribution functions, (2) select the most suitable parameter estimation technique among the maximum likelihood estimation, the probability weighted moments estimation, and the likelihood moment estimation approaches, and (3) investigate the robust flood prediction probability distribution function for flood frequency analysis considering different spatial scales in the Blue Nile River Basin.

Study Area
This study evaluated the performances of the commonly used probability distribution functions for flood frequency analyses in the Upper Blue Nile River Basin, which is the Ethiopian part of the Nile River Basin and locally known as the Abay River Basin (Figure 1). Three different spatial scales (i.e., large, medium and small spatial scales) are considered to evaluate the performances of each probability distribution function in view of reproducing the annual maximum flow series obtained from different basin scales. Investigating the robust probability distribution function considering different spatial scales in the study region has relevant purposes in the development, operation and maintenance of the ongoing and proposed water resources projects in the Upper Blue Nile River Basin. In this respect, the historical annual maximum flow series measured at the border between Ethiopia and Sudan (i.e., Eldiem gauging station) is considered to represent the large spatial scale scenario, whereas the annual maximum flow series at Kessie gauging station is considered to reflect the medium basin scale. Furthermore, four hydrologically different small-scale basins are considered to investigate the performance evolution of the probability distribution functions with relatively less magnitude in the annual maximum flow. The annual maximum flow series that represent the small spatial scales are measured at Gilgel Abay, Gummara, Ribb, and Megech gauging stations ( Figure 1). The small-scale catchments area ranging between 514 km 2 for Megech and 1656 km 2 for Gilgel Abay. Conversely, the large and medium spatial scales cover the areas of 199,812 and 65,784 km 2 , respectively. Each of the small-scale catchments contribute to the flow magnitude measured at both the medium-scale and large-scale basins, such that the performance evolution of each probability distribution function with flow magnitude can be investigated ( Figure 1). The observed streamflow data recorded for the periods 1980-2012, 1965-2013, 1959-2013, 1973-2012, 1960-2014, and 1961-2002, respectively, at Megech, Ribb, Gummara, Gilgel Abay, Kessie, and Eldiem stations, were collected from the Ministry of Water, Irrigation, and Energy of Ethiopia. The selected gauging stations with their spatial scales are shown in Figure 1.
From Figure 2, the study basin mainly dominated by wooded grassland, followed by agro-pastoral, agriculture, and sylvo-pastoral. However, in the study basin, the natural vegetation cover is rapidly changing into the agricultural crop land. The deforestation of natural woody vegetation as well as the expansion of cultivated land are increasing in the study basin. Thus, the land use change over the basin can influence the temporal and spatial dynamics of surface runoff. For example, the conversion of the natural vegetation cover into the agricultural crop land will cause an increase in the surface runoff magnitude. Consequently, this will influence the type of probability distribution that has been identified in the region based on prior information. Thus, it is essential to frequently update the most suitable flood frequency prediction model considering the land use changes. Moreover, the topography of the study area shows that the elevation varied between 502 and 4133 m above mean sea level. From the topographic map, the small-scale catchments show a similar slope pattern. Conversely, the medium-and large-scale basins show the diverse slope patterns. Hydrology 2020, 4, x FOR PEER REVIEW 6 of 22

Methodology
The general steps followed in this study are (1) acquisition of data for flood extremes analysis; (2) identification of a plotting position formula to compute observed probabilities of occurrence; (3) identification of appropriate parameter estimation technique for each probability distribution function; (4) fitting of the candidate probability distribution functions to the annual maximum flow series; (5) testing the goodness of fit measures; (6) investigating the robust probability distribution function for Hydrology 2020, 7, 44 8 of 21 flood frequency analysis at three different spatial scales; and (7) estimation of flood quantiles with different return periods.

Annual Maximum Flow
The annual maximum flow series from six gauging stations in the Upper Blue Nile River Basin are selected and ranked in descending order of magnitude. From the annual maximum flow series in each station, the highest flood of 115. 53, 127.79, 229.45, 260.35, 4763.69, and 7800.03 m 3 /s were observed at the Megech, Ribb, Gummara, Gilgel Abay, Kessie, and Eldiem stations, respectively. Table 1 presents the main statistical characteristics of the annual maximum flow series obtained from six selected river gauging stations in the Upper Blue Nile River Basin.
Moreover, the analyses show that there is a weak correlation between the annual maximum flow data of the selected gauging sites in the study basin, such that the best probability distribution function for one gauging site might not be the best for the other sites. The weak correlation between the maximum flow data obtained from six sites might be attributed by the complex spatial and temporal variability of the climate in the Blue Nile River Basin. Therefore, it is essential to investigate the best probability distribution function for each gaging site considered in this study. Moreover, Hosking and Wallis [24] reported that there will be no single distribution that applies to each site in a hydrologically heterogeneous region. Hence, this study aims to find the most suitable probability distribution function for each gauging station considered in this study. Furthermore, the performance evolution of the candidate probability distribution functions over different spatial scales are investigated in order to suggest the robust model for flood frequency analysis in the study region.

Probability Distributions
The most commonly used probability distribution functions (Table 2) are applied at the selected six representative gauging stations in the Upper Blue Nile River Basin. The candidate probability distribution functions are: extreme value type-II (Frechet, EV2), the three parameter gamma (GM3P), generalized gamma (GMG), generalized extreme value (GEV), beta (BT), the four parameter generalized gamma (GMG4P), Log-Pearson type-III (LP3), the three parameter lognormal (LN3P), and the three parameter Weibull (WB3P) probability distributions.
The beta distribution is a continuous probability distribution characterized by boundary parameters [a, b] and parametrized by two shape parameters (α 1 , and α 2 ). The gamma distribution is defined in terms of a shape factor (α), a scale factor (β), and a location factor (γ). The three-parameter gamma is also referred to as the Pearson Type III distribution. The lognormal distribution is based on the normal distribution with a scale parameter σ and location parameter µ. The Log-Pearson type-III distribution [54] has a shape parameter (α), scale parameter (β), and location parameter (γ). The Log-Pearson type-III distribution has been widely used for hydrologic frequency analyses after the recommendation by the United States Water Resources Council [55]. Similarly, the Weibull distribution has been commonly used for flood frequency analysis and reliability analysis. The Weibull distribution has two versions of two-parameter and three parameter distributions with the shape parameter (α), scale parameter (β) and location parameter (γ, with γ set to zero yields the two-parameter Weibull distribution). Moreover, the generalized extreme value distribution has been popularly used for flood frequency analyses in the United Kingdom [23]; it has three parameters reflecting on shape (k), the scale (σ), and the location (µ). The extreme value type-II is one of the commonly used probability distributions for modeling extreme events with shape parameter (α), scale parameter (β), and location parameter (γ, γ = 0 yields the two-parameter Frechet distribution). Table 2 presents the candidate probability distribution functions with their parameter bounds.

Parameter Estimation Methods
In this study, nine commonly used probability distributions are considered to suggest the most suitable model for flood frequency analysis in Ethiopia, particularly in the Upper Blue Nile River Basin. The Hazen and Weibull plotting position relationships are the commonly used approaches in flood frequency analysis [56]; both of these methods satisfy the Gumbel's [57] five criteria in terms of plotting position relationships. The parameters of the candidate probability distributions considered in this study are estimated using three commonly used methods: the maximum likelihood estimation, the probability weighted moments estimation, and the likelihood moment estimation approaches. The estimated parameters are used to calculate quantile estimates for different return periods or, conversely, to calculate the return period for a given flood magnitude.
The maximum likelihood approach [58] is the commonly used parameter estimation approach. This approach maximizes the likelihood function (L) for the selected annual maximum flows (x) as follows Note that the first derivative of the function (f ) provides the estimated parameters that maximize Equation (1).
The probability weighted moment parameter estimation approach has been recommended for flood sample sizes below 500 [59]; thus, this study adopted this approach for estimating the selected probability distribution functions (F) parameters with annual maximum flow (x) as follows The likelihood moment parameter estimation approach [60] is the hybrid between likelihood and moment parameter estimation approaches based on the following approach where n is the sample size, θ = ξ/σ, and P = −m/ n j=1 log 1 − θ x j .

Performance Measures
The parameter values of the candidate probability distributions were estimated by the commonly used three parameter estimation approaches: the maximum likelihood estimation, the probability weighted moments estimation, and the likelihood moment estimation approaches. The candidate probability distributions were then evaluated based on the goodness of fit tests that were used to measure the compatibility of annual maximum flow data with each probability distribution function. Moreover, the probability differences plot is used to graphically detect the performance variation of each distribution used in this study. The Anderson-Darling (A 2 ), Kolmogorov-Smirnov (D), and Chi-Squared (χ 2 ) statistical tests are the commonly used methods to check the adequacy of the probability distribution functions for flood frequency analysis [27]; therefore, this study considered these three goodness of fit tests with the root mean square error (RMSE) method to investigate the robust probability distribution function for flood frequency analysis in the Blue Nile River Basin. In general, to investigate the robust probability distribution function for the study region, the best probability distribution fit ranking approach was considered that the robust distribution should perform satisfactorily at all spatial scales considered in this study. The probability distribution with the lowest sum of rankings considering all performance metrics at all gauging stations is considered to be the robust probability. The Anderson-Darling (A 2 ), Kolmogorov-Smirnov (D), and Chi-Squared (χ 2 ), and RMSE performance measure values can be computed as follows where F i is the empirical distribution function, F x is the tested distribution function, Q max,i is a sample maximum flow value, x is a calculated maximum flow value for the selected distribution function, n is a sample size, k is the number of bins, O i is the observed frequency for bin i, and E i is the expected frequency for bin i.

Application of the Probability Distribution Functions
The parameters of the candidate probability distribution functions considered in this study were estimated using nine of the probability distribution functions. To this end, the maximum likelihood method proved to be good for the EV2 (frechet -3P), the three-parameter gamma, the generalized gamma, the four-parameter generalized gamma, the beta, the three-parameter lognormal, and the three-parameter Weibull distributions; whereas the probability-weighted moments yielded the best parameter values for the generalized extreme value and Log-Pearson type-III distributions. Figure 3 presents the probability density functions of all candidate distributions computed at each gauging station considered in this study. The probability density function can be used to visually detect the deviation of the predicted quantile from the actual data. The actual and predicted quantiles at the Gilgel Abay and Eldiem gauging stations showed negative skewness, whereas the quantiles in the remaining gauging stations were observed to have a positive skewness. The highest positive skewness was observed at the Megech gauging station, at which the Lognormal and Log-Pearson distributions perform relatively better than the remaining candidate distributions. This indicates that the log-transformation of the highly skewed data could improve the prediction results.
In general, most of the probability distribution functions considered in this study performed relatively better in the large and medium scale basins compared to that of the smaller catchments. In general, it is essential that the flood frequency analysis needs more attention to reduce both the risk of flooding and failure of hydraulic structures as well as to mitigate the flood associated damages in the study region. Therefore, this study infers that a comprehensive flood frequency analysis that aims to investigate the robust probability distribution function in the study region could improve our understanding of the possible future probable flood that leads to the efficient planning and design of hydraulic structures.

Goodness of Fit Test Statistics
The three goodness of fit test statistics were computed for each probability distribution function at six gauging stations (i.e., Eldiem, Kessie, Gilgel Abay, Gummara, Megech, and Ribb). The goodness of fit test statistics proved that all the candidate probability distributions can be used for flood frequency analyses in the study region with an acceptable fit value that measured at the significance level (alpha) of 0.2. Figure 4 presents the rank of the candidate probability distributions for each gauging station using the three goodness of fit test statistic values. Based on the Kolmogorov-Smirnov test statistic values, the generalized extreme value distribution ranked first at Ribb, Kessie, and Gilgel Abay stations, the beta ranked first at Eldiem (Border) station, the generalized gamma ranked first at the Gummara station, and the lognormal (3P) performed better than the other candidate distributions at Megech station. Similarly, based on the Anderson-Darling test statistic values, the generalized extreme value distribution performed best at both the large and medium scale basins (Eldiem and Kessie), whereas the Weibull (3P) performed better than the other candidate probability distributions at the small scale basins (Gilgel Abay and Gummara). Moreover, the Log-Pearson type-III and lognormal (3P) distributions proved to be the best at Ribb and Megech stations, respectively. Using the Chi-Squared test statistic values, the frechet (3P), gamma (3P), generalized extreme value, and Weibull (3P) distributions were found to be the best at Megech, Gummara, Kessie and Ribb stations, respectively. Conversely, the generalized gamma distribution performed better than the other candidate distributions both at the Eldiem and Gilgel Abay stations based on the Chi-Squared test statistic. Based on the overall analyses (three test statistics and six stations), the generalized extreme value distribution proved to be the robust distribution for flood frequency analysis over different spatial scales in the study region. The generalized extreme value distribution provides satisfactory results in all spatial scales considered in this study. Conversely, the performance of the remaining probability distributions in the six gauging stations varied largely; their performance was dependent on the annual flow magnitudes.
relatively better in the large and medium scale basins compared to that of the smaller catchments. In general, it is essential that the flood frequency analysis needs more attention to reduce both the risk of flooding and failure of hydraulic structures as well as to mitigate the flood associated damages in th

Discussion
The probability distribution with the lowest sum of rankings in view of the three goodness of test statistic values at all gauging stations (three test statistics and six test stations) is considered to be the robust probability distribution for flood frequency analysis in the Blue Nile River Basin. In this direction, the generalized extreme value distribution proved to be the best for flood frequency analysis in the study region based on the Kolmogorov-Smirnov and Anderson-Darling test statistics, whereas both the generalized extreme value distribution and Weibull (3P) were found to be the best probability distributions based on the Chi-Squared test statistic. Thus, the results showed the robustness of the generalized extreme value distribution for flood frequency analysis over different spatial scales in the Blue Nile River Basin.
This study further compared the performances of the candidate probability distributions using the RMSE to support the results with the three goodness of fit test statistics. Figure 5 presents the stacked bar plots of the RMSE for each probability distribution evaluated at six gauging stations.

Discussion
The probability distribution with the lowest sum of rankings in view of the three goodness of test statistic values at all gauging stations (three test statistics and six test stations) is considered to be the robust probability distribution for flood frequency analysis in the Blue Nile River Basin. In this direction, the generalized extreme value distribution proved to be the best for flood frequency analysis in the study region based on the Kolmogorov-Smirnov and Anderson-Darling test statistics, whereas both the generalized extreme value distribution and Weibull (3P) were found to be the best probability distributions based on the Chi-Squared test statistic. Thus, the results showed the robustness of the generalized extreme value distribution for flood frequency analysis over different spatial scales in the Blue Nile River Basin.
This study further compared the performances of the candidate probability distributions using the RMSE to support the results with the three goodness of fit test statistics. Figure 5 presents the stacked bar plots of the RMSE for each probability distribution evaluated at six gauging stations. Again, the generalized extreme value distribution proved to be the best distribution to predict the flood quantiles in all spatial scales considered in this study. The generalized extreme value distribution improved the flood quantile prediction performances obtained by the Frechet (3P), generalized gamma, gamma (3P), lognormal (3P), beta, Log-Pearson 3, Webull (3P), and generalized gamma (4P) probability distributions, respectively, by 67.91%, 39.64%, 22.01%, 20.41%, 16.18%, 12.81%, 8.15%, and 5.84%. The largest cumulative RMSE was observed with the frechet (3P) with value reaches~691.62 m 3 /s, while the minimum cumulative RMSE (second to the generalized extreme value distribution) was observed with the generalized gamma (4P) with values of~235 m 3 /s. In general, based on the overall results, it can be concluded that the generalized extreme value distribution can be considered as the robust probability distribution function for flood frequency analysis in the Blue Nile River Basin. Figures 6 and 7 present the quantile estimates for each site with 95% confidence limits using the generalized extreme value distribution. Note that the predicted quantiles at each spatial scale could be essential for the efficient design of hydraulic structures in the study region. The flood frequency curve for different spatial scales might not be the same since the large scale basin may respond more to the larger-scale dynamics generated by heavy monsoon rains in the region, whereas the small scale basins may respond more to the local flash flood events. Thus, developing a flood frequency curve considering different basin spatial scales might improve our understanding of the probable flood for different return periods. This approach could be used for the efficient planning and design of hydraulic structures in the study region in view of minimizing the risk associated with the estimation of peak design discharges. From the analysis, the flood prediction uncertainty for the Megech and Ribb gauging stations was relatively higher than that of the remaining stations. In general, the flood prediction uncertainty was observed to be minimum in the medium and large basin spatial scales compared to that of the error in the small-basin spatial scales. Figures 6 and 7 present the quantile estimates for each site with 95% confidence limits using the generalized extreme value distribution. Note that the predicted quantiles at each spatial scale could be essential for the efficient design of hydraulic structures in the study region. The flood frequency curve for different spatial scales might not be the same since the large scale basin may respond more to the larger-scale dynamics generated by heavy monsoon rains in the region, whereas the small scale basins may respond more to the local flash flood events. Thus, developing a flood frequency curve considering different basin spatial scales might improve our understanding of the probable flood for different return periods. This approach could be used for the efficient planning and design of hydraulic structures in the study region in view of minimizing the risk associated with the estimation of peak design discharges. From the analysis, the flood prediction uncertainty for the Megech and Ribb gauging stations was relatively higher than that of the remaining stations. In general, the flood prediction uncertainty was observed to be minimum in the medium and large basin spatial scales compared to that of the error in the small-basin spatial scales.  To suggest the robust probability distribution in the Blue Nile River Basin, this study adopted three goodness-of-fit tests: the Anderson-Darling, Kolmogorov-Smirnov, and Chi-Squared statistical tests. However, a major limitation of the present study is that the most suitable probability distribution identified in this study might be different if more goodness-of-fit tests were considered. For example, Haddad et al. [60] considered the Akaike information criterion, Bayesian information criterion, and Anderson-Darling criterion for investigating the most suitable probability distribution function in Australia. They reported that the Anderson-Darling criterion proved to be the best for identifying the most suitable probability distribution compared to the Akaike and Bayesian To suggest the robust probability distribution in the Blue Nile River Basin, this study adopted three goodness-of-fit tests: the Anderson-Darling, Kolmogorov-Smirnov, and Chi-Squared statistical tests. However, a major limitation of the present study is that the most suitable probability distribution identified in this study might be different if more goodness-of-fit tests were considered. For example, Haddad et al. [60] considered the Akaike information criterion, Bayesian information criterion, and Anderson-Darling criterion for investigating the most suitable probability distribution function in Australia. They reported that the Anderson-Darling criterion proved to be the best for identifying the most suitable probability distribution compared to the Akaike and Bayesian information criterions for three-parameter distributions. Conversely, the Akaike and Bayesian information criterions were observed to be better than Anderson-Darling criterion for identifying the best probability distribution from two-parameter distributions. Chen et al. [7] also compared eight probability distributions considering hypothesis tests and information-based criteria for identifying the best flood frequency prediction model in the Thames River from the United Kingdom, Wabash River from the United States of America, and both the Beijing River and Huai River from China. They reported that the information-based criteria perform better than that of the hypothesis test methods for identifying the most suitable probability distribution function. In addition, for future study, an alternative flood frequency model such as the multivariate probability approach, which characterizes the simultaneous behavior of flood characteristics (i.e., flood peak, volume and duration), as well as different parameter estimation approaches (e.g., harmonic search, maximum entropy and neural network), should be investigated. Therefore, this study recommends that significant efforts should be devoted to investigating the most suitable probability distribution with a similar approach in the future by considering the major limitations of this study.

Conclusions
This study investigated the robust probability distribution function for flood frequency analysis over different basin spatial scales in the Upper Blue Nile River Basin. The commonly used annual maximum flow method was used to investigate the robust probability distribution among nine commonly used probability distributions. In this study, six streamflow gauging sites with different basin spatial scales were considered to check the performance evolution of the probability distribution functions with varying spatial scales. Of six gauging stations, four were considered to represent the small basin spatial scale, whereas one gauging station was considered for the medium and large basin spatial scales. The performances of the candidate probability distributions were assessed using the Anderson-Darling, Kolmogorov-Smirnov, and Chi-Squared statistical tests to check the adequacy of the models in predicting the flood quantiles at each station. The root mean square error was also considered to define the best fit probability distribution for each gauging station. This study also assessed the performance evolution of the selected probability distributions with spatial scales by considering three different basins scales of small-, medium-, and large-scale basins in the Blue Nile River Basin. The flood frequency curve in different basin spatial scales in the Blue Nile River Basin might not be the same, since the large-scale basin may respond more to the larger scale dynamics, whereas the small-scale basins may respond more to the local flash flooding. Thus, flood frequency analyses considering different basin spatial scales might improve our understanding of the flood dynamics in the study region.
Based on the overall analyses, the generalized extreme value distribution was found to be the robust probability distribution for flood frequency analysis over different basin spatial scales in the Blue Nile River Basin; this probability distribution was frequently found to be the highest ranked model for most of the selected stations based on the Anderson-Darling, Kolmogorov-Smirnov, and Chi-Squared statistical tests. As per the root mean square error evaluation criteria, the generalized extreme value and generalized gamma (4P) distributions, respectively, were found to be the first-and second-best probability distributions for predicting the annual maximum flow in all gauging stations. The root mean square error improvement with the generalized extreme value distribution was observed to vary between 5.84% over the generalized gamma (4P) and 67.91% over the frechet (3P). In general, it can be concluded from the overall results that the generalized extreme value distribution proved to be the robust model for flood frequency analysis over different basin spatial scales in the study region. The findings of this study could be used for the efficient planning and design of hydraulic structures that were used to minimize the risk associated with the destructive nature of the flood. Conducting similar flood frequency analyses in the other major river basins of Ethiopia is suggested.