Frequency Analysis of High Flow Extremes in the Yingluoxia Watershed in Northwest China

Zhanling Li 1,*, Yuehua Wang 1, Wei Zhao 1, Zongxue Xu 2 and Zhanjie Li 2 1 School of Water Resources and Environment, China University of Geosciences, Beijing 100083, China; 2105140007@cugb.edu.cn (Y.W.); 2105140008@cugb.edu.cn (W.Z.) 2 College of Water Sciences, Beijing Normal University, Beijing 100875, China; zongxuexu@vip.sina.com (Z.X.); lzhan@bnu.edu.cn (Z.L.) * Correspondence: zhanling.li@cugb.edu.cn; Tel.: +86-10-8232-2281


Introduction
A significant amount of attention has been paid to hydrological extreme events, which are likely to increase in frequency in most regions of the world [1][2][3].There are two common ways to study the so called extreme events: one is the annual maximum series (AM) and the other is the peaks over a threshold (POT) series [4][5][6][7].Rao and Hamed [8] showed that the AM series is statistically more efficient than the POT series when λ is small ( λ ă 1.65 ), where λ is the mean number of peaks per year included in the POT series.However, as only the peak flow in each year is considered, the use of the AM series may involve some loss of information.For example, the second or third peak within a year may be greater than the maximum flow in other years, and yet they are ignored [6].This situation is avoided in the POT series where all peaks above a certain threshold value are considered, and thus more information about the extremes would be involved in the analysis [6].In the study of Madsen et al. [9], they found that the POT model is more advantageous than the AM model in cases where only short records are available.The POT series also has some obvious disadvantages, and the major one is that the flood peaks might not form an independent time series, since some flood peaks may occur on the recession curves of the preceding flood peaks [10].Thus, both series are used to indicate high flow extremes and compared to each other in this study.
Plenty of probability distribution models, e.g., the Log Pearson Type III (LP3), Loglogistic, Lognormal, Burr, Weibull, Gamma, Generalized Extreme Value (GEV) model, and the Generalized Pareto distribution (GPD) are commonly used in fitting extreme events [11][12][13].Rahmani et al. [14] used the Weibull distribution to calculate the extreme precipitation frequency in Kansas and its adjacent states.Du et al. [15] and Xia et al. [6] used the GEV and GPD models to estimate the historical extreme precipitation frequency in the Haihe and Huaihe river basins of China.Benyahya et al. [12] compared five probability distributions to identify the appropriate modes for providing the most accurate seasonal maximum precipitation in southern Quebec, Canada.Some studies show that the GEV or GPD model provides optimum fitness to extreme events for comparison studies [4,12,16], while others show different results [13,17].
Most of the extreme value models have been obtained through mathematical arguments that assume an underlying process consisting of a sequence of independent and stationary variables.However, for the types of data to which extreme value models are commonly applied, such assumptions are usually unrealistic [18].Ling et al. [19] detected an abrupt change in 1993 in the high-flow index of the annual runoff for the Tarim River basin over the last 50 years.Chen et al. [20] found a change point in 1991 for the annual maximum flood flow in southern China.Although the usual extreme value models are still applicable in the presence of non-stationary and temporal dependence [18], the estimations of the model parameters and return levels based on model fitting would be suspicious without considering these assumptions [4,[20][21][22][23].Therefore, there is a need to investigate the validity of the stationarity and independence assumptions for the series before probability modeling is performed.
This study uses the AM and the POT series to indicate the hydrological extremes for the Yingluoxia watershed, and mainly focuses on the following issues: (1) the selection of the threshold for the POT series and a comparison of the AM and POT series; (2) statistical tests of the stationarity and independence assumptions for the AM series; and (3) estimations and uncertainties of the return levels for both series using the GEV and GPD models.

Study Area and Data Description
The Heihe River basin, the second largest inland river basin in China, has received a great deal of attention from the public and from water authorities.Based on the available literatures, most of the previous studies are mainly focused on the mean flows [24,25].For better disaster management and mitigation in general, it is important to be aware that understanding the changes in flow extremes is more important than understanding the changes in the mean pattern [26].
The Heihe River basin, which originates from the Qilian Mountains, is located between 37 ˝N-42 ˝N latitude and 97 ˝E-102 ˝E longitude.The Yingluoxia watershed is the upper reach of the Heihe River basin (Figure 1).The area of this watershed reaches 10,009 km 2 and its altitude ranges from 3300 m asl to 1700 m asl.About 50% of the total runoff at the watershed outlet is generated in the mid-mountain zone (>2900 m asl) [27].With the decreasing of altitude, the annual mean precipitation decreases from 400 mm to 180 mm, and the annual mean temperature increases from ´3 ˝C to 7 ˝C.There are three hydrological stations in the watershed.The locations and the basic information for three stations are shown in Figure 1 and Table 1.The flow data of the 36-year, 48-year, and 66-year flow volumes for the stations are obtained from the National Hydrological Statistical Yearbook, and they are regularly checked and can be regarded to be of good quality.The annual mean flow is 13 m 3 /s, 23 m 3 /s, and 52 m 3 /s for QL, ZMSK, and YLX stations, respectively.As is shown by Figure 2, the inter-annual distributions of monthly flow are quite uneven, highly correlated with the temporal distributions of precipitation in the basin.Nearly 80% of precipitation is concentrated in June-September, with the largest frequencies concentrated in July.The annual maximum flow usually occurs in July or August.

Probability Modeling
The probability distribution of the AM series can be statistically fitted by making use of the family of extreme value distributions referred to as the Generalized Extreme Value (GEV), which was introduced by Jenkinson [28].GEV distribution has been widely used in the analysis of hydrological extremes because of its flexibility in representing three asymptotic types of extreme value probability distributions [4,16].The cumulative probability function (CDF) for GEV distribution is given by: where µ, σ and ξ are the location, scale and shape parameters, respectively.In the case ξ = 0, the distribution matches the Extreme Value type 1 (EV1) or Gumbel distribuon.
The POT series usually states that excesses from any sample over a high threshold may follow such distributions as exponential or Generalized Pareto distributions (GPD).Here, we choose the GPD to fit the POT series due to its flexibility in representing the different types of extreme value probability distributions.The theoretical development of the GPD can be found in Coles [18].The CDF for the GPD distribution is given by: where u is the location parameter, also known as the threshold; x ´u is the series of exceedances over the threshold.σ u and ξ are referred to as the scale and shape parameters, respectively.For a given shape parameter, the scale parameter controls the mean of the exceedances above the threshold.The value of u must be specified before fitting Equation (2), since the GPD is a distribution of the threshold excesses [18].The shape and scale parameters in GEV and GPD distributions are estimated by the maximum likelihood estimation (MLE) [29,30].The Kolmogorov-Smirnov (K-S) test and the Anderson-Darling (A-D) test are performed for evaluating the suitability of selected probability distributions.Assume a random sample, x 1, ¨¨¨xn , from some distribution with CDF, F(x).The empirical CDF is F n (x) = 0, i/N 0 , 1 for x < x 1 , x i ď x and x ď x i+1 , x ě x n , where N 0 is the number of observations.The statistic D max in the K-S test is defined as: If D max ą D α n (the critical value at the significance level of α), then the null hypothesis that the data follows a specified distribution is rejected.The K-S test tends to be more sensitive near the center of the distribution, while the A-D test gives more weight to the tails.The Anderson-Darling statistic, A 2 , is defined as: where x p1q , ¨¨¨, x pnq are the ordered observations, in an increasing order.If A 2 ą A 2 α (the critical value at the significance level α), the hypothesis regarding the distributional form is rejected [31].

Stationarity and Independence Tests
We examine the stationarity and independence assumptions by estimating the features of the change points in mean, monotonic trends, and autocorrelations for the series.The presence of change-points can have a significant impact on the results of the monotonic trend analyses [20], and we first perform a change-point analysis.The Pettitt test [32], a non-parametric test based on a version of the Mann-Whitney statistic that allows testing whether two samples are from the same population, is used to test the high flow extremes data for abrupt changes in the mean.The test statistic, U t,n, in the Pettitt test is given by: , then the null hypothesis of Pettitt's test, with the absence of a changing point, will be accepted, at a specified significance level p such as 0.05 (0 ă p ă 1 ).
The presence of monotonic trends of time series are investigated by means of one of the most widely used tests, the Mann-Kendall test [33].This test is non-parametric, making it more robust against outliers and departures from normality, which usually occurs in hydrological datasets.In the Mann-Kendall test Z c is given by where n is the data record length, and x i and x j are the sequential data values.When |Z c | ą z 1´α{2 , in which z 1´α{2 are the standard normal deviates and α is the significance level, H 0 (no significant trend in dataset) will be rejected.Q, the gradient of Kendall, shows the extent of trends.
It means an upward trend when Q is positive and a downward trend when it is negative, where We employ the autocorrelation test by examining the autocorrelation coefficients to check the assumption of independence for the data series.When the absolute values of the autocorrelation coefficients of different lag times, for a time series with n observations, are not larger than the typical critical value, i.e., 1.96{ ? n corresponding to the 0.05 significance level, the observations in the time series can be regarded as being independent from each other.

Return Level Estimations
When we explore the extreme events, what extreme events might occur on 100-year or even longer return periods given the much shorter period data available is another active topic [5,22].Suppose that a GEV with parameters µ, ξ and σ is a suitable model for the AM series, the estimations of high flow extremes, x, at a given return period, T, can be obtained from Suppose that a GDP with parameters ξ and σ u is a suitable model for the exceedances of a threshold, u.The estimations of high flow extremes x m , which is exceeded on average once every m observations, can be written as: in which ζ u " k n , k is the exceedance data length (for example, k = 32 for QL station shown in Table 2) and n is the total observations in day (n = 36 ˆ365 = 13140 for QL station).For the POT series, the return period represents the average interval between the high flows that exceed threshold value, and thus the m-observation return level, x m , is constructed.It is should be noted that there is an important distinction in meaning between the return periods computed from the AM and POT series.In the AM series, the return period is the average interval, in which a flood of a given size will occur as an annual maximum; while in the POT series, the return period represents the average interval between floods of a given size, regardless of their relation to the year or any other period [34].It is often more convenient to give return levels on an annual scale, so that the T-year return level is approximately the level expected to be exceed once every T years.If there are N y observations per year, this corresponds to the m-observation return level, where m = T ˆNy [18].The flow data is daily in this study, so the T-year return level corresponds to the m-observation return level with m = 365 ˆT.In this case, we finally get the return levels at a T-year return period instead of at an m-observation return period.Note: The year shown in the bracket is when maximum and minimum flows occurred.

Threshold Selection and POT Series
Threshold selection for the POT series is not an easy task.Too low a threshold value is likely to violate the asymptotic basis of the model, leading to bias, while too high a threshold value will generate a few excesses, with which the model can be estimated, leading to high variance [18].Scarrott and MacDonald [35] provided a comprehensive review of threshold selection approaches.In this article, combined with the consideration of the physical meanings of "high flow extremes", we mainly employ two commonly used methods for the selection of the optimum threshold [5,36,37]: one is examining the mean excess plot carried out prior to model estimation, and the other is assessing the stability of parameter estimates based on model fittings across a range of different thresholds.
The mean excess plot uses the expectation of GPD excesses, E(X ´u|X > u = σ µ /(1 ´ξ), as a diagnostic, defined for ξ <1 to ensure the mean exists.For any higher x > u, the expectation becomes E pX ´x |X ą xq " pσ u `ξxq { p1 ´ξq , which is linear in x with gradient ξ{ p1 ´ξq and intercept σ u { p1 ´ξq .Empirical estimates of the sample mean excesses are typically plotted against a range of thresholds.The threshold is chosen to be the lowest level where all the higher threshold based sample mean excesses are consistent with a straight line [35].Figure 3a presents the mean excess plot with 95% confidence intervals (CIs) for the daily flow data at YLX station, from the extRemes package in R. It is noted that the 95% CIs in Figure 3 are not quite right since this figure is based on the threshold exceedance data, not the POT data series.From Figure 3a, four fitted mean excess straight lines are detected corresponding to thresholds of 100 m 3 /s, 350 m 3 /s, 420 m 3 /s, and 650 m 3 /s.The threshold of 100 m 3 /s gives 3089 exceedances, corresponding to as many as 47 high flow extremes occurring per year on average during the period from 1944 to 2011, which deviates from the common cognition on extreme events.Above 650 m 3 /s, less than three exceedances are included, which are too few data points to make meaningful and reliable inferences.For thresholds of 350 and 420 m 3 /s, both mean excesses are presented as straight lines.While, there are relatively larger deviations from the fitted plot, and greater uncertainties on the estimated return levels for the threshold of 420 m 3 /s.Figure 3b shows the estimated shape parameter stability plot across threshold values ranging from 100 to >500 m 3 /s.The shape parameter should be constant above some threshold levels, where asymptotics begin to apply [16].From Figure 3b, we find approximately constant estimates for the shape parameter above the threshold of 350 m 3 /s, and a very obvious decreasing or increasing trend is presented for the parameter estimates after the threshold of 420 m 3 /s, together with wider 95% CIs.Considering the points mentioned above, the threshold of 350 m 3 /s, yielding a threshold exceedance with 82 data, is finally identified for YLX station.Similar procedures are performed for QL and ZMSK stations, and thresholds of 80 m 3 /s and 170 m 3 /s are finally identified for the two stations.This assessment is largely subjective.Northrop and Coleman [38] and Wadsworth [39] reduce this subjectivity using a likelihood-based procedure to produce complementary plots that enable an automated threshold selection.Figure 4 shows the daily flow time series against series at Lag 1 for the three stations, and the red lines in the plots represent the thresholds.Above the threshold, a short-term serial correlation (although it is not strong) seems to exist, indicating the presence of extremal dependence.There are two commonly used ways to deal with this problem.One is filtering out a set of approximately dependent threshold excesses, and another is estimating the extremal index [22].We use the first approach to obtain an independent series.Considering the specific climatic and geographic conditions in one region, there is a criterion in the threshold exceedance series to meet the independence condition, and the successive high flow extremes should be separated by at least as many days as five plus the natural logarithm of the square miles in the basin area [40].The three hydrological stations in the study area control basin areas of 3000-10,009 km 2 , so the successive high flow extremes should be separated by at least as 12-13 days.Only the maximum flow is retained if successive high flow extremes occurred within the 12-13 days interval, and the rest are filtered out.We then get the POT series from the above declustering scheme with sample sizes of 32, 45, and 45 for QL, ZMSK, and YLX stations, respectively.Figure 5 shows the threshold exceedance series and Figure 4 shows the daily flow time series against series at Lag 1 for the three stations, and the red lines in the plots represent the thresholds.Above the threshold, a short-term serial correlation (although it is not strong) seems to exist, indicating the presence of extremal dependence.There are two commonly used ways to deal with this problem.One is filtering out a set of approximately dependent threshold excesses, and another is estimating the extremal index [22].We use the first approach to obtain an independent series.Considering the specific climatic and geographic conditions in one region, there is a criterion in the threshold exceedance series to meet the independence condition, and the successive high flow extremes should be separated by at least as many days as five plus the natural logarithm of the square miles in the basin area [40].The three hydrological stations in the study area control basin areas of 3000-10,009 km 2 , so the successive high flow extremes should be separated by at least 12-13 Only the maximum flow is retained if successive high flow extremes occurred within the 12-13 days interval, and the rest are filtered out.We then get the POT series from the above declustering scheme with sample sizes of 32, 45, and 45 for QL, ZMSK, and YLX stations, respectively.Figure 5 shows the threshold exceedance series and the POT series.From Figure 5, no obvious trends or serial correlations are found for the POT series, which means that the POT series can be regarded as approximately independent and stationary from graphic diagnosis.
Water 2016, 8, 215 8 of 15 the POT series.From Figure 5, no obvious trends or serial correlations are found for the POT series, which means that the POT series can be regarded as approximately independent and stationary from graphic diagnosis.the POT series.From Figure 5, no obvious trends or serial correlations are found for the POT series, which means that the POT series can be regarded as approximately independent and stationary from graphic diagnosis.

AM Series and Stationarity and Independence Tests
The AM series can be directly derived from the historical flow records, which consists of the maximum flow from each recorded year.The samples of the AM and POT series for the three stations are shown in Figure 5.The statistics of these two series are listed in Table 2.It is apparent that the numbers of high flow extremes in the POT series are not consistent with those in the AM series.For example, 31 out of 66 extremes in the AM series are not included in the POT series for YLX station, while lots of the second and even the third largest extremes from some years are included in addition to the largest one.The means of the POT series are 10%-18% higher than those of the AM series, which suggests that the POT series does take more extreme information into account in this study.
The results of stationarity and independence tests for the AM series are shown in Table 3.The calculated k ptq values in the Pettitt test are 69, 119, and 139 for QL, ZMSK, and YLX stations, respectively.These values are smaller than those of the critical values at the 0.05 significance level, which means that the null hypothesis, with the absence of a change point, will be accepted.Since no statistically significant change points are detected, we perform monotonic analysis on the entire record of the AM series directly.The absolute values of test Z c , in the Mann-Kendall test are smaller than the value of the normal standard deviations, z 1´α{2 (1.96 at α " 0.05 ), which means that no significant trends are detected for the AM series at the 0.05 significance level.From the autocorrelation coefficients for Lag1 and their corresponding 95% CIs, the AM series for the three stations are not highly autocorrelated and do not violate the assumptions of independence.The above analysis suggests that the independence and stationarity assumptions are approximately satisfied for the AM series, i.e., the subsequent model fitting is straightforward, without requiring consideration of serial dependence or the parameter changes in probability models through time.We fit the AM series using the GEV model and fit the POT series using the GPD model.The results of goodness-of-fit, together with the estimated optimal parameter values and standard errors (in the brackets), are shown in Table 4.The values of the estimated statistic, D max , in the K-S test are 0.09, 0.08, and 0.06 at QL, ZMSK, and YLX stations for the AM series, which is smaller than the critical values of D α n (0.22, 0.20, and 0.16 at the 0.05 significance level).The values of the estimated statistic, A 2 , in the A-D test are 0.19, 0.33, and 0.26 at the three stations for the AM series, which is smaller than the critical values of A 2 α at the 0.05 significance level.As for the POT series, both the estimated statistic D max in the K-S test and the estimated statistic A 2 in the A-D test are smaller than the critical values of D α n and A 2 α as well.It means that the GEV model fits the AM series well and the GPD model fits the POT series well.Table 4. Goodness-of-fit of probability modeling and the parameter estimations and standard errors (given in the parentheses) for the GEV and GPD models for the three stations in the Yingluoxia watershed.Note: The significance level, α, is 0.05.

Return Level Estimation
Extremes are scarce, thus estimates are often required for levels of a process that are much greater than have already been observed.Estimates of daily high flow extremes at a given return period are essential for the local flooding protection; thus, different return levels of daily high flow extremes in the Yingluoxia watershed are estimated herein.
The estimated return levels (solid lines) and the corresponding 95% CIs (dashed lines) from both the AM and POT series are presented in Figure 6, together with the empirical return periods (red and blue circles).The empirical return periods for the AM series are calculated using T " indicates the order of the ascending high flow extremes data (i = 1, 2, . . ., k), and k is the high flow data length [41].The empirical return periods for the POT series are calculated using T " in which ζ u " k n , k is the exceedence data length, n is the total observations in day, and i is the order of the ascending exceedance data (i = 1, 2, . . ., k).The GEV model provides a good match to the empirical estimations for the AM series and the GPD model provides a good match to those for the POT series.Table 5 lists the estimated return levels for the AM and POT series from both the GEV and GPD models, together with the 95% CIs.These estimated return levels are expected to be helpful in providing useful information for the design of local flooding defenses.At the 10-year return period, the GEV model fitted values are 135 (8.45) m 3 /s, 287 (10.70) m 3 /s, and 572 (37.63) m 3 /s for QL, ZMSK, and YLX stations, and the GPD model fitted values are 143 (9.18) m 3 /s, 295 (10.71) m 3 /s, and 591 (37.17) m 3 /s for the three stations with standard errors given in the parentheses.At the 100-year return period, the fitted values increase to 180 (21.67) m 3 /s, 337 (18.64) m 3 /s, and 867 (134.18)m 3 /s for the GEV model, and 179 (15.82) m 3 /s, 333 (12.76) m 3 /s, and 807 (93.01) m 3 /s for the GPD model for the three stations.Overall, the results for the estimated return levels from both the GEV and GPD models are fairly comparable, with the largest difference being less than 10% for the 10-year, 20-year, 50-year, and 100-year return periods.However, the 95% CIs of the estimated return levels for the AM series based on the GEV model are wider than those for the POT series based on the GPD model (Figure 6) especially at the longer return periods, which means that greater uncertainties exist for the extrapolation from the AM series and GEV model.
The POT series makes use of more information on extremes than the AM series, though discarding all but the cluster maxima is still wasteful of data.For comparison, we attempt to use all threshold exceedance points to estimate the return levels of high flow extremes with considering the extremal index.In Fawcett and Walshaw [22], the authors proposed five extremal index estimators.Due to our practical experiences, we select the approach mentioned in Coles [18] to estimate the extremal index, i.e., the extremal index θ " n c n u , in which n c indicates the number of clusters obtained above the threshold, and n u indicates the number of exceedances of the threshold.The m-observation return level is x m " u `σu ξ r `ζu θq ξ ´1‰ (ξ ‰ 0), in which ζ u " n u n [18], and the meanings of u , σ µ , and ξ are the same as those in Equation (8). Figure 7 shows the differences between the two schemes on the estimated return level.The differences are calculated with the formula of relative error.From Figure 7, the differences are not great between the POT series and all the threshold exceedances with considering the extremal index for the estimated return levels.For YLX station, it is relatively stable and within 3%.For ZMSK station, it is about 6% at less than 10-year return periods and decreases to lower than 5% with the return periods increasing.For QL station, relatively larger differences are found, with 10%-12% differences for less than 10-year return periods and 5%-10% differences for the longer return periods.These two  Figure 7 shows the differences between the two schemes on the estimated return level.The differences are calculated with the formula of relative error.From Figure 7, the differences are not great between the POT series and all the threshold exceedances with considering the extremal index for the estimated return levels.For YLX station, it is relatively stable and within 3%.For ZMSK station, it is about 6% at less than 10-year return periods and decreases to lower than 5% with the return periods increasing.For QL station, relatively larger differences are found, with 10%-12% differences for less than 10-year return periods and 5%-10% differences for the longer return periods.These two schemes give close results in return level estimations especially for YLX and ZMSK stations.We further analyze the 95% CIs of the estimated return levels from the two schemes.Results are shown in Table 6.The 95% CIs from the threshold exceedances are slightly narrower than those from the POT series.For example, for YLX station, the standard errors of the estimated return levels from the threshold exceedances are 33.55,64.43, 85.45, and 110.25 at the 10-year, 50-year, 100-year, and 200-year return periods, which are lower than those from the POT series with 37.17, 68.28, 93.01, and 123.74, respectively.Due to more data being involved in the analysis, using all excesses with considering the extremal index will tend to give more accurate return level estimates than those from the POT series.This result is consistent with the findings of Fawcett and Walshaw [22].
Water 2016, 8, 215 12 of 15 year return periods, which are lower than those from the POT series with 37.17, 68.28, 93.01, and 123.74, respectively.Due to more data being involved in the analysis, using all excesses with considering the extremal index will tend to give more accurate return level estimates than those from the POT series.This result is consistent with the findings of Fawcett and Walshaw [22].Table 5. Estimations and 95% CIs (in the brackets) of return levels from the AM and the POT series for the Yingluoxia watershed, together with the differences of return level estimations from both series.Note that for the extremal index used here, there are other methods possible for estimating the extremal index.Return level estimations are sensitive to the choice of the estimator of the extremal index, and extremal index values are related to the strength of serial correlation in the threshold exceedance series.Fawcett and Walshaw [22] have attempted to discuss this issue and found many meaningful results.As this paper mainly discusses the AM and POT series and probability modeling using the GEV and GPD models, we will pursue further research into this area by considering other estimation approaches for the extremal index, which would be helpful in understanding how the extremal index estimators affect the estimations and uncertainties of the return levels.

Conclusions
Both the AM and POT series are commonly used to study extreme events.In this paper, we construct these two series to indicate the hydrological extremes in the Yingluoxia watershed, and compare the estimations and uncertainties of return levels for both series by using the GEV and GPD models.Using all threshold excesses with considering the extremal index to estimate the return levels is also investigated for comparison.Before the probability modeling, the stationarity and independence assumptions for the AM and POT series are examined since ignoring these assumptions would result in errors in the estimations of the model parameters and return levels based on model fitting.Results show that, combined with a declustering scheme, the POT series are approximately independent, and they are also broadly stationary based on the graphic diagnosis.The AM series can be regarded as stationary and independent with the results of Mann-Kendall test, Pettitt test, and autocorrelation coefficient test.The GEV model fits the AM series well, and the GPD model fits the POT series well according the statistical K-S test and A-D test.The estimated return levels from the AM series, the POT series, and all threshold exceedances with considering the extremal index are fairly comparable.The largest difference between the former two series is less than 10% at the 10-year, 20-year, 50-year, and 100-year return periods.The difference between the latter two series increases first and then decreases as the return period increasing, and it is less than 10% for return periods over 10 years.As for the uncertainties of the estimated return levels, the AM series show the largest uncertainties on extrapolation, followed by the POT series.The threshold excesses with considering the extremal index tend to show the smallest uncertainties of return level estimations with the narrowest 95% CIs, due to more data involved in the analysis.

Figure 1 .
Figure 1.Locations of the Heihe River basin and the Yingluoxia watershed in China.

Figure 2 .
Figure 2. Inter-annual distributions of monthly mean flow in the Yingluoxia watershed.

Figure 1 . 15 Figure 1 .
Figure 1.Locations of the Heihe River basin and the Yingluoxia watershed in China.

Figure 2 .
Figure 2. Inter-annual distributions of monthly mean flow in the Yingluoxia watershed.

Figure 2 .
Figure 2. Inter-annual distributions of monthly mean flow in the Yingluoxia watershed.
and k ptq " max 1ďtďn |U t,n | represents the most significant change point t.n means the data record length.If k ptq ă b ´pn 3 `n2 qln p 2 6

Figure 3 .
Figure 3. (a) The mean excess plot for high flow data at Yingluoxia station (The solid jagged line is the mean excess plot, with 95% CIs shown as dashed lines.Blue solid lines correspond to the thresholds u = 100, 350, 420, and 650 m 3 /s.Vertical dashed lines mark the number of exceedances corresponding to these thresholds); (b) Threshold stability plots for shape parameter for the high flow data at Yingluoxia station (Circles are maximum likelihood estimates, with 95% CIs shown as vertical lines.Two thresholds u = 350 and 420 m 3 /s are shown by vertical dashed lines).

Figure 3 .
Figure 3. (a) The mean excess plot for high flow data at Yingluoxia station (The solid jagged line is the mean excess plot, with 95% CIs shown as dashed lines.Blue solid lines correspond to the thresholds u = 100, 350, 420, and 650 m 3 /s.Vertical dashed lines mark the number of exceedances corresponding to these thresholds); (b) Threshold stability plots for shape parameter for the high flow data at Yingluoxia station (Circles are maximum likelihood estimates, with 95% CIs shown as vertical lines.Two thresholds u = 350 and 420 m 3 /s are shown by vertical dashed lines).

Figure 4 .Figure 5 .
Figure 4. Plots of flow time series against series at Lag 1 at the three stations in the Yingluoxia watershed.The red lines in the plots represent thresholds above which events are classified as high flow extremes (The corresponding thresholds are 80, 170, and 350 m 3 /s, respectively, for QL (a), ZMSK (b), and YLX (c) stations).

Figure 4 .
Figure 4. Plots of flow time series against series at Lag 1 at the three stations in the Yingluoxia watershed.The red lines in the plots represent thresholds above which events are classified as high flow extremes (The corresponding thresholds are 80, 170, and 350 m 3 /s, respectively, for QL (a), ZMSK (b), and YLX (c) stations).

Figure 4 .Figure 5 .
Figure 4. Plots of flow time series against series at Lag 1 at the three stations in the Yingluoxia watershed.The red lines in the plots represent thresholds above which events are classified as high flow extremes (The corresponding thresholds are 80, 170, and 350 m 3 /s, respectively, for QL (a), ZMSK (b), and YLX (c) stations).

Figure 5 .
Figure5.The threshold exceedance series (blue circles), the POT series (red solid dots), and the AM series (black rectangle) for the three stations in the Yingluoxia watershed.

Figure 6 .
Figure 6.Return level estimations from different probability distributions and empirical return periods for high flow extremes for the Yingluoxia watershed.

Figure 6 .
Figure 6.Return level estimations from different probability distributions and empirical return periods for high flow extremes for the Yingluoxia watershed.

Figure 7 .
Figure 7. Differences on the estimated return levels from the POT series and the threshold exceedances with considering the extremal index at the three stations in the Yingluoxia watershed.

Figure 7 .
Figure 7. Differences on the estimated return levels from the POT series and the threshold exceedances with considering the extremal index at the three stations in the Yingluoxia watershed.

Table 1 .
Basic information for three hydrological stations and their statistics on the annual flow in the Yingluoxia watershed.

Table 1 .
Basic information for three hydrological stations and their statistics on the annual flow in the Yingluoxia watershed.

Table 1 .
Basic information for three hydrological stations and their statistics on the annual flow in the Yingluoxia watershed.

Table 2 .
Statistics of the AM and POT series for the three stations in the Yingluoxia watershed.

Table 3 .
Results of the stationarity and independence tests for the AM series in the Yingluoxia watershed.

Table 5 .
Estimations and 95% CIs (in the brackets) of return levels from the AM and the POT series for the Yingluoxia watershed, together with the differences of return level estimations from both series.

Table 6 .
Estimations and 95% CIs (in the brackets) of return levels from the threshold exceedances with considering the extremal index for the Yingluoxia watershed.