Improved Mixed Distribution Model Considering Historical Extraordinary Floods under Changing Environment

Historical extraordinary floods are an important factor in non-stationary flood frequency analysis and they may occur at any time, regardless of whether the environment is changing or not. Based on mixed distribution (MD) modeling, this paper proposed an improved mixed distribution (IMD) model to consider the discontinuity and non-stationarity of flood samples simultaneously, which adds historical extraordinary floods in both sub-series divided by a change point. As a case study, the annual maximum peak discharge and volume series of Ankang hydrological station, located in the upper Hanjiang River Basin of China, were selected to identify non-stationarity by using the variation diagnosis system. MD and IMD were used to fit the flood characteristic series and a genetic algorithm was employed to estimate the optimal parameters. Compared with the design flood values fitted by the stationary Pearson type-III distribution, the results computed by IMD decreased at low return periods and increased at high return periods, with the difference varying from −6.67% to 7.19%. The results highlighted that although the design flood values of IMD are slightly larger than those of MD with different return periods, IMD provided a better result than MD. IMD provides a new perspective for non-stationary flood frequency analysis.


Introduction
According to the Intergovernmental Panel on Climate Change (IPCC) Fifth assessment report [1], global warming will be a considerable issue in the future, resulting in subtle changes in the global water cycle and even the global distribution of water.Furthermore, high intensity human activities have resulted in substantial changes in the land surface conditions of many river basins [2], thus affecting the mechanism of runoff generation and convergence in basins.Under the joint influence of the above aspects, the observed hydrological time series have changed substantially, which makes the assumption of "stationarity" questionable in traditional hydrological frequency analysis [3].
Liang et al. [4] grouped non-stationarity flood frequency methods into two types: indirect and direct methods.The indirect methods are mainly based on the rainfall-runoff relation of the basin as well as the decomposition and composition of time series or the hydrological model to revise the hydrological series to eliminate the influence of climate change and human activities and to finally construct stationary time series.A large amount of literature has carried out studies with the indirect methods [5][6][7][8].However, because the direct methods do not need to restore the hydrological time series, they have been widely used.
The direct methods can be divided into three methods: the mixed distribution method, time variant moment method [9][10][11] and conditional probability distribution method [12].
The mixed distribution (MD) method was employed by Singh and Sinclair for the first time [13].Although this method was widely used in non-stationary flood frequency analysis, parameter estimation is a substantial limit in this method.Alila and Mtiraoui found that the MD model provided a more satisfactory fitting than a traditional single distribution model in the Gila River Basin [14].Meanwhile, the authors noted that the key to ensuring the accuracy of the MD model lies in two aspects.One aspect is to analyze the formation mechanism of floods in detail and to rationally divide the series of hydrological extremes.In some cases, the sub-distributions were divided by the different causes of floods, such as seasonality [15][16][17], or the change point of the hydrological series [18].The other aspect is to keep the number of sub-distributions to a minimum, mainly because the increase in sub-distributions will increase the number of parameters and affect the accuracy of the model parameter estimation.Thus, the determination of the estimated parameter is key to the MD model.Various parameter estimation methods were used to address this problem, such as the maximum likelihood method [19], principle of maximum entropy (POME) [20], EM/ECM algorithm [21] and simulated annealing algorithm (SAA) [22].These examples illustrate that the application of an intelligent optimization algorithm is more and more widely used in parameter estimation and the accuracy of estimation is improved.
Recently, Yan et al. [22] considered the time variability of the parameters in the mixed distribution.The authors proposed the time-varying two-component mixed distributions (TTMD), which considers the time variant in both the weighting coefficients of MD and the parameters of individual component distributions.However, the conventional mixed distributions method often uses the continuous gauged flood sequence as the study sample, without considering historical extraordinary flood data.The historical extraordinary floods refer to the rare extraordinary floods that have occurred in history but were not observed by hydrological stations.The peak discharge of historical extraordinary floods can be attained through historical flood investigation generally.Schendal et al. [23] and Strupczewski et al. [24] showed that the historical extraordinary flood event has a large influence on the calculation accuracy of flood frequency analysis.Taking historical extraordinary flood events into consideration not only increases the information of the flood samples [25,26] but also effectively reduces the uncertainty of flood frequency analysis [27,28].This idea provides an important reference for the design, operation and management of water conservancy projects.However, due to the addition of historical extraordinary floods, the hydrological series has become a discontinuous series.Many scientists have exploited the employment of historical extraordinary flood data in flood frequency analysis for the last few decades [29,30].However, the study of considering both historical extraordinary flood events and non-stationarity of flood series is limited.Machado et al. [31] used the time-varying model based on Generalized Additive Models for Location, Scale and Shape (GAMLSS) modelling and incorporated the external covariates to analyze the flood frequency of a 400-year flood record from the Tagus River in Spain, which obtained a better fitting Zeng et al. [18] used the mixed distribution model to handle non-stationarity.The authors divided the series by the change point and added the historical extraordinary flood data into the sub-series before the change point but not in the post-sub-series.In other words, the authors did not consider the influence of historical extraordinary flood events for the sub-series after the change point.However, historical extraordinary flood events are likely to occur at any time, regardless of whether the environment is changing or not.
Therefore, the results of design floods will first be compared in this paper in a variety of cases, such as with or without historical extraordinary floods.The novelty of this paper is to propose an improved mixed distribution (IMD) method, which adds historical extraordinary flood events in both sub-series divided by a change point.As case studies, the proposed IMD method will be applied to the Ankang hydrological station in the upper stream of the Hanjiang River Basin, China, where many extraordinary floods have occurred in history.A genetic algorithm will be employed to estimate the optimal parameters.The change in the design flood under different return periods will be compared and analyzed, which will provide a new method for non-stationarity flood frequency analysis considering historical extraordinary flood events.

Variation Diagnosis System
The time series of hydrological variables can reflect the extent of the hydrological variables affected by climatic conditions and human activities.There are many methods to test the variation of the hydrological series but there is often a problem in that the results of various methods are not consistent.The hydrological variation diagnosis system, proposed by Xie et al. [32], can be divided into three procedures: primary diagnosis, detailed diagnosis and comprehensive diagnosis, which makes the results more objective and reasonable.A variety of corresponding methods can be used at each procedure of diagnosis.Finally, the results of comprehensive diagnosis can be obtained by combining the weight synthesis.
In this study, we use the Hurst exponent [33] as the method of primary diagnosis to test the degree of variation for each series.According to the calculated Hurst exponent value and the classification of variation degree shown in Table 1 [32], the variation degree of each series can be determined.For the detailed diagnosis, we propose the Spearman and Kendall rank correlation coefficient methods [34,35] to investigate the trends.The Lee-Heghinian method [36], Sequential clustering method [37], Pettitt test [38], Mann-Kendall test [39,40] and R/S analysis method [33] are used to identify the change points of the series.Combined with the hydrological survey investigation and detailed diagnosis results, a final comprehensive diagnosis can be conducted.

Mixed Distribution
The mixed distribution model was first proposed by Singh and Sinclair and applied to the non-stationary hydrological frequency analysis [13].This model can be defined as a probability distribution composed of multiple sub-series distributions; that is, its cumulative distribution function can be regarded as a linear distribution of cumulative distribution functions of several sub-series distributions.The expression is shown in Equation (1).
where w i is the weighting coefficient of each sub-series distribution and satisfies the equation of The number of sub-series distribution is denoted as k.F k (x) denotes the cumulative distribution functions of each sub-series distributions.
Certainly, the weighted average of cumulative distribution functions is not the only way to cope with the multiple components series.Strupczewski et al. [41] provided a seasonal approach to flood frequency analysis.Kochanek et al. [42] applied this approach to the annual peak flow series of Polish rivers, which are formed from summer and winter flows.The method showed that seasonal cumulative distribution functions can also provide good results.
The number of sub-series distributions of mixed distribution can often be determined by the hydrological variation diagnosis results or the physical mechanism of floods.In most cases, to reduce the complexity of mixed distribution, two-component mixture models were often used.For example, Waylen and Woo noted that the simple Gumbel distribution does not fit the different flood-generating processes well [15]; they divided the observed annual flood series data into two subsets (the snowmelt flood and the rainfall flood) and fitted it by using mixed distribution.Zeng et al. [18] divided the annual flood series into two components according to the change points based on the variation diagnosis results.
Thus, we also divide the flood series into two sub-series according to the variation diagnosis results.In China, Pearson type-III (P3) is recommended for flood frequency analysis according to the Regulation for Calculating Design Flood of Water Resources and Hydropower Projects [43].In this paper, we assume each sub-series distribution is subject to a P3 distribution, denoted as f 1 (x) and f 2 (x), respectively.The whole mixed distribution model is given by where w is the weighting coefficient of mixed distribution.α i , β i and a 0i (i = 1, 2) denote the shape, scale and location parameter of the probability density function f i (x) of each sub-series distribution, respectively.In flood frequency analysis, these three parameters can be expressed by mean EX i , variation coefficient C vi and skewness coefficient C si .The formulas are given as follows: . The initial value of the sample mean EX 1 and EX 2 estimated by the moment method can be considered as the unbiased estimate of the total.Thus, there are w, C v1 , C v2 , C s1 and C s2 up to five parameters to be estimated in mixed distribution f(x).

Parameter Estimation
Zhao et al. [44] proposed a curve fitting method for P3 with discontinuous series considering historical extraordinary flood data by using the genetic algorithm (GA), illustrating that the genetic algorithm has good global search ability and can reduce the error of fitting.To fit the empirical frequency points of historical extraordinary flood data, the theoretical frequency curve should pass through the center of the point group of the historical extraordinary flood data and the measured flood data.Hence, we fitted discontinuous series considering historical extraordinary flood data by weight and estimated the optimal parameters for a mixed distribution by using GA.The weight of the historical extraordinary flood data is denoted p, the weight of the measured flood data is denoted q and the formula is shown in Equation (3).Thus, the weighted least square (WLS) method was applied to construct the objective function with the weight coefficient, which is shown in Equation (4).
where a is the number of historical extraordinarily large floods, n is the number of the measured flood series and l is the number of extraordinary large floods from the measured flood series.
During the iterative process, we first estimate the mean value for the two sub-series by the moment method.Then, the genetic algorithm is employed to optimize the other five parameters for mixed distributions, that is, w, C v1 , C v2 , C s1 and C s2 , which can make the objective function attain the minimum and obtain the best fitted mixed distribution parameters.The main calculation procedures are as follows.
(1) Use real number coding to generate an initialization population with a population size Np of 100.The initial parameter variation range of five parameters for mixed distribution should be constrained.For example, the weight coefficient w of each sub-series distribution should be between 0~1, the variation range of C v should be 0~2 according to the information of the Ankang hydrological station and the variation range of C s /C v should be between 2~2.5.This approach effectively avoids the large deviation between the estimated values of C v and C s /C v as well as the recommended value.(2) Calculate the fitness of the initial population.The fitness value of the initial population can be calculated through the objective function shown in Equation ( 4).(3) Set the population gap GGAP = 0.7, the crossover probability P c = 0.6 and the maximum number of iterations N G = 150; the processes of multiple selection, crossover and mutation are carried out for the initial population.Each iteration is used to evaluate the fitness of the population to minimize the objective function value until the optimal parameter is obtained according to the maximum number of iterations.

Model Evaluation Criterion and Goodness-of-Fit Test
The Kolmogorov-Smirnov (K-S) goodness-of-fit test [45] and the AIC criterion [46] were used to test the fitting of each series.The K-S test statistic D is given by where F n (x) denotes the cumulative distribution function of random samples, that is, the empirical frequency of the series.F 0 (x) denotes the distribution form to be tested, that is, the theoretical frequency.n is the sample size and α is the significance level.If the value of statistic D is less than or equal to the critical value D n (α), then the original hypothesis is accepted and it is considered that the fitting is good according to the test.The AIC criterion is also used to evaluate the goodness of fit, which is given by where Pe i and P i denotes the empirical frequency and theoretical frequency of the series, respectively.m is the number of frequency distribution parameters.Taking P3 distribution as an example, there are EX, C v and C s up to three parameters in the P3 distribution; thus, m = 3.

Improved Mixed Distribution Model
We propose the IMD model based on the methods of MD.Because historical extraordinary floods may occur before and after the change in land surface, we take historical extraordinary floods into both sub-series, which are divided by the change point.Thus, both sub-series are discontinuous and are formed with the observed data and historical data.Compared with the MD model, the sub-series before the change point should use moment estimation of the discontinuous sample mean and the sub-series after the change point is necessary for using the same method to calculate the mean value.The change in the mean value is the key difference between the IMD model and MD.The estimation methods of the other parameters are the same.The formula of moment estimation of the discontinuous sample mean is given as follows.
where EX represents the mean of the discontinuous sample.N is the recurrence period of the historical extraordinary flood.a is the number of historical extraordinary floods.l represents the number of historical extraordinary floods in the observed series.

Monte Carlo Simulation
In this work, in order to test whether the length of data will affect the results of design flood values at high return periods, we verify our design flood results via Monte Carlo simulation [47].According to the moment method, the initial value of sample mean EX, variation coefficient C v and skewness coefficient C s for hydrological series can be obtained.Based on the initial parameters, we can use the Monte Carlo method to generate synthetic data with a length of 10,000, which obey the P3 distribution.According to the optimized parameters by GA, theoretical frequency series of observed data can be calculated.To investigate the variation of the synthetic data and the results of observed data, normalized mean bias (NMB) [48] and relative root mean square error (RRMSE) [49] statistical parameters are used for comparison.The formula is given as follows.
where n is the length of synthetic data, which is 10,000.x O represents the design flood values of the observed series estimated by GA. x MC represents the design flood values of synthetic data generated by the Monte Carlo method.Note that lower NMB and RRMSE represent a better performance.
In addition, in order to estimating the uncertainties, nonparametric bootstrap method is used to determine the confidence intervals for flood frequency curves.Nonparametric bootstrap method is resampling from the original data to obtain the bootstrapped sample of flood data.

Study Area
The Hanjiang River is one of the largest tributaries of the Yangtze River in China, with a catchment area of 159,000 km 2 and a length of 1570 km.The basin is bounded by 30 • 10 N to 34 • 20 N latitude and 106 • 15 E to 114 • 20 E longitude.Originating in Hanzhong city of Shaanxi province, the main stream flows southeast through Shaanxi and Hubei provinces and returns to the Yangtze River in Wuhan city.The area controlled by the Ankang hydrological station, with a catchment area of 38,700 km 2 , is the study area (shown in Figure 1).The annual average discharge is 621 m 3 /s at the Ankang hydrological station.The study area has a subtropical continental monsoon climate, which is mild and has four distinctive seasons.The average annual temperature is 15~17 • C and the annual average rainfall is 800~1000 mm.Floods are mainly caused by rainfall, occurring over 3~10 months but mostly in summer and autumn.Summer floods mainly occur in July, mostly consisting of a heavy intensity and short duration rainstorms.Autumn floods often appear in September, generally consisting of stable and persistent rainfall.Thus, the floods in autumn often have a long duration and large volume.The hydrographs of the 1974 typical flood and the 1983 largest flood are shown in Figure 2.
Several reservoirs have been built in this drainage area for the purposes of flood control, irrigation and electricity generation [50].The geographical distribution of these reservoirs at the upper reaches of the Ankang hydrological station is shown in Figure 1.The information for the reservoirs is shown in Table 2.
Water 2018, 10, x FOR PEER REVIEW 7 of 20 Several reservoirs have been built in this drainage area for the purposes of flood control, irrigation and electricity generation [50].The geographical distribution of these reservoirs at the upper reaches of the Ankang hydrological station is shown in Figure 1.The information for the reservoirs is shown in Table 2.

Data Set
In this study, the hourly observed flood data during the period of 1968-2013 at the Ankang hydrological station were available and the series during the annual maximum peak discharge series (AMPDS), annual maximum 24-h flood volume series (24-h AMFVS) and annual maximum 72-h flood volume series (72-h AMFVS) between 1968-2013 were selected.Among them, the flood in July of 1983 with a peak discharge of 31,000 m 3 /s is the largest flood that has been encountered since the establishment (1935) of the Ankang hydrological station.In addition, combined with the historical extraordinary floods investigation results by Yang [51,52], the historical extraordinary flood data of Several reservoirs have been built in this drainage area for the purposes of flood control, irrigation and electricity generation [50].The geographical distribution of these reservoirs at the upper reaches of the Ankang hydrological station is shown in Figure 1.The information for the reservoirs is shown in Table 2.

Data Set
In this study, the hourly observed flood data during the period of 1968-2013 at the Ankang hydrological station were available and the series during the annual maximum peak discharge series (AMPDS), annual maximum 24-h flood volume series (24-h AMFVS) and annual maximum 72-h flood volume series (72-h AMFVS) between 1968-2013 were selected.Among them, the flood in July of 1983 with a peak discharge of 31,000 m 3 /s is the largest flood that has been encountered since the establishment (1935) of the Ankang hydrological station.In addition, combined with the historical extraordinary floods investigation results by Yang [51,52], the historical extraordinary flood data of

Data Set
In this study, the hourly observed flood data during the period of 1968-2013 at the Ankang hydrological station were available and the series during the annual maximum peak discharge series (AMPDS), annual maximum 24-h flood volume series (24-h AMFVS) and annual maximum 72-h flood volume series (72-h AMFVS) between 1968-2013 were selected.Among them, the flood in July of 1983 with a peak discharge of 31,000 m 3 /s is the largest flood that has been encountered since the establishment (1935) of the Ankang hydrological station.In addition, combined with the historical extraordinary floods investigation results by Yang [51,52], the historical extraordinary flood data of 36,000 m 3 /s in 1583, 30,000 m 3 /s in 1867 and 26,000 m 3 /s in 1921 were selected.Due to the lack of historical extraordinary flood volume data, the correlation relationship (shown in Figure 3) between the flood peak and volume of the Ankang hydrological station was used to calculate the historical maximum 24-h and 72-h flood volume data corresponding to the historical maximum peak discharge.The three series of flood samples that consider the discontinuity of historical extraordinary floods are formed.
Water 2018, 10, x FOR PEER REVIEW 8 of 20 36,000 m 3 /s in 1583, 30,000 m 3 /s in 1867 and 26,000 m 3 /s in 1921 were selected.Due to the lack of historical extraordinary flood volume data, the correlation relationship (shown in Figure 3) between the flood peak and volume of the Ankang hydrological station was used to calculate the historical maximum 24-h and 72-h flood volume data corresponding to the historical maximum peak discharge.
The three series of flood samples that consider the discontinuity of historical extraordinary floods are formed.

Primary Diagnosis
The Hurst exponent h values of the flood characteristics series of the Ankang hydrological station for 1968~2013 were calculated and the variation degree was determined according to the classification of variation degree shown in Table 1.The results are shown in Table 3. AMPDS, 24-h AMFVS and 72-h AMFVS all exhibit medium variation, which requires further detailed diagnosis.First, we used the Spearman and Kendall rank correlation coefficient methods to investigate the trends in the three flood characteristics series and then adopted the Lee-Heghinian method, Sequential clustering method, Pettitt test, Mann-Kendall test and R/S analysis method to identify the change points in the series.With a 5% significance level, the critical values of the Spearman and Kendall rank correlation statistics were 2.015 and 1.96, respectively.If the absolute value of the statistics exceeds the critical value, it illustrates that the trend component is significant.The positive and negative values of the statistics show that the trends of the series are increasing or decreasing.From the results, as shown in Table 4, we can learn that the statistics of the two trend analysis methods were negative and the three flood characteristic series have a significant downward trend at the 0.05 significant level.Furthermore, the change points in the flood characteristic series occur between the end of the 1980s and the early 1990s.

Primary Diagnosis
The Hurst exponent h values of the flood characteristics series of the Ankang hydrological station for 1968~2013 were calculated and the variation degree was determined according to the classification of variation degree shown in Table 1.The results are shown in Table 3. AMPDS, 24-h AMFVS and 72-h AMFVS all exhibit medium variation, which requires further detailed diagnosis.First, we used the Spearman and Kendall rank correlation coefficient methods to investigate the trends in the three flood characteristics series and then adopted the Lee-Heghinian method, Sequential clustering method, Pettitt test, Mann-Kendall test and R/S analysis method to identify the change points in the series.With a 5% significance level, the critical values of the Spearman and Kendall rank correlation statistics were 2.015 and 1.96, respectively.If the absolute value of the statistics exceeds the critical value, it illustrates that the trend component is significant.The positive and negative values of the statistics show that the trends of the series are increasing or decreasing.From the results, as shown in Table 4, we can learn that the statistics of the two trend analysis methods were negative and the three flood characteristic series have a significant downward trend at the 0.05 significant level.Furthermore, the change points in the flood characteristic series occur between the end of the 1980s and the early 1990s.and noted that the decrease in runoff in the Hanjiang River Basin was mainly due to the significant change in the land surface conditions.In addition, since there are many dam-break floods that inundated the urban areas of Ankang City, water conservancy projects such as reservoirs and dams began to be built to control floods in the 1980s.Especially after the extraordinary floods that occurred in July of 1983, causing serious economic losses, a ten-mile-long dyke began thorough renovation and was completed in 1987.Therefore, the fact that the change points of the three flood characteristic series, AMPDS, 24-h AMFVS and 72-h AMFVS, are all in 1987 is reasonable.
We also used the Kendall rank correlation coefficient method to investigate the trend for the flood series before the change point (1968)(1969)(1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)) and after the change point .The results are shown in Table 5.From the results, we can see that both trend component of the flood series before and after the change point is not significant.In addition, the trend of flood series before the change point is increasing and the trend of flood series after the change point is decreasing.Thus, the results also verified that the change point is 1987.Due to the fact that the trends and change point of each flood characteristic series are all significant, the final variation form is determined by calculating the efficiency coefficient, which is given by (11) where Q obs,i (i = 1, 2, • • • , n) is the observed hydrological series.Q obs denotes the mean value of the observed hydrological series.For the trend component, Q sim.i (i = 1, 2, • • • , n) is the fitted value of each point on the trend line.τ i denotes the change point of the flood characteristic series and the formula of Q sim,i is expressed by As Table 6 shows, the efficiency coefficients of the change points of the flood characteristic series are all larger than those of the trend variation; thus, the final variation forms of all flood characteristic series are change points and the possible change point is 1987.

Results of Monte Carlo Simulation and Uncertainty Analysis
Taking the observed 1968-2013 flood characteristic series with four historical extraordinary flood events as an example, we compared the variation between the design flood value results simulated by Monte Carlo method and estimated by GA.Values of NMB and RRMSE statistical parameters are shown in Table 7. Figure 4 shows the variation between the design flood values of AMPDS.It illustrates that there is little deviation between the Monte Carlo simulated design flood values and the estimated design flood values by observed flood series.The Monte Carlo simulation of AMPDS exhibited the best results.The statistical parameters of 24-h AMFVS and 72-h AMFVS are slightly larger than AMPDS.It is mainly because the peak discharge of historical extraordinary floods is obtained through historical flood investigation.The historical flood volume data were obtained according to the peak volume relationship of flood, which adds some deviations.However, the statistical parameters are still within the acceptable range.This indicates that we can use 50-year-long datasets including historical extraordinary floods to estimate the design flood values at high return periods.To estimate the uncertainties for flood characteristic series, nonparametric bootstrap method was used to calculate the 95% confidence interval for flood frequency curve.We obtained a bootstrapped sample from the original flood data with the length of 2000.Figure 5 shows the 95% confidence interval for flood characteristic series.

Analysis of the Design Flood Results under Changing Environments
Combined with the results of the change point diagnosis, the flood characteristic series can be divided into two sub-series: the observed series before the change point and the observed series after the change point.Taking historical extraordinary flood data into consideration, we designed four cases fitting the P3 distribution to analyze the difference in the design flood under changing environmental conditions, which are as follows.

Analysis of the Design Flood Results under Changing Environments
Combined with the results of the change point diagnosis, the flood characteristic series can be divided into two sub-series: the observed series before the change point and the observed series after the change point.Taking historical extraordinary flood data into consideration, we designed four cases fitting the P3 distribution to analyze the difference in the design flood under changing environmental conditions, which are as follows.Where i = 1, 2, 3 represents the AMPDS, 24-h AMFVS and 72-h AMFVS, respectively.
The design floods at different return periods of each flood characteristic series are calculated separately for the four different cases.The genetic algorithm is also used to estimate the parameters of the P3 distribution for the flood series.Table 8 shows the results of the estimated parameters for all flood series.Table 9 illustrates that the results all pass the goodness-of-fit test and the fitting results are shown in Figure 6.According to the optimized parameters obtained, the design values of each flood characteristic series are calculated as shown in Table 10.
Water 2018, 10, x FOR PEER REVIEW 12 of 20 • Case 1: original flood characteristic series (1968-2013) that do not consider the variation but add the historical extraordinary flood data, denoted as series 1 i A .
• Case 3: observed flood characteristic series after the change point (1987-2013) with the addition of the historical extraordinary flood data, denoted as series 3 i A .
• Case 4: only the observed flood characteristic series after the change point (1987-2013), denoted as series Where 1, 2,3 i = represents the AMPDS, 24-h AMFVS and 72-h AMFVS, respectively.The design floods at different return periods of each flood characteristic series are calculated separately for the four different cases.The genetic algorithm is also used to estimate the parameters of the P3 distribution for the flood series.Table 8 shows the results of the estimated parameters for all flood series.Table 9 illustrates that the results all pass the goodness-of-fit test and the fitting results are shown in Figure 6.According to the optimized parameters obtained, the design values of each flood characteristic series are calculated as shown in Table 10.As shown in Table 10, we learn that the design flood results of A 11 , which considered historical extraordinary flood data but not non-stationary data, are larger than the 1999 design flood results of the Ankang hydrological station at 10,000-year and 5000-year return periods.However, the A 11 design flood results are smaller at other return periods and the difference is between −8.9% and 1.1%.Compared with the design AMPDS results of the 1999 design report, the design flood of A i1 , which does not consider the non-stationarity of the series, tends to be larger at a high return period and smaller at a low return period, which may be caused by the increase of the sample size.Mainly because of the decreasing trend of the mean value for the flood characteristic series, the addition of data increases the C V of the computed series, resulting in the increase in the upper tail of the frequency curve and the decrease in the lower tail.
In the same way, it can be clearly seen that the design values after the change point of the three flood characteristic series A i3 , considering the historical extraordinary flood data, are smaller than the design values of A i2 at 100-year, 20-year and 5-year return periods.For the other return periods, the design values are larger.The difference range of AMPDS is −31.5% to 19.0%, the difference of 24-h AMFVS ranges from −31.0% to 18.2% and the difference of 72-h AMFVS is 29.9% to 17.0%.It can be seen from Fig. 6 that the design flood value, without considering the variation of flood series, is almost between the design values before and after the variation series.
Comparing the design flood results between case 3 and case 4 shows that the A i4 design values of 24-h AMFVS and 72-h AMFVS are larger than the A i3 designed flood values, except the 5-year return level.The ranges of difference are −1.7%~12.2%and −1.7%~19.2%.For AMPDS, the A i4 design values are larger than the A i3 design values at 10,000-year and 5000-year return periods and are smaller for the rest of the return periods.The range of difference is −2.8%~0.1%.
The design flood of A i3 compared with A i1 results in the same trends, with an increase in the upper tail of the frequency curve and a decrease in the lower tail.Although the mean value decreases after the environment changes, the increase in C V for the flood characteristic series illustrated the change in land surface of the Ankang hydrology station, which reflects the necessity of considering non-stationarity.The results indicate that the current regulation of reservoirs in the upper stream of the Hanjiang River may not satisfy the requirements of flood control.Furthermore, the comparison between A i3 and A i4 exhibits the importance of adding historical extraordinary flood data.Taking historical extraordinary flood data into consideration revised the design flood value and improved the accuracy of flood frequency analysis.

Analysis of the Design Flood Results Based on IMD in Consideration of Historical Extraordinary Floods
The flood characteristic series from the Ankang hydrological station are investigated to illustrate the superiority of the improved mixed distribution (IMD) method proposed in Section 2.3 compared with conventional mixed distribution (MD) methods.According to Section 2.2.2, parameter estimations for the two mixed distribution methods by GA are given in Table 11.We also use P3 to calculate design floods for different return periods for each flood characteristic series.The goodness-of-fit values of these three methods are shown in Table 12.
The D values for AMPDS, 24-h AMFVS and 72-h AMFVS are all less than the critical value D n (α), which is equal to 0.194 at the 5% significance level.This result means that the P3, MD and IMD methods provide a satisfactory fit.The results show that P3 provides the best fit of the three methods.However, regarding the mechanism, the use of P3 distribution fitting is based on satisfying the stationarity hypothesis; in fact, the observed data of the Ankang hydrological station are consistent with the non-stationarity phenomenon.Thus, it is not reasonable to use the P3 distribution after environmental change.Meanwhile, it can also be seen that the AIC criterion values of the IMD method for the three flood characteristic series are all less than the corresponding criterion values of the MD method, which indicates that the IMD method is better.This result proved that our method improved the mechanism of mixed distribution and that our work is meaningful.The fitting results of the MD and IMD methods are given in Figure 7.As is shown in Table 13, for AMPDS, the differences in the design flood values between the best fitness MD methods and the 1999 design flood results are larger at high return periods and smaller As is shown in Table 13, for AMPDS, the differences in the design flood values between the best fitness MD methods and the 1999 design flood results are larger at high return periods and smaller at low return periods, with a difference of −11.88%~6.39%.The design flood value between the best fitness MD methods and the P3 results, without considering the non-stationarity data, has the same changing trends.The differences of AMPDS, 24-h AMFVS and 72-h AMFVS are −3.24%~5.21%,−6.67%~6.29%and −6.34%~7.19%,respectively.From Table 13, we can also see that the design value calculated by the IMD method of the flood series is slightly increased compared with the MD method.However, the IMD method may cause larger changes in the design flood values in other basins.Tang et al. [54] developed the historical extraordinary flood-concerned mixed-distribution method (HFCMM) to overcome the shortcomings of the traditional methods without considering historical extraordinary flood events.However, their study focused on comparing the modelling results of different probability distribution function tail types rather than the difference in the design flood values.
Zeng et al. [18] used MD to estimate the design flood values of the Xidayang reservoir, which is located in the Daqinghe River Basin, in the northern part of China.All the design values decreased by 0.03%-20.24%with different return periods compared with the P3 distribution.However, both design flood values estimated by MD or IMD in our study increased for high return periods but decreased for small return periods.The cause of this phenomenon may be that our study area and the study area of Zeng are located in different river basins and different parts of China.Different factors such as different changes in land use cause different mechanisms of runoff generation and convergence in different basins.Thus, the design flood results we obtained are also reasonable.Yan et al. [22] noted that the time-varying two-component mixture distribution (TTMD) models exhibited better fitting results than and outperformed the stationary models in both the Huanxian and Xianyang stations of the Weihe River Basin.We can also develop the time-varying improved mixed distribution to consider the time variations of the parameters in future work.

Conclusions
Taking the Ankang hydrological station in the upper reaches of the Hanjiang River Basin as the research area, the trend and change point of the flood characteristic series were studied by the hydrological variation diagnosis system.The difference in the design flood values under four conditions showed the large significance of the non-stationary flood frequency analysis and the historical extraordinary flood data.The proposed IMD method was based on a mixed distribution and the calculation principle of discontinuous samples.The genetic algorithm was employed to estimate the parameters.The main conclusions of this paper were drawn as follows.
(1) Hydrological series diagnosis was performed by using a variant diagnostic system.The trends of AMPDS, 24-h AMFVS and 72-h AMFVS at the Ankang hydrological station all decreased significantly at the 5% significance level.However, the final variant form was the change point, which illustrated that the change points of all flood characteristic series were in the year of 1987.This result was mainly related to the construction of the Ankang reservoir.(2) Based on the principle of MD, we proposed the methods of IMD, for which the genetic algorithm was applied, to estimate the parameters and the information of historical extraordinary floods was supplemented in the series after the change point.Meanwhile, the superiority of IMD was demonstrated by the consideration of both environment changes and historical extraordinary floods.Although the design flood of IMD was slightly larger than MD at the Ankang hydrological station, adding historical extraordinary flood data into both sub-series divided by the change point improved the theoretical mechanism of the mixed distribution.The new design flood based on IMD provides the basis for the regulation of reservoir floods in the upper reaches of the Hanjiang River.
Compared with other research of mixed distribution, the greatest advantage of this study is that the discontinuity and non-stationarity of flood samples are solved simultaneously.Taking historical extraordinary floods into sub-series both before and after the change point improved the physical mechanism of mixed distribution under a changing environment.

Figure 1 .
Figure 1.Map of the upper Hanjiang Basin above the Ankang hydrological station.

Figure 2 .
Figure 2. The hydrograph of the 1974 typical flood and the 1983 largest flood.(a) The hydrograph of the typical flood in 1974; and (b) The hydrograph of the largest flood in 1983.

Figure 1 .
Figure 1.Map of the upper Hanjiang Basin above the Ankang hydrological station.

Figure 1 .
Figure 1.Map of the upper Hanjiang Basin above the Ankang hydrological station.

Figure 2 .
Figure 2. The hydrograph of the 1974 typical flood and the 1983 largest flood.(a) The hydrograph of the typical flood in 1974; and (b) The hydrograph of the largest flood in 1983.

Figure 2 .
Figure 2. The hydrograph of the 1974 typical flood and the 1983 largest flood.(a) The hydrograph of the typical flood in 1974; and (b) The hydrograph of the largest flood in 1983.

Figure 3 .
Figure 3. Correlation relationship between the flood peak and volume at the Ankang hydrological station.

Figure 3 .
Figure 3. Correlation relationship between the flood peak and volume at the Ankang hydrological station.

Figure 4 .
Figure 4. Variation between the design flood value of annual maximum peak discharge series (AMPDS) simulated by Monte Carlo and estimated by genetic algorithm (GA).(a) Design flood values of AMPDS simulated by Monte Carlo and estimated by GA at different return periods; and (b) Quantile-Quantile plots between design flood values of AMPDS simulated by Monte Carlo and estimated by GA.

Figure 5 .
Figure 5. 95% confidence intervals computed using the bootstrap method for flood characteristic series.(a) 95% confidence intervals of the annual maximum peak discharge; (b) 95% confidence intervals of the annual maximum 24-h flood volume; and (c) 95% confidence intervals of the annual maximum 72-h flood volume.

Figure 4 .
Figure 4. Variation between the design flood value of annual maximum peak discharge series (AMPDS) simulated by Monte Carlo and estimated by genetic algorithm (GA).(a) Design flood values of AMPDS simulated by Monte Carlo and estimated by GA at different return periods; and (b) Quantile-Quantile plots between design flood values of AMPDS simulated by Monte Carlo and estimated by GA.

Figure 4 .
Figure 4. Variation between the design flood value of annual maximum peak discharge series (AMPDS) simulated by Monte Carlo and estimated by genetic algorithm (GA).(a) Design flood values of AMPDS simulated by Monte Carlo and estimated by GA at different return periods; and (b) Quantile-Quantile plots between design flood values of AMPDS simulated by Monte Carlo and estimated by GA.

Figure 5 .
Figure 5. 95% confidence intervals computed using the bootstrap method for flood characteristic series.(a) 95% confidence intervals of the annual maximum peak discharge; (b) 95% confidence intervals of the annual maximum 24-h flood volume; and (c) 95% confidence intervals of the annual maximum 72-h flood volume.

Figure 5 .
Figure 5. 95% confidence intervals computed using the bootstrap method for flood characteristic series.(a) 95% confidence intervals of the annual maximum peak discharge; (b) 95% confidence intervals of the annual maximum 24-h flood volume; and (c) 95% confidence intervals of the annual maximum 72-h flood volume.

4. 3 .
Analysis of the Design Flood Results under Changing EnvironmentsCombined with the results of the change point diagnosis, the flood characteristic series can be divided into two sub-series: the observed series before the change point and the observed series after the change point.Taking historical extraordinary flood data into consideration, we designed four cases fitting the P3 distribution to analyze the difference in the design flood under changing environmental conditions, which are as follows.•Case1: original flood characteristic series (1968-2013) that do not consider the variation but add the historical extraordinary flood data, denoted as series A i1 .•Case2: observed flood characteristic series before the change point (1968-1987) with the addition of the historical extraordinary flood data, denoted as series A i2 .• Case 3: observed flood characteristic series after the change point (1987-2013) with the addition of the historical extraordinary flood data, denoted as series A i3 .• Case 4: only the observed flood characteristic series after the change point (1987-2013), denoted as series A i4 .

Figure 6 .
Figure 6.Fitting results for each flood characteristic series under four conditions.(a) Fitting results of the annual maximum peak discharge; (b) Fitting results of the annual maximum 24-h flood volume; and (c) Fitting results of the annual maximum 72-h flood volume.

Figure 6 .
Figure 6.Fitting results for each flood characteristic series under four conditions.(a) Fitting results of the annual maximum peak discharge; (b) Fitting results of the annual maximum 24-h flood volume; and (c) Fitting results of the annual maximum 72-h flood volume.

Table 1 .
Classification of variation degree.
where α is significance level, r α denotes the correlation function C(t) corresponding to α significance level,

Table 2 .
Information for the reservoirs in the study area.

Table 2 .
Information for the reservoirs in the study area.

Table 2 .
Information for the reservoirs in the study area.

Table 3 .
Calculation results of the Hurst exponent for each flood characteristic series.

Table 3 .
Calculation results of the Hurst exponent for each flood characteristic series.

Table 4
illustrates that the possible change point of AMPDS and 24-h AMFVS are in 1987 but the possible change point of 72-h AMFVS appears in 1985 and 1987, which is in accordance with the results of Xiong et al. [53].Ankang reservoir construction was started in 1978.It began to store water in 1989 and was finished in 1992.These authors considered that the change point was closely linked with the construction of the Ankang reservoir.Zhang et al. [2] compared the catchment runoff change during 2001-2010 with the previous 40 years

Table 5 .
Kendall rank correlation statistic values for the flood series before and after the change point.

Characteristics Series Kendall Rank Correlation Statistic U 1968-1986 1987-2013
represents the Kendall rank correlation statistic.With a 5% significance level, the critical value of Kendall rank correlation statistics was 1.96.If the absolute value of the statistic exceeds the critical value, it illustrates that the trend component is significant.The positive and negative values of the statistics show that the trends of the series are increasing or decreasing. U

Table 6 .
Calculation results of the efficiency coefficients.

Table 7 .
Values of statistical parameters for all flood characteristic series.

Table 8 .
Results of the estimated parameters for all flood series.

Parameters AMPDS (m 3 /s) 24-h AMFVS (10 8 m 3 ) 72-h AMFVS (10 8 m 3 ) 11A 12 A 13 A 14 A 21 A 22 A 23 A 24 A 31 A 32 A 33 A 34
EX denotes the mean value of the series.Cv denotes the variation coefficient.Cs/Cv denotes the ratio of skewness coefficient Cs and the variation coefficient Cv.

Table 8 .
Results of the estimated parameters for all flood series.

3 /s) 24-h AMFVS (10 8 m 3 ) 72-h AMFVS (10 8 m 3 )
EX denotes the mean value of the series.Cv denotes the variation coefficient.Cs/Cv denotes the ratio of skewness coefficient Cs and the variation coefficient Cv.

Table 9 .
Goodness-of-fit test results for all flood series.D denotes the K-S test statistic.D n (α) is the critical value of K-S test statistic D and D less than D n (α) means the distribution passes the goodness-of-fit test at the 5% significant level.

Table 10 .
Design value of each flood characteristic series.Difference (1) represents the difference percentage in A i1 compared with 1999 design report results, difference (2) represents the difference percentage in A i3 compared with A i2 and difference (3) represents the difference percentage in A i4 compared with A i3 .

Table 11 .
Parameter estimation results for the mixed distribution (MD) and improved mixed distribution (IMD) methods.

Flood Characteristic Series Method α EX 1 Cv 1 Cs 1 EX 2 Cv 2 Cs 2
EX 1 and EX 2 represent the mean values of the two sub-series divided by the change point; Cv 1 and Cv 2 represent the variation coefficient of the two sub-series; and Cs 1 and Cs 2 represent the skewness coefficient of the two sub-series.

Table 12 .
Goodness-of-fit test results for the MD and IMD methods.

Table 12 .
Goodness-of-fit test results for the MD and IMD methods.
D denotes the K-S test statistic.AIC denotes the evaluation value of the AIC criterion.

Table 13 .
Design flood results of each flood characteristic series with the MD and IMD methods.represents the best fitness MD methods.Difference (1) represents the difference percentage in the best fitness MD methods compared with 1999 design report results, difference (2) represents the difference percentage in the best fitness MD methods compared with P3 and difference (3) represents the difference percentage in IMD compared with MD. *