Spatial and Temporal Patterns in Nonstationary Flood Frequency across a Forest Watershed : Linkage with Rainfall and Land Use Types

Understanding the response of flood frequency to impact factors could help water resource managers make better decisions. This study applied an integrated approach of a hydrological model and partial least squares (PLS) regression to quantify the influences of rainfall and forest landscape on flood frequency dynamics in the Upper Honganjian watershed (981 km2) in China, the flood events of flood seasons in return periods from two to 100 years, wet seasons in return periods from two to 20 years, and dry seasons in return periods from two to five years show similar dynamics. Our study suggests that rainfall and the forest landscape are pivotal factors triggering flood event alterations in lower return periods, that flood event dynamics in higher return periods are attributed to hydrological regulations of water infrastructures, and that the influence of rainfall on flood events is much greater than that of land use in the dry season. This effective and simple approach could be applied to a variety of other watersheds for which a digital spatial database is available, hydrological data are lacking, and the hydroclimate context is variable.


Introduction
Flood frequency is the probability of a flood event in a certain period, which is relevant to planning and decision processes related to hydraulic works or flood alleviation programs [1].Thorough knowledge of flood frequency dynamics is crucial in a watershed [2][3][4].Variations in flood frequencies result from meteorological factors and underlying surface properties, including rainfall, flood control facilities, topography, soil and land use types [3,5].Among these factors, flood control facilities represent passive defense structures against floods; topography and soil are relatively constant in short periods, whereas rainfall and land use are variable [3,[5][6][7][8].Therefore, determining the response of flood frequencies to rainfall and land use is crucial for water resource management in watersheds [5,9].However, major challenges are associated with flood frequency response research, including failed estimation of flood frequencies, lack of data, and the high collinearity of rainfall and land use types [8,10].To quantify the factors controlling flood frequency, estimation for flood frequencies is a prerequisite.Frequency analysis (FA) is a method used to fit the frequency of extreme hydrological events to their magnitudes by using probability distributions [6,10,11].FA has played an important role in increasing the prediction accuracy of flood frequencies [2,10,11].Other methods, such as empirical relationships and the fuzzy logic approach, have also been applied to a few watersheds [10].Many FA methods have been developed and tested, including the index flood method, the rational method, and various regression-based methods [2], the disadvantages of conventional FA could be the significant uncertainties and bias, which are due to limited historical recorded data with sufficient spatial and temporal coverage and acceptable quality, sampling variability, model errors, and the errors in projections into the future [12,13].In addition, this traditional technology is based on the assumption that the hydrological observations are independently and identically distributed and the conditions remain stationary [11,14].Lastly, FA focuses on flood peak values; however, the severity of a flood is defined not only by the flood peak value but also by flood volume, duration, etc. [15].In practical applications, the flood series are always not independent and exist in a nonstationary context, and the watershed always contains a large number of ungauged areas [16].Overall, conventional FA provides a limited assessment of flood frequencies [14].Using conventional FA on nonstationary flow series may lead to uncertain flood frequency predictions [11,17].
To overcome these limitations, a nonstationary FA framework combined with lognormal or generalized extreme value distribution models and hydrological models has been developed [5,18].Nonstationary FA coupled probability distribution has been considered an effective improvement in flood frequency analysis, with observations that are not independent under nonstationary circumstances [2,11,14].To address these challenges of data scarcity, hydrological models have the ability to complement available datasets from local gauging stations with a spatial extension.This modeling approach provides some advantages: (1) planned alterations in rainfall and land use can be considered, and hydrological modeling allows one to obtain the full hydrograph for the design, and (2) the estimation of design flows can be performed for ungauged watersheds if the parameters of the hydrological model are regionalized [4].Recently, many studies have reported the use of hydrological models, such as the HECHMS, WRF/DHSVM, and SWAT models, to simulate floods [4,17,19], the SWAT model can divide a watershed into sub-watersheds and then discretize them into a series of hydrologic response units (HRUs), which are spatially identified as unique soil-land use combination areas.SWAT can also provide a wide range of flexibility for model formulation and calibration [20][21][22].A nonstationary FA framework incorporated with SWAT is a relatively new modeling approach and has been shown to provide acceptable prediction results when the hydroclimate context is variable, data are lacking, or the spatiotemporal analysis is complex [20,21,23].
Multivariate regression approaches have great potential for quantifying the relative importance of rainfall and land use types in controlling flood frequencies.However, traditional multivariate approaches cannot easily overcome the limitations of rainfall and land use types, which are highly collinear predictors [8,22,24,25].Therefore, non-independent data must be handled cautiously in quantitative analyses.Partial least squares (PLS) regression is an advanced method that combines the features of principal component analysis and multiple linear regressions.It has been widely used to overcome the issue of multicollinearity and noisy data in quantitative analyses by projecting variables on high-dimensional spaces [26], the importance of a predictor to variations in model fitting is given by the variable influence on projection (VIP) value.VIP values reflect the importance of terms in a model with respect to both Y, i.e., a variable's correlation to all responses, and X, i.e., the projection [27].Variations with higher VIP values are considered more important [28].PLS regression can be used to evaluate rainfall, forest, and other land use influences on flood events [8,22,26].
The objective of this paper is to study the influences of forest land use and precipitation on flood frequencies in the Upper Honganjian watershed in China.This investigation is separated into two parts: (1) revealing flood frequency destruction based on a nonstationary FA method and SWAT model,

Study Area
The Upper Honganjian watershed, with a total area of 981 km 2 , is located in the Yellow River Basin and lies between 36 • 2' N to 36 • 34' N and 110 • 50' E to 112 • 10' E, the average annual temperature is approximately 11.8 • C, and the average annual precipitation is approximately 558.5 mm.A large portion of precipitation occurs during the monsoon season from May to October.Floods occur primarily in July, August, and September [29], the topography of the watershed is undulating and characterized by mountain ranges, steep slopes, and deep valleys, the elevation varies from 572 m at the Dongzhuang gauging station to 2259 m at the highest point in the watershed (Figure 1), the main soil types are yellow loamy soil (50.3%) and brown soil (18.8%), which correspond to Alfisols and Entisols in the USA Soil Taxonomy [30], respectively.Most areas are covered by forest (43.4%) and farmland (34.3%).
The objective of this paper is to study the influences of forest land use and precipitation on flood frequencies in the Upper Honganjian watershed in China.This investigation is separated into two parts: (1) revealing flood frequency destruction based on a nonstationary FA method and SWAT model, and (2) illuminating the response of flood frequency distribution to land use and precipitation based on PLS regression.

Study Area
The Upper Honganjian watershed, with a total area of 981 km 2 , is located in the Yellow River Basin and lies between 36°2′ N to 36°34′ N and 110°50′ E to 112°10′ E. The average annual temperature is approximately 11.8 °C, and the average annual precipitation is approximately 558.5 mm.A large portion of precipitation occurs during the monsoon season from May to October.Floods occur primarily in July, August, and September [29].The topography of the watershed is undulating and characterized by mountain ranges, steep slopes, and deep valleys.The elevation varies from 572 m at the Dongzhuang gauging station to 2259 m at the highest point in the watershed (Figure 1).The main soil types are yellow loamy soil (50.3%) and brown soil (18.8%), which correspond to Alfisols and Entisols in the USA Soil Taxonomy [30], respectively.Most areas are covered by forest (43.4%) and farmland (34.3%).

Data Collection and Pretreatment
Hydrometeorological data: Forty-six years (1965-2010) of daily streamflow data were collected at the Dongzhuang station (the outlet of the Upper Honganjian watershed).Daily precipitation, solar radiation, wind speed, relative humidity, and max/min temperatures were attained from six weather stations (Figure 1).To account for seasonal variations, the streamflow data series were split into a wet season (i.e., May, June, and October), a flood season (i.e., July-September), and a dry

Data Collection and Pretreatment
Hydrometeorological data: Forty-six years (1965-2010) of daily streamflow data were collected at the Dongzhuang station (the outlet of the Upper Honganjian watershed).Daily precipitation, solar radiation, wind speed, relative humidity, and max/min temperatures were attained from six weather stations (Figure 1).To account for seasonal variations, the streamflow data series were split into a wet season (i.e., May, June, and October), a flood season (i.e., July-September), and a dry season (i.e., November-April), the precipitation data were interpolated over the delineated sub-watersheds using a skewed normal distribution.
Topographical and soil coverage data for the model setup: The topographical data required by the SWAT model were derived from a digital elevation model (DEM) with a resolution of 25 × 25 m, which was obtained from the National Geomatics Center of China, the soil data, including a soil type map (1:100,000) and information on the related soil properties, were obtained from the Hydrological Bureau of Shanxi Province.
Land use data: The land use data for the 1980s were obtained from the Hydrological Bureau of Shanxi Province.Four land use domains were identified, namely, forest, farmland, urban, and grassland (Figure 2).Land use and soil data were extracted using ArcGIS Version 10.2.(Esri, Redlands, CA, USA).
Forests 2018, 9, x FOR PEER REVIEW 4 of 20 season (i.e., November-April).The precipitation data were interpolated over the delineated sub-watersheds using a skewed normal distribution.Topographical and soil coverage data for the model setup: The topographical data required by the SWAT model were derived from a digital elevation model (DEM) with a resolution of 25 × 25 m, which was obtained from the National Geomatics Center of China.The soil data, including a soil type map (1:100,000) and information on the related soil properties, were obtained from the Hydrological Bureau of Shanxi Province.
Land use data: The land use data for the 1980s were obtained from the Hydrological Bureau of Shanxi Province.Four land use domains were identified, namely, forest, farmland, urban, and grassland (Figure 2).Land use and soil data were extracted using ArcGIS Version 10.2.(Esri, Redlands, CA, USA).

SWAT Model Calibration for Ungauged Sub-Watershed Streamflow
Before using the SWAT model to predict the streamflow in ungauged sub-watersheds, calibration and validation of the model were performed.Calibration was performed with automated and manual techniques.The first step in the calibration process was determination of the most sensitive parameters for studying the watershed.Sensitivity analysis of the parameters in the SWAT model was performed using the LH-OAT analysis method, which combines the Latin hypercube (LH) sampling method and the one-factor-at-a-time (OAT) sensitivity analysis method.After 350 runs, the most sensitive parameters were detected.Autocalibration was the second step.This procedure was based on shuffled complex evolution (SCE-UA), which allows calibration of model parameters based on a single objective function [31].In the last step, the SWAT model was manually calibrated against monthly and daily streamflow data, which were observed at the Dongzhuang gauge station.The calibration period was from January 1972 to December 1981.Manual calibration was performed to minimize total flow (minimized average annual percent bias), accompanied by visual inspection of the hydrographs.The parameters governing the surface runoff response were first calibrated, followed by those governing the fraction of streamflow that transform to baseflow.This preliminary calibration was followed by a fine-tuning at the daily time scale to ensure that the predicted versus measured peak flows and recession curves on a daily time step matched as closely as possible.
The validation period was from January 1982 to December 1991.Nash-Sutcliffe efficiency (ENS), percent bias (PBIAS), and coefficient of determination (R 2 ) were used to evaluate the performance of the model.ENS was calculated as follows [8,32]:

SWAT Model Calibration for Ungauged Sub-Watershed Streamflow
Before using the SWAT model to predict the streamflow in ungauged sub-watersheds, calibration and validation of the model were performed.Calibration was performed with automated and manual techniques, the first step in the calibration process was determination of the most sensitive parameters for studying the watershed.Sensitivity analysis of the parameters in the SWAT model was performed using the LH-OAT analysis method, which combines the Latin hypercube (LH) sampling method and the one-factor-at-a-time (OAT) sensitivity analysis method.After 350 runs, the most sensitive parameters were detected.Autocalibration was the second step.This procedure was based on shuffled complex evolution (SCE-UA), which allows calibration of model parameters based on a single objective function [31].In the last step, the SWAT model was manually calibrated against monthly and daily streamflow data, which were observed at the Dongzhuang gauge station, the calibration period was from January 1972 to December 1981.Manual calibration was performed to minimize total flow (minimized average annual percent bias), accompanied by visual inspection of the hydrographs, the parameters governing the surface runoff response were first calibrated, followed by those governing the fraction of streamflow that transform to baseflow.This preliminary calibration was followed by a fine-tuning at the daily time scale to ensure that the predicted versus measured peak flows and recession curves on a daily time step matched as closely as possible.The validation period was from January 1982 to December 1991.Nash-Sutcliffe efficiency (E NS ), percent bias (PBIAS), and coefficient of determination (R 2 ) were used to evaluate the performance of the model.E NS was calculated as follows [8,32]: where n is the discrete time step and O i and S i are the measured and simulated values, respectively.PBIAS is defined as follows [8,32]: where O i and S i are the measured and simulated values, respectively, and n is the total number of paired values.R 2 was calculated as follows [8,32]: where n is the number of events, O i and S i are the measured and simulated streamflow values, respectively, and O and S are the mean observed and simulated values, respectively, the performance of the SWAT model ( 1) is considered acceptable when R 2 and E NS are greater than 0.5 and PBIAS ranges from ±15% to ±25%; (2) is good when R 2 is greater than 0.5, E NS is greater than 0.65, and PBIAS ranges from ±10% to ±15%; and (3) is very good when R 2 is greater than 0.5, E NS is greater than 0.75, and PBIAS is smaller than ±10% [8,32].

Evaluation of the Quantiles of Maximum Streamflow
Based on previous studies, the annual maximum (AM) series model was used in the multistep nonstationary FA in this study [2,18], the AM model is a framework that uses annual maximum values as appropriate estimators with a preferred distribution [10].First, the AM series model was applied to identify the annual seasonal extreme streamflow.To identify the low and high outliers for the flow series, the Grubbs and Beck (1972) statistical test (see Appendix A.1 for more technical details) was used after the data were transformed to be normally distributed [33].Second, to perform FA, several statistical tests were used, including the Wald-Wolfowitz test for randomness or autocorrelation and the Mann-Kendall test for stationarity (see Appendix A.2 and A.3 for more technical details), the Wald-Wolfowitz test and Mann-Kendall test were performed by SPSS 20 (IBM SPSS Inc., Chicago, IL, USA) and MATLAB 8.4 (The MathWorks Inc., Natick, MA, USA), respectively.Third, the appropriate probability distribution for the sub-watershed streamflow frequencies was identified.In this study, for the nonstationary modeling, a generalized extreme value (GEV) distribution model and a lognormal (LN2) distribution model (see Appendix A.4 and A.5 for more technical details) were chosen because the principles of the models are incorporated in the regional frequency analysis, although some of the parameters are allowed to change with time [18,34], the parameters were estimated using the maximum likelihood (ML) estimation method in MATLAB 8.4.To identify an appropriate probability distribution for fitting the observed hydrological data, goodness-of-fit tests were used in this study (similar methods were used by [10]), the Akaike information criterion (AIC), based on the principle of maximum entropy, and the Bayesian information criterion (BIC), proposed for use in the Bayesian framework, were used to assess the performance of the two models [35][36][37][38], the equations of the AIC and BIC are given as follows: Forests 2018, 9, 339 7 of 20 where L is the likelihood function, k is the number of parameters of the distribution, and N is the sample size.Finally, the quantiles of maximum streamflow and return periods were obtained and evaluated.Details on estimating the quantiles for the used distribution and return periods can be found in [18].

Determination of Flood Events and Return Periods
The flood events of the Upper Honganjian watershed were derived from the daily streamflow data collected at the Dongzhuang gauge station, the number of extreme streamflow events was calculated by counting the number of days in a year or season for which daily values exceed the quantiles of maximum streamflow.A flood event was identified when extreme streamflow events were continuous for six or more days in high streamflow periods [39], the notion of return period for extreme hydrological events is commonly used in hydrological nonstationary FA, the return period T is an event magnitude having a probability 1/T of being exceeded during any single event [10].

SWAT Model Calibration and Validation
The calibrated SWAT parameters are listed in Table 1, the E NS , R 2 , and PBIAS values for the monthly and daily streamflow calibration and validation are listed in Table 2.All of the E NS and R 2 values for monthly streamflow were greater than 0.8, and the PBIAS values were in the range of ±10%, the statistical comparison between the measured daily streamflow and the simulation results showed good agreement, and the parameters that were calibrated for the daily streamflow of the model could be used to simulate every sub-watershed.

Annual Extreme Streamflow and Appropriate Distribution
Figure 4 displays a box plot for the annual seasonal extreme streamflow data for each decade, the total annual number of days with extreme streamflow for the entire watershed for the 1970s, 1980s, 1990s, and 2000s were 306, 321, 291, and 196, respectively, the peak in the 1980s was well defined due to the frequency of the hydrometeorological days, which represented approximately 28.8% of all the data, while the low value in the 2000s represented approximately 17.6% of all the data.
Forests 2018, 9, x FOR PEER REVIEW 8 of 20  The Wald-Wolfowitz test results (Z = −1.054,p-value = 0.387) of the streamflow at the Dongzhuang station showed that the data were independent at p < 0.05.The Mann-Kendall test results (K = −2.256,p-value = 0.021) showed that the yearly maximum streamflow increased significantly after 1999 at p < 0.05.The corresponding parameters were estimated by the ML methods.The results of the goodness-of-fit tests for selecting an appropriate probability distribution for the sub-watershed streamflow frequency analysis using the AIC and BIC criteria are summarized in Table 3.Out of 66 cases (two model selection criteria × 33 sub-watersheds), the LN2 distribution model was preferred for 53 cases (i.e., 80.3%), whereas the GEV distribution model was favored in the remaining 13 cases (i.e., 19.7%).Both model selection criteria favored the LN2 distribution over the GEV distribution, as shown in Table 3.These results demonstrated that the LN2 distribution was preferable to the GEV distribution for modeling the partial time series data for the selected watersheds.The empirical probability curves as well as the confidence intervals of the LN2 distribution for the observed streamflow at the watershed scale are presented in Figure 5.The Wald-Wolfowitz test results (Z = −1.054,p-value = 0.387) of the streamflow at the Dongzhuang station showed that the data were independent at p < 0.05, the Mann-Kendall test results (K = −2.256,p-value = 0.021) showed that the yearly maximum streamflow increased significantly after 1999 at p < 0.05, the corresponding parameters were estimated by the ML methods, the results of the goodness-of-fit tests for selecting an appropriate probability distribution for the sub-watershed streamflow frequency analysis using the AIC and BIC criteria are summarized in Table 3.Out of 66 cases (two model selection criteria × 33 sub-watersheds), the LN2 distribution model was preferred for 53 cases (i.e., 80.3%), whereas the GEV distribution model was favored in the remaining 13 cases (i.e., 19.7%).Both model selection criteria favored the LN2 distribution over the GEV distribution, as shown in Table 3.These results demonstrated that the LN2 distribution was preferable to the GEV distribution for modeling the partial time series data for the selected watersheds, the empirical probability curves as well as the confidence intervals of the LN2 distribution for the observed streamflow at the watershed scale are presented in Figure 5.

Nonstationary Regional Frequency Analysis
By fitting the LN2 distribution with the ML parameter estimation method to the sub-watershed streamflow data, the maximum streamflow quantiles at each sub-watershed were estimated for average recurrence intervals (ARIs) of two, five, 10, 20, 100, and 1000 years to analyze the flood events in each sub-watershed.
To analyze the seasonal variation in flood events at the watershed scale during different decades, the annual average number of flood events in different decades (1970s, 1980s, 1990s, and 2000s) were computed under different return levels (2-5, 5-20, 20-100, 100-1000, and >1000 years).The results are shown in Figure 6.The number of flood events during flood seasons for return periods covering two to 100 years, wet seasons for return periods covering two to 20 years, and dry seasons for the return periods of two to five years shows similar fluctuations.Similar variations can also be found for return periods that exceed 100, 20, and five years in flood seasons, wet seasons, and dry seasons, respectively.Compared with the lower and higher return periods, a significantly different flood event distribution is identified.As the seasons change (i.e., from the flood to dry season), the total number of flood events in each return periods change and the maximum and minimum numbers diminish and shift to shorter return periods.
For the return periods from two to five years, the flood events were most frequent during the 1980s in all seasons, and the flood events were least frequent in the 1990s for each season.For the return periods of five to 20 years, the maximum values occurred in the 1980s, whereas the minimum values occurred in the 1990s for both the flood and wet seasons.For the return periods of 20 to 100 years, the maximum values occurred in the 1980s for both the flood and dry seasons and in the 1970s for the wet season, and the minimum values were recorded in the 1990s for the flood season.The lowest values occurred in the 1980s, 1990s, and 2000s for the wet season, and the total number for the three decades was four.For the return periods of 100-1000 and >1000 years, the maximum values occurred in the 1970s for both the flood and wet seasons, whereas the minimum values were

Nonstationary Regional Frequency Analysis
By fitting the LN2 distribution with the ML parameter estimation method to the sub-watershed streamflow data, the maximum streamflow quantiles at each sub-watershed were estimated for average recurrence intervals (ARIs) of two, five, 10, 20, 100, and 1000 years to analyze the flood events in each sub-watershed.
To analyze the seasonal variation in flood events at the watershed scale during different decades, the annual average number of flood events in different decades (1970s, 1980s, 1990s, and 2000s) were computed under different return levels (2-5, 5-20, 20-100, 100-1000, and >1000 years), the results are shown in Figure 6, the number of flood events during flood seasons for return periods covering two to 100 years, wet seasons for return periods covering two to 20 years, and dry seasons for the return periods of two to five years shows similar fluctuations.Similar variations can also be found for return periods that exceed 100, 20, and five years in flood seasons, wet seasons, and dry seasons, respectively.Compared with the lower and higher return periods, a significantly different flood event distribution is identified.As the seasons change (i.e., from the flood to dry season), the total number of flood events in each return periods change and the maximum and minimum numbers diminish and shift to shorter return periods.
For the return periods from two to five years, the flood events were most frequent during the 1980s in all seasons, and the flood events were least frequent in the 1990s for each season.For the return periods of five to 20 years, the maximum values occurred in the 1980s, whereas the minimum values occurred in the 1990s for both the flood and wet seasons.For the return periods of 20 to 100 years, the maximum values occurred in the 1980s for both the flood and dry seasons and in the 1970s for the wet season, and the minimum values were recorded in the 1990s for the flood season, the lowest values occurred in the 1980s, 1990s, and 2000s for the wet season, and the total number for the three decades was four.For the return periods of 100-1000 and >1000 years, the maximum values occurred Forests 2018, 9, 339 10 of 20 in the 1970s for both the flood and wet seasons, whereas the minimum values were recorded in the 1990s for the flood season.Compared with the baseline return periods of two to five years, in the flood season, the mean annual number of flood events for the four decades increased for the return periods of five to 20 years and then showed a decreasing gradient for larger return periods.Similar variations occurred in each decade.In the wet season, compared with the baseline, the mean annual number of flood events over the four decades indicated a decreasing trend for return periods below 1000 years.Each decade exhibited similar dynamics.In the dry season, a strong gradient existed in the total number of events over the four decades.
Among the four decades, in the flood season, the highest number of events was found for the return periods of five to 20 years, and the lowest number was found for the return periods of >1000 years, the highest numbers were found for the return periods of two to five years in both the wet and dry seasons, while the lowest numbers were found for the return periods of 100 to 1000 years and the return periods of 20 to 100 years in the wet and dry seasons, respectively.Among the three seasons, the flood events in the flood season were more severe than in the other two seasons.
Forests 2018, 9, x FOR PEER REVIEW 10 of 20 recorded in the 1990s for the flood season.Compared with the baseline return periods of two to five years, in the flood season, the mean annual number of flood events for the four decades increased for the return periods of five to 20 years and then showed a decreasing gradient for larger return periods.Similar variations occurred in each decade.In the wet season, compared with the baseline, the mean annual number of flood events over the four decades indicated a decreasing trend for return periods below 1000 years.Each decade exhibited similar dynamics.In the dry season, a strong gradient existed in the total number of events over the four decades.Among the four decades, in the flood season, the highest number of events was found for the return periods of five to 20 years, and the lowest number was found for the return periods of >1000 years.The highest numbers were found for the return periods of two to five years in both the wet and dry seasons, while the lowest numbers were found for the return periods of 100 to 1000 years and the return periods of 20 to 100 years in the wet and dry seasons, respectively.Among the three seasons, the flood events in the flood season were more severe than in the other two seasons.

Contribution of Rainfall and Land Use to Flood Events
The performance of the PLS regression models is shown in Table 4.For the flood season, in the return periods of two to five years, the first component was dominated by forest on the negative side and explained 74.3% of the variation in flood events, the addition of the second component was dominated by rainfall, urban land, and farmland on the positive side and made the explanation approached 81.2% of the variation and generated the minimum mean square error of cross-validation (RMSECV) value, the addition of more components to the PLS regression models did not substantially improve the explanatory power but resulted in a higher RMSECV, indicating that the subsequent components were not strongly correlated with the residuals of the predicted variable according to [26].For return periods of five to 20 years model, the first component was dominated by forest on the negative side, which explained 71.8% of the flood events variance in the dataset, the addition of the second component, dominated by rainfall, farmland, and urban land on the positive side, increased the model-explained variance to 78.0%.For the return periods of 20 to 100 and 100-1000 years, the first component was dominated by forest on the negative side, and the second component was dominated by urban land on the positive side.For the wet season, in return periods of two to five and five to 20 years, the first component was dominated by forest on the negative side, and the second component was dominated by rainfall, urban land, and farmland on the positive side.For the dry season, in the return periods of two to five years, the first component was dominated by forest on the negative side and farmland on the positive side, and the second component was dominated by rainfall and urban land on the positive side.In return periods > 5 years, the rainfall became to the dominate factor.
The relationship between the number of seasonal flood events (in the 1980s) and the proportion of land use types (1980s) was analyzed at the sub-watershed scale.Figure 7 shows the relative contributions of impact factors under different return periods, the influence of land use types on flood events decreased with increasing return periods.Farmland and urban areas had a positive effect on the number of events, while forest land had a negative effect.No statistically significant relationships were detected between grassland and flood events.For the flood and wet season, in the return periods of 2-5 and 5-20 years, the forest area had a significant negative effect on the number of flood events, while significant positive correlations were detected between both farmland and urban areas and flood events.VIP values of forest, farmland, and urban land (VIP > 1) were higher than those of grassland and rainfall.For the flood season, in the return periods of 20 to 100 and 100-1000 years, forest and urban land had VIP scores greater than 1, and rainfall, farmland, and grassland had VIP value less than 1, as shown in Figure 7.In the wet seasons, no statistically significant relationships between land use types and flood events were detected when the return period exceeded 100 years.For the dry season, in the return periods of two to five years, rainfall had the highest VIP score (1.337) and a larger positive regression (0.679), followed by the farmland (coefficient = 0.723; VIP = 1.301), forest (coefficient = −0.685;VIP = 1.246), and urban land (coefficient = 0.324; VIP = 1.129).These results show a significant negative correlation between the forest area and the number of flood events.Rainfall, farmland, and urban area had remarkable positive effects on the number of flood events, the VIP value (greater than 1) of rainfall was highest in return periods > 5 years in the dry season, followed by farmland, forest, urban land, and grassland with VIP values less than 1 (ranging from 0.428 to 0.087).

Discussion
The ability of the SWAT model to simulate the daily streamflow data in the frequency analysis has been demonstrated by many papers [21,23,40].A high goodness-of-fit value for the monthly and daily streamflow suggests good model performance in our study.Table 2 shows good modeling results for the monthly streamflow, with a maximum Nash-Sutcliffe efficiency (E NS ) of 0.80.In our study, daily streamflow E NS is less satisfactory than that of monthly streamflow, which is similar to other SWAT modeling studies [41,42].For hydrologic evaluations performed at a monthly time interval, the model results are satisfactory when E NS values exceed 0.5.However, appropriate relaxing of the standard may be performed for daily time-step evaluations [32].Thus, a daily E NS of 0.5 corresponds to a monthly E NS of approximately >0.8, as suggested by [43].Therefore, the performance measures for the simulations range from satisfactory to good in all studied sub-watersheds according to [32].This evaluation methodology is suggested by many studies [8,41,43,44].Previously, we studied the Upper Du River watershed, which is a forest watershed similar to the Upper Honganjian watershed in China, using the same method and the results confirmed the satisfactory performance of SWAT [22,43,44], the results of this study are consistent with Liu [45], which was a study of a runoff simulation in the Upper Honganjian watershed.
We address the relative importance of precipitation and land use types for flood events in 33 sub-watersheds, the most convenient and comprehensive description of the relative importance of predictors can be derived from exploring their VIP values [26].For different return periods in each season, relevant fluctuations in the total number of events in each decade are found.These fluctuations are consistent with the variations in flood events in the Upper Honganjian watershed, implying that a similar pattern of the flood events seasonal distribution exists in specific return periods.However, there are significantly different dynamics for return periods that exceed 100 years in the flood season, return periods that exceed 20 years in the wet season, and return periods that exceed five years in the dry season.These results may be related to the coupling of precipitation, land use types, and anthropogenic construction [5,46].
The results in Figure 7 show that precipitation changes will influence flood events, especially at a two-to five-year return period in the flood and wet seasons.Precipitation could trigger a higher risk of floods in a watershed, and many complex factors (temperature, reservoirs, and drainage system) may also influence the streamflow volume [47].Similar conclusions were reported by Zhang et al. [5], who showed that precipitation is one of the pivotal factors triggering hydrological alterations of flood events, the flood dynamics at return periods > 5 years may be attributed to the hydrological regulations of water reservoirs.Anthropogenic construction and management, such as dams and reservoirs, river training, and human water use affect the seasonality of flood events in these periods [48,49].Reservoirs, dams, irrigation flow diversions, and flood control structures have been developed and generate significant hydrogeomorphic alterations with impacts occurring in both streams and catchments of the watershed [50].Reservoirs and dams result in increased evapotranspiration (ET) and lead to fewer flood events, while seasonal withdrawals affect the seasonality of flood events [48].Small reservoirs may lose up to 50% of their stored volume due to evaporation in many regions due to the high ratio of surface/volume area.Evaporation constitutes a major component of the water balance in the reservoirs and may significantly decrease flood events [51].Moreover, in order to sustain and maintain the ecological integrity of watershed, numerous watershed management measures, such as management of flood utilization, establishing and maintaining minimum flow releases, or permitting controlled "flushing" releases that establish the necessary high flows for sediment transport have been applied.These hydrogeomorphic and watershed management practice impacts have profoundly influenced flood evolution and frequency [50].In the dry season, the correlations between flood events and the precipitation amount shift from low to high values as the return period increases.This anomaly can be interpreted as follows: for a longer return period, the flood frequency depends on the initial soil moisture conditions [21]; therefore, the precipitation amount in the dry season determines the initial soil moisture conditions and indirectly influences the flood events that occur in the wet and flood seasons.
For each season, the influence of land use on flood events decreases with an increasing return period.Water management facilities have led to variations in hydrological events for a longer return period [4].Our results reveal a strong influence of land use types on flood events for specific return periods; the number of flood events is also highly influenced by land use type, namely, forest, urban, and farmland areas.Farmland and urban areas are found to increase the number of flood events, while forest land results in fewer flood events.Similar results were reported by [47,52].Farmland reduces evapotranspiration, enhances infiltration, increases the initial moisture stored in the soil, eventually increases the number of flood events [21], the increased number of flood events associated with the expansion of urban areas can be explained as follows.With urbanization, infiltration is reduced by soil compaction and impervious surface additions, and water flushes more quickly through the watershed as a result of decreases in the hydraulic resistance of land surfaces and channels [53].However, forest land increases evapotranspiration and tends to decrease the number of flood events [21].For return periods of two to five years, the number of flood events is most closely related to the land use types of forest, urban, and farmland in the dry season, followed by the wet and flood seasons.This conclusion was also drawn by Liu et al. [54], who indicated that the impact of deforestation or reforestation on hydrological events is more significant in the dry season than in other seasons.In dry season, the effects of rainfall are greater than those of land use type.This phenomenon can be explained as follows, the dry season is not a growth period for most vegetation, and ET of forest and other land use types associated with flood events can be ignored [22].In addition, compared with flood and wet seasons, interception of the forest canopy and undergrowth vegetation associated with throughfall in the dry season is generally lower, which mitigates the negative influences of forests on flood [53].

Conclusions
The SWAT model was used to estimate the daily streamflow for ungauged sub-watersheds.Exploratory data analysis and outlier detection were performed using box plots.Based on the maximum likelihood (ML) estimation method, Akaike information criterion (AIC), and Bayesian information criterion (BIC), the lognormal (LN2) distribution was preferable to the generalized extreme value (GEV) distribution to fit the partial time series streamflow data.
In low return periods, similar patterns existed for the flood event distribution.Significantly different patterns existed for return periods that exceeded 100 years in the flood season, return periods that exceeded 20 years in the wet season, and return periods that exceeded five years in the dry season.Rainfall and forest are pivotal factors triggering flood event alterations in lower return periods, and the flood events in higher return periods are attributed to the hydrological regulations of water management facilities.Farmland and urban areas were related to fewer flood events, while the presence of forest land was found to decrease the number of flood events.In the dry season, the influence of rainfall on flood events is much greater than that of land use.
The approach used in this study can help to easily select the optimal distributions for watersheds using ungauged sub-watersheds, the return periods and flood events can be simulated more precisely using optimal distributions.Moreover, flood-prone regions can be identified, which can provide a scientific foundation to determine flood-resistant measures by comparing the increased flood risk at different return levels.

Forests
illuminating the response of flood frequency distribution to land use and precipitation based on PLS regression.

Figure 1 .
Figure 1.Location of Upper Honganjian watershed with observation stations.

Figure 1 .
Figure 1.Location of Upper Honganjian watershed with observation stations.

Figure 2 .
Figure 2. Land uses (farmland, urban, forest, and grassland) map for the 1980s in the Upper Honganjian watershed.

Figure 2 .
Figure 2. Land uses (farmland, urban, forest, and grassland) map for the 1980s in the Upper Honganjian watershed.

ForestsFigure 3 .
Figure 3. Map showing the sub-watersheds in the Upper Honganjian watershed.

Figure 3 .
Figure 3. Map showing the sub-watersheds in the Upper Honganjian watershed.

Figure 4
Figure 4 displays a box plot for the annual seasonal extreme streamflow data for each decade.The total annual number of days with extreme streamflow for the entire watershed for the 1970s, 1980s, 1990s, and 2000s were 306, 321, 291, and 196, respectively.The peak in the 1980s was well defined due to the frequency of the hydrometeorological days, which represented approximately 28.8% of all the data, while the low value in the 2000s represented approximately 17.6% of all the data.

Figure 4 .
Figure 4. Box plot for the monthly extreme streamflow data in the flood season (July, August, and September), wet season (May, June, and October), and dry season (November-April) in the 1970s (A), 1980s (B), 1990s (C), and 2000s (D).

Figure 4 .
Figure 4. Box plot for the monthly extreme streamflow data in the flood season (July, August, and September), wet season (May, June, and October), and dry season (November-April) in the 1970s (A), 1980s (B), 1990s (C), and 2000s (D).

Figure 5 .
Figure 5. Empirical probability curves and confidence intervals (Conf.Int.) of the LN2 distributions for the observed stream flows at the watershed scale.The scale of y-axis is normal probability.

Figure 5 .
Figure 5. Empirical probability curves and confidence intervals (Conf.Int.) of the LN2 distributions for the observed stream flows at the watershed scale, the scale of y-axis is normal probability.

Figure 7 .
Figure 7. Regression coefficients (lines) and the variable influence on projection (VIP) (bars) of each factor.

Figure 7 .
Figure 7. Regression coefficients (lines) and the variable influence on projection (VIP) (bars) of each factor.

Table 1 .
Parameters for streamflow calibration of the SWAT in the Upper Honganjian watershed.

Table 2 .
Accuracy of the SWAT model calibration and validation in the Upper Honganjian watershed.

Table 3 .
Summary of the goodness-of-fit tests for the 33 sub-watersheds.

Table 3 .
Summary of the goodness-of-fit tests for the 33 sub-watersheds.

Table 4 .
Summary of PLS regression for floods in each season.Sampling R 2 = goodness of fit; b RMSECV = cross-validated root mean squared error. a