Comparison of SWAT and GWLF Model Simulation Performance in Humid South and Semi-Arid North of China

Watershed models have gradually been adapted to support both decision and policy making for global environmental pollution control. In this study, two watershed models with different complexity, the Soil and Water Assessment Tool (SWAT) and the Generalized Watershed Loading Function (GWLF), were applied in two catchments in data scarce China, namely the Tunxi and the Hanjiaying basins with contrasting climatic conditions (humid and semi-arid, respectively). The performances of both models were assessed via comparison between simulated and measured monthly streamflow, sediment yield, and total nitrogen. Time series plots as well as four statistical measures (the coefficient of determination (R2), the Nash–Sutcliffe efficiency (NSE), percent bias (PBIAS), and RMSE (root mean square error)—observations standard deviation ratio (RSR)) were used to estimate the performance of both models. The results show that both models were generally able to simulate monthly streamflow, sediment, and total nitrogen loadings during the simulation period. However, SWAT performed better for detailed representations, while GWLF could produce much better average values of the observed data. Thus, GWLF offers a user-friendly prospective alternative watershed model that requires little input data and that is applicable for areas where the input data required for SWAT are not always available. SWAT is more suitable for projects that require high accuracy and offers an advantage when measured data are scarce.


Introduction
China is the biggest developing country in the world, and its rapid economic development has resulted in a large number of significant water quality issues such as eutrophication of lakes and reservoirs, deterioration of river water, and groundwater pollution [1,2].To resolve these environmental issues, the Chinese government gradually resorted to mathematical models to provide a scientific basis for quantitative environmental management rather than exclusively depending on empirical qualitative analyses [3].Currently, numerous watershed models with various capabilities are widely used in hydrological research and environmental resource management around the world [4,5].These are powerful tools that enable us to understand the natural processes, as well as to find solutions for problems, while assessing the environmental conditions on a the watershed scale [6].However, typically, there is a trade-off between model complexity, input data availability, and prediction ability in a certain application objective [7].Butts et al. developed a hydrological modeling framework that allows for the application of different model structures by providing varying levels of model complexity.The authors reported that an increase of model complexity did not increase model performance for a number of investigated cases.Accordingly, different models with different complexities had to be selected for an exploration of the applicability of watershed models.
SWAT is a semi-distributed and physical-based hydrological model, which has evolved from multiple previous models over more than 30 years [8,9].Considerable applications in a wide range of regions and environmental conditions have indicated SWAT to be an effective and acceptable tool both for scientific research and policy making [10].It has been extensively implemented throughout the world, e.g., in America [11], Africa [12], and Australia [13].In China, it has been used in the Chaohe basin in the north of China [14], the Heihe basin in the west of China [15], and the Three Gorges Reservoir Region in the south of China [16].The primary categories to which SWAT has been applied include hydrologic assessments [17,18], pollutant assessments [19], and climate change impacts [20,21].The GWLF is a simpler, continuous process-based model, which has been used in America [22], Ireland [23], and China [24] for various purposes.The Ministry of Environmental Protection of China has endorsed the GWLF as an alternative model to promote water quality and to meet environmental quality standards [25].Both models were used to support the development of Total Maximum Daily Loads (TMDLs) [26].Due to their wide applicability, acceptance by the authorities, as well as their different complexities, SWAT and GWLF were selected and compared in China for regions where monitoring networks are incomplete compared to developed countries.
There have been many studies that compared watershed models.Li et al. [27] compared the conceptual, lumped Water and Snow balance MODeling system (WASMOD) model to SWAT for the Yingluoxia watershed and found that MASMOD provided the same, or even better results than SWAT for the simulated hydrograph.Parajuli et al. [28] employed both the Annualized AGricultural Non-Point Source (AnnAGNPS) and SWAT in south-central Kansas and their study indicated SWAT as the most appropriate model for this particular watershed.Wilcox et al. [29] simulated the runoff on six uncalibrated catchments using both a simple model and a complex model.Although their results demonstrated that more complex catchment models yield more accurate results, the superiority of complex models is not immutable for all watersheds.These studies show that different models lead to different performance in different applications.A model comparison without considering the regional differences is easily one-sided.Niraula et al. [30] applied the SWAT and GWLF models in east central Alabama to identify critical source areas (CSAs) of sediment and nutrients.Both models performed well for streamflow; however, SWAT slightly outperformed GWLF for sediment, total nitrogen (TN), and total phosphorus (TP).The purpose of their study was to assess whether different model choice would lead to a variance in the locations of CSAs and the authors did not conduct a comprehensive comparison between the simulation results of SWAT and GWLF.Moreover, the authors conducted the models on one site only, suggesting limited implications.Therefore, the objective of this study was to conduct a comprehensive comparison between SWAT and GWLF and to evaluate their applicability in two catchments with different climate, landuse, and soil type for monthly stream flow, sediment, and total nitrogen in the data scarce China.

Study Sites
The study sites were chosen based on data availability and differences in climate, landuse and soil types (Figure 1).The Tunxi catchment is located in Anhui Province, which was selected to represent the humid south of China.It covers an area of approximately 2674 km 2 with forest covering 74%, agriculture area 15.8%, urban 4.6% and others.Red soil (55%), paddy soil (13%), and purple soil (9.8%) are the predominant soil types.The basin had a subtropical humid monsoon climate with a mean annual temperature of 15.5 • C and a mean annual precipitation of 1752 mm during the period from 1993 to 2013.The daily temperature was always above 0 • C. The 6736 km 2 of the Hanjiaying basin were selected as a representation of the semi-arid north of China.It is one of the largest subbasins of the Luan River watershed and located in Hebei Province, which is situated in the north of the Qinling Mountains-Huaihe River line.Forest (49%) and agricultural land (25%) are the major land uses within the basin.Brown soil (65%), and cinnamon soil (22%) are predominant in this watershed.The basin plays an important role for ecological servicing and water supply to the region.The climate is dominated by a temperate continental monsoon climate with a mean annual temperature of 5.62  Due to significant differences in meteorological conditions, these two sites are typical representatives of the semi-arid north and the humid south of China, respectively (Figure 2).Due to significant differences in meteorological conditions, these two sites are typical representatives of the semi-arid north and the humid south of China, respectively (Figure 2).Due to significant differences in meteorological conditions, these two sites are typical representatives of the semi-arid north and the humid south of China, respectively (Figure 2).

Watershed Models
SWAT is a distributed-parameter model, which was primarily designed by the Agricultural Research Service (ARS) of the United States Department of Agriculture (USDA) to assess the effect of land management practices on water, sediment, and agricultural chemical yields in large complex watersheds over extended periods of time [31].GWLF is a combined distributed/lumped parameter model, which is based on a combination of simple runoff, sediment, and groundwater relationships and empirical chemical parameters [32].Both SWAT and GWLF models are continuous, pollutant-loading models that operate with a daily time step.
Table 1 lists the major processes and related methods considered by SWAT and GWLF models.SWAT and GWLF differ greatly in the way in which they delineate the watershed.Based on the topological structure of river networks, SWAT first discretized the watershed into a number of subbasins, subsequently dividing each subbasin into hydrologic response units (HRUs) according to the unique land use, soil, and slope combinations [31].In SWAT, each physical and chemical process is modeled at HRU scale within the subbasin and then routed along the river network toward the outlet of the watershed.However, the conception of subbasin does not exist in GWLF; therefore, it can only identify surface loading from different land covers and the results of each area are simply added into the watershed summation.In some sense, the model is distributed but lacks a spatial conception as well as a channel route component.For sub-surface modeling however, it is considered a lumped parameter model because it used uniform parameters for the entire watershed, ignoring the spatial variability of physical and chemical processes [33].The differences in emphasis on simplifying the real environment lead to the diverse properties of various watershed models.
The hydrological process is the most important component in any watershed model as the drive force during the whole simulation.Both models simulate the hydrological component based on the water balance equation for the shallow aquifer.The SWAT model provides two methods to estimate surface runoff: the modified SCS-curve number and the Green-Ampt infiltration method.In this study, both models used different versions of SCS-CN to estimate the surface runoff volume, considering the remaining amount for infiltration [34].The GWLF describes groundwater with the linear reservoir model, while SWAT uses empirical relationships.In addition, SWAT can calculate the lateral flow in the unsaturated zone.In GWLF, erosion is simulated via the Universal Soil Loss Equation (USLE), which predicts the average erosion, using a function of rainfall energy [44].Then, a sediment delivery ratio and transport capacities are applied to determine monthly sediment yield for each source area [33].In contrast, SWAT uses a modified version of the Universal Soil Loss Equation (MUSLE), which introduces a runoff factor displacing energy factor to daily estimate erosion and sediment yield.A delivery ratio is not required and sediment yields of single storms can be calculated [42,43].
Both models are also quite different in the way they estimate nutrient loads.The GWLF simply calculates nutrient loads by multiplying N and P concentration coefficients with the runoff volume or sediment yield at a monthly scale.It uses denitrification loss fractions to calculate the denitrification amount.With the daily time step, SWAT models nutrient cycles via different pools to simulate their mineralization, decomposition, and immobilization between inorganic and organic forms within the soil.Then, the amount of mineral and organic nutrients transported in both land phase and routing phase is calculated.
In addition to these basic components, SWAT has the additional powerful ability to simulate crop growth, management, as well as the amount of pesticide, bacteria, algae, dissolved oxygen, carbonaceous biological oxygen demand (CBOD), and their routing in the channel or reservoir.

Model Inputs
Table 2 summarizes the data used for the model setup in this study.To avoid different results based on variations of model input data, we kept the input data of GWLF consistent with SWAT.The SWAT (Version 2012) and the ReNuMa (Regional Nutrient Management) (Version 2.2.2) modeling platform of GWLF (Version 2) were used.Thirty-meter resolution DEMs were used to determine the watershed and sub-watershed boundaries in SWAT and GWLF identified runoff source areas based on the same delineation.At both sites, land use data were used to obtain major cover classification information and SWAT needs extra spatial datasets.Soil datasets were only used in SWAT to partition the watershed into HRUs along with landuse and slope datasets.In SWAT, a combination of these three datasets divided the Tunxi watersheds into 40 subbasins and 307 HRUs, while it divided Hanjiaying into 33 subbasins and 258 HRUs.In GWLF, there were nine major landuse classes in Tunxi and seven in Hanjiaying.Meteorological data of each subbasin were obtained from the weather station nearest to its centroid for SWAT, while average climatic data were used for GWLF.Agriculture management information of the Tunxi watershed was referenced to [46] and obtained from the local government in Hanjiaying.Furthermore, population data were also required for the GWLF.

Model Calibration, Validation, and Evaluation
In the Tunxi watershed, the period from 2001 to 2008 was chosen for model calibration, and the data in 2000 were used as "warm up" to define appropriate initial conditions, and the latest five years from 2009 to 2013 were used for model validation of streamflow and total-nitrogen, while the sediment was validated from 2009 to 2011.For the Hanjiaying watershed, the periods of 2006-2011 and 2012-2014 were selected as the calibration and validation periods, respectively for flow, sediment, and total-nitrogen, while 2005 was used as the warm-up period.The simulation of SWAT and GWLF was conducted with a monthly time step and followed the calibration sequence: flow, sediment, and nitrogen.
Although multiple sets of parameters can obtain optimal fitting with the measured data, we only selected one of them as representation to facilitate comparison of both models.Tables 3 and 4 show the parameters that were chosen and defined in this study.In SWAT, a sensitivity analysis was conducted prior to model calibration and more than 20 major parameters were selected in Tunxi and Hanjiaying.Calibration was manually and automatically conducted via SUFI-2 uncertainty analysis through the SWAT-CUP program [47].The SCS curve number (CN2) was the most critical parameter for both stations, which is directly related to the runoff yield.As the value of CN decreased, overland flow reduced, but infiltration potential increased.The base flow recession constant, αALPHA_BF, is a direct index of groundwater flow response to recharge from the vadose zone [48].Values vary from 0.1-0.3 for land with slow response to recharge to 0.9-1.0 for land with rapid response.The SLSOIL was the key parameter, which we chose to adjust the lateral flow yield.By default, it is equal to the value of the average slope length of the subbasin (SLSUBBSN), which tends to result in a high lateral flow ratio.Therefore, we appropriately reduced its value for both sites.In Hanjiaying, two additional parameters were considered due to their influence on the snowmaking process.SMTMP defines the base temperature above which snowmelt is allowed.SNOCOVMX is the threshold depth of snow above which the basin would be completely (100%) covered with snow.The soil property parameter SOL_K was also included because the soil categories in the Hanjiaying basin are relatively coarse.Parameters related to groundwater balance and channel routing were also taken into account.Seven parameters were chosen to calibrate the sediment simulation with respect to erosion, maximal sediment amount, and routing in the channel.For nitrogen, four parameters about nitrite and one parameter about organic nitrogen were considered.Furthermore, we distributed several parameters depending on landuse, soil texture, and slope.When the calibration of one variable was completed, we retained an unchanged parameter range and began calibration of the next variable, unless results were not satisfactory [47].
For the GWLF, parameters related to watershed specific characteristics such as runoff source areas and populations were identified via GIS data analysis.Transport and nutrient parameters could be estimated using default coefficients according to [49].In this study, we used them as initial values and manually calibrated them.A total of 11 parameters were selected for calibration.The meaning of each parameter is listed in detail in Tables 3 and 4.After model calibration, the values of input parameters remained unchanged during the validation process.
The model performance for fitting measured constituent data was qualitatively evaluated via time series plots and quantitatively evaluated via four widely used statistics in watershed model evaluation (Table 5).
The coefficient of determination (R 2 ) indicates the degree of linear relationship between simulated and observed data.A R 2 value close to one indicates a better performance.However, it is very sensitive to extremely high values.The Nash-Sutcliffe efficiency (NSE) is one of the most commonly used criteria [50].This is a normalized statistic, which can be used to determine the goodness of fit.The NSE ranges from −∞ to 1, with 1 indicating a perfect match.The squared difference in equation becomes the limitation of the NSE for overestimating higher values and neglecting lower values [51].Percent bias (PBIAS) is an error index, generally used to measure the deviation of the constituent of data.It calculates the average tendency of the simulated data to be either larger or smaller than their observed counterparts with zero indicating the optimal value [52].The RMSE (root mean square error)-observations standard deviation ratio (RSR) combines the feature of an error index RMSE and a normalization factor so that it can be applied to various constituents [53].RSR ranges from the optimal value of 0 to infinity and the smaller the RSR, the better the simulation results will be.Model performance was judged based on statistics performance ratings as previously recommended [28,53].

Flow
Figure 3 illustrates a comparison between observed and simulated monthly mean streamflow series of both SWAT and GWLF models in two sites; the numerical criteria of model performance are summarized in Table 6.
In the Tunxi watershed, SWAT and GWLF almost replicated the entire trend of the discharge hydrograph with the simulated peak values and low flows consistently and perfectly matching the observed data throughout all years (Figure 3a).The high R 2 and NSE (above 0.9) values and the reasonably low RSR (below 0.25) for the calibration and validation periods indicate the excellent correlation and agreement between measured and simulated runoff for both models.Both SWAT and GWLF models underestimated streamflow by 9.69% and 4.03% during calibration, respectively, while overestimating the flow volume by 1.17% and 2.97% during the validation period.The average runoff simulated by SWAT and GWLF were both close to the average of observations.For the Hanjiaying watershed, the performance of both models degraded compared to the results for Tunxi.The shape of the monthly hydrograph was largely reproduced and relatively large fluctuations were found for the simulation of peak and low flows, contrasting with the measured data (Figure 3b).Based on the similar values of R 2 , NSE, and RSR between SWAT and GWLF, both models were equally able to predicted monthly streamflow during the entire duration of the simulation.However, GWLF produced marginally better PBIAS values and slightly more accurate average monthly flow than SWAT, especially during the validation period.According to these results, both models had an almost equal ability to simulate the monthly streamflow with sufficient accuracy after adequate calibration.Furthermore, the average runoff simulated by SWAT and GWLF were both close to the average of the observations.

Flow
Figure 3 illustrates a comparison between observed and simulated monthly mean streamflow series of both SWAT and GWLF models in two sites; the numerical criteria of model performance are summarized in Table 6.
In the Tunxi watershed, SWAT and GWLF almost replicated the entire trend of the discharge hydrograph with the simulated peak values and low flows consistently and perfectly matching the observed data throughout all years (Figure 3a).The high R 2 and NSE (above 0.9) values and the reasonably low RSR (below 0.25) for the calibration and validation periods indicate the excellent correlation and agreement between measured and simulated runoff for both models.Both SWAT and GWLF models underestimated streamflow by 9.69% and 4.03% during calibration, respectively, while overestimating the flow volume by 1.17% and 2.97% during the validation period.The average runoff simulated by SWAT and GWLF were both close to the average of observations.For the Hanjiaying watershed, the performance of both models degraded compared to the results for Tunxi.The shape of the monthly hydrograph was largely reproduced and relatively large fluctuations were found for the simulation of peak and low flows, contrasting with the measured data (Figure 3b).Based on the similar values of R 2 , NSE, and RSR between SWAT and GWLF, both models were equally able to predicted monthly streamflow during the entire duration of the simulation.However, GWLF produced marginally better PBIAS values and slightly more accurate average monthly flow than SWAT, especially during the validation period.According to these results, both models had an almost equal ability to simulate the monthly streamflow with sufficient accuracy after adequate calibration.Furthermore, the average runoff simulated by SWAT and GWLF were both close to the average of the observations.The critical reason for why the performance of both models was highly consistent at the same site is that the same runoff calculating method (SCS CN) was utilized in both models.Furthermore, the distinctly different behavior between the Tunxi and Hanjiaying watersheds of both models indicates that the SCS is more suitable for areas with high flow.Some previous applications in areas with less runoff yielded relatively poor statistics.Shen et al. [54] obtained a NSE of 0.711 and 0.690 during calibration and validation periods for the monthly runoff of the Three Gorges Reservoir with mean monthly observed values below 0.05 m 3 /s.Parajuli et al. [28] obtained a NSE of 0.56 and a PBIAS of −95.06 in Red Rock Creek with normal flow volume below 1 m 3 /s.Li et al. [27] obtained a NSEs of 0.948 and 0.923, and REs of −0.071 and −0.084 during calibration and validation periods for the Heihe River basin in China with mean monthly observed runoff above 49 m 3 /s.Other publications reported that the performance of SWAT and GWLF in simulating low flows is not as useful as those of high or normal flows [22,55].In fact, Chahinian et al. [56] compared four different infiltration-runoff models and all tested models had difficulties simulating low runoff events and even events characterized by a mild rainfall hiatus.Furthermore, the authors contributed this phenomenon to the absence of soil moisture re-distribution during flood events and to a constant value during the whole duration of the flood event.
For the calibrated parameters in SWAT, Tunxi had a higher CN2; thus, more streamflow was generated than in Hanjiaying, which is perhaps due to more abundant rainfall of Tunxi.The GWQMN is considerably higher in the Tunxi watershed than in Hanjiaying, indicating that Tunxi has more groundwater storage.The higher ALPHA_BF in Tunxi suggests a more rapid response to recharge entering the aquifers than in Hanjiaying, which was further confirmed by the higher value of recession coefficient in Tunxi of GWLF.The higher CH_K2 in Hanjiaying implies that its channel was easier to loose water via transmission when there is no groundwater contribution.As for GWLF, CN2 is also higher in Tunxi than Hanjiaying, which is consistent with SWAT.The parameter, unsaturated available water, is mainly related to soil property.Red soil and brown soil are the main soil types of Tunxi and Hanjiaying respectively.Brown soil is usually formed through eluviation and clayization processes and has thus poor water permeability and good water holding capacity [57].Hanjiaying has a higher value of unsaturated available water than Tunxi, partially indicating that more water can be sorted in brown soil than in red soil.As a whole, the variances among these parameters of both models consistently reflect differences in hydrological processes under different catchments to some extent.However, these differences still need to be experimentally verified.

Sediment
Figure 4 shows a graphical representation of the predicted and measured sediment yield on a monthly basis.Furthermore, the numerical criteria of model performance in simulating sediment load are summarized in Table 6.
In the Tunxi watershed, both models adequately simulated the trend of monthly sediment yield, but tended to underestimate extremely high values.Furthermore, the GWLF performed worse than SWAT in tracking peak timing (Figure 4a).In summary, both models showed very good correlations and sufficient agreement between monthly measured and predicted sediment values according to statistical criteria, except for PBIAS.During the calibration time, the GWLF model performed slightly better than the SWAT model, based on the same R 2 , higher NSE, lower RSR, and lower PBIAS.During the validation process, SWAT responded noticeably better than during the calibration period, while the performance of the GWLF did not show an apparent improvement.In addition, the PBIAS degrees for both models did not agree with other criteria and the values of SWAT were always higher than those of GWLF throughout entire periods, indicating a higher bias to predict sediment.In the Hanjiaying watershed, the performance of both models decreased compared to the results for Tunxi.The trend shape of the monthly sediment was roughly represented and there were large fluctuations for the simulation of peak and low flows compared to the measured data (Figure 4b).In general, both models equally predicted monthly sediment loads with reasonable accuracy during the entire simulation time based on the approximately identical values of R 2 , NSE, and RSR.In addition, the performance of both models during the validation period increased compared to the calibration period.Furthermore, SWAT performed marginally better than GWLF to some extent; however, this difference was so small that it was negligible.Furthermore, the average monthly sediment yield simulated by SWAT was much higher and closer to the observed values than for GWLF.According to the analysis above, both models were capable to predict the monthly sediment yield with adequate accuracy after sufficient calibration and SWAT was more reliable during the validation period.
The similarity of the results of both models suggests that the difference between MUSLE and USLE is not apparent in simulating monthly sediment loads, which has previously been suggested [54].The good representation and increased performance the SWAT model during calibration and validation periods may be attributed to the distributed property assessing spatial variations of the study sites.In Tunxi catchment, the consistent performance of GWLF was partially achieved due to its capability allowing sediment delivery ratio to be calibrated during different months.It simulated the peak values of sediment between April and July during calibration period reasonably, whereas it did not capture the peak values in February 2009 and March 2010 during validation periods.This indicates that the GWLF lacks adequate flexibility in case when evident difference exists between calibration and validation observed data, mainly due to its simple sediment parameters.Furthermore, errors of manual measurement and adaption of empirical calculating equation could also affect the performance of sediment in both models.period.Furthermore, SWAT performed marginally better than GWLF to some extent; however, this difference was so small that it was negligible.Furthermore, the average monthly sediment yield simulated by SWAT was much higher and closer to the observed values than for GWLF.According to the analysis above, both models were capable to predict the monthly sediment yield with adequate accuracy after sufficient calibration and SWAT was more reliable during the validation period.
The similarity of the results of both models suggests that the difference between MUSLE and USLE is not apparent in simulating monthly sediment loads, which has previously been suggested [54].The good representation and increased performance the SWAT model during calibration and validation periods may be attributed to the distributed property assessing spatial variations of the study sites.In Tunxi catchment, the consistent performance of GWLF was partially achieved due to its capability allowing sediment delivery ratio to be calibrated during different months.It simulated the peak values of sediment between April and July during calibration period reasonably, whereas it did not capture the peak values in February 2009 and March 2010 during validation periods.This indicates that the GWLF lacks adequate flexibility in case when evident difference exists between calibration and validation observed data, mainly due to its simple sediment parameters.Furthermore, errors of manual measurement and adaption of empirical calculating equation could also affect the performance of sediment in both models.

Total Nitrogen
Time series plots and numerical criteria of simulated and measured total nitrogen loads are summarized in Figure 5 and Table 6, respectively.
For the Tunxi watershed, both the SWAT and GWLF models produced acceptable fluctuations in comparison to the observed data, while the peak values tended to be underestimated by SWAT in particular (Figure 5a).Although the R 2 value of SWAT was similar to that of GWLF, the GWLF model outperformed SWAT remarkably during both calibration and validation periods based on NSE, RSR, and PBIAS.The GWLF constantly predicted the monthly TN loadings with very good accuracy: R 2 and NSE were above 0.8, RSR was above 0.5, and PBIAS stayed within 20.Compared to GWLF, SWAT improved the results from fair to good.Furthermore, the average monthly total nitrogen yield of GWLF was closer to the observed values than SWAT.The performance of both models during the validation period did not have obvious change contrasted to that during the calibration period.In the Hanjiaying watershed, the results of both models were not as satisfactory as those for Tunxi.The SWAT model roughly represented the trend shape of the monthly TN loadings and had criteria values ranging from good to very good during both calibration and validation periods.However, the GWLF did not provide acceptable simulation results for all years, although its statistics analysis during the calibration period was very good.Especially during the validation period, the time series of the GWLF was too gentle to capture each fluctuation of the observed data (Figure 5b).In contrast to SWAT, the average monthly TN predicted via GWLF was generally nearer to the measured values during both the calibration and verification periods.

Total Nitrogen
Time series plots and numerical criteria of simulated and measured total nitrogen loads are summarized in Figure 5 and Table 6, respectively.
For the Tunxi watershed, both the SWAT and GWLF models produced acceptable fluctuations in comparison to the observed data, while the peak values tended to be underestimated by SWAT in particular (Figure 5a).Although the R 2 value of SWAT was similar to that of GWLF, the GWLF model outperformed SWAT remarkably during both calibration and validation periods based on NSE, RSR, and PBIAS.The GWLF constantly predicted the monthly TN loadings with very good accuracy: R 2 and NSE were above 0.8, RSR was above 0.5, and PBIAS stayed within 20.Compared to GWLF, SWAT improved the results from fair to good.Furthermore, the average monthly total nitrogen yield of GWLF was closer to the observed values than SWAT.The performance of both models during the validation period did not have obvious change contrasted to that during the calibration period.In the Hanjiaying watershed, the results of both models were not as satisfactory as those for Tunxi.The SWAT model roughly represented the trend shape of the monthly TN loadings and had criteria values ranging from good to very good during both calibration and validation periods.However, the GWLF did not provide acceptable simulation results for all years, although its statistics analysis during the calibration period was very good.Especially during the validation period, the time series of the GWLF was too gentle to capture each fluctuation of the observed data (Figure 5b).In contrast to SWAT, the average monthly TN predicted via GWLF was generally nearer to the measured values during both the calibration and verification periods.Based on the comparison above, the SWAT model was capable of providing a reasonable and reliable prediction of monthly TN loadings especially in the Hanjiaying watershed where measured data were scarce.Several published studies verified the robustness of SWAT in representing nitrogen Based on the comparison above, the SWAT model was capable of providing a reasonable and reliable prediction of monthly TN loadings especially in the Hanjiaying watershed where measured data were scarce.Several published studies verified the robustness of SWAT in representing nitrogen loadings.Stewart et al. [19] used SWAT to predict water quality changes in Texas, reporting a very good correlation (R 2 = 0.89, 0.87) and good agreement (NS = 0.71, 0.73) of monthly organic nitrogen in calibration and validation periods.Jha et al. [58] reported that SWAT performed very well on annual and monthly nutrient predictions in the Raccoon River watershed during the simulation periods with R 2 and NSE exceeding 0.7 in most cases.Gassman et al. [10] summarized more than twenty peer-reviewed articles and the values for R 2 and NSE mostly exceeded 0.5, indicating that the SWAT model is able to replicate a wide range of observed in-stream pollutant levels.This is due to SWAT considering five different chemical forms of nitrogen as well as the mutual transformation between them in the nitrogen cycle.However, GWLF only considers two different physical forms of nitrogen and does not take the conversion between them into account.Furthermore, the nitrogen concentrations remain constant during the whole model operating time.Thus, the accuracy achieved by GWLF is heavily dependent on the efficacy of calibration, which perhaps results in its poor performance in the Hanjiaying watershed where the measured data were limited.Furthermore, the value of CMD was higher in the Tunxi watershed than in Hanjiaying, indicating that microbial activity tended to be higher in this humid and warm area.In addition, the higher SHALLST_N of SWAT and nitrogen concentrations in sediment and groundwater of GWLF indicate that Hanjiaying suffers more human intervention than Tunxi.Actually, Hanjiaying has more agricultural land than Tunxi.This perhaps contributed to the relatively degraded performance of both models for the Hanjiaying watershed.

Conclusions
In this study, we conducted a comparison between two watershed models with different complexities and construction in two discrete sites that represent the semi-arid north and the humid south of China.According to the quantitative statistics and graphical techniques, both the SWAT and the GWLF model were capable of simulating monthly flow, sediment, and total nitrogen with adequate accuracy.They performed similarly well in terms of streamflow and sediment.Furthermore, GWLF outperformed SWAT in the Tunxi watershed, while it had opposite performance in Hanjiaying for nitrogen simulation.The main conclusions of our study are listed below.
-The performances of both models in arid areas were not as good as the performances in humid areas, indicating that climatic conditions could greatly affect the applicability of a given model.
-Due to the same adopted surface runoff calculation method (SCS CN), results of both models in monthly streamflow were quite similar, even though the complexity of the model structures was quite different.
-In contrast to GWLF, SWAT performed more dependable and robust in sediment and total nitrogen and could reproduce the fluctuations of the observed data more accurately due to its spatial property and more detailed description of reality.
-GWLF could provide similar or even better results and much closer average values to measured data than SWAT in some cases.
-Due to its simpler structure, GWLF requires fewer data to set up, less time to run, and is easier to be used than SWAT.However, it is not suitable for application in large catchments and cannot reflect spatial variations due to the absence of channel route and spatial topological relationship of land uses.Furthermore, GWLF is more dependent on the calibration process than SWAT.
Overall, the user friendly GWLF is more suitable for a basic analysis to support environmental management in data-deficient areas such as China, where the basic data required by SWAT are not always available or credible.Furthermore, SWAT has an advantage in areas where measured data are scarce and is more suitable for projects that require high accuracy.
• C and a mean annual precipitation of 446 mm from 1993 to 2013.Monthly mean temperatures range below 0 • C during the period from November to March and above 10 • C during the summer months (June-August).Water 2017, 9, 567 3 of 19 of the Qinling Mountains-Huaihe River line.Forest (49%) and agricultural land (25%) are the major land uses within the basin.Brown soil (65%), and cinnamon soil (22%) are predominant in this watershed.The basin plays an important role for ecological servicing and water supply to the region.The climate is dominated by a temperate continental monsoon climate with a mean annual temperature of 5.62 °C and a mean annual precipitation of 446 mm from 1993 to 2013.Monthly mean temperatures range below 0 °C during the period from November to March and above 10 °C during the summer months (June-August).

Figure 1 .
Figure 1.Location and elevation of the Hanjiaying and Tunxi watersheds.

Figure 2 .
Figure 2. Mean annual rainfall and temperature of two sites between 1993 and 2013.

Figure 1 .
Figure 1.Location and elevation of the Hanjiaying and Tunxi watersheds.
Mountains-Huaihe River line.Forest (49%) and agricultural land (25%) are the major land uses within the basin.Brown soil (65%), and cinnamon soil (22%) are predominant in this watershed.The basin plays an important role for ecological servicing and water supply to the region.The climate is dominated by a temperate continental monsoon climate with a mean annual temperature of 5.62 °C and a mean annual precipitation of 446 mm from 1993 to 2013.Monthly mean temperatures range below 0 °C during the period from November to March and above 10 °C during the summer months (June-August).

Figure 1 .
Figure 1.Location and elevation of the Hanjiaying and Tunxi watersheds.

Figure 2 .
Figure 2. Mean annual rainfall and temperature of two sites between 1993 and 2013.

Figure 2 .
Figure 2. Mean annual rainfall and temperature of two sites between 1993 and 2013.

Figure 5 .
Figure 5. Simulated and observed monthly total nitrogen for the: (a) Tunxi watershed; and (b) Hanjiaying watershed.

Figure 5 .
Figure 5. Simulated and observed monthly total nitrogen for the: (a) Tunxi watershed; and (b) Hanjiaying watershed.

Table 1 .
Summary of the major processes and related methods used by SWAT and GWLF.

Table 2 .
Input data used in SWAT and GWLF.

Table 3 .
Parameters selected for the calibration for streamflow.

Table 4 .
Parameters selected for the calibration for sediment.

Table 5 .
Statistics used to evaluate models.

Table 6 .
Statistics values of model performance.