Evaluating Three Hydrological Distributed Watershed Models: Mike-she, Apex, Swat

Selecting the right model to simulate a specific watershed has always been a challenge, and field testing of watersheds could help researchers to use the proper model for their purposes. The performance of three popular Geographic Information System (GIS)-based watershed simulation models (European Hydrological System Model (MIKE SHE), Agricultural Policy/Environmental Extender (APEX) and Soil and Water Assessment Tool (SWAT)) were evaluated for their ability to simulate the hydrology of the 52.6 km 2 Canagagigue Watershed located in the Grand River Basin in southern Ontario, Canada. All three models were calibrated for a four-year period and then validated using an independent four-year period by comparing simulated and observed daily, monthly and annual streamflow. The simulated flows generated by the three models are quite similar and closely match the observed flow, particularly for the calibration results. The mean daily/monthly flow at the outlet of the Canagagigue Watershed simulated by MIKE SHE was more accurate than that simulated by either the SWAT or the APEX model, during both the calibration and validation periods. Moreover, for the validation period, MIKE SHE predicted the overall variation of streamflow slightly better than either SWAT or APEX.


Introduction
Many computer simulation models have been developed to simulate watershed-scale processes and the hydrologic effects of different management scenarios.Watershed models are effective tools for investigating the complex nature of those processes that affect surface and subsurface hydrology, soil erosion and the transport and fate of chemical constituents in watersheds [1].A watershed model can be used to achieve a better understanding of the impact of land use activities and different management practices on these hydrologic processes.Due to the increased spatial data availability, more and more distributed hydrological models are used.For example, from 2004 to 2011, as part of the overall Conservation Effects Assessment Project, thirteen projects on agricultural watersheds in the United States were funded jointly by the U.S. Department of Agriculture USDA National Institute of Food and Agriculture and the Natural Resources Conservation Services, [2].The Geographic Information System (GIS) has provided another useful basis for spatially distributed physical processes, including watershed models.Selecting the proper model to simulate the hydrologic processes of a specific watershed has always been a challenge, and field testing of the hydrologic components of watersheds could help researchers to use the proper model for their purposes.
In recent years, distributed watershed models have been used increasingly to implement alternative management strategies in the areas of water resources allocation, flood control, land use and climate change impact assessments, as well as pollution control [3].Some authors tend to criticize the use of distributed models; their main concern is that many parameters can be altered during the calibration phase [4].According to Beven [5], a key characteristic of distributed models is the problem of over parameterization.In response, Refsgaard and Storm [6] emphasize that a rigorous parameterization procedure might help overcome the problems faced in calibrating and validating fully distributed physically-based models.
In this study, three GIS-based distributed continuous simulation models commonly used for watershed management assessment are evaluated and field verified.These models include the Agricultural Policy/Environmental Extender (APEX) [7], European Hydrological System Model MIKE SHE [6] and the Soil and Water Assessment Tool (SWAT) [8].There have been several applications of these models, either individually or in comparison with another model.Some of these applications are described below.
Borah and Bera [9] reviewed eleven watershed-scale hydrological models and concluded that Agricultural Non-Point Source (AGNPS), Annualized Agricultural Non-Point Source (AnnAGNPS), Decision Support System for Agro Technology Transfer (DWSM), Hydrological Simulation Program-Fortran (HSPF), MIKE-SHE and SWAT were able to simulate all major components (hydrology, sediment and chemical) applicable to watershed-scale catchments.SWAT was considered a promising model for continuous simulations in predominantly agricultural watersheds.In general, MIKE SHE and SWAT showed better performances when compared with other models.All of these studies indicated that further investigation was needed to reach a solid conclusion about the superiority of one model over the others.APEX was not included in this study.
Borah et al. [10] evaluated and compared SWAT and DWSM results for the 620-km 2 Upper Little Wabash River watershed (Effingham, IL, USA) using a visual comparison of hydrographs.These results showed SWAT's weakness in predicting monthly peak flows (mostly under predictions).
Shi et al. [3] compared the performance of the SWAT and Xinanjiang (XAJ) models, the latter widely used in China, and showed that both models performed well in the Xixian River Basin, with a percentage of bias (PBIAS) of less than 15%, Nash-Sutcliffe efficiency (NSE) > 0.69 and coefficient of determination (R 2 ) > 0.72 for both calibration and validation periods.Two popular watershed-scale models, SWAT and HSPF, were used to simulate streamflow, sediment and nutrient loading from the Polecat Creek watershed in Virginia.The results indicated that both models were generally able to simulate effectively streamflow, sediment and nutrient loading.However, HSPF-simulated hydrology and water quality components were more accurate than those of the SWAT model at all monitoring sites within the watershed [11].HSPF and SWAT were also evaluated for simulating the hydrology of a watershed in Illinois and Indiana.As a rule, the characteristics of simulated flows from both models were similar to each other and to observed flows, particularly for the calibration results.However, SWAT predicted flows slightly better than HSPF for the verification period, with the primary advantage being a better simulation of low flows [1].
Refsgaard and Knudsen [12] validated and compared three different models in three catchments, namely the Nedbor-Afstromnings NAM lumped conceptual modeling system [13], the MIKE SHE distributed, physically-based system [14,15] and the Hybrid Water Balance Model (WATBAL) approach [16].The study was applied on two large catchments and a medium-sized one (1090, 1040 and 254 km 2 ).The authors concluded that all models performed equally well when at least one year of data was available for calibration, while the distributed models performed marginally better for cases where no calibration was performed.
The performance of the fully distributed MIKE SHE model and that of the semi-distributed SWAT model were compared for the 465-km 2 Jeker River Basin, situated in the loamy belt region of Belgium [4].The two models differ in conceptualization and spatial distribution, but gave similar results during calibration.However, MIKE SHE provided slightly better overall predictions of river flow.
All of these studies concluded that the models' performances are very site specific, and because no one model is superior under all conditions, a complete understanding of comparative model performance requires applications under different hydrologic conditions and watershed scales.Since APEX is able to be used for small-scale watersheds and farms and also to evaluate a wide range of alternative manure management scenarios, it will be important to evaluate the hydrological component of the model.
Therefore, the objective of the present study is to compare and assess the suitability of three widely-used watershed simulation models, namely APEX, MIKE SHE and SWAT, for simulating the hydrology of a major tributary of the upper Grand River Basin, the Canagagigue Watershed.This watershed is representative of land use and soils throughout much of the Grand River Basin.The performance of the three models was assessed with respect to their capacity to generate the daily flow rate at the catchment outlet of the Canagagigue Watershed, a small-sized watershed situated in a loamy region of the Grand River Basin.This paper presents the overall performances of the three models in this Ontario watershed, where there is significant snowfall and snowmelt influence runoff.

Study Area
With almost 7000 km 2 in drainage area, southwestern Ontario's Grand River basin contributes about 10% to the water received by Lake Erie.A minor tributary of the Grand River, the Canagagigue Creek has a drainage area that extends over 143 km 2 (43.60-43.70°N,80.55-80.63°W)and covers the Peel and Pilkington townships of Wellington County and Woolwich Township of Waterloo County, ON (Figure 1).A gauging station (02GAC17) situated near Floradale, ON (approximately 100 km west of Toronto, ON, Canada), provided hourly stream flow observations for the period of 1989-2000, monitored at the southerly outlet of a 53-km 2 subwatershed housing the upper reaches of the Canagagigue Creek.With a mean elevation of 417 m above mean sea level (mAMSL), this roughly triangular and southerly downsloping subwatershed shows a flat to gently undulating topography (mean slope < 1.5%).The main soil types in the watershed are presented in Figure 2. Soil surveys of Waterloo County [17] and Wellington County [18] indicate that most of the watershed bears 0.2 to 0.6 m of loam or silty loam of the Huron and Harriston series over a loam till.In the northern part of the watershed, clay loam is predominant, while loam is the main soil type in the central portion of the watershed.In the south and southeastern sections, the soil types can be characterized as moraine deposits of very fine sand and fine sandy loam, with occasional layers of other material.
A map of land use characteristics (Figure 3) shows that 80% of the area is devoted to agriculture and another 10% to woodlots [19].The remaining watershed is comprised of urban areas, fallow land, rivers and lakes.

SWAT Model
SWAT is a conceptual, physically-based, continuous model.It operates on a daily time step and is designed to predict the impact of watershed management practices on hydrology, sediment and water quality on a gauged or an un-gauged watershed.The major model components include weather generation, hydrology, sediment, crop growth, nutrient and pesticide subroutines [8].To accurately simulate water quality and quantity, SWAT requires specific information about topography, weather (precipitation, temperature), hydrography (groundwater reserves, channel routing, ponds or reservoirs, sedimentation patterns), soil properties (composition, moisture and nutrient content, temperature, erosion potential), crops, vegetation and agronomic practices (tillage, fertilisation, pest control) [20,21].The model simulates a watershed by dividing it into subbasins, which are further subdivided into hydrologic response units (HRUs), a compartmentational unit that is determined by finding regions of similarity by overlying digitized soil, slope and land use maps.For each HRU in every subbasin, SWAT simulates the soil water balance, groundwater flow, lateral flow, channel routing (main and tributary), evapotranspiration, crop growth and nutrient uptake, pond and wetland balances, soil pesticide degradation and in-stream transformation nutrients and pesticides [22].
The hydrologic components in SWAT include surface runoff, infiltration, evapotranspiration, lateral flow, tile drainage, percolation/deep seepage, consumptive use through pumping (if any), shallow aquifer contribution to streamflow for a nearby stream (baseflow) and recharge by seepage from surface water bodies [20,21].More detailed descriptions of the model are given by Arnold et al. [8] and/or in the SWAT theoretical documentation [20].

APEX
The APEX model was developed to extend the Environmental Policy Impact Climate (EPIC) model's [7] capabilities to whole farms and small watersheds.The model consists of 12 major components: climate, hydrology, crop growth, pesticide fate, nutrient cycling, erosion-sedimentation, carbon cycling, management practices, soil temperature, plant environment control, economic budgets and subarea/routing.Management capabilities include sprinkler, drip or furrow irrigation, drainage, furrow diking, buffer strips, terraces, waterways, fertilization, manure management, lagoons, reservoirs, crop rotation and selection, cover crops, biomass removal, pesticide application, grazing and tillage.The simulation of liquid waste applications from concentrated animal feeding operation (CAFO) waste storage ponds, or from lagoons, is a key component of the model.Stockpiling and subsequent land application of solid manure in feedlots or other animal feeding areas can also be simulated in APEX.In addition to routing algorithms, groundwater and reservoir components have been incorporated into APEX.The routing mechanisms provide for the evaluation of the interactions between subareas involving surface run-off, return flow, sediment deposition and degradation, nutrient transport and groundwater flow.Water quality in terms of soluble and organic N and P and pesticide losses may be estimated for each subarea and at the watershed outlet.
Williams [7] provided the first qualitative description of the APEX model, which included a description of the major components of the model, including the manure management component.Williams et al. [23] expanded qualitative descriptions of the model, which provided an overview of the manure erosion and routing components, including some mathematical description.Williams and Izaurralde [24] provided an exhaustive, qualitative description of the model coupled with mathematical theory for several of the components.Complete theoretical descriptions of APEX were initially compiled by Williams et al. [25] and Williams and Izaurralde [24]; Williams et al. [26] provided an updated, in-depth theoretical manual for the latest APEX model (version 0604).Recently, a graphical user interface has been developed in Geographic Information System, ArcGIS, ArcAPEX, to be used as a pre-processor and data entry tool.

MIKE SHE
MIKE SHE [6] is a deterministic, physically-based, distributed model for the simulation of different processes of the land phase in the hydrologic cycle.The hydrological processes are modelled by finite difference representations of the partial differential equations for the conservation of mass, momentum and energy, in addition to some empirical equations [4].The MIKE SHE modeling system simulates hydrological components, including the movement of surface water, unsaturated subsurface water, evapotranspiration, overland channel flow, saturated groundwater and exchanges between surface water and groundwater.With regard to water quality, the system simulates sediment, nutrient and pesticide transport in the model area.The model also simulates water use and management operations, including irrigation systems, pumping wells and various water control structures.A variety of agricultural practices and environmental protection alternatives may be evaluated using the many add-on modules developed at the Danish Hydraulic Institute (DHI).Model components describing the different parts of the hydrologic cycle can be used individually or in combination, depending on the scope of the study [27].To account for the spatial variations in catchment properties, MIKE SHE represents the basin horizontally by an orthogonal grid network and uses a vertical column at each horizontal grid square to describe the variation in the vertical direction.This is achieved by dividing the catchment into a large number of discrete elements or grid squares, then solving the equations for the state variables for every grid into which the study area was divided.To run the model, for each cell, several parameters and variables have to be given as input [28].
The system has no limitations in terms of watershed size.The modeling area is divided into polygons based on land use, soil type and precipitation region.Most data preparation and model set-up can be completed using GIS, ArcView, or MIKE SHE's built-in graphic pre-processor.
The system has a built-in graphics and digital post-processor for model calibration and evaluation of both current conditions and management alternatives.Animation of model scenarios is another useful tool for analyzing and presenting results.The MIKE SHE model makes predictions that are distributed in space, with state variables that represent local averages of storage, flow depths or hydraulic potential.Because of the distributed nature of the model, the amount of input data required to run the model is rather large, and it is rare to find a watershed where all input data required to run the model has been measured [4].

Model Performance Evaluation
In order to calibrate and validate the models and for comparison purposes, some quantitative information is required to measure model performance.In this study, the streamflow data measured at the outlet of the watershed was used to assess the model performance.The performance assessment was based on the water balance closure of the watershed, the agreement of the overall shape of the time series of discharge together with the total accumulated volumes and the value of the statistical performance indices [29][30][31], such as the root mean square error (RMSE), the modeling efficiency (EF) and the goodness of fit (R 2 ).
The RMSE (Equation ( 1)) indicates a perfect match between observed and predicted values when it equals 0 (zero), with increasing RMSE values indicating an increasingly poor match.Singh et al. [31] stated that RMSE values less than half the standard deviation of the observed (measured) data might be considered low and indicative of a good model prediction.The Nash-Sutcliffe efficiency coefficient (NSE) ranges between −∞ and 1.It indicates a perfect match between observed and predicted values when NSE = 1 (Equation ( 2)).Values between 0.0 and 1.0 are generally viewed as acceptable levels of performance, whereas values less than 0.0 indicate that the mean observed value is better than the simulated value, which indicates unacceptable performance.The coefficient of determination, R 2 , (Equation (3)), which ranges between 0 and 1, describes the proportion of the variance in the measured data, which is explained by the model, with higher values indicating less error variance.Typically, R 2 > 0.5 is considered acceptable [32,33].The percentage of bias (PBIAS) measures the average tendency of the simulated data to be larger or smaller than their observed counterparts [34].The optimal value of PBIAS is 0.0, with low magnitude values indicating an accurate model simulation.
Positive values indicate under-estimation bias, and negative values indicate over-estimation bias [34].The RMSE-observations standard deviation ratio (RSR) is calculated as the ratio of the RMSE and standard deviation of measured data.RSR varies from the optimal value of 0, to a large positive value.The lower RSR, the lower the RMSE and the better the model simulation performance.

(
) 1 where, n is the number of observations in the period under consideration, is the i-th observed value, is the mean observed value, is the i-th model-predicted value and is the mean model-predicted value.

Model Calibration and Validation
In order to evaluate the model, the first year of the data (1989) served to initialize the model, and the following two times four years of data were used to respectively validate and calibrate the model.Calibration of SWAT was performed in two steps by first calibrating the average annual water balance and then the calibration of the hydrograph shapes for the daily streamflow graphs.This was carried out in a logical order according to the most sensitive parameters, based on the sensitivity analysis.
In order to obtain more realistic and physically meaningful results, the observed total flow was separated into two components, surface runoff and baseflow, using an automated baseflow digital filter separation technique [35,36].The Base Flow Index (BFI) is then defined as the observed ratio of the baseflow that contributed to the total water yield.The baseflow from the total streamflow is estimated to be 40% annually for this watershed.Surface runoff was calibrated by adjusting the curve numbers for the different soils in the watershed under the conditions prevailing in the region and, then, using the soil available water (SAW) and soil evaporation compensation factor.In the next step, the baseflow component was calibrated by changing the "revap" coefficient (water in shallow aquifer returning to root zone), which controls the water movement from shallow aquifer into the unsaturated zone.The temporal flow was then calibrated by changing the transmission losses for the channel hydraulic conductivity and the baseflow alpha factor, which is a direct index of groundwater flow response to changes in recharge [37].Since the Canagagigue Creek Watershed is subject to significant snowfall and snowmelt during the winter and early spring, certain parameters related to the snow water mass balance were investigated with regard to their sensitivity to surface runoff, base flow, actual evapotranspiration and streamflow, through a review of the pertinent literature.For SWAT, these parameters were ESCO (soil evaporation compensation factor), SMTMP (snow fall temperature), TIMP (snow pack temperature lag factor), SMFMN (melt factor for snow on 21 December) and SMFMX (melt factor for snow on 21 June).The range of values for calibration of the SWAT model is listed in Table 1.All applied calibration steps applied to the SWAT model were in line with the recommended calibration steps listed in the SWAT User Manual 2000 [38].
In order to calibrate the MIKE SHE model, the snowmelt constants (degree-day factor and threshold melting temperature), as well as Manning's n were adjusted to match the simulated and observed runoff.An adjustment in the timing of the peaks was attempted by increasing the Manning's n parameter by 20% over the entire watershed and, thus, reducing the surface roughness and increasing the surface runoff velocity.Table 1 shows the MIKE SHE model parameters subjected to calibration.
To calibrate the APEX model, a sensitivity analysis on flow parameters in the model showed that certain parameters, which are presented in Table 1, are more sensitive parameters.Adjusting these parameters resulted in a better match between the observed and simulated flow data in the Canagagigue Watershed.Among these parameters, the curve number for moisture Condition 2 or the average curve number (CN2) are the most influential for runoff.Evapotranspiration was estimated using The Penman-Monteith method.Other parameters in Table 1 were also fine-tuned within the recommended range, which resulted in a better match between the observed and simulated flow data in Canagagigue Watershed.In addition, the parameters affecting CN, such as soil hydrological class and land use, were modified in some of the HRUs during the calibration.

Results and Discussion
Daily streamflow data from 1 October 1994 to 30 September 1998, were used for calibration, and the remaining data from 1 October 1990 to 30 September 1994, were used to validate the model performance.The calibration years were chosen for the completeness of their observed data and the inclusion of representative years (normal, wet and dry).
The watershed water balance for the calibration and the validation period is presented in Table 2. On average, SWAT overestimated and MIKE SHE underestimated the mean annual flow rate.APEX underestimated the river flow rate in the calibration period, but overestimated it during the validation period.The scatter plots of the observed and simulated monthly discharges (mm) for the three models are plotted in Figures 3 and 4 for the calibration and the validation periods, respectively.On the basis of the visual analysis of the observed and predicted runoff (Figures 3 and 4), the overall simulation appears to be reasonably good.Observed and simulated daily and monthly average streamflow using SWAT, MIKE SHE and APEX for the calibration and validation periods are presented in Figures 5-8.Based on Figures 5 and  7, one can conclude that with respect to the mean observed discharge assessed under calibration conditions, the models yielded comparable results.The performance of the models with respect to simulated river discharge was further examined using statistical criteria, applied to the calibration and validation periods.Model calibration and validation statistics, comparing observed and simulated flows for monthly and daily time intervals, are presented in Tables 3 and 4 These statistical coefficients (Tables 3 and 4) show that the fully-distributed physically-based MIKE SHE model performed better than the semi-distributed SWAT and APEX models during both calibration and validation.As might be expected, all three models performed slightly better in the calibration period than in the validation period.
Based on RMSE and R 2 values, all three models performed better for monthly comparisons than daily ones.On a monthly basis, the R 2 for APEX was slightly better than that for SWAT or MIKE SHE; however, the converse was the case for the RMSE and NSE.This shows that although the APEX prediction follows trends in the observed data, the deviation of the results from the average is high.For daily predictions, all statistical parameters show better performances with the MIKE SHE results.

Conclusions
The observed mean daily discharge was used to examine the performance of the fully distributed MIKE SHE model and the semi-distributed SWAT and APEX models.All three models require a fair amount of input and model parameters.In order to understand their limitations and advantages, these widely used watershed management models were tested using the same flow data drawn from a gauging station at the outlet of the Canagagigue Creek Watershed, in Ontario, Canada.The performance of the three models was tested using both qualitative (graphical) and quantitative (statistical) methods.
For the comparison, use was made of the discharge monitored at the Floradle station, located at the outlet of Canagagigue Creek Watershed, for the period of 1990-1998.One year of data was used to initialize the models, while from the eight-year record of daily discharge values, four years were used for calibration of the models and the remaining four years to validate them.
All three models are able to simulate the hydrology of the watershed in an acceptable way.The calibration results for the three models were similar, though the models differed in concept and spatial distribution.Notwithstanding their similarity in modelling capacity, a comparative analysis showed the MIKE SHE model to be slightly better at predicting the overall variation in streamflow.The second best model was SWAT; its performance only differed from that of MIKE SHE in the validation period.APEX performance in predicting daily mean streamflow was not as good as that of the other models.This can be attributed to the fact that it was originally developed for small-scale watersheds with a low concentration time.Therefore, APEX calculates monthly flow rates better than daily flow rates.Both the SWAT and APEX models are based on the curve number (CN) method for estimating surface runoff.It was expected that both models would have relatively similar results.The reason for the poorer performance of APEX can be due to the fact that the flexibility of SWAT for calibration is higher than of APEX.For example, in SWAT, the curve number can be manually manipulated and changed to better simulate the observed surface runoff, but in APEX, the CN values are calculated based on its components and cannot be entered directly.

Figure 1 .
Figure 1.Location of the study area in Grand River Basin and the river network.

Figure 2 .
Figure 2. Soil and land use classifications in Canagagigue Watershed.

Figure 6 .
Figure 6.Observed and simulated monthly average streamflow using SWAT, MIKE SHE and APEX for the validation period (1990-1994).

Figure 8 .
Figure 8. Observed and simulated daily average streamflow using SWAT, MIKE SHE and APEX for the validation period (1990-1994).

Table 1 .
Calibrated values of the adjusted parameters for streamflow calibration of the Soil and Water Assessment Tool (SWAT) model for the Canagagigue Creek Watershed.APEX, Agricultural Policy/Environmental Extender.

Table 2 .
Watershed water balance during the calibration period for MIKE SHE, APEX and SWAT.

Table 2 .
Cont.Flow sim : simulated flow by the models; surface runoff sim : simulated surface runoff by the models

Table 3 .
. Better model performances are realized if the values of RMSE are closer to zero, R 2 and EF are close to unity and PBIAS and RSR have small values.According to Moriasi et al. (2007b), a model is considered calibrated for flow if monthly NSE ≥ 0.65, PBIAS ≤ ±10% and RSR ≤ 0.60.Therefore, all three, MIKE SHE, SWAT and APEX, models were well calibrated, as shown by the statistics in Table 3. Monthly calibration and validation statistics for MIKE SHE, APEX and SWAT.RSR, RMSE-observations standard deviation ratio; NSE, Nash-Sutcliffe efficiency; PBIAS, percentage of bias.

Table 4 .
Daily calibration and validation statistics for MIKE SHE, APEX and SWAT.