A Comparison of Continuous and Event-Based Rainfall–Runoff (RR) Modelling Using EPA-SWMM

: This study investigates the comparative performance of event-based and continuous simulation modelling of a stormwater management model (EPA-SWMM) in calculating total runoff hydrographs and direct runoff hydrographs. Myponga upstream and Scott Creek catchments in South Australia were selected as the case study catchments and model performance was assessed using a total of 36 streamﬂow events from the period of 2001 to 2004. Goodness-of-ﬁt of the EPA-SWMM models developed using automatic calibration were assessed using eight goodness-of-ﬁt measures including Nash–Sutcliff efﬁciency (NSE), NSE of daily high ﬂows (ANSE), Kling–Gupta efﬁciency (KGE), etc. The results of this study suggest that event-based modelling of EPA-SWMM outperforms the continuous simulation approach in producing both total runoff hydrograph (TRH) and direct runoff hydrograph (DRH).


Introduction
Rainfall-runoff (RR) models are important tools for planning, design and management of water resource systems. These models are used in a wide range of hydrological applications ranging from the estimation of catchment runoff to analyzing the impact of land-use change on runoff [1]. The methods of synthesizing the rainfall-runoff process in these models differ from one model to another, consequently a variety of model classifications exist. According to Brocca et al. [2], RR models can be classified based on their spatial structure (lumped versus semi-distributed or distributed), time representation (continuous time versus event-based) or process description (physically meaningful versus data-driven). Daniel et al. [3] adopted four different classifications for RR models based on: (i) parameter specification (deterministic or stochastic); (ii) the nature of the basic algorithms (empirical, conceptual or physically-based); (iii) spatial representation (lumped, semi-distributed or distributed); and (iv) the temporal representation (event-based or continuous time). Similar classifications can be found in Elga et al. [4], Singh [5] and Wheater et al. [6]. Of the available models, spatial and temporal representations are the most commonly adopted model types [7,8], while event-based (EB) and continuous simulation (CS) models are the most recognized category within the temporal domain [4,5]. In EB modelling, the rainfall-runoff process in a catchment is simulated for a single rainfall or streamflow event with durations ranging from several hours to several days. Whereas in CS modelling the rainfall-runoff process is simulated for a long time period ranging from a couple of months to several years, including both dry and wet seasons. Generally, in EB modelling, only the infiltration process is modelled in order to account for losses, while in CS modelling, the evapotranspiration loss is also accounted for [9].
CS and EB models both have advantages and limitations. One major advantage of CS modelling is that it is capable of accounting for antecedent conditions such as initial soil-moisture status, stream

Study Area
For this study, several South Australian catchments were initially considered and finally two catchments: Myponga upstream and Scott Creek were selected as case study catchments based on the fact that rain gauges are located in close proximity to these catchments and data for these gauges are readily available. Besides, these catchments have been frequently used in hydrological, hydrogeological and climate change studies. For instance, the Myponga upstream catchment has been used for hydrological studies by Wella-Hewage [15], Akhter and Hewa [17] and climate change studies by Jones et al. [50]. Similarly, Scott Creek catchment has been used for hydrogeological studies by Banks et al. [51], Bestland et al. [52] and climate change studies by Ward et al. [53]. Moreover, these catchments have distinctive hydrogeological characteristics and therefore RR modelling performance can vary markedly [15,51,54].
The Myponga upstream catchment is located about 50 km south of the city of Adelaide. The whole catchment covers an area of 122 km 2 while upstream and downstream areas covering 73 km 2 and 49 km 2 , respectively [15,55]. The Myponga River is the major watercourse in the catchment, originating on the south-eastern side of the catchment and flowing into the Myponga Reservoir [56]. It is a highly seasonal river, yielding approximately 7,500,000 m 3 /year (MDF = 0.24 m 3 /s) [55]. Some of the tributaries of Myponga River are intermittent, completely drying up during dry periods. The terrain pattern in the catchment is made up of steep hills along the western boundary and rolling terrain on the eastern side of the Myponga River. The mid-section of the catchment comprises mostly rolling terrain with some low-lying marshy areas along the banks of the Myponga River [15,55]. The aquifer in the region is made up of Permian Sands, Tertiary Limestone and Quaternary sediments [54]. The Tertiary Limestone is the most productive aquifer in the catchment with high yields and low salinity [15,54]. A variety of soil categories exists in the catchment, but loamy soil dominates. The average daily temperature in the catchment varies from 13 • C to 27 • C in summer and 4 • C to 12 • C in winter [57]. Rainfall in the area is winter dominated. The mean annual rainfall and evaporation in the catchment is approximately 820 mm/year and 1100 mm/year, respectively [57].
Scott Creek is a subcatchment of the Onkaparinga catchment located about 17 km southeast of Adelaide. The catchment spreads over an area of 26.6 km 2 . The runoff from Scott Creek is highly seasonal, being concentrated in the winter and spring seasons. The mean annual streamflow measured at the Scott Bottom gauging station, is 3,710,000 m 3 /year (MDF = 0.12 m 3 /s) [15]. The topography of the catchment varies from steep slopes to mild rolling terrain [52]. A range of soil categories exists in the catchment. The northern and eastern sides of the catchment comprise loamy sand soils while north western and southern parts are mostly characterised by loam. About 51% catchment land consist loamy sand, 31% loam and 15% sandy loam [15,50]. The aquifer in the region is characterised by fractured rock aquifers which are overlain by a thin soil layer [51]. Groundwater plays a vital role in streamflow generation from the Scott Creek catchment [58]. The climate of the catchment is typically temperate with cool moist winters and warm dry summers. The average daily temperatures ranging from 14 • C to 27 • C in summer and 8 • C to 14 • C in winter [57]. The mean annual rainfall and evaporation in the catchment is 992 mm/year and 1200 mm/year respectively [57]. A location map of the Myponga upstream and Scott Creek catchment is shown in Figure 1.
Land-use distributions for Myponga upstream and Scott Creek catchment are shown in Figure 2a,b. As can be seen in Figure 2, majority of the land at the Myponga upstream covers by livestock which includes broad scale grazing for cattle and sheep. According to Wella-Hewage [15], approximately 75% of the catchment area is used for grazing, 16% covered by native vegetation, 7% covered by native forest and 2.4% used for horticulture. On the other hand, grazing is the major land-use of Scott Creek catchment, covering almost 50% of the catchment area, while natural vegetation covers 48% of catchment area. Natural vegetation exists mostly on the steep slopes of the catchment. The remaining land-uses includes 0.5% plantation forest and 0.8% urban areas [15]. is characterised by fractured rock aquifers which are overlain by a thin soil layer [51]. Groundwater plays a vital role in streamflow generation from the Scott Creek catchment [58]. The climate of the catchment is typically temperate with cool moist winters and warm dry summers. The average daily temperatures ranging from 14°C to 27°C in summer and 8°C to 14°C in winter [57]. The mean annual rainfall and evaporation in the catchment is 992 mm/year and 1200 mm/year respectively [57]. A location map of the Myponga upstream and Scott Creek catchment is shown in Figure 1. Land-use distributions for Myponga upstream and Scott Creek catchment are shown in Figure  2a,b. As can be seen in Figure 2, majority of the land at the Myponga upstream covers by livestock which includes broad scale grazing for cattle and sheep. According to Wella-Hewage [15], approximately 75% of the catchment area is used for grazing, 16% covered by native vegetation, 7% covered by native forest and 2.4% used for horticulture. On the other hand, grazing is the major landuse of Scott Creek catchment, covering almost 50% of the catchment area, while natural vegetation covers 48% of catchment area. Natural vegetation exists mostly on the steep slopes of the catchment. The remaining land-uses includes 0.5% plantation forest and 0.8% urban areas [15]. Creek catchment.

EPA-SWMM Model Description
The EPA-SWMM model was first developed in 1971 (SWMM1) and since then it has undergone four major upgrades: SWMM2; SWMM3; SWMM4; and SWMM5. The current version (SWMM5) has many attractive features including an integrated graphical environment to do the modelling task in a more convenient way. It also offers several post-analysis options such as displaying simulation result using graphs and tables, showing drainage area maps, scatter plots, profile plots, etc. Under version 5, the model has several releases. This study used release 5.1.011 to perform the analysis.
EPA-SWMM is a conceptual RR model that can be used for urban and rural, EB as well as CS modelling. The model is capable of accounting spatial variability of catchment properties by dividing

EPA-SWMM Model Description
The EPA-SWMM model was first developed in 1971 (SWMM1) and since then it has undergone four major upgrades: SWMM2; SWMM3; SWMM4; and SWMM5. The current version (SWMM5) has many attractive features including an integrated graphical environment to do the modelling task in a more convenient way. It also offers several post-analysis options such as displaying simulation result using graphs and tables, showing drainage area maps, scatter plots, profile plots, etc. Under version 5, the model has several releases. This study used release 5.1.011 to perform the analysis.
EPA-SWMM is a conceptual RR model that can be used for urban and rural, EB as well as CS modelling. The model is capable of accounting spatial variability of catchment properties by dividing the catchment into a collection of smaller, homogeneous subcatchments. It consists of four groups of algorithms to describe the hydrological processes of a catchment. These are: (i) surface runoff algorithms; (ii) loss algorithms; (iii) conveyance algorithms; and (iv) groundwater flow algorithms [11,40,41]. Surface runoff algorithms are used to generate surface runoff hydrographs from subcatchments. The model uses nonlinear reservoir routing techniques to compute surface runoff from each subcatchment [40]. Loss algorithms are used to estimate stormwater losses in a subcatchment due to infiltration and evaporation. Evaporation/evapotranspiration losses are modelled as losses from three different levels. These are: (i) surface; (ii) unsaturated zone; and (iii) saturated zone [11,40]. Infiltration losses are calculated using one of the three methods, viz. Horton method, Green-Ampt method or the Soil Conservation Service (SCS) Curve Number method [41]. The Horton method assumes infiltration rate to be maximum at the beginning of a storm and is decreasing exponentially with time and reach to a minimum constant or equilibrium rate when the soil gets fully saturated [11,40]. The Green-Ampt method describes the infiltration process as a balance between the changing water content of the soil over time and the change in hydraulic conductivity and diffusivity through the depth of the soil profile. The SCS curve number method is an empirical method based on the classification of several parameters describing the direct runoff and soil types for different areas across the United States. This method considers the runoff as a function of the total precipitation, the potential maximum storage retention and the initial abstraction.
The subsurface flow algorithms controls infiltration of water to the aquifer and groundwater flow [11,40,41]. Aquifers are represented in the model by two zones: unsaturated or upper zone and saturated or lower zone [40,42]. The upper zone receives water from the subcatchment through infiltration. The flow from the unsaturated zone to the saturated zone is controlled via infiltration parameters which depend on the moisture content of the upper zone. Water in the saturated zone is routed as lateral inflow to the stream and percolation into deep groundwater. Conveyance algorithms are used to route water through a conduit link. Flow within a conduit link is routed using one of the three routing methods, viz. steady flow routing, kinematic wave routing or dynamic wave routing [41,43]. Steady flow routing assumes flow is uniform and steady within each computational time step and hence does not account for channel storage, backwater effect, entrance/exit losses etc. Kinematic wave routing assumes energy slope equals the bed slope and each conduit is able to accommodate flows up to a maximum flow level and the excess water is stored in the upstream node and routed when capacity is available. Dynamic wave routing uses the one-dimensional Saint-Venant flow equation, coupled with a one-dimensional continuity equation and full momentum equation. This method is capable of accounting channel storage, backwater effects and entrance/exit losses [11,15,[40][41][42][43]. The continuity equation is applied at each node of the conveyance system while the momentum equation is applied along the conduit links.

Data
Rainfall-runoff modelling with EPA-SWMM requires rainfall and evaporation data as input to the model. There are several gauging stations in and around the Myponga upstream and Scott Creek catchments those are managed by the Bureau of Meteorology (BOM) and Department for Environment and Water (DEW). Some of these gauging stations record data on daily time-scale while others on sub-daily scale. Available data for these gauging stations were collected from BOM and DEW. These data were assessed for quality and only a few gauges were found to have continuous record on daily scale and selected for the modelling tasks. The selected gauging stations for both catchments are shown in Figure 2. Other data, including SRTM (shuttle radar topographic mission) derived one-arc-second (~30m) gridded digital elevation model (DEM) data, geological cross-section data of the catchments, soil category data and land-use data were collected from the Geoscience Australia and South Australian Government's Spatial and Geographic Information System (GIS) databases.
Though the collected time series data of the selected gauging stations spans over several years, the complete series could not be used because of: (i) missing records; and (ii) data from all selected gauges' do not coincide with each other for several reasons including shifting gauges and instrument malfunction. Finally, records from year 2000 to 2007 was found to be the longest continuous record which was graded as good quality by the agency and therefore selected for the study. These time series data were divided into three parts; the year 2000 was allocated to be the continuous RR modelling warm up period [15,59], while the periods from 2001 to 2004 and from 2005 to 2007 were chosen to be the model calibration and validation periods respectively in the EPA-SWMM modelling. The ratio of 70:30 was applied in allocating data series for model calibration and validation periods. A warm-up period is allocated for a continuous RR model before the actual calibration is performed to enable the model parameters to adjust to the current hydrological conditions. This period may less for wet catchments (ratio of evaporation over precipitation <0.90) and high for dry catchments [59]. According to Rahman et al. [59], a wet catchment requires less time (~55 days) as warm up in comparison to that of dry catchment (~298 days). From the calibration period (2001)(2002)(2003)(2004), events that produce peak flow above a threshold value, which was fivefold higher than the mean annual flow were extracted for the EB model calibration. As the peak flow of the selected events are with Annual Exceedance Probability (AEP) greater than 0.5, these events are considered as more frequent events in the context of flood frequency studies. Finally, a total of 20 streamflow events at the Myponga upstream catchment and 16 at Scott Creek catchment were selected, as indicated in Figure 3a,b. The selected events covered a wide range of peak flow levels ranging from 1.2 m 3 /s to 8.8 m 3 /s at the Myponga upstream and 0.7 m 3 /s to 5.7 m 3 /s at Scott Creek catchment.
Even though the collected time series data was graded as good quality by the agency, they were further checked for errors. Bonnin et al. [60] suggested that for good quality data checks, it is important to ensure: (i) check for extreme values above threshold; (ii) a real-data-check of values among others; (iii) dataset trend analysis; and (iv) a study of cross correlation between the gauges. Data were checked for outliers, normality and homogeneity by using parametric and nonparametric tests [61,62]. Outliers were defined by values over M + 3SD (M is the mean and SD is standard deviation) and compared with values from other gauging stations. If similar record exists in other gauging stations, they were not treated as outliers otherwise replaced by either mean value or M + 3SD as appropriate compared to the other gauging stations. Kolmogorov-Smirnov and Shapiro-Wilk's test of normality was done on the datasets and found that none of these gauges' data were normally distributed. However, the standard normal homogeneity test (SNHT) indicated that there was no significant difference between variances of different rain gauge's data for both catchments at 0.05% significance level. Besides, correlation analysis indicated that significant correlation exists within all rain gauges' data at both catchments. As shown in Figure 4a,b, trend analysis suggests that despite the fluctuations in annual rainfall and annual streamflow, a general trend of decrease in rainfall was consistent with the trend of decrease in streamflow at both catchments.
Exceedance Probability (AEP) greater than 0.5, these events are considered as more frequent events in the context of flood frequency studies. Finally, a total of 20 streamflow events at the Myponga upstream catchment and 16 at Scott Creek catchment were selected, as indicated in Figure 3a,b. The selected events covered a wide range of peak flow levels ranging from 1.2 m 3 /s to 8.8 m 3 /s at the Myponga upstream and 0.7 m 3 /s to 5.7 m 3 /s at Scott Creek catchment.  Even though the collected time series data was graded as good quality by the agency, they were further checked for errors. Bonnin et al. [60] suggested that for good quality data checks, it is important to ensure: (i) check for extreme values above threshold; (ii) a real-data-check of values among others; (iii) dataset trend analysis; and (iv) a study of cross correlation between the gauges. Data were checked for outliers, normality and homogeneity by using parametric and nonparametric tests [61,62]. Outliers were defined by values over M + 3SD (M is the mean and SD is standard

Catchment Delineation and Model Setup
Catchment processes are highly variable at both spatial and temporal scales [63]. To account for spatial variability, the Myponga upstream and Scott Creek catchments were subdivided into thirty and ten subcatchments, respectively, by using ArcMap software and 1-arc-second DEM data. Spatial analyst tools in ArcMap offers a range of functions to perform the catchment delineation. First a depressionless DEM layer is created to avoid any internal drainage within the catchment. This layer is then used to create flow direction and flow accumulation layers in ArcMap where it shows flow accumulations in streams, classify these streams based on flow accumulations and displays the classification with the help of color ramp. On the streams, several pour points to be identified where significant variation in flow accumulation occurs, roughly indicating on how many subcatchments to be created. There are no specific rules on the number of subcatchments required for modelling but the RORB model manual suggests that between 5 and 20 subcatchments are enough to allow for spatial variation of terrain, rainfall and losses [12]. Hence, by interpreting the flow accumulation layer in ArcMap, it was apparent that thirty subcatchments for Myponga upstream and ten for Scott Creek catchment were reasonable and pour points were inserted accordingly. The flow direction layer, flow accumulation layer and pour points layer were then processed through a few more functions in ArcMap and finally created the delineated watershed. In the EPA-SWMM model, water from every part of a subcatchment is flowing towards a common node and from that node, it is then routed to the next node through a conduit link between the nodes. Figure 5a,b presents the delineated subcatchments and main flow paths for the Myponga upstream and Scott Creek catchments. deviation) and compared with values from other gauging stations. If similar record exists in other gauging stations, they were not treated as outliers otherwise replaced by either mean value or M + 3SD as appropriate compared to the other gauging stations. Kolmogorov-Smirnov and Shapiro-Wilk's test of normality was done on the datasets and found that none of these gauges' data were normally distributed. However, the standard normal homogeneity test (SNHT) indicated that there was no significant difference between variances of different rain gauge's data for both catchments at 0.05% significance level. Besides, correlation analysis indicated that significant correlation exists within all rain gauges' data at both catchments. As shown in Figure 4a,b, trend analysis suggests that despite the fluctuations in annual rainfall and annual streamflow, a general trend of decrease in rainfall was consistent with the trend of decrease in streamflow at both catchments.   After the subcatchment delineation, some model parameters such as width and slope of a subcatchment, conduit length and junction invert elevation were extracted using GIS and DEM data. Percentage of impervious and pervious areas were estimated by interpreting the land-use map. Manning's roughness for impervious and pervious areas for each subcatchment was assigned via estimation using aerial photographs of the catchment. The initial values of depression storage parameters were assigned within the values recommended in the EPA-SWMM manual for various land-use types [40,41]. Though EPA-SWMM model allows to model irregular channel cross-sections, field observations suggested that stream cross-sections of the Myponga upstream and Scott Creek catchments can be simplified into trapezoidal geometry. Therefore, conduit sizing was estimated by assuming trapezoidal cross-section and depth of conduit was estimated as 3 m. Besides, some previous studies at the Myponga upstream and Scott Creek used trapezoidal cross-section in model setup and obtained a reasonable modelling performance [15,17]. Conduit roughness was initially assigned a value of 0.10 as per the recommended values in the EPA-SWMM manual [41,42]. The EPA-SWMM manual suggests that the depth of junction invert should be at or below the channel invert [40,42]. Hence, the depth of junction invert was assigned an arbitrary value of 4 m to keep it below the channel invert of 3 m. For infiltration modelling, the Horton infiltration method was selected because of its superiority over other infiltration methods such as Green-Ampt method, or SCS curve number method. Horton's method is widely adopted in infiltration modelling as it can best represent the infiltration capacity curve [15,64]. The initial values of the Horton's infiltration parameters were assigned by interpreting the hydrological soil group map and using the recommended values provided by Akan [18].

Catchment Delineation and Model Setup
Catchment processes are highly variable at both spatial and temporal scales [63]. To account for spatial variability, the Myponga upstream and Scott Creek catchments were subdivided into thirty and ten subcatchments, respectively, by using ArcMap software and 1-arc-second DEM data. Spatial analyst tools in ArcMap offers a range of functions to perform the catchment delineation. First a depressionless DEM layer is created to avoid any internal drainage within the catchment. This layer is then used to create flow direction and flow accumulation layers in ArcMap where it shows flow accumulations in streams, classify these streams based on flow accumulations and displays the classification with the help of color ramp. On the streams, several pour points to be identified where significant variation in flow accumulation occurs, roughly indicating on how many subcatchments to be created. There are no specific rules on the number of subcatchments required for modelling but the RORB model manual suggests that between 5 and 20 subcatchments are enough to allow for spatial variation of terrain, rainfall and losses [12]. Hence, by interpreting the flow accumulation layer in ArcMap, it was apparent that thirty subcatchments for Myponga upstream and ten for Scott Creek catchment were reasonable and pour points were inserted accordingly. The flow direction layer, flow accumulation layer and pour points layer were then processed through a few more functions in ArcMap and finally created the delineated watershed. In the EPA-SWMM model, water from every part of a subcatchment is flowing towards a common node and from that node, it is then routed to the next node through a conduit link between the nodes. Figure 5a,b presents the delineated subcatchments and main flow paths for the Myponga upstream and Scott Creek catchments. After the subcatchment delineation, some model parameters such as width and slope of a subcatchment, conduit length and junction invert elevation were extracted using GIS and DEM data. Percentage of impervious and pervious areas were estimated by interpreting the land-use map. Manning's roughness for impervious and pervious areas for each subcatchment was assigned via estimation using aerial photographs of the catchment. The initial values of depression storage parameters were assigned within the values recommended in the EPA-SWMM manual for various Aquifer bottom elevation was obtained from the geological cross-section map of the catchments. Initial aquifer water table depths for each subcatchment was estimated as equal to the junction invert elevation. Other aquifer properties such as field capacity, wilting point, and hydraulic conductivity for the identified soil categories in the catchments were obtained from the EPA-SWMM manual [40,41]. The initial soil moisture content in the aquifer was assigned an arbitrary value between the field capacity and the wilting point of the subsoil. Conductivity slope and seepage were assigned arbitrary values within the range specified in the EPA-SWMM manual [40,42]. Tension slope was initially assigned as EPA-SWMM default value and the initial value of the upper evaporation fraction was assigned an arbitrary value. The elevation of the ground surface level was set at a level 4 m higher than the node invert elevation. As the groundwater flow equation in the EPA-SWMM model is a user defined power function [11,41], initial values of the groundwater flow coefficient and exponent were assigned default values.
As shown in Figure 2, rainfall data were available from five rain gauges at the Myponga upstream and four gauges at the Scott Creek catchment. In order to identify which station's rainfall data is most relevant to a given subcatchment, the contributing area for each gauging station was determined using the Thiessen-polygon method. This method is widely used not only in Australian context but also around the world. Some of the previous studies at the Myponga upstream and Scott Creek catchment were done using this method [15,17]. Consequently, subcatchments within a gauging station's Thiessen-polygon boundary were assigned rainfall from that gauging station. Unlike rain gauge stations, evaporation gauges are sparsely located and therefore all subcatchments were assigned evaporation data from the closest evaporation gauge station.
The EPA-SWMM model was set up at daily time step because no suitable sub-daily time series data was available during the study period. For the flow routing option, the dynamic wave routing method was selected as it is theoretically the most accurate routing method, capable to simulate non-uniform, unsteady flow conditions. The routing time step in EPA-SWMM was set to 60 s. Although routing time steps with lower resolution (e.g., 30 s) can provide comparatively more reliable results, it requires a longer model run time and hence not preferable for automatic calibration where model needs to run several times during parameter optimization. The EPA-SWMM model allows to assign different time step values to compute surface runoff for different seasons of the year based on initial hydrological catchment conditions. Therefore, for dry periods, a time step of 10 min and for wet periods, a time step of 5 min were assigned to compute surface runoff. As year 2000 was considered as warmup period [59] for CS modelling, no events were selected from that year for EB modelling

Model Calibration, Validation, Parameter Sensitivity and Goodness-Of-Fit Tests
Both EB and CS models were calibrated using the automatic calibration method. A nonlinear parameter optimization tool, PEST (Parameter ESTimation) was integrated into EPA-SWMM to perform the optimization. In starting the process, PEST necessitates setting up three files: (i) PEST instruction file; (ii) PEST template file; and (iii) PEST control file [65]. The instruction file tells PEST how to read result from the EPA-SWMM modelling output file. For each model run during the optimization process, the PEST engine writes new values of the selected calibration parameters to the EPA-SWMM model input file using a template file. The PEST control file contains calibration data, initial values and upper and lower boundaries of the calibration parameters.
As shown in Figure 6, during the automatic calibration process, the PEST engine takes control of the EPA-SWMM model and performs a number of model runs within the assigned parameters boundary until a termination criterion is reached and the optimized values are identified. The termination criteria used in PEST are: (i) convergence of adjustable parameters to their optimal values (parameter convergence); (ii) insignificant relative parameter change over the successive iterations (function convergence); (iii) insignificant objective function reduction over successive iterations; and (iv) exceedance of the upper limit of the number of optimization iterations. The objective function used in PEST is given by Equation (1) [65].
where B is a vector comprising the parameters value being estimated, n refers to the number of total observations, x i and x i are the observed and simulated flow ordinates in ith observation and Wi is the weight of ith observation.
Though several parameters were defined in the EPA-SWMM modelling, all of them do not need to be calibrated because some of them are insensitive as compared to the others and changing their values during optimization do not make any significant difference in the model output [15,40,65]. Therefore, the most appropriate set of parameters to calibrate was identified based on parameters sensitivity. Parameters that were extracted from GIS or taken from other available information were locked in the model and excluded from calibration. During the optimization process, at each model run, PEST slightly changes the parameter values from the previous run and observe its effect on the objective function. By comparing the objective function between two successive iterations, it computes parameters sensitivity which is recorded in the PEST sensitivity file. The calibration parameters sensitivities were investigated using the PEST sensitivity file and most sensitive parameters were identified within the first couple of runs. Finally, the calibration was carried out using these sensitive parameters while insensitive parameters were fixed at physically-sensible values. The parameters those identified as the most sensitive parameters during the sensitivity analysis are shown in Table 1. It was noticed from the sensitivity analysis in PEST that the most sensitive parameters are same in both CS and EB modelling but the level of sensitivity was different.
initial values and upper and lower boundaries of the calibration parameters.
As shown in Figure 6, during the automatic calibration process, the PEST engine takes control of the EPA-SWMM model and performs a number of model runs within the assigned parameters boundary until a termination criterion is reached and the optimized values are identified. The termination criteria used in PEST are: (i) convergence of adjustable parameters to their optimal values (parameter convergence); (ii) insignificant relative parameter change over the successive iterations (function convergence); (iii) insignificant objective function reduction over successive iterations; and (iv) exceedance of the upper limit of the number of optimization iterations. The objective function used in PEST is given by Equation (1) [65].
where B is a vector comprising the parameters value being estimated, n refers to the number of total observations, are the observed and simulated flow ordinates in ith observation and is the weight of ith observation. Though several parameters were defined in the EPA-SWMM modelling, all of them do not need to be calibrated because some of them are insensitive as compared to the others and changing their values during optimization do not make any significant difference in the model output [15,40,65]. Therefore, the most appropriate set of parameters to calibrate was identified based on parameters sensitivity. Parameters that were extracted from GIS or taken from other available information were locked in the model and excluded from calibration. During the optimization process, at each model run, PEST slightly changes the parameter values from the previous run and observe its effect on the objective function. By comparing the objective function between two successive iterations, it computes parameters sensitivity which is recorded in the PEST sensitivity file. The calibration parameters sensitivities were investigated using the PEST sensitivity file and most sensitive parameters were identified within the first couple of runs. Finally, the calibration was carried out using these sensitive parameters while insensitive parameters were fixed at physically-sensible values. The parameters those identified as the most sensitive parameters during the sensitivity  To assess the validity of the calibrated model, a model validation process was followed. CS model was validated within the period from 2005 to 2007. During the validation process, the model was run using rainfall and evaporation data for that period without changing the values of the model parameters obtained through the model calibration. In contrast, for the EB model validation, average model parameter which were calculated using the values of the calibrated events (20 events for Myponga catchment and 16 events for Scott Creek catchment) were adopted. Since the events used in the EB model calibration were from the wet part of the year, validation events were purposely selected from the wet period. Therefore, it was expected that by averaging the parameters would generally reflect the initial catchment conditions and the validation would be reasonable.
The performances of CS and EB modelling were evaluated by applying a total of 8 goodness-of-fit tests listed in Table 2. Among these goodness-of-fit tests, the Nash-Sutcliffe Efficiency (NSE) is widely used in comparable hydrological studies [9,15,17,47,[66][67][68][69]. However, according to Jain and Sudheer [67], single values of NSE estimated using full series can be higher for poor models or lower for good models. In order to overcome this problem, NSE was calculated separately for wet seasons (ANSE) as well as for dry seasons (NSE of low flows), in addition NSE for the full series [15,66,70]. Moreover, Gupta et al. [71] reported that NSE is highly sensitive to large runoff volume and may not be linearly linked with model performance. To rectify this issue, they introduced a new goodness-of-fit measure called Kling-Gupta efficiency (KGE) that define the modelling performance in terms of three statistical components: the linear correlation, the bias and the variability of flow. This method has been widely adopted in recent studies including Saber and Yilmaz [69], Massari et al. [70], Kling et al. [72], Camici et al. [73], Dick et al. [74], Thirel et al. [75], etc. In addition to NSE and KGE, coefficient of determination (R 2 ) [17,66,69,76], RSR which is defined as the ratio of root mean square error (RMSE) to the standard deviation of the observed series [77][78][79], absolute volume error [15,47,80,81] and absolute error in peak flow rate [47,80] were also used to assess the modelling performance. Table 2. Selected goodness-of-fit measures to assess the modelling performance.

Goodness-of-Fit Tests Equation Interpretation
Goodness-of-fit value ranges from −∞ to 1, where 1 means perfect fit and 0 means model validation is as accurate as the observed mean over the observed dataset. Less than 0 means the accuracy of the observed mean over the observed dataset is better than the simulated model.
Value ranging from 0 to 1 where 0 means poorest fit and 1 means best fit.
Value ranges from 0 to +∞, with 0 means perfect fit.
Value ranging from 0 to +∞, where 0 means best fit and greater than zero means the model underestimates or overestimates overall flow volume or peak flow. Absolute error in peak flow rate 1 −

Qsim,max Qobs,max
where n is the total number of data points; Q obs,i and Q sim,i are the ith flow value; Q m,obs and Q m,sim are the mean of the observed and simulated flow; Q obs,max and Q obs,min are the maximum and minimum value in the dataset; and ε is a small fraction of the interannual mean flow. The term ε is introduced to avoid the error that may arise during the log transformation of zero data present in the observed and the simulated time series [15,70,82]. In the equation for KGE, r is the linear correlation; β is the bias ratio β = Qm,sim Qm,obs ; and δ is the variability between simulated and observed flow given by δ = CVsim CVobs ; CV being the coefficient of variation.
Note: When scores of NSE, ANSE, NSE for low flows and KGE are higher, model performance is better. When scores of RSR, absolute volume error and absolute error in peak flow rate are higher model performance is poorer.
Using these goodness-of-fit measures, the performance of both models was evaluated under two different scenarios: (i) performance in reproducing the observed streamflow hydrograph (total runoff hydrograph, TRH); and (ii) performance in reproducing the direct runoff hydrograph, DRH. In the first scenario, model performance was evaluated during the calibration period since the calibration was done to match with the observed streamflow hydrograph. To assess the models' performance under the second scenario, baseflow was removed from the observed and the simulated time series before the goodness-of-fit tests were conducted. A variety of baseflow separation techniques are available in the literature [29][30][31][32][33]. The common separation method to calculate the baseflow from the time series records can be performed either by: graphically or by filtering. The graphical methods are generally employed to plot the baseflow component of the flood hydrograph [31,33]. They attempt to identify points on the hydrograph where baseflow intersect the rising and falling limbs of the quick response. In this way, they delimit the entire baseflow hydrograph. Among these graphical methods, it is possible to highlight three approaches: the constant discharge; the constant slope method; and concave method [32]. The filtering separation methods on the other hand use data filtering procedure to separate the baseflow component of streamflow time series [29,30]. They are designed to generate a baseflow hydrograph for a long-term period of observations. The aim of this approach is the generation of an objective, repeatable and easily automated index known as the baseflow index, which is a long-term ratio of baseflow to total streamflow. This study utilized a baseflow separation tool, "BFI+" [31] that uses graphical local minimum method proposed by Sloto and Crouse [33]. The local minimum method searches the minimum flow record within a time interval and assume it as the baseflow component. The interval is defined as −0.5(2N − 1) days before the day being considered to +0.5(2N − 1) days after the day being considered, where N = A 0.2 ; A being the watershed area in square miles [28]. This point is then connected by straight lines to adjacent local minima. The baseflow for each day between these points is estimated by linear interpolation. This method is illustrated in Figure 7.
Value ranging from 0 to 1 where 0 means poorest fit and 1 means best fit.
Value ranges from 0 to +∞, with 0 means perfect fit. Value ranging from 0 to +∞, where 0 means best fit and greater than zero means the model underestimates or overestimates overall flow volume or peak flow.
Absolute error in peak flow rate where n is the total number of data points; , and , are the ith flow value; , and , are the mean of the observed and simulated flow; , and , are the maximum and minimum value in the dataset; and is a small fraction of the interannual mean flow. The term is introduced to avoid the error that may arise during the log transformation of zero data present in the observed and the simulated time series [15,70,82]. In the equation for KGE, r is the linear correlation; β is the bias ratio = , , ; and δ is the variability between simulated and observed flow given by = ; CV being the coefficient of variation.
Note: When scores of NSE, ANSE, NSE for low flows and KGE are higher, model performance is better. When scores of RSR, absolute volume error and absolute error in peak flow rate are higher model performance is poorer. Finally, to assess the overall modelling performance, a statistical framework for interpreting hydrological model performance developed by Ritter and Muñoz-Carpena [76] was adopted. Similar Finally, to assess the overall modelling performance, a statistical framework for interpreting hydrological model performance developed by Ritter and Muñoz-Carpena [76] was adopted. Similar classification can be found in the literature [47,68,77] that assesses the goodness-of-fit into four different performance classes as shown in Table 3.

Parameter Sensitivity
During the optimization process, it was observed that for CS modelling, groundwater parameters, unsaturated zone moisture content, channel roughness and seepage were the most sensitive parameters, while depression storage and Manning's roughness for pervious area were comparatively less sensitive. For EB modelling, unsaturated zone moisture content and channel roughness showed high sensitivity, while surface runoff parameters and groundwater parameters showed comparatively less sensitivity. The sensitivity analysis in PEST suggested that for CS modelling, optimization is mainly governed by the groundwater parameters, unsaturated zone moisture content, channel roughness and seepage characteristics, whereas for EB modelling, the optimization process is mainly driven by the unsaturated zone moisture content and channel roughness.

Model Performance during Calibration and Validation
After the model optimization was completed, the EPA-SWMM model was then run with the optimized parameters. Figure 8a,b compare the observed and the simulated (CS modelling) time series against the rainfall time series at the two study catchments. Visual comparison indicates that the trend in the simulated series resembles the observed series reasonably well. However, it is apparent that, simulations of the extreme events were not highly successful. The EB model was calibrated individually for the selected events at two case study catchments. Figure 9a,b compare the simulated hydrographs by the EB model and the CS method against the observed hydrographs at the Myponga upstream and Scott Creek catchments respectively while the goodness-of-fit values for these events are presented in Tables A1-A4 in Appendix A. It is apparent that, for most of the events, the simulated hydrographs by the EB model resemble the observed hydrographs more closely. Also note that simulated hydrographs for Events 1 and 8 by the two models for Scott Creek are matching each other but not reflecting the observed hydrograph. This could be due to errors in the observed rainfall data. This issue was further evident through the values of the performance indices given in Tables A3 and A4 in Appendix A. Therefore, in calculating average results, (e.g., Table 4), these two events will be excluded. The EB model was calibrated individually for the selected events at two case study catchments. Figure 9a,b compare the simulated hydrographs by the EB model and the CS method against the observed hydrographs at the Myponga upstream and Scott Creek catchments respectively while the goodness-of-fit values for these events are presented in Tables A1-A4 in Appendix A. It is apparent that, for most of the events, the simulated hydrographs by the EB model resemble the observed hydrographs more closely. Also note that simulated hydrographs for Events 1 and 8 by the two models for Scott Creek are matching each other but not reflecting the observed hydrograph. This could be due to errors in the observed rainfall data. This issue was further evident through the values of the performance indices given in Tables A3,A4 in Appendix A. Therefore, in calculating average results, (e.g., Table 4), these two events will be excluded.  Performance of the EB and CS modelling are compared here using the goodness-of-fit measures listed in Table 2 and the assessment framework shown in Table 3. As noted in Table 2, when the scores for either NSE (NSE, ANSE and NSE for low flows) or KGE or R 2 is higher than that of the other model type, it is said to be outperforming. The opposite will apply with RSR, absolute volume  Performance of the EB and CS modelling are compared here using the goodness-of-fit measures listed in Table 2 and the assessment framework shown in Table 3. As noted in Table 2, when the scores for either NSE (NSE, ANSE and NSE for low flows) or KGE or R 2 is higher than that of the other model type, it is said to be outperforming. The opposite will apply with RSR, absolute volume error and error in peak flow criteria. At the Myponga upstream catchment, the average scores of the NSE for full period (NSE), NSE for wet season (ANSE), NSE for low flow season and KGE were estimated as 0.50, 0.56, −0.015 and 0.68, respectively. This indicates that, overall, the model is capable of simulating daily flow series or daily high flow series with a reasonable level of accuracy but incapable of simulating low flow series. Based on the framework of statistical interpretation developed by Ritter and Muñoz-Carpea [76] and Singh et al. [77], the performance level of the CS model calibration can be regarded as "satisfactory". On the other hand, in EB modelling, the average scores of NSE, ANSE, NSE of low flows and KGE were 0.75, 0.76, 0.65 and 0.73 respectively, indicating that the performance level was "good".
In terms of coefficient of determination (R 2 ), EB model (R 2 = 0.82) outperformed the CS model (R 2 = 0.51). RSR value, that is determined by dividing RMSE by the standard deviation of the observed flow series was 0.46 for event based-modelling and 0.64 for CS modelling, which further confirms that EB model performs better than the CS model at the Myponga upstream catchment. Mean daily flow of the observed and the simulated series by CS modelling during the calibration period were 0.27 m 3 /s and 0.25 m 3 /s, respectively, with an overall 9% of absolute volume error. EB modelling shows slightly better performance in simulating observed hydrograph by having only an 8% of absolute volume error. Average percentage error in peak flow rate of 22% and 48% also indicate that EB modelling is better than CS modelling in achieving peak flow objective. Overall, the performance of the EB modelling can be accepted as "good" category and it outperforms the CS modelling at the Myponga upstream.
CS modelling at Scott Creek catchment, achieved goodness-of-fit scores of 0.54, 0.71, 0.68 and 0.38 for NSE, ANSE, NSE of low flows, and KGE respectively. A comparatively high value of ANSE indicates that the model has a high capability of simulating high flows. Based on NSE and ANSE, model performance is at "satisfactory" level (NSE, ANSE > 0.50) although KGE value does not fall within the same category. Performance of the EB model was also found to be at "satisfactory" level by having scores of 0.56, 0.57, 0.51 and 0.63 for NSE, ANSE, NSE of low flows and KGE respectively. The R 2 was 0.75 which also ensured a good agreement between the observed and the simulated flow than that achieved from CS modelling (R 2 = 0.61). Absolute volume error and error in peak flow estimation in EB modelling were 17% and 25% with compared to 48% and 37% in CS modelling. Consequently, though the performance of both models were rated at "satisfactory" level, these measures confirm that the EB model outperformed the CS model at Scott Creek during the calibration period.
As the model goodness-of-fit scores vary from one event to another, recommending one grade ("satisfactory" or "good") to a whole catchment based on an average score can be misleading. Hence, it was decided to examine the performance grade distribution over the events. For example, Figure 10a,b present NSE data for the Myponga upstream and Scott Creek respectively. It is clearly observed from Figure 10a, that while CS model was unsatisfactory for over 70% of the events, EB model was giving top level performance for over 60% of the events at the Myponga catchment. Also observed from Figure 10b, is that 50% of the time both models were not satisfactory while for the majority of the remaining events, EB model was outperforming (having "very good" grade for over 30% of the time) than CS model. These observations suggest that based on performance measured in terms of NSE, EB model outperform the CS model at both the case study catchments. Figure 11 shows percentage of events that each model type outperformed the other in terms of any given goodness-of-fit measures.
Further, as observed in Figure 11a,b, the EB model outperformed the CS model in terms of all selected goodness-of-fit criteria at the Myponga upstream catchment and 6 out of 8 criteria at Scott Creek catchment. More importantly, it is noted that EB model was better performing at greater proportion of the events (95%) at the Myponga while it equally performed as CS model at Scott Creek in achieving peak flow target.  Further, as observed in Figure 11a,b, the EB model outperformed the CS model in terms of all selected goodness-of-fit criteria at the Myponga upstream catchment and 6 out of 8 criteria at Scott Creek catchment. More importantly, it is noted that EB model was better performing at greater proportion of the events (95%) at the Myponga while it equally performed as CS model at Scott Creek in achieving peak flow target.
Outcome of the CS model during the validation period at the study catchments are presented in Figure 12a,b. It is observed from Figure 12 that the resemblance of the predicted flow to the observed flow is poorer than that was observed during the calibration period ( Figure 8) at both the catchments.  Further, as observed in Figure 11a,b, the EB model outperformed the CS model in terms of all selected goodness-of-fit criteria at the Myponga upstream catchment and 6 out of 8 criteria at Scott Creek catchment. More importantly, it is noted that EB model was better performing at greater proportion of the events (95%) at the Myponga while it equally performed as CS model at Scott Creek in achieving peak flow target.
Outcome of the CS model during the validation period at the study catchments are presented in Figure 12a,b. It is observed from Figure 12 that the resemblance of the predicted flow to the observed flow is poorer than that was observed during the calibration period ( Figure 8) at both the catchments. Outcome of the CS model during the validation period at the study catchments are presented in Figure 12a,b. It is observed from Figure 12 that the resemblance of the predicted flow to the observed flow is poorer than that was observed during the calibration period ( Figure 8) at both the catchments.
As noted earlier, EB model validation was done by using the mean parameter values. Four streamflow events were extracted from the validation period to evaluate the validity of the calibrated model. Three out of four extracted events were from the year 2005 because year 2006 was comparatively dry and only one event with a reasonable peak was available for model validation. Figure 13a,b present how the simulated hydrographs by the two methods compare against the observed hydrographs at the Myponga upstream and Scott Creek catchments respectively. It is noted from Figure 13a that the simulated hydrographs by the EB model resemble the observed hydrographs more closely than that of the CS model. In contrast, at Scott Creek catchment, simulated hydrographs of both EB and CS models resemble each other well and follow the trend of the observed hydrographs but both models underestimate the peak flow and the flow volume in 3 out of the 4 events. As noted earlier, EB model validation was done by using the mean parameter values. Four streamflow events were extracted from the validation period to evaluate the validity of the calibrated model. Three out of four extracted events were from the year 2005 because year 2006 was comparatively dry and only one event with a reasonable peak was available for model validation. Figure 13a,b present how the simulated hydrographs by the two methods compare against the observed hydrographs at the Myponga upstream and Scott Creek catchments respectively. It is noted from Figure 13a that the simulated hydrographs by the EB model resemble the observed hydrographs more closely than that of the CS model. In contrast, at Scott Creek catchment, simulated hydrographs of both EB and CS models resemble each other well and follow the trend of the observed hydrographs but both models underestimate the peak flow and the flow volume in 3 out of the 4 events. Table 5 presents average goodness-of-fit scores of the two models at the study catchments during the validation period. As can be seen from the Table 5, at the Myponga upstream catchment, average values of goodness-of-fit measures; NSE, ANSE, NSE of low flows and KGE for CS modelling were 0.40, 0.47, −0.54 and 0.39 respectively while respective values of the EB modelling were 0.32, 0.48, 0.10 and 0.29. Consequently, performance of the both models are at "unsatisfactory" level during the validation period. On the other hand, at Scott Creek catchment, based on average scores of NSE, ANSE and KGE, the performance of the EB model can be rated as "unsatisfactory" while that of the CS model can be accepted as "satisfactory". The R 2 and RSR values also support these claims. Absolute volume error resulting from the two models were reasonably high at the Myponga while  Table 5 presents average goodness-of-fit scores of the two models at the study catchments during the validation period. As can be seen from the Table 5, at the Myponga upstream catchment, average values of goodness-of-fit measures; NSE, ANSE, NSE of low flows and KGE for CS modelling were 0.40, 0.47, −0.54 and 0.39 respectively while respective values of the EB modelling were 0.32, 0.48, 0.10 and 0.29. Consequently, performance of the both models are at "unsatisfactory" level during the validation period. On the other hand, at Scott Creek catchment, based on average scores of NSE, ANSE and KGE, the performance of the EB model can be rated as "unsatisfactory" while that of the CS model can be accepted as "satisfactory". The R 2 and RSR values also support these claims. Absolute volume error resulting from the two models were reasonably high at the Myponga while CS model at Scott Creek was surprisingly achieving good volume balance (2% of absolute volume error). Poor model performance during validation may be due to several reasons including poor data quality, overfitting the model during calibration and spatial and temporal variations in catchment characteristics between the calibration and validation periods. Further, the goodness-of-fit scores estimated during the validation period cannot be taken as reliable as the calculation was based on only 4 events. However, considering the observations made in Figure 12, a good agreement between the rainfall hyetograph and simulated flow hydrograph at both catchments suggests that, the validation of the CS model can be considered as acceptable. Table 5. Goodness-of-fit scores of EB and CS modelling during the validation period.

Model Performance in Reproducing Direct Runoff Hydrograph (DRH)
As stated before, EB and CS modelling performances were assessed under two different scenarios: (i) the performance in reproducing TRHs; and (ii) the performance in reproducing DRHs. Results of the scenario 1, were presented in Figure 9. As observed in Figure 9 and discussed in Section 3.2, at both Myponga upstream and Scott Creek catchments, simulated total hydrographs of the EB modelling showed relatively better agreement with the observed hydrographs than that of the CS modelling. Further, EB model performance was rated as "good" at the Myponga upstream and "satisfactory" at Scott Creek based on average scores of goodness-of-fit measures; NSE, ANSE and  Table 5. Goodness-of-fit scores of EB and CS modelling during the validation period. Poor model performance during validation may be due to several reasons including poor data quality, overfitting the model during calibration and spatial and temporal variations in catchment characteristics between the calibration and validation periods. Further, the goodness-of-fit scores estimated during the validation period cannot be taken as reliable as the calculation was based on only 4 events. However, considering the observations made in Figure 12, a good agreement between the rainfall hyetograph and simulated flow hydrograph at both catchments suggests that, the validation of the CS model can be considered as acceptable.

Model Performance in Reproducing Direct Runoff Hydrograph (DRH)
As stated before, EB and CS modelling performances were assessed under two different scenarios: (i) the performance in reproducing TRHs; and (ii) the performance in reproducing DRHs. Results of the scenario 1, were presented in Figure 9. As observed in Figure 9 and discussed in Section 3.2, at both Myponga upstream and Scott Creek catchments, simulated total hydrographs of the EB modelling showed relatively better agreement with the observed hydrographs than that of the CS modelling. Further, EB model performance was rated as "good" at the Myponga upstream and "satisfactory" at Scott Creek based on average scores of goodness-of-fit measures; NSE, ANSE and KGE. On the other hand, CS modelling showed a lower performance, i.e., "satisfactory" at Scott Creek and "unsatisfactory" at the Myponga upstream catchment.
In the second scenario, that is performance in reproducing DRHs was assessed. To obtain DRH, baseflow was first separated from the observed and the simulated hydrographs following the procedure described in Section 2.5. The baseflow separation of the observed time series and CS modelling time series at the Myponga upstream are presented in Figure 14a,b. It is apparent from Figure 14 that baseflow component of the simulated time series is relatively higher than that of the observed time series. Therefore, it is vital to understand how two modelling methods perform in predicting DRHs. KGE. On the other hand, CS modelling showed a lower performance, i.e., "satisfactory" at Scott Creek and "unsatisfactory" at the Myponga upstream catchment.
In the second scenario, that is performance in reproducing DRHs was assessed. To obtain DRH, baseflow was first separated from the observed and the simulated hydrographs following the procedure described in Section 2.5. The baseflow separation of the observed time series and CS modelling time series at the Myponga upstream are presented in Figure 14a,b. It is apparent from Figure 14 that baseflow component of the simulated time series is relatively higher than that of the observed time series. Therefore, it is vital to understand how two modelling methods perform in predicting DRHs.   Table 6 compares the average scores of the 8 selected goodness-of-fit criteria in reproducing DRHs by the two modelling methods at the two case study catchments. The individual scores of each event are presented in Appendix A, Tables A5-A8. The average scores of NSE, ANSE, NSE of low flows and KGE for EB modelling at the Myponga upstream were 0.75, 0.73, 0.86 and 0.74 respectively whereas respective scores for CS modelling were 0.26, 0.20, 0.55 and 0.45. These values clearly classify the performance of the EB model into a "good" level while that of the CS into a "unsatisfactory" level. This results further confirms by having the R 2 of 0.82 and 0.59 and, RSR of 0.46 and 0.78 for the EB modelling and CS modelling respectively. Smaller values of the absolute volume error and the absolute error in peak flow rate by the EB model as compared to the values of the CS model also support the above claim, that is, in reproducing DRHs at the Myponga upstream catchment, EB model is better than CS model. Following the same framework with the average scores of model performance measures, both the EB and CS modelling performance in estimating DRHs at Scott Creek can be classified as "satisfactory". Scores of R 2 , RSR, absolute volume error and error in peak flow rate further confirm that two models perform equally. Thus, for the second scenario, it is apparent from the average scores of the goodness-of-fit measures that EB model performs slightly better at the Myponga upstream catchment while performance of the EB and CS models at Scott Creek catchment was nearly the same. Further, detailed analyses of the performance scores as presented in Figures 15 and 16 clearly indicate that percentage of the events where EB model outperformed was considerably higher than that by the CS model in reproducing DRHs at the Myponga catchment and that the classification as "good" is a reasonable assessment of the performance level. Similarly, at Scott Creek catchment, there's no great difference in performance in the two methods and both can be accepted as satisfactory.
Water 2019, 11, 611 23 of 33 support the above claim, that is, in reproducing DRHs at the Myponga upstream catchment, EB model is better than CS model. Following the same framework with the average scores of model performance measures, both the EB and CS modelling performance in estimating DRHs at Scott Creek can be classified as "satisfactory". Scores of R 2 , RSR, absolute volume error and error in peak flow rate further confirm that two models perform equally. Thus, for the second scenario, it is apparent from the average scores of the goodness-of-fit measures that EB model performs slightly better at the Myponga upstream catchment while performance of the EB and CS models at Scott Creek catchment was nearly the same. Further, detailed analyses of the performance scores as presented in Figures 15 and 16 clearly indicate that percentage of the events where EB model outperformed was considerably higher than that by the CS model in reproducing DRHs at the Myponga catchment and that the classification as "good" is a reasonable assessment of the performance level. Similarly, at Scott Creek catchment, there's no great difference in performance in the two methods and both can be accepted as satisfactory.

Summary and Outcome of the Study
Stormwater management model, EPA-SWMM was used to study the performance of EB and CS modelling at the Myponga upstream and Scott Creek catchment. While setting up the model, some parameter values were obtained from GIS and DEM, some using the recommended values in the EPA-SWMM manual and the others were assumed either realistic or default values.
Model calibrations were conducted using rainfall, evapotranspiration and observed runoff time series from 2001 to 2004. While the complete time series during the calibration period was reproduced by CS method, several events (20 at the Myponga upstream catchment and 16 at Scott Creek catchment) were reproduced through EB modeling. In order to compare results of the two methods, corresponding events from the simulated time series by the CS model was also extracted. Following the calibration, these models were validated and modelling performance in reproducing both TRHs and DRHs were assessed.
Performances of the EB modelling and CS modelling were assessed using a total of 8 goodnessof-fit measure. The scores of each goodness-of-fit criterion was averaged, and a single score was defined for each catchment for using the framework proposed by Ritter and Muñoz-Carpena [76] to classify performance levels. The four levels of performance are: (i) very good (NSE, ANSE, KGE > 0.75); (ii) good (0.65 < NSE, ANSE, KGE < 0.75); (iii) satisfactory (0.50 < NSE, ANSE, KGE < 0.65); and (iv) unsatisfactory (NSE, ANSE, KGE < 0.50). The results are summarized in Table 7. The above observations and remarks made based on the average scores of goodness-of-fit measures were further validated by comparing the percentage events of the two modelling methods that fall into each performance grades (Figures 10 and 15) as well as by comparing the percentage events that each model type outperformed the other in terms of all eight selected goodness-of-fit measures (Figures 11 and 16). Based on the above analysis and observations, it can be summarizing

Summary and Outcome of the Study
Stormwater management model, EPA-SWMM was used to study the performance of EB and CS modelling at the Myponga upstream and Scott Creek catchment. While setting up the model, some parameter values were obtained from GIS and DEM, some using the recommended values in the EPA-SWMM manual and the others were assumed either realistic or default values.
Model calibrations were conducted using rainfall, evapotranspiration and observed runoff time series from 2001 to 2004. While the complete time series during the calibration period was reproduced by CS method, several events (20 at the Myponga upstream catchment and 16 at Scott Creek catchment) were reproduced through EB modeling. In order to compare results of the two methods, corresponding events from the simulated time series by the CS model was also extracted. Following the calibration, these models were validated and modelling performance in reproducing both TRHs and DRHs were assessed.
Performances of the EB modelling and CS modelling were assessed using a total of 8 goodness-of-fit measure. The scores of each goodness-of-fit criterion was averaged, and a single score was defined for each catchment for using the framework proposed by Ritter and Muñoz-Carpena [76] to classify performance levels. The four levels of performance are: (i) very good (NSE, ANSE, KGE > 0.75); (ii) good (0.65 < NSE, ANSE, KGE < 0.75); (iii) satisfactory (0.50 < NSE, ANSE, KGE < 0.65); and (iv) unsatisfactory (NSE, ANSE, KGE < 0.50). The results are summarized in Table 7. The above observations and remarks made based on the average scores of goodness-of-fit measures were further validated by comparing the percentage events of the two modelling methods that fall into each performance grades (Figures 10 and 15) as well as by comparing the percentage events that each model type outperformed the other in terms of all eight selected goodness-of-fit measures (Figures 11 and 16). Based on the above analysis and observations, it can be summarizing that EB modelling has better capability in estimating both TRH and DRH at the Myponga upstream and Scott Creek catchments. In specific to the selected study catchments, it is suggested to employ EB modelling approach to estimate runoff hydrographs, design flood and further planning and management of these catchments.

Modelling Errors, Limitations and Future Directions
During the model development, a number of assumptions and simplifications were made to simplify complex hydrological processes which can contribute to have increased errors in the model output. According to Coon and Reddy [83], errors in a hydrological model can arise from various sources including: (i) errors in precipitation and streamflow data; (ii) limitations in model structure; (iii) errors in model calibration; (iv) inaccurate estimation and interpretation of land-use and land-cover data; and (v) changes in land-use during the simulation period. While developing the model, the precipitation data used were obtained from point-based rainfall gauge stations and then applied uniformly over a specified area around the gauging stations defined by the Thiessen-polygon method. In case of a large Thiessen-polygon boundary, spatial variation of rainfall becomes significant and the actual rainfall in the subcatchments could be significantly different to the rainfall measured at the gauging stations. This may add errors to the model output.
Limitations in model structure can affect the model output and therefore can lead to inaccurate estimation by the model. For instance, during the model setup, infiltration into soil and unsaturated zone moisture content was assumed to be uniform over the entire subcatchment. These assumptions can have a significant impact on soil moisture characteristics. In the EPA-SWMM model, the soil moisture content in the unsaturated zone controls flow from the unsaturated zone to the saturated zone. Therefore, such simplifications can lead to estimation errors. While groundwater flow is a complex process, EPA-SWMM represents the groundwater flow process by a simple groundwater module which considers the total groundwater flow into a stream as a sum of interflow and baseflow. During dry periods, when all the groundwater from catchment discharges into streams, the system becomes empty and EPA-SWMM assumes that there is no input from groundwater and it does not allow water entering to catchment from another groundwater zone.
Another major source of error in RR modelling is the error arise through improper model calibration. While in automatic calibration, either local optimization routine or global optimization routine can be chosen to use, this study used local optimization procedure where parameters were assigned some initial values and a possible range. It has been found that different initial values can affect the objective function, which affects the overall calibration. This is a limitation of local optimization. In contrast, global optimization does not require initial start points, rather all possible start points are considered. Therefore, the use of global optimization may reduce the optimization error, thereby increasing the modelling performance.
In this study, land-use information was obtained from aerial photographs and GIS which can be open to misinterpretation and misclassification. It was assumed that land-use would remain unchanged during the study period. However, this assumption is not realistic and land-use can change with time. For future studies it is recommended to explore possible options that allows to input a variable land-use data in the model during the modelling and the validation period. In fact, it is suggested that a land transformation model be integrated into EPA-SWMM model in order to allow more realistic representation of land-use changes.
There are a range of recommendations that can be made for the future to improve the modelling performance. During the setup of the EPA-SWMM model, daily time step data was used due to the lack of sub-daily data. For future studies, it is recommended to use sub-daily data if possible. While due to the unavailability of concurrent continuous data for the selected gauging stations, this study used only eight years of time series data, future studies should consider longer periods of time series data to come up with a more representative result. Due to the unavailability of the grid-based data, this study was conducted using point-based data, in future studies, it is recommended to use grid-based rainfall data that is more accurate than point-based data. Due to the time limitation, this study was limited to two catchments only, in future, it is recommended to assess modelling performance using more catchments before results can be generalised as RR modelling performance can vary from one catchment to another.

Conclusions
This paper presents a comparative study of event-based and continuous simulation rainfall-runoff modelling performances in reproducing total runoff hydrograph and direct runoff hydrograph. Two catchments in South Australia: Myponga upstream and Scott Creek were used as case studies and EPA-SWMM model was used to formulate event-based and continuous simulation models for these catchments. The outcome of this study can be highlighted through the following findings: • At the case study catchments, event based modelling performs better than continuous simulation modelling in reproducing both total runoff hydrograph and direct runoff hydrograph. • Event-based and continuous simulation modelling performance can vary from catchment to catchment. For instance, at Myponga upstream, both modelling performances differ greatly whereas at Scott Creek both have nearly equal performances. Therefore, before selecting a RR model, it is recommended to evaluate both modelling performances for any catchment of interest. • While assessing the suitability of an event-based or continuous simulation model for any specific project, goodness-of-fit tests that are used to assess them should reflect the objectives of the project because the same model can exhibit different level of performance under different goodness-of-fit tests used. In fact, it is recommended to use a combination of goodness-of-fit tests rather than a single test to reach a more robust conclusion.
This study clearly demonstrates the ability of event-based and continuous simulation modelling at Myponga upstream and Scott Creek catchments and confirms that event-based modelling is more appropriate in producing a realistic hydrograph there. Therefore, this approach is recommended for further planning and management of these catchments. Acknowledgments: The authors would like to acknowledge the Bureau of Meteorology and the Department for Environment and Water for providing necessary data for the project.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Event-based modelling performances in reproducing TRHs at the Myponga upstream catchment (Figure 9a).