Quality Assessment of Small Urban Catchments Stormwater Models: A New Approach Using Old Metrics

: Small urban catchments pose challenges in applying performance metrics when comparing measured and simulated hydrographs. Indeed, results are hampered by the short peak ﬂows, due to rainfall variability and measurement synchronization errors, and it can be both difﬁcult and inconvenient to remove base ﬂows from the analysis, given their inﬂuence on combined sewer overﬂow (CSO) performance. A new approach, based on the application of metrics to peak ﬂows for a selected set of different durations, is proposed and tested to support model quality assessment and calibration. Its advantages are: avoiding inconveniences arising from lags in peak ﬂows and subjectivity of possible adjustments; favouring the assessment of the inﬂuence of base ﬂow variability and ﬂow lamination by CSOs; promoting integrated analysis for a wide range of rainfall events; facilitating bias identiﬁcation and also guiding calibration. However, this new approach tends to provide results (e.g., for NSE, r 2 and PBIAS) closer to optimal values than when applying metrics to compare the measured and simulated values of hydrographs, so the comparison of results with thresholds widely used in the literature should be done with caution. The various case study examples highlight the importance of using a judicious set of different metrics and graphical analyses.


Introduction
Urban drainage models usually aim at supporting decisions on flooding mitigation, pollution control, sewer system management and, increasingly, city planning and regeneration [1][2][3]. Depending on the objectives of the work and the availability of hydrological and asset data, the models can be either distributed and physically based or more aggregated and conceptual [4][5][6][7]. The increasing implementation of decentralized measures and nature-based solutions, i.e., solutions that seek to replicate natural phenomena across the catchment, often requires distributed models and, in some cases, more detailed modelling of infiltration, evapotranspiration and water quality [8,9].
Calibration requirements also depend on the objectives of the study, the availability of data and the time and resources needed to collect data [10]. Calibration tends to be more demanding and conservative for the variables and scales most relevant for the study purpose. For example, in a model developed for flood risk assessment, emphasis is put on calibrating peak flows of the most severe storms [11,12], while in a model designed to manage combined sewer overflows (CSO), greater accuracy is required in calibrating the volumes of hydrographs for a much wider range of events, which includes medium and small magnitude rainfall [13][14][15]. In the latter case in particular, daily and seasonal variations in dry weather flow [16] and rainfall derived infiltration and inflow (RDII) may play a very important role [17][18][19][20][21].
The number of rain events recommended for calibration and verification and the representativeness of the monitored events also depend on the objectives for which the However, the temporal and spatial scales of urban catchments and the issues related to wet weather discharges (from both CSO and Sanitary Sewer Overflows, SSO) pose challenges that have not yet been sufficiently discussed, in particular when adopting thresholds for performance ratings proposed on the basis of other realities. The variability of the dry weather flow and the RDII can substantially contribute to the uncertainty of the model results concerning CSO and SSO discharges (CSO structures are usually designed to carry 3 to 6 times the average daily dry weather flow to the wastewater treatment plant). In these cases, the traditional approach of removing base flows to calibrate or assess the accuracy of the hydrological model becomes particularly difficult and debatable.
This article presents a brief description and discussion in the light of urban drainage of the most used metrics for model calibration and quality assessment, which are then applied to small urban catchment models with different levels of performance. An innovative approach more suited to the challenges of these catchments is proposed and the results are discussed in detail.

Nash-Sutcliffe Efficiency Coefficient (NSE)
The Nash-Sutcliffe Efficiency coefficient (NSE, Equation (1)) [31] is probably the most used dimensionless coefficient to evaluate the performance of hydrological models [33,35]. NSE is a normalized statistic that compares the mean squared error (MSE) with the variance of the observed data (σ o 2 ): where o i is the observed value at timestep i or at the ith preselected time "such as peaks or troughs in the hydrograph" [31]; P i is the predicted value at timestep i or at the ith preselected time; O is the mean of the measured values; n is the total number of observations. In the literature, the population variance has been used instead of sample variance (s 2 = ∑ n i=1 O i − O 2 /(n − 1)), since the sample comprises all timesteps of each hydrograph, i.e., it is equal to the population. NSE varies between −∞ and 1, with higher values indicating a better fit of the model to the observed data. NSE values lower than zero indicate that the model is a worse predictor than the mean of the observations.
The NSE value has been widely interpreted as a classic skill score of models, for various hydraulic and water quality variables [37,45]. The performance ratings proposed by Moriasi et al. (2007) [36] for a watershed scale have been widely used in the literature. According to [36], the model simulation can be judged as "satisfactory" for NSE > 0.50, "Good" for NSE > 0.60 and "Very good" for NSE > 0.75. In Moriasi et al. (2015) [39], the thresholds for the ratings of "Good" and "Very good" were raised to 0.70 and 0.80, for the annual, monthly and daily temporal scales, respectively.
In urban catchments, important rainfall events last a few hours or even minutes, and therefore the time step of measurement records is usually only of a few minutes. The urban drainage literature reports NSE values from the calibration of hydrologic or hydraulic models, normally between 0.5 and 0.9 [2], most of them being greater than 0.7 [7,10,20,22,[46][47][48][50][51][52][53]. However, in some cases, NSE values above 0.95 are reported [1,53,55], while in other specific cases, usually for water quality parameters, very low values, close to zero, are considered as acceptable [9].
Despite NSE being widely used, there has been lengthy discussion about its limitations and suitability, and, as such, several modifications and more refined criteria have been proposed.

Kling-Gupta Efficiency Coefficient (KGE)
The Kling-Gupta Efficiency (KGE, Equation (5) or Equation (6)) [37] is one of the criterion proposed to overcome some limitations of the NSE, which has been increasingly used in recent years, in urban drainage also (e.g., [9,48,51,54]). The KGE results from the decomposition of the NSE into three distinctive components representing the correlation, the bias, and a measure of the relative variability between predicted and observed values.
where r is the linear correlation coefficient between predicted and observed values; α is the ratio between the standard deviations of predicted and observed values (σ m /σ o ); and β is the ratio between the mean predicted and mean observed flows (µ p /µ o ), representing the bias. Like the NSE, KGE varies between −∞ and 1, where KGE = 1 corresponds to an absolute agreement between the model and the observations. Although KGE thresholds are often considered the same as NSE ones in model classification, ref. [45] demonstrates that a model is a better predictor than the mean of the observed data if KGE > 1 − √ 2 = −0.41, whereas with NSE, this only happens for NSE > 0.

Percent Bias (PBIAS)
The Percent bias, or Percent deviation (PBIAS, Equation (7)) is used to assess the average tendency of the predicted data to be smaller or greater than the observed data. Positive values of PBIAS indicate underestimation, and negative values indicate overestimation.
where the variables have the same meaning as those in Equation (1). Other names have been given to indicators calculated in a similar manner to PBIAS [36]: Percent streamflow volume error (PVE), prediction error (PE), and percent deviation of streamflow volume (Dv). Absolute values of PBIAS < 10% have been usually classified as very good [36,49]. However, in Moriasi et al. (2015) [39], the PBIAS thresholds for the ratings of "Satisfactory", "Good" and "Very good" were reduced to <±15%, <±10% and <±5%, for all daily, monthly and annual scales, respectively.

Root Mean Square Error (RMSE)
Both the root mean square error (RMSE) and the Mean Absolute Error (MAE) describe the difference between model simulations and observations in the units of the variable. The RMSE (Equation (8)) is less intuitive than the MAE, but tends to be more used in hydrological studies because it penalizes the largest errors more severely [32,33,35,36].
Reduced RMSE values are associated with smaller errors, with the null value corresponding to a model that perfectly fits the measured data. where the variables have the same meaning as those in Equation (1). Another advantage of RMSE is that, by assuming that the errors are unbiased and the error mean is zero, the RMSE is the standard deviation of the error [56]. Therefore, if the errors are normally distributed, 68% of errors will lie within one standard deviation of the mean, and 95% of the errors will lie within 1.96 (≈2) standard deviations of the mean [8].
Hence, the estimated interval of errors at a 95% probability (which we will call I95) can be expressed as in Equation (9): For the normalization of the RMSE to a dimensionless coefficient, different methods are found in the literature. The most common ones consist of dividing the RMSE by the following parameter of the observed values: mean; standard deviation; difference between the maximum and the minimum; and interquartile range, i.e., the difference between 25th and 75th percentile. In this work we will use the first two.
The first one is known as the Coefficient of Variation of the RMSE (CVRMSE or CV(RMSE), Equation (10)).
The second one, named by [36] as the RMSE-observations standard deviation ratio (RSR), is expressed as in Equation (11).

Linear Regression Coefficients and Graph
The slope, the y-intercept and the coefficient of determination (r 2 ) of the linear regression that best fits the simulation results to the observed values can provide important information about the model's quality. While the slope reflects the relative relationship between observed and predicted data, the y-intercept indicates a bias [36]. The coefficient of determination describes the amount of observed dispersion that is explained by the regression. It varies between 0 and 1, where 1 is for perfect alignment [34].
The coefficient of determination is also one of the most recommended criteria for calibrating and evaluating the accuracy of hydrological models. In Moriasi et al. (2015) [39], the r 2 thresholds for the ratings of "Satisfactory", "Good" and "Very good" are, respectively, 0.60, 0.75 and 0.85, for the daily flow at catchment scale.
However, there is also a consensus on the need to carefully interpret the linear regression parameters, due to the great weight in the results of the highest values, which can be significantly assisted by the analysis of linear regression plots. See the example in Figure 1, in which two series of 14 dots each have the same linear regression slope (slope = 0.906), but different behaviours: in (a), twelve values correspond to y = x and the two largest values to y = 0.9x; in (b), the four highest values correspond to y = x, but the other ten values correspond to y = 3x. In (a), the coefficient of determination is practically equal to unity, showing an excellent correlation between x and y for almost all values, but the slope of 0.9 indicates that a reduced number of intense events is underestimated by 10%. In (b), the coefficient of determination is still quite high from a statistical point of view (r 2 = 0.85) and the model is excellent for the most intense events; however, the slope of 0.9 and a high y-intercept value compared to the smallest events result from the model being too poor for most events.
to unity, showing an excellent correlation between x and y for almost all values, but the slope of 0.9 indicates that a reduced number of intense events is underestimated by 10%. In (b), the coefficient of determination is still quite high from a statistical point of view (r 2 = 0.85) and the model is excellent for the most intense events; however, the slope of 0.9 and a high y-intercept value compared to the smallest events result from the model being too poor for most events.
(a) (b) Figure 1. Example of two series with the same linear regression slope and good coefficients of determination, but with very different behaviours.

Challenges from Short-Duration Peak Flows in the Application of Metrics
The NSE, KGE and error statistics are commonly used to compare time series between modelled and observed values. However, in small urban and natural catchments, including ephemeral streams [57,58], many significant peak flows occur during very short periods of time. The spatial variability of rainfall and small desynchronizations between rain and flow measurement equipment can lead to delays or advances of the modelled series in relation to the observed series of only a few minutes, but with a significant impact on the statistical results. Figure 2 shows the measured and modelled hydrographs of the most intense storm monitored in the case study presented below. Table 1 compares the results of the statistics described in the previous section for four scenarios: (a) the case represented in Figure 2; (b) the measured flow rate advanced by 2 min; (c) the measured flow rate delayed by 4 min; and (d) measured flow delayed by 6 min.

Challenges from Short-Duration Peak Flows in the Application of Metrics
The NSE, KGE and error statistics are commonly used to compare time series between modelled and observed values. However, in small urban and natural catchments, including ephemeral streams [57,58], many significant peak flows occur during very short periods of time. The spatial variability of rainfall and small desynchronizations between rain and flow measurement equipment can lead to delays or advances of the modelled series in relation to the observed series of only a few minutes, but with a significant impact on the statistical results. Figure 2 shows the measured and modelled hydrographs of the most intense storm monitored in the case study presented below. Table 1 compares the results of the statistics described in the previous section for four scenarios: (a) the case represented in Figure 2

Challenges from Short-Duration Peak Flows in the Application of Metrics
The NSE, KGE and error statistics are commonly used to compare tim between modelled and observed values. However, in small urban and natural catc including ephemeral streams [57,58], many significant peak flows occur during ve periods of time. The spatial variability of rainfall and small desynchronizations rain and flow measurement equipment can lead to delays or advances of the m series in relation to the observed series of only a few minutes, but with a significan on the statistical results. Figure 2 shows the measured and modelled hydrographs of the most inten monitored in the case study presented below. Table 1 compares the results of the described in the previous section for four scenarios: (a) the case represented in F (b) the measured flow rate advanced by 2 min; (c) the measured flow rate dela min; and (d) measured flow delayed by 6 min.   According to the results in Table 1, scenario (b) is the one that leads to the best results, with values of NSE, KGE, slope and coefficient of determination very close to the unity. However, NSE is less than 0.5 and 0.35 for scenarios (c) and (d), respectively, with only 4 and 6 min of rainfall delay. The error values also rise significantly for scenarios (c) and (d).
These results highlight the significant impact that small time deviations between measured and simulated series can have on the results of various metrics.

The Proposed New Approach
In a context of an increasingly widespread adoption of decentralized and nature-based solutions, the measures to be modelled will influence the entire urban water cycle, covering small to heavy rainfall. Therefore, assessing the shape of hydrographs for a wide range of rainfall events is of great importance. In order to strengthen the assessment of the shape of hydrographs and reduce the inconveniences of the event-by-event analysis described above, a new approach is proposed to assess the quality of hydrological models.
Rather than the performance metrics being applied to compare measured and simulated values within each hydrograph (and/or the measured and simulated peak flows of the various hydrographs), they could be applied to compare measured and simulated maximum flows for various durations. For each duration, the measured and simulated maximum flow series can be easily calculated by applying a rolling-window search routine to each hydrograph.
Hence, the assessment of model results is performed simultaneously for a pre-selected set of durations from all hydrographs. To avoid excessive complexity in the analysis, it is important to use a limited but representative number of durations, so we recommend selecting five to eight durations with increasing intervals between them. For the case study presented below, the maximum flow rates associated with the following durations will be assessed: 2, 6, 16, 30, 60, 104 and 150 min. Table 2 presents an example of the application of the metrics described in Section 2 to the durations selected in the case study, according to the proposed approach. In addition to numerical metrics, graphical techniques must also be applied, as mentioned above and will be presented in the case study. By evaluating peak flows for a wide range of durations, this new approach favours the assessment of the shape of hydrographs, as well as the effect of the uncertainty of both base flows and CSO discharges.
This new approach will be applied to assess eight different models (or modelling conditions) of the case study.

Case Study
The study area is located at Odivelas, a 26.5 km 2 municipality of the Lisbon Metropolitan Area, Portugal, and is 102 ha in size. It consists of two distinct catchments: a combined catchment, with 22 ha, of which the main sewer receives the foul flow from a 400 mm upstream interceptor sewer; a mixed and partially separate catchment upstream, with about 80 ha, served by the mentioned interceptor sewer. For wet weather, the interceptor sewer transports a mixture of wastewater and stormwater, sometimes under pressure.
The two catchments have mainly a residential occupation with some commerce. In the downstream combined catchment, most sewers are built of concrete (Manning coefficient of 0.014 s·m −1/3 ) and have a circular cross-section of 300 and 400 mm, increasing up to 1000 mm downstream. The sewer slopes are close to those of the terrain, ranging from 0.3 to 11% and with a mean and median of 2.8% and 2.3%, respectively. The percentages of paved, roofed and green areas are 46%, 35%, and 19%, respectively. However, only about 70% of the runoff from impermeable areas drains into the sewer network, due to drainage to backyards and the insufficient number of inlet devices.
Over a decade ago, the downstream combined catchment was modelled in detail using the Stormwater Management Model (SWMM) [59], with 86 sub-catchments, 145 nodes and 153 sewer branches. However, the drainage system of the upstream mixed catchment is complex and is not known in detail, so it was modelled in an aggregated way using SWMM, considering only the two main combined sewer overflows (CSO) and two sub-catchments, with 68 ha and 12 ha ( Figure 3). Currently, the model is intended to be used to study stormwater manage measures distributed within the combined catchment, with a view to reduce discharges, mitigate floods and improve the urban water cycle. Therefore, it is impo to improve the calibration of the upstream aggregated model (modelled with only the main CSO and two sub-catchments) and, hence, to model more accurately the f generated in the combined catchment downstream. Between sections B1-I and B1-M is a small CSO structure that shaves off the highest flows measured in B1-I, which c not be monitored and has not been modelled in the past. This CSO adds complexity t model and its calibration requires a detailed quantification of the shape of hydrographs in B1-I and B1-M.
The variability of base flows during rainfall events also plays an important role in the estimation of the wet weather overflow discharges and in the calibration of this For both catchments, the model was calibrated and verified on basis of data from a 4-month monitoring survey, in which 26 rainfall events were recorded by two rain gauges and two flowmeters. One flow meter was installed in the interceptor sewer, a few meters upstream of the entrance to the combined catchment (section B1-I), and the other was installed downstream from the combined catchment (section B1-M).
The peak flow of the most intense monitored event reached 80% of the maximum capacity of the combined system, of 1530 L/s, in a sewer downstream B1-M (and 60% of the capacity in B1-M, which already has a maximum diameter of 1 m). Downstream B1-M, the sewer is under pressure for return periods greater than 2-5 years and flooding occurs for return periods greater than 5-10 years.
As the complexity of the upstream catchment behavior did not allow us to obtain good calibration and verification results in the section B1-I, part of the underestimation of the flows in the interceptor sewer was compensated by some overestimation of the flows in the downstream combined catchment, allowing us to obtain very good results in B1-M. Thus, the model was left with a "black box" component inside, but it was quite adequate for the purpose of the study at the time. The model was used to evaluate the CSO discharges from the downstream catchment, both by event-by-event analyses [60,61] and using a 19-year rainfall historical series [62,63]. Currently, the model is intended to be used to study stormwater management measures distributed within the combined catchment, with a view to reduce CSO discharges, mitigate floods and improve the urban water cycle. Therefore, it is important to improve the calibration of the upstream aggregated model (modelled with only the two main CSO and two sub-catchments) and, hence, to model more accurately the flows generated in the combined catchment downstream. Between sections B1-I and B1-M there is a small CSO structure that shaves off the highest flows measured in B1-I, which could not be monitored and has not been modelled in the past. This CSO adds complexity to the model and its calibration requires a detailed quantification of the shape of the hydrographs in B1-I and B1-M.
The variability of base flows during rainfall events also plays an important role both in the estimation of the wet weather overflow discharges and in the calibration of this CSO structure. Figure 4 shows the hourly variation of the average and median flow (Qav and Qmedian) for the weekdays, as well as the 10th, 25th, 75th and 90th percentiles (Q10, Q25, Q75 and Q90). The horizontal lines represent the same statistics for the daily flows.

Model Recalibration and Verification
Nine from the 26 events were selected for the sensitivity analysis of the parameters and for the recalibration of the models: events 1, 4, 7, 9, 13, 16, 19, 21 and 24. The selected events include five of the eight events that led to flow rates above the discharge threshold of the CSO structure located between sections B1-I and B1-M.
The other 17 events were used to verify the models. The recalibration of the partially separate upstream catchment consisted mainly of adjusting the contributing areas of the two sub-catchments and the flow capacity of the respective CSO structures. The recalibration of the downstream catchment consisted mainly of calibrating the discharge capacity of the CSO structure between B1-I and B1-M, adjusting the impermeable area and improving the shape of the hydrographs, through the slope and width of the sub-catchments.
Calibration was carried out manually based on volumes, peak flows and the shape of the hydrographs.
All 26 events were used to assess the quality of the models based on the proposed new approach described in Section 4.1, given that only five calibration and three verification events led to discharges in all CSO structures.
However, during the quality assessment of the global model in B1-M by the proposed approach and using the 26 events, it was determined that a small correction to the shape of the hydrographs should be done by increasing the slope and the width of the As Figure 4 shows, there is substantial variability in the daily dry weather flows, particularly in the downstream section. This variability is attributed to three main factors: the activities in the catchment; the groundwater infiltration into the sewers, although there is not a sufficiently long series to make it possible to model the RDII component; as well as to the measurement error, which significantly depends on the cleanliness and the accumulation of debris on the submerged pressure and velocity gauges.

Model Recalibration and Verification
Nine from the 26 events were selected for the sensitivity analysis of the parameters and for the recalibration of the models: events 1, 4, 7, 9, 13, 16, 19, 21 and 24. The selected events include five of the eight events that led to flow rates above the discharge threshold of the CSO structure located between sections B1-I and B1-M.
The other 17 events were used to verify the models. The recalibration of the partially separate upstream catchment consisted mainly of adjusting the contributing areas of the two sub-catchments and the flow capacity of the respective CSO structures. The recalibration of the downstream catchment consisted mainly of calibrating the discharge capacity of the CSO structure between B1-I and B1-M, adjusting the impermeable area and improving the shape of the hydrographs, through the slope and width of the sub-catchments.
Calibration was carried out manually based on volumes, peak flows and the shape of the hydrographs.
All 26 events were used to assess the quality of the models based on the proposed new approach described in Section 4.1, given that only five calibration and three verification events led to discharges in all CSO structures.
However, during the quality assessment of the global model in B1-M by the proposed approach and using the 26 events, it was determined that a small correction to the shape of the hydrographs should be done by increasing the slope and the width of the catchments and slightly decreasing the contribution of the impervious area. Hence, the set of 26 events was initially used to assess the quality of the model and later to enhance the calibration.
If the monitored stormwater event set were large enough that it could be split into two representative subsets of at least 20 events each, it would be recommended to split it into two subsets, one for model calibration and one for verification.
Performance metrics were applied to all rainfall events, but the results were substantially influenced by the time lags between the measured and modelled hydrographs, as described in Section 3. No attempt was made to synchronize the simulated and measured hydrographs, due to the subjectivity this would introduce. These results are presented in Tables A1 and A2 of Appendix A and will be discussed in Section 5.4.

Application of the Proposed New Approach
The new approach described in Section 4.1 was applied to eight quality assessments of case study models.
For both monitoring sections B1-I and B1-M, three assessments were carried out, two to evaluate the quality of the initial and recalibrated models, in which the dry weather flow (DWF) is adjusted event by event, and the third to evaluate the accuracy of the recalibrated model results without DWF adjustment.
For the downstream monitoring section B1-M, two additional assessments were carried out: one with a recalibrated model of the upstream interceptor sewer catchment, but still with the initial model of the downstream combined catchment (with adjustment of DWF); and another with the model recalibrated downstream, but receiving from the interceptor sewer the inflows measured in B1-I (with the DWF adjusted only in B1-M). Table 3 lists the order in which the results of the eight assessments will be presented and discussed in the next section.   Table 4 presents the results obtained in section B1-I, for the first three assessments described in Table 3. The coefficients and statistics are those described in Section 2. The variables analysed are the volume of the hydrograph (Volume, in m 3 ) and the maximum flows (ith-min peak, in L/s) associated with the following durations (ith): 2, 6, 16, 30, 60, 104 and 150 min. The 2 min duration corresponds to the recording time interval of both the monitored data and the model results.  C2. Recalibrated upstream model with dry weather flow adjustment per event.
C3. Recalibrated upstream model without dry weather flow adjustment. As written in the description of the case study in Section 4.2, the initial model significantly underestimates the results in section B1-I, in particular for some events. Based on a qualitative appreciation of hydrographs in B1-I, the authors of this work would classify the initial model as providing a useful but limited and underestimated approximation. However, in a blind evaluation using the thresholds from Moriasi et al. (2007) [36] and Moriasi et al. (2015) [39], the model would be classified as "good" and "very good" for all PBIAS, NSE and r 2 .
On closer analysis to Table 4, the model's shortcomings are mainly reflected on: • In accordance with the model's underestimation, the PBIAS values are positive for all durations. However, given that the interceptor capacity on B1-I is limited to about 320 L/s and that CSO discharges occur upstream, the variation in flows at the interceptor sewer is limited when compared to the base flows and, therefore, PBIAS values do not exceed 9.6%. • RSR error indicators are greater than 46% for all peak flow durations, reaching 58% for the shortest duration. However, RSR is only 19% for the volume variable, As written in the description of the case study in Section 4.2, the initial model significantly underestimates the results in section B1-I, in particular for some events. Based on a qualitative appreciation of hydrographs in B1-I, the authors of this work would classify the initial model as providing a useful but limited and underestimated approximation. However, in a blind evaluation using the thresholds from Moriasi et al. (2007) [36] and Moriasi et al. (2015) [39], the model would be classified as "good" and "very good" for all PBIAS, NSE and r 2 .
On closer analysis to Table 4, the model's shortcomings are mainly reflected on: • In accordance with the model's underestimation, the PBIAS values are positive for all durations. However, given that the interceptor capacity on B1-I is limited to about 320 L/s and that CSO discharges occur upstream, the variation in flows at the interceptor sewer is limited when compared to the base flows and, therefore, PBIAS values do not exceed 9.6%.
• RSR error indicators are greater than 46% for all peak flow durations, reaching 58% for the shortest duration. However, RSR is only 19% for the volume variable, probably because the duration of events is variable, causing the average of measured volumes to be much higher than the model errors for the smallest events. • Linear regression slopes are less than 0.84 for durations greater than 30 min, reflecting an increasing underestimation bias with the hydrograph duration. For the volume, the linear slope increases to 0.87 (with r 2 of 0.99) probably due to the greater influence of base flows and to the explanation given above.

•
The coefficients of determination are less than 0.9 for almost all durations, although they increase with duration and reach 0.99 for the volume. They are close to 0.8 for the shortest durations, showing some dispersion of results. This dispersion is attributed to the difficulty in the aggregated model covering the variety of situations that occur in the partially separate upstream system. Although these values have good statistical significance and are also well classified according to Moriasi et al. (2015) [39], they should be interpreted with caution due to the great weight of base flows relative to wet weather flows.

•
Except for the volume, where NSE is 0.96, the NSE values are below 0.8 for all durations, reaching 0.70 and 0.66 for, respectively, 6 and 2 min peak flows. Although these NSE values are classified as "good" according to [39] (except NSE = 0.66), they reflect the influence of the base flow and they are much lower than those obtained in the downstream section (B1-M), as will be seen below. These results indicate that within the scope of this new approach, in which NSE is not used to analyze errors in each hydrograph, but errors in pre-selected parts of the various hydrographs, NSE values below 0.8 should not be considered as "good", but simply as "satisfactory". In Table 4, the results from the recalibrated model that are better than the results from the original model are shaded in green. The results that are worse are shaded in orange. The recalibrated model provides better results on B1-I than the initial model in virtually all statistics and for all durations. A substantial improvement stands out in the less satisfactory statistics of the initial model.
However, the recalibrated model continues to show an underestimation trend due to: • The increase in the base flow occurring during and after major rainfall events, which is attributed to the groundwater infiltration into the sewer network. Although the model acceptably represents the tail of some hydrographs, there are not enough events to model the RDII component.

•
The interceptor sewer capacity being limited to roughly 320 L/s, and, therefore, the model deviations for the most intense peak flows also tend to be limited (they can be slightly positive only in the cases where the model results extend over time with this threshold value) (see Figure 5).
Based on a qualitative assessment of the hydrographs in B1-I, the authors would classify the recalibrated model as providing results that tend to be good, but with deviations and limitations for some events.
A more detailed analysis of the results included in Table 4 shows that: • The PBIAS values are less than 4% for all durations, except for the 2 min one (which is 5.3%), evidencing the much smaller underestimation of the model. However, the PBIAS values provided by this new approach cannot be compared with the thresholds in [39] (where the rating would be "very good"), because the underestimation of the largest events is quite muffled in the set of all events. • For durations of up to 30 min, the slopes of the regression line are between 1.00 and 1.04 and the interceptions remain reduced. For the maximum flows over 60 min and for the volume, the slope became greater than 0.9, reflecting improvements over the initial model.

•
The coefficients of determination improved only slightly compared to the initial model, remaining below 0.9 for almost all durations and increasing with duration. • RSR error indicators remain relatively high, but significantly lower than for the initial model, particularly for longer durations.

•
Only for the 2 and 6 min durations did the NSE values remain below 0.8. For the volume variable, the NSE increased to 0.98. However, based on the hydrograph analysis, it would be abusive to classify these results as "very good" according to [39]. • KGE values have increased to the range between 0.84 and 0.93. While in the initial model the lowest KGE values occurred for the longest durations, in the recalibrated model, KGE values above 0.9 occurred for the longest durations, highlighting the effects of the model recalibration. However, it is for longer durations that the model continues to behave worse (due to not modelling the RDII component), which highlights some limitations of these aggregate metrics and the misinterpretation that can result if they are used alone.

Recalibrated Upstream Model without Dry Weather Flow Adjustment (C3)
The use of the model without base flow adjustment corresponds to the standard situation of its use and, therefore, it is the situation for which the precision of the results should be quantified.
As might be expected, not adjusting the dry weather flows leads to worsening of virtually all statistics (shaded in orange in Table 4), with great significance for some. Except for the volume variable, almost all NSE values are now less than 0.8, with some being less than 0.7, and almost all coefficients of determination are less than 0.8. For durations greater than 60 min, the slopes of the regression line are less than 0.9. Interestingly, PBIAS improves slightly for all durations, indicating a lower underestimation.
The CVRMSE values are between 15% and 19% for all durations and are 12% for the volume. As the estimated interval of errors at a 95% probability is I95 ≈ 2·RMSE, this means that I95 is between 30% and 38% of the mean of the flows associated with each duration and is 24% of the mean of the measured volumes.
A significant increase in the RMSE (and I95) values is noteworthy, in relation to the model with base flow adjustments. For the Volume variable, RMSE rises from 214 m 3 to 294 m 3 , corresponding to a 38% increase in the error. The relative error of RMSE increases with the duration of the hydrograph tip, increasing from 16% for the 2 min duration (from 32 L/s to 37 L/s) to 41% for the 150 min duration (from 17 L/s to 24 L/s).
These results indicate that an important component of the uncertainty of the model results in B1-I comes from the variability of the base flows. Table 5 presents the statistical results obtained for section B1-M, for the assessments C4 to C8 described in Table 3. Figure 6 compares the measured and modelled values for both the volume and the 6 min peak flow, for the 26 precipitation events of the recalibrated model. Results are presented for both assessments with and without dry weather flow adjustment (C6 and C8). The graphs of the other assessments are not presented as they do not add value to this discussion. value For C5, C6 and C7, the value is worse than that of the previous analysis. For C8, the value is worse than that of C6. value For C6, the value is better than that of the initial model value For C6, the value is worse than that of the initial model value For C5, C6 and C7, the value is better than that of the previous analysis. For C8, the value is better than that of C6. value For C5, C6 and C7, the value is worse than that of the previous analysis. For C8, the value is worse than that of C6. value For C6, the value is better than that of the initial model value For C6, the value is worse than that of the initial model C6. Recalibrated global model with dry weather flow adjustment per event.

Results of the Global Model in B1-M
C7. Recalibrated global model without dry weather flow adjustment.  The quality of the results of this model is confirmed by the very good values of all statistics for all durations: NSE between 0.96 and 0.99, KGE between 0.95 and 0.98, PBIAS from −1.7% to 2.0%, linear regression slope between 0.97 and 1.03, coefficients of determination not less than 0.97 and CVRMSE values between 8% and 11% (corresponding to I95 between 13% and 21% of the means).
For the volume variable, NSE is 0.99, although KGE is "only" 0.93. PBIAS is only 1.7%, but the slope of the linear regression is 0.93 (with r 2 = 0.99), reflecting the underestimation due to the RDII component not being modelled.
Both slope and KGE for volume are the only metrics that point to a slight bias that would not allow classifying the model as excellent and the volume is the only variable accumulated over different durations. This result seems to show the advantages of the new approach in including a variable that accumulates over the different durations of the events.
These results also call attention to the importance of the modeler's knowledge in assessing the model quality. Despite having more than 100 nodes and providing very accurate results in B1-M, the initial model has some usability limitations due to the black box component left during calibration (it provides underestimated results for the upstream catchment and compensates most of this deviation in the downstream combined catchment, as described in Section 4.2).
It is important to note that the proximity of the values of NSE, KGE and r 2 to optimal values is higher than in most cases in the literature and may derive from the approach applied in this work. As mentioned in Section 2.1, most of the NSE values reported in the calibration of hydrological or hydraulic urban drainage models range from 0.5 to 0.9, with most being greater than 0.7 [2,7,10,20,22,[46][47][48][50][51][52][53]. However, NSE values above 0.95 are also reported [1,53,55].

Initial Downstream Model, but with the Recalibrated Interceptor Sewer Model (with DWF Adjustment) (C5)
With the recalibration of the upstream catchment, the accuracy of the initial model in the downstream section B1-M is significantly impaired (shaded in orange in the results of the C5 analysis of Table 5), except for the volume variable (shaded in green) due to the increase in the upstream flows.
As shown in the C5 results of Table 5, the worst results occur for the 60 and 104 min durations (with NSE of 0.92, KGE of 0.86 and 0.84, linear regression slopes greater than 1.11, and negative PBIES values with absolute values greater than 5%). The overestimation for these durations results from the model not taking into account that, during some events, part of the flows from the interceptor sewer is overflowed between B1-I and B1-M.
In fact, downstream of B1-I, there is a by-pass (in a manhole of the interceptor sewer) that allows overflowing to the stream when the interceptor sewer is under pressure. In fact, a detailed analysis of the linear slope and PBIAS results of the initial model (C4) already indicates a slight overestimation of the peak flows with durations between 16 and 60 min, in relation to the other durations. Figure 7 compares the hydrographs in B1-M of two rainfall events for the following three cases, all with dry weather flow adjustment: C4) the initial model; C5) the initial model downstream, but with the upstream catchment recalibrated; C6) model with the new CSO in the interceptor sewer downstream B1-I and recalibrated. The hydrographs in Figure 7 highlight the excess of inflows to B1-M in the C5 case and the good results of the C6 case, after shaving the excess flow in the new CSO between B1-I and B1-M.

Recalibrated Global Model with Dry Weather Flow Adjustment per Event (C6)
Based on the slopes of the interceptor sewer downstream B1-I and considering the Manning-Strickler coefficient for concrete Ks = 1/0.014 = 71 m 1/3 ·s −1 , the flow capacity of the interceptor sewer downstream B1-I was estimated to be about 220 L/s (about 100 L/s less than in B1-I). The downstream model was then recalibrated, obtaining less accurate results than the initial model, but also quite good.
In the C6 results of Table 5 (recalibrated model with dry weather flow adjustment), the results that are better than the C5 results (initial model downstream, but with the upstream catchment recalibrated) are shaded in green. The worst results are shaded in orange.
The new CSO structure and the recalibration of the downstream model led to a significant improvement in virtually all the statistics for both the volume and the flows lasting longer than 30 min. The slope has also improved for the 2, 6 and 16 min peak flows, but the other statistics are now worse for these durations. This is because a detailed analysis of the scatterplots between the measured and modelled values led to the decision to slightly increase the 6 min peak flow, compared to the initial model. Figure 8 shows the scattergraphs and linear regression lines for the 6 and 150 min maximum flows, for both the initial model and the recalibrated model. For the 6 min peak flows, in the initial model the regression line has a slope of 0.97 and a slight underestimation of the highest values is observed. In the recalibrated model, the slight increase in the highest values of the 6 min peak flow is confirmed and the slope of the linear regression increased to 0.99. The y-intercept increased slightly to 20 L/s, mostly due to a slight overestimation of some lower magnitude 6 min peak flows.  In Table 5, the results of C6 written in purple are worse than those of the initial model and the results written in bold are better than those of the initial model. Except for most of the volume variable values and the 6 min and 16 min peak flow slopes, almost all other statistics in the recalibrated model are worse (albeit slightly) than the initial model. This results from the increase in determinism, and consequent uncertainty, introduced by the CSO structure in the recalibrated model.  In Table 5, the results of C6 written in purple are worse than those of the initial model and the results written in bold are better than those of the initial model. Except for most of the volume variable values and the 6 min and 16 min peak flow slopes, almost all other statistics in the recalibrated model are worse (albeit slightly) than the initial model. This results from the increase in determinism, and consequent uncertainty, introduced by the CSO structure in the recalibrated model.

Recalibrated Global Model without Dry Weather Flow Adjustment (C7)
As expected, the lack of adjustment of the dry weather flow leads to a worsening of virtually all C7 results (shaded in orange in Table 5), with some significance for some. However, all statistics maintain values that should be classified as good. (All NSE values are greater than 0.91, decreasing with duration, KGE is between 0.93 and 0.95, slopes are equal to or greater than 0.95 (except for 150 min) and the regression coefficients range from 0.92 to 0.97. PBIAS values are between −2.7% and −5.7%.) The CVRMSE values are between 11% and 16% for all variables, which means that I95 ≈ 2·RMSE is between 22% and 32% of the mean of the measured values for all variables.
A significant increase in the values of RMSE (and I95) is noteworthy in relation to the model with the base flow adjustments. For the volume variable, RMSE rises from 228 m 3 to 421 m 3 , corresponding to an 85% increase in the error. The relative error of RMSE increases with the duration of peak flow, growing from 7% for the 2-min duration (from 55 L/s to 59 L/s) to 53% for the 150-min duration (from 15 L/s to 23 L/s).
These results show that a significant percentage of the errors in B1-M result from the variability in the base flows.
In spite of the values of several metrics decreasing with some significance in relation to assessment C6, the values of NSE, KGE, r 2 and probably the PBIAS remain relatively close to optimal values, being above most of the values in the literature.
The analysis of the set of results from Tables 4 and 5 and from Tables A1 and A2 of Appendix A shows the potential of the proposed approach, but indicates that it delivers results closer to optimal values than when applying metrics to each rainfall event. If the measurement errors in B1-I and B1-M were null, the results of the C8 analysis would deliver the errors of the downstream model, and the differences in relation to the C6 case (recalibrated model with DWF adjustment) would reflect the effect of the errors of the upstream catchment model on B1-M. In Table 5, the results of the C8 analysis that are better and worse than the results of C6 are shaded, respectively, in green and orange.
The results tend to be slightly better than the C6 ones for most statistics. However, they are worse for PBIAS, showing an overestimation trend for all durations of the downstream recalibrated model. As such, both the slope and the y-intercept are worse for practically all durations, except for 150 min and for volume, which benefit from the contribution of the infiltration flows and the RDII component. The coefficients of determination are 0.98 and 0.99 (except for the 2-min duration only, which is 0.97).
Similarly, KGE values are also better for volume, in principle due to the RDII component, and are worse for practically all durations, due to the model's tendency to overestimation. Table 6 compares the RMSE values obtained for all variables between assessment C6 (recalibrated model with DWF adjustment) and the following two cases: C7 (model without DWF adjustment); C8 (model using the values measured in B1-I and adjusting the DWF only for B1-M). As expected, the RMSE values for C8 are very close to those obtained for C6 for the shortest durations (2, 6 and 16 min) and are smaller for the other durations. The RMSE reduction is maximum for the 60-min duration (28%), probably due to the combined effect of the RDII component and the improvement in the permeable areas infiltration component throughout the event.
However, the RMSE reductions in assessment C8 are much smaller than the RMSE increases when the dry weather flow adjustment is not carried out (C7). These results highlight the role of the base flow variability in model errors.
Part of the variability of the base flows, as well as the errors obtained in C8, results from measurement errors in B1-I and B1-M. Nor can it be excluded that part of the overestimation trend of the downstream model results from some overestimation of the average DWF of the combined catchment in relation to the average DWF coming from the upstream catchment. Figure 9 shows the dry weather flow adjustments considered for each event in sections B1-I and B1-M, as well as the adjustment difference between the two sections. Although the flow adjustments vary with some significance for some events (between ±40 L/s, i.e., around 50% and 40% of the dry weather average flow in B1-I and B1-M, respectively), the adjustment difference between sections B1-I and B1-M is relatively constant for all events: this difference is null until event 11; and ranges from −15 L/s to −10 L/s between events 15 and 26 (except for event 18, where it is 5 L/s, probably due to the accumulation of debris or grease on the sensors). The difference in results between these two different periods indicates a systematic error in measurements. This error is between 9 and 14% of the average dry weather flow in B1-M.

Quality of Monitored Data
This systematic error was initially attributed to the fact of the ring that fixed the pressure, ultrasonic and velocity sensors to the sewer in B1-I being dragged by the flow during event 14, forcing it to be reinstalled under possibly slightly different conditions. However, the analysis of the records of water depth, pressure, velocity and flow in section B1-I, as well as the scattergraphs between these variables, did not allow identifying any bias between those two time periods. On the contrary, the analysis of the scattergraphs at B1-M suggests that the systematic error is likely due to some changes in measurement conditions at B1-M.
The new approach described above could also be used to assess the influence of systematic errors in measurements on the accuracy of the model results. This analysis is not presented to avoid overloading. constant for all events: this difference is null until event 11; and ranges from L/s between events 15 and 26 (except for event 18, where it is 5 L/s, probab accumulation of debris or grease on the sensors). The difference in results b two different periods indicates a systematic error in measurements. This err 9 and 14% of the average dry weather flow in B1-M. This systematic error was initially attributed to the fact of the ring pressure, ultrasonic and velocity sensors to the sewer in B1-I being dragge during event 14, forcing it to be reinstalled under possibly slightly differe However, the analysis of the records of water depth, pressure, velocity and f B1-I, as well as the scattergraphs between these variables, did not allow id bias between those two time periods. On the contrary, the analysis of the sca B1-M suggests that the systematic error is likely due to some changes in conditions at B1-M.
The new approach described above could also be used to assess the systematic errors in measurements on the accuracy of the model results. Th not presented to avoid overloading.

Results of the Metrics Applied to Each Rainfall Event
Appendix A presents two tables containing the results of the perform applied event by event to four cases. Table A1 presents and compares the events in B1-I, for both the initial and the recalibrated model, the tw adjustment (C1 and C2 analysis). Table A2 compares the results for all even the recalibrated model, with and without DWF adjustment (C6 and C7 asse As described in Section 3, in small urban catchments, these re significantly on both time lags between rainfall and flow measurements and and spatial variability of rainfall within the catchment. The values presented and A2 were not the subject of any attempt to synchronize the times betwee flow measurements, although a short 2-4-min adjustment could significantly

Results of the Metrics Applied to Each Rainfall Event
Appendix A presents two tables containing the results of the performance metrics applied event by event to four cases. Table A1 presents and compares the results for all events in B1-I, for both the initial and the recalibrated model, the two with DWF adjustment (C1 and C2 analysis). Table A2 compares the results for all events in B1-M for the recalibrated model, with and without DWF adjustment (C6 and C7 assessments).
As described in Section 3, in small urban catchments, these results depend significantly on both time lags between rainfall and flow measurements and the temporal and spatial variability of rainfall within the catchment. The values presented in Tables A1 and A2 were not the subject of any attempt to synchronize the times between rainfall and flow measurements, although a short 2-4-min adjustment could significantly improve the results for some events. In cases like Figure 4, the lack of synchronism seems obvious, but in other cases this attempt at synchronization would be subjective and debatable due to the intrinsic variability in rainfall.
The qualitative ratings proposed by Moriasi et al. (2015) [39] are represented in the color of the NSE, PBIAS and r 2 results in Tables A1 and A2. Both B1-I and B1-M have a very wide range of results for most metrics, with many events rated as "very good" and many as "unsatisfactory".
However, the analysis of the hydrographs shows that the events classified as "unsatisfactory" usually correspond to small rainfalls, in which small errors of the flow or even of the base flow lead to significant relative errors. Despite such unsatisfactory results in an event-by-event analysis, the model even has a reasonable performance for the complexity and non-linearity of the simulated phenomena and associated uncertainty, as can be observed in Figure 10. This Figure compares the measured and the simulated hydrographs of the two events with the worst metrics (events 26 and 4). While in Table A2 these events make the analysis confusing and even biased or erroneous, in the proposed new approach they are all taken into account, but with the due importance (see Figures 6 and 8).
the complexity and non-linearity of the simulated phenomena and associated uncertain as can be observed in Figure 10. This Figure compares the measured and the simulat hydrographs of the two events with the worst metrics (events 26 and 4). While in Ta A2 these events make the analysis confusing and even biased or erroneous, in proposed new approach they are all taken into account, but with the due importance (  Figures 6 and 8). From the predominance of green shading in Table A1, we can see that the recalibrat model is better than the initial model in B1-I, but interestingly, there are more eve classified as "unsatisfactory" for NSE with the recalibrated model (10 events) than w the initial one (8 events). The results in Table A1 hardly guide the modeler on how improve the model during calibration, contrary to the new approach presented here usi the same metrics.
From the predominance of orange shading in Table A2, we can also see that the no adjustment of the base flow impairs the quality of the results in B1-M. However, results are scattered and do not allow us to give a quantitative or even qualitative idea how much the quality is impaired, contrary to the proposed approach (see Sections 5. and 5.2.5).

Conclusions
Small urban catchments pose challenges in applying performance metrics wh comparing measured and simulated hydrographs. Indeed, results are hampered by short peak flows, due to rainfall variability and measurement synchronization errors, a it can be both difficult and inconvenient to remove base flows from the analysis, giv their influence on the performance of CSO structures. In addition, base flows are important source of uncertainty in modelling small rainfall events, which must be tak into account when assessing the quality of the models.
A new approach was proposed and tested to assess the quality of models of sm combined catchments, which proved to be quite suitable not only for the assessment the quality of the models, but also to support calibration. In the proposed approach, rath than the performance metrics being applied to compare the measured and simulat values within each hydrograph and/or the measured and simulated peak flows of various events, they are applied to compare measured and simulated maximum flows From the predominance of green shading in Table A1, we can see that the recalibrated model is better than the initial model in B1-I, but interestingly, there are more events classified as "unsatisfactory" for NSE with the recalibrated model (10 events) than with the initial one (8 events). The results in Table A1 hardly guide the modeler on how to improve the model during calibration, contrary to the new approach presented here using the same metrics.
From the predominance of orange shading in Table A2, we can also see that the nonadjustment of the base flow impairs the quality of the results in B1-M. However, the results are scattered and do not allow us to give a quantitative or even qualitative idea of how much the quality is impaired, contrary to the proposed approach (see Sections 5.2.4 and 5.2.5).

Conclusions
Small urban catchments pose challenges in applying performance metrics when comparing measured and simulated hydrographs. Indeed, results are hampered by the short peak flows, due to rainfall variability and measurement synchronization errors, and it can be both difficult and inconvenient to remove base flows from the analysis, given their influence on the performance of CSO structures. In addition, base flows are an important source of uncertainty in modelling small rainfall events, which must be taken into account when assessing the quality of the models.
A new approach was proposed and tested to assess the quality of models of small combined catchments, which proved to be quite suitable not only for the assessment of the quality of the models, but also to support calibration. In the proposed approach, rather than the performance metrics being applied to compare the measured and simulated values within each hydrograph and/or the measured and simulated peak flows of the various events, they are applied to compare measured and simulated maximum flows for a set of different durations. For each duration, the measured and simulated maximum flow series can be easily calculated by applying a rolling-window search routine to each hydrograph. To keep the assessment simple, five to eight different durations with increasing intervals should be analysed.
This new approach presents the following advantages: (a) being simple; (b) avoiding the inconveniences arising from the time lags of very short peaks (described in Section 3) and the subjectivity of possible adjustments; (c) favouring the assessment of the influence of base flow and RDII variability (see assessments C2, C3 and C7) and the influence of peak flow shaving by upstream CSOs (assessment C5); (d) promoting and facilitating an integrated analysis for a wide range of rainfall events; (e) avoiding subjectivity in interpreting different results for the various events; and (f) making it possible to identify biases in simulated hydrographs that would otherwise be difficult to detect, also guiding calibration.
However, it has the disadvantage of requiring a sufficiently large and representative set of rainfall events to ensure statistical significance, which, in principle, should not be less than 20 events for evaluating the quality of model results or twice that number for a complete model calibration and verification.
In addition, the results delivered by this new approach should not be compared with the thresholds proposed in Moriasi et al. (2015) [39] without careful consideration, as the values of NSE, r 2 and PBIAS tend to be closer to optimal values than when applying metrics to compare measured and simulated values within each hydrograph.
This recommendation is extended to all modelers who apply performance metrics to peak flows in urban drainage systems.
In the application of the described new approach to a model classified as providing a useful but limited and underestimated approximation (assessment C1), the NSE values were below 0.8 for all durations and, for some, below 0.7. KGE values ranged from 0.7 to 0.85. PBIAS values were less than ±10%. For the recalibrated model, providing results that tend to be good but with deviations and limitations for some events (assessment C2), the NSE values tended to be greater than 0.8, the KGE values tended to approach 0.9 and PBIAS were within ±5%. Finally, for models of which the simulated and measured hydrographs are very coincident for some events and show small deviations in other events (assessments C4 and C6), both the NSE and the KGE values were higher than 0.95 for all durations, reaching 0.98 and 0.99 in some cases.
During normal use of the model, base flows are unknown. Without adjusting base flows (assessments C3 and C7), NSE values tended to fall by up to 0.1, depending on duration, while KGE values tended to vary much less. For the "very good" quality model (assessments C7 compared with C6), the RMSE values increased between 20% and 50% as the analysed duration increased to 150 min, due to the influence of the unmodelled RDII.
The various examples of the case study highlight the importance of using different metrics and graphical analyses and the pertinence of the proposed approach. Funding: Part of this work was co-funded by the European Regional Development Fund (FEDER), under programs POR Lisboa2020 and CrescAlgarve2020, through Project SINERGEA (ANI 33595). according to [39] according to [39] according to [39] according to [39]