1. Introduction
British Columbia (BC), Canada, is renowned for its natural resources and is a major producer of fruits, vegetables, wine, and seafood, thanks to its diverse climate and geography. The province regulates access to its water resources by issuing water licenses under the Water Sustainability Act (WSA), which came into effect on 29 February 2016 [
1], replacing the previous Water Act. This groundbreaking legislation integrates economic and environmental considerations by requiring statutory decision-makers to assess the Environmental Flow Needs (EFNs) of streams during the adjudication of water license applications. However, with only a limited number of streams being gauged across the province, managing water allocation while accounting for EFNs has become challenging for provincial water teams responsible for technical analyses and licensing under the WSA. As a result, some watersheds, such as the Nicomekl and Serpentine, have been over-allocated, with more licenses issued than the watersheds can support [
2]. At the same time, other watersheds, which still have available capacity for licensing, lack sufficient data to fully understand the EFN requirements. Therefore, the effective management of BC’s water resources under the WSA requires accurate streamflow predictions, especially in regions with limited or no hydrological data.
Every descriptive method in hydrology fundamentally relies on the transfer of hydrological information from a gauged watershed to an ungauged watershed
, utilizing known relationships and patterns to infer streamflow characteristics in areas lacking direct measurements [
3,
4]. However, traditional predictive methods often face challenges such as complexity, the need for extensive data, or suitability, primarily for large-scale watersheds like rivers and lakes [
5]. For water rights authorization, however, there is a need for methods that are not only easy to implement but also require minimal data and perform effectively in small watersheds. These attributes are crucial to ensure that the methodology can be efficiently integrated into decision-making processes. Such methods are particularly well-suited for small streams or creeks, where applicants are often small-scale farmers or businesses.
In
, where direct streamflow measurements are unavailable, hydrologists often rely on surrogate methods to estimate flow regimes. Among these methods, empirical techniques like the Drainage Area Ratio (DAR) and data-driven methods such as Nearest Neighbor
are widely used in hydrology due to their simplicity and effectiveness, particularly in data-scarce environments [
6,
7,
8]. In particular, the DAR approach, also referred to as the watershed area ratio, has been extensively applied to estimate hydrological data for
worldwide [
9,
10], including in Canada [
11]. It is a well-established empirical approach that estimates streamflow in
watersheds based on the ratio of drainage areas between gauged and
watersheds. While this lumped predictive method is easy to implement and data-efficient, it assumes that streamflow is directly proportional to the drainage area, without considering other hydrological factors that may influence flow.
While the DAR method offers a straightforward approach to streamflow estimation, particularly for water licensing applications, its lumped framework and the underlying assumption that streamflow scales solely with watershed area can introduce substantial uncertainties, potentially compromising the robustness of the resulting predictions. To ensure the proposed methodology works effectively for British Columbia, adjustments must be made to DAR, as streamflow in BC fluctuates across a range of time scales. On shorter time scales, ranging from less than a day to a few days, it is primarily influenced by weather events, including rainfall, snowmelt, ice melt, and evapotranspiration [
12]. On longer time scales, such as over several years or decades, large-scale climatic phenomena like El Niño become significant factors [
13]. Despite these variations, streamflow often follows predictable seasonal patterns, which are generally linked to the main sources of water flow: rainfall, snowmelt, and glacier melt [
14].
The influence of other watershed characteristics, such as slope and elevation, becomes evident in how these seasonal patterns manifest, as these factors directly affect the timing, volume, and distribution of runoff throughout the year [
4,
15]. These key characteristics influence streamflow generation and, therefore, can serve as valuable indicators for predicting streamflow, particularly in
watersheds [
4,
8]. The slope of a watershed affects the velocity of surface runoff, with steeper slopes leading to more concentrated runoff during precipitation events or snowmelt, resulting in higher peak flows [
16]. Elevation also plays a significant role, as higher elevations are more likely to experience snowfall, which contributes to streamflow during warmer temperatures when snowmelt occurs [
17]. This delayed contribution from snowmelt is particularly important in regions with substantial snowpacks. Additionally, elevation is often correlated with temperature and atmospheric conditions, influencing precipitation type and timing, which, in turn, affects runoff patterns. Collectively, watershed slope and elevation govern hydrological processes, offering essential insights into streamflow dynamics and overall watershed behavior. When combined with empirical methods such as DAR, these physiographic factors can enhance streamflow predictions, even in regions with limited hydrological data.
Integrating slope and elevation into the DAR method enhances its ability to normalize flow estimates beyond the conventional use of watershed area alone. Instead of relying solely on a watershed area as in the traditional DAR approach, these supplementary watershed characteristics offer a more comprehensive framework for flow estimation in the
watershed. The selection of normalization parameters—whether watershed area alone, or in conjunction with elevation or slope—should be determined based on the combination that yields the most statistically robust results in neighboring watersheds. While the integration of
with
is not novel, having been previously explored [
7], prior methodologies have primarily utilized the area solely as a scaling parameter. The novelty of the proposed approach lies in introducing two additional normalization parameters alongside area, with the optimal parameter being selected based on which yields the most statistically robust results when applied to neural network predictions of Ug using known streamflow data.
The proposed methodology is grounded in the principles of traditional hydrological predictive models, where hydrological data from gauged watersheds are transferred to
watersheds under specific assumptions, in accordance with established hydrological laws [
3]. To ensure that the hydrological characteristics of an
watershed—namely, the timing, volume, and distribution of runoff throughout the year—correspond with those of gauged watersheds, we selected its
with similar flow-generating parameters, particularly watershed area, slope, and elevation. In contrast to the DAR,
the method utilizes statistical learning to identify similarities between basins based on physical and climatic attributes, thereby providing more refined predictions by leveraging a broader set of basin characteristics. While both methods (DAR and
) have distinct strengths and limitations, their combination
offers a promising approach to improving streamflow predictions in
watersheds. The DAR method provides a quick, data-efficient estimate based on the basic physical characteristics of the drainage area, while the
method can complement this by incorporating additional environmental variables to refine predictions. By integrating these methods, we aim to balance simplicity with accuracy, offering a more robust tool for hydrological forecasting in regions with limited data availability.
Building on the aforementioned observations, we hypothesize that normalizing watersheds located in the same hydrological zones based on their area (for homogeneous watersheds) alone, or combined with mean elevations (for snow-dominant watersheds) or slope (for rainfall-dominant watersheds), should—with reasonable efficiency—predict hydrological data for watersheds. To test this hypothesis, the study enhances the traditional DAR method by incorporating an approach for donor site selection, thereby accounting for hydrological and physiographic similarity. Accordingly, the methodological comparison is intentionally limited to the conventional DAR approach in order to clearly demonstrate the specific improvements introduced by the proposed modification.
3. Methodology
The rationale for adopting this methodology
to predict streamflows in BC is based on the approach proposed by Ahmed [
19]. The study highlights that the most effective means of estimating streamflow characteristics at
sites is through regional procedures that utilize hydrologic zones—areas with homogeneous runoff characteristics where available data can be extrapolated with reasonable accuracy. These zones are typically delineated using physiographic features or statistical analysis of hydrologic data to ensure reliable streamflow estimations. Given the highly heterogeneous nature of British Columbia’s hydrology and the limited availability of gauged data, the study employed a physical mapping procedure to define hydrologic zones. This approach follows the methodology outlined in the BCSI report [
20], which is publicly accessible at:
https://catalogue.data.gov.bc.ca/dataset/329fd234-8835-4d44-9aaa-97c37bfc8d92; accessed on 14 January 2025.
The delineated hydrological zones used in this study are presented in
Figure 2.
However, the same study also confirmed that there are cases where an NN approach to selecting stations for prediction in watersheds may be more appropriate.
The methodology was applied to multiple locations across BC where the published literature showed that mean elevation and runoff magnitudes are directly related, i.e., runoff magnitude increases with an increase in basin elevation [
19]. Whereas the slope of a watershed plays a pivotal role in determining variations in surface hydrology, particularly in pluvial watersheds, impacting the time between precipitation and maximum discharge within the watershed [
21]. In the context of BC, Sharma and Dery [
22] also identified a significant positive correlation between the slope of watersheds and Atmospheric River-related Annual Maxima runoff percentage across the province. Therefore, three basin characteristics (area, mean elevation, and slope) were selected based on (1) the published literature, which showed how they impact runoff magnitudes; (2) ease of availability of these parameters; and (3) convenience/ease of implementation and reimplementation of the methodology. It is also important to note that while slope is widely recognized as a key factor influencing streamflow generation within a watershed, where steeper gradients promote faster runoff and higher flow velocities, the role of elevation is more complex and may not always exhibit a direct correlation with streamflow [
23]. This distinction becomes particularly relevant in lumped hydrological modeling, where streamflow predictions are derived from limited input data. In such modelling techniques, assuming a fixed relationship—whether direct or inverse—between elevation and streamflow without empirical validation can introduce significant uncertainties, leading to errors in the prediction methodology. Therefore, a more nuanced approach that accounts for the variability in elevation effects is crucial for enhancing the accuracy and reliability of hydrological models.
Beyond the scientific foundation of the methodology, its implementation necessitates hydrological data from multiple locations across the target watersheds where streamflow estimations are required. The province of BC has several abandoned and real-time hydrometric stations across its area. Since real-time hydrometric stations are only located on large water bodies (rivers or large creeks) and authorization is not limited to the larger water bodies, we used abandoned hydrometric station data in our analysis as well.
The methodology for predicting monthly flows in watershed is outlined in the following steps, followed by the mathematical algorithms used to generate hydrological data for .
Watersheds within the same hydrological zone were clustered together. To identify the most relevant watersheds for analysis, we calculated their distances from the watershed using Euclidean space metrics. This clustering approach ensured that the selected watersheds were hydrologically similar and spatially relevant.
- 2.
Watershed Delineation
Using
, we meticulously delineated the watershed boundaries for each gauged and
watershed. The delineation process was carried out to accurately define the contributing areas and ensure consistency in the dataset. Below is the watershed delineation at the outlet for Gamelin Creek (represented by PD43584 in
Figure 3), provided as an example in our analysis.
- 3.
Area, Elevation, and Slope Estimation
The plugin was employed to calculate the watershed area, mean elevation, and mean slope for the of the watershed. This provided a quantitative basis for comparing the physical characteristics of the watershed with its neighboring watersheds.
While watershed area and elevation are the primary factors used to identify
and drive the streamflow prediction process, slope has also been incorporated due to its significant role in runoff generation, particularly in low-lying areas. As noted in the Introduction section, in regions where rainfall is the dominant driver of streamflow and elevation has minimal influence, slope becomes a crucial factor in determining streamflow magnitudes. This relationship has been well-documented in
by Sharma and Dery [
22], reinforcing the importance of integrating slope alongside area and elevation to enhance the accuracy of hydrological modeling.
- 4.
Sorting by Elevation and Slope Difference
The were further sorted based on the absolute difference in mean elevation and slope between each and the watershed. This secondary sorting step ensured that the watersheds with the most similar topographical characteristics were prioritized for subsequent analysis.
- 5.
Flow Data Extraction
Flow magnitude data for the selected
were obtained from multiple sources to ensure a comprehensive hydrological assessment. These sources included: (1) the Water Survey of Canada’s database (
https://wateroffice.ec.gc.ca/; accessed on 14 January 2025), (2) Surface Water-managed hydrometric stations, (3) municipal government records, and (4) public contributions. The dataset comprised mean monthly flow values, spot measurements, and observations of maximum and minimum flow occurrences, all of which were essential for characterizing the hydrological patterns of the selected
and improving the reliability of streamflow predictions.
- 6.
Selection of NNs for Data Coverage
To ensure a comprehensive and accurate representation of flow data, the
were chosen in a specific sequence. The process began by selecting the
with the smallest elevation or slope difference compared to the
watershed. This was carried out to ensure that the first
was hydrologically similar, as elevation or slope can significantly influence streamflow. Once the first
was selected, additional
were added one by one in iterative steps. The goal was to ensure that for each month of the year, there were at least two mean monthly flow values available from different
. This selection strategy was informed by observations during the cross-validation phase, where it became evident that the most accurate predictions occurred when each month was represented by flow data from at least two
. Using fewer than two
often failed to capture the variability in flow patterns, resulting in oversimplified outputs. Conversely, incorporating more than two
tended to introduce unnecessary complexity and led to overly smoothed predictions. These findings are consistent with prior studies by Samaniego et al. [
24] and Qamar et al. [
8], which also emphasized the importance of balancing representativeness and model parsimony in
selection.
- 7.
Prediction of Flow Data for Watersheds
Once the were selected, each in the cluster was alternately treated as . For the remaining with known hydrological data, the flow data at the watershed was predicted using the proposed approach, broadly summed up in the following steps:
The flow data (obtained from multiple sources mentioned above) for each watershed with known flow was normalized by dividing the monthly flow values by the watershed area, producing a unit area discharge value .
The normalized flow data were then scaled up to match the watershed area of the watershed, providing an initial prediction of the flow.
To further refine the prediction, the scaled-up flow data were normalized based on the elevation and slope of each watershed, accounting for topographical influences.
The adjusted flow values for each were averaged, yielding a representative predictive monthly flow dataset (expressed as two column vectors) for the watershed. Watershed area, elevation, and slope were used as mathematical operators in this process.
- 8.
Validation Using Mean Absolute Error ()
The predicted flow data were compared with observed flow data at the gauged stations to calculate
. This process was repeated iteratively, with each basin being removed once to simulate an
scenario. The error metric, representing
for each basin
, was calculated as follows:
where
and
are the predicted and observed flow values, respectively, and
is the number of observations.
By systematically iterating through all basins, the methodology ensured robustness and consistency in the predictive flow estimates for watersheds.
Mathematically, the modeling process can be defined as the following steps.
Definitions:
Predicted flow for the original watershed for month .
Measured flow for the gauged watershed for month .
Total number of gauged used for prediction.
Watershed area of the and gauged watershed , respectively.
Mean elevation of the and gauged watershed , respectively.
Mean slope of the and gauged watershed , respectively.
Weight assigned to each gauged (optional, default is equal weighting).
To normalize the monthly flow
of each gauged station
of the current month
by its watershed area:
For each gauged station , normalize the area-normalized flow using elevation and slope:
Each is treated as if it were . Its flow is predicted using the remaining by scaling the normalized flows back (or up) to the characteristics of .
where
represents the remaining number of
after
is assumed to be
and removed from the dataset.
The position of elevation in the denominator indicates that elevation is inversely proportional to watershed streamflow.
To obtain the final predicted flow for each treated as , we take the average of the predicted flows over all the months:
To determine which normalization (area, elevation, or slope) is more effective, compare the predicted flows with the actual flows using :
Whichever normalization (elevation or slope) produces the lowest
is selected for predicting the flows for the original
:
Note that the downward arrows indicate stations where the inverse relationship between elevation and discharge was considered, as it resulted in a lower compared to the direct relationship .
Final Predicted Flow for Each Month:
The final equation used to predict the monthly flow
for the original
is as follows:
where
is the chosen parameter (either area, elevation, or slope) based on which one resulted in the lower
. Please note that if the selected
is elevation and is inversely related to discharge, then Equation (15) will be displayed as follows:
For improved clarity and understanding, the methodology is presented in
Figure 4.
It is important to note that the hydrological zones delineated in
Figure 2—based on the study conducted by Ahmed [
19]—represent regions where watersheds exhibit strong correlations in flow behavior. Consequently,
are selected exclusively from within these zones, ensuring consistency in flow-generating mechanisms between
watershed and its neighbors.
4. Results and Discussion
To assess the effectiveness of the proposed methodology, we compiled observed streamflow data from various hydrometric stations managed by provincial hydrometric specialists, municipal governments, and observation stations. In cases where direct discharge measurements were not feasible, local residents recorded the months of maximum and minimum flow. The observed data were then compared with the predicted values. As detailed in previous sections, some hydrometric stations recorded data at high temporal resolutions, including 15-minute intervals or daily measurements. To standardize the dataset for analysis, these high-resolution observations were aggregated to a monthly time scale by computing the mean discharge for each month using (version 4.3.0).
The watersheds analyzed in this study are primarily influenced by rainfall in the southern regions of the province and by snowmelt in the northern regions of British Columbia. To evaluate the performance of the proposed methodology, we compared the results with the conventional method, which serves as a key normalization parameter alongside elevation and slope. In regions lacking installed hydrometric gauges, supplementary information was obtained from residents whose families have lived in the area for multiple generations. These individuals, who either utilize water from nearby channels or possess extensive knowledge of regional flow patterns, provided qualitative insights into seasonal discharge variability.
As outlined in the Methodology section, while the relationship between watershed slope and discharge is direct, the connection between elevation and discharge is less certain. To assess the relationship between elevation and discharge, we calculated the
in the
of
under two assumptions: one where elevation is directly related to discharge, and another where this relationship is not assumed. This process was repeated for all
, with the predicted monthly streamflows compared to the observed streamflows for each
. The relationship that resulted in the lowest
was selected for predicting the hydrological data at
.
Table 2 displays the
values obtained by utilizing area, elevation, and slope as normalization parameters for streamflow prediction in the
of
.
Notably, stations such as 08MH098, 08MH152, 08MH156, 08MH153, and 08MH055 demonstrated improved prediction accuracy when the inverse relationship between elevation and discharge was applied. In contrast, for other stations, the direct relationship between elevation and discharge, or the use of other normalization parameters, yielded the most accurate results. These variations emphasize the significance of considering alternative relationships for elevation as a normalization parameter, underscoring the need for a flexible approach when optimizing streamflow predictions in the current modeling framework.
The relationship between elevation and streamflow varies across the study area. While most of the stations in the study area exhibit a direct relationship with mean elevation, some stations show an inverse relationship between elevation and flow. This pattern is particularly evident in larger watersheds, such as 08MH055. This inverse trend in higher-elevation watersheds is likely influenced by a combination of delayed snowmelt contributions, high infiltration rates—suggested by Google Earth imagery indicating dense vegetation—longer water travel times (time of concentration), and precipitation distribution patterns. Identifying the dominant factor requires further analysis of precipitation data, land cover, soil permeability, and groundwater interactions.
Similarly, the inverse relationship between elevation and flow in lower-elevation watersheds along the international border between the USA and Canada (08MH152, 08MH153, and 08MH156) can be attributed to increasing groundwater contributions as the point of interest within the watershed shifts downstream. Groundwater contributions were particularly evident during our field visits, and an image from one of these visits is provided below in
Figure 5:
Notably, 08MH098 is located on West Creek, near its drainage point into the Fraser River, making it more susceptible to groundwater influence.
We observed that for watersheds, better predictive results were achieved when the had similar watershed characteristics to those of the . The normalization procedure tended to produce less efficient results when it was applied to watersheds with significant differences in characteristics. In contrast, when the and its had similar characteristics, less normalization was required, leading to more accurate predictions. For instance, watershed PD43584 had a of 38.381—significantly higher than that of other watersheds in the study, such as 08MH082. This can be explained by the substantial variation in size among PD43584’s NNs, which leads to significant differences in discharge magnitudes. In particular, one of its NNs, 08MG005, exhibits a much higher discharge than the others, which skews the normalization and results in a disproportionately high value. As a result, predictive performance at 08MH082 is theoretically expected to be more reliable than at PD43584, given its more consistent watershed characteristics.
Table 3 presents
and Nash–Sutcliffe efficiency values for various watersheds, calculated using the
hydroGOF package in
R(version 4.3.0) [
25], highlighting how these metrics vary based on differences in watershed characteristics between
and its
. The table also identifies the normalization parameter selected for each station and demonstrates how
and
change depending on the degree of variation—both minimum and maximum—in watershed characteristics between
and its
.
The minimum, average, and maximum variations in watershed characteristics between the target watershed
and its
, along with the corresponding performance parameters,
and
in
Table 3, highlight how different hydrometric stations exhibit varying degrees of similarity to
in terms of elevation, watershed area, and slope. The variation percentages indicate the extent of deviation of each
from
, with lower values representing greater similarity.
For stations where elevation was used as the primary normalization parameter, the variation ranged from as low as 0.404% (HYD-GIBS-R1) to as high as 88.44% (PD189470). Notably, larger deviations in elevation were associated with lower values, suggesting reduced predictive performance, as seen in PD189470 ( = −0.707). Conversely, stations with relatively low variation, such as 08MH084 (average variation = 9.034%), exhibited higher performance with values approaching 1.
Similarly, when the watershed area was used as the normalization parameter, variations remained moderate, with maximum deviations around 38.534% (08MH082, 08MH098, and 08MH156). In these cases, the values generally remained positive, indicating reasonable predictive accuracy.
For stations where slope was the chosen parameter, variation patterns varied widely. While some stations (e.g., 08MH104 and 08MH155) showed minimal variation (1.901%), others, such as PD200737, exhibited extreme deviations, exceeding 300%. Interestingly, extreme variations in slope were associated with significantly lower values (e.g., = 0.0746 for PD200737), highlighting the critical importance of selecting with watershed characteristics that closely match those of to ensure more reliable predictions.
Overall, the table underscores that
and
values are sensitive to the choice of normalization parameter and the degree of variation between
and its
. The findings suggest that selecting the most appropriate
for
should be guided by minimizing the variation between
and
. Moreover, by reviewing
Table 3 and
Table 4 concurrently, it can be clearly interpreted that as the performance of the normalization parameter in predicting the hydrological data of
deteriorates (resulting in larger
values), the prediction performance for
also declines. This indicates that the prediction performance can be assessed prior to the actual application of the model for
. For example, for hydrometric stations 08MH157, 08MH0004, and PD43584, the performance deteriorates, as shown in
Table 3, with increasing
values for predicting the hydrological data of
for
. By the same token, stations 08MH098, 08MH084, and 08MH0058 demonstrated strong statistical performance, as reflected in their comparatively better predictive performance for predicting the hydrological data of their
, as shown in
Table 2.
The results presented below illustrate the outcomes when the area parameter is utilized as the normalization parameter in the prediction process for , where the area parameter yielded the least error among the for these stations.
Figure 6 compares observed (“Actual”) and modeled (“DAR”) streamflow discharge across four hydrometric stations, revealing seasonal variations in discharge trends. At station 08MH0051, the model underestimates discharge during the high-flow season (January to March) but aligns more closely during the low-flow period (May to September). Similarly, at 08MH082, the DAR model captures seasonal trends but overestimates discharge in the high-flow period, while providing a good fit during low-flow periods. Station 08MH098 exhibits strong agreement between modeled and observed values, with minimal deviations. At PD200737, the model underestimates streamflow during the summer months but aligns better in other periods. Overall, while the DAR method effectively captures seasonal discharge patterns, some discrepancies persist, particularly in high-flow conditions, indicating the need for further model refinement.
Building upon the analysis of the area normalization parameter, the subsequent results investigate the performance of the elevation parameter in normalizing the prediction of .
Figure 7 presents a comparison between predicted and observed discharge values across eight hydrometric stations: 08MH152, 08MH156, 08MH153, 08MH055, 08MH0055, 08MH0004, 08MH0058, and 08MH084.
For station 08MH152, observed discharge values (represented by the blue line) generally exceed predicted values during the high-flow period (January to March). However, during the low-flow period (June to August), predictions from both methods align more closely with the observed data. In station 08MH156, predictions from both the and methods are nearly identical, indicating that the inclusion of does not significantly enhance the accuracy over the method alone. Both methods, however, tend to underestimate discharge during the high-flow period.
For station 08MH153, the method provides improved prediction accuracy compared to alone, particularly during the high-flow period. Despite this improvement, both methods underestimate discharge during the low-flow period.
Station 08MH055 exhibits high variability in observed discharge values, with a notable peak in June. Neither prediction method successfully captures this sharp peak, although both methods adequately represent the overall seasonal trend. The method provides a better fit to observed discharge values than the method alone.
In station 08MH0055, both methods tend to underestimate discharge during high-flow periods (January to April) and overestimate during low-flow periods (July to August). However, the method provides a closer match to observed values compared to alone.
For station 08MH0004, observed discharge displays a significant peak in June, which is overestimated by the method, but more accurately captured by the approach. Both methods, however, tend to underestimate discharge during the winter months.
For station 08MH0058, both and methods closely track the observed discharge throughout the year, with slight underestimation during peak flows in January and overestimation during the late fall. These minor discrepancies indicate good predictive performance at this station. In station 08MH084, the observed, , and lines are well-aligned, effectively capturing both high- and low-flow periods, suggesting that both methods perform similarly and effectively.
For HYD-GIBS-R1, the model tends to overestimate discharge, particularly during peak flow periods, indicating potential limitations in accurately capturing seasonal variability. In contrast, the hybrid method provides a closer approximation to actual discharge values, suggesting improved predictive capability. Notably, during the low-flow period around June, both models struggle to capture the sharp decline, though follows the trend more closely.
In general, the method generally improves prediction accuracy compared to the method alone, particularly in capturing seasonal trends and reducing errors during low-flow periods. However, challenges persist in accurately capturing sharp peaks and high-flow events, particularly at stations with more variable hydrological regimes, which may necessitate the introduction of basin-scale information into the current modeling framework.
Subsequent to the analysis of the elevation normalization parameter, the following section assesses the influence of the slope parameter on the normalization of predictions.
Figure 8 presents a comparison between predicted and observed discharge values for four hydrometric stations (08MH104, 08MH155, 08MH0041, and 08MH157), with the slope parameter used as the normalization parameter.
For station 08MH104, both prediction methods follow the seasonal trend of the actual data, although tends to slightly overestimate discharge in January and underestimate it during the late fall. The method provides a closer fit to observed data, especially during low-flow periods.
At station 08MH155, both the and methods show good agreement with actual discharge, effectively capturing both high- and low-flow periods, though slightly overestimates discharge in the winter months.
For station 08MH0041, both prediction methods underestimate peak flows observed in January and February but align well with actual values during the low-flow period (June to August). The approach provides a closer match to observed discharge throughout the year compared to alone.
In contrast, station 08MH157 exhibits higher variability in actual discharge, with a sharp peak observed around June. Both and methods overestimate flows during this period, although performs slightly better in aligning with the actual pattern. Additionally, both methods tend to overestimate discharge in the early months and underperform during the low-flow season.
The above results also confirm that the proposed methodology consistently predicts the months of maximum and minimum monthly flow. This accuracy was further validated for PD43584, designated as an “Observation Post,” where local Indigenous knowledge was employed to determine the timing of peak and lowest flow values. The predicted flow rates and their corresponding months are presented in
Figure 9.
For PD43584, the observed maximum monthly flow occurs in June, while the minimum flow is recorded during the winter months, aligning precisely with the insights provided by local Indigenous knowledge.
An overall performance comparison of the proposed methodology with showed that the proposed methodology can predict hydrological data in watersheds with reasonable accuracy, making it applicable for predicting hydrological data elsewhere. Applying this methodology to of watersheds demonstrate their effectiveness at the local scale before extending their application to other watersheds. This property helps determine whether to use the methodology for predicting hydrological data for a specific point of diversion or to rely on information from another source if better results are not achieved in the .
A comparative analysis of model performance for streamflow prediction using two methods: the
method and
approach are presented in the following
Figure 10.
From the figure, it is clear that the method consistently outperforms the method alone in terms of both efficiency and accuracy across most hydrometric stations. Specifically, the approach yields higher efficiency values, indicating a better ability to reproduce the observed discharge variability and lower error, reflecting smaller deviations from actual streamflow data. The improvement is especially notable in stations with more complex and fluctuating flow regimes, where the combined method demonstrates significantly better performance with higher efficiency and reduced error. However, for stations characterized by stable and less variable streamflow patterns, both methods perform comparably, suggesting that the simpler method may still be sufficient in such cases. Overall, the figure underscores the effectiveness of integrating with for enhancing streamflow prediction accuracy, particularly in dynamically varying hydrological conditions where traditional methods may fall short.
Figure 6,
Figure 7 and
Figure 8 indicate that the model performs poorly for watersheds 08MH055 and 08MH157. This poor performance can be attributed to the transboundary nature of these watersheds—portions of their drainage areas extend into the United States. The DEMs used in our analysis are limited to the Canadian side of the international border and therefore only generate streamflow networks within Canadian territory. As a result, the delineation of these watersheds is incomplete (see
Figure 11 below), leading to partial streamflow networks. This incomplete representation introduces inaccuracies in key watershed characteristics, including area, slope, and elevation, which, in turn, negatively impact the model’s predictive performance.
As illustrated in
Figure 11 above, the delineation only covers the northern portion of the watersheds within Canada, with no contribution from the U.S. side.
Additionally, the performance deterioration is more pronounced for stations with larger watershed areas. For example, stations like 08MH157 and 08MH055, with watershed areas of 39,879,899 m2 and 426,331,622 m2, respectively, show much worse performance than stations with smaller areas, such as 08MH0058 (1,691,188 m2) or 08MH104 (26,357,626 m2). Larger watersheds tend to exhibit more complex and varied hydrological behavior, making it more difficult for the model to accurately predict flow dynamics when there is significant variation in watershed characteristics. This highlights the importance of both the similarity in characteristics between the station and its and the size of the watershed in determining the model’s predictive accuracy.
This issue becomes particularly pertinent when the of an is a substantially larger water body, such as a major river, compared to the other . It also arises when predicting flow for a tributary watershed using data from a much larger hydrological system. To address such cases, we propose either excluding watersheds with significant area discrepancies or, if exclusion is not feasible, incorporating additional neighboring watershed(s) to normalize the impact of including a significantly larger water body. While establishing what constitutes a “significantly larger” watershed can be complex, this methodology proves valuable when it is necessary to incorporate a hydrologically distinct or larger watershed into the analysis in order to ensure comprehensive data coverage across all twelve months of the year. This approach is particularly useful when a suitable cannot be identified, necessitating the selection of a more compromise-prone .
A practical example of this approach is demonstrated in the flow prediction for station 08MH157, where the
08MH004 and 08MH055 were selected due to their minimal
in relation to the normalization parameter of elevation (see
Figure 12 below). To mitigate the risk of oversimplifying the results, the larger watershed 08MH055 was chosen despite its considerable size and elevation. To balance the impact of including a large watershed (with a portion of its area located in the United States and, therefore, not fully delineated in the stream network), we also incorporated watershed 08MH163—entirely located within Canada and covering an area of 26,075,739 m
2—as an additional neighboring watershed in the analysis. This adjustment helped correct the oversimplification introduced by the inclusion of a hydrologically dominant watershed.
Figure 12 above demonstrates the marked improvement in prediction accuracy achieved by applying the scaling factors derived from this procedure. Specifically, the performance indicators were enhanced, with
reducing from 2.070 to 0.794 and
increasing from −1.908 to 0.520. The dotted lines represent the mean monthly discharge for each dataset: actual, initial prediction, and revised prediction. The placement of these mean values highlights the tendency of the initial prediction to overestimate discharge, while the revised prediction more closely approximates observed trends. These results demonstrate the effectiveness of incorporating additional
to improve streamflow predictions, particularly when dealing with larger watersheds that extend beyond Canada’s borders and, therefore, lack complete delineation. This approach contributes to a more robust and reliable modeling framework.
The accuracy of predicted discharge data can be significantly affected by anthropogenic activities as well, particularly during critically low-flow periods and peak irrigation demand months, as unauthorized water diversions across British Columbia’s water channels contribute to flow variability and discrepancies between observed and predicted values [
26]. These diversions disrupt the natural flow regime by introducing artificial and undocumented alterations, especially in upstream areas. When diversions occur upstream of gauging stations, they reduce recorded flow volumes, which are crucial for model calibration and validation. As a result, prediction models misinterpret these diminished flows as indicative of natural conditions, leading to a systematic underestimation of streamflow in downstream and
regions. This issue is particularly relevant in cases where a disproportionate amount of unauthorized water is diverted from a stream surrounded by agricultural areas, compared to its
with minimal agricultural influence. Here, we must note that an argument can be made that the presence of unauthorized water diversions (i.e., the extraction or redirection of water without a valid license or in excess of permitted amounts) in historical records means they are inherently included in the observed discharge data used for model training. However, the critical issue is the interannual variability of these unauthorized diversions, which introduces uncertainty in the prediction process. The proposed models are trained on past data that reflect a mixture of natural flows and any historical anthropogenic alterations, including unauthorized diversions. While this enables the model to implicitly learn patterns under past diversion conditions, it does not account for year-to-year fluctuations in the magnitude, timing, or spatial distribution of unauthorized diversions, particularly during periods of low flow and high irrigation demand. These variations are unrecorded and differ significantly across watersheds depending on land use (e.g., intensity of agriculture), enforcement practices, and climatic conditions. Because there is no consistent or quantifiable record of unauthorized water use across the province, these diversions introduce a non-stationary component in the discharge data that the model cannot reliably capture or forecast. This contributes to the observed discrepancies between predicted and actual flows during critical periods.
Unauthorized diversions also undermine fundamental hydrological assumptions in predictive models, such as the assumed stability between watershed characteristics (e.g., area, slope, and elevation) and streamflow. These models are built on the premise of natural, unaltered systems; however, diversions artificially reduce flow magnitudes, weakening the model’s ability to make accurate predictions. The impacts of these diversions extend across watersheds, introducing errors that propagate through the network, distorting predictions for locations. This issue becomes particularly pronounced during extreme hydrological events such as drought, where unauthorized diversions can exacerbate low-flow conditions during droughts, further undermining prediction reliability. The lack of regulation and documentation of unauthorized diversions complicates efforts to quantify their impact, exacerbating underestimation and reducing the overall accuracy of streamflow models. To enhance future predictive models, we recommend incorporating the proportion of agriculturally licensed water use areas relative to the total agricultural or water consumption area, ensuring a more comprehensive representation of human-induced hydrological impacts.
Another factor that could introduce uncertainty is the presence of springs within the watershed, which can impact localized hydrology. Springs, as localized sources of groundwater, contribute to streamflow, particularly during dry periods. Their impact can vary based on size, location, and seasonal groundwater variations. Since spring flow may not always be captured by traditional gauging stations or models focused on surface runoff, their contribution can be overlooked, leading to inaccuracies in streamflow predictions. Additionally, springs can alter local flow dynamics, especially in areas where groundwater discharge significantly influences streamflow. It may be contended that the influence of springs—whether perennial or ephemeral—is inherently included in the discharge measurements recorded at downstream hydrometric stations. Indeed, in such cases, the spring contributions are implicitly represented in the training data used by the model. However, our concern lies in the localized and heterogeneous nature of spring inflows, which may not be uniformly distributed across different watersheds. When using data-driven models such as the proposed technique, particularly those that leverage data from neighboring watersheds or use regional generalizations, the presence or absence of spring contributions introduces site-specific hydrological complexity. These complexities are not always captured effectively when predictor variables do not explicitly account for groundwater–surface water interactions. Moreover, while perennial springs tend to have more stable contributions throughout the year, seasonal variability and interannual changes in spring discharge (due to variations in groundwater recharge, land use, or climatic conditions) can subtly alter flow dynamics, especially during transition seasons like spring and fall. In models that do not explicitly parameterize groundwater discharge or spring dynamics, this can manifest as prediction discrepancies [
27]. In summary, while spring contributions are captured at the gauge level, their variability and spatial non-uniformity—especially when transferring model assumptions or training across watersheds—can still pose challenges for predictive accuracy, and this is the context in which they were noted as a potential disturbance factor.
A relevant example of this issue can be found in Wilfred Creek (PD200737), located in the Chilliwack region of BC. In this area, a spring plays a significant role in contributing to the flow of the creek, which has implications for streamflow predictions. The presence of the spring, which feeds groundwater into the creek, introduces complexities in accurately forecasting the flow, as traditional models may not capture the contribution of this subsurface water source. Since the flow from the spring is not always accounted for by standard runoff-based models, the predictions for PD200737 can be inaccurate when relying solely on surface runoff data.
Table 4 shows that when the runoff generated exclusively by the spring is incorporated into the predicted flow values, the prediction accuracy improves significantly. This highlights the importance of considering all water sources, including springs, in hydrological modeling. Without integrating these local hydrological factors, such as spring contributions, the predictions could be misleading, resulting in the over- or underestimation of streamflow. Therefore, understanding the full spectrum of hydrological processes, including spring-fed contributions, is essential for improving model accuracy and making more reliable streamflow forecasts in such areas.
To address these challenges regarding unauthorized diversions and the contribution of springs, it is essential to implement monitoring and quantification systems to track diversions, correct historical flow records to account for these alterations, make use of local indigenous knowledge, and incorporate anthropogenic influences into prediction models. This requires a better understanding of local hydrology and translating those factors into the modeling procedure. This translation was beyond the scope of the current study. Implementing these strategies is vital for improving the accuracy of streamflow predictions and ensuring sustainable water resource management, particularly during “high demand” irrigation periods. It is important to clarify that the term “high demand” in the aforementioned sentence is used specifically in the context of water supply. While individual water diversions authorized under existing licenses on the stream may appear minor, their cumulative impact on low-flow systems, typically with discharges below 1 m3/s, can be substantial. This is particularly true during dry summer months when the natural baseflow is already limited. These small but numerous withdrawals can introduce noticeable variability in observed streamflow, which, in turn, affects the accuracy of model predictions.
While we acknowledge that the lumped nature of the modeling technique introduces certain limitations in accounting for the detailed hydrological behavior of individual watersheds, which may reduce predictive efficiency and necessitate the inclusion of more localized information to improve accuracy, it is important to emphasize the primary purpose of the proposed modeling approach. This technique is designed to support informed decision-making regarding water licensing applications and to provide a standardized method that can be readily reused by water management teams across the province.
For the sake of simplicity and operational efficiency, the proposed modeling technique is a practical and viable option. One of its key advantages is that it does not impose any financial burden on the province, unlike distributed or semi-distributed modeling techniques, which typically require significant investments in data collection, software, and expert personnel.
While we recommend this technique for general flow prediction tasks, we acknowledge that it may not be suitable for complex applications in water resources engineering and hydraulics. For critical applications—such as the design of hydraulic infrastructure like flood diversion systems or spillways, or for flood simulations—that require higher temporal resolution (e.g., hourly rather than monthly data, which our currently methodology is based on), we strongly recommend using more detailed, localized watershed data and modeling approaches. This involves supplementing basic parameters, such as watershed slope and elevation, with additional meaningful descriptors to improve the precision and reliability of the analysis.
Finally, it is important to acknowledge a foundational challenge in hydrological modeling: the reliability of streamflow data derived from stage–discharge rating curves. These curves, which convert water level observations into discharge estimates, are essential for flow monitoring but are subject to long-term inaccuracies due to changes in riverbed morphology, sedimentation, and anthropogenic alterations. While the streamflow data used in this study were sourced from national hydrometric networks and subjected to standard quality control procedures, the potential influence of rating curve variability cannot be entirely eliminated. To reduce sensitivity to such uncertainties, our approach operates at a monthly time scale, which tends to smooth short-term anomalies and reduce noise associated with episodic rating curve shifts. Additionally, by selecting neighboring watersheds from hydrologically coherent zones, we ensure a degree of consistency in both flow-generating mechanisms and data quality across sites. Nonetheless, we recognize that the evolving nature of rating curves remains a fundamental limitation in hydrology that must be considered when applying and interpreting model results, particularly in regions experiencing rapid morphological or land use changes.