2.1. Study Area and Data
The Big East River (620 km
2) and the Black River (1522 km
2) watersheds, located in the northern part of Ontario, Canada, are chosen for the implementation of the proposed BMA scenario-based analysis (
Figure 1). Both basins are mostly forested regions and their landscapes are moderately sloped with mean elevations of 450 and 300 meters above sea level for the Big East River and Black River watersheds, respectively. The historical daily streamflow data at the outlet of both watersheds (the only hydrometric station of each watershed) illustrate that high flows mostly occur in April when the snowmelt process plays an important role. Moreover, as can be seen from
Figure 1, the only six available Environment Canada (EC) meteorological stations with reliable and sufficient historical data are located outside the boundaries of both watersheds. This represents an actual condition of watersheds with limited data availability. Analysis of the precipitation and temperature time-series of these six stations approximately shows the annual mean precipitation and the daily average temperature of 1050 mm and 5 °C, respectively. Moreover, the winter and summer average temperature are −9 °C and 18 °C, respectively, showing that all four seasons are defined clearly in both study areas (
Figure 2).
Besides the ground-based precipitation data, the archive of the daily aggregated form of the Canadian Precipitation Analysis (CaPA) was used as an alternative precipitation forcing input for hydrologic modeling of both watersheds. The CaPA is a gridded precipitation product with a spatial resolution of 15 km produced by the Meteorological Service of Canada based on the combination of various data sources, such as radar data, climate model data, and observations [
46]. It was shown that the archived CaPA is a potential reliable source of precipitation for data-scarce regions [
47]. In order to initially assess the precipitation variability of each basin using different datasets, primary analysis was performed. Two mean areal precipitation time-series for each watershed were derived from interpolated EC ground-based data using an inverse distance weighting method [
48] and the CaPA data by applying a Thiessen polygon approach [
49]. As can be seen from
Figure 3, although CaPA provided more intense rainfalls specifically in the Black River watershed, it underestimated the amount of precipitation compared with the EC data in both watersheds. Moreover, the calculated daily correlation coefficients between EC- and CaPA-derived datasets (0.83 and 0.87 for the Big East River and Black River watersheds, respectively) show evidence of a linear relationship. However, by focusing on intense rainfall events (precipitation > 10 mm/day), the correlation coefficients were dramatically decreased to 0.42 and 0.48 for the Big East River and Black River watersheds, respectively. Therefore, there are remarkable differences between two datasets, especially at intense rainfall events, suggesting a significant amount of input uncertainty in poor-data watersheds. So, the authors used CaPA as a second forcing data for hydrologic models, which can help obtain a better quantification of the predictive uncertainty in the rainfall-runoff process using a Bayesian model averaging approach.
2.2. Standard Bayesian Model Averaging Technique
Bayesian model averaging is a statistical method for estimating probabilistic prediction based on various competing forecasts, possessing more reliability and accuracy than initial ensemble predictions. In this approach, the weighted averages of the individual forecasts’ probability distribution functions (PDFs) are used for generating the posterior distribution of forecasting variables. It was claimed through different studies that the higher weights are considered for better performing predictions in the training period [
30,
32,
35,
40,
45].
Consider
as a quantity which is going to be forecasted (i.e., predictand) and, therefore,
denotes the training period of observation with data length
. Having
different models (i.e.,
) results in
, the ensemble of model predictions for the aforementioned training period, where
. Based on the law of total probability and the assumption about the independence of different model forecasts, the PDF of the predictand conditioned on the models over the given training period can be formulated as follows [
15]:
where
is the posterior distribution of
given the prediction of model
and observed data
, which simply can be considered as the forecast PDF of
y based on model
. Moreover,
is the posterior probability or the likelihood of the model’s
prediction being correct over the training period. Due to the assumption of models’ independency, the posterior probabilities of models should sum to unity,
, and, consequently, they can be considered as weights (i.e.,
is the weight of model
). Furthermore, in the BMA approach, it is assumed that the model forecasts are unbiased, meaning that the expected value of the difference between observation and each model forecast should be equal to zero (i.e.,
for
). So, before BMA implementation, a bias-correction method should be used in order to create an unbiased ensemble of predictions. Although there are several bias-correction methods which all can be used for this aim, a linear-regression technique is utilized in the original BMA [
16]. The bias-corrected results,
(where
and
are the coefficients of the linear regression model), are replaced with the original model forecasts (
). Therefore, the BMA predictive model (Equation (1)) can be rewritten as follows:
On the other hand, in the original BMA method, it is assumed that the aforementioned posterior probability (i.e.,
) follows the normal (Gaussian) distribution,
, with mean
and variance
, reflecting the uncertainty within the individual model
. As explained in the introduction, some studies discussed that this assumption is a poor choice for a non-Gaussian forecast variable like streamflow. Therefore, they proposed implementing more representative distribution types (e.g., gamma distribution) or applying data transformation procedures (e.g., the Box–Cox transformation method [
50]) for transforming data from their original space to a Gaussian space. It is worth mentioning that in the case of applying a data transformation procedure, the reverting process has to be able to apply in order to revert back to the original variable space.
Finally, based on Equation (2) and considering the Gaussian distribution, the BMA predictive mean and its associated variance can be determined using the two following equations [
15,
16]. The mean value is the weighted average of individual predictions, and the BMA variance consists of (1) between-model variance, reflecting the spread of the ensemble, and (2) within-model variance that represents the uncertainty regarding each model having the best forecast.
Successful implementation of the BMA method relies on the proper estimation of the parameters including weights (
) and variances (
) of each individual prediction (
). Following Raftery et al. [
16], in the standard BMA, the EM algorithm is utilized in order to maximize the log-likelihood function of the parameter vector (
) being approximated as follows:
Given that there is no analytical solution for maximizing the summation of the aforementioned function over the training period, an iterative procedure such as the EM algorithm was used. In this procedure, the optimization problem was set by introducing a latent variable (
). Apart initialization, this algorithm included an (1) expectation step, where the latent variable was calculated based on the current values of parameters, and a (2) maximization step, where the parameters were estimated according to the determined value of the latent variable (
Figure 4b). It is worthy of note that, although the EM algorithm is computationally efficient, it is argued that using other optimization methods can lead to more robust estimation of the parameters.
According to the above equations, the flowchart of the classical BMA implementation is depicted in
Figure 4a. As previously stated, some studies have been done in order to improve the reliability of the standard BMA approach by modifying some parts of the BMA structure. However, no comprehensive evaluation has been completed in order to clarify the effects of these modifications.
2.3. BMA Scenario-Based Analysis
In order to achieve the main goal of this research, we designed a BMA scenario-based analysis (
Table 1) to see how the predictive streamflow simulation of the BMA approach was affected by modifying or changing some steps of the original BMA procedure. Implementation of the proposed evaluation allowed to assess how the accuracy and reliability of the BMA probabilistic results are sensitive to considering (1) different streamflow ensemble scenarios; (2) various data transformation methods; (3) more representative distribution types; (4) different standard deviation definitions; and (5) different optimization methods for parameter estimation. These scenarios are chosen in a way that cover most of the aforementioned modifications proposed by previous studies (explained in
Section 1). Therefore, the effects of each modification or the combinations of modifications on BMA results can be assessed completely through the proposed analysis. The following paragraphs present a brief description of all aforementioned modification sections.
2.3.1. Streamflow Ensemble
As mentioned before, the ensemble can stem from different sources. Apart from considering different hydrologic models, various forcing precipitation inputs, as well as different reliable parameter sets of each rainfall-runoff model, can be considered for generating an ensemble of streamflow simulations. In this study, four different scenarios were determined to see how the BMA performance would change by considering a different number of ensemble members coming from various sources. In the first scenario, which was named “Multi-Model”, the ensemble was only based on different hydrologic models. In the two other scenarios (i.e., Multi-Model Multi-Input and Multi-Model Multi-Parameter), besides multiple hydrologic models, different precipitation datasets and various parameter sets were respectively utilized. Moreover, the last scenario was defined using all aforementioned sources (i.e., Multi-Model Multi-Input Multi-Parameter).
2.3.2. Data Transformation Methods
Four different data transformation procedures were assessed in the case of assuming normal function for the posterior distributions. The Box–Cox transformation method is a family of power transformations, and one of the common approaches is formulated as follows [
50]:
and
are the original and transformed data, respectively.
is the Box–Cox coefficient and its common optimum value will be estimated using (1) observation data (i.e., Type 1) or (2) observation and simulations data (i.e., Type 2) by maximizing the log-likelihood function. Moreover, in the logarithmic transformation method, the daily streamflow data are transformed using natural logarithm in order to make them approximately follow the normal distribution. Another data transformation method evaluated in this study was the Empirical Normal Quantile Transformation (ENQT) procedure [
51]. In this approach, the transformed data were calculated using the following equation, where
is the inverse of the standard normal distribution and the empirical cumulative distribution of each value is denoted by
.
It is of note that, instead of the empirical distribution, the generalized Pareto distribution is fitted to extrapolate the upper tail of the sample in the case of having a value which falls outside the range of the calibration data.
2.3.3. Distribution Types
Apart from using normal distribution, which is the main assumption of the original BMA method, the log-normal, gamma, and Weibull distributions are implemented as the conditional probability distribution function in Equation (2). These distributions are more representative for highly skewed data such as daily stream flows and may lead to better results.
2.3.4. Standard Deviation Types
In this study, following Vrugt [
37], six various standard deviation parameterizations of the forecast distributions were assessed. The terms “common” and “individual” are used when all members of the ensembles have the same and distinct standard deviations, respectively. The other two terms illustrate if the standard deviations are dependent on the magnitude of the streamflow data (“non-constant”) or not (“constant”). Moreover, the last two types are defined by adding constant value in order to make the standard deviation be more than zero in all cases. The equations of all aforementioned standard deviation types and their corresponding number of parameters are presented in
Table 2. In these equations,
and
, respectively, denote the standard deviation and the daily discharge of the
th simulated streamflow at time-step
. Also,
is the total number of members in the ensemble.
2.3.5. Optimization Methods
Given the criticism of the EM algorithm regarding its ability to achieve the global optimum estimation and its lack of flexibility in applying to the various aforementioned modifications, the dynamically dimensioned search (DDS) method [
52] was used as the alternative optimization technique for estimating the BMA parameters. Dynamically dimensioned search is a single global optimization method which finds the optimal solution by dynamically rescaling the search space dimension. Similar to the EM algorithm, the log-likelihood of the BMA parameter vector is considered as the objective function in the DDS optimization approach. Correspondingly, the DDS parameter estimations can be utilized as benchmarks for evaluating the application of the EM algorithm.
2.4. Hydrological Models
Using different hydrologic models for generating an ensemble of competing simulated stream flows is the main basis of the BMA approach [
9]. As listed in
Table 3, the seven different rainfall-runoff models implemented in this study are SAC-SMA, MAC-HBV, SMARG, GR4J, and three HEC-HMS [
53] based models. There are different methods available for each part of the hydrologic cycle in the HEC-HMS platform. In this study, we used the rational combination of loss (i.e., deficit and constant, and soil moisture accounting) and baseflow (i.e., recession and linear reservoir) methods for generating the HEC-HMS-based models with different structures. In the HEC-HMS type 1 and 2, the recession baseflow method is implemented with the deficit and constant and soil moisture accounting loss approaches, respectively, while HEC-HMS type 3 is developed using the combination of the soil moisture accounting and linear reservoir methods.
All of the aforementioned models are lumped conceptual ones, which have been shown to provide comparable or even better performance in comparison to the more complex models (e.g., distributed models) in data-poor watersheds [
54,
55,
56]. Moreover, by adding the simplified Thornwaite formula [
57,
58] to the first four models and feeding HEC-HMS models the average monthly potential evapotranspiration calculated using Hargreaves equation [
59], the only inputs to all models are the mean areal daily precipitation and temperature. Also, streamflow estimation at the outlet of the watershed is the only output of these models. It is worth mentioning that due to the importance of the snow accumulation and melt process in cold regions, three different snowmelt modules are implemented with different hydrologic models. The available temperature-index method in the HEC-HMS software [
53] was used for the three aforementioned HEC-HMS-based models. The simple degree-day snowmelt module (DDM) [
58] was added to the SMARG and GR4J models, while the SACSMA and MACHBV models were combined with the more complex SNOW17 snowmelt estimation method [
60,
61] for snow–rainfall discrimination and quantifying snowpack changes over the simulation period.
On the one hand, in the DDM approach, the snowmelt is calculated using a linear relationship between snowmelt and air temperature, where a constant melt rate factor is considered. However, the antecedent temperature index is used for melt-rate determination in the HEC-HMS snowmelt approach [
62]. On the other hand, the SNOW17 is a process-based temperature-index method that considers different physical processes in the snowmelt procedure such as energy exchange between air and snow, heat storage and deficit of the snowpack, liquid water storage, etc. Also, upper and lower preset temperature thresholds are used for distinguishing between rainfall and snowfall in both the DDM and SNOW17 models [
63]. For a more detailed description of all snow routines, the readers are referred to the aforementioned citations.
Furthermore, five different objective functions, including Nash–Sutcliffe efficiency (NSE) [
68], Kling–Gupta efficiency (KGE) [
69], Nash volume error (NVE) [
58], peak-weighted root mean square error (PWRMSE) [
70], and modified Nash volume error (MNVE) were used through the dynamically dimensioned search (DDS) algorithm for finding the optimized parameter sets of each individual model. The latter objective function was defined in order to greatly focus on high flows by using the NSE based on square of discharge (
):
where volume error (
) is:
and
based on square of discharge (
) is calculated as follows:
In the above equations, and are the observed and simulated streamflow, respectively, while is the data length. The years 2006 to 2011 were considered the calibration period and the validation was carried out for the 2012–2015 (4 years) period. It is of note that the best performing parameter set of each individual model, determined based on validation results, is utilized for generating multi-model and multi-model multi-input ensemble scenarios. For a detailed description of the aforementioned hydrologic models and objective functions, the readers are referred to the cited references.
2.5. Performance Evaluation Metrics
Five model evaluation statistics are used for comparing the accuracy, reliability, and sharpness of the results of different BMA variants. The accuracy is defined as the error between deterministic simulations and their corresponding observations. In this study, besides the well-known Nash–Sutcliffe efficiency criteria,
being calculated according to squared (
; Equation (10)) and logarithmic (
; Equation (11)) transformed streamflow data, were the two other deterministic performance criteria being, respectively, focused on the accuracy of the high- and low-flow simulations.
is the observed variable and represents the simulated variable which is considered to be the expected value of the BMA predictive simulation. Also, is the length of the dataset. All -based criteria vary between −∞ and 1 with the best value of 1.
Furthermore, two other probabilistic performance measurements proposed by Xiong et al. [
71] were adopted for quantitative evaluation of the BMA probabilistic results. The containing ratio (
) is defined as the percentage of the observed data which falls within the 95% confidence interval, and the average bandwidth (
) is the average width of the corresponding bound. The former measures the reliability while the latter is used for quantifying the sharpness of the results. Given two forecasts with the same
(i.e., same reliability), the one with a smaller
shows a greater precision.
In the above equations, the number of observations being contained in the 95% confidence interval is denoted by and , respectively, show the upper and lower boundaries of the 95% confidence interval at time-step . In addition, for evaluating the probabilistic performance of different BMA variants regarding high flows, we calculated the two aforementioned probabilistic indices using the streamflow values of more than 90 percentiles (denoted by and for the containing ratio and the average bandwidth, respectively).