Novel Ensemble Forecasting of Streamﬂow Using Locally Weighted Learning Algorithm

: The development of advanced computational models for improving the accuracy of streamﬂow forecasting could save time and cost for sustainable water resource management. In this study, a locally weighted learning (LWL) algorithm is combined with the Additive Regression (AR), Bagging (BG), Dagging (DG), Random Subspace (RS), and Rotation Forest (RF) ensemble techniques for the streamﬂow forecasting in the Jhelum Catchment, Pakistan. To build the models, we grouped the initial parameters into four different scenarios (M1–M4) of input data with a ﬁve-fold cross-validation (I–V) approach. To evaluate the accuracy of the developed ensemble models, previous lagged values of streamﬂow were used as inputs whereas the cross-validation technique and periodicity input were used to examine prediction accuracy on the basis of root correlation coefﬁcient (R), root mean squared error (RMSE), mean absolute error (MAE), relative absolute error (RAE), and root relative squared error (RRSE). The results showed that the incorporation of periodicity (i.e., MN) as an additional input variable considerably improved both the training performance and predictive performance of the models. A comparison between the results obtained from the input combinations III and IV revealed a signiﬁcant performance improvement. The cross-validation revealed that the dataset M3 provided more accurate results compared to the other datasets. While all the ensemble models successfully outperformed the standalone LWL model, the ensemble LWL-AR model was identiﬁed as the best model. Our study demonstrated that the ensemble modeling approach is a robust and promising alternative to the single forecasting of streamﬂow that should be further investigated with different datasets from other regions around the world.


Introduction
To understand the current state, potential, and prospects of water availability, systematic studies on all aspects of basin hydrology (e.g., precipitation, surface, and sub-surface water) and investigation of all indicators are required [1][2][3]. Streamflow is one such indicator that has a direct influence on local drinking water supply and the quantity of water available for irrigation, hydro-electricity generation, and other needs [4]. Indeed, projections have shown that 20% of the river discharge is controlled by human interventions [5]. Changes in land use and land cover over time, glaciers, snowfields, topographic boundaries, dams, and reservoir management are some of the key factors influencing Rotation Forest (RF), to develop five ensemble models for a novel ensemble forecasting of streamflow. We apply the models to the lagged streamflow time-series input derived from the antecedent streamflow data. To the best of our knowledge, the LWL technique has not yet been investigated for streamflow forecasting and this study is the first to use and compare different versions of the LWL-based ensemble models for this purpose.

Case Study
For this study, the Jhelum Catchment located in the western Himalayas in the north part of Pakistan was selected. This catchment originates from India and drains the southern slope of the Greater Himalayas and the northern slope of the Pir Punjal Mountains. The upstream side of the basin located in India is occupied with great glaciers. Due to climate change in recent years, this transboundary river in Pakistan side is greatly affected by glacier melt. Pakistan has a key reservoir (i.e., Mangla Reservoir) downstream of this basin. This reservoir is the second biggest reservoir in Pakistan with an installed capacity of 1000 MW and fulfills 6% of the electricity generation demand of the country. Therefore, precise estimation of this key catchment is very crucial for the economy and sustainability of water resources in Pakistan. This catchment mainly consists of two main sub-basins, that is, the Naran and Neelum basins. The catchment covers a drainage area of 33,342 km 2 up to Mangla Dam with an elevation variation of 200 m to 6248 m. For accurate estimation of streamflow in this basin, the key hydraulic station, that is, Kohala station, at the main river Jhelum streamline after the confluence of both key tributaries (Neelum and Naran) was selected as shown in Figure 1. For model development, the monthly streamflow data of the selected station were obtained from the Water and Power Development Authority (WAPDA) of Pakistan for the duration of 1965 to 2012. For a robust data analysis with the models, a cross-validation scheme was applied. Therefore, data were divided into four equal datasets where each dataset was used for model testing whereas the other three datasets were set aside for model training. (DG), Random Subspace (RS), and Rotation Forest (RF), to develop five ensemble models for a novel ensemble forecasting of streamflow. We apply the models to the lagged streamflow time-series input derived from the antecedent streamflow data. To the best of our knowledge, the LWL technique has not yet been investigated for streamflow forecasting and this study is the first to use and compare different versions of the LWL-based ensemble models for this purpose.

Case Study
For this study, the Jhelum Catchment located in the western Himalayas in the north part of Pakistan was selected. This catchment originates from India and drains the southern slope of the Greater Himalayas and the northern slope of the Pir Punjal Mountains. The upstream side of the basin located in India is occupied with great glaciers. Due to climate change in recent years, this transboundary river in Pakistan side is greatly affected by glacier melt. Pakistan has a key reservoir (i.e., Mangla Reservoir) downstream of this basin. This reservoir is the second biggest reservoir in Pakistan with an installed capacity of 1000 MW and fulfills 6% of the electricity generation demand of the country. Therefore, precise estimation of this key catchment is very crucial for the economy and sustainability of water resources in Pakistan. This catchment mainly consists of two main sub-basins, that is, the Naran and Neelum basins. The catchment covers a drainage area of 33,342 km 2 up to Mangla Dam with an elevation variation of 200 m to 6248 m. For accurate estimation of streamflow in this basin, the key hydraulic station, that is, Kohala station, at the main river Jhelum streamline after the confluence of both key tributaries (Neelum and Naran) was selected as shown in Figure 1. For model development, the monthly streamflow data of the selected station were obtained from the Water and Power Development Authority (WAPDA) of Pakistan for the duration of 1965 to 2012. For a robust data analysis with the models, a cross-validation scheme was applied. Therefore, data were divided into four equal datasets where each dataset was used for model testing whereas the other three datasets were set aside for model training.

Locally Weighted Learning (LWL) Algorithm
The locally weighted learning (LWL) algorithm is motivated by the classification of example-based approaches [30]. In this algorithm, the regression model is not processed unless the output value of the new vector is presented. This is required to correctly execute all learning at the prediction moment. LWL is an advanced type of M5 method in a way that suits both linear and non-linear regression in space for the unique fields of example [31]. Based on the weighted results, distance according to the questionnaire was used to allocate the weights to the training datasets and a regression equation is produced. There is a wide range of methods of distance-based weighting that can be used on the basis of the problem preference in LWL [32]. The statistical model for basic linear regression and the linear model of the multiple regression are presented, respectively, in Equations (1) and (2): where y is the response (dependent variable), x is the predictor (independent variable), y i and ε i represent random variables, and xi is constant. The linear existence of the model is due to β-parameters. The LWL objective function of squared error is expressed as follows: where F is the function of objective, w is the weight function matrix, M is total variables number, ε k is the random error, and ∝ K0 . . . . . ∝ kn are regression coefficients.

Bagging
Bagging or "Bootstrap Aggregating" is a method composed of two major steps for getting more stable, robust, and precise models [33,34]. Bagging is one of the stable ensemble learning techniques used for resampling the training dataset. The first phase consists of bootstrapping the raw data samples that make up the various sets of training data. From these training datasets, multiple models are created. Prediction is generated from the continuous training processes for datasets and multiple models. The underlying notion of the Bagging technique is straightforward. Instead of generating predictions from a standalone model that is appropriate for the actual data, the relationship between the input-output variables is defined by multiple models generated. Then using the weighted average in the Bagged algorithm, various models are coupled to form a single output [35,36]. This strategy can effectively reduce the possible uncertainties in the modeling process. Previous works prove that Bagging is a favorable choice for ensemble modeling of many environmental problems [29].

Additive Regression
Additive Regression was first developed by Stone [37] as a nonparametric method to approximate a multivariate function by using multiple unary functions. For the dependent variable Y and the independent variables X 1 , X 2 , . . . , X p , the nonparametric additive model can be given by: where f i (X i ) is a unary nonparametric function. To satisfy the identifiable conditions, it is generally required that f i (X i ) = 0, i = 1, 2, . . . , p. Compared to traditional linear models, the nonparametric regression model does not pre-suppose the relationship between variables and the form of the regression function. Further, it is an adaptable and robust data-driven model that can yield a better approximation for nonlinear nonhomogeneous problems [38]. Given these advantages, many researchers applied this technique to study the linear and nonlinear relationships in environmental problems [39].

Random Subspace (RS)
Random Subspace (RS) was developed by Ho [40] as a new ensemble learning technique for resolving real-world problems. The numerous classifiers of this technique are combined and trained on an altered feature space to generate multiple training subsets for the classifiers, which are the training bases. RS applies multiple samples on function space, as opposed to the example space as in other ensemble models, as stated by Havlíček, et al. [41]. This strategy takes advantage of bootstrapping and grouping. The RS inputs are the training set (x), the base-classifier (w), and the subspaces number (L) [42]. It is strongly recommended by Pham, et al. [43] that this approach be used to prevent over-fitting issues and to cope with the most unnecessary datasets.

Dagging
Ting and Witten [44] pioneered the Dagging algorithm as a resampling ensemble technique that uses most votes to combine various classifiers to get improved prediction accuracy for the base classifier. Dagging generates multiple different samples instead of producing the bootstrap samples to acquire the base classifier. In recent years, it has been considered a promising machine learning algorithm for classification problems. In the real world, the Dagging ensemble technique has been applied to solve different classification problems. The development of an M dataset can occur with a specific training dataset containing N samples which may come from the existing training datasets [45,46]. There are many n (n < N) samples in any dataset that are distinct from each other. In the particular training datasets, the variables are not replaced and can be chosen as a part of the dataset specified where the size of sample datasets is expanded. According to that, a base classifier is installed on any sample dataset. Ultimately, depending on the training dataset, many classifiers can be acquired. The capability of Dagging has been frequently proven for obtaining improved predictive modeling of different classification problems [29,47].

Rotation Forest
Rotation Forest (RF) is an ensemble learning technique that independently trains L decision trees using, for each tree, a different set of extracted features. Suppose the x = (x 1 , . . . x n ) T represents an example defined by n characteristics (attributes) and let X be an N × n matrix including examples of the training process. We assume that the actual class labels of all instances of training are also given. Let go of D = {D 1 , . . . D L } is the set of classifiers for L and F is the set of characteristics. The purpose of Rotation Forest is to create precise and diverse classifiers. As in Bagging, bootstrap samples are taken as the training collection for the individual classifiers. The key heuristic is to introduce extraction of features and to recreate a complete feature set for each classifier in the ensemble afterward [48]. The feature collection is randomly divided into K subsets to do this. The principal component analysis (PCA) is run on each subset separately, and a new set of n linear extracted features is constructed by pooling all main components. The data are translated into the new space of the function linearly. With this data collection, classifier D i is educated. Multiple splits of the collection of features will contribute to various extracted features, thereby leading to the diversity of the bootstrap sampling implemented.

Ensemble Forecasting
Ensemble forecasting of the monthly streamflow was performed using the LWL algorithm that was used as the base model and was combined with the Additive Regression (AR), Bagging (BG), Dagging (DG), Random Subspace (RS), and Rotation Forest (RF) ensemble techniques. This combination resulted in five ensemble models, namely the ensemble LWL-AR, LWL-BG, LWL-DG, LWL-RS, and LWL-RF models. In each model, the ensemble learning technique performs resampling of the training dataset to train the base LWL algorithm. Table 1 details the summary of statistical characteristics of the data used in this study. To build the models, we grouped the initial input parameters into four different scenarios of input data. They include: where Qt-1 is the streamflow at 1 previous month and vice versa and MN is the month number of the streamflow. In a cross-validation approach, data were divided into four equal sets such that three sets were used for model training and the remaining set was used for validation [49][50][51]. We used several performance metrics to measure the performance of the models during both training and validation phases. These metrics include: correlation coefficient (R) (Equation (5), root mean square error (RMSE) (Equation (6), mean absolute error (MAE) (Equation (7), relative absolute error (RAE) (Equation (8), and root-relative square error (RRSE) (Equation (9). A full description of these metrics can be found in the corresponding literature [2,24,[52][53][54][55].
where P is the value predicted, T is the target value, P and T are the mean predicted and target values. We developed the models using the open-source Weka software on an HP Laptop with an Intel(R) Core (TM) i3-3110M CPU @ 2.40GHz, 4 GB of RAM, an x64-based processor, and the Microsoft Windows 8.1 operating system. The optimum value for each model parameter was identified via a trial-and-error process. To do so, we arbitrarily entered different values until the best model performance was achieved [36,56]. Table 2 details the optimum parameter setting of each model.  Table 3  Importing periodicity (i.e., MN) as an additional input variable into the model considerably improved both the training performance and prediction performance. A comparison between the results obtained from the input combinations III and IV revealed a significant performance improvement, that is, RMSE, MAE, RAE, and RRSE decreased up to 10.12, 14.59, 15.41, and 10.69% in the training phase and 6. 17, 9.41, 8.40, and 9.56% in the testing phase, respectively. In terms of the R metric, the results showed 5.3 and 6.1% training and testing improvements when we used input combination IV. Further, the results revealed that the best and worst predictive performance (i.e., testing performance) was obtained with the datasets M3 and M2, respectively. The best performance is shown in bold.

Results
The results of the five ensemble models, that is, LWL-AR, LWL-BG, LWL-DG, LWL-RS, and LWL-RF, are summed up in Tables 4-8   The best performance is shown in bold. The best performance is shown in bold. The best performance is shown in bold. The best performance is shown in bold. The best performance is shown in bold.
A comparison between the results obtained from the single LWL model and its ensembles clearly indicates that the ensemble learning techniques considerably improved the training and testing performances of the base LWL algorithm. The ensemble models achieved greater training performance than the single LWL model by about 44.7, 44.7, 47.8, 44.7, and 13.9% in terms of the RMSE, MAE, RAE, RRSE, and R metrics, respectively. In the case of the testing performance, LWL-AR showed 53.3, 54.5, 55, 53.8, and 22.4% improvements. Similarly, testing performance improvements in the corresponding metrics are 8, 7.1, 8, 9, and 4.5% by applying LWL-BG, 12.5, 8.2, 8.3, 13.2, and 5.5% by applying LWL-DG, 3.1, 3.2, 1.9, 4.7, and 4.1% by applying LWL-RS, 12.6, 10, 11, 13.8, and 7.3% by applying LWL-RF, respectively.
A comparison of the models' outcomes also reveals that the ensemble LWL-AR model performed better than the other models in both training and testing phases of the monthly streamflow modeling. The LWL-DG and LWL-RF models showed similar performance and ranked as the second-best models, followed by the LWL-RS model that was identified as the least effective ensemble model.
To further compare the models' performance, we used time variation, scatter plots, and Taylor and violin diagrams to visualize the results obtained from the best input combination (i.e., M3-IV). Figure 2 shows that LWL-AR predictions are much closer to the observed values compared to the other models. Figure 3 reveals that the ensemble LWL-AR model performed better compared to other models in catching the extreme streamflow values (minimum and maximum), which is an important indicator in water resource management and for the evaluation of extreme events such as drought and flood. values (minimum and maximum), which is an important indicator in water resource management and for the evaluation of extreme events such as drought and flood. Figure 4 compares the single LWL model with its ensemble models in low streamflow (i.e., lower than 500 m 3 /s) prediction and clearly demonstrates the superiority of LWL-AR in catching the minimums of streamflow. Figure 5 shows the scatter plots of the observed and predicted monthly streamflow for the best input combination (i.e., M3-IV). While the single LWL model resulted in a highly scattered prediction with R 2 = 0.809, the LWL-AR ensemble model produced a fit line equation (y = 0.9401x + 56.669) close to the exact line (y = x) with the highest R 2 value (0.867) compared to the other models.   Figure 4 compares the single LWL model with its ensemble models in low streamflow (i.e., lower than 500 m 3 /s) prediction and clearly demonstrates the superiority of LWL-AR in catching the minimums of streamflow. Figure 5 shows the scatter plots of the observed and predicted monthly streamflow for the best input combination (i.e., M3-IV). While the single LWL model resulted in a highly scattered prediction with R 2 = 0.809, the LWL-AR ensemble model produced a fit line equation (y = 0.9401x + 56.669) close to the exact line (y = x) with the highest R 2 value (0.867) compared to the other models.     Figure 6 shows the Taylor diagram of the models and indicates how well the models match each other in terms of their standard deviation and correlation difference. Among the different models, LWL-AR achieved a closer standard deviation to the observed data with the lowest square error and highest correlation, which is followed by the LWL-BG and LWL-DG models. Figure 7 shows the violin graph of the models and indicates that LWL-AR achieved a data distribution similar to the observed data, which is followed by the LWL-DG model.
Overall, our case study demonstrated that the ensemble models successfully outperformed the single LWL model and provided promising accuracy for streamflow forecasting. Due to the non-linear nature of many environmental processes and phenomena (e.g., streamflow), hybrid ensemble models that benefit from the advantages of multiple methods/models can better capture the complexity of these phenomena and often yield more accurate results than single simple models.  Figure 6 shows the Taylor diagram of the models and indicates how well the models match each other in terms of their standard deviation and correlation difference. Among the different models, LWL-AR achieved a closer standard deviation to the observed data with the lowest square error and highest correlation, which is followed by the LWL-BG and LWL-DG models. Figure 7 shows the violin graph of the models and indicates that  LWL-AR achieved a data distribution similar to the observed data, which is followed by the LWL-DG model.  Overall, our case study demonstrated that the ensemble models successfully outperformed the single LWL model and provided promising accuracy for streamflow forecasting. Due to the non-linear nature of many environmental processes and phenomena (e.g.,  LWL-AR achieved a data distribution similar to the observed data, which is followed by the LWL-DG model.  Overall, our case study demonstrated that the ensemble models successfully outperformed the single LWL model and provided promising accuracy for streamflow forecasting. Due to the non-linear nature of many environmental processes and phenomena (e.g.,

Discussion
In all ensemble models, considering periodicity (i.e., MN) as an additional input variable substantially improved both the training performance and predictive performance. During the testing phase, for the LWL-AR model, the improvements in RMSE, MAE, RAE, RRSE, and R were up to 13, 17.9, 17.5, 20.5 [57] demonstrated the improved performance of the three types of ANN models using the periodicity variable for the prediction of monthly streamflow of the Canakdere and Goksudere rivers, Turkey. Adnan, et al. [58] used the periodicity variable to improve the predictive capability of the FFNN, RBNN, GRNN, and ANFIS models for the prediction of the monthly streamflow of the Gilgit River, Pakistan. In a recent study, Adnan, Zounemat-Kermani, Kuriqi and Kisi [53] achieved an improved performance of the long short-term memory (LSTM), extreme learning machines (ELM), and random forest (RF) models for the monthly streamflow of the Kohala and Garhi Habibullah stations in Pakistan. They showed that the inclusion of the periodicity component (MN) decreased the RMSE of the optimal LSTM, ELM, and RF models by 11.9%, 6.9%, and 1% for the Garhihabibullah Station and by 20.8%, 20.5%, and 3.7% for the Kohala Station, respectively.
A comparison of the models' outcomes revealed that the ensemble LWL-AR model performed better than the other models in both training and testing phases of the monthly streamflow modeling. The LWL-DG and LWL-RF models showed similar performance and ranked as the second-best models, followed by the LWL-RS model that was identified as the least effective ensemble model. The results of other modeling studies support our findings that the application of the ensemble learning techniques can considerably improve the capability of the base models for modeling different environmental problems [26,29,47,59]. Overall, our case study demonstrated that the ensemble models successfully outperformed the single LWL model and provided promising accuracy for streamflow forecasting. Due to the non-linear nature of many environmental processes and phenomena (e.g., streamflow), hybrid ensemble models that benefit from the advantages of multiple methods/models can better capture the complexity of these phenomena and often yield more accurate results than single simple models.

Conclusions
This study investigated the capability of five ensemble models, that is, LWL-AR, LWL-BG, LWL-DG, LWL-RS, and LWL-RF, for monthly streamflow forecasting. The results were validated using several performance metrics and compared to those of a single LWL model. Based on the results obtained, we conclude that:

•
The ensemble models are predominantly superior to the single LWL model for monthly streamflow forecasting. • Among the ensemble methods, the LWL-AR model surpasses the other models in both training and testing performances.

•
The most accurate models are developed when the periodicity variable (MN, month number) is incorporated into the modeling process. • Ensemble forecasting is a robust and promising alternative to the single forecasting of streamflow.
Although the developed ensemble models were verified using a regional-scale dataset from Pakistan, they are sufficiently general to be applied in any other region around the world with minor adjustments in the variables relative to local conditions. Future research can extend this ensemble forecasting approach by using other ensemble learning techniques (e.g., AdaBoost, MultiBoost, LogitBoost, Decorate, etc.) and, perhaps even more interesting, by testing various types of state-of-the-art machine learning methods as the base classifier. The idea of coupling machine learning methods with ensemble learning techniques with the aim of enhancing the computational performance and improving the predictive accuracy can be extended beyond forecasting monthly streamflow to solve many other complex geo-hydrology problems. In this study, previous streamflow values and periodicity information were considered as inputs to the ensemble models. In future works, streamflow forecasting considering the flood mitigation capacity of Mangla Dam can be investigated using ensemble models. Furthermore, by taking into account the landforms (the digital terrain model) and the dimensions of the river basin as inputs, the implemented methods may provide more accurate forecasting results.