Machine Learning Application in Reservoir Water Level Forecasting for Sustainable Hydropower Generation Strategy

: The aim of this study is to accurately forecast the changes in water level of a reservoir located in Malaysia with two di ﬀ erent scenarios; Scenario 1 (SC1) includes rainfall and water level as input and Scenario 2 (SC2) includes rainfall, water level, and sent out. Di ﬀ erent time horizons (one day ahead to seven days) will be investigated to check the accuracy of the proposed models. In this study, four supervised machine learning algorithms for both scenarios were proposed such as Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Bayesian Linear Regression (BLR) and Neural Network Regression (NNR). Eighty percent of the total data were used for training the datasets while 20% for the dataset used for testing. The models’ performance is evaluated using ﬁve statistical indexes; the Correlation Coe ﬃ cient (R 2 ), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), and Relative Squared Error (RSE). The ﬁndings showed that among the four proposed models, the BLR model outperformed other models with R 2 0.998952 (1-day ahead) for SC1 and BDTR for SC2 with R 2 0.99992 (1-day ahead). With regards to the uncertainty analysis, 95PPU and d-factors were adopted to measure the uncertainties of the best models (BLR and BDTR). The results showed the value of 95PPU for both models in both scenarios (SC1 and SC2) fall into the range between 80% to 100%. As for the d-factor, all values in SC1 and SC2 fall below one.


Introduction
A reservoir is a physical structure (artificial and natural) used as water storage for water storage preservation, control, and regulation of water supply [1]. Thus, climate change is likely to cause parts of the reservoir's water cycle to intensify as warming global temperatures increase the rate of evaporation around the world. More evaporation is causing more precipitation on average where it leads to increasing of water level in reservoir. When a reservoir is full at the top water level because of excessive inflow, it will cause the water level to rise and increase the rates of discharge over the spillway. This event can cause flooding downstream. Flooding is one of the extreme events that has a major impact on reservoirs, especially at unregulated sites where it can directly or indirectly cause extreme losses to the public such as houses, facilities, assets, or innocent souls [2]. The water level in all water bodies changes to some extent and the intake facility should be considered in terms of the anticipated varieties. During drought season, the water level in reservoirs will be low and abstraction might not be permitted. Though the abstraction is permitted, the water level may still not be high enough to drive the desired stream into the intake. A weir or torrent may be required to sustain an adequate water level. One major problem in current dam impact studies is the lack of a reliable model for simulating the implications of water level on the reservoir operation. It reduces the efficiency of the reservoir operation as some amount of water has to be released through the spillway to ensure the water level is below the full supply level. In forecasting reservoir water levels, two methods can be applied, which is to weigh the model utilizing the water level's natural factors and to acknowledge the natural factors effecting historical water level to anticipate future levels [3]. Essentially, utilizing historical water level information can altogether diminish the inconsistency between factors in a model. Also, the importance of prior knowledge of water level will give benefit to the operator regarding the optimal draw of the water level and sustainable plan for hydropower generation. Therefore, this paper focuses on forecasting changes on water level using Machine Learning (ML) algorithms.
The term ML means to facilitate machines to review without programming them explicitly. In particular, the performance of ML in intelligence tasks is largely due to its ability to discover a complex structure that has not been defined prior [4]. There are four general ML methods [5,6]: (1) Supervised, (2) unsupervised, (3) semi-supervised, and (4) reinforcement learning methods. The difference between supervised and unsupervised learning is that supervised learning already has the expert knowledge to develop the input/output [7]. Meanwhile, unsupervised learning only has the input and the model will learn the hidden structure or data distribution to produce the output as cluster or feature [8][9][10]. The aim of ML is to allow machines to predict, cluster, extract association rules, or make decisions from a given dataset [11]. Over the past years, there has been numerous techniques employed in forecasting the hydrological events. Previously, the tools to forecast reservoir water level used conventional approach of linear mathematical relationships based on operator experience, mathematical curves, and guidelines [12]. However, due to the complexity and lack of access to the data, overestimating parameters and high missing values in data cause poor performances and undermining of numerical models. With regards to these issues, the use of ML approaches has been introduced, on the condition of improved results in modelling nonlinear processes and forecasting than traditional models, such as moving average methods [13][14][15]. The core advantage of this modelling is the skill of the software to plot the input-output patterns without the aforementioned expertise of the factors affecting the forecast parameters [16,17]. Hence, researchers have since started to practice this ML to predict variety of modelling approaches and parameters to improve the accuracy and reliability of describing the predicting model.
As such, numerous researchers used ML to forecast water level such as Artificial Neural Networks [18,19], Support Vector Machines (SVM) [17,19,20], Adaptive Neuro-Fuzzy Inference Systems (ANFIS) [17][18][19][20], Radial Basis Neural Networks (RBNN) [20], Generalized Regression Neural Networks (GRNN) [20], Radial Basis Function-Firefly Algorithm (RBF-FFA) [21], and Cuckoo Optimization Algorithms [18]. Furthermore, few researchers have also used hybrid and improvised models such as Wavelet-Based Artificial Neural Network (WANN) and Wavelet-Based Adaptive Neuro-Fuzzy Inference System (WANFIS) [22] and Least Squares Support Vector Machine (LSSVM) [23]. Damien [24] stated that all approaches of ML are within the same range as they all showed decent prediction in different scenarios and the quality of data used. However, the downside of using these ML techniques, such as ANN and ANFIS, is the outcome dissimilarity that depend on the complexity of the modeled system. Therefore, new ML approaches has been introduced in this study for forecasting the water level, which is Boosted Decision Tree Regression (BDTR). This BDTR has become more popular because of the simplicity of the model system. In spite of the simplicity, it can exhibit good predictive and tends to improve accuracy with minor risk of less coverage. The major aim of the study is to provide the operator of the reservoir with accurate water level forecasting tool and investigate different modelling approaches like Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Bayesian Linear Regression (BLR), and Neural Network Regression (NNR) and the effectiveness of these algorithms in learning the parameter pattern of water level. Secondly, the objective of this study is to assess and evaluate different scenarios and time horizons to find the most accurate and reliable model. Thirdly, this study is essential for industrial activities and progress, and better forecasting accuracy for water level would help generate better operation policies for the reservoir, which leads to better condition for water users including the business of agricultural process, industrial activities, and hydropower generation. As a result, the accurate forecasting for water level will positively affect for better planning for all related industrial and business activities. In a nutshell, the motivation of this study is necessary for predicting/forecasting the water level in the reservoir as it is an important variable for the decision-maker or the operator of the reservoir to know in order to be able to optimize the water resources planning. become more popular because of the simplicity of the model system. In spite of the simplicity, it can exhibit good predictive and tends to improve accuracy with minor risk of less coverage. The major aim of the study is to provide the operator of the reservoir with accurate water level forecasting tool and investigate different modelling approaches like Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Bayesian Linear Regression (BLR), and Neural Network Regression (NNR) and the effectiveness of these algorithms in learning the parameter pattern of water level. Secondly, the objective of this study is to assess and evaluate different scenarios and time horizons to find the most accurate and reliable model. Thirdly, this study is essential for industrial activities and progress, and better forecasting accuracy for water level would help generate better operation policies for the reservoir, which leads to better condition for water users including the business of agricultural process, industrial activities, and hydropower generation. As a result, the accurate forecasting for water level will positively affect for better planning for all related industrial and business activities. In a nutshell, the motivation of this study is necessary for predicting/forecasting the water level in the reservoir as it is an important variable for the decisionmaker or the operator of the reservoir to know in order to be able to optimize the water resources planning.

Study Area and Dataset
Sultan Mahmud Power Station or Kenyir Dam is the Kenyir Lake, Terengganu, Malaysia hydroelectric dam and is situated 50 km in distance southwest of Kuala Terengganu (latitude: 5°1′25.23′' N, longitude: 102°54′35.88′' E) with catchment area is 1260 sq km. The Kenyir Reservoir was built by the structure of the Kenyir dam to sustain Sungai Terengganu flows throughout the year for the production of hydroelectric power. Figures 1 and 2 illustrate the Kenyir Dam location and station layout.  Data for this study were secondary data, a total of 12,531 (34 years) historical data points were collected for the daily water level and daily rainfall from April 1985 to July 2019 and 82,057 (9 years) hourly sent out data from March 2010 to July 2019 were acquired from the Kenyir Operation Unit. For reservoir water level, the range was based on the hydraulic features of Kenyir Dam; minimum operating level was at 120 m, full supply level (FSL) was at 145 m, and the maximum water level was at 153 m. The basic statistical parameters, namely minimum, maximum, average, standard deviation (S.D.), and total count of the input are presented in Table 1. Data for this study were secondary data, a total of 12,531 (34 years) historical data points were collected for the daily water level and daily rainfall from April 1985 to July 2019 and 82,057 (9 years) hourly sent out data from March 2010 to July 2019 were acquired from the Kenyir Operation Unit. For reservoir water level, the range was based on the hydraulic features of Kenyir Dam; minimum operating level was at 120 m, full supply level (FSL) was at 145 m, and the maximum water level was at 153 m. The basic statistical parameters, namely minimum, maximum, average, standard deviation (S.D.), and total count of the input are presented in Table 1.

Input Selection
In Machine Learning (ML), one of the main tasks is to choose input parameters that will influence output parameters because it will require attention and good understanding of the underlying physical process based on causal variables and statistical analysis of possible inputs and outputs [10]. Reservoir water level is basically affected by the hydrological phenomena such as rainfall, heat and temperature, evaporation, discharge, and also, the decision of send out water to the river downstream. Rising water level can cause flooding downstream and the importance of forecasting water level is to control the water release during dry and wet season.
There are two different input for forecasting water level, which are Scenario 1 (SC1) in Equation (1) and Scenario 2 (SC2) in Equation (2).
where is the rainfall and is water level at time t and the water level for 7 days ahead, and where n is a day-ahead value like tomorrow until the seventh day ahead will give an insight on the best forecasting scenarios. Also, is sent out as additional input for the scenarios 2. Also, SC1 will use the data from 1985 to 2019 and SC2 from 2010 to 2019.

Input Selection
In Machine Learning (ML), one of the main tasks is to choose input parameters that will influence output parameters because it will require attention and good understanding of the underlying physical process based on causal variables and statistical analysis of possible inputs and outputs [10]. Reservoir water level is basically affected by the hydrological phenomena such as rainfall, heat and temperature, evaporation, discharge, and also, the decision of send out water to the river downstream. Rising water level can cause flooding downstream and the importance of forecasting water level is to control the water release during dry and wet season.
There are two different input for forecasting water level, which are Scenario 1 (SC1) in Equation (1) and Scenario 2 (SC2) in Equation (2).
where R t is the rainfall and WL t is water level at time t and the water level for 7 days ahead, and where n is a day-ahead value like tomorrow until the seventh day ahead will give an insight on the best forecasting scenarios. Also, S t is sent out as additional input for the scenarios 2. Also, SC1 will use the data from 1985 to 2019 and SC2 from 2010 to 2019.

Data Partitioning
Data partitioning in ML has several ways to be used in an experiment. Training and test partition or cross validation were the most popular ways for data partitioning [25,26]. In forecast modeling purposes, a training dataset is employed for an ML algorithm to learn the pattern of the input and form a model; consequently, the test partition used to measure against the prediction precision of the model [27]. The partition of data to train and test should be representative of each partition to avoid issues or bias to the datasets. The training set should be a higher value than the test set as it has to learn the data pattern before going to test the data. Researchers have chosen to go with training rate between 80% and 95% [10,21,28]. Thus, these datasets were randomly split into a training set that contained 80% of the total data and another 20% for the test set.

Models Used for Forecasting
The forecasting model will use four different ML algorithm, which are Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Neural Network Regression (NNR), and Bayesian Linear Regression (BLR).
A BDTR is one of several classic methods to create an ensemble of regression trees where each tree is dependent on prior tree [29]. In a simple word, it is an ensemble learning method during which the errors of the primary tree will be corrected by the second tree, the third tree corrects for the errors of the primary and second trees, and so on. Predictions are based together on the whole set of trees, which makes the prediction. The BDTR shows an exceptionally great ability in dealing with tabular data [30]. The advantages of BDTR is it robust to missing data and normally allocates feature significance scores. Usually BDTR perform better than DFR because it appears to be the method of choice with slightly better performance than DFR in Kaggle competition [31]. Unlike DFR, BDTR may be more prone to overfitting because the main purpose is to reduce bias and not variance. BDTR takes a longer time to develop since it has many hyperparameters to tune and trees are built sequentially [32]. Figure 3 below shows the distribution of BDTR where the trees are generally shallow with three parameter-number of trees, depth of trees, and learning rate.

Data Partitioning
Data partitioning in ML has several ways to be used in an experiment. Training and test partition or cross validation were the most popular ways for data partitioning [25,26]. In forecast modeling purposes, a training dataset is employed for an ML algorithm to learn the pattern of the input and form a model; consequently, the test partition used to measure against the prediction precision of the model [27]. The partition of data to train and test should be representative of each partition to avoid issues or bias to the datasets. The training set should be a higher value than the test set as it has to learn the data pattern before going to test the data. Researchers have chosen to go with training rate between 80% and 95% [10,21,28]. Thus, these datasets were randomly split into a training set that contained 80% of the total data and another 20% for the test set.

Models Used for Forecasting
The forecasting model will use four different ML algorithm, which are Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Neural Network Regression (NNR), and Bayesian Linear Regression (BLR).
A BDTR is one of several classic methods to create an ensemble of regression trees where each tree is dependent on prior tree [29]. In a simple word, it is an ensemble learning method during which the errors of the primary tree will be corrected by the second tree, the third tree corrects for the errors of the primary and second trees, and so on. Predictions are based together on the whole set of trees, which makes the prediction. The BDTR shows an exceptionally great ability in dealing with tabular data [30]. The advantages of BDTR is it robust to missing data and normally allocates feature significance scores. Usually BDTR perform better than DFR because it appears to be the method of choice with slightly better performance than DFR in Kaggle competition [31]. Unlike DFR, BDTR may be more prone to overfitting because the main purpose is to reduce bias and not variance. BDTR takes a longer time to develop since it has many hyperparameters to tune and trees are built sequentially [32]. Figure 3 below shows the distribution of BDTR where the trees are generally shallow with three parameter-number of trees, depth of trees, and learning rate. A DFR is an ensemble of randomly trained decision trees [33]. It works by constructing a huge number of decision trees at training time and producing an individual tree model of classes (classification) or mean forecast (regression) as the end of product. Referring to Figure 4, each tree is developed using a random subset of features employing an irregular subset of data, which deviate the trees by appearing them diverse datasets. It has two parameters, which is the number of trees and number of features to be selected at each node. DFR is good at generating uneven data sets with missing variables since it is generally robust to overfitting. It also has lower classification error and  A DFR is an ensemble of randomly trained decision trees [33]. It works by constructing a huge number of decision trees at training time and producing an individual tree model of classes (classification) or mean forecast (regression) as the end of product. Referring to Figure 4, each tree is developed using a random subset of features employing an irregular subset of data, which deviate the trees by appearing them diverse datasets. It has two parameters, which is the number of trees and number of features to be selected at each node. DFR is good at generating uneven data sets with missing variables since it is generally robust to overfitting. It also has lower classification error and better f-scores than decision trees but it is not easy to interpret the results [32]. Another disadvantage of importance is that the feature may not be vigorous enough to deal with the variety within the prepared dataset. Figure 4 below shows the distribution of DFR.
better f-scores than decision trees but it is not easy to interpret the results [32]. Another disadvantage of importance is that the feature may not be vigorous enough to deal with the variety within the prepared dataset. Figure 4 below shows the distribution of DFR. NNR is a chain of linear operations scattered with various nonlinear activation functions [34]. In general, the network has these defaults; the first layer is the input layer, the last layer is the output layer, and the hidden layer, which consists of the number of nodes that should be equal to the number of classes [24]. A neural network (NN) model is defined by the structure of its graph, which includes these features; the number of hidden layers, the number of nodes in each hidden layer, how the layers are connected, and which activation function is used and weights on the graph edges. Although NN are widely known for use in deep learning and modeling complex problems such as image recognition, they are easily adapted to regression problems. Any class of statistical models can be termed an NN if they use adaptive weights and can approximate non-linear functions of their inputs. Thus, NNR is suited to problems where a more traditional regression model cannot fit a solution. Figure 5 below shows the architecture modelling system of NNR. Bayesian Inference is used in the Bayesian approach unlike linear regression [35]. Prior parameter information is combined with a likelihood function to generate parameter estimates, which means the forecast distribution evaluates the likelihood of a value y given x for a particular w, by means of likelihood by current belief about w given data (y, X). Finally, all possible values of w are summed [35]. BLR enables the survival of insufficient data or incorrectly distributed data by a fairly natural process. The major advantage is that, by this Bayesian processing, you recover the whole range of inferential solutions, rather than a point estimate and a confidence interval as in classical regression. Figure 6 below shows the architecture modelling system of BLR. NNR is a chain of linear operations scattered with various nonlinear activation functions [34]. In general, the network has these defaults; the first layer is the input layer, the last layer is the output layer, and the hidden layer, which consists of the number of nodes that should be equal to the number of classes [24]. A neural network (NN) model is defined by the structure of its graph, which includes these features; the number of hidden layers, the number of nodes in each hidden layer, how the layers are connected, and which activation function is used and weights on the graph edges. Although NN are widely known for use in deep learning and modeling complex problems such as image recognition, they are easily adapted to regression problems. Any class of statistical models can be termed an NN if they use adaptive weights and can approximate non-linear functions of their inputs. Thus, NNR is suited to problems where a more traditional regression model cannot fit a solution. Figure 5 below shows the architecture modelling system of NNR.
Sustainability 2020, 12, x FOR PEER REVIEW 6 of 21 better f-scores than decision trees but it is not easy to interpret the results [32]. Another disadvantage of importance is that the feature may not be vigorous enough to deal with the variety within the prepared dataset. Figure 4 below shows the distribution of DFR. NNR is a chain of linear operations scattered with various nonlinear activation functions [34]. In general, the network has these defaults; the first layer is the input layer, the last layer is the output layer, and the hidden layer, which consists of the number of nodes that should be equal to the number of classes [24]. A neural network (NN) model is defined by the structure of its graph, which includes these features; the number of hidden layers, the number of nodes in each hidden layer, how the layers are connected, and which activation function is used and weights on the graph edges. Although NN are widely known for use in deep learning and modeling complex problems such as image recognition, they are easily adapted to regression problems. Any class of statistical models can be termed an NN if they use adaptive weights and can approximate non-linear functions of their inputs. Thus, NNR is suited to problems where a more traditional regression model cannot fit a solution. Figure 5 below shows the architecture modelling system of NNR. Bayesian Inference is used in the Bayesian approach unlike linear regression [35]. Prior parameter information is combined with a likelihood function to generate parameter estimates, which means the forecast distribution evaluates the likelihood of a value y given x for a particular w, by means of likelihood by current belief about w given data (y, X). Finally, all possible values of w are summed [35]. BLR enables the survival of insufficient data or incorrectly distributed data by a fairly natural process. The major advantage is that, by this Bayesian processing, you recover the whole range of inferential solutions, rather than a point estimate and a confidence interval as in classical regression. Figure 6 below shows the architecture modelling system of BLR. Bayesian Inference is used in the Bayesian approach unlike linear regression [35]. Prior parameter information is combined with a likelihood function to generate parameter estimates, which means the forecast distribution evaluates the likelihood of a value y given x for a particular w, by means of likelihood by current belief about w given data (y, X). Finally, all possible values of w are summed [35]. BLR enables the survival of insufficient data or incorrectly distributed data by a fairly natural process. The major advantage is that, by this Bayesian processing, you recover the whole range of inferential solutions, rather than a point estimate and a confidence interval as in classical regression. Figure 6 below shows the architecture modelling system of BLR. To summarize, the choice of the proposed methods to implement the forecasting for water level is the difficulty for mimicking the water level process utilizing traditional model methods. This is due to the fact that the water level behavior is affected by different stochastic and natural resources such as reservoir inflow pattern from the upstream river and also the evaporation process from the reservoir surface area as it affected by the stochastic changes in the temperature, relative humidity, etc.

Best Models Performance Evaluation
Model performance evaluation was used to signify the successful of scoring (datasets) that has been by a trained model to mimicking the real values of the output parameters indicated as follows; i.Mean absolute error, MAE [36] reflects the degree of absolute error between the actual and forecasted data.
ii.Root Mean Square Error, RMSE [36] is compared between the actual and forecasted data.
iii.Relative absolute error, RAE [37] is the relative absolute difference between actual and forecasted values.
iv.Relative squared error, RSE [37] similarly normalizes the entire squared error of the forecasted values.
v.Coefficient of determination, R 2 [38] shows the performance of forecasting model where zero means the model is random while 1 means there is a perfect fit.
In a nutshell, the performance of forecasting is better when the value of R 2 is close to 1 and differs for RMSE and MAE because the model's performance will be better if the value is close to 0 [39]. To summarize, the choice of the proposed methods to implement the forecasting for water level is the difficulty for mimicking the water level process utilizing traditional model methods. This is due to the fact that the water level behavior is affected by different stochastic and natural resources such as reservoir inflow pattern from the upstream river and also the evaporation process from the reservoir surface area as it affected by the stochastic changes in the temperature, relative humidity, etc.

Best Models Performance Evaluation
Model performance evaluation was used to signify the successful of scoring (datasets) that has been by a trained model to mimicking the real values of the output parameters indicated as follows; i Mean absolute error, MAE [36] reflects the degree of absolute error between the actual and forecasted data.
ii Root Mean Square Error, RMSE [36] is compared between the actual and forecasted data.
iii Relative absolute error, RAE [37] is the relative absolute difference between actual and forecasted values.
iv Relative squared error, RSE [37] similarly normalizes the entire squared error of the forecasted values.
v Coefficient of determination, R 2 [38] shows the performance of forecasting model where zero means the model is random while 1 means there is a perfect fit.

of 19
In a nutshell, the performance of forecasting is better when the value of R 2 is close to 1 and differs for RMSE and MAE because the model's performance will be better if the value is close to 0 [39].

Sensitivity Analysis (SA)
The need of SA is crucial element of decision-making, to learn how the output of the decisionmaking process changes when the input is varied [40]. The approach that we chose to determine the SA involves using the ML with the BDTR algorithms. Table 2 shows the forecasting performance of the coefficient of determination, R 2 for each input reacting to finding the optimal decision-making of the output, WL t+1 . The value that is close to 1 shows better model's performance.

Uncertainty Analysis (UA)
UA attempts to measure the output variance due to the input variability. It is carried out to define the range of possible outcomes based on the input uncertainty and to examine the impact of the model's lack of knowledge or errors. Consideration is given to the percentage of measured data bracketed by 95% Prediction Uncertainty (95PPU) determined by [41]. This factor is calculated at the 2.5% X L and 97.5% X U levels of an output variable where it refused 5% of the very bad simulations.
where k represents the total of actual data at test phases. Based on Equation (8), the value of "Bracketed by 95PPU" is greater (or 100%) when all measured data at testing stages are inserted between the X L and X U . Eighty percent and above of the measured data ought to be within the 95PPU level if they are of great quality. In case of a few regions where data are poor, 50% of data in 95PPU would be appropriate [42]. D-factors will be used to estimate the average width of uncertainty interim band and less than 1 will indicated the best value [42] as presented in Equation (9) σ x represents standard deviation of actual data x and d x is the average distance between the upper and lower bands [43] as in Equation (10)

Results and Discussion
This study aimed to forecast the water level at time t (for seven days ahead) mimicking the closest values to actual by using four ML's algorithm (BDTR, DFR, NNR, and BLR) for both SC1 and SC2 and the performance was accessed for each model for seven days. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Relative Squared Error (RSE), and Coefficient of Determination (R2) was the indexed used to validate the performance of each model. All of the model was then optimized in order to improve accuracy by tuning the hyperparameter and learning rate of each model. The thorough results are defined in the following sections.  Tables 3 and 4 presents the one-day-

Results and Discussion
This study aimed to forecast the water level at time t (for seven days ahead) mimicking the closest values to actual by using four ML's algorithm (BDTR, DFR, NNR, and BLR) for both SC1 and SC2 and the performance was accessed for each model for seven days. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Relative Squared Error (RSE), and Coefficient of Determination (R2) was the indexed used to validate the performance of each model. All of the model was then optimized in order to improve accuracy by tuning the hyperparameter and learning rate of each model. The thorough results are defined in the following sections.  Tables 3 and 4 presents the one-day-ahead forecasting of the water level, column 2 presents the train model, and column 3 presents the hyperparameter, which searches for a configuration that results in the best performance across different hyperparameter configurations. As summarized in Tables 3 and 4 for study location at Kenyir Dam, the average performance of the correlation coefficient was 0.99 and 0.95 for training and tuning hyperparameter, respectively.  For SC1, all models are able to perform well in forecasting the water level. In the comparison between four models used, the outcome of the results showed that the BLR outperformed the other models for both training and tuning. By comparing the scenarios in BLR, it is clear that the model performs better in predicting WL+1 in terms of R compared with its performance in predicting WL+7. RMSE and MAE for BLR also shows much a lower value of WL+1 (RMSE = 0.090613, MAE = 0.044009) than the others model, which indicates that the smaller the result, the better the performance of the model shows.

Models Performance and Optimization for SC1 and SC2
For SC2 in Table 4, R 2 shows that the performance decreasing along the water level one day ahead. This not only happens in one ML Algorithms, the coefficients of determination for all model (DFR, NNR, BLR, and BDTR) have the same pattern. Hence, water level for the first day (WL+1) is more accurate in forecasting for all the ML Algorithms because the nearest R 2 value to 1 means there is a perfect fit. In addition, the BDTR model gives the best performance in forecasting water level compare to other ML algorithms. When referring to R 2 , BDTR has the highest value for training (0.999368, 0.997656, 0.995418, 0.993893, 0.99146, 0.990325, 0.988958) followed by tuning hyperparameter (0.99992, 0.999774, 0.999518, 0.999279, 0.998875, 0.998596, 0.998363) to improve the model accuracy.
Referring Tables 3 and 4, the result for MAE, RMSE, RAE, and RSE is increased along the water level one day ahead. This shows that WL+1 for each ML Algorithms has the same pattern, which is increasing, and it means that the lowest value gave the better performance. Therefore, the rank from highest to lowest performance for other models in SC1 is BLR, BDTR, DFR, and NNR. Meanwhile, for SC2, the rank performance is BDTR, DFR, BLR, and NNR. The results of each model for the time horizon between two days to six days prediction are given in Appendices A and B. Figures 8 and 9 represent scatter plot the best models of dataset. WL+1 for both scenarios of the dataset shows that the plot/distribution is close to each other compared to the next day of water level until WL+7. The scatter plot of each model for the time horizon are given in Appendices C and D. It can be seen clearly in scenario one that BLR has outstanding performance in predicting the water level in the reservoir with a high level of precision for one day and seven days ahead. Similarly, the same performance has been noticed for scenario two when BDTR was used. These results indicate that these two models can be used to predict the expected changes in water level one week ahead.

Uncertainty Analysis of Best Model of SC1 and SC2
UA of the best model for SC1 is BLR while SC2 is BDTR was calculated using 95PPU and d-factor. Table 5 represent the uncertainty analysis results for seven days and one day ahead of forecasting water level.     This finding of the uncertainty analysis revealed that the proposed model exhibited a high level of accuracy in predicting the water level where all 95PPU for the different time horizons attained more than 80% and the d-factor value was at a very acceptable level, which fell below this. However, in spite of such results, there is still a need for further analysis to adopt this model on another study area, which can be achieved by modifying the architecture of the proposed model since each study area has its own pattern in water level. Figure 10 shows the Taylor diagram from water level one day ahead. Taylor diagrams will facilitate the comparative assessment of different models. It is used to quantify the degree of correspondence between the modeled and observed behavior in terms of three statistics: The Pearson correlation coefficient, the root-mean-square error (RMSE) error, and the standard deviation. Figure 10a shows the four models used; namely, M1 presents BLR, M2 presents NNR, M3 presents BDTR, and M4 presents DFR. Meanwhile, Figure 10b shows the four models used; namely, M1 presents DFR, M2 presents NNR, M3 presents BLR, and M4 presents BDTR. From the diagram, we can conclude that the most correspondence predicted to actual data is M1 for both scenarios, which represents BLR in SC1 and BDTR in SC2 as the closest to the actual value.

Taylor Diagram for Best Performance in Each Scenarios
Sustainability 2020, 12, x FOR PEER REVIEW 14 of 21 correlation coefficient, the root-mean-square error (RMSE) error, and the standard deviation. Figure  10a shows the four models used; namely, M1 presents BLR, M2 presents NNR, M3 presents BDTR, and M4 presents DFR. Meanwhile, Figure 10b shows the four models used; namely, M1 presents DFR, M2 presents NNR, M3 presents BLR, and M4 presents BDTR. From the diagram, we can conclude that the most correspondence predicted to actual data is M1 for both scenarios, which represents BLR in SC1 and BDTR in SC2 as the closest to the actual value.
In can be seen based on the correlation and standard deviation between the actual water level and the predicted one from the proposed model that the developed model is not only capable of predicting the changes in water level accurately during the entire dataset, but also the standard deviation of predicted data is close to the actual one. This indicates that the proposed model is capable of mimicking the variation in the dataset.

Accuracy Improvement
In order to compare between scenario one and scenario two, the percentage of accuracy improvement formula was used as follow: where R²SC1 is the value of the coefficient of determination given by SC1, while R²SC2 is the same coefficient of determination given by the proposed SC2. This comparison of R² from the best model of BDTR for the both scenario. Referring to Table 6, this shows that SC2 has the best result rather than SC1 since the accuracy improvement is positive. The result is the accuracy improvement getting better among the water level for the next seven days. In can be concluded that by introducing scenario two, there is noticeable improvement in predicting the changing in the water level of the reservoir. In addition to that, the accuracy improvement percentage after introducing scenario two shows that the model performs better in predicting water level in different lead times where the highest percentage of accuracy improvement is achieved when the model is used to predict the water level one week ahead.  In can be seen based on the correlation and standard deviation between the actual water level and the predicted one from the proposed model that the developed model is not only capable of predicting the changes in water level accurately during the entire dataset, but also the standard deviation of predicted data is close to the actual one. This indicates that the proposed model is capable of mimicking the variation in the dataset.

Accuracy Improvement
In order to compare between scenario one and scenario two, the percentage of accuracy improvement formula was used as follow: where R 2 SC1 is the value of the coefficient of determination given by SC1, while R 2 SC2 is the same coefficient of determination given by the proposed SC2. This comparison of R 2 from the best model of BDTR for the both scenario. Referring to Table 6, this shows that SC2 has the best result rather than SC1 since the accuracy improvement is positive. The result is the accuracy improvement getting better among the water level for the next seven days. In can be concluded that by introducing scenario two, there is noticeable improvement in predicting the changing in the water level of the reservoir. In addition to that, the accuracy improvement percentage after introducing scenario two shows that the model performs better in predicting water level in different lead times where the highest percentage of accuracy improvement is achieved when the model is used to predict the water level one week ahead.

Conclusions
This study investigated different Machine Learning techniques such as Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Neural Network Regression (NNR), and Bayesian Linear Regression (BLR) to identify the most accurate model for water level prediction based on daily measured historical data from 1985 to 2019. Different time horizons were examined from one day ahead to seven days ahead. The results showed that all the models performed well and can mimic the actual values, however, the BLR model outperformed all other models for SC1 where the predicted values were found to be close to the coefficient of determination of 1 for forecasting water level. Meanwhile, BDTR outperformed all other models for SC2. The best model performance of R 2 for SC1 is 0.994992, 0.9917, 0.990856, 0.984186, 0.97562, 0.967717 while for SC2 is 0.999368, 0.997656, 0.995418, 0.993893, 0.99146, 0.990325, and 0.988958 for each water level, respectively. After conducting uncertainty for the proposed models, using 95PPU and d-factor analysis, it can be concluded that the BLR (for SC1) and BDTR (for SC2) show high satisfaction in degree of uncertainty. In this study, in spite of introducing few inputs to the proposed models, two input parameters (water level and rainfall) used in SC1 and three input parameters (water level, rainfall, and sent out) used in SC2, it can be seen that the models performed well in predicting the changes in water level with a high level of accuracy. The proposed models can be used as a tool in operating the reservoirs in Malaysia efficiently and could be an effective method for water decision makers by relying on the highly demanding new technologies such as Artificial Intelligence. However, further study is needed by including more input parameters such as the change in rainfall due to climate changes in order to improve the accuracy of the model and also investigate the impact of the projected rainfall on the water level of the reservoir.