Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms

Zrebar Lake is one of the largest freshwater lakes in Iran and it plays an important role in the ecosystem of the environment, while its desiccation has a negative impact on the surrounded ecosystem. Despite this, this lake provides an interesting recreation setting in terms of ecotourism. The prediction and forecasting of the water level of the lake through simple but practical methods can provide a reliable tool for future lake water resource management. In the present study, we predict the daily water level of Zrebar Lake in Iran through well-known decision tree-based algorithms, including the M5 pruned (M5P), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). We used five different water input combinations to find the most effective one. For our modeling, we chose 70% of the dataset for training (from 2011 to 2015) and 30% for model evaluation (from 2015 to 2017). We evaluated the models’ performances using different quantitative (root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), percent bias (PBIAS) and ratio of the root mean square error to the standard deviation of measured data (RSR)) and visual frameworks (Taylor diagram and box plot). Our results showed that water level with a one-day lag time had the highest effect on the result and, by increasing the lag time, its effect on the result was decreased. This result indicated that all the developed models had a good prediction capability, but the M5P model outperformed the others, followed by RF and RT equally and then REPT. Our results showed that these algorithms can predict water level accurately only with a one-day lag time in water level as an input and they are cost-effective tools for future predictions.

. Location of the study area [53].

Data Assemblage and Preparation
One of the challenges in modeling nonlinear hydrological processes is choosing the most important variables from all possible input variables [54]. Input selection is critical for learning systems, particularly during the identification process, when the dataset is big and the number of variables is large [55]. The primary objective of data assemblage and preparation is to select the appropriate input variables based upon the available data. In the model, the combination of various input variables, also known as feature selection (variable subset selection), is a process of selecting the optimum subset of inputs according to established governing principles [56]. The tweaking of models through such selection is done to increase model accuracy and efficiency (to reduce calibration time).
In the current study, we applied different input variable combinations to solve emerging issues during the modeling process. As inflow to Zrebar Lake is only primarily from subaqueous springs, we chose antecedent water level as a variable. In this context, we used various combinations of water level (WL) at different lag times (i.e., WL (t-1 to t-5)). For the purpose of this study, we calibrated the features using observed daily water level data across 6 years from Sept 2011 to Sept 2017. We initially defined five scenarios, starting with WL for the present day (WL(t-1)), and then continued with

Data Assemblage and Preparation
One of the challenges in modeling nonlinear hydrological processes is choosing the most important variables from all possible input variables [54]. Input selection is critical for learning systems, particularly during the identification process, when the dataset is big and the number of variables is large [55]. The primary objective of data assemblage and preparation is to select the appropriate input variables based upon the available data. In the model, the combination of various input variables, also known as feature selection (variable subset selection), is a process of selecting the optimum subset of inputs according to established governing principles [56]. The tweaking of models through such selection is done to increase model accuracy and efficiency (to reduce calibration time).
In the current study, we applied different input variable combinations to solve emerging issues during the modeling process. As inflow to Zrebar Lake is only primarily from subaqueous springs, we chose antecedent water level as a variable. In this context, we used various combinations of water level (WL) at different lag times (i.e., WL(t-1 to t-5)). For the purpose of this study, we calibrated the features using observed daily water level data across 6 years from September 2011 to September 2017. We initially defined five scenarios, starting with WL for the present day (WL(t-1)), and then continued with combination for the previous 2, 3, 4 and 5 days (Table 1). All these input scenarios were applied to develop a model (i.e., model training) to predict lake water level as an output. We then calculated the prediction accuracy for every different scenario. Generally, the best input variable was a combination of WL(t-1), WL(t-2), WL(t-3), WL(t-4), WL(t-5) and the output variable (WL(t)) selected for model training. Next, we developed algorithms that were applied to predict WL(t) using the testing dataset.
For the purpose of modeling, we used 70% of the dataset for training (from 2011 to 2015) and 30% for testing (from 2015 to 2017) [57]. While there is no universally applied guide for data division, the training and the testing datasets have to carry similar statistical properties and 70:30 is the most commonly used ratio [27,[58][59][60][61][62].

M5P
The M5 tree is a highly accurate and computationally cheap state-of-the-art model amongst decision tree learners which works based on regression tasks that have very high dimensionality. It was developed by Quinlan [63]. M5, instead of assigning a constant to the leaf node, allocates a multivariate linear regression model at each leaf to forecast numerical values. Therefore, the performance of an M5 tree model is highly dependent on the chosen linear models. Amongst M5 models, M5P is a binary regression tree on which the last leaf nodes are the linear regression functions that yield continuous numerical attributes. Thereafter, to do tree pruning, tree evacuation and substitution are performed by a linear function approximation, which diminishes variance in cells and creates smaller nodes with a tree-like structure.
The M5P method is capable of handling large datasets, along with missing data recovered by dividing input spaces into various smaller sub-spaces. In general, the minimum number of instances, batch size, constructed regression trees, number of decimal places and unpruned and unchecked capabilities are all advantages of M5P models. A more detailed study has been investigated by Khorsvai et al. [64] about the M5P modeling approach.

Random Forest (RF)
RF, first designated by Breimen et al. [65], is an ensemble approach for building predictive models for both classification and regression tasks. It is a way of combining less predictive base models to yield better predictive models. Due to their simple nature, low assumptions and high performance, RF models have been broadly used in machine learning (ML). The term "forest" refers to a series of decision trees that are by themselves "weak" classifiers. A regression forest does not have the same predictive power as a singular regression tree [65,66]. Where a single tree splits into just one criterion, it is then very sensitive to the training dataset. Even small changes in the dataset and splitting criterion can prime different tree structures and yield different explanations [66]. Therefore, RF models classify the variables based upon their importance to attain the best RF model.

Random Tree (RT)
RT divides a dataset into sub-spaces and fits a constant for each sub-space. A single tree model has a tendency to be very unstable and shows a poor prediction accuracy. However, by bagging RT as a decision tree algorithm, it can yield highly accurate results [67]. RT has high flexibility along with fast training capability [68].

Reduced Error Pruning Tree (REPT)
When a decision tree is constructed, due to noise or outliers, several branches reproduce variances in the training dataset. This problem has been addressed as over fitting in tree pruning, which uses statistical procedures to eliminate the less accurate branches and generally includes pre-pruning and post-pruning. The principle incentive of pruning is "trading accuracy for simplicity". REPT is an integrated method of the reduced error pruning (REP) and the DT approaches, in which the pruned trees are produced by using the test data. It uses the validation dataset to estimate generalization error. This method was first employed by Quinlan [63] when they applied a decision tree based on the available information and variance reduction. The advantage of REPT lies in its ability to reduce the complexity of the tree by pruning, which decreases the dimensions of a decision tree and over-fitting during the learning process without a substantial accuracy loss [69]. Thus, a pruning process is needed to cut the tree back. REPT is capable of fast learning by decreasing variance to create decision trees [70].

Model Evaluation and Comparison
To validate and compare the models, the five quantitative statistics, including: root mean square error (RMSE), correlation coefficient (R 2 ), mean absolute error (MAE), percent bias (PBIAS), ratio of the root mean square error to the standard deviation of measured data (RSR) and Nash-Sutcliffe coefficient (NSE), were utilized to assess the performance of the evaluation methods. Furthermore, to visually compare the model performance, Taylor diagrams and boxplots were investigated [71]. Taylor diagrams were introduced by Taylor [71] tographically show how closely an estimated output (or a set of estimated outputs) matches observations. They are plotted based on the correlation and standard deviation of the estimated and observed datasets. It is especially useful in evaluating multiple aspects of complex models or in gauging the relative skill of many different models (e.g., [72]). However, the box plot provides more information about data distribution, as well as maximum and minimum values, which is important in modeling.
These indices are expressed by the following equations: (2) where n is the quantity of samples, WL is the actual value of the output, WL is the average of WL over the entire target set, WL is the average of WL over the entire target set and WL is the simulated output value. R 2 describes the degree of collinearity between our simulated and measured data. It ranges between 0 and 1, with higher R 2 values designating better prediction accuracy, and values greater than 0.5 are considered acceptable [73]. The RMSE and MAE measure the error of the models. Lower values of RMSE and MAE designate better model predictive performance. The NSE is a normalized statistic that governs the relative extent of the residual variance compared to the measured data variance [74]. The NSE ranges between −∞ and 1. It designates a flawless match between observed and predicted values when NSE = 1. Model predictive performance is categorized as very good, good, acceptable or unacceptable with the ranges of 0. [64,75]. The PBIAS determines the average inclination of the simulated values to be larger or smaller than the observed values [76] and, hence, it is the best metric to show over-or underestimation [64]. It varies from −∞ to 1, with negative values showing overestimation, while positive values indicates underestimation [77]. The RSR is intended as the ratio of the RMSE and standard deviation of measured data. The RSR differs from the optimal value of 0 to a large positive value. A lower RSR means a lower RMSE, which shows a better model predictive performance. RSR classification ranges are represented as very good, good, acceptable and unacceptable with ranges of 0.00 ≤ RSR ≤ 0.50, 0.50 ≤ RSR ≤ 0.60, 0.60 ≤ RSR ≤ 0.70 and RSR > 0.70, respectively [64].
In addition, two convenient graphical evaluation tools, such as Taylor diagrams and boxplots, were applied to visually compare the model performance [71]. Taylor diagrams deliver the similarity between two patterns and how strictly a model pattern ties to observation. It uses three corresponding model performance statistics, including the standard deviation (sigma), R 2 , and the RMSE, which can be plotted on a two-dimensional graph by the law of cosines. In general, Taylor diagrams are a useful tool for assessing the comparative skill of different models. Furthermore, boxplots have been used for the purpose of evaluation, as they present five statistics, including minimum, lower quartile, median, upper quartile and maximum, in a graphic presentation. The schematic diagram of the methodology is illustrated in Figure 2.

Results and Analysis
We tested the performance of four models to predict daily water level fluctuation in both training and testing stages using various evaluation criteria ( Table 2). According to our statistical evaluation criteria, we observed that all the models had very good predictive ability (R 2 > 0.7). Our result of the coefficient of determination (R 2 ) showed that these models are all acceptable but the M5P model performed best, due to having the highest R 2 value (0.9874), followed by the RF (0.9697), RT (0.9654) and REPT (0.965) models. In terms of the RMSE value, the M5P model also had the highest predictive power by having the lowest RMSE (0.05), followed by the RF and RT (0.09) and REPT (0.1) models. The M5P model yielded the lowest MAE criteria (0.01), followed by the RF (0.02), RT (0.03) and REPT (0.033) models. In addition, the NSE metric was classified from greatest predictive power to least, as

Results and Analysis
We tested the performance of four models to predict daily water level fluctuation in both training and testing stages using various evaluation criteria ( Table 2). According to our statistical evaluation criteria, we observed that all the models had very good predictive ability (R 2 > 0.7). Our result of the coefficient of determination (R 2 ) showed that these models are all acceptable but the M5P model performed best, due to having the highest R 2 value (0.9874), followed by the RF (0.9697), RT (0.9654) and REPT (0.965) models. In terms of the RMSE value, the M5P model also had the highest predictive power by having the lowest RMSE (0.05), followed by the RF and RT (0.09) and REPT (0.1) models. The M5P model yielded the lowest MAE criteria (0.01), followed by the RF (0.02), RT (0.03) and REPT (0.033) models. In addition, the NSE metric was classified from greatest predictive power to least, as follows: M5P (0.98) > RF and RT (0.96) > REPT (0.95), which is similar to R 2 . Additionally, the results from the PBIAS reveal that all of the applied models underestimated water levels (due to a positive value of PBIAS). The calculated PBIAS values were between 0-0.2 for all models, indicating a very good performance in predicting daily water level fluctuation in the study area. Finally, the performance all of the applied models, based on the RSR values, was classified into four classes: very good, good, satisfactory and unsatisfactory with ranges of 0.00 ≤ RSR ≤ 0.50, 0.50 ≤ RSR ≤ 0.60, 0.60 ≤ RSR ≤ 0.70 and RSR > 0.70, respectively. Therefore, the RSR shows a very good performance across all our developed models. We used the Pearson correlation coefficient (PCC) to calculate the relative importance of the input variables (WL(t-1 to (t-5)) and daily water level for different time lags to determine the most important factor for the prediction of daily water level. These were WL(t-1) (0.981), followed by WL(t-2) (0.964), WL(t-3) (−0.946), WL(t-4) (0.928) and WL(t-5) (0.925) and they show that water level with a one-day lag time had the highest effect on the result and this effectiveness reduced with greater lag times.
The R values are represented in Table 3, indicating information for the given inputs and output/target variables. The results of PCC show that WL(t-1) and WL(t-5) had the highest and lowest daily water level values, respectively (PCC = 0.981 and 0.925). After the completion of the correlation analysis, we applied the best input combination by using the testing dataset shown in Table 4. Base on the R values, we evaluated different input combinations in the models (M5P, RF, RT and REPT) at both the training and testing stage. We found that compound WL(t-1) (combination 1) for all developed models, except M5P (REPT, RT and RF), was the best input combination due to the highest R in the testing phase, with values of 0.982, 0.981 and 0.980, respectively. For M5P, the most effective combination was WL(t-1) and WL(t-2) (combination 2) with a value of R = 0.933. We found that, overall, the M5P model generally had a better fitting accuracy and the highest correlation among the input variables and that the RF model had the poorest accuracy in the approximation of the training data. Table 3. Pearson correlation coefficient (R) between input variables and bed load sediment transport rate.    Figure 3 shows the line graphs and scatter plots of the observed and predicted daily water levels. The result shows that all the models predicted daily water level with a high level of accuracy, while only M5P was able to perfectly predict the maximum values of the water level fluctuation of Zrebar Lake. Also, M5P was the closest to the observed water level values and best line (45 • line), with minimum scatter with the linear equation wl pred = 19.576wl obs + 0.9848. Conversely, the REPT model provided the worst estimates with maximum scatter (Figure 4). This confirms that M5P outperformed the other models, including RF, RT and REPT. This result is in accordance with the evaluation criteria presented in Table 2.

WL(t-1), WL(t-2), WL(t-3), WL(t-4), WL(t-5)
We also further analyzed model efficiency using Taylor diagrams ( Figure 5) and boxplots ( Figure 6). The closer the point of each developed model to the observed point location, the higher the performance. Here, our results also show that the models had good predictive power, but the M5P algorithm had a higher correlation and lower RMSE. Based on the normalized standard deviation values, the SD of the M5P model was similar to the observed SD, whereas REPT had a lower standard deviation, followed by the RT and RF models.   Figure 3 shows the line graphs and scatter plots of the observed and predicted daily water levels. The result shows that all the models predicted daily water level with a high level of accuracy, while only M5P was able to perfectly predict the maximum values of the water level fluctuation of Zrebar Lake. Also, M5P was the closest to the observed water level values and best line (45° line), with minimum scatter with the linear equation = 19.576 + 0.9848. Conversely, the REPT model provided the worst estimates with maximum scatter (Figure 4). This confirms that M5P outperformed the other models, including RF, RT and REPT. This result is in accordance with the evaluation criteria presented in Table 2.    We also further analyzed model efficiency using Taylor diagrams ( Figure 5) and boxplots ( Figure 6). The closer the point of each developed model to the observed point location, the higher the performance. Here, our results also show that the models had good predictive power, but the M5P algorithm had a higher correlation and lower RMSE. Based on the normalized standard deviation values, the SD of the M5P model was similar to the observed SD, whereas REPT had a lower standard deviation, followed by the RT and RF models.  The results of the boxplot are presented in Figure 6. The boxplot for predicting maximum values by the M5P model was closer to the observed values, whereas REPT, RT and RF underestimated water levels. In term of quartile, the median and minimum values of all the models were able to predict WL values similar to the observed values with a significant degree of accuracy, although the M5P model outperformed the other models.  We also further analyzed model efficiency using Taylor diagrams ( Figure 5) and boxplots ( Figure 6). The closer the point of each developed model to the observed point location, the higher the performance. Here, our results also show that the models had good predictive power, but the M5P algorithm had a higher correlation and lower RMSE. Based on the normalized standard deviation values, the SD of the M5P model was similar to the observed SD, whereas REPT had a lower standard deviation, followed by the RT and RF models. predict WL values similar to the observed values with a significant degree of accuracy, although the M5P model outperformed the other models.

Discussion
Lakes can be complex ecosystems and provide numerous uses to society, ranging from drinking water supplies, recreation, navigation, irrigation, hydroelectric power and more. Tools that predict water level fluctuation are important for the management of lakes, water consumption and their surrounding catchments. In this study, we developed and applied advanced soft computing decision tree-based ML algorithms, including M5P, RF, RT and REPT, for predicting the daily water level fluctuation of Zrebar Lake, Kurdistan Province, Iran. To our knowledge, this is the first time these models have been used for predicting the daily water levels of lakes.
We computed and measured the predictive performance of the learning models for training and validation datasets by RMSE, MAE, NSE, PBIAS, RSR and R 2 criteria, as used by others [78][79][80][81][82][83][84][85]. After implementing the learning models, we created a histogram of actual and estimated values, a Taylor diagram, scatterplots and a boxplot. The results of the validation phases are of greater importance than the performance of the evaluation by the training dataset (modeling phase) [28,86]. We observed that, although the models were well trained and successfully performed in all scenarios, the M5P model under the second scenario (WL (t-1), WL (t-2)) outperformed and outclassed the REPT, RT and RF models, which had a high performance in the first scenario (only WL (t-1)) only. The reason the M5P model succeeded over the other models is probably related to the advantages of this model. The first advantage is its more efficient learning process, which does not rely on assumptions of data type and distribution, can handle many attributes and high dimensions and is robust in dealing with missing data. The second advantage of M5P is its ability to construct a simple tree structure and applicable linear equations in multiple leaves, with which it can explicitly explain the relationship between the variable inputs and output parameters [64,87].
Other models, like the RF and RT models, are also known as decision tree-based algorithms used for both classification and regression problems, but these models are limited in their abilities to construct a large numbers of trees, making the algorithms slow and ineffective for real-time predictions. According to Kisi et al. [5], decision tree-based algorithms (M5P, RT and RF) have a higher prediction power than models with hidden layers in their structures (ANN, adaptive-neurofuzzy inference system (ANFIS) and fuzzy logic (FL)).
Other water researchers who have used the M5P model have reported mixed levels of performance [88][89][90]. For example, Balouchi et al. [86] found that the M5P model performance was inferior to the ANN-MLP and radial basis function neural network (ANN-RBF) models for the prediction of scour depth at river confluences. In contrast, Onyari and Ilunga [90] also compared

Discussion
Lakes can be complex ecosystems and provide numerous uses to society, ranging from drinking water supplies, recreation, navigation, irrigation, hydroelectric power and more. Tools that predict water level fluctuation are important for the management of lakes, water consumption and their surrounding catchments. In this study, we developed and applied advanced soft computing decision tree-based ML algorithms, including M5P, RF, RT and REPT, for predicting the daily water level fluctuation of Zrebar Lake, Kurdistan Province, Iran. To our knowledge, this is the first time these models have been used for predicting the daily water levels of lakes.
We computed and measured the predictive performance of the learning models for training and validation datasets by RMSE, MAE, NSE, PBIAS, RSR and R 2 criteria, as used by others [78][79][80][81][82][83][84][85]. After implementing the learning models, we created a histogram of actual and estimated values, a Taylor diagram, scatterplots and a boxplot. The results of the validation phases are of greater importance than the performance of the evaluation by the training dataset (modeling phase) [28,86]. We observed that, although the models were well trained and successfully performed in all scenarios, the M5P model under the second scenario (WL(t-1), WL(t-2)) outperformed and outclassed the REPT, RT and RF models, which had a high performance in the first scenario (only WL(t-1)) only. The reason the M5P model succeeded over the other models is probably related to the advantages of this model. The first advantage is its more efficient learning process, which does not rely on assumptions of data type and distribution, can handle many attributes and high dimensions and is robust in dealing with missing data. The second advantage of M5P is its ability to construct a simple tree structure and applicable linear equations in multiple leaves, with which it can explicitly explain the relationship between the variable inputs and output parameters [64,87].
Other models, like the RF and RT models, are also known as decision tree-based algorithms used for both classification and regression problems, but these models are limited in their abilities to construct a large numbers of trees, making the algorithms slow and ineffective for real-time predictions. According to Kisi et al. [5], decision tree-based algorithms (M5P, RT and RF) have a higher prediction power than models with hidden layers in their structures (ANN, adaptive-neuro-fuzzy inference system (ANFIS) and fuzzy logic (FL)).
Other water researchers who have used the M5P model have reported mixed levels of performance [88][89][90]. For example, Balouchi et al. [86] found that the M5P model performance was inferior to the ANN-MLP and radial basis function neural network (ANN-RBF) models for the prediction of scour depth at river confluences. In contrast, Onyari and Ilunga [90] also compared multilayer neural networks (ANN-MLP) with M5P tree models to predict the stream flow in Luvuvhu Catchment, South Africa, with the M5P model yielding better predictions.
In our study, the M5P model provided the best prediction of the daily water level fluctuation of Zrebar Lake. The difference in the results from the modeling process compared to other models with less favorable results requires further scrutiny. While M5P was optimal for predicting lake levels at Zrebar Lake, it underperformed for others, as described above, and thus requires further testing in other settings. The main limitation of the current research is the lack of a comprehensive dataset, such as rainfall, inflow discharge, evaporation and so on, which have a meaningful effect on the result. It is recommended to compare the results of the present study with ensemble-based models and optimization algorithms to develop a more robust algorithm.

Conclusions
The accurate prediction of lake water level fluctuation can help guide the sustainable development and management of lake water usage. In this study, we tested and developed a number of state-of-the-art soft computing benchmark ML models, including M5P, RF, RT and REPT, to spatially predict the daily water level fluctuation of Zrebar Lake, Kurdistan Province, Iran. We used various scenarios, based on the combination of data inputs, to select the optimal parameters. We evaluated the performance of the developed models quantitatively using RMSE, MAE, NSE, PBIAS, RSR and R 2 measures. Our results are summarized as follows: The M5P model outperformed the other models when WL(t-1) and WL(t-2) variables, the second scenario, were selected as inputs, which implied a combination of one-and two-day lag times of water level prediction. The best performance by other ML models was achieved with a one-day lag time of real measured water levels.
Our results showed that the M5P had the highest power performance and accuracy (the lowest RMSE and the highest R 2 ), in comparison to other machine learning models. Additionally, the M5P model had a tighter fit to the observed data based on the scatter plots and histograms of actual and estimated values, thus showing promise for wider applications in water level prediction.
A lake with a one-day lag time has the highest effect on the results, while its effectiveness reduces with more lag time.
The best input scenario is the one in which all input variables are considered. The M5P model is able to predict maximum lake water level perfectly, compared to other developed algorithms.