Simulation of Titicaca Lake Water Level Fluctuations Using Hybrid Machine Learning Technique Integrated with Grey Wolf Optimizer Algorithm

Lakes have an important role in storing water for drinking, producing hydroelectric power, and environmental, agricultural, and industrial uses. In order to optimize the use of lakes, precise prediction of the lake water level (LWL) is a main issue in water resources management. Due to the existence of nonlinear relations, uncertainty, and characteristics of the time series variables, the exact prediction of the lake water level is difficult. In this study the hybrid support vector regression (SVR) and the grey wolf algorithm (GWO) are used to predict lake water level fluctuations. Also, three types of data preprocessing methods, namely Principal component analysis, Random forest, and Relief algorithm were used for finding the best input variables for prediction LWL by the SVR and SVR-GWO models. Before the LWL simulation on monthly time step using the hybrid model, an evolutionary approach based on different monthly lags was conducted for determining the best mask of the input variables. Results showed that based on the random forest method, the best scenario of the inputs was Xt−1, Xt−2, Xt−3, Xt−4 for the SVR-GWO model. Also, the performance of the SVR-GWO model indicated that it could simulate the LWL with acceptable accuracy (with RMSE = 0.08 m, MAE = 0.06 m, and R2 = 0.96).


Introduction
Lake water level fluctuations have important impacts on water resource management planning, such as water supply management, irrigation and drinking, recharge of groundwater aquifers, ecological and environmental changes, fishing, tourism, and many other forms of activities [1,2]. Changes in lake levels are affected by many factors such as precipitation, direct and indirect runoff from adjacent basins, evaporation from the free surface of the lake, and interactions between the lake and the groundwater table. The study of the lake water fluctuations was always of interest to hydrologists and environmental experts [3]. The accurate prediction of a lake water level intervals, has always been one of the challenges faced by hydrologists. Prediction of the lake water level (LWL) fluctuations at various time intervals is a challenging task in water resources management, and forecasting it using the records of past time series provides useful data for planning, designing, and constructing water resource projects. Hence, researchers are trying to predict the LWL through accurate and cost-effective ways. Lake water level forecasting can be done using some complex models that include many relevant parameters, but the sensitivity of the LWL to these factors may vary from one area to another [4].
Over recent decades, several theory-driven (e.g., conceptual hydrological models) and data-driven techniques have been adopted and extended to the use for lake water fluctuations simulation [5]. The process-driven models, e.g., IHACRES [6], HSPF [7], SWAT [8], stochastic dynamic models [9], are useful for understanding the physically-based hydrological process in a watershed. However, the main constraint for the use of these models is their extensive data requirements which restrict their implementation with intensively monitored watersheds. Generally, hydrologists are concerned about the accurate prediction at specific locations with affordable time and cost. On the other hand, the data-driven techniques are estimated not from the physically-based hydrological process, but from the concurrent hydro-meteorological input-output data time series [10]. In comparison with theory-driven techniques, fewer parameters are needed for data-driven models. Consequently, the data requirement of these models is significantly less than those of theory-driven models. Various types of data-driven techniques such as Support Vector Regression (SVR) [11], artificial neural networks (ANNs) [12], adaptive neuro-fuzzy inference system [13], genetic expression programming and M5 model trees [14], have been extensively applied in hydro-environmental studies including prediction of the hydrological variables.
In recent decades, artificial intelligence models have become a proven tool in modeling nonlinear parameters and have been utilized by researchers in different hydrological studies [15][16][17][18][19][20]. Support vector regression is one of the advanced models of machine learning that operates based on minimizing error [21]. Although many studies show the ability of support vector regression (SVR) models to predict hydrological variables [22], SVR performance can be enhanced by hybridization with the Grey Wolf Optimizer algorithm (GWO) [23]. The GWO algorithm was first introduced by Mirjalili et al. [24] and was used in hydrology and other associated areas [25].
Providing water needed for agriculture, industry, and beverage requires precise and long-term planning. Hence, the prediction of hydrological variables in order to manage the utilization of water resources is necessary, in which the level of lakes as a natural heritage is of special importance. To this end, it is very important to use methods that achieve the most accurate result with the least time and cost. Pillco and Bengtsson [26] used the water balance method to predict the water level of Poopo Lake at a monthly scale. The main inflow water source of this lake is from the Desaguadero River, which is an outflow of Titicaca Lake. They discussed the dynamic of water levels of Poopo Lake related to climate variability, as well as climate change scenarios. Their results indicated that lowering the water depth decreased one meter in a wet season; the Poopo lake will be dry in the dry season.
The objective of this study is to examine the ability of new hybrid support vector regression (SVR) model and the Grey Wolf Optimizer algorithm (GWO) for short-term prediction of lake water level. To do that, the data of the Titicaca Lake was used as a case study for the training and testing model. Finally, the performance of the hybrid SVR-GWO model was compared to the SVR model. Therefore, in this study, the hybrid support vector regression model and a Grey Wolf Optimizer model is used to improve the prediction of Titicaca lake water level at a monthly scale dataset during the 1973-2017 period. Also, in this research, we used three different pre-processing methods (principal component analysis, random forest, and Relief algorithm) to find the best mask of inputs for the proposed model.

Study Area
Titicaca Lake (TL) is one of the most important water body resources on the South American continent, plays an important role in the agriculture, habitat for animals, human being, and plant species; as well as in ecology of the region, economic and industrial activities, and the environment [27]. This lake is located at 68 • 33 -70 • 1 W and 15 • 6 -16 • 50 S between Peru and Bolivia ( Figure 1). Titicaca Lake is an international wetland, the largest, deepest, and the highest navigable water body in the world; also, it is important from the point of view tourism for South America. Furthermore, this lake feeds significantly to the second largest and shallow Poopó Lake in Bolivia, placed at 300 km downstream from Titicaca [26]. The total lake surface is 8560 km 2 at an elevation of 3810 m a.m.s.l, with a mean depth of 105 m. The outlet is still at 3807 m a.m.s.l. [28].
Water 2020, 12, x FOR PEER REVIEW 3 of 19 [27]. This lake is located at 68°33′-70°1′ W and 15°6′-16°50′ S between Peru and Bolivia ( Figure 1). Titicaca Lake is an international wetland, the largest, deepest, and the highest navigable water body in the world; also, it is important from the point of view tourism for South America. Furthermore, this lake feeds significantly to the second largest and shallow Poopó Lake in Bolivia, placed at 300 km downstream from Titicaca [26]. The total lake surface is 8560 km 2 at an elevation of 3810 m a.m.s.l, with a mean depth of 105 m. The outlet is still at 3807 m a.m.s.l. [28].

Data Used
To conduct this research, the time series data of the water level which was measured in the hydrometric station of Titicaca Lake (located in southeast Peru) were used. As shown in Figure 1, this station is the closest representative station which present water level variations within the lake. The monthly time series data was used for tuning the parameters of the investigated models. For this aim, the selection dataset is from August 1973 to January 2017. 75% of the dataset (i.e., 392 data points) used for the training phase from August 1973 to March 2006 and 25% of dataset (i.e., 130 data points) used for the testing phase from April 2006 to January 2017. The average yearly precipitation of the Titicaca Lake basin is 609.5 mm/year, the average yearly minimum air temperature is −0.08 °C, and the average yearly maximum air temperature is 17.2 °C. Table 1 shows the statistical characteristics of data used including length, minimum, maximum, mean, standard deviation, Skewness, Kurtosis, and confidence level. Additionally, Figures 2 and 3 indicate the time series of the water level of Titicaca for the duration of the study period  and also, monthly patterns of some years, respectively.

Data Used
To conduct this research, the time series data of the water level which was measured in the hydrometric station of Titicaca Lake (located in southeast Peru) were used. As shown in Figure 1, this station is the closest representative station which present water level variations within the lake. The monthly time series data was used for tuning the parameters of the investigated models. For this aim, the selection dataset is from August 1973 to January 2017. 75% of the dataset (i.e., 392 data points) used for the training phase from August 1973 to March 2006 and 25% of dataset (i.e., 130 data points) used for the testing phase from April 2006 to January 2017. The average yearly precipitation of the Titicaca Lake basin is 609.5 mm/year, the average yearly minimum air temperature is −0.08 • C, and the average yearly maximum air temperature is 17.2 • C. Table 1 shows the statistical characteristics of data used including length, minimum, maximum, mean, standard deviation, Skewness, Kurtosis, and confidence level. Additionally, Figures 2 and 3 indicate the time series of the water level of Titicaca for the duration of the study period  and also, monthly patterns of some years, respectively.

Preprocessing Methods
Model input combination and proper training and testing data lengths are among the main limiting factors on the predictive accuracy of any non-linear model [29]. In order to determine the best mask of input variables, the Principal component analysis (PCA), Random forest (RF) and Relief (RL) are used to abstract the most correlated variables.

Principal Component Analysis (PCA)
The data dimension is the number of variables that are measured per view. Many of the data sets may have a high-dimension space, and therefore their interpretation will be difficult. Therefore,

Preprocessing Methods
Model input combination and proper training and testing data lengths are among the main limiting factors on the predictive accuracy of any non-linear model [29]. In order to determine the best mask of input variables, the Principal component analysis (PCA), Random forest (RF) and Relief (RL) are used to abstract the most correlated variables.

Principal Component Analysis (PCA)
The data dimension is the number of variables that are measured per view. Many of the data sets may have a high-dimension space, and therefore their interpretation will be difficult. Therefore,

Preprocessing Methods
Model input combination and proper training and testing data lengths are among the main limiting factors on the predictive accuracy of any non-linear model [29]. In order to determine the best mask of input variables, the Principal component analysis (PCA), Random forest (RF) and Relief (RL) are used to abstract the most correlated variables.

Principal Component Analysis (PCA)
The data dimension is the number of variables that are measured per view. Many of the data sets may have a high-dimension space, and therefore their interpretation will be difficult. Therefore, we tried to reduce the dimensions of data. Principal component analysis (PCA) is one of the most common methods of reducing the dimensions with the minimum error between the original data set Water 2020, 12, 3015 5 of 18 and the new obtained dimensions. PCA is an unsupervised learning method, is similar to clustering methods [30], and can be used for extracting important variables (in the form of the component) of the large set of variables in a dataset. In this method, the variables in a multimode correlated area are considered to be a collection of non-correlated components. The PCA method reduces the number of variables and finds the relationship structure between variables that are the same classification of variables. By applying the PCA variables, the main input is converted to new variables that are non-correlation. The PCA changes the input variables to the main components that are independent and linear combinations of input variables. In this method, data will be provided with a minimum of losses in the main components [10]. In this study, the first components of data sets, which has at least 80% of the total variance, are used as the input data for the predictive models.

Random Forest (RF)
Random Forests (RF) is a popular and powerful tool in order to select proper input variables to build predictive models. This method is used extensively across a multitude of engineering fields. It is also known as random decision forests. RF is one of the latest techniques which can be used for classification and regression-based analyses [31]. In this technique, the number of trees with different verity was used for forecasting or estimation. Tree predictors were used as random numerical values to class labels in the random forest classifier. Random forest regression used in this study contains an assembly of input parameters or arbitrarily chosen parameters at each node to grow a tree. RF technique requires only two user-defined parameters, such as the number of parameters used at each node and the number of trees [32].

Relief (RL)
Relief (RL) is an algorithm introduced by Kira and Rendell [33] that takes a filter-method approach to feature selection that is notably sensitive to feature interactions. The RL method is a random search method based on the filter model and no feedback on the learning algorithm is applied to assess the subset of the selected features. The stages of the RL, as the algorithm for reducing input data dimensions, are as follows: In the first step, each ranking correlation feature is given to the ultimate goal. In the second step, the ranks are updated using randomly selected samples. Then these ratings are sorted and the features with the lowest rank are removed, and the threshold is used to ranks the feature. The remaining features are retained as the superior features are kept in the input of a batch algorithm. The RL algorithm works fine for the noise characteristics or features of a good correlation, and the complexity of the time is linear and functional of the number of features and the number of exemplars the sample collection. It works well for both continuous and discrete datasets. The RL algorithm uses an evaluation function that calculates the sum of a statistically significant criterion and a criterion of complexity makes it a minimum. This algorithm finds the first feature to be able to make classes even better. It then finds properties, which in combination with the selected features, increase the separation of classes. This process is stopped when it gets to the least expected criteria for the pending representation [33].

Support Vector Regression
Support vector regression (SVR) is a set of supervised learning methods and can be used for classification and regression tasks. This method is based on binary classification in the space of arbitrary properties and is, therefore, a suitable method for predictive problems [21]. SVR can learn and model the nonlinear relation between the input and output data in the higher dimension, which minimizes the observed training error and the distribution error extent to attain generalized regression efficiency [10]. In fact, support vector regression is an efficient learning system based on effective optimization theory. This technique implements the principle of inductive minimization of structural error in order to attain a general optimal solution. The structure of the SVR is presented in Figure 4.
Water 2020, 12, x FOR PEER REVIEW 6 of 19 structural error in order to attain a general optimal solution. The structure of the SVR is presented in Figure 4. This study used radial basis function (RBF) as the learning algorithm for SVR. Let the training data set ( , ), ( , ), … ( , ) where ( = 1 − ) and ( = 1 − ) indicate the i-th input and output vector, respectively, and n is regarded as the number of training data pairs. The SVR function attempts to detect a suitable function ( ) by minimizing the distribution error limits to attain general regression efficiency as follows: where φ(x) represents the non-linear projection function which maps the input vector x into a higherdimensional feature space where the trained data show linearity. However, w and b are the weight and constant coefficients, respectively, which are estimated by minimizing the arranged risk function.

Grey Wolf Optimizer
The grey wolf optimizer (GWO) algorithm is a nature-inspired swarm-based algorithm that emulates the social hierarchy of wolves and their behavior in approaching, encircling, and attacking the prey . Each member in a grey wolf herd categorized as α, β, δ and ω based on the effectiveness and decision-making power in the group. The alpha wolf usually corresponds to the strongest and dominant wolf who leads the herd, and his/her order should be enforced by the rest of the group. The β wolves are the second group that plays the role of advisors for alphas. The beta wolves assist the alpha wolves by reinforcing their commands through the rest of the wolves [24]. The δ wolves are ranked below the α and β wolves and above the ω wolvess in terms of leadership hierarchy. They are guards, sentinels, hunters, and caretakers of the group. The ω wolves are placed in the lowest order of decision-making and have to submit to all other wolves. To address an optimization problem using the GWO algorithm, the social behavior of the grey wolves in the hunting process is modeled in a statistical framework. The hunting process of the grey wolves is followed in three stages: tracking, encircling and attacking the prey [34]. The hunting process is organized by α, β and δ wolves. The ω wolves follow α, β and δ wolves [35]. The best solution is considered as α. The β, δ, and ω are the next subsequent solutions in terms of priority. In GWO iterations, the wolves evaluate the possible hunt situation, and update their status according to that. The mathematical formulation of the encircling process is as follows: In the above equation, denotes the current iteration, ⃗ and ⃗ respectively implicates the position vectors of the hunt and hunter (candidate solutions), ⃗ = 2 ⃗. ⃗ − ⃗ and ⃗ = 2⃗ indicate coefficient vectors with ⃗ and ⃗ being as random vectors (between 0, 1) that allow the This study used radial basis function (RBF) as the learning algorithm for SVR. Let the training data set (x 1 , y 1 ), (x 2 , y 2 ), . . . (x n , y n ) where x i (i = 1 − n) and y i (i = 1 − n) indicate the i-th input and output vector, respectively, and n is regarded as the number of training data pairs. The SVR function attempts to detect a suitable function f (x) by minimizing the distribution error limits to attain general regression efficiency as follows: where ϕ(x) represents the non-linear projection function which maps the input vector x into a higher-dimensional feature space where the trained data show linearity. However, w and b are the weight and constant coefficients, respectively, which are estimated by minimizing the arranged risk function.

Grey Wolf Optimizer
The grey wolf optimizer (GWO) algorithm is a nature-inspired swarm-based algorithm that emulates the social hierarchy of wolves and their behavior in approaching, encircling, and attacking the prey . Each member in a grey wolf herd categorized as α, β, δ and ω based on the effectiveness and decision-making power in the group. The alpha wolf usually corresponds to the strongest and dominant wolf who leads the herd, and his/her order should be enforced by the rest of the group. The β wolves are the second group that plays the role of advisors for alphas. The beta wolves assist the alpha wolves by reinforcing their commands through the rest of the wolves [24]. The δ wolves are ranked below the α and β wolves and above the ω wolvess in terms of leadership hierarchy. They are guards, sentinels, hunters, and caretakers of the group. The ω wolves are placed in the lowest order of decision-making and have to submit to all other wolves. To address an optimization problem using the GWO algorithm, the social behavior of the grey wolves in the hunting process is modeled in a statistical framework. The hunting process of the grey wolves is followed in three stages: tracking, encircling and attacking the prey [34]. The hunting process is organized by α, β and δ wolves. The ω wolves follow α, β and δ wolves [35]. The best solution is considered as α. The β, δ, and ω are the next subsequent solutions in terms of priority. In GWO iterations, the wolves evaluate the possible hunt situation, and update their status according to that. The mathematical formulation of the encircling process is as follows: In the above equation, t denotes the current iteration, → X P and → X respectively implicates the position vectors of the hunt and hunter (candidate solutions), r 2 indicate coefficient vectors with → r 1 and → r 2 being as random vectors (between 0, 1) that allow the wolves update their status in the hunt space. Values of → a are linearly reduced from 2 to zero during the process as follows: where t signifies the running iteration and t signify the total number of iterations. During the search process, ω wolves update their status according to the best exploration factors i.e., α, β and δ, which represented by the following equations:

Hybrid SVR-GWO Model
In the proposed method, the Grey Wolf optimization algorithm is coupled with support vector regression (SVR-GWO). Indeed, the GWO used to optimize the SVR parameters. Figure 3 illustrates adjusting the search between local and global optima. In the weight optimization process, the weights are randomly assigned initially. The initial weights are changed and modified in each repetition as long as the difference of support vector regression output and the output related to the actual inputs is less than a specified limit.
In order to use the GWO for SVR weight updates, training SVR as a function should be highlighted and the problem aims to optimize it in an n-dimensional space. In this regard, the SVR is considered a grey wolf optimizer in d-dimensional space, d equals to the total number of weights and SVR bias. In other words, the location of each grey wolf has a dimension, which is equal to the total number of weights and SVR bias. This algorithm modifies each grey wolf location and returns the best wolf as the answer. Then, the weight of SVR is updated based on the values obtained from the best position of wolves. The best weight is acquired if one of two final conditions is established: first when the network minimum squared errors is less than a certain threshold, and second the number of iterations which is already known. Figure 3 displays a schematic view of the SVR model integrated with GWO optimizer applied for predicting lake water lake by using data-driven techniques method for data assimilation ( Figure 5). It is an estimator hybrid procedure that utilizes the ANFIS capability together with optimization algorithms robustness. It is already confirmed that predictor tools can predict more successful results by coupling with optimization approaches [36][37][38][39][40][41].
Water 2020, 12, x FOR PEER REVIEW 8 of 19 Figure 5. Schematic view of predicting lake water lake by SVR and SVR-GWO by using data-driven techniques.

Performance Indexes
Various performance evaluation criteria such as root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R-squared) were utilized to evaluate the simulated results of each model, as illustrated in Equations (8)-(11), respectively: where is the observed value, ̅ is the mean of observed values, is the estimated value, is the mean of estimated values, and σx and σy are the standard deviation of the observed and estimated data, and n is the number of observations.

Implementation of the Preprocessing Methods
The predictive accuracy of the machine learning approaches relies significantly on the proper input combination and efficient size of training data. In this regard, three methods of the PCA, RF and RL, were implemented to find the best mask of the input variables. The predictor inputs in this study are selected among the time lags of the output series, as a time series prediction problem. This approach is applicable for forecasting the variable for one step (in this study one month) ahead. Table  2 lists the result of the PCA method to investigate the effective input variables. For this aim, the PCA method was employed to analyze the monthly time lags of water level. This method was implemented for 12 months of time lags and results show that PCA1 (component 1) covered 84.98% of all variances so that PCA1 is selected as the best mask of the input variable for the predictor models. Also, PCA1 as the most effective input by PCA method introduced (PCA1 = 0.1117 × L1 + 0.1138 × L2 Figure 5. Schematic view of predicting lake water lake by SVR and SVR-GWO by using data-driven techniques.

Performance Indexes
Various performance evaluation criteria such as root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R-squared) were utilized to evaluate the simulated results of each model, as illustrated in Equations (8)-(11), respectively: where x i is the observed value, x is the mean of observed values, y i is the estimated value, y is the mean of estimated values, and σx and σy are the standard deviation of the observed and estimated data, and n is the number of observations.

Implementation of the Preprocessing Methods
The predictive accuracy of the machine learning approaches relies significantly on the proper input combination and efficient size of training data. In this regard, three methods of the PCA, RF and RL, were implemented to find the best mask of the input variables. The predictor inputs in this study are selected among the time lags of the output series, as a time series prediction problem. This approach is applicable for forecasting the variable for one step (in this study one month) ahead. Table 2 lists the result of the PCA method to investigate the effective input variables. For this aim, the PCA method was employed to analyze the monthly time lags of water level. This method was implemented for 12 months of time lags and results show that PCA1 (component 1) covered 84.98% of all variances so Water 2020, 12, 3015 9 of 18 that PCA1 is selected as the best mask of the input variable for the predictor models. Also, PCA1 as the most effective input by PCA method introduced (PCA1 = 0.1117 × L1 + 0.1138 × L2 +0.1155 × L3 +0.1169 × L4 + 0.1179 × L5 + 0.1183 × L6 +0.1183 × L7 + 0.1179 × L8 +0.1169 × L9 + 0.1155 × L10 + 0.1138 × L11 + 0.1117 × L12 − 5287.7).              After implementation of the pre-processing methods on historical lake water level data, three scenarios were defined for the predictor models (SVR and SVR-GWO) to be implemented. Table 3 shows three scenarios defined according to the selected input variables. Table 3. Scenarios defined according to the selected input variables by PCA, RF, and RL pre-processing methods.

Performances of the SVR and SVR-GWO Models
The SVR-GWO, meta-optimized hybrid version of the base SVR with the GWO algorithm, was utilized for the simulation of Titicaca lake water lake in monthly scale. The training and testing of the SVR-GWO model were done with that measured data. The parameter settings of the SVR-GWO model have selected by trial and error method; whereas, the population size set to 20 and the max number of iterations is set to 500. The statistical results of the performance evaluation of the investigated models are also provided in Table 4. As shown in Table 4  Among the pre-processing used methods (i.e., PCA, RF and RL), the RL method has been the most successful method compared to the PCA and RF methods for determining inputs. Based on statistical indices, the third combination of data (SVR-GWO3) was selected as the best input data for prediction of the lake water level. The main reason for that was that this method introduced fewer inputs; furthermore, the performance of SVR-GWO3 was better than the other scenarios, while the RF method in SVR2 model selected more inputs and had weaker results. In general, the new hybrid model (SVR-GWO) had better result with less input than the ordinary SVR models. Also, it can be perceived that the GWO algorithm attained the best fitness values compared to ordinary SVR trainers' optimizers. So, in Table 4, the MAE's index for the training part of SVR model is 0.312 m (for SVR1), 0.082 m (for SVR2), and 0.08 m (for SVR3), respectively. The same magnitude for the training part of the SVR-GWO model is 0.208 m (for SVR-GWO1), 0.06 m (for SVR-GWO2), and 0.052 m (for SVR-GWO3). This suggests that combining the grey wolf optimization algorithm with the support vector regression model can create a new hybrid model with high learning ability, which is a powerful tool (SVR-GWO) for estimating the level of lake water. The optimal parameters of the SVR ranged between γ (Radial basis function parameter) = 0.45 -20.2 and C (Trade-off parameter) = 1.68 -19.14. Also, for running the GWO used 500 as the maximum number of iterations, 50 as the number of agents.
As the comparison between results of Table 4 shows that using data with four lag times in model SVR-GWO2 and with five lag times in SVR-GWO3 model would not further increase the prediction accuracy. In other word, increasing lag times would not improve the accuracy of the model and the results have been converged. Also, this topic shows lag number 5 is not very effective in lake water level for Titicaca Lake. Then, by ignoring some minor errors, this study suggests using less inputs for models. Figure 8 reported the scatter plots of the predictions by the SVR and SVR-GWO models versus the actual values in the training and testing stages. Based on the presented graphical variation between the observed lake water level and predicted values, the SVR-GWO model attained the distinguished correlation with the highest value at all scenarios compared to the ordinary SVR model. This is evidenced by the potential of the SVR-GWO predictive model to capture the variance observations of the water level of Titicaca Lake with a couple of scattered observations. Indeed, using the GWO optimizer improved the performance of hybrid SVR-GWO compared to the original SVR model in all scenarios. Also, according to Figure 7, the PCA pre-processing method has a weak performance in determining the input variables for prediction of the lake water level, while pre-processing methods RF and RL have been successful in determining inputs. As the comparison between results of Table 4 shows that using data with four lag times in model SVR-GWO2 and with five lag times in SVR-GWO3 model would not further increase the prediction accuracy. In other word, increasing lag times would not improve the accuracy of the model and the results have been converged. Also, this topic shows lag number 5 is not very effective in lake water level for Titicaca Lake. Then, by ignoring some minor errors, this study suggests using less inputs for models. Figure 8 reported the scatter plots of the predictions by the SVR and SVR-GWO models versus the actual values in the training and testing stages. Based on the presented graphical variation between the observed lake water level and predicted values, the SVR-GWO model attained the distinguished correlation with the highest value at all scenarios compared to the ordinary SVR model. This is evidenced by the potential of the SVR-GWO predictive model to capture the variance observations of the water level of Titicaca Lake with a couple of scattered observations. Indeed, using the GWO optimizer improved the performance of hybrid SVR-GWO compared to the original SVR model in all scenarios. Also, according to Figure 7, the PCA pre-processing method has a weak performance in determining the input variables for prediction of the lake water level, while preprocessing methods RF and RL have been successful in determining inputs.  Figure 9 shows the box plot of the measured and the predicted values by the SVR and SVR-GWO models for the testing stage. Based on these figure results obtained by the RL and RF with SVR-GWO model, it shows the highest similarities with observation so that the SVR-GWO2 and SVR-GWO3 can accurately simulate the minimum, maximum, 25% range and 75% range intervals. Based on the results of this study, the SVR-GWO model has been performing better and more accurately in predicting lake water level. Finlay, SVR1 is the weakest model and SVR-GWO2 is a strong model for the prediction of lake water level. In Figure 10, the best output of each model has plotted versus measured values for four years (2013, 2014, 2015, and 2016). This figure is drawn up to better understand the ability of each model to predict LWL for the testing period. The hybrid SVR-GWO2 model has better overlap with observational values and its disagreements with observational values are significantly less than the ordinary SVR model. Also, based on the results of this figure, most of the forecast errors in four years were from March to July in both models. Figure 11 shows the prediction error of the SVR and SVR-GWO models for the testing period. According to this figure, the SVR-GWO2 and SVR-GWO3 models predict the LWL with low error, and subsequently they reduce the average error by 27% and 28% compared to the SVR1, SVR2, SVR3, Figure 8. Scatter plot of the predicted lake water level from the SVR and SVR-GWO models versus the corresponding measurements for the testing stage. Figure 9 shows the box plot of the measured and the predicted values by the SVR and SVR-GWO models for the testing stage. Based on these figure results obtained by the RL and RF with SVR-GWO model, it shows the highest similarities with observation so that the SVR-GWO2 and SVR-GWO3 can accurately simulate the minimum, maximum, 25% range and 75% range intervals. Based on the results of this study, the SVR-GWO model has been performing better and more accurately in predicting lake water level. Finlay, SVR1 is the weakest model and SVR-GWO2 is a strong model for the prediction of lake water level.
Water 2020, 12, x FOR PEER REVIEW 12 of 19 Figure 8. Scatter plot of the predicted lake water level from the SVR and SVR-GWO models versus the corresponding measurements for the testing stage. Figure 9 shows the box plot of the measured and the predicted values by the SVR and SVR-GWO models for the testing stage. Based on these figure results obtained by the RL and RF with SVR-GWO model, it shows the highest similarities with observation so that the SVR-GWO2 and SVR-GWO3 can accurately simulate the minimum, maximum, 25% range and 75% range intervals. Based on the results of this study, the SVR-GWO model has been performing better and more accurately in predicting lake water level. Finlay, SVR1 is the weakest model and SVR-GWO2 is a strong model for the prediction of lake water level. In Figure 10, the best output of each model has plotted versus measured values for four years (2013, 2014, 2015, and 2016). This figure is drawn up to better understand the ability of each model to predict LWL for the testing period. The hybrid SVR-GWO2 model has better overlap with observational values and its disagreements with observational values are significantly less than the ordinary SVR model. Also, based on the results of this figure, most of the forecast errors in four years were from March to July in both models. Figure 11 shows the prediction error of the SVR and SVR-GWO models for the testing period. According to this figure, the SVR-GWO2 and SVR-GWO3 models predict the LWL with low error, and subsequently they reduce the average error by 27% and 28% compared to the SVR1, SVR2, SVR3, Figure 9. Box plot of observed and predicted lake water level for the testing period of the SVR and SVR-GWO models in all scenarios.
In Figure 10, the best output of each model has plotted versus measured values for four years (2013, 2014, 2015, and 2016). This figure is drawn up to better understand the ability of each model to predict LWL for the testing period. The hybrid SVR-GWO2 model has better overlap with observational values and its disagreements with observational values are significantly less than the ordinary SVR model. Also, based on the results of this figure, most of the forecast errors in four years were from March to July in both models. and SVR-GWO3 models offer much better performance in terms of prediction of the high and the low values of the Titicaca Lake water level dataset. Also, the combination of different models indicates that using the integrated of the SVR model and GWO algorithm increased the accuracy of predicting the lake water level. In addition, Figure 11 shows that two methods of the Random forest and Relief have a good ability for LWL data pre-processing and these methods can find the best mask of input variables.   Figure 11 shows the prediction error of the SVR and SVR-GWO models for the testing period. According to this figure, the SVR-GWO2 and SVR-GWO3 models predict the LWL with low error, and subsequently they reduce the average error by 27% and 28% compared to the SVR1, SVR2, SVR3, and SVR-GWO1 models for testing stages, respectively. Accordingly, two models of the SVR-GWO2 and SVR-GWO3 models offer much better performance in terms of prediction of the high and the low values of the Titicaca Lake water level dataset. Also, the combination of different models indicates that using the integrated of the SVR model and GWO algorithm increased the accuracy of predicting the lake water level. In addition, Figure 11 shows that two methods of the Random forest and Relief have a good ability for LWL data pre-processing and these methods can find the best mask of input variables.
Water 2020, 12, x FOR PEER REVIEW 14 of 19 Figure 11. Time series of prediction error for the SVR and SVR-GWO models for the testing period.
Histogram plot is another method of evaluation ability of models. Figure 12 shows the histogram diagram for the testing period of all scenarios. Based on the result of this figure, the PCA method is a weak method for identifying effective variables for the perdition lake water level, so that both the SVR1 and SVR-GWO1 models had a poor performance, while RL and RF methods had better performance in identifying effective variables for lake water level prediction, because the SVR2, SVR3, SVR-GWO2 and SVR-GWO3 had better performance. Overall, the pre-processing of the RF method was more successful in identifying input variable to predict Titicaca lake water level compared the RL and PCA methods, and using this method improved the efficiency of the SVR-GWO model. Histogram plot is another method of evaluation ability of models. Figure 12 shows the histogram diagram for the testing period of all scenarios. Based on the result of this figure, the PCA method is a weak method for identifying effective variables for the perdition lake water level, so that both the SVR1 and SVR-GWO1 models had a poor performance, while RL and RF methods had better performance in identifying effective variables for lake water level prediction, because the SVR2, SVR3, SVR-GWO2 and SVR-GWO3 had better performance. Overall, the pre-processing of the RF method was more successful in identifying input variable to predict Titicaca lake water level compared the RL and PCA methods, and using this method improved the efficiency of the SVR-GWO model. There are some similar studies to discuss regarding the current study's results. An SVR model was used to predict the water level of Poyang Lake in China [42], which showed similar results. In Li et al. [42] the R 2 value of SVR predictions were about 0.95 & 0.96 and approved the high capability of this model in lake water level prediction; which is in line with the results of the current research, despite the lakes being located in two completely different areas. To evaluate the positive effect of merging SVR with GWO, there are no studies about lake water level prediction. However, in other There are some similar studies to discuss regarding the current study's results. An SVR model was used to predict the water level of Poyang Lake in China [42], which showed similar results. In Li et al. [42] the R 2 value of SVR predictions were about 0.95 & 0.96 and approved the high capability of this model in lake water level prediction; which is in line with the results of the current research, despite the lakes being located in two completely different areas. To evaluate the positive effect of merging SVR with GWO, there are no studies about lake water level prediction. However, in other areas such as monthly streamflow forecasting, this was a successful and efficient combination; so that in hybrid forms of SVR, the GWO was superior in competition with the algorithms such as Particle Swarm Optimization (PSO), Multi-Verse Optimization (MVO) and Shuffled Complex Evolution (SCE) [43].

Conclusions
In the present study, support vector regression (SVR) integrated with the grey wolf optimizer algorithm (GWO) has been adopted for the forecasting of the lake water level (LWL). The historical water level data of Titicaca Lake for the period of 1973-2017 were used as a case study. Three types of data preprocessing methods (PCA, RF, and RL) were used for finding the best input variables for prediction the water level of Titicaca Lake by the SVR and hybrid SVR-GWO models. Based on the results of three preprocessing methods, six scenarios are considered for developing these models. The results have been evaluated with several statistical score metrics (i.e., RMSE, MAE, and R 2 ) and visual displays (i.e., scatter plot, box plot, etc.). Comparing the results of six scenarios from the implementation of the SVR and SVR-GWO models showed that the performance of the RF pre-processing method was better than PCA and RF methods for finding the best input for predictor models. The results demonstrated that the meta-optimized hybrid model (SVR-GWO) enhanced the capability of the original SVR model for the reproduction of the monthly lake water level. The SVR-GWO model with inputs of L (t − 1), L (t − 2), L (t − 3), L (t − 4) was found to be the most suitable model for prediction LWL. The results of this study suggest that the Grey Wolf Optimizer algorithm is a useful add-on tool for enhancing the accuracy of forecasting SVR model to predict LWL. Also, this research provided evidence for the effectiveness of the hybrid model, which can be utilized and investigated in hydrology for forecasting time series data such as lake water level (LWL). The results show that the models can be used to represent a 1-month (ahead) prediction of the lake water level. This can be applicable for the agricultural, industrial, environmental and urban sectors and systems related to Titicaca Lake and their managers and planners, to inform about the lake's water level status in the coming month.