1. Introduction
Depending on climatic, geographic, geological, social, and economic factors, each location has its own water quality and quantity challenges. Additionally, ongoing global warming and meteorological patterns are likely to disrupt the temporal and spatial balance of water, leading to freshwater scarcity and impeding the achievement of the United Nations Sustainable Development Goals around the world. Modeling studies suggest that there will be a paradigm shift in the distribution of freshwater on the planet by 2050 [
1,
2]. Therefore, a sound water management plan, developed using reliable forecasting models, is essential for implementing sustainable water use and conserving water resources in a given basin or region.
Turkey experiences frequent droughts that significantly reduce surface and groundwater resources, including wetlands and lakes [
3,
4]. Drought conditions affect standing water bodies when there is a reduction in surface runoff and in stream inputs. Droughts typically coincide with hot weather, which causes evaporation to increase significantly during dry periods. The effects of drought include a decrease in water levels in what is usually a very fertile littoral zone. This can leave aquatic fauna (e.g., mussels, snails, and flora) stranded in the area. The increased water temperature associated with drought can lead to stratification, increased salinity, and reduced oxygen levels. In some cases, the combination of high temperatures with low oxygen may lead to the extinction of fish species [
5].
Uncontrolled drinking water supplies and inadequately managed water reservoirs pose a significant threat to developing and densely populated cities. Lake Sapanca, for example, is an important source of fresh water supply for the cities of Sakarya and Kocaeli and is also used by several bottled water companies for commercial purposes. The prospects of the reservoir appear to be affected by climate change and recent droughts, which could negatively impact several parts of the region and its ecosystems [
6], as well as water quality associated with cyanobacteria and microcystin [
7]. Because of the multitude of factors that affect the surface area of a lake, one of the most critical hydrologic problems is estimating the water level of a lake before it reaches its threshold. Hydrological models have certain limitations in terms of providing accurate predictive results [
8] due to the complex nature of hydrological and meteorological variables as well as the temporal and spatial properties of an individual catchment. Therefore, it is vital to develop more reliable predictive models that can accurately and reliably estimate the future water level of a lake.
There are two different approaches for LWL prediction in the literature. The most prominent approach follows the physical process, and the emerging approach focuses on data-driven methodologies, which focus on historical datasets to predict future values. Data-driven methods simulate the LWL in addition to the factors affecting it using scientific computer models. Different types of models have been developed to promote specific cases. For instance, Chang and Chang evaluated the model with Support Vector Regression (SVR) and an Adaptive Neuro-Fuzzy Inference System (ANFIS) [
9]. Liu et al. presented a multivariate conditional model based on copulas to predict water level and improve spatial precipitation estimation [
10]. Wang et al. applied SVR to simulate the causality between LWL and the quantity of water discharged from the reservoir [
11]. Statistical methods and Artificial Intelligence (AI) techniques are two common data-driven approaches to solving LWL prediction problems [
12]. These methods include multiple regression, pattern recognition, neural network techniques, time series methods, and probability features [
13].
During the recent decade, a variety of contemporaneous techniques have been applied to compare the predictive performance of the algorithms. For example, Ghorbani et al. investigated the ability of the Genetic Programming (GP) and ANN models to predict LWLs in Australia, and reported accurate predictions with good agreement [
14]. To predict LWLs at Lake Urmia in Iran, Talebizadeh and Moridnejad employed the ANN and ANFIS [
15]. The ANFIS algorithm has better accuracy compared to the ANN model, as shown by the uncertainty analysis. In another study, neural network, neural fuzzy, and GP models were applied to estimate the LWL on a daily basis [
16]. The results showed that each of the three models accurately predicted the LWL. Buyukyildiz et al. developed a series of AI models, Multilayer Perceptron (MLP), hybridized SVR with Particle Swarm Optimization (PSO), a Radial Basis Neural Network, and ANFIS, to predict LWLs [
17]. Their results show that the hybrid model SVR-PSO is a reliable predictive model. Similarly, for three upstream rivers on the east coast of Malaysia, water levels for the next five hours were successfully estimated using an ANN [
18]. To predict the LWL, Yadav and Eliza used a Support Vector Machine (SVM) and Wavelet [
19]. The results of the study showed that the model implemented to predict future values of the reservoir was more accurate compared to regression models. Despite the successful attempts to use machine learning (ML) methods in these studies, there are certain inherent limitations in the algorithms used in the literature [
20]. For instance, in ANNs, the rules that could explain underlying methods are not given. In terms of fuzzy logic, setting precise, fuzzy enrollment limitations and parameters can be difficult and the fuzzy justification is not always correct. Regression models show that as the number of variables increases, their accuracy decreases. The regression models work better when there are fewer variables. Lastly, training a deep learning model requires a lot of computing power, which leads to the need for powerful GPUs and a large amount of RAM. Another potential drawback is an overfitting issue that arises when a model performs poorly on newly untrained data after being overtrained on training data.
The most used time series prediction model with statistical analysis that is conducted by scholars for lake level prediction is the Autoregressive Integrated Moving Average (ARIMA) model [
21,
22]. It can be expressed in several ways, including as Moving Average (MA), Autoregressive (AR), or hybrid AR or MA, known as Autoregressive Moving Average (ARMA) or Seasonal Autoregressive Integrated Moving Average (SARIMA) [
23]. The SARIMA model, on the other hand, has the advantage of requiring fewer model features to explain the structure of time series that exhibit nonstationarity in seasons and between seasons [
24]. Unlike ML methods, which often require multiple features as input, this is an important simplification [
22]. The artificial neural network (ANN) algorithm is a widely used ML method for water flow modeling, water quality assessment, and water level prediction in the field of hydrology and water resources [
18,
25,
26,
27]. In addition, some research papers have presented a hybrid ANN-ARIMA model [
28,
29].
Review of the aforementioned studies shows that various models for LWL prediction have different findings and highlight their estimation uncertainty. Some scholars have used time series techniques for predicting various areas such as energy price, stock price, and corporate sales forecasts, which are critical to the global economy [
30,
31], including weather, environment, hydrology, and geological phenomena [
32,
33] in recent years. Nearly all of them concluded that the time series forecasting methods provide more accurate results compared to the benchmark models.
The recurrent neural network (RNN)-based deep learning (DL) approach is proposed in this paper as a state-of-art technique for examining the LWL that would improve the prediction performance. DL networks, which differ from conventional approaches in that they allow computer models consisting of numerous layers to learn representations of data consisting of multiple levels of abstraction, replicate the functioning of the human brain [
34]. The approach of DL has been used for object recognition, speech recognition, and visual object recognition including genomics and drug discovery [
35]. The extraordinary success of supervised RNN-based DL algorithms for conducting recognition studies directed the use of RNN-based algorithms in multivariate time series studies. The LWL studies also have time series data due to their nature and attract hydrologists to exploit the power of these DL algorithms in their future time series prediction studies. However, the application of DL models for LWLs is limited and is the focus of this study in order to overcome several drawbacks of the available approaches to predict LWLs, such as the large number of input variables and their uncertainty. The motivation behind this study is to provide an effective prediction technique for water managers to handle drinking water supply availability in lakes before reaching an alarming level. The limited water supply in lakes not only causes frequent drought experiences and water shortages, but also causes a decrease in water quality.
In this work, novel gated RNN-based algorithms are used to build a model that can predict the future LWL to support drought mitigation and reservoir management. In addition, this study aims to help fill the gap in the literature regarding the selection of DL models and the evaluation of the performance of LWL prediction algorithms by using Naïve Benchmarks and the Diebold–Mariano test. As far as the authors are aware, there is no other study that focuses on the comparison between algorithms for multivariate prediction studies with different time lags.
3. Results
This study uses the ANN and four different RNN-based deep learning algorithms to compare their forecasting accuracy from day 1 to day 120 ahead, based on RMSE and MAPE values, the Naïve Method, and Diebold–Mariano test results. The ANN, LSTM, GRU, Stacked LSTM, and Bidirectional LSTM algorithms were successfully trained and validated, and compared with test data consisting of 3004 lines to evaluate the model’s reliability for the unknown dataset.
Table 3 presents the performance of LWL prediction of the investigated ANN and RNN algorithms from day 1 to day 120 ahead forecasting. These results show that all investigated ANN and RNN algorithms showed excellent prediction accuracy in the 1 day to 10 day ahead prediction scenario with RMSE values of <0.1 m. On the other hand, in the 60 day ahead scenarios, the LSTM algorithm had the best performance value for training and testing with an RMSE = 0.1762 m, while in the 120 day ahead scenarios, GRU showed the best performance with an RMSE score of 0.3838 m (
Table 3). In contrast, the Stacked LSTM and Bidirectional LSTM models did not show additional performance in terms of prediction accuracy over LSTM. In summary, the LSTM model is significantly efficient considering its high accuracy among other advanced models, specifically for long-term predictions such as a 60 days ahead forecast, due to the architectural benefits of the process of parameter tuning and its migration to different tasks. GRUs are easier to train and faster to run than LSTMs, but they may not be as effective at storing and accessing long-term dependencies. Since it is necessary to know the next timestamp in advance for a Bidirectional LSTM, it is more appropriate for offline applications [
55]. On the other hand, the performance difference between Stacked LSTM and LSTM comes from additional dimensions for next value prediction other than the time dimension.
Additionally, the RMSE values describing the prediction error rates of time series algorithms were compared using the Naïve Method, and the algorithms that performed better than the Naïve Method were identified as successful algorithms for predicting future LWL values. The Naïve Benchmark comparison results of algorithms are presented in
Table 4 from day 1 to day 120 forecasting. The higher value for each investigated algorithm to each prediction period indicates higher performance and good predictive power. Based on the Naïve Method benchmark, the algorithm performances increased up to the 60 days ahead predictions, then decreased for the 120 days ahead predictions. As an average, the performance of GRU was higher for all investigated periods, whereas Stacked LSTM had a lower average performance value, followed by the Bidirectional LSTM algorithms.
The variabilities between Naïve Benchmark comparison scores are much more apparent than for RMSE values (
Table 4). The decreasing performance goes down to −24.26%, which indicates that it would be disadvantageous to use the RNN-based algorithm for predicting that specific period. The results also show that the RMSE results of some algorithms are close to those of the Naïve Method, especially for the predictions of 5 and 10 days. Therefore, the algorithms were tested even more to find if it is necessary to use these algorithms for future LWL values. The results show an increase in performance of at least 18.45% (Stacked LSTM) when the prediction horizon is set to 20 days or more. Based on the Naïve Method comparison, LSTM showed the highest performance with a 78.55% improvement over the Naïve Method at 60 day ahead forecasting. It is also worth noting that ANN is the only algorithm that performed better than the Naïve Method in the 1 day prediction period.
The performances of ANN and RNN-based algorithms were also tested using MAPE as an evaluation metric. A similar pattern was observed in the MAPE results when considering the results in the RMSE values because as the time horizon extends to further time periods, the model performance decreases. This pattern indicates there needs to be additional evaluation criteria for model performance results between different time periods. For this reason, the results in
Table 5 were calculated further by taking differences between the Naïve Method results and the algorithms’ results (
Table 6).
Table 6 reveals the performance differences that indicate the performance improvement in models in terms of the Naïve Method. As can be seen in the table, none of the models perform better when compared with the Naïve Method in the 1 day and 5 day prediction periods. However, as the time period increases, the performance improvement also increases. The best performance is observed for the GRU algorithm in the 120 day prediction period with a 0.85 points performance increase in the MAPE. The MAPE results are occasionally compatible with the RMSE results, but the MAPE results indicate that in order to achieve the advantage of RNN-based algorithms, the models should focus on at least a 30 day ahead prediction. In addition, the ANN algorithm is advantageous when used in the 30 day and 60 day prediction period according to the results.
The performance of ANN, LSTM, GRU, Stacked LSTM, and Bidirectional LSTM for LWL, and their observed and estimated values compared with the Naïve Method for the day 1 to day 120 ahead scenarios are presented in
Figure A1,
Figure A2,
Figure A3 and
Figure A4. It can be seen from
Figure A1 that the observed and simulated lines are generally distributed closely for each investigated model, showing that all ANN and RNN algorithms have high simulation performance at day 1. However, as the forecasting time extends from day 1 to day 120, the observed, estimated, and Naïve Method lines diverge for each of the algorithms.
Figure A1 shows the 1 day and 5 days prediction results of the ANN and gated RNN algorithms and the comparison with the observed and Naïve Method values. The prediction results of all the studied algorithms are quite similar to each other and to the Naïve Method for the 1 day and 5 days prediction (
Figure A1), indicating good training, validation, and prediction.
Figure A2 shows the prediction results of the ANN and gated RNN algorithms for 10 and 20 days ahead and the comparison with the observed values and the Naïve Method. Compared to the Naïve Method, all tested algorithms had a similar prediction trend for 10 days ahead, but all algorithms outperformed the Naïve Method in their predictions for 20 days ahead. When forecasting 10 and 20 days ahead, the GRU achieved the best results (
Figure A2), showing a lower RMSE (
Table 3) and a higher performance improvement compared with the Naïve Method (
Table 4).
When comparing the performance results of the algorithms for day 30, Stacked LSTM and Bidirectional LSTM produced a similar prediction performance to LSTM and GRU, whereas for day 45 prediction, the GRU, Stacked LSTM, and LSTM algorithms produced a similar performance to Bidirectional LSTM (
Figure A3,
Table 4).
Figure A4 shows the 60 and 120 day forecast results of the ANN and gated RNN algorithms and the comparison with the observed values and the Naive Method. The 60th day was the culminating point for the prediction performance of the tested algorithms, and LSTM performed better for LWL at the 60 day prediction based on the RMSE and Naïve Method values. Although all tested algorithms performed well in 60 day prediction (
Table 4), LSTM provided the closest prediction values to the observed values of LWL 60 days in advance compared with the other methods, as shown in
Figure A4. For the 120 day ahead predictions, there was a significant decrease in values for the studied algorithms compared to the Naïve Method, with the exception of GRU. Although the prediction performance was low, the GRU algorithm provided a statistically similar prediction performance for day 60 and day 120. These results show that the GRU algorithm may still be superior to the other algorithms in terms of prediction accuracy with higher Naïve values. However, the degree of agreement between the predicted value and the actual value is not very good and exceeds the actual value.
As a summary,
Figure A1,
Figure A2,
Figure A3 and
Figure A4 show that the tested algorithms predicted LWL at a statistically acceptable level for up to 120 days. Among the proposed algorithms, the LSTM algorithm was clearly superior in tracking the nonlinear behavior of Lake Sapanca over a 60-day period with the smallest RMSE (0.1762 m) and a higher performance ratio compared to the Naïve Method result (78.55%). Thus, when a model is needed for long-term forecasting LWL, the LSTM-based DL algorithm can help to automate and manage LWL to implement more effective water management strategies. It is optimal for 60-day forecasts of LWL.
The Diebold–Mariano test values to determine the statistical significance of two separate prediction results are summarized in
Table 7. It can be noted that the RNN algorithms did not show significant superiority for the 1-day, 5-days, and 10-days LWL forecasting over the Naïve Method. However, the Naïve Method and the GRU algorithm for 5 days gave a
p-value of 0.031, indicating the GRU algorithm’s superior result is significant compared with the Naïve Method to predict the next 5 days. The same is true for predicting the next 10 days using the LSTM and GRU algorithms. However, when the prediction significances of LSTM and GRU are tested, the
p-value is lower than 0.05, indicating that GRU must be used to predict the next 10 days.
From
Table 4, it can be seen that, for day 20 predictions, the best performance improvement comes from the GRU algorithm. Accordingly, the
p-values are significant (
p < 0.05) based on the Diebold–Mariano test (
Table 7), which confirms the superiority of GRU. Regarding the Naïve Method comparison (
Table 4) and the Diebold–Mariano (
Table 7) test results, only the GRU algorithm should be preferred to predict the LWL for the next 20 days.
Similarly, GRU, LSTM, and Stacked LSTM gave a p-value of less than 0.05 in the Diebold–Mariano tests compared to the Naïve Method for predicting the next 30 days LWL. On the other hand, the predictive performance of Bidirectional LSTM was not significant compared to the Naïve Method as the p-value is greater than 0.05.
According to the Naïve Method comparison, GRU performed better than the other algorithms in the 45-day forecast (
Table 4).
Table 7 further confirms that the predictions of GRU algorithms have a significant
p-value compared to the Naïve Method. Moreover, the
p-values are more remarkable than 0.05 when GRU is compared with Stacked LSTM and Bidirectional LSTM, indicating that the GRU algorithm can be used interchangeably with the Stacked LSTM and Bidirectional LSTM algorithms.
The results of the Diebold–Mariano test show that the accuracy of the prediction results and the stability of the performance of the LSTM algorithm are significantly better, with a
p-value of less than 0.05 (
Table 7). Considering the results of the RMSE, Naïve Method, and Diebold–Mariano test, only the LSTM algorithm should be preferred for predicting the next 60 days to obtain a more reliable and accurate prediction of the future dynamics of LWL.
It is clear that the implemented ANN and RNN algorithms provide a relatively accurate prediction pattern when the prediction values are compared with the observed data for the 120 day prediction (
Table 7), even though the magnitude of the Naïve Method benchmark result is reduced compared to the 60 day prediction. In addition to the benchmark, the Naïve Method, the GRU algorithm has the significant best performance for the 120 day forecast considering the Diebold–Mariano test results compared to the other algorithms (
Table 7), indicating that the GRU algorithm is more efficient at forecasting the next 120 days of LWL.
From the obtained results for LWL prediction from day 1 to day 120, we can see that: (1) Day 60 predictions provide the most optimized LWL detection based on high Naïve Benchmark performance comparison values. (2) The best performance of the investigated algorithms can change in terms of the selected prediction periods. (3) The LSTM algorithm can better predict LWL for 60 days in advance with higher accuracy, which allows water managers to take action. In addition, it is worth noting that the Bidirectional LSTM and Stacked LSTM algorithms contribute to the forecast with little or no performance increase for the short prediction period of less than 20 days.
Among the features, the most important one to affect the output was determined as withdrawal using the Mutual Information technique. The importance levels can be ordered as withdrawal, average temperature, minimum temperature, maximum temperature, and precipitation (
Figure 5).
Accurate LWL prediction is a necessity, not only to prevent possible drought conditions but also possible water quality effects. Therefore, this study conducted extra work to observe the relationship between microcystin concentrations previously observed during low LWL periods. In addition to the LWL effect, this study investigated the relationship for the maximum temperature, mean temperature, minimum temperature, precipitation, light intensity, and evaporation. This experiment was conducted in order to reveal their importance so as to predict LWL in advance and be able to take measurable actions in advance.
To begin with, the microcystin concentrations at the surface, 1 m, 5 m, 10 m, 15 m, and 20 m were measured over the period of 2019–2023 to understand the relationship between the changing meteorological situation and water quality was affected by algal growth. The microcystin concentration in all sampled depths showed approximately the same increasing pattern over time, except for the samples collected from the depth of 15 m (
Figure 6). The variations in
Figure 6 indicate there is an increasing trend of microcystin for the surface water, 1 m, 5 m, 10 m, and 20 m depths. However, the trend is decreasing for the 15 m depth. The microcystin level was almost similar for each depth of the first 10 m; however, significant differences were recorded in the spring and autumn, specifically, vertically mixing periods. During the summer, the microcystin concentration stayed relatively low (<0.5 µg/L) or at an undetectable level from May to October. The highest concentrations were observed during the winter period from November to April with a significant fluctuation, which coincided with the mixing period. By contrast, the microcystin concentrations were higher at the sampling depths of 15 and 20 m. The microcystin was recorded at all sampling times during the experimental period. In general, the concentrations were below the 2 µg/L for both sampling depths; however, the highest concentrations of around 8 µg/L were recorded during the summer stratification phase (June to August). For the two years 2020 and 2021, the microcystin concentration was the lowest (with <3.31 µg/L), especially for 2021 (<1.61 µg/L).
The nonparametric Mann–Kendall test shows that the microcystin concentration decreases monotonically at a depth of 15 m and increases at the other depths. However, only the microcystin concentration at 20 m depth was significant at the 95% confidence limit with a z-value of 2.08 (
Figure 6), indicating an increasing positive trend in the microcystin data time series that dominates at this depth.
Due to temporal and spatial variability, it is difficult to obtain sufficient input data needed for data-driven predictive models to analyze and learn the relationships between microcystin and meteorological parameters, i.e., temperature, precipitation associated with algal proliferation, and microcystin concentration. To better understand the changing meteorological parameters on microcystin concentration, Spearman correlations were evaluated using monthly microcystin data collected from raw water before water treatment. From
Figure 7, the significant positive contribution of temperature on microcystin concentration is evident. Light intensity also has a positive effect on microcystin concentration. On the other hand, the water level of the lake had no significant effect on the microcystin concentration.
The degree of association differs in terms of the features in
Figure 7. The minimum temperature, maximum temperature, mean temperature, and evaporation have a moderate correlation with microcystin [
56]. In addition, light intensity has a weak correlation. On the other hand, the LWL and precipitation have a very weak correlation with microcystin. The results provide a better understanding that the water quality is rarely affected by the level of water. However, the temperature, which is one of the indicators for predicting LWL, affects the water quality. Thus, it can be concluded that LWL does not directly affect the water quality, but the effect is indirect through the consideration of temperature values.
4. Discussion
Based on the experimental result of this case study that applies ANN and RNN-based deep learning algorithms for lake water level prediction, it is possible to forecast the next 120 days with a smaller RMSE (0.3838 m), reasonable Naïve Benchmark comparison value (58.00%), and significant Diebold–Mariano test results (
p < 0.05). However, compared with other models, the prediction result based on LSTM proposed in this study is optimal for the next 60 days LWL forecasting with a smaller RMSE (0.1762 m), the highest Naïve Benchmark comparison value (78.55%), and a significant Diebold–Mariano test
p-value (<0.003). The goal of this study is to compare the impact of various climates and comprehend how new AI techniques behave and perform on various event forecasting tasks. The prediction performance of the investigated ANN and RNN algorithms aligns with previous research based on the RMSE and the Naïve Method. Using ANN and SVM, Yoon et al. predicted the groundwater levels in the nearshore aquifer in Donghae City, Korea, for two wells with RMSE values of 0.13 m and 0.136 m, respectively [
57]. The objective of their research was to create and evaluate data-driven time series forecasting models for the short-term fluctuations in groundwater levels in a coastal aquifer caused by tidal influence and precipitation recharge. However, their study lacks a comparison of the proposed algorithms with the baseline models and other algorithms from DL. Therefore, the performance of the models cannot be evaluated for predicting water levels. The algorithms are also not evaluated against basic benchmark methods such as the Naïve Method, which raises the question of whether it is necessary to create fancy DL algorithms for LWL prediction. Thus, this study could be a milestone for further water level studies that attempt to develop every single DL algorithm available in the field of data science.
Hrnjica, B. and Bonacci found that the LSTM and RNN algorithms performed better than the traditional ANN algorithms on datasets with a given number of features and a time scale of one month [
58]. They also found that the feed-forward neural network and LSTM models performed better than the traditional time series forecasting models based on ARIMA and other similar techniques. The objective of their study was motivated by the realization that traditional regression and statistical techniques were insufficiently effective at predicting stochastic events such as water level. In contrast to traditional models, Lee et al. showed that the LSTM model better reproduces the variability and correlation structure of the broader time scale as well as the important statistics of the original time domain [
59]. Applying the LSTM into stochastic simulation and determining if the long-term trends of known hydroclimatological indicators can be replicated was the main objective of their work. The improved representation of long-term variability is critical for water managers as they rely on these data to plan and manage future water resources. In the future, the performance improvement over the Naïve Benchmark can be tested with other novel models, such as attention-based algorithms or other derivatives. However, the recent attempt to use an attention-based algorithm showed that it did not perform better than a recurrent network [
60].
The main hypothesis of the present study is confirmed by the fact that RNN-based algorithms achieve better predictive performance of LWL when using long-term daily data from a decade and improve predictive accuracy for 60-day forecasts (
Table 4). The trends of observations and model predictions in
Figure A1 through
Figure A4 suggest that the potential performance of RNN algorithms can also be extended beyond 120-day forecasts by incorporating more data into the models. The LSTM model network has demonstrated its ability to learn from sequential data in the past and has been shown to be a useful model. It can effectively learn from sequences of varied durations, capturing long-term dependencies. [
35]. To confirm the results of this paper, Zhu et al. studied 69 lakes in Poland for 30 day ahead water level prediction and concluded that the recurrent DL models performed similarly to attention-based recurrent DL models in terms of predictive performance [
60]. The results of the LSTM algorithm between its variants, namely the Stacked LSTM and the Bidirectional LSTM, in the present study show that there is no significant difference in predicting less than 30 days ahead. The LSTM algorithm requires long observation datasets and the selection and optimization of hyperparameters, learning rate, and number of epochs to achieve correct prediction results [
20]. For example, Morovati et al. reported a better prediction performance of LSTM when using daily recorded data over 20 years [
61]. The results obtained for LSTM in this study are consistent with these findings. The findings also show that the LSTM algorithm reflects well when compared to the fluctuation trend of the real LWL value. This is due to the use of a gated structure in the LSTM model, so the LSTM algorithm is good at extracting short-term temporal correlations. However, due to the cyclic periods of water level variations, the performance increase drops when it reaches the next LWL cycle after 60 days. The better performance of RNN algorithms compared to the Naïve Method is also due to the successful optimization of hyperparameters in the RNN networks.
Another important aspect is that although the prevailing opinion suggests using all available DL algorithms to find the algorithm that performs best according to the RMSE or MAPE results, the results of the algorithms do not seem to differ significantly with respect to the Diebold–Mariano test. Therefore, in order to suggest a better performing algorithm, the statistical difference must be shown in addition to the RMSE or MAPE results [
53], and in some cases, the ANN and gated RNN derivatives, as indicated in the Results section, do not appear to have statistical significance and can be used interchangeably.
The fluctuations in LWL are associated with meteorological processes and anthropogenic activities, which lead to a nonlinear and complex system. In this context, the study has several limitations due to its nature. One of the limitations is that the results depend on the geographical location. The experiment was conducted at Lake Sapanca in the northeastern Marmara region of Turkey. This location has characteristics of both Black Sea and Mediterranean climates. Therefore, the results may change in regions with different climate characteristics. Another limitation of the study is that the dataset produced by the Turkish Meteorological Service contains several missing data for selected parameters. Although it is possible to interpolate missing data, the results with interpolated data rows may produce a biased LWL value. The results could change with a dataset containing complete records for a longer period without missing data. In addition, there are limitations to the study in that there is an insufficient amount of data, especially for some features. In practice, there may not be a chance to gather all the features from the field. The potential feature(s) may not be represented to the algorithm and the potential feature(s) may even increase the model prediction performance. In addition, in the case of there being very few available features, the prediction performance could be underrepresented. However, in the case of there being too many features, it may cause the model to overfit. Therefore, a balance between overrepresentation and underrepresentation must be provided. Thus, it is further suggested to apply other appropriate preprocessing methods to improve the predictive performance of the RNN DL models with different time horizons. In the future, the LWL prediction could be practiced by using GIS methods with a satellite dataset. The performance difference between time series prediction and prediction with image data could be compared with the Naïve Method Benchmark. In addition, in terms of the availability of more features, the researchers can conduct sensitivity analysis and uncertainty analysis to eliminate some of the features to prevent possible overfitting issues.
Several well-known nutrient inputs and relatively less known meteorological parameters, together with hydrological disturbances, cause excessive growth of cyanobacteria in freshwater ecosystems, which degrade water quality with their toxins. Extreme heat waves are becoming more common as global and regional warming continues and are expected to become the norm in future scenarios. Microcystin concentration correlated positively with temperature variables (max, min, and mean,
p < 0.01), including evaporation and light intensity (
p < 0.05), and not significantly with precipitation (negative correlation), which is directly related to LWL (
Figure 7). Significant correlations between meteorological parameters and microcystin concentrations in freshwater bodies have been reported previously [
7]. Light intensity in the metalimnion zone leads to greater development of cyanobacteria and the presence of large amounts of microcystins, posing potential problems for the use of water resources [
62]. Since freshwater lakes are used as drinking water sources, proper water and algae management is necessary to ensure a clean and safe water supply. The use of tap water is restricted when large amounts of algae are found in water reservoirs because various water treatment problems can occur, such as clogged treatment systems, a bad odor, color in the water, and regulated toxic substances such as microcystin. Predicting the correlation of algal blooms with easily measured meteorological or hydrological parameters in advance and taking rapid response actions to algal growth can minimize damage and ensure uninterrupted production of purified water.