Deep Learning in Water Resources Management: The Case Study of Kastoria Lake in Greece

: The effects of climate change on water resources management have drawn worldwide attention. Water quality predictions that are both reliable and precise are critical for an effective water resources management. Although nonlinear biological and chemical processes occurring in a lake make prediction complex, advanced techniques are needed to develop reliable models and effective management systems. Artiﬁcial intelligence (AI) is one of the most recent methods for modeling complex structures. The applications of machine learning (ML), as a part of AI, in hydrology and water resources management have been increasing in recent years. In this paper, the ability of deep neural networks (DNNs) to predict the quality parameter of dissolved oxygen (DO), in Lake Kastoria, Greece, is tested. The available dataset from 11 November 2015, to 15 March 2018, on an hourly basis, from four telemetric stations located in the study area consists of (1) Chl-a ( µ g/L), (2) pH, (3) temperature—Tw ( ◦ C), (4) conductivity ( µ S/cm), (5) turbidity (NTU), (6) ammonia (NH 4 , mg/L), (7) nitrate nitrogen (N–NO 3 , mg/L), and (8) dissolved oxygen (DO) (mg/L). Feed-forward deep neural networks (FF-DNNs) of DO, with different structures, are tested for all stations. All the well-trained DNNs give satisfactory results. The optimal selected FF-DNNs of DO for each station with a high efﬁciency (NSE > 0.89 for optimal selected structures/station) constitute a good choice for modeling dissolved oxygen. Moreover, they provide information in real time and comprise a powerful decision support system (DSS) for preventing accidental and emergency conditions that may arise from both natural and anthropogenic hazards.


Introduction
The physical, chemical, and biological responses of lakes to the climate give a variety of priceless information [1]. Lakes are affected directly by changes in climate: (a) due to changes in mixing regime, including lake stratification, oxygen saturation by increase in temperature, and the frequency of extreme wind events; (b) by changes in trophic structure determined by temperature; and (c) by complex interactions between temperature, nutrients, and physical forces [2]. In recent years, waterbodies have undergone extensive change as a result of widespread qualitative and quantitative degradation.
Dissolved oxygen is a very important water quality parameter, and its variation can be wide-ranging over a period of 24 hours [3]. When high concentrations of DO are observed, they mainly occur: (a) at shallow eutrophic lake systems; (b) at late spring-early summer; (c) in the morning and at noon, when high concentrations of DO are observed due to the photosynthetic productivity of algae and/or cyanobacteria, which are also associated with correspondingly high concentrations of Chl-a; and (d) when they are associated with low values of water temperature, which favors high values of DO of saturation, except in cases that the lake has an ice cap, which favors DO consumption and the inability to replenish.
A typical example is the lake of Kastoria, where measured values of DO, higher than those of oxygen saturation (DOs), have been recorded during the spring, mainly in May a feed-forward back-propagation artificial neural network model, including one hidden layer, was built by Lu et al. [26] to predict the total phosphorus (TP) concentration in the Lake Champlain.
Zhang et al. [11] applied a multilayer neural network to approach complex regression functions and predict the trends in dissolved oxygen. The aforementioned model gave accurate and better results in predicting the trend of DO than the typical ANN model, support vector regression (SVR) and linear regression model (LRM). Moreover, Kuo et al. [20] gave useful information for effective water quality management by predicting concentrations of water quality and eutrophication problems in the Te-Chi Reservoir in Taiwan by developing a DO back-propagation neural network model. The correlation coefficient between predicted values and measured data was well above 0.7.
In this paper, the ability of deep neural networks to predict the quality parameter of dissolved oxygen, in Lake Kastoria in Greece, is tested. Increases in water temperature lead to reduced oxygen solubility, thus reducing DO concentrations that will have an impact on the duration and intensity of algal blooms [27]. Moreover, the variation of parameters such as DO can be wide-ranging over a period of 24 hours. That is why it is so crucial to have continuous and uninterrupted measurements of the DO concentrations of a lake. The objectives of this paper are (a) to provide a useful supportive tool for water quality management of lakes and (b) to provide real-time prediction of DO in accidental and emergency situations.
The importance of this work lies in the fact that there is no other published work using the same platforms, tools, and methodology of deep learning, as in the present paper, to investigate the predictive capacity of water quality parameters of lakes in real time. Moreover, the fact that Lake Kastoria is monitored by our telemetric stations, offering continuous and uninterrupted data sets (a few hundred thousand), supports the use of deep neural networks, which need large amount of data to complete their learning process, and enhances the originality of the present study.

Study Area
The study area (Figure 1) comprises the catchment area of Lake Kastoria, which is located in the region of Western Macedonia in Greece, protected by directives, regulations, and international conventions. Lake Kastoria is a shallow polymictic lake with intense agricultural activities in its catchment area, which load both point and nonpoint source pollutants. To record water quality characteristics (on the lake's surface), on an hourly basis, four telemetric stations were installed at specific locations in Lake Kastoria, namely, Gkiole, Stavros, Psaradika, and Toichio.
The aforementioned telemetric stations assist the main monitoring station of Lake Kastoria (according to Directive 2000/60, Law 3199/2003 and JM 140384/2011), both spatially (more stations in different locations) and chronologically (continuous monitoring). After all, without the present monitoring system of Lake Kastoria, it would be impossible to apply deep neural network techniques.
The criteria of the choice of the monitoring stations in the present positions were: (a) the environmental pressures occurring from the land uses of the catchment area; (b) the hydromorphological characteristics of the lake; (c) the presence of corrosion, transport, and deposition phenomena; (d) the inflows and outflows; and (f) issues related to the accessibility and the costs of stations' installation and maintenance.

Programming Language
The Python programming language is used for the purpose of the study. Its main feature is its flexibility, as the same piece of code can be used with little or no change over a wide range of devices with different architectural and computing capabilities [28]. Another advantage is the use and interconnection of many libraries, which makes it an

Programming Language
The Python programming language is used for the purpose of the study. Its mai feature is its flexibility, as the same piece of code can be used with little or no change ove a wide range of devices with different architectural and computing capabilities [28]. An other advantage is the use and interconnection of many libraries, which makes it an idea programming language for developing ML models that use many different libraries an platforms. One of its applications is to develop DNNs.

Tools and Platforms
To develop DNNs-based on machine learning methods-the Python programmin

Tools and Platforms
To develop DNNs-based on machine learning methods-the Python programming language, the Spyder scientific environment, the Tensorflow open-source machine learning platform, and Nvidia's Compute Unified Device Architecture (CUDA) parallel platform (graphics card) are used. All of the above have been integrated into Anaconda, an opensource programming language. The CUDA platform is a parallel computing platform developed by Nvidia that enables users to exploit the computing power of graphics card Water 2021, 13, 3364 5 of 16 kernels for training and testing machine learning models. This results in the verification and creation of many models very fast. The main reason for using Anaconda and CUDA is that the model can be created in a variety of virtual environments and with the possibility of using parallel programming through the computing power of the graphics card. The above procedure gives the user the advantage of using large databases, such as in the present study, and complex learning algorithms, without having to install and reinstall various versions of libraries, offering training, verification, and testing of the model in a very short time [29].

Graphical User Interface
The Spyder scientific environment, written in Python, is used as it offers a unique combination of the advanced editing, analysis, debugging, and profiling functionality with an integrated deployment tool with interactive execution, data exploration, and visualization capabilities of a scientific package. Spyder is also an open-source code-processing environment, but only for the Python programming language. The Spyder environment enables the user to execute code line by line while displaying the results for each execution step. This allows the user to see if each step of the code is able to print individual diagrams or sections of code without having to re-execute the entire code again [30].

Libraries
Various libraries (Pathlib, Matplotlib, Pandas, Seaborn) have been used for Python programming. The libraries needed are not loaded in default. For that reason, a good practice to begin the code is by importing all the libraries needed. The most important of all libraries is the open-source Tensorflow machine learning library. The Tensorflow platform is an end-to-end open-source platform for developing machine learning models, developed by Google. It consists of a comprehensive, flexible ecosystem set of tools and libraries that allow users to easily use and develop ML models [31]. This library provides state-of-the-art machine learning methods, allowing the users to create their own machine learning models in a user-friendly, multipack, and add-on environment. In the present study, the Tensorflow GPU library is used, which enables the user with the CUDA tool to train and test the models they have developed through the computing power of the graphics card kernels.

Input and Output Data
The predictive ability of networks is based on their training, which in turn depends on the amount of information available on the network. For the purpose of this study, time series of quality parameters from 11 November 2015 to 15 March of 2018 were used on an hourly basis from four telemetric stations located in the study area, namely, Gkiole, Toichio, Psaradika, and Stavros (Table 1). More specifically, the available data consist of (1) Chl-a (µg/L), (2) pH, (3) Tw ( • C), (4) ECw (µS/cm), (5) turbidity (NTU), (6) ammonia nitrogen (N-NH 4 , ppm) (not available (NA) data for Stavros station), (7) nitrate nitrogen (N-NO 3 , mg/L), and (8) dissolved oxygen (mg/L). Indicatively, the descriptive statistics of the available data are given for Toichio station (Table 2). An important step in DL is the selection of appropriate input variables. Based on the literature, neural network studies reported that most important water quality parameters, for the modeling of dissolved oxygen, are pH and water temperature [15,19]. In addition to those two inputs, the importance of NO 3 -N and NH 4 -N as input variables for DO models has been proposed [19,20]. Moreover, turbidity [13], Chl-a [20], and ECw [17,19] have been reported as input variables for DO modeling. Here, based on the literature [15,17,19,20], all the available parameters are taken into account as input parameters for each investigated model. Subsequently, based on the literature and taking into account the resulting tables, concerning the impact of each input parameter to the output parameter (DO), DNNs consisting of four input parameters, namely, (1) nitrate nitrogen (N-NO 3 , mg/L), (2) pH, (3) Tw ( • C), and (4) ECw (µS/cm), are also tested.

Number of Hidden Layers
The generalization capability of a neural network is linked to its hidden layer. Too many hidden layers in a network increase computational burden and cause over-fitting, which results in poor prediction. Several studies [32,33] show that one or two hidden layers mostly produce better performance. It should be reminded that if an ANN has more than three layers, including input and output layers, it is called a deep neural network [34]. As the scope of this paper is to apply a deep neural network for modeling and prediction of quality parameter, two hidden layers are used for different structures.

Number of Nodes in the Hidden Layer
Deciding the appropriate number of nodes is crucial for effective learning and performance of the network. Nevertheless, there is no systematic approach to determine the optimal number of nodes to utilize for a problem [32,33]. Here, a preliminary investigation of deep neural networks for simulation and real-time prediction is used as default 64 neurons per hidden layer. The predictive ability of a deep neural network, consisting of 32 nodes per hidden layer, is also tested.

Training Epochs
In order to train a neural network, many epochs (cycles of training process) are needed. Some studies [32,33] have indicated that convergence could be achieved by training 85 to 5000 epochs. Here, 1000 epochs were able to achieve the convergence.

Activation Function
Activation functions are an essential part of neural networks as they provide nonlinearity. The absence of nonlinearity turns the neural network to a simple logistic regression model. Here, the rectified linear unit (ReLU) is used, which is defined as (Equation (1)): where x is the input to the neuron. An advantage of ReLU is that it is a highly simplified and easy-to-calculate function. It is also very quick to use and train compared with other activation functions. Moreover, in ML, updating a parameter is proportional to the partial derivative of the error function with respect to these parameters. If the gradient becomes too small, the updates will not work and the network may stop the training procedure. ReLU does not saturate in the positive direction, while other activation functions such as sigmoid and hyperbolic tangent saturate in both directions. Therefore, it has fewer vanishing gradients, resulting in better training. The limitation of ReLU is that its mean output is not zero [35].

Learning Rate
The learning rate is a hyper-parameter that controls how much to change the model in response to the estimated error each time the model weights are updated. If the learning rate is too large, the network fails to converge with allowable error over training set. Choosing a too small learning rate results in slow training process. Most of the neural networks utilize a value from 0.01 to 0.3 of the learning rate. In the present paper, taking advantage of the computing power of the graphic card, a learning rate of 0.001 is used.

Optimization Algorithm
With a strong ability to find the most optimistic result, RMSprop is used. RMSprop is one of the most popular optimization algorithms used in DL, as it is a fast and good optimizer [30]. It uses a moving average of squared gradients to normalize the gradient itself. It has an effect of balancing the step size; it decreases the step for a large gradient to avoid exploding and increases the step for a small gradient to avoid vanishing. RMSprop avoids the decay of the learning rate to zero.

The Structures
Deep learning-based technique is used in order to develop the DO model, which is applied to the data of the four stations separately. To achieve the goal, a DNN was built for each case consisting of: • an input layer of quality parameters, depending on the investigated structure; • two densely connected hidden layers, consisting of 64 or 32 units/nodes each, depending on the investigated structure; and • an output layer of DO quality parameter.
The examined deep neural networks structures are listed below: The DO model with structures: Here, a DNN with structure, for example, 7-64-64-1, indicates a model comprising 7 inputs, 64 nodes per hidden layer, and 1 output node. In terms of how the nodes are connected to each other, a feed-forward neural network is used for each case, as there is no feedback from the outputs toward the inputs.

Preparation of the Dataset
In this context, there were three phases-training, testing, and validation-for each station. In total, 80% of the data was used for the training process and 20% for testing, and 80% of the training data was re-divided into 60% for the training process and the remaining 20% for validation.

Statistical Descriptors
The following statistical measures were used to evaluate the predictive ability of the neural network models: (a) the mean absolute error (MAE), (b) the mean square error (MSE), and (c) the Nash-Sutcliffe model efficiency coefficient (NSE). Here, a DNN with structure, for example, 7-64-64-1, indicates a model comprising 7 inputs, 64 nodes per hidden layer, and 1 output node. In terms of how the nodes are connected to each other, a feed-forward neural network is used for each case, as there is no feedback from the outputs toward the inputs.

Preparation of the Dataset
In this context, there were three phases-training, testing, and validation-for each station. In total, 80% of the data was used for the training process and 20% for testing, and 80% of the training data was re-divided into 60% for the training process and the remaining 20% for validation.

Statistical Descriptors
The following statistical measures were used to evaluate the predictive ability of the neural network models: (a) the mean absolute error (MAE), (b) the mean square error (MSE), and (c) the Nash-Sutcliffe model efficiency coefficient (NSE).

Toichio Station
The procedure that was followed is described in detail for Toichio station. The same procedure was followed for all the examined stations and structures, the results of which are given in Table 3. 3.1.1. Structure: 7-64-64-1 Figures 3 and 4 illustrate the mean absolute error and the mean square error of the DO model for Toichio station, respectively, during the training process of 1000 epochs. The model seems to "learn" from the dataset after the 20th epoch. The training means absolute error equals to 0.48, while the training means square error equals to 0.52. Moreover, the resulting values of mean absolute error and mean square error for test set equal to 0.49 and 0.51, respectively. The fact that the obtained training errors were slightly lower than the tested ones indicates that a good fit is obtained. Figure 5 shows that the DO model with structure 7-64-64-1 for Toichio station predicts very well. Finally, Figure 5 illustrates the prediction error distribution.  3 and 4 illustrate the mean absolute error and the mean square error of the DO model for Toichio station, respectively, during the training process of 1000 epochs. The model seems to "learn" from the dataset after the 20th epoch. The training means absolute error equals to 0.48, while the training means square error equals to 0.52. Moreover, the resulting values of mean absolute error and mean square error for test set equal to 0.49 and 0.51, respectively. The fact that the obtained training errors were slightly lower than the tested ones indicates that a good fit is obtained. Figure 5 shows that the DO model with structure 7-64-64-1 for Toichio station predicts very well. Finally, Figure 5 illustrates the prediction error distribution.     3 and 4 illustrate the mean absolute error and the mean square error of the DO model for Toichio station, respectively, during the training process of 1000 epochs. The model seems to "learn" from the dataset after the 20th epoch. The training means absolute error equals to 0.48, while the training means square error equals to 0.52. Moreover, the resulting values of mean absolute error and mean square error for test set equal to 0.49 and 0.51, respectively. The fact that the obtained training errors were slightly lower than the tested ones indicates that a good fit is obtained. Figure 5 shows that the DO model with structure 7-64-64-1 for Toichio station predicts very well. Finally, Figure 5 illustrates the prediction error distribution.              The DO model with structure 7-32-32-1 produces results with an MAE of 0.50 and 0.54 and an MSE of 0.54 and 0.57 for the training and the testing process, respectively, for Toichio station. The training process takes place for 1000 epochs, and convergence is achieved (Figures 9 and 10). Finally, Figure 11 shows the prediction error distribution of the DO models for the aforementioned station.   The DO model with structure 7-32-32-1 produces results with an MAE of 0.50 and 0.54 and an MSE of 0.54 and 0.57 for the training and the testing process, respectively, for Toichio station. The training process takes place for 1000 epochs, and convergence is achieved (Figures 9 and 10). Finally, Figure 11 shows the prediction error distribution of the DO models for the aforementioned station. The DO model with structure 7-32-32-1 produces results with an MAE of 0.50 and 0.54 and an MSE of 0.54 and 0.57 for the training and the testing process, respectively, for Toichio station. The training process takes place for 1000 epochs, and convergence is achieved (Figures 9 and 10). Finally, Figure 11 shows the prediction error distribution of the DO models for the aforementioned station.    The DO model with structure 7-32-32-1 produces results with an MAE of 0.50 and 0.54 and an MSE of 0.54 and 0.57 for the training and the testing process, respectively, for Toichio station. The training process takes place for 1000 epochs, and convergence is achieved (Figures 9 and 10). Finally, Figure 11 shows the prediction error distribution of the DO models for the aforementioned station.  The dissolved oxygen model with structure 4-32-32-1 produces results with MAE of 0.58 and 0.62 and the MSE of 0.76 and 0.77 for the training and the testing process, respectively, for Toichio station (Figures 12 and 13). Figure 14 shows the obtained versus the predicted values and the prediction error distribution. The DO model with structure 4-32-32-1 predicts reasonably well.  The dissolved oxygen model with structure 4-32-32-1 produces results with MAE of 0.58 and 0.62 and the MSE of 0.76 and 0.77 for the training and the testing process, respectively, for Toichio station (Figures 12 and 13). Figure 14 shows the obtained versus the predicted values and the prediction error distribution. The DO model with structure 4-32-32-1 predicts reasonably well. The dissolved oxygen model with structure 4-32-32-1 produces results with MAE of 0.58 and 0.62 and the MSE of 0.76 and 0.77 for the training and the testing process, respectively, for Toichio station (Figures 12 and 13). Figure 14 shows the obtained versus the predicted values and the prediction error distribution. The DO model with structure 4-32-32-1 predicts reasonably well.    The dissolved oxygen model with structure 4-32-32-1 produces results with MAE of 0.58 and 0.62 and the MSE of 0.76 and 0.77 for the training and the testing process, respectively, for Toichio station (Figures 12 and 13). Figure 14 shows the obtained versus the predicted values and the prediction error distribution. The DO model with structure 4-32-32-1 predicts reasonably well.

All Stations/Structure
The dissolved oxygen models are trained for two different numbers of neurons in the hidden layers and for two different input combinations. All the investigated structures give satisfactory results. The optimal network architecture of each structure for each station is selected based on the one with the minimum statistical descriptors of MAE and MSE. The NSE is used to assess the predictive power of the models. The lower the MAE and MSE, the more the model is optimized (NSE reaches the unit). Overall, four structures are compared as shown in Table 3 for each station.
The well-trained DNN with structure 7-32-32-1 produces results with an MAE of 0.54 and 0.55 and an MSE of 0.65 and 0.68 for the training and the testing process, respectively, with a good predictive ability for Gkiole station. For Toichio station, the structure 7-64-64-1 prevails with an MAE of 0.48 and 0.49 and an MSE of 0.52 and 0.51 for the training and the testing process, respectively, and it constitutes the structure with the best performance compared with all the structures for all stations. For Psaradika station, the structure 4-64-64-1 produces the best results in relation to the other structures of this station with an MAE of 0.57 and 0.58 and an MSE of 0.65 and 0.68 for the training and the testing process, respectively. Finally, Stavros station presents results with an MAE of 0.69 and 0.70 and an MSE of 0.98 and 1.01 for the training and the testing process, respectively (structure 6-32-32-1). However, the selected structure 6-32-32-1 for Stavros station is less appropriate compared to the aforementioned selected structures for Gkiole, Toichio, and Psaradika stations.
It should be mentioned that, in all the investigated structures, the training errors are slightly lower than the tested ones, which indicates that a good fit has been achieved. Based on the investigated structures, the results demonstrate that the proposed DNN models (Table 4) constitute a good choice for modeling dissolved oxygen for each station. Table 4 also illustrates the high efficiency of the selected structures.

All Stations/Structure
The dissolved oxygen models are trained for two different numbers of neurons in the hidden layers and for two different input combinations. All the investigated structures give satisfactory results. The optimal network architecture of each structure for each station is selected based on the one with the minimum statistical descriptors of MAE and MSE. The NSE is used to assess the predictive power of the models. The lower the MAE and MSE, the more the model is optimized (NSE reaches the unit). Overall, four structures are compared as shown in Table 3 for each station.
The well-trained DNN with structure 7-32-32-1 produces results with an MAE of 0.54 and 0.55 and an MSE of 0.65 and 0.68 for the training and the testing process, respectively, with a good predictive ability for Gkiole station. For Toichio station, the structure 7-64-64-1 prevails with an MAE of 0.48 and 0.49 and an MSE of 0.52 and 0.51 for the training and the testing process, respectively, and it constitutes the structure with the best performance compared with all the structures for all stations. For Psaradika station, the structure 4-64-64-1 produces the best results in relation to the other structures of this station with an MAE of 0.57 and 0.58 and an MSE of 0.65 and 0.68 for the training and the testing process, respectively. Finally, Stavros station presents results with an MAE of 0.69 and 0.70 and an MSE of 0.98 and 1.01 for the training and the testing process, respectively (structure 6-32-32-1). However, the selected structure 6-32-32-1 for Stavros station is less appropriate compared to the aforementioned selected structures for Gkiole, Toichio, and Psaradika stations.
It should be mentioned that, in all the investigated structures, the training errors are slightly lower than the tested ones, which indicates that a good fit has been achieved. Based on the investigated structures, the results demonstrate that the proposed DNN models (Table 4) constitute a good choice for modeling dissolved oxygen for each station. Table 4 also illustrates the high efficiency of the selected structures.

Discussion
It is clear that the models that show the most appropriate predictive ability, based on their statistical descriptors for Gkiole, Toichio, and Stavros stations, are the models consisting of all the input parameters. Despite the fact that not all the input parameters have a high impact on the output, the use of all the available information seems to give a better performance in deep learning. In other words, all the parameters have a significant effect on the performance of the DNN model and cannot be excluded from the input variables. This inability to explain network behavior may seem unacceptable to the scientist, but one should remember that when one is moving in the ambiguous contexts of stochastic phenomena and, in particular, artificial intelligence, this may not only be acceptable, but that is the scope. Moreover, the structure and function of the network becomes "autonomous," achieving the idea of "machine learning".
In case of Psaradika station, the selected model does not use all the parameters to achieve good performance, but it needs a deep network of 64 nodes per layer instead of 32 in order to give the best results compared to other structures for this station. The fact that the model with structure 4-32-32-1, utilizing fewer inputs and nodes in hidden layers, is less appropriate in relation to all the investigated models for all stations, constitutes the evidence that complex neural networks are a promising field for improvement. The use of the selected tools and platforms gives the advantage of using large databases, while the training and testing procedure is obtained in a very short time (GPU card).
Learning rate is a crucial hyper-parameter in deep learning, controlling how much to change the model in response to the estimated error, each time the model weights are updated. If the learning rate is too large, the network fails to converge. Most of the neural networks utilize a value from 0.01 to 0.3 of the learning rate. In the present study, taking advantage of the computing power of the graphic card, a learning rate of 0.001 is used. Moreover, the Tensorflow provides state-of-the-art machine learning methods, and the possibility of using parallel programming through the computing power of the graphics card makes the process of machine learning constantly gain ground.
The optimal selected feed-forward deep neural networks (DNN) of the DO for each station provide information comprising a powerful decision support system (DSS) to prevent accidental and emergency conditions that may arise from both natural and anthropogenic hazards in real time. In practice, the use of neural networks in hydrology tends to mimic hydrological processes, which science does not fully understand or can express with the help of a mathematical formula. However, it should be noted that, according to their structure and function, neural networks generally do not provide a better understanding of hydrological processes and natural phenomena, as they simplify physics and "degenerate" into weights and threshold values. To select the appropriate network structure and to apply the appropriate training algorithm, one must understand the natural procedures that occur. Moreover, the right choice of network architecture, activation functions, and learning methods could be substantiated through a test and control process. In addition, the selection of training data is of paramount importance, as it requires proper preparation and normalization. However, from a water management point of view, due to the user-friendly nature of the proposed neural networks, they can be implemented in real time by nonspecialists, as no knowledge of the phenomenon is required after the end of training phase.
The possible future work could include the following: • Use additional information captured from a modern drone (uncrewed aerial vehicle) equipped with a multispectral camera as ground-truth information to calibrate satellite imagery in order to improve quantification of the specific quality parameters of the water from the study area. The additional information could also be enhanced by using data (ground data collection) derived from field work (targeted area samplings) in the study area. The existing operational algorithms could be tested, or maybe new ones could be created in order to find the best fit of the band ratio.
• Use more complex machine learning methods (such as convolutional neural networks (CNNs)), not only for Lake Kastoria but also for other national and international lakes, mainly in neighboring countries with cross-border water resources. • Use the same methodology in order to test the adequacy of the proposed models for other national and international lakes.

Conclusions
The selected DO models for each station provide information in real time and comprise a powerful decision support system (DSS) for preventing accidental and emergency conditions. The real-time monitoring of the water quality parameters contributes in management by (a) controlling water quality for irrigation, (b) monitoring atmospheric conditions, (c) determining microclimate indicators, (d) issuing a warning in case of crises caused by extreme events, (e) providing continuous knowledge of the state of the water bodies, and (f) maintaining the ecological balance of the ecosystems and water resources of the region.
Author Contributions: Conceptualization, methodology, software, validation, formal analysis, insvestigation, writing-original draft preparation, visualization, L.K.; resources, data curration, writing-review and editing, supervision, project administration, A.P. All authors have read and agreed to the published version of the manuscript.