Genetic-Algorithm-Optimized Sequential Model for Water Temperature Prediction

: Advances in establishing real-time river water quality monitoring networks combined with novel artiﬁcial intelligence techniques for more accurate forecasting is at the forefront of urban water management. The preservation and improvement of the quality of our impaired urban streams are at the core of the global challenge of ensuring water sustainability. This work adopted a genetic-algorithm (GA)-optimized long short-term memory (LSTM) technique to predict river water temperature (WT) as a key indicator of the health state of the aquatic habitat, where its modeling is crucial for e ﬀ ective urban water quality management. To our knowledge, this is the ﬁrst attempt to adopt a GA-LSTM to predict the WT in urban rivers. In recent research trends, large volumes of real-time water quality data, including water temperature, conductivity, pH, and turbidity, are constantly being collected. Speciﬁcally, in the ﬁeld of water quality management, this provides countless opportunities for understanding water quality impairment and forecasting, and to develop models for aquatic habitat assessment purposes. The main objective of this research was to develop a reliable and simple urban river water temperature forecasting tool using advanced machine learning methods that can be used in conjunction with a real-time network of water quality monitoring stations for proactive water quality management. We proposed a hybrid time series regression model for WT forecasting. This hybrid approach was applied to solve problems regarding the time window size and architectural factors (number of units) of the LSTM network. We have chosen an hourly water temperature record collected over 5 years as the input. Furthermore, to check its robustness, a recurrent neural network (RNN) was also tested as a benchmark model and the performances were compared. The experimental results revealed that the hybrid model of the GA-LSTM network outperformed the RNN and the basic problem of determining the optimal time window and number of units of the memory cell was solved. This research concluded that the GA-LSTM can be used as an advanced deep learning technique for time series analysis.


Introduction
The impact of urbanization on urban streams has been well established, with the term urban stream syndrome (USS) commonly used to describe the detrimental effects of high urbanization on the aquatic health of streams. The symptoms of USS include a complex web of changes, including decreases in biodiversity; a flashier flow response; increased erosion; and changes to the geomorphology, chemistry, and nutrient cycles [1][2][3][4]. The levels of urbanization that can cause stream impairment are relatively Most recently, Kumar et al. investigated an LSTM model with different combinations of window size and the number of units of memory cells to provide better model performances [46]. From the above critical appraisal, the largest difficulty arises due to the selection of the window size and the number of units. Therefore, this research investigated this problem through a genetic search. In this search, we applied a genetic algorithm to find the optimal window size and number of units based on the lowest RMSE, and the best window size and number of units were fed into an LSTM model to train the dataset. To our knowledge, this paper presents the first time a GA-LSTM framework has been applied to river temperature data at the hourly timescale.

Recurrent Neural Network (RNN)
An RNN model is an extension of a feedforward neural network (FFN), which consists of edges that span the adjacent time steps to denote time in the model. The basic difference between an FFN and an RNN is that an RNN does not have cycles among the conventional edges, where this is replaced by adjacent time steps that are recurrent edges. These recurrent edges form cycles that are self-connected by nodes across time. An RNN is a sequential model that consists of memory cells, which act as a hidden state. These memory cells iterate the sequential elements in a loop and maintain its state in vector form, which is known as the state vector [47]. Figure 1 represents the workflow of an RNN model. The current hidden state (s t ) is a function of previous hidden state (s t−1 ) and the current input (x t ). The value of the current hidden state can be calculated using Equation (1): where s t and s t−1 represents the hidden state values at the current timestep (t) and the previous timestep (t − 1), and the current timestep input value is denoted as x t . Through a looping mechanism, the output value is fed back into the hidden state to calculate subsequent time steps. Furthermore, an RNN model consists of three hidden nodes having a weight matrix (U, V, W), as shown in Figure 1. The temporal dynamics of an RNN model can be calculated using Equation (2): Sustainability 2020, 12, x FOR PEER REVIEW 17 of 17 Most recently, Kumar et al. investigated an LSTM model with different combinations of window size and the number of units of memory cells to provide better model performances [46]. From the above critical appraisal, the largest difficulty arises due to the selection of the window size and the number of units. Therefore, this research investigated this problem through a genetic search. In this search, we applied a genetic algorithm to find the optimal window size and number of units based on the lowest RMSE, and the best window size and number of units were fed into an LSTM model to train the dataset. To our knowledge, this paper presents the first time a GA-LSTM framework has been applied to river temperature data at the hourly timescale.

Recurrent Neural Network (RNN)
An RNN model is an extension of a feedforward neural network (FFN), which consists of edges that span the adjacent time steps to denote time in the model. The basic difference between an FFN and an RNN is that an RNN does not have cycles among the conventional edges, where this is replaced by adjacent time steps that are recurrent edges. These recurrent edges form cycles that are self-connected by nodes across time. An RNN is a sequential model that consists of memory cells, which act as a hidden state. These memory cells iterate the sequential elements in a loop and maintain its state in vector form, which is known as the state vector [47]. Figure 1 represents the workflow of an RNN model. The current hidden state ( ) is a function of previous hidden state ( ) and the current input ( ). The value of the current hidden state can be calculated using Equation (1): where and represents the hidden state values at the current timestep ( ) and the previous timestep ( − 1 ), and the current timestep input value is denoted as . Through a looping mechanism, the output value is fed back into the hidden state to calculate subsequent time steps. Furthermore, an RNN model consists of three hidden nodes having a weight matrix (U, V, W), as shown in Figure 1. The temporal dynamics of an RNN model can be calculated using Equation (2): The recurrence equation (Equation (2)) filters the weighted sum of inputs and states using a nonmapping activation function. The output value of each time step ( ) contains information about the previous time step and the current time step in a 2D tensor, which is carried forward to the next subsequent time step to form recurrent edges.  The recurrence equation (Equation (2)) filters the weighted sum of inputs and states using a non-mapping activation function. The output value of each time step (o t ) contains information about the previous time step and the current time step in a 2D tensor, which is carried forward to the next subsequent time step to form recurrent edges. In Equation (2), φ and ψ denote the activation functions at the inputs and output, respectively. The selection of these activation functions depends upon the problem statement. The interested reader can find more details about this model in Lipton et al. [48].

Long Short-Term Memory (LSTM)
An LSTM is a variant of an RNN architecture proposed by Hochreiter and Schmidhuber that was developed to model sequences and long-term dependencies more precisely than an RNN [49]. The gating mechanism enables an LSTM model to regulate information across the network to overcome the problem of exploding and vanishing gradients [50]. The architecture of a single LSTM block is shown in Figure 2. In Equation (2), and denote the activation functions at the inputs and output, respectively. The selection of these activation functions depends upon the problem statement. The interested reader can find more details about this model in Lipton et al. [48].

Long Short-Term Memory (LSTM)
An LSTM is a variant of an RNN architecture proposed by Hochreiter and Schmidhuber that was developed to model sequences and long-term dependencies more precisely than an RNN [49]. The gating mechanism enables an LSTM model to regulate information across the network to overcome the problem of exploding and vanishing gradients [50]. The architecture of a single LSTM block is shown in Figure 2. The memory block of an LSTM model consists of four units: input gate, forget gate, carry state, and output gate, which regulate the temporal relationship of the previous time series values by remembering or forgetting. The input gate ( ) and forget gate ( ) regulate how much information passes through the current cell and how much information is to be forgotten from the previous memory ( ), the carry state ( ) modulates the writing of new information to the next memory cell ( ), and the output gate ( ) decides how much information will pass to the next cell from the current cell. The workflow and gating mechanism of this process is presented in Equation (3): where c, f, i, and o denote the carry state and the forget, input, and output gates, respectively. The hidden state (g) present in the memory cell is assessed by the current input ( ) and the previous hidden state ( ). The forget gate replaces the previous memory with the new input, whereas the hidden state ( ) is considered after the multiplication of and . More details can be found in Bandara et al. [51]. The memory block of an LSTM model consists of four units: input gate, forget gate, carry state, and output gate, which regulate the temporal relationship of the previous time series values by remembering or forgetting. The input gate (i) and forget gate ( f ) regulate how much information passes through the current cell and how much information is to be forgotten from the previous memory (h t−1 ), the carry state (c t ) modulates the writing of new information to the next memory cell (h t ), and the output gate (o) decides how much information will pass to the next cell from the current cell. The workflow and gating mechanism of this process is presented in Equation (3):

Genetic Algorithm (GA)
where c, f, i, and o denote the carry state and the forget, input, and output gates, respectively.
The hidden state (g) present in the memory cell is assessed by the current input (x t ) and the previous hidden state (h t−1 ). The forget gate replaces the previous memory with the new input, whereas the hidden state (h t ) is considered after the multiplication of o and c t . More details can be found in Bandara et al. [51].

Genetic Algorithm (GA)
A GA is a natural-evolution-inspired stochastic optimization technique, which is one of the most commonly applied metaheuristic algorithms [52]. A GA process includes evolutionary principles, such as crossover and mutations of chromosomes. Each chromosome behaves as an individual solution to the target problem, which is articulated in a binary string form. The initial population of chromosomes is generated randomly, and the one that gives a better solution to an assigned target is chosen to reproduce [53].
The whole process of optimization is divided into six stages: initialization, fitness calculation, termination via a condition check, selection, crossover, and mutation. Figure 3 denotes the detailed process of a GA. During the process of fitness estimation, only the chromosomes displaying an excellent performance are preserved for further reproduction. This selection and reproduction process is iterated several times to obtain a high probability of superior chromosomes. In the next step, the superior chromosomes generate offspring by interchanging string parts and gene combinations during the crossover process, which results in a new solution. In the mutation process, one of the chromosomes is selected to change a randomly selected bit through arbitrary swapping. The fitness of the generated solution is estimated and checked against the termination criteria. When the termination criteria have been satisfied, the GA process terminates. A GA is a natural-evolution-inspired stochastic optimization technique, which is one of the most commonly applied metaheuristic algorithms [52]. A GA process includes evolutionary principles, such as crossover and mutations of chromosomes. Each chromosome behaves as an individual solution to the target problem, which is articulated in a binary string form. The initial population of chromosomes is generated randomly, and the one that gives a better solution to an assigned target is chosen to reproduce [53].
The whole process of optimization is divided into six stages: initialization, fitness calculation, termination via a condition check, selection, crossover, and mutation. Figure 3 denotes the detailed process of a GA. During the process of fitness estimation, only the chromosomes displaying an excellent performance are preserved for further reproduction. This selection and reproduction process is iterated several times to obtain a high probability of superior chromosomes. In the next step, the superior chromosomes generate offspring by interchanging string parts and gene combinations during the crossover process, which results in a new solution. In the mutation process, one of the chromosomes is selected to change a randomly selected bit through arbitrary swapping. The fitness of the generated solution is estimated and checked against the termination criteria. When the termination criteria have been satisfied, the GA process terminates

Genetic Algorithm Long Short-Term Memory (GA-LSTM)
This section describes the hybrid approach of an LSTM sequential model integrated with a GA to find the customized window size and number of units (memory cell) in an LSTM model for water temperature time-series predictions. Since the performance of sequential models (i.e., LSTM) relies on past information from the training phase, the selection of an appropriate or optimized time window plays a vital role in obtaining a more accurate model. For example, if the window is small, there is a chance of important information being neglected, and on the other hand, if the window time is large, the model will overfit during the learning process. Figure 4 shows the flow diagram of the GA-LSTM model used in this study. The learning process consisted of two stages. The first stage was used to design the appropriate network parameters for the LSTM model. To keep the architecture

Genetic Algorithm Long Short-Term Memory (GA-LSTM)
This section describes the hybrid approach of an LSTM sequential model integrated with a GA to find the customized window size and number of units (memory cell) in an LSTM model for water temperature time-series predictions. Since the performance of sequential models (i.e., LSTM) relies on past information from the training phase, the selection of an appropriate or optimized time window plays a vital role in obtaining a more accurate model. For example, if the window is small, there is a chance of important information being neglected, and on the other hand, if the window time is large, the model will overfit during the learning process. Figure 4 shows the flow diagram of the GA-LSTM model used in this study. The learning process consisted of two stages. The first stage was used to design the appropriate network parameters for the LSTM model. To keep the architecture simple, we adopted a single hidden layer, and the optimum number of memory cell units was searched for by the GA. The hyperbolic tangent function was used as an activation function at the inputs and hidden nodes to scale the inputs between −1 and 1, and a linear output function was used as the activation. Furthermore, to adjust the initialized random weight of the network, a gradient-based Adam optimizer was used [54]. simple, we adopted a single hidden layer, and the optimum number of memory cell units was searched for by the GA. The hyperbolic tangent function was used as an activation function at the inputs and hidden nodes to scale the inputs between −1 and 1, and a linear output function was used as the activation. Furthermore, to adjust the initialized random weight of the network, a gradientbased Adam optimizer was used [54]. In the second stage, to obtain the optimal window size and network parameters, an evolutionary GA was used. The population of chromosomes with a possible solution was initialized with random values. The generated chromosomes were encoded in binary bits, which represented the size of the window and the number of memory cells. The solution of the model was evaluated based on the predefined fitness function (RMSE) and strings with a higher performance were retained for reproduction. If the termination criteria were satisfied, the near-optimal solution was calculated by the model. The performance of the model was dependent upon the population size, crossover rate, and mutation rate. In this research, the population size, crossover rate, and mutation rate were selected to be 4, 0.7, and 0.15, respectively. Furthermore, for the stopping condition, the total number of generations was selected to be 10. Pseudo-code of the GA-LSTM is shown in Algorithm 1. Crossover of chromosomes with probability 0.7; 7.
Mutation of a new chromosome with probability 0.15; 8.
Evaluate the fitness of the newly generated chromosome; 9. End while 10. Select the best individual chromosome, which is the optimized input window size and number of hidden units in the LSTM layers; 11. Use the optimal input window size and number of hidden units settings to predict the unseen data/test data; In the second stage, to obtain the optimal window size and network parameters, an evolutionary GA was used. The population of chromosomes with a possible solution was initialized with random values. The generated chromosomes were encoded in binary bits, which represented the size of the window and the number of memory cells. The solution of the model was evaluated based on the pre-defined fitness function (RMSE) and strings with a higher performance were retained for reproduction. If the termination criteria were satisfied, the near-optimal solution was calculated by the model. The performance of the model was dependent upon the population size, crossover rate, and mutation rate. In this research, the population size, crossover rate, and mutation rate were selected to be 4, 0.7, and 0.15, respectively. Furthermore, for the stopping condition, the total number of generations was selected to be 10. Pseudo-code of the GA-LSTM is shown in Algorithm 1. Split the data into training and test data; 2.
Training data is used to evaluate the LSTM.
Set the RMSE as the fitness function; 5.
While it == number of generations 6.
Mutation of a new chromosome with probability 0.15; 8.
Evaluate the fitness of the newly generated chromosome; 9.
End while 10. Select the best individual chromosome, which is the optimized input window size and number of hidden units in the LSTM layers; 11. Use the optimal input window size and number of hidden units settings to predict the unseen data/test data;

Description of the Data Used
The Credit Valley Conservation Authority (CVC) operates a network of real-time water quality monitoring stations within the Credit River watershed. Temperature data from the Mississauga Golf and Country Club (MGCC) station located in Mississauga, Ontario (43 • 33 17.2" N, 79 • 37 12.9" W), was chosen for this study ( Figure 5). This station is located on the lower Credit River, approximately 3.5 km upstream from Lake Ontario. The Credit River watershed has an area of 1000 square kilometers with land use comprising 31% urban, 34% agriculture and open space, and 35% natural areas [55]. At this point, the Credit River has a mean discharge of 8.1 m 3 /s according to the Water Survey of Canada station 02HB002 (Credit River at Erindale), which was operational from 1945 to 1993. The water quality was monitored using a Hydrolab DS5X multiparameter sonde. The temperature sensor in the sonde was a variable resistance thermistor with an accuracy of ±0.1 • C and a resolution of 0.01 • C. Sensor data was polled every 15 min and transferred via the intelligent SODA TM telemetry platform to a central database. The water temperature sensor was housed within a perforated pipe mounted to a bridge pier. The Credit River is 2 to 3 m deep in winter months at this station. The depth of the sonde was positioned such that the sonde was submerged during low flows. Over the winter, this would normally be below any potential ice cover if it had occurred. This watershed is heavily urbanized and large amounts of road salt are used on roads and parking lots for winter de-icing operations. High chloride concentrations prevent the formation of ice in urban streams. The CVC data quality and validation procedures removed periods where the data was not reliable. The Hydrolab sonde was exchanged monthly for calibration and quality assurance/quality control (QA/QC) validation of the data. Figure 6 shows the time series of the water temperatures.

Model Development, Performance Assessment, and Forecast Quality Metrics
This section describes the model development of the GA-LSTM and RNN models. To achieve the stated objectives, fully connected RNN and LSTM models were developed for the water temperature modeling using the raw data. The model proposed here was an ensemble of time delay sequential modeling. The RNN and GA-LSTM models were built in TensorFlow using "Keras: The Python Deep Learning library" [57,58]. Since this study formulated the water temperature prediction as a sequence prediction, we adopted the historical data to predict the future temperature.
The investigation also focused on determining how long the historical data (previous time steps) could be used to predict future temperatures. The architecture of the LSTM model consisted of the number of LSTM layers (l r ), a fully connected layer (l fc ), and the number of hidden units (units_r), which determined the complete structure of the LSTM of the network. Considering all the above aspects, we first designed the simplest layered LSTM model. We fixed l r and l fc to be 1, while units_r was determined through optimization. Once the structure was determined, there were still unknown model parameters that were required to train the model, i.e., the learning rate, batch size, optimizer, and activation function. The determination of these control parameters heavily depends upon the skill and experience of researchers. The conventional optimization method uses stochastic gradient descent (SGD), which is a batch version of the gradient descent that helps to speed up the convergence of the network during the learning process.
Kingma and Ba have developed the Adam optimizer, which adds an adaptive learning rate parameter for the training of large-scale neural networks, where it was found that Adam is more robust than SGD [54]. We used the Adam optimizer with a default learning rate of 0.001 in all our experiments, with a batch size of 10. We chose the ReLu activation function, which has been the subject of some recent attention and has shown significant improvement in terms of performance [59]. The optimal window size and units_r were selected based on the root mean square error of the validation set, which was used as the fitness function for the GA. Setting the GA consisted of a binary representation of a solution of length 10, which was randomly initialized using the Bernoulli distribution [60].
The final setting for the GA was population size = 4, number of generations = 2, and gene length = 10, which was used to obtain the best window and units_r by considering five-fold cross-validation. The elitism technique was utilized to obtain the best solution from the population pool, which was then passed on to the next generation; further iterations of this process took place until the termination criteria were satisfied. The division of the data into training and testing sets varies with the problem of interest. Many researchers in the past have used different divisions: Kurup and Dudani adopted 63% of the available data used for training [61], Boadu used 80% of the available data for training [62], Coulibaly and Baldwin used 90% of the data for training [63], and Pal used 69% of the available data for training [64]. In this study, the entire dataset was divided into a training set (first 90% of the whole, where 20% was taken for validation) and a testing set (last 10% of the whole data set).
In general, the performance of the model was assessed using the minimum prediction error criteria. To evaluate the performance of the developed model, two types of forecasting quality metrics were selected since both correlation and variance affect a model's performance. Type 1 errors account for the accuracy of the mean and the closeness of the forecasted time series to the target time series, while type 2 errors account for the closeness of the forecasted mean to the mean of the target values. Therefore, we used the coefficient of determination (r 2 ), mean absolute error (MAE), root mean square error (RMSE), ratio of the RMSE to the standard deviation (RSR), modified Nash-Sutcliffe efficiency coefficient (mNSE), modified index of agreement (md), and Kling-Gupta efficiency (KGE) as fitness indices to evaluate the models' performances (Equations (4)-(9)): where WT E i is the ith hourly water temperature estimated using the models, WT O i is the ith observed hourly water temperature, WT E i is the average of the estimated hourly water temperature, WT O i is the average of the observed hourly water temperature, l is the number of observations, and STDEV obs is the standard deviation of the observed hourly water temperature. The Kling-Gupta efficiency is calculated as follows: where r is the Pearson product-moment correlation coefficient; and s [1], s [2], and s [3] are the scaling factors used for re-scaling the criteria space before computing the Euclidean distance. The factor β      Table 1 shows the optimal window size and the number of units of the memory cell searched by the GA based on the lowest RMSE. The best window size (34) and number of units (9) were used to train the model. In addition, we trained the RNN models with a similar model architecture to compare their performances. Figure 7 shows the plot of the mean square error for the models during the training and validation period.  From the analysis, it was evident that the models were well trained as the model performance did not improve as the number of epochs increased. Moreover, the mean square error for the RNN model had inherent noise while converging during the validation period. From this finding, we could conclude that the LSTM model had a better gradient flow for longer time steps than the RNN model and had an improved performance for long-term dependency tasks. The performance of both models was tested by considering the forecast quality metrics ( Table 2). As mentioned above, two types of errors were evaluated for both models. Slightly higher coefficients of determination were recorded for the GA-LSTM (r 2 = 0.999) than the RNN (r 2 = 0.998) during the training period, whereas during validation and testing, both the models showed similar performances. A scatter plot was drawn for better visualization for both the models for the three phases ( Figure  8a,b). Furthermore, the performance of other metrics clearly created a distinction between the two models; in terms of RMSE, smaller values were found for the GA-LSTM compared to the RNN in all From the analysis, it was evident that the models were well trained as the model performance did not improve as the number of epochs increased. Moreover, the mean square error for the RNN model had inherent noise while converging during the validation period. From this finding, we could conclude that the LSTM model had a better gradient flow for longer time steps than the RNN model and had an improved performance for long-term dependency tasks. The performance of both models was tested by considering the forecast quality metrics ( Table 2). As mentioned above, two types of errors were evaluated for both models. Slightly higher coefficients of determination were recorded for the GA-LSTM (r 2 = 0.999) than the RNN (r 2 = 0.998) during the training period, whereas during validation and testing, both the models showed similar performances. A scatter plot was drawn for better visualization for both the models for the three phases (Figure 8a,b). Furthermore, the performance of other metrics clearly created a distinction between the two models; in terms of RMSE, smaller values were found for the GA-LSTM compared to the RNN in all three phases. In general, the lower the error, the better the model, where the GA-LSTM model showed a smaller error (RMSE = 0.755, RSR = 0.093) than the RNN (RMSE = 1.07, RSR = 0.131) during the testing phase. In addition, the KGE was also calculated to assess the relative importance of the three components (correlation, bias, and variability), while also providing a decomposition of NSE and MSE, which comes under the type 2 error category [65]. Based on these fitness metrics, the GA-LSTM model outperformed the RNN model during all three phases ( Table 2). The superior performance of the GA-LSTM model was supported by the modified index of agreement (md), which also showed the highest values. Figure 9a shows that both models were well trained during the training and validation periods and the GA-LSTM model was very capable of forecasting the diurnal temperature peaks. In Figure 8b, it can be easily seen that the GA-LSTM model was good enough for the prediction of water temperature.

Results and Discussion
A violin plot is used to visualize the distribution of the error and the probability density produced by models [66]. Its summary statistics show the mean/median and interquartile ranges with a full distribution of the error produced. Figure 10 shows that the RNN model had more outliers and its distribution of error occurred more in the lower quantile range, which was in contrast to the GA-LSTM, where the errors were approximately equally distributed in the upper and lower quantiles. Therefore, it can be concluded that the GA-LSTM model performed better and the selection of the optimal window and number of units by the genetic search was validated. This research finding concluded that GA-LSTM can be used as a better option compared with an RNN and even an LSTM. In addition, the KGE was also calculated to assess the relative importance of the three components (correlation, bias, and variability), while also providing a decomposition of NSE and MSE, which comes under the type 2 error category [65]. Based on these fitness metrics, the GA-LSTM model outperformed the RNN model during all three phases ( Table 2). The superior performance of the GA-LSTM model was supported by the modified index of agreement (md), which also showed the highest values. Figure 9a shows that both models were well trained during the training and validation periods and the GA-LSTM model was very capable of forecasting the diurnal temperature peaks. In Figure 8b, it can be easily seen that the GA-LSTM model was good enough for the prediction of water temperature.
(a) RNN model (b) GA-LSTM model   A violin plot is used to visualize the distribution of the error and the probability density produced by models [66]. Its summary statistics show the mean/median and interquartile ranges with a full distribution of the error produced. Figure 10 shows that the RNN model had more outliers and its distribution of error occurred more in the lower quantile range, which was in contrast to the GA-LSTM, where the errors were approximately equally distributed in the upper and lower quantiles. Therefore, it can be concluded that the GA-LSTM model performed better and the selection of the optimal window and number of units by the genetic search was validated. This research finding concluded that GA-LSTM can be used as a better option compared with an RNN and even an LSTM.

Concluding Remarks
This study addressed the applicability of a genetic algorithm integrated with an LSTM model (GA-LSTM) to forecast river water temperatures and to solve the long-standing problem of determining the optimal number of memory units and the window size. The LSTM network used in this study was composed of a single layer with nine memory units that utilized 34 previous time steps to forecast a one-step-ahead value. To validate the effectiveness of this approach, a benchmark model (RNN) with the same input configuration was tested as a comparative study. To further test the robustness, different forecast quality metrics were tested. The overall result demonstrated that a GA-LSTM approach can be an effective method for time series analysis and can capture all the features during learning.
This study suggests that a GA-LSTM can help in designing the architecture of an LSTM and its variants for the detection of temporal patterns in data. Future research in this regard can include other tuning parameters of an LSTM model for prediction performance that depends upon other hyperparameters. Further testing and setting of control parameters of the GA, such as the crossover and mutation parameters, can also be improved to enhance model performance.
The application of these deep learning techniques is encouraged since such models present the possibility of exploiting the benefit of understanding the temporal relationships and sequential nature of time series, which in turn helps in achieving higher accuracies.
This research focused on the use of hybrid and series decomposition techniques to improve forecasting accuracy. The proposed GA-LSTM framework achieved a significant forecasting accuracy in comparison with a benchmark RNN model when applied to the water temperature dataset. From the analysis of the results, it was evident that the GA-LSTM model can be a good replacement without compromising accuracy.
The development of real-time water quality monitoring networks for predicting and detecting toxic spills and other adverse events is a potential application for the use of the GA-LSTM framework proposed in this paper [67][68][69][70]. As more urban watercourses become instrumented, fast and effective forecasting tools will be required to predict and respond to adverse water quality events.

Concluding Remarks
This study addressed the applicability of a genetic algorithm integrated with an LSTM model (GA-LSTM) to forecast river water temperatures and to solve the long-standing problem of determining the optimal number of memory units and the window size. The LSTM network used in this study was composed of a single layer with nine memory units that utilized 34 previous time steps to forecast a one-step-ahead value. To validate the effectiveness of this approach, a benchmark model (RNN) with the same input configuration was tested as a comparative study. To further test the robustness, different forecast quality metrics were tested. The overall result demonstrated that a GA-LSTM approach can be an effective method for time series analysis and can capture all the features during learning.
This study suggests that a GA-LSTM can help in designing the architecture of an LSTM and its variants for the detection of temporal patterns in data. Future research in this regard can include other tuning parameters of an LSTM model for prediction performance that depends upon other hyperparameters. Further testing and setting of control parameters of the GA, such as the crossover and mutation parameters, can also be improved to enhance model performance.
The application of these deep learning techniques is encouraged since such models present the possibility of exploiting the benefit of understanding the temporal relationships and sequential nature of time series, which in turn helps in achieving higher accuracies.
This research focused on the use of hybrid and series decomposition techniques to improve forecasting accuracy. The proposed GA-LSTM framework achieved a significant forecasting accuracy in comparison with a benchmark RNN model when applied to the water temperature dataset. From the analysis of the results, it was evident that the GA-LSTM model can be a good replacement without compromising accuracy.
The development of real-time water quality monitoring networks for predicting and detecting toxic spills and other adverse events is a potential application for the use of the GA-LSTM framework proposed in this paper [67][68][69][70]. As more urban watercourses become instrumented, fast and effective forecasting tools will be required to predict and respond to adverse water quality events.