Comparison of Water Quality Prediction for Red Tilapia Aquaculture in an Outdoor Recirculation System Using Deep Learning and a Hybrid Model

: In modern aquaculture, the focus is on optimizing production and minimizing environmental impact through the use of recirculating water systems, particularly in outdoor setups. In such systems, maintaining water quality is crucial for sustaining a healthy environment for aquatic life, and challenges arise from instrumentation limitations and delays in laboratory measurements that can impact aquatic animal production. This study aimed to predict key water quality parameters in an outdoor recirculation aquaculture system (RAS) for red tilapia aquaculture, including dissolved oxygen (DO), pH, total ammonia nitrogen (TAN), nitrite nitrogen (NO 2 –N), and alkalinity (ALK). Initially, a random forest (RF) model was employed to identify significant factors for predicting each parameter, selecting the top three features from routinely measured parameters on the farm: DO, pH, water temperature (Temp), TAN, NO 2 –N, and transparency (Trans). This approach aimed to streamline the analysis by reducing variables and computation time. The selected parameters were then used for prediction, comparing the performance of convolutional neural network (CNN), long short-term memory (LSTM), and CNN–LSTM models across different epochs (1000, 3000, and 5000). The results indicated that the CNN–LSTM model at 5000 epochs was effective in predicting DO, TAN, NO 2 –N, and ALK, with high R 2 values (0.815, 0.826, 0.831, and 0.780, respectively). However, pH prediction showed lower efficiency with an R 2 value of 0.377.


Introduction
Nile tilapia (Oreochromis niloticus Linn.) is an important freshwater fish that is widely cultivated around the world [1].Currently, farmers prioritize maximizing productivity within confined spaces while minimizing environmental impact.To achieve this, recirculating aquaculture systems (RASs) have been widely implemented in Europe and the USA, especially indoors [2].Maintaining water quality is pivotal in these systems [3][4][5][6][7], necessitating the continuous monitoring of parameters such as dissolved oxygen (DO) levels, water temperature (Temp), acidity, alkalinity (ALK), water clarity, total ammonia nitrogen (TAN), and nitrite nitrogen (NO 2 -N).Measurements involve field equipment and laboratory assessments, often requiring chemicals, instruments, labor, time, and expenses [8].In open recirculating systems, water quality parameters may swiftly fluctuate due to environmental changes, thus demanding more resources than closed systems.Additionally, equipment breakdowns and delayed laboratory measurements can pose challenges, potentially impacting aquatic animal rearing.
However, to date, most research has used water quality relationships to predict outcomes, often assuming linear correlations, which might not always yield accurate Water 2024, 16, 907 2 of 15 predictions, due to nonlinear relationships and environmental variations.In recent years, machine learning (ML) is a technique that has become an increasingly important tool in data analysis, being applied in various fields of research, including applications related to water quality, or work in aquaculture.For example, Palani et al. [9] employed artificial neural networks (ANNs) for marine water quality, while Castrillo and García [10] applied multi-factor linear regression (MLR) and random forest (RF) models for predicting river water quality.Zambrano et al. [8] utilized RF, MLR models, and an ANN for predicting water quality in fish farming reservoirs.Anand et al. [11] and Ye et al. [12] opted for CNNs in their models.Additionally, Hu et al. [13] used a deep long short-term memory (LSTM) network for cage-cultured environments, and Liu et al. [14] employed LSTM deep neural networks in an internet of things (IoT) setting for predicting water quality in cage-cultured environments.Ahmed et al. [15] applied gradient boosting and the multilayer perceptron (MLP).Juna et al. [16] proposed a nine-layer MLP model with k-nearest neighbors (KNN) imputation.Li et al. [17] recommended support vector machine (SVM) for industrial aquaculture.Wang et al. [18] suggested SVM for dissolved oxygen, MLP for nonlinear modeling, and LSTM for dynamic patterns in precise water quality prediction.This diversification of ML techniques across various studies underscores the evolving and versatile nature of ML applications in understanding and forecasting water quality.
In recent advancements, various models have been integrated to enhance prediction accuracy [19].Notably, da Silva et al. [20] introduced a novel toxicity-warning sensor with the linear concentration addition (LCA) model and ML for water quality monitoring.Chen et al. [21] proposed intelligent variable-flow technology for improved water quality control.Meanwhile, Yang et al. [22] utilized a hybrid deep learning approach (CNN, gated recurrent unit (GRU), and attention mechanism) for RAS water quality prediction.Additionally, Wu and Wang [23] introduced the artificial neural network-wavelet transform-long short-term memory (ANN-WT-LSTM) model for Jinjiang River, surpassing others in water quality prediction.Zhou et al. [24] presented the wavelet-autoregressive integrated moving average-gated recurrent unit (W-ARIMA-GRU) model for Beijing's water, combining wavelet decomposition, ARIMA, and GRU.Chen et al. [25] developed LSTM and attention-based long short-term memory (AT-LSTM) models for forecasting Australia's Burnett River water quality.Cai et al. [26] proposed a Kalman-filter-aided LSTM with attention for improved accuracy on Haimen Bay data.Farzana et al. [27] used XGBoost and GRU models for Toowoomba reservoirs, providing insights for climate-aware water management.These studies widely attest to the efficacy and success of this methodology.
However, hybrid models often involve a relatively high number of features, leading to complex processing and potentially lengthy computations.Therefore, the objective of the current study was to predict essential water quality parameters in large-scale, open recirculating red tilapia systems.Notably, there is limited research on refining the predictive model by reducing the number of features.This was achieved through the application of an RF model, complemented by the utilization of the CNN, LSTM, and CNN-LSTM models.Adjustments in epochs (1000, 3000, and 5000) were made to optimize accuracy levels.The findings should help to advance the development of outdoor RAS methodologies.

Farming System and Data Collection
The data were collected from a red tilapia farm in Buriram province, northeast Thailand (15 • 04 ′ 01.9 ′′ N 102 • 47 ′ 20.3 ′′ E).The farm used an RAS consisting of 3 treatment ponds for the water inlet, 4 nursing ponds (each 1600 m 2 ), 18 grow-out ponds (each 1600 m 2 ), and 5 treatment ponds for the water outlet.All the nursing and grow-out ponds were lined with polyethylene.Three grow-out ponds were selected as experimental ponds (1, 2, and 3), as shown in Figure 1.All the experimental ponds were under the same management regime, involving aeration using four 3 horsepower (Hp) aerators that operated continuously (Figure 2a).The fish were fed with 35% protein pelleted feed 3 times a day (08.00,11.30, and 16.30) using an automatic feeder (Figure 2b).The average starting weight of fish raised Water 2024, 16, 907 3 of 15 was about 200 g.The stocking density was 19,000 fish/pond (about 12 fish/m 2 ).The fish weight was assessed manually and randomly twice a month using about 45 fish/pond after anesthetization with clove oil (10 µL/L).The average fish weight on the day of harvest was about 1000 g.The average survival rate was 95%.The rearing period was approximately 90 days.
2, and 3), as shown in Figure 1.All the experimental ponds were under the same management regime, involving aeration using four 3 horsepower (Hp) aerators that operated continuously (Figure 2a).The fish were fed with 35% protein pelleted feed 3 times a day (08.00,11.30, and 16.30) using an automatic feeder (Figure 2b).The average starting weight of fish raised was about 200 g.The stocking density was 19,000 fish/pond (about 12 fish/m 2 ).The fish weight was assessed manually and randomly twice a month using about 45 fish/pond after anesthetization with clove oil (10 µL/L).The average fish weight on the day of harvest was about 1000 g.The average survival rate was 95%.The rearing period was approximately 90 days.2, and 3), as shown in Figure 1.All the experimental ponds were under the same management regime, involving aeration using four 3 horsepower (Hp) aerators that operated continuously (Figure 2a).The fish were fed with 35% protein pelleted feed 3 times a day (08.00,11.30, and 16.30) using an automatic feeder (Figure 2b).The average starting weight of fish raised was about 200 g.The stocking density was 19,000 fish/pond (about 12 fish/m 2 ).The fish weight was assessed manually and randomly twice a month using about 45 fish/pond after anesthetization with clove oil (10 µL/L).The average fish weight on the day of harvest was about 1000 g.The average survival rate was 95%.The rearing period was approximately 90 days.(YSI; Yellow Springs, OH, USA).The pH was measured using a YSI pH100A instrument (YSI; USA) and Trans was measured using a Secchi disc.The TAN, NO 2 -N, and ALK were sampled for analysis in the laboratory according to the method of APHA [28].All parameters were monitored every day in the morning between 07.00 and 08.00 throughout the 4 months of the growth cycle.

Pre-Processing Dataset
Before running the processes, data cleaning was performed by checking for missing data, which was addressed by either removing entries with missing values or imputing them using statistical methods (mean, median, or mode).Approximately 0.2% of the total data were affected.Additional cleaning involved correcting any incorrect data or formatting issues, such as pH values of 15 and 7.5, which impacted approximately 0.5% of the total data.

Feature Selection
RF was applied to identify important features for predicting each water quality parameter.The process of selecting important features using RandomForestregressor entails utilizing the model's built-in feature importance attribute.During training, feature importance was calculated by measuring how each feature decreases impurity across decision trees, highlighting their contributions to predictive performance.All the processes are shown in Table 1.The training performances of the RF were assessed by computing the mean absolute error (MAE).These metrics provide a numerical measure of how well the model captures the real-world conditions.After the selection process, the focus narrowed down to processing only the top 3 features.This adjustment aimed to expedite processing and simplify the task, reducing both processing time and complexity.

Table 1.
The key steps of feature selection using RF.

Import libraries
Pandas for data manipulation.RandomForestRegressor for building the regression model.Other libraries for data processing, evaluation, and visualization.
Load and preprocess data Load a csv dataset and select relevant features and the target variable.
Train-test split Split the data into training and testing sets.

Initialize and train a RandomForestRegressor with specific parameters
Initialize and train a RandomForestRegressor with specific parameters.The code configures the regressor with 100 trees, a random seed of 42 for consistency, a maximum tree depth of 10, and a maximum of 10 leaf nodes per tree.Then, it trains the regressor using the given dataset.
Model evaluation Evaluate the model on both training and testing sets using MAE.
Visualize predictions Create a scatter plot to visualize predicted vs. actual values.

Feature importance bar graph
Calculate and display a bar graph showing the importance of each feature in predicting the parameter.

Data Processing, Analysis, and Visualization
Python (version 3.9), in a colab notebook setting, was used for the essential tasks encompassing deep learning and data analysis.Data processing: the python library (pandas) was utilized for data manipulation, while scikit-learn was used for dataset splitting and feature scaling (min-max scaler), and TensorFlow was used for handling neural network data structures.The analysis used TensorFlow's Keras components to construct neural network models, such as sequential, conv1D, maxpooling1D, LSTM, dense, flatten, and dropout alongside importing evaluation metrics; root mean square error (RMSE), MAE, normalized root mean square error (NRMSE), nash-sutcliffe efficiency (NSE), and the coefficient of determination (R 2 ) from scikit-learn to assess model performance.Matplotlib (version 3.8) was used for data visualization, enabling the creation of graphs, charts, and other visual representations of the data and model outputs within the colab notebook.
Initially, the data were loaded and divided into a training set (80%) and a testing set (20%).Next, 3 models (CNN, LSTM, and a hybrid CNN-LSTM) were used for analysis.All the models underwent fine-tuning by progressively increasing the number of epochs (1000, 3000, and 5000).Here, epochs are the number of times a model goes through the entire training dataset during training.The model structures are shown in Table 2. Calculate evaluation metrics: MAE, RMSE, NRMSE, NSE, and R 2
Water 2024, 16, 907 6 of 15 where y measured is the observed values, y predict is the predicted values, N is the total number of variables, y max is the maximum value, and y min is the minimum value.
where Obs t is the observed value at time t, Sim t is the simulated (predicted) value at time t, and Obs is the mean of the observed values.
where SS res is the sum of squares of residuals (also known as the sum of squared errors or SSE), which represents the difference between the predicted values and the actual values; and SS tot is the total sum of squares, which measures the total variance of the dependent variable (the target) from its mean.Moreover, the calculation times of each model were measured.

Ethical Statement
The study protocol for fish care and experiments was reviewed and approved by the Kasetsart University institutional animal care and use committee (ACKU 66-FIS-004).This study followed Arrive guidelines (https://arriveguidelines.org, accessed on 19 August 2023).All methods were performed in accordance with the relevant guidelines and regulations.

Important Features for Each Water Quality Parameter Prediction
The MAE values for the training performances of the RF model were as follows: DO (0.247), pH (0.053), TAN (0.246), NO 2 -N (0.093), and ALK (2.127).These results are visually presented in Figure 3.The top three most influential features for each water quality parameter were selected using RF. Figure 4a outlines feature importance for predicting DO in a water quality model, highlighting Temp (0.751) as the most critical, followed by NO 2 -N (0.065) and ALK (0.053).In Figure 4b, ALK (0.247) is crucial for pH prediction, followed by DO (0.241) and Trans (0.195). Figure 4c indicates the role of Trans (0.237) in TAN production prediction, followed by ALK (0.218) and DO (0.180). Figure 4d highlights the dominance of ALK (0.372) in NO 2 -N prediction, followed by Temp (0.255) and TAN (0.145).Lastly, Figure 4e shows the importance of Trans (0.391) in ALK prediction, followed by TAN (0.188) and NO 2 -N (0.184).

Predictive Efficiency
The performance metrics (RMSE, MAE, NRMSE, NSE, and R 2 ) were compared among the three models across various epochs (1000-5000).Notably, the CNN-LSTM model, particularly at 5000 epochs, exhibited superior predictive capabilities for key water quality parameters of DO, pH, TAN, NO2-N, and ALK.This model consistently demonstrated lower RMSE, MAE, and NRMSE values compared to the other models.Furthermore, NSE values were consistently higher than those of the other models.Specifically, the R 2 values for the CNN-LSTM model at 5000 epochs reached peaks at 0.815 (DO), 0.826 (TAN), 0.831 (NO2-N), and 0.780 (ALK).However, the pH prediction notably underperformed, with an R 2 of only 0.377.Additionally, the calculation times for this model were approximately 15 min, as shown in Table 4.Following the application of the developed model, graphs displaying both observed and predicted values for each crucial water quality parameter (DO, pH, TAN, NO2-N, and ALK) are depicted in Figures 5-9.

Predictive Efficiency
The performance metrics (RMSE, MAE, NRMSE, NSE, and R 2 ) were compared among the three models across various epochs (1000-5000).Notably, the CNN-LSTM model, particularly at 5000 epochs, exhibited superior predictive capabilities for key water quality parameters of DO, pH, TAN, NO 2 -N, and ALK.This model consistently demonstrated lower RMSE, MAE, and NRMSE values compared to the other models.Furthermore, NSE values were consistently higher than those of the other models.Specifically, the R 2 values for the CNN-LSTM model at 5000 epochs reached peaks at 0.815 (DO), 0.826 (TAN), 0.831 (NO 2 -N), and 0.780 (ALK).However, the pH prediction notably underperformed, with an R 2 of only 0.377.Additionally, the calculation times for this model were approximately 15 min, as shown in Table 4.Following the application of the developed model, graphs displaying both observed and predicted values for each crucial water quality parameter (DO, pH, TAN, NO 2 -N, and ALK) are depicted in Figures 5-9.

Discussion
The CNN and LSTM model combination was clearly the best for prediction tasks due to their complementary strengths.CNNs excel in extracting spatial features, making them well suited for data like images with spatial patterns, while LSTMs specialize in capturing temporal dependencies, fitting perfectly for sequential data such as time series or text [39][40][41].This coupling allows for hierarchical feature learning, where the CNNs extract features and the LSTMs sequentially process them for deeper insight [42,43].Furthermore, this model exhibits acceptable computation times for predictions when compared to laboratory analysis.
The findings of this study are consistent with prior research that utilized a hybrid CNN-LSTM for predicting water quality.In 2020, Baek et al. [44] demonstrated the accuracy of a CNN-LSTM model in simulating water quality in the Nakdong River basin, achieving 'very good' performance and proving valuable for precise water level and quality simulation.Li et al. [45] employed a CNN-LSTM model to compute runoff in the Elbe River basin, Germany, using two-dimensional rainfall radar maps.This model proved beneficial for assessing water availability and providing flood alerts in river basin management.Additionally, in 2023, Li et al. [46] introduced CLATT, a CNN-LSTM-attention model, enhancing wastewater quality prediction accuracy with a sliding window method.
The advantages of the above hybrid model combined with the selected features conducted using RF include that our results can be explained as temperature directly impacting oxygen solubility in the water, with warmer temperatures decreasing oxygen levels [47], while other parameters may have a lesser effect.

Discussion
The CNN and LSTM model combination was clearly the best for prediction tasks due to their complementary strengths.CNNs excel in extracting spatial features, making them well suited for data like images with spatial patterns, while LSTMs specialize in capturing temporal dependencies, fitting perfectly for sequential data such as time series or text [39][40][41].This coupling allows for hierarchical feature learning, where the CNNs extract features and the LSTMs sequentially process them for deeper insight [42,43].Furthermore, this model exhibits acceptable computation times for predictions when compared to laboratory analysis.
The findings of this study are consistent with prior research that utilized a hybrid CNN-LSTM for predicting water quality.In 2020, Baek et al. [44] demonstrated the accuracy of a CNN-LSTM model in simulating water quality in the Nakdong River basin, achieving 'very good' performance and proving valuable for precise water level and quality simulation.Li et al. [45] employed a CNN-LSTM model to compute runoff in the Elbe River basin, Germany, using two-dimensional rainfall radar maps.This model proved beneficial for assessing water availability and providing flood alerts in river basin management.Additionally, in 2023, Li et al. [46] introduced CLATT, a CNN-LSTM-attention model, enhancing wastewater quality prediction accuracy with a sliding window method.
The advantages of the above hybrid model combined with the selected features conducted using RF include that our results can be explained as temperature directly impacting oxygen solubility in the water, with warmer temperatures decreasing oxygen levels [47], while other parameters may have a lesser effect.

Discussion
The CNN and LSTM model combination was clearly the best for prediction tasks due to their complementary strengths.CNNs excel in extracting spatial features, making them well suited for data like images with spatial patterns, while LSTMs specialize in capturing temporal dependencies, fitting perfectly for sequential data such as time series or text [39][40][41].This coupling allows for hierarchical feature learning, where the CNNs extract features and the LSTMs sequentially process them for deeper insight [42,43].Furthermore, this model exhibits acceptable computation times for predictions when compared to laboratory analysis.
The findings of this study are consistent with prior research that utilized a hybrid CNN-LSTM for predicting water quality.In 2020, Baek et al. [44] demonstrated the accuracy of a CNN-LSTM model in simulating water quality in the Nakdong River basin, achieving 'very good' performance and proving valuable for precise water level and quality simulation.Li et al. [45] employed a CNN-LSTM model to compute runoff in the Elbe River basin, Germany, using two-dimensional rainfall radar maps.This model proved beneficial for assessing water availability and providing flood alerts in river basin management.Additionally, in 2023, Li et al. [46] introduced CLATT, a CNN-LSTM-attention model, enhancing wastewater quality prediction accuracy with a sliding window method.
The advantages of the above hybrid model combined with the selected features conducted using RF include that our results can be explained as temperature directly impacting oxygen solubility in the water, with warmer temperatures decreasing oxygen levels [47], while other parameters may have a lesser effect.
In forecasting TAN levels, Trans is used to measure clarity and particle presence, indicating organic matter [47].This may relate to TAN levels, as organic content influences ammonia levels [48].ALK also plays a role by influencing pH [49], where higher pH levels can elevate TAN production and toxicity [50].In addition, the presence of nitrifying bacteria, reliant on DO, affects the efficiency of nitrification, consequently lowering TAN levels with higher DO concentrations [51].
For NO 2 -N prediction, in the nitrogen cycle, it is not solely the ALK but also the presence of oxygen and specialized bacteria that govern the process; however, these relations are quite complex.ALK can significantly influence the solubility of NO 2 -N in water, as a higher ALK level may indirectly impact the nitrogen cycle and alter the speciation of nitrogen compounds [52].Additionally, ALK facilitates the volatilization of ammonia.In turn, temperature affects the production and consumption rate of NO 2 -N, as it is generated by ammonia oxidation-a process catalyzed by temperature [53].Furthermore, TAN acts as a precursor to NO 2 -N, as bacteria-mediated ammonia oxidation leads to the formation of NO 2 -N [47].
With ALK prediction, Trans has been reported as a pivotal factor, since Trans serves as an indicator of water clarity, influenced by suspended particles such as algae and sediment [47].These particles absorb sunlight, potentially reducing available light for photosynthesis, subsequently impacting the growth of aquatic plants and algae, which, in turn, affects ALK levels [54].TAN influences ALK by interacting with bicarbonate ions, leading to a reduction in alkalinity levels, with elevated TAN concentrations contributing to this reduction.Furthermore, specific waterborne bacteria, such as Nitrobacter and Nitrospira, can convert NO 2 -N into nitrate-nitrogen (NO 3 -N), a process that releases hydrogen ions (H + ).This conversion indirectly leads to an increase in ALK by shifting the pH towards a more neutral or slightly alkaline state [47].Trans itself does not directly affect ALK; instead, it refers to the clarity or clearness of water.ALK, on the other hand, is a measure of the water's ability to resist changes in pH.However, the factors influencing Trans, such as suspended particles or dissolved substances, can indirectly influence ALK.These factors have the potential to absorb or adsorb alkaline substances.
Predicting pH directly from the data collected might be difficult because there are factors like CO 2 , minerals, pollution, and biological activities that can affect pH [55], but we did not include them in this study.So, it is not practical to predict pH only based on our available data.Also, these factors affect other parameters, not just pH.
However, notably, the effectiveness of any model combination depends heavily on the nature of the data and the specific problem at hand.While CNN-LSTM hybrids have shown promise in certain applications, other architectures or models might perform better in different scenarios.The choice of model often involves empirical testing and experimentation to find the most suitable one for a particular task.

Conclusions
The red tilapia outdoor recirculating study focused on predicting crucial water quality indicators DO, pH, TAN, NO 2 -N, and ALK.Three key features were selected using the RF model; different models (CNN, LSTM, and a hybrid CNN-LSTM) were tested by varying the epochs from 1000 to 5000.Using the CNN-LSTM model at 5000 epochs demonstrated notably high-performance metrics (RMSE, MAE, NRMSE, and R 2 ) for the parameters DO, TAN, NO 2 -N, and ALK.However, the prediction of pH had comparatively lower results.These outcomes might have been influenced by external environmental factors.A limitation of this study is the absence of measurements for other environmental parameters, such as meteorological data.This is crucial due to environmental variations and seasonal effects that can significantly impact water quality in outdoor settings, along with other biological data.

Figure 2 .
Figure 2. Important equipment used in ponds: (a) 3 Hp aerators that operated continuously and (b) automatic feeder.

Figure 2 .
Figure 2. Important equipment used in ponds: (a) 3 Hp aerators that operated continuously and (b) automatic feeder.Figure 2. Important equipment used in ponds: (a) 3 Hp aerators that operated continuously and (b) automatic feeder.

Figure 2 .
Figure 2. Important equipment used in ponds: (a) 3 Hp aerators that operated continuously and (b) automatic feeder.Figure 2. Important equipment used in ponds: (a) 3 Hp aerators that operated continuously and (b) automatic feeder.

2. 2 .
Water Quality Measurement A total of 2250 water samples were collected for analysis of DO, Temp, pH, TAN, NO 2 -N, ALK, and Trans.The DO and Temp were monitored using a YSI Pro20i instrument Water 2024, 16, 907 4 of 15

Figure 3 .
Figure 3. Prediction performances of RF by MAE for (a) DO, (b) pH, (c) TAN, (d) NO2-N, and (e) ALK.The red line in the scatter plot represents the MAE line, which is a measure of how different the predicted values are from the actual values.The blue dots in the scatter plot represent the actual value measurements.The horizontal position of each dot shows the actual value, and the vertical position shows the residual value, which is the difference between the predicted value and the actual value.In linear regression, the goal is to fit a line through the data points in a way that minimizes the residuals.The MAE line can be used to assess how well the fitted line meets this goal.A lower MAE value indicates that the predictions are, on average, closer to the actual values.

Figure 3 .
Figure 3. Prediction performances of RF by MAE for (a) DO, (b) pH, (c) TAN, (d) NO 2 -N, and (e) ALK.The red line in the scatter plot represents the MAE line, which is a measure of how different the predicted values are from the actual values.The blue dots in the scatter plot represent the actual value measurements.The horizontal position of each dot shows the actual value, and the vertical position shows the residual value, which is the difference between the predicted value and the actual value.In linear regression, the goal is to fit a line through the data points in a way that minimizes the residuals.The MAE line can be used to assess how well the fitted line meets this goal.A lower MAE value indicates that the predictions are, on average, closer to the actual values.

Figure 5 .
Figure 5. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of DO.

Figure 6 .
Figure 6.Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of pH.

Figure 7 .
Figure 7. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of TAN.

Figure 5 . 16 Figure 5 .
Figure 5. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of DO.

Figure 6 .
Figure 6.Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of pH.

Figure 7 .
Figure 7. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of TAN.

Figure 6 . 16 Figure 5 .
Figure 6.Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of pH.

Figure 6 .
Figure 6.Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of pH.

Figure 7 .
Figure 7. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of TAN.Figure 7. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of TAN.

Figure 7 .
Figure 7. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of TAN.Figure 7. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of TAN.

Figure 8 .
Figure 8. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of NO2-N.

Figure 9 .
Figure 9. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of ALK.

Figure 8 . 16 Figure 8 .
Figure 8. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of NO 2 -N.

Figure 9 .
Figure 9. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of ALK.

Figure 9 .
Figure 9. Actual and predicted values (a), along with a scatter graph (b), obtained from the CNN-LSTM model after 5000 epochs of ALK.

Table 3 .
Mean ± standard deviation and standard quality range of dataset variables.

Table 4 .
Performance comparison of different models in each epoch for predicting DO, pH, TAN, NO2-N, and ALK, where bold indicates best performing model for each water parameter.

Table 4 .
Performance comparison of different models in each epoch for predicting DO, pH, TAN, NO 2 -N, and ALK, where bold indicates best performing model for each water parameter.