# Predicting Aquaculture Water Quality Using Machine Learning Approaches

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

_{3}-N), nitrate nitrogen (NO

_{3}-N), and nitrite-nitrogen (NO

_{2}-N). Published data were used to compare the prediction accuracy of different methods. The correlation coefficients of BPNN, RBFNN, SVM, and LSSVM for predicting DO were 0.60, 0.99, 0.99, and 0.99, respectively. The correlation coefficients of BPNN, RBFNN, SVM, and LSSVM for predicting pH were 0.56, 0.84, 0.99, and 0.57. The correlation coefficients of BPNN, RBFNN, SVM, and LSSVM for predicting NH

_{3}-N were 0.28, 0.88, 0.99, and 0.25, respectively. The correlation coefficients of BPNN, RBFNN, SVM, and LSSVM for predicting NO3-N were 0.96, 0.87, 0.99, and 0.87, respectively. The correlation coefficients of BPNN, RBFNN, SVM, and LSSVM predicted NO

_{2}-N with correlation coefficients of 0.87, 0.08, 0.99, and 0.75, respectively. SVM obtained the most accurate and stable prediction results, and SVM was used for predicting the water quality parameters of industrial aquaculture systems with groundwater as the source water. The results showed that the SVM achieved the best prediction effect with accuracy of 99% for both published data and measured data from a typical industrial aquaculture system. The SVM model is recommended for simulating and predicting the water quality in industrial aquaculture systems.

## 1. Introduction

_{3}-N should be lower than 0.2 mg/L, while that of NO

_{2}-N should be generally lower than 0.1 mg/L. In industrial aquaculture systems, especially recirculating systems, the aquatic animals can sustain relatively high concentrations of NH

_{3}-N, NO

_{2}-N, and NO

_{3}-N to some extent. The long-term breeding results have showed that the cultured animals grew well when the ammonia-nitrogen in the recirculating water was lower than 0.4 mg/L. The maintenance of good water quality in aquaculture systems is closely affected by real-time water quality monitoring and accurate water quality prediction. Real-time water quality monitoring or measurement of water quality parameters such as nitrate, nitrite, and ammonia are generally expensive for aquaculture farms. Thus, accurate water quality simulation and prediction are good choices for most industrial aquaculture farms. It is of great economic value to discuss the feasibility of mathematical methods of water quality prediction [4,5]. Good water quality prediction will well maintain the stability of aquaculture systems and reduce the occurrence of fish diseases caused by water quality deterioration.

^{2}of BPNN ranged from 78% to 83% with RMSE of 2.05–2.317, while that of RBFNN ranged from 75% to 83% with RMSE of 2.567–2.946 [10]. Xu et al. compared three prediction models including time series, multiple linear regression, and BPNN to find that BPNN obtained the best prediction result [11]. SVM, which is a representative statistical-learning algorithm, can establish a linearly separated hyperplane for data classification. SVM is very robust for overfitting. SVM algorithms include support vector regression (SVR) and least square support vector machine (LSSVM). The SVR model can fit the input–output relationship of a simulation model to a high degree with less computation with MAE in the range of 0.57–3.3 [12]. LSSVM using RBF as a kernel function was found to be the best model with the highest R

^{2}of 77% [13]. A LSSVM model was utilized for modeling the discharge-suspended sediment relationship to achieve good performance with R

^{2}in the range of 90.9–96% [14]. ANN and SVM were previously used for predictions of algal growth [15]. The results revealed that ANN achieved satisfactory results with quick response, while the SVM was suitable for accurately identifying the optimal model but taking longer training time [15]. Mirarabi et al. reported that the SVR model performed better than the ANN model for 1-, 2-, and 3-month ahead groundwater-level forecasting, while the SVR model could be successfully used in predicting monthly groundwater in confined and unconfined systems [16]. SVR and ANN were also used to predict flood, and the results showed that the predictions of the SVR model for different magnitudes of floods were similar and relatively constant, whereas the ANN model tended to overpredict the smaller floods and underpredicted the extreme floods [17]. RF usually uses a bootstrap with a random subset to be suitable for more variables, while XGBoost is a boosting algorithm. Both RF and XGBoost belong to decision-tree-based machine learning methods with big datasets and high efficiency [18]. In general, the predictions of these methods achieved different accuracies, with R

^{2}ranging from 40 to 96% [10,11,12,13,14,15]. Good prediction performance is related to multiple factors and requires careful method screening.

## 2. Material and Methods

#### 2.1. Selection of Water Quality Prediction Model

#### 2.1.1. Back Propagation Neuron Network (BPNN)

#### 2.1.2. Radial Basis Function Neuron Network (RBFNN)

_{p}is the p-th input sample, c

_{i}is the i-th center point, and h is the number of nodes in the hidden layer. For the radial basis function, its main parameters are the function center c

_{i}, width σ

_{i}, and hidden layer weights ω

_{i}. At the output layer, the RBF network obtains the output through a linear transformation:

_{i}is the total number of samples participating in the training or test and X

_{m}represents the k-th cluster center. Clustering stops when the change in the center of the class is less than the preset constant.

_{i}can be solved by the following equation:

_{max}is the maximum distance between the selected center.

#### 2.1.3. Support Vector Regression Machine (SVM)

_{i},y

_{i}), i = 1, 2, …, n, where x

_{i}(x

_{i}∈R

^{n}) is the input value of the i sample, and y

_{i}∈R

^{n}are the corresponding output values.

_{i}

^{1}, ξ

_{i}

^{2}:

_{i},x) is used to replace the inner product vector in high-dimensional space φ(x

_{i})•φ(x) to obtain the final SVR regression function:

_{i}and α

_{i}* are Lagrange multipliers.

#### 2.1.4. Least Squares Support Vector Machine (LSSVM)

#### 2.2. Simulation and Prediction by Using the Empirical Data

#### 2.2.1. Data Sources

_{3}-N, NO

_{3}-N, NO

_{2}-N, and temperature.

_{3}-N, NO

_{3}-N, and NO

_{2}-N.

^{3}with feed quantity of 5% of fish weight. The inlet water was recycled during the cultivation process.

#### 2.2.2. Algorithm Implementation

_{3}-N, NO

_{3}-N, and NO

_{2}-N were simulated and predicted. The BPNN algorithm constructed a 5-10-1 3-layer BP network for prediction through MATLAB’s neural network toolbox, where 5 referred to the number of neurons in the input layer, 10 referred to the number of nodes in the hidden layer, and 1 referred to the number of neurons in the output layer. For example, 5 parameters including pH, NH

_{3}-N, NO

_{3}-N, NO

_{2}-N, and temperature were used as input vectors to predict DO so that neurons in the output layer was 1. The number of neurons in the hidden layer was determined by empirical formula (12). The training epoch number was determined as 1000. The RBFNN algorithm was created by the newrb function with error series and spread determined as 1 × 10

^{−5}and 2 respectively. The SVM algorithm was implemented using libsvm toolbox and cross-validation selection to obtain the best parameter combination penalty coefficient C and kernel function g. The relevant parameters were set as v = 3, cstep = 0.5, gstep = 0.5, and msestep = 0.05 when looking for the best regression parameters g and c. The LSSVM algorithm was optimized by the ten-fold cross-validation method [30]. The regularization parameter γ and kernel parameter could obtain σ

^{2}.

#### 2.2.3. Metric Evaluation Models

^{2}) were used to evaluate the prediction results:

_{i}is the measured value, and y

_{i}is the predicted value.

#### 2.2.4. Sensitivity Analysis

## 3. Results and Discussion

#### 3.1. Model Screening for Predicting Water Quality

_{3}-N concentration often occurs in breeding ponds, and excessive NH

_{3}-N will lead to decreased immunity, slow growth, poisoning, and the death of aquatic products. Therefore, it is important to predict NH

_{3}-N concentration to maintain the normal production of aquaculture. Figure 1c showed that SVM obtained the best prediction effect to match the measured value with the correlation coefficient of 0.996 and the MAE/MSE less than 0.001. The RBFNN prediction obtained the second best effect with a correlation coefficient of 0.88 and MAE/MSE of 0.005/0.001 (Figure 2c). The correlation coefficient of BPNN and LSSVM was less than 0.3, and the prediction effect was the worst.

_{3}-N is another key indicator of water quality. Figure 1d showed that SVM obtained the best prediction effect among four models for NO

_{3}-N, followed by BPNN. The correlation coefficients of SVM/BPNN were 0.99/0.96, while MSE was 0.006/0.02 and MAE was 0.001/0.002 (Figure 2d). The correlation coefficients of both RBFNN and LSSVM prediction result was about 0.87 while the MSE and MAE were about 0.02 and 0.007, respectively.

_{2}-N in the water body is higher than 0.1 mg/L, it will endanger the normal growth of aquatic animals [34]. The best model and the worst model had a large gap in the predictions of nitrite-nitrogen (Figure 1e). SVM obtained the best prediction result with a correlation coefficient of 0.999, while RBFNN obtained the worst prediction effect with a correlation coefficient of only 0.08 (Figure 2e). The prediction effect of BPNN was second to LSSVM with a correlation coefficient of 0.88, while the prediction effect of LSSVM was second to that of BPNN with a correlation coefficient of 0.75.

#### 3.2. Simulation and Prediction by Using Support Vector Machine

_{3}-N when predicting NH

_{3}-N while the model was more sensitive to NH

_{3}-N when predicting NO

_{3}-N. The model was more sensitive to NO

_{3}-N when predicting the NO

_{2}-N input.

## 4. Conclusions

_{4}-N, NO

_{3}-N, and NO

_{2}-N. The major findings showed that SVM had better performance than the other three models with higher stability and lower data requirements. The accuracy of RBFNN in the prediction of individual indicators was also relatively high, but its stability was not high, and the accuracy gap was too large. The BPNN and LSSVM models were not suitable for predicting water quality parameters. It is feasible to use machine learning models to predict water quality in aquaculture systems. SVM showed excellent prediction performance for a real aquaculture farm. Shortcomings of the parameter selection of SVM occurred in this study. Parameter optimization methods can be used to obtain better prediction results. Water quality early warning can be added on the basis of water quality prediction for factory farming.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Lu, J.; Lin, Y.C.; Wu, J.; Zhang, C. Continental-scale spatial distribution, sources, and health risks of heavy metals in seafood: Challenge for the water-food-energy nexus sustainability in coastal regions? Environ. Sci. Pollut. Res.
**2021**, 28, 1–14. [Google Scholar] [CrossRef] [PubMed] - Lu, J.; Wu, J.; Wang, J.H. Metagenomic analysis on resistance genes in water and microplastics from a mariculture system. Front Environ. Sci. Eng.
**2022**, 16, 4. [Google Scholar] [CrossRef] - Lu, J.; Zhang, Y.X.; Wu, J.; Wang, J.H. Intervention of antimicrobial peptide usage on antimicrobial resistance in aquaculture. J. Hazard. Mater.
**2022**, 427, 128154. [Google Scholar] [CrossRef] [PubMed] - Abdullah, A.H.; Saad, F.S.; Sudin, S.; Ahmad, Z.A.; Ahmad, I.; Abu, B.N.; Omar, S.; Sulaiman, S.F.; Che, M.H.; Umoruddin, N.A.; et al. Development of aquaculture water quality real-time monitoring using multi-sensory system and internet of things. J. Phys. Conf. Ser.
**2021**, 1, 2107. [Google Scholar] [CrossRef] - Nguyen, X.C.; Nguyen, T.; La, D.D.; Kumar, G.; Nguyen, V.K. Development of machine learning—based models to forecast solid waste generation in residential areas: A case study from Vietnam. Resour. Conserv. Recycl.
**2021**, 167, 105381. [Google Scholar] [CrossRef] - Rajaee, T.; Mirbagheri, S.A.; Zounemat-Kermani, M.; Nourani, V. Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models. Sci. Total Environ.
**2009**, 407, 17. [Google Scholar] [CrossRef] - Shouliang, H.; Zhuoshi, H.; Jing, S.; Beidou, X.; Chaowei, Z. Using Artificial Neural Network Models for Eutrophication Prediction. Procedia Environ. Sci.
**2013**, 18, 310–316. [Google Scholar] - Chang, F.J.; Chen, P.A.; Chang, L.C.; Tsai, Y.H. Estimating spatio-temporal dynamics of stream total phosphate concentration by soft computing techniques. Sci. Total Environ.
**2016**, 562, 228–236. [Google Scholar] [CrossRef] - Markus, M.; Tsai, C.W.S.; Demissie, M. Uncertainty of weekly nitrate-nitrogen forecasts using artificial neural networks. J. Environ. Eng.
**2003**, 129, 267–274. [Google Scholar] [CrossRef] - Suen, J.P.; Eheart, J.W. Evaluation of neural networks for modeling nitrate concentrations in rivers. J. Water Resour. Plan. Manag.
**2003**, 129, 505–510. [Google Scholar] [CrossRef] - Xu, X.; Sun, Z.J.; Wang, L.; Fu, J.; Wang, C. A Comparative Study of Customer Complaint Prediction Model of Time Series, Multiple Linear Regression and BP Neural Network. J. Phys. Conf. Ser.
**2019**, 1187, 052036. [Google Scholar] [CrossRef] - Fan, Y.; Lu, W.X.; Miao, T.S.; An, Y.; Li, J.; Luo, J. Optimal design of groundwater pollution monitoring network based on the SVR surrogate model under uncertainty. Environ. Sci. Pollut. Res. Int.
**2020**, 27, 24090–24102. [Google Scholar] [CrossRef] [PubMed] - Chia, S.L.; Chia, M.Y.; Koo, C.H.; Huang, Y.F. Integration of advanced optimization algorithms into least-square support vector machine (LSSVM) for water quality index prediction. Water Sci. Technol. Water Supply.
**2022**, 22, 1951–1963. [Google Scholar] [CrossRef] - Kisi, O. Modeling discharge-suspended sediment relationship using least square support vector machine. J. Hydrol.
**2012**, 456, 110–120. [Google Scholar] [CrossRef] - Deng, T.; Chau, K.W.; Duan, H.F. Machine learning based marine water quality prediction for coastal hydro-environment management. J. Environ. Manage.
**2021**, 284, 112051. [Google Scholar] [CrossRef] - Mirarabi, A.; Nassery, H.R.; Nakhaei, M.; Adamowski, J.; Akbarzadeh, A.H.; Alijani, F. Evaluation of data-driven models (SVR and ANN) for groundwater-level prediction in confined and unconfined systems. Environ. Earth Sci.
**2019**, 78, 1–15. [Google Scholar] [CrossRef] - Mirza, A.S.; Leal, J. Emulation of 2D Hydrodynamic Flood Simulations at Catchment Scale Using ANN and SVR. Water.
**2021**, 13, 2858. [Google Scholar] [CrossRef] - Lu, H.; Ma, X. Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere
**2020**, 249, 126169. [Google Scholar] [CrossRef] - Lnaa, B.; Jtb, B.; Taa, B. Ensemble method based on Artificial Neural Networks to estimate air pollution health risks—ScienceDirect. Environ. Model. Softw.
**2020**, 123, 104567. [Google Scholar] - Li, J. Construction of legal incentive evaluation model based on BP neural network with multiple hidden layers. J. Phys. Conf. Ser.
**2021**, 1941, 012087. [Google Scholar] [CrossRef] - Lourakis, M.I.A. A Brief Description of the Levenberg-Marquardt Algorithm Implemened by levmar. Found. Res. Technol.
**2005**, 4, 1–6. [Google Scholar] - Kc, A.; Hc, B.; Cz, B. Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data—ScienceDirect. Water Res.
**2019**, 171, 115454. [Google Scholar] - Dandy, M. Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications. Environ. Model. Softw.
**2000**, 15, 101–124. [Google Scholar] - Cherkassky, V.; Ma, Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw.
**2004**, 17, 113–126. [Google Scholar] [CrossRef] - Liu, X.P.; Lu, M.Z.; Chai, Y.Z.; Tang, J.; Gao, J.Y. A comprehensive framework for HSPF hydrological parameter sensitivity, optimization and uncertainty evaluation based on SVM surrogate model—A case study in Qinglong River watershed, China. Environ. Model Softw.
**2021**, 143, 150126. [Google Scholar] - Leong, W.C.; Bahadori, A.; Zhang, J. Prediction of water quality index (WQI) using support vector machine (SVM) and least square-support vector machine (LS-SVM). Int. J. River Basin Manag.
**2019**, 19, 149–156. [Google Scholar] [CrossRef] - Xu, W.; Wang, G.; Zhang, X. Prediction of Chlorophyll-a content using hybrid model of least squares support vector regression and radial basis function neural networks. In Proceedings of the 2016 Sixth International Conference on Information Science & Technology, Dalian, China, 6–8 May 2016. [Google Scholar]
- Del, G.D.; Muenich, R.L.; Kalcic, M.M. On the practical usefulness of least squares for assessing uncertainty in hydrologic and water quality predictions. Environ. Model Softw.
**2018**, 105, 286–295. [Google Scholar] - Lei, T. Based on the Neural Network Model to Predict Water Quality; Haikou, D., Ed.; Hainan University: Haikou, China, 2015. [Google Scholar]
- Wang, S.; Yu, L.; Tang, L. A novel seasonal decomposition based least squares support vector regression ensemble learning approach for hydropower consumption forecasting in China. Energy
**2011**, 36, 6542–6554. [Google Scholar] [CrossRef] - Sakaa, B.; Elbeltagi, A.; Boudibi, S.; Chaffaï, H.; Islam, A.R.M.T.; Kulimushi, L.C.; Choudhari, P.; Hani, A.; Brouziyne, Y.; Wong, Y.J. Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environ. Sci. Pollut Res. Int.
**2022**, 29, 32. [Google Scholar] - Cai, O.; Xiong, Y.; Yang, H. Phosphorus transformation under the influence of aluminum, organic carbon, and dissolved oxygen at the water-sediment interface: A simulative study. Front. Environ. Sci. Eng.
**2020**, 3, 165–176. [Google Scholar] [CrossRef] - Iraní, S.M.; Vanessa, R.R.; Fernanda, L.A. The influence of the water pH on the sex ratio of tambaqui colossoma macropomum (CUVIER, 1818). Aquac. Rep.
**2020**, 17, 100334. [Google Scholar] - Li, Y.; Ling, J.; Chen, P. Pseudomonas mendocina LYX: A novel aerobic bacterium with advantage of removing nitrate high effectively by assimilation and dissimilation simultaneously. Front. Environ. Sci. Eng.
**2021**, 15, 57. [Google Scholar] [CrossRef]

**Figure 1.**The observed and simulated values of DO (

**a**), pH (

**b**), NH

_{3}-N (

**c**), NO

_{3}-N (

**d**), and NO

_{2}-N (

**e**).

**Figure 2.**Performance indicators including MAE and MSE of different algorithms. (

**a**) DO; (

**b**) pH; (

**c**) NH

_{3}-N; (

**d**) NO

_{3}-N; (

**e**) NO

_{2}-N.

**Figure 3.**Simulation and prediction of support vector machine on water body data of industrial aquaculture farm. (

**a**) DO; (

**b**) pH; (

**c**) NH

_{3}-N; (

**d**) NO

_{3}-N; (

**e**) NO

_{2}-N.

**Figure 5.**Simulation and prediction effect of support vector machine by using expanded data. (

**a**) DO; (

**b**) pH; (

**c**) NH

_{3}-N; (

**d**) NO

_{3}-N; (

**e**) NO

_{2}-N.

Data | Measurement Methods |
---|---|

DO | DO sensor |

pH | pH meter |

NH3-N | Nessler’s reagent spectrophotometry |

NO3-N | Ultraviolet spectrophotometric method |

NO2-N | 1,2-diaminoethane dihydrochioride spectrophotometry |

Published Data | Aquaculture Water Quality Data in Industrial Aquaculture Systems | ||||||
---|---|---|---|---|---|---|---|

Water Quality Parameter | Model | Result | Water Quality Parameter | Model | Result | ||

MSE | R^{2} | MSE | R^{2} | ||||

DO | BPNN | 0.092 | 0.60 | DO | SVM | 0.001 | 0.99 |

RBFNN | 0.002 | 0.99 | |||||

SVM | 0.003 | 0.99 | |||||

LSSVM | 0.004 | 0.99 | |||||

pH | BPNN | 0.053 | 0.56 | pH | SVM | 0.0002 | 0.99 |

RBFNN | 0.002 | 0.84 | |||||

SVM | 0.002 | 0.99 | |||||

LSSVM | 0.052 | 0.57 | |||||

NH_{3}-N | BPNN | 0.055 | 0.28 | NH_{3}-N | SVM | 0.001 | 0.99 |

RBFNN | 0.001 | 0.88 | |||||

SVM | 0.004 | 0.99 | |||||

LSSVM | 0.056 | 0.25 | |||||

NO_{3}-N | BPNN | 0.017 | 0.96 | NO_{3}-N | SVM | 0.003 | 0.99 |

RBFNN | 0.002 | 0.87 | |||||

SVM | 0.006 | 0.99 | |||||

LSSVM | 0.031 | 0.87 | |||||

NO_{2}-N | BPNN | 0.002 | 0.87 | NO_{2}-N | SVM | 0.006 | 0.99 |

RBFNN | 0.351 | 0.08 | |||||

SVM | 0.001 | 0.99 | |||||

LSSVM | 0.064 | 0.75 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, T.; Lu, J.; Wu, J.; Zhang, Z.; Chen, L.
Predicting Aquaculture Water Quality Using Machine Learning Approaches. *Water* **2022**, *14*, 2836.
https://doi.org/10.3390/w14182836

**AMA Style**

Li T, Lu J, Wu J, Zhang Z, Chen L.
Predicting Aquaculture Water Quality Using Machine Learning Approaches. *Water*. 2022; 14(18):2836.
https://doi.org/10.3390/w14182836

**Chicago/Turabian Style**

Li, Tingting, Jian Lu, Jun Wu, Zhenhua Zhang, and Liwei Chen.
2022. "Predicting Aquaculture Water Quality Using Machine Learning Approaches" *Water* 14, no. 18: 2836.
https://doi.org/10.3390/w14182836