Water Quality Index (WQI) Forecasting and Analysis Based on Neuro-Fuzzy and Statistical Methods
Abstract
1. Introduction
2. Highlights
- Presenting NAFS, a novel model that combines neural networks and fuzzy logic to improve the accuracy of prediction in WQI calculation.
- Demonstrating that NAFS outperforms ANN and ANFIS with significantly lower error values, including MSE of 1.678 and RMSE of 1.295, indicating improved predictive accuracy and reliability.
- Utilizing an extensive dataset from six prominent Malaysian water bodies to enhance the reliability and practicality of water quality forecasts.
- Incorporating advanced learning mechanisms, such as an optimized backward pass, which improves the model’s adaptability and efficiency in real-time contexts.
- Confirming the robustness of the NAFS model through statistical validation techniques, supporting its suitability for environmental decision-making and water resource management.
3. Methods
3.1. Description of the Study Area
3.2. Dataset and Sample Analysis
3.3. Water Quality Index (WQI) Calculation
3.4. Development of NAFS (Neuro-Adapt Fuzzy Strategist)
- First layer: In this layer, the water quality parameters include pH, DO, COD, AN, and SS, where each node stands for one of these metrics;
- The first layer, “Fuzzification,” uses membership functions to transform the clean input values into fuzzy logic. Fuzzification describes the procedure. The layer’s nodes are square and have a parameterized membership function—like a bell-shaped function—that give a membership degree between zero and one.
- Every node in the rules layer refers to a fuzzy rule. Each rule produces an output label representing a qualitative assessment of water quality, denoted as n11, n12,…, n39. These outputs are intermediate linguistic labels used in the fuzzy inference process and are later mapped to a final water quality score through defuzzification. For example, n11 to n19 correspond to combinations of pH and BOD, n21 to n29 to combinations of DO and COD, and n31 to n39 to combinations of ammonia and TDS. While the labels (n11–n39) are internal identifiers, they reflect varying degrees of water quality states—ranging from “very poor” to “excellent”—depending on the strength and nature of the fuzzy conditions.
- The list of rules is shown below:
- R1: IF pH is Low AND BOD is Low THEN water quality is n11.
- R2: IF pH is Low AND BOD is Medium THEN water quality is n12.
- R3: IF pH is Low AND BOD is High THEN water quality is n13.
- R4: IF pH is Medium AND BOD is Low THEN water quality is n14.
- R5: IF pH is Medium AND BOD is Medium THEN water quality is n15.
- R6: IF pH is Medium AND BOD is High THEN water quality is n16.
- R7: IF pH is High AND BOD is Low THEN water quality is n17.
- R8: IF pH is High AND BOD is Medium THEN water quality is n18.
- R9: IF pH is High AND BOD is High THEN water quality is n19.
- R10: IF DO is Low AND COD is Low THEN water quality is n21.
- R11: IF DO is Low AND COD is Medium THEN water quality is n22.
- R12: IF DO is Low AND COD is High THEN water quality is n23.
- R13: IF DO is Medium AND COD is Low THEN water quality is n24.
- R14: IF DO is Medium AND COD is Medium THEN water quality is n25.
- R15: IF DO is Medium AND COD is High THEN water quality is n26.
- R16: IF DO is High AND COD is Low THEN water quality is n27.
- R17: IF DO is High AND COD is Medium THEN water quality is n28.
- R18: IF DO is High AND COD is High THEN water quality is n29.
- R19: IF Ammonia is Low AND TDS is Low THEN water quality is n31.
- R20: IF Ammonia is Low AND TDS is Medium THEN water quality is n32.
- R21: IF Ammonia is Low AND TDS is High THEN water quality is n33.
- R22: IF Ammonia is Medium AND TDS is Low THEN water quality is n34.
- R23: IF Ammonia is Medium AND TDS is Medium THEN water quality is n35.
- R24: IF Ammonia is Medium AND TDS is High THEN water quality is n36.
- R25: IF Ammonia is High AND TDS is Low THEN water quality is n37.
- R26: IF Ammonia is High AND TDS is Medium THEN water quality is n38.
- R27: IF Ammonia is High AND TDS is High THEN water quality is n39.
- The membership values from the preceding layer are inputs to these nodes, and the output, which represents the rule’s strength, is the product of these values.
- The normalization layer takes the rules layer’s output and applies some standardization to it. are the nodes in this layer that determine the firing strength ratio of the rule relative to the sum of all rules. This step converts the firing intensity of the rules into a probability distribution by ensuring that the total of the output signals is equal to 1.
- To transform the fuzzy classification results into a clean output, there is a defuzzification layer. Usually, a parameterized function is used to determine the weighted average for the nodes in this layer, which are designated as . These function parameters are changed during training.
- The final layer is responsible for calculating the total output by aggregating all the incoming signals. The inputs and learned fuzzy rules would culminate in a final prediction at the “Result” node in this layer for water quality prediction.
3.5. Adaptive Neuro Fuzzy Inference System (ANFIS)
- Fuzzification: This initial step involves converting the crisp input variables. In the context of water quality prediction, the parameters like pH, DO, COD, BOD, AN, and SS are included in fuzzy values.
- Rule base: A set of fuzzy if–then rules, derived from expert knowledge or empirical data analysis, is established to capture the relationship between the fuzzy input variables and the target output, the water quality index (WQI).
- Defuzzification: The final step in the ANFIS model is to convert the fuzzy output sets into crisp values, providing specific prediction for the WQI. The centroid method is a common defuzzification technique, which calculates the center of gravity of the aggregated fuzzy set to produce a precise output value.
3.6. Artificial Neural Network (ANN)
3.7. Model Evaluation
3.8. Statistical Analysis
- Under the Central Limit Theorem, the correlation coefficient sampling distribution approximates normality due to the huge sample size [36].
- We want to find linear correlations between water quality metrics using Pearson correlation, even with considerable deviations from normality.
4. Results and Discussion
4.1. Preliminary Statistical Results Based on Data
- DO: The distribution shows a relatively tight grouping, indicating consistent DO levels across samples, with a few outliers suggesting instances of lower oxygen levels.
- BOD: BOD levels display a compact distribution, highlighting generally stable organic pollution levels, though outliers are present, indicating occasional higher pollution levels.
- COD: The COD boxplot reveals a slightly wider spread, suggesting more variability in the chemical pollutants within the water samples.
- SS: The distribution of SS shows a narrow interquartile range but with several outliers, indicating sporadic instances of a higher level of suspended solids.
- pH: The pH levels are concentrated around the neutral to slightly alkaline range, crucial for maintaining aquatic life health and water quality, with minimal outliers.
- AN: The parameter’s distribution is relatively tight, with a few outliers indicating occasional spikes in nitrogen levels, which can be from agricultural runoff or industrial waste.
- DO: Higher values are generally better, with anything above 5 mg/L considered acceptable for most aquatic life.
- BOD and COD: Lower values are preferable as they indicate less organic pollution. BOD levels below 3 mg/L and COD levels below 10 mg/L are often considered clean for natural waters.
- SS: Lower levels are desired as high SS can affect aquatic life and water clarity. Acceptable levels can be below 25 mg/L.
- pH: A range of 6.5 to 9 is generally acceptable for freshwater systems to protect aquatic life.
- AN: Lower concentration of AN is preferable, with levels ideally below 0.5 mg/L for protection against eutrophication.
4.2. Water Quality Index Forecasting Analysis
- MSE and RMSE penalize larger errors more severely, which is useful when large deviations are critical.
- MAE gives a straightforward interpretation of average error, which is less sensitive to outliers.
- MAPE expresses prediction accuracy as a percentage, making it intuitive for practical applications.
- R2 (coefficient of determination) indicates the proportion of variance explained, providing a standardized goodness-of-fit indicator.
4.3. Statistical Analysis Based on Principal Component Analysis (PCA) and Several Tests
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Damseth, S.; Thakur, K.; Kumar, R.; Kumar, S.; Mahajan, D.; Kumari, H.; Sharma, D.; Sharma, A.K. Assessing the impacts of river bed mining on aquatic ecosystems: A critical review of effects on water quality and biodiversity. HydroResearch 2024, 7, 122–130. [Google Scholar] [CrossRef]
- Jan, F.; Min-Allah, N.; Düştegör, D. IoT Based Smart Water Quality Monitoring: Recent Techniques, Trends and Challenges for Domestic Applications. Water 2021, 13, 1729. [Google Scholar] [CrossRef]
- Pham, Q.B.; Mohammadpour, R.; Linh, N.T.T.; Mohajane, M.; Pourjasem, A.; Sammen, S.S.; Anh, D.T.; Nam, V.T. Application of soft computing to predict water quality in wetland. Environ. Sci. Pollut. Res. 2021, 28, 185–200. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, H.; Yaseen, Z.M.; Scholz, M.; Ali, M.; Gad, M.; Elsayed, S.; Khadr, M.; Hussein, H.; Ibrahim, H.H.; Eid, M.H.; et al. Evaluation and Prediction of Groundwater Quality for Irrigation Using an Integrated Water Quality Indices, Machine Learning Models and GIS Approaches: A Representative Case Study. Water 2023, 15, 694. [Google Scholar] [CrossRef]
- Elsabagh, M.A.; Emam, O.E.; Gafar, M.G.; Medhat, T. Handling uncertainty issue in software defect prediction utilizing a hybrid of ANFIS and turbulent flow of water optimization algorithm. Neural Comput. Appl. 2023, 36, 4583–4602. [Google Scholar] [CrossRef]
- Al-Adhaileh, M.H.; Alsaade, F.W. Modelling and Prediction of Water Quality by Using Artificial Intelligence. Sustainability 2021, 13, 4259. [Google Scholar] [CrossRef]
- Aminu, I.I. A novel approach to predict Water Quality Index using machine learning models: A review of the methods employed and future possibilities. Glob. J. Eng. Res. 2022, 13, 26–37. [Google Scholar] [CrossRef]
- Liang, J.; Liu, L. Prediction of Optimal Coagulant Dosage Based on FCM-ISSA-ANFIS Hybrid Model. Pol. J. Env. Stud. 2023, 32, 5171–5183. [Google Scholar] [CrossRef]
- Rathnayake, N.; Rathnayake, U.; Dang, T.L.; Hoshino, Y. Water level prediction using soft computing techniques: A case study in the Malwathu Oya, Sri Lanka. PLoS ONE 2023, 18, e0282847. [Google Scholar] [CrossRef]
- Shine, A.; Madhu, G. Water Quality Modelling of River Periyar Using Artificial Neural Network and Adaptive Neuro-Fuzzy Inference System. IOP Conf. Ser. Earth Environ. Sci. 2022, 1125, 012008. [Google Scholar] [CrossRef]
- Choden, Y.; Chokden, S.; Rabten, T.; Chhetri, N.; Aryan, K.R.; Abdouli, K.M.A. Performance assessment of data driven water models using water quality parameters of Wangchu river, Bhutan. SN Appl. Sci. 2022, 4, 290. [Google Scholar] [CrossRef]
- Dharani, D.L.; Jahnavi, S.; Yougender, Y.; Tanuja, M.; Yaswanth, M. Water Quality Prediction and Analysis Using Machine Learning. Int. J. Adv. Res. Sci. Commun. Technol. 2022, 45, 672–675. [Google Scholar] [CrossRef]
- Olasoji, S.O.; Oyewole, N.O.; Abiola, B.; Edokpayi, J.N. Water Quality Assessment of Surface and Groundwater Sources Using a Water Quality Index Method: A Case Study of a Peri-Urban Town in Southwest, Nigeria. Environments 2019, 6, 23. [Google Scholar] [CrossRef]
- Pappaka, R.K.; Nakkala, A.B.; Badapalli, P.K.; Gugulothu, S.; Anguluri, R.; Hasher, F.F.B.; Zhran, M. Machine Learning-Driven Groundwater Potential Zoning Using Geospatial Analytics and Random Forest in the Pandameru River Basin, South India. Sustainability 2025, 17, 3851. [Google Scholar] [CrossRef]
- Chen, J.; Wei, X.; Liu, Y.; Zhao, C.; Liu, Z.; Bao, Z. Deep Learning for Water Quality Prediction—A Case Study of the Huangyang Reservoir. Appl. Sci. 2024, 14, 8755. [Google Scholar] [CrossRef]
- Shahid, M.S.B.; Rifat, H.R.; Uddin, M.A.; Islam, M.M.; Mahmud, M.Z.; Sakib, M.K.H.; Roy, A. Hypertuning-Based Ensemble Machine Learning Approach for Real-Time Water Quality Monitoring and Prediction. Appl. Sci. 2024, 14, 8622. [Google Scholar] [CrossRef]
- Li, Q.; He, J.; Mu, D.; Liu, H.; Li, S. Dissolved Oxygen Modeling by a Bayesian-Optimized Explainable Artificial Intelligence Approach. Appl. Sci. 2025, 15, 1471. [Google Scholar] [CrossRef]
- Lokman, A.; Ismail, W.Z.W.; Aziz, N.A.A. A Review of Water Quality Forecasting and Classification Using Machine Learning Models and Statistical Analysis. Water 2025, 17, 2243. [Google Scholar] [CrossRef]
- Moeinzadeh, H.; Yong, K.T.; Withana, A. A critical analysis of parameter choices in water quality assessment. Water Res. 2024, 258, 121777. [Google Scholar] [CrossRef]
- Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef]
- City Population. Klang (District, Malaysia)—Population Statistics, Charts, Map and Location. Available online: https://www.citypopulation.de/en/malaysia/admin/selangor/1002__klang/ (accessed on 10 March 2024).
- Saif, M.A.M.; Hussin, N.; Husin, M.M.; Alwadain, A.; Chakraborty, A. Determinants of the Intention to Adopt Digital-Only Banks in Malaysia: The Extension of Environmental Concern. Sustainability 2022, 14, 11043. [Google Scholar] [CrossRef]
- DOE. National Water Quality Standards and Water Quality Index—Department of Environment. Available online: https://www.doe.gov.my/en/national-river-water-quality-standards-and-river-water-quality-index/ (accessed on 12 April 2024).
- Chia, S.L.; Chia, M.Y.; Koo, C.H.; Huang, Y.F. Integration of advanced optimization algorithms into least-square support vector machine (LSSVM) for water quality index prediction. Water Supply 2022, 22, 1951–1963. [Google Scholar] [CrossRef]
- Makinde, A. Optimizing Time Series Forecasting: A Comparative Study of Adam and Nesterov Accelerated Gradient on LSTM and GRU Networks Using Stock Market Data. September 2024. Available online: https://arxiv.org/pdf/2410.01843v1 (accessed on 13 May 2025).
- Meenakshi, P.; Ambiga, K. Prediction of the Water Quality Index Using ANFIS Modelling. J. Pharm. Negat. Results 2022, 13, 1289–1298. [Google Scholar] [CrossRef]
- Trach, R.; Trach, Y.; Kiersnowska, A.; Markiewicz, A.; Lendo-Siwicka, M.; Rusakov, K. A Study of Assessment and Prediction of Water Quality Index Using Fuzzy Logic and ANN Models. Sustainability 2022, 14, 5656. [Google Scholar] [CrossRef]
- Menapace, A.; Zanfei, A.; Righetti, M. Tuning ANN Hyperparameters for Forecasting Drinking Water Demand. Appl. Sci. 2021, 11, 4290. [Google Scholar] [CrossRef]
- Wu, J.; Wang, Z. A Hybrid Model for Water Quality Prediction Based on an Artificial Neural Network, Wavelet Transform, and Long Short-Term Memory. Water 2022, 14, 610. [Google Scholar] [CrossRef]
- Uddin, M.G.; Nash, S.; Diganta, M.T.M.; Rahman, A.; Olbert, A.I. Robust machine learning algorithms for predicting coastal water quality index. J. Environ. Manag. 2022, 321, 115923. [Google Scholar] [CrossRef] [PubMed]
- Patel, V.; Shukla, H.; Raval, A. Enhancing Botnet Detection With Machine Learning And Explainable AI: A Step Towards Trustworthy AI Security. Int. J. Multidiscip. Res. 2025, 7, 2. [Google Scholar] [CrossRef]
- Rajapriya, N.; Kawajiri, K. Deep Learning for GWP Prediction: A Framework Using PCA, Quantile Transformation, and Ensemble Modeling. November 2024. Available online: https://arxiv.org/pdf/2411.19124 (accessed on 15 August 2025).
- Michelucci, U. Correlation and Linear Regression. In Statistics for Scientists; Springer: Cham, Switzerland, 2025; pp. 137–144. [Google Scholar] [CrossRef]
- Lokman, A.; Ismail, W.Z.W.; Aziz, N.A.A. Water Quality Evaluation and Analysis by Integrating Statistical and Machine Learning Approaches. Algorithms 2025, 18, 494. [Google Scholar] [CrossRef]
- Kermani, M.A.M.A.; Mohammadi, N.; Ghasemi, H.; Sahebi, H.; Gilani, H. Enhancing Gas Distribution Network Resilience Utilizing a Mixed Social Network Analysis-Simulation Approach: Application of Artificial Intelligence. IEEE Access 2025, 13, 6924–6944. [Google Scholar] [CrossRef]
- Arachige, D.; Researcher, I. The Dissonance Between Statistical Theory and Practice: The Case of Central Limit Theorem and The Sample Size. Reserachgate 2025, 1, 1–14. [Google Scholar] [CrossRef]
- Mathew, S.; Idi, D.; Stephen, M. Modeling and Inference of Insurance Sector Development on Nigeria Economic Growth African Multidisciplinary Modeling and Inference of Insurance Sector Development on Nigeria Economic Growth. J. Sci. Artif. Intell. 2024, 1, 249–263. [Google Scholar] [CrossRef]
- Mikolajczyk, A.P.; Fortela, D.L.B.; Berry, J.C.; Chirdon, W.M.; Hernandez, R.A.; Gang, D.D.; Zappi, M.E. Evaluating the Suitability of Linear and Nonlinear Regression Approaches for the Langmuir Adsorption Model as Applied toward Biomass-Based Adsorbents: Testing Residuals and Assessing Model Validity. Langmuir 2024, 40, 20428–20442. [Google Scholar] [CrossRef]
- Baek, S.S.; Pyo, J.; Chun, J.A. Prediction of Water Level and Water Quality Using a CNN-LSTM Combined Deep Learning Approach. Water 2020, 12, 3399. [Google Scholar] [CrossRef]
- Rahul, G.D.; Harigovindan, V.P.; Rasheed, A.H.K.P.; Amrtha, B. Attention-driven LSTM and GRU deep learning techniques for precise water quality prediction in smart aquaculture. Aquac. Int. 2024, 32, 8455–8478. [Google Scholar] [CrossRef]
- Huang, T.; Jiang, Y.; Gan, R.; Wang, F. A novel water quality prediction model based on BiMKANsDformer. Environ. Sci. 2025, 11, 590–603. [Google Scholar] [CrossRef]
- Tejaswi, T.; Manoj, C.; Naidu, P.V.D.; Santhosh, T.; Akhil, P.V.S.; Ganesan, V. Nexus of Water Quality prediction by ANN. In Proceedings of the 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems, Chennai, India, 15–16 July 2022. [Google Scholar] [CrossRef]
- Chen, H.; Yang, J.; Fu, X.; Zheng, Q.; Song, X.; Fu, Z.; Wang, J.; Liang, Y.; Yin, H.; Liu, Z.; et al. Water Quality Prediction Based on LSTM and Attention Mechanism: A Case Study of the Burnett River, Australia. Sustainability 2022, 14, 13231. [Google Scholar] [CrossRef]
- Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Predicting flood susceptibility using LSTM neural networks. J. Hydrol. 2021, 594, 125734. [Google Scholar] [CrossRef]
- Anand, M.V.; Sohitha, C.; Saraswathi, G.N.; Lavanya, G.V. Water quality prediction using CNN. J. Phys. Conf. Ser. 2023, 2484, 012051. [Google Scholar] [CrossRef]
- Zhou, L.; Zou, H. Cross-Fitted Residual Regression for High-Dimensional Heteroscedasticity Pursuit. J. Am. Stat. Assoc. 2023, 118, 1056–1065. [Google Scholar] [CrossRef]
- de Souza, R.S.; Borges, E.M. Teaching Descriptive Statistics and Hypothesis Tests Measuring Water Density. J. Chem. Educ. 2023, 100, 4438–4448. [Google Scholar] [CrossRef]
Water Parameter | Mean | Standard Deviation | Coefficient of Variation (%) | Min | Max |
---|---|---|---|---|---|
DO | 6.84 | 1.22 | 17.88 | 0.00 | 14.88 |
BOD | 3.55 | 1.34 | 37.87 | 0.50 | 17.00 |
COD | 29.79 | 12.84 | 43.11 | 1.00 | 110.00 |
SS | 21.80 | 23.20 | 106.43 | 0.00 | 1280 |
pH | 6.61 | 0.82 | 12.48 | 0.00 | 8.44 |
AN | 0.35 | 0.36 | 102.59 | 0.009 | 10.70 |
Model | Temporal Capability | Interpretability | Training Efficiency | Data Requirement |
---|---|---|---|---|
ANN [42] | Low | Medium | High | Moderate |
ANFIS [26] | Low | High | Moderate | Moderate |
LSTM [43] | High | Low | Slow | High |
GRU [40] | High | Low | Moderate | High |
CNN-LSTM [39] | High | Low | Slow | High |
Transformer [41] | Very High | Low | High | Very High |
NAFS (proposed model) | Medium | High | High | Moderate |
Evaluation Metric | ANN [42] | ANFIS [26] | NAFS |
---|---|---|---|
Mean Squared Error (MSE) | 14.208 | 4.815 | 1.678 |
Mean Absolute Error (MAE) | 1.687 | 0.405 | 0.257 |
Root Mean Squared Error (RMSE) | 3.769 | 2.194 | 1.295 |
Mean Absolute Percentage Error (MAPE) | 2.88% | 0.51% | 0.33% |
Coefficient of Determination () | 0.822 | 0.864 | 0.943 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lokman, A.; Wan Ismail, W.Z.; Ab Aziz, N.A.; Ghazali, A.K. Water Quality Index (WQI) Forecasting and Analysis Based on Neuro-Fuzzy and Statistical Methods. Appl. Sci. 2025, 15, 9364. https://doi.org/10.3390/app15179364
Lokman A, Wan Ismail WZ, Ab Aziz NA, Ghazali AK. Water Quality Index (WQI) Forecasting and Analysis Based on Neuro-Fuzzy and Statistical Methods. Applied Sciences. 2025; 15(17):9364. https://doi.org/10.3390/app15179364
Chicago/Turabian StyleLokman, Amar, Wan Zakiah Wan Ismail, Nor Azlina Ab Aziz, and Anith Khairunnisa Ghazali. 2025. "Water Quality Index (WQI) Forecasting and Analysis Based on Neuro-Fuzzy and Statistical Methods" Applied Sciences 15, no. 17: 9364. https://doi.org/10.3390/app15179364
APA StyleLokman, A., Wan Ismail, W. Z., Ab Aziz, N. A., & Ghazali, A. K. (2025). Water Quality Index (WQI) Forecasting and Analysis Based on Neuro-Fuzzy and Statistical Methods. Applied Sciences, 15(17), 9364. https://doi.org/10.3390/app15179364