The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review

Shaheed, Hassan; Zawawi, Mohd Hafiz; Hayder, Gasim

doi:10.3390/pr13030810

Open AccessReview

The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review

by

Hassan Shaheed

^1,2,*

,

Mohd Hafiz Zawawi

¹ and

Gasim Hayder

^1,3

¹

Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN), Kajang 43000, Selangor Darul Ehsan, Malaysia

²

Ministry of Planning of Republic of Iraq, Baghdad P.O. Box 13032, Iraq

³

Institute of Energy Infrastructure (IEI), Universiti Tenaga Nasional (UNITEN), Kajang 43000, Selangor Darul Ehsan, Malaysia

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(3), 810; https://doi.org/10.3390/pr13030810

Submission received: 9 January 2025 / Revised: 14 February 2025 / Accepted: 5 March 2025 / Published: 10 March 2025

(This article belongs to the Section AI-Enabled Process Engineering)

Download Versions Notes

Abstract

This review, “The Development of a River Quality Prediction Model That Is Based on the Water Quality Index using Machine Learning: A Review”, discusses and evaluates research articles and attempts to incorporate ML algorithms into the water quality index (WQI) to improve the prediction of river water quality. This original study confirms how new methodologies like LSTM, CNNs, and random forest perform better than previous methods, as they offer real-time predictions, operational cost saving, and opportunities for handling big data. This review finds that, in addition to good case studies and real-life applications, there is a need to expand in the following areas: impacts of climate change, ways of enhancing data representation, and concerns to do with ethics as well as data privacy. Furthermore, this review outlines issues, such as data scarcity, model explainability, and computational overhead in real-world ML applications, as well as strategies to preemptively address these issues in order to improve the versatility of data-driven models in various domains. Moving to the analysis of the review specifically to discuss the propositions, the identified key points focus on the use of complex approaches and interdisciplinarity and the involvement of stakeholders. Due to the added specificity and depth in a number of comparisons and specific technical and policy discussions, this sweeping review offers a broad view of how to proceed in enhancing the usefulness of the predictive technologies that will be central to environmental forecasting.

Keywords:

water quality index; machine learning; river quality prediction; environmental monitoring

1. Introduction

The existing body of knowledge about the approaches and methods applied to assess river water quality consists of traditional approaches and the use of the water quality index (WQI), which is a holistic measure. It has been used through traditional approaches, which entail direct sampling and subsequent analysis in the laboratory, which may be quite expensive and time-consuming (Essien-Ibok, M.A. et al., 2024 [1]). However, the WQI is a relatively simple and straightforward tool compared with complex models and equations, which present multiple aspects of water quality in an easy-to-understand single figure for both experts and the general public. This approach is gradually being augmented by technologies such as remote sensing and artificial intelligence to improve the monitoring process (Li et al., 2023 [2]).

The ordinary techniques include taking water samples that are later tested in a laboratory to determine parameters such as acidity, dissolved oxygen, and the presence of chemical substances. Even though these methods are quite precise, they can be rather time-consuming and sometimes demand considerable amounts of resources (Essien-Ibok, M.A. et al., 2024 [1]). The single-factor assessment method, where only the worst value within a particular parameter is taken into consideration, has been deemed to offer an incomplete picture of water quality (Li et al., 2023 [2]).

The WQI integrates several aspects of water quality into one parameter, thus making it easier to compare the levels of water pollution. This method has been commonly utilized in the evaluation of some water bodies in relation to their health, including the Cross-River System, which has been shown to exhibit different degrees of pollution [1]. The WQI is the most useful in informing the public and policy-making bodies about the formulation of management strategies for the status of water quality [3]. As shown by the application on the Nakdong River, the real-time WQI makes water quality data more easily readable and does not require additional interpretation even for uninformed people [4].

Other procedures, such as remote sensing and artificial intelligence models, such as models that check the Hudson River, are more useful as a backup for standard procedures. These technologies can be used to determine a suitable WQI for water via satellite images and AI models to increase the costs associated with monitoring water quality [5]. Although conventional techniques are important for obtaining increasingly detailed information, the combination of the WQI with modern technologies is much more effective and powerful in monitoring river water quality. These innovations not only improve the efficiency and availability of water quality assessments but also contribute to management and policy-making processes [1,4].

Several disadvantages have been identified with the conventional method of water quality monitoring: the labor involved and cost are major setbacks in regard to water sampling due to high operational costs and limited resources. The problems associated with these data are that they are sparse both spatially and temporally and are often not in real time, which presents a major challenge to water quality management. However, the current problems can be addressed by modern sensors and new developments in machine learning (Li et al., 2023 [2]).

1.1. Complexity and Cost of Manual Sampling

Conventional water quality assessment is time-consuming and expensive because a large number of sampling stations and trained personnel collect and analyze samples [6].

These costs cannot be explained by anything other than the necessity of numerous site visits, equipment, and the purchase of reserve electric power and monitoring devices [6].

1.2. Limited Spatial and Temporal Coverage

There are few monitoring stations, and some parts of the world cannot be accessed, and, thus, data collection is limited per region, hence making it hard to detect regional differences in water quality [6].

Manual sampling may be performed after a certain time spacing so that it does not capture short-term data and periodic changes such as seasonal fluctuation in water quality [6].

1.3. Lack of Real-Time Monitoring

The old procedures extend the time taken in data analysis, hence producing outdated information, and instead of proactively responding to water quality problems, management is on the lookout for these problems [7].

Real-time monitoring systems, including the usage of WSNs, can give timely information, thereby necessitating lower amounts of human effort and increasing flexibility [8].

1.4. Data Quality and Consistency

This type of data collection exposes research to human error and inconsistency as methods differ from one region to another [9].

A lack of complete datasets because of limited resources or breakdowns in monitoring equipment also complicates monitoring duration and trend analysis [9].

1.5. Potential of Machine Learning and Advanced Technologies

Machine learning models can help make water quality predictions to supplement and possibly replace manual sampling and can also fill in the gaps of unobserved locations in terms of spatial–temporal coverage [8].

In Table 1 below, there is a summary of the key studies that have been reviewed on water quality prediction using machine learning [9].

1.6. Key Machine Algorithms

❖: Conventional neural networks and gated recurrent units: These models achieved a validation accuracy of 97.86% when used to monitor the Vaigai River, which was more accurate than other state-of-the-art models. It incorporates real-time data collection and feedback and gives immediate alarms, which help in early intercession and decision-making in sound management [10].
❖: Long short-term memory (LSTM) has been shown to be superior to other models, such as support vector regression and random forest, for estimating the British Columbia water quality index without some parameters, with a coefficient of determination of 0.91 [11].
❖: Gradient boosting and polynomial regression: these algorithms are helpful in predicting the water quality index (WQI), with the gradient boosting model having been shown to have a mean absolute error of 1.8074 [12].
❖: Random forest and extreme gradient boosting: these models were used for classification tasks and gradient boosting, with an accuracy level of 99.50% for water quality class prediction [13].
❖: Performance comparison of key algorithms: Reviews of the performance comparison of key algorithms in previous trials indicate that machine learning models have considerable potential in the estimation of water quality index parameters, including pH, turbidity, and dissolved oxygen. The prognosis models used in previous studies include CNNs, LSTMs, and RF, which differ in their suitability for demonstrating predictive capabilities.
❖: Convolutional neural networks (CNNs): convolutional neural networks (CNNs) can be applied to situations where spatial and image-based analyses of water quality can also be conducted and can learn complex patterns in data; hence, they may be effective if they are used with large training datasets [14].
❖: Long short-term memory networks (LSTMs): These are ideal for use with time series data, which makes them perfect when trying to identify changes in water quality over time. They perform well in capturing long-term dependency information and can also be computationally expensive [14].
❖: Random forest (RF): This is a strong ensemble method that works proficiently even when the relationships between the IVs and the DV are nonlinear. The anticipation of WQI parameters was proven to be accurate with published research suggesting R² values equal to 0.98 (Khoi et al., 2022b [14]) and (Asadollah et al., 2021 [25]).

For the accuracy of performance metrics, for example, the XGBoost model provided a good R² value of 0.989, which points to the high predictive ability of the chosen model [14].

However, with respect to error metrics, there is a more generalized approach to measure the performance of the models, such as the mean absolute error (MAE) and root mean square error (RMSE); the lower the values are, the better the model is [26].

However, current machine learning models enhance the ability to predict water quality, whereas statistical models are still applicable under certain situations, e.g., where data-related information or constituent relationships are not very complex. Model interpretability remains a crucial issue in the current field when deciding on a model that is both complex enough to learn and simple enough to explain.

1.7. Advantages and Limitations

ML models provide real-time predictions of outcomes, rely little on laboratory experiments, and even permit approximations of missing parameters when the dataset is compound [11] and (Khaskheli et al., 2024 [12]). They also present a high level of accuracy and flexibility when analyzing big datasets. However, some models might, in fact, be very demanding in terms of the amount of computational power and data preprocessing techniques, such as normalization and imputation, to yield the best results [13].

Thus, although using ML models yields significant enhancements over regular approaches, concerns, including data issues and the demand for significant computing facilities, need to be resolved to unlock the full potential of such technology. However, the incorporation of ML into water quality management systems has led to improvements in environmental analysis and management, as well as a reaction to emerging problems in today’s world through the proposal of intelligent and adaptable strategies.

The main objective of this research is to create a model for river quality estimation through the combination of water quality index computation with machine learning (ML) methods. The proposed research aims to improve predictive water quality accuracy through a combined analysis of WQI calculations with LSTM, CNN, and random forest ML techniques. The research investigates present techniques for water quality evaluation through an analysis of traditional sampling practices alongside WQI-based assessment methods.

❖: A comparison of different ML algorithms for real-time water quality predictions occurs within this study.
❖: This study analyzes four significant challenges including the lack of data availability along with difficulty in interpreting models and high computational load and ethical issues.
❖: This research examines machine learning applications within practical scenarios while showing their cost-saving and large data-handling abilities.
❖: The authors propose future recommendations that combine ML with climate change data while improving data representation and handling privacy concerns.

2. Summary of Original Article

New developments in ML have greatly improved observations of water quality since classical methods are expensive, cover only a limited number of locations, and have a lower temporal resolution. The amalgamation of ML techniques with the WQI has appeared to be a plausible solution for real-time, accurate water health evaluation. This synthesis aims to discuss different kinds of ML models and their relevance to the understanding and assessment of water quality indicators.

2.1. Machine Learning Models for Water Quality Prediction

Neural Networks and Fuzzy Methods: Fuzzy approaches integrated with deep learning neural networks, including LSTM, outperform methods of analyzing and predicting water pollution trends. These models are capable of dealing with spatial data to provide insight into a particular region’s levels of pollution [15].

Real-time Assessment with WQI Models: LSTM models, among other machine learning models, have been applied and tested for their ability to estimate the BCWQI, and the results of this study indicate that even if Chemical Oxygen Demand (COD) and TP are unavailable, the models can be applied in real time to estimate the BCWQI. This approach supplements the evaluation of water quality in a highly efficient and time-saving way [11].

Supervised Learning Techniques: both gradient boosting and polynomial regression have been shown to be useful for WQI forecasting, whereas the MLP has demonstrated high accuracy in the classification of water quality [12].

2.2. Advanced Monitoring Systems

Convolutional Neural Networks and Gated Recurrent Units: these models have been used to create extensive monitoring frameworks that allow the stream aggregation of various parameters of water quality in real time, with high validation accuracy, to enable intervention [10].

Edge Computing and IOT Integration: in addition to edge computing, the IoT, along with ML algorithms, especially LSTM, makes it possible to monitor and predict water quality in aquaculture in real time so that losses due to polluted water can be avoided [27].

2.3. Feature Selection and Model Optimization

Deep Learning with Feature Selection: Combining deep learning models with feature selection methods increases the accuracy of water quality analysis when the random forest and AdaBoost methods are applied. This approach leads to an understanding of how feature extraction increases model performance.

Parameter Optimization: Methods such as grid search are used to improve the parameter estimates in ML models, with a positive impact on WQI predictions. Random forest and gradient boosting have been found to be very effective for classification as well as regression analysis [13].

However, there are also several limitations of using machine learning; for example, collecting enough data is a real problem for training algorithms, and the process of training itself is very challenging. Furthermore, the application of ML approaches alongside conventional methods can help improve the effectiveness of water quality monitoring by blending reliability with accuracy to yield accurate results. The proposed approach fits traditional and innovative methods and results in a reasonable approach to address the water quality question.

2.4. Key Themes in the Literature

The WQI is used to evaluate the general quality of water in any water course by integrating several parameters. Owing to its simplicity and ease of comprehension, it is useful for those in a position to make decisions regarding the appropriateness of water for certain purposes. Nevertheless, conventional WQI approaches are associated with several disadvantages arising from manual data acquisition, including slow monitoring and prediction. There is a growing interest in the integration of ML in WQI models to make them more accurate and to provide real-time results.

2.5. Machine Learning Integration with WQI

Real-time Assessment: There are several machine learning approaches used for WQI estimation on the basis of monitored parameters, including long short-term memory (LSTM), which does not require some parameters, such as COD and TP, to be measured in real time. The LSTM model depicted a correlation coefficient of determination of 0.91, proving to be more efficient than the SVR and RF models for the real-time estimation of the WQI in Lake Päijänne, Finland [11].

Predictive Modeling: Several ML algorithms, such as the GB, RF, and MLP algorithms, have been used in studies to estimate the WQI. The MLP model was found to have a high discriminator power of 99.8% in determining the WQI from given input variables, thus being suitable for regression tasks [13].

Artificial Neural Networks (ANNs): ANNs have been applied to model the WQI via faster and more defined methods than conventional approaches. Developed computational procedures also helped reduce the time and cost of computing the WQI in the Klang River of Malaysia by using ANNs [28].

2.6. Real-Time Monitoring

With the help of IoT systems and sensors for determining temperature, turbidity, and pH, the WQI can be calculated and determined. These systems provide real-time information on water pollution, hence enabling the monitoring of water quality in real time [17].

3. Challenges and Considerations

❖: Model interpretability: The problem with many ML models, however, is that they may perform accurately but their results cannot be easily explained. This has been accomplished by applying methods such as Shapley additive explanation (SHAP), which explains the effects of different parameters on the WQI [29].
❖: Parameter variability: The quality of water from different sources is not constant; thus, source-specific treatments are needed. For example, the levels of pH and total dissolved solids (TDSs) are extremely different in different water sources, which calls for specific strategies [30].
❖: Ethical considerations and data privacy: The application of the IoT and big data in water quality management presents several ethical questions, whose responses are crucial in the advancement of this technology, in ownership and privacy, and in equity in the same technology. These issues must not be taken lightly to gain the confidence of communities regarding the proper implementation of the said measures. In the IoT case, ownership of data gathered through IoT devices is sometimes unclear, particularly when the data are personal or sensitive. Some questions arise regarding who owns these rights and how these data can be utilized [18]. In WBE, for example, a question is who owns the genetic data obtained from wastewater, which is still a concern for ethical implications and suggests that guidelines and policies should be more developed [18]. The problem of privacy violation arises from the massive flow of information from the IoT, which requires effective management. There is a need to adopt the right ethical practices in the development of machine learning to protect such information and make the utilization of the data more transparent.

This is because the present privacy laws are generally responsive, meaning that they may not offer adequate protection against new invasions of privacy as technology continues to evolve [18].

The use of the IoT in low-resource communities means that the participation of these communities in the IoT is bound to worsen inequality if these technologies are not universally deployed. In particular, promoting improvements in water quality measurement based on the Internet of Things should also address the needs of all communities for sustainable development [31].

The combination of these monitoring programs with the use of citizen science initiatives can play a role in filling this gap because of the inclusion of local communities in the collection of data [32].

This is why all the considered ethical issues have opportunities in carving out more effective paradigms, specifically for SES deployment, that take into account the values and norms of communities most affected by such technologies. It is critical to solve these issues to build trust and provide fairness in utilizing the IoT and big data in better water quality monitoring.

AI and ML solutions for environmental monitoring applications present opportunities, but the risk is that the resulting solutions amplify the selected and available biases. This problem is rooted in algorithms’ ability to reproduce existing injustice and fail in a way that discriminates against minorities.

The next parts of this paper discuss the following aspects of this concern.

❖: Data representation and bias: Machine learning that involves the use of nonsample datasets can lead to the generation of incorrect predictions, as discovered by studies on drinking water quality in California. Here, modeling decisions play a highly important role in the demographic features of false negatives [33]. A lack of diverse data hampers performance in certain areas or with certain groups, thus perpetuating current disparities in environmental surveillance and resource distribution [34].
❖: Impact of fata drift: A critical issue with AI/ML models is that they are not robust to data shifts, where performance can degrade as a result of changes in the environment. When the training data are not current, the models carry forward this bias, making the disparities in environmental monitoring even more entrenched [33].
❖: Recommendations for mitigation: To address these biases, there is a call by researchers for the proper verification of algorithms and the inclusion of datasets of diverse populations and settings (Karasaki, S. et al., 2024 [34]) and (Chakraborty, 2024 [35]).
❖: Frameworks: Frameworks can be used to recognize environmental AI problems and how AI can reduce dataset bias [35].

On the other hand, with the use of AI/ML technologies, there is such potential bias, but the use of these innovative technologies promotes the supplementation of environmental monitoring with effective solutions to monitor the quality of collected data. However, the latter depends on the quality and samples of data used in the unearthing of these technologies.

Despite the potential benefits of the integration of machine learning with WQI models in terms of real-time assessment and projection, there are important issues such as the interpretability of the models and the variability of the parameters to be considered. These improvements not only improve the reliability of water quality assessments but also improve the management of water quality in the form of policies related to environmental health.

The reviewed articles on machine learning (ML) applications in water quality prediction revealed several overlooked aspects that warrant deeper critique. These are the incorporation of socioeconomic factors, the stability of ML models, and the possibility of applying such models in developing countries where data are scarce.

The accompanying sociofiscal factors need to also be incorporated, including socioeconomic variables.

Socioeconomic factors play a considerable role in water quality management but are not well incorporated into ML models.

Higher accuracy provided by variables, such as population density, industrial activity, and land use, could increase the models’ usefulness and applicability [36].

An intervention approach to water quality issues that makes use of socioeconomic variables might prove to be more useful.

❖: Robustness to outlier data: Most of the papers researched do not pay proper attention to the effect of outliers on models, which distorts the results [19]. Sophisticated methods such as isolation forest and the kernel density estimation algorithm have received positive reception in the identification of outliers, but they have not received extensive use [19]. Model performance resilience to noisy inputs is important since they are present in real applications.
❖: Limitations od applicability of models to developing areas: The reviewed models tend to use large datasets, which are difficult to obtain in the developing part of the world [37]. New strategies, including the use of sparse dataset modeling, must be introduced to improve the predictiveness of models in these domains. This appears to be the case for regions whose data are severely constrained since closing this gap could enhance water quality management.

4. Challenges in Traditional River Water Monitoring

There are several drawbacks of conventional methods for river water quality assessment, which restrict their ability to provide relevant information. Sometimes, these techniques use costly hardware and software and human interventions, which increase cost and may cause data loss due to human mishandling. Furthermore, conventional techniques are unable to provide long-term forecasts of water quality since analysis is performed manually instead of via algorithms. Moreover, the transportation of water samples can affect their properties in a way that makes it difficult to perform a standard monitoring process [38]. These constraints demand a search for better methods for enhancing the accuracy and effectiveness of water quality assessment.

4.1. Limitation of Traditional Monitoring Methods

High Costs and Specialized Equipment: conventional approaches are expensive in terms of the equipment needed and the personnel who are needed to carry out the processes [38].

Human Error and Data Loss: these can cause errors and result in the loss of data since the data are collected and analyzed manually [38].

❖: Inability to Predict Trends: these methods do not have prognostic value because they are based on human assessment rather than on sophisticated models [38].
❖: Sample Alteration During Transport: this is why the properties of water samples may change during the transportation process, which affects results [38].
❖: Inadequate real-time monitoring: conventional approaches are not capable of generating accurate data in real time, which is paramount to decision-making [39].

River water quality monitoring through conventional methods encounters various obstacles because it demands expensive operations, offers restricted location monitoring and time access, and requires delay in detecting pollution incidents and presents measurement impreciseness. Operation and maintenance expenses create critical barriers to river water quality monitoring because they prove too expensive for numerous areas, especially those located in underdeveloped nations [32]. River system water quality monitoring is restricted in understanding full variations through traditional fixed monitoring stations which survey limited sections of systems. Real-time response to pollution events proves challenging because of time delays in pollutant detection which increases environmental damage potential alongside public health risks. Today’s artificial intelligence systems use data processing methods to provide prompt information that supports quick response operations. The periodic nature of traditional data collection limits the ability to detect pollution trends and forecast future contamination events because data collection is not continuous. The development of AI-based solutions is expanding because these systems deliver real-time analysis and better accuracy, larger area surveillance, and reduced operation expenses [27].

4.2. Addressing the Limitations

❖: Machine learning models: models can be implemented to improve predictive capabilities through the use of on-site parameters to generate predictions with improved water quality index (WQI) predictions [40].
❖: Geospatial frameworks: making use of geospatial frameworks can enhance the transparency and effectiveness of monitoring through providing near real-time data and pollutant control.
❖: Advance sensing technologies: traditional methods can be complex; however, by incorporating more modern sensing methods, such as the IoT, virtual sensing, and cyber–physical systems, real-time detection can be achieved [39].
❖: Biomarker analyses: a combination of classical chemical biological monitoring with cutting-edge techniques such as biomarker analysis can aid in more sensitive assessments of environmental contaminants [41].
❖: Wireless sensor networks: autonomous microsensors could be developed to create networks that can continuously monitor at appropriate time scales, reducing costs [42].

Traditional methods have great limitations, but there are promising modern alternatives. However, these new methods come at the cost of a high initial cost and the need for technical expertise. Finally, these technologies need to be integrated into current frameworks to develop and implement them coherently, together with a series of stakeholders, to provide the most use of the data they generate. These challenges do exist, but advanced monitoring techniques should be adopted for better river water quality assessment and the handling of dynamic river ecosystems.

4.3. Role of Machine Learning in Environmental Monitoring

River water quality monitoring has become dominated by machine learning algorithms for increased accuracy and predictive capabilities. Water quality parameters are forecasted via these algorithms on the basis of historic and real-time data to help anticipate and intervene proactively. Ensembles of deep learning models, including convolutional neural networks, gated recurrent units, and hybrid models that mix several machine learning techniques, have been studied. These models help increase prediction accuracy because they mimic the spatial and temporal patterns exhibited by water quality data.

❖: Deep learning ensembles: Two deep learning models, TNX and STNX, also use temporal and spatial–temporal attention mechanisms. The improvement in prediction accuracy for short-step and long-step predictions over baseline models is 2.1% to 6.1% and 4.3% to 22.0%, respectively. To further enhance performance, the STNX model results in 0.5–2.4% and 2.3–5.7% improvements over TNX and effectively mitigates prediction shifts in long-step forecasts [20].
❖: Convolutional neural networks and gated recurrent units: CNGRU-WQM is an improvement over CNGRUs, which modeled water quality along the Vaigai River via a convolutional neural network and gated recurrent units. The validation accuracy of this model was 97.86%, which was better than that of other state-of-the-art approaches and provided warnings of real-time water quality breaches [10].
❖: Hybrid machine learning models: To improve prediction accuracy, the most representative data subset is selected, and hybrid models, including the adaptive neurofuzzy inference system (ANFIS), artificial neural networks (ANNs), and support vector machines (SVMs), are used. It has been shown that these models still perform with similar or better accuracy than traditional models, especially when trained on highly variable data [21].
❖: Tree-Based and ensemble learning approaches: Water quality index prediction algorithms such as random forest, gradient boosting, and XGBoost are used. Furthermore, results show that these models, especially gradient boosting, have high accuracy, with R² values of 0.88 and 0.85 for training and testing, respectively [22,43].
❖: Recurrent neural networks and LSTM models: The Self-Attentive LSTM (SA-LSTM) model was combined with the Load Estimator (LOADEST) for water quality prediction in regions with sparse data. The method reduced the RMSE by 24.6% for COD Mn and 21.3% for NH3 N and preserved accuracy at longer data collection intervals [37].

While machine learning models greatly benefit water quality prediction, problems such as determining data sparsity and requiring a wealth of data still occur. In addition, since Internet of Things (IoT) technology and real-time monitoring systems are incorporated into these models, they can be more accurate and robust and form the groundwork for managing water resources and environmental conservation [22].

Several studies have used machine learning methods based on river water quality prediction, such as [44], who categorized the water quality indices of rivers in Southeast, South, and West Asia, with the goal of demonstrating the effectiveness of sophisticated algorithms in enhancing the classification.

4.4. Integration of WQI and ML for River Quality Prediction

The incorporation of WQI data with machine learning ensures that the calculated river quality indices have a high level of accuracy and precision as a result of the elaborate computational processes of machine learning algorithms. Integrated with the WQI, machine learning models can also process vast datasets and find what standard approaches might overlook, as well as provide timely results. This integration enables the formulation of models that can incorporate diverse environmental conditions and inputs of data to enhance the control and monitoring of river water quality.

(1)

Enhanced prediction accuracy:

❖: Statistical methods, including the adaptive neurofuzzy inference system (ANFIS), artificial neural networks (ANNs) and support vector machines (SVMs), increase the accuracy of water quality prediction when trained on representative and variable data to decrease overfitting and increase model generalizability [20].
❖: In general, the gradient boosting (GB) and random forest (RF) models have been able to yield high precision in estimating the WQI, with GB accuracies with an R² of 0.88 during model training and 0.85 during model testing [22].
❖: Ensemble models such as models that include more than one machine learning algorithm could be used to predict the WQI with high accuracy, as illustrated when data derived from the Johor River Basin are used [23].

(2)

Data-Driven insights:

❖: Feature importance in machine learning is used to find and rank features that are important for model building, which, on the one hand, enables the addressing of key sources of water pollution and, on the other hand, enhances model interpretability (Ejaz et al., 2024 [22]) and (Aldrees, A. et al., 2024 [29]).
❖: The combination of the modern enhancement of AI-based prediction and feature selection, such as SHAP, improves the ability to interpret models and provides insights into the individual characteristics of water that contribute to poor water quality (Ejaz et al., 2024 [22]) and (Aldrees, A. et al., 2024 [29]).

(3)

Real-Time monitoring and risk mittigation:

❖: Real-time water quality monitoring is performed through machine learning models to prevent pollution by developing preventive measures as early as possible [23].
❖: This implies that, with the help of predictive river health models, the management of resources and the reduction in the unfavorable effects of climate change and anthropogenic activities can be enhanced [40].
❖: Sophisticated measures such as isolation forest and kernel density estimation are used to rectify the data discrepancy and to increase the strength of WQI forecasting [19].

Some limitations are observed when using the WQI and machine learning, where the first limitation is the insufficiency of large datasets and the second limitation is the effect of outlier datasets when the reliability of the water quality index is predicted. Future research should consider improving such models to counter these challenges in addition to better applying their findings under various environmental conditions. Furthermore, the integration of stakeholders in the development and use of these models can be critical because it guarantees that the resulting models are relevant for actual applications in enhancing the management of water quality.

4.5. Real-World Applications and Impacts

River quality prediction models depend on several factors that are environmentally friendly and socioeconomic in nature. Some of these factors include climate change, changing land use and land cover, urbanization, industrialization, and socioeconomic activities. All these factors are vital with respect to the quality of river water, and assessing the effects of these aspects should be considered when models for water quality prediction are developed. The incorporation of these factors into machine learning models can help improve the precision of the patterns, thereby providing support for the sustainable management of water resources.

The environmental factors are as follows:

❖: Climate change: Climate change has a large impact on river water quality because the hydrological regime becomes more sensitive to flow changes. It has been identified to account for more than 70% of the changes in water quality at some points, such as the Kanpur stretch of the Ganga River [45].
❖: Land use and land cover (LULC): LULC conversion to urban land and increased agricultural activities are other ways through which nutrient loads and nonpoint source pollution are increased. Such changes modify the hydrological connectivity and delivery of pollutants to rivers [46].

The socioeconomic factors are as follows:

❖: Urbanization and industrialization: These activities add surfaces that do not allow evaporation and load water with pollutants, thus causing water pollution. The prevalent socioenvironmental situation in the Ganga River study further suggests that integrated and better sewage management needs to be adopted to control the different sources of pollution from urban and industrial areas [45].
❖: Population growth: density results in the increased accumulation of waste and increases the availability of water for human use, thereby increasing pollution levels [44].

The prediction models and techniques are as follows:

❖: Machine learning models: ANNs, XGBoost, and random forest have been used for river water quality prediction. The optimization and simplification of these models involve considering a number of environmental and socioeconomic parameters that could increase precision (Camara et al., 2019 [46]) and (Satish et al., 2024 [47]).
❖: Hybrid models: The SSA-VMD and BiGRU models also divide water quality into several parts to enhance the forecast results. These models have displayed high accuracy compared with other developed models, with prediction accuracies of 97.8% for dissolved oxygen and 96.1% for pH [48].
❖: Deep learning approaches: ST-GCN and LSTM attention are learned techniques for spatial and temporal patterns in water-quality data, improving the precision of predictions (Jiao et al., 2024 [49]) and (Lv et al., 2023 [48]).

However, there are issues with these models, as mentioned, and important gains have been made in terms of prediction accuracy. Thus, it is often important to note the nonstationarity and data limitations that may affect the models. To address these problems, new approaches, such as wavelet analysis and transfer learning, have been developed to improve the efficiency and reliability of the models [50]. Moreover, the coupling of climate change with different socioeconomic scenarios is important for establishing workable sustainable water management approaches [51].

4.6. Model Performance and Predictive Accuracy

Compared with previous methods, the proposed machine learning models significantly enhance prediction. Owing to the high level of data sophistication, the accuracy of typical indicators is high; furthermore, seasonal fluctuations and step-like shifts in river quality indices are well captured by the model. This aspect is particularly important to practitioners, who must obtain prompt and accurate information on water quality for decision-making. On the basis of analysis, accurate and complete training data enable machine learning methods to enhance the adaptation of environmental models to provide dynamic solutions over regression-based models.

4.7. Practical Implication of Findings

The proposed framework of machine learning for the prediction of river water quality has practical relevance if the following issues are appropriately addressed. For environmental managers and policymakers, the information obtained via these models is useful for monitoring and predicting changes in water quality so that interventions can be effected as soon as there is a noticeable increase in pollution intensity. For example, this approach could lead to the development of real-time situation-awareness systems, which would inform decision makers when water quality has dropped below certain thresholds so that response strategies may be initiated earlier.

The use of machine learning (ML) results in policy formulation that has both strengths and weaknesses. The paper suggests that ML can improve decisions by offering nuanced perspectives on policy impacts, but explicit attention to fairness and compliance with policy goals is needed. The subsequent subtopics describe various elements of this integration.

❖: Enhancing policy analysis: When methods from ML are applied, it is possible to estimate various and different treatment effects for different groups. For example, when the decentralized social care system in the Netherlands was examined, ML revealed large differences in the effects of the policy between urban and rural areas [52]. The effectiveness of policies can be evaluated by using large datasets in policy-making since the latter is a key feature of ML [48].
❖: Addressing fairness and bias: When dealing with causal ML models that are used to overcome unfair biases, designers should be careful with sensitive characteristics such as race or gender. Some conventional AI fairness approaches may not be adequate, primarily because most of them implicitly assume that the models make decisions rather than providing suggestions to people to make decisions, which is not always the case with ITS systems [53]. Policy-makers should understand the predisposing biases in outputs generated by ML models and avoid prejudicious policy consequences [54].
❖: Technical difficulties and methodological developments: The task of aligning ML models with often highly complex policy objectives is not easy. Challenges such as inclusivity in training datasets and model configurations are strategic whenever they are integrated [53]. Some new approaches, such as causal ML and multiple objective optimizations, can be useful to fill the gap between the current capacity and the policy aims.

Overall, applying ML provides an opportunity to enhance the quality of policy, and, at the same time, the utilization of such opportunities should be associated with the authors’ critical approach to the ethical considerations and technical constraints of that process. Obtaining the right balance of innovation and realism to meet the aims of fairness and accuracy will be critical in achieving integration into public policy processes.

We previously highlighted that a framework for monitoring real-time water quality needs to be governed by a system that brings together several entities, such as the government, nongovernmental organizations, and some technology companies. This framework should aim at creating a strategy for implementing IoT technology in water quality management to promote both safe drinking water and the safety of the environment.

4.8. Stakeholder Roles

❖: Governments: provide legal bodies and frameworks concerning the quality of water monitoring, funding, and technological advancement in the field.
❖: NGOS: promote and support stakeholders’ involvement in monitoring water quality and encourage members of the public to participate in the same way.
❖: TECH companies: design, implement, and constantly monitor Internet of Things systems that gather water quality monitoring information and make forms and alerts available to the relevant parties [21,55].

4.9. Data Management and Sharing

Advanced cloud systems provide a solution for the storage of data and the use of cloud analysis systems so that the parties involved can use the results and real-time quality of water [55].

Mobile applications can inform the public of water quality problems thus improve engagement and reception [56].

4.10. Continuous Improvement

The efficacy of monitoring systems and communications with stakeholders should be periodically evaluated, gathered and analyzed information should be applied, and the recommendations of a community should be provided [57].

However, there are some obstacles that might be problematic in this respect: data protection issues and financial constraints, as well as differences in the interests of the participating stakeholders and the usage of technology. The above challenges must therefore be addressed to facilitate efficiency in the implementation of real-time water quality monitoring systems.

Moreover, the approach used in this study could be useful in determining how resources could be used in the management of rivers. The ability to predict trends and possible future dangers that affect river quality helps managers strengthen and provide more attention to susceptible zones. This is in line with sustainable water management goals; available resources are best utilized to meet crucial needs, hence aiming at the right problems.

4.11. Comparison with Existing Approaches

Compared with other conventional methods that use previous data records and linear regression to predict river quality, the use of a machine learning algorithm provides certain benefits. The traditional approaches face the challenge of addressing nonlinear relationships in terms of water quality information inherent in data mining. In contrast, the current machine learning models have the ability to address such complexity and, therefore, provides better and more subtle classifications. This is a massive improvement across river quality prediction since it eliminates linear assumptions that address only the relationship between one or two water quality characteristics and addresses the complexity of the interaction of many parameters.

There is also evidence that the machine learning methods adopted in this study are flexible as well. Compared with some more traditional approaches, which often provide very specialized results on the basis of specific datasets or geographic subsets, the portability of machine learning results could be leveraged appropriately for different river systems with additional training. This approach may be particularly useful for areas where water quality data are scarce or collected at a relatively low frequency, as existing machine learning models may exhibit greater generalizability than classic models between different datasets.

4.12. Limitation of Findings

Importantly, the machine learning models discussed in this article return high accuracy, although some aspects are highlighted; the performance of the models contradicts the quality and variety of input data by which the models are limited to some certain extent and therefore cannot be widely applicable across different geographical locations. Furthermore, the models’ accuracy could be subjected to seasonal fluctuations and variability in specific water quality parameters incorporated into the models; thus, future studies could benefit from a more expansive range of input values.

4.13. Exploring Future Research Opportunities

Thus, the findings of this study have potential for a number of future investigations. Other studies involve analyses of other variables, such as land use or climate data, to improve the results of the model. One area that has yet to be properly investigated is the dynamic aspect, which refers to the constant updates of the parameters of these models, perhaps via IoT sensors. This could result in the establishment of an integrated system that monitors river quality and supports real-time decision-making.

The predictors of river water quality are strongly influenced by climate change, with factors such as increasing temperature and changes in the rate of precipitation. The concentration gradients of pollutants may also change, with possible implications for the ecological status of river basins.

❖: Temperature rise and water quality: Warmer water has a low DO holding capacity; therefore, with increasing temperature, the DO concentration decreases in rivers, which harms fish and other aquatic organisms [31]. Higher temperatures can also influence the development of preferential blooms of toxic algae, further worsening nutrient pollution [58].
❖: Altered precipitations patterns: Increased precipitation intensity results in increased flooding and variations in the process of transporting nutrients and setting sediments [16]. For example, precipitation modifications in the Qu’Appelle River also led to high fluctuations in DO and TDS values [59].
❖: Challenges for prediction models: Climate change as a source of variability makes it challenging to adjust for the accuracy of predictive frameworks since flexibility is seen in the increased concentration of pollutants in the Ganga River, as envisaged in certain scenarios in the future. The fact that the constituents of water, such as nutrients and pathogens, result in mixed reactions makes predictions even more challenging since some involve counterbalancing mechanisms [31]. The effects of climate change on river water quality are apparent; however, some degree of protection from some of the negative effects of climate change is associated with increased stream flow and highlights the importance of sophisticated management initiatives. Recent advancements in applying ML techniques to address climate development and the projection of variable and tight environments have been undertaken, with a stronger emphasis on modeling accuracy and data fusion. These factors, of course, are necessary for predicting the effects of climate change on severe weather conditions.
❖: Machine learning for extreme event attribution: Counteractions of extreme weather events have been developed in recent studies with the help of ML techniques, especially convolutional neural networks. For example, Trok and others have shown how ML can quickly identify that extreme heat is linked to climate change and have recently estimated that temperature increases caused by global warming [60]. It facilitates the identification of historical trends that should be attributed to respective classes and future trends that should be expected.
❖: Data integration and accessibility: Preprocessing repositories of information or what the authors refer to as Deep Extreme Cubes are crucial to the ML process. This database compiles several types of Earth observation data to quantify the effects of climate extremes on ecosystems to enhance biosphere dynamics forecasting [61].
❖: Bias correction in climate models: This method is also used in the improvement of the biases of climate models, which significantly improves the forecasting of intense climate systems. Trok, J.T. et al. proved that ML dramatically decreases systematic biases in large-scale environment simulations, contributing to enhancements in extreme weather perception [62]. However, there are still obstacles present in terms of data quality and model interpretability, which may slow the further development of ML applications in climatology. Solving these problems will be crucial for future work. Artificial intelligence (AI) models have been used routinely in climate-sensitive regions to solve diverse problems associated with climatic change. These models make use of complicated data relationships to improve patient prognosis and guide change plans. The following is a list of some of the remarkable uses of ML in various climate-sensitive sectors.
❖: Precipitation downscaling in the colorado river basin: To downscale precipitation in the Colorado River Basin, a new generative diffusion model was used, which showed comparatively lower error with traditional schematics. This model was based on reanalysis precipitation data to identify the precipitation fields at a fairly high resolution, which benefited climate modeling work in this area [63].
❖: Urban Climate Change Adaptation: AI and ML have been studied for their ability to adapt to climate change in urban areas, with a focus on their effectiveness on different continents. The research focuses on such success stories where AI-ML technologies are applied with the aim of increasing the climate trauma resilience of cities, as the findings indicate, and the application of the approaches differs depending on the characteristics of the urban environment [50].
❖: Multiple risk assessments in the Veneto region for multiple risks from extreme climate events in the Veneto region of Italy were performed, and an AI framework was employed to analyze the scenarios. This approach applied supervised ML algorithms to identify risk vulnerability and exposure to various harms to inform risk management [64].
❖: Agricultural impact assessment in Ethiopia: Climatology and climate change models were combined with ML techniques to study the effects of climate change on the Gilgel Gibe Watershed in Ethiopia. The study employed ML models to forecast future climate trends, thus depicting negative trends in rainfall favorable for agriculture and positive trends in temperature that are important in agricultural practices [60]. However, these applications indicate the ability of ML to support climate change vulnerability in climate-sensitive areas, and data integration issues still persist, as does model accuracy. Ongoing improvements in ML methods and their cooperation are necessary to enhance their potential to solve the challenges posed by climate change implications.

Table 2 evaluates traditional river water quality monitoring issues through a comparative analysis with AI and ML solutions. This study includes numerical data and statistical percentages that stem from research-based findings on different monitoring systems. The table shows how traditional methods differ from modern approaches regarding their expenses and precision as well as coverage along with real-time detection capabilities in pollution which demonstrates the current movement towards smart water quality monitoring technologies.

Consumer operations become more economical when AI and smart sensors enter the market because they cut expenses by as much as 75% against traditional business models. The implementation of AI-driven data analysis reduces measurement errors so that they fall from 25% to less than 5%. The implementation of real-time AI analytics minimizes pollution detection time from multiple days down to mere minutes. Smart sensors support broader water surface monitoring compared to other traditional monitoring stations. Early pollution detection happens since AI-based systems forecast water quality alterations before severe damage materializes.

5. Conclusions

In terms of its significance in the study of water management, the current work has implications for designing future research and holds the potential to inform policy. Therefore, prospects for major advances in water management systems may be seen in the conjunction of new technologies, such as AI and the IoT, which are currently classified as cutting-edge technologies, as well as the necessity for the creation of relevant extensive policy frameworks. In the short term, these developments can boost resource productivity and tend to support more sustainable council operations and build resilience against adverse climate change effects. The key findings of this study can be helpful for further research and policy, as the crucial aspect highlighted is how multilateral and cross-disciplinary collaboration should be used when addressing the shortcomings of water management systems. The following are important areas for how this study could impact future studies and provide recommendations for water management.

5.1. Future Research Directions

(1): Integration of AI and the IoT: AI and the IoT, when integrated into water utility modernization, can significantly enhance closed monitoring and optimization procedures. Decision support systems that utilize artificial intelligence will advance the allocation of resources, overall organizational practices, and sustainability and efficiency in water resource management systems [65,66].
(2): Interdisciplinary Approaches: Subsequent studies should attempt to look at the interrelated nature of the sector to address water insecurity issues, which have interdisciplinary features. This includes bringing perspectives outside hydrological systems, such as energy, food, and climate security, into water management strategies [66].
(3): Smart Water Metering (SWM): Although the uptake of SWM technologies is slow, it is important in the proper management of water. Thus, positive water policies are helpful for increasing the efficiency of water assets and supporting sustainable development in the SWM market [67].

5.2. Interdisciplinary Approaches

(1): Integrated water security solutions: Water security also involves competing and/or mutually reinforcing drivers, such as energy, food, and climate security. The structure and process that break down siloes of specialty can create proof-based policy remedies that are sensitive to local conditions, as shown in studies from the Middle East and Africa [66].
(2): Comparative analysis and knowledge sharing: Benchmarking the water practices of the USA against the water practices of Africa can help in the general improvement of global water policy. The implications of this research thus indicate the necessity of collaboration at the international level and knowledge sharing as preconditions for context-adapted strategies as well as for the creation of sustainable water futures [68,69].

5.3. Policy and Governance

(1): Comprehensive water policies: Water management should consider science, environmental, economic, and cultural policy frameworks for its optimum use. Water scarcity is triggered by population pressure, economic development, and climate change and therefore requires coordinated, intersectoral effort [70].
(2): Considerations of broader impact: It is recommended that the broader impacts of water management research are studied systematically to help in policy formulation. This encompasses taking into account social, ethical, and cultural effects and ensuring that all decisions made are open [71]. Although the application of this study to various aspects of life is encouraged, several difficulties persist regarding the diffusion of knowledge into concrete practice. The proposed versus reported broader impacts in research projects reveal the absence of frameworks that capture and address these impacts, particularly for marginalized populations [72]. Moreover, the slow adoption of technologies such as SWM also implies that adequate policy support and cooperation from all stakeholders are still required for the technology to be adopted fully [67]. Thus, future research and policy can learn from these challenges and optimize technological and interdisciplinary adaptations toward sustainable water management.

5.4. Educational and Awareness Initiatives

Awareness creation and outreach classes are vital for ensuring that different organizations implement ML-based systems in their organizations. The abovementioned efforts can help increase public understanding and spread and promote trust and cooperation in the development of ML technologies. The subsequent subtopics explain the following areas of this position.

5.5. Public Engagement as Significant

(1): Democratic governance: public awareness of the topic of ML creates awareness and makes those developing the technology fully responsible to the public to produce systems that are acceptable to society [73].
(2): Diverse perspectives: communalities enable the consideration of these various views, which, when followed, could result in proper and responsible algorithmic findings [74].

5.6. Educational Campaigns

(1): Awareness and understanding: campaigns can fill the knowledge gaps existing in the minds of various stakeholders by providing explanatory information about the specific ML technologies needed for targeted work, as practice shows in theoretical and empirical studies of science engagement [75].
(2): Skill development: educational interventions may provide people with proper knowledge that would enable them to manage and transition to the use of ML tools in innovation, such as digital marketing [76].

5.7. Case Studies and Examples

The incorporation of the public in ML use for SWM shows how integrated approaches can improve results and decision-making [77].

Despite the importance of education campaigns and public engagement in the uptake of ML systems, it is possible that not all members of the public will be accounted for or involved. These differences mean that there can be a lack of equal distribution of the advantages resulting from the implementation of ML technologies, thereby requiring further work to make all groups of people participants in the projects that use such technologies.

6. Implications for Future Technologies

Recent advances in machine learning, including the adoption of AI technology with IoT devices and the application of deep learning structures, can be enumerated as having considerable potential for improving river quality prediction. These methods also have the additional advantages of enhancing the process of collecting data, as well as analyzing and predicting water quality, with a more profound understanding of dynamics in the process. AI and the IoT can enable the real-time observation and application of big data analytic techniques; deep learning models can further the ability of models to parse through multiple datasets. Some of the aspects of these emerging techniques and their use in the prediction of river quality are included the following.

6.1. Integration of AI and IoT

❖: Water quality parameters include temperature, pH, and pollutant levels, with the IoT capable of continuously collecting data and informing appropriate actions when needed [77].
❖: The IoT increases the geographical spread of the space where data are collected, and as a result, the datasets that are available for analysis are broader. This may enhance the realism of the models and, in turn, improve the allocation of resources and prognosis [78].

However, there are several barriers, such as internet connections and data loss, which should be addressed to obtain accurate data that are received and analyzed [79].

6.2. Deep Learning Architectures

❖: Implemented deep learning algorithms, particularly convolutional neural networks, can work effectively with large datasets, hence increasing the precision of water quality prediction [48,80].
❖: This approach can significantly minimize the need to gather large amounts of data by working from devices already trained, leading to cost-effectiveness and quick river quality prediction [48].
❖: Owing to its ability to employ large databases, deep learning can be used in the prediction of the concentration of contaminants, including fecal indicator bacteria, through a set of sensor-collected physicochemical data [48].

Although such new methods can indeed improve prognosis in the context of river quality, it is important to look at the wider implications and issues connected with them. For example, the combination of AI and the IoT requires strong supporting structures and data management to address the volumes of data produced. Furthermore, concerns need to be raised and answered in connection with ethical questions, especially the privacy resulting from the collection and use of such technologies.

7. Proposed Innovations

Applying the concept of incorporating machine learning and blockchain as a new paradigm for water quality data management is secure and transparent. It can be delivered by melding the strengths of both information technologies where monitoring, prediction, and accountability in systems affecting water management can be achieved. The next three sections provide details on the following aspects of this integration.

❖: IoT and blockchain for real-time monitoring: IoT devices maintain constant observations and measurements, including of pH, turbidity, and the temperature of water, to obtain real-time data on water quality. Such data are protected, and through the use of blockchain, these data are made to be high-quality and permanent, created to increase the level of trust between the different stakeholders [81].
❖: Machine learning for predictive analysis: Other ML techniques based on random forest have been used in previous works to improve water quality predictions with good accuracy [82]. When trained with ML and accompanied by blockchain, the prediction process can then be safeguarded, and only those authorized for data change can do so [82].
❖: Enforcement and compliance: This shows that with blockchain, compliance can be managed by recording instant violations and hence promptly penalizing industries with water source pollution. Thus, this integration not only increases accountability but also helps in observing environmental legislation. Closely related to the main advantages of ML together with blockchain, the existing issues include data privacy and the difficulty of implementing such a system. To enable such integrated systems to take root in the management of water, these challenges must be addressed [83].

With generative AI, there is a more innovative way of performing a modeling process through the assumption of different scenarios for water quality, thus making it easy for the real-world environment to be solved. These algorithms enable researchers to predict the behavior of water quality and study the possible effects of climate change and water contamination. The following sections discuss the uses and advantages of generative AI in this respect.

❖: AI-Driven scenario modeling: Generative AI includes tools such as the python toolbox introduced for water distribution networks (WDNs), which helps generate the hydraulic and water quality scenarios required for researchers to simulate complexities such as contamination and leakage. Clinical hyporheic-zone biogeochemical activity has been successfully simulated via conditional deep convolutional generative adversarial networks (cDC-GANs) without the need for significant parameterization [84].
❖: Climate change impact assessment: Machine learning and AI models such as random forest have been used to forecast climate change effects on water quality and quantity more than physically based models do. These models interpret weather patterns to forecast the actions of river flow and sediment load on the basis of climate, with demonstrations of the versatility of AI for such analyses [85].
❖: Risk evaluation and decision support: Advanced AI-based solutions, including the s-WQI, form the basis for complex methodologies to assess water quality with leaching periods, taking into account the seasonal changes and estimating the existing risks via the Monte Carlo method. AI helps to improve decision-making related to water quality assessments, offering the optimization of water management and early signs of a negative shift.

Unfortunately, generative AI provides many improvements over the existing models that simulate water quality; however, there are pitfalls that can be observed: the necessity to feed the AI with high-quality data and the fact that overcomplicated scenarios could lead to overfitting. The blending of AI and conventional approaches may offer the most stable way of addressing future water quality issues.

Author Contributions

Conceptualization, H.S.; methodology, M.H.Z.; data collection and analysis, M.H.Z.; supervision, G.H.; writing—original draft preparation, H.S. and M.H.Z.; writing—review and editing, G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data and materials that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that there are no competing interests.

References

Essien-Ibok, M.A.; George, U.U.; Titus, D.I. Assessing ecosystem health: A comparative study using water quality index analysis across ten lotic system in the cross river system. J. Geogr. Environ. Earth Sci. Int. 2024, 28, 96–113. [Google Scholar] [CrossRef]
Li, H.; Xue, C.; Song, L.; Yu, Z.; Zhang, J.; Xiao, K. Application of comprehensive water quality labeling index method in water quality evaluation of Xiangjiang river. In Proceedings of the 2023 4th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+AI), Guiyang, China, 15–17 December 2023; IEEE: New York, NY, USA, 2023; pp. 147–151. [Google Scholar]
Zaman, M.U.; Pandit, B.A.; Singh, R.; Zulfiya, M. Evaluating water health: A review of varied water quality index computation techniques. Int. J. Res. Appl. Sci. Eng. Technol. 2023, 11, 716–720. [Google Scholar] [CrossRef]
Jeong, D.Y.; Choi, Y.H.; Kim, K.H. Evaluation of the water quality of Nakong river using the real-time water quality index. J. Korean Soc. Hazard Mitig. 2024, 24, 199–207. [Google Scholar] [CrossRef]
Najafzadeh, M.; Basirian, S. Evaluation of river water quality index using remote sensing and artificial intelligence models. Remote Sens. 2023, 15, 2359. [Google Scholar] [CrossRef]
Harmel, R.D.; Preisendanz, H.E.; King, K.W.; Busch, D.; Birgand, F.; Sahoo, D. A review of data quality and cost considerations for water quality monitoring at the field scale and in small watersheds. Water 2023, 15, 3110. [Google Scholar] [CrossRef]
Simonoska, E.; Bogatinoska, D.C.; Dimitrievski, I.; Malekian, R. Sensor system for real-time water quality monitoring. In Proceedings of the 2023 46th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia, 22–26 May 2023; IEEE: New York, NY, USA, 2023; pp. 114–119. [Google Scholar]
Khan, G.; Ali, W.; Qureshi, M.H.; Kumar Gola, K.; Chauhan, A. Water quality monitoring system using wireless sensor networks. In Proceedings of the 2023 6th International Conference on Contemporary Computing and Informatics (IC3I), Gautam Buddha Nagar, India, 14–16 September 2023; IEEE: New York, NY, USA, 2023; pp. 239–245. [Google Scholar]
Thakur, A.; Devi, P. A comprehensive review on water quality monitoring devices: Materials advances, current status, and future perspective. Crit. Rev. Anal. Chem. 2024, 54, 193–218. [Google Scholar] [CrossRef]
Geetha, T.S.; Gandhimathi, G.; Chellaswamy, C.; Thiruvalar Selvan, P. comprehensive river water quality monitoring using convolutional neural networks and gated recurrent units: A case study along the Vaigai river. J. Env. Manag. 2024, 365, 121567. [Google Scholar] [CrossRef]
Il Kim, H.; Kim, D.; Mahdian, M.; Salamattalab, M.M.; Bateni, S.M.; Noori, R. Incorporation of water quality index models with machine learning-based techniques for real-time assessment of aquatic ecosystems. Environ. Pollut. 2024, 355, 124242. [Google Scholar] [CrossRef]
Khaskheli, S.A.; Ahmed Rahu, M.; Siraj, S.; Jamshed, H.; Iqbal, S. Optimized water quality forecasting using machine learning. Int. J. Inf. Syst. Comput. Technol. 2024, 3, 46–60. [Google Scholar] [CrossRef]
Nitya, N.J. Computational machine learning analytics for prediction of water quality. Commun. Appl. Nonlinear Anal. 2024, 31, 448–465. [Google Scholar] [CrossRef]
Khoi, D.N.; Quan, N.T.; Linh, D.Q.; Nhi, P.T.T.; Thuy, N.T.D. Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water 2022, 14, 1552. [Google Scholar] [CrossRef]
Mokarram, M.; Pourghasemi, H.R.; Pham, T.M. Enhancing water quality monitoring through the integration of deep learning neural networks and fuzzy method. Mar. Pollut. Bull. 2024, 206, 116698. [Google Scholar] [CrossRef] [PubMed]
Tefera, G.W.T.; Ray, R.L.; Singh, V.P. Simulating the effect of climate change scenarios on surface water quality in the bosque watershed, central texas, united states. Res. Sq. 2023. [Google Scholar] [CrossRef]
Koleva, R.; Zaev, E.; Babunski, D.; Stefanovski, D.; Rath, G. Computational methods for water quality index calculation using real-time measurement system. In Proceedings of the 2024 13th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 11–14 June 2024; IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
Jacobs, D.; McDaniel, T.; Varsani, A.; Halden, R.U.; Forrest, S.; Lee, H. Wastewater monitoring raises privacy and ethical considerations. IEEE Trans. Technol. Soc. 2021, 2, 116–121. [Google Scholar] [CrossRef]
Uddin, M.G.; Rahman, A.; Rosa Taghikhah, F.; Olbert, A.I. Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-a case study of irish water quality index (IEWQI) model. Water Res. 2024, 255, 121499. [Google Scholar] [CrossRef]
Zheng, Y.; Wei, J.; Zhang, W.; Zhang, Y.; Zhang, T.; Zhou, Y. An ensemble model for accurate prediction of key water quality parameters in river based on deep learning methods. J. Env. Manag. 2024, 366, 121932. [Google Scholar] [CrossRef]
del Castillo, A.F.; Garibay, M.V.; Díaz-Vázquez, D.; Yebra-Montes, C.; Brown, L.E.; Johnson, A.; Garcia-Gonzalez, A.; Gradilla-Hernández, M.S. Improving river water quality prediction with hybrid machine learning and temporal analysis. Ecol. Inf. 2024, 82, 102655. [Google Scholar] [CrossRef]
Ejaz, U.; Khan, S.M.; Jehangir, S.; Ahmad, Z.; Abdullah, A.; Iqbal, M.; Khalid, N.; Nazir, A.; Svenning, J.C. Monitoring the industrial waste polluted stream—integrated analytics and machine learning for water quality index assessment. J. Clean. Prod. 2024, 450, 141877. [Google Scholar] [CrossRef]
Sidek, L.M.; Mohiyaden, H.A.; Marufuzzaman, M.; Noh, N.S.M.; Heddam, S.; Ehteram, M.; Kisi, O.; Sammen, S.S. Developing an ensembled machine learning model for predicting water quality index in johor river basin. Env. Sci. Eur. 2024, 36, 67. [Google Scholar] [CrossRef]
Lin, Z.; Lim, J.Y.; Oh, J.M. Innovative interpretable AI-guided water quality evaluation with risk adversarial analysis in river streams considering spatial-temporal effects. Environ. Pollut. 2024, 350, 124015. [Google Scholar] [CrossRef]
Asadollah, S.B.H.S.; Sharafati, A.; Motta, D.; Yaseen, Z.M. River water quality index prediction and uncertainty analysis: A comparative study of machine learning models. J. Env. Chem. Eng. 2021, 9, 104599. [Google Scholar] [CrossRef]
Tabassum, S.; Kotnala, C.B.; Masih, R.K.; Shuaib, M.; Alam, S.; Alar, T.M. Performance analysis of machine learning techniques for predicting water quality index using physiochemical parameters. In Proceedings of the 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 14–16 June 2023; IEEE: New York, NY, USA, 2023; pp. 372–377. [Google Scholar]
Nguyen, D.T.; Luong, T.L.; Nguyen, N.M.; Nguyen, T.Q.; Le, N.T.N.; Truong, C.D.; Nguyen, T.L.; Ho, V.L.; Dinh, Q.K.; Huynh, D.C.; et al. Monitoring water quality parameters in aquaculture using edge computing. Vietnam J. Catal. Adsorpt. 2024, 12, 135–140. [Google Scholar] [CrossRef]
Ismail, R.; Rawashdeh, A.; Al-Mattarneh, H.; Hatamleh, R.; Telfah, D.B.; Jaradat, A. Artificial intelligence for application in water engineering: The use of ann to determine water quality index in rivers. Civ. Eng. J. 2024, 10, 2261–2274. [Google Scholar] [CrossRef]
Aldrees, A.; Khan, M.; Taha, A.T.B.; Ali, M. Evaluation of water quality indexes with novel machine learning and SHapley additive ExPlanation (SHAP) approaches. J. Water Process Eng. 2024, 58, 104789. [Google Scholar] [CrossRef]
Tetali, R.R.; Salomi, K.; Gope, E.R. Analysis of water quality parameters across diverse sources. J. Pharma Insights Res. 2024, 2, 210–216. [Google Scholar] [CrossRef]
Khalil, M.; Al-Haija, Q.A. Ethical machine learning for internet of things network. In Ethical Artificial Intelligence in Power Electronics; CRC Press: New York, NY, USA, 2024; pp. 12–20. [Google Scholar]
Amador-Castro, F.; González-López, M.E.; Lopez-Gonzalez, G.; Garcia-Gonzalez, A.; Díaz-Torres, O.; Carbajal-Espinosa, O.; Gradilla-Hernández, M.S. Internet of things and citizen science as alternative water quality monitoring approaches and the importance of effective water quality communication. J. Env. Manag. 2024, 352, 119959. [Google Scholar] [CrossRef]
Kaswan, P.; Marzban, M.F.; Nam, W.; Akkarakaran, S.; Luo, T. Statistical AI/ML model monitoring for 5G/6G: Interference prediction case study. In Proceedings of the 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, CO, USA, 9–13 June 2024; IEEE: New York, NY, USA, 2024; pp. 638–643. [Google Scholar]
Karasaki, S.; Morello-Frosch, R.; Callaway, D. Machine learning for environmental justice: Dissecting an algorithmic approach to predict drinking water quality in California. Sci. Total Environ. 2024, 951, 175730. [Google Scholar] [CrossRef]
Chakraborty, S. Towards a comprehensive assessment of AI’s environmental impact. arXiv 2024. [Google Scholar] [CrossRef]
Torres-Martínez, J.A.; Mahlknecht, J.; Kumar, M.; Loge, F.J.; Kaown, D. Advancing groundwater quality predictions: Machine learning challenges and solutions. Sci. Total Environ. 2024, 949, 174973. [Google Scholar] [CrossRef]
Huang, S.; Xia, J.; Wang, Y.; Lei, J.; Wang, G. Water quality prediction based on sparse dataset using enhanced machine learning. Environ. Sci. Ecotechnol. 2024, 20, 100402. [Google Scholar] [CrossRef]
Sani, S.A.; Ibrahim, A.; Musa, A.A.; Dahiru, M.; Baballe, M.A.; Sadiku, N.; Sani, A. Drawbacks of traditional environmental monitoring systems corresponding author. TMP Univers. J. Res. Rev. Arch. 2023, 2, 2583–7214. [Google Scholar]
Zainurin, S.N.; Wan Ismail, W.Z.; Mahamud, S.N.I.; Ismail, I.; Jamaludin, J.; Ariffin, K.N.Z.; Wan Ahmad Kamil, W.M. Advancements in monitoring water quality based on various sensing methods: A systematic review. Int. J. Env. Res. Public Health 2022, 19, 14080. [Google Scholar] [CrossRef] [PubMed]
Azha, S.F.; Sidek, L.M.; Ahmad, Z.; Zhang, J.; Basri, H.; Zawawi, M.H.; Noh, N.M.; Ahmed, A.N. Enhancing river health monitoring: Developing a reliable predictive model and mitigation plan. Ecol. Indic. 2023, 156, 111190. [Google Scholar] [CrossRef]
Popescu, S.M.; Mansoor, S.; Wani, O.A.; Kumar, S.S.; Sharma, V.; Sharma, A.; Arya, V.M.; Kirkham, M.B.; Hou, D.; Bolan, N.; et al. Artificial intelligence and IoT driven technologies for environmental pollution monitoring and management. Front. Env. Sci 2024, 12, 1336088. [Google Scholar] [CrossRef]
Namour, P.; Clement, Y.; Bernard, C.; Lyon, U.; Breil, P.; Lanteri, P. Environmental monitoring and water safety are claiming innovative tools. In Proceedings of the 8th International Conference Novatech 2013 (Event), Lyon, France, 23–27 June 2013; p. 8. [Google Scholar]
Md. Islam, J.; Salekin, S.U.; Abdullah, M.S.; Zaman, N.; Khan, A.A.A. Evaluation of water quality assessment through machine learning: A water quality index-based approach. Res. Sq. 2024. [Google Scholar] [CrossRef]
Shaheed, H.; Zawawi, M.H.; Hayder, G. Water quality index classification of southeast, south and west asia rivers using machine learning algorithms. J. Ecohumanism 2024, 3, 2752–6801. [Google Scholar] [CrossRef]
Benzerra, A.; Cherrared, M.; Chocat, B.; Cherqui, F.; Zekiouk, T. Decision support for sustainable urban drainage system management: A case study of Jijel, Algeria. J. Env. Manag. 2012, 101, 46–53. [Google Scholar] [CrossRef]
Camara, M.; Jamil, N.R.; Abdullah, A.F. Bin impact of land uses on water quality in malaysia: A review. Ecol. Process. 2019, 8, 10. [Google Scholar] [CrossRef]
Satish, N.; Anmala, J.; Rajitha, K.; Varma, M.R.R. A stacking ANN ensemble model of ml models for stream water quality prediction of Godavari River basin, India. Ecol. Inf. 2024, 80, 102500. [Google Scholar] [CrossRef]
Lv, M.; Niu, X.; Zhang, D.; Ding, H.; Lin, Z.; Zhou, S.; Zhu, Y. A data-driven framework for spatiotemporal analysis and prediction of river water quality: A case study in Pearl River, China. Water 2023, 15, 257. [Google Scholar] [CrossRef]
Jiao, J.; Ma, Q.; Huang, S.; Liu, F.; Wan, Z. A hybrid water quality prediction model based on variational mode decomposition and bidirectional gated recursive unit. Water Sci. Technol. 2024, 89, 2273–2289. [Google Scholar] [CrossRef]
Srivastava, A.; Maity, R. Assessing the potential of AI–ML in urban climate change adaptation and sustainable development. Sustainability 2023, 15, 16461. [Google Scholar] [CrossRef]
Santy, S.; Mujumdar, P.; Bala, G. Projections of water quality along an industrialized stretch of Ganga River for the Mid-21st Century. In Proceedings of the AGU Fall Meeting 2021, New Orleans, LA, USA, 13–17 December 2021. [Google Scholar]
Verhagen, M.D. Using Machine Learning to Study Effect Heterogeneity in Large-Scale Policy Interventions: The Dutch Decentralisation of the Social Domain. SocArXiv 2024. [Google Scholar] [CrossRef]
Fischer-Abaigar, U.; Kern, C.; Barda, N.; Kreuter, F. Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector. Gov. Inf. Q. 2024, 41, 101976. [Google Scholar] [CrossRef]
Rehill, P.; Biddle, N. Fairness implications of heterogeneous treatment effect estimation with machine learning methods in policy-making. arXiv 2023. [Google Scholar] [CrossRef]
Iancu, G.; Ciolofan, S.N.; Drăgoicea, M. Real-time IoT architecture for water management in smart cities. Discov. Appl. Sci. 2024, 6, 191. [Google Scholar] [CrossRef]
Mohanasundaram, R.; Sagar, R.K.; Khandelwal, A.; Upadhyay, D.; Poddar, H.; Rajagopal, S. Water quality monitoring and control in urban areas in real-time via IoT and mobile applications. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT), Vellore, India, 3–4 May 2024; IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar]
Nishan, R.K.; Akter, S.; Sony, R.I.; Hoque, M.M.; Anee, M.J.; Hossain, A. Development of an IoT-based multi-level system for real-time water quality monitoring in industrial wastewater. Discov. Water 2024, 4, 43. [Google Scholar] [CrossRef]
Dorado-Guerra, D.Y.; Paredes-Arquiola, J.; Pérez-Martín, M.Á.; Corzo-Pérez, G.; Ríos-Rojas, L. Effect of climate change on the water quality of mediterranean rivers and alternatives to improve its status. J. Env. Manag. 2023, 348, 119069. [Google Scholar] [CrossRef]
Hassanjabbar, A.; Nezaratian, H.; Wu, P. Climate change impacts on the flow regime and water quality indicators using an artificial neural network (ANN): A case study in Saskatchewan, Canada. J. Water Clim. Change 2022, 13, 3046–3060. [Google Scholar] [CrossRef]
Bojer, A.K.; Woldetsadik, M.; Biru, B.H. Machine learning and CORDEX-Africa regional model for assessing the impact of climate change on the Gilgel Gibe Watershed, Ethiopia. J. Env. Manag. 2024, 363, 121394. [Google Scholar] [CrossRef]
Ji, C.; Fincke, T.; Benson, V.; Camps-Valls, G.; Fernandez-Torres, M.A.; Gans, F.; Kraemer, G.; Martinuzzi, F.; Montero, D.; Mora, K.; et al. DeepExtremeCubes: Integrating Earth System Spatio-Temporal Data for Impact Assessment of Climate Extremes. arXiv 2024. [Google Scholar] [CrossRef]
Trok, J.T.; Barnes, E.A.; Davenport, F.V.; Diffenbaugh, N.S. Machine learning–based extreme event attribution. Sci. Adv. 2024, 10, eadl3242. [Google Scholar] [CrossRef] [PubMed]
Saoulis, A.; Moraga, S.; Lord, N.; Uhe, P.; Addor, N. Application of Novel Generative Diffusion Models to Precipitation Downscaling. In Proceedings of the European Geosciences Union General Assembly 2024 (EGU24), Vienna, Austria, 14–19 April 2024; p. 19266. [Google Scholar]
Ferrario, D.M.; Sano, M.; Tiggeloven, T.; Claasen, J.; Petrovska, E.; Maraschini, M.; de Ruiter, M.; Torresan, S.; Critto, A. Artificial intelligence for climate change multi-risk assessment: A myriad-EU case study in the Veneto Region. In Proceedings of the European Geosciences Union General Assembly 2024 (EGU24), Vienna, Austria, 14–19 April 2024; p. EGU24-16585. [Google Scholar]
Elshaikh, A.; Mabrouki, J.; Mohamed, A.A.O. The future of water management. In Advancements in Climate and Smart Environment Technology; IGI Global: Hershey, PA, USA, 2024; pp. 213–225. [Google Scholar]
Ismail, S.; Dawoud, D.W.; Ismail, N.; Marsh, R.; Alshami, A.S. IoT-based water management systems: Survey and future research direction. IEEE Access 2022, 10, 35942–35952. [Google Scholar] [CrossRef]
Msamadya, S.; Joo, J.C.; Lee, J.M.; Choi, J.S.; Lee, S.; Lee, D.J.; Go, H.W.; Jang, S.Y.; Lee, D.H. Role of water policies in the adoption of smart water metering and the future market. Water 2022, 14, 826. [Google Scholar] [CrossRef]
Nwokediegwu, Z.Q.S.; Adefemi, A.; Ayorinde, O.B.; Ilojianya, V.I.; Etukudoh, E.A. Review of water policy and management: Comparing the USA and Africa. Eng. Sci. Technol. J. 2024, 5, 402–411. [Google Scholar] [CrossRef]
Tóth, T.; Matias Silva, B. Development directions of water management by comparing Rio Grande do Norte to Hungary. Belügyi. Szle. 2021, 69, 53–67. [Google Scholar] [CrossRef]
Bekele, D.A.; Bona, S.K.; Haji, S.H. Water policy for sustainable management: A review. J. Resour. Dev. Manag. 2021, 72, 2422–8397. [Google Scholar] [CrossRef]
Dassler, T.; Myhr, A.I.; Lalyer, C.R.; Frieß, J.L.; Spök, A.; Liebert, W.; Hagen, K.; Engelhard, M.; Giese, B. Structured analysis of broader GMO impacts inspired by technology assessment to inform policy decisions. Agric. Hum. Values 2024, 41, 449–458. [Google Scholar] [CrossRef]
Woodson, T.; Boutilier, S. From intent to impact—The decline of broader impacts throughout an NSF project life cycle. Res. Eval. 2023, 32, 348–355. [Google Scholar] [CrossRef]
Gilman, M.E. Democratizing AI: Principles for meaningful public participation. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
Pallett, H.; Price, C.; Chilvers, J.; Burall, S. Just public algorithms: Mapping public engagement with the use of algorithms in UK public services. Big Data Soc. 2024, 11, 20539517241235867. [Google Scholar] [CrossRef]
Crettaz von Roten, F. Broadening the audience for science engagement with machine-learning techniques. Front. Commun. 2024, 9, 1382952. [Google Scholar] [CrossRef]
Miklosik, A.; Kuchta, M.; Evans, N.; Zak, S. Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access 2019, 7, 85705–85718. [Google Scholar] [CrossRef]
Hanoon, S.K.; Abdullah, A.F.; Shafri, H.Z.M.; Wayayok, A. A novel approach based on machine learning and public engagement to predict water-scarcity risk in urban areas. ISPRS Int. J. Geoinf. 2022, 11, 606. [Google Scholar] [CrossRef]
Balachandran, T.; Abreu, T.; Naloufi, M.; Souihi, S.; Lucas, F.; Janne, A. IoT and transfer learning based urban river quality prediction. In Proceedings of the 2022 IEEE Global Communications Conference (GLOBECOM 2022), Rio de Janeiro, Brazil, 4–8 December 2022; IEEE: New York, NY, USA, 2022; pp. 257–262. [Google Scholar]
Al-Rawas, G.; Nikoo, M.R.; Al-Wardy, M.; Etri, T. A Critical review of emerging technologies for flash flood prediction: Examining artificial intelligence, machine learning, internet of things, cloud computing, and robotics techniques. Water 2024, 16, 2069. [Google Scholar] [CrossRef]
Yang, Y. Current trends in deep learning. Adv. Eng. Technol. Res. 2023, 5, 422. [Google Scholar] [CrossRef]
Samanta, S.; Sarkar, A. IoT and blockchain for smart water quality management in future cities: A Hyperledger fabric framework for smart water quality management and distribution. Res. Sq. 2023. [Google Scholar] [CrossRef]
Sharma, A.; Sharma, R.; Rana, R.; Kalia, A. Water quality prediction using machine learning models. E3S Web Conf. 2024, 596, 01025. [Google Scholar] [CrossRef]
Alharbi, N.; Althagafi, A.; Alshomrani, O.; Almotiry, A.; Alhazmi, S. A blockchain based secure IoT solution for water quality management. In Proceedings of the 2021 International Congress of Advanced Technology and Engineering (ICOTEN), Taiz, Yemen, 4–5 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
Artelt, A.; Kyriakou, M.S.; Vrachimis, S.G.; Eliades, D.G.; Hammer, B.; Polycarpou, M.M. A toolbox for supporting research on ai in water distribution networks. arXiv 2024. [Google Scholar] [CrossRef]
Moezzi, S.M.M.; Abousaeidi, A.; Khorashadi Zadeh, F.; Sheikholeslami, R.; Nkwasa, A.; van Griensven, A. Projecting Impacts of Climate Change on Water Quality and Quantity Using an AI-Based Model as an Innovative Alternative to Physically-Based Models. In Proceedings of the EGU General Assembly 2024, Vienna, Austria, 14–19 April 2024; p. EGU24-21915. [Google Scholar]

Table 1. Summary of previous studies on water quality prediction via machine learning.

Reference		Method/Model Used	Results
[6]	(Harmel et al., 2023)	Traditional water quality assessment.	Perceived weaknesses in non-auto sampling that included high costs, labor and time intensity, and inadequate coverage.
[10]	(Geetha, T.S. et al., 2024)	Convolutional neural networks (CNNs) and gated recurrent units (GRUs).	Transaction and real-time monitoring accuracy on the Vaigai River was validated, achieving a level of 97.86%.
[11]	(Kim et al., 2024)	In the presented work, monthly WQI values are predicted using the long short-term memory (LSTM) model.	Achieved a superior accuracy with an R² of 0.91 which suggested more accuracy as compared to support vector regression (SVR) and random forest models.
[12]	(Khaskheli et al., 2024)	The optimized machine learning models for water quality forecasting.	Applied gradient boosting to achieve a mean absolute error (MAE) = 1.8074 and proved that feature selection is highly relevant.
[13]	(Nitya Nand Jha, 2024)	Generally, random forest for classification.	Observed a 99.50% accuracy in the classification of actual and potential water quality classes using physicochemical parameters.
[14]	(Khoi et al., 2022a)	The machine learning algorithms that can be implemented for these data comprise a gradient boosting model and polynomial regression.	Showed good accuracy in estimating WQI values in river systems.
[15]	(Mokarram et al., 2024)	Fuzzy neural networks.	Combined deep learning with fuzzy techniques to improve SWQRs for water quality in polluted areas.
[16]	(Tefera et al., 2023)	In this paper, a deep learning model with feature selection is proposed for disease diagnosis Ghana as a case study.	The integration of random forest with AdaBoost improved interpretability and the accuracy of the chosen model in a real-time monitoring environment.
[17]	(Koleva et al., 2024)	The Internet of Things (Internet of Things systems).	Facilitated the instant computation of the WQI in real time with sensors for turbidity, pH, and DO in water bodies.
[18]	(Jacobs et al., 2021)	The fourth topic is ethical consideration in AI/ML models.	Sparred on data privacy and equality issues regarding the use of the IoT and machine learning with applications in water management.
[19]	(Uddin et al., 2024)	This paper presents an approach for outlier detection with isolation forest.	Emphasized the need to address noisy data for improved estimates of water quality prediction models in particular.
[20]	(Zheng et al., 2024)	Ensemble deep learning models.	Implemented a spatial–temporal attention mechanism for better prediction accuracy up to 22%.
[21]	(del Castillo et al., 2024)	The ANFIS model as a combination of an ANFIS, an ANN, and an SVM.	Improved accuracy in estimating dissolved oxygen and pH in various conditions in water environments.
[22]	(Ejaz et al., 2024)	As the names suggest, random forest and gradient boosting are also another two important algorithms for machine learning.	Quitted with high proficiency rates by achieving a training R² of 0.88 and a testing R² of 0.85.
[23]	(Sidek et al., 2024)	The analysis of ensemble models for water quality prediction.	They also established the efficiency of predicting the WQI for the Johor River Basin through hybrid techniques.
[24]	(Lin et al., 2024)	This paper delves into the exploration of Explainable AI using SHAP analysis.	Highlighted WQI prediction model features and contributed to understanding key parameters which affect the model’s results.

Table 2. Comparison of traditional and AI-based river water quality monitoring strategies.

Criterion	Traditional Methods	AI- and Machine Learning-Based Methods	Improvement Percentage Using AI	References
Annual Operating Cost (per monitoring point)	More expansive than machine learning-based methods	Fewer costs	60–75% reduction in cost	(Zainurin, et al., 2022) [39]
Geographical Coverage (monitoring stations per 100 km²)	2–5 stations	15–50 IoT sensors	500–900% improvement in coverage	(Lv et al., 2023) [48]
Response Time for Pollution Detection	12–48 h (lab-based analysis)	1–5 min (real-time AI analysis)	Over 99% improvement in detection speed	(Trok et al., 2024) [62]
Quality Measurement Accuracy (pollutant measurement error rate)	±15–25%	±2–5%	80–90% improvement in accuracy	(Amador-Castro, et al., 2024) [32]
Monitoring Continuity	Intermittent (once or twice per month)	Continuous (24/7)	100% improvement in data continuity	(Khoi et al., 2022) [14]
Early Pollution Detection Capability	30–50% of cases detected after damage occurs	85–98% of cases detected in real time	70–90% improvement in early detection	(Khoi Huang et al., 2024) [37]
Equipment Cost per Monitoring Station	More expansive than machine learning-based methods	Fewer costs	60–80% reduction in equipment cost	(Trok et al., 2024) [62]
Logistical and Environmental Constraints	Difficult access to some areas	Deployable self-operating sensors and drones	Significant improvement in operational flow	(Nguyen et al., 2024) [27]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shaheed, H.; Zawawi, M.H.; Hayder, G. The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review. Processes 2025, 13, 810. https://doi.org/10.3390/pr13030810

AMA Style

Shaheed H, Zawawi MH, Hayder G. The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review. Processes. 2025; 13(3):810. https://doi.org/10.3390/pr13030810

Chicago/Turabian Style

Shaheed, Hassan, Mohd Hafiz Zawawi, and Gasim Hayder. 2025. "The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review" Processes 13, no. 3: 810. https://doi.org/10.3390/pr13030810

APA Style

Shaheed, H., Zawawi, M. H., & Hayder, G. (2025). The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review. Processes, 13(3), 810. https://doi.org/10.3390/pr13030810

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Development of a River Quality Prediction Model That Is Based on the Water Quality Index via Machine Learning: A Review

Abstract

1. Introduction

1.1. Complexity and Cost of Manual Sampling

1.2. Limited Spatial and Temporal Coverage

1.3. Lack of Real-Time Monitoring

1.4. Data Quality and Consistency

1.5. Potential of Machine Learning and Advanced Technologies

1.6. Key Machine Algorithms

1.7. Advantages and Limitations

2. Summary of Original Article

2.1. Machine Learning Models for Water Quality Prediction

2.2. Advanced Monitoring Systems

2.3. Feature Selection and Model Optimization

2.4. Key Themes in the Literature

2.5. Machine Learning Integration with WQI

2.6. Real-Time Monitoring

3. Challenges and Considerations

4. Challenges in Traditional River Water Monitoring

4.1. Limitation of Traditional Monitoring Methods

4.2. Addressing the Limitations

4.3. Role of Machine Learning in Environmental Monitoring

4.4. Integration of WQI and ML for River Quality Prediction

4.5. Real-World Applications and Impacts

4.6. Model Performance and Predictive Accuracy

4.7. Practical Implication of Findings

4.8. Stakeholder Roles

4.9. Data Management and Sharing

4.10. Continuous Improvement

4.11. Comparison with Existing Approaches

4.12. Limitation of Findings

4.13. Exploring Future Research Opportunities

5. Conclusions

5.1. Future Research Directions

5.2. Interdisciplinary Approaches

5.3. Policy and Governance

5.4. Educational and Awareness Initiatives

5.5. Public Engagement as Significant

5.6. Educational Campaigns

5.7. Case Studies and Examples

6. Implications for Future Technologies

6.1. Integration of AI and IoT

6.2. Deep Learning Architectures

7. Proposed Innovations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI