A Bibliometric-Systematic Literature Review (B-SLR) of Machine Learning-Based Water Quality Prediction: Trends, Gaps, and Future Directions

Jeimmy Adriana Muñoz-Alegría; Jorge Núñez; Ricardo Oyarzún; Cristian Alfredo Chávez; José Luis Arumí; Lien Rodríguez-López

doi:10.3390/w17202994

,

and

¹

Doctorate Program in Energy, Water and Environment, University of La Serena, La Serena 1700000, Chile

²

Department of Mining Engineering, University of La Serena, La Serena 1700000, Chile

³

Water Research Center for Agriculture and Mining (CRHIAM), ANID FONDAP Center, Universidad de Concepción, Concepción 4070411, Chile

⁴

Center for Advanced Studies in Arid Zones (CEAZA), La Serena 1700000, Chile

Water2025, 17(20), 2994;https://doi.org/10.3390/w17202994

This article belongs to the Special Issue Machine Learning Applications in the Water Domain

Version Notes

Order Reprints

Abstract

Predicting the quality of freshwater, both surface and groundwater, is essential for the sustainable management of water resources. This study collected 1822 articles from the Scopus database (2000–2024) and filtered them using Topic Modeling to create the study corpus. The B-SLR analysis identified exponential growth in scientific publications since 2020, indicating that this field has reached a stage of maturity. The results showed that the predominant techniques for predicting water quality, both for surface and groundwater, fall into three main categories: (i) ensemble models, with Bagging and Boosting representing 43.07% and 25.91%, respectively, particularly random forest (RF), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGB), along with their optimized variants; (ii) deep neural networks such as long short-term memory (LSTM) and convolutional neural network (CNN), which excel at modeling complex temporal dynamics; and (iii) traditional algorithms like artificial neural network (ANN), support vector machines (SVMs), and decision tree (DT), which remain widely used. Current trends point towards the use of hybrid and explainable architectures, with increased application of interpretability techniques. Emerging approaches such as Generative Adversarial Network (GAN) and Group Method of Data Handling (GMDH) for data-scarce contexts, Transfer Learning for knowledge reuse, and Transformer architectures that outperform LSTM in time series prediction tasks were also identified. Furthermore, the most studied water bodies (e.g., rivers, aquifers) and the most commonly used water quality indicators (e.g., WQI, EWQI, dissolved oxygen, nitrates) were identified. The B-SLR and Topic Modeling methodology provided a more robust, reproducible, and comprehensive overview of AI/ML/DL models for freshwater quality prediction, facilitating the identification of thematic patterns and research opportunities.

Keywords:

water quality; machine learning; explainable artificial intelligence (XAI); Bibliometric-Systematic Literature Review (B-SLR); Topic Modeling

1. Introduction

Surface and groundwater pollution is a problem of global concern, as it can negatively affect freshwater availability, a key aspect for the conservation of ecosystems and the health of the population. This is aggravated by the strong pressure exerted by a growing demand for fresh water to sustain population growth and associated economic development [1,2,3]. Consequently, access to reliable sources of clean water directly influences the capacity to ensure water security, particularly given that nearly 2.2 billion people lack safe drinking water [4].

Aquatic conditions are shaped by multiple variables, including weather patterns [5], seasonal hydrological fluctuations [6], geological features [7], and anthropogenic influences such as land use. Traditionally, assessments have relied on in situ sensors or laboratory analysis of biological and physicochemical parameters. More recently, predictive approaches have incorporated physically based simulation tools like QUAL2K [8,9,10] and WASP [11,12], alongside statistical techniques such as multivariate analysis [13,14], and linear regression [15,16].

Within this landscape, the application of artificial intelligence (AI) over recent decades has significantly advanced predictive techniques for aquatic system assessment. AI transcends basic sensor-based data collection [17], by processing, interpreting, and modeling the complex, multidimensional relationships inherent in water systems [18,19]. This approach allows it to dynamically predict the state of water quality [20], supporting proactive and optimized resource management. Notably, the integration of machine learning (ML) [20,21,22,23] and deep learning (DL) [24,25,26,27,28,29,30] models have strengthened the path toward meeting the Sustainable Development Goals (SDGs) related to water [31]. These prediction models can handle large amounts of data and adapt to complex relationships. In fact, the use of ML for water quality studies is highlighted as an example of the United Nations Global Acceleration Framework, for the scope of the SDG-6 [32,33].

A key advantage of applying AI/ML/DL to water quality prediction lies in their ability to anticipate variations in physicochemical and biological parameters using historical data, remote sensing, or hydrological models—thus enhancing environmental management and evidence-based decision-making. Two main approaches are commonly used: (a) prediction of individual water quality parameters (WQP) [10,34,35,36,37] and (b) integration of these parameters into a composite indicator, traditionally known as the water quality index (WQI) [38,39,40,41,42]. For the first approach, one example is the prediction of total phosphorus concentrations in Taihu Lake, China, using the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and long short-term memory (LSTM) algorithms, combined with Shapley Additive Explanations (SHAP) for interpretability [43]. Another study [44], applied the random forest (RF) method to predict 14 physicochemical parameters from historical data in the Loa River, Chile. Regarding the second approach, ML models were used to estimate WQI in the Bug River, Ukraine [45]. High predictive performance was also reported in Kerala, India [46], where various ML models—including extreme gradient boosting (XGB), support vector regression (SVR), artificial neural network (ANN), and random forest regression (RFR)—were evaluated to predict the entropy-weighted water quality index (EWQI) and assess groundwater quality.

The aforementioned studies are examples of how AI/ML/DL methods have been consolidated as water quality prediction techniques, given their robustness and efficiency to extract patterns of complex systems, as water systems, making these techniques an interesting alternative to conventional water quality modeling techniques [47,48,49]. Indeed, systematic reviews of the literature at a global level show the high interest of researchers in the use of AI/ML/DL for water quality prediction. For example, the authors in [50] reviewed 876 articles within the period 2015–2022, showing that the United States, England, Iran, India, and China have emerged as major contributors to the field of water quality prediction with ML and DL. Likewise, the authors in [51] reviewed 249 articles on water quality using Internet of Things (IoT) models and machine learning. In another study [52], 253 articles were reviewed, finding that DL optimizes the processing of large volumes of data through parallel computing, facilitating the effective prediction of water quality, although its success depends on the quality of the data used. Finally, in [53], the literature was reviewed in terms of groundwater quality prediction studies using AI/ML/DL, finding that the ANN, Adaptive Neuro-Fuzzy Inference System (ANFIS) and support vector machine (SVM) techniques have proven to be efficient and accurate tools for such purposes.

Despite the intrinsic value of the aforementioned literature review studies, it is important to highlight a key conceptual distinction between the two main review approaches. One is known as bibliometric analysis, that is, a quantitative method that allows the identification of emerging trends, thematic networks, and research gaps [54,55,56]. Another corresponds to a systematic review of the literature, a rigorous and structured method used to collect and evaluate all available evidence on a research question [57,58,59]. Although the two approaches complement each other, and together they form a promising methodology with which to develop robust reviews of the scientific literature, to date, no work, to the best of our knowledge, has addressed the study of the state of the art on the application of AI/ML/DL for water quality prediction through the integration of both approaches. Thus, such a dual-approach review corresponds to the main objective of this work, as this contribution proposes to analyze the state of the art on the application of AI/ML/DL for water quality prediction in the 2000–2024 time frame, using the Bibliometric-Systematic approach Literature Review (B-SLR) [60]. We hope that the results of this work can guide researchers interested in the study, assessment and prediction of water quality using AI/ML/DL techniques.

2. Materials and Methods

We employed a structured framework for conducting systematic literature reviews, known as the Bibliometric-Systematic Literature Review (B-SLR), following the guidelines outlined in [60]. This approach was used to assess the current status, emerging trends, and research gaps in the application of AI/ML/DL techniques to freshwater (surface and groundwater) quality studies. The methodology B-SLR combines the bibliometric analysis (BA) [56,61,62] with the systematic literature review (SLR) [63,64,65]. Thus, the B-SLR approach facilitates the broadening of topic scope, expanding the domain of knowledge available to researchers working in the field [59]. According to the adopted methodology, this work was developed in three sequential stages: (a) data collection, (b) bibliometric analysis, and (c) a systematic review of the literature, as illustrated in Figure 1.

Figure 1. Stages of the B-SLR methodology. Stage 1: data collection: defining keywords, search terms, applying inclusion/exclusion criteria, and filtering irrelevant results using topic modeling. Stage 2: bibliometric analysis: identifying trends, thematic clusters, and performance metrics. Stage 3: systematic review: extracting and synthesizing the evidence to answer the research questions.

2.1. Data Gathering Process

This stage aims to identify the research topic and the structural framework to establish the research questions, keywords, search strings, database, time scope of the research, techniques to carry out the search and analysis tools, where each stage generates a result that serves as an input for the next stage.

The research questions raised in this study seek to identify a set of patterns behind the application of AI/ML/DL techniques in water quality prediction. Table 1 presents the guiding research questions of this study with their corresponding justification, following the guidelines of the systematic review of the literature [64].

Table 1. Research questions defined in this study.

The bibliographic references were obtained from the Scopus database (https://www.scopus.com/home.uri), which offers a comprehensive and global view of scientific production. Scopus is one of the most used sources in bibliometric analysis and is recognized for its reliability and high data quality, especially in academic research related to water resources and water quality [17,55,56,66,67,68].The temporal scope was defined from January 2000 to December 2024. The structured search string used to obtain the first consolidated set of documents (N = 3157) and the characteristic code line generated after applying the inclusion/exclusion criteria to create a second consolidated set of documents (N = 1822) are presented in Table 2. The inclusion and exclusion criteria presented in Table 3 allow determining the eligibility of the primary studies for inclusion in the bibliographic review and are useful for extracting relevant information and answering the research questions posed.

Table 2. Search strategy applied to Scopus database.

Table 3. Inclusion and exclusion criteria used in the B-SLR methodology.

The bibliographic references were exported from Scopus in CSV format and subsequently imported into the R programming environment [62]. To filter the 1,822 original articles selected for the B-SLR analysis, a comprehensive list of 29,144 Scimago journals (https://www.scimagojr.com/) for the year 2023 was incorporated, with the objective of retaining only those studies published in journals classified within quartiles Q1 and Q2, in accordance with the recommendations described in reference [60]. This procedure involved excluding publications prior to 2014, indicating that such studies were published in journals indexed outside Q1 and Q2, thereby establishing by default an analysis period between 2015 and 2024. In addition, a systematic cleaning process was carried out, which included the removal of duplicate rows, incomplete records in the fields of Index Keywords, Author Keywords, DOI, and Abstract, as well as the elimination of entries with duplicate DOIs and false positives [69]. These publications were not aligned with the objectives of the present study.

The automation of this process was achieved through the implementation of an iterative text-mining approach based on Topic Modeling, whose procedure is illustrated in Figure 2. For this procedure, we used the TextMiner package in R as a specialized tool for the construction and analysis of topic models [70]. The process operates under the assumption that each document is a combination of several topics and that each topic is a set of words that frequently co-occur [71]. The algorithm identifies these groups of coexisting words to infer the latent topics within the text [72,73].

Figure 2. Flowchart of the Topic Modeling Process.

2.2. Bibliometric Analysis (BA)

The bibliometric analysis was conducted using the Bibliometric package in the R programming language version 4.4.1 [62]. This package has been widely applied in bibliometric studies related to hydrology [55,74] and water quality [75,76]. The analysis included the generation and visualization of maps and graphs corresponding to performance analysis [17,55,61] and science mapping [56,77,78].

2.3. Systematic Literature Review (SLR)

A comprehensive reading stage was conducted for the 276 selected articles, conducted using the Systematic Literature Review (SLR) approach described in [59], in order to extract key data and information to address the research questions. This methodology enabled the identification, collection, analysis, and synthesis of relevant information to answer the research questions posed [79]. Its robustness stems primarily from the transparency of its implementation process, which ensures the reproducibility of the review [80,81].

Each study was classified as either a Review or a Research article. Within the Research category, articles were further classified according to the type of water body under study: surface water or groundwater. A particular case involved two Review articles [82,83] that addressed both systems (groundwater and surface water); this condition did not affect the classification by water body type, as these studies were not part of the Research category.

3. Results

3.1. Bibliometric Performance Analysis

The annual scientific output on water quality prediction using AI/ML/DL has increased in the study period, as shown in Figure 3, exhibiting an exponential increase, especially marked since 2020. It should be noted that the annual production of the domain field shows an exponential growth pattern between 2015 and 2024. Consequently, it was decided to apply an exponential model, in order to more accurately represent both the growth acceleration and the current development phase of this research area.

Figure 3. Annual publication output on freshwater quality prediction using AI/ML/DL.

The growing interest in research on the prediction of water quality using AI/ML/DL is within a global context driven by climate change, and increased pollution and demand for fresh water [5,84]. This evolution indicates that the analyzed domain has passed an initial stage of exploration, positioning itself in a phase of maturity. Likewise, it is possible to expect an even more significant development in the coming years, anticipating a continuous strengthening of both interest and research in this field [85,86].

3.1.1. Journals

In this context, the H-index is a metric that allows for the evaluation of scientific research productivity and journal impact. On the other hand, the Total Citations (TC) refers to the cumulative number of times a publication has been cited by other researchers in academic publications, revealing the scope and influence of the research from complementary perspectives. Global Citations (GC) refers to the total number of times an article has been cited in the Scopus database.

When ranking the top 10 of the 32 sources based on the H-index and Total Citations (TC) [87], it is observed that the journals Water (Switzerland), Journal of Hydrology, Environmental Science and Pollution Research, Science of the Total Environment and Water Research are the publications with the highest number of papers (first 5) within the reference collection (Table 4).

Table 4. Journals Local impact.

3.1.2. Most Cited Documents

The most cited documents within the analyzed collection of bibliographic references were identified. Both global and local citation counts were considered to assess the reach and influence of the research from complementary perspectives. In this context, local citations reflect the impact of a document within the specific dataset under analysis, while global citations represent the total number of times each article has been cited in the Scopus database.

Table 5 provides the top 10 most cited publications and corresponding (leading) authors in terms of local and global citations. The referred papers reveal important contributions to research on water quality prediction using AI/ML/DL techniques.

Table 5. Top ten Local citations and Global citations.

According to Table 3, the most cited reference corresponds to the work of Barzegar [88] entitled “Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model”, which investigates the prediction of water quality in Lake Prespa, Greece, and proposes a hybrid CNN-LSTM model to predict dissolved oxygen (DO; mg/L) and chlorophyll-a (Chl-a; μg/L).

Second, the study [89] compares different machine learning methods (such as artificial neural networks and support vector machines) for the prediction of water quality parameters of northwest Iran’s Aji-Chay River, demonstrating the effectiveness of these techniques in improving environmental monitoring. Its high number of global citations reflects the interest in using water quality prediction to support the transition toward more sustainable water resources management.

Third, the work of [90] examined the application of a decision tree model for predicting the Water quality index (WQI) in the Klang River. The research successfully showed that the number of water quality parameters needed for monitoring can be reduced while maintaining prediction accuracy above a 75% benchmark. Finally, the most recent study of the list [95], proposes a hybrid approach that combines the Random Forest algorithm with an improved version of the SMO algorithm for support vector machines. Applied to the Saf-Saf River Basin, this model improved WQI prediction, highlighting the growing role of optimized hybrid approaches in watershed management.

3.2. Bibliometric Science Mapping

3.2.1. Network Analysis of the Co-Occurrence of Authors’ Keywords

The network analysis of co-occurrence of author keywords presented in Figure 4 reveals a conceptual structure derived from studies on machine learning and water quality management. Each node represents a keyword, with its size indicating the degree of connectivity (number of links to other terms). Connecting lines reflect co-occurrence relationships, and their thickness represents the strength of those associations. Nodes are grouped into three thematic communities, distinguished by color: group 1 (blue) and group 2 (orange), enabling the identification of conceptual subdomains within the field.

Figure 4. Network analysis co-occurrence of the author’s keywords.

Central terms such as support vector machine, deep learning, water quality predictions, and water quality management, along with technical acronyms like ML (machine learning), AI (artificial intelligence), SHAP (SHapley Additive exPlanations), ANN, CNN (convolutional neural networks), LSTM (long short-term memory), and random forest, exhibit high connectivity. This suggests their integrative role in the analyzed literature and highlights their methodological relevance in the development of predictive and explanatory models applied to water systems. The visualization supports the identification of thematic clusters, methodological relationships, and emerging areas at the intersection of artificial intelligence and environmental management.

3.2.2. Word Cloud

Another method for identifying the frequency of the most common terms used by the authors in the collection corresponds to the word cloud generated from the keywords. Figure 5 shows a predominant interest in “water quality,” which serves as the central axis of the study corpus. Its association with indicators such as “WQI,” “WQP,” and water bodies like “groundwater quality” and “surface water” reaffirms the environmental orientation of the studies. From a methodological standpoint, there is a predominant interest in techniques such as “random forest,” “deep learning,” “ML,” and “ANN,” along with specific architectures like “LSTM,” “CNN,” “RNN,” and “SVM,” used in modeling and prediction processes of hydrological parameters.

Figure 5. Word cloud of water quality prediction by AI/ML/DL. The size of the words reflects their frequency and the color is used only to differentiate them within the group of words.

Complementing this trend is the appearance of terms like “hybrid model,” “ensemble learning,” and “transfer learning,” reflecting the adoption of integrated strategies and techniques for transferring pretrained representations aimed at improving model generalization and efficiency. Finally, “XAI” and “SHAP” reveal an interest in model explainability, pointing toward advanced analytical systems for the management and evaluation of water sustainability.

3.2.3. Thematic Map with Authors’ Keywords

Figure 6 presents a thematic map derived from bibliometric analysis, grouping clusters associated with distinct research subtopics in the field of water quality prediction using AI, ML, and DL. This map enables the evaluation of each topic in terms of centrality and maturity, identifying established, emerging, and declining areas. In the quadrant of Basic themes, terms like “water quality prediction,” “deep learning,” and “LSTM” reflect the consolidation of deep neural networks as a methodological foundation. Notably, LSTM networks have been highlighted in multiple studies for their strength in modeling complex temporal dynamics in water quality parameter prediction [30,88,96,97]. Furthermore, hybrid models and the ANFIS technique have provided enhanced precision and flexibility in complex scenarios, particularly in surface water contexts [89,98,99,100].

Figure 6. Thematic map based on the author’s keyword. The colors represent the different clusters of related terms that form a thematic group, and the size of each circle indicates the frequency of occurrence or density of publications related to that group.

In the quadrant of Motor themes, terms such as “WQI,” “ANN,” “AI,” and “ML” stand out, indicating that artificial neural networks and machine learning models have been extensively studied and applied, yielding significant results. Conversely, “transfer learning” emerges as a nascent exploratory line within water quality prediction, although several studies [6,96,101,102,103,104] have demonstrated its potential to improve predictions in data-scarce environments.

In the quadrant of Niche themes, the terms “data-driven models,” “lake water quality,” and “prediction intervals” suggest that lake water analysis and data-driven modeling represent growing niches within water-related research, though they have yet to become dominant in AI/ML/DL-based water quality prediction.

In the quadrant of Emerging or declining themes, terms such as “support vector machine,” “feature selection,” and “gaussian process regression” indicate that these methods (SVR, GPR), although historically robust and widely used in this field, may be losing relevance compared to more modern approaches such as deep learning. Additionally, procedures like feature selection may have already transitioned into standard practice within AI/ML/DL applications. Meanwhile, “extreme gradient boosting” and “SHAP” are associated with advanced ML techniques but have not yet reached the prominence of core topics. The emerging discipline known as Explainable Artificial Intelligence (XAI), or Interpretable Machine Learning (iML), seeks to address the challenges posed by the opacity of black-box techniques in AI/ML/DL. Its application has expanded in recent years across various engineering domains [85,105,106,107], and has recently been implemented in the field of water quality, particularly in groundwater studies [108,109].

3.2.4. Thematic Evolution and Trend Topics with Keywords Plus

The thematic evolution in the BA is useful to visualize more clearly the change and evolution of research topics over time, allowing the identification of windows of opportunity for future research work. To have a more global view of the thematic evolution and current trends in water quality prediction using AI/ML/DL, Figure 7 presents the thematic evolution based on the Keywords Plus. These words are automatically generated by the BA algorithms from sources such as the titles of the references cited in the documents, allowing for to assessment, from a complementary perspective, of the areas that are related to the central axis of the research search. Thus, the application of models based on ANN and decision support techniques is maintained over time, while models based on SVM have been less constant in recent years.

Figure 7. Thematic evolution of AI/ML/DL applications in water quality prediction.

In recent years, cutting-edge ML and DL techniques have increasingly been applied to environmental challenges such as pollution, climate change, water quality, and water availability. These approaches are often integrated with Geographic Information Systems (GIS) technologies [1,2,110,111,112], enhancing spatial analysis capabilities.

While GIS is now widely used in the field of water resources, it has yet to become a dominant tool for predicting water quality through AI/ML/DL methods. Notably, interest in developing predictive models for surface water quality has remained consistent throughout the study period. Since 2015, ML has emerged as the leading approach, demonstrating strong applicability in environmental monitoring, wastewater treatment, and water quality assessment, indicating a sustained research focus on water resources management.

3.2.5. Social Structure

Figure 8 presents the map of main collaborations between countries, which reveals a highly interconnected structure of scientific cooperation. Countries such as China and the United States stand out, acting as central nodes, evidencing their leadership in the production of knowledge and in the articulation of international efforts. This dynamic reflects not only the shared interest in addressing the challenges of water quality prediction but also the growing need to integrate data, methodologies, and interdisciplinary approaches on a global scale. Countries such as India, Australia, the United Kingdom, Italy, and Germany stand out for their collaborative connections as well.

Figure 8. Collaboration with country authors. The thickness of each line represents the intensity of collaboration between two countries and the colors represent the dominant clusters of regional or thematic collaboration.

However, the map also reveals a striking absence of Latin American countries, suggesting a potential asymmetry in scientific visibility. This underrepresentation may be partially attributed to the methodological filter applied in this study, which included only publications indexed in high-impact Q1 and Q2 journals. While this criterion ensures scientific rigor, it may inadvertently exclude valuable regional research that, due to structural or editorial limitations, does not reach these publication venues. As a result, the map not only visualizes collaboration intensity but also reflects broader epistemic inequalities in global scientific discourse.

3.3. Systematic Literature Reviews Results

The systematic review allowed us to classify each work into different categories according to: Approach (Review or Research articles), body of water of the study, that is, Surface water (rivers, reservoirs and lakes) or Groundwater, as presented in Figure 9. Figure 9a shows the classification of the 274 articles collected according to their type: Research Articles and Review Articles. The disparity between the two approaches suggests that, although research in the area is growing rapidly, a significant gap persists in terms of consolidating knowledge through systematic reviews of the state of the art. This represents a valuable opportunity to strengthen the field through studies that integrate and synthesize existing findings (such as ours).

Figure 9. Classification of publications in water quality prediction according to (a) Approach: Article research versus Review. (b) Body of water: Underground versus Surface, (c) Surface water: River, lake, reservoir.

Within Research articles, 75% (n = 190) correspond to studies focused on surface water quality, while the remaining 25% (n = 65) address groundwater quality issues (Figure 9b). This bias towards the study of surface waters can be attributed to the relevance of watershed monitoring for the sustainable management of water resources. This monitoring allows us to understand the complex interactions between biological, chemical, physical, and environmental factors that determine water quality, as well as to anticipate the appearance of alterations to water quality. This information is crucial for informed decision-making and long-term planning, which are essential for preserving the health of water bodies and the ecosystems that depend on them [113].

In Figure 9c, we present the distribution of works considering only surface water bodies, i.e., rivers, lakes, and reservoirs. It is shown a major focus in rivers, which could be because they are particularly exposed to pollution caused by anthropogenic activities [39,114]. In addition, during periods of drought, rivers are more easily and rapidly affected by the reduction in the flow, affecting the quality of the water in terms of its physical, chemical and biological properties [115]. The study [57], made a significant contribution with the exhaustive review of the literature in the period 2000–2020 on the modeling of river water quality in the world, reviewing 209 research articles from Scopus journals. The results showed that most of the study areas are Asian countries, such as China, Iran, India, Malaysia, Taiwan, Korea, Iraq, Bangladesh and Thailand, accounting for more than 50% of research papers. These results are consistent with the findings of the current work for the period 2015–2024. Based on the classification shown in Figure 8, this study delves into the two water bodies that are currently of greatest interest in the application of AI/ML/DL techniques for water quality prediction: rivers and groundwater. For this reason, a detailed description of the literature for them is presented below.

3.3.1. Prediction of River Water Quality Using AI/ML/DL

Focusing on the publications on water quality prediction in rivers, Table 6 presents the characterization of a random sample of the database, equivalent to 40% (n = 57).

Table 6. Characterization of predictive models in a representative sample of river water quality prediction studies (n = 57).

Recent advances in AI/ML/DL techniques have significantly enhanced the prediction of river water quality, as demonstrated by the 57 studies summarized in Table 4. These investigations employ a range of algorithmic approaches, including CNN-LSTM, XGB, RF, LSTM, SVM, and hybrid models to estimate key indicators such as the water quality parameter (WQP) and the water quality index (WQI) across rivers in Asia, America, Europe, and Africa. Artificial neural networks (ANNs) have played a prominent role in this domain, owing to their capacity to model complex nonlinear relationships and anticipate fluctuations in water quality, as confirmed by studies [135,140]. In [42], ANN-based models were evaluated alongside gradient boosted trees (GBT), decision trees (DTs), support vector machines (SVM), and random forests (RFs) for predicting the WQI of the Indian river system, with GBT achieving the highest performance. Similarly, ref. [157] implemented four standalone decision tree models and twelve hybrid configurations to estimate the WQI of the Talar River in Iran. The Bagging-Random Tree (BA-RT) approach yielded the most accurate results, reinforcing the superiority of hybrid models over simpler alternatives, as also noted in [161].

The integration of optimization algorithms has proven to significantly enhance the accuracy of machine learning (ML) models in predicting river water quality. For instance, studies [24,95] employed Particle Swarm Optimization (PSO) and Sequential Minimal Optimization (SMO) to strengthen models such as PSO-DBN-LSSVR for the Juhe River in China and SMO-SVM for the Wadi Saf-Saf River in Algeria, respectively. Additionally, the application of Wavelet Transform (WT) has been key in identifying relevant variables and reducing noise, as demonstrated in studies on the Aji-Chay River (Iran) [89], Fujian (China) [103], Dongjiang (China) [147], and Sefid Rud (Iran) [156]. These enhancements have led to improved predictive performance in models such as WT-SVM, WT-RF, and WT-ANN. Another widely applicable technique is XGB, which has been successfully implemented in basins such as the Delaware River (USA) [117], the Han River (South Korea), and regions of the northwestern United States [142], due to its ability to capture complex interactions among predictors and deliver high predictive accuracy.

Hybrid architectures that integrate Convolutional Neural Networks (CNNs) with LSTM units have also demonstrated strong performance in river water quality prediction, as shown in studies on the Yangtze River [116,158]. In [114], the SSA-CNN-LSTM model was applied to the Sheshui River, achieving effective integration of temporal and spatial patterns for estimating the water quality parameter (wQp). Other studies have explored variants such as Gated Recurrent Units (GRUs) [124,127], Bidirectional GRU (BiGRU) networks [150], and hybrid approaches like ANFIS–GP and ANFIS–SC [145] to address nonlinear modeling challenges. In this context, emerging techniques such as Transfer Learning (TL) have shown considerable promise by enabling the reuse of previously acquired knowledge in new environments, thereby enhancing predictive performance, as demonstrated in studies on the Fujian river system in China [102,103].

Finally, the geographical diversity of the studies listed in Table 4 spans a wide range of river systems, including the Yangtze [116,158], Sheshui [114], Tanjiang [123], Li and Liu [127], Pearl [137], Fuyang [138], Xiaofu [130], Lijiang [131], Juhe [24], Euphrates [128,145], Júcar [134], Yamuna [30,145], Langat [141], Klang [148], Sefid Rud [156], Talar [136], Sefidrood [94], Aji-Chay [89], Nakdong [149,155], Oyster River [126], Upper Red River Basin [102], Danube [144], Burnett [154], Kelantan [140], and Bullfrog [135]. These rivers span diverse climatic zones from tropical and temperate to arid and mountainous and reflect a broad spectrum of hydrological and geological contexts. This territorial breadth highlights the versatility of AI/ML/DL models in adapting to varied environmental conditions, reinforcing their value as effective tools for the sustainable management of water resources.

3.3.2. Prediction of Groundwater Quality Using AI/ML/DL

Regarding groundwater, Table 7 presents a random sample of 40% (n = 26) of the publications on groundwater quality prediction found in our database, spanning various regions of the world. Diverse environmental, hydrogeological, socioeconomic, and technological factors have shaped the development of advanced computational approaches for predicting groundwater quality.

Table 7. Characterization of predictive models in a representative sample of groundwater quality prediction studies (n = 26).

These approaches range from traditional statistical models and standalone machine learning algorithms (e.g., decision trees, support vector machines, etc.) to hybrid, ensemble methods and SHAP-enhanced models. They also include the integration of geospatial tools, remote sensing data, and cloud-based platforms for data processing and visualization. The most frequently studied contaminants are nitrates, arsenic, fluoride, heavy metals, and total dissolved solids. In addition to studies focused on individual water quality parameters (WQPs), there is research incorporating integrative water quality indices such as the water quality index (WQI), irrigation water quality index (IWQI), entropy-weighted water quality index (EWQI), and groundwater quality index (GWQI).

The wide variety of AI/ML/DL models highlights a transition from traditional approaches such as ANN, LSTM, Multilayer Perceptron (MLP) and CNN, to more sophisticated models such as XGB and LightGB. These models are generally applied in studies that require modeling nonlinear relationships or time series, and have been widely used to estimate water quality indices as well. From the reading of the studies listed in Table 5, it is possible to identify a current trend in the use of ensemble-type algorithms, such as Random Forest, Gradient Boosting and Bagging, which are considered robust methods capable of handling the high dimensionality and multicollinearity of predictive models [46,164].

Other ensemble algorithms such as CatBoost, Bagging, and Extra Trees have also been frequently used to predict specific WQP such as nitrates [162,165,176,184], salinity levels [164,173], metals [163,180], as well as water quality indices such as WQI, IWQI, EWQI, and GWQ. These algorithms allow multiple hydrochemical variables to be handled, and reliable predictions to be constructed in complex environments. Regression models have also been widely used in this area, such as the use of Multinomial Logistic Regression (MnLR), which allows water quality to be classified into multiple categories, being useful in environmental risk and zoning studies [178].

Among the emerging models, Generative Adversarial Network (GAN) [163] and Group Method of Data Handling (GMDH) [167] stand out. They are generally used in contexts with complex dynamics, particularly with scarce data, and be particularly useful in predictions of Sr²⁺ and salinity levels [163]. In addition, explanatory models such as SHAP and LIME represent a significant advance in the interpretability and explainability of predictive models applied to groundwater quality [180]. These techniques, typical of the field of XAI, are important because they allow the outputs of complex algorithms to be broken down into quantifiable contributions of each input variable, allowing one to understand how much each factor influences the final prediction. For example, in [164] SHAP was applied to assess the impact of physicochemical factors on salinity levels in multiple aquifers, revealing key spatial patterns using interpretable maps. Similarly, [165] used SHAP in a nitrate prediction model to identify the main geoenvironmental pollution-related drivers in a UK aquifer, integrating interpretation and prediction into a unified framework.

In this sense, the integration of GIS technologies, together with XAI models, strengthens the capacity to spatially represent the results, detect risk areas and generate useful tools for water management. Together, the AI/ML/DL models allow the prediction of groundwater quality to be approached from an interdisciplinary, explanatory and applicable perspective, configuring themselves as key tools for water sustainability in complex environmental scenarios [108,170].

On the other hand, in the hydrogeological context, studies are conducted in regions characterized by a wide range of factors, including aquifers affected by saline intrusion (Vietnam, Iran), arid zones (Algeria, Saudi Arabia), and areas with carbonate lithology that induce water hardness (India, USA). Natural processes such as weathering, evapotranspiration, and the influence of volcanic emissions also contribute to the spatial and temporal variability of hydrochemical parameters [163,179,180]. These hydrogeological factors, combined with local climatic conditions, shape physically complex and dynamic environments that demand predictive models with high adaptive capacity. In this context, the suitability of groundwater quality for human consumption is conditioned by these factors, among them, variations in well depth, which can significantly alter water mineralization and contaminant mobility due to interactions with lithological formations, hydraulic gradients, and specific redox conditions [174,186]. Therefore, the applied algorithms must be capable of capturing aquifer heterogeneity by integrating geological and hydrodynamic variables that directly influence groundwater quality.

Finally, at the regional level, Asia leads scientific production, with strong representation from India, China, Iran, and Bangladesh. These countries face severe water stress, which justifies the scientific community’s growing interest in groundwater quality and the development of advanced predictive models.

4. Answering the Research Questions

Consistent with the B-SLR framework adopted in this study, the driven research questions are answered below.

RQ1. What are the most widely used AI/ML/DL algorithms in water quality prediction?

The literature review showed that assembly models, such as Bagging and Boosting, have established themselves as predominant techniques in the prediction of both surface water and groundwater quality [187]. Their effectiveness is supported by studies that demonstrate their ability to reduce mean absolute errors and quadratic errors [188]. In addition, recent research has explored variants of these models, such as Grid Search Random Forest (GS-RF) and XGB, to optimize prediction accuracy on parameters such as turbidity and different nutrients [189], showing that algorithms such as CatBoost Regression (CBR) offer advantages in terms of stability and adaptability, particularly in handling heterogeneous datasets, minimizing overfitting, and maintaining consistent performance across varying environmental conditions and input configurations [173].

These assembly models have been successfully applied in diverse hydrological contexts, including urban rivers and Rural [57], and their use extends to the prediction of water quality indices in multivariable and complex scenarios [190]. Moreover, their ability to handle large volumes of data and correlated variables characterizes them as robust tools in environmental studies.

On the other hand, models based on ANNs have evolved considerably in recent years, incorporating more flexible architectures and optimization techniques such as genetic algorithms, wavelet transforms and hybrid strategies inspired by nature, improving performance against nonlinear and highly noisy datasets [109,161]. Its applicability has extended from the prediction of individual parameters to multivariable estimates of water quality, including dissolved oxygen and organic pollutants. Also, models based on fuzzy logic have gained relevance due to their ability to handle uncertainty (inherent in environmental data). Indeed, research in Saudi Arabia, India, and China has shown that fuzzy logic, when integrated with neural networks or evolutionary algorithms, exhibits suitable performances in regions with fragmented or scattered data [100,191,192]. Techniques such as Extreme Learning Machine (ELM), in combination with RF, have also been used to extend their applicability in environments with high hydrological variability [127].

Additionally, algorithms such as SVM, SVR, and RF remain highly popular due to their effectiveness in estimating water quality parameters. These methods offer non-parametric solutions that learn directly from observed data, facilitating uncertainty management and contextualized interpretation of pollution patterns [26].

The bibliometric analysis also reflects a sustained increase in integrating XAI techniques in research oriented to water quality prediction, to interpret interpreting the internal behavior of complex models and facilitating transparent decision-making. Techniques such as SHAP and LIME have been implemented to overcome the “black box” of ML/DL algorithms, allowing researchers to assess and visualize the influence of each predictor variable on the results [85,107,193]. XAI has been shown to improve model reliability and provide useful explanations for water quality management [108,194,195]. A prominent example is [196], in which an interpretable learning framework based on SHAP and RF is developed, applied to hydrodynamic scenarios. This approach allowed us to understand the impact of environmental variables on water quality, reinforcing the usefulness of XAI in complex studies of aquatic systems.

Figure 10a summarizes the predictive models identified in this study, classifying and organizing the AI/ML/DL algorithms most commonly used in freshwater quality prediction studies, both for surface water and groundwater systems. The percentage of the models with the highest applicability in predicting water quality in both surface and groundwater is presented in Figure 10b.

Figure 10. ML/DL models most used in water quality prediction: (a) Classification; (b) Percentage of applicability.

Finally, building reliable predictive models involves following a rigorous workflow that includes database consolidation, raw data preprocessing, proper predictive algorithm selection, model training, and validation. Recent literature underscores that each stage is critical to ensure the accuracy and robustness of the predictive model [47,197,198,199]. In particular, the selection of the algorithm must be aligned with the nature of the data and the specific objectives of a given study, ensuring interpretable, adaptable and relevant results for environmental decision-making.

RQ2. Which AI/ML/DL algorithm allows a better estimate of water quality?

The literature review identified a wide range of AI/ML/DL models developed both as standalone approaches and hybrid frameworks to enhance water quality prediction. Defining a single robust predictive model remains a challenge for researchers. However, certain algorithms stand out for their efficiency. For instance, the light gradient boosting machine (LightGBM) has emerged as a highly effective option due to its ability to process large datasets, fast training speed, and optimized architecture designed to minimize computational cost while maximizing predictive accuracy. Recent studies have reported accuracies exceeding 90% in the estimation of physicochemical parameters [200,201,202]. This technique is particularly notable for its consistent performance in water bodies with high variability.

The XGB algorithm has demonstrated remarkable performance in classifying both surface and groundwater quality, achieving accuracy levels close to 89% [203,204]. Its integration with XAI techniques such as SHAP enables the interpretation of key indicators such as zinc, nitrates, and chlorides, thereby improving the transparency of predictive models [203]. On the other hand, study [205] evaluated the XGB model alongside the ETR and GBR regression models to predict water quality in the Weihe River Basin, China. The study addressed the challenges of monitoring NH₃-N, TP, COD, and DO using multispectral images by integrating meteorological data (temperature, precipitation, evapotranspiration, and wind speed) and land use types as covariates in the models, which were assessed under three scenarios: remote sensing; remote sensing + meteorology; and remote sensing + meteorology + land use. Methods such as Correlation Analysis (CA), Feature Importance Analysis (FIA), and SHAP were employed to select sensitive spectral bands and evaluate the contribution of multi-source data to improve model accuracy. The results showed that the GBR and ETR models were more robust and transferable, enabling the generation of spatial inversion maps with accurate concentrations of WQPs.

Time-series-based models, particularly LSTM architectures and their hybrid variants such as LSTM-CNN and CEEMDAN-LSTM, have achieved accuracies ranging from 90% to 93% by capturing complex dynamic patterns [6,43,88,116,155,206]. The incorporation of Transfer Learning further enhances performance by enabling efficient adaptation under data limitations [101,103,104]. Although MLP models have a simpler structure, they yield accuracies between 85% and 89% in sequential scenarios. Optimizing their parameters through genetic algorithms has improved precision in hydrologically realistic contexts [197,207,208,209].

Finally, hybrid approaches that combine multiple algorithms have reached accuracies above 92%, standing out for their adaptability to outlier values and their generalization capacity [157,210]. In summary, although no single model consistently outperforms across all contexts, algorithms that integrate interpretability, deep architecture, and hybrid strategies have delivered more accurate and reliable results. The selection of the most suitable model depends on the type of data, the analytical objectives, and the specific conditions of the water system under study.

RQ3. What limitations have been identified in the use of AI/ML/DL for water quality prediction?

A relevant aspect identified in this work is the study of spatial and temporal variations in the evaluation of water quality in the context of missing data, primarily due to missing data caused by measurement system failures, operational errors, environmental phenomena, and the non-continuous sampling frequency of water quality data. These limitations constrain the availability of reliable datasets for classification and evaluation purposes. In this context, Time Series Analysis (TSA) models combined with machine learning approaches, particularly long short-term memory (LSTM) networks have proven highly effective in addressing these challenges and estimating future values of water quality parameters (WQP) or water quality indices (WQI) based on historical data [211,212,213].

It was observed that DL models face various challenges compared to traditional physical models for water quality prediction. These include the complexity of internal structure and parameter adjustment, reliance on large datasets for effective training, and a lack of physical constraints, which can make it difficult to explain prediction results. Similarly, obtaining high-reliability data for certain water quality parameters can be difficult, limiting the applicability of deep learning approaches [214]. However, these challenges have begun to be overcome with the development of hybrid models and the use of interpretable approaches [25,88,114,215,216]. In general, the accuracy of the predictions of AI/ML/DL models is influenced by the availability and quality of historical data, and the models developed can be sensitive to variations in environmental conditions over time.

Recent literature has shown that some of these limitations can be mitigated through the use of LSTM models combined with Transfer Learning (TL) techniques [101,103], particularly instance-based approaches such as TrAdaBoost [96]. This model inherits the strengths of both LSTM and TL, offering powerful capabilities to capture long-term dependencies in time series and the flexibility to leverage related knowledge from complete datasets to fill large-scale consecutive data gaps [96]. Notably, prediction performance can be further enhanced by applying wavelet transforms to suppress noise in time-series signals, serving as an optimization mechanism for predictive modeling [89]. In this regard, model performance depends heavily on how variables are cleaned, transformed, and selected. For this reason, optimization algorithms such as Particle Swarm Optimization (PSO) [217,218], among others [114,118,219], are widely used.

Finally, variations in model performance caused by seasonal changes, extreme events, or point-source contamination which can significantly affect prediction accuracy have been addressed using Generative Adversarial Networks (GANs). These networks enable the simulation of abnormal or extreme conditions that are poorly represented in real datasets [163]. While data scarcity and climate variability driven by extreme events have posed significant challenges to water quality prediction using AI/ML/DL techniques, the recent literature reveals a growing interest in bridging this gap—an essential step toward ethical and transparent water resource management. In this context, study [220] introduced the Multi-Scale Weighted-Slope Regression (MS-WSR) model, which demonstrated strong robustness in predicting water levels across six lakes, achieving high accuracy and adaptability under climate variability.

From an ethical and regulatory standpoint, AI-based models have emerged as powerful yet ambivalent tools in water treatment, conservation, and management. While they offer transformative potential to address pressing water challenges, but its use raises concern about its considerable water consumption and potential environmental impact. These issues can be mitigated through the integration of renewable energy sources reducing both water footprints and greenhouse gas emissions, and the implementation of water reuse systems that lessen reliance on freshwater and minimize wastewater-related effects [221]. Furthermore, the lack of explainability in certain models may undermine trust in the predictions that inform transparent decision-making in water governance. Emerging technologies such as XAI offer promising solutions to issues of opacity and bias, paving the way for more sustainable and accountable water resource management.

RQ4. What emerging variants currently exist in AI/ML/DL models for estimating water quality?

A current trend in water resource management using AI/ML/DL is the development of hybrid algorithms, which have gained relevance as a strategy to improve the accuracy of water quality parameter estimation. For instance, models based on Variational mode decomposition optimized by the sparrow search algorithm (SSA-VMD), combined with Bidirectional Gated Recurrent Units (BiGRU), have achieved over 96% efficiency in the case of Qiandao Lake, China [222]. These hybrid approaches enable the capture of both nonlinear patterns and temporal dynamics in hydro-environmental data.

In the context of time series prediction for water quality assessment, Transformer models have shown notable advantages over LSTM networks. This architecture overcomes the limitations of previous models by effectively capturing long-term dependencies and correlations between distant points in time series—an essential capability for predicting variables such as pH, turbidity, and dissolved oxygen. Unlike LSTMs, which process data sequentially, the Transformer simultaneously analyzes the entire historical dataset, significantly reducing training time, enhancing the extraction of relevant features, and improving overall modeling efficiency [26]. Finally, there is growing interest in enhancing the interpretability of machine learning models, given their “black box” nature. In this regard, Explainable AI (XAI) techniques have been implemented to identify the relative impact of water quality parameters on model predictions. The recent literature highlights successful applications of SHAP in models estimating salinity and dissolved oxygen, and in the prediction of heavy metal concentrations, contributing to more informed decision-making in water management [164,165,223].

RQ5. What are the key water quality indicators used to assess natural water sources?

Water quality assessment in natural sources relies on a set of physicochemical and biological parameters that characterize both the environmental status and the suitability of water for various uses. According to the scientific literature reviewed, the most frequently used parameters in predictive studies employing AI/ML/DL techniques include dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD), total suspended solids (TSS), temperature, pH, electrical conductivity (EC), chlorophyll-a (Chl-a), nitrates, phosphates, and coliform bacteria. These indicators are considered conventional and are regulated by international environmental standards due to their relevance in characterizing both surface and groundwater bodies.

Beyond these essential parameters, there is a growing trend in the literature toward incorporating complementary variables such as heavy metals (e.g., arsenic, copper, lead), nutrients (NO₃⁻, NO₂⁻, NH₄⁺, PO₄³⁻), and trace metals (Fe, Mn, Zn, Cu, Cr), which enhance the discrimination between different water quality categories. Additionally, the integration of hydrogeological, meteorological, land use, and socioeconomic variables has proven valuable in enriching predictive models by capturing the influence of external factors on water body dynamics [129,224,225]. In this regard, meteorological, land use, socioeconomic, and hydrogeological variables [226], for example, help illustrate how human activities can alter the export of chemical elements through changes in vegetation cover, ultimately affecting water quality [227].

5. Contribution and Future Work

The scientific literature review conducted in this article, covering the period 2015–2024, using the integrated bibliometric-systematic literature review (B-SLR) methodology and Topic Modeling, delved into AI/ML/DL techniques for predicting freshwater quality (surface and groundwater). The review confirmed a state of maturity in the field, characterized by exponential growth in scientific publications since 2020. It also allowed us to identify key authors in the field, research gaps, emerging trends, and models with high predictive performance. The main contribution of this study lies in its methodological approach, which allowed us to identify the most relevant works that apply AI/ML/DL models in studies related to surface and groundwater quality.

The results of this research confirmed the findings of the study [57], which received more than 452 citations and reviewed predictive models for river water quality during the period 2000–2020. However, by applying the B-SLR methodology, the present study significantly expanded the scope of the analysis, updating the literature to 2024, identifying emerging trends, and incorporating studies on freshwater groundwater as a potential source of drinking water. This broader and more integrative approach allowed for a deeper thematic mapping of the field, revealing underexplored areas and offering a more complete understanding of current scientific advances. Regarding groundwater quality, the study [109] from the period (1994–2022) highlighted that ANNs are the most widely used in water quality modeling, with nitrate being the most studied parameter.

Complementing this knowledge, the present work identifies a transition from more traditional approaches, such as ANNs, to more sophisticated models, such as XGB, LightGB, and hybrid explainable architectures, such as XGB, LSTM-CNN, and SHAP. In addition, promising emerging approaches are identified, such as Generative Adversarial Networks (GANs) and Group Data Handling (GMDH), used in data-scarce contexts. Transfer Learning (TL), applied for knowledge reuse and performance improvement in data-constrained environments; and Transformer architectures, which have been shown to outperform LSTM networks in time series prediction.

As future research, we propose expanding the spectrum of water quality parameters considered in predictive models, incorporating emerging contaminants such as microplastics, pharmaceuticals, and persistent organic compounds (POCs), which are of current concern [195] in water management. Furthermore, a promising research gap could explore the scalability of Transformer models in real-time water quality monitoring systems, especially in resource-limited settings.

Finally, we recognize that using citations as an indicator of quality has limitations. The lack of distinction between open access and subscription articles can bias the results, given that visibility and accessibility influence citation frequency. Furthermore, citation practices influenced by national or institutional affinities can distort the relationship between citation counts and intrinsic quality.

The total number of citations reflects the overall scholarly impact of a body of work, yet it may be misleading when disproportionately driven by a few highly cited publications or diluted through extensive co-authorship, obscuring individual contributions [87]. Future research should consider incorporating complementary metrics that evaluate the quality of cited works, alongside approaches that integrate contextual impact and explainability indicators. Such enhancements would enable a more equitable and nuanced assessment of scientific value, particularly in disciplines where visibility does not necessarily correlate with methodological rigor.

6. Conclusions

This research demonstrates that the Bibliometric-Systematic Literature Review (B-SLR) approach constitutes a robust methodology for analyzing the state of the art in highly dynamic scientific fields, such as water quality prediction using AI/ML/DL techniques. By integrating the structured rigor of a systematic review with the analytical depth of bibliometric analysis, the B-SLR enabled the identification of domain-specific trends, thematic mapping of knowledge, assessment of scientific impact, and detection of gaps in the literature—offering a more comprehensive, precise, and context-aware understanding of the field.

The findings reveal that ensemble models (e.g., Bagging, Boosting), deep neural networks (LSTM, CNN, MLP), and hybrid approaches have overcome the limitations of conventional methods, delivering greater accuracy, adaptability, and the ability to handle incomplete or nonlinear data. The integration of explainable artificial intelligence (XAI) techniques, such as SHAP and LIME, has facilitated the development of more transparent and reliable models, enhanced result interpretation and supported informed decision-making. Accordingly, the selection of the optimal predictive model depends on multiple factors, including the type of water body, geographic context, data availability, and the specific objectives of the study. In this regard, hybrid and interpretable models emerge as the most promising alternatives for addressing current challenges in water quality prediction.

Regarding the methodological approach employed in this study, the application of B-SLR enabled the refinement of an initial database of 1822 articles into a final corpus of 274 highly relevant publications, through automated procedures and manual validation. This process ensured the quality and relevance of the analyzed studies, reinforcing the reliability of the results obtained. Furthermore, a detailed classification was achieved, covering the most frequently used algorithms, the types of water bodies studied, key quality indicators, and the methodological limitations faced by predictive models. Ultimately, this work establishes a replicable and scalable methodological foundation for future research.

Author Contributions

Conceptualization, J.A.M.-A., J.N., R.O., C.A.C., J.L.A. and L.R.-L.; methodology, J.A.M.-A.; software, J.A.M.-A. and J.N.; validation, J.A.M.-A. and J.N.; formal analysis, J.A.M.-A., J.N., R.O., C.A.C., J.L.A. and L.R.-L.; investigation, J.A.M.-A., J.N. and R.O.; resources, J.N. and R.O.; data curation, J.A.M.-A. and J.N.; writing—original draft preparation, J.A.M.-A., J.N. and R.O.; writing—review and editing, J.A.M.-A., J.N., R.O., C.A.C., J.L.A. and L.R.-L.; visualization, J.A.M.-A. and J.N.; supervision, J.N. and R.O.; project administration, R.O.; funding acquisition R.O., J.N. and J.L.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by DIDULS Regular PR2553851 Project of the University of La Serena and ANID/FONDAP/1523A0001. The APC was sponsored by the CRHIAM Water Center, Universidad de Concepción, Chillán, Chile.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

Ricardo Oyarzún and Jorge Núñez acknowledges the financial support of DIDULS/ULS, through the project PR2553851 (University of La Serena, Chile). José Luis Arumí and Ricardo Oyarzún acknowledges the financial support of the Water Research Center CRHIAM: ANID/FONDAP/1523A0001.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AMT	Alternating Model Tree
ANFIS	Adaptive Neuro-Fuzzy Inference System
ANFIS–GP	Adaptive Neuro Fuzzy Inference System–Grid Partitioning
ANFIS–SC	ANFIS–Subtractive Clustering
ANN	Artificial Neural Network
AO-SVM	Aquila Optimization Support Vector Machine
AR	Additive Regression
AdaBoost	Adaptive Boosting
BDT	Boosted Decision Tree
B-SLR	Bibliometric-Systematic Literature Review
BiGRU	Bi-directional Gated Recurrent Units
BMEF	Bayesian Maximum Entropy-based Fusion
BNN	Bayesian Neural Network
BOD	Biochemical Oxygen Demand
BPNN	Backpropagation Neural Network
BRT	Boosted Regression Trees
CA	Correlation Analysis
Chl-a	Chlorophyll-a
CART	Classification and Regression Tree
CatBoost	Categorical Boosting
CBR	CatBoost Regression
CEEMD	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
CNN	Convolutional Neural Network
COD	Chemical Oxygen Demand
CSA	Crow Search Algorithm
DL	Deep Learning
DBN	Deep Belief Network
DCGAN	Deep Convolutional Generative Adversarial Network
DENFIS	Dynamic Evolving Neural-Fuzzy Inference System
DNN	Deep Neural Network
DO	Dissolved Oxygen
DR	Discretization Regression
DRNN	Deep Recurrent Neural Network
DT	Decision Tree
DWT	Discrete Wavelet Transform
EANN	Emotional Artificial Neural Network
EANN-GA	Emotional Artificial Neural Network–Genetic Algorithm
EBM	Ensemble Bagged Machine
EC	Electrical Conductivity
EFuNN	Evolving Fuzzy Neural Network
ELM	Extreme Learning Machine
EN	Elastic Network
ETR	Extra Tree Regression
EWQI	Entropy-weighted Water Quality Index
ExT	Extra Trees
FIA	Feature Importance Analysis
FFNNs	Feedforward Neural Networks
FNN	Feed-forward Neural Network
FSGCN	Functional-Structural Sub-Region Graph Convolutional Network
FFA	Firefly Algorithm
GAN	Generative Adversarial Network
GB	Gradient Boosting
GBM	Gradient Boosting Machine
GBR	Gradient Boosting Regression
GBT	Gradient Boosted Trees
GC	Global Citations
GEP	Gene Expression Programming
GMDH	Group Method of Data Handling
GNB	Gaussian Naïve Bayes
GPR	Gaussian Process Regression
GRNN	Generalized Regression Neural Network
GRU	Gated Recurrent Unit
GS-RF	Grid Search-Random Forest
GS-SVR	Grid Search-Support Vector Regression
GWQI	Groundwater Quality Index
HGB	Histogram Gradient Boosting
IABC-BPNN	Improved Artificial Bee Colony–Backpropagation
iML	Interpretable Machine Learning
IoT	Internet of Things
IWQI	Irrigation Water Quality Index
KNN	K-Nearest Neighbors
kPCA	Kernal Principal Component Analysis
LC	Local Citation
LIME	Local Interpretable Model-agnostic Explanations
LR	Logistic Regression
LRM	Logistic Regression Model
LSSVR	Least Squares Support Vector Regression
LSTM	Long Short-Term Memory
LightGBM	Light Gradient Boosting Machine
LGB	Light Gradient Boosting
MARS	Multivariate Adaptive Regression Spline
ML	Machine Learning
MLR	Multiple Linear Regression
MLRF	Multi-label Classification Through Random Forest
MLP	Multi-Layer Perceptron
MnLR	Multinomial Logistic Regression
MS-WSR	Multi-Scale Weighted-Slope Regression
NNE	Neural Network Ensemble
PCA	Principal Component Analysis
PCOs	Persistent Organic Compounds
PLS	Partial Least Squares
PNN	Probabilistic Neural Network
PSO	Particle Swarm Optimization
RBF	Radial Basis Function
RBFNN	Radial Basis Function Neural Network
RC	Random Committee
REPT	Reduced Error Pruning Tree
RF	Random Forest
RFR	Random Forest Regression
RFC	Randomizable Filtered Classification
RNN	Recurrent Neural Network
RR	Ridge Regression
RQ	Research Questions
RT	Regression Tree
SDGs	Sustainable Development Goals
SHAP	SHapley Additive exPlanations
SLR	Simple Linear Regression
SMO	Sequential Minimal Optimization
SMO-SVM	Sequential Minimal Optimization-Support Vector Machine
SSA-CNN-LSTM	Sparrow Search Algorithm-Convolutional Neural Network-Long Short-Term Memory
SSA-VMD	Sparrow search algorithm-Variational mode decomposition
SVM	Support Vector Machines
SVR	Support Vector Regression
SVMR	Support Vector Machine Regression
SWEBM	Stochastic Weighted Ensemble Bagged Machine
TC	Total Citations
TDS	Total Dissolved Solids
TL	Transfer Learning
TSS	Total Suspended Solids
WA	Wavelet Analysis
W-MGGP	Wavelet-Multigene Genetic Programming
WQI	Water Quality Index
WQP	Water Quality Parameters
WT	Wavelet Transform
WT-ANN	Wavelet Transform-Artificial Neural Network
XAI	eXplainable Artificial Intelligence
XGB	eXtreme Gradient Boosting

References

Ahmed, W.; Mohammed, S.; El-Shazly, A.; Morsy, S. Tigris River water surface quality monitoring using remote sensing data and GIS techniques. Egypt. J. Remote Sens. Sci. 2023, 26, 816–825. [Google Scholar] [CrossRef]
Gaagai, A.; Aouissi, H.A.; Bencedira, S.; Hinge, G.; Athamena, A.; Heddam, S.; Gad, M.; Elsherbiny, O.; Elsayed, S.; Eid, M.H.; et al. Application of Water Quality Indices, Machine Learning Approaches, and GIS to Identify Groundwater Quality for Irrigation Purposes: A Case Study of Sahara Aquifer, Doucen Plain, Algeria. Water 2023, 15, 2. [Google Scholar] [CrossRef]
Rahaman, M.H.; Sajjad, H.; Hussain, S.; Roshani; Masroor, M.; Sharma, A. Surface water quality prediction in the lower Thoubal river watershed, India: A hyper-tuned machine learning approach and DNN-based sensitivity analysis. J. Environ. Chem. Eng. 2024, 12, 112915. [Google Scholar] [CrossRef]
UNESCO; UN-Water; World Water Assessment Programme. The United Nations World Water Development Report 2023: Partnerships and Cooperation for Water; UNESCO: Paris, France, 2023; Available online: http://digitallibrary.un.org/record/4007797 (accessed on 26 September 2025).
Gao, J.; Zhu, S.; Li, D.; Jiang, H.; Deng, G.; Wen, Y.; He, C.; Cao, Y. Bibliometric analysis of climate change and water quality. Hydrobiologia 2023, 850, 3441–3459. [Google Scholar] [CrossRef]
Pyo, J.; Pachepsky, Y.; Kim, S.; Abbas, A.; Kim, M.; Kwon, Y.S.; Ligaray, M.; Cho, K.H. Long short-term memory models of water quality in inland water environments. Water Res. X 2023, 21, 100207. [Google Scholar] [CrossRef]
Hussein, E.E.; Baloch, M.Y.J.; Nigar, A.; Abualkhair, H.F.; Aldawood, F.K.; Tageldin, E. Machine Learning Algorithms for Predicting the Water Quality Index. Water 2023, 15, 3540. [Google Scholar] [CrossRef]
Kamal, N.A.; Muhammad, N.S.; Abdullah, J. Scenario-based pollution discharge simulations and mapping using integrated QUAL2K-GIS. Environ. Pollut. 2020, 259, 113909. [Google Scholar] [CrossRef]
Mummidivarapu, S.K.; Rehana, S.; Rao, Y.R.S. Mapping and assessment of river water quality under varying hydro-climatic and pollution scenarios by integrating QUAL2K, GEFC, and GIS. Environ. Res. 2023, 239, 117250. [Google Scholar] [CrossRef]
Sarafaraz, J.; Kaleybar, F.A.; Karamjavan, J.M.; Habibzadeh, N. Predicting river water quality: An imposing engagement between machine learning and the QUAL2Kw models (case study: Aji-Chai, river, Iran). Results Eng. 2024, 21, 101921. [Google Scholar] [CrossRef]
Chueh, Y.-Y.; Fan, C.; Huang, Y.-Z. Copper concentration simulation in a river by SWAT-WASP integration and its application to assessing the impacts of climate change and various remediation strategies. J. Environ. Manag. 2021, 279, 111613. [Google Scholar] [CrossRef]
Prajapati, S.; Sabokruhie, P.; Brinkmann, M.; Lindenschmidt, K.-E. Modelling Transport and Fate of Copper and Nickel across the South Saskatchewan River Using WASP—TOXI. Water 2023, 15, 265. [Google Scholar] [CrossRef]
Alam, R.; Ahmed, Z.; Seefat, S.M.; Nahin, K.T.K. Assessment of surface water quality around a landfill using multivariate statistical method, Sylhet, Bangladesh. Environ. Nanotechnol. Monit. Manag. 2021, 15, 100422. [Google Scholar] [CrossRef]
Isaac, R.; Siddiqui, S.; Higgins, P.; Paul, A.S.; Lawrence, N.A.; Lall, A.S.; Khatoon, A.; Singh, A.; Majeed, P.A.; Massey, S.; et al. Assessment of seasonal impacts on Water Quality in Yamuna river using Water Quality Index and Multivariate Statistical approaches. Waste Manag. Bull. 2024, 2, 145–153. [Google Scholar] [CrossRef]
Fernandes, C.P.; Fonseca, A.R.; Pacheco, F.A.L.; Fernandes, L.F.S. Water quality predictions through linear regression—A brute force algorithm approach. MethodsX 2023, 10, 102153. [Google Scholar] [CrossRef] [PubMed]
Galoie, M.; Motamedi, A.; Fan, J.; Moudi, M. Prediction of water quality under the impacts of fine dust and sand storm events using an experimental model and multivariate regression analysis. Environ. Pollut. 2023, 336, 122462. [Google Scholar] [CrossRef]
Pandey, D.K.; Hunjra, A.I.; Bhaskar, R.; Al-Faryan, M.A.S. Artificial intelligence, machine learning and big data in natural resources management: A comprehensive bibliometric review of literature spanning 1975–2022. Resour. Policy 2023, 86, 104250. [Google Scholar] [CrossRef]
Li, X.; Su, J.; Wang, H.; Boczkaj, G.; Mahlknecht, J.; Singh, S.V.; Wang, C. Bibliometric analysis of artificial intelligence in wastewater treatment: Current status, research progress, and future prospects. J. Environ. Chem. Eng. 2024, 12, 113152. [Google Scholar] [CrossRef]
Gonzales-Inca, C.; Calle, M.; Croghan, D.; Haghighi, A.T.; Marttila, H.; Silander, J.; Alho, P. Geospatial Artificial Intelligence (GeoAI) in the Integrated Hydrological and Fluvial Systems Modeling: Review of Current Applications and Trends. Water 2022, 14, 2211. [Google Scholar] [CrossRef]
Aliaga-Alvarado, M.; Gómez-Escalonilla, V.; Martínez-Santos, P. Identification of non-conventional groundwater resources by means of machine learning in the Aconcagua basin, Chile. J. Hydrol. Reg. Stud. 2023, 49, 101502. [Google Scholar] [CrossRef]
Khoi, D.N.; Quan, N.T.; Linh, D.Q.; Nhi, T.T.; Thuy, N.T.D. Using Machine Learning Models for Predicting the Water Quality Index in the La Buong River, Vietnam. Water 2022, 14, 1552. [Google Scholar] [CrossRef]
Mallya, G.; Hantush, M.M.; Govindaraju, R.S. A Machine Learning Approach to Predict Watershed Health Indices for Sediments and Nutrients at Ungauged Basins. Water 2023, 15, 586. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D. Application of Machine Learning in Water Resources Management: A Systematic Literature Review. Water 2023, 15, 620. [Google Scholar] [CrossRef]
Yan, J.; Gao, Y.; Yu, Y.; Xu, H.; Xu, Z. A Prediction Model Based on Deep Belief Network and Least Squares SVR Applied to Cross-Section Water Quality. Water 2020, 12, 1929. [Google Scholar] [CrossRef]
Zhou, Y. Real-time probabilistic forecasting of river water quality under data missing situation: Deep learning plus post-processing techniques. J. Hydrol. 2020, 589, 125164. [Google Scholar] [CrossRef]
Tripathy, K.; Mishra, A.K. Deep learning in hydrology and water resources disciplines: Concepts, methods, applications, and research directions. J. Hydrol. 2024, 628, 130458. [Google Scholar] [CrossRef]
Zheng, Y.; Wei, J.; Zhang, W.; Zhang, Y.; Zhang, T.; Zhou, Y. An ensemble model for accurate prediction of key water quality parameters in river based on deep learning methods. J. Environ. Manag. 2024, 366, 121932. [Google Scholar] [CrossRef]
Chellaiah, C.; Anbalagan, S.; Swaminathan, D.; Chowdhury, S.; Kadhila, T.; Shopati, A.K.; Shangdiar, S.; Sharma, B.; Amesho, K.T.T. Integrating deep learning techniques for effective river water quality monitoring and management. J. Environ. Manag. 2024, 370, 122477. [Google Scholar] [CrossRef]
Prasad, D.V.V.; Venkataramana, L.Y.; Kumar, P.S.; Prasannamedha, G.; Harshana, S.; Srividya, S.J.; Harrinei, K.; Indraganti, S. Analysis and prediction of water quality using deep learning and auto deep learning techniques. Sci. Total Environ. 2022, 821, 153311. [Google Scholar] [CrossRef]
Khullar, S.; Singh, N. Water quality assessment of a river using deep learning Bi-LSTM methodology: Forecasting and validation. Environ. Sci. Pollut. Res. 2022, 29, 12875–12889. [Google Scholar] [CrossRef]
United Nations Department of Economic and Social Affairs. The Sustainable Development Goals Report 2023; Special Edition; United Nations Department of Economic and Social Affairs: New York, NY, USA, 2023; Available online: https://unstats.un.org/sdgs/report/2023/The-Sustainable-Development-Goals-Report-2023.pdf (accessed on 4 June 2025).
UN-Water. The Sustainable Development Goal 6 Global Acceleration Framework; UN-Water: Geneva, Switzerland, 2020; Available online: https://unsceb.org/sdg-6-global-acceleration-framework (accessed on 4 June 2025).
de Carvalho Marques, M.; Mohamed, A.A.; Feitosa, P. Sustainable development goal 6 monitoring through statistical machine learning—Random Forest method. Clean. Prod. Lett. 2025, 8, 100088. [Google Scholar] [CrossRef]
Rodríguez-López, L.; Bustos Usta, D.; Bravo Alvarez, L.; Duran-Llacer, I.; Lami, A.; Martínez-Retureta, R.; Urrutia, R. Machine Learning Algorithms for the Estimation of Water Quality Parameters in Lake Llanquihue in Southern Chile. Water 2023, 15, 1994. [Google Scholar] [CrossRef]
Ansari, T.; Nigar, N.; Faisal, H.M.; Shahzad, M.K. AI for clean water: Efficient water quality prediction leveraging machine learning. Water Pract. Technol. 2024, 19, 1986–1996. [Google Scholar] [CrossRef]
Zamani, M.G.; Nikoo, M.R.; Niknazar, F.; Al-Rawas, G.; Al-Wardy, M.; Gandomi, A.H. A multi-model data fusion methodology for reservoir water quality based on machine learning algorithms and bayesian maximum entropy. J. Clean. Prod. 2023, 416, 137885. [Google Scholar] [CrossRef]
Haghiabi, H.; Nasrolahi, A.H.; Parsaie, A. Water quality prediction using machine learning methods. Water Qual. Res. J. 2018, 53, 3–13. [Google Scholar] [CrossRef]
Satish, N.; Anmala, J.; Varma, M.R.R.; Rajitha, K. Performance of Machine Learning, Artificial Neural Network (ANN), and stacked ensemble models in predicting Water Quality Index (WQI) from surface water quality parameters, climatic and land use data. Process Saf. Environ. Prot. 2024, 192, 177–195. [Google Scholar] [CrossRef]
Farzana, S.Z.; Paudyal, D.R.; Chadalavada, S.; Alam, M.J. Temporal Dynamics and Predictive Modelling of Streamflow and Water Quality Using Advanced Statistical and Ensemble Machine Learning Techniques. Water 2024, 16, 2107. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Diganta, M.T.M.; Rahman, A.; Olbert, A.I. Robust machine learning algorithms for predicting coastal water quality index. J. Environ. Manag. 2022, 321, 115923. [Google Scholar] [CrossRef]
Shamsuddin, I.S.; Othman, Z.; Sani, N.S. Water Quality Index Classification Based on Machine Learning: A Case from the Langat River Basin Model. Water 2022, 14, 2939. [Google Scholar] [CrossRef]
Singh, S.; Das, A.; Sharma, P. Predictive modeling of water quality index (WQI) classes in Indian rivers: Insights from the application of multiple Machine Learning (ML) models on a decennial dataset. Stoch. Environ. Res. Risk Assess. 2024, 38, 3221–3238. [Google Scholar] [CrossRef]
Yao, J.; Chen, S.; Ruan, X. Interpretable CEEMDAN-FE-LSTM-transformer hybrid model for predicting total phosphorus concentrations in surface water. J. Hydrol. 2024, 629, 130609. [Google Scholar] [CrossRef]
Flores, V.; Bravo, I.; Saavedra, M. Water Quality Classification and Machine Learning Model for Predicting Water Quality Status—A Study on Loa River Located in an Extremely Arid Environment: Atacama Desert. Water 2023, 15, 2868. [Google Scholar] [CrossRef]
Masood, A.; Niazkar, M.; Zakwan, M.; Piraei, R. A Machine Learning-Based Framework for Water Quality Index Estimation in the Southern Bug River. Water 2023, 15, 3543. [Google Scholar] [CrossRef]
Aju, D.; Achu, A.L.; Mohammed, M.; Raicy, M.C.; Gopinath, G.; Reghunath, R. Groundwater quality prediction and risk assessment in Kerala, India: A machine-learning approach. J. Environ. Manag. 2024, 370, 122616. [Google Scholar] [CrossRef] [PubMed]
Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef]
del Castillo, F.; Garibay, M.V.; Díaz-Vázquez, D.; Yebra-Montes, C.; Brown, L.E.; Johnson, A.; Garcia-Gonzalez, A.; Gradilla-Hernández, M.S. Improving river water quality prediction with hybrid machine learning and temporal analysis. Ecol. Inform. 2024, 82, 102655. [Google Scholar] [CrossRef]
Yan, T.; Zhou, A.; Shen, S.-L. Prediction of long-term water quality using machine learning enhanced by Bayesian optimisation. Environ. Pollut. 2023, 318, 120870. [Google Scholar] [CrossRef]
Liu, C.; Xu, J.; Li, X.; Yu, Z.; Wu, J. Water resource forecasting with machine learning and deep learning: A scientometric analysis. Artif. Intell. Geosci. 2024, 5, 100084. [Google Scholar] [CrossRef]
Jayaraman, P.; Nagarajan, K.K.; Partheeban, P.; Krishnamurthy, V. Critical review on water quality analysis using IoT and machine learning models. Int. J. Inf. Manag. Data Insights 2024, 4, 100210. [Google Scholar] [CrossRef]
Li, W.; Zhao, Y.; Zhu, Y.; Dong, Z.; Wang, F.; Huang, F. Research progress in water quality prediction based on deep learning technology: A review. Environ. Sci. Pollut. Res. 2024, 31, 26415–26431. [Google Scholar] [CrossRef]
Nordin, N.F.C.; Mohd, N.S.; Koting, S.; Ismail, Z.; Sherif, M.; El-Shafie, A. Groundwater quality forecasting modelling using artificial intelligence: A review. Groundw. Sustain. Dev. 2021, 14, 100643. [Google Scholar] [CrossRef]
Li, X.; Li, Y.; Li, G. A scientometric review of the research on the impacts of climate change on water quality during 1998–2018. Environ. Sci. Pollut. Res. 2020, 27, 14322–14341. [Google Scholar] [CrossRef]
Bose, S.; Mazumdar, A.; Basu, S. Evolution of groundwater quality assessment on urban area- a bibliometric analysis. Groundw. Sustain. Dev. 2023, 20, 100894. [Google Scholar] [CrossRef]
Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
Tiyasha; Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar] [CrossRef]
Cojbasic, S.; Dmitrasinovic, S.; Kostic, M.; Sekulic, M.T.; Radonic, J.; Dodig, A.; Stojkovic, M. Application of machine learning in river water quality management: A review. Water Sci. Technol. 2023, 88, 2297–2308. [Google Scholar] [CrossRef] [PubMed]
Kitchenham, B.; Brereton, O.P.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 2009, 51, 7–15. [Google Scholar] [CrossRef]
Marzi, G.; Balzano, M.; Caputo, A.; Pellegrini, M.M. Guidelines for Bibliometric-Systematic Literature Reviews: 10 steps to combine analysis, synthesis and theory development. Int. J. Manag. Rev. 2025, 27, 81–103. [Google Scholar] [CrossRef]
Lim, W.M.; Kumar, S.; Donthu, N. How to combine and clean bibliometric data and use bibliometric tools synergistically: Guidelines using metaverse research. J. Bus. Res. 2024, 182, 114760. [Google Scholar] [CrossRef]
Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
Mohammed, M.A.; De-Pablos-Heredero, C.; Botella, J.L.M. A Systematic Literature Review on the Revolutionary Impact of Blockchain in Modern Business. Appl. Sci. 2024, 14, 11077. [Google Scholar] [CrossRef]
Lahami, M.; Maalej, A.J.; Krichen, M. A systematic literature review on dynamic testing of blockchain oriented software. Sci. Comput. Program. 2025, 240, 103211. [Google Scholar] [CrossRef]
Rousso, B.Z.; Bertone, E.; Stewart, R.; Hamilton, D. A systematic literature review of forecasting and predictive models for cyanobacteria blooms in freshwater lakes. Water Res. 2020, 182, 115959. [Google Scholar] [CrossRef]
Baas, J.; Schotten, M.; Plume, A.; Côté, G.; Karimi, R. Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quant. Sci. Stud. 2020, 1, 377–386. [Google Scholar] [CrossRef]
Baarimah, O.A.; Bazel, A.M.; Alaloul, S.W.; Alazaiza, D.Y.M.; Al-Zghoul, M.T.; Almuhaya, B.; Khan, A.; Mushtaha, W.A. Artificial intelligence in wastewater treatment: Research trends and future perspectives through bibliometric analysis. Case Stud. Chem. Environ. Eng. 2024, 10, 100926. [Google Scholar] [CrossRef]
Gusenbauer, M. Searchsmart.org: Guiding researchers to the best databases and search systems for systematic reviews and beyond. Res. Synth. Methods 2024, 15, 1200–1213. [Google Scholar] [CrossRef] [PubMed]
van Dinter, R.; Tekinerdogan, B.; Catal, C. Automation of systematic literature reviews: A systematic literature review. Inf. Softw. Technol. 2021, 136, 106589. [Google Scholar] [CrossRef]
textmineR: Functions for Text Mining and Topic Modeling, version 3.0.5; CRAN: Venna, Austria, 2021. Available online: https://cran.r-project.org/package=textmineR (accessed on 29 August 2025).
Chen, X.; Xie, H.; Tao, X.; Xu, L.; Wang, J.; Dai, H.; Wang, L.F. A topic modeling-based bibliometric exploration of automatic summarization research. WIREs Data Min. Knowl. Discov. 2024, 14, e1540. [Google Scholar] [CrossRef]
Cobelli, N.; Blasi, S. Combining topic modeling and bibliometric analysis to understand the evolution of technological innovation adoption in the healthcare industry. Eur. J. Innov. Manag. 2024, 27, 127–149. [Google Scholar] [CrossRef]
Xu, S.; Hao, L.; An, X.; Yang, G.; Wang, F. Emerging research topics detection with multiple machine learning models. J. Informetr. 2019, 13, 100983. [Google Scholar] [CrossRef]
Kassem, A.; Sefelnasr, A.; Ebraheem, A.A.; Sherif, M. Seawater intrusion physical models: A bibliometric analysis and review of mitigation strategies. J. Hydrol. 2024, 634, 131135. [Google Scholar] [CrossRef]
Phiri, Z.; Moja, N.T.; Nkambule, T.T.I.; de Kock, L.-A. Utilization of biochar for remediation of heavy metals in aqueous environments: A review and bibliometric analysis. Heliyon 2024, 10, e25785. [Google Scholar] [CrossRef]
Liu, H.; Kong, F.; Yin, H.; Middel, A.; Zheng, X.; Huang, J.; Xu, H.; Wang, D.; Wen, Z. Impacts of green roofs on water, temperature, and air quality: A bibliometric review. Build. Environ. 2021, 196, 107794. [Google Scholar] [CrossRef]
Biazatti, J.; Justi, A.C.A.; Souza, R.F.; de Carvalho Miranda, J.C. Soybean biorefinery and technological forecasts based on a bibliometric analysis and network mapping. Environ. Dev. 2024, 52, 101074. [Google Scholar] [CrossRef]
Batagelj, V.; Cerinšek, M. On bibliographic networks. Scientometrics 2013, 96, 845–864. [Google Scholar] [CrossRef]
Pandey, H.; Maraseni, T.N.; Apan, A.A. Enhancing systematic literature review adapting ‘double diamond approach. Heliyon 2024, 10, e40581. [Google Scholar] [CrossRef] [PubMed]
Gusenbauer, M.; Gauster, S. How to search for literature in systematic reviews and meta-analyses: A comprehensive step-by-step guide. Technol. Forecast. Soc. Chang. 2025, 212, 123833. [Google Scholar] [CrossRef]
Petersen, K.; Vakkalanka, S.; Kuzniarz, L. Guidelines for conducting systematic mapping studies in software engineering: An update. Inf. Softw. Technol. 2015, 64, 1–18. [Google Scholar] [CrossRef]
Sit, M.; Demiray, B.Z.; Xiang, Z.; Ewing, G.J.; Sermet, Y.; Demir, I. A comprehensive review of deep learning applications in hydrology and water resources. Water Sci. Technol. 2020, 82, 2635–2670. [Google Scholar] [CrossRef]
Huang, R.; Ma, C.; Ma, J.; Huangfu, X.; He, Q. Machine learning in natural and engineered water systems. Water Res. 2021, 205, 117666. [Google Scholar] [CrossRef]
Tefera, G.W.; Ray, R.L.; Singh, V. Surface water quality under climate change scenarios in the Bosque watershed, Central Texas of United States. Ecohydrol. Hydrobiol. 2024, 25, 477–492. [Google Scholar] [CrossRef]
Ramya, S.; Srinath, S.; Tuppad, P. Comprehensive analysis of multiple classifiers for enhanced river water quality monitoring with explainable AI. Case Stud. Chem. Environ. Eng. 2024, 10, 100822. [Google Scholar] [CrossRef]
Sidek, L.M.; Mohiyaden, H.A.; Marufuzzaman, M.; Noh, N.S.M.; Heddam, S.; Ehteram, M.; Kisi, O.; Sammen, S.S. Developing an ensembled machine learning model for predicting water quality index in Johor River Basin. Environ. Sci. Eur. 2024, 36, 67. [Google Scholar] [CrossRef]
Hirsch, J.E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA 2005, 102, 16569–16572. [Google Scholar] [CrossRef]
Barzegar, R.; Aalami, M.T.; Adamowski, J. Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch. Environ. Res. Risk Assess. 2020, 34, 415–433. [Google Scholar] [CrossRef]
Barzegar, R.; Adamowski, J.; Moghaddam, A.A. Application of wavelet-artificial intelligence hybrid models for water quality prediction: A case study in Aji-Chay River, Iran. Stoch. Environ. Res. Risk Assess. 2016, 30, 1797–1819. [Google Scholar] [CrossRef]
Ho, J.Y.; Afan, H.A.; El-Shafie, A.H.; Koting, S.B.; Mohd, N.S.; Jaafar, W.Z.B.; Hin, L.S.; Malek, M.A.; Ahmed, A.N.; Mohtar, W.H.M.W.; et al. Towards a time and cost effective approach to water quality index class prediction. J. Hydrol. 2019, 575, 148–165. [Google Scholar] [CrossRef]
Ji, X.; Shang, X.; Dahlgren, R.A.; Zhang, M. Prediction of dissolved oxygen concentration in hypoxic river systems using support vector machine: A case study of Wen-Rui Tang River, China. Environ. Sci. Pollut. Res. 2017, 24, 16062–16076. [Google Scholar] [CrossRef]
Fijani, E.; Barzegar, R.; Deo, R.; Tziritis, E.; Skordas, K. Design and implementation of a hybrid model based on two-layer decomposition method coupled with extreme learning machines to support real-time environmental monitoring of water quality parameters. Sci. Total Environ. 2019, 648, 839–853. [Google Scholar] [CrossRef]
Abba, S.I.; Pham, Q.B.; Saini, G.; Linh, N.T.T.; Ahmed, A.N.; Mohajane, M.; Khaledian, M.; Abdulkadir, R.A.; Bach, Q. Implementation of data intelligence models coupled with ensemble machine learning for prediction of water quality index. Environ. Sci. Pollut. Res. 2020, 27, 41524–41539. [Google Scholar] [CrossRef]
Noori, R.; Yeh, H.-D.; Abbasi, M.; Kachoosangi, F.T.; Moazami, S. Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand. J. Hydrol. 2015, 527, 833–843. [Google Scholar] [CrossRef]
Sakaa, B.; Elbeltagi, A.; Boudibi, S.; Chaffaï, H.; Islam, A.R.T.; Kulimushi, L.C.; Choudhari, P.; Hani, A.; Brouziyne, Y.; Wong, Y.J.; et al. Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environ. Sci. Pollut. Res. 2022, 29, 48491–48508. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Xu, H.; Jiang, P.; Yu, S.; Lin, G.; Bychkov, I.; Hmelnov, A.; Ruzhnikov, G.; Zhu, N.; Liu, Z.; et al. A transfer Learning-Based LSTM strategy for imputing Large-Scale consecutive missing data and its application in a water quality prediction system. J. Hydrol. 2021, 602, 126573. [Google Scholar] [CrossRef]
Jamshidzadeh, Z.; Ehteram, M.; Shabanian, H. Bidirectional Long Short-Term Memory (BILSTM)—Support Vector Machine: A new machine learning model for predicting water quality parameters. Ain Shams Eng. J. 2024, 15, 102510. [Google Scholar] [CrossRef]
Chen, T.-C. Application of wavelet theory to enhance the performance of machine learning techniques in estimating water quality parameters (case study: Gao-Ping River). Water Sci. Technol. 2023, 87, 1294–1315. [Google Scholar] [CrossRef]
Wang, K.; Liu, L.; Ben, X.; Jin, D.; Zhu, Y.; Wang, F. Hybrid deep learning based prediction for water quality of plain watershed. Environ. Res. 2024, 262, 119911. [Google Scholar] [CrossRef]
Manzar, M.S.; Benaafi, M.; Costache, R.; Alagha, O.; Mu’azu, N.D.; Zubair, M.; Abdullahi, J.; Abba, S.I. New generation neurocomputing learning coupled with a hybrid neuro-fuzzy model for quantifying water quality index variable: A case study from Saudi Arabia. Ecol. Inform. 2022, 70, 101696. [Google Scholar] [CrossRef]
Chen, X.; Sun, W.; Jiang, T.; Ju, H. Enhanced prediction of river dissolved oxygen through feature- and model-based transfer learning. J. Environ. Manag. 2024, 372, 123310. [Google Scholar] [CrossRef]
Khodkar, K.; Mirchi, A.; Nourani, V.; Kaghazchi, A.; Sadler, J.M.; Mansaray, A.; Wagner, K.; Alderman, P.D.; Taghvaeian, S.; Bailey, R.T.; et al. Stream salinity prediction in data-scarce regions: Application of transfer learning and uncertainty quantification. J. Contam. Hydrol. 2024, 266, 104418. [Google Scholar] [CrossRef]
Chen, S.; Huang, J.; Wang, P.; Tang, X.; Zhang, Z. A coupled model to improve river water quality prediction towards addressing non-stationarity and data limitation. Water Res. 2024, 248, 120895. [Google Scholar] [CrossRef]
Peng, L.; Wu, H.; Gao, M.; Yi, H.; Xiong, Q.; Yang, L.; Cheng, S. TLT: Recurrent fine-tuning transfer learning for water quality long-term prediction. Water Res. 2022, 225, 119171. [Google Scholar] [CrossRef]
Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
Núñez, J.; Cortés, C.B.; Yáñez, M.A. Explainable Artificial Intelligence in Hydrology: Interpreting Black-Box Snowmelt-Driven Streamflow Predictions in an Arid Andean Basin of North-Central Chile. Water 2023, 15, 3369. [Google Scholar] [CrossRef]
Madni, H.A.; Umer, M.; Ishaq, A.; Abuzinadah, N.; Saidani, O.; Alsubai, S.; Hamdi, M.; Ashraf, I. Water-Quality Prediction Based on H2O AutoML and Explainable AI Techniques. Water 2023, 15, 475. [Google Scholar] [CrossRef]
Alshehri, F.; Rahman, A. Coupling Machine and Deep Learning with Explainable Artificial Intelligence for Improving Prediction of Groundwater Quality and Decision-Making in Arid Region, Saudi Arabia. Water 2023, 15, 2298. [Google Scholar] [CrossRef]
Haggerty, R.; Sun, J.; Yu, H.; Li, Y. Application of machine learning in groundwater quality modeling—A comprehensive review. Water Res. 2023, 233, 119745. [Google Scholar] [CrossRef]
Ibrahim, H.; Yaseen, Z.M.; Scholz, M.; Ali, M.; Gad, M.; Elsayed, S.; Khadr, M.; Hussein, H.; Ibrahim, H.H.; Eid, M.H.; et al. Evaluation and Prediction of Groundwater Quality for Irrigation Using an Integrated Water Quality Indices, Machine Learning Models and GIS Approaches: A Representative Case Study. Water 2023, 15, 694. [Google Scholar] [CrossRef]
Rahat, S.H.; Steissberg, T.; Chang, W.; Chen, X.; Mandavya, G.; Tracy, J.; Wasti, A.; Atreya, G.; Saki, S.; Bhuiyan, M.A.E.; et al. Remote sensing-enabled machine learning for river water quality modeling under multidimensional uncertainty. Sci. Total Environ. 2023, 898, 165504. [Google Scholar] [CrossRef]
Huangfu, K.; Li, J.; Zhang, X.; Zhang, J.; Cui, H.; Sun, Q. Remote Estimation of Water Quality Parameters of Medium- and Small-Sized Inland Rivers Using Sentinel-2 Imagery. Water 2020, 12, 3124. [Google Scholar] [CrossRef]
O’Grady, J.; Zhang, D.; O’Connor, N.; Regan, F. A comprehensive review of catchment water quality monitoring using a tiered framework of integrated sensing technologies. Sci. Total Environ. 2021, 765, 142766. [Google Scholar] [CrossRef]
Bai, Y.; Peng, M.; Wang, M. A River Water Quality Prediction Method Based on Dual Signal Decomposition and Deep Learning. Water 2024, 16, 3099. [Google Scholar] [CrossRef]
Ahmed, N.; Othman, F.B.; Afan, H.A.; Ibrahim, R.K.; Fai, C.M.; Hossain, M.S.; Ehteram, M.; Elshafie, A. Machine learning methods for better water quality prediction. J. Hydrol. 2019, 578, 124084. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, Z.; Wang, X.; Liao, Z.; Wang, L. The Use of Attention-Enhanced CNN-LSTM Models for Multi-Indicator and Time-Series Predictions of Surface Water Quality. Water Resour. Manag. 2024, 38, 6103–6119. [Google Scholar] [CrossRef]
Sadler, J.M.; Koenig, L.E.; Gorski, G.; Carter, A.M.; Hall, R.O., Jr. Evaluating a process-guided deep learning approach for predicting dissolved oxygen in streams. Hydrol. Process. 2024, 38, e15270. [Google Scholar] [CrossRef]
Rajagopal, S.; Ganesh, S.S.; Karthick, A.; Sampradeepraj, T. Environmental water quality prediction based on COOT-CSO-LSTM deep learning. Environ. Sci. Pollut. Res. 2024, 31, 54525–54533. [Google Scholar] [CrossRef] [PubMed]
Poursaeid, M.; Poursaeed, A.H.; Shabanlou, S. Water quality fluctuations prediction and Debi estimation based on stochastic optimized weighted ensemble learning machine. Process Saf. Environ. Prot. 2024, 188, 1160–1174. [Google Scholar] [CrossRef]
Vellingiri, J.; Kalaivanan, K.; Shanmugaiah, K.; Bai, F.J.J.S. AO-SVM: A machine learning model for predicting water quality in the cauvery river. Environ. Res. Commun. 2024, 6, 075025. [Google Scholar] [CrossRef]
Poluru, R.K.; Sundararajan, S.; Balakrishnan, S.; Rajagopal, M. Predicting nitrous oxide contaminants in Cauvery basin using region-based convolutional neural network. Groundw. Sustain. Dev. 2024, 26, 101194. [Google Scholar] [CrossRef]
Lin, Z.; Lim, J.Y.; Oh, J.-M. Innovative interpretable AI-guided water quality evaluation with risk adversarial analysis in river streams considering spatial-temporal effects. Environ. Pollut. 2024, 350, 124015. [Google Scholar] [CrossRef]
Liu, W.; Lin, S.; Li, X.; Li, W.; Deng, H.; Fang, H.; Li, W. Analysis of dissolved oxygen influencing factors and concentration prediction using input variable selection technique: A hybrid machine learning approach. J. Environ. Manag. 2024, 357, 120777. [Google Scholar] [CrossRef]
Saha, G.; Shen, C.; Duncan, J.; Cibin, R. Performance evaluation of deep learning based stream nitrate concentration prediction model to fill stream nitrate data gaps at low-frequency nitrate monitoring basins. J. Environ. Manag. 2024, 357, 120721. [Google Scholar] [CrossRef]
Singh, R.B.; Patra, K.C.; Pradhan, B.; Samantra, A. HDTO-DeepAR: A novel hybrid approach to forecast surface water quality indicators. J. Environ. Manag. 2024, 352, 120091. [Google Scholar] [CrossRef] [PubMed]
Hu, Y.; Liu, C.; Wollheim, W.M. Prediction of riverine daily minimum dissolved oxygen concentrations using hybrid deep learning and routine hydrometeorological data. Sci. Total Environ. 2024, 918, 170383. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Wang, Q.; Liu, Z.; Wu, T. A deep learning interpretable model for river dissolved oxygen multi-step and interval prediction based on multi-source data fusion. J. Hydrol. 2024, 629, 130637. [Google Scholar] [CrossRef]
Al-Mukhtar, M.; Srivastava, A.; Khadke, L.; Al-Musawi, T.; Elbeltagi, A. Prediction of Irrigation Water Quality Indices Using Random Committee, Discretization Regression, REPTree, and Additive Regression. Water Resour. Manag. 2024, 38, 343–368. [Google Scholar] [CrossRef]
E, B.; Zhang, S.; Driscoll, C.T.; Wen, T. Human and natural impacts on the U.S. freshwater salinization and alkalinization: A machine learning approach. Sci. Total Environ. 2023, 889, 164138. [Google Scholar] [CrossRef]
Luo, L.; Zhang, Y.; Dong, W.; Zhang, J.; Zhang, L. Ensemble Empirical Mode Decomposition and a Long Short-Term Memory Neural Network for Surface Water Quality Prediction of the Xiaofu River, China. Water 2023, 15, 1625. [Google Scholar] [CrossRef]
Xu, R.; Wu, W.; Cai, Y.; Wan, H.; Li, J.; Zhu, Q.; Shen, S. Feature Extraction and Prediction of Water Quality Based on Candlestick Theory and Deep Learning Methods. Water 2023, 15, 845. [Google Scholar] [CrossRef]
Im, Y.; Song, G.; Lee, J.; Cho, M. Deep Learning Methods for Predicting Tap-Water Quality Time Series in South Korea. Water 2022, 14, 3766. [Google Scholar] [CrossRef]
Hoque, J.M.Z.; Aziz, N.A.A.; Alelyani, S.; Mohana, M.; Hosain, M. Improving Water Quality Index Prediction Using Regression Learning Models. Int. J. Environ. Res. Public Health 2022, 19, 13702. [Google Scholar] [CrossRef]
Dorado-Guerra, D.Y.; Corzo-Pérez, G.; Paredes-Arquiola, J.; Pérez-Martín, M.Á. Machine learning models to predict nitrate concentration in a river basin. Environ. Res. Commun. 2023, 4, 125012. [Google Scholar] [CrossRef]
Adedeji, C.; Ahmadisharaf, E.; Sun, Y. Predicting in-stream water quality constituents at the watershed scale using machine learning. J. Contam. Hydrol. 2022, 251, 104078. [Google Scholar] [CrossRef]
Khosravi, K.; Golkarian, A.; Melesse, A.M.; Deo, R.C. Suspended sediment load modeling using advanced hybrid rotation forest based elastic network approach. J. Hydrol. 2022, 610, 127963. [Google Scholar] [CrossRef]
Song, C.; Yao, L. A hybrid model for water quality parameter prediction based on CEEMDAN-IALO-LSTM ensemble learning. Environ. Earth Sci. 2022, 81, 262. [Google Scholar] [CrossRef]
Hou, Y.; Zhang, A.; Lv, R.; Zhao, S.; Ma, J.; Zhang, H.; Li, Z. A study on water quality parameters estimation for urban rivers based on ground hyperspectral remote sensing technology. Environ. Sci. Pollut. Res. 2022, 29, 63640–63654. [Google Scholar] [CrossRef] [PubMed]
Balson, T.; Ward, A.S. A machine learning approach to water quality forecasts and sensor network expansion: Case study in the Wabash River Basin, United States. Hydrol. Process. 2022, 36, e14619. [Google Scholar] [CrossRef]
Malek, H.A.; Yaacob, W.F.W.; Nasir, S.A.M.; Shaadan, N. Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques. Water 2022, 14, 1067. [Google Scholar] [CrossRef]
Rizal, N.N.M.; Hayder, G.; Yusof, K.A. Water Quality Predictive Analytics Using an Artificial Neural Network with a Graphical User Interface. Water 2022, 14, 1221. [Google Scholar] [CrossRef]
Weierbach, H.; Lima, A.R.; Willard, J.D.; Hendrix, V.C.; Christianson, D.S.; Lubich, M.; Varadharajan, C. Stream Temperature Predictions for River Basin Management in the Pacific Northwest and Mid-Atlantic Regions Using Machine Learning. Water 2022, 14, 1032. [Google Scholar] [CrossRef]
del Castillo, A.; Yebra-Montes, C.; Garibay, M.V.; de Anda, J.; Garcia-Gonzalez, A.; Gradilla-Hernández, M.S. Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning. Water 2022, 14, 1235. [Google Scholar] [CrossRef]
Ilić, M.; Srdjević, Z.; Srdjević, B. Water quality prediction based on Naïve Bayes algorithm. Water Sci. Technol. 2022, 85, 1027–1039. [Google Scholar] [CrossRef]
Arora, S.; Keshari, A.K. Dissolved oxygen modelling of the Yamuna River using different ANFIS models. Water Sci. Technol. 2021, 84, 3359–3371. [Google Scholar] [CrossRef]
Moghadam, S.V.; Sharafati, A.; Feizi, H.; Marjaie, S.M.S.; Asadollah, S.B.H.S.; Motta, D. An efficient strategy for predicting river dissolved oxygen concentration: Application of deep recurrent neural network model. Environ. Monit. Assess. 2021, 193, 798. [Google Scholar] [CrossRef]
Xu, C.; Chen, X.; Zhang, L. Predicting river dissolved oxygen time series based on stand-alone models and hybrid wavelet-based models. J. Environ. Manag. 2021, 295, 113085. [Google Scholar] [CrossRef]
Najah, A.; Teo, F.Y.; Chow, M.F.; Huang, Y.F.; Latif, S.D.; Abdullah, S.; Ismail, M.; El-Shafie, A. Surface water quality status and prediction during movement control operation order under COVID-19 pandemic: Case studies in Malaysia. Int. J. Environ. Sci. Technol. 2021, 18, 1009–1018. [Google Scholar] [CrossRef]
Kim, S.; Maleki, N.; Rezaie-Balf, M.; Singh, V.P.; Alizamir, M.; Kim, N.W.; Lee, J.T.; Kisi, O. Assessment of the total organic carbon employing the different nature-inspired approaches in the Nakdong River, South Korea. Environ. Monit. Assess. 2021, 193, 445. [Google Scholar] [CrossRef] [PubMed]
Yan, J.; Liu, J.; Yu, Y.; Xu, H. Water Quality Prediction in the Luan River Based on 1-DRCNN and BiGRU Hybrid Neural Network Model. Water 2021, 13, 1273. [Google Scholar] [CrossRef]
Setshedi, J.; Mutingwende, N.; Ngqwala, N. The Use of Artificial Neural Networks to Predict the Physicochemical Characteristics of Water Quality in Three District Municipalities, Eastern Cape Province, South Africa. Int. J. Environ. Res. Public Health 2021, 18, 5248. [Google Scholar] [CrossRef] [PubMed]
Abba, S.I.; Abdulkadir, R.A.; Sammen, S.S.; Usman, A.G.; Meshram, S.G.; Malik, A.; Shahid, S. Comparative implementation between neuro-emotional genetic algorithm and novel ensemble computing techniques for modelling dissolved oxygen concentration. Hydrol. Sci. J. 2021, 66, 1584–1596. [Google Scholar] [CrossRef]
Sha, J.; Li, X.; Zhang, M.; Wang, Z.-L. Comparison of Forecasting Models for Real-Time Monitoring of Water Quality Parameters Based on Hybrid Deep Learning Neural Networks. Water 2021, 13, 1547. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Fitch, P.; Thorburn, J. Predicting the Trend of Dissolved Oxygen Based on the kPCA-RNN Model. Water 2020, 12, 585. [Google Scholar] [CrossRef]
Baek, S.-S.; Pyo, J.; Chun, J.A. Prediction of Water Level and Water Quality Using a CNN-LSTM Combined Deep Learning Approach. Water 2020, 12, 3399. [Google Scholar] [CrossRef]
Jamei, M.; Ahmadianfar, I.; Chu, X.; Yaseen, Z.M. Prediction of surface water total dissolved solids using hybridized wavelet-multigene genetic programming: New approach. J. Hydrol. 2020, 589, 125335. [Google Scholar] [CrossRef]
Bui, D.T.; Khosravi, K.; Tiefenbacher, J.; Nguyen, H.; Kazakis, N. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Environ. 2020, 721, 137612. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; Fang, G.; Huang, X.; Zhang, Y. Water Quality Prediction Model of a Water Diversion Project Based on the Improved Artificial Bee Colony–Backpropagation Neural Network. Water 2018, 10, 806. [Google Scholar] [CrossRef]
Raheli, B.; Aalami, M.T.; El-Shafie, A.; Ghorbani, M.A.; Deo, R.C. Uncertainty assessment of the multilayer perceptron (MLP) neural network model with implementation of the novel hybrid MLP-FFA method for prediction of biochemical oxygen demand and dissolved oxygen: A case study of Langat River. Environ. Earth Sci. 2017, 76, 503. [Google Scholar] [CrossRef]
Stoica, C.; Camejo, J.; Banciu, A.; Nita-Lazar, M.; Paun, I.; Cristofor, S.; Pacheco, O.R.; Guevara, M. Water quality of Danube Delta systems: Ecological status and prediction using machine-learning algorithms. Water Sci. Technol. 2016, 73, 2413–2421. [Google Scholar] [CrossRef]
Zhang, Q.; You, X. Recent Advances in Surface Water Quality Prediction Using Artificial Intelligence Models. Water Resour. Manag. 2024, 38, 235–250. [Google Scholar] [CrossRef]
Gómez-Escalonilla, V.; Montero-González, E.; Díaz-Alcaide, S.; Martín-Loeches, M.; del Rosario, M.R.; Martínez-Santos, P. A machine learning approach to site groundwater contamination monitoring wells. Appl. Water Sci. 2024, 14, 250. [Google Scholar] [CrossRef]
Zhang, J.; Xiao, C.; Yang, W.; Liang, X.; Zhang, L.; Wang, X.; Dai, R. Improving prediction of groundwater quality in situations of limited monitoring data based on virtual sample generation and Gaussian process regression. Water Res. 2024, 267, 122498. [Google Scholar] [CrossRef]
Jeong, H.; Abbas, A.; Kim, H.G.; Hoan, H.V.; Tuan, P.V.; Long, P.T.; Lee, E.; Cho, K.H. Spatial prediction of groundwater salinity in multiple aquifers of the Mekong Delta region using explainable machine learning models. Water Res. 2024, 266, 122404. [Google Scholar] [CrossRef]
Li, X.; Liang, G.; Wang, L.; Yang, Y.; Li, Y.; Li, Z.; He, B.; Wang, G. Identifying the spatial pattern and driving factors of nitrate in groundwater using a novel framework of interpretable stacking ensemble learning. Environ. Geochem. Health 2024, 46, 482. [Google Scholar] [CrossRef] [PubMed]
Boufekane, A.; Meddi, M.; Maizi, D.; Busico, G. Performance of artificial intelligence model (LSTM model) for estimating and predicting water quality index for irrigation purposes in order to improve agricultural production. Environ. Monit. Assess. 2024, 196, 1049. [Google Scholar] [CrossRef] [PubMed]
Lal, A.; Sharan, A.; Sharma, K.; Ram, A.; Roy, D.K.; Datta, B. Scrutinizing different predictive modeling validation methodologies and data-partitioning strategies: New insights using groundwater modeling case study. Environ. Monit. Assess. 2024, 196, 623. [Google Scholar] [CrossRef] [PubMed]
Khan, I.; Ayaz, M. Sensitivity analysis-driven machine learning approach for groundwater quality prediction: Insights from integrating ENTROPY and CRITIC methods. Groundw. Sustain. Dev. 2024, 26, 101309. [Google Scholar] [CrossRef]
Cao, W.; Zhang, Z.; Fu, Y.; Zhao, L.; Ren, Y.; Nan, T.; Guo, H. Prediction of arsenic and fluoride in groundwater of the North China Plain using enhanced stacking ensemble learning. Water Res. 2024, 259, 121848. [Google Scholar] [CrossRef]
Das, R.; Das, S. Coastal groundwater quality prediction using objective-weighted WQI and machine learning approach. Environ. Sci. Pollut. Res. 2024, 31, 1943–19457. [Google Scholar] [CrossRef]
Chatterjee, T.; Gogoi, U.R.; Samanta, A.; Chatterjee, A.; Singh, M.K.; Pasupuleti, S. Identifying the Most Discriminative Parameter for Water Quality Prediction Using Machine Learning Algorithms. Water 2024, 16, 481. [Google Scholar] [CrossRef]
Tesoriero, J.; Wherry, S.A.; Dupuy, D.I.; Johnson, T.D. Predicting Redox Conditions in Groundwater at a National Scale Using Random Forest Classification. Environ. Sci. Technol. 2024, 58, 5079–5092. [Google Scholar] [CrossRef]
Elzain, H.E.; Abdalla, O.; Ahmed, H.A.; Kacimov, A.; Al-Maktoumi, A.; Al-Higgi, K.; Abdallah, M.; Yassin, M.A.; Senapathi, V. An innovative approach for predicting groundwater TDS using optimized ensemble machine learning algorithms at two levels of modeling strategy. J. Environ. Manag. 2024, 351, 119896. [Google Scholar] [CrossRef]
Iqbal, J.; Su, C.; Ahmad, M.; Baloch, M.Y.J.; Rashid, A.; Ullah, Z.; Abbas, H.; Nigar, A.; Ali, A.; Ullah, A.; et al. Hydrogeochemistry and prediction of arsenic contamination in groundwater of Vehari, Pakistan: Comparison of artificial neural network, random forest and logistic regression models. Environ. Geochem. Health 2023, 46, 14. [Google Scholar] [CrossRef]
Krishnamoorthy, L.; Lakshmanan, V.R. Groundwater quality assessment using machine learning models: A comprehensive study on the industrial corridor of a semi-arid region. Environ. Sci. Pollut. Res. 2024. [Google Scholar] [CrossRef] [PubMed]
Mahboobi, H.; Shakiba, A.; Mirbagheri, B. Improving groundwater nitrate concentration prediction using local ensemble of machine learning models. J. Environ. Manag. 2023, 345, 118782. [Google Scholar] [CrossRef] [PubMed]
Sajib, M.; Diganta, M.T.M.; Rahman, A.; Dabrowski, T.; Olbert, A.I.; Uddin, M.G. Developing a novel tool for assessing the groundwater incorporating water quality index and machine learning approach. Groundw. Sustain. Dev. 2023, 23, 101049. [Google Scholar] [CrossRef]
Masoudi, R.; Mousavi, S.R.; Rahimabadi, D.; Panahi, M.; Rahmani, A. Assessing data mining algorithms to predict the quality of groundwater resources for determining irrigation hazard. Environ. Monit. Assess. 2023, 195, 319. [Google Scholar] [CrossRef]
Liu, C.; Xu, M.; Liu, Y.; Li, X.; Pang, Z.; Miao, S. Predicting Groundwater Indicator Concentration Based on Long Short-Term Memory Neural Network: A Case Study. Int. J. Environ. Res. Public Health 2022, 19, 15612. [Google Scholar] [CrossRef]
Huynh, T.-M.-T.; Ni, C.-F.; Su, Y.-S.; Nguyen, V.-C.-N.; Lee, I.-H.; Lin, C.-P.; Nguyen, H.-H. Predicting Heavy Metal Concentrations in Shallow Aquifer Systems Based on Low-Cost Physiochemical Parameters Using Machine Learning Techniques. Int. J. Environ. Res. Public Health 2022, 19, 12180. [Google Scholar] [CrossRef]
Taşan, M.; Taşan, S.; Demir, Y. Estimation and uncertainty analysis of groundwater quality parameters in a coastal aquifer under seawater intrusion: A comparative study of deep learning and classic machine learning methods. Environ. Sci. Pollut. Res. 2023, 30, 2866–2890. [Google Scholar] [CrossRef]
Banerjee, K.; Bali, V.; Nawaz, N.; Bali, S.; Mathur, S.; Mishra, R.K.; Rani, S. A Machine-Learning Approach for Prediction of Water Contamination Using Latitude, Longitude, and Elevation. Water 2022, 14, 728. [Google Scholar] [CrossRef]
Kouadri, S.; Pande, C.B.; Panneerselvam, B.; Moharir, K.N.; Elbeltagi, A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut. Res. 2022, 29, 21067–21091. [Google Scholar] [CrossRef]
Messier, K.; Wheeler, D.C.; Flory, A.R.; Jones, R.R.; Patel, D.; Nolan, B.T.; Ward, M.H. Modeling groundwater nitrate exposure in private wells of North Carolina for the Agricultural Health Study. Sci. Total Environ. 2019, 655, 512–519. [Google Scholar] [CrossRef]
Sakizadeh, M. Spatial analysis of total dissolved solids in Dezful Aquifer: Comparison between universal and fixed rank kriging. J. Contam. Hydrol. 2019, 221, 26–34. [Google Scholar] [CrossRef] [PubMed]
Abbas, F.; Cai, Z.; Shoaib, M.; Iqbal, J.; Ismail, M.; Arifullah; Alrefaei, A.F.; Albeshr, M.F. Machine Learning Models for Water Quality Prediction: A Comprehensive Analysis and Uncertainty Assessment in Mirpurkhas, Sindh, Pakistan. Water 2024, 16, 941. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
Aldrees, A.; Awan, H.H.; Javed, M.F.; Mohamed, A.M. Prediction of water quality indexes with ensemble learners: Bagging and boosting. Process Saf. Environ. Prot. 2022, 168, 344–361. [Google Scholar] [CrossRef]
Chen, Y.; Yao, K.; Zhu, B.; Gao, Z.; Xu, J.; Li, Y.; Hu, Y.; Lin, F.; Zhang, X. Water Quality Inversion of a Typical Rural Small River in Southeastern China Based on UAV Multispectral Imagery: A Comparison of Multiple Machine Learning Algorithms. Water 2024, 16, 553. [Google Scholar] [CrossRef]
Najah, A.; El-Shafie, A.; Karim, O.A.; El-Shafie, A.H. Application of artificial neural networks for water quality prediction. Neural Comput. Appl. 2013, 22, 187–201. [Google Scholar] [CrossRef]
Gulati, S.; Pal, A. Tuning fuzzy logic controller with SGWO for river water quality modelling. Mater. Today Proc. 2022, 54, 733–737. [Google Scholar] [CrossRef]
Dilipkumar, J.; Shanmugam, P. Fuzzy-based global water quality assessment and water quality cells identification using satellite data. Mar. Pollut. Bull. 2023, 193, 115148. [Google Scholar] [CrossRef]
Kalyakulina, A.; Yusipov, I.; Moskalev, A.; Franceschi, C.; Ivanchenko, M. eXplainable Artificial Intelligence (XAI) in aging clock models. Ageing Res. Rev. 2024, 93, 102144. [Google Scholar] [CrossRef]
Nallakaruppan, K.; Gangadevi, E.; Shri, M.L.; Balusamy, B.; Bhattacharya, S.; Selvarajan, S. Reliable water quality prediction and parametric analysis using explainable AI models. Sci. Rep. 2024, 14, 7520. [Google Scholar] [CrossRef]
Kundu, S.; Datta, P.; Pal, P.; Ghosh, K.; Das, A.; Das, B.K. Unveiling the hidden connections: Using explainable artificial intelligence to assess water quality criteria in nine giant rivers. J. Clean. Prod. 2025, 492, 144861. [Google Scholar] [CrossRef]
Nong, X.; Lai, C.; Chen, L.; Wei, J. A novel coupling interpretable machine learning framework for water quality prediction and environmental effect understanding in different flow discharge regulations of hydro-projects. Sci. Total Environ. 2024, 950, 175281. [Google Scholar] [CrossRef] [PubMed]
Juna, A.; Umer, M.; Sadiq, S.; Karamti, H.; Eshmawi, A.A.; Mohamed, A.; Ashraf, I. Water Quality Prediction Using KNN Imputer and Multilayer Perceptron. Water 2022, 14, 2592. [Google Scholar] [CrossRef]
Ahmed, U.; Mumtaz, R.; Anwar, H.; Shah, A.A.; Irfan, R.; García-Nieto, J. Efficient Water Quality Prediction Using Supervised Machine Learning. Water 2019, 11, 2210. [Google Scholar] [CrossRef]
Pany, R.; Rath, A.; Swain, C. Water quality assessment for River Mahanadi of Odisha, India using statistical techniques and Artificial Neural Networks. J. Clean. Prod. 2023, 417, 137713. [Google Scholar] [CrossRef]
Yang, S.; Luo, D.; Tan, J.; Li, S.; Song, X.; Xiong, R.; Wang, J.; Ma, C.; Xiong, H. Spatial Mapping and Prediction of Groundwater Quality Using Ensemble Learning Models and SHapley Additive exPlanations with Spatial Uncertainty Analysis. Water 2024, 16, 2375. [Google Scholar] [CrossRef]
Heydari, S.; Nikoo, M.R.; Mohammadi, A.; Barzegar, R. Two-stage meta-ensembling machine learning model for enhanced water quality forecasting. J. Hydrol. 2024, 641, 131767. [Google Scholar] [CrossRef]
Rodríguez-López, L.; Usta, D.B.; Duran-Llacer, I.; Alvarez, L.B.; Yépez, S.; Bourrel, L.; Frappart, F.; Urrutia, R. Estimation of Water Quality Parameters through a Combination of Deep Learning and Remote Sensing Techniques in a Lake in Southern Chile. Remote Sens. 2023, 15, 4157. [Google Scholar] [CrossRef]
Zhang, T.; Wu, J.; Chu, H.; Liu, J.; Wang, G. Interpretable Machine Learning Based Quantification of the Impact of Water Quality Indicators on Groundwater Under Multiple Pollution Sources. Water 2025, 17, 905. [Google Scholar] [CrossRef]
Yao, Z.; Wang, Z.; Huang, J.; Xu, N.; Cui, X.; Wu, T. Interpretable prediction, classification and regulation of water quality: A case study of Poyang Lake, China. Sci. Total Environ. 2024, 951, 175407. [Google Scholar] [CrossRef]
Liang, Y.; Ding, F.; Liu, L.; Yin, F.; Hao, M.; Kang, T.; Zhao, C.; Wang, Z.; Jiang, D. Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach. J. Hydrol. 2025, 648, 132394. [Google Scholar] [CrossRef]
Shadkani, S.; Hemmatzadeh, Y.; Saber, A.; Sergini, M.M. Enhanced predictive modeling of dissolved oxygen concentrations in riverine systems using novel hybrid temporal pattern attention deep neural networks. Environ. Res. 2024, 263, 120015. [Google Scholar] [CrossRef] [PubMed]
Abuzir, S.Y.; Abuzir, Y.S. Machine learning for water quality classification. Water Qual. Res. J. 2022, 57, 152–164. [Google Scholar] [CrossRef]
Zhang, Q.; Li, Z.; Zhu, L.; Zhang, F.; Sekerinski, E.; Han, J.C.; Zhou, Y. Real-time prediction of river chloride concentration using ensemble learning. Environ. Pollut. 2021, 291, 118116. [Google Scholar] [CrossRef]
Fertikh, S.; Boutaghane, H.; Boumaaza, M.; Belaadi, A.; Bouslah, S. Assessment and prediction of water quality indices by machine learning-genetic algorithm and response surface methodology. Model. Earth Syst. Environ. 2024, 10, 5573–5604. [Google Scholar] [CrossRef]
Fooladi, M.; Nikoo, M.R.; Mirghafari, R.; Madramootoo, C.A.; Al-Rawas, G.; Nazari, R. Robust clustering-based hybrid technique enabling reliable reservoir water quality prediction with uncertainty quantification and spatial analysis. J. Environ. Manag. 2024, 362, 121259. [Google Scholar] [CrossRef]
Ghashghaie, M.; Eslami, H.; Ostad-Ali-Askari, K. Applications of time series analysis to investigate components of Madiyan-rood river water quality. Appl. Water Sci. 2022, 12, 202. [Google Scholar] [CrossRef]
Bojer, K.; Biru, B.H.; Al-Quraishi, A.M.F.; Debelee, T.G.; Negera, W.G.; Woldesillasie, F.F.; Esubalew, S.Z. Machine learning and remote sensing based time series analysis for drought risk prediction in Borena Zone, Southwest Ethiopia. J. Arid Environ. 2024, 222, 105160. [Google Scholar] [CrossRef]
Huan, S. A novel interval decomposition correlation particle swarm optimization-extreme learning machine model for short-term and long-term water quality prediction. J. Hydrol. 2023, 625, 130034. [Google Scholar] [CrossRef]
Yadav, A.; Raj, A.; Yadav, B. Enhancing local-scale groundwater quality predictions using advanced machine learning approaches. J. Environ. Manag. 2024, 370, 122903. [Google Scholar] [CrossRef]
Hasani, S.S.; Arias, M.E.; Nguyen, H.Q.; Tarabih, O.M.; Welch, Z.; Zhang, Q. Leveraging explainable machine learning for enhanced management of lake water quality. J. Environ. Manag. 2024, 370, 122890. [Google Scholar] [CrossRef]
Ezzat, D.; Soliman, M.; Ahmed, E.; Hassanien, A.E. An optimized explainable artificial intelligence approach for sustainable clean water. Environ. Dev. Sustain. 2024, 26, 25899–25919. [Google Scholar] [CrossRef]
Maroufpoor, S.; Jalali, M.; Nikmehr, S.; Shiri, N.; Shiri, J.; Maroufpoor, E. Modeling groundwater quality by using hybrid intelligent and geostatistical methods. Environ. Sci. Pollut. Res. 2020, 27, 28183–28197. [Google Scholar] [CrossRef]
Shah, M.I.; Javed, M.F.; Alqahtani, A.; Aldrees, A. Environmental assessment based surface water quality prediction using hyper-parameter optimized machine learning models based on consistent big data. Process Saf. Environ. Prot. 2021, 151, 324–340. [Google Scholar] [CrossRef]
Moayedi, H.; Salari, M.; Ali, S.A.-J.; Dehrashid, A.A.; Azadi, H. Modeling the total hardness (TH) of groundwater in aquifers using novel hybrid soft computing optimizer models. Environ. Earth Sci. 2024, 83, 392. [Google Scholar] [CrossRef]
Kaya, Y. Slope-aware and self-adaptive forecasting of water levels: A transparent model for the Great Lakes under climate variability. J. Hydrol. 2025, 662, 133948. [Google Scholar] [CrossRef]
Egbemhenghe, U.; Ojeyemi, T.; Iwuozor, K.O.; Emenike, E.C.; Ogunsanya, T.I.; Anidiobi, S.U.; Adeniyi, A.G. Revolutionizing water treatment, conservation, and management: Harnessing the power of AI-driven ChatGPT solutions. Environ. Chall. 2023, 13, 100782. [Google Scholar] [CrossRef]
Jiao, J.; Ma, Q.; Huang, S.; Liu, F.; Wan, Z. A hybrid water quality prediction model based on variational mode decomposition and bidirectional gated recursive unit. Water Sci. Technol. 2024, 89, 2273–2289. [Google Scholar] [CrossRef]
Makumbura, R.K.; Mampitiya, L.; Rathnayake, N.; Meddage, D.P.P.; Henna, S.; Dang, T.L.; Hoshino, Y.; Rathnayake, U. Advancing water quality assessment and prediction using machine learning models, coupled with explainable artificial intelligence (XAI) techniques like shapley additive explanations (SHAP) for interpreting the black-box nature. Results Eng. 2024, 23, 102831. [Google Scholar] [CrossRef]
Bordbar, M.; Busico, G.; Sirna, M.; Tedesco, D.; Mastrocicco, M. A multi-step approach to evaluate the sustainable use of groundwater resources for human consumption and agriculture. J. Environ. Manag. 2023, 347, 119041. [Google Scholar] [CrossRef]
Lee, J.M.; Ko, K.-S.; Yoo, K. A machine learning-based approach to predict groundwater nitrate susceptibility using field measurements and hydrogeological variables in the Nonsan Stream Watershed, South Korea. Appl. Water Sci. 2023, 13, 242. [Google Scholar] [CrossRef]
Zheng, H.; Liu, Y.; Wan, W.; Zhao, J.; Xie, G. Large-scale prediction of stream water quality using an interpretable deep learning approach. J. Environ. Manag. 2023, 331, 117309. [Google Scholar] [CrossRef]
Zhang, Z.; Huang, J.; Duan, S.; Huang, Y.; Cai, J.; Bian, J. Use of interpretable machine learning to identify the factors influencing the nonlinear linkage between land use and river water quality in the Chesapeake Bay watershed. Ecol. Indic. 2022, 140, 108977. [Google Scholar] [CrossRef]

Figure 1. Stages of the B-SLR methodology. Stage 1: data collection: defining keywords, search terms, applying inclusion/exclusion criteria, and filtering irrelevant results using topic modeling. Stage 2: bibliometric analysis: identifying trends, thematic clusters, and performance metrics. Stage 3: systematic review: extracting and synthesizing the evidence to answer the research questions.

Figure 2. Flowchart of the Topic Modeling Process.

Figure 3. Annual publication output on freshwater quality prediction using AI/ML/DL.

Figure 4. Network analysis co-occurrence of the author’s keywords.

Figure 5. Word cloud of water quality prediction by AI/ML/DL. The size of the words reflects their frequency and the color is used only to differentiate them within the group of words.

Figure 6. Thematic map based on the author’s keyword. The colors represent the different clusters of related terms that form a thematic group, and the size of each circle indicates the frequency of occurrence or density of publications related to that group.

Figure 7. Thematic evolution of AI/ML/DL applications in water quality prediction.

Figure 8. Collaboration with country authors. The thickness of each line represents the intensity of collaboration between two countries and the colors represent the dominant clusters of regional or thematic collaboration.

Figure 9. Classification of publications in water quality prediction according to (a) Approach: Article research versus Review. (b) Body of water: Underground versus Surface, (c) Surface water: River, lake, reservoir.

Figure 10. ML/DL models most used in water quality prediction: (a) Classification; (b) Percentage of applicability.

Table 1. Research questions defined in this study.

Research Questions	Justification
RQ1. What are the most commonly used AI/ML/DL algorithms for predicting water quality	To establish a general overview of the research topic.
RQ2. Which AI/ML/DL algorithm provides the most accurate estimation of water quality?	To identify knowledge gaps in AI/ML/DL prediction models.
RQ3. What limitations have been identified in water quality prediction using AI/ML/DL techniques?	To uncover potential research opportunities and future work
RQ4. What emerging variants currently exist in AI/ML/DL models for estimating water quality?	To identify current trends in AI/ML/DL techniques for water quality prediction.
RQ5. What are the key water quality indicators used to assess natural water sources?	To review and understand the factors that determine water quality.

Table 2. Search strategy applied to Scopus database.

Search Strategy

Total Documents

Search chain:
“water” AND “quality” AND “prediction” AND “machine” AND “learning” OR “water” AND “quality” AND “prediction” AND “artificial” AND “intelligence” OR “water” AND “quality” AND “prediction” AND “deep” AND “learning”

3157

Research code line:
(TITLE-ABS-KEY (water AND quality AND prediction AND machine AND learning) OR TITLE-ABS-KEY (water AND quality AND prediction AND artificial AND intelligence) OR TITLE-ABS-KEY (water AND quality AND prediction AND deep AND learning)) AND PUBYEAR > 1999 AND PUBYEAR < 2025 AND (LIMIT-TO (DOCTYPE, “re”) OR LIMIT-TO (DOCTYPE, “ar”)) AND (LIMIT-TO (LANGUAGE, “English”)) AND (LIMIT-TO (SUBJAREA, “ENVI”) OR LIMIT-TO (SUBJAREA, “ENGI”) OR LIMIT-TO (SUBJAREA, “EART”) OR LIMIT-TO (SUBJAREA, “MULT”))

1822

Table 3. Inclusion and exclusion criteria used in the B-SLR methodology.

Inclusion Criteria

Publications classified as “Research Article” or “Review”

The study must be published in English to ensure accessibility and comprehension.

Publications within the subject areas of Environmental Science, Engineering, Earth and Planetary Sciences, and Multidisciplinary

Articles containing the keywords specified in the search string

Exclusion Criteria

Conference proceedings, books, book chapters, theses and reports were excluded.

Publications that have not undergone a formal peer review process, such as preprints, unpublished reports, or unreviewed gray literature.

Research whose primary focus is not on the application of AI/ML/DL for the prediction of freshwater quality (surface and groundwater)

Table 4. Journals Local impact.

ID	Journals	H Index	TC
1	Water (Switzerland)	19	1568
2	Journal of Hydrology	18	2231
3	Environmental Science and Pollution Research	14	803
4	Water Research	10	823
5	Science of the Total Environment	9	737
6	Journal of Environmental Management	7	193
7	International Journal of Environmental Research and Public Health	6	103
8	Environmental Monitoring and Assessment	5	115
9	Hydrological Processes	5	82
10	Process Safety and Environmental Protection	5	147

Table 5. Top ten Local citations and Global citations.

Ranking	First Author	Year	LC ¹	GC ²	Reference
1	Rahim Barzegar	2020	32	330	[88]
2	Rahim Barzegar	2016	15	149	[89]
3	Jun Yung Ho	2019	13	101	[90]
4	Amir Hamzeh Haghiabi	2018	13	290	[37]
5	Xiaoliang Ji	2017	11	10	[91]
6	Elham Fijani	2019	10	146	[92]
7	Sani Isah Abba	2020	9	91	[93]
8	Muhammed Sit	2020	9	273	[82]
9	Roohollah Noori	2015	9	66	[94]
10	Bachir Sakaa	2022	7	66	[95]

¹ LC = Local citation. ² GC = Global citation.

Table 6. Characterization of predictive models in a representative sample of river water quality prediction studies (n = 57).

ID	River	Algorithm	Approach	Reference
1	Yangtze River, China	CNN-LSTM	WQP	[116]
2	Delaware River Basin, USA	XGB, RF, KNN	WQP	[117]
3	Sheshui River in Wuhan, China	RF, SSA-CNN-LSTM	WQP	[114]
4	Vaigai, Madurai, and Tamil Nadu Rivers, India	Optimization algorithm and LSTM	WQP	[118]
5	Upper Red River Basin (URRB), USA	TL, FFNNs	WQP	[102]
6	The South Platte River, Colorado, USA	EBM, SWEBM	WQP	[119]
7	Cauvery River, India	AO-SVM	WQI	[120]
8	Indian Rivers	DT, RF, GBT, ANN, SVM	WQP	[42]
9	Cauvery River, India	CNN	WQP	[121]
10	Han River, South Korea	RF, SVR, XGB, LGB, and a hybrid model. SHAP, LIME	WQI	[122]
11	Tanjiang River, China	SVR	WQP	[123]
12	Des Moines, Iowa, and Cedar Rivers, Iowa, USA	LSTM, GRU	WQP	[124]
13	Mahanadi River, India	LSTM, GRU, XGB	WQI	[125]
14	Oyster River, New Hampshire, USA	CNN-LSTM	WQP	[126]
15	Li River and Liu River, China	SSA, GRU, SHAP	WQP	[127]
16	Fujian River Network, China	WA-LSTM-TL	WQP	[103]
17	Euphrates River, Iraq	RC, DR, REPT, AR	WQP	[128]
18	Ohio River, USA	LSTM	WQP	[111]
19	USA Rivers	RF	WQP	[129]
20	Xiaofu River, China	LSTM	WQP	[130]
21	Lijiang River, China	BPNN, SVR, GRU	WQP	[131]
22	Drinking water quality, South Korea	LSTM, GRU	WQP	[132]
23	Indian, Rivers *	DT, LR, Ridge, Lasso, SVR, RF, ETR, ANN	WQI	[133]
24	Júcar River, Spain	RF, XGB, SHAP	WQP	[134]
25	Bullfrog River, Tampa, Florida USA	SVM, RF, XGB, ANN, SHAP	WQP	[135]
26	Talar River, Iran	EN, AMT, REPT	WQP	[136]
27	Wadi Saf-Saf River, Algeria	SMO-SVM, RF	WQI	[95]
28	Pearl River, China	CEEMDAN -LSTM	WQP	[137]
29	Fuyang River, China	RF, PLS	WQP	[138]
30	Synthetic dataset, Wabash River, USA	SVMR	WQP	[139]
31	Yamuna River, India	LSTM, SVR, CNN-LSTM	WQP	[30]
32	Kelantan River, Malaysia	KNN, ANN, DT, RF, GB	WQP	[140]
33	Langat River, Malaysia	ANN	WQP	[141]
34	Mid-Atlantic and Pacific Northwest USA, River Basin	SVR, XGB	WQP	[142]
35	Santiago-Guadalajara River, Mexico	SLR, MLR	WQI	[143]
36	Danube, Tisa, and Sava Rivers, Vojvodina Province, Serbia	Naïve Bayes algorithm	WQI	[144]
37	Yamuna River, India	ANFIS–GP, ANFIS–SC	WQP	[145]
38	Fanno Creek in Oregon, USA	DRNN, SVM, ANN	WQP	[146]
39	Dongjiang River, China	WT-MLR, WT-SVM, WT-ANN, WT-RF	WQP	[147]
40	Klang and Penang Rivers, Malaysia	MLP, SVM, RF, BDT	WQI	[148]
41	Nakdong River, South Korea	CEEMDAN, CSA, MARS	WQP	[149]
42	Luan River, Tangshan China	1-DRCNN *, BiGRU	WQP	[150]
43	Tyhume, Bloukrans, Buffalo Rivers Province of South Africa	ANN, MLP, RBF	WQP	[151]
44	Kinta River, Malaysia	EANN-GA, EANN, FFNN, NNE	WQI	[152]
45	Xin’anjiang River, China	CNN-LSTM, CEEMDAN	WQP	[153]
46	The Juhe River, Sanhe China	PSO-DBN-LSSVR	WQP	[24]
47	Burnett River, Australia	kPCA, RNN, FFNN, SVR, GRNN	WQP	[154]
48	Nakdong River, South Korea	CNN-LSTM	WQP	[155]
49	Sefid Rud River, Iran	W-MGGP, GEP, DWT	WQP	[156]
50	Talar River, Iran	RF, RFC	WQI	[157]
51	Yangtze River, Jiangsu, China	IABC-BP	WQP	[158]
52	Klang River, Malaysia	DT	WQI	[90]
53	Langat River, Malaysia	MLP-FFA	WQP	[159]
54	Tireh River, Iran	ANN, GMDH, SVM	WQP	[37]
55	Danube Delta River, Romania	ANN, KNN, BPNN	WQI	[160]
56	Sefidrood River, Iran	SVM	WQP	[94]
57	Aji-Chay River, Iran	ANN, ANFIS, WT	WQP	[89]

* Indian water quality data from Kaggle, 1-DRCNN: One-dimensional residual convolutional neural networks.

Table 7. Characterization of predictive models in a representative sample of groundwater quality prediction studies (n = 26).

ID	Region	Parameters	Algorithm	Reference
1	Madrid, Spain	Nitrate concentrations	DT, RF, AdaBoost, ExT	[162]
2	Songyuan City, China	Strontium (Sr²⁺)	GAN, KNN, GPR	[163]
3	Mekong Delta región, Vietnam	Salinity levels	Bagging, CatBoost, ExT, HGB, XGB, DT, RF, LightGBM, KNN, SHAP	[164]
4	Eden Valley, Cumbria, North West England	Nitrate concentrations	DT, XGB, RF, KNN, SHAP	[165]
5	Kerala, India	EWQI	XGB, SVR, ANN, RF	[46]
6	The Mitidja plain, northern Algeria	IWQI	LSTM	[166]
7	Groundwater dataset	Salinity levels	GMDH algorithm	[167]
8	Tamil Nadu, India	IWQI	SVM, ANN, LRM, RT, GPR, BRT	[168]
9	North China Plain, Beijing	Arsenic (As) and fluoride (F−) concentrations	XGB, RF, SVM,	[169]
10	Eastern India	WQI	MLP-ANN	[170]
11	Raipur district, Chhattisgarh, India	WQI	ANN-LR
12	Midwestern United States	Redox Conditions	GBM, XGB, RF	[171]
13	Hawasinah catchment Wilayat Al-Khaburah, Oman	TDS	CatBoost regression, ETR, Bagging regression	[172]
14	Vehari, Punjab Province of Pakistan	WQI	ANN, RF, LR	[173]
15	Northeast of Tamil Nadu, India	WQI	GB, RF, DT, KNN, MLP, XGB, SVR	[174]
16	Qom City, Iran	Nitrate concentration	KNN, SVR, RF	[175]
17	Savar, Dhaka district, Bangladesh	GWQI *	LR, SVM, ANN	[176]
18	Al Qunfudhah, Saudi Arabia	WQI	CNN, XGB, SHAP	[177]
19	Fars Province, Iran	WQI	RF, BRT, MnLR	[178]
20	Wendeng District, China	WQI	LSTM	[179]
21	Taiwan Groundwater Pollution Monitoring Standard	Heavy Metal Concentrations	SVR, KNN, MLP, GBR, LIME, SHAP	[180]
22	Middle Black Sea Region of Turkey	WQP	CNN, RF, XGB, DNN	[181]
23	Noida, Uttar Pradesh, India	WQP	MLR, SVR, DT	[182]
24	The Akot basin, Akola district of Maharashtra, India	IWQI	ANN, LSTM, MLR	[183]
25	North Carolina, USA	Nitrate concentrations	RF	[184]
26	Dezful Aquifer, Iran	TDS	RF	[185]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Bibliometric-Systematic Literature Review (B-SLR) of Machine Learning-Based Water Quality Prediction: Trends, Gaps, and Future Directions

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Gathering Process

2.2. Bibliometric Analysis (BA)

2.3. Systematic Literature Review (SLR)

3. Results

3.1. Bibliometric Performance Analysis

3.1.1. Journals

3.1.2. Most Cited Documents

3.2. Bibliometric Science Mapping

3.2.1. Network Analysis of the Co-Occurrence of Authors’ Keywords

3.2.2. Word Cloud

3.2.3. Thematic Map with Authors’ Keywords

3.2.4. Thematic Evolution and Trend Topics with Keywords Plus

3.2.5. Social Structure

3.3. Systematic Literature Reviews Results

3.3.1. Prediction of River Water Quality Using AI/ML/DL

3.3.2. Prediction of Groundwater Quality Using AI/ML/DL

4. Answering the Research Questions

5. Contribution and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics