A Review of the Application of Machine Learning Models in Groundwater Resources Management and Quality Assessment

Qiyuan Liu; Kunjie Liang; Fu Xia; Zhichao Yun; Sheng Deng; Xu Han; Yu Yang; Yonghai Jiang

doi:10.3390/su18115261

Abstract

Machine learning (ML) has evolved into an indispensable tool for uncovering hidden patterns and deducing correlations. Currently, ML is having a profound impact on the field of groundwater resources and environment research by enhancing predictive accuracy and optimizing management strategies. In this study, we conducted a bibliometric review using CiteSpace and a global-scale analysis of ML methods applied to groundwater resources and quality based on 1326 records. The findings suggest that ML applications in groundwater resources and water environment research are still in their infancy compared with other environmental science fields. This paper then provides a systematic summary of the specific applications of machine learning methodologies within groundwater research, focusing primarily on the prediction of groundwater levels and water quality, along with the extraction of feature importance. Furthermore, a comparison was made of the pros and cons of several prevalent ML techniques used in groundwater level and water quality studies, with an emphasis on the significance of aligning data with models during the application of ML. Finally, the challenges encountered by ML tools in groundwater research were addressed, along with opportunities for the future. The significant potential of employing ML methodologies in groundwater is proposed to make the invisible visible.

Keywords:

groundwater resource; groundwater pollution; machine learning; predictive model

1. Introduction

Groundwater is one of the most important water resources for human survival and development. As a key component of global liquid freshwater, groundwater stored in the lithosphere accounts for approximately 99% of total global liquid freshwater reserves, serving as an irreplaceable water supply source for various socioeconomic sectors. Specifically, it is extensively utilized in agricultural production, drinking water supply, industrial manufacturing, and numerous other fields, underpinning the stable operation of these sectors. It is reported that groundwater has provided approximately 50% of the world’s drinking water [1], while also providing 40% of agricultural irrigation and industrial water [2,3]. Additionally, as an integral part of the community of mountains, waters, forests, fields, lakes, and grasslands, groundwater plays a crucial role in supporting global vital ecological functions by interacting with its surrounding environment [4,5]. However, with the continuous increase in human activities, environmental issues related to groundwater are becoming more prominent and deteriorating. The rapid development of intensive agriculture and industry has triggered dual threats of excessive groundwater extraction and anthropogenic pollution [4,6,7,8]. This phenomenon compromises the drinking water safety of over 844 million people worldwide and poses a significant barrier to the sustainable development of the global economy and the preservation of ecological integrity [9]. Therefore, obtaining a timely and precise understanding of groundwater resources and their environmental changes is crucial for the formulation of effective regulatory strategies. Nevertheless, groundwater systems are inherently concealed and difficult to observe, accompanied by strong spatial heterogeneity and governed by intricate physical mechanisms [10]. Therefore, accurately modeling the dynamic variations in groundwater level and water quality remains a highly challenging task. With the continuous advancement of artificial intelligence technology, machine learning (ML) models analyze data through inductive hypotheses and “learn the rules” without relying on known equation systems, demonstrating enormous potential in groundwater pollution identification, prediction, and simulation of groundwater level changes [2,10,11,12,13,14,15]. Conventional hydrological models require the calibration of numerous physical parameters, and their prediction accuracy is largely constrained by the complex nonlinearity and non-stationarity of groundwater level fluctuations [16]. In contrast, ML algorithms exhibit significant advantages in modeling nonlinear and non-stationary problems. When applying ML technologies, modelers do not need to manually establish mathematical relationships between variables, as ML models can automatically learn these correlations from input data. However, such methods also have limitations, such as insufficient generalization ability caused by overfitting, the risk of using irrelevant data, incorrect modeling with inappropriate methods, and dependence on training data, among others [3,10,11]. Meanwhile, different models exhibit differences in model performance and result accuracy due to variations in data quality and structure. The size of the database is a crucial factor in applying ML models for groundwater prediction and pollution identification, but larger datasets are not always advantageous, as each model necessitates datasets that are appropriately matched to its characteristics [16,17]. Researchers are constantly adjusting and innovating the application methods of ML models [18,19]. For instance, dimensionality reduction methods are adopted to decrease the number of input features, thereby improving the measurement of similarity in high-dimensional sparse data [20]; for datasets with limited data, data augmentation techniques are used to expand the scale of training datasets [21], so as to enhance the performance and accuracy of these models.

Currently, though ML models have been used in many studies to address groundwater quality and level change issues, it is worth noting that selecting an appropriate ML model for specific groundwater research scenarios remains a major challenge in the current research field. A systematic review of previously published literature reveals that only one study has explored the application of ML technologies in groundwater quality modeling; however, this study did not conduct a systematic evaluation and summary of the applicability and prediction accuracy of different ML models under diverse research conditions. At present, a comprehensive review that elaborates on the advantages and limitations of applying ML technologies in groundwater quality and water level change research remains lacking [22]. Therefore, to better clarify the application status and development trend of ML in groundwater environment and water resource research, this study conducts a comprehensive review of the application of ML in groundwater quality and water level changes through bibliometric analysis. The first part focuses on the development status of ML models in groundwater resource and water environment research, including the temporal distribution of published studies, key research focuses, major research fields, and the distribution characteristics of different ML models. The second part discusses the applicability and prediction accuracy of ML models in groundwater quality evaluation and water level prediction by category and evaluates their scope of application based on the main algorithm models involved in existing studies. The third part analyzes the challenges and opportunities faced by ML models in future groundwater quality and quantity research, combined with the current development trends of intelligent algorithms, big data technology, and groundwater monitoring methods.

2. Bibliometric Overview

2.1. Data Collection

A literature review is a critical phase in the research process, enabling researchers to comprehend and assess the focal points, application domains, and advancements within pertinent knowledge areas, thereby offering comprehensive insights [23]. This article utilizes the Web of Science (WOS) core collection as its retrieval platform to search for relevant literature. To ensure comprehensive coverage of research pertaining to the application of ML in groundwater studies, the keywords (Article) in WOS are configured as [machine learning, or machine learning models, or artificial intelligence and groundwater or ground water] (Table 1), spanning the years from 1996 to 2023, and the language chosen for refinement is English. A total of 1326 articles were retrieved from the WOS core collection, and following meticulous verification with CiteSpace, it was confirmed that they were all unique instances. After removing 49 review articles and 46 conference proceedings that did not meet the eligibility criteria, 1231 research papers were screened for bibliometric analysis (Figure 1). The analysis was conducted using CiteSpace software (version 6.1.R4), employing 1-year time slices. Graphs were generated in CiteSpace using keywords and references as nodes. A threshold selection of top n = 30 was applied. Pruning options were set to pathfinder and pruning sliced networks [24].

Table 1. Search keywords used for article collection from WOS.

Figure 1. The exclusion and selection process of publications.

2.2. Research Evolution

Bibliometric analysis reveals that the number of publications on the application of ML methods for groundwater research has shown a significant upward trajectory over the past two decades (R² = 0.98). Notably, academic attention to the development and adoption of ML surged exponentially after 2020 (Figure 2a). To date, as early as 1996, a groundwater quality modeling advisory system embedded with ML independently validated algorithms was developed for the U.S. Air Force. The system provides a standardized analytical framework for both experienced and novice researchers to select optimal groundwater models for site-specific evaluation [25]. The advent of AlphaGo in 2015 demonstrated the powerful potential of deep learning and gradually propelled the application of ML in groundwater research, with the number of publications growing exponentially year by year, with annual publications increasing from 5 in 2015 to a plateau of 405 in 2023. Significant advancements have been achieved in the exploration of big data, the recognition of complex patterns, and the prediction of intricate variables around 2020 [26]. One efficient way of analyzing big data, recognizing complex patterns, and extracting trends is through ML algorithms. Podgorski and Berg were the first to publish a study in Science that utilized ML to predict arsenic concentrations in drinking groundwater on a global scale [27]. Their groundbreaking research has attracted considerable interest, with the paper having been cited 738 times according to the most recent search, significantly propelling the use of ML within the field of groundwater.

Figure 2. Research evolution of ML applications in groundwater from 1996 to 2023. (a) Annual publication number; (b) timeline view of keyword clusters; (c) keyword clustering graph; (d) co-occurrence of keywords. (Search options: “Machine learning” (Topic) and “Groundwater” (Topic)). Circle size represents keyword frequency, while different colors indicate clusters based on co-occurrence relationships; Lines represent the co-occurrence relationships between nodes, and their thickness and color indicate the co-occurrence intensity.

The timeline graph is designed to depict the temporal interconnections among clusters, with nodes originating from the same cluster aligned in chronological sequence along a single horizontal line, thereby highlighting the cluster’s historical milestones and research endeavors [28,29]. As shown in Figure 2b, the ANN, ML, and groundwater level clusters emerged early and persisted throughout the entire study period, indicating that these topics garnered significant attention during this time frame. Additionally, clusters closely related to groundwater research include geographic information systems (GIS), drinking water, climate change, geochemistry, and groundwater level prediction. The earliest research focused on mass transport [25,30], decision-making [25,31], and water quality [25,30].

The fundamental principle of clustering entails grouping related sets of keywords into distinct categories (Figure 2c). In this study, the modularity (Q value) of keywords was 0.395 (with S = 0.7153). It has been reported that the greater the number of connections between keywords, the higher the modularity (Q value) of the graph becomes, indicating that the clustering performance for this cluster number is superior. A Q value greater than 0.3 indicates substantial graph modularity, while a silhouette coefficient (S value) exceeding 0.5 reflects more rational clustering results. The high-frequency clusters are primarily concentrated in groundwater level [21,32,33] and GIS [34,35,36], indicating that the current application of ML methods in groundwater research is predominantly focused on predicting groundwater levels. Additionally, the majority of studies incorporate GIS. Moreover, increasing attention is being directed toward underground drinking water, marking it as a future research trend [27,37,38,39]. This is closely linked to the depletion of global underground drinking water resources in recent years [20,40,41] and the frequent occurrence of underground drinking water source pollution incidents [16,42,43]. With the expansion of groundwater monitoring networks in various countries, large-scale climate change issues related to groundwater, such as “rainfall, drought, resources availability and interactions with surface water,” have also become major research subjects in recent years. This is attributed to the United Nations Framework Convention on Climate Change, signed in 2015, which has promoted research by scholars on environmental media and climate change [44].

Keywords encapsulate the core essence of a paper’s subject matter, and through the analysis of these keywords, one can understand the research hotspots in the field. In the keyword co-occurrence map (Figure 2d), the term “machine learning” is represented by the largest node, signifying its highest frequency of occurrence (674 articles) and its earliest appearance in the dataset. This indicates that ML models have become an important method for predicting and tracing groundwater resources and water environments. “Random forest” stands out as the next key term, indicating that among the prevalent ML algorithms, the random forest algorithm is predominantly utilized in groundwater research. This preference is largely attributed to its versatility and well-established applications [22]. Other keyword nodes such as “groundwater level,” “groundwater quality,” “aquifer,” and “prediction” reflect the current research hotspots and directions in the groundwater field and also indicate that the focus of research is mainly on prediction. “Spatial prediction,” “artificial intelligence,” “logistic regression,” “artificial neural network,” and “support vector machine” point out several typical algorithms currently applied in the groundwater field using ML. “Simulation,” “optimization,” “system,” “pollution,” and “algorithm” are keywords that reflect the future direction of ML development in groundwater research.

2.3. Geographical Distribution of ML Studies

The number of publications is a key indicator of research activity in a country within a specific domain. The United States, China, Iran, and India are the top four contributors in this field, contributing 77.36% of total publications, with figures of 351, 300, 218, and 177, respectively (Figure 3a). These countries exhibit betweenness centralities exceeding 0.1, which indicates the strong correlation between scientific research output and policy priorities. Though the institutions accountable for 10 or more publications represent just 10.67% of the total participating institutions, they are responsible for producing 50.60% of all publications, signifying a concentration of research in this field within a limited number of institutions. The collaborative network demonstrates extensive cooperation among countries, institutions, and authors. Each of the top 20 countries within this domain has exhibited varying levels of collaborative ties. Among them, the cooperation between the United States and Vietnam is the closest (Figure 3b).

Figure 3. Research leaders of ML applications in groundwater from 1996 to 2023. (a) Sankey diagram of publication countries and years; (b) country cooperation network; (c) institute cooperation network; (d) author cooperation network; (e) geographical distribution of the reviewed articles.

Institutions with substantial influence in this field include the University of Tehran, Islamic Azad University, the Chinese Academy of Sciences, the IIT System, and China University of Geosciences, all of which exhibit betweenness centrality values exceeding 0.1 (Figure 3c). The three leading authors in terms of publication output are Pradhan Biswajeet from the University of Technology Sydney, Lu Wenxi from Jilin University and Pal, Subodh Chandra from Bharati Vidyapeeth University. The author collaboration network reveals that foreign researchers maintain relatively strong collaborative connections in this field, while Chinese researchers, who are still in the nascent stages of their work in this area, exhibit fewer collaborative ties with international peers (Figure 3d). This suggests that the realm of groundwater research still necessitates bolstering international cooperation in the future.

2.4. Machine Learning Models

As shown in Figure 2b, the application of ML in groundwater research can be divided into three stages. The initial phase primarily employs ML techniques to forecast and simulate groundwater levels [45,46,47]. The subsequent stage concentrates on subterranean potable water resources and the quality of drinking water [37,38,48,49,50]. The final phase involves utilizing ML to investigate the potential correlations between groundwater and climate change [32,51,52]. In general, the selection of ML models should be guided by specific research needs and objectives. ML models are usually categorized into three main categories according to their learning types: unsupervised, supervised, and optimization algorithms [22]. Unsupervised algorithms include clustering algorithms, multiple frameworks, and self-organizing maps. Supervised learning algorithms comprise six types: fuzzy theory, comparative analysis, SVM, decision tree, ANN, and random forest. Optimization algorithms are further divided into genetic algorithms, wavelet transforms, and ensemble learning algorithms (Figure 4).

Figure 4. Application of ML in groundwater prediction and pollution identification process.

3. Application of Machine Learning Models in Groundwater Level and Quality Modeling

3.1. Background on Groundwater Prediction

Due to the influence of human activities and natural factors, groundwater resources and pollution have dramatically become a global issue [50,53]. In the realm of groundwater resource management, groundwater level serves as a crucial indicator for forecasting groundwater availability. Groundwater dynamic prediction generally requires refined analysis and numerical simulation of groundwater storage, recharge–discharge conditions, and hydrogeological parameters, so as to deduce the future trend of groundwater level changes and comprehensively assess the sustainability of groundwater resource development and utilization within a specific period [54]. Accurate prediction of groundwater level helps policymakers and water resource managers formulate rational water resource planning and management strategies, improve water use efficiency, and mitigate environmental impacts caused by excessive groundwater extraction [55].

In groundwater level prediction, ML algorithms frequently employed encompass Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), Decision Trees (DTs), and Random Forests (RFs), among others [56,57] (Figure 5). Bibliometric analysis reveals that SVM, ANN, DT, and RF models occur in research literature with frequencies of 122, 84, 27, and 114 instances, respectively. This indicates that SVM and RF models are more widely used than ANN and DT models in the field of groundwater resource prediction. In the field of groundwater pollution identification, ML-based recognition methods have been widely applied. Current studies mostly adopt classical ML algorithms, such as logistic regression [58], DT [59,60], and other supervised learning. These algorithms are capable of constructing groundwater pollution models by training and learning from groundwater sample data, thereby realizing the discrimination and early warning of groundwater pollution status. A growing number of studies have begun to explore the integrated application of multiple machine learning algorithms, optimizing model performance through complementary advantages and providing technical support for the precise prevention and control of groundwater pollution [7,10,11].

Ghasemi et al. integrated ML models such as RF and SVM with GIS technology to establish a land subsidence vulnerability map in Hamadan Province, Iran, confirming that the decline in groundwater levels leads to land subsidence in aquifers [61]. Jiang et al. combined four ML models (SVM, RF, multiple perception, and stacked ensemble model) with weather forecasts, based on groundwater level and meteorological data from five monitoring wells in the Huaibei Plain from 2010 to 2020, to test the feasibility of predicting groundwater levels using meteorological factors and ML algorithms, providing assistance for agricultural water management [62].

3.2. Techniques Used in Groundwater Research

3.2.1. Support Vector Machine

Support Vector Machine (SVM) is a versatile ML model applicable to both classification and regression tasks. Its basic formulation is the maximum-margin classifier, which achieves classification by maximizing the margin width between different classes. As the conventional maximum-margin classifier relies on hard-margin classification, it is highly sensitive to outliers. SVM mitigates this drawback by adopting the support vector classifier, also referred to as the soft-margin classifier. This approach performs cross-validation on support vectors within the soft margin to determine the optimal soft margin that delivers the best classification performance [16,57]. Moreover, SVM overcomes several limitations of artificial neural networks in terms of overall generalization ability, such as the tendency to converge to local minima during training, overfitting to training data, and the subjectivity in selecting model architectures [54].

In the context of groundwater prediction, SVM is adept at forecasting future conditions of groundwater [12,20,63]. Among the 40 studies included by Boo et al., SVM was verified as the most accurate model in seven investigations [17]. A study by Samani et al. demonstrated that SVM can accurately predict groundwater level (GWL) variations in the Qazvin Aquifer of Iran over the subsequent 2 months [64]. Research by Yadav et al. in the southeastern region of Karnataka, India, showed that the SVM model, constructed using historical data on population growth rate, NOI, SOI, Nino3, T, P, and GWL, significantly outperformed the hybrid Wavelet Neural Network model in GWL prediction, with a correlation coefficient R > 85% [20]. Xu et al. [65] attributed the widespread application of SVM technology mainly to the following advantages: (1) excellent generalization performance; (2) consistent availability of a global optimal solution rather than a local optimal solution; and (3) the ability to achieve sparse representation of solutions (support vectors) with only a small number of training samples [54].

In summary, SVM is a reliable modeling approach capable of reducing predictive uncertainty under specified conditions. Unlike neural networks, the SVM model does not depend on the design and selection of network architecture. Moreover, it is less prone to overfitting since it avoids iterative training processes, and its fitting efficiency is comparatively higher. Consistent with other machine learning methods, constructing hybrid or ensemble models by integrating SVM with other algorithms can effectively compensate for the inherent limitations of standalone SVM and further improve its prediction accuracy [43].

3.2.2. Decision Tree

Decision trees are a commonly used supervised learning algorithm that classifies and predicts by dividing the dataset into different subsets and gradually building a decision tree. In groundwater pollution identification, DTs can infer whether groundwater is contaminated based on a series of feature variables. The advantages of DTs lie in their ease of understanding and interpretation, and they can automatically handle missing values and outliers [66]. Additionally, the DT algorithm can consider the interactions between different features, improving the accuracy of groundwater pollution identification. By training and optimizing the DT, a reliable model for groundwater pollution identification can be obtained, providing a scientific basis for the management and protection of groundwater resources. Therefore, the application of the DT algorithm in groundwater pollution identification has significant research value and positive implications for addressing groundwater pollution issues.

Saghebian et al. proposed a decision tree-based method for predicting groundwater quality using data from the agricultural aquifers in Ardabil Province, northwestern Iran, based on the United States Salinity Laboratory (USSL) diagram. By comprehensively considering hydrochemical parameters of groundwater and monthly precipitation amounts at different lag times, the study aimed to find an accurate and cost-effective alternative for groundwater quality classification [67]. The results indicate that the groundwater quality classification based on DT is more precise and efficient compared with the principal component analysis method. DT models can also be applied to groundwater resource development [68], prediction of water potential zones [69], and assessment of groundwater potential [69,70,71].

3.2.3. Random Forest

The RF model is a machine learning algorithm based on DT. By employing an ensemble strategy, this algorithm effectively overcomes the inherent limitations of single decision tree models. It not only significantly alleviates overfitting but also further improves prediction accuracy and stability, making it widely used in hydrological simulations such as groundwater level dynamic prediction [54]. The core idea of RF is to construct an ensemble of multiple mutually independent decision trees that jointly perform classification or regression tasks. During model construction, each decision tree is generated independently with no information interaction among them and is generally expanded to an optimal structure without pruning according to the predefined number of trees and feature variables involved in modeling [57]. To enhance the generalization ability of the model, each decision tree is trained using bootstrap random sampling and random selection of input features. By introducing double randomness, the model variance is reduced, and overall prediction performance is improved [36,72,73].

In groundwater research, the RF algorithm is utilized to develop predictive models for groundwater levels and groundwater quality. Through the analysis of the interplay and significance of various feature variables, the RF algorithm can yield more precise predictions for groundwater levels. Furthermore, RF serves as a valuable tool for feature selection, aiding in the identification of variables that exert a substantial influence on groundwater contamination. By incorporating the RF algorithm, the detection and forecasting of groundwater contamination can attain enhanced accuracy and dependability [26], offering practical instruments and methodologies for the safeguarding and administration of groundwater resources.

Rodriguez-Galiano et al. applied the RF algorithm for predicting agricultural groundwater contamination and found that the RF model needed to explain only four variables when human factors were considered as water quality simulation parameters [74]. Wang et al. indicates that in sandy shallow groundwater areas, the RF model can be used to locate the sources and flow paths of dissolved organic nitrogen (DON) in groundwater based on landscape characteristics [75]. Judeh et al. combined GIS, statistics, and ML for groundwater quality management, including water quality assessment and prediction [76]. Using the distribution of nitrates in groundwater and their influencing factors to construct an RF prediction model, the results showed that the average and maximum prediction accuracies of RF were 88.5% and 91.7%, respectively. Well depth had the greatest impact on GNC.

3.2.4. Artificial Neural Network

ANN is a commonly used ML model that is composed of multiple neurons and can simulate the learning and decision-making processes of the human brain [77]. In groundwater prediction, neural networks adeptly discern the intricate interrelations between groundwater levels and their influencing factors through the analysis of extensive datasets. The primary benefit of this approach is its capacity to address nonlinear challenges and its robust fitting capabilities. During the construction of a neural network model for groundwater prediction, numerous variables are typically taken into account, such as fluctuations in groundwater levels, precipitation, and additional factors. Through the adjustment of the neural network’s architecture, including the quantity of neurons and layers, and the application of suitable activation functions alongside optimization algorithms, a model can be developed that offers a high degree of predictive accuracy.

Research into ANNs has concentrated on their utility in groundwater quality modeling, encompassing the forecasting of nitrate leaching, hydrological variables, and groundwater quality. The studies have compared various input–output configurations, tackled data deficiencies or practical applications, or have contributed to the broader domain. These investigations have concluded that neural networks, including backpropagation neural networks (BPANNs), feedforward neural networks (FFANNs), multilayer perceptrons (MLPs), and Bayesian neural networks (BNNs), are appropriate for simulating groundwater quality within specific study regions [68,78,79,80]. ANNs have been proven to be a feasible method for predicting groundwater quality variables such as nitrate concentration. Neural network models are highly accurate and easy to implement, making them the most commonly used algorithms in groundwater quality modeling. Based on the prediction results, they can be applied for management purposes, such as drinking water health management [81], agricultural irrigation [82], and others.

3.2.5. Logistic Regression

The main idea of logistic regression is to estimate the probability of a sample belonging to a certain class by defining a logistic function [36,59]. The logistic function maps any real number to a probability value between 0 and 1. For the groundwater pollution identification problem, the characteristics of groundwater can be used as input variables, and the logistic regression model can be trained based on the labels of known groundwater samples (polluted or non-polluted) to ultimately obtain a model that can classify new samples. The core of the logistic regression model is the estimation of parameters, which is usually solved using the method of maximum likelihood estimation. A trained logistic regression model can be used to classify new groundwater samples and determine whether they belong to the polluted category [83,84,85].

The application of the logistic regression algorithm in groundwater pollution identification has the following advantages: firstly, it is a simple and effective algorithm that does not require complex computations and optimization processes; secondly, logistic regression can provide probability estimates of categories, not just classification outcomes; moreover, the results of the logistic regression model are well interpretable, which can help understand the key factors affecting groundwater pollution [36]. Ozdemir used logistic regression to locate potential areas of groundwater springs in the Sudan mountains, and the evolved model was found to be in good agreement with existing groundwater spring test data [86]. This demonstrates the utility of the logistic regression method for conventional groundwater exploration. Nolan et al. used a logistic regression model to predict the probability of nitrate contamination in groundwater exceeding 4 mg/L in the United States [83]. Lado et al. created a semi-automatic method for assessing the risk of arsenic (As) contamination in shallow groundwater in Cambodia [87]. Logistic regression analysis showed a good correlation between terrain and geomorphological environmental variables and the risk of groundwater As contamination.

3.2.6. Genetic Algorithm

Genetic algorithms are optimization algorithms inspired by the principles of biological evolution [88,89,90]. Genetic algorithms emulate the genetic evolutionary process to seek an optimal solution. They initiate with a population of potential solutions and iteratively enhance their quality through genetic operations, including crossover and mutation. The main advantages of genetic algorithms lie in their global optimization capability and parallel processing ability. Through an automated optimization process, genetic algorithms can find the best solutions in complex groundwater systems and improve the accuracy and reliability of prediction results [91]. Additionally, genetic algorithms are adept at tackling the difficulties posed by high-dimensional data and nonlinear relationships inherent in groundwater forecasting.

During the groundwater prediction process, genetic algorithms can be employed to ascertain the optimal combination of parameters, including the selection of suitable input variables and the optimization of model parameters. Several studies have compared the performance of various genetic algorithms [92,93,94]. Ritzel et al. showed that the Pareto genetic algorithm is superior to the Vector Evaluated Genetic Algorithm in solving multi-objective groundwater pollution control problems [94]. Kisi et al. demonstrated that the Continuous Genetic Algorithm (CGA) is generally superior to Particle Swarm Optimization (PSO) and Ant Colony Optimization Algorithm (ACOR) in training and optimizing models [93].

Compared with ANN and RF algorithms, genetic algorithms exhibit distinct advantages, including structural independence and prevention of overfitting and premature convergence [95]. They also demonstrate high flexibility in terms of the number of parameters, requiring as few as three parameters in some applications [92], while being capable of handling more than ten parameters [96,97]. However, genetic algorithms also have some limitations. Firstly, the computational complexity of the algorithm is high, requiring a significant amount of computational resources. Secondly, the choice of algorithm parameters greatly affects the results, necessitating appropriate parameter tuning. Additionally, genetic algorithms may produce overfitting issues during application, requiring model selection and validation [98]. Despite these challenges and limitations, genetic algorithms, as a potent optimization tool, play a significant role in the research into groundwater pollution identification and prediction.

Figure 5. Demonstration of typical ML models for groundwater level and pollution prediction.

4. General Discussion

4.1. Supervised Learning Algorithm

Supervised machine learning centers on constructing an estimation model for a target variable using a set of known samples and explanatory variables [99]. Typical algorithms include DT, SVM, ANN, and RF [100]. Such methods generally first collect labeled data representative of various data sources and then build classification models based on selected features—for instance, to distinguish between contaminated and uncontaminated samples. Data labeling is a critical step for supervised machine learning to achieve pattern recognition and outcome prediction, which distinguishes it distinctly from unsupervised learning, which can mine data patterns autonomously without labeled data [101].

Among supervised learning models, the RF algorithm offers numerous advantages over other algorithms. It exhibits a relatively weak “black box” characteristic, allowing researchers to obtain more detailed parameter information [76]. A study by Tesoriero et al. demonstrated that the RF algorithm outperformed linear regression on almost all evaluation metrics when predicting the occurrence of redox-active components in groundwater [102]. Mosavi et al. compared the performance of boosted regression trees (BRTs) and RF in the sensitivity prediction of groundwater hardness, and the results showed that the RF algorithm outperformed the Boosted Regression Tree model [103]. Anjum et al. found that RF performed better than ANN in predicting groundwater quality parameters (WQI) in urban areas [104]. Kumar and Pati employed multiple ML models, including DT, RF, Multilayer Perceptron, and Naive Bayes, to assess the degree of groundwater arsenic contamination in Jharkhand, India [105]. The results indicated that RF achieved the highest prediction accuracy among all tested models.

In multiple studies, ANNs have been compared with other types of neural networks. Research by Charulatha et al. [106] and Khalil et al. [30] found that ANNs have lower prediction errors in estimating nitrate and electrical conductivity compared with linear regression (LR) or MLR. However, in the study by Nolan et al. [107], ANNs performed the worst when compared with boosted regression trees (BRTs), Bayesian networks, MLR, and random forest regression (RFR). Compared with several Kriging models and co-Kriging models, ANNs have shown the best performance in estimating groundwater electrical conductivity.

Supervised classification algorithms can optimize field investigations during the characterization stage of groundwater-dependent ecosystems, identify the impacts of leachate leakage from livestock burial sites on shallow groundwater quality, and support research into groundwater potential [108]. Martínez-Santos et al. applied supervised classification algorithms trained on ground-truth samples and compared their results with officially published inventories of groundwater-dependent ecosystems, achieving an accuracy rate of 90% [99]. This method can also be used to improve environmental impact assessments due to improper disposal of organic waste. Oh et al. employed supervised ML techniques to distinguish the impact of leachate leakage from livestock burial sites on shallow groundwater quality, even in areas where the groundwater was previously contaminated by agriculture and livestock farming [101]. Naghibi and Dashtpagerdi utilized four supervised learning models for mapping groundwater potential, resulting in improved and more accurate Groundwater Potential Maps (GPMs) [109].

However, supervised learning algorithms also have their shortcomings; data-driven supervised ML algorithms typically heavily rely on the characteristics of the data (the number of labels and data distribution). The difference in the number of labels between majority and minority samples can hinder the generalization ability of ML models, particularly weakening the predictive power for minority groups [110]. In practical applications, ML techniques are often combined with other methods to improve prediction accuracy. Furthermore, as an algorithm derived from DT, the RF model is not well suited to handling large datasets, which also requires the input data to have relatively high accuracy and completeness [111].

4.2. Unsupervised Learning Algorithm

In the identification of groundwater pollution, unsupervised learning methods are also frequently applied ML algorithms [112,113]. Unlike supervised learning methods, unsupervised learning does not require labeled training data but instead analyzes and identifies by seeking patterns and regularities within the data [114]. The main methods applied in commonly used unsupervised learning algorithms include clustering analysis, anomaly detection, and principal component analysis.

Clustering analysis can automatically divide data samples into non-overlapping subsets, where data points within each subset have similar characteristics and are significantly different from data points in other subsets [115]. In groundwater pollution identification, clustering analysis can help to identify clusters of groundwater samples with similar characteristics and pollution levels, thus better understanding the distribution and main features of groundwater pollution. It is therefore widely used in identifying different types of groundwater quality and predicting future groundwater quality. By performing clustering analysis on groundwater samples, the samples can be divided into different clusters, with each cluster representing groundwater with similar quality characteristics [116].

The application process of clustering analysis typically includes the following steps: First, collect and prepare groundwater sample data, including the concentration of water quality parameters and geographical location information [112]. Then, select an appropriate clustering algorithm, with common ones being K-means, DBSCAN, etc. Next, by selecting suitable feature vectors, transform the groundwater samples into a processable data form. After that, set appropriate parameters according to the requirements of the algorithm, such as the number of clusters, neighborhood radius, etc. During the execution of the algorithm, clustering analysis will categorize the samples into different clusters based on their feature similarity [115]. Based on this, the relationship between each cluster and groundwater pollution can be inferred, and corresponding predictions can be made by analyzing the characteristics of each cluster. Finally, validate and evaluate the prediction results to determine the accuracy and reliability of the prediction model. The groundwater prediction based on clustering methods has high accuracy and interpretability, providing important reference information for groundwater management and protection.

Clustering analysis has certain advantages in the application of groundwater pollution identification. Firstly, it can discover unknown groups of groundwater pollution, providing an unsupervised way to identify potential pollution sources and diffusion patterns. Vesselinov et al. proposed a new method for identifying pollution sources (NMFK), which is based on Non-negative Matrix Factorization (NMF) of Blind Source Separation (BSS) combined with a custom semi-supervised clustering algorithm to decompose observed mixtures [117]. It can identify the original geochemical concentrations of contaminants in geochemical mixtures of groundwater types and unknown areas, as well as the mixing ratios, without any additional field information. Secondly, clustering algorithms can help understand the characteristics of groundwater pollution and reveal its influencing factors, providing important references for further research and pollution control [118]. Finally, clustering analysis can also dynamically monitor and predict groundwater pollution, achieving real-time pollution alerts and timely treatment measures, thereby effectively protecting the safety and sustainable use of groundwater resources.

Principal component analysis (PCA) is a commonly used unsupervised learning algorithm and has also been widely applied in groundwater pollution identification [119,120,121]. PCA reduces the dimensionality of the dataset to find the main features that can explain the maximum variance in the data. In groundwater pollution identification, PCA can reduce the dimensionality of a large number of water quality indicators while retaining the information in the original data. The feature vectors obtained after dimensionality reduction of water quality indicators based on PCA can be used as input data to train classification models, thereby identifying whether groundwater is contaminated. PCA can eliminate redundant information, improve data interpretability, and enhance classification accuracy, making it of great significance in groundwater pollution identification. Through the research and application of the PCA algorithm, a better understanding of the characteristics and sources of groundwater pollution can be achieved, providing a scientific basis for the protection and management of groundwater resources.

Researchers often use multivariate statistical techniques such as clustering analysis, principal component analysis, and factor analysis to analyze the physicochemical components of groundwater, thereby determining the main factors affecting groundwater quality [122,123,124]. This method has been applied in a wide range of areas, including Quebec, Canada [122], the Jinju coastal area of South Korea [125], Karnataka, India [126], the Shenzhen coastal area of China [127], the South China coastal area [128], and the Loess Plateau [129], demonstrating strong adaptability under various geological environmental conditions.

Friedel et al. compared supervised and unsupervised learning methods and applied them to predict the groundwater redox status in the Tasmán, Waikato, and Wellington regions of New Zealand, where agriculture is dominant [114]. The results showed that supervised learning algorithms had prediction biases under oxic conditions and did not yield good results when using independent regional data. In contrast, unsupervised learning algorithms performed well when using independent regional data to predict oxic, mixed, and anoxic conditions, as well as the corresponding depths. Ratolojanahary et al. combined unsupervised and supervised learning methods to determine four types of water quality and provided insights into abnormal situations [130]. The proposed method can be generalized for water quality from any data source, which will enable decision-makers responsible for water resource management to predict quality deterioration and any consequences for human health.

In summary, unsupervised learning methods have significant application value in the identification of groundwater pollution. Through unsupervised learning methods, potential patterns and regularities can be extracted from groundwater monitoring data, providing accurate identification and localization of groundwater pollution. Therefore, in future research, unsupervised learning methods will continue to play a significant role, offering more innovation and progress in the field of groundwater pollution identification and prediction.

4.3. Comparison of Machine Learning Models with Conventional Methods

Over the past few decades, the identification of groundwater contamination and the prediction of groundwater conditions have mainly relied on statistical analysis and numerical simulation [77,131]. Statistical analysis is a method based on historical data that assumes future groundwater level trends will resemble past trends. The groundwater system, however, is subject to a myriad of intricate factors, such as precipitation, groundwater recharge, hydrogeological conditions, and others. Consequently, this assumption does not invariably yield precise forecasts of future groundwater level fluctuations. Numerical simulation, on the other hand, predicts groundwater level changes by establishing mathematical models of groundwater flow and then conducting simulation calculations. This method requires a large amount of input data and professional expertise, and it involves multiple iterative calculations, which can be time-consuming and difficult to ensure the accuracy of the results. Secondly, traditional groundwater prediction methods also encounter significant challenges in data acquisition [132].

The process of collecting groundwater data necessitates drilling and monitoring wells, which are both time-consuming and labor-intensive, as well as expensive. Moreover, conventional data collection methods often yield only a limited number of data samples, which fail to comprehensively capture the intricate nature of the groundwater system. This limitation constrains the reliability and precision of the prediction models. Furthermore, traditional groundwater prediction approaches tend to oversimplify the groundwater system by using linear or static models, thereby overlooking the dynamic complexities inherent in the system [133,134]. For example, changes in groundwater levels are influenced not only by natural factors such as climate and geological conditions but also by human activities, such as groundwater extraction and land use changes. The impact of these factors is often nonlinear, which traditional methods struggle to address effectively.

To overcome these obstacles, integrating ML algorithms is crucial for developing more robust groundwater prediction models and methodologies. Unlike conventional approaches, ML algorithms do not require any statistical assumptions and are capable of processing data obtained from various measurement methods [135]. Their performance has been shown to outperform binary and multiple regression models in numerous studies [30,76]. By analyzing massive datasets, these algorithms can identify the complex interrelationships within groundwater systems and adapt to dynamic changes in the data, thereby improving the accuracy of groundwater prediction [57]. For instance, ML algorithms such as SVMs, ANNs, and decision trees have been widely applied in groundwater prediction and achieved satisfactory predictive performance [12,20,77,78,79].

Overall, traditional groundwater pollution identification methods have certain limitations, while the introduction of ML algorithms offers new perspectives and methods for the identification and prediction of groundwater pollution. The application of ML algorithms allows for more accurate identification and prediction of groundwater pollution, providing strong support for the protection and management of groundwater resources. However, the application of ML algorithms in the field of groundwater pollution identification is still in the exploratory stage and requires further research and practice for refinement and improvement. Future studies can further optimize the models and parameter settings of ML algorithms to enhance the accuracy and practicality of groundwater pollution identification, contributing to the sustainable utilization of groundwater resources.

5. Future Research Direction

ML models have demonstrated remarkable performance in groundwater level and water quality research. To further advance their practical application and better support groundwater resource security, global researchers and institutions need to make joint efforts in the following aspects.

First, strengthen international and institutional cooperation and improve groundwater monitoring systems. Currently, the application of ML in groundwater prediction is still in its initial stage, with limited cooperation between countries and institutions, mainly due to the invisibility of groundwater. Developing countries such as China and India lack sufficient groundwater monitoring systems compared with countries like the United States and Iran, which have comprehensive long-term monitoring data [2]. Therefore, it is urgent to establish more groundwater monitoring points, adopt online monitoring systems for real-time data collection to build extensive databases, and promote broader international cooperation in data sharing, algorithm development, and hydrogeological exploration to form more comprehensive groundwater datasets.

Second, optimize hybrid ML models and balance computational costs with model adaptability. With the advancement of relevant research, the focus of studies in the field of groundwater applications has shifted from comparing various single ML models to integrating multiple ML models or their combinations through weighted averaging techniques. Integrating process-based physical models with ML models can not only leverage the mechanistic insight and transferability of process-based methods but also combine the predictive capability and flexibility of ML [136]. Future research should focus on the integration and combination of models, fully considering computational costs and model adaptability to further enhance the operational efficiency and practical application value of the models.

Third, enhance model interpretability and promote cross-disciplinary integration. Although techniques such as partial dependence plots and Shapley Additive Explanations (SHAPs) can improve the interpretability of ML models, their effect is still not ideal. Future research should explore the application of Explainable AI (XAI) in this field to break the “black box” nature of ML [16]. At the same time, it is crucial to leverage mature model algorithms from different disciplines for groundwater level and quality prediction and prioritize the integrated advancement of cross-disciplinary strategies.

Finally, link ML simulation results with decision-making and serve groundwater management. Current research is mostly limited to the ML models themselves; future exploration should focus on using simulation results to support decision-making processes. By collaborating with government departments, ML can be applied to groundwater risk assessment, water level warning, and pollution identification. Building a data simulation and prediction platform using modern methods can achieve more precise monitoring and early warning, thereby providing services for groundwater management and decision-making.

6. Conclusions

This study conducted a bibliometric review using CiteSpace and a global-scale analysis of ML methods applied to groundwater resources and quality based on 1326 records. The research findings indicate that the study of ML for groundwater prediction has experienced three stages of rapid development in recent years, with an exponential growth trend particularly after 2020. Through the analysis of keyword clustering maps and timeline charts, the dominant research directions in this field were clarified, including underground drinking water, pollution, systems, simulation, and optimization. By cataloging the primary applications of ML in groundwater level prediction and water quality analysis, the study compared the strengths and weaknesses of various algorithms and summarized the applicable conditions for different algorithms. Compared with traditional prediction methods, ML algorithms demonstrate obvious advantages in this field, characterized by high accuracy and stability when handling high-dimensional data and nonlinear relationships. Additionally, the study identified existing shortcomings in current research and future research directions, laying a foundation for proposing future research directions.

Author Contributions

Q.L.: writing—review and editing, methodology, resources, visualization, funding acquisition; K.L.: data curation, formal analysis, investigation, writing—review and editing, visualization; F.X.: visualization; Z.Y.: methodology; S.D.: methodology; X.H.: data curation; Y.Y.: supervision; Y.J.: funding acquisition, supervision, resources. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Basic Research Operating Expenses of Central-level Public Welfare Research Institutions, Grant No. 2024YSKY-43.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ghasemian, D.Z. Groundwater Management Using Remotely Sensed Data in High Plains Aquifer; The University of Arizona: Tucson, AZ, USA, 2016. [Google Scholar]
Jain, M.; Fishman, R.; Mondal, P.; Galford, G.L.; Bhattarai, N.; Naeem, S.; Lall, U.; Balwinder, S.; DeFries, R.S. Groundwater depletion will reduce cropping intensity in India. Sci. Adv. 2021, 7, eabd2849. [Google Scholar] [CrossRef]
Yin, Z.J.; Xu, Y.Y.; Zhu, X.Y.; Zhao, J.W.; Yang, Y.P.; Li, J. Variations of groundwater storage in different basins of China over recent decades. J. Hydrol. 2021, 598, 126282. [Google Scholar] [CrossRef]
Koch, F.; Blum, P.; Korbel, K.; Menberg, K. Global overview on groundwater fauna. Ecohydrology 2024, 17, e2607. [Google Scholar] [CrossRef]
Saccò, M.; Mammola, S.; Altermatt, F.; Alther, R.; Bolpagni, R.; Brancelj, A.; Brankovits, D.; Fišer, C.; Gerovasileiou, V.; Griebler, C.; et al. Groundwater is a hidden global keystone ecosystem. Glob. Change Biol. 2024, 30, e17066. [Google Scholar] [CrossRef]
Noori, R.; Maghrebi, M.; Jessen, S.; Bateni, S.M.; Heggy, E.; Javadi, S.; Noury, M.; Pistre, S.; Abolfathi, S.; Aghakouchak, A. Decline in Iran’s groundwater recharge. Nat. Commun. 2023, 14, 6674. [Google Scholar] [CrossRef] [PubMed]
Derdour, A.; Abdo, H.G.; Almohamad, H.; Alodah, A.; Al Dughairi, A.A.; Ghoneim, S.S.M.; Ali, E. Prediction of groundwater quality index using classification techniques in arid environments. Sustainability 2023, 15, 9687. [Google Scholar] [CrossRef]
Qu, X.Y.; Shi, L.Q.; Han, J. Spatial evaluation of groundwater quality based on toxicological indexes and their effects on ecology and human health. J. Clean. Prod. 2022, 377, 134255. [Google Scholar] [CrossRef]
Su, F.; Wu, J.; He, S. Set pair analysis-Markov chain model for groundwater quality assessment and prediction: A case study of Xi’an city, China. Hum. Ecol. Risk Assess. 2019, 25, 158–175. [Google Scholar] [CrossRef]
Cai, H.J.; Shi, H.Y.; Liu, S.N.; Babovic, V. Impacts of regional characteristics on improving the accuracy of groundwater level prediction using machine learning: The case of central eastern continental United States. J. Hydrol. Reg. Stud. 2021, 37, 100930. [Google Scholar] [CrossRef]
Liu, X.; Lu, D.W.; Zhang, A.Q.; Liu, Q.; Jiang, G.B. Data-Driven Machine learning in environmental pollution: Gains and problems. Environ. Sci. Technol. 2022, 56, 2124–2133. [Google Scholar] [CrossRef]
Subbarayan, S.; Thiyagarajan, S.; Karuppannan, S.; Panneerselvam, B. Enhancing groundwater vulnerability assessment: Comparative study of three machine learning models and five classification schemes for Cuddalore district. Environ. Res. 2024, 242, 117769. [Google Scholar] [CrossRef]
Chen, C.; He, W.; Zhou, H.; Xue, Y.R.; Zhu, M.D. A comparative study among machine learning and numerical models for simulating groundwater dynamics in the Heihe River Basin, northwestern China. Sci. Rep. 2020, 10, 3904. [Google Scholar] [CrossRef]
Kumar, M.; Singh, P.; Singh, P. Machine learning and GIS-RS-based algorithms for mapping the groundwater potentiality in the Bundelkhand region, India. Ecol. Inform. 2023, 74, 101980. [Google Scholar] [CrossRef]
Teimoori, S.; Olya, M.H.; Miller, C.J. Groundwater level monitoring network design with machine learning methods. J. Hydrol. 2023, 625, 130145. [Google Scholar] [CrossRef]
Boo, K.B.W.; El-Shafie, A.; Othman, F.; Khan, M.M.H.; Birima, A.H.; Ahmed, A.N. Groundwater level forecasting with machine learning models: A review. Water Res. 2024, 252, 121249. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Dong, F.; Chen, G.; Dai, Z. Advance prediction of coastal groundwater levels with temporal convolutional and long short-term memory networks. Hydrol. Earth Syst. Sci. 2023, 27, 83–96. [Google Scholar] [CrossRef]
Singh, A.; Patel, S.; Bhadani, V.; Kumar, V.; Gaurav, K. AutoML-GWL: Automated machine learning model for the prediction of groundwater level. Eng. Appl. Artif. Intell. 2024, 127, 107405. [Google Scholar] [CrossRef]
Yadav, B.; Gupta, P.K.; Patidar, N.; Himanshu, S.K. Ensemble modelling framework for groundwater level prediction in urban areas of India. Sci. Total Environ. 2020, 712, 135539. [Google Scholar] [CrossRef]
Liu, Q.; Gui, D.W.; Zhang, L.; Niu, J.; Dai, H.; Wei, G.H.; Hu, B.X. Simulation of regional groundwater levels in arid regions using interpretable machine learning models. Sci. Total Environ. 2022, 831, 154902. [Google Scholar] [CrossRef] [PubMed]
Pandya, H.; Jaiswal, K.; Shah, M.A. A Comprehensive Review of Machine learning algorithms and its application in groundwater quality prediction. Arch. Comput. Methods Eng. 2024, 31, 4633–4654. [Google Scholar] [CrossRef]
Haggerty, R.; Sun, J.X.; Yu, H.F.; Li, Y.S. Application of machine learning in groundwater quality modeling-A comprehensive review. Water Res. 2023, 233, 119745. [Google Scholar] [CrossRef]
Ding, Y.Z.; Sun, Q.Y.; Lin, Y.Q.; Ping, Q.; Peng, N.; Wang, L.; Li, Y.M. Application of artificial intelligence in (waste)water disinfection: Emphasizing the regulation of disinfection by-products formation and residues prediction. Water Res. 2024, 253, 121267. [Google Scholar] [CrossRef] [PubMed]
Chen, C.M. Science mapping: A systematic review of the literature. J. Data Inf. Sci. 2017, 2, 1–40. [Google Scholar] [CrossRef]
Medina, M.A.; Jacobs, T.L.; Lin, W.C.; Lin, K.C. Ground water solute transport, optimal remediation planning, and decision making under uncertainty. Water Resour. Bull. 1996, 32, 1–12. [Google Scholar] [CrossRef]
Tahmasebi, P.; Kamrava, S.; Bai, T.; Sahimi, M. Machine learning in geo- and environmental sciences: From small to large scale. Adv. Water Resour. 2020, 142, 103619. [Google Scholar] [CrossRef]
Podgorski, J.; Berg, M. Global threat of arsenic in groundwater. Science 2020, 368, 845. [Google Scholar] [CrossRef]
Chen, C.M. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Technol. 2006, 57, 359–377. [Google Scholar] [CrossRef]
Liu, Z.G.; Yin, Y.M.; Liu, W.D.; Dunford, M. Visualizing the intellectual structure and evolution of innovation systems research: A bibliometric analysis. Scientometrics 2015, 103, 135–158. [Google Scholar] [CrossRef]
Khalil, A.; Almasri, M.N.; McKee, M.; Kaluarachchi, J.J. Applicability of statistical learning algorithms in groundwater quality modeling. Water Resour. Res. 2005, 41, W05010. [Google Scholar] [CrossRef]
Khalil, A.F.; Kaheil, Y.H.; Gill, K.M.; McKee, M. Application of Learning Machines and Combinatorial Algorithms in Water Resources Management and Hydrologic Sciences; Utah State University: Logan, UT, USA, 2010; pp. 61–106. [Google Scholar]
Wunsch, A.; Liesch, T.; Broda, S. Deep learning shows declining groundwater levels in Germany until 2100 due to climate change. Nat. Commun. 2022, 13, 1221. [Google Scholar] [CrossRef]
Wu, C.C.; Zhang, X.Q.; Wang, W.J.; Lu, C.P.; Zhang, Y.; Qin, W.; Tick, G.R.; Liu, B.; Shu, L.C. Groundwater level modeling framework by combining the wavelet transform with a long short-term memory data-driven model. Sci. Total Environ. 2021, 783, 146948. [Google Scholar] [CrossRef]
Naghibi, S.A.; Pourghasemi, H.R.; Dixon, B. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ. Monit. Assess. 2016, 188, 44. [Google Scholar] [CrossRef]
Vafadar, S.; Rahimzadegan, M.; Asadi, R. Evaluating the performance of machine learning methods and Geographic Information System (GIS) in identifying groundwater potential zones in Tehran-Karaj plain, Iran. J. Hydrol. 2023, 624, 129952. [Google Scholar] [CrossRef]
Chen, W.; Li, H.; Hou, E.K.; Wang, S.Q.; Wang, G.R.; Panahi, M.; Li, T.; Peng, T.; Guo, C.; Niu, C.; et al. GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Sci. Total Environ. 2018, 634, 853–867. [Google Scholar] [CrossRef]
Erickson, M.L.; Elliott, S.M.; Brown, C.J.; Stackelberg, P.E.; Ransom, K.M.; Reddy, J.E.; Cravotta, C.A. Machine-learning predictions of high arsenic and high manganese at drinking water depths of the glacial aquifer system, northern continental United States. Environ. Sci. Technol. 2021, 55, 5791–5805. [Google Scholar] [CrossRef]
Ransom, K.M.; Nolan, B.T.; Stackelberg, P.E.; Belitz, K.; Fram, M.S. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States. Sci. Total Environ. 2022, 807, 151065. [Google Scholar] [CrossRef]
Podgorski, J.E.; Labhasetwar, P.; Saha, D.; Berg, M. Prediction Modeling and Mapping of Groundwater Fluoride Contamination throughout India. Environ. Sci. Technol. 2018, 52, 9889–9898. [Google Scholar] [CrossRef] [PubMed]
Xiong, J.H.; Abhishek; Guo, S.L.; Kinouchi, T. Leveraging machine learning methods to quantify 50 years of dwindling groundwater in India. Sci. Total Environ. 2022, 835, 155474. [Google Scholar] [CrossRef] [PubMed]
Hasan, M.F.; Smith, R.; Vajedian, S.; Pommerenke, R.; Majumdar, S. Global land subsidence mapping reveals widespread loss of aquifer storage capacity. Nat. Commun. 2023, 14, 6180. [Google Scholar] [CrossRef]
Chakraborty, M.; Sarkar, S.; Mukherjee, A.; Shamsudduha, M.; Ahmed, K.M.; Bhattacharya, A.; Mitra, A. Modeling regional-scale groundwater arsenic hazard in the transboundary Ganges River Delta, India and Bangladesh: Infusing physically-based model with machine learning. Sci. Total Environ. 2020, 748, 141107. [Google Scholar] [CrossRef] [PubMed]
Sajedi-Hosseini, F.; Malekian, A.; Choubin, B.; Rahmati, O.; Cipullo, S.; Coulon, F.; Pradhan, B. A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci. Total Environ. 2018, 644, 954–962. [Google Scholar] [CrossRef]
Rogelj, J.; den Elzen, M.; Höhne, N.; Fransen, T.; Fekete, H.; Winkler, H.; Chaeffer, R.S.; Ha, F.; Riahi, K.; Meinshausen, M. Paris Agreement climate proposals need a boost to keep warming well below 2C. Nature 2016, 534, 631–639. [Google Scholar] [CrossRef]
Barzegar, R.; Fijani, E.; Moghaddam, A.A.; Tziritis, E. Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models. Sci. Total Environ. 2017, 599, 20–31. [Google Scholar] [CrossRef]
Rezaie-Balf, M.; Naganna, S.R.; Ghaemi, A.; Deka, P.C. Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting. J. Hydrol. 2017, 553, 356–373. [Google Scholar] [CrossRef]
Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US. Water Resour. Res. 2017, 53, 3878–3895. [Google Scholar] [CrossRef]
Erickson, M.L.; Elliott, S.M.; Christenson, C.A.; Krall, A.L. Predicting geogenic Arsenic in drinking water wells in Glacial Aquifers, North-Central USA: Accounting for Depth-Dependent Features. Water Resour. Res. 2018, 54, 10172–10187. [Google Scholar] [CrossRef]
Kouadri, S.; Elbeltagi, A.; Islam, A.M.T.; Kateb, S. Performance of machine learning methods in predicting water quality index based on irregular data set: Application on Illizi region (Algerian southeast). Appl. Water Sci. 2021, 11, 190. [Google Scholar] [CrossRef]
Arias-Estévez, M.; López-Periago, E.; Martínez-Carballo, E.; Simal-Gándara, J.; Mejuto, J.C.; García-Río, L. The mobility and degradation of pesticides in soils and the pollution of groundwater resources. Agric. Ecosyst. Environ. 2008, 123, 247–260. [Google Scholar] [CrossRef]
Olivares, E.A.O.; Torres, S.S.; Jiménez, S.I.B.; Enríquez, J.O.C.; Zignol, F.; Reygadas, Y.; Tiefenbacher, J.P. Climate change, land use/land cover change, and population growth as drivers of groundwater depletion in the Central Valleys, Oaxaca, Mexico. Remote Sens. 2019, 11, 1290. [Google Scholar] [CrossRef]
Yin, J.N.; Medellín-Azuara, J.; Escriva-Bou, A.; Liu, Z. Bayesian machine learning ensemble approach to quantify model uncertainty in predicting groundwater storage change. Sci. Total Environ. 2021, 769, 144715. [Google Scholar] [CrossRef] [PubMed]
Lapworth, D.J.; Baran, N.; Stuart, M.E.; Ward, R.S. Emerging organic contaminants in groundwater: A review of sources, fate and occurrence. Environ. Pollut. 2012, 163, 287–303. [Google Scholar] [CrossRef]
Di Salvo, C. Improving results of existing groundwater numerical models using machine learning techniques: A review. Water 2022, 14, 2307. [Google Scholar] [CrossRef]
Chen, B.B.; Gong, H.L.; Chen, Y.; Li, X.J.; Zhou, C.F.; Lei, K.C.; Zhu, L.; Duan, L.; Zhao, X.X. Land subsidence and its relation with groundwater aquifers in Beijing Plain of China. Sci. Total Environ. 2020, 735, 139111. [Google Scholar] [CrossRef]
Afrifa, S.; Zhang, T.; Appiahene, P.; Varadarajan, V. Mathematical and machine learning models for groundwater level changes: A systematic review and bibliographic analysis. Future Internet 2022, 14, 259. [Google Scholar] [CrossRef]
Hussein, E.A.; Thron, C.; Ghaziasgar, M.; Bagula, A.; Vaccari, M. Groundwater prediction using machine-learning tools. Algorithms 2020, 13, 300. [Google Scholar] [CrossRef]
Nguyen, P.T.; Ha, D.H.; Avand, M.; Jaafari, A.; Nguyen, H.D.; Al-Ansari, N.; Phong, T.V.; Sharma, R.; Kumar, R.; Le, H.V.; et al. Soft computing ensemble models based on logistic regression for groundwater potential mapping. Appl. Sci. 2020, 10, 2469. [Google Scholar] [CrossRef]
Rahmati, O.; Falah, F.; Naghibi, S.A.; Biggs, T.; Soltani, M.; Deo, R.C.; Cerdà, A.; Mohammadi, F.; Bui, D.T. Land subsidence modelling using tree-based machine learning algorithms. Sci. Total Environ. 2019, 672, 239–252. [Google Scholar] [CrossRef]
Sachdeva, S.; Kumar, B. A comparative study between frequency ratio model and gradient boosted decision trees with greedy dimensionality reduction in groundwater potential assessment. Water Resour. Manag. 2020, 34, 4593–4615. [Google Scholar] [CrossRef]
Ghasemi, A.; Bahmani, O.; Akhavan, S.; Pourghasemi, H.R. Investigation of land-subsidence phenomenon and aquifer vulnerability using machine models and GIS technique. Nat. Hazards 2023, 118, 1645–1671. [Google Scholar] [CrossRef]
Jiang, Z.W.; Yang, S.H.; Liu, Z.Y.; Xu, Y.; Shen, T.; Qi, S.T.; Pang, Q.Q.; Xu, J.Z.; Liu, F.P.; Xu, T. Can ensemble machine learning be used to predict the groundwater level dynamics of farmland under future climate: A 10-year study on Huaibei Plain. Environ. Sci. Pollut. Res. 2022, 29, 44653–44667. [Google Scholar] [CrossRef]
Seifi, A.; Ehteram, M.; Singh, V.P.; Mosavi, A. Modeling and uncertainty analysis of groundwater level using six evolutionary optimization algorithms hybridized with ANFIS, SVM, and ANN. Sustainability 2020, 12, 4023. [Google Scholar] [CrossRef]
Samani, S.; Vadiati, M.; Azizi, F.; Zamani, E.; Kisi, O. Groundwater level simulation using soft computing methods with emphasis on major meteorological components. Water Resour. Manag. 2022, 36, 3627–3647. [Google Scholar] [CrossRef]
Xu, T.; Valocchi, A.J.; Choi, J.; Amir, E. Use of machine learning methods to reduce predictive error of groundwater models. Groundwater 2014, 52, 448–460. [Google Scholar] [CrossRef]
Xue, D.M.; Pang, F.M.; Meng, F.Q.; Wang, Z.L.; Wu, W.L. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions. J. Contam. Hydrol. 2015, 180, 25–33. [Google Scholar] [CrossRef] [PubMed]
Saghebian, S.M.; Sattari, M.T.; Mirabbasi, R.; Pal, M. Ground water quality classification by decision tree method in Ardebil region, Iran. Arab. J. Geosci. 2014, 7, 4767–4777. [Google Scholar] [CrossRef]
Lee, S.; Lee, C.W. Application of decision-tree model to groundwater productivity-potential mapping. Sustainability 2015, 7, 13416–13432. [Google Scholar] [CrossRef]
Duan, H.J.; Deng, Z.D.; Deng, F.F.; Wang, D.Q. Assessment of groundwater potential based on multicriteria decision making model and decision tree algorithms. Math. Probl. Eng. 2016, 2016, 2064575. [Google Scholar] [CrossRef]
Chen, W.; Wang, Z.; Wang, G.R.; Ning, Z.X.; Lian, B.X.; Li, S.J.; Tsangaratos, P.; Ilia, I.; Xue, W.F. Optimizing rotation forest-based decision tree algorithms for groundwater potential mapping. Water 2023, 15, 2287. [Google Scholar] [CrossRef]
Sachdeva, S.; Kumar, B. Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India. Stoch. Environ. Res. Risk Assess. 2021, 35, 287–306. [Google Scholar] [CrossRef]
Pal, S.; Kundu, S.; Mahato, S. Groundwater potential zones for sustainable management plans in a river basin of India and Bangladesh. J. Clean. Prod. 2020, 257, 120311. [Google Scholar] [CrossRef]
Pham, Q.B.; Kumar, M.; Di Nunno, F.; Elbeltagi, A.; Granata, F.; Islam, A.M.T.; Talukdar, S.; Nguyen, X.C.; Ahmed, A.N.; Anh, D.T. Groundwater level prediction using machine learning algorithms in a drought-prone area. Neural Comput. Appl. 2022, 34, 10751–10773. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.; Mendes, M.P.; Garcia-Soldado, M.J.; Chica-Olmo, M.; Ribeiro, L. Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain). Sci. Total Environ. 2014, 476, 189–206. [Google Scholar] [CrossRef]
Wang, B.; Hipsey, M.R.; Ahmed, S.; Oldham, C. The impact of landscape characteristics on groundwater dissolved organic nitrogen: Insights from machine learning methods and sensitivity analysis. Water Resour. Res. 2018, 54, 4785–4804. [Google Scholar] [CrossRef]
Judeh, T.; Almasri, M.N.; Shadeed, S.M.; Bian, H.B.; Shahrour, I. Use of GIS, Statistics and machine learning for groundwater quality management: Application to nitrate contamination. Water Resour. 2022, 49, 503–514. [Google Scholar] [CrossRef]
Chen, Y.Y.; Song, L.H.; Liu, Y.Q.; Yang, L.; Li, D.L. A review of the artificial neural network models for water quality prediction. Appl. Sci. 2020, 10, 5776. [Google Scholar] [CrossRef]
Darwishe, H.; El Khattabi, J.; Chaaban, F.; Louche, B.; Masson, E.; Carlier, E. Prediction and control of nitrate concentrations in groundwater by implementing a model based on GIS and artificial neural networks (ANN). Environ. Earth Sci. 2017, 76, 649. [Google Scholar] [CrossRef]
Gemitzi, A.; Petalas, C.; Pisinaras, V.; Tsihrintzis, V.A. Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: An application to South Rhodope aquifer (Thrace, Greece). Hydrol. Process. 2009, 23, 372–383. [Google Scholar] [CrossRef]
Heidarzadeh, N. A practical low-cost model for prediction of the groundwater quality using artificial neural networks. J. Water Supply Res. Technol.-Aqua 2017, 66, 86–95. [Google Scholar] [CrossRef]
Nguyen, P.T.; Ha, D.H.; Jaafari, A.; Nguyen, H.D.; Phong, T.V.; Al-Ansari, N.; Prakash, I.; Le, H.V.; Pham, B.T. Groundwater potential mapping combining artificial neural network and real AdaBoost ensemble technique: The DakNong Province Case-study, Vietnam. Int. J. Environ. Res. Public Health 2020, 17, 2473. [Google Scholar] [CrossRef] [PubMed]
El Bilali, A.; Taleb, A.; Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agric. Water Manag. 2021, 245, 106625. [Google Scholar] [CrossRef]
Nolan, B.T.; Hitt, K.J.; Ruddy, B.C. Probability of nitrate contamination of recently recharged groundwaters in the conterminous United States. Environ. Sci. Technol. 2002, 36, 2138–2145. [Google Scholar] [CrossRef]
Scott, M.L.; Shafroth, P.B.; Auble, G.T. Responses of riparian cottonwoods to alluvial water table declines. Environ. Manag. 1999, 23, 347–358. [Google Scholar] [CrossRef]
Squillace, P.J.; Moran, M.J.; Lapham, W.W.; Price, C.V.; Clawges, R.M.; Zogorski, J.S. Volatile organic compounds in untreated ambient groundwater of the United States, 1985–1995. Environ. Sci. Technol. 1999, 33, 4176–4187. [Google Scholar] [CrossRef]
Ozdemir, A. Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey). J. Hydrol. 2011, 405, 123–136. [Google Scholar] [CrossRef]
Lado, L.R.; Polya, D.A.; Hegan, A. A logistic regression method for mapping the As hazard risk in shallow, reducing groundwaters in Cambodia. Mineral. Mag. 2008, 72, 437–440. [Google Scholar] [CrossRef]
Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
McKinney, D.C.; Lin, M.D. Genetic algorith solution of groundwater-managemnet models. Water Resour. Res. 1994, 30, 1897–1906. [Google Scholar] [CrossRef]
Yang, X.-S. Nature-inspired optimization algorithms: Challenges and open problems. J. Comput. Sci. 2020, 46, 101104. [Google Scholar] [CrossRef]
Mahmod, W.E.; Mohamed, H.I.; Suleiman, A.H. Integrated approach for optimizing groundwater monitoring systems using evolutionary algorithms. Hydrol. Sci. J. 2021, 66, 1963–1978. [Google Scholar] [CrossRef]
Banadkooki, F.B.; Ehteram, M.; Panahi, F.; Sammen, S.S.; Othman, F.B.; El-Shafie, A. Estimation of total dissolved solids (TDS) using new hybrid machine learning models. J. Hydrol. 2020, 587, 124989. [Google Scholar] [CrossRef]
Kisi, O.; Azad, A.; Kashi, H.; Saeedian, A.; Hashemi, S.A.A.; Ghorbani, S. Modeling groundwater quality parameters using hybrid neuro-fuzzy methods. Water Resour. Manag. 2019, 33, 847–861. [Google Scholar] [CrossRef]
Ritzel, B.J.; Eheart, J.W.; Ranjithan, S. Using genitc algorithms to solve a multiple-obective groundwater pollution containment-problem. Water Resour. Res. 1994, 30, 1589–1603. [Google Scholar] [CrossRef]
Aryafar, A.; Khosravi, V.; Zarepourfard, H.; Rooki, R. Evolving genetic programming and other AI-based models for estimating groundwater quality parameters of the Khezri plain, Eastern Iran. Environ. Earth Sci. 2019, 78, 69. [Google Scholar] [CrossRef]
Hosseini, S.M.; Mahjouri, N. Developing a fuzzy neural network-based support vector regression (FNN-SVR) for regionalizing nitrate concentration in groundwater. Environ. Monit. Assess. 2014, 186, 3685–3699. [Google Scholar] [CrossRef]
Ransom, K.M.; Nolan, B.T.; Traum, J.A.; Faunt, C.C.; Bell, A.M.; Gronberg, J.A.M.; Wheeler, D.C.; Rosecrans, C.Z.; Jurgens, B.; Schwarz, G.E.; et al. A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA. Sci. Total Environ. 2017, 601, 1160–1172. [Google Scholar] [CrossRef]
Jalalkamali, A.; Jalalkamali, N. Groundwater modeling using hybrid of artificial neural network with genetic algorithm. Afr. J. Agric. Res. 2011, 6, 5775–5784. [Google Scholar] [CrossRef]
Martínez-Santos, P.; Díaz-Alcaide, S.; De la Hera-Portillo, A.; Gómez-Escalonilla, V. Mapping groundwater-dependent ecosystems by means of multi-layer supervised classification. J. Hydrol. 2021, 603, 126873. [Google Scholar] [CrossRef]
Aish, A.M.; Zaqoot, H.A.; Sethar, W.A.; Aish, D.A. Prediction of groundwater quality index in the Gaza coastal aquifer using supervised machine learning techniques. Water Pract. Technol. 2023, 18, 501–521. [Google Scholar] [CrossRef]
Oh, J.; Kim, H.R.; Yu, S.; Kim, K.H.; Lee, J.H.; Park, S.; Kim, H.; Yun, S.T. A supervised machine learning approach to discriminate the effect of carcass leachate on shallow groundwater quality around on-farm livestock mortality burial sites. J. Hazard. Mater. 2023, 457, 131712. [Google Scholar] [CrossRef] [PubMed]
Tesoriero, A.J.; Gronberg, J.A.; Juckem, P.F.; Miller, M.P.; Austin, B.P. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification. Water Resour. Res. 2017, 53, 7316–7331. [Google Scholar] [CrossRef]
Mosavi, A.; Hosseini, F.S.; Choubin, B.; Abdolshahnejad, M.; Gharechaee, H.; Lahijanzadeh, A.; Dineva, A.A. Susceptibility prediction of groundwater hardness using ensemble machine learning models. Water 2020, 12, 2770. [Google Scholar] [CrossRef]
Anjum, R.; Ali, S.A.; Siddiqui, M.A. Assessing the Impact of Land Cover on Groundwater Quality in a Smart City Using GIS and Machine Learning Algorithms. Water Air Soil Pollut. 2023, 234, 182. [Google Scholar] [CrossRef]
Kumar, S.; Pati, J. Assessment of groundwater arsenic contamination level in Jharkhand, India using machine learning. J. Comput. Sci. 2022, 63, 101779. [Google Scholar] [CrossRef]
Charulatha, G.; Srinivasalu, S.; Maheswari, O.U.; Venugopal, T.; Giridharan, L. Evaluation of ground water quality contaminants using linear regression and artificial neural network models. Arab. J. Geosci. 2017, 10, 128. [Google Scholar] [CrossRef]
Nolan, B.T.; Fienen, M.N.; Lorenz, D.L. A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA. J. Hydrol. 2015, 531, 902–911. [Google Scholar] [CrossRef]
Víctor, G.E.; Marie-Louise, V.; Elisa, D.; Moussa, I.; Giaime, O.; Daira, D.; Pedro, M.S.; Francesco, H. Delineation of groundwater potential zones by means of ensemble tree supervised classification methods in the Eastern Lake Chad basin. Geocarto Int. 2022, 37, 8924–8951. [Google Scholar] [CrossRef]
Naghibi, S.A.; Dashtpagerdi, M.M. Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features. Hydrogeol. J. 2017, 25, 169–189. [Google Scholar] [CrossRef]
Gao, S.L.; Xu, M.H.; Zhao, L.X.; Chen, Y.Y.; Geng, J.H. Seismic predictions of fluids via supervised deep learning: Incorporating various class-rebalance strategies. Geophysics 2023, 88, M185–M200. [Google Scholar] [CrossRef]
Knoll, L.; Breuer, L.; Bach, M. Nation-wide estimation of groundwater redox conditions and nitrate concentrations through machine learning. Environ. Res. Lett. 2020, 15, 064004. [Google Scholar] [CrossRef]
Barlow, H.B. Unsupervised Learning. Neural Comput. 1989, 1, 295–311. [Google Scholar] [CrossRef]
Narvaez-Montoya, C.; Mahlknecht, J.; Torres-Martínez, J.A.; Mora, A.; Bertrand, G. Seawater intrusion pattern recognition supported by unsupervised learning: A systematic review and application. Sci. Total Environ. 2023, 864, 160933. [Google Scholar] [CrossRef]
Friedel, M.J.; Wilson, S.R.; Close, M.E.; Buscema, M.; Abraham, P.; Banasiak, L. Comparison of four learning-based methods for predicting groundwater redox status. J. Hydrol. 2020, 580, 124200. [Google Scholar] [CrossRef]
Wu, T.N.; Su, C.S. Application of principal component analysis and clustering to spatial allocation of groundwater contamination. In Proceedings of the Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Jinan, China, 18–20 October 2008; IEEE: New York, NY, USA, 2008; Volume 4, pp. 236–240. [Google Scholar]
Yang, J.; Ye, M.; Tang, Z.H.; Jiao, T.; Song, X.Y.; Pei, Y.Z.; Liu, H.H. Using cluster analysis for understanding spatial and temporal patterns and controlling factors of groundwater geochemistry in a regional aquifer. J. Hydrol. 2020, 583, 124594. [Google Scholar] [CrossRef]
Vesselinov, V.V.; Alexandrov, B.S.; O’Malley, D. Contaminant source identification using semi-supervised machine learning. J. Contam. Hydrol. 2018, 212, 134–142. [Google Scholar] [CrossRef] [PubMed]
Devic, G.; Djordjevic, D.; Sakan, S. Natural and anthropogenic factors affecting the groundwater quality in Serbia. Sci. Total Environ. 2014, 468, 933–942. [Google Scholar] [CrossRef]
Helena, B.; Pardo, R.; Vega, M.; Barrado, E.; Fernandez, J.M.; Fernandez, L. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Res. 2000, 34, 807–816. [Google Scholar] [CrossRef]
Nakagawa, K.; Amano, H.; Berndtsson, R. Spatial Characteristics of groundwater chemistry in Unzen, Nagasaki, Japan. Water 2021, 13, 426. [Google Scholar] [CrossRef]
Nanni, A.; Roisenberg, A.; Fachel, J.M.G.; Mesquita, G.; Danieli, C. Fluoride characterization by principal component analysis in the hydrochemical facies of Serra Geral Aquifer System in Southern Brazil. An. Acad. Bras. Cienc. 2008, 80, 693–701. [Google Scholar] [CrossRef]
Cloutier, V.; Lefebvre, R.; Therrien, R.; Savard, M.M. Multivariate statistical analysis of geochemical data as indicative of the hydrogeochemical evolution of groundwater in a sedimentary rock aquifer system. J. Hydrol. 2008, 353, 294–313. [Google Scholar] [CrossRef]
Sahu, P.; Sikdar, P.K. Hydrochemical framework of the aquifer in and around East Kolkata Wetlands, West bengal, India. Environ. Geol. 2008, 55, 823–835. [Google Scholar] [CrossRef]
Zhang, B.; Song, X.F.; Zhang, Y.H.; Han, D.M.; Tang, C.Y.; Yu, Y.L.; Ma, Y. Hydrochemical characteristics and water quality assessment of surface water and groundwater in Songnen plain, Northeast China. Water Res. 2012, 46, 2737–2748. [Google Scholar] [CrossRef]
Kim, J.H.; Kim, R.H.; Lee, J.; Cheong, T.J.; Yum, B.W.; Chang, H.W. Multivariate statistical analysis to identify the major factors governing groundwater quality in the coastal area of Kimje, South Korea. Hydrol. Process. 2005, 19, 1261–1276. [Google Scholar] [CrossRef]
Reghunath, R.; Murthy, T.R.S.; Raghavan, B.R. The utility of multivariate statistical techniques in hydrogeochemical studies: An example from Karnataka, India. Water Res. 2002, 36, 2437–2442. [Google Scholar] [CrossRef]
Chen, K.P.; Jiao, J.J.; Huang, J.M.; Huang, R.Q. Multivariate statistical evaluation of trace elements in groundwater in a coastal area in Shenzhen, China. Environ. Pollut. 2007, 147, 771–780. [Google Scholar] [CrossRef]
Huang, G.X.; Sun, J.C.; Zhang, Y.; Chen, Z.Y.; Liu, F. Impact of anthropogenic and natural processes on the evolution of groundwater chemistry in a rapidly urbanized coastal area, South China. Sci. Total Environ. 2013, 463, 209–221. [Google Scholar] [CrossRef]
Wu, J.H.; Li, P.Y.; Wang, D.; Ren, X.F.; Wei, M.J. Statistical and multivariate statistical techniques to trace the sources and affecting factors of groundwater pollution in a rapidly growing city on the Chinese Loess Plateau. Hum. Ecol. Risk Assess. 2020, 26, 1603–1621. [Google Scholar] [CrossRef]
Ratolojanahary, R.; Ngouna, R.H.; Medjaher, K.; Dauriac, F.; Sebilo, M. Groundwater quality assessment combining supervised and unsupervised methods. In Proceedings of the IFAC PapersOnline; IFAC: New York, NY, USA, 2019; pp. 340–345. [Google Scholar]
Sahour, H.; Gholami, V.; Vazifedan, M. A comparative analysis of statistical and machine learning techniques for mapping the spatial distribution of groundwater salinity in a coastal aquifer. J. Hydrol. 2020, 591, 125321. [Google Scholar] [CrossRef]
Feng, W.; Shum, C.K.; Zhong, M.; Pan, Y. Groundwater storage changes in China from satellite gravity: An overview. Remote Sens. 2018, 10, 674. [Google Scholar] [CrossRef]
Khan, J.; Lee, E.K.Y.; Balobaid, A.S.; Kim, K. A Comprehensive review of conventional, machine leaning, and deep learning models for groundwater level (GWL) forecasting. Appl. Sci. 2023, 13, 2743. [Google Scholar] [CrossRef]
Condon, L.E.; Kollet, S.; Bierkens, M.F.P.; Fogg, G.E.; Maxwell, R.M.; Hill, M.C.; Fransen, H.J.H.; Verhoef, A.; Van Loon, A.F.; Sulis, M.; et al. Global groundwater modeling and monitoring: Opportunities and challenges. Water Resour. Res. 2021, 57, e2020WR029500. [Google Scholar] [CrossRef]
Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehrm, A.D.; et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 2022, 489, 271–308. [Google Scholar] [CrossRef]
Elsayed, A.; Levison, J.; Binns, A.; Larocque, M.; Goel, P. A review of machine learning applications in the prediction of selected groundwater quality parameters: Key lessons, knowledge gaps, and future directions. Sci. Total Environ. 2026, 1027, 181693. [Google Scholar] [CrossRef]

Figure 1. The exclusion and selection process of publications.

Figure 2. Research evolution of ML applications in groundwater from 1996 to 2023. (a) Annual publication number; (b) timeline view of keyword clusters; (c) keyword clustering graph; (d) co-occurrence of keywords. (Search options: “Machine learning” (Topic) and “Groundwater” (Topic)). Circle size represents keyword frequency, while different colors indicate clusters based on co-occurrence relationships; Lines represent the co-occurrence relationships between nodes, and their thickness and color indicate the co-occurrence intensity.

Figure 3. Research leaders of ML applications in groundwater from 1996 to 2023. (a) Sankey diagram of publication countries and years; (b) country cooperation network; (c) institute cooperation network; (d) author cooperation network; (e) geographical distribution of the reviewed articles.

Figure 4. Application of ML in groundwater prediction and pollution identification process.

Table 1. Search keywords used for article collection from WOS.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.

Target Expression	Search Keywords
Groundwater	“groundwater” or “ground water”
Machine learning (ML)	“machine learning” or “machine learning models” or “artificial intelligence”