1. Introduction
Despite the abundance of water on Earth, only 3% of this total is classified as fresh water [
1]. The limited availability of freshwater globally, with some regions depending on it as their sole resource, emphasizes its critical importance. This is underlined by an annual increase in demand of 1% since 2000 [
2]. Water quality plays an important role in determining the ecological environment and public health safety [
3]. People’s social and economic well-being is directly proportional to having clean and safe freshwater resources [
4]. Ecosystems such as glaciers, lakes, rivers, swamps, soil moisture, and groundwater are the places where freshwater resources are most abundant. Groundwater, under the protection of the lithosphere, is considered a natural freshwater source and carries an ecosystem invisible to the naked eye [
5,
6]. Groundwater is an important sustainable water source that is refilled with rainfall and snowfall events, which are caused by the effect of solar energy on the Earth or the support of flowing water [
7]. Groundwater further plays an important role in the hydrological cycle due to its interrelation with other water resources [
8]. Groundwater is a resource that is in danger of depletion, making the investigation of its quality critical. For this reason, research and publications on the depletion and quality of groundwater are important in terms of examining these resources in detail. However, the exposure of groundwater studies has been limited to conventional approaches, and alternative approaches, such as bibliometric analysis studies, have not become widespread [
9].
Machine learning (ML) techniques are computer techniques that use algorithms that can train and test to create predictive models based on available data [
10]. Identifying non-linear variable behavior is one of the purposes of ML. ML, which can be defined as a subset of artificial intelligence (AI), can make practical and predictive decisions on target variables by using input variables [
11]. At the same time, ML models are effective techniques that provide promising outcomes in minimizing source quantization errors that are considered difficult to prevent and are frequently used in many research fields [
12]. The relevant literature reveals that ML techniques are used to examine various groundwater-related issues, including groundwater potential mapping, groundwater contamination mapping, and quantity and quality assessments. Various groundwater quality parameters, such as the Water Quality Index, have also recently been used and integrated with ML techniques [
13]. These modeling schemes have shown rewarding outcomes in hydrological studies and complement traditional methods by ensuring notable accuracies in predictive attempts. The use of conventional approaches in assessing complex groundwater problems poses challenges and is time-consuming for researchers and practitioners [
14]. Hence, ML techniques in groundwater quality studies differ from GIS-based techniques in that they have the ability to perform enhanced simulation frameworks, such as cross-validation and bootstrap tests, to increase the reliability of the databases utilized. Moreover, their effectiveness in determining the contribution of variables in water quantity and quality assessment makes ML a more suitable option in the groundwater research domain, which can be costly in terms of budget and time, especially given the large data requirements [
10].
In the field of groundwater, supervised algorithms are the most widely used ML methods. In particular, artificial neural networks (ANN), inspired by the working principle of biological nerve cells in the human brain, support vector machines (SVM) providing maximum margin between different classes by transforming data into high-dimensional spaces, random forests (RF) that increase accuracy by combining the results of a large number of decision trees, and adaptive neuro-fuzzy inference systems (ANFIS) combining neural networks with fuzzy logic and that make successful predictions in cases of uncertainty can be regarded as some of the well-known and powerful techniques within the supervised algorithm class. These methods are especially widely used in estimating the groundwater level and pollution [
15,
16]. In ML applications, R and Python languages are widely used with the support of open-source libraries such as Scikit-learn and TensorFlow. In addition, data analysis software such as MATLAB, which contains divergent tools, and WEKA, which is developed entirely for ML implementations, are also widely used [
17,
18,
19,
20]. Additionally, the performance of ML algorithms is adjusted with various parameters. For this reason, the optimization of these parameters is important to obtaining accurate outputs from the data-driven models. In this sense, some other techniques have also been proposed to optimize the hyper-parameters of the ML algorithms [
21]. The techniques used in optimization processes are generally classified into two main groups. The first group includes deterministic optimization methods that provide exact solutions, such as linear programming (LP) [
22] and dynamic programming (DP) [
23], while the second group includes meta-heuristic approaches, such as Genetic Algorithm (GA) [
24] and Particle Swarm Optimization (PSO) [
25] and offers more flexible, intuitive solution strategies.
Bibliometric analysis is a methodology offering quantitative analysis and visualization of data extracted from related studies conducted within the scientific literature, utilizing mathematical and graphical computational techniques. The earliest known examples of this approach were introduced by Alan Pritchard in 1969 [
3]. Researchers analyze the number and trends of publications within a specific research field, identify the journals where these articles are most frequently published, and highlight the most prolific scientists, institutions, or countries. They can explore topic trends from multiple perspectives and extract the most relevant and significant insights through bibliometric analysis [
26]. Systematic techniques that ensure access to transparent bibliographic information can also be incorporated for holistic evaluations [
7]. Bibliometric analysis relies on extensive datasets, often comprising hundreds or even thousands of publications, incorporating both objective metrics (e.g., most cited authors, citation counts) and subjective aspects (e.g., thematic evolution). The analysis primarily focuses on the core metadata of articles, utilizing statistical approaches that facilitate the interpretation of large datasets, which would otherwise be too complex to analyze manually [
9].
In this study, comprehensive datasets were compiled from two commonly utilized databases, i.e., Scopus and Web of Science (WoS), to address the respective research with a particular focus on both groundwater and ML. The overarching goal of the current study is to examine the academic literature in these two fields using bibliometric analysis and reveal both general trends and specific details. Within the scope of the bibliometric analysis, various metrics such as current research topics in the literature, the most frequently published journals, the most cited authors, productive institutions, and countries were evaluated in detail. In addition, evidence such as trends, thematic evolution, and collaboration networks of scientific studies over the years were also visualized in order to provide a comprehensive overview to the interested research community.
In this context, software such as Bibliometrix (version 4.0) and VOSviewer (version 1.6.20) were used, and mathematical and graphical analyses of the data were performed. Bibliometrix is a tool based on the R programming language and allows for in-depth studies, especially in the creation of thematic maps and scientific impact analyses. Meanwhile, VOSviewer was utilized as a powerful tool in producing visual-based outputs such as collaboration networks and keyword analyses. The datasets used in the study cover a wide range of publications published since 2000. Accordingly, the present attempt aimed to address key questions such as how these two fields have evolved over time, which themes have gained prominence, and where knowledge gaps exist. In particular, the extensive body of research on groundwater served as a foundation for understanding the application and extent of ML techniques in this domain. Large-scale datasets obtained from these databases were systematically analyzed using both objective and subjective approaches provided by bibliometric analysis. Ultimately, these analyses sought to offer a comprehensive overview of the current research landscape in groundwater and ML, highlighting the current trends and potential future research directions.
2. Materials and Methods
Various searches were conducted using WoS and Scopus databases to examine the studies conducted using ML methods on groundwater through bibliometric analysis. The searches were performed by scanning the title, abstract, and keyword sections of relevant articles, initially using the keywords “groundwater” and “machine learning”. Upon reviewing the retrieved documents, additional relevant keywords associated with similar topics were identified and incorporated into the search. The results were then refined using inclusion and exclusion criteria, such as article type, language, and subject area, with the search period restricted to the years 2000 to 2023.
The main reasons for preferring Web of Science (WoS) and Scopus databases in this study are their academic reliability, broad coverage, and data consistency in bibliometric analyses. These paid and audited databases provide reliable bibliographic data by scanning the comprehensive literature consisting of refereed journals and conferences. While WoS is referred to as the “gold standard” in the literature, Scopus is accepted as one of the largest comprehensive academic databases; many studies base their comprehensive analyses on these two databases [
27,
28]. On the other hand, it is worth noting that sources such as Google Scholar and IEEE Xplore were not evaluated in the present work. Although Google Scholar offers free access and broad coverage, it is not considered reliable as a bibliographic data source due to the uncertainty of the indexing and the inability to control data quality. The fact that the content scanning and counting methods in Google Scholar are not transparent and that it has the potential to contain duplicate and/or incorrect records, allowing for not large-scale data transfer, may lead to data consistency problems in bibliometric analyses [
29]. Similarly, IEEE Xplore was not preferred because it is a narrow-scope digital library that focuses only on IEEE publications. Since Scopus also indexes publications in major digital libraries such as IEEE Xplore, there was no need to consult such special databases separately. Associatively, similar studies in the literature often use WoS and Scopus while scanning the pertinent literature and do not include sources that are difficult to control, such as Google Scholar, or have limited scope, such as IEEE Xplore [
30].
As a result of all these efforts, the search for Scopus is: “TITLE-ABS-KEY (groundwater OR “ground water”) AND TITLE-ABS-KEY (“machine learning” OR “soft comput*” OR “data driven” OR “data-driven”) AND PUBYEAR > 1999 AND PUBYEAR < 2024 AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “cp”)) AND (LIMIT-TO (LANGUAGE, “English”))”, while for the WoS: “Refine results for (groundwater OR ‘ground water’) AND (‘machine learning’ OR ‘soft comput*’ OR ‘data driven’ OR ‘data-driven’) (Title) OR (groundwater OR ‘ground water’) AND (‘machine learning’ OR ‘soft comput*’ OR ‘data driven’ OR ‘data-driven’) (Abstract) OR (groundwater OR ‘ground water’) AND (‘machine learning’ OR ’soft comput*’ OR ‘data driven’ OR ’data-driven’) (Author Keywords) and English (Languages) and Article or Proceeding Paper (Document Types) and 2024 (Exclude, Final Publication Year)”. The list of research that was obtained has been downloaded in comma-separated values (.csv) and BibTex formats.
The missing records, such as author names, titles, and publication years in the obtained data, were corrected, and then the WoS and Scopus data were combined into a common file to ensure a holistic search mechanism. To avoid repetitive evaluation of articles published in both WoS and Scopus, duplicate entries were identified and removed using the dplyr package (version 1.1.4) in R. This step ensured the accuracy and uniqueness of the dataset. After completing all preprocessing steps, a total of 1797 unique documents were retained for analysis.
To perform the bibliometric analysis, the R package Bibliometrix and VOSviewer software were used as widely recognized in the pertinent literature [
31,
32]. Bibliometrix is an important tool that is freely available within the R, and it allows for analyzing and evaluating the details in the scientific literature with bibliometrics to ensure quantitative outcomes and enhanced visual representations. Specifically, it allows for analyses such as scientific productivity, collaborations, and citation analyses using data obtained from databases that contain data on the scientific literature, such as Scopus and WoS. Bibliometrix, which is a comprehensive tool for the detailed examination of scientific research, also contributes greatly to the interpretability of the analysis results with its data visualization capabilities [
31].
VOSviewer is a bibliometric analysis tool developed by Ludo Waltman and Nees Jan van Eck [
32] from Leiden University for the visualization and analysis of scientific maps. It has the ability to use data from databases such as WoS and Scopus and to create maps that users can explore interactively. VOSviewer, which stands out with its visualization ability and the creation of citations, collaboration, and topic maps, classifies the scientific literature according to subject areas and research groups by creating clusters among the data [
32]. Bibliometrix and VOSviewer provide researchers with powerful and complementary tools for bibliometric analysis, university research performance evaluation, and scientific collaboration analysis. While Bibliometrix excels in customization and flexibility, VOSviewer is particularly strong in visualization and network analysis capabilities [
31].
In this research, the analyses were evaluated in two categories: performance analysis and scientific mapping. For the performance analysis, the annual publication numbers and the productivity of authors, institutions, and countries were determined, and the impact levels of authors and journals were examined through h-index and g-index metrics. In science mapping, techniques such as bibliographic matching, word analysis, co-authorship analysis, and citation network analysis were used to examine the relationships between research components. Among the metrics used for bibliometric analysis, h-index and g-index have an important role. Proposed by Jorge Hirsch [
33], the h-index is a measurement that evaluates the productivity of researchers by establishing a relationship with their citation impact. It is an approach that expresses the number h as a result of a situation where h number of articles of a researcher are cited. Proposed by Leo Egghe [
34], the g-index is defined as a measurement where the most cited g articles receive at least g
2 citations in total and focus more on highly cited articles.
Within this research, the results were presented and interpreted by utilizing graphs, word clouds, and network maps. These visualizations were used to understand the structure and dynamics of the research area.
3. Results
3.1. General Trends and Research Patterns
Although the search study conducted is limited to the period of 2000–2023 (
Table 1), the bibliometric analysis of groundwater and ML studies over the last two decades showcases significant insights, including trends and patterns in this research domain.
The dataset, comprising 1797 documents from 574 different publication sources, offers a comprehensive overview of the growth, collaboration, and research activity related to the application of ML in groundwater studies. As shown in
Table 1, the field has experienced a substantial annual growth rate of 31.99%, indicating a significant rise in studies integrating ML and groundwater research. This increasing research activity highlights the growing need for advanced computational methods to model and predict groundwater dynamics, which is a complex environmental system that requires more sophisticated analytical approaches.
The results also demonstrate a total of 5425 researchers contributed to all the publications, resulting in an average of 4.76 author contributions per study. The high average author contributions highlight the collaborative nature of groundwater research, which requires expertise in the fields of hydrology, environmental science, and advanced data analytics. On the other hand, international co-authorship accounts for 2.39%, reflecting that while collaboration within studies is strong, it primarily occurs at national or regional levels rather than through extensive international partnerships.
The average age of the publications is relatively recent, at 3.49 years, revealing the contemporary side of research in this field. This recency is also reflected in the citation metrics, with each publication receiving an average of 21.61 citations, underscoring the rapid advancement of ML applications in groundwater studies and their significance within the academic community. The publications analyzed contain 4368 author keywords, not only covering various research topics but also demonstrating the extensive network of interrelated research areas. In addition, a strong foundation built upon a broad spectrum of prior research can be underscored by the fact that the listed 1797 documents have cited more than a thousand reference publications. This extensive citation network highlights the interdisciplinary nature of groundwater and ML studies, connecting them to broader scientific domains such as climate science, hydrology, and artificial intelligence.
3.2. Institutional Contributions
Figure 1 presents a three-field network plot based on bibliometric analyses of groundwater and ML. The plot includes three main dimensions, i.e., institutions (AU_UN, on the left), countries (AU_CO, in the middle), and journals (SO, on the right). The connections between these three dimensions reveal academic collaborations and the journals in which the papers were published. Upon examining the left side of the graph, Jilin University, China University of Geosciences, and Hohai University in China stand as prominent contributors to research on groundwater analysis using ML techniques. The strong presence of these institutions reflects China’s strategic emphasis on addressing its groundwater management challenges. Institutions such as Tehran University and Islamic Azad University in Iran have made significant contributions to the field of interest. This heightened focus is driven by a critical need to manage water resources effectively, particularly in regions facing severe water scarcity and pollution issues. The efforts of the institutions in groundwater research indicate their efforts to utilize advanced computational methods to improve and develop the management of sustainable water resources.
Furthermore, the AU_CO section located in the middle of the graph provides information regarding national efforts to shape the outputs obtained from the research attempts. As can be seen from
Figure 1, the United States and China are the leading countries in terms of national productivity, while the leadership of the United States (US) is associated with its technological power and pioneering attitude in environmental research. American institutions, especially in regions known for their consistent challenges with chronic drought conditions, such as California, have made significant contributions to the development and application of ML models in groundwater modeling, which plays a critical role in sustaining sub-surface water resources. Likewise, China’s contribution to ML-based groundwater research reflects its commitment to conducting scientific analysis and investing in environmental sciences. The cooperation of Chinese institutions with other international institutions is also noteworthy. Notably, the frequent collaborations between China and the US demonstrate their mutual interest in groundwater resources, escalating the engagement among these two countries’ researchers. On the one hand, concerning the geographical location of these countries, this international cooperation can be considered a rewarding driver for finding solutions to global environmental problems such as groundwater scarcity and pollution. On the other hand, the relatively low level of international co-authorship in the dataset analysis highlights an opportunity for expanding global collaboration in this field. Strengthening such partnerships could enhance the robustness and global relevance of research findings, particularly in addressing transboundary water challenges that demand coordinated international efforts.
The SO section on the right side of the graph highlights the leading journals publishing research on groundwater and ML. Notably, the Journal of Hydrology and Hydrogeology Journal emerge as a key publication platform in this field. The dominance of hydrology-focused journals underscores the strong emphasis on water science within this research domain. These journals are recognized for their rigorous peer-review processes, playing a crucial role in disseminating significant advancements in groundwater modeling and ML applications. Additionally, interdisciplinary journals such as Science of the Total Environment and Water Resources Research are prominently featured, reflecting the cross-disciplinary nature of this research area. These journals appeal to a broad audience, from environmental science to hydrology and other engineering fields, which is important in encouraging interdisciplinary innovation. The inclusion of journals such as Remote Sensing reflects the increasing integration of satellite data and remote sensing technologies in groundwater studies, further increasing the accuracy and enhancing the scope of environmental modeling efforts.
Table 2 showcases the most influential institutions in the field of groundwater and ML, considering their scholarly output and helping identify the leading institutions driving achievements in the field. Upon reviewing the figure, Jilin University ranked at the top with 58 publications, indicating its pivotal role in enhancing research at the intersection of groundwater and ML. The University of Tehran and the University of Tabriz follow closely, which reveals Iran’s focus on technological solutions for water resource management. Other notable contributions have been made by the Islamic Azad University and Ton Duc Thang University, with 45 and 41 publications, respectively. These institutions are making significant contributions to solving environmental problems by combining ML applications with groundwater models. Also, prominent institutions, namely the University of California and the China University of Geosciences, are actively contributing through international collaborations, enhancing both the quantity and quality of research in this field.
3.3. Influence of Publication Sources
The local impact of the publication sources, represented by their h-index, is presented in
Table 3. The table provides insights regarding the impact of the journals in which scientific studies in the groundwater field using ML models are published. According to the table, the “Journal of Hydrology” has the highest h-index, with a remarkable value of 37, along with a high impact factor. This indicates that articles published in the field of groundwater and ML research in this journal have a significant influence and are frequently cited. The journal’s strong impact makes it a highly preferred platform for publishing studies related to water resources management, hydrology, and environmental sciences. This journal is followed by the “Science of the Total Environment” having a h-index of 31, indicating its impactful role in environmental sciences and sustainability. The broad interdisciplinary interaction and scope of this journal further make it highly influential in studies with regard to water resources. These two journals stand out as the leading publication sources, ensuring that pertaining research is highly cited and reaches a broad academic community.
In addition, journals such as “Water Resources Research” and “Water (Switzerland)” stand out as other significant publication sources in the water resources management domain, specifically groundwater modeling. Articles also published in these journals are widely accepted by the research community and cited in academic circles, which increases the prestige of these journals in the field. Journals having 15 h-index values, such as “Water Resources Management” and “Environmental Science and Pollution Research”, also stand out for their expertise in specific areas and the significant impact they have made in this field of research. Likewise, “Advances in Water Resources”, “Environmental Earth Sciences”, and “Hydrology and Earth System Sciences” seem to be relatively less cited due to their limited audiences; however, they still serve as valuable platforms for important research focused on specific niche topics and host impactful studies that provide in-depth information in the groundwater research domain.
3.4. Gobal Perspectives: Country-Level Contributions
Whether the majority of studies on groundwater are local or international is also an important area of inquiry for bibliometric analysis.
Figure 2 also shows the countries where the corresponding authors in the field of groundwater and ML are located and the level of international collaboration of these countries. The graph visualizes the total number of documents published by the countries and whether these documents were published through single-country collaboration (SCP) or multiple-country collaborations (MCP). This analysis is important for understanding the level of international collaboration and the most productive countries in a given field.
China has the highest number of documents in the visualization, with a total of approximately 300 documents, making it the most productive country in this field. China’s research activities in groundwater and ML have largely been carried out through SCP. This shows that China has made significant scientific contributions in this field with its domestic resources and local collaborations. The United States (US) is second with approximately 250 documents and is also a significant contributor to MCP. The US’s international collaborations facilitate the global dissemination of research in this field globally and create a broader impact.
India and Iran are among the other key countries in this field, ranking third and fourth, respectively. While both nations primarily conduct their research through SCP, they also place considerable emphasis on MCP. Iran’s international collaborations highlight its role as a significant leader in this field, particularly in the Middle East. Other countries, such as South Korea, Italy, Austria, and Germany, are also notably productive in this domain. They often make substantial contributions to groundwater and ML research through both national and international collaborations.
Figure 2 further underscores the importance that countries such as Canada, Egypt, Sweden, and Denmark place on international collaborations. These nations have conducted relatively more MCP, suggesting a greater integration into global research networks compared to Germany and other countries ranked below it.
The high rate of SCP can be due to a variety of regional, political, economic, and academic factors. The lack of international collaboration seen in
Figure 2 is often associated with insufficient funding, national focus on academic incentives, language barriers, and political constraints. Limited research budgets, especially in developing countries, make it further difficult to participate in international projects, while legal restrictions and bureaucratic processes regarding data sharing in some countries can also hinder collaboration. Geographical distance can further hinder teamwork by making effective communication and regular collaboration among researchers difficult. Different time zones, travel requirements, and logistical difficulties can make participating in international projects more costly and demanding. Methodological differences, on the other hand, arise from differences in academic traditions, data collection techniques, and approaches to analysis between countries. While certain modeling methods or data processing techniques are common in some countries, different standards may be adopted in others, which may create incompatibility in joint projects. Although the literature states that international collaborations increase scientific impact and visibility, the existence of such barriers causes SCP rates to remain high.
Table 4 shows the most cited countries in the field of groundwater and ML. The analysis graph, which visualizes the relationship between the total number of citations for each country and the number of published documents, is important for understanding the scientific impacts related to the current research area and which countries have contributed the most to this field.
The US is at the top of the list with 6557 citations, indicating that the US has the highest academic impact in this field. The main reasons for the high scientific impact of the US include extensive research funding, large academic networks, studies published in high-impact journals, and a developed research infrastructure. In addition, the US’s large population and number of universities are important factors that increase academic productivity. Iran ranks second with 5814 citations, indicating that research in this field has a wide resonance in the scientific literature. Iran’s strong academic impact in this field is based on its search for solutions against water-related challenges like water scarcity [
35]. The country’s heavy investment in scientific research, especially on water management issues, is one of the main factors behind this high number of citations.
China ranks third with 5488 citations, while India ranks fourth with 2960 citations. China and India have been increasingly influenced in this field due to their large populations, rapidly developing science and technology policies, nationally promoted research programs, and large data pools. China, in particular, conducts intensive ML research in the fields of hydrological modeling and water resources management and encourages these studies through state-supported projects. India, on the other hand, prioritizes research on regional water problems by developing ML-based solutions in agricultural irrigation and water management [
36]. However, the publication rate per citation of research conducted in China and India is lower compared to the US and Iran, indicating that the academic impact is broader but not widely referenced.
Countries such as Switzerland (1246 citations), Germany (982 citations), and Australia (762 citations) also have moderate citation numbers. Switzerland and Germany, in particular, attach importance to interdisciplinary research and international academic collaborations and strengthen their research in the field of ML and groundwater management through European Union (EU)-supported projects. The United Kingdom (750 citations), Malaysia (735 citations), and Canada (697 citations) are also among the countries with moderate academic impact in the field of ML and groundwater management compared to their counterparts. Despite their relatively lower number of citations, the United Kingdom and Malaysia have also been influential in the academic field with their work focusing on regional water management issues, and Canada with its research on data-driven ML models and water quality prediction.
3.5. Keyword Analysis and Research Focus
Figure 3 and
Figure 4 analyze the most frequently used keywords in groundwater and ML research, as well as their relevance.
Figure 3 presents the number of times the most commonly used words appear in the documents, while
Figure 4 visualizes these terms as a word cloud. This analysis is crucial for understanding the primary research focuses in the field and identifying the topics that receive the most attention. The most frequently used keyword in
Figure 3 is “groundwater”, appearing 1479 times. Likewise, “machine learning” is mentioned 1330 times, highlighting the widespread application of ML techniques in groundwater studies and their significant role in this research domain.
Other notable keywords include “groundwater resources” (77 times), “water quality” (67 times), and “hydrogeology” (65 times), indicating that water resources management, water quality, and hydrogeology are central themes in this field. Additionally, terms such as “artificial intelligence” (20 times) and “neural networks” (20 times) reflect the growing focus on technological innovations in groundwater research. In
Figure 4, “machine learning” and “groundwater” appear as the most prominent keywords, reinforcing their dominant role in research. Keywords such as “groundwater resources”, “water quality”, “environmental monitoring”, and “artificial neural networks” also stand out, emphasizing strong research interest in the intersection of water management and ML. This analysis underscores the field’s extensive study among researchers and highlights its significance in addressing the connections between groundwater management and ML techniques.
Figure 5 presents a tree map illustrating the proportional distribution of the most frequently used keywords in groundwater and ML research. The treemap plotted visually represents how widely each keyword is used in the research literature and how much space it occupies compared to other terms. This analysis is important for understanding which topics are most studied in the research literature and which areas receive the most attention.
The keyword that occupies the largest area in the visual is “machine learning”, accounting for 36% of the entire study. This clearly shows that ML techniques are of great importance in groundwater research and are one of the most frequently used methods in studies in this field. The keyword “groundwater” is second, used 278 times (16%), revealing that groundwater-related topics are intensively studied by integrating ML. Notably, these results align with the search conducted in this study. However, the plot specifically shows that some ML methods, such as “random forest” (143 times, 8%) and “deep learning” (89 times, 5%), also occupy a significant place. These methods are widely used in groundwater data analysis and constitute a significant part of the research conducted in this field. Terms such as “GIS” (88 times, 5%), “groundwater level” (79 times, 5%), and “remote sensing” (64 times, 4%) are also noteworthy, according to the tree map. These terms show how important geographic information systems and remote sensing technologies play in groundwater management, modeling, and data curation processes. Other keywords include terms such as “artificial intelligence” (56 times, 3%), “groundwater quality” (44 times, 3%), “prediction” (44 times, 3%) and “support vector machines” (46 times, 3%). These terms reveal that artificial intelligence and ML techniques, in particular, are widely applied in various areas of groundwater studies.
3.6. Emerging Research and Trend Topics
The bibliometric analysis conducted within the scope of the current research further covers the evaluation of prominent trends and popular topics in groundwater-related ML studies. The corresponding outcomes are presented in
Figure 6. The figure shows the distribution of the most frequently used key terms in groundwater-themed ML studies by year, thereby aiding in revealing the prominent topics in the studies and identifying how the related research areas have evolved over time. Analyzing the frequency of certain terms over time allows researchers to determine when they emerged and in which years they gained popularity. In recent years, terms such as “GIS”, “water quality index”, “geostatistics”, and “machine learning” have become particularly prominent. This trend indicates a growing interest in water quality assessments and ML-based approaches in groundwater studies, as well as an increasing integration of geographic information systems into these analyses. Notably, the terms “groundwater” and “groundwater” have been used interchangeably. While “ground water” was more commonly used in earlier years, recent studies show a clear preference for the single-word form “groundwater”. Additionally, terms such as “neural networks”, “kriging”, and “genetic algorithm” gained popularity in earlier years and continue to be widely used today.
In the visual representation, “machine learning” and “groundwater” appear as the largest bubbles, signifying their prominence as fundamental research topics. The increasing use of water quality indices and geostatistical methods highlights their significance in groundwater studies. Furthermore, ML-related terms such as “neural networks” and “boosted regression trees” have gained popularity in recent years, indicating a shift toward more advanced ML models in groundwater research. The frequent use of terms like “forecasting” and “algorithms” over an extended period underscores the critical role of predictive models in estimating variables such as groundwater levels. The presence of country names, such as “Iran”, suggests that research efforts may be concentrated in specific regions where groundwater protection and management receive significant attention. Additionally, terms like “LiDAR” and “hydrogeophysics” highlight the adoption of advanced technological methods for data collection and analysis in groundwater studies. The visualization further indicates that terms such as “machine learning”, “water quality index”, and “neural networks” have gained substantial traction, especially from 2021 onward. This suggests that ML and water quality assessments will continue to be key research areas in groundwater analysis in the future.
3.7. Thematic Clusters and Research Directions
Another key aspect of the investigation is the thematic map presented in
Figure 7. This thematic map categorizes research themes in groundwater and machine learning ML into four main groups based on their density and centrality, offering insights into the current state of the field and potential future research directions. It is worth mentioning that each quadrant of the map represents a different type of research theme. For instance, “motor themes” located at the upper right side of the plot reflect core research areas that are both highly developed and strongly connected to other topics in the field, and “niche themes” on the left upper side of the plot focus on specialized subfields that require specific expertise but may have limited broader impact. Additionally, “basic themes” located at the right bottom of the figure represent widely used methods that serve as foundational tools in research, and “emerging and declining themes” on the left bottom of the plot highlight topics that may either be gaining or losing relevance over time.
Motor Themes, characterized by high centrality and density, include essential and well-established topics within the field. Notable keywords in this category include “groundwater level”, “geostatistics”, “data-driven models”, “Gaussian process regression”, “SVM” (support vector machine), “ANN” (artificial neural networks), “ANFIS” (adaptive neuro-fuzzy inference system), and “logistic regression”. These terms highlight the widespread adoption of ML techniques in groundwater studies. Specifically, “groundwater level” and “geostatistics” relate to monitoring groundwater fluctuations and analyzing spatial data. Meanwhile, “data-driven models” and “Gaussian process regression” are crucial for direct data-based analysis and uncertainty estimation, underscoring the increasing reliance on ML-driven approaches in groundwater research.
In addition, although methods such as Support Vector Machines (SVM), Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference Systems (ANFIS), and logistic regression are represented by relatively small bubbles, they demonstrate the significance of these powerful techniques in groundwater research. These keywords highlight the critical role of ML and statistical modeling in groundwater analysis. Basic Themes (lower right quadrant) are characterized by high centrality but low density, representing fundamental methods commonly used in groundwater studies. This section includes keywords such as “GIS”, “remote sensing”, “Iran”, and “groundwater potential mapping”. The presence of “GIS” and “remote sensing” underscores their widespread application in groundwater potential mapping and spatial analysis. The keyword “Iran” suggests a strong regional focus in groundwater research, while “groundwater potential mapping” signifies studies aimed at determining the spatial distribution of water resources. Additionally, terms like “machine learning”, “groundwater”, “random forest”, and “deep learning” are partially positioned within Motor Themes, but most of them remain within the boundaries of this category. “machine learning” and “deep learning” stand out as advanced tools for prediction and modeling, capable of extracting insights from large datasets, whereas the random forest algorithm is particularly favored for classification and regression tasks. These keywords collectively reflect the fundamental geographic tools, regional foci, and methodological approaches that shape groundwater analysis.
Niche Themes (upper left quadrant) encompass topics that cater to a specialized research community characterized by high density but lower centrality. Prominent keywords in this quadrant include “forecasting”, “groundwater recharge”, and “extreme gradient boosting”. The term “forecasting” relates to studies that predict future groundwater levels and water quality, while “groundwater recharge” pertains to the natural replenishment processes of aquifers. “Extreme gradient boosting” is an advanced ML algorithm known for enhancing predictive accuracy. These keywords represent a specialized domain that demands expertise in hydrological forecasting and the analysis of groundwater renewal cycles. Emerging or Declining Themes (lower left quadrant) consist of topics that are either gaining or losing relevance, exhibiting both low centrality and density. This category includes keywords such as “groundwater management” and “GRACE downscaling.” “Groundwater management” reflects the shift from traditional approaches to modern, data-driven techniques for the sustainable use of groundwater resources. Meanwhile, “GRACE downscaling” refers to the enhancement of satellite data resolution for groundwater analysis, offering new possibilities for large-scale hydrological assessments. These keywords indicate evolving research areas with significant implications for sustainable water resource management and the integration of remote sensing in groundwater studies.
3.8. Overlay Visualization: Mapping Research Evolution
Visualizations generated using VOSviewer were also employed to analyze the academic publication and author keyword networks in the field of groundwater and ML. Each figure highlights different aspects of these networks, offering valuable insights into publication trends and clustering patterns in the literature as well as providing a dynamic overview of the field’s evolution, helping to identify both well-established research avenues and emerging directions in groundwater and ML studies.
The overlay visualization, in particular, illustrates the temporal evolution of key publications within the network. As shown in
Figure 8a, the color scale transitions from blue to yellow, representing the period from 2010 to 2020. Nodes appearing in yellow indicate emerging topics and recent publications, reflecting the latest research trends. Notably, publications such as Remote Sensing, Science of the Total Environment, Environmental Science and Pollution Research, and Water (Switzerland) are marked with colors closer to yellow, signifying their growing prominence in recent years. Additionally, journals such as the Journal of Hydrology, Water Resources Management, Environmental Earth Sciences, and Water Resources Research appear in shades closer to green. This suggests that while these journals have played a significant role in the field, their research output in this area has slowed in the most recent years.
Figure 8b presents a cluster visualization that categorizes publications into different thematic groups, with each cluster represented by a distinct color. These clusters illustrate sub-research areas based on citation relationships. The green cluster includes key publication titles such as Journal of Hydrology, Water Resources Research, and Hydrological Processes, focusing primarily on fundamental hydrology and water resources management. The red cluster consists of journals such as Water (Switzerland), Environmental Science and Pollution Research, Groundwater for Sustainable Development, Environmental Monitoring and Assessment, and Environmental Research and Risk Assessment. This cluster is centered on topics related to environmental risks, pollution, sustainability, and environmental impact assessment. The blue cluster includes publications such as Remote Sensing, Journal of Environmental Management, Geocarto International, Water Resources Management, Science of the Total Environment, and Environmental Earth Sciences. This cluster focuses on environmental management, remote sensing, GIS, and water resources management. Due to overlapping themes in environmental science, the blue and red clusters are distinct from the green clusters but appear to be more interspersed with each other. The yellow cluster, the smallest of the groups, contains journals such as Hydrology and Earth System Sciences, Environmental Science and Technology, Water Research, and Chemosphere. This cluster is concentrated in a small area on the left and primarily focuses on preventing environmental pollution, protecting water resources, and developing clean technologies. Despite its distinct focus, it shares several thematic connections with the red and blue clusters, reflecting its interdisciplinary nature.
Similarly, examining author keywords is important in terms of examining related topics and their connections to each other. Visualizing the basic themes in the field of research, topic clusters, and development trends over the years provides an important perspective to reveal the general structure of the literature and key research topics.
Figure 9a presents a temporal dimension, revealing the changes in the topics over the years and their development in the literature. The color scale changes from 2010 to 2020; shades close to yellow represent newer research and rising trends, while green tones reflect topics that have received attention in previous years. The most central concepts in the visual are “machine learning” and “groundwater”, and other related topics are densely surrounded by them. This reveals that “machine learning” is increasingly becoming a fundamental tool in environmental sciences and water management. Frequently used techniques related to ML include algorithms such as “artificial neural network”, “random forest”, and “support vector machine”. These algorithms play an important role in the analysis of environmental data and the estimation of variables such as water quality and groundwater levels. The fact that these keywords are in yellow tones indicates that they have gained particular importance in recent years and that interest in more complex and advanced algorithms has increased in studies on the groundwater topic.
Keywords such as “groundwater level” and “groundwater potential mapping” show how ML is used in groundwater management. The fact that such techniques are in yellow indicates that groundwater level monitoring and potential mapping have become prominent application areas in recent years. In particular, groundwater level modeling and sustainable management of water resources are being investigated as a critical issue as the effects of climate change become increasingly evident. Advanced ML models such as “boosted regression trees”, “convolutional neural networks”, and “deep learning” also attract attention. The fact that “convolutional neural networks” and “deep learning” algorithms are in yellow tones among these techniques shows that these methods are becoming increasingly popular in the analysis of data in groundwater-themed studies and are increasingly included in the research. In addition, the “boosted regression trees” algorithm is more blue in tone, indicating that interest in this algorithm has decreased in recent years. It is also noteworthy that newer and more specific environmental issues, such as “groundwater drought” and “terrestrial water storage”, are also in yellow. These issues can be interpreted as areas that have gained importance and intensified research in the context of climate change and sustainable water management in recent years.
Figure 9b provides an analysis of the use of ML techniques in environmental sciences and groundwater research, in particular, by topic cluster. Each cluster shows how specific keywords are related to each other and how different topics are grouped together. This analysis helps us understand the relationship between ML and environmental research and how data-driven approaches to solving environmental problems have evolved. The green cluster focuses on the use of ML techniques in groundwater modeling and prediction studies. Keywords such as “machine learning”, “artificial neural network”, “groundwater modeling”, and “groundwater level prediction” are at the center of this cluster. This shows that ML algorithms play a key role in groundwater level prediction and modeling studies. It is also understood that techniques such as “boosted regression trees” and “principal component analysis” are frequently used in this field. This cluster highlights the increasing importance of data-driven prediction models in groundwater management. The red cluster includes keywords related to environmental management, water resources management, and the integration of geographic information systems. Keywords such as “groundwater management”, “groundwater potential mapping”, and “grace” indicate that this cluster focuses on tools and techniques used in environmental management processes. Terms such as “terrestrial water storage” and “land surface model” also indicate how work in this area intersects with other disciplines such as agriculture and land use management. The red cluster indicates the increasing role of ML applications in environmental management applications supported by geographic information systems.
The blue cluster focuses on environmental chemistry, pollution analysis, and climate change. Keywords such as “climate change”, “nitrate load”, “water resources”, “hydrochemistry”, and “solute transport” are concentrated in this cluster. These terms represent studies investigating the effects of climate change and various pollutants on water resources. Words such as “agriculture” and “nitrate load” indicate studies examining the effects of agricultural activities on water resources. The blue cluster is centered on keywords addressing environmental pollution and water quality issues. The yellow cluster, especially the largest member of which is the keyword “groundwater”, stands out in keywords such as “groundwater quality”, “big data”, “permeability”, “hydrological model”, and “linear regression” and is distributed homogeneously among the other clusters. This cluster, which is quite homogeneous with the red and green clusters, also intersects thematically with these two clusters with the keywords that stand out within it.
4. Discussion
Within the scope of the study, the applications of ML techniques on groundwater quality and management were examined through bibliometric analysis and compared with the current trends in the literature. In ML-based studies on groundwater, it was observed that MCP was significantly lower than SCP, and it was concluded that international cooperation was low. Limited cooperation at the national level indicates the need for coordinated international efforts, especially for transboundary water problems. As supported in the literature, cooperation between countries that are leaders in this field, such as the USA and China [
3], with countries with limited resources and experience in this field, sharing power by knowledge, experience, and materials will increase the richness and diversity of studies and produce valuable results on groundwater management. As confirmed in the literature, the subject of groundwater is very closely related to ML and remote sensing topics [
26]. ML and remote sensing, two parameters that make each other strong in groundwater studies, have the potential to increase reliability and success in groundwater studies if they are worked on more compactly.
In addition to the fact that the relationship between climate change and water quality has a bidirectional effect, the effects of climate change on water quality, especially temperature increase, changes in precipitation patterns, and extreme weather events have been stated in the literature as changing the physical, chemical and biological properties of water. In addition, it has been noted that water pollution can also affect climate change, and for example, greenhouse gases such as methane and carbon dioxide released from water resources contribute to global warming. GRACE is a satellite mission conducted by NASA and the German Aerospace Center (DLR) focused on examining such situations [
3,
37]. When compared with the keywords used within the scope of this topic in the thematic map, it seems that GRACE studies are on the rise, and further integration would be useful for examining such interactions in more detail.
The most widely used ML models in groundwater research include ANN, SVM, Decision Trees, RF, ANFIS, and recently deep learning methods (CNN, RNN) [
16,
38]. While ANNs learn complex relationships between inputs and outputs, SVMs find hyperplanes that separate the data. While decision trees are based on if-then rules, ensemble models such as RF increase generalization ability by reducing overfitting. ANFIS combines fuzzy logic and neural networks, while deep learning models exhibit superior performance on large datasets [
16].
The accuracy of these models is usually evaluated by metrics such as root mean square error (RMSE), Nash-Sutcliffe efficiency coefficient (NSE), correlation coefficient (R), coefficient of determination (R
2), and mean absolute error (MAE) [
39]. The processing time of ML models depends on the model complexity and data size. While simple models such as decision trees have shorter training times, SVM and deep neural networks require longer training times and high processing power on large datasets. While the training time of SVMs increases significantly as the data size increases, deep learning models can be trained on large data in reasonable times with parallel processing capabilities such as GPUs. The scalability of models such as CNN allows them to process large and multivariate datasets faster. With appropriate hardware and optimizations, the training and prediction times of deep learning models can be kept at manageable levels [
40].
In the literature, the performance of different ML algorithms in groundwater estimation has been compared, and it has been seen that they generally reach similar levels of accuracy. A meta-analysis examining 197 studies between 2010 and 2020 found that ANNs are slightly superior to other methods [
39]. However, the best-performing model may vary depending on the problem definition and dataset. For example, in one study, SVM gave the lowest MAE and RMSE [
41], while in another study, the XGBoost model outperformed all other algorithms [
42]. While ensemble models such as RF generally provide more stable results compared to single models, models such as SVM can overfit with small numbers of data [
43]. On the other hand, deep learning methods can exhibit extraordinary accuracy on large datasets. In a study conducted with CNN, the highest accuracy was achieved, surpassing all traditional methods [
40]. Therefore, many researchers compare multiple algorithms to determine the model that provides the highest accuracy [
39].
Remote sensing and Geographic Information Systems (GIS) play a critical role in ML-based groundwater analyses. Remote sensing supports field measurements by providing data over large areas. For example, GRACE satellites detect groundwater storage changes through equivalent water depths, while InSAR techniques provide indirect information by monitoring land subsidence due to excessive water withdrawal [
44]. Such data provide both input for ML models and an independent comparison opportunity for the validation of model outputs. GIS enables spatial analysis of ML models by integrating data from different sources, such as geological maps and well measurements. For example, ML predictions can be compared with GRACE satellite data to assess the reliability of the model. GIS also facilitates the visualization of results. Pollution risk zones predicted by ML models can be overlaid with population distribution and land use maps in the GIS environment, and risky areas can be clearly revealed [
45].
Some of the major challenges in ML-based groundwater analyses are regarded as data access, computational costs, and model interpretability. In many regions, well measurements are sparse, water quality data are incomplete or are kept in incompatible formats by different institutions [
16]. Data sharing restrictions also complicate research [
44]. In terms of computational costs, training deep learning models requires high processing power, while access to hardware such as GPU and memory required by researchers in traditional hydrogeology may be limited. However, the widespread use of cloud computing services (Google Earth Engine, AWS, etc.) facilitates the processing of large datasets [
39].
The interpretability of model results is also a significant challenge for both researchers and practitioners. ML models, especially complex structures such as deep neural networks, are often considered “black boxes”, and it is difficult to transparently reveal their decision mechanisms. For example, understanding the reason behind a certain prediction made by algorithms is of great importance in fields such as hydrology and hydrogeology [
46]. Explainable AI techniques are therefore being developed to tackle the corresponding challenge. Approaches such as SHapley Additive exPlanations (SHAP) and variable importance identification techniques such as permutation importance can help make the decision processes of ML models more understandable [
16]. Generalization and overfitting problems are also among the other issues that need to be considered. ML models can overfit the training data and cause high errors in new data. Therefore, researchers devote their significant efforts to increasing the generalization capacity of models with cross-validation and independent test data [
39].
In groundwater studies, there are three countries that stand out the most in the comparison between countries: USA, China, and India. These three countries, which are also at the top of the list in the current study, have given more importance to these studies, especially with the increase in urbanization [
9]. Urbanization brings with it a population increase, and water consumption increases in direct proportion to this. For this reason, it is essential to give importance to groundwater studies in developed and developing countries where urbanization is increasing. In addition, supporting these studies with ML techniques that facilitate work will make a great contribution to finding a solution to the water problem in urbanization, especially in terms of groundwater. The contributions of the journals in which groundwater and ML studies are published were also examined. In particular, the journals “Water” and “Science of the Total Environment” attracted attention as prominent Q1-level journals in the field of environmental sciences and water resources management [
26,
37]. The articles in these journals show that the findings in our study are consistent with the literature. In addition, the Journal of Hydrology is a pioneer in the literature with its high h-index in the modeling of hydrological processes and the effects of climate change on groundwater. The findings of our study are directly related to the themes discussed in these journals.
5. Implications
It can be beneficial for future research to focus on the more effective and reliable application of ML models in groundwater management. In particular, the integration of physics-based models with data-driven methods is considered an important approach for more comprehensive modeling and understanding of hydrological processes. Although it is stated in the existing literature that ML models can make high-accuracy predictions, the lack of transparency in decision-making processes makes it difficult for policymakers to widely adopt these models. In this context, the development of interpretable artificial intelligence (XAI) techniques that will provide a better understanding of model outputs may contribute to the more effective use of these technologies in water management and environmental planning processes. In addition, it is recommended that analyses be conducted based on wider and more diverse datasets in order to increase the reliability of model training and validation processes. Integrating remote sensing technologies, hydrological observations obtained by unmanned aerial vehicles (UAVs), and data provided by multi-sensor systems with long-term climate projections can increase the accuracy and generalizability capacity of prediction models. In this direction, it may be useful to focus on new modeling strategies and interdisciplinary approaches based on large-scale data integration. However, strengthening international collaborations, developing joint research projects and establishing data sharing platforms can contribute to producing more comprehensive and effective solutions for groundwater management. In particular, multinational ML projects for the management of transboundary water basins can enable the development of holistic policy recommendations for regional water crises. On the other hand, further studies on the reliability of ML models, uncertainty analyses, and model adaptation are considered an important research area. In particular, the integration of uncertainty assessments in model outputs can contribute to the more effective and reliable use of these models in decision support systems. It may be useful to adapt the models used in groundwater predictions to include future climate change scenarios instead of relying solely on historical datasets. In this context, drought predictions, water pollution monitoring systems, and analysis of the effects of extreme weather events on hydrological processes can be among the priority research areas. In addition, it is recommended that the development of ML techniques be focused on, which can more effectively model pollutant transport in order to prevent groundwater pollution. In this context, optimizing models that can predict the distribution of heavy metals, pesticides, and industrial pollutants in groundwater can be an important step toward protecting water resources. It may be useful to investigate the advantages of ML and deep learning algorithms in water quality prediction and monitoring more comprehensively. Finally, it is recommended that awareness studies be increased in order to encourage policymakers and public institutions to adopt technological developments in the field of ML and water management. The development of ML-based water management projects not only in an academic framework but also in cooperation with local governments, water administrations, and private sector representatives can provide more holistic and applicable solutions. Encouraging data-driven approaches in the process of determining water management policies and adopting more conscious and sustainable water management strategies by utilizing the predictive capacity offered by ML models can contribute to developments in the sector.
6. Conclusions
Within the scope of this study, the use and development of ML techniques in groundwater research were evaluated with a comprehensive bibliometric analysis. Groundwater-themed ML research published between 2000 and 2023 was examined in terms of scientific productivity, collaboration networks, research themes, and methods used. The findings reveal that ML is increasingly used in various topics in the groundwater research pool and is positioned as an important method. The high accuracy and predictive capacity provided by ML, especially in water quality, water level estimation, and pollution modeling, have made it a valuable tool in this field. Countries such as China, the USA, and Iran stand out as leading centers that emphasize the strategic role of ML in groundwater management. The study also showed a strong connection between groundwater management and ML, the expansion of the research area, and the collaboration of researchers from different disciplines in this field. On the other hand, it has been observed that in ML-based groundwater research, SCP is significantly more than MCP. This situation reveals that international cooperation is limited, and coordinated global efforts are necessary, especially for transboundary water problems. As supported in the literature, if leading countries in this field, such as the USA and China, cooperate with countries with fewer resources and increase the sharing of information, experience, and materials, this will increase the scientific richness and diversity of studies and provide more comprehensive solutions for groundwater management.
Likewise, only English publications were considered in the study, which could pose a risk in terms of not fully reflecting the global perspective on ML and groundwater research. In addition, bibliometric analyses usually focus on scientometrics such as citation counts, h-index, and the number of publications, disregarding directly measuring the real scientific impact of studies. In addition, changing citation habits in different disciplines make cross-field comparisons difficult. Accordingly, one can conclude that individual researchers or research groups can increase their h-index and total citation counts by citing their own articles frequently. This can lead to misleading results in the evaluation of academic performance. This phenomenon, also known as citation inflation in the literature, can undermine the reliability of academic metrics by overestimating the level of scientific impact of the researcher. Although bibliometric analysis methods can detect self-citations to a certain extent, they are generally inadequate in filtering them completely. Large academic databases such as Web of Science (WoS) and Scopus can calculate self-citation rates within certain criteria; however, they do not provide a definitive control mechanism for how such citations should be treated in academic evaluations. In the scientific publishing ecosystem, bibliometric analyses can play an important role in highlighting certain topics and research groups. Academic journals with high-impact factors can limit scientific diversity by giving more space to certain methodologies, research traditions, or certain scientists. In particular, the requirement to publish in leading journals in the field can make it difficult for innovative or non-mainstream ideas to be adequately represented in the academic literature. Furthermore, detailed analyses, such as the performance comparison of ML algorithms, were not included and left for future studies. Therefore, details such as which ML algorithm is more effective under which conditions or for which environmental parameters it provides the best results will be addressed in future studies.
Future research should incorporate larger and more diverse datasets into groundwater prediction models. Incorporating environmental data such as satellite imagery, climatic variables (such as temperature and precipitation rates), and soil structure into ML models can increase the accuracy and reliability of these models. Combining such data with ML techniques allows for more comprehensive analyses of water quantity and quality. The findings of this study show that developing ML models to adapt to environmental changes such as climate change, drought, and temperature fluctuations will contribute to the sustainable management of water resources. Flexible models that take into account regional and seasonal variability will provide better adaptability in different geographical conditions. Making ML techniques user-friendly and explainable will facilitate the effective use of models in water management and environmental planning processes. Understandable and transparent algorithms will increase the confidence of decision-makers in ML results and encourage wider adoption of these technologies. In future studies, a deeper examination of citation networks may contribute to the development of strategies to reduce self-citation rates. Publishing studies in different disciplines and through various academic journals may enable scientific knowledge to reach a wider academic community. Identifying countries with low levels of collaboration within the scope of bibliometric analyses and encouraging international joint projects may increase scientific interaction. In conclusion, although the bibliometric analysis provides important findings on integrated ML and groundwater research, its limitations and potential biases should be taken into account. The use of more comprehensive and balanced datasets may contribute to the strengthening of academic collaborations at the global level by reducing the effects of self-citation and journal-based biases.