Previous Article in Journal
Riverine Ecosystem Contamination and Ecological Risk Assessment Following Cyanide Leakage from In Situ Rare Earth Mining in Northern Laos
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrated Assessment of Coastal Groundwater Vulnerability in Western Kingdom of Saudi Arabia Using the DRASTIC Model and Machine Learning Algorithms

1
Water Research Center, King Abdulaziz University, Jeddah 21598, Saudi Arabia
2
Earth Science Department, Faculty of Science, Damanhour University, Damanhour 22511, Egypt
3
Hydrology Department, Desert Research Centre, Cairo 11753, Egypt
4
Department of Water Resources, Faculty of Environmental Sciences, King Abdulaziz University, Jeddah 21598, Saudi Arabia
5
Department of Environment, Faculty of Environmental Sciences, King Abdulaziz University, Jeddah 21598, Saudi Arabia
6
Geology Department, Suez Canal University, Ismailia 41522, Egypt
7
Agricultural Engineering Department, Faculty of Agriculture, Mansoura University, Mansoura 35511, Egypt
8
Agricultural Engineering, Evaluation of Natural Resources Department, Environmental Studies and Research Institute, University of Sadat City, El Sadat City 32897, Egypt
*
Author to whom correspondence should be addressed.
Earth 2026, 7(3), 97; https://doi.org/10.3390/earth7030097 (registering DOI)
Submission received: 2 May 2026 / Revised: 24 May 2026 / Accepted: 2 June 2026 / Published: 4 June 2026

Abstract

Groundwater resources in the Kingdom of Saudi Arabia (KSA) are important for meeting the needs of human communities, agriculture, and industry. In Western KSA, groundwater from coastal aquifers is an essential resource that complements desalinated seawater. Therefore, ensuring the quality and contamination of groundwater has emerged as a critical priority for preserving water security. The aim of this research is to evaluate the groundwater quality and its vulnerability to contamination within the Wadi Marawani Basin. To achieve this aim, water quality indices (WQIs), the DRASTIC model, and machine learning (ML) algorithms were employed alongside a Geographic Information System (GIS). The results of the chemical analysis of 64 water samples were used in these assessments. Furthermore, several input parameters were evaluated using the DRASTIC model to estimate the DRASTIC index (DI) and generate a groundwater vulnerability map. Three ML algorithms—specifically, a Multilayer Perceptron (MLP), a Random Forest (RF), and a Decision Tree (DT)—were utilized to forecast WQIs such as the total dissolved solids (TDS) and sodium adsorption ratio (SAR), in addition to the DRASTIC index (DI). The results revealed that around 36% of the samples were classified as fresh water (<1000 mg/L). The SAR ranged from 1.10 to 32.50, indicating that most samples were suitable for irrigation. Approximately 22% of the basin was classified as demonstrating high vulnerability, whereas about 78% demonstrated low-to-moderate vulnerability. Assessment of the ML models showed high predictive accuracy for the TDS, SAR, and DI. The MLP-Vul. model attained an R2 value of 1.00 and RMSE value of 0.01, the RF-Vul. model achieved an R2 of 0.94 and RMSE of 3.17, and the DT-Vul. model attained an R2 of 0.92 and RMSE of 3.57. Although there was a minor increase in RMSE across all models during the testing phase, their predictive performance remained clear.

1. Introduction

Today, the world is facing a scarcity of water resources due to increasing aridity and high population growth. Therefore, groundwater serves as a vital resource for many societies, significantly contributing to maintaining agricultural productivity and enhancing global food production [1,2]. It is estimated that fifty percent of the global supply of drinking water, along with a sizable portion of irrigation water, originates from groundwater sources [3]. The worldwide issue of providing safe and good-quality drinking and irrigation water is widely acknowledged [4]. In areas characterized by arid and semi-arid climates, like KSA, groundwater is the primary supply for domestic, industrial, and irrigation uses [5,6,7]. Climate change, combined with agricultural practices, population growth, and rapid urban development, poses significant challenges to the quantity and chemistry of groundwater resources [8]. The issue of groundwater contamination is critical, affecting not only regions that rely on groundwater as their main drinking and irrigation water supply but also the water resource management in these regions [9,10,11,12].
Previous studies have focused on the contamination of groundwater caused by a variety of human activities, such as leaks from storage tanks, chemical spills, landfill operations, and the application of fertilizers and pesticides in agriculture [13,14]. The increasing demand for groundwater resources, driven by population growth and by agricultural, industrial, and urban activities, threatens both their quality and quantity [14,15]. Numerous organizations and communities are actively engaged in mapping and identifying contaminated aquifers to understand the issue and build public knowledge [4]. As a result, measuring water quality for drinking and irrigation uses is essential for water resource management, especially in arid regions [16,17], with effective groundwater management requiring a range of factors to be studied. Evaluating groundwater vulnerability to contamination is an essential step in developing approaches to protect these water resources and prevent pollution [18,19,20]. Statistical analyses of hydrochemical components in groundwater and the application of numerical modeling techniques can aid scientists and groundwater resource managers in mitigating contamination and improving monitoring quality.
Groundwater vulnerability is a complex concept that does not lend itself to a single definition [21]. In general, it refers to the probability of pollutants penetrating the vadose zone and reaching the aquifer (saturated zone), which is affected by the properties of the ground surface and the vadose zone [22,23].
Nitrate (NO3) contamination in groundwater systems continues to present a significant threat to human health. Various methodologies have been created to assess groundwater vulnerability [24]. Among these, the index-based DRASTIC model is particularly prevalent (the letters of the abbreviation DRASTIC refer to seven elements affecting the degree of groundwater contamination). The outcomes of the DRASTIC model are frequently represented as vulnerability zonation maps that highlight areas of risk. According to [25], the ability to generate vulnerability maps depends on the accessibility and explanation of the data collected. Recognizing vulnerable regions enables water resource managers to focus groundwater (GW) development efforts in safer areas, thus minimizing water treatment expenses. Vulnerability mapping is essential for optimal groundwater management; it assesses the potential for GW pollution from both natural and anthropogenic sources [20,26].
Recently, ML algorithms have become important in evaluating water quality indices (WQIs), as demonstrated in the WQI estimations in [27,28], as well as in enhancing the outcomes derived from the DRASTIC method that is used for estimating the vulnerability index [29]. The studies in [30,31,32] have highlighted ML’s remarkable ability to represent complex nonlinear processes in water resource investigations. Each ML model has distinct advantages and limitations, with varying effectiveness depending on the specific WQIs, as evidenced in the literature review. Numerous ML techniques have been used for representing and forecasting water quality trends; for example, Decision Trees (DTs) were discussed in [33,34], Random Forests (RFs) were highlighted in [32,35], and Multilayer Perceptrons (MLPs) were examined in [31,36,37]. However, determining the optimal ML algorithm for specific water quality concerns remains a challenging task for researchers. The capabilities of MLP, DT, and RF models can help address the limitations of traditional water quality assessment techniques. At the core of water quality management, water quality modeling provides crucial support to governments and policymakers in improving water quality for various purposes. The studies in [7,38,39] underscore that the performance of ML models offers benefits such as improved consistency, feasibility, costs, and speed compared with traditional laboratory analyses. By developing prediction models that leverage a spectrum of physical and chemical properties, these advanced techniques enable effective estimation of diverse WQIs and the groundwater vulnerability index.
The primary object of this research is to create a vulnerability distribution map and to evaluate the pollution risk of groundwater based on GIS, DRASTIC, and ML models. Machine learning (ML) is a branch of artificial intelligence (AI) concentrated on creating predictive models from real data [14]. The principal aim of every ML algorithm is to attain a price classification, which is assessed using metrics including classification accuracy, area under the curve, and the anticipated water quality indices (WQIs), along with the DRASTIC index, derived from the given input data. Individual machine learning (ML) models possess unique strengths and weaknesses, with their efficacy varying significantly across different water quality parameters and research settings. While many ML methods demonstrate proficiency in simulating and predicting water quality patterns, identifying the most suitable algorithm for addressing contamination issues remains a major challenge. This research addresses this gap by leveraging ML models to mitigate the constraints associated with conventional groundwater quality and vulnerability assessment approaches. The novelty of this study stems from its integrated approach to groundwater assessment of the coastal aquifer systems in Western Saudi Arabia. It combines DRASTIC-based vulnerability mapping with groundwater quality evaluation using WQI and SAR indices, offering a more holistic framework than studies that address these aspects separately. In addition, the research compares the performance of MLP, RF, and DT machine learning models with respect to the region’s arid climatic and hydrogeological conditions, providing valuable insights into their applicability and reliability in coastal aquifers that are still insufficiently studied. The outcomes of using ML to predict groundwater quality and vulnerability indices highlight the advantages of this innovative approach over simpler methodologies. To avoid a deterioration in groundwater quality and a high level of groundwater vulnerability within the region, several strategies may be explored, including regulating agricultural exploitation, supporting advanced irrigation methods, reducing the use of nitrogen-based fertilizers on farmland, and overseeing municipal and industrial wastewater. Finally, the outcomes of this research are recommended to regional planners and policymakers for helping to apply initiative-taking strategies to mitigate groundwater contamination and associated health risks. Furthermore, the study results support future groundwater monitoring and decision-making in similar arid coastal environments.

2. Geological and Hydrogeological Background

This research was carried out in Wadi Marwani Basin, located to the southeast of Rabigh city, Western KSA. The basin is about 90 km northeast of Jeddah and is situated between 39°5′ and 40°15′ E, and 21°50′ and 22°55′ N (Figure 1). As shown in Figure 2a (digital elevation model), the basin exhibits significant variation in altitude, characterized by high mountains reaching approximately 1661 m (AMSL). The elevation of the surrounding plain ranges from 1 m to 400 m (AMSL). The course of the wadi exhibits an uneven pattern, progressing east–west toward the Red Sea [5]. According to rainfall data collected during the hydrologic period from 1966 to 2018, the study area is characterized by arid conditions, with moderate rainfall that varies significantly in both frequency and distribution. As shown in Figure 2b, the annual precipitation levels range from 70 to 110 mm. The geological formations within the studied wadi range from the Pre-Cambrian to the Quaternary periods, as shown in Figure 2c. The upstream portion of the study area is characterized by alkali olivine basalt rocks [5,40].
Alluvial deposits resulting from the weathering of adjacent rock units serve as the base of the wadi. This alluvial layer comprises gravel in shallower regions, as well as clay and fine sand near the watercourse [5]. These deposits serve as an essential renewable water resource, extensively utilized for irrigation, household purposes, and potable water. The quality of groundwater is affected by multiple factors, including rainfall, ion dissolution in aquifer rocks, contamination from surrounding sources, biological activity, residence time, and runoff. This research has direct importance to groundwater evaluation for the coastal aquifer in Wadi Marawani, where high population densities and agricultural practices rely on groundwater resources in the basin. The primary challenges for this coastal aquifer are declining groundwater levels and increasing salinity of the groundwater due to over-pumping of groundwater and seawater intrusion; thus, a strong, scientifically based water management plan is required for the basin. A recent study [6] emphasizes that the Wadi Marawani coastal aquifer is an essential, yet vulnerable, unconfined system composed primarily of Quaternary alluvial deposits. These deposits, ranging in thickness from 10 to 100 m, are essential for local agricultural and domestic needs.

3. Data Collection and Methods

To collect data for this study, a field investigation was conducted that included measuring the depth to groundwater and collecting water samples from 64 sites within the coastal aquifer (Figure 1). Water samples were collected after a pumping period of 5 to 10 min and stored in high-quality polyethylene bottles with a capacity of 500 mL (two groups). To inhibit microbial growth and prevent water property alterations, all the collected samples were preserved at 3–5 °C. Furthermore, the hydraulic parameters for the coastal aquifer were obtained from pumping-test analyses conducted in [6], in accordance with the methodology developed in [41] and subsequently refined in [42]. The method applied in this work consisted of several stages, with each stage having a distinct purpose, as described below.

3.1. Hydrochemical Characterization of Groundwater

The chemical analysis evaluated the cation (K+, Mg2+, Ca2+, Na+) and anion (SO42−, HCO3, Cl, NO3) levels in the 64 collected water samples, as detailed in Table 1. All the water samples were analyzed in accordance with established regular analytical protocols [43]. A multi-parameter device (WTW a Xylem brand, Weilheim, Germany) was used to determine the pH and electrical conductivity (EC) of the collected water samples. The levels of Mg2+, Ca2+, Cl, HCO3, and CO32− were determined through titration methods. Additionally, K+ and Na+, as well as sulfate (SO42−) and nitrate (NO3), were measured using an ultraviolet (UV)–visible spectrophotometer, DR/2040( Hach Company, Loveland, CO, USA). To confirm the quality of the reported results, the precision of the ion concentration measurements, expressed in meq/L, was confirmed by a charge–balance error (CBE) evaluation, maintaining an acceptable threshold of ±5% in accordance with the guidelines established in [44]:
C B E = C a t i o n s A n i o n s C a t i o n s + A n i o n s × 100
The quality control analytical methods were verified through proper instrument calibration and confirmation of sample accuracy.

Water Quality Index (WQI)

The total dissolved solids (TDS) is the concentration of minerals, metals, organic compounds, and salts present in a specific volume of water, measured in milligrams per liter (mg/L). This parameter is closely linked to water quality and purity, especially in the context of water purification systems. Accordingly, the TDS was obtained as follows [45]:
TDS = Ca2+ + Mg2+ + Na+ + K+ + Cl + SO42− + HCO32− + CO32− + NO3 (all in mg/L)
The sodium adsorption ratio (SAR) is expressed in milliequivalents per liter (epm). It measures the sodium (Na+) concentration in comparison to the Ca2+ and Mg2+ levels. This ratio is determined by taking the sodium concentration and dividing it by the square root of half the total of the calcium and magnesium levels [46], as shown in the following equation [47]:
SAR = [Na+/√(Ca2+ + Mg2+)/2]
Soils with SAR values of 13 or higher are vulnerable to increased dispersion of organic matter and clay particles, which leads to a decrease in saturated hydraulic conductivity (Ksat) and aeration, along with an overall deterioration of soil structure [48,49].
Table 1. Physiochemical parameters of the 64 water samples collected in Wadi Marawani Basin, Western KSA.
Table 1. Physiochemical parameters of the 64 water samples collected in Wadi Marawani Basin, Western KSA.
Physicochemical ParameterUnitWadi Marawani Basin
(Number of Samples = 64)
WHO Standard for Drinking [50]
Min.Max.Mean
pH-7.108.007.677.00
ECµmhos/cm658.0028,700.004905.521000
TDSmg/L346.0018,171.002936.54500
K+mg/L0.7928.108.1230
Na+mg/L38.005150.00588.90200
Mg2−mg/L9.30710.00129.67200
Ca2+mg/L11.602002.00306.85200
Clmg/L37.109666.001193.12250
SO42−mg/L19.302840.00609.62100
HCO3mg/L31.00394.00200.50250
CO32−mg/LN.D.N.D.N.D.-
NO3mg/L2.20290.7053.9950
THmg/L67.187914.511298.81600
SARmeq/L1.1032.506.6013
N.D. means Not Detected.

3.2. DRASTIC Model

The DRASTIC method [51] applied in this study includes seven input parameters: D (depth to groundwater level), R (net recharge), A (aquifer media or formation), S (soil properties), T (topographic slope), I (the impact of the vadose zone), and C (hydraulic conductivity). Ratings (r) for vulnerability issues were allocated a grade ranging from 1 to 10, drawing from existing scholarly sources [18,52]. Additionally, each parameter received a weighting (w) between 1 and 5. The DRASTIC approach computes the vulnerability index (DI) by totaling the ratings (r) for each parameter multiplied by its corresponding weight (w), as shown in Equation (4) [53,54].
DRASTIC Index (DI) = DrDw + RrRw + ArAw + SrSw + TrTw + Ir Iw + CrCw
In the DRASTIC method, the parameter weight is denoted as w, while the rate is represented by r, both pertaining to specific parameters. Table 2 presents the rating values and the associated weights for each parameter, reflecting their relative significance [51].

3.3. Machine Learning (ML) Models

In this research, three ML models were utilized to predict the WQIs and the vulnerability index (DI): Multilayer Perceptron (MLP), Decision Tree (DT), and Random Forest (RF). These ML models were built with the scikit-learn library version 1.8.0 and the Spyder version 6.1 to predict the TDS, SAR, and vulnerability indices [55]. Input variables were designated according to their relationships exceeding a predefined threshold. Comparable approaches have been employed in previous studies [56,57].

3.3.1. Data Preprocessing and Splitting

The dataset was preprocessed to deal with null values and outliers. The data rows with null values were removed from the dataset to avoid any adverse impact on the training process of the ML model. The outliers in the dataset were handled using the imputation method with the most typical values. The dataset preprocessing was performed using the scikit-learn preprocessing module of the Python version 3.14.5. The dataset was subsequently split at random, allocating 70% to training and validation, and 30% to testing. This division enabled a comparison of the predicted outcomes to the true values, thereby assessing the model’s effectiveness.
To prepare for model training, the dataset was normalized using Equation (1), following the approach established in [58]. This standardization centers the mean levels of the input variables near zero and sets their standard deviation (SD) to one [59]. In Equation (5), z represents the modified dataset value, x denotes the true value, μ corresponds to the mean, and σ represents the SD:
z = x μ σ

3.3.2. Cross-Validation Procedure

To enhance the performance and generalization abilities of the simulations, the net search technique from the scikit-learn library, with 5-fold cross-validation, was applied to the training dataset [60]. This approach enhances generalization, prevents overfitting, and yields a further detailed evaluation of the model’s predictive abilities, as highlighted in [61]. The dataset is divided into five equal folds to train and evaluate the model five times. During each iteration, four folds are used for training, and the remaining fold is used for testing purposes. The testing fold is rotated across all five folds, ensuring that each fold acts as the testing set at least once. Model performance metrics are recorded for each iteration. After five iterations, the mean of the metrics is employed to evaluate the ML model.

3.3.3. Model Architecture and Training

The training dataset was used to evaluate multiple hyperparameter combinations and optimize the performance and generalization capabilities of the three ML models using the scikit-learn library’s grid-search approach combined with a 5-fold cross-validation technique. The model demonstrating optimal performance was determined according to the highest R2 value and the lowest root mean squared error (RMSE). The hyperparameters, which were fixed before training and not learned from the data, played a vital part in shaping the model’s presentation [59,62]. The detailed flowchart in Figure 3 presents the overall process of the ML models used to forecast the TDS, SAR, and vulnerability indices.
Multilayer Perceptron (MLP) Model
Recent studies have highlighted MLP models’ capabilities as powerful regression tools, particularly for tasks involving pattern recognition and function estimation. In contrast to traditional techniques, MLP exhibits exceptional ability to derive significant insights, handle incomplete information, and maintain robustness against outliers, as shown in the studies in [59,63,64]. The Quasi-Newton (QN) technique, as detailed in Equation (6), was chosen as the optimization method to minimize the discrepancy between the predicted and actual values, iteratively fine-tuning the connections, as detailed in [60,65].
ω j + 1 ω j α · L · ( L ω j ) 1
where ω j + 1 represents the weights of the following iteration, ω j denotes the weights of the current iteration, α is the learning rate, and L ω j represents the first partial derivative of the loss function ( L ). To optimize performance and generalization, a range of hyperparameters was tested, including the number of hidden layers (1–5), neurons per layer (2–10), learning rate (fixed at 0.001), maximum iterations (500–1000), and activation functions (see Table 3) [66,67]. According to [68], experience and empirical testing typically guide the determination of an MLP’s structure.
Decision Tree Model (DT)
DT models are well-suited for exploratory knowledge discovery as they do not require parameter tuning or specialized domain expertise. The Decision Tree algorithm contains a root node, branches, decision nodes, and leaf nodes, forming a tree-like structure in which decision nodes lead to leaf nodes [7]. Each decision node influences the path taken between nodes, as outlined in [69]. During training, hyperparameter tuning was used to identify the optimal parameters for the top-level model [70,71]. For training the DT model, the tree depth should range from 1 to 20. Additionally, the mean squared error (MSE), as shown in Equation (7), was employed to evaluate the value of each split, as discussed in [72,73].
M S E = 1 N i = 1 N ( Y a Y p ) 2
Random Forest Model (RF)
The RF model constitutes a widely adopted and exceptionally efficient ML algorithm utilized for regression and classification purposes in environmental analysis applications. This ensemble technique employs an integration of numerous decision trees to substantially improve the accuracy and dependability of forecasts when compared to single-tree algorithms [74]. At its core, a random forest comprises an ensemble of decision trees generated from random subsets of the available training data, which helps to reduce overfitting and improve generalization to new datasets.
This study systematically considered three important hyperparameters for optimal RF configuration. The quantity of trees within the ensemble was adjusted across a range of 1 to 20 to identify the ideal ensemble configuration that optimized both computational performance and predictive precision. The maximum depth of each individual tree was similarly modified systematically from 1 to 20 levels to regulate the intricacy of every component tree within the collective model. The criterion function utilized for split evaluation included both the MSE metrics described in the equations above.

3.3.4. Model Evaluation

To assess the differences between the real values and the predictions produced by the models, the machine learning algorithms employed metrics such as the coefficient of determination (R2) and the root mean squared error (RMSE) (see Equations (8) and (9)). Here, Y a represents the real value defined in the laboratory, Y p donates the expected value, Y ¯ signifies the average value, and N indicates the total number of data points [59].
R 2 = 1 ( Y a Y p ) 2 ( Y a Y ¯ ) 2
R M S E = 1 N i = 1 N ( Y a Y p ) 2

4. Results

4.1. Hydrogeology and Quality Assessment for Coastal Aquifer

An assessment of the production wells indicates that the Quaternary aquifer serves as the main groundwater supply in the Wadi Marawani Basin [75]. The coastal area (downstream) and stream courses of the investigated wadi are composed of unconsolidated alluvial sediments consisting mainly of sand, gravel, and silt produced from the erosion of surrounding basement rocks. These sediments constitute the Quaternary aquifer, which serves as the principal renewable groundwater source for domestic use, drinking water supply, and agricultural irrigation within the area. The aquifer is generally of the unconfined type, with semi-arid climatic conditions prevailing in the upstream mountainous regions and increasingly arid conditions toward the downstream areas. The groundwater depth in the wells varies between 2 and 110 m. Based on these measurements, a groundwater table map was created to illustrate groundwater movement, indicating a general flow direction from east to west toward the Red Sea coast (Figure 4). Recharge processes in the basin are highly variable due to fluctuations in rainfall and surface runoff across both time and space. Despite this variability, precipitation remains the dominant source of aquifer recharge, with previous studies reporting the recharge rates in Western Saudi Arabia coastal aquifers as ranging from 17% to 31% of rainfall, with an average close to 24% [76]. In general, groundwater chemistry in the aquifer is influenced by several interacting processes, including recharge from rainfall, mineral dissolution during water–rock interaction, seawater intrusion, and groundwater abstraction intensity.
The results of the chemical analysis conducted on 64 water samples (Table 1) indicated that the average values for pH, EC, TDS, sodium (Na+), calcium (Ca2+), chloride (Cl), sulfate (SO42−), nitrate (NO3), and total hardness (TH) exceeded the drinking water quality values established by the World Health Organization [50]. The spatial distributions of TDS, NO3, and SAR were created using ArcGIS (Figure 5a, Figure 5b and Figure 5c, respectively). The TDS serves as an essential measure of the chemical substances that dissolve in water. The TDS values of the groundwater varied between 346.00 and 18,171.00 mg/L, with an average value of approximately 2936.50 mg/L. This indicates a variety of water types within the study area, ranging from fresh to brackish saline. The TDS distribution map (Figure 5a) exhibits a gradual increase from the upstream to the downstream of the basin. Around 23 samples, accounting for approximately 36% of the total, were identified as fresh type (with TDS < 1000 mg/L) and mainly presented in the central and eastern portions (Figure 5a). In contrast, total dissolved solids, suggesting brackish saline water, were detected downstream of the basin. This situation can be attributed to several factors, including seawater intrusion, increased well discharge leading to mixing, wastewater leakage, and irrigation water recharge.
The level of NO3 fluctuated widely, between 2.200 and 290.70 mg/L, with a mean of 53.99 mg/L. As presented in Figure 5b, about 72% of the samples indicated a medium-to-high risk of NO3 pollution. High NO3 in the basin is primarily attributed to the influences of domestic and industrial wastewater, agricultural runoff, and excessive fertilizer use [15,77]. The SAR parameter serves as a crucial indicator for assessing water quality in agricultural contexts, particularly concerning the management of soils affected by sodium. The SAR values ranged between 1.10 and 32.50, with a mean of 6.60 (Table 1). The map for SAR distribution and the USSL salinity chart [47], as shown in Figure 5c, indicate that most of the samples in the basin are suitable for irrigation purposes.
Furthermore, the Quaternary coastal aquifer in Wadi Marawani Basin is affected by seawater intrusion, resulting in water quality that is inadequate for various uses. Therefore, seawater intrusion indicators were investigated, including ionic relationships such as Na+/Cl, Cl/HCO3, and Ca2+/Mg2+ [78], which are generally applied to determine the seawater impact on groundwater quality. The Na+/Cl ionic ratio was one of the most consistent indicators, varying between 0.06 and 3.05, with an average of about 0.79. Approximately 71% of the sampled water had a lower ratio (less than 0.86), often accompanied by increasing TDS, suggesting seawater mixing. The Cl/HCO3 ratio ranged from 0.31 to 126.95; about 64% of the water samples had high values (>2.8), suggesting an influence of seawater on the groundwater. On the other hand, the Ca2+/Mg2+ ratio in the sampled water ranged between 0.45 and 4.11, with an average value of about 1.67. However, when saltwater meets freshwater, a reverse ion exchange takes place, resulting in an increase in the Ca2+/Mg2+ ratio (>1.0). In the groundwater of Wadi Marawani, about 86% of the samples exceeded 1.0 and may be classified as seawater intruded. Additionally, a hydrochemical facies evolution diagram (HFE-D) [79] was utilized for groundwater sample classification to differentiate between the two primary phases (intrusion or freshening). The HFE-D distinguishes these phases via the conservative mixing line (indicated by the blue line in Figure 6). As indicated by the HFE-D, 76% of the samples were identified as part of the freshening phase, whereas 24 of the samples were classified under the salinization phase (mixing with seawater), suggesting an evolution of seawater intrusion in the study area.

4.2. Evaluation of Groundwater Vulnerability

The DRASTIC index was used in this research to develop an initial map of groundwater vulnerability in ArcGIS. The seven parameters integral to the DRASTIC model (as shown in Equation (4)) were obtained from primary data sources, as detailed in Table 2. The model is regularly applied using the GIS technique, which allows for the different parameter layers to be integrated and for the creation of vulnerability index (DI) maps. A comprehensive description of the results is presented below.
“D”—depth to water: Groundwater depths in the basin range from 2 to 110 m below the earth’s surface. The presence of relatively shallow groundwater, along with high porosity and permeability of deposits, poses a risk of contamination from human and agricultural activities in the study area [80]. To create the D parameter, data from 64 wells were analyzed using the inverse distance weighting (IDW) method within GIS (Figure 7a). Accordingly, the D parameter was reclassified and assigned a grade between 1 and 10 (Table 2). These classifications are essential for computing the vulnerability index (DI) using the DRASTIC technique.
“R”—net recharge: The R parameter measures the quantity of surface water, derived from rainfall and irrigation, for each unit zone within the study basin. It reflects the volume of water (measured in mm) that penetrates the saturated zone of the groundwater aquifer through the ground surface [14]. This parameter may be diminished and subject to variation because of progressively more arid climatic conditions. This phenomenon is attributed to climatic changes, in conjunction with variations in rock type, irrigation type, and land use [81]. The R-value plays an important role in understanding the dynamics and migration of contaminants from the vadose zone to the saturated part of the aquifer [82].
The study in [83] estimated the groundwater recharge using the chloride mass balance (CMB) method in some selected wadis in Western Saudi Arabia. The findings highlight that the overall recharge values remain low because rainfall occurs over short periods and mainly generates flash floods, which deposit clay and reduce infiltration. The Wadi Marawani Basin exhibits a low recharge value, with an average of 25.7 mm, a minimum of 16.3 mm, and a maximum of 37.5 mm. A net annual recharge map (Figure 7b) was produced utilizing the ArcGIS software (version 10.5), incorporating additional information on rainfall, evaporation, and runoff. The resultant R-values, instrumental in determining the vulnerability index, were in the range of 0.00 to 50.8 (Table 2). Regions with high recharge rates may be more susceptible to surface pollutants. The patterns of groundwater recharge identified in the study indicate that the middle region exhibits higher rates, whereas the eastern and western regions show comparatively lower rates (Figure 7b).
“A”—aquifer media: The aquifer media (A) relates to the properties of the materials comprising the water-bearing formation (aquifer). It influences the mechanisms of pollutant behavior in the water-bearing formation [84]. The alluvial deposits (aquifer media) resulting from the weathering of the surrounding rocks consist of sand, gravel, and silt, which accumulate in the coastal plain and along the major streams of the wadi being examined. These deposits play an important role in shaping the aquifer distribution within the studied basin. The groundwater aquifers in most basins are primarily composed of underlying rock and soil layers that consist of diverse materials that significantly enhance groundwater potential. The underlying basement rocks were analyzed to identify and outline the base of the aquifer. The aquifer media in the basin are characterized by sand and gravel sediments, earning a ranking of 8 (Table 2). This classification indicates an extremely coarse, porous medium, noted for its superior drainage and transmission capabilities.
“S”—soil media: The characteristics of soil media significantly influence groundwater recharge, water infiltration, contaminant movement, the relationship between groundwater and surface water, and the pollutant removal process. The classification of the soil within the study area was based on environmental engineering and geology and was identified as gravel and sand (Table 2). Ratings and weights were assigned based on soil hydrological characteristics, with values of 9–10 and 2, respectively.
“T”—topography or slope: In the DRASTIC model, T (%) characterizes the slope index, reflecting variations in the topography of the ground surface. In this study, T reflects the terrain gradient in the Wadi Marawani Basin, directly affecting the volume of water, which can seep into the ground surface. There is a direct correlation between the extent of contaminant infiltration into an aquifer and the topographical features of a region [51]: a gentler slope promotes greater infiltration, thereby increasing the likelihood that pollutants will migrate into the aquifer [85]. In contrast, steeper slopes are less prone to contamination because of higher surface runoff. In this research, the constructed digital elevation model (DEM), shown in Figure 2a, was used to create slope layers in the basin. The slope function in ArcGIS was applied to categorize the slope layer into five classes, as shown in Figure 7c. The associated rankings and weights attributed to the T index were derived from Table 2 and Figure 7c. Consequently, the T parameter was assigned the lowest weight (1), with rating values established from 1–10.
“I”—impact of vadose zone media: The Quaternary coastal aquifers in Western Saudi Arabia are primarily composed of unconsolidated sediments, including sand, gravel, silt, and clay, all of which are influential for the vadose zone’s impact in groundwater vulnerability assessments. In the Wadi Marawani Basin, the vadose zone is commonly formed of alluvial deposits with varying sediment textures above the water table, where the soil and rock pores are filled with air [86]. Coarse sand and gravel layers generally promote faster water infiltration, whereas clay-dominated layers tend to restrict downward movement and provide partial protection against contaminant transport. Transport is also influenced by intermittent rainfall, flash flooding, and sediment accumulation within wadis. Flood events frequently leave behind fine-grained clay and silt deposits that can reduce infiltration and limit contaminant migration toward the aquifer. In contrast, zones characterized by fractured formations or coarse alluvial sediments are typically more susceptible to contamination because of their higher permeability and weaker natural filtration capacity. In general, the vadose zone is a key factor controlling surface water infiltration, groundwater recharge, and contaminant transport. Its influence depends mainly on the porosity and permeability of the subsurface sediments, where highly permeable materials promote water movement through the unsaturated zone toward the aquifer. Therefore, the I-index was assigned a rating of 6 and a weight of 5 during the IrIw calculation, producing a value of 30 (Table 2).
“C”—hydraulic conductivity of the aquifer: The C constraint denotes the capacity of a water-bearing formation (aquifer) to convey water. This index influences the dynamics of water movement within the aquifer, in turn affecting groundwater flow outlines and the mechanisms controlling the transport of contaminants [20,85]. High conductivity facilitates faster groundwater flow, potentially increasing the distance contaminants can migrate. Low hydraulic conductivity, on the other hand, can act as a natural barrier, restricting the movement of contaminants and offering a protective role. The most used method for calculating aquifer properties, including the C parameter, is the pumping-test analysis. In the Wadi Marawani Basin, long-term pumping tests were performed and analyzed in [6] to determine key aquifer hydraulic parameters such as the transmissivity and hydraulic conductivity. The estimated hydraulic conductivity (C) values showed substantial variation across the study area, ranging from 2.8 × 10−8 m/day to 25.3 m/day, with a mean of 7.589 m/day. The high C values recorded in some wells suggest highly permeable aquifer zones with good groundwater potential, as evidenced by the rapid recovery of groundwater levels following pumping operations. Accordingly, the C parameter was classified into three groups, as shown in the distribution map constructed utilizing the ArcGIS software (Figure 7d). The compounding rating scale for the C parameter ranged from 4 to 8, and the weight assigned during the CrCw measurement was 5 (Table 2).
After establishing all seven parameters outlined above, the DRASTIC index (DI) was calculated. The DI variables should fall within the range of 23.00 to 230.00, as noted in [50]. The calculated DRASTIC index varied from 87.00 to 139.00, with an average of approximately 118.2. Accordingly, the DI values were categorized into three distinct groups according to the classification in [14,19] (Figure 7e): low, medium, and high vulnerability.

4.3. ML Performance for WQIs and DI Predictions

The three ML models achieved high accuracy in predicting various indices (TDS, SAR, and vulnerability index), demonstrating their utility for water quality assessment. The efficiency of these models was evaluated in the training and testing phases as outlined in [87], with the detailed results presented in Table 4. Moreover, the specific results calculated throughout the training and testing phases for these indices are illustrated in Figure 8. Figure 9 presents the learning curves of the applied models on both the training and validation datasets; the results indicate that none of the three models exhibit overfitting. Their predictive performance consistently improves as the size of the training set increases. In particular, although the Random Forest (RF) model achieves a relatively good fit for the SAR, its instability remains a notable concern. However, this instability diminishes, and the model’s performance stabilizes with enlargement of the training dataset.
During the training phase, the Multilayer Perceptron model adapted for the TDS index (MLP-TDS) exhibited exceptional accuracy during the training phase, surpassing the other models. It attained a remarkable R2 value of 1.00 and an RMSE of 25.29. This model, which comprises a single hidden layer containing nine neurons and utilizes the ReLU activation function, was subjected to 500 iterations, as illustrated in Figure 10a. For TDS prediction, RF-TDS (five trees, depth 6) achieved R2 = 0.99 and RMSE = 301. DT-TDS (depth 5) showed similar training accuracy (R2 = 0.99 and RMSE = 96.68) but a higher test RMSE (400.95). MLP-TDS was the most precise (train R2 = 1.0, test RMSE = 61.02; Figure 8). In testing, MLP-TDS showed a perfect fit (R2 = 1.0, RMSE = 61.02), outperforming RF-TDS (0.98, 360.70) and DT-TDS (0.97, 400.95).
The MLP-SAR model, designed for the SAR, achieved greater accuracy than the other models during training. It attained a flawless R2 score of 1.00 and an exceptionally low RMSE of 0.02. The MLP-SAR model, with a single hidden layer of seven neurons and the Tanh activation function, was trained for 700 iterations, as illustrated in Figure 10b. The Random Forest (RF-SAR) model demonstrated strong performance, achieving R2 = 0.93 and RMSE = 1.56, using a single tree with an extreme depth of 6. The DT-SAR achieved a perfect R2 value of 1.00 and an extremely low RMSE of 0.30, with an extreme depth of 7. During the transition to the testing phase, all three models consistently exhibited high accuracy levels, as illustrated in Figure 8d–f. The MLP-SAR model demonstrated high precision in forecasting the SAR parameter, with an R2 of about 0.90 and an RMSE of 1.08. The RF-SAR demonstrated significant precision, with R2 = 0.64 and an RMSE value of 2.03. In comparison, the DT-SAR model exhibited an R2 value of 0.61 and an RMSE of 2.11.
Among the three ML models, the Multilayer Perceptron model (MLP-Vul.) designed for the vulnerability index achieved the highest accuracy during the training phase, with a flawless R2 score of 1.00 and a remarkably low RMSE of <0.01. The MLP-Vul. model, consisting of two hidden layers with four neurons each and using the activation function, was trained for 500 iterations, as shown in Figure 10c. The Random Forest model (RF-Vul.) for the vulnerability index displayed strong precision, with R2 = 0.98 and RMSE = 1.60, using three trees with extreme depth = 6. The decision tree model for the vulnerability index (DT-Vul.) achieved R2 = 1.00 and RMSE = 0.56, with a maximum depth of 5. Throughout the testing period, all three models consistently demonstrated commendable accuracy levels (Figure 8g–i). The MLP-Vul. model achieved an R2 of 1.00 and an RMSE < 0.01; the RF-Vul. model obtained R2 = 0.94 and an RMSE of 3.17; and the DT-Vul. model had an R2 of 0.92 and RMSE of 3.57. Although the RMSE slightly increased across all models during testing, their predictive capabilities remained evident.

5. Discussion

The Quaternary aquifer is commonly recognized as the essential water-bearing formation in the Wadi Marawani Basin, primarily due to its large geometry and significant storage capacity. The aquifer is categorized as unconfined, exhibiting limited thickness and storage potential, and is dependent on recent precipitation. The development of the Quaternary aquifer occurs downstream of the wadi and adjacent to the basin channels, reaching a high thickness of approximately 100 m. It consists of permeable and porous materials, such as sand, silt, and gravel. The proximity of groundwater to the surface, along with these highly porous and permeable deposits, could result in contamination from both human and agricultural activities within the study area. The groundwater quality ranges from fresh to saline and is suitable for irrigation in most parts of the basin.
According to the findings of the DRASTIC model, about 78% of the Wadi Marawani Basin is characterized by low-to-medium vulnerability, with the western (downstream), eastern, and central portions of the basin showing similar conditions. A high-vulnerability zone, comprising 22% of the area, is identified in the southern and northern parts of the basin. Comparison of the nitrate distribution map (Figure 5b) with the vulnerability map obtained from the application of the DRASTIC model (Figure 7e) indicates that certain areas exhibiting a high level of NO3 align with areas with a high DRASTIC index, especially in the southern and northern sections of the basin. This relationship suggests that agricultural fertilizers used in cultivated lands are a major source of groundwater contamination. However, because farming activities are not uniformly distributed across the study area, some locations correspond well with the vulnerability map while others do not. Alongside agricultural methods in the study area, natural environmental elements also affect the susceptibility of groundwater to contamination, where shallow groundwater levels, combined with highly permeable sandy and gravelly sediments, facilitate the infiltration and replenishment of surplus irrigation water into the aquifer, thereby heightening the risk of pollution. Consequently, it is anticipated that some regions will correspond with the vulnerability map while others will not. Therefore, the DRASTIC model serves as an essential resource for local authorities in managing groundwater resources and formulating policies to prevent contamination. It helps pinpoint regions that are particularly vulnerable to pollution. On the other hand, the initial DRASTIC model has certain limitations, including a subjective approach to assigning ratings and weights and significant sensitivity to the recharge parameter. To overcome these challenges, researchers have developed modified versions of the model, often by refining the weights or integrating additional parameters.
Each ML model comes with distinct advantages and limitations, showing varying effectiveness depending on the water quality indices. Numerous ML techniques are adept at mimicking and forecasting water quality trends; however, determining the optimal ML algorithm for specific water quality concerns remains a challenging task for researchers. The capabilities of the MLP, DT, and RF models applied in this study can help address the limitations of traditional techniques in groundwater quality and vulnerability assessment. To mitigate the overfitting risk and ensure the models did not unduly learn from the training data, we performed hyperparameter optimization using k-fold cross-validation. The dataset was partitioned into k = 5 random folds. For each iteration, one-fold served as the test, and the model was trained on the remaining folds. The performance metrics (R2 and RMSE) were averaged across all iterations to select the optimal model. We implemented this process using the GridSearchCV library, which takes the machine learning model, hyperparameter grid, and number of folds (k = 5) as inputs and outputs the optimal estimator with its hyperparameters. The entire dataset was initially split into training (70%) and hold-out testing (30%) sets, with cross-validation conducted entirely on the training set. The final model, chosen through cross-validation, was assessed on the reserved test set, and the results are listed in Table 4. The small discrepancy between the training and test set performance (R2 and RMSE) indicates no overfitting, justifying our choice of k = 5. Figure 8 plots the predicted versus actual values. The data exhibits a positive linear trend, consistent with the regression slope, with most points aligning closely. This linear relationship and point distribution indicate low model variance.
Accurate forecasting of WQIs and vulnerability assessments requires understanding the detailed physicochemical and environmental factors contributing to quality deterioration. Modern multivariate regression techniques have surpassed traditional methods, offering both improved predictive performance and practical advantages. Recent research demonstrates the superior performance of ML approaches, particularly MLP, RF, and DT models, in WQI prediction [88,89,90]. Notably, Ref. [57] achieved exceptional results using trace element data, with their RF model reaching 98.99% accuracy and MLP achieving 98.65% accuracy in WQI prediction. Similarly, Ref. [28] corroborated the efficiency of MLP, RF, and DT models for WQI prediction, with R2 values ranging from 0.68 to 0.99, indicating high precision. Ref. [39] employed RF and DT models using physicochemical data inputs, achieving R2 values of 0.93 and 0.87, respectively. Ref. [32] demonstrated comparable performance between MLP and RF models (R2 = 0.92) for predicting WQIs and vulnerability assessments using physical characteristics. Ref. [88] highlighted the operational efficiency of DT algorithms, showing their ability to rapidly classify and forecast water quality when paired with hydrochemical parameter analysis. Furthermore, Ref. [31] identified a strong correlation (R2 = 0.95) between the MLP model and measured WQIs, emphasizing the effectiveness of multivariate regression models in accurately estimating WQIs. The researchers used ML algorithms, including Support Vector Machines (SVMs), Random Forest (RF), and Generalized Linear Models (GLMs), to enhance the results of the DRASTIC model. Among these, the RF model achieved exceptional predictive performance (AUC = 0.98), significantly outperforming GLMs and SVMs (AUC ≈ 0.76). These results underscore the utility of ML algorithms as valuable tools for groundwater resource evaluation and management.
Finally, the use of ML models in groundwater evaluation and management has increased significantly in recent years [91,92], being applied across various areas, including the optimization of groundwater monitoring systems, decision making on water quality, predicting water levels, and planning for groundwater vulnerability [93]. Such applications are instrumental in enhancing groundwater management strategies, covering various aspects from exploration to quality assessment and distribution for multiple purposes [92]. The results of the present study demonstrate that the approach of employing a range of ML algorithms, such as the MLP, RF, and DT models, together with the parameters typically used in more traditional methods like DRASTIC and GIS, is effective for planning groundwater resources in the Wadi Marawani Basin. With an integrated approach, significant advancements can be achieved in assessing groundwater quality (WQIs) and contamination vulnerability. This understanding offers valuable guidance for environmental monitoring and water resource management, particularly in arid regions [93,94]. By clarifying the importance of the input variables for precise predictions, the research highlights how machine learning enhances water quality indicators and vulnerability assessments, facilitating expedited computations and considerable time and effort savings. The study advocates for the widespread adoption of ML models, particularly MLP, RF, and DT, by resource managers and organizations involved in water quality monitoring. These models serve as robust alternatives to traditional WQIs and vulnerability index calculation methods and are particularly beneficial when existing methodologies involve numerous sub-index formulas and complex computations. The comprehensive and reliable nature of ML models underscores their utility for enhancing water quality and groundwater vulnerability evaluation, and for optimizing resource management strategies. The resultant groundwater quality and vulnerability maps serve as valuable tools for providing insights for future work, such as enhancing quality monitoring systems (ground remote sensing and hyperspectral approaches) and conducting deep studies in regions recognized as extremely vulnerable.
This study represents an important contribution to water resource development and management in an arid coastal region of KSA. Its findings can support government agencies and farmers in improving water conservation and sustainable groundwater management for both agricultural and domestic use. Furthermore, the results provide a useful foundation for developing a decision support system (DSS) to direct land-use planning, drilling new production wells, and the protection of groundwater resources in arid environments. Furthermore, recommendations for future studies using larger and multi-seasonal datasets when applying machine learning are also provided.

6. Limitations and Future Work Section

6.1. Methodological Limitations

While the ML models in this study demonstrated exceptional performance (with R2 values approaching 1.0) in predicting the DRASTIC Index (DI), Total Dissolved Solids (TDS), and Sodium Adsorption Ratio (SAR), it is important to contextualize these metrics. A known methodological limitation of this study is the inherent mathematical overlap between the input predictors and the target variables. The standard DI is calculated as a weighted sum of its hydrogeological components; SAR is derived from a strict mathematical formula involving Na+, Ca2+, and Mg2+; and TDS is the physical sum of major dissolved ions.
Consequently, the near-perfect R2 values do not reflect the simulation of highly heterogeneous, regional-scale physical groundwater flow. Rather, they demonstrate the models’ robust capacity to capture, approximate, and reconstruct these deterministic formulas and chemical mass balances. This approach, however, was intentionally adopted as a proof-of-concept to identify the relative influence of individual parameters and to explore the potential for massive model simplification.

6.2. Practical Implications and Future Work

By leveraging this modeling approach, the results successfully demonstrated that complex, nonlinear ML algorithms can accurately reproduce these critical groundwater quality and vulnerability indices using a significantly reduced subset of features. Originally, a total of 14 physicochemical parameters (pH, EC, TDS, K+, Na+, Mg2+, Ca2+, Cl, SO42−, HCO3, CO32−, NO3, TH, and SAR) were analyzed. Based on the ML performance, the inputs required to accurately forecast both TDS and SAR were reduced to just five unique parameters: Ca, Mg, Na, SO4, and Cl. This represents a reduction of approximately 64% compared to the full parameter suite. Similarly, for the DRASTIC Index, the required inputs were reduced from seven parameters down to only three (Depth to water table [D], Topography/slope [T], and Hydraulic conductivity [C]).
Standard vulnerability and water quality assessments are highly resource-intensive, requiring extensive field investigations and numerous laboratory-measured parameters. Building upon the findings of this study, our future research will focus on a rigorous economic comparison between conventional index calculations and these proposed AI-based approaches. We anticipate that by relying on fewer, simpler inputs, AI-driven models will achieve equivalent predictive utility while delivering meaningful reductions in computation time, data collection efforts, and overall financial costs for environmental decision-makers.

7. Conclusions

Groundwater is increasingly at risk of contamination worldwide. Historically, spatially distributed methods have been used to evaluate the quality and vulnerability of this vital resource. Traditional techniques have been constrained by their reliance on fixed or semi-fixed variables and weights. For planning groundwater resources in the area of study, it was essential to assess their quality and vulnerability using a range of novel models and GIS tools. Therefore, the implementation of machine learning (ML) methodologies in this research denoted notable progress in the field of groundwater research, providing effective solutions to existing challenges. This research demonstrates how machine learning algorithms, specifically MLP, RF, and DT, can adaptively modify coefficients for specific instances and successfully incorporate site-specific variables into the evaluations of groundwater quality and vulnerability. A comparison of the vulnerability index (DI) map with the distribution map of nitrate data indicates that regions with high nitrate concentrations align with areas with a higher DRASTIC index. The use of ML for predicting groundwater quality and vulnerability indices highlights the advantages of this innovative approach over simpler methodologies. The ML models used in this research demonstrate R2 values frequently exceeding 0.95, with some reaching 0.999. These models effectively predict the water quality indices (WQIs) and vulnerability indices, facilitating rapid water quality assessments even with a limited number of parameters, thereby reducing both monitoring costs and time. To avoid the deterioration in groundwater quality and a high level of groundwater vulnerability within the region, several strategies may be explored. These strategies encompass regulating agricultural exploitation, supporting advanced irrigation methods, reducing the use of nitrogen-based fertilizers on farmland, and overseeing municipal and industrial wastewater. Additionally, while the current model is specifically calibrated to the Wadi Marawani Basin and would require recalibration for direct use elsewhere, the underlying methodological framework is fully transferable. Future work will focus on validating this approach in other watersheds and developing a broader regional model. Finally, the outcomes of this research can be disseminated to help regional planners and policymakers apply proactive strategies to mitigate groundwater contamination and associated health risks.

Author Contributions

Conceptualization, M.E.O., M.M. and S.E.; fieldwork, M.E.O., M.M. and A.A.; methodology, M.M., M.E.O., N.A.-A., A.A., R.H., M.R., M.S.A.E.-b. and S.E.; software, M.E.O., M.M., M.S.A.E.-b. and S.E.; validation, A.A., N.A.-A., R.H., M.R., M.E.O. and M.M.; investigation, N.A.-A., R.H. and M.M.; resources, M.E.O., A.A. and M.M.; data curation, M.E.O., M.M., S.E. and M.S.A.E.-b.; writing—original draft preparation, M.E.O., M.M., S.E., M.S.A.E.-b., M.R., N.A.-A., A.A. and R.H.; writing—review and editing, M.E.O., M.M., S.E., M.R., M.S.A.E.-b., N.A.-A., A.A. and R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, Saudi Arabia, under grant no. (IPP: 142-123-2025). The authors, therefore, acknowledge with thanks DSR for the technical and financial support.

Data Availability Statement

All data for this study are provided as tables and figures.

Acknowledgments

The authors acknowledge with gratitude the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, Saudi Arabia, for the technical and financial support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, P.; Karunanidhi, D.; Subramani, T.; Srinivasamoorthy, K. Sources and consequences of groundwater contamination. Arch. Environ. Contam. Toxicol. 2021, 80, 1–10. [Google Scholar] [CrossRef]
  2. Mergia, T.J.; Tesfay, A.H.; Tesfamariam, G.M.; Sbhatu, D.B.; Gebresilasie, K.G.; Berhe, G.G. Assessment on Physicochemical Quality of Tap and Bottled Water in Mekelle City, Ethiopia; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2024. [Google Scholar] [CrossRef]
  3. Gleeson, T.; Befus, K.M.; Jasechko, S.; Luijendijk, E.; Cardenas, M.B. The global volume and distribution of modern groundwater. Nat. Geosci. 2016, 9, 161–167. [Google Scholar] [CrossRef]
  4. Iqbal, Z.; Imran, M.; Rahman, G.N.; Miandad, M.; Shahid, M.; Murtaza, B. Spatial distribution, health risk assessment, and public perception of groundwater in Bahawalnagar, Punjab, Pakistan: A multivariate analysis. Environ. Geochem. Health 2023, 45, 381–391. [Google Scholar] [CrossRef] [PubMed]
  5. Alqarawy, A. Assessment of shallow groundwater aquifer in an arid environment, Western Saudi Arabia. J. Afr. Earth Sci. 2023, 200, 104864. [Google Scholar] [CrossRef]
  6. Masoud, M.; El Osta, M.; Alqarawy, A.; Niyazi, B. Optimal management of the groundwater coastal aquifer based on the hydraulic characteristics in Wadi Al Marwani basin: KSA. Environ. Earth Sci. 2023, 82, 308. [Google Scholar] [CrossRef]
  7. El Osta, M.; Masoud, M.; Niyazi, B.; Al-Amri, N.; Alqarawy, A.; El-baki, M.S.A.; Elsayed, S. Utilizing machine learning algorithms to improve predictions of groundwater quality indices for irrigation in an arid environment of Saudi Arabia. Environ. Earth Sci. 2025, 84, 389. [Google Scholar] [CrossRef]
  8. Zhang, D.; Wang, P.; Cui, R.; Yang, H.; Li, G.; Chen, A.; Wang, H. Electrical conductivity and dissolved oxygen as predictors of nitrate concentrations in shallow groundwater in Erhai Lake region. Sci. Total Environ. 2022, 802, 149879. [Google Scholar] [CrossRef]
  9. El Osta, M.; Masoud, M.; Alqarawy, A.; Elsayed, S.; Gad, M. Groundwater Suitability for Drinking and Irrigation Using Water Quality Indices and Multivariate Modeling in Makkah Al-Mukarramah Province, Saudi Arabia. Water 2022, 14, 483. [Google Scholar] [CrossRef]
  10. Shinwari, F.U.; Liaquat, U.; Khan, M.A.; Kontakiotis, G.; Makri, P.; Lianou, V.; Antonarakou, A. Hydrochemical zonation and depth-based vulnerability of groundwater in Islamabad using GIS and WQI techniques. Front. Water 2025, 7, 1581668. [Google Scholar] [CrossRef]
  11. Makri, P.; Hermides, D.; Kontakiotis, G.; Zarkogiannis, S.D.; Besiou, E.; Janjuhah, H.T.; Antonarakou, A. Integrated Ecological Assessment of Heavily Polluted Sedimentary Basin within the Broader Industrialized Area of Thriassion Plain (Western Attica, Greece). Water 2022, 14, 382. [Google Scholar] [CrossRef]
  12. Hermides, D.; Makri, P.; Kontakiotis, G.; Antonarakou, A. Advances in the Coastal and Submarine Groundwater Processes: Controls and Environmental Impact on the Thriassion Plain and Eleusis Gulf (Attica, Greece). Mar. Sci. Eng. 2020, 8, 944. [Google Scholar] [CrossRef]
  13. Chamanehpour, E.; Sayadi, M.H.; Yousefi, E. The potential evaluation of groundwater pollution based on the intrinsic and the specific vulnerability index. Groundw. Sustain. Dev. 2020, 10, 100313. [Google Scholar] [CrossRef]
  14. Motlagh, Z.; Derakhshani, R.; Sayadi, M. Groundwater vulnerability assessment in central Iran: Integration of GIS based DRASTIC model and a machine learning approach. Groundw. Sustain. Dev. 2023, 23, 101037. [Google Scholar] [CrossRef]
  15. Teng, Y.; Zuo, R.; Xiong, Y.; Wu, J.; Zhai, Y.Z.; Su, J. Risk assessment framework for nitrate contamination in groundwater for regional management. Sci. Total Environ. 2019, 697, 134102. [Google Scholar] [CrossRef]
  16. Solangi, G.S.; Siyal, A.A.; Babar, M.M.; Siyal, P. Application of water quality index, synthetic pollution index, and geospatial tools for the assessment of drinking water quality in the Indus Delta, Pakistan. Environ. Monit. Assess. 2019, 191, 731. [Google Scholar] [CrossRef] [PubMed]
  17. Das, A. Harnessing hydro chemical characterization of surface water using water quality indices and machine learning—Driven water quality modelling with special emphasis on side—Stream pollution. Desalin. Water Treat. 2025, 324, 101592. [Google Scholar] [CrossRef]
  18. Masoud, M.H.; El Osta, M.M. Evaluation of groundwater vulnerability by using modeling and GIS techniques in El-Bahariya Oasis-Western Desert-Egypt. J. Earth Syst. Sci. 2016, 125, 1139–1155. [Google Scholar] [CrossRef]
  19. Kang, J.; Zhao, L.; Li, R.; Mo, H.; Li, Y. Groundwater vulnerability assessment based on modified DRASTIC model: A case study in Changli County, China. Geocarto Int. 2017, 32, 749–758. [Google Scholar] [CrossRef]
  20. El Osta, M.; Niyazi, B.; Masoud, M. Groundwater evolution and vulnerability in semi-arid regions using modeling and GIS tools for sustainable development: Case study of Wadi Fatimah, Saudi Arabia. Environ. Earth Sci. 2022, 81, 248. [Google Scholar] [CrossRef]
  21. Vrba, J.; Zaporožec, A. Guidebook on mapping groundwater vulnerability. In International Association of Hydrogeologists; Heise, H., Ed.; The International Association of Hydrogeologists: Hannover, Germany, 1994. [Google Scholar]
  22. Katyal, D.; Tomer, T.; Joshi, V. Recent trends in groundwater vulnerability assessment techniques: A review. Int. J. Appl. Res. 2017, 3, 646–655. [Google Scholar]
  23. Goyal, D.; Haritash, A.K.; Singh, S.K. A comprehensive review of groundwater vulnerability assessment using index-based, modelling, and coupling methods. J. Environ. Manag. 2021, 296, 113161. [Google Scholar] [CrossRef]
  24. Zupo, A.; Paula, R.; Sampaio, J.; Júnior, J.; Melo, M. Comparative study of standard and modified groundwater vulnerability methods in the gold and iron mining regions of Western Quadrilátero Ferrífero, Brazil. J. Contam. Hydrol. 2025, 270, 104516. [Google Scholar] [CrossRef]
  25. Shrestha, S.; Kafle, R.; Pandey, V. Assessment of groundwater vulnerability and risk to pollution in Kathmandu Valley, Nepal. Sci. Total Environ. 2017, 575, 779–790. [Google Scholar] [CrossRef]
  26. Rezaei, A.; Sayadi, M.H.; Zadeh, R.J.; Mousazadeh, H. Assessing the hydrogeochemical processes through classical integration of groundwater parameters in the Birjand plain in eastern Iran. Groundw. Sustain. Dev. 2021, 15, 100684. [Google Scholar] [CrossRef]
  27. Tiyasha; Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar] [CrossRef]
  28. Khoi, D.N.; Quan, N.T.; Linh, D.Q.; Nhi, P.T.; Thuy, N.T.D. Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water 2022, 14, 1552. [Google Scholar] [CrossRef]
  29. Khosravi, K.; Sartaj, M.; Tsai, F.; Singh, V.; Kazakis, N.; Melesse, A.; Prakash, I.; Bui, D.; Pham, B. A comparison study of DRASTIC methods with various objectivemethods for groundwater vulnerability assessment. Sci. Total Environ. 2018, 642, 1032–1049. [Google Scholar] [CrossRef] [PubMed]
  30. Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What role does hydrological science play in the age of machine learning. Water Resour. Res. 2021, 57, e2020WR028091. [Google Scholar] [CrossRef]
  31. Gazzaz, N.M.; Yusoff, M.K.; Aris, A.Z.; Juahir, H.; Ramli, M.F. Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar. Pollut. Bull. 2012, 64, 2409–2420. [Google Scholar] [CrossRef] [PubMed]
  32. El Bilali, A.; Taleb, A.; Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agric. Water Manag. 2021, 245, 106625. [Google Scholar] [CrossRef]
  33. Radhkrishnan, N.; Pillai, A.S. Comparison of water quality classification models using machine learning. In Proceedings of the 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 10–12 June 2020; IEEE: New York, NY, USA, 2020; pp. 1183–1188. [Google Scholar]
  34. Ahamed, M.; Mumtaz, R.; Zaidi, S.M.H. Analysis of water quality indices and machine learning techniques for rating water pollution: A case study of Rawal Dam, Pakistan. Water Supply 2021, 21, 3225–3250. [Google Scholar] [CrossRef]
  35. Naloufi, M.; Lucas, F.S.; Souihi, S.; Servais, P.; Janne, A.; De Abreu, T.W.M. Evaluating the performance of machine learning approaches to predict the microbial quality of surface waters and to optimize the sampling effort. Water 2021, 13, 2457. [Google Scholar] [CrossRef]
  36. Raheja, H.; Goel, A.; Pal, M. Prediction of groundwater quality indices using machine learning algorithms. Water Pract. Technol. 2022, 17, 336–351. [Google Scholar] [CrossRef]
  37. Bowes, B.D.; Wang, C.; Ercan, M.B.; Culver, T.B.; Beling, P.A.; Goodall, J.L. Reinforcement learning-based real-time control of coastal urban stormwater systems to mitigate flooding and improve water quality. Environ. Sci. Water Res. Technol. 2022, 8, 2065–2086. [Google Scholar] [CrossRef]
  38. Wong, W.Y.; Al-Ani, A.K.I.; Hasikin, K.; Khairuddin, A.S.M.; Razak, S.A.; Hizaddin, H.F.; Mokhtar, M.I.; Azizan, M.M. Water quality index using modified random forest technique: Assessing novel input features. CMES-Comput. Model. Eng. Sci. 2022, 132, 1011–1038. [Google Scholar]
  39. Bui, D.T.; Khosravi, K.; Tiefenbacher, J.; Nguyen, H.; Kazakis, N. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Environ. 2020, 721, 137612. [Google Scholar] [CrossRef] [PubMed]
  40. Al-ahmadi, M.E.; El-Fiky, A.A. Hydrogeochemical evaluation of shallow alluvial aquifer of Wadi Marwani, western Saudi Arabia. J. King Saud Univ. (Sci.) 2009, 21, 179–190. [Google Scholar] [CrossRef]
  41. Theis, C.V. The relation between the lowering of the piezometric surface and the rate and duration of discharge of a well using groundwater storage. Eos Trans. Am. Geophys. Union 1935, 16, 519–524. [Google Scholar] [CrossRef]
  42. Cooper, H.H.; Jacob, C.E. A generalized graphical method for evaluating formation constants and summarizing well field history. Eos Trans. Am. Geophys. Union 1946, 27, 526–534. [Google Scholar]
  43. APHA. Standard Methods for the Examination of Water and Wastewater, 22nd ed.; American Public Health Association: Washington, DC, USA; American Water Works Association: Denver, CL, USA; Water Environment Federatio: Alexandria, VA, USA, 2012. [Google Scholar]
  44. Domenico, P.A.; Schwartz, F.W. Physical and Chemical Hydrogeology, 2nd ed.; John Wiley & Sons Inc.: New York, NY, USA, 1998. [Google Scholar]
  45. Ayers, R.; Westcot, D. Water Quality for Agriculture. In FAO Irrigation and Drainage Paper 29 Rev. 1; Food and Agriculture Organization of the United Nations: Rome, Italy, 1994. [Google Scholar]
  46. Mohammed, S.; Arshad, S.; Bashir, B.; Ata, B.; Al-Dalahmeh, M.; Alsalman, A.; Ali, H.; Alhennawi, S.; Kiwan, S.; Harsanyi, E. Harsanyi Evaluating machine learning performance in predicting sodium adsorption ratio for sustainable soil-water management in the eastern Mediterranean. J. Environ. Manag. 2024, 370, 122640. [Google Scholar] [CrossRef] [PubMed]
  47. Richards, L.A. Diagnosis Improvement Saline Alkali Soils. In US Department of Agriculture Handbook; U.S. Department of Agriculture: Washington, DC, USA, 1954. Available online: https://www.ars.usda.gov/ARSUserFiles/20360500/hb60_pdf/hb60complete.pdf (accessed on 3 May 2026).
  48. Alwan, I.A.; Karim, H.H.; Aziz, N.A. Groundwater Aquifer Suitability for Irrigation Purposes Using Multi-Criteria Decision Approach in Salah Al-Din Governorate/Iraq. AgriEngineering 2019, 1, 303–323. [Google Scholar] [CrossRef]
  49. Arora, P.; Rani, N.; Anand, A. Assessment of Soil Quality of Rice Fields Under Irrigation with Different Water Sources. Curr. Agric. Res. J. 2024, 12, 694–704. [Google Scholar] [CrossRef]
  50. World Health Organization (WHO). Guidelines for Drinking Water Quality: Fourth Edition Incorporating the First Addendum; World Health Organization: Geneva, Switzerland, 2017; p. 631. [Google Scholar]
  51. Aller, L.; Lehr, J.H.; Petty, R.; Bennett, T. DRASTIC—A Standardized System to Evaluate Groundwater Pollution Potential Using Hydrogeologic Setting. J. Geol. Soc. India 1987, 29, 23–37. [Google Scholar] [CrossRef]
  52. Mishima, Y.; Takada, M.; Kitagawa, R. Evaluation of intrinsic vulnerability to nitrate contamination of groundwater: Appropriate fertilizer application management. Environ. Earth Sci. 2010, 63, 571–580. [Google Scholar] [CrossRef]
  53. Nadiri, A.A.; Gharekhani, M.; Khatibi, R.; Moghaddam, A.A. Assessment of groundwater vulnerability using supervised committee to combine fuzzy logic models. Environ. Sci. Pollut. Res. 2017, 24, 8562–8577. [Google Scholar] [CrossRef] [PubMed]
  54. Awais, M.; Aslam, B.; Maqsoom, A.; Khalil, U.; Ullah, F.; Azam, S.; Imran, M. Assessing Nitrate Contamination Risks in Groundwater: A Machine Learning Approach. Appl. Sci. 2021, 11, 10034. [Google Scholar] [CrossRef]
  55. Hfaiedh, E.; Gaagai, A.; Petitta, M.; Ben Moussa, A.; Mlayah, A.; Eid, M.H.; Szűcs, P.; Elsayed, S.; El-Baki, M.S.A.; Elbeltagi, A.; et al. Hydrogeochemical characterization and water quality evaluation associated with toxic elements using indexing approaches, multivariate analysis, and artificial neural networks in Morang, Tunisia. Environ. Earth Sci. 2025, 84, 361. [Google Scholar] [CrossRef]
  56. Gad, M.; El-Safa, M.M.A.; Farouk, M.; Hussein, H.; Alnemari, A.M.; Elsayed, S.; Khalifa, M.M.; Moghanm, F.S.; Eid, E.M.; Saleh, A.H. Integration of water quality indices and multivariate modeling for assessing surface water quality in Qaroun Lake, Egypt. Water 2021, 13, 2258. [Google Scholar] [CrossRef]
  57. Hassan, M.; Hassan, M.; Akter, L.; Rahmani, M.; Zaman, S.; Hasib, K.; Jahan, N.; Smrity, R.N.; Farhana, J.; Raihan, M.; et al. Efficient prediction of water quality index (WQI) using machine learning algorithms. Hum.-Centric Intell. Syst. 2021, 1, 86–97. [Google Scholar] [CrossRef]
  58. Thara, D.; Premasudha, B.; Xiong, F. Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit. Lett. 2019, 128, 544–550. [Google Scholar] [CrossRef]
  59. El-Baki, M.S.A.; Ibrahim, M.M.; Elsayed, S.; Yaseen, Z.M.; El-Fattah, N.G.A. Water status and plant traits of dry bean assessment using integrated spectral reflectance and RGB image indices with artificial intelligence. Sci. Rep. 2025, 15, 16808. [Google Scholar] [CrossRef]
  60. Moghanm, F.S.; Ali, R.A.; Abowaly, M.E.; Gharib, M.S.; Abbas, A.M.; Szűcs, P.; Eid, M.H.; Elwakeel, A.E.; Elsayed, S.; El-Baki, M.S.A.; et al. Evaluating land degradation and environmental hazards in North delta Egypt using machine learning and GIS approaches. Sci. Rep. 2025, 15, 37749. [Google Scholar] [CrossRef] [PubMed]
  61. Zhu, J.; Huang, Z.; Sun, H.; Wang, G. Mapping forest ecosystem biomass density for Xiangjiang river basin by combining plot and remote sensing data and comparing spatial extrapolation methods. Remote Sens. 2017, 9, 241. [Google Scholar] [CrossRef]
  62. Hossain, M.R.; Timmer, D. Machine learning model optimization with hyper parameter tuning approach. Glob. J. Comput. Sci. Technol. 2021, 21, 7–13. [Google Scholar]
  63. Elsayed, S.; El-Hendawy, S.; Khadr, M.; Elsherbiny, O.; Al-Suhaibani, N.; Alotaibi, M.; Tahir, M.U.; Darwish, W. Combining thermal and RGB imaging indices with multivariate and data-driven modeling to estimate the growth, water status, and yield of potato under different drip irrigation regimes. Remote Sens. 2021, 13, 1679. [Google Scholar] [CrossRef]
  64. Huang, J.; Zhang, Y.; Bing, H.; Peng, J.; Dong, F.; Gao, J.; Arhonditsis, G.B. Characterizing the river water quality in China: Recent progress and on-going challenges. Water Res. 2021, 201, 117309. [Google Scholar] [CrossRef]
  65. Yang, K.; Hao, J.; Wang, Y. Switching angles generation for selective harmonic elimination by using artificial neural networks and quasi-newton algorithms. In Proceedings of the IEEE Energy Conversion Congress and Exposition (ECCE), Milwaukee, WI, USA, 18–22 September 2016; IEEE: New York, NY, USA, 2016; pp. 1–5. [Google Scholar]
  66. Sharma, S.; Sharma, S.; Anidhya, A. Activation functions in neural networks. Towards Data Sci. 2020, 6, 310–316. [Google Scholar] [CrossRef]
  67. Liu, G.; Yang, S.; Zhong, Y. A computational framework for interface design using lattice matching, machine learning potentials, and active learning: A case study on LaCoO3/La2NiO4. Mater. Today Phys. 2025, 59, 101940. [Google Scholar] [CrossRef]
  68. Mijwel, M.M. Artificial neural networks advantages and disadvantages. Mesopotamian J. Big Data 2021, 2021, 29–31. [Google Scholar] [CrossRef]
  69. Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]
  70. Xia, Y.; Liu, C.; Li, Y.; Liu, N. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 2017, 78, 225–241. [Google Scholar] [CrossRef]
  71. Elsayed, S.; Gala, H.; El-Baki, M.S.A.; Maher, M.; Elbeltagi, A.; Salem, A.; Elwakeel, A.E.; Elsherbiny, O.; El-Fattah, N.G.A. Hyperspectral technology and machine learning models to estimate the fruit quality parameters of mango and strawberry crops. PLoS ONE 2025, 20, e0313397. [Google Scholar] [CrossRef]
  72. Ahmad, M.W.; Reynolds, J.; Rezgui, Y. Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees. J. Clean. Prod. 2018, 203, 810–821. [Google Scholar] [CrossRef]
  73. Elvanidi, A.; Katsoulas, N.; Augoustaki, D.; Loulou, I.; Kittas, C. Crop reflectance measurements for nitrogen deficiency detection in a soilless tomato crop. Biosyst. Eng. 2018, 176, 1–11. [Google Scholar] [CrossRef]
  74. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  75. Al-Shaibani, A.M. Hydrogeology and hydrochemistry of a shallow alluvial aquifer, western Saudi Arabia. Hydrogeol. J. 2008, 16, 155–165. [Google Scholar] [CrossRef]
  76. Niyazi, B.; Ahmed, M.; Masoud, M.; Rashed, M.; Basahi, J. Sustainable and resilient management scenarios for groundwater resources of the Red Sea coastal aquifers. Sci. Total Environ. 2019, 690, 1310–1320. [Google Scholar] [CrossRef]
  77. Zhang, Q.; Wei, A.; Ren, J.; Qian, H.; Hou, K. Multi-isotope tracer for identifying nitrate sources in shallow groundwater in a large irrigation area, China. J. Environ. Manag. 2025, 376, 124424. [Google Scholar] [CrossRef]
  78. Jones, B.F.; Vengosh, A.; Rosenthal, E.; Yechieli, Y. Geochemical investigation of groundwater quality. In Seawater Intrusion in Coastal Aquifers-Concepts, Methods and Practices; Springer: Kluwer, The Netherlands, 1999; pp. 51–71. [Google Scholar]
  79. Giménez-Forcada, E.; San Román, F.J.S. An excel macro to plot the HFE-Diagram to identify seawater intrusion phases. Ground Water 2015, 53, 819–824. [Google Scholar] [CrossRef]
  80. Olea-Olea, S.; Escolero, O.; Mahlknecht, J.; Ortega, L.; Silva-Aguilera, R.; Florez-Peñaloza, J.R.; Perez-Quezadas, J.; Zamora-Martinez, O. Identification of the components of a complex groundwater flow system subjected to intensive exploitation. J. South Am. Earth Sci. 2019, 98, 102434. [Google Scholar] [CrossRef]
  81. Lerner, D.N.; Issar, A.S.; Simmers, I. Groundwater Recharge: A guide to understanding and estimating Natural Recharge. In International Contributions to Hydrogeologists; CRC Press: Boca Raton, FL, USA, 1990; Volume 10. [Google Scholar]
  82. Zomer, R.J.; Xu, J.; Trabucco, A. Version 3 of the global aridity index and potential evapotranspiration database. Sci. Data 2022, 9, 409. [Google Scholar] [CrossRef] [PubMed]
  83. El Osta, M.; Masoud, M.; Al-Amri, N.; Alqarawy, A.; Halawani, R.; Rashed, M. Estimation of groundwater recharge by chloride mass balance (CMB) method in some selected wadis, Western Saudi Arabia in (1966–2018). Environ. Earth Sci. 2025, 84, 321. [Google Scholar] [CrossRef]
  84. Shrestha, S.; Semkuyu, D.J.; Pandey, V. Evaluation of index-overlay methods for groundwater vulnerability and risk assessment in Kathmandu Valley, Nepal. Sci. Total. Environ. 2016, 556, 23–35. [Google Scholar] [CrossRef]
  85. Ijlil, S.; Essahlaoui, A.; Mohajane, M.; Essahlaoui, N.; Mili, E.M.; Rompaey, A.V. Machine learning algorithms for modeling and mapping of groundwater pollution risk: A study to reach water security and sustainable development (sdg) goals in a mediterranean aquifer system. Rem. Sens. 2022, 14, 2379. [Google Scholar] [CrossRef]
  86. Güler, C.; Kurt, M.A.; Korkut, R.N. Assessment of groundwater vulnerability to nonpoint source pollution in a Mediterranean coastal zone (Mersin, Turkey) under conflicting land use practices. Ocean. Coast. Manag. 2013, 71, 141–152. [Google Scholar] [CrossRef]
  87. Târcoveanu, F.; Leon, F.; Lisa, C.; Curteanu, S.; Feraru, A.; Ali, K.; Anton, N. The use of artificial neural networks in studying the progression of glaucoma. Sci. Rep. 2024, 14, 19597. [Google Scholar] [CrossRef]
  88. Gorgan-Mohamadi, F.; Rajaaee, T.; Zounemat-Kermani, M. Decision tree models in predicting water quality parameters of dissolved oxygen and phosphorus in lake water. Sustain. Water Resour. Manag. 2003, 9, 1. [Google Scholar] [CrossRef]
  89. Fang, X.; Li, X.; Zhang, Y.; Zhao, Y.; Qian, J.; Hao, C.; Zhou, J.; Wu, Y. Random forest-based understanding and predicting of the impacts of anthropogenic nutrient inputs on the water quality of a tropical lagoon. Environ. Res. Lett. 2021, 16, 055003. [Google Scholar] [CrossRef]
  90. Zhang, Y.; Yao, X.; Wu, Q.; Huang, Y.; Zhou, Z.; Yang, J.; Liu, X. Turbidity prediction of lake-type raw water using random forest model based on meteorological data: A case study of Tai Lake, China. J. Environ. Manag. 2021, 290, 112657. [Google Scholar] [CrossRef]
  91. Zaresefat, M.; Derakhshani, R. Revolutionizing Groundwater Management with Hybrid AI Models: A Practical Review. Water 2023, 15, 1750. [Google Scholar] [CrossRef]
  92. Gómez-Escalonilla, V.; Martínez-Santos, P. A Machine Learning Approach to Map the Vulnerability of Groundwater Resources to Agricultural Contamination. Hydrology 2024, 11, 153. [Google Scholar] [CrossRef]
  93. Khan, J.; Lee, E.; Balobaid, A.S.; Kim, K. A Comprehensive Review of Conventional, Machine Leaning, and Deep Learning Models for Groundwater Level (GWL) Forecasting. Appl. Sci. 2023, 13, 2743. [Google Scholar] [CrossRef]
  94. Al-Falal, A.N.A.; Elsayed, S.; El Fadaly, E.A.; Gaagai, A.; Aouissi, H.A.; El-Baki, M.S.A.; Eid, M.H.; Elwakeel, A.E.; Yaseen, Z.M.; Elsherbiny, O.; et al. Aquatic System Assessment of Potentially Toxic Elements in El Manzala Lake, Egypt: A Statistical and Machine Learning Approach. Results Eng. 2025, 26, 105027. [Google Scholar] [CrossRef]
Figure 1. Key map of Wadi Marawani Basin, Western KSA, and location of groundwater wells and water samples.
Figure 1. Key map of Wadi Marawani Basin, Western KSA, and location of groundwater wells and water samples.
Earth 07 00097 g001
Figure 2. Digital elevation model (a), annual distribution map of average rainfall in mm (b), and geological map (c) of Wadi Marawani Basin, Western KSA.
Figure 2. Digital elevation model (a), annual distribution map of average rainfall in mm (b), and geological map (c) of Wadi Marawani Basin, Western KSA.
Earth 07 00097 g002
Figure 3. Flowchart depicting a broad outline of the ML models designed to predict TDS, SAR, and vulnerability (DI) indices.
Figure 3. Flowchart depicting a broad outline of the ML models designed to predict TDS, SAR, and vulnerability (DI) indices.
Earth 07 00097 g003
Figure 4. Groundwater level and flow direction map in Wadi Marawani Basin, Western KSA.
Figure 4. Groundwater level and flow direction map in Wadi Marawani Basin, Western KSA.
Earth 07 00097 g004
Figure 5. Spatial distribution maps: (a) total dissolved solids (TDS) in mg/L, (b) nitrates (NO3) in mg/L, (c) sodium adsorption ratio (SAR) in meq/L, and USSL salinity diagram in Wadi Marawani Basin, Western KSA.
Figure 5. Spatial distribution maps: (a) total dissolved solids (TDS) in mg/L, (b) nitrates (NO3) in mg/L, (c) sodium adsorption ratio (SAR) in meq/L, and USSL salinity diagram in Wadi Marawani Basin, Western KSA.
Earth 07 00097 g005
Figure 6. HFE diagram [78] for groundwater hydrochemical facies in Wadi Marawani Basin, Western KSA.
Figure 6. HFE diagram [78] for groundwater hydrochemical facies in Wadi Marawani Basin, Western KSA.
Earth 07 00097 g006
Figure 7. Spatial distribution maps: (a) Depth to water table (m), (b) net recharge (mm), (c) slope (%), (d) hydraulic conductivity (m/day), and (e) vulnerability index in Wadi Marawani Basin, Western KSA.
Figure 7. Spatial distribution maps: (a) Depth to water table (m), (b) net recharge (mm), (c) slope (%), (d) hydraulic conductivity (m/day), and (e) vulnerability index in Wadi Marawani Basin, Western KSA.
Earth 07 00097 g007
Figure 8. Testing phase results of (a,d,g) MLP, (b,e,h) RF, and (c,f,i) DT pertaining to the association between observed and predicted TDS, SAR, and vulnerability indices (*** for R2, statistically significant at p ≤ 0.001).
Figure 8. Testing phase results of (a,d,g) MLP, (b,e,h) RF, and (c,f,i) DT pertaining to the association between observed and predicted TDS, SAR, and vulnerability indices (*** for R2, statistically significant at p ≤ 0.001).
Earth 07 00097 g008
Figure 9. Learning curves generated by the ML models based on the number of samples using five-fold cross-validation. (ac) MLP, (df) RF, and (gi) DT pertaining to the association between observed and predicted TDS, SAR, and vulnerability indices, respectively.
Figure 9. Learning curves generated by the ML models based on the number of samples using five-fold cross-validation. (ac) MLP, (df) RF, and (gi) DT pertaining to the association between observed and predicted TDS, SAR, and vulnerability indices, respectively.
Earth 07 00097 g009
Figure 10. MLP diagrams for detecting (a) TDS, (b) SAR, and (c) vulnerability indices.
Figure 10. MLP diagrams for detecting (a) TDS, (b) SAR, and (c) vulnerability indices.
Earth 07 00097 g010
Table 2. The range, rate, and weights of the DRASTIC parameters [51].
Table 2. The range, rate, and weights of the DRASTIC parameters [51].
DRASTIC ParameterRangeRating (r)Weight (w)
D
Depth to water table (m)
0–1.5105
1.5–4.69
4.6–9.17
9.1–15.25
15.2–22.83
22.8–30.42
>30.41
R
Net recharge (mm)
0–50.814
50.8–101.63
101.6–177.86
177.8–2548
>2549
A
Aquifer media
Gravel93
Sand and gravel8
Limestone, gravel, sand, and clay7
Sandy clay6
clay5
S
Soil media
Thin or absent102
Gravel10
Sand9
Peat8
Aggregated clay7
Sandy loam6
Loam5
Silty loam4
Clay loam3
Muck2
Non-aggregated clay1
T
Slope (%)
0–2101
2–69
6–125
12–183
>181
I
Impact of vadose zone
Karst105
Basalt9
Sand and gravel8
Sandstone6
Limestone/sandstone6
Sand, gravel, and alluvium6
Clay/alluvium3
Calcareous3
Confined aquifer1
C
Hydraulic conductivity (m/day)
0.4–4.113
4.1–12.32
12.3–28.74
28.7–416
41–828
>8210
Table 3. Types of activation function in MLP models.
Table 3. Types of activation function in MLP models.
NameEquation
Hyperbolic Tangent (Tanh) f x = ( e x e x ) ( e x + e x )
Logistic (Sigmoid) f x = 1 1 + e x
Rectified Linear Unit (ReLU) f x = m a x ( 0 , x )
Linear (Identify) f x = x
Table 4. Performance of various ML models in forecasting TDS, SAR, and vulnerability indices.
Table 4. Performance of various ML models in forecasting TDS, SAR, and vulnerability indices.
ParameterModelOptimal Input ParametersTrainingTesting
R2RMSER2RMSE
TDSMLP-TDSCa, Mg, Na, SO4, Cl0.9999 ***25.290.9993 ***61.02
RF-TDSCa, Mg, Na, SO4, Cl0.9939 ***3010.9767 ***360.70
DT-TDSCa, Mg, Na, SO4, Cl0.9994 ***96.680.9712 ***400.95
SARMLP-SARNa, Ca, Mg0.9999 ***0.020.8979 ***1.08
RF-SARNa, Ca, Mg0.929 ***1.560.6358 **2.03
DT-SARNa, Mg0.9975 ***0.300.6087 **2.11
Vulnerability index
DI
MLP-Vul.DrDw, TrTw, CrCw1.0 ***9.59 × 10−121.0 ***1.45 × 10−11
RF-Vul.DrDw, TrTw, CrCw0.9848 ***1.600.9385 ***3.17
DT-Vul.DrDw, TrTw, CrCw0.9981 ***0.560.9221 ***3.57
** and ***: statistically significant at p ≤ 0.01 and 0.001, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

El Osta, M.; Masoud, M.; Al-Amri, N.; Alqarawy, A.; Halawani, R.; Rashed, M.; El-baki, M.S.A.; Elsayed, S. Integrated Assessment of Coastal Groundwater Vulnerability in Western Kingdom of Saudi Arabia Using the DRASTIC Model and Machine Learning Algorithms. Earth 2026, 7, 97. https://doi.org/10.3390/earth7030097

AMA Style

El Osta M, Masoud M, Al-Amri N, Alqarawy A, Halawani R, Rashed M, El-baki MSA, Elsayed S. Integrated Assessment of Coastal Groundwater Vulnerability in Western Kingdom of Saudi Arabia Using the DRASTIC Model and Machine Learning Algorithms. Earth. 2026; 7(3):97. https://doi.org/10.3390/earth7030097

Chicago/Turabian Style

El Osta, Maged, Milad Masoud, Nassir Al-Amri, Abdulaziz Alqarawy, Riyadh Halawani, Mohamed Rashed, Mohamed S. Abd El-baki, and Salah Elsayed. 2026. "Integrated Assessment of Coastal Groundwater Vulnerability in Western Kingdom of Saudi Arabia Using the DRASTIC Model and Machine Learning Algorithms" Earth 7, no. 3: 97. https://doi.org/10.3390/earth7030097

APA Style

El Osta, M., Masoud, M., Al-Amri, N., Alqarawy, A., Halawani, R., Rashed, M., El-baki, M. S. A., & Elsayed, S. (2026). Integrated Assessment of Coastal Groundwater Vulnerability in Western Kingdom of Saudi Arabia Using the DRASTIC Model and Machine Learning Algorithms. Earth, 7(3), 97. https://doi.org/10.3390/earth7030097

Article Metrics

Back to TopTop