Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater

Gulzar, Nighat; Liao, Xin; Xu, Zhongyuan; Rehman, Amir

doi:10.3390/hydrology13060144

Open AccessArticle

Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater

by

Nighat Gulzar

¹,

Xin Liao

¹,

Zhongyuan Xu

¹

and

Amir Rehman

^2,3,4,*

¹

Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 611756, China

²

School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China

³

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China

⁴

Sichuan-Chongqing Joint Key Laboratory of Digital Transportation, Southwest Jiaotong University, Chengdu 611756, China

^*

Author to whom correspondence should be addressed.

Hydrology 2026, 13(6), 144; https://doi.org/10.3390/hydrology13060144

Submission received: 4 April 2026 / Revised: 25 May 2026 / Accepted: 27 May 2026 / Published: 29 May 2026

Download

Browse Figures

Versions Notes

Abstract

Fluoride contamination in groundwater poses a significant public-health concern in most semi-arid areas such as the Punjab alluvial aquifers of Pakistan, with local concentrations exceeding the WHO guideline. Reliable fluoride dynamics prediction and mechanistic interpretation of fluoride is key for targeted monitoring and risk mitigation. This paper built an integrated hydrogeochemical machine learning model to predict the fluoride concentration and classify exceedance risk in the Rechna Doab aquifer Tehsil Jaranwala, Punjab, Pakistan. Nested cross-validation and independent test evaluation were performed on conventional models (linear regression, random forest, XGBoost) and a deep tabular model (FT-Transformer). Model reliability was evaluated using discrimination and probability-calibration metrics, while Shapley Additive Explanations (SHAP) and permutation importance were applied to identify the main hydrogeochemical controls on fluoride prediction. Moreover, the robustness was tested by noise sensitivity experiments. Fluoride concentrations showed a positive skewed distribution with some local exceedances related to the geogenic and hydrochemical influences. Nonlinear models greatly outperformed the linear baseline; XGBoost showed robust regression performance (test R² = 0.878; RMSE ≈ 0.190 mg/L). The FT-Transformer showed strong exceedance-classification performance, with high sensitivity (recall = 0.875) and good probability calibration (Brier ≈ 0.021). Interpretability analyses identified EC/TDS, Mg²⁺, and Ca²⁺ as important predictors, linking fluoride enrichment to chemically evolved groundwater with reduced calcium activity, sodium enrichment, and alkalinity buffering. The proposed framework provides accurate, interpretable, and risk-oriented support for groundwater fluoride monitoring in alluvial aquifer systems.

Keywords:

fluoride contamination; groundwater quality; hydrogeochemistry; machine learning; FT-transformer; risk prediction; environmental modeling

1. Introduction

Fluoride is a naturally occurring groundwater constituent which becomes a significant public-health issue when present at elevated concentrations [1,2]. Prolonged use of drinking water containing high content of fluoride has been linked to systemic health consequences such as dental fluorosis and, in the long term, skeletal fluorosis. In turn, the World Health Organization (WHO) provides a guideline range of 1.5 mg/L for drinking water, while acknowledging the fact that local requirements can be adjusted to the climate, extent of water consumption, and cumulative exposure pathways [3]. Groundwater dependence and hydrogeochemical conditions that support fluoride mobilization cause persistent exceedance hotspots in many semi-arid to arid regions, making fluoride one of the most widespread geogenic contaminants worldwide [4,5].

In groundwater-dependent communities, untreated groundwater is a major fluoride exposure pathway. Long-term consumption of water exceeding the WHO guideline of 1.5 mg/L may increase the risk of dental and skeletal fluorosis, particularly in semi-arid regions where water intake is relatively high. Therefore, identifying high-fluoride wells is essential for safe drinking-water management and public-health protection [6]. South Asia is one of the significant world regions that face intense constraints of groundwater quality because of excess population, high-intensity irrigation, and use of alluvial aquifers that can be classified as shallow to shallow–intermediate [7]. In the Indus Basin and related doab systems, the behavior of fluoride is very much governed by the hydrogeochemical evolution particularly when the conditions are characterized by low calcium activity, high bicarbonate and sodium and when there is active exchange of ion activity. These states improve the solubility of fluoride and inhibit precipitation of fluorite, which favors enrichment of ground water [8,9]. Hydrochemical monitoring of the Punjab plains at large scales has recorded spatially organized fluoride concentrations and attributed them to fluorite equilibrium, alkalinity of carbonate and slowly progressive water–rock interaction [8]. On smaller scales, isotope-hydrochemical measurements in the Rechna Doab have also shown salinity evolution and residence time of groundwater to be controlling factors of fluoride occurrence [10].

Pakistan’s Punjab Province is especially susceptible because groundwater serves as the primary source of consumption and irrigation on the domestic level, and the aquifer systems are progressively strained due to the agricultural expansions, urbanization, and industry. Several studies conducted in Punjab have documented that there are high concentrations of fluoride in drinking-water wells and emphasized related health problems [11,12,13,14]. These analyses showed that fluoride contamination occurs in spatially heterogeneous pockets rather than uniformly across aquifers, implying localized hydrogeochemical controls and heterogeneity. In spite of the mounting evidence, significant knowledge gaps remain at the sub-district level, especially in emerging agricultural–industrial corridors where groundwater chemistry reflects both geogenic and anthropogenic impacts [15,16]. With the development of hydrogeochemistry, data-driven methods also have been used tangibly more in the prediction and mapping of groundwater quality.

Machine-learning (ML) techniques are also particularly suited to hydrochemical systems as they can capture nonlinear relationships and higher-order interactions between environmental predictors that challenge traditional statistical approaches [17]. Nevertheless, numerous groundwater ML studies [18,19,20] lack validation rigor, a well-tested model-reliability assessment, and interpretability, making them less trusted and hydrogeochemically informed. Explainable machine-learning techniques, such as Shapley Additive Explanations (SHAP), explicitly quantify the contribution of each predictor to model output and link model predictions with environmental mechanisms, thereby improving interpretability and scientific validity [21]. Recent fluoride prediction experiments [22,23] have demonstrated that boosting algorithms with SHAP-based interpretation can clarify geochemical factors and improve risk communication.

There are two methodological gaps, which are crucial for developing groundwater fluoride modeling. To begin with, some predictive studies give great emphasis to discrimination metrics (ROC-AUC) whilst being insufficient in assessing probability calibration, which is essential to implement risk-based screening and decision-making [24]. Secondly, deep learning methodologies for tabular hydrochemical datasets are still poorly explored. Feature-tokenizer transformers (FT-Transformer) present a promising approach for understanding complex feature interactions in structured environmental data; however, their performance, robustness, and interpretability in hydrogeochemical applications in the context of South Asian alluvial aquifers have largely been unproven [21,24].

To address these gaps, this study examines groundwater fluoride dynamics in Tehsil Jaranwala (Faisalabad District, Punjab, Pakistan) within the Rechna Doab alluvial aquifer system, where fluoride exceedance is significant issue. An integrated predictive framework is developed that (i) models fluoride concentration as a regression task and (ii) classifies exceedance risk relative to the 1.5 mg/L guideline threshold. ML models including linear regression, random forest, and gradient boosting are combined with a deep tabular learning model and validated using nested cross-validation and independent test analysis. In addition to predictive accuracy, the research assesses probability calibration, model resilience, and SHAP and permutation importance to synthesize model behavior into hydrogeochemical insight.

This study introduces a unified framework for the predictive modeling of groundwater quality using both hydrogeochemical analysis and explainable machine-learning algorithms. A dual task modeling strategy is developed to predict fluoride concentration and exceedance risk against WHO drinking-water guidelines, facilitating both quantitative estimates and risk-oriented groundwater screening. The study uses nested cross-validation and independent testing for rigorous validation, ensuring the predictive performance is reliable and unbiased. Additionally, explainable AI methods such as SHAP and permutation importance are utilized to connect model predictions with the hydrogeochemical processes affecting fluoride mobilization. By linking the data-driven insights with geochemical mechanisms, the proposed framework provides interpretable and reliable predictions that support targeted groundwater monitoring, risk assessment and enhanced understandings of fluoride contamination in intensively used alluvial aquifers in Punjab.

2. Materials and Methods

This section describes the study design, data preparation, and modeling approach for groundwater fluoride concentration and exceedance risk. To ensure accurate predictions, it covers model construction, optimization, evaluation, interpretability, calibration, and robust analysis.

2.1. Study Area

The research was conducted in Tehsil Jaranwala, Faisalabad District, which is one of the largest sub-districts in Punjab, Pakistan, located within the Rechna Doab, the interfluvial tract in the Ravi–Chenab rivers, as indicated in Figure 1. This area is a part of the highly fertilized Indus Basin irrigation complex, where groundwater is the main source of domestic and agricultural water. It is a semi-arid climate, marked by hot summer and mild winters along with highly seasonal precipitation; most of the rain falls during the summer monsoon (June–September), supplying occasional groundwater recharge [25,26]. The geologic composition of the aquifer system is thick uncompacted quaternary alluvial beds of interlaced sand, silt and clay beds. These are laterally extending in nature and become multi-layered aquifers of sands providing the groundwater with high permeability to move vertically while few clayey interbeds locally vegetate with vertical mobility and transport solutes and hydrochemical evolution. Its major sources of recharge are rainfall infiltration and seeping of canals in the presence of the extensive irrigation system, but groundwater abstraction to support irrigation and household purposes has also led to a decrease in the water table and a limited-scale worsening of water quality [27,28].

Hydrogeologically, Tehsil Jaranwala belongs to the Rechna Doab aquifer system within the broader Indus Basin alluvial aquifer. Groundwater flow and residence time are controlled by regional hydraulic gradients, pumping intensity, recharge from the canal-irrigation network, and local lithological variability. These basin-scale hydrogeological conditions influence groundwater mineralization, salinity evolution, water–rock interaction, and the mobility of fluoride-bearing minerals [27,28]. Land use is primarily characterized by irrigated agriculture (wheat, rice, sugarcane, and fodder crops), while the growing urban and industrial expansion around Faisalabad intensifies the strain on groundwater due to fertilizer application, irrigation return flow, and wastewater inputs. Fluoride contamination has become a growing concern in Punjab’s alluvial aquifers where the levels may surpass the WHO drinking-water guideline of 1.5 mg/L in spatially heterogeneous pockets. Fluoride enrichment is predominantly controlled by geogenic and hydrochemical processes, including dissolution of fluoride-bearing minerals, longer water–rock interaction, alkaline conditions, ion exchange, carbonate equilibrium, reduced Ca²⁺ activity, and evolution toward Ca-depleted Na-HCO₃-type groundwater [29]. The dataset used in this work contains groundwater samples tested in respect of fluoride along with all kinds of physicochemical and major-ion parameters to be the core of quantitative modeling of fluoride level and exceedance risk in a high-use alluvial aquifer system.

Table 1 summarizes the descriptive statistics of 350 groundwater samples collected once from wells across Tehsil Jaranwala during February–September. The dataset represents a single sampling campaign covering part of the agricultural season, rather than continuous monitoring of the full agrarian cycle. Sampling sites were selected to cover major agricultural, settlement, and groundwater-use areas and to represent spatial variability in groundwater quality. Based on the mapped study-area extent of approximately 1770 km², the estimated sampling density was about 0.20 samples/km². Detailed well-depth information was not available for all sampling locations; therefore, vertical variability in fluoride occurrence could not be evaluated in this study. High variability in EC, TDS, Na⁺, Cl⁻, and SO₄²⁻ indicates substantial hydrochemical heterogeneity, likely related to variable recharge, groundwater residence time, and water–rock interaction. Fluoride shows a positively skewed distribution with localized exceedances above the WHO guideline, indicating spatially heterogeneous contamination rather than uniform aquifer-wide enrichment. The measured parameters included pH, EC, turbidity, TDS, Ca²⁺, Mg²⁺, TH, HCO₃⁻, alkalinity, Cl⁻, K⁺, Na⁺, SO₄²⁻, NO₃⁻, and F⁻.

2.2. Sampling, Analytical Methods, and Quality Control

Groundwater samples were collected from selected wells and hand pumps across Tehsil Jaranwala and analyzed for physicochemical and major-ion parameters. Field parameters, including pH, electrical conductivity (EC), total dissolved solids (TDS), and turbidity, were measured using calibrated portable meters. Major cations and anions were analyzed using standard laboratory procedures: Na⁺ and K⁺ by flame photometry; Ca²⁺ and Mg²⁺ by EDTA titration; alkalinity and HCO₃⁻ by acid titration; Cl⁻ by argentometric titration; and SO₄²⁻, NO₃⁻, and F⁻ by UV–visible spectrophotometry. Analytical-grade reagents, including EDTA, standard acid solution, silver nitrate, barium chloride, nitrate reagents, fluoride standards, and SPADNS reagent, were used with deionized water. Quality assurance and quality control were maintained through instrument calibration with standard solutions, calibration records, reagent blanks, duplicate sample analysis, and analytical precision checks. Major-ion reliability was further evaluated using ionic balance error between total cations and total anions expressed in milliequivalents per liter. Samples with unacceptable analytical deviation were rechecked before statistical analysis and model development. Overall, all measurements and analyses followed standard groundwater-quality analytical procedures to ensure data reliability.

2.3. Study Design and Analytical Framework

This study proposes and empirically validates a unified predictive framework for groundwater fluoride concentration by integrating hydrogeochemical predictors with machine-learning and deep tabular learning models. The overall methodological workflow is illustrated in Figure 2. The analytical workflow was executed with a sequential pipeline consisting of thorough data quality screening and preprocessing to minimize possible bias and achieve methodological rigor, followed by building target variables for regression and classification. The models were trained with systematically hyperparameter optimization and were eventually tested on one independent hold-out test set. The model reliability was determined through the discrimination and calibration diagnostics of the classification exercise. The interpretability was considered based on feature-attribution and importance analysis to relate patterns of predictions to the potential hydrogeochemical constraints. Noise-sensitivity experiments were used to test the stability of the predictions between experiments to measure uncertainty and hydrochemical variation.

2.4. Data Preparation, Predictors, and Target Definition

The dataset consisted of 350 groundwater samples collected from Tehsil Jaranwala, Rechna Doab aquifer, Punjab, Pakistan. Each sample included measured physicochemical and major-ion parameters. A total of 15 hydrochemical variables were available: pH, EC, turbidity, TDS, Ca²⁺, Mg²⁺, TH, HCO₃⁻, Cl⁻, K⁺, Na⁺, SO₄²⁻, NO₃⁻, F⁻. For machine-learning modeling, fluoride concentration was treated as the target variable for the regression task, whereas the remaining 14 physicochemical and hydrochemical parameters were used as predictor features. The data matrix was structured as X ∈ R^n×p, where n represents the number of groundwater samples and p = 14 represents the predictor variables. The regression response variable was the measured fluoride concentration in mg/L. For classification, a binary exceedance-risk label was generated using the WHO drinking-water guideline of 1.5 mg/L. Samples with F⁻ > 1.5 mg/L were classified as high-risk groundwater samples (Risk = 1), whereas samples with F⁻ ≤ 1.5 mg/L were classified as low-risk groundwater samples (Risk = 0).

Before modeling, the dataset was screened to ensure data quality and consistency. Duplicate records were removed, parameter names and units were standardized, and all variables were checked for missing values, non-physical values, and abnormal entries. Descriptive statistics were calculated to examine data ranges, central tendency, dispersion, and coefficients of variation. Highly skewed variables were inspected, and preprocessing was applied only within the training pipeline to avoid information leakage. The dataset was then stratified according to the fluoride-risk class and divided into training, validation, and independent test subsets using a 64:16:20 ratio. The independent test set was reserved strictly for final model evaluation. Early stopping and learning-curve monitoring for the FT-Transformer were conducted only on the validation set, while classical machine-learning models were optimized using nested cross-validation.

2.5. Model Development and Hyperparameter Tuning

The regression analysis compared three model families: linear regression (base), XGBoost regressor, and random forest regressor. Classification analysis compared logistic regression (baseline), XGBoost classifier, and random forest classifier. Class imbalance was addressed by using class weighting for logistic regression and random forest, and scale-pos-weight parameter for XGBoost. Nested cross-validation was used for model selection and unbiased performance estimation, minimizing optimistic bias in optimized models [30]. Performance was estimated by a five-fold outer loop to yield unbiased estimates, and hyperparameter optimization was carried out through a five-fold inner loop. The regression models were optimized to negative RMSE, and the classification models were optimized for ROC-AUC. Final regression model was selected based on the minimum mean outer-fold RMSE. The mean outer-fold recall (sensitivity) was the main criterion, and ROC-AUC was the secondary criterion for selecting the final classification model, complying with the screening goal to minimize false negatives. The selected models were chosen to represent different levels of model complexity and interpretability. Linear and logistic regression served as transparent baseline models, while Random Forest and XGBoost were used to capture nonlinear relationships and interactions among hydrochemical variables. The FT-Transformer was included to evaluate the potential of attention-based deep tabular learning for modeling higher-order feature interactions. This model selection enabled a systematic comparison of predictive accuracy, interpretability, and complexity for fluoride concentration prediction and exceedance-risk classification. All statistical analyses and machine-learning models were implemented in Python using the scikit-learn, XGBoost, SHAP, PyTorch, and PyTorch Tabular packages.

2.6. FT-Transformer Implementation and Learning Behavior

The FT-Transformer was implemented to evaluate whether attention-based deep tabular learning can better capture nonlinear hydrogeochemical interactions compared to tree-based models. A tabular deep-learning model based on the FT-Transformer architecture was trained for both regression and classification to represent complex nonlinear interactions between hydrogeochemical predictors. The FT-Transformer, tailored to tabular data, considers each input feature as a single token, allowing the model to learn higher-order interactions using self-attention [31]. Assuming a predictor vector

x = [x_{1}, x_{2}, \dots, x_{F}]

, the

x_{j}

hydrogeochemical feature is represented in a continuous embedding space by a learnable feature tokenizer:

e_{j} = Embed (x_{j}), E = [e_{1}, e_{2}, \dots, e_{F}] \in R^{F \times d_{model}}

(1)

where Embed (.) is executed as a learnable projection layer, and

d_{model}

represents the embedding dimension. This allows viewing of each hydrochemical variable as a token in a transformer sequence. Multi-head self-attention between feature tokens using the scaled dot-product attention mechanism captures contextual dependencies between hydrochemical variables, e.g., interactions between salinity indicators and major ions:

Attention (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

(2)

where Q is the query, K is the key, and V is the value projection of the embedded tokens. Feed-forward layers and stacked attention blocks learn complicated nonlinear relationships that dictate fluoride behavior. For classification, a pooled feature representation

z

is passed to a prediction head to estimate exceedance probability:

\hat{y} = σ (w^{⊤} z + b)

(3)

where

σ (\cdot)

refers to the sigmoid activation function. Binary cross-entropy loss optimizes model parameters:

L_{clf} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} l o g ({\hat{p}}_{i}) + (1 - y_{i}) l o g (1 - {\hat{p}}_{i})]

(4)

Regression model reduces the mean squared error between the measured and the predicted fluoride concentration. The FT-Transformer was trained with the pytorch-tabular framework, consisting of four attention blocks, eight attention heads, a learning rate of

1 \times 10^{- 3}

, and 0.1 dropout. Training continued up to 200 epochs, and early stopping (patience = 15) on a validation set to avoid over-fitting. Learning dynamics was studied through the analysis of training and validation loss curves to understand convergence and generalization stability. The final model performance was tested on the independent test set through the same regression and classification indicators used in the prediction of traditional ML models, enabling direct comparison of accuracy and reliability.

2.7. Explainable AI Analysis

To increase the transparency of the model, and to associate predictive behavior with hydrogeochemical processes, explainable artificial intelligence was implemented using two complementary approaches, i.e., SHAP and permutation-based feature importance. SHAP provides a game-theoretic attribution model that decomposes the model output as a sum of additive contributions of each predictor [32,33]. In the case of a trained model

f (x)

, the prediction of a sample x is given as:

f (x) = ϕ_{0} + \sum_{j = 1}^{F} ϕ_{j}

(5)

In which

ϕ_{j}

is the Shapley value of the marginal contribution of feature

j

to the prediction relative to the expected model output (

ϕ_{0})

. TreeExplainer computed SHAP values on the highest-performing tree-based model (Random Forest or XGBoost) on the fully preprocessed feature space ensuring compatibility with the trained prediction pipeline. The mean absolute SHAP magnitude provided global feature importance, whereas the dependence analysis analyzed nonlinear response patterns and feature interactions of importance to fluoride mobilization. Permutation importance, which is a model-agnostic and independent measure of feature relevance, was obtained by randomly permuting each predictor and assessing the resulting predictive performance degradation. The importance of a variable, feature

X_{j}

, is measured as:

I_{j} = E [L (f (X_{perm (j)})) - L (f (X))]

(6)

where

L

refers to the evaluation loss (RMSE in regression) and

X_{perm (j)}

is the loss evaluated on the dataset with randomly permuted feature

j

. The increased performance degradation means more feature influence. To ensure the robustness of SHAP attribution towards hydrogeochemical drivers and not model-specific artefacts, consistency between SHAP attribution and permutation importance was employed. XAI analyses enable internal and interaction-level interpretation, relating machine-learning predictions to well-known hydrogeochemical controls of fluoride behavior.

2.8. Performance Metrics and Diagnostic Evaluation

The model performance was measured using a complementary metric that measured prediction accuracy, discrimination, ability to withstand class imbalance, and probability reliability.

Regression metrics: Coefficient of determination

(R^{2})

was used to measure the predictive power of fluoride concentration, mean absolute error (MAE), and root-mean-squared error (RMSE):

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}

(7)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}

(8)

MAE = \frac{1}{n} \sum_{i = 1}^{n} ∣ y_{i} - {\hat{y}}_{i} ∣

(9)

where

y_{i}

,

{\hat{y}}_{i}

, and

\bar{y}

are the observed, predicted, and the mean observed fluoride concentrations of the samples, respectively, and

n

signifies the sample number. RMSE penalizes large errors more severely than MAE [34].

Classification metrics: To predict exceedance risk, ROC-AUC [35] measured the discriminatory performance of a model without reference to a decision point. The threshold-dependent metrics were calculated after transforming the predicted probabilities

{\hat{p}}_{i}

into class labels with a default threshold of 0.5. Sensitivity (recall), precision, F1-score, and balanced accuracy were calculated as:

Recall = \frac{T P}{T P + F N}

(10)

Precision = \frac{T P}{T P + F P}

(11)

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(12)

Balanced Accuracy = \frac{1}{2} (\frac{T P}{T P + F N}+ \frac{T N}{T N + F P})

(13)

where

T P

,

T N

,

F P

, and

F N

are the true positives, true negatives, false positives, and false negatives, respectively. Balanced accuracy addresses the imbalance of classes in predicting exceedance risk by averaging sensitivity and specificity.

Probability calibration and reliability: Since exceedance risk is viewed in terms of probability, the calibration quality was measured in the form of the Brier score [36]:

Brier = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{p}}_{i} - y_{i})^{2}

(14)

where true risk value is given by

y_{i} \in {0,1},

and

{\hat{p}}_{i}

denotes the predicted exceedance probability. A smaller Brier value implies a higher probability calibration. To assess the correspondence between predicted occurrence probability and observed event frequency, both 10-bin and 20-bin probability discretization were used to construct calibration (reliability) curves. Further diagnostic visualizations such as ROC curves, precision recall curves [37], confusion matrices and calibration curves were supplementary with respect to discrimination, threshold behavior, and reliability.

3. Simulation Results

This section describes hydrogeochemical groundwater and compares machine-learning and deep tabular models to predict the fluoride concentration and exceedance risk. Interpretations are based on hydrogeochemical processes, model interpretability, calibration, and robustness to provide reliable insights into fluoride mobilization.

3.1. Hydrogeochemical Characteristics and Fluoride Occurrence

Groundwater chemistry showed substantial spatial heterogeneity, indicating variable hydrogeochemical evolution across the study area. Fluoride concentrations ranged from 0.00 to 3.39 mg/L, with a mean of 0.88 mg/L, median of 0.86 mg/L, and standard deviation of 0.54 mg/L. Although the mean fluoride concentration was below the WHO drinking-water guideline of 1.5 mg/L, 70 samples (20%) exceeded this threshold, indicating localized fluoride contamination. Figure 3 shows a positively skewed fluoride distribution, with most samples below 1.0 mg/L and a distinct high-fluoride tail. Higher fluoride concentrations were associated with more mineralized groundwater, elevated alkalinity, bicarbonate, sodium, and TDS, suggesting the influence of water–rock interaction, ion exchange, and carbonate-equilibrium processes. These conditions can reduce Ca²⁺ activity and promote fluoride release from fluoride-bearing minerals such as fluorite, biotite, and apatite.

3.2. Regression Model Performance Comparison

Regression analysis was used to quantify the nonlinear association between the hydrogeochemical predictors and the dissolved fluoride concentrations. Nested cross-validation revealed clear variations in the models’ performance provided by methodological families, with the nonlinear ensemble models showing clear superiority in predictive accuracy compared to the linear baseline. The simple linear relationships proved to be insufficient to explain the variability of fluoride as linear regression presented limited explanatory capability (mean R² = 0.464; RMSE = 0.383). In contrast, the predictive strength of tree-based ensemble models was very high, which indicates the capability to predict complex correlations between hydrochemical variables. Specifically, Random Forest achieved the strong cross-validated performance (Mean R² = 0.886; RMSE = 0.178), and XGBoost achieved the best cross-validated accuracy (mean R² = 0.913; RMSE = 0.157), that had lower fold-to-fold error variance, which confirmed consistency in generalization within partitions of the data, as in Table 2.

Independent test evaluation further supported model robustness. The predicted fluoride concentrations were correlated with the 1:1 reference line, the residuals were both symmetric about zero, and thus the systematic bias was negligible, as shown in Figure 4.

The presence of slightly higher prediction errors at higher fluoride concentrations shows more geochemical heterogeneity in strongly mineralized conditions as opposed to a model instability. The optimized XGBoost model achieved R² = 0.877, RMSE = 0.189 mg/L, and MAE = 0.084 mg/L on the independent test set, which indicates strong agreement between the actual and predicted values presented in Table 3. Similar performance was obtained with the FT-Transformer (R² = 0.877; RMSE = 0.190), which therefore indicates that deep tabular learning can effectively capture nonlinear hydrogeochemical relationships, although not superior to tree-based ensembles given the dataset size. These superior nonlinear models imply interactive fluoride mobilization processes, such as mineral dissolution equilibrium, desorption under the influence of alkalinity, and salinity effects. Overall, the regression model can provide precise, objective, and generalizable estimates of the fluoride, supporting subsequent risk classification and hydrogeochemical interpretation.

3.3. Classification of Fluoride Exceedance Risk

Machine learning models were evaluated for risk-oriented groundwater screening to classify fluoride excess in comparison with the WHO guideline value of 1.5 mg/L. Nested cross-validation analysis showed consistent high discrimination among models as shown in Table 4, which confirmed hydrogeochemical signatures as effective predictors of exceedance risk. Random Forest showed the best discriminatory performance (ROC-AUC = 0.997), while logistic regression showed the highest sensitivity (recall = 0.960), supporting the screening objective of reducing false negatives.

Table 5 shows the independent test evaluation revealed the competitive and balanced performance of FT-Transformer, obtaining ROC-AUC = 0.948, F1-score = 0.933, perfect precision (1.00), and a high recall (0.875). The accuracy of calibration was significantly higher than that of logistic regression (Brier = 0.0213 vs. 0.0709), emphasizing a reliable probabilistic forecast for early-warning decision-support systems.

The confusion matrix provides further understanding of the classification behavior illustrates in Figure 5. The model accurately distinguished most low-risk samples (TN = 58) and high-risk samples (TP = 7) with one false negative and four false positives. The minimal false-negative outcome confirms the effective detection of the dangerous fluoride levels, lowering the undetected exposure risk.

Figure 6 shows that receiver-operating characteristic curves show high separability, with all the models scoring ROC-AUC above 0.92. XGBoost had the greatest discrimination (AUC = 0.972), followed by Random Forest (AUC = 0.953) and logistic regression (AUC = 0.925). The high initial ROC slope indicates a strong separation between high- and low-fluoride groundwater at false-positive rates. This excellent classification performance indicates specific geochemical characteristics related to fluoride enrichment, such as high alkalinity, salinity, and cation-exchange conditions. The classification framework generally provides precise, reliable, and well-calibrated exceedance predictions, hence assuming its use in groundwater risk screening and public-health monitoring.

3.4. Comparative Performance of ML and Deep Tabular Modeling

Comparative analysis of classical ML models and the FT-Transformer displayed complementary advantages in simulating fluoride dynamics. Figure 7 shows both training and validation loss convergence in the first few epochs, followed by steady, parallel learning curves. This demonstrates little overfitting and ideal generalization. The gradual decrease in the validation loss indicates that the FT-Transformer captured underlying nonlinear hydrogeochemical structure rather than random noise, reflecting a well-optimized training process with sufficient regularization.

Independent test evaluation also supports model strength. Figure 8 illustrates the observed–predicted relationship of the FT-Transformer, which is similar to that of the 1:1 reference line, thus verifying a high-predictive power across fluoride concentrations. The slight dispersion rise at high fluoride concentrations is probably due to more hydrogeochemical variability at strongly mineralized conditions, where complex equilibrium processes cause nonlinear behavior in responses, rather than model instability. XGBoost and the FT-Transformer showed similar regression accuracy, with RMSE values of 0.189 mg/L and 0.190 mg/L, respectively, indicating that both ensemble and deep tabular models effectively captured nonlinear hydrogeochemical interactions.

Figure 9 summarizes the independent test comparison. Both models showed comparable regression performance, while the FT-Transformer achieved high classification sensitivity (recall = 0.875) with improved precision and probability calibration. However, the FT-Transformer showed better precision and better probability calibration presents, implying more reliability in exceedance-risk estimation. Ensemble tree-based models provide robust and stable regression capabilities, but the FT-Transformer has a higher probabilistic classification capability. These results indicate that the strength of hydrogeochemical signals, not the model architecture, is the main limiting factor in fluoride prediction, and deep tabular learning adds more value to calibration of risk prediction.

3.5. Hydrogeochemical Controls and Model Interpretability

Figure 10 shows SHAP global feature importance, ranking magnesium, EC, and calcium as the most significant predictors, followed by total hardness, sulfate, chloride, bicarbonate, potassium, TDS, and nitrate. The strong impact of EC and TDS proves that fluoride enrichment is linked to mineralization of the groundwater by long-term contact between water and rocks. High magnesium and hardness are indications of carbonate and silicate weathering that is inclined to co-exist with alkaline environments, promoting fluoride release.

Calcium has a strong regulatory effect, as is expected when controlled by fluorite (CaF₂) solubility, with the less active Ca²⁺ leaving an increased concentration of dissolved fluoride. SHAP dependence analysis indicates strong nonlinear associations between the main hydrogeochemical variables and fluoride behavior as shown Figure 11.

Calcium increase is correlated with negative SHAP values, which implies the inhibition of fluoride concentration in high Ca²⁺ conditions. This illustrates the fluorite precipitation balance, where increased calcium restricts the fluoride movement. Interaction coloring proposes that sodium-rich water enhances fluoride in low-calcium conditions, in line with the cation exchange and Ca, Na replacement reaction that lowers calcium activity and increases fluoride dissolution. EC has a positive, monotonic relationship with SHAP values, meaning that greater mineralization leads to larger enrichment of fluoride, which is probably because of a longer residence time of the groundwater and more reaction between water and rock. TDS is nonlinear: moderate levels of mineralization only increase fluoride slightly, but extremely high TDS can also lead to a decrease in the predicted fluoride, possibly due to competitive ion effects, dilution, or mixed hydrochemical facies. Sodium has a positive effect, thus giving further evidence to the effect of ion exchange in facilitating the mobility of fluoride. These trends support the notion that the process of fluoride contamination is the product of interacting geochemical processes and is not due to simple linear relationships. Fluoride behavior is highly coupled to calcium and bicarbonate, as indicated by interaction analysis in Figure 12. With low-calcium levels, high bicarbonate provides higher SHAP interaction values which implies greater fluoride mobilization.

This response corresponds to the carbonate equilibrium, where the bicarbonate buffering reaction facilitates alkalinization of pH and elimination of fluoride on the mineral surfaces and decreases Ca²⁺ activity through precipitation of carbonate. Conversely, augmentation of calcium comparative to the diminished bicarbonate prevents fluoride, advancing in the control of fluorite saturation. These findings show that fluoride mobilization is regulated by multivariate geochemical equilibrium instead of single parameter dominance. Figure 13 shows permutation importance analysis that independently confirms SHAP-based rankings, with EC, magnesium and calcium exhibiting the most influential performance regression on perturbation. It is shown that the model-agnostic permutation importance correlates to SHAP attribution, proving that the identified predictors are actual hydrogeochemical drivers rather than model artifacts. Chloride, potassium, bicarbonate and sulfate are recommended to have a secondary contribution of fluoride variability although they are not key control variables. Collectively, findings regarding interpretability are very consistent in terms of ML attribution and known hydrogeochemical theory, suggesting that the predictive framework reflects physically important controls on fluoride contamination. The feature-importance results were interpreted considering the correlation among salinity- and hardness-related variables. EC is a proxy for overall groundwater mineralization and may capture information shared by TDS, major ions, and TH, while TH partly overlaps with Ca²⁺ and Mg²⁺. Therefore, the lower marginal importance of Ca²⁺ or TH does not indicate hydrogeochemical insignificance. Calcium remains a key control on fluoride mobility because low Ca²⁺ activity favors fluorite dissolution and limits fluoride removal through CaF₂ precipitation, while carbonate precipitation and ion exchange can further reduce Ca²⁺ activity and promote fluoride enrichment. Similar effects of correlated predictors on feature-attribution methods have also been reported in previous studies [14,38].

3.6. Calibration, Reliability, and Robustness Analysis

Probability calibration was used to test predictive reliability of fluoride exceedance risk. Figure 14 indicates that there is a close correspondence between the observed exceedance frequency and predicted probabilities, which is an indicator of well-calibrated model behavior.

The logistic regression baseline was also moderately calibrated (Brier score = 0.0709), with an insignificant underestimation of intermediate probability ranges. On the other hand, superior models possessed better reliability, with calibration curves demonstrated strong correlations within high-risk scales that are essential in decision support. These results reveal that the classification model provides credible probabilistic predictions and prioritizes risks by groundwater. Figure 15 represents the model robustness in controlled noise-injection experiments. Prediction error (RMSE) was consistent with successive perturbation trials with no systematic performance decline. This consistency shows the strength of the model when there is moderate measurement error and hydrochemical variability in groundwater data. Lack of responsiveness to noise shows that model forecasts are primarily preferred by consistent hydrogeochemical signals as opposed to random patterns.

Integrating robustness, interpretability, and calibration analysis reveals a definite and steady hydrogeochemical control of fluoride mobilization. The main consideration of influencing factors of fluoride release are low calcium, high sodium and high salinity (EC and TDS). These trends are in accordance with the well-known geochemical reactions, including mineral dissolution, ion exchange, and alkalinity buffering. The fact that SHAP values, permutation importance and robustness test are very similar signals to statistical validity and environmental relevance of the model. Comprehensively, the framework can provide reliable and interpretable predictions fluoride contamination risks in groundwater systems. Since groundwater is widely used for domestic drinking purposes in the study area, wells exceeding 1.5 mg/L fluoride should be prioritized for monitoring, public awareness, and suitable treatment or alternative water-supply options.

4. Discussion

This study shows that fluoride contamination in the Rechna Doab alluvial aquifer is mainly associated with hydrogeochemical evolution rather than a single-solute anomaly. Fluoride concentrations ranged from 0.00 to 3.39 mg/L, with a mean of 0.88 mg/L, and 70 samples (20%) exceeded the WHO drinking-water guideline of 1.5 mg/L. The positively skewed distribution indicates localized geogenic enrichment rather than uniform aquifer-wide contamination. Similar spatially heterogeneous fluoride patterns have been reported in Punjab aquifers, where fluoride enrichment is commonly linked to water–rock interaction, aquifer heterogeneity, and chemically evolved groundwater [11,39,40]. Since groundwater is widely used for domestic drinking purposes in the study area, wells exceeding 1.5 mg/L fluoride should be prioritized for monitoring, public awareness, and suitable treatment or alternative water-supply options. The strong difference between linear and nonlinear models confirms that fluoride behavior is controlled by interacting hydrogeochemical processes. The linear baseline explained less than half of the fluoride variability, whereas Random Forest and XGBoost achieved substantially higher accuracy. On the independent test set, XGBoost showed strong predictive performance with R² = 0.877, RMSE = 0.189 mg/L, and MAE = 0.084 mg/L. This indicates that the predictor set captured a clear hydrochemical signal of fluoride mobilization. Such behavior is hydrogeochemically reasonable because fluoride enrichment in alluvial aquifers is often controlled by progressive mineral dissolution, ion exchange, alkalinity buffering, salinity evolution, and groundwater residence time.

The interpretability results further support the hydrogeochemical relevance of the models. SHAP and permutation-importance analyses identified EC, Mg²⁺, and Ca²⁺ as key predictors. EC reflects groundwater mineralization and may increase with longer residence time, evaporation, irrigation return flow, and water–rock interaction. Calcium has a direct regulatory role because higher Ca²⁺ activity can limit dissolved fluoride through fluorite precipitation, whereas Ca-depleted Na-HCO₃-type groundwater favors fluoride enrichment. Magnesium may indicate carbonate and silicate weathering, dolomite/calcite interaction, and ion-exchange processes under alkaline conditions. Agricultural inputs, atmospheric deposition, soil weathering, groundwater flow path, and residence time may also influence the Ca–Mg–Na balance and indirectly affect fluoride mobility. The fluoride occurrence observed in this study is comparable with previous groundwater studies in Pakistan and nearby Punjab aquifer systems. For example, previous studies in South Punjab and the wider Punjab plains reported fluoride exceedances above the WHO guideline and linked fluoride enrichment to water–rock interaction, fluoride-bearing mineral dissolution, ion exchange, low Ca²⁺ activity, and alkaline Na-HCO₃-type groundwater [12,14]. Machine-learning-based studies in Pakistan and Punjab, India, also showed that Random Forest and XGBoost-type models are effective for predicting groundwater fluoride because they capture nonlinear relationships among hydrochemical predictors [41,42]. Compared with previous studies, the present work adds calibration analysis, FT-Transformer modeling, and SHAP-based hydrogeochemical interpretation.

The exceedance-classification task is important for groundwater management because intervention is usually threshold-based. Cross-validation showed strong discrimination among classifiers, confirming that hydrochemical signatures can distinguish high- and low-fluoride groundwater. On the independent test set, the FT-Transformer achieved ROC-AUC = 0.948, recall = 0.875, precision = 1.000, F1-score = 0.933, and Brier score = 0.021, indicating reliable classification and probability calibration. The confusion matrix showed only one false negative, which is important because missed high-fluoride wells may increase public-health risk. Robustness analysis also showed stable RMSE under noise perturbation, suggesting that the model predictions were driven mainly by consistent hydrogeochemical signals rather than random variation. Overall, the results demonstrate that combining hydrogeochemical indicators with explainable and calibration-aware machine-learning models can provide reliable support for groundwater fluoride screening. The framework is useful for sub-district-scale risk ranking, targeted monitoring, and prioritization of wells requiring treatment or alternative drinking-water supply.

Limitations

Although the proposed framework achieved accurate and interpretable predictions of groundwater fluoride contamination, some limitations should be noted. First, the dataset was limited to Tehsil Jaranwala in the Rechna Doab aquifer; therefore, model transferability to other aquifers requires external validation or retraining with local data. Second, the study mainly used physicochemical and major-ion parameters, while additional variables such as well depth, aquifer lithology, groundwater age, pumping rate, land use, seasonal water-table variation, and mineral saturation indices could improve interpretation and prediction. Third, machine-learning models identify statistical relationships but do not directly prove causality; therefore, future studies should combine ML with hydrogeochemical modeling, isotope analysis, and long-term monitoring. Finally, the relatively small number of high-fluoride samples and the lack of multi-season sampling may limit classification stability under changing recharge, evaporation, and irrigation-return-flow conditions.

5. Conclusions

This study demonstrates that integrating hydrogeochemical indicators with explainable machine learning models can support the reliable identification of fluoride contamination risk in the Rechna Doab alluvial aquifer. The results show that fluoride enrichment is spatially heterogeneous and mainly associated with groundwater mineralization, reduced Ca²⁺ activity, Mg²⁺ variation, ion exchange, carbonate equilibrium, and water–rock interaction. Nonlinear and deep tabular models provided strong predictive performance, while SHAP and permutation-importance analyses helped link model outputs with hydrogeochemical processes. These findings support the use of data-driven tools for targeted groundwater monitoring, risk-based screening, and drinking-water management in areas affected by fluoride. Future studies should include multi-season sampling, well-depth information, hydrogeological layers, mineral saturation indices, isotope data, and external validation datasets to improve model transferability and mechanistic interpretation.

Author Contributions

N.G., X.L. and A.R., Conceptualization.; N.G., X.L. and A.R., Methodology.; N.G. and A.R., Writing—Original Draft, X.L. and Z.X., Supervision.; N.G. and A.R., Conceptualization, X.L., Z.X. and A.R., Validation.; Z.X. and A.R., Review and Editing.; N.G., X.L. and A.R., Coding.; N.G., X.L. and A.R., Data Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Fundamental Research Funds for the Central Universities (中央高校基本科研业务费专项资金资助) (Grant No. 2682025CX157) the National Natural Science Foundation of China (Grant Nos. 42477200 and 42307089), and the Innovative Practice Bases of Geological Engineering and Surveying Engineering of Southwest Jiaotong University (Grant No. YJG-2022-JD04).

Data Availability Statement

The dataset presented in this study is not readily available because they form part of an ongoing research project. Requests to access the dataset should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tiwari, K.K.; Raghav, R.; Pandey, R. Recent advancements in fluoride impact on human health: A critical review. Environ. Sustain. Indic. 2023, 20, 100305. [Google Scholar] [CrossRef]
Podgorski, J.; Berg, M. Global analysis and prediction of fluoride in groundwater. Nat. Commun. 2022, 13, 4232. [Google Scholar] [CrossRef]
Abdipour, H.; Azari, A.; Kamani, H.; Pirasteh, K.; Mostafapour, F.K.; Rayegnnakhost, S. Human health risk assessment for fluoride and nitrate contamination in drinking water of municipal and rural areas of Zahedan, Iran. Appl. Water Sci. 2025, 15, 47. [Google Scholar] [CrossRef]
Arumugam, K.; Arpana, M. Environmental Impact Evaluation of Arsenic and Fluoride Contamination and Remediation Strategies. In Groundwater Depletion and Sustainability: A Methodology Utilizing Artificial Intelligence and Earth Observation Systems; Springer: Cham, Switzerland, 2026; pp. 219–238. [Google Scholar]
Nunoo, S.; Abu, M. Fluoride health risks and index-based scaling and corrosiveness potency assessment of groundwater in a peri-urban area in Ghana using multimethod techniques. Environ. Syst. Res. 2025, 14, 39. [Google Scholar] [CrossRef]
Senila, M.; Levei, E.; Cadar, O.; Senila, L.R.; Roman, M.; Puskas, F.; Sima, M. Assessment of Availability and Human Health Risk Posed by Arsenic Contaminated Well Waters from Timis-Bega Area, Romania. J. Anal. Methods Chem. 2017, 2017, 3037651. [Google Scholar] [CrossRef]
Ahmed, Z.; Gui, D.; Qi, Z.; Liu, Y. Poverty reduction through water interventions: A review of approaches in sub-Saharan Africa and South Asia. Irrig. Drain. 2022, 71, 539–558. [Google Scholar] [CrossRef]
Wang, Z.; Guo, H.; Adimalla, N.; Pei, J.; Zhang, Z.; Liu, H. Co-occurrence of arsenic and fluoride in groundwater of Guide basin in China: Genesis, mobility and enrichment mechanism. Environ. Res. 2024, 244, 117920. [Google Scholar] [CrossRef]
Li, J.; Wang, Y.; Zhu, C.; Xue, X.; Qian, K.; Xie, X.; Wang, Y. Hydrogeochemical processes controlling the mobilization and enrichment of fluoride in groundwater of the North China Plain. Sci. Total Environ. 2020, 730, 138877. [Google Scholar] [CrossRef]
Khan, M.; Khan, W. Socioeconomic and recharge effect on spatial changes in the groundwater chemistry of Punjab, Pakistan: A multivariate statistical approach. SN Appl. Sci. 2020, 2, 1465. [Google Scholar] [CrossRef]
Sadiq, M.; Eqani, S.A.M.A.S.; Nawaz, I.; Bangash, N.; Ilyas, S.; Podgorski, J.; Berg, M. Fluoride contamination of groundwater in different geological settings of Punjab Province, Pakistan: Levels, possible mechanisms and health risks. Sci. Total Environ. 2025, 1001, 180450. [Google Scholar] [CrossRef] [PubMed]
Iqbal, J.; Su, C.; Wang, M.; Abbas, H.; Baloch, M.Y.J.; Ghani, J.; Ullah, Z.; Huq, M.E. Groundwater fluoride and nitrate contamination and associated human health risk assessment in South Punjab, Pakistan. Environ. Sci. Pollut. Res. 2023, 30, 61606–61625. [Google Scholar] [CrossRef]
Yasar, A.; Javed, T.; Kausar, F.; Shamshad, J.; Hayat Khan, M.U.; Iqbal, R. Ground water toxicity due to fluoride contamination in Southwestern Lahore, Punjab, Pakistan. Water Supply 2021, 21, 3126–3140. [Google Scholar] [CrossRef]
Khattak, J.A.; Farooqi, A.; Hussain, I.; Kumar, A.; Singh, C.K.; Mailloux, B.J.; Bostick, B.; Ellis, T.; van Geen, A. Groundwater fluoride across the Punjab plains of Pakistan and India: Distribution and underlying mechanisms. Sci. Total Environ. 2022, 806, 151353. [Google Scholar] [CrossRef]
Zahi, F.; Drouiche, A.; Medjani, F.; Reghais, A.; Rahman, M.A.E.A.; Mecibah, I.; Scopa, A.; Fong, S.B.; Refaee, A. Integrating water quality indices and multivariate statistics for groundwater assessment in a Mediterranean coastal aquifer, Northeast Algeria. J. Hydrol. Reg. Stud. 2026, 64, 103200. [Google Scholar] [CrossRef]
Monaci, F.; Baroni, D. Spatial distribution and ecological risk of potentially toxic elements in peri-urban soils of a historically industrialised area. Environ. Monit. Assess. 2025, 197, 948. [Google Scholar] [CrossRef] [PubMed]
Xie, Z.; Liu, W.; Chen, S.; Yao, R.; Yang, C.; Zhang, X.; Li, J.; Wang, Y.; Zhang, Y. Machine learning approaches to identify hydrochemical processes and predict drinking water quality for groundwater environment in a metropolis. J. Hydrol. Reg. Stud. 2025, 58, 102227. [Google Scholar] [CrossRef]
Rasool, U.; Yin, X.; Xu, Z.; Rasool, M.A.; Senapathi, V.; Hussain, M.; Siddique, J.; Trabucco, J.C. Mapping of groundwater productivity potential with machine learning algorithms: A case study in the provincial capital of Baluchistan, Pakistan. Chemosphere 2022, 303, 135265. [Google Scholar] [CrossRef] [PubMed]
Pham, Q.B.; Kumar, M.; Di Nunno, F.; Elbeltagi, A.; Granata, F.; Islam, A.R.M.T.; Talukdar, S.; Nguyen, X.C.; Ahmed, A.N.; Anh, D.T. Groundwater level prediction using machine learning algorithms in a drought-prone area. Neural Comput. Appl. 2022, 34, 10751–10773. [Google Scholar] [CrossRef]
Basharat, U.; Zhang, W.; Han, C.; Khan, S.H.; Abbasi, A.; Mahroof, S.; Li, S. Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan. Ecotoxicol. Environ. Saf. 2025, 302, 118610. [Google Scholar] [CrossRef]
Kim, S.; Alizamir, M.; Heddam, S.; Chang, S.W.; Chung, I.-M.; Kisi, O.; Kulls, C. Development of the machine learning and deep learning models with SHAP strategy for predicting groundwater levels in South Korea. Sci. Rep. 2025, 15, 35523. [Google Scholar] [CrossRef]
Ye, F.; Xiao, F.; Zhan, A.; Chu, Y.; Tian, S.; Zhang, X. QSAR-based prediction of acute inhalation toxicity and SHAP interpretability analysis of fluorocarbon environmental-friendly insulating gases. Environ. Res. 2025, 285, 122340. [Google Scholar] [CrossRef]
Elnakar, H. Automated machine learning and SHAP-based interpretation of PFOA removal via electrochemical oxidation. Desalin. Water Treat. 2025, 325, 101598. [Google Scholar] [CrossRef]
Shah, P.; Shukla, M.; Dholakia, N.H.; Gupta, H. Predicting cardiovascular risk with hybrid ensemble learning and explainable AI. Sci. Rep. 2025, 15, 17927. [Google Scholar] [CrossRef] [PubMed]
Arshad, A.; Zhang, Z.; Zhang, W.; Gujree, I. Long-term perspective changes in crop irrigation requirement caused by climate and agriculture land use changes in Rechna Doab, Pakistan. Water 2019, 11, 1567. [Google Scholar] [CrossRef]
Inam, A.; Adamowski, J.; Prasher, S.; Albano, R. Parameter estimation and uncertainty analysis of the Spatial Agro Hydro Salinity Model (SAHYSMOD) in the semi-arid climate of Rechna Doab, Pakistan. Environ. Model. Softw. 2017, 94, 186–211. [Google Scholar] [CrossRef]
Razzaq, A.; Zhou, Y.; Shahzad, M.A.; Wang, L.; Eliw, M. A tale of two countries: The potential of managed aquifer recharge in Pakistan and Egypt. In Managed Aquifer Recharge in MENA Countries: Developments, Applications, Challenges, Strategies, and Sustainability; Springer: Cham, Switzerland, 2024; pp. 165–183. [Google Scholar]
Awais, M.; Arshad, M.; Ahmad, S.R.; Nazeer, A.; Waqas, M.M.; Aziz, R.; Shakoor, A.; Rizwan, M.; Chauhdary, J.N.; Mehmood, Q.; et al. Simulation of groundwater flow dynamics under different stresses using MODFLOW in Rechna Doab, Pakistan. Sustainability 2022, 15, 661. [Google Scholar] [CrossRef]
Sadeak, S.; Maria, F.Z.; Al Amin, M.; Chowdhury, T.; Alam, M.J.; Mia, M.B.; Ahmed, K.M.; Khan, M.R. Prioritizing geochemical drivers of groundwater quality and health risks in coastal aquifers of Bangladesh using machine learning algorithms. Environ. Geochem. Health 2025, 47, 493. [Google Scholar] [CrossRef] [PubMed]
Calle, P.; Bates, A.; Reynolds, J.C.; Liu, Y.; Cui, H.; Ly, S.; Wang, C.; Zhang, Q.; de Armendi, A.J.; Shettarg, S.S. Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models. Comput. Methods Programs Biomed. 2025, 272, 109063. [Google Scholar] [CrossRef] [PubMed]
Asal, B.; Yalciner, B. Benchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis. IEEE Access 2026, 14, 11660–11681. [Google Scholar]
Henna, S.; Sakhamuri, M.R.; Moitra, L.G.; Rathnayake, U. Game-Theoretic Explainable AI for Ensemble-Boosting Models in Early Malware Prediction for Computer Systems. Int. J. Comput. Intell. Syst. 2025, 18, 318. [Google Scholar] [CrossRef]
Zhang, L.; Jiang, L. Game-theoretic SHAP-driven interpretable forecasting of air cargo demand using Bayesian-optimized random forests. Front. Phys. 2025, 13, 1705687. [Google Scholar] [CrossRef]
Demir Yetiş, A.; İlhan, N.; Kara, H. Integrating deep learning and regression models for accurate prediction of groundwater fluoride contamination in old city in Bitlis province, Eastern Anatolia Region, Türkiye. Environ. Sci. Pollut. Res. 2024, 31, 47201–47219. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 2023, 16, 4. [Google Scholar] [CrossRef]
Stehouwer, N.; Rowland-Seymour, A.; Gruppen, L.; Albert, J.M.; Qua, K. Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning. Diagnosis 2025, 12, 53–60. [Google Scholar] [CrossRef] [PubMed]
Miao, J.; Zhu, W. Precision–recall curve (PRC) classification trees. Evol. Intell. 2022, 15, 1545–1569. [Google Scholar] [CrossRef]
Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
Raza, I.; Khalid, P.; Ehsan, M.I.; Ahmad, Q.A.; Khurram, S.; Zainab, R.; Farooq, S. Geospatial interpolation and hydro-geochemical characterization of alluvial aquifers in the Thal Desert, Punjab, Pakistan. PLoS ONE 2024, 19, e0307025. [Google Scholar] [CrossRef] [PubMed]
Hagagg, K.; Mohamed, F.A. Integrated insights on causal evaluation of shallow aquifer quality using hydrogeochemistry and isotopic approaches. Discov. Sustain. 2025, 6, 1369. [Google Scholar] [CrossRef]
Ling, Y.; Podgorski, J.; Sadiq, M.; Rasheed, H.; Eqani, S.A.M.A.S.; Berg, M. Monitoring and prediction of high fluoride concentrations in groundwater in Pakistan. Sci. Total Environ. 2022, 839, 156058. [Google Scholar] [CrossRef]
Benyoussef, S.; Arabi, M.; El Yousfi, Y.; Makkaoui, M.; Gueddari, H.; El Ouarghi, H.; Abdaoui, A.; Ghalit, M.; Zegzouti, Y.F.; Azirar, M. Assessment of groundwater quality using hydrochemical process, GIS and multivariate statistical analysis at central Rif, North Morocco. Environ. Earth Sci. 2024, 83, 515. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of groundwater sampling locations. (a) provinces of Pakistan; (b) location of study area in Punjab; (c) sampling points and elevation in the study area, stars indicate the locations of major towns.

Figure 2. The overview of the proposed methodology.

Figure 3. Groundwater fluoride distribution based on the WHO drinking-water guideline.

Figure 4. Observed versus predicted fluoride showing strong predictive. The dashed line shows the 1:1 reference line, and the inset shows the residual distribution.

Figure 5. Test-set confusion matrix showing high accuracy with minimal misclassification.

Figure 6. ROC curves showing strong discrimination, highest AUC for XGBoost.

Figure 7. FT-Transformer training dynamics.

Figure 8. Independent test prediction performance of FT-Transformer.

Figure 9. Regression and classification comparisons showing similar model performance.

Figure 10. SHAP global importance ranking of the top hydrochemical predictors of fluoride.

Figure 11. SHAP dependence plots show nonlinear effects on fluoride prediction.

Figure 12. SHAP interaction showing coupled Ca²⁺–HCO₃⁻ influence on fluoride prediction.

Figure 13. Permutation importance ranking identifies key predictors of fluoride.

Figure 14. Calibration curves show improved FT-Transformer probability calibration. The solid blue lines show the model calibration curves, and the dashed diagonal line represents perfect calibration.

Figure 15. Robustness analysis shows stable RMSE under feature perturbations.

Table 1. Descriptive statistics of groundwater physicochemical and hydrochemical parameters.

Parameters	Mean	Median	Min	Max	Standard Deviation	Coefficient Variation %	WHO Value
PH	7.27	7.22	6.50	8.50	0.32	4.4	6.5–8.5
EC	2198	1978	221	6183	1096	49.8	1500
Turbidity	2.65	0.60	0.00	125.20	14.78	557.1	≤5
TDS	1419	1285	127	3942	727	51.2	1000
Calcium (Ca²⁺)	56.8	51.4	13.3	128.3	28.3	49.8	75
Magnesium (Mg²⁺)	43.4	39.1	8.5	122.4	23.0	52.9	50
Total Hardness (TH)	313	307	42	804	148	47.2	500
Bicarbonate (HCO₃⁻)	558	502	100	1520	233	41.7	500
Alkalinity (Alk)	11.2	10.0	1.8	30.5	4.7	41.7	120
Chloride (Cl⁻)	264	193	1	1180	231	87.7	250
Potassium (K⁺)	17.2	10.0	1.0	286	29.1	169.3	12
Sodium (Na⁺)	448	419.5	0	1755	297	66.2	200
Sulfate (SO₄²⁻)	383	304	12	1212	235	61.3	250
Nitrate (NO₃⁻)	2.43	1.35	0.00	9.88	2.44	100.7	50
Fluoride (F⁻)	0.88	0.86	0.00	3.39	0.54	61.3	1.5

Table 2. Cross-validated fluoride prediction performance.

Models	R²	RMSE	MAE
Linear Regression	0.463	0.382	0.272
Random Forest	0.885	0.178	0.111
XGBoost	0.912	0.156	0.091

Table 3. Independent test performance of XGBoost and FT-Transformer.

Models	R²	RMSE	MAE
XGBoost Regression	0.877	0.189	0.084
FT-Transformer	0.877	0.190	0.148

Table 4. Nested cross-validation demonstrated classification performance.

Models	AUC	Recall	Precision	F1-Score	Accuracy
Logistic Regression	0.969	0.960	0.615	0.739	0.936
Random Forest	0.997	0.846	0.942	0.881	0.918
XGBoost	0.989	0.846	0.876	0.854	0.913

Table 5. Independent test classification performance.

Models	ROC-AUC	Recall	Precision	F1-Score	Accuracy	Brier Score
Logistic Regression	0.925	0.875	0.636	0.739	0.905	0.070
FT-Transformer	0.948	0.875	1.000	0.933	0.937	0.021

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gulzar, N.; Liao, X.; Xu, Z.; Rehman, A. Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater. Hydrology 2026, 13, 144. https://doi.org/10.3390/hydrology13060144

AMA Style

Gulzar N, Liao X, Xu Z, Rehman A. Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater. Hydrology. 2026; 13(6):144. https://doi.org/10.3390/hydrology13060144

Chicago/Turabian Style

Gulzar, Nighat, Xin Liao, Zhongyuan Xu, and Amir Rehman. 2026. "Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater" Hydrology 13, no. 6: 144. https://doi.org/10.3390/hydrology13060144

APA Style

Gulzar, N., Liao, X., Xu, Z., & Rehman, A. (2026). Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater. Hydrology, 13(6), 144. https://doi.org/10.3390/hydrology13060144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hydrogeochemical Controls and Explainable Machine Learning for Reliable Prediction of Fluoride Contamination in Groundwater

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Sampling, Analytical Methods, and Quality Control

2.3. Study Design and Analytical Framework

2.4. Data Preparation, Predictors, and Target Definition

2.5. Model Development and Hyperparameter Tuning

2.6. FT-Transformer Implementation and Learning Behavior

2.7. Explainable AI Analysis

2.8. Performance Metrics and Diagnostic Evaluation

3. Simulation Results

3.1. Hydrogeochemical Characteristics and Fluoride Occurrence

3.2. Regression Model Performance Comparison

3.3. Classification of Fluoride Exceedance Risk

3.4. Comparative Performance of ML and Deep Tabular Modeling

3.5. Hydrogeochemical Controls and Model Interpretability

3.6. Calibration, Reliability, and Robustness Analysis

4. Discussion

Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI